linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Linux 2.4.33-rc2
@ 2006-06-21 19:27 Marcelo Tosatti
  2006-06-21 23:35 ` Grant Coady
  2006-07-03 22:07 ` Willy Tarreau
  0 siblings, 2 replies; 8+ messages in thread
From: Marcelo Tosatti @ 2006-06-21 19:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: Willy Tarreau, Grant Coady


A few problems appeared on -rc1... More networking security updates.


Summary of changes from v2.4.33-rc1 to v2.4.33-rc2
============================================

Marcelo Tosatti:
      Change VERSION to v2.4.33-rc2

Mikael Pettersson:
      [PATCH 2.4.33-rc1] repair __ide_dma_no_op breakage

Solar Designer:
      [NETFILTER]: Fix do_add_counters race, possible oops or info leak (CVE-2006-0039)

Vlad Yasevich:
      [SCTP]: Validate the parameter length in HB-ACK chunk. (CVE-2006-1857)
      [SCTP]: Respect the real chunk length when walking parameters. (CVE-2006-1858)

Willy Tarreau:
      Fix vfs_unlink/NFS NULL pointer dereference
      range checking for sleep states sent to /proc/acpi/sleep

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Linux 2.4.33-rc2
  2006-06-21 19:27 Linux 2.4.33-rc2 Marcelo Tosatti
@ 2006-06-21 23:35 ` Grant Coady
  2006-07-03 22:07 ` Willy Tarreau
  1 sibling, 0 replies; 8+ messages in thread
From: Grant Coady @ 2006-06-21 23:35 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-kernel, Willy Tarreau

On Wed, 21 Jun 2006 16:27:56 -0300, Marcelo Tosatti <marcelo@kvack.org> wrote:

>
>A few problems appeared on -rc1... More networking security updates.
>
>
>Summary of changes from v2.4.33-rc1 to v2.4.33-rc2
>============================================
>
>Marcelo Tosatti:
>      Change VERSION to v2.4.33-rc2
>
>Mikael Pettersson:
>      [PATCH 2.4.33-rc1] repair __ide_dma_no_op breakage
>
>Solar Designer:
>      [NETFILTER]: Fix do_add_counters race, possible oops or info leak (CVE-2006-0039)
>
>Vlad Yasevich:
>      [SCTP]: Validate the parameter length in HB-ACK chunk. (CVE-2006-1857)
>      [SCTP]: Respect the real chunk length when walking parameters. (CVE-2006-1858)
>
>Willy Tarreau:
>      Fix vfs_unlink/NFS NULL pointer dereference
>      range checking for sleep states sent to /proc/acpi/sleep

Things are looking up ;)  <http://bugsplatter.mine.nu/test/linux-2.4/>

+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| kernel version  |deltree|hal    |niner  |peetoo |pooh   |sempro |silly  |tosh   |
+ - - - - - - - - + - - - + - - - + - - - + - - - + - - - + - - - + - - - + - - - +
| 2.4.33-rc2      |   Y   |   Y   |   Y   |   Y   |       |   Y   |   Y   |   Y   |
| 2.4.33-rc1      |   -   |   -   |   -   |   -   |       |   X   |   -   |   X   |

Grant.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Linux 2.4.33-rc2
  2006-06-21 19:27 Linux 2.4.33-rc2 Marcelo Tosatti
  2006-06-21 23:35 ` Grant Coady
@ 2006-07-03 22:07 ` Willy Tarreau
  2006-07-05  1:51   ` Grant Coady
  1 sibling, 1 reply; 8+ messages in thread
From: Willy Tarreau @ 2006-07-03 22:07 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-kernel, Trond Myklebust

On Wed, Jun 21, 2006 at 04:27:56PM -0300, Marcelo Tosatti wrote:
 
> Willy Tarreau:
>       Fix vfs_unlink/NFS NULL pointer dereference

Marcelo, I'm not sure this one is perfect yet. Today, while packaging
a lot of files for our distro at work, I came up with a problem where
deleting a file on NFS, and later simply accessing (read/write/create)
a file on the NFS file system did block. However, I could kill all the
offending processes. This was after a full day of mkdir/create/open/
unlink... (tens of thoudands of those), so it is not much reproduceable.

I could not unmount the NFS anymore, while other users had no problem.
Rebooting the client solved the problem. I caught an RPC trace (attached),
not sure if it can help. I must say that I'm also running Trond's NFS
patches which I suspected first, but with which I never encountered a
single problem for years.

The fact that the problem appeared during an rm -rf made me think about
the vfs_unlink() patch. I went to read it again an I'm wondering if we
have not inserted a new problem (please forgive my ignorance here) :

in 2.4.32, we had the following sequence :
        down(&dir->i_zombie);
        if (may_delete(dir, dentry, 0) != 0) return;
        lock_kernel();
        error = dir->i_op->unlink(dir, dentry);
        unlock_kernel();
        if (!error)
              d_delete(dentry);
        up(&dir->i_zombie);
        if (!error)
                inode_dir_notify(dir, DN_DELETE);


int 2.4.33-rc2, we have :
        if (may_delete(dir, dentry, 0) != 0) return;
        inode = dentry->d_inode;

        atomic_inc(&inode->i_count);
        double_down(&dir->i_zombie, &inode->i_zombie);
 
        lock_kernel();
        error = dir->i_op->unlink(dir, dentry);
        unlock_kernel();

        double_up(&dir->i_zombie, &inode->i_zombie);
        iput(inode);

        if (!error) {
                d_delete(dentry);
                inode_dir_notify(dir, DN_DELETE);
        }

What I notice is that in 2.4.32, d_delete(dentry) was performed
between down(&dir->i_zombie) and up(&dir->i_zombie), while now
it's completely outside. I wonder if this can cause race conditions
or not, but at least, I'm sure that we have changed the locking
sequence, which might have some impact.

Do you think I'm searching in the wrong direction ? I worry a
bit, because getting a deadlock after only one day, it's a bit
early :-/

Thanks,
Willy

--- dmesg after writing to /proc/sys/sunrpc/* ---
nfs: flush(a/100663641)
nfs: write(utm-gateway/truc(100663641), 5@0)
nfs: flush(a/100663641)
RPC: 43724 new task procpid 12145
RPC: 43724 rpc_execute flgs 1
RPC: 43724 deleting timer
RPC: 43724 call_start nfs3 proc 7 (async)
RPC: 43724 deleting timer
RPC: 43724 call_reserve
RPC: 43724 reserved req 925b01cc xid 8c207ac6
RPC: 43724 deleting timer
RPC: 43724 call_reserveresult (status 0)
RPC: 43724 deleting timer
RPC: 43724 call_allocate (status 0)
RPC:      allocated buffer 7faa7800
RPC: 43724 deleting timer
RPC: 43724 call_encode (status 0)
RPC: 43724 deleting timer
RPC: 43724 call_bind xprt 925b0000 is connected
RPC: 43724 deleting timer
RPC: 43724 call_transmit (status 0)
RPC: 43724 xprt_transmit(8c207ac6)
RPC: 43724 xprt_cwnd_limited cong = 0 cwnd = 4021
RPC:      xprt_sendmsg(0) = 188
RPC: 43724 xmit complete
RPC: 43724 sleep_on(queue "xprt_pending" time 7214624)
RPC: 43724 added to queue 925b0058 "xprt_pending"
RPC: 43724 setting alarm for 104 ms
RPC:      wake_up_next(925b004c "xprt_resend")
RPC:      wake_up_next(925b0040 "xprt_sending")
RPC:      udp_data_ready...
RPC:      udp_data_ready client 925b0000
RPC: 43724 received reply
RPC:      cong 256, cwnd was 4021, now 4021
RPC:      wake_up_next(925b004c "xprt_resend")
RPC:      wake_up_next(925b0040 "xprt_sending")
RPC: 43724 has input (136 bytes)
RPC: 43724 __rpc_wake_up_task (now 7214625 inh 0)
RPC: 43724 disabling timer
RPC: 43724 removed from queue 925b0058 "xprt_pending"
RPC: 43724 added to queue 9c46448c "schedq"
RPC:      __rpc_wake_up_task done




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Linux 2.4.33-rc2
  2006-07-03 22:07 ` Willy Tarreau
@ 2006-07-05  1:51   ` Grant Coady
  2006-07-05  5:18     ` Willy Tarreau
  2006-07-05 20:51     ` Willy Tarreau
  0 siblings, 2 replies; 8+ messages in thread
From: Grant Coady @ 2006-07-05  1:51 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Marcelo Tosatti, linux-kernel, Trond Myklebust

On Tue, 4 Jul 2006 00:07:36 +0200, Willy Tarreau <w@1wt.eu> wrote:

>On Wed, Jun 21, 2006 at 04:27:56PM -0300, Marcelo Tosatti wrote:
> 
>> Willy Tarreau:
>>       Fix vfs_unlink/NFS NULL pointer dereference
>
>Marcelo, I'm not sure this one is perfect yet. Today, while packaging
>a lot of files for our distro at work, I came up with a problem where
>deleting a file on NFS, and later simply accessing (read/write/create)
>a file on the NFS file system did block. However, I could kill all the
>offending processes. This was after a full day of mkdir/create/open/
>unlink... (tens of thoudands of those), so it is not much reproduceable.
>
>I could not unmount the NFS anymore, while other users had no problem.
>Rebooting the client solved the problem. I caught an RPC trace (attached),
>not sure if it can help. I must say that I'm also running Trond's NFS
>patches which I suspected first, but with which I never encountered a
>single problem for years.
>
>The fact that the problem appeared during an rm -rf made me think about
>the vfs_unlink() patch. I went to read it again an I'm wondering if we
>have not inserted a new problem (please forgive my ignorance here) :
>
>in 2.4.32, we had the following sequence :
>        down(&dir->i_zombie);
>        if (may_delete(dir, dentry, 0) != 0) return;
>        lock_kernel();
>        error = dir->i_op->unlink(dir, dentry);
>        unlock_kernel();
>        if (!error)
>              d_delete(dentry);
>        up(&dir->i_zombie);
>        if (!error)
>                inode_dir_notify(dir, DN_DELETE);
>
>
>int 2.4.33-rc2, we have :
>        if (may_delete(dir, dentry, 0) != 0) return;
>        inode = dentry->d_inode;
>
>        atomic_inc(&inode->i_count);
>        double_down(&dir->i_zombie, &inode->i_zombie);
> 
>        lock_kernel();
>        error = dir->i_op->unlink(dir, dentry);
>        unlock_kernel();
>
>        double_up(&dir->i_zombie, &inode->i_zombie);
>        iput(inode);
>
>        if (!error) {
>                d_delete(dentry);
>                inode_dir_notify(dir, DN_DELETE);
>        }
>
>What I notice is that in 2.4.32, d_delete(dentry) was performed
>between down(&dir->i_zombie) and up(&dir->i_zombie), while now
>it's completely outside. I wonder if this can cause race conditions
>or not, but at least, I'm sure that we have changed the locking
>sequence, which might have some impact.
>
>Do you think I'm searching in the wrong direction ? I worry a
>bit, because getting a deadlock after only one day, it's a bit
>early :-/
>
Assuming you mean something like the patch below?  Doesn't cause any 
problems (yet, still testing) like eat files or segfault here as 
reported for -rc1 +/- various patches ;)

Cheers,
Grant.
--- linux-2.4.33-rc2/fs/namei.c	2006-06-22 07:27:47.000000000 +1000
+++ linux-2.4.33-rc2b/fs/namei.c	2006-07-05 11:43:19.000000000 +1000
@@ -1497,13 +1497,14 @@
 			lock_kernel();
 			error = dir->i_op->unlink(dir, dentry);
 			unlock_kernel();
+			if (!error)
+				d_delete(dentry);
 		}
 	}
 	double_up(&dir->i_zombie, &inode->i_zombie);
 	iput(inode);
 
 	if (!error) {
-		d_delete(dentry);
 		inode_dir_notify(dir, DN_DELETE);
 	}
 	return error;

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Linux 2.4.33-rc2
  2006-07-05  1:51   ` Grant Coady
@ 2006-07-05  5:18     ` Willy Tarreau
  2006-07-05 20:51     ` Willy Tarreau
  1 sibling, 0 replies; 8+ messages in thread
From: Willy Tarreau @ 2006-07-05  5:18 UTC (permalink / raw)
  To: Grant Coady; +Cc: Marcelo Tosatti, linux-kernel, Trond Myklebust

Hi Grant,

On Wed, Jul 05, 2006 at 11:51:35AM +1000, Grant Coady wrote:
> On Tue, 4 Jul 2006 00:07:36 +0200, Willy Tarreau <w@1wt.eu> wrote:
(...)
> >What I notice is that in 2.4.32, d_delete(dentry) was performed
> >between down(&dir->i_zombie) and up(&dir->i_zombie), while now
> >it's completely outside. I wonder if this can cause race conditions
> >or not, but at least, I'm sure that we have changed the locking
> >sequence, which might have some impact.
> >
> >Do you think I'm searching in the wrong direction ? I worry a
> >bit, because getting a deadlock after only one day, it's a bit
> >early :-/
> >
> Assuming you mean something like the patch below?  Doesn't cause any 
> problems (yet, still testing) like eat files or segfault here as 
> reported for -rc1 +/- various patches ;)

yes, exactly this. I don't know if it's correct and/or needed. In 2.6,
the d_delete() is performed outside the lock. I'd like someone's advise
on this one. Also, I'll look for an NFS client stress test to try to
reproduce the problem, because I don't like it when problems like this
only appear once a day. And playing with the VFS does not make me happy
at all.

> Cheers,
> Grant.

Cheers,
Willy

> --- linux-2.4.33-rc2/fs/namei.c	2006-06-22 07:27:47.000000000 +1000
> +++ linux-2.4.33-rc2b/fs/namei.c	2006-07-05 11:43:19.000000000 +1000
> @@ -1497,13 +1497,14 @@
>  			lock_kernel();
>  			error = dir->i_op->unlink(dir, dentry);
>  			unlock_kernel();
> +			if (!error)
> +				d_delete(dentry);
>  		}
>  	}
>  	double_up(&dir->i_zombie, &inode->i_zombie);
>  	iput(inode);
>  
>  	if (!error) {
> -		d_delete(dentry);
>  		inode_dir_notify(dir, DN_DELETE);
>  	}
>  	return error;

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Linux 2.4.33-rc2
  2006-07-05  1:51   ` Grant Coady
  2006-07-05  5:18     ` Willy Tarreau
@ 2006-07-05 20:51     ` Willy Tarreau
  2006-07-06  7:42       ` Grant Coady
  1 sibling, 1 reply; 8+ messages in thread
From: Willy Tarreau @ 2006-07-05 20:51 UTC (permalink / raw)
  To: Grant Coady; +Cc: Marcelo Tosatti, linux-kernel, Trond Myklebust

Hi,

On Wed, Jul 05, 2006 at 11:51:35AM +1000, Grant Coady wrote:
> On Tue, 4 Jul 2006 00:07:36 +0200, Willy Tarreau <w@1wt.eu> wrote:
> 
> >On Wed, Jun 21, 2006 at 04:27:56PM -0300, Marcelo Tosatti wrote:
> > 
> >> Willy Tarreau:
> >>       Fix vfs_unlink/NFS NULL pointer dereference
> >
> >Marcelo, I'm not sure this one is perfect yet. Today, while packaging
> >a lot of files for our distro at work, I came up with a problem where
> >deleting a file on NFS, and later simply accessing (read/write/create)
> >a file on the NFS file system did block. However, I could kill all the
> >offending processes. This was after a full day of mkdir/create/open/
> >unlink... (tens of thoudands of those), so it is not much reproduceable.
> >
> >I could not unmount the NFS anymore, while other users had no problem.
> >Rebooting the client solved the problem. I caught an RPC trace (attached),
> >not sure if it can help. I must say that I'm also running Trond's NFS
> >patches which I suspected first, but with which I never encountered a
> >single problem for years.
> >
> >The fact that the problem appeared during an rm -rf made me think about
> >the vfs_unlink() patch. I went to read it again an I'm wondering if we
> >have not inserted a new problem (please forgive my ignorance here) :
> >
> >in 2.4.32, we had the following sequence :
> >        down(&dir->i_zombie);
> >        if (may_delete(dir, dentry, 0) != 0) return;
> >        lock_kernel();
> >        error = dir->i_op->unlink(dir, dentry);
> >        unlock_kernel();
> >        if (!error)
> >              d_delete(dentry);
> >        up(&dir->i_zombie);
> >        if (!error)
> >                inode_dir_notify(dir, DN_DELETE);
> >
> >
> >int 2.4.33-rc2, we have :
> >        if (may_delete(dir, dentry, 0) != 0) return;
> >        inode = dentry->d_inode;
> >
> >        atomic_inc(&inode->i_count);
> >        double_down(&dir->i_zombie, &inode->i_zombie);
> > 
> >        lock_kernel();
> >        error = dir->i_op->unlink(dir, dentry);
> >        unlock_kernel();
> >
> >        double_up(&dir->i_zombie, &inode->i_zombie);
> >        iput(inode);
> >
> >        if (!error) {
> >                d_delete(dentry);
> >                inode_dir_notify(dir, DN_DELETE);
> >        }
> >
> >What I notice is that in 2.4.32, d_delete(dentry) was performed
> >between down(&dir->i_zombie) and up(&dir->i_zombie), while now
> >it's completely outside. I wonder if this can cause race conditions
> >or not, but at least, I'm sure that we have changed the locking
> >sequence, which might have some impact.
> >
> >Do you think I'm searching in the wrong direction ? I worry a
> >bit, because getting a deadlock after only one day, it's a bit
> >early :-/
> >
> Assuming you mean something like the patch below?  Doesn't cause any 
> problems (yet, still testing) like eat files or segfault here as 
> reported for -rc1 +/- various patches ;)
> 
> Cheers,
> Grant.
> --- linux-2.4.33-rc2/fs/namei.c	2006-06-22 07:27:47.000000000 +1000
> +++ linux-2.4.33-rc2b/fs/namei.c	2006-07-05 11:43:19.000000000 +1000
> @@ -1497,13 +1497,14 @@
>  			lock_kernel();
>  			error = dir->i_op->unlink(dir, dentry);
>  			unlock_kernel();
> +			if (!error)
> +				d_delete(dentry);
>  		}
>  	}
>  	double_up(&dir->i_zombie, &inode->i_zombie);
>  	iput(inode);
>  
>  	if (!error) {
> -		d_delete(dentry);
>  		inode_dir_notify(dir, DN_DELETE);
>  	}
>  	return error;

after a full day of stress-test of multiple parallel tar xUf, and ffsb at
full CPU load, I could not reproduce the problem on the exact same kernel
I first saw it on. So I think I had bad luck and the problem is not related
to the vfs_unlink() patch, so unless anyone else reports a problem or tells
us why it is right or wrong, it would seem reasonable to keep it as it is
in -rc2.

Regards,
Willy


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Linux 2.4.33-rc2
  2006-07-05 20:51     ` Willy Tarreau
@ 2006-07-06  7:42       ` Grant Coady
  2006-07-06  8:25         ` Willy Tarreau
  0 siblings, 1 reply; 8+ messages in thread
From: Grant Coady @ 2006-07-06  7:42 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Marcelo Tosatti, linux-kernel, Trond Myklebust

On Wed, 5 Jul 2006 22:51:37 +0200, Willy Tarreau <w@1wt.eu> wrote:

>Hi,
>
>On Wed, Jul 05, 2006 at 11:51:35AM +1000, Grant Coady wrote:
>> On Tue, 4 Jul 2006 00:07:36 +0200, Willy Tarreau <w@1wt.eu> wrote:
>> 
>> >On Wed, Jun 21, 2006 at 04:27:56PM -0300, Marcelo Tosatti wrote:
>> > 
>> >> Willy Tarreau:
>> >>       Fix vfs_unlink/NFS NULL pointer dereference
>> >
>> >Marcelo, I'm not sure this one is perfect yet. Today, while packaging
>> >a lot of files for our distro at work, I came up with a problem where
>> >deleting a file on NFS, and later simply accessing (read/write/create)
>> >a file on the NFS file system did block. However, I could kill all the
>> >offending processes. This was after a full day of mkdir/create/open/
>> >unlink... (tens of thoudands of those), so it is not much reproduceable.
>> >
>> >I could not unmount the NFS anymore, while other users had no problem.
>> >Rebooting the client solved the problem. I caught an RPC trace (attached),
>> >not sure if it can help. I must say that I'm also running Trond's NFS
>> >patches which I suspected first, but with which I never encountered a
>> >single problem for years.
>> >
>> >The fact that the problem appeared during an rm -rf made me think about
>> >the vfs_unlink() patch. I went to read it again an I'm wondering if we
>> >have not inserted a new problem (please forgive my ignorance here) :
>> >
>> >in 2.4.32, we had the following sequence :
>> >        down(&dir->i_zombie);
>> >        if (may_delete(dir, dentry, 0) != 0) return;
>> >        lock_kernel();
>> >        error = dir->i_op->unlink(dir, dentry);
>> >        unlock_kernel();
>> >        if (!error)
>> >              d_delete(dentry);
>> >        up(&dir->i_zombie);
>> >        if (!error)
>> >                inode_dir_notify(dir, DN_DELETE);
>> >
>> >
>> >int 2.4.33-rc2, we have :
>> >        if (may_delete(dir, dentry, 0) != 0) return;
>> >        inode = dentry->d_inode;
>> >
>> >        atomic_inc(&inode->i_count);
>> >        double_down(&dir->i_zombie, &inode->i_zombie);
>> > 
>> >        lock_kernel();
>> >        error = dir->i_op->unlink(dir, dentry);
>> >        unlock_kernel();
>> >
>> >        double_up(&dir->i_zombie, &inode->i_zombie);
>> >        iput(inode);
>> >
>> >        if (!error) {
>> >                d_delete(dentry);
>> >                inode_dir_notify(dir, DN_DELETE);
>> >        }
>> >
>> >What I notice is that in 2.4.32, d_delete(dentry) was performed
>> >between down(&dir->i_zombie) and up(&dir->i_zombie), while now
>> >it's completely outside. I wonder if this can cause race conditions
>> >or not, but at least, I'm sure that we have changed the locking
>> >sequence, which might have some impact.
>> >
>> >Do you think I'm searching in the wrong direction ? I worry a
>> >bit, because getting a deadlock after only one day, it's a bit
>> >early :-/
>> >
>> Assuming you mean something like the patch below?  Doesn't cause any 
>> problems (yet, still testing) like eat files or segfault here as 
>> reported for -rc1 +/- various patches ;)
>> 
>> Cheers,
>> Grant.
>> --- linux-2.4.33-rc2/fs/namei.c	2006-06-22 07:27:47.000000000 +1000
>> +++ linux-2.4.33-rc2b/fs/namei.c	2006-07-05 11:43:19.000000000 +1000
>> @@ -1497,13 +1497,14 @@
>>  			lock_kernel();
>>  			error = dir->i_op->unlink(dir, dentry);
>>  			unlock_kernel();
>> +			if (!error)
>> +				d_delete(dentry);
>>  		}
>>  	}
>>  	double_up(&dir->i_zombie, &inode->i_zombie);
>>  	iput(inode);
>>  
>>  	if (!error) {
>> -		d_delete(dentry);
>>  		inode_dir_notify(dir, DN_DELETE);
>>  	}
>>  	return error;
>
>after a full day of stress-test of multiple parallel tar xUf, and ffsb at
>full CPU load, I could not reproduce the problem on the exact same kernel
>I first saw it on. So I think I had bad luck and the problem is not related
>to the vfs_unlink() patch, so unless anyone else reports a problem or tells
>us why it is right or wrong, it would seem reasonable to keep it as it is
>in -rc2.

Hi Willy,

Got this with unpatched -rc2, tosh is NFS server, niner is client:

grant@niner:/home/nfstest$ ls -l
total 228474
drwxr-xr-x  19 grant wheel       680 2006-03-20 16:53 linux-2.6.16/
-rw-r--r--   1 grant wheel 233953280 2006-07-05 18:27 linux-2.6.16.tar
drwxr-xr-x  19 grant wheel       680 2006-03-20 16:53 linux-2.6.16b/
grant@niner:/home/nfstest$ x=0; while [ ! $(diff -rq linux-2.6.16 linux-2.6.16b) ]; do ((x++)); echo "trial $x"; rm -rf linux-2.6.16b; mv linux-2.6.16 linux-2.6.16b; tar xf linux-2.6.16.tar; done
trial 1
...
trial 29
rm: cannot remove directory `linux-2.6.16b/drivers/cdrom': Directory not empty
-bash: [: too many arguments
grant@niner:/home/nfstest$ ls -l
total 228474
drwxr-xr-x  19 grant wheel       680 2006-03-20 16:53 linux-2.6.16/
-rw-r--r--   1 grant wheel 233953280 2006-07-05 18:27 linux-2.6.16.tar
drwxr-xr-x   4 grant wheel       104 2006-07-06 11:01 linux-2.6.16b/
grant@niner:/home/nfstest$ rm -rf linux-2.6.16b/

The 'rm -rf linux-2.6.16b' completed okay, a mystery?  

This is with two slow (500MHz) boxen with -rc2.
Only idea I get from logs is during the test:

Jul  5 19:01:19 niner kernel: nfs: server tosh not responding, still trying
Jul  5 19:01:19 niner kernel: nfs: server tosh OK

... about one pair each 2 to 5 mins

Jul  6 11:16:08 niner kernel: nfs: server tosh not responding, still trying
Jul  6 11:16:08 niner kernel: nfs: server tosh OK
Jul  6 11:26:57 niner -- MARK --
Jul  6 11:46:57 niner -- MARK --

Other pair of boxen with patched -rc2 completed 146 trials overnight along 
with compiling 2.4 kernel over NFS as well since morning, 64 completed. 
No 'server not responding messages' logged.

I'll change the two running boxen to straight -rc2 and see if catch 
anything.  

Grant.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Linux 2.4.33-rc2
  2006-07-06  7:42       ` Grant Coady
@ 2006-07-06  8:25         ` Willy Tarreau
  0 siblings, 0 replies; 8+ messages in thread
From: Willy Tarreau @ 2006-07-06  8:25 UTC (permalink / raw)
  To: Grant Coady; +Cc: Marcelo Tosatti, linux-kernel, Trond Myklebust

Hi Grant,

On Thu, Jul 06, 2006 at 05:42:17PM +1000, Grant Coady wrote:
> On Wed, 5 Jul 2006 22:51:37 +0200, Willy Tarreau <w@1wt.eu> wrote:
(...)
> >after a full day of stress-test of multiple parallel tar xUf, and ffsb at
> >full CPU load, I could not reproduce the problem on the exact same kernel
> >I first saw it on. So I think I had bad luck and the problem is not related
> >to the vfs_unlink() patch, so unless anyone else reports a problem or tells
> >us why it is right or wrong, it would seem reasonable to keep it as it is
> >in -rc2.
> 
> Hi Willy,
> 
> Got this with unpatched -rc2, tosh is NFS server, niner is client:
> 
> grant@niner:/home/nfstest$ ls -l
> total 228474
> drwxr-xr-x  19 grant wheel       680 2006-03-20 16:53 linux-2.6.16/
> -rw-r--r--   1 grant wheel 233953280 2006-07-05 18:27 linux-2.6.16.tar
> drwxr-xr-x  19 grant wheel       680 2006-03-20 16:53 linux-2.6.16b/
> grant@niner:/home/nfstest$ x=0; while [ ! $(diff -rq linux-2.6.16 linux-2.6.16b) ]; do ((x++)); echo "trial $x"; rm -rf linux-2.6.16b; mv linux-2.6.16 linux-2.6.16b; tar xf linux-2.6.16.tar; done
> trial 1
> ...
> trial 29
> rm: cannot remove directory `linux-2.6.16b/drivers/cdrom': Directory not empty
> -bash: [: too many arguments
> grant@niner:/home/nfstest$ ls -l
> total 228474
> drwxr-xr-x  19 grant wheel       680 2006-03-20 16:53 linux-2.6.16/
> -rw-r--r--   1 grant wheel 233953280 2006-07-05 18:27 linux-2.6.16.tar
> drwxr-xr-x   4 grant wheel       104 2006-07-06 11:01 linux-2.6.16b/
> grant@niner:/home/nfstest$ rm -rf linux-2.6.16b/
> 
> The 'rm -rf linux-2.6.16b' completed okay, a mystery?  

you might have had a '.nfs0000*' file inthe directory which prevented rmmod
from working, but it was finally removed by the rm -rf.

> This is with two slow (500MHz) boxen with -rc2.
> Only idea I get from logs is during the test:
> 
> Jul  5 19:01:19 niner kernel: nfs: server tosh not responding, still trying
> Jul  5 19:01:19 niner kernel: nfs: server tosh OK
> 
> ... about one pair each 2 to 5 mins
> 
> Jul  6 11:16:08 niner kernel: nfs: server tosh not responding, still trying
> Jul  6 11:16:08 niner kernel: nfs: server tosh OK
> Jul  6 11:26:57 niner -- MARK --
> Jul  6 11:46:57 niner -- MARK --

I get this if the server spends too much time writing data back to the disks.
Doing this on the server fixed the problem for me :

# echo 50 25000 0 0 100 100 60 45 0 >/proc/sys/vm/bdflush

> Other pair of boxen with patched -rc2 completed 146 trials overnight along 
> with compiling 2.4 kernel over NFS as well since morning, 64 completed. 
> No 'server not responding messages' logged.

Was it on the same server and while other clients saw the server disappear ?

> I'll change the two running boxen to straight -rc2 and see if catch 
> anything.  

OK, similarly, it might be interesting to apply your patch to niner to see
if the rmmod error happens again.

> Grant.

Thanks for your tests,
Willy


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-07-06  8:25 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-06-21 19:27 Linux 2.4.33-rc2 Marcelo Tosatti
2006-06-21 23:35 ` Grant Coady
2006-07-03 22:07 ` Willy Tarreau
2006-07-05  1:51   ` Grant Coady
2006-07-05  5:18     ` Willy Tarreau
2006-07-05 20:51     ` Willy Tarreau
2006-07-06  7:42       ` Grant Coady
2006-07-06  8:25         ` Willy Tarreau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).