linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.5.67: ext3 and tcp BUG()/oops/error/whatnot?
@ 2003-04-10 16:38 CaT
  2003-04-10 17:30 ` Dave Jones
  0 siblings, 1 reply; 6+ messages in thread
From: CaT @ 2003-04-10 16:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: sct, akpm, adilger, davem, jmorris

Buggered if I know what triggered this exactly. At the time I was
abusing the living snot out of the system by compiling mplayer twice,
compiling the kernel, getting mozilla to load 50 or so pages and running
a program that allocs ram until it's oom-killed a few times.

These happened well after the last running of the memory eater and the
slab/tcp error happened 10 mins before the ext3/block error. The
slab/tcp happened just before the first of the mplayer compiles
finished. The ext3/block happened just before the 2nd mplayer compile
finished. Through both mplayer compiles the kernel was compiling and
mozilla was trying to do its thing. Also, mplayer compiles were
happening on the same partition as each other but on a different
partition to the kernel.

I think I'll run a fs check now... :)

Slab corruption: start=ce6130c4, expend=ce6131f3, problemat=ce613128
Last user: [<c032ff78>](destroy_conntrack+0x9c/0xac)
Data: ****************************************************************************************************28 31 61 CE 28 31 61 CE ***************************************************************************************************************************************************************************************************A5 
Next: 71 F0 2C .78 FF 32 C0 71 F0 2C .********************
slab error in check_poison_obj(): cache `ip_conntrack': object was modified after freeing
Call Trace:
 [<c0131d5d>] __slab_error+0x21/0x28
 [<c013214c>] check_poison_obj+0x174/0x180
 [<c01331b9>] kmem_cache_alloc+0x8d/0x128
 [<c033075f>] init_conntrack+0xcf/0x310
 [<c033075f>] init_conntrack+0xcf/0x310
 [<c0330ad8>] ip_conntrack_in+0x138/0x274
 [<c013e280>] shmem_readlink+0x64/0xc0
 [<c032fad6>] ip_conntrack_local+0x52/0x58
 [<c030bfc8>] dst_output+0x0/0x28
 [<c03013b0>] nf_iterate+0x34/0x88
 [<c030bfc8>] dst_output+0x0/0x28
 [<c0301676>] nf_hook_slow+0xb6/0x140
 [<c030bfc8>] dst_output+0x0/0x28
 [<c030a804>] ip_queue_xmit+0x400/0x458
 [<c030bfc8>] dst_output+0x0/0x28
 [<c0108e9d>] error_code+0x2d/0x38
 [<c031e4ba>] tcp_v4_send_check+0x6e/0xb0
 [<c031926c>] tcp_transmit_skb+0x424/0x588
 [<c0319320>] tcp_transmit_skb+0x4d8/0x588
 [<c031b8bf>] tcp_connect+0x3c3/0x45c
 [<c031dcb6>] tcp_v4_connect+0x432/0x4f8
 [<c032af17>] inet_stream_connect+0xd7/0x1f0
 [<c02f61cf>] sys_connect+0x5f/0x84
 [<c02f530a>] sock_map_fd+0xc2/0x120
 [<c02f5358>] sock_map_fd+0x110/0x120
 [<c02f5dc8>] sock_create+0xb4/0xe0
 [<c02f5e22>] sys_socket+0x2e/0x4c
 [<c02f6b8c>] sys_socketcall+0xa4/0x1c0
 [<c013a2b0>] sys_munmap+0x44/0x64
 [<c0108cf3>] syscall_call+0x7/0xb

buffer layer error at fs/buffer.c:127
Call Trace:
 [<c0143cea>] __buffer_error+0x2a/0x30
 [<c0143de6>] __wait_on_buffer+0x66/0xb0
 [<c0119088>] autoremove_wake_function+0x0/0x3c
 [<c0119088>] autoremove_wake_function+0x0/0x3c
 [<c014567f>] __block_prepare_write+0x2a3/0x3b0
 [<c0145eb4>] block_prepare_write+0x20/0x3c
 [<c016e84c>] ext3_get_block+0x0/0x68
 [<c016ec9c>] ext3_prepare_write+0x48/0xe0
 [<c016e84c>] ext3_get_block+0x0/0x68
 [<c012e90e>] generic_file_aio_write_nolock+0x5e2/0x9c8
 [<c012edfb>] generic_file_aio_write+0x7b/0x94
 [<c016cb1a>] ext3_file_write+0x2e/0xc4
 [<c0142fff>] do_sync_write+0x7f/0xb0
 [<c026fb0c>] do_rw_disk+0x440/0x67c
 [<c01207f0>] update_process_times+0x2c/0x38
 [<c01206d6>] update_wall_time+0xe/0x38
 [<c012096d>] do_timer+0x4d/0xd4
 [<c0142b11>] generic_file_llseek+0x31/0xd0
 [<c01430ce>] vfs_write+0x9e/0xd0
 [<c014316a>] sys_write+0x2a/0x40
 [<c0108cf3>] syscall_call+0x7/0xb

-- 
Martin's distress was in contrast to the bitter satisfaction of some
of his fellow marines as they surveyed the scene. "The Iraqis are sick
people and we are the chemotherapy," said Corporal Ryan Dupre. "I am
starting to hate this country. Wait till I get hold of a friggin' Iraqi.
No, I won't get hold of one. I'll just kill him."
	- http://www.informationclearinghouse.info/article2479.htm

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.5.67: ext3 and tcp BUG()/oops/error/whatnot?
  2003-04-10 16:38 2.5.67: ext3 and tcp BUG()/oops/error/whatnot? CaT
@ 2003-04-10 17:30 ` Dave Jones
  2003-04-10 21:14   ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Jones @ 2003-04-10 17:30 UTC (permalink / raw)
  To: CaT; +Cc: linux-kernel, sct, akpm, adilger, davem, jmorris

On Fri, Apr 11, 2003 at 02:38:58AM +1000, CaT wrote:

 > Slab corruption: start=ce6130c4, expend=ce6131f3, problemat=ce613128
 > Last user: [<c032ff78>](destroy_conntrack+0x9c/0xac)
 > Data: ****************************************************************************************************28 31 61 CE 28 31 61 CE ***************************************************************************************************************************************************************************************************A5 
 > Next: 71 F0 2C .78 FF 32 C0 71 F0 2C .********************
 > slab error in check_poison_obj(): cache `ip_conntrack': object was modified after freeing
 > Call Trace:
 >  [<c0131d5d>] __slab_error+0x21/0x28
 >  [<c013214c>] check_poison_obj+0x174/0x180
 >  [<c01331b9>] kmem_cache_alloc+0x8d/0x128
 >  [<c033075f>] init_conntrack+0xcf/0x310
 >  [<c033075f>] init_conntrack+0xcf/0x310

Known bug, with known fix. This really should go to Linus.
http://bugzilla.kernel.org/show_bug.cgi?id=497
 
 > buffer layer error at fs/buffer.c:127
 > Call Trace:
 >  [<c0143cea>] __buffer_error+0x2a/0x30
 >  [<c0143de6>] __wait_on_buffer+0x66/0xb0
 >  [<c0119088>] autoremove_wake_function+0x0/0x3c
 >  [<c0119088>] autoremove_wake_function+0x0/0x3c

False alarm caused by buggy debugging code.
See my posting yesterday in the thread about breaking
reiserfs.

		Dave


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.5.67: ext3 and tcp BUG()/oops/error/whatnot?
  2003-04-10 17:30 ` Dave Jones
@ 2003-04-10 21:14   ` Andrew Morton
  2003-04-10 22:35     ` Martin Josefsson
  2003-04-10 23:33     ` David S. Miller
  0 siblings, 2 replies; 6+ messages in thread
From: Andrew Morton @ 2003-04-10 21:14 UTC (permalink / raw)
  To: Dave Jones; +Cc: cat, linux-kernel, sct, akpm, adilger, davem, jmorris

Dave Jones <davej@codemonkey.org.uk> wrote:
>
> On Fri, Apr 11, 2003 at 02:38:58AM +1000, CaT wrote:
> 
>  > Slab corruption: start=ce6130c4, expend=ce6131f3, problemat=ce613128
>  > Last user: [<c032ff78>](destroy_conntrack+0x9c/0xac)
>  > Data: ****************************************************************************************************28 31 61 CE 28 31 61 CE ***************************************************************************************************************************************************************************************************A5 
>  > Next: 71 F0 2C .78 FF 32 C0 71 F0 2C .********************
>  > slab error in check_poison_obj(): cache `ip_conntrack': object was modified after freeing
>  > Call Trace:
>  >  [<c0131d5d>] __slab_error+0x21/0x28
>  >  [<c013214c>] check_poison_obj+0x174/0x180
>  >  [<c01331b9>] kmem_cache_alloc+0x8d/0x128
>  >  [<c033075f>] init_conntrack+0xcf/0x310
>  >  [<c033075f>] init_conntrack+0xcf/0x310
> 
> Known bug, with known fix. This really should go to Linus.
> http://bugzilla.kernel.org/show_bug.cgi?id=497

I've had the below patch in -mm for some time, but am not sure what to do
with it.  My last attempt to contact netfilter people didn't work.

James?  Help?


From: Martin Josefsson <gandalf@wlug.westbo.se>

You are correct. It was a list_del() that caused it (at least I think
so, it's 2am right now).

1. conntrack helper adds an expectation and adds that to a list hanging
of off a connection.

2. the expected connection arrives. the expectation is still on the
list.

3. the original connection that caused the expectation terminates but
the expectation still thinks it's added to the list.

4. the expected connection terminates and list_del() is called to remove
it from the list which doesn't exist anymore. boom!

(forwarded by akpm@digeo.com)


 25-akpm/net/ipv4/netfilter/ip_conntrack_core.c |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff -puN net/ipv4/netfilter/ip_conntrack_core.c~conntrack-use-after-free-fix net/ipv4/netfilter/ip_conntrack_core.c
--- 25/net/ipv4/netfilter/ip_conntrack_core.c~conntrack-use-after-free-fix	Thu Apr  3 14:53:46 2003
+++ 25-akpm/net/ipv4/netfilter/ip_conntrack_core.c	Thu Apr  3 14:53:46 2003
@@ -273,6 +273,7 @@ static void remove_expectations(struct i
 		 * the un-established ones only */
 		if (exp->sibling) {
 			DEBUGP("remove_expectations: skipping established %p of %p\n", exp->sibling, ct);
+			exp->expectant = NULL;
 			continue;
 		}
 
@@ -326,9 +327,11 @@ destroy_conntrack(struct nf_conntrack *n
 	WRITE_LOCK(&ip_conntrack_lock);
 	/* Delete our master expectation */
 	if (ct->master) {
-		/* can't call __unexpect_related here,
-		 * since it would screw up expect_list */
-		list_del(&ct->master->expected_list);
+		if (ct->master->expectant) {
+			/* can't call __unexpect_related here,
+			 * since it would screw up expect_list */
+			list_del(&ct->master->expected_list);
+		}
 		kfree(ct->master);
 	}
 	WRITE_UNLOCK(&ip_conntrack_lock);

_


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.5.67: ext3 and tcp BUG()/oops/error/whatnot?
  2003-04-10 21:14   ` Andrew Morton
@ 2003-04-10 22:35     ` Martin Josefsson
  2003-04-11  0:18       ` James Morris
  2003-04-10 23:33     ` David S. Miller
  1 sibling, 1 reply; 6+ messages in thread
From: Martin Josefsson @ 2003-04-10 22:35 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Dave Jones, cat, linux-kernel, akpm, davem, jmorris

On Thu, 2003-04-10 at 23:14, Andrew Morton wrote:
> Dave Jones <davej@codemonkey.org.uk> wrote:
> >
> > On Fri, Apr 11, 2003 at 02:38:58AM +1000, CaT wrote:
> > 
> >  > Slab corruption: start=ce6130c4, expend=ce6131f3, problemat=ce613128
> >  > Last user: [<c032ff78>](destroy_conntrack+0x9c/0xac)
> >  > Data: ****************************************************************************************************28 31 61 CE 28 31 61 CE ***************************************************************************************************************************************************************************************************A5 
> >  > Next: 71 F0 2C .78 FF 32 C0 71 F0 2C .********************
> >  > slab error in check_poison_obj(): cache `ip_conntrack': object was modified after freeing
> >  > Call Trace:
> >  >  [<c0131d5d>] __slab_error+0x21/0x28
> >  >  [<c013214c>] check_poison_obj+0x174/0x180
> >  >  [<c01331b9>] kmem_cache_alloc+0x8d/0x128
> >  >  [<c033075f>] init_conntrack+0xcf/0x310
> >  >  [<c033075f>] init_conntrack+0xcf/0x310
> > 
> > Known bug, with known fix. This really should go to Linus.
> > http://bugzilla.kernel.org/show_bug.cgi?id=497
> 
> I've had the below patch in -mm for some time, but am not sure what to do
> with it.  My last attempt to contact netfilter people didn't work.
> 
> James?  Help?

I have a IMO better patch than the one you have and I have tried to get
Harald to approve one of the patches, I just can't get any kind of
response from him.

The new one doesn't leave any dangling pointers around. 
This is the one I'd prefer, I just need to get it blessed.

I'd say we get this to Linus and then Harald can submit a diffrent fix
if he doesn't approve of this one since it seems it can cause crashes.

(although only if the conntrack memory was released by slab before it is
touched by the faulty code in conntrack (which hasn't been the case in
the reports I've seen since it was caught when allocating new
conntracks), shouldn't be able to cause crashes in conntrack itself)

Compiled but not booted (more than what I did with the other fix :)

diff -urN linux-2.5.65.orig/net/ipv4/netfilter/ip_conntrack_core.c linux-2.5.65.fixed/net/ipv4/netfilter/ip_conntrack_core.c
--- linux-2.5.65.orig/net/ipv4/netfilter/ip_conntrack_core.c	2003-03-17 22:43:37.000000000 +0100
+++ linux-2.5.65.fixed/net/ipv4/netfilter/ip_conntrack_core.c	2003-03-26 14:58:54.000000000 +0100
@@ -274,6 +274,7 @@
 		 * the un-established ones only */
 		if (exp->sibling) {
 			DEBUGP("remove_expectations: skipping established %p of %p\n", exp->sibling, ct);
+			exp->expectant = NULL;
 			continue;
 		}
 
@@ -325,6 +326,7 @@
 		ip_conntrack_destroyed(ct);
 
 	WRITE_LOCK(&ip_conntrack_lock);
+	list_del(&ct->sibling_list);
 	/* Delete our master expectation */
 	if (ct->master) {
 		/* can't call __unexpect_related here,


-- 
/Martin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.5.67: ext3 and tcp BUG()/oops/error/whatnot?
  2003-04-10 21:14   ` Andrew Morton
  2003-04-10 22:35     ` Martin Josefsson
@ 2003-04-10 23:33     ` David S. Miller
  1 sibling, 0 replies; 6+ messages in thread
From: David S. Miller @ 2003-04-10 23:33 UTC (permalink / raw)
  To: akpm; +Cc: davej, cat, linux-kernel, sct, akpm, adilger, jmorris

   From: Andrew Morton <akpm@digeo.com>
   Date: Thu, 10 Apr 2003 14:14:43 -0700
   
   I've had the below patch in -mm for some time, but am not sure what to do
   with it.  My last attempt to contact netfilter people didn't work.

Whatever fix eventually is used, we need a 2.4.x copy as well
as the code is identical there.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.5.67: ext3 and tcp BUG()/oops/error/whatnot?
  2003-04-10 22:35     ` Martin Josefsson
@ 2003-04-11  0:18       ` James Morris
  0 siblings, 0 replies; 6+ messages in thread
From: James Morris @ 2003-04-11  0:18 UTC (permalink / raw)
  To: gandalf
  Cc: Andrew Morton, Dave Jones, cat, linux-kernel, akpm, David S. Miller

On 11 Apr 2003, Martin Josefsson wrote:

> 
> I have a IMO better patch than the one you have and I have tried to get
> Harald to approve one of the patches, I just can't get any kind of
> response from him.
> 
> The new one doesn't leave any dangling pointers around. 
> This is the one I'd prefer, I just need to get it blessed.

Martin, it looks ok to me, and I'm also fine to go with your judgement on
this.

> 
> I'd say we get this to Linus and then Harald can submit a diffrent fix
> if he doesn't approve of this one since it seems it can cause crashes.
> 

Agreed.  The fix can be tested in 2.5 then backported to 2.4 once everyone
is happy.



- James
-- 
James Morris
<jmorris@intercode.com.au>




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-04-11  0:07 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-10 16:38 2.5.67: ext3 and tcp BUG()/oops/error/whatnot? CaT
2003-04-10 17:30 ` Dave Jones
2003-04-10 21:14   ` Andrew Morton
2003-04-10 22:35     ` Martin Josefsson
2003-04-11  0:18       ` James Morris
2003-04-10 23:33     ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).