linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 2.6.15-git9a] aoe [1/1]: do not stop retransmit timer when device goes down
@ 2006-01-25 22:04 Al Boldi
  2006-01-26 14:51 ` Ed L. Cashin
  2006-01-27 16:12 ` Al Boldi
  0 siblings, 2 replies; 6+ messages in thread
From: Al Boldi @ 2006-01-25 22:04 UTC (permalink / raw)
  To: Ed L. Cashin; +Cc: linux-kernel, linux-raid, netdev

Ed L. Cashin wrote:
> This patch is a bugfix that follows and depends on the
> eight aoe driver patches sent January 19th.

Will they also fix this?
Or is this an md bug?
It only happens with aoe.
Also, why is aoe slower than nbd?

md: bind<etherd/e0.0>
------------[ cut here ]------------
kernel BUG at fs/sysfs/symlink.c:87!
invalid operand: 0000 [#1]
CPU:    0
EIP:    0060:[<c0188166>]    Not tainted VLI
EFLAGS: 00210246   (2.6.15) 
EIP is at sysfs_create_link+0x56/0x60
eax: c66de390   ebx: 00000000   ecx: c03db91f   edx: c7ee0040
esi: c211bdf8   edi: c7ca0400   ebp: c66de360   esp: c211bdb4
ds: 007b   es: 007b   ss: 0068
Process mkraid (pid: 701, threadinfo=c211b000 task=c2300600)
Stack: c7ca0424 c66de390 c211bdf8 c66de390 c02e5997 c66de390 c6b1b5ec 
c03db91f 
       00200296 c0207d56 c66de3a8 c66de360 c02e650f c66de390 09800000 
5c4725a7 
       98831dc4 65687465 652f6472 00302e30 3feed8a3 891a1652 7f3dc64e 
ab9a9a72 
Call Trace:
 [<c02e5997>] bind_rdev_to_array+0x157/0x1a0
 [<c0207d56>] kobject_init+0x16/0x50
 [<c02e650f>] md_import_device+0xbf/0x1c0
 [<c02e80ad>] add_new_disk+0x22d/0x390
 [<c024403f>] get_random_bytes+0x2f/0x40
 [<c020be9e>] copy_from_user+0x4e/0x90
 [<c02e8ef8>] md_ioctl+0x2e8/0x710
 [<c01fdb46>] blkdev_driver_ioctl+0x56/0x70
 [<c01fdbf3>] blkdev_ioctl+0x93/0x1a0
 [<c015a83b>] block_ioctl+0x2b/0x30
 [<c01641ce>] do_ioctl+0x6e/0x80
 [<c016435a>] vfs_ioctl+0x6a/0x1e0
 [<c0164515>] sys_ioctl+0x45/0x70
 [<c0103009>] syscall_call+0x7/0xb
Code: 4c 24 04 8b 44 24 18 89 1c 24 89 44 24 08 e8 f2 fe ff ff 8b 53 08 89 c1 
ff 42 70 0f 8e 0b 02 00 00 8b 5c 24 0c 89 c8 83 c4 10 c3 <0f> 0b 57 00 5e a6 
3d c0 eb be 8b 44 24 04 8b 40 30 89 44 24 04 
 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2.6.15-git9a] aoe [1/1]: do not stop retransmit timer when device goes down
  2006-01-25 22:04 [PATCH 2.6.15-git9a] aoe [1/1]: do not stop retransmit timer when device goes down Al Boldi
@ 2006-01-26 14:51 ` Ed L. Cashin
  2006-01-27 16:12 ` Al Boldi
  1 sibling, 0 replies; 6+ messages in thread
From: Ed L. Cashin @ 2006-01-26 14:51 UTC (permalink / raw)
  To: Al Boldi; +Cc: linux-kernel, linux-raid, netdev

On Thu, Jan 26, 2006 at 01:04:37AM +0300, Al Boldi wrote:
> Ed L. Cashin wrote:
> > This patch is a bugfix that follows and depends on the
> > eight aoe driver patches sent January 19th.
> 
> Will they also fix this?
> Or is this an md bug?

No, this patch fixes a bug that would cause an AoE device to be
totally unusable, so I think mdadm or mkraid would get an error that
the device was not available before it tried to make a new md device.

> It only happens with aoe.

It looks like in setting up the raid, sysfs_create_link probably has
this going off:

	BUG_ON(!kobj || !kobj->dentry || !name);

> Also, why is aoe slower than nbd?

It wasn't when I tried it.  The userland vblade is slow.  Maybe that's
affecting your results?

> md: bind<etherd/e0.0>
> ------------[ cut here ]------------
> kernel BUG at fs/sysfs/symlink.c:87!
> invalid operand: 0000 [#1]
> CPU:    0
> EIP:    0060:[<c0188166>]    Not tainted VLI
> EFLAGS: 00210246   (2.6.15) 
> EIP is at sysfs_create_link+0x56/0x60
> eax: c66de390   ebx: 00000000   ecx: c03db91f   edx: c7ee0040
> esi: c211bdf8   edi: c7ca0400   ebp: c66de360   esp: c211bdb4
> ds: 007b   es: 007b   ss: 0068
> Process mkraid (pid: 701, threadinfo=c211b000 task=c2300600)
> Stack: c7ca0424 c66de390 c211bdf8 c66de390 c02e5997 c66de390 c6b1b5ec 
> c03db91f 
>        00200296 c0207d56 c66de3a8 c66de360 c02e650f c66de390 09800000 
> 5c4725a7 
>        98831dc4 65687465 652f6472 00302e30 3feed8a3 891a1652 7f3dc64e 
> ab9a9a72 
> Call Trace:
>  [<c02e5997>] bind_rdev_to_array+0x157/0x1a0
>  [<c0207d56>] kobject_init+0x16/0x50
>  [<c02e650f>] md_import_device+0xbf/0x1c0
>  [<c02e80ad>] add_new_disk+0x22d/0x390
>  [<c024403f>] get_random_bytes+0x2f/0x40
>  [<c020be9e>] copy_from_user+0x4e/0x90
>  [<c02e8ef8>] md_ioctl+0x2e8/0x710
>  [<c01fdb46>] blkdev_driver_ioctl+0x56/0x70
>  [<c01fdbf3>] blkdev_ioctl+0x93/0x1a0
>  [<c015a83b>] block_ioctl+0x2b/0x30
>  [<c01641ce>] do_ioctl+0x6e/0x80
>  [<c016435a>] vfs_ioctl+0x6a/0x1e0
>  [<c0164515>] sys_ioctl+0x45/0x70
>  [<c0103009>] syscall_call+0x7/0xb
> Code: 4c 24 04 8b 44 24 18 89 1c 24 89 44 24 08 e8 f2 fe ff ff 8b 53 08 89 c1 
> ff 42 70 0f 8e 0b 02 00 00 8b 5c 24 0c 89 c8 83 c4 10 c3 <0f> 0b 57 00 5e a6 
> 3d c0 eb be 8b 44 24 04 8b 40 30 89 44 24 04 
>  

-- 
  Ed L Cashin <ecashin@coraid.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2.6.15-git9a] aoe [1/1]: do not stop retransmit timer when device goes down
  2006-01-25 22:04 [PATCH 2.6.15-git9a] aoe [1/1]: do not stop retransmit timer when device goes down Al Boldi
  2006-01-26 14:51 ` Ed L. Cashin
@ 2006-01-27 16:12 ` Al Boldi
  1 sibling, 0 replies; 6+ messages in thread
From: Al Boldi @ 2006-01-27 16:12 UTC (permalink / raw)
  To: Ed L. Cashin; +Cc: linux-kernel, linux-raid, netdev

Ed L. Cashin wrote:
> On Thu, Jan 26, 2006 at 01:04:37AM +0300, Al Boldi wrote:
> > Ed L. Cashin wrote:
> > > This patch is a bugfix that follows and depends on the
> > > eight aoe driver patches sent January 19th.
> >
> > Will they also fix this?
> > Or is this an md bug?
>
> No, this patch fixes a bug that would cause an AoE device to be
> totally unusable, so I think mdadm or mkraid would get an error that
> the device was not available before it tried to make a new md device.
>
> > It only happens with aoe.
>
> It looks like in setting up the raid, sysfs_create_link probably has
> this going off:
>
>         BUG_ON(!kobj || !kobj->dentry || !name);
>
> > Also, why is aoe slower than nbd?
>
> It wasn't when I tried it.  The userland vblade is slow.  Maybe that's
> affecting your results?

Why is the userland vblade server slower than the userland nbd-server?

Thanks!

--
Al


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2.6.15-git9a] aoe [1/1]: do not stop retransmit timer when device goes down
  2006-01-30 16:49 devzero
@ 2006-01-31 15:56 ` Al Boldi
  0 siblings, 0 replies; 6+ messages in thread
From: Al Boldi @ 2006-01-31 15:56 UTC (permalink / raw)
  To: devzero; +Cc: linux-kernel, linux-net, Ed L. Cashin

devzero@web.de wrote:
> Hello!
>
> >Why is the userland vblade server slower than the userland nbd-server?
>
> maybe it yet is`t optimized for speed !?
> nbd probably is more mature, too.
>
> as of writing this, maybe the userspace vblade is meant for
> demonstration/testing/learning purpose. you wouldn`t buy a etherblade from
> coraid just for testing AoE - would you?

It's running on layer-2, it should be faster no matter what.
If it's not faster, it would indicate some underlying problem.

Also, is there an nbd-server/client running over udp?

Thanks!

--
Al


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2.6.15-git9a] aoe [1/1]: do not stop retransmit timer when device goes down
@ 2006-01-30 16:49 devzero
  2006-01-31 15:56 ` Al Boldi
  0 siblings, 1 reply; 6+ messages in thread
From: devzero @ 2006-01-30 16:49 UTC (permalink / raw)
  To: linux-kernel; +Cc: a1426z

Hello!

>Why is the userland vblade server slower than the userland nbd-server?

maybe it yet is`t optimized for speed !?
nbd probably is more mature, too.

as of writing this, maybe the userspace vblade is meant for demonstration/testing/learning purpose. you wouldn`t buy a etherblade from coraid just for testing AoE - would you?

but, anyway - there is a second vblade implementation (independent from coraid) at  http://lpk.com.price.ru/~lelik/AoE/

this one is done as a LKM and should be faster.
give it a try !

regards
roland

ps:
i did successfully boot a linux system with AoE-root (just like NFS-root) today ! 



Ed L. Cashin wrote:
> On Thu, Jan 26, 2006 at 01:04:37AM +0300, Al Boldi wrote:
> > Ed L. Cashin wrote:
> > > This patch is a bugfix that follows and depends on the
> > > eight aoe driver patches sent January 19th.
> >
> > Will they also fix this?
> > Or is this an md bug?
>
> No, this patch fixes a bug that would cause an AoE device to be
> totally unusable, so I think mdadm or mkraid would get an error that
> the device was not available before it tried to make a new md device.
>
> > It only happens with aoe.
>
> It looks like in setting up the raid, sysfs_create_link probably has
> this going off:
>
>         BUG_ON(!kobj || !kobj->dentry || !name);
>
> > Also, why is aoe slower than nbd?
>
> It wasn't when I tried it.  The userland vblade is slow.  Maybe that's
> affecting your results?

Why is the userland vblade server slower than the userland nbd-server?

Thanks!

--
Al


http://lpk.com.price.ru/~lelik/AoE/
______________________________________________________________
Verschicken Sie romantische, coole und witzige Bilder per SMS!
Jetzt bei WEB.DE FreeMail: http://f.web.de/?mc=021193


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 2.6.15-git9a] aoe [1/1]: do not stop retransmit timer when device goes down
@ 2006-01-25 18:54 Ed L. Cashin
  0 siblings, 0 replies; 6+ messages in thread
From: Ed L. Cashin @ 2006-01-25 18:54 UTC (permalink / raw)
  To: linux-kernel; +Cc: ecashin, Greg K-H

This patch is a bugfix that follows and depends on the
eight aoe driver patches sent January 19th.

Signed-off-by: "Ed L. Cashin" <ecashin@coraid.com>

When taking an AoE device down, keep the retransmit timer
going so that it re-appears properly when detected later.

diff -upr 2.6.15-git9a-orig/drivers/block/aoe/aoecmd.c 2.6.15-git9a-aoe/drivers/block/aoe/aoecmd.c
--- 2.6.15-git9a-orig/drivers/block/aoe/aoecmd.c	2006-01-19 13:31:23.000000000 -0500
+++ 2.6.15-git9a-aoe/drivers/block/aoe/aoecmd.c	2006-01-25 13:49:07.000000000 -0500
@@ -331,7 +331,7 @@ rexmit_timer(ulong vp)
 	spin_lock_irqsave(&d->lock, flags);
 
 	if (d->flags & DEVFL_TKILL) {
-tdie:		spin_unlock_irqrestore(&d->lock, flags);
+		spin_unlock_irqrestore(&d->lock, flags);
 		return;
 	}
 	f = d->frames;
@@ -342,7 +342,7 @@ tdie:		spin_unlock_irqrestore(&d->lock, 
 			n /= HZ;
 			if (n > MAXWAIT) { /* waited too long.  device failure. */
 				aoedev_downdev(d);
-				goto tdie;
+				break;
 			}
 			rexmit(d, f);
 		}


-- 
  "Ed L. Cashin" <ecashin@coraid.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2006-01-31 15:57 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-25 22:04 [PATCH 2.6.15-git9a] aoe [1/1]: do not stop retransmit timer when device goes down Al Boldi
2006-01-26 14:51 ` Ed L. Cashin
2006-01-27 16:12 ` Al Boldi
  -- strict thread matches above, loose matches on Subject: below --
2006-01-30 16:49 devzero
2006-01-31 15:56 ` Al Boldi
2006-01-25 18:54 Ed L. Cashin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).