All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Scst-devel] [ofa-general] WinOF_2_0_5/SRP initiator: slow reads and eventually hangs
       [not found]                   ` <f3177b9e0909081529i4dd74faq9a6c5a4783b5ded4-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2009-09-09 16:38                     ` Bart Van Assche
       [not found]                       ` <e2e108260909090938x6b72c519teed7dfd280eac804-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Bart Van Assche @ 2009-09-09 16:38 UTC (permalink / raw)
  To: Chris Worley
  Cc: Vladislav Bolkhovitin, scst-devel, OpenIB,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Sep 9, 2009 at 12:29 AM, Chris Worley <worleys-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> I'm wondering why it's so easily repeatable by me, and those I work
> with, and nobody else?  I have another completely different
> configuration w/ the same issue...

It would help if you could run the following test:
* Connect an SRP initiator back-to-back to the SRP target.
* Install an operating system combination on initiator and target with
which SRP did not work properly under heavy load.
* Install and run OpenSM on the target if this software is not yet
running on the target.
* Repeat the SRP stress test on the initiator system that is connected
back-to-back to the SRP target.

This will tell us whether or not the IB switch or its firmware is
causing the SRP issues.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Scst-devel] [ofa-general] WinOF_2_0_5/SRP initiator: slow reads and eventually hangs
       [not found]                       ` <e2e108260909090938x6b72c519teed7dfd280eac804-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2009-09-11 19:50                         ` Chris Worley
       [not found]                           ` <f3177b9e0909111250w159def51h9b720366e27fa3a7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Worley @ 2009-09-11 19:50 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Vladislav Bolkhovitin, scst-devel, OpenIB,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Sep 9, 2009 at 6:38 PM, Bart Van Assche<bart.vanassche-Re5JQEeQqe8@public.gmane.orgm> wrote:
> On Wed, Sep 9, 2009 at 12:29 AM, Chris Worley <worleys-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> I'm wondering why it's so easily repeatable by me, and those I work
>> with, and nobody else?  I have another completely different
>> configuration w/ the same issue...
>
> It would help if you could run the following test:
> * Connect an SRP initiator back-to-back to the SRP target.

Many of my tests have been back-to-back, and using different switches,
with  no effect.

> * Install an operating system combination on initiator and target with
> which SRP did not work properly under heavy load.

I'm not sure what test is being requested.  I have tried many tests
using the same OS/Dirtro/etc... combination on both target/initiator
sides.

> * Install and run OpenSM on the target if this software is not yet
> running on the target.

I do generally avoid running it in the target, so will try this test.

> * Repeat the SRP stress test on the initiator system that is connected
> back-to-back to the SRP target.

Does "SRP Stress Test" mean some specific test included w/ SCST?

>
> This will tell us whether or not the IB switch or its firmware is
> causing the SRP issues.

I've definitely removed the switch/firmware from being the cause.

I'm thinking the reason you can't repeat the test may be latency
related.  We get ~50usecs average latency (on small block sizes),
which can't be achieved using regular SSD's (and rotating drives are
nowhere close).  Maybe a ramdisk would help repeat the issue.

Thanks,

Chris
>
> Bart.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Scst-devel] [ofa-general] WinOF_2_0_5/SRP initiator: slow reads and eventually hangs
       [not found]                           ` <f3177b9e0909111250w159def51h9b720366e27fa3a7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2009-09-12 15:24                             ` Bart Van Assche
  2009-09-14 18:51                             ` Vladislav Bolkhovitin
  1 sibling, 0 replies; 22+ messages in thread
From: Bart Van Assche @ 2009-09-12 15:24 UTC (permalink / raw)
  To: Chris Worley
  Cc: Vladislav Bolkhovitin, scst-devel, OpenIB,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Fri, Sep 11, 2009 at 9:50 PM, Chris Worley <worleys-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> I'm thinking the reason you can't repeat the test may be latency
> related.  We get ~50usecs average latency (on small block sizes),
> which can't be achieved using regular SSD's (and rotating drives are
> nowhere close).  Maybe a ramdisk would help repeat the issue.

SRPT is being tested routinely with a RAM disk as target and this works fine.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Scst-devel] [ofa-general] WinOF_2_0_5/SRP initiator: slow reads and eventually hangs
       [not found]                           ` <f3177b9e0909111250w159def51h9b720366e27fa3a7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2009-09-12 15:24                             ` Bart Van Assche
@ 2009-09-14 18:51                             ` Vladislav Bolkhovitin
       [not found]                               ` <4AAE909F.6030202-d+Crzxg7Rs0@public.gmane.org>
  1 sibling, 1 reply; 22+ messages in thread
From: Vladislav Bolkhovitin @ 2009-09-14 18:51 UTC (permalink / raw)
  To: Chris Worley
  Cc: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA, scst-devel, OpenIB

Chris Worley, on 09/11/2009 11:50 PM wrote:
> I've definitely removed the switch/firmware from being the cause.
> 
> I'm thinking the reason you can't repeat the test may be latency
> related.  We get ~50usecs average latency (on small block sizes),
> which can't be achieved using regular SSD's (and rotating drives are
> nowhere close).  Maybe a ramdisk would help repeat the issue.

I think you should try to reproduce the problem with ramdisk or nullio. 
By so you will eliminate possible influence of the SSD backend.

Vlad
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Scst-devel] [ofa-general] WinOF_2_0_5/SRP initiator: slow reads and eventually hangs
       [not found]                               ` <4AAE909F.6030202-d+Crzxg7Rs0@public.gmane.org>
@ 2009-09-14 23:03                                 ` Chris Worley
       [not found]                                   ` <f3177b9e0909141603j2dc61663j4c6bbcc0dda631d4-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Worley @ 2009-09-14 23:03 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA, scst-devel, OpenIB

On Mon, Sep 14, 2009 at 12:51 PM, Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org> wrote:
> Chris Worley, on 09/11/2009 11:50 PM wrote:
>>
>> I've definitely removed the switch/firmware from being the cause.
>>
>> I'm thinking the reason you can't repeat the test may be latency
>> related.  We get ~50usecs average latency (on small block sizes),
>> which can't be achieved using regular SSD's (and rotating drives are
>> nowhere close).  Maybe a ramdisk would help repeat the issue.
>
> I think you should try to reproduce the problem with ramdisk or nullio. By
> so you will eliminate possible influence of the SSD backend.

W/ 12GB RAM in the target, I created a 7GB ramdisk:

mount -t ramfs -o size=7g ramfs /mnt/
dd if=/dev/zero of=/mnt/foo bs=1024k count=7000
echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
echo "add ramdisk 2" >/proc/scsi_tgt/groups/Default/devices

Then, on the initiator, I tested it... and it hung during sequential
8KB block reads:

fio --rw=read --bs=8k --numjobs=64 --iodepth=64 --sync=0 --direct=1
--randrepeat=0 \
   --group_reporting --ioengine=libaio --filename=/dev/sde --name=test
--loops=10000 --runtime=600

Note that I was running the SM on the target this time too.

Thanks,

Chris
>
> Vlad
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Scst-devel] [ofa-general] WinOF_2_0_5/SRP initiator: slow reads and eventually hangs
       [not found]                                   ` <f3177b9e0909141603j2dc61663j4c6bbcc0dda631d4-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2009-09-15  6:10                                     ` Bart Van Assche
       [not found]                                       ` <e2e108260909142310vb353718uea99d50ab638a865-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2009-09-15 16:39                                     ` Vladislav Bolkhovitin
  1 sibling, 1 reply; 22+ messages in thread
From: Bart Van Assche @ 2009-09-15  6:10 UTC (permalink / raw)
  To: Chris Worley
  Cc: Vladislav Bolkhovitin, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	scst-devel, OpenIB

On Tue, Sep 15, 2009 at 1:03 AM, Chris Worley <worleys-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
> On Mon, Sep 14, 2009 at 12:51 PM, Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org> wrote:
> > Chris Worley, on 09/11/2009 11:50 PM wrote:
> >>
> >> I've definitely removed the switch/firmware from being the cause.
> >>
> >> I'm thinking the reason you can't repeat the test may be latency
> >> related.  We get ~50usecs average latency (on small block sizes),
> >> which can't be achieved using regular SSD's (and rotating drives are
> >> nowhere close).  Maybe a ramdisk would help repeat the issue.
> >
> > I think you should try to reproduce the problem with ramdisk or nullio. By
> > so you will eliminate possible influence of the SSD backend.
>
> W/ 12GB RAM in the target, I created a 7GB ramdisk:
>
> mount -t ramfs -o size=7g ramfs /mnt/
> dd if=/dev/zero of=/mnt/foo bs=1024k count=7000
> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
> echo "add ramdisk 2" >/proc/scsi_tgt/groups/Default/devices
>
> Then, on the initiator, I tested it... and it hung during sequential
> 8KB block reads:
>
> fio --rw=read --bs=8k --numjobs=64 --iodepth=64 --sync=0 --direct=1
> --randrepeat=0 \
>   --group_reporting --ioengine=libaio --filename=/dev/sde --name=test
> --loops=10000 --runtime=600
>
> Note that I was running the SM on the target this time too.

Which Linux distro was installed on the inititiator and on the target
? And if applicable, which OFED version ? Which kernel messages were
logged by SRPT around the time the issue occurred (after having
enabled SRPT logging first) ?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Scst-devel] [ofa-general] WinOF_2_0_5/SRP initiator: slow reads and eventually hangs
       [not found]                                       ` <e2e108260909142310vb353718uea99d50ab638a865-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2009-09-15 15:50                                         ` Chris Worley
       [not found]                                           ` <f3177b9e0909150850r2d0f5e15id7f4e14b015f68ed-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Worley @ 2009-09-15 15:50 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Vladislav Bolkhovitin, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	scst-devel, OpenIB

On Tue, Sep 15, 2009 at 12:10 AM, Bart Van Assche
<bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Tue, Sep 15, 2009 at 1:03 AM, Chris Worley <worleys-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>
>> On Mon, Sep 14, 2009 at 12:51 PM, Vladislav Bolkhovitin <vst-UyIK/FWETgo@public.gmane.orgt> wrote:
>> > Chris Worley, on 09/11/2009 11:50 PM wrote:
>> >>
>> >> I've definitely removed the switch/firmware from being the cause.
>> >>
>> >> I'm thinking the reason you can't repeat the test may be latency
>> >> related.  We get ~50usecs average latency (on small block sizes),
>> >> which can't be achieved using regular SSD's (and rotating drives are
>> >> nowhere close).  Maybe a ramdisk would help repeat the issue.
>> >
>> > I think you should try to reproduce the problem with ramdisk or nullio. By
>> > so you will eliminate possible influence of the SSD backend.
>>
>> W/ 12GB RAM in the target, I created a 7GB ramdisk:
>>
>> mount -t ramfs -o size=7g ramfs /mnt/
>> dd if=/dev/zero of=/mnt/foo bs=1024k count=7000
>> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
>> echo "add ramdisk 2" >/proc/scsi_tgt/groups/Default/devices
>>
>> Then, on the initiator, I tested it... and it hung during sequential
>> 8KB block reads:
>>
>> fio --rw=read --bs=8k --numjobs=64 --iodepth=64 --sync=0 --direct=1
>> --randrepeat=0 \
>>   --group_reporting --ioengine=libaio --filename=/dev/sde --name=test
>> --loops=10000 --runtime=600
>>
>> Note that I was running the SM on the target this time too.
>
> Which Linux distro was installed on the inititiator and on the target
> ? And if applicable, which OFED version ? Which kernel messages were
> logged by SRPT around the time the issue occurred (after having
> enabled SRPT logging first) ?

As logging hadn't helped this issue previously, I've not been enabling
it.  That plus the kernel hacks needed to invoke logging, it's not
worth enabling.

This was with Ubuntu 8.10, built-in IB on the 2.6.27-14-server kernel.

I couldn't get ramdisks working w/ SCST in RHEL5.2.  When running:

echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk

I get the error:

dev_vdisk: ***ERROR***: Wrong f_op or FS doesn't have required capabilities

... which doesn't occur in the Ubuntu kernel, so I've been unable to
test RHEL kernels w/ ramdisks.  In general, this problem occurs w/ 8KB
and smaller blocks w/ the Ubuntu kernels, and 2KB and smaller blocks
w/ RHEL kernels.

Chris
>
> Bart.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Scst-devel] [ofa-general] WinOF_2_0_5/SRP initiator: slow reads and eventually hangs
       [not found]                                   ` <f3177b9e0909141603j2dc61663j4c6bbcc0dda631d4-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2009-09-15  6:10                                     ` Bart Van Assche
@ 2009-09-15 16:39                                     ` Vladislav Bolkhovitin
  2009-09-15 16:52                                       ` Chris Worley
  1 sibling, 1 reply; 22+ messages in thread
From: Vladislav Bolkhovitin @ 2009-09-15 16:39 UTC (permalink / raw)
  To: Chris Worley; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, scst-devel, OpenIB

Chris Worley, on 09/15/2009 03:03 AM wrote:
> On Mon, Sep 14, 2009 at 12:51 PM, Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org> wrote:
>> Chris Worley, on 09/11/2009 11:50 PM wrote:
>>> I've definitely removed the switch/firmware from being the cause.
>>>
>>> I'm thinking the reason you can't repeat the test may be latency
>>> related.  We get ~50usecs average latency (on small block sizes),
>>> which can't be achieved using regular SSD's (and rotating drives are
>>> nowhere close).  Maybe a ramdisk would help repeat the issue.
>> I think you should try to reproduce the problem with ramdisk or nullio. By
>> so you will eliminate possible influence of the SSD backend.
> 
> W/ 12GB RAM in the target, I created a 7GB ramdisk:
> 
> mount -t ramfs -o size=7g ramfs /mnt/
> dd if=/dev/zero of=/mnt/foo bs=1024k count=7000
> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
> echo "add ramdisk 2" >/proc/scsi_tgt/groups/Default/devices
> 
> Then, on the initiator, I tested it... and it hung during sequential
> 8KB block reads:
> 
> fio --rw=read --bs=8k --numjobs=64 --iodepth=64 --sync=0 --direct=1
> --randrepeat=0 \
>    --group_reporting --ioengine=libaio --filename=/dev/sde --name=test
> --loops=10000 --runtime=600
> 
> Note that I was running the SM on the target this time too.

Should you try then with iSCSI with IPoIB? It will eliminate the SRP stack.

> Thanks,
> 
> Chris
>> Vlad
>>
> 
> ------------------------------------------------------------------------------
> Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
> is the only developer event you need to attend this year. Jumpstart your
> developing skills, take BlackBerry mobile applications to market and stay 
> ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
> http://p.sf.net/sfu/devconf
> _______________________________________________
> Scst-devel mailing list
> Scst-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
> https://lists.sourceforge.net/lists/listinfo/scst-devel
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Scst-devel] [ofa-general] WinOF_2_0_5/SRP initiator: slow reads and eventually hangs
       [not found]                                           ` <f3177b9e0909150850r2d0f5e15id7f4e14b015f68ed-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2009-09-15 16:43                                             ` Vladislav Bolkhovitin
  2009-09-15 16:53                                               ` Chris Worley
  2009-09-16  6:38                                             ` Bart Van Assche
  2009-09-16  6:42                                             ` Bart Van Assche
  2 siblings, 1 reply; 22+ messages in thread
From: Vladislav Bolkhovitin @ 2009-09-15 16:43 UTC (permalink / raw)
  To: Chris Worley
  Cc: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA, scst-devel, OpenIB

Chris Worley, on 09/15/2009 07:50 PM wrote:
> On Tue, Sep 15, 2009 at 12:10 AM, Bart Van Assche
> <bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> On Tue, Sep 15, 2009 at 1:03 AM, Chris Worley <worleys-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>> On Mon, Sep 14, 2009 at 12:51 PM, Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org> wrote:
>>>> Chris Worley, on 09/11/2009 11:50 PM wrote:
>>>>> I've definitely removed the switch/firmware from being the cause.
>>>>>
>>>>> I'm thinking the reason you can't repeat the test may be latency
>>>>> related.  We get ~50usecs average latency (on small block sizes),
>>>>> which can't be achieved using regular SSD's (and rotating drives are
>>>>> nowhere close).  Maybe a ramdisk would help repeat the issue.
>>>> I think you should try to reproduce the problem with ramdisk or nullio. By
>>>> so you will eliminate possible influence of the SSD backend.
>>> W/ 12GB RAM in the target, I created a 7GB ramdisk:
>>>
>>> mount -t ramfs -o size=7g ramfs /mnt/
>>> dd if=/dev/zero of=/mnt/foo bs=1024k count=7000
>>> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
>>> echo "add ramdisk 2" >/proc/scsi_tgt/groups/Default/devices
>>>
>>> Then, on the initiator, I tested it... and it hung during sequential
>>> 8KB block reads:
>>>
>>> fio --rw=read --bs=8k --numjobs=64 --iodepth=64 --sync=0 --direct=1
>>> --randrepeat=0 \
>>>   --group_reporting --ioengine=libaio --filename=/dev/sde --name=test
>>> --loops=10000 --runtime=600
>>>
>>> Note that I was running the SM on the target this time too.
>> Which Linux distro was installed on the inititiator and on the target
>> ? And if applicable, which OFED version ? Which kernel messages were
>> logged by SRPT around the time the issue occurred (after having
>> enabled SRPT logging first) ?
> 
> As logging hadn't helped this issue previously, I've not been enabling
> it.  That plus the kernel hacks needed to invoke logging, it's not
> worth enabling.
> 
> This was with Ubuntu 8.10, built-in IB on the 2.6.27-14-server kernel.
> 
> I couldn't get ramdisks working w/ SCST in RHEL5.2.  When running:
> 
> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
> 
> I get the error:
> 
> dev_vdisk: ***ERROR***: Wrong f_op or FS doesn't have required capabilities
> 
> ... which doesn't occur in the Ubuntu kernel, so I've been unable to
> test RHEL kernels w/ ramdisks.  In general, this problem occurs w/ 8KB
> and smaller blocks w/ the Ubuntu kernels, and 2KB and smaller blocks
> w/ RHEL kernels.

Use ramfs instead.

> Chris
>> Bart.
>>
> 
> ------------------------------------------------------------------------------
> Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
> is the only developer event you need to attend this year. Jumpstart your
> developing skills, take BlackBerry mobile applications to market and stay 
> ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
> http://p.sf.net/sfu/devconf
> _______________________________________________
> Scst-devel mailing list
> Scst-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
> https://lists.sourceforge.net/lists/listinfo/scst-devel
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Scst-devel] [ofa-general] WinOF_2_0_5/SRP initiator: slow reads and eventually hangs
  2009-09-15 16:39                                     ` Vladislav Bolkhovitin
@ 2009-09-15 16:52                                       ` Chris Worley
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Worley @ 2009-09-15 16:52 UTC (permalink / raw)
  To: Vladislav Bolkhovitin; +Cc: linux-rdma, scst-devel, OpenIB

On Tue, Sep 15, 2009 at 10:39 AM, Vladislav Bolkhovitin <vst@vlnb.net> wrote:
> Chris Worley, on 09/15/2009 03:03 AM wrote:
>>
>> On Mon, Sep 14, 2009 at 12:51 PM, Vladislav Bolkhovitin <vst@vlnb.net>
>> wrote:
>>>
>>> Chris Worley, on 09/11/2009 11:50 PM wrote:
>>>>
>>>> I've definitely removed the switch/firmware from being the cause.
>>>>
>>>> I'm thinking the reason you can't repeat the test may be latency
>>>> related.  We get ~50usecs average latency (on small block sizes),
>>>> which can't be achieved using regular SSD's (and rotating drives are
>>>> nowhere close).  Maybe a ramdisk would help repeat the issue.
>>>
>>> I think you should try to reproduce the problem with ramdisk or nullio.
>>> By
>>> so you will eliminate possible influence of the SSD backend.
>>
>> W/ 12GB RAM in the target, I created a 7GB ramdisk:
>>
>> mount -t ramfs -o size=7g ramfs /mnt/
>> dd if=/dev/zero of=/mnt/foo bs=1024k count=7000
>> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
>> echo "add ramdisk 2" >/proc/scsi_tgt/groups/Default/devices
>>
>> Then, on the initiator, I tested it... and it hung during sequential
>> 8KB block reads:
>>
>> fio --rw=read --bs=8k --numjobs=64 --iodepth=64 --sync=0 --direct=1
>> --randrepeat=0 \
>>   --group_reporting --ioengine=libaio --filename=/dev/sde --name=test
>> --loops=10000 --runtime=600
>>
>> Note that I was running the SM on the target this time too.
>
> Should you try then with iSCSI with IPoIB? It will eliminate the SRP stack.

I don't see this issue w/ iSCSI over IPoIB... I also don't see the performance.

Chris
>
>> Thanks,
>>
>> Chris
>>>
>>> Vlad
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
>> is the only developer event you need to attend this year. Jumpstart your
>> developing skills, take BlackBerry mobile applications to market and stay
>> ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
>> http://p.sf.net/sfu/devconf
>> _______________________________________________
>> Scst-devel mailing list
>> Scst-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scst-devel
>>
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Scst-devel] [ofa-general] WinOF_2_0_5/SRP initiator: slow reads and eventually hangs
  2009-09-15 16:43                                             ` Vladislav Bolkhovitin
@ 2009-09-15 16:53                                               ` Chris Worley
       [not found]                                                 ` <f3177b9e0909150953x11d19210mf07cbdcc57928d42-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Worley @ 2009-09-15 16:53 UTC (permalink / raw)
  To: Vladislav Bolkhovitin; +Cc: linux-rdma, scst-devel, OpenIB

On Tue, Sep 15, 2009 at 10:43 AM, Vladislav Bolkhovitin <vst@vlnb.net> wrote:
> Chris Worley, on 09/15/2009 07:50 PM wrote:
>>
>> On Tue, Sep 15, 2009 at 12:10 AM, Bart Van Assche
>> <bart.vanassche@gmail.com> wrote:
>>>
>>> On Tue, Sep 15, 2009 at 1:03 AM, Chris Worley <worleys@gmail.com> wrote:
>>>>
>>>> On Mon, Sep 14, 2009 at 12:51 PM, Vladislav Bolkhovitin <vst@vlnb.net>
>>>> wrote:
>>>>>
>>>>> Chris Worley, on 09/11/2009 11:50 PM wrote:
>>>>>>
>>>>>> I've definitely removed the switch/firmware from being the cause.
>>>>>>
>>>>>> I'm thinking the reason you can't repeat the test may be latency
>>>>>> related.  We get ~50usecs average latency (on small block sizes),
>>>>>> which can't be achieved using regular SSD's (and rotating drives are
>>>>>> nowhere close).  Maybe a ramdisk would help repeat the issue.
>>>>>
>>>>> I think you should try to reproduce the problem with ramdisk or nullio.
>>>>> By
>>>>> so you will eliminate possible influence of the SSD backend.
>>>>
>>>> W/ 12GB RAM in the target, I created a 7GB ramdisk:
>>>>
>>>> mount -t ramfs -o size=7g ramfs /mnt/
>>>> dd if=/dev/zero of=/mnt/foo bs=1024k count=7000
>>>> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
>>>> echo "add ramdisk 2" >/proc/scsi_tgt/groups/Default/devices
>>>>
>>>> Then, on the initiator, I tested it... and it hung during sequential
>>>> 8KB block reads:
>>>>
>>>> fio --rw=read --bs=8k --numjobs=64 --iodepth=64 --sync=0 --direct=1
>>>> --randrepeat=0 \
>>>>  --group_reporting --ioengine=libaio --filename=/dev/sde --name=test
>>>> --loops=10000 --runtime=600
>>>>
>>>> Note that I was running the SM on the target this time too.
>>>
>>> Which Linux distro was installed on the inititiator and on the target
>>> ? And if applicable, which OFED version ? Which kernel messages were
>>> logged by SRPT around the time the issue occurred (after having
>>> enabled SRPT logging first) ?
>>
>> As logging hadn't helped this issue previously, I've not been enabling
>> it.  That plus the kernel hacks needed to invoke logging, it's not
>> worth enabling.
>>
>> This was with Ubuntu 8.10, built-in IB on the 2.6.27-14-server kernel.
>>
>> I couldn't get ramdisks working w/ SCST in RHEL5.2.  When running:
>>
>> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
>>
>> I get the error:
>>
>> dev_vdisk: ***ERROR***: Wrong f_op or FS doesn't have required
>> capabilities
>>
>> ... which doesn't occur in the Ubuntu kernel, so I've been unable to
>> test RHEL kernels w/ ramdisks.  In general, this problem occurs w/ 8KB
>> and smaller blocks w/ the Ubuntu kernels, and 2KB and smaller blocks
>> w/ RHEL kernels.
>
> Use ramfs instead.

Do you mean:

mount -t ramfs -o size=7g ramfs /mnt/

?

That's what I'm doing.

Chris
>
>> Chris
>>>
>>> Bart.
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
>> is the only developer event you need to attend this year. Jumpstart your
>> developing skills, take BlackBerry mobile applications to market and stay
>> ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
>> http://p.sf.net/sfu/devconf
>> _______________________________________________
>> Scst-devel mailing list
>> Scst-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scst-devel
>>
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Scst-devel] [ofa-general] WinOF_2_0_5/SRP initiator: slow reads and eventually hangs
       [not found]                                                 ` <f3177b9e0909150953x11d19210mf07cbdcc57928d42-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2009-09-15 16:57                                                   ` Vladislav Bolkhovitin
  2009-09-15 17:01                                                     ` Chris Worley
  0 siblings, 1 reply; 22+ messages in thread
From: Vladislav Bolkhovitin @ 2009-09-15 16:57 UTC (permalink / raw)
  To: Chris Worley; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, scst-devel, OpenIB

Chris Worley, on 09/15/2009 08:53 PM wrote:
> On Tue, Sep 15, 2009 at 10:43 AM, Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org> wrote:
>> Chris Worley, on 09/15/2009 07:50 PM wrote:
>>> On Tue, Sep 15, 2009 at 12:10 AM, Bart Van Assche
>>> <bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>>> On Tue, Sep 15, 2009 at 1:03 AM, Chris Worley <worleys-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>>>> On Mon, Sep 14, 2009 at 12:51 PM, Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org>
>>>>> wrote:
>>>>>> Chris Worley, on 09/11/2009 11:50 PM wrote:
>>>>>>> I've definitely removed the switch/firmware from being the cause.
>>>>>>>
>>>>>>> I'm thinking the reason you can't repeat the test may be latency
>>>>>>> related.  We get ~50usecs average latency (on small block sizes),
>>>>>>> which can't be achieved using regular SSD's (and rotating drives are
>>>>>>> nowhere close).  Maybe a ramdisk would help repeat the issue.
>>>>>> I think you should try to reproduce the problem with ramdisk or nullio.
>>>>>> By
>>>>>> so you will eliminate possible influence of the SSD backend.
>>>>> W/ 12GB RAM in the target, I created a 7GB ramdisk:
>>>>>
>>>>> mount -t ramfs -o size=7g ramfs /mnt/
>>>>> dd if=/dev/zero of=/mnt/foo bs=1024k count=7000
>>>>> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
>>>>> echo "add ramdisk 2" >/proc/scsi_tgt/groups/Default/devices
>>>>>
>>>>> Then, on the initiator, I tested it... and it hung during sequential
>>>>> 8KB block reads:
>>>>>
>>>>> fio --rw=read --bs=8k --numjobs=64 --iodepth=64 --sync=0 --direct=1
>>>>> --randrepeat=0 \
>>>>>  --group_reporting --ioengine=libaio --filename=/dev/sde --name=test
>>>>> --loops=10000 --runtime=600
>>>>>
>>>>> Note that I was running the SM on the target this time too.
>>>> Which Linux distro was installed on the inititiator and on the target
>>>> ? And if applicable, which OFED version ? Which kernel messages were
>>>> logged by SRPT around the time the issue occurred (after having
>>>> enabled SRPT logging first) ?
>>> As logging hadn't helped this issue previously, I've not been enabling
>>> it.  That plus the kernel hacks needed to invoke logging, it's not
>>> worth enabling.
>>>
>>> This was with Ubuntu 8.10, built-in IB on the 2.6.27-14-server kernel.
>>>
>>> I couldn't get ramdisks working w/ SCST in RHEL5.2.  When running:
>>>
>>> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
>>>
>>> I get the error:
>>>
>>> dev_vdisk: ***ERROR***: Wrong f_op or FS doesn't have required
>>> capabilities
>>>
>>> ... which doesn't occur in the Ubuntu kernel, so I've been unable to
>>> test RHEL kernels w/ ramdisks.  In general, this problem occurs w/ 8KB
>>> and smaller blocks w/ the Ubuntu kernels, and 2KB and smaller blocks
>>> w/ RHEL kernels.
>> Use ramfs instead.
> 
> Do you mean:
> 
> mount -t ramfs -o size=7g ramfs /mnt/

You should then create a file on it and use it.

> ?
> 
> That's what I'm doing.
> 
> Chris
>>> Chris
>>>> Bart.
>>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
>>> is the only developer event you need to attend this year. Jumpstart your
>>> developing skills, take BlackBerry mobile applications to market and stay
>>> ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
>>> http://p.sf.net/sfu/devconf
>>> _______________________________________________
>>> Scst-devel mailing list
>>> Scst-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
>>> https://lists.sourceforge.net/lists/listinfo/scst-devel
>>>
>>
> _______________________________________________
> general mailing list
> general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Scst-devel] [ofa-general] WinOF_2_0_5/SRP initiator: slow reads and eventually hangs
  2009-09-15 16:57                                                   ` Vladislav Bolkhovitin
@ 2009-09-15 17:01                                                     ` Chris Worley
  2009-09-15 17:10                                                       ` Vladislav Bolkhovitin
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Worley @ 2009-09-15 17:01 UTC (permalink / raw)
  To: Vladislav Bolkhovitin; +Cc: linux-rdma, scst-devel, OpenIB

On Tue, Sep 15, 2009 at 10:57 AM, Vladislav Bolkhovitin <vst@vlnb.net> wrote:
> Chris Worley, on 09/15/2009 08:53 PM wrote:
>>
>> On Tue, Sep 15, 2009 at 10:43 AM, Vladislav Bolkhovitin <vst@vlnb.net>
>> wrote:
>>>
>>> Chris Worley, on 09/15/2009 07:50 PM wrote:
>>>>
>>>> On Tue, Sep 15, 2009 at 12:10 AM, Bart Van Assche
>>>> <bart.vanassche@gmail.com> wrote:
>>>>>
>>>>> On Tue, Sep 15, 2009 at 1:03 AM, Chris Worley <worleys@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> On Mon, Sep 14, 2009 at 12:51 PM, Vladislav Bolkhovitin <vst@vlnb.net>
>>>>>> wrote:
>>>>>>>
>>>>>>> Chris Worley, on 09/11/2009 11:50 PM wrote:
>>>>>>>>
>>>>>>>> I've definitely removed the switch/firmware from being the cause.
>>>>>>>>
>>>>>>>> I'm thinking the reason you can't repeat the test may be latency
>>>>>>>> related.  We get ~50usecs average latency (on small block sizes),
>>>>>>>> which can't be achieved using regular SSD's (and rotating drives are
>>>>>>>> nowhere close).  Maybe a ramdisk would help repeat the issue.
>>>>>>>
>>>>>>> I think you should try to reproduce the problem with ramdisk or
>>>>>>> nullio.
>>>>>>> By
>>>>>>> so you will eliminate possible influence of the SSD backend.
>>>>>>
>>>>>> W/ 12GB RAM in the target, I created a 7GB ramdisk:
>>>>>>
>>>>>> mount -t ramfs -o size=7g ramfs /mnt/
>>>>>> dd if=/dev/zero of=/mnt/foo bs=1024k count=7000
>>>>>> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
>>>>>> echo "add ramdisk 2" >/proc/scsi_tgt/groups/Default/devices
>>>>>>
>>>>>> Then, on the initiator, I tested it... and it hung during sequential
>>>>>> 8KB block reads:
>>>>>>
>>>>>> fio --rw=read --bs=8k --numjobs=64 --iodepth=64 --sync=0 --direct=1
>>>>>> --randrepeat=0 \
>>>>>>  --group_reporting --ioengine=libaio --filename=/dev/sde --name=test
>>>>>> --loops=10000 --runtime=600
>>>>>>
>>>>>> Note that I was running the SM on the target this time too.
>>>>>
>>>>> Which Linux distro was installed on the inititiator and on the target
>>>>> ? And if applicable, which OFED version ? Which kernel messages were
>>>>> logged by SRPT around the time the issue occurred (after having
>>>>> enabled SRPT logging first) ?
>>>>
>>>> As logging hadn't helped this issue previously, I've not been enabling
>>>> it.  That plus the kernel hacks needed to invoke logging, it's not
>>>> worth enabling.
>>>>
>>>> This was with Ubuntu 8.10, built-in IB on the 2.6.27-14-server kernel.
>>>>
>>>> I couldn't get ramdisks working w/ SCST in RHEL5.2.  When running:
>>>>
>>>> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
>>>>
>>>> I get the error:
>>>>
>>>> dev_vdisk: ***ERROR***: Wrong f_op or FS doesn't have required
>>>> capabilities
>>>>
>>>> ... which doesn't occur in the Ubuntu kernel, so I've been unable to
>>>> test RHEL kernels w/ ramdisks.  In general, this problem occurs w/ 8KB
>>>> and smaller blocks w/ the Ubuntu kernels, and 2KB and smaller blocks
>>>> w/ RHEL kernels.
>>>
>>> Use ramfs instead.
>>
>> Do you mean:
>>
>> mount -t ramfs -o size=7g ramfs /mnt/
>
> You should then create a file on it and use it.

That's what I'm doing, I believe.  From above:

>>>>>> mount -t ramfs -o size=7g ramfs /mnt/
>>>>>> dd if=/dev/zero of=/mnt/foo bs=1024k count=7000
>>>>>> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
>>>>>> echo "add ramdisk 2" >/proc/scsi_tgt/groups/Default/devices

... but the "open", on RHEL5.2 kernel 2.6.18-92.el5, generates the
following kernel messages:

dev_vdisk: Registering virtual FILEIO device ramdisk
scst: Processing thread started, PID 9629
scst: Processing thread started, PID 9630
scst: Processing thread started, PID 9631
scst: Processing thread started, PID 9632
scst: Processing thread started, PID 9633
dev_vdisk: ***ERROR***: Wrong f_op or FS doesn't have required capabilities
scst: ***ERROR***: New device handler's vdisk attach() failed: -22
scst: Processing thread PID 9629 finished
scst: Processing thread PID 9630 finished
scst: Processing thread PID 9631 finished
scst: Processing thread PID 9632 finished
scst: Processing thread PID 9633 finished
scst: Failed to attach to virtual device ramdisk

Chris
>
>> ?
>>
>> That's what I'm doing.
>>
>> Chris
>>>>
>>>> Chris
>>>>>
>>>>> Bart.
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
>>>> is the only developer event you need to attend this year. Jumpstart your
>>>> developing skills, take BlackBerry mobile applications to market and
>>>> stay
>>>> ahead of the curve. Join us from November 9&#45;12, 2009. Register
>>>> now&#33;
>>>> http://p.sf.net/sfu/devconf
>>>> _______________________________________________
>>>> Scst-devel mailing list
>>>> Scst-devel@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scst-devel
>>>>
>>>
>> _______________________________________________
>> general mailing list
>> general@lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>
>> To unsubscribe, please visit
>> http://openib.org/mailman/listinfo/openib-general
>>
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Scst-devel] [ofa-general] WinOF_2_0_5/SRP initiator: slow reads and eventually hangs
  2009-09-15 17:01                                                     ` Chris Worley
@ 2009-09-15 17:10                                                       ` Vladislav Bolkhovitin
       [not found]                                                         ` <4AAFCA77.6050305-d+Crzxg7Rs0@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Vladislav Bolkhovitin @ 2009-09-15 17:10 UTC (permalink / raw)
  To: Chris Worley; +Cc: linux-rdma, scst-devel, OpenIB

Chris Worley, on 09/15/2009 09:01 PM wrote:
> On Tue, Sep 15, 2009 at 10:57 AM, Vladislav Bolkhovitin <vst@vlnb.net> wrote:
>> Chris Worley, on 09/15/2009 08:53 PM wrote:
>>> On Tue, Sep 15, 2009 at 10:43 AM, Vladislav Bolkhovitin <vst@vlnb.net>
>>> wrote:
>>>> Chris Worley, on 09/15/2009 07:50 PM wrote:
>>>>> On Tue, Sep 15, 2009 at 12:10 AM, Bart Van Assche
>>>>> <bart.vanassche@gmail.com> wrote:
>>>>>> On Tue, Sep 15, 2009 at 1:03 AM, Chris Worley <worleys@gmail.com>
>>>>>> wrote:
>>>>>>> On Mon, Sep 14, 2009 at 12:51 PM, Vladislav Bolkhovitin <vst@vlnb.net>
>>>>>>> wrote:
>>>>>>>> Chris Worley, on 09/11/2009 11:50 PM wrote:
>>>>>>>>> I've definitely removed the switch/firmware from being the cause.
>>>>>>>>>
>>>>>>>>> I'm thinking the reason you can't repeat the test may be latency
>>>>>>>>> related.  We get ~50usecs average latency (on small block sizes),
>>>>>>>>> which can't be achieved using regular SSD's (and rotating drives are
>>>>>>>>> nowhere close).  Maybe a ramdisk would help repeat the issue.
>>>>>>>> I think you should try to reproduce the problem with ramdisk or
>>>>>>>> nullio.
>>>>>>>> By
>>>>>>>> so you will eliminate possible influence of the SSD backend.
>>>>>>> W/ 12GB RAM in the target, I created a 7GB ramdisk:
>>>>>>>
>>>>>>> mount -t ramfs -o size=7g ramfs /mnt/
>>>>>>> dd if=/dev/zero of=/mnt/foo bs=1024k count=7000
>>>>>>> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
>>>>>>> echo "add ramdisk 2" >/proc/scsi_tgt/groups/Default/devices
>>>>>>>
>>>>>>> Then, on the initiator, I tested it... and it hung during sequential
>>>>>>> 8KB block reads:
>>>>>>>
>>>>>>> fio --rw=read --bs=8k --numjobs=64 --iodepth=64 --sync=0 --direct=1
>>>>>>> --randrepeat=0 \
>>>>>>>  --group_reporting --ioengine=libaio --filename=/dev/sde --name=test
>>>>>>> --loops=10000 --runtime=600
>>>>>>>
>>>>>>> Note that I was running the SM on the target this time too.
>>>>>> Which Linux distro was installed on the inititiator and on the target
>>>>>> ? And if applicable, which OFED version ? Which kernel messages were
>>>>>> logged by SRPT around the time the issue occurred (after having
>>>>>> enabled SRPT logging first) ?
>>>>> As logging hadn't helped this issue previously, I've not been enabling
>>>>> it.  That plus the kernel hacks needed to invoke logging, it's not
>>>>> worth enabling.
>>>>>
>>>>> This was with Ubuntu 8.10, built-in IB on the 2.6.27-14-server kernel.
>>>>>
>>>>> I couldn't get ramdisks working w/ SCST in RHEL5.2.  When running:
>>>>>
>>>>> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
>>>>>
>>>>> I get the error:
>>>>>
>>>>> dev_vdisk: ***ERROR***: Wrong f_op or FS doesn't have required
>>>>> capabilities
>>>>>
>>>>> ... which doesn't occur in the Ubuntu kernel, so I've been unable to
>>>>> test RHEL kernels w/ ramdisks.  In general, this problem occurs w/ 8KB
>>>>> and smaller blocks w/ the Ubuntu kernels, and 2KB and smaller blocks
>>>>> w/ RHEL kernels.
>>>> Use ramfs instead.
>>> Do you mean:
>>>
>>> mount -t ramfs -o size=7g ramfs /mnt/
>> You should then create a file on it and use it.
> 
> That's what I'm doing, I believe.  From above:
> 
>>>>>>> mount -t ramfs -o size=7g ramfs /mnt/
>>>>>>> dd if=/dev/zero of=/mnt/foo bs=1024k count=7000
>>>>>>> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
>>>>>>> echo "add ramdisk 2" >/proc/scsi_tgt/groups/Default/devices
> 
> ... but the "open", on RHEL5.2 kernel 2.6.18-92.el5, generates the
> following kernel messages:
> 
> dev_vdisk: Registering virtual FILEIO device ramdisk
> scst: Processing thread started, PID 9629
> scst: Processing thread started, PID 9630
> scst: Processing thread started, PID 9631
> scst: Processing thread started, PID 9632
> scst: Processing thread started, PID 9633
> dev_vdisk: ***ERROR***: Wrong f_op or FS doesn't have required capabilities
> scst: ***ERROR***: New device handler's vdisk attach() failed: -22
> scst: Processing thread PID 9629 finished
> scst: Processing thread PID 9630 finished
> scst: Processing thread PID 9631 finished
> scst: Processing thread PID 9632 finished
> scst: Processing thread PID 9633 finished
> scst: Failed to attach to virtual device ramdisk
> 
> Chris
>>> ?
>>>
>>> That's what I'm doing.

That's strange. I'm doing it all the time, although with not so old 
kernels as 2.6.18.

>>> Chris
>>>>> Chris
>>>>>> Bart.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Scst-devel] [ofa-general] WinOF_2_0_5/SRP initiator: slow reads and eventually hangs
       [not found]                                                         ` <4AAFCA77.6050305-d+Crzxg7Rs0@public.gmane.org>
@ 2009-09-15 20:51                                                           ` Chris Worley
       [not found]                                                             ` <f3177b9e0909151351p12173c78oe01cc8bcca957550-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Worley @ 2009-09-15 20:51 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, scst-devel, OpenIB

On Tue, Sep 15, 2009 at 11:10 AM, Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org> wrote:
> Chris Worley, on 09/15/2009 09:01 PM wrote:
>>
>> On Tue, Sep 15, 2009 at 10:57 AM, Vladislav Bolkhovitin <vst-UyIK/FWETgo@public.gmane.orgt>
>> wrote:
>>>
>>> Chris Worley, on 09/15/2009 08:53 PM wrote:
>>>>
>>>> On Tue, Sep 15, 2009 at 10:43 AM, Vladislav Bolkhovitin <vst@vlnb.net>
>>>> wrote:
>>>>>
>>>>> Chris Worley, on 09/15/2009 07:50 PM wrote:
>>>>>>
>>>>>> On Tue, Sep 15, 2009 at 12:10 AM, Bart Van Assche
>>>>>> <bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>>>>>>
>>>>>>> On Tue, Sep 15, 2009 at 1:03 AM, Chris Worley <worleys-Re5JQEeQqe8@public.gmane.orgm>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On Mon, Sep 14, 2009 at 12:51 PM, Vladislav Bolkhovitin
>>>>>>>> <vst-d+Crzxg7Rs0@public.gmane.org>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Chris Worley, on 09/11/2009 11:50 PM wrote:
>>>>>>>>>>
>>>>>>>>>> I've definitely removed the switch/firmware from being the cause.
>>>>>>>>>>
>>>>>>>>>> I'm thinking the reason you can't repeat the test may be latency
>>>>>>>>>> related.  We get ~50usecs average latency (on small block sizes),
>>>>>>>>>> which can't be achieved using regular SSD's (and rotating drives
>>>>>>>>>> are
>>>>>>>>>> nowhere close).  Maybe a ramdisk would help repeat the issue.
>>>>>>>>>
>>>>>>>>> I think you should try to reproduce the problem with ramdisk or
>>>>>>>>> nullio.
>>>>>>>>> By
>>>>>>>>> so you will eliminate possible influence of the SSD backend.
>>>>>>>>
>>>>>>>> W/ 12GB RAM in the target, I created a 7GB ramdisk:
>>>>>>>>
>>>>>>>> mount -t ramfs -o size=7g ramfs /mnt/
>>>>>>>> dd if=/dev/zero of=/mnt/foo bs=1024k count=7000
>>>>>>>> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
>>>>>>>> echo "add ramdisk 2" >/proc/scsi_tgt/groups/Default/devices
>>>>>>>>
>>>>>>>> Then, on the initiator, I tested it... and it hung during sequential
>>>>>>>> 8KB block reads:
>>>>>>>>
>>>>>>>> fio --rw=read --bs=8k --numjobs=64 --iodepth=64 --sync=0 --direct=1
>>>>>>>> --randrepeat=0 \
>>>>>>>>  --group_reporting --ioengine=libaio --filename=/dev/sde --name=test
>>>>>>>> --loops=10000 --runtime=600
>>>>>>>>
>>>>>>>> Note that I was running the SM on the target this time too.
>>>>>>>
>>>>>>> Which Linux distro was installed on the inititiator and on the target
>>>>>>> ? And if applicable, which OFED version ? Which kernel messages were
>>>>>>> logged by SRPT around the time the issue occurred (after having
>>>>>>> enabled SRPT logging first) ?
>>>>>>
>>>>>> As logging hadn't helped this issue previously, I've not been enabling
>>>>>> it.  That plus the kernel hacks needed to invoke logging, it's not
>>>>>> worth enabling.
>>>>>>
>>>>>> This was with Ubuntu 8.10, built-in IB on the 2.6.27-14-server kernel.
>>>>>>
>>>>>> I couldn't get ramdisks working w/ SCST in RHEL5.2.  When running:
>>>>>>
>>>>>> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
>>>>>>
>>>>>> I get the error:
>>>>>>
>>>>>> dev_vdisk: ***ERROR***: Wrong f_op or FS doesn't have required
>>>>>> capabilities
>>>>>>
>>>>>> ... which doesn't occur in the Ubuntu kernel, so I've been unable to
>>>>>> test RHEL kernels w/ ramdisks.  In general, this problem occurs w/ 8KB
>>>>>> and smaller blocks w/ the Ubuntu kernels, and 2KB and smaller blocks
>>>>>> w/ RHEL kernels.
>>>>>
>>>>> Use ramfs instead.
>>>>
>>>> Do you mean:
>>>>
>>>> mount -t ramfs -o size=7g ramfs /mnt/
>>>
>>> You should then create a file on it and use it.
>>
>> That's what I'm doing, I believe.  From above:
>>
>>>>>>>> mount -t ramfs -o size=7g ramfs /mnt/
>>>>>>>> dd if=/dev/zero of=/mnt/foo bs=1024k count=7000
>>>>>>>> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
>>>>>>>> echo "add ramdisk 2" >/proc/scsi_tgt/groups/Default/devices
>>
>> ... but the "open", on RHEL5.2 kernel 2.6.18-92.el5, generates the
>> following kernel messages:
>>
>> dev_vdisk: Registering virtual FILEIO device ramdisk
>> scst: Processing thread started, PID 9629
>> scst: Processing thread started, PID 9630
>> scst: Processing thread started, PID 9631
>> scst: Processing thread started, PID 9632
>> scst: Processing thread started, PID 9633
>> dev_vdisk: ***ERROR***: Wrong f_op or FS doesn't have required
>> capabilities
>> scst: ***ERROR***: New device handler's vdisk attach() failed: -22
>> scst: Processing thread PID 9629 finished
>> scst: Processing thread PID 9630 finished
>> scst: Processing thread PID 9631 finished
>> scst: Processing thread PID 9632 finished
>> scst: Processing thread PID 9633 finished
>> scst: Failed to attach to virtual device ramdisk
>>
>> Chris
>>>>
>>>> ?
>>>>
>>>> That's what I'm doing.
>
> That's strange. I'm doing it all the time, although with not so old kernels
> as 2.6.18.

In lots of testing today, I've seen this panic twice on the Ubuntu 8.10 targets:

[  330.155992] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[  357.207046] ib_srpt: srpt_xmit_response: tag= 17 channel in bad state 2
[  357.207052] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[  357.207100] ib_srpt: srpt_xmit_response: tag= 47 channel in bad state 2
[  357.207104] scst: ***ERROR***: Target driver ib_srpt
xmit_response() returned fatal error
[  357.241429] scst: ***ERROR***: Target driver ib_srpt
xmit_response() returned fatal error
[  357.250234] ------------[ cut here ]------------
[  357.250537] ib_srpt: srpt_xmit_response: tag= 26 channel in bad state 2
[  357.250539] scst: ***ERROR***: Target driver ib_srpt
xmit_response() returned fatal error
[  357.250550] ib_srpt: srpt_xmit_response: tag= 38 channel in bad state 2
[  357.250553] scst: ***ERROR***: Target driver ib_srpt
xmit_response() returned fatal error
[  357.250560] ib_srpt: srpt_xmit_response: tag= 27 channel in bad state 2
<repeated many times>
[  357.301253] kernel BUG at /root/scst/scst/src/scst_targ.c:3089!
[  357.301253] invalid opcode: 0000 [1] SMP
[  357.301253] CPU 0
...
[  357.301253] RIP: 0010:[<ffffffffa04759f6>]  [<ffffffffa04759f6>]
scst_tgt_cmd_done+0x26/0x30 [scst]
[  357.301253] RSP: 0018:ffff88039ad27b50  EFLAGS: 00010297
[  357.301253] RAX: 0000000000000200 RBX: ffff8803ad9c68f8 RCX: 0000000000000000
[  357.301253] RDX: 00000000ffffffff RSI: 0000000000000000 RDI: ffff8803ad9c68f8
[  357.301253] RBP: ffff88039ad27b50 R08: 0000000000000000 R09: 0000000000000000
[  357.301253] R10: ffff88039ad277c0 R11: ffff88041ad278cf R12: ffff8803c2972180
[  357.301253] R13: ffff88039ada0000 R14: 0000000000000001 R15: ffff8803fb00c2b0
[  357.301253] FS:  0000000000000000(0000) GS:ffffffff807dd000(0000)
knlGS:0000000000000000
[  357.301253] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[  357.301253] CR2: 00007f9281e64000 CR3: 0000000000201000 CR4: 00000000000006e0
[  357.301253] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  357.301253] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  357.301253] Process ib_cm/0 (pid: 8299, threadinfo
ffff88039ad26000, task ffff88039ad40000)
[  357.301253] Stack:  ffff88039ad27b80 ffffffffa04c0c47
ffff88039a8db900 ffff8803c2972180
[  357.301253]  ffff8803fb00c240 ffff8803fb00c284 ffff88039ad27bc0
ffffffffa04c0d93
[  357.301253]  ffff88042a4959c0 ffff88042a9d7800 ffff88042544da00
ffff88042a9d7898
[  357.301253] Call Trace:
[  357.301253]  [<ffffffffa04c0c47>] srpt_abort_scst_cmd+0xd7/0x160 [ib_srpt]
[  357.301253]  [<ffffffffa04c0d93>] srpt_release_channel+0xc3/0x190 [ib_srpt]
[  357.301253]  [<ffffffffa04c0e82>]
srpt_find_and_release_channel+0x22/0x30 [ib_srpt]
[  357.301253]  [<ffffffffa04c227d>] srpt_cm_handler+0x6d/0xbb8 [ib_srpt]
[  357.301253]  [<ffffffff80247526>] ? try_to_wake_up+0x126/0x2f0
[  357.301253]  [<ffffffff802476fd>] ? default_wake_function+0xd/0x10
[  357.301253]  [<ffffffff80267106>] ? autoremove_wake_function+0x16/0x40
[  357.301253]  [<ffffffff8023c99a>] ? __wake_up_common+0x5a/0x90
[  357.301253]  [<ffffffff8023dece>] ? __wake_up+0x4e/0x70
[  357.301253]  [<ffffffff80263271>] ? __queue_work+0x41/0x50
[  357.301253]  [<ffffffff8026331d>] ? queue_work_on+0x4d/0x60
[  357.301253]  [<ffffffff8026344f>] ? queue_work+0x1f/0x30
[  357.301253]  [<ffffffff8026350d>] ? queue_delayed_work+0x2d/0x40
[  357.301253]  [<ffffffffa0373255>] ? wait_for_response+0xd5/0xe0 [ib_mad]
[  357.301253]  [<ffffffffa04b4e07>] cm_process_work+0x27/0x130 [ib_cm]
[  357.301253]  [<ffffffffa04b5cf1>] cm_drep_handler+0xf1/0x180 [ib_cm]
[  357.301253]  [<ffffffffa04b7790>] ? cm_work_handler+0x0/0x1b8 [ib_cm]
[  357.301253]  [<ffffffffa04b7895>] cm_work_handler+0x105/0x1b8 [ib_cm]
[  357.301253]  [<ffffffffa04b7790>] ? cm_work_handler+0x0/0x1b8 [ib_cm]
[  357.301253]  [<ffffffff802628c2>] run_workqueue+0xc2/0x1a0
[  357.301253]  [<ffffffff80262bcf>] worker_thread+0xaf/0x130
[  357.301253]  [<ffffffff802670f0>] ? autoremove_wake_function+0x0/0x40
[  357.301253]  [<ffffffff80262b20>] ? worker_thread+0x0/0x130
[  357.301253]  [<ffffffff80266cbe>] kthread+0x4e/0x90
[  357.301253]  [<ffffffff80213c99>] child_rip+0xa/0x11
[  357.301253]  [<ffffffff80266c70>] ? kthread+0x0/0x90
[  357.301253]  [<ffffffff80213c8f>] ? child_rip+0x0/0x11
[  357.301253]
[  357.301253]
[  357.301253] Code: 00 00 00 00 00 55 48 89 e5 e8 a7 cc d9 df 83 7f
28 78 75 17 80 67 2d f7 c7 47 28 0d 00 00 00 ba 01 00 00 00 e8 8c fc
ff ff c9 c3 <0f> 0b eb fe 66 0f 1f 44 00 00 55 48 89 e5 41 54 53 e8 74
cc d9
[  357.301253] RIP  [<ffffffffa04759f6>] scst_tgt_cmd_done+0x26/0x30 [scst]
[  357.301253]  RSP <ffff88039ad27b50>
[  358.302745] ---[ end trace a7f20725e9471e16 ]---
[  384.258076] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[  384.276974] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[  384.290055] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[  411.329136] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[  411.348297] ib_srpt: srpt_xmit_response: tag= 24 channel in bad state 2
[  411.348301] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[  411.348323] ib_srpt: srpt_xmit_response: tag= 20 channel in bad state 2
[  411.348326] scst: ***ERROR***: Target driver ib_srpt
xmit_response() returned fatal error
[  411.348331] ib_srpt: srpt_xmit_response: tag= 42 channel in bad state 2
<many repeats>
[  411.349705] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[  411.621319] scst: ***ERROR***: Target driver ib_srpt
xmit_response() returned fatal error
[  411.629805] ib_srpt: srpt_xmit_response: tag= 38 channel in bad state 2
[  411.636690] scst: ***ERROR***: Target driver ib_srpt
xmit_response() returned fatal error
[  411.636698] ------------[ cut here ]------------
[  411.636699] WARNING: at /root/scst/srpt/src/ib_srpt.c:924
srpt_abort_scst_cmd+0xac/0x160 [ib_srpt]()
...
[  411.636735]
[  411.636736] Call Trace:
[  411.636742]  [<ffffffff8024eb14>] warn_on_slowpath+0x64/0x90
[  411.636747]  [<ffffffff802b3aa0>] ? free_hot_page+0x10/0x20
[  411.636750]  [<ffffffff802e2430>] ? __slab_free+0x10/0x120
[  411.636753]  [<ffffffff80234089>] ? __phys_addr+0x9/0x50
[  411.636757]  [<ffffffff803b3e1c>] ? swiotlb_unmap_sg_attrs+0x9c/0xc0
[  411.636759]  [<ffffffffa04c0c1c>] srpt_abort_scst_cmd+0xac/0x160 [ib_srpt]
[  411.636762]  [<ffffffffa04c0d93>] srpt_release_channel+0xc3/0x190 [ib_srpt]
[  411.636764]  [<ffffffffa04c0e82>]
srpt_find_and_release_channel+0x22/0x30 [ib_srpt]
[  411.636766]  [<ffffffffa04c227d>] srpt_cm_handler+0x6d/0xbb8 [ib_srpt]
[  411.636769]  [<ffffffff80247526>] ? try_to_wake_up+0x126/0x2f0
[  411.636771]  [<ffffffff802476fd>] ? default_wake_function+0xd/0x10
[  411.636774]  [<ffffffff80267106>] ? autoremove_wake_function+0x16/0x40
[  411.636776]  [<ffffffff8023c99a>] ? __wake_up_common+0x5a/0x90
[  411.636778]  [<ffffffff8023dece>] ? __wake_up+0x4e/0x70
[  411.636781]  [<ffffffff80263271>] ? __queue_work+0x41/0x50
[  411.636783]  [<ffffffff8026331d>] ? queue_work_on+0x4d/0x60
[  411.636784]  [<ffffffff8026344f>] ? queue_work+0x1f/0x30
[  411.636785]  [<ffffffff8026350d>] ? queue_delayed_work+0x2d/0x40
[  411.636790]  [<ffffffffa0373255>] ? wait_for_response+0xd5/0xe0 [ib_mad]
[  411.636794]  [<ffffffffa04b4e07>] cm_process_work+0x27/0x130 [ib_cm]
[  411.636797]  [<ffffffffa04b5cf1>] cm_drep_handler+0xf1/0x180 [ib_cm]
[  411.636799]  [<ffffffffa04b7790>] ? cm_work_handler+0x0/0x1b8 [ib_cm]
[  411.636802]  [<ffffffffa04b7895>] cm_work_handler+0x105/0x1b8 [ib_cm]
[  411.636804]  [<ffffffffa04b7790>] ? cm_work_handler+0x0/0x1b8 [ib_cm]
[  411.636806]  [<ffffffff802628c2>] run_workqueue+0xc2/0x1a0
[  411.636808]  [<ffffffff80262bcf>] worker_thread+0xaf/0x130
[  411.636809]  [<ffffffff802670f0>] ? autoremove_wake_function+0x0/0x40
[  411.636811]  [<ffffffff80262b20>] ? worker_thread+0x0/0x130
[  411.636812]  [<ffffffff80266cbe>] kthread+0x4e/0x90
[  411.636815]  [<ffffffff80213c99>] child_rip+0xa/0x11
[  411.636816]  [<ffffffff80266c70>] ? kthread+0x0/0x90
[  411.636817]  [<ffffffff80213c8f>] ? child_rip+0x0/0x11
[  411.636818]
[  411.636819] ---[ end trace a7f20725e9471e16 ]---
[  411.636838] ------------[ cut here ]------------
[  411.636839] kernel BUG at /root/scst/scst/src/scst_targ.c:3089!
[  411.636840] invalid opcode: 0000 [2] SMP
[  411.636841] CPU 3
...
[  411.636863] RIP: 0010:[<ffffffffa04759f6>]  [<ffffffffa04759f6>]
scst_tgt_cmd_done+0x26/0x30 [scst]
[  411.636874] RSP: 0018:ffff88039ad4fb50  EFLAGS: 00010297
[  411.636875] RAX: 0000000000000200 RBX: ffff880071d34558 RCX: 00000000000213e2
[  411.636876] RDX: 00000000ffffffff RSI: 0000000000000000 RDI: ffff880071d34558
[  411.636877] RBP: ffff88039ad4fb50 R08: 0000000000000006 R09: 0000000000000000
[  411.636878] R10: ffff88039ad4f7c0 R11: ffff88041ad4f8cf R12: ffff88039af49240
[  411.636879] R13: ffff88039ada0000 R14: 0000000000000001 R15: ffff8803ad8b97f0
[  411.636881] FS:  0000000000000000(0000) GS:ffff88042e4f7080(0000)
knlGS:0000000000000000
[  411.636882] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[  411.636883] CR2: 00007f9281e64000 CR3: 0000000000201000 CR4: 00000000000006e0
[  411.636884] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  411.636885] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  411.636887] Process ib_cm/3 (pid: 8302, threadinfo
ffff88039ad4e000, task ffff88039ad44350)
[  411.636888] Stack:  ffff88039ad4fb80 ffffffffa04c0c47
ffff88039a995a80 ffff88039af49240
[  411.636890]  ffff8803ad8b9780 ffff8803ad8b97c4 ffff88039ad4fbc0
ffffffffa04c0d93
[  411.636891]  ffff88042a4959c0 ffff88042896f300 ffff88041801d800
ffff88042896f398
[  411.636893] Call Trace:
[  411.636895]  [<ffffffffa04c0c47>] srpt_abort_scst_cmd+0xd7/0x160 [ib_srpt]
[  411.636897]  [<ffffffffa04c0d93>] srpt_release_channel+0xc3/0x190 [ib_srpt]
[  411.636899]  [<ffffffffa04c0e82>]
srpt_find_and_release_channel+0x22/0x30 [ib_srpt]
[  411.636901]  [<ffffffffa04c227d>] srpt_cm_handler+0x6d/0xbb8 [ib_srpt]
[  411.636903]  [<ffffffff80247526>] ? try_to_wake_up+0x126/0x2f0
[  411.636905]  [<ffffffff802476fd>] ? default_wake_function+0xd/0x10
[  411.636906]  [<ffffffff80267106>] ? autoremove_wake_function+0x16/0x40
[  411.636908]  [<ffffffff8023c99a>] ? __wake_up_common+0x5a/0x90
[  411.636909]  [<ffffffff8023dece>] ? __wake_up+0x4e/0x70
[  411.636911]  [<ffffffff80263271>] ? __queue_work+0x41/0x50
[  411.636912]  [<ffffffff8026331d>] ? queue_work_on+0x4d/0x60
[  411.636914]  [<ffffffff8026344f>] ? queue_work+0x1f/0x30
[  411.636915]  [<ffffffff8026350d>] ? queue_delayed_work+0x2d/0x40
[  411.636918]  [<ffffffffa0373255>] ? wait_for_response+0xd5/0xe0 [ib_mad]
[  411.636921]  [<ffffffffa04b4e07>] cm_process_work+0x27/0x130 [ib_cm]
[  411.636923]  [<ffffffffa04b5cf1>] cm_drep_handler+0xf1/0x180 [ib_cm]
[  411.636926]  [<ffffffffa04b7790>] ? cm_work_handler+0x0/0x1b8 [ib_cm]
[  411.636928]  [<ffffffffa04b7895>] cm_work_handler+0x105/0x1b8 [ib_cm]
[  411.636930]  [<ffffffffa04b7790>] ? cm_work_handler+0x0/0x1b8 [ib_cm]
[  411.636932]  [<ffffffff802628c2>] run_workqueue+0xc2/0x1a0
[  411.636934]  [<ffffffff80262bcf>] worker_thread+0xaf/0x130
[  411.636935]  [<ffffffff802670f0>] ? autoremove_wake_function+0x0/0x40
[  411.636937]  [<ffffffff80262b20>] ? worker_thread+0x0/0x130
[  411.636938]  [<ffffffff80266cbe>] kthread+0x4e/0x90
[  411.636939]  [<ffffffff80213c99>] child_rip+0xa/0x11
[  411.636941]  [<ffffffff80266c70>] ? kthread+0x0/0x90
[  411.636942]  [<ffffffff80213c8f>] ? child_rip+0x0/0x11
[  411.636943]
[  411.636943]
[  411.636943] Code: 00 00 00 00 00 55 48 89 e5 e8 a7 cc d9 df 83 7f
28 78 75 17 80 67 2d f7 c7 47 28 0d 00 00 00 ba 01 00 00 00 e8 8c fc
ff ff c9 c3 <0f> 0b eb fe 66 0f 1f 44 00 00 55 48 89 e5 41 54 53 e8 74
cc d9
[  411.636954] RIP  [<ffffffffa04759f6>] scst_tgt_cmd_done+0x26/0x30 [scst]
[  411.636960]  RSP <ffff88039ad4fb50>
[  411.636966] ---[ end trace a7f20725e9471e16 ]---
[  412.361070] ib_srpt: srpt_xmit_response: tag= 8 channel in bad state 2
[  412.367835] scst: ***ERROR***: Target driver ib_srpt
xmit_response() returned fatal error
[  412.376335] ib_srpt: srpt_xmit_response: tag= 16 channel in bad state 2
[  412.383183] scst: ***ERROR***: Target driver ib_srpt
xmit_response() returned fatal error
[  412.391677] ib_srpt: srpt_xmit_response: tag= 47 channel in bad state 2
<many repeats>
[  456.831960] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[  456.850851] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[  456.863891] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[  483.902158] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[  483.921040] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[  483.934101] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[  510.971720] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[  510.990587] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[  511.003643] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[  543.542817] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000055:0x24710000000055, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[  543.563208] scst: Using security group "Default" for initiator
"0x00247100000000550024710000000055"
[  544.186067] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[  544.204962] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[  544.218003] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[  571.256063] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[  571.274934] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[  571.287930] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[  598.330955] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[  598.349836] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[  598.362854] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[  625.400865] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[  625.419753] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[  625.432786] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[  652.470722] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[  652.489625] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[  652.502814] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[  679.540621] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[  679.559503] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[  679.572514] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[  706.610483] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[  706.629376] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[  706.642369] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[  733.680405] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[  733.699243] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[  733.712237] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[  760.750170] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[  760.769019] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[  760.782038] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[  800.108971] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[  800.127840] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[  800.140847] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[  827.178873] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[  827.197731] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[  827.210717] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[  854.248829] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[  854.267662] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[  854.280645] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[  881.318906] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[  881.337874] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[  881.351139] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[  893.147951] BUG: unable to handle kernel paging request at ffffe3e00e700cc0
[  893.155156] IP: [<ffffffff802e2589>] kfree+0x49/0x100
[  893.157386] PGD 0
[  893.157386] Oops: 0000 [3] SMP
[  893.157386] CPU 2
...
[  893.157386] RIP: 0010:[<ffffffff802e2589>]  [<ffffffff802e2589>]
kfree+0x49/0x100
[  893.157386] RSP: 0018:ffff88042e4d7ce0  EFLAGS: 00010216
[  893.157386] RAX: 000001e00e700cc0 RBX: ffffe3e00e700cc0 RCX: 0100000000002081
[  893.157386] RDX: ffffe20000000000 RSI: ffff88039accdf00 RDI: 000000039c033000
[  893.157386] RBP: ffff88042e4d7d10 R08: 0000000000000000 R09: 0000000000000001
[  893.157386] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8803c09d7c20
[  893.157386] R13: 000000039c033000 R14: ffff88039a108240 R15: ffff880071dc4558
[  893.157386] FS:  0000000000000000(0000) GS:ffff88042fc02d00(0000)
knlGS:0000000000000000
[  893.157386] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[  893.157386] CR2: ffffe3e00e700cc0 CR3: 0000000000201000 CR4: 00000000000006e0
[  893.157386] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  893.157386] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  893.157386] Process swapper (pid: 0, threadinfo ffff88042e4d2000,
task ffff88042e4cace0)
[  893.157386] Stack:  ffff88042e4d7cf0 0000000000000001
ffff8803c09d7c20 ffff88039accdf00
[  893.157386]  ffff88039a108240 ffff880071dc4558 ffff88042e4d7d80
ffffffffa04bf301
[  893.157386]  0000000000000000 ffffffff802e2430 ffffe20001c77100
0000000000000202
[  893.157386] Call Trace:
[  893.157386]  <IRQ>  [<ffffffffa04bf301>] srpt_reset_ioctx+0x41/0xf0 [ib_srpt]
[  893.157386]  [<ffffffff802e2430>] ? __slab_free+0x10/0x120
[  893.157386]  [<ffffffffa047b8f7>] ? scst_destroy_put_cmd+0x47/0x80 [scst]
[  893.157386]  [<ffffffffa04bf42d>] srpt_on_free_cmd+0x7d/0xa0 [ib_srpt]
[  893.157386]  [<ffffffffa047bc59>] scst_free_cmd+0x39/0x1a0 [scst]
[  893.157386]  [<ffffffffa0474c7d>]
scst_process_active_cmd+0x144d/0x1bd0 [scst]
[  893.157386]  [<ffffffffa04c180e>] ? srpt_completion+0x4e/0x230 [ib_srpt]
[  893.157386]  [<ffffffffa047546d>] scst_do_job_active+0x6d/0x90 [scst]
[  893.157386]  [<ffffffffa04754b7>] scst_cmd_tasklet+0x27/0x40 [scst]
[  893.157386]  [<ffffffffa0348a7d>] ? mlx4_eq_int+0x7d/0x2a0 [mlx4_core]
[  893.157386]  [<ffffffff80254776>] tasklet_action+0x86/0x110
[  893.157386]  [<ffffffff80254e3c>] __do_softirq+0x8c/0x100
[  893.157386]  [<ffffffff8021417c>] call_softirq+0x1c/0x30
[  893.157386]  [<ffffffff80215875>] do_softirq+0x65/0xa0
[  893.157386]  [<ffffffff80254ba5>] irq_exit+0x95/0xa0
[  893.157386]  [<ffffffff80215b1b>] do_IRQ+0x8b/0x100
[  893.157386]  [<ffffffff80212f0e>] ret_from_intr+0x0/0x29
[  893.157386]  <EOI>  [<ffffffff8021aba2>] ? mwait_idle+0x52/0x60
[  893.157386]  [<ffffffff8021ab59>] ? mwait_idle+0x9/0x60
[  893.157386]  [<ffffffff80210e95>] ? cpu_idle+0x75/0x110
[  893.157386]  [<ffffffff804ff405>] ? start_secondary+0x97/0xc2
[  893.157386]
[  893.157386]
[  893.157386] Code: 01 f3 ff 48 83 ff 10 49 89 fd 0f 86 85 00 00 00
e8 0d 1b f5 ff 48 c1 e8 0c 48 ba 00 00 00 00 00 e2 ff ff 48 c1 e0 06
48 8d 1c 10 <48> 8b 03 f6 c4 40 74 07 48 8b 5b 10 48 8b 03 84 c0 0f 89
87 00
[  893.157386] RIP  [<ffffffff802e2589>] kfree+0x49/0x100
[  893.157386]  RSP <ffff88042e4d7ce0>
[  893.157386] CR2: ffffe3e00e700cc0
[  893.157386] ---[ end trace a7f20725e9471e16 ]---
[  893.533612] Kernel panic - not syncing: Aiee, killing interrupt handler!
[  893.540006] ------------[ cut here ]------------
[  893.540617] WARNING: at /build/buildd/linux-2.6.27/kernel/smp.c:332
smp_call_function_mask+0x22c/0x240()
...
[  893.630002]
[  893.630002] Call Trace:
[  893.632644]  <IRQ>  [<ffffffff8024eb14>] warn_on_slowpath+0x64/0x90
[  893.640002]  [<ffffffff8023e8f3>] ? __enqueue_entity+0x93/0xa0
[  893.643556]  [<ffffffff8023fdd9>] ? enqueue_entity+0xd9/0x260
[  893.650002]  [<ffffffff802463f9>] ? enqueue_task_fair+0x59/0x60
[  893.660002]  [<ffffffff8023c460>] ? enqueue_task+0x50/0x60
[  893.661747]  [<ffffffff80245e9d>] ? resched_task+0x2d/0x90
[  893.670002]  [<ffffffff8027876c>] smp_call_function_mask+0x22c/0x240
[  893.672916]  [<ffffffff802256c0>] ? stop_this_cpu+0x0/0x40
[  893.680003]  [<ffffffff80502d69>] ? mutex_unlock+0x9/0x20
[  893.690003]  [<ffffffff80284d44>] ? crash_kexec+0x74/0x100
[  893.690515]  [<ffffffff80267106>] ? autoremove_wake_function+0x16/0x40
[  893.700002]  [<ffffffff8023c99a>] ? __wake_up_common+0x5a/0x90
[  893.703245]  [<ffffffff802787a0>] smp_call_function+0x20/0x30
[  893.710002]  [<ffffffff80225688>] native_smp_send_stop+0x28/0x60
[  893.720003]  [<ffffffff80501b7d>] panic+0xb4/0x177
[  893.721449]  [<ffffffff8026c3a1>] ? __blocking_notifier_call_chain+0x21/0x90
[  893.730002]  [<ffffffff8026c426>] ? blocking_notifier_call_chain+0x16/0x20
[  893.740002]  [<ffffffff80253505>] do_exit+0x3f5/0x410
[  893.740638]  [<ffffffff80504ea0>] oops_begin+0x0/0xb0
[  893.750002]  [<ffffffff80506e60>] do_page_fault+0x270/0x790
[  893.751374]  [<ffffffff802b2816>] ? get_page_from_freelist+0x2a6/0x380
[  893.760003]  [<ffffffff8026b3e1>] ? ktime_get_ts+0x61/0x70
[  893.764197]  [<ffffffff8026d0ac>] ? sched_clock_cpu+0xcc/0x160
[  893.770002]  [<ffffffff805045ea>] error_exit+0x0/0x70
[  893.780003]  [<ffffffff802e2589>] ? kfree+0x49/0x100
[  893.780898]  [<ffffffffa04bf301>] srpt_reset_ioctx+0x41/0xf0 [ib_srpt]
[  893.790003]  [<ffffffff802e2430>] ? __slab_free+0x10/0x120
[  893.793126]  [<ffffffffa047b8f7>] ? scst_destroy_put_cmd+0x47/0x80 [scst]
[  893.800002]  [<ffffffffa04bf42d>] srpt_on_free_cmd+0x7d/0xa0 [ib_srpt]
[  893.810002]  [<ffffffffa047bc59>] scst_free_cmd+0x39/0x1a0 [scst]
[  893.812078]  [<ffffffffa0474c7d>]
scst_process_active_cmd+0x144d/0x1bd0 [scst]
[  893.820002]  [<ffffffffa04c180e>] ? srpt_completion+0x4e/0x230 [ib_srpt]
[  893.830005]  [<ffffffffa047546d>] scst_do_job_active+0x6d/0x90 [scst]
[  893.833429]  [<ffffffffa04754b7>] scst_cmd_tasklet+0x27/0x40 [scst]
[  893.840003]  [<ffffffffa0348a7d>] ? mlx4_eq_int+0x7d/0x2a0 [mlx4_core]
[  893.850003]  [<ffffffff80254776>] tasklet_action+0x86/0x110
[  893.852828]  [<ffffffff80254e3c>] __do_softirq+0x8c/0x100
[  893.860004]  [<ffffffff8021417c>] call_softirq+0x1c/0x30
[  893.864467]  [<ffffffff80215875>] do_softirq+0x65/0xa0
[  893.870005]  [<ffffffff80254ba5>] irq_exit+0x95/0xa0
[  893.880004]  [<ffffffff80215b1b>] do_IRQ+0x8b/0x100
[  893.880448]  [<ffffffff80212f0e>] ret_from_intr+0x0/0x29
[  893.890003]  <EOI>  [<ffffffff8021aba2>] ? mwait_idle+0x52/0x60
[  893.893008]  [<ffffffff8021ab59>] ? mwait_idle+0x9/0x60
[  893.900003]  [<ffffffff80210e95>] ? cpu_idle+0x75/0x110
[  893.902676]  [<ffffffff804ff405>] ? start_secondary+0x97/0xc2
[  893.910004]
[  893.913956] ---[ end trace a7f20725e9471e16 ]---

This may have been due to low memory, as I was using most target
memory for the ramdisk.

I do seem to be able to push the issue harder with:

fio --rw=randrw --bs=1k --numjobs=64 --iodepth=64 --sync=0 \
    --direct=1 --randrepeat=0  --group_reporting --ioengine=libaio \
    --filename=/dev/sdp --name=test --loops=10000 --runtime=1600 \
    --rwmixread=100

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Scst-devel] [ofa-general] WinOF_2_0_5/SRP initiator: slow reads and eventually hangs
       [not found]                                           ` <f3177b9e0909150850r2d0f5e15id7f4e14b015f68ed-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2009-09-15 16:43                                             ` Vladislav Bolkhovitin
@ 2009-09-16  6:38                                             ` Bart Van Assche
  2009-09-16  6:42                                             ` Bart Van Assche
  2 siblings, 0 replies; 22+ messages in thread
From: Bart Van Assche @ 2009-09-16  6:38 UTC (permalink / raw)
  To: Chris Worley
  Cc: Vladislav Bolkhovitin, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	scst-devel, OpenIB

On Tue, Sep 15, 2009 at 5:50 PM, Chris Worley <worleys-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> I couldn't get ramdisks working w/ SCST in RHEL5.2.  When running:
>
> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
>
> I get the error:
>
> dev_vdisk: ***ERROR***: Wrong f_op or FS doesn't have required capabilities
>
> ... which doesn't occur in the Ubuntu kernel, so I've been unable to
> test RHEL kernels w/ ramdisks.  In general, this problem occurs w/ 8KB
> and smaller blocks w/ the Ubuntu kernels, and 2KB and smaller blocks
> w/ RHEL kernels.

The following should work on RHEL systems:

echo "open ramdisk /dev/ram0" > /proc/scsi_tgt/vdisk/vdisk

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Scst-devel] [ofa-general] WinOF_2_0_5/SRP initiator: slow reads and eventually hangs
       [not found]                                           ` <f3177b9e0909150850r2d0f5e15id7f4e14b015f68ed-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2009-09-15 16:43                                             ` Vladislav Bolkhovitin
  2009-09-16  6:38                                             ` Bart Van Assche
@ 2009-09-16  6:42                                             ` Bart Van Assche
  2 siblings, 0 replies; 22+ messages in thread
From: Bart Van Assche @ 2009-09-16  6:42 UTC (permalink / raw)
  To: Chris Worley
  Cc: Vladislav Bolkhovitin, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	scst-devel, OpenIB

On Tue, Sep 15, 2009 at 5:50 PM, Chris Worley <worleys-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Tue, Sep 15, 2009 at 12:10 AM, Bart Van Assche
> <bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Which Linux distro was installed on the inititiator and on the target
>> ? And if applicable, which OFED version ? Which kernel messages were
>> logged by SRPT around the time the issue occurred (after having
>> enabled SRPT logging first) ?
>
> This was with Ubuntu 8.10, built-in IB on the 2.6.27-14-server kernel.

A 2.6.27-14-server kernel with or without SCST patches ?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Scst-devel] [ofa-general] WinOF_2_0_5/SRP initiator: slow reads and eventually hangs
       [not found]                                                             ` <f3177b9e0909151351p12173c78oe01cc8bcca957550-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2009-09-16  7:03                                                               ` Bart Van Assche
  2009-09-16 15:11                                                                 ` Chris Worley
  2009-09-16 18:15                                                               ` Vladislav Bolkhovitin
  1 sibling, 1 reply; 22+ messages in thread
From: Bart Van Assche @ 2009-09-16  7:03 UTC (permalink / raw)
  To: Chris Worley
  Cc: Vladislav Bolkhovitin, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	scst-devel, OpenIB

On Tue, Sep 15, 2009 at 10:51 PM, Chris Worley <worleys-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> In lots of testing today, I've seen this panic twice on the Ubuntu 8.10 targets:
>
> [  330.155992] ib_srpt: disconnected session
> 0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
> been received.

The above message means that an initiator logged in, did not log out
and logged in again. Has one of the initiator systems e.g. been power
cycled while an SRP connection was active ?

> [  357.207046] ib_srpt: srpt_xmit_response: tag= 17 channel in bad state 2

This means that an attempt was made by SRPT to transmit a response for
a channel in the state 2 (disconnecting). This must be analyzed
further, just like the bug report triggered from
srpt_abort_scst_cmd().

[ ... ]

> [  411.636699] WARNING: at /root/scst/srpt/src/ib_srpt.c:924
> srpt_abort_scst_cmd+0xac/0x160 [ib_srpt]()
> ...

This message has been triggered by the statement WARN_ON("unexpected
cmd state"). It must be analyzed whether this is a consequence of what
went wrong before or whether this is a separate issue.

[ ... ]

> This may have been due to low memory, as I was using most target
> memory for the ramdisk.

The kernel warning message in ib_srpt.c at line 924 should never be
triggered, not even under low memory circumstances. I'll have a look
at this anyway.

Thanks for the detailed report.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Scst-devel] [ofa-general] WinOF_2_0_5/SRP initiator: slow reads and eventually hangs
  2009-09-16  7:03                                                               ` Bart Van Assche
@ 2009-09-16 15:11                                                                 ` Chris Worley
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Worley @ 2009-09-16 15:11 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-rdma, scst-devel, Vladislav Bolkhovitin, OpenIB

On Wed, Sep 16, 2009 at 1:03 AM, Bart Van Assche
<bart.vanassche@gmail.com> wrote:
> On Tue, Sep 15, 2009 at 10:51 PM, Chris Worley <worleys@gmail.com> wrote:
>> In lots of testing today, I've seen this panic twice on the Ubuntu 8.10 targets:
>>
>> [  330.155992] ib_srpt: disconnected session
>> 0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
>> been received.
>
> The above message means that an initiator logged in, did not log out
> and logged in again. Has one of the initiator systems e.g. been power
> cycled while an SRP connection was active ?

Yes.  Once ib_srp is hung on one device, I re-login to get the device
and test again.  I can't log out of the previous, as it's hung... this
"hanging" is the issue I'm having.  When I re-login, I get a new
device... i.e. I hung /dev/sdc and re-login to get /dev/sdd... then
test that until it hangs.

Using the ramdisks makes the problem much easier to trigger, and
occasionally causes the panic, especially using:

fio --rw=randrw --bs=1k --numjobs=64 --iodepth=64 --sync=0 \
   --direct=1 --randrepeat=0  --group_reporting --ioengine=libaio \
   --filename=/dev/sdp --name=test --loops=10000 --runtime=1600 \
   --rwmixread=100

...on the initiator to cause it.

Chris
>
>> [  357.207046] ib_srpt: srpt_xmit_response: tag= 17 channel in bad state 2
>
> This means that an attempt was made by SRPT to transmit a response for
> a channel in the state 2 (disconnecting). This must be analyzed
> further, just like the bug report triggered from
> srpt_abort_scst_cmd().
>
> [ ... ]
>
>> [  411.636699] WARNING: at /root/scst/srpt/src/ib_srpt.c:924
>> srpt_abort_scst_cmd+0xac/0x160 [ib_srpt]()
>> ...
>
> This message has been triggered by the statement WARN_ON("unexpected
> cmd state"). It must be analyzed whether this is a consequence of what
> went wrong before or whether this is a separate issue.
>
> [ ... ]
>
>> This may have been due to low memory, as I was using most target
>> memory for the ramdisk.
>
> The kernel warning message in ib_srpt.c at line 924 should never be
> triggered, not even under low memory circumstances. I'll have a look
> at this anyway.
>
> Thanks for the detailed report.
>
> Bart.
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Scst-devel] [ofa-general] WinOF_2_0_5/SRP initiator: slow reads and eventually hangs
       [not found]                                                             ` <f3177b9e0909151351p12173c78oe01cc8bcca957550-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2009-09-16  7:03                                                               ` Bart Van Assche
@ 2009-09-16 18:15                                                               ` Vladislav Bolkhovitin
       [not found]                                                                 ` <4AB12B40.9050902-d+Crzxg7Rs0@public.gmane.org>
  1 sibling, 1 reply; 22+ messages in thread
From: Vladislav Bolkhovitin @ 2009-09-16 18:15 UTC (permalink / raw)
  To: Chris Worley; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, scst-devel, OpenIB

Chris Worley, on 09/16/2009 12:51 AM wrote:
> On Tue, Sep 15, 2009 at 11:10 AM, Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org> wrote:
>> Chris Worley, on 09/15/2009 09:01 PM wrote:
>>> On Tue, Sep 15, 2009 at 10:57 AM, Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org>
>>> wrote:
>>>> Chris Worley, on 09/15/2009 08:53 PM wrote:
>>>>> On Tue, Sep 15, 2009 at 10:43 AM, Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org>
>>>>> wrote:
>>>>>> Chris Worley, on 09/15/2009 07:50 PM wrote:
>>>>>>> On Tue, Sep 15, 2009 at 12:10 AM, Bart Van Assche
>>>>>>> <bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>>>>>>> On Tue, Sep 15, 2009 at 1:03 AM, Chris Worley <worleys-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>>>>>>>> wrote:
>>>>>>>>> On Mon, Sep 14, 2009 at 12:51 PM, Vladislav Bolkhovitin
>>>>>>>>> <vst-d+Crzxg7Rs0@public.gmane.org>
>>>>>>>>> wrote:
>>>>>>>>>> Chris Worley, on 09/11/2009 11:50 PM wrote:
>>>>>>>>>>> I've definitely removed the switch/firmware from being the cause.
>>>>>>>>>>>
>>>>>>>>>>> I'm thinking the reason you can't repeat the test may be latency
>>>>>>>>>>> related.  We get ~50usecs average latency (on small block sizes),
>>>>>>>>>>> which can't be achieved using regular SSD's (and rotating drives
>>>>>>>>>>> are
>>>>>>>>>>> nowhere close).  Maybe a ramdisk would help repeat the issue.
>>>>>>>>>> I think you should try to reproduce the problem with ramdisk or
>>>>>>>>>> nullio.
>>>>>>>>>> By
>>>>>>>>>> so you will eliminate possible influence of the SSD backend.
>>>>>>>>> W/ 12GB RAM in the target, I created a 7GB ramdisk:
>>>>>>>>>
>>>>>>>>> mount -t ramfs -o size=7g ramfs /mnt/
>>>>>>>>> dd if=/dev/zero of=/mnt/foo bs=1024k count=7000
>>>>>>>>> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
>>>>>>>>> echo "add ramdisk 2" >/proc/scsi_tgt/groups/Default/devices
>>>>>>>>>
>>>>>>>>> Then, on the initiator, I tested it... and it hung during sequential
>>>>>>>>> 8KB block reads:
>>>>>>>>>
>>>>>>>>> fio --rw=read --bs=8k --numjobs=64 --iodepth=64 --sync=0 --direct=1
>>>>>>>>> --randrepeat=0 \
>>>>>>>>>  --group_reporting --ioengine=libaio --filename=/dev/sde --name=test
>>>>>>>>> --loops=10000 --runtime=600
>>>>>>>>>
>>>>>>>>> Note that I was running the SM on the target this time too.
>>>>>>>> Which Linux distro was installed on the inititiator and on the target
>>>>>>>> ? And if applicable, which OFED version ? Which kernel messages were
>>>>>>>> logged by SRPT around the time the issue occurred (after having
>>>>>>>> enabled SRPT logging first) ?
>>>>>>> As logging hadn't helped this issue previously, I've not been enabling
>>>>>>> it.  That plus the kernel hacks needed to invoke logging, it's not
>>>>>>> worth enabling.
>>>>>>>
>>>>>>> This was with Ubuntu 8.10, built-in IB on the 2.6.27-14-server kernel.
>>>>>>>
>>>>>>> I couldn't get ramdisks working w/ SCST in RHEL5.2.  When running:
>>>>>>>
>>>>>>> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
>>>>>>>
>>>>>>> I get the error:
>>>>>>>
>>>>>>> dev_vdisk: ***ERROR***: Wrong f_op or FS doesn't have required
>>>>>>> capabilities
>>>>>>>
>>>>>>> ... which doesn't occur in the Ubuntu kernel, so I've been unable to
>>>>>>> test RHEL kernels w/ ramdisks.  In general, this problem occurs w/ 8KB
>>>>>>> and smaller blocks w/ the Ubuntu kernels, and 2KB and smaller blocks
>>>>>>> w/ RHEL kernels.
>>>>>> Use ramfs instead.
>>>>> Do you mean:
>>>>>
>>>>> mount -t ramfs -o size=7g ramfs /mnt/
>>>> You should then create a file on it and use it.
>>> That's what I'm doing, I believe.  From above:
>>>
>>>>>>>>> mount -t ramfs -o size=7g ramfs /mnt/
>>>>>>>>> dd if=/dev/zero of=/mnt/foo bs=1024k count=7000
>>>>>>>>> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
>>>>>>>>> echo "add ramdisk 2" >/proc/scsi_tgt/groups/Default/devices
>>> ... but the "open", on RHEL5.2 kernel 2.6.18-92.el5, generates the
>>> following kernel messages:
>>>
>>> dev_vdisk: Registering virtual FILEIO device ramdisk
>>> scst: Processing thread started, PID 9629
>>> scst: Processing thread started, PID 9630
>>> scst: Processing thread started, PID 9631
>>> scst: Processing thread started, PID 9632
>>> scst: Processing thread started, PID 9633
>>> dev_vdisk: ***ERROR***: Wrong f_op or FS doesn't have required
>>> capabilities
>>> scst: ***ERROR***: New device handler's vdisk attach() failed: -22
>>> scst: Processing thread PID 9629 finished
>>> scst: Processing thread PID 9630 finished
>>> scst: Processing thread PID 9631 finished
>>> scst: Processing thread PID 9632 finished
>>> scst: Processing thread PID 9633 finished
>>> scst: Failed to attach to virtual device ramdisk
>>>
>>> Chris
>>>>> ?
>>>>>
>>>>> That's what I'm doing.
>> That's strange. I'm doing it all the time, although with not so old kernels
>> as 2.6.18.
> 
> In lots of testing today, I've seen this panic twice on the Ubuntu 8.10 targets:
> 
> [  330.155992] ib_srpt: disconnected session
> 0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
> been received.
> [  357.207046] ib_srpt: srpt_xmit_response: tag= 17 channel in bad state 2
> [  357.207052] ib_srpt: disconnected session
> 0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
> been received.
> [  357.207100] ib_srpt: srpt_xmit_response: tag= 47 channel in bad state 2
> [  357.207104] scst: ***ERROR***: Target driver ib_srpt
> xmit_response() returned fatal error
> [  357.241429] scst: ***ERROR***: Target driver ib_srpt
> xmit_response() returned fatal error
> [  357.250234] ------------[ cut here ]------------
> [  357.250537] ib_srpt: srpt_xmit_response: tag= 26 channel in bad state 2
> [  357.250539] scst: ***ERROR***: Target driver ib_srpt
> xmit_response() returned fatal error
> [  357.250550] ib_srpt: srpt_xmit_response: tag= 38 channel in bad state 2
> [  357.250553] scst: ***ERROR***: Target driver ib_srpt
> xmit_response() returned fatal error
> [  357.250560] ib_srpt: srpt_xmit_response: tag= 27 channel in bad state 2
> <repeated many times>
> [  357.301253] kernel BUG at /root/scst/scst/src/scst_targ.c:3089!
> [  357.301253] invalid opcode: 0000 [1] SMP
> [  357.301253] CPU 0
> ...
> [  357.301253] RIP: 0010:[<ffffffffa04759f6>]  [<ffffffffa04759f6>]
> scst_tgt_cmd_done+0x26/0x30 [scst]
> [  357.301253] RSP: 0018:ffff88039ad27b50  EFLAGS: 00010297
> [  357.301253] RAX: 0000000000000200 RBX: ffff8803ad9c68f8 RCX: 0000000000000000
> [  357.301253] RDX: 00000000ffffffff RSI: 0000000000000000 RDI: ffff8803ad9c68f8
> [  357.301253] RBP: ffff88039ad27b50 R08: 0000000000000000 R09: 0000000000000000
> [  357.301253] R10: ffff88039ad277c0 R11: ffff88041ad278cf R12: ffff8803c2972180
> [  357.301253] R13: ffff88039ada0000 R14: 0000000000000001 R15: ffff8803fb00c2b0
> [  357.301253] FS:  0000000000000000(0000) GS:ffffffff807dd000(0000)
> knlGS:0000000000000000
> [  357.301253] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [  357.301253] CR2: 00007f9281e64000 CR3: 0000000000201000 CR4: 00000000000006e0
> [  357.301253] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  357.301253] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  357.301253] Process ib_cm/0 (pid: 8299, threadinfo
> ffff88039ad26000, task ffff88039ad40000)
> [  357.301253] Stack:  ffff88039ad27b80 ffffffffa04c0c47
> ffff88039a8db900 ffff8803c2972180
> [  357.301253]  ffff8803fb00c240 ffff8803fb00c284 ffff88039ad27bc0
> ffffffffa04c0d93
> [  357.301253]  ffff88042a4959c0 ffff88042a9d7800 ffff88042544da00
> ffff88042a9d7898
> [  357.301253] Call Trace:
> [  357.301253]  [<ffffffffa04c0c47>] srpt_abort_scst_cmd+0xd7/0x160 [ib_srpt]
> [  357.301253]  [<ffffffffa04c0d93>] srpt_release_channel+0xc3/0x190 [ib_srpt]
> [  357.301253]  [<ffffffffa04c0e82>]
> srpt_find_and_release_channel+0x22/0x30 [ib_srpt]
> [  357.301253]  [<ffffffffa04c227d>] srpt_cm_handler+0x6d/0xbb8 [ib_srpt]

It's because srpt called scst_tgt_cmd_done() when the corresponding 
command hasn't yet been sent to xmit_response() callback, so srpt should 
use another function to abort commands in this state.

Vlad

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Scst-devel] [ofa-general] WinOF_2_0_5/SRP initiator: slow reads and eventually hangs
       [not found]                                                                 ` <4AB12B40.9050902-d+Crzxg7Rs0@public.gmane.org>
@ 2009-09-16 19:41                                                                   ` Chris Worley
       [not found]                                                                     ` <f3177b9e0909161241h89dabdbybaf98edc5b10f735-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Worley @ 2009-09-16 19:41 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, scst-devel, OpenIB

On Wed, Sep 16, 2009 at 12:15 PM, Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org> wrote:
> Chris Worley, on 09/16/2009 12:51 AM wrote:
>>
>> On Tue, Sep 15, 2009 at 11:10 AM, Vladislav Bolkhovitin <vst-UyIK/FWETgo@public.gmane.orgt>
>> wrote:
>>>
>>> Chris Worley, on 09/15/2009 09:01 PM wrote:
>>>>
>>>> On Tue, Sep 15, 2009 at 10:57 AM, Vladislav Bolkhovitin <vst@vlnb.net>
>>>> wrote:
>>>>>
>>>>> Chris Worley, on 09/15/2009 08:53 PM wrote:
>>>>>>
>>>>>> On Tue, Sep 15, 2009 at 10:43 AM, Vladislav Bolkhovitin <vst@vlnb.net>
>>>>>> wrote:
>>>>>>>
>>>>>>> Chris Worley, on 09/15/2009 07:50 PM wrote:
>>>>>>>>
>>>>>>>> On Tue, Sep 15, 2009 at 12:10 AM, Bart Van Assche
>>>>>>>> <bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>>>>>>>>
>>>>>>>>> On Tue, Sep 15, 2009 at 1:03 AM, Chris Worley <worleys@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> On Mon, Sep 14, 2009 at 12:51 PM, Vladislav Bolkhovitin
>>>>>>>>>> <vst-d+Crzxg7Rs0@public.gmane.org>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Chris Worley, on 09/11/2009 11:50 PM wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> I've definitely removed the switch/firmware from being the
>>>>>>>>>>>> cause.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm thinking the reason you can't repeat the test may be latency
>>>>>>>>>>>> related.  We get ~50usecs average latency (on small block
>>>>>>>>>>>> sizes),
>>>>>>>>>>>> which can't be achieved using regular SSD's (and rotating drives
>>>>>>>>>>>> are
>>>>>>>>>>>> nowhere close).  Maybe a ramdisk would help repeat the issue.
>>>>>>>>>>>
>>>>>>>>>>> I think you should try to reproduce the problem with ramdisk or
>>>>>>>>>>> nullio.
>>>>>>>>>>> By
>>>>>>>>>>> so you will eliminate possible influence of the SSD backend.
>>>>>>>>>>
>>>>>>>>>> W/ 12GB RAM in the target, I created a 7GB ramdisk:
>>>>>>>>>>
>>>>>>>>>> mount -t ramfs -o size=7g ramfs /mnt/
>>>>>>>>>> dd if=/dev/zero of=/mnt/foo bs=1024k count=7000
>>>>>>>>>> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
>>>>>>>>>> echo "add ramdisk 2" >/proc/scsi_tgt/groups/Default/devices
>>>>>>>>>>
>>>>>>>>>> Then, on the initiator, I tested it... and it hung during
>>>>>>>>>> sequential
>>>>>>>>>> 8KB block reads:
>>>>>>>>>>
>>>>>>>>>> fio --rw=read --bs=8k --numjobs=64 --iodepth=64 --sync=0
>>>>>>>>>> --direct=1
>>>>>>>>>> --randrepeat=0 \
>>>>>>>>>>  --group_reporting --ioengine=libaio --filename=/dev/sde
>>>>>>>>>> --name=test
>>>>>>>>>> --loops=10000 --runtime=600
>>>>>>>>>>
>>>>>>>>>> Note that I was running the SM on the target this time too.
>>>>>>>>>
>>>>>>>>> Which Linux distro was installed on the inititiator and on the
>>>>>>>>> target
>>>>>>>>> ? And if applicable, which OFED version ? Which kernel messages
>>>>>>>>> were
>>>>>>>>> logged by SRPT around the time the issue occurred (after having
>>>>>>>>> enabled SRPT logging first) ?
>>>>>>>>
>>>>>>>> As logging hadn't helped this issue previously, I've not been
>>>>>>>> enabling
>>>>>>>> it.  That plus the kernel hacks needed to invoke logging, it's not
>>>>>>>> worth enabling.
>>>>>>>>
>>>>>>>> This was with Ubuntu 8.10, built-in IB on the 2.6.27-14-server
>>>>>>>> kernel.
>>>>>>>>
>>>>>>>> I couldn't get ramdisks working w/ SCST in RHEL5.2.  When running:
>>>>>>>>
>>>>>>>> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
>>>>>>>>
>>>>>>>> I get the error:
>>>>>>>>
>>>>>>>> dev_vdisk: ***ERROR***: Wrong f_op or FS doesn't have required
>>>>>>>> capabilities
>>>>>>>>
>>>>>>>> ... which doesn't occur in the Ubuntu kernel, so I've been unable to
>>>>>>>> test RHEL kernels w/ ramdisks.  In general, this problem occurs w/
>>>>>>>> 8KB
>>>>>>>> and smaller blocks w/ the Ubuntu kernels, and 2KB and smaller blocks
>>>>>>>> w/ RHEL kernels.
>>>>>>>
>>>>>>> Use ramfs instead.
>>>>>>
>>>>>> Do you mean:
>>>>>>
>>>>>> mount -t ramfs -o size=7g ramfs /mnt/
>>>>>
>>>>> You should then create a file on it and use it.
>>>>
>>>> That's what I'm doing, I believe.  From above:
>>>>
>>>>>>>>>> mount -t ramfs -o size=7g ramfs /mnt/
>>>>>>>>>> dd if=/dev/zero of=/mnt/foo bs=1024k count=7000
>>>>>>>>>> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
>>>>>>>>>> echo "add ramdisk 2" >/proc/scsi_tgt/groups/Default/devices
>>>>
>>>> ... but the "open", on RHEL5.2 kernel 2.6.18-92.el5, generates the
>>>> following kernel messages:
>>>>
>>>> dev_vdisk: Registering virtual FILEIO device ramdisk
>>>> scst: Processing thread started, PID 9629
>>>> scst: Processing thread started, PID 9630
>>>> scst: Processing thread started, PID 9631
>>>> scst: Processing thread started, PID 9632
>>>> scst: Processing thread started, PID 9633
>>>> dev_vdisk: ***ERROR***: Wrong f_op or FS doesn't have required
>>>> capabilities
>>>> scst: ***ERROR***: New device handler's vdisk attach() failed: -22
>>>> scst: Processing thread PID 9629 finished
>>>> scst: Processing thread PID 9630 finished
>>>> scst: Processing thread PID 9631 finished
>>>> scst: Processing thread PID 9632 finished
>>>> scst: Processing thread PID 9633 finished
>>>> scst: Failed to attach to virtual device ramdisk
>>>>
>>>> Chris
>>>>>>
>>>>>> ?
>>>>>>
>>>>>> That's what I'm doing.
>>>
>>> That's strange. I'm doing it all the time, although with not so old
>>> kernels
>>> as 2.6.18.
>>
>> In lots of testing today, I've seen this panic twice on the Ubuntu 8.10
>> targets:
>>
>> [  330.155992] ib_srpt: disconnected session
>> 0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
>> been received.
>> [  357.207046] ib_srpt: srpt_xmit_response: tag= 17 channel in bad state 2
>> [  357.207052] ib_srpt: disconnected session
>> 0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
>> been received.
>> [  357.207100] ib_srpt: srpt_xmit_response: tag= 47 channel in bad state 2
>> [  357.207104] scst: ***ERROR***: Target driver ib_srpt
>> xmit_response() returned fatal error
>> [  357.241429] scst: ***ERROR***: Target driver ib_srpt
>> xmit_response() returned fatal error
>> [  357.250234] ------------[ cut here ]------------
>> [  357.250537] ib_srpt: srpt_xmit_response: tag= 26 channel in bad state 2
>> [  357.250539] scst: ***ERROR***: Target driver ib_srpt
>> xmit_response() returned fatal error
>> [  357.250550] ib_srpt: srpt_xmit_response: tag= 38 channel in bad state 2
>> [  357.250553] scst: ***ERROR***: Target driver ib_srpt
>> xmit_response() returned fatal error
>> [  357.250560] ib_srpt: srpt_xmit_response: tag= 27 channel in bad state 2
>> <repeated many times>
>> [  357.301253] kernel BUG at /root/scst/scst/src/scst_targ.c:3089!
>> [  357.301253] invalid opcode: 0000 [1] SMP
>> [  357.301253] CPU 0
>> ...
>> [  357.301253] RIP: 0010:[<ffffffffa04759f6>]  [<ffffffffa04759f6>]
>> scst_tgt_cmd_done+0x26/0x30 [scst]
>> [  357.301253] RSP: 0018:ffff88039ad27b50  EFLAGS: 00010297
>> [  357.301253] RAX: 0000000000000200 RBX: ffff8803ad9c68f8 RCX:
>> 0000000000000000
>> [  357.301253] RDX: 00000000ffffffff RSI: 0000000000000000 RDI:
>> ffff8803ad9c68f8
>> [  357.301253] RBP: ffff88039ad27b50 R08: 0000000000000000 R09:
>> 0000000000000000
>> [  357.301253] R10: ffff88039ad277c0 R11: ffff88041ad278cf R12:
>> ffff8803c2972180
>> [  357.301253] R13: ffff88039ada0000 R14: 0000000000000001 R15:
>> ffff8803fb00c2b0
>> [  357.301253] FS:  0000000000000000(0000) GS:ffffffff807dd000(0000)
>> knlGS:0000000000000000
>> [  357.301253] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>> [  357.301253] CR2: 00007f9281e64000 CR3: 0000000000201000 CR4:
>> 00000000000006e0
>> [  357.301253] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> 0000000000000000
>> [  357.301253] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>> 0000000000000400
>> [  357.301253] Process ib_cm/0 (pid: 8299, threadinfo
>> ffff88039ad26000, task ffff88039ad40000)
>> [  357.301253] Stack:  ffff88039ad27b80 ffffffffa04c0c47
>> ffff88039a8db900 ffff8803c2972180
>> [  357.301253]  ffff8803fb00c240 ffff8803fb00c284 ffff88039ad27bc0
>> ffffffffa04c0d93
>> [  357.301253]  ffff88042a4959c0 ffff88042a9d7800 ffff88042544da00
>> ffff88042a9d7898
>> [  357.301253] Call Trace:
>> [  357.301253]  [<ffffffffa04c0c47>] srpt_abort_scst_cmd+0xd7/0x160
>> [ib_srpt]
>> [  357.301253]  [<ffffffffa04c0d93>] srpt_release_channel+0xc3/0x190
>> [ib_srpt]
>> [  357.301253]  [<ffffffffa04c0e82>]
>> srpt_find_and_release_channel+0x22/0x30 [ib_srpt]
>> [  357.301253]  [<ffffffffa04c227d>] srpt_cm_handler+0x6d/0xbb8 [ib_srpt]
>
> It's because srpt called scst_tgt_cmd_done() when the corresponding command
> hasn't yet been sent to xmit_response() callback, so srpt should use another
> function to abort commands in this state.

Could this be related to the hang (i.e. the command has been aborted
before xmit_response has been called... but w/o causing a panic)?

Thanks,

Chris
>
> Vlad
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Scst-devel] [ofa-general] WinOF_2_0_5/SRP initiator: slow reads and eventually hangs
       [not found]                                                                     ` <f3177b9e0909161241h89dabdbybaf98edc5b10f735-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2009-09-17 10:22                                                                       ` Bart Van Assche
  0 siblings, 0 replies; 22+ messages in thread
From: Bart Van Assche @ 2009-09-17 10:22 UTC (permalink / raw)
  To: Chris Worley
  Cc: Vladislav Bolkhovitin, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	scst-devel, OpenIB

On Wed, Sep 16, 2009 at 9:41 PM, Chris Worley <worleys-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
> On Wed, Sep 16, 2009 at 12:15 PM, Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org> wrote:
> > Chris Worley, on 09/16/2009 12:51 AM wrote:
> > >
> > > On Tue, Sep 15, 2009 at 11:10 AM, Vladislav Bolkhovitin <vst@vlnb.net>
> > > wrote:
> > > [ ... ]
> > > [  357.250550] ib_srpt: srpt_xmit_response: tag= 38 channel in bad state 2
> > > [  357.250553] scst: ***ERROR***: Target driver ib_srpt
> > > xmit_response() returned fatal error
> >
> > It's because srpt called scst_tgt_cmd_done() when the corresponding command
> > hasn't yet been sent to xmit_response() callback, so srpt should use another
> > function to abort commands in this state.
>
> Could this be related to the hang (i.e. the command has been aborted
> before xmit_response has been called... but w/o causing a panic)?

When analyzing such logs it's important to distinguish between cause
and consequence. What happened first is that the OFED SRP initiator
noticed that something went wrong with the IB communication, as
indicated by the log message "srp_qp_in_err_timer called". This means
that an error occurred in the IB network or in one of the two IB
stacks. This resulted in the SRP initiator trying to relogin without
intervening logout. The error messages logged by SRPT are a
consequence of the initiator relogin. While the SRPT issue will be
fixed, such a fix won't solve the slow reads and the hang you
observed.

Regarding the SRP communication problems you observed: since my
attempts to reproduce this issue have been unsuccessful so far, I'm
afraid these communication problems are caused by some component in
your IB network that is not working as reliable as it should.

By the way, the description of the patch that generated the message
"srp_qp_in_err_timer called" is interesting. The patch description
indicates that the condition "srp_qp_in_err_timer called" should only
happen during multipath failover. See also
http://www.mail-archive.com/ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org/msg01959.html
(which is not the latest version of this patch).

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2009-09-17 10:22 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <f3177b9e0908091009x23813cbdq4fbd9ebe6d8e174f@mail.gmail.com>
     [not found] ` <f3177b9e0909022108uc826c3ct3db0ae7bfa2c8128@mail.gmail.com>
     [not found]   ` <4A9FA945.4070408@vlnb.net>
     [not found]     ` <f3177b9e0909031038s22f0a1bg401629d4208fd82@mail.gmail.com>
     [not found]       ` <f3177b9e0909031620i10db945ep69ddc36a7044334a@mail.gmail.com>
     [not found]         ` <e2e108260909060617y13a1ded8jd6686d02521ecb4@mail.gmail.com>
     [not found]           ` <f3177b9e0909060636k6e293c06la3dfe9f929da4915@mail.gmail.com>
     [not found]             ` <f3177b9e0909060641i296c7aefp322712f52de9786a@mail.gmail.com>
     [not found]               ` <4AA4F561.504@vlnb.net>
     [not found]                 ` <f3177b9e0909081529i4dd74faq9a6c5a4783b5ded4@mail.gmail.com>
     [not found]                   ` <f3177b9e0909081529i4dd74faq9a6c5a4783b5ded4-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-09 16:38                     ` [Scst-devel] [ofa-general] WinOF_2_0_5/SRP initiator: slow reads and eventually hangs Bart Van Assche
     [not found]                       ` <e2e108260909090938x6b72c519teed7dfd280eac804-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-11 19:50                         ` Chris Worley
     [not found]                           ` <f3177b9e0909111250w159def51h9b720366e27fa3a7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-12 15:24                             ` Bart Van Assche
2009-09-14 18:51                             ` Vladislav Bolkhovitin
     [not found]                               ` <4AAE909F.6030202-d+Crzxg7Rs0@public.gmane.org>
2009-09-14 23:03                                 ` Chris Worley
     [not found]                                   ` <f3177b9e0909141603j2dc61663j4c6bbcc0dda631d4-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-15  6:10                                     ` Bart Van Assche
     [not found]                                       ` <e2e108260909142310vb353718uea99d50ab638a865-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-15 15:50                                         ` Chris Worley
     [not found]                                           ` <f3177b9e0909150850r2d0f5e15id7f4e14b015f68ed-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-15 16:43                                             ` Vladislav Bolkhovitin
2009-09-15 16:53                                               ` Chris Worley
     [not found]                                                 ` <f3177b9e0909150953x11d19210mf07cbdcc57928d42-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-15 16:57                                                   ` Vladislav Bolkhovitin
2009-09-15 17:01                                                     ` Chris Worley
2009-09-15 17:10                                                       ` Vladislav Bolkhovitin
     [not found]                                                         ` <4AAFCA77.6050305-d+Crzxg7Rs0@public.gmane.org>
2009-09-15 20:51                                                           ` Chris Worley
     [not found]                                                             ` <f3177b9e0909151351p12173c78oe01cc8bcca957550-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-16  7:03                                                               ` Bart Van Assche
2009-09-16 15:11                                                                 ` Chris Worley
2009-09-16 18:15                                                               ` Vladislav Bolkhovitin
     [not found]                                                                 ` <4AB12B40.9050902-d+Crzxg7Rs0@public.gmane.org>
2009-09-16 19:41                                                                   ` Chris Worley
     [not found]                                                                     ` <f3177b9e0909161241h89dabdbybaf98edc5b10f735-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-17 10:22                                                                       ` Bart Van Assche
2009-09-16  6:38                                             ` Bart Van Assche
2009-09-16  6:42                                             ` Bart Van Assche
2009-09-15 16:39                                     ` Vladislav Bolkhovitin
2009-09-15 16:52                                       ` Chris Worley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.