linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Booting from Qlogic qla2300 fibre channel card
@ 2003-04-16  6:18 Jurjen Oskam
  2003-04-16  6:56 ` Lincoln Dale
  2003-04-16 16:10 ` Patrick Mansfield
  0 siblings, 2 replies; 8+ messages in thread
From: Jurjen Oskam @ 2003-04-16  6:18 UTC (permalink / raw)
  To: linux-kernel

Hi everybody,

At work, we are looking to deploy several Linux boxes on our SAN. The
machines will be IBM eServer xSeries 345 with Qlogic qla2340 Fibre Channel
cards, and no internal disks.

The storage array is an EMC Symmetrix model 8530. EMC created a document
where they explain how to make such a configuration work. When they mention
booting from a Symmetrix-provided volume, they mention the following:

"If Linux loses connectivity long enough, the disks disappear from the
system. [...] For [this reason], EMC recommends that you do not boot a
Linux host from the EMC storage array."


When making an online configuration change on the Symmetrix (such as
remapping volumes), it is possible for the attached hosts to experience
a temporary error while accessing a storage array volume. For example,
when changing the Symmetrix configuration, it is not uncommon for the
RS/6000s (also attached to the SAN) to log one or two temporary
SCSI-errors. They don't cause any problems at all, the AIX volume manager
never notices a problem.


Does the warning describe a real-world possibility? For example, what is
"long enough"?


Of course, we'll test this configuration before putting it into production
(and I'll mention our results here if they prove to be interesting), but
I'm hoping if somebody here has some useful comments. :-)

Thanks,
-- 
Jurjen Oskam

PGP Key available at http://www.stupendous.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Booting from Qlogic qla2300 fibre channel card
  2003-04-16  6:18 Booting from Qlogic qla2300 fibre channel card Jurjen Oskam
@ 2003-04-16  6:56 ` Lincoln Dale
  2003-04-16  9:48   ` Jurjen Oskam
  2003-04-16 15:32   ` Michael Clark
  2003-04-16 16:10 ` Patrick Mansfield
  1 sibling, 2 replies; 8+ messages in thread
From: Lincoln Dale @ 2003-04-16  6:56 UTC (permalink / raw)
  To: Jurjen Oskam; +Cc: linux-kernel

Hi,

At 08:18 AM 16/04/2003 +0200, Jurjen Oskam wrote:
>At work, we are looking to deploy several Linux boxes on our SAN. The
>machines will be IBM eServer xSeries 345 with Qlogic qla2340 Fibre Channel
>cards, and no internal disks.
>
>The storage array is an EMC Symmetrix model 8530. EMC created a document
>where they explain how to make such a configuration work. When they mention
>booting from a Symmetrix-provided volume, they mention the following:
>
>"If Linux loses connectivity long enough, the disks disappear from the
>system. [...] For [this reason], EMC recommends that you do not boot a
>Linux host from the EMC storage array."

in general, all OSes get rather upset if disks disappear under 
them.  particularly if those disks contain swap -- exactly how is the 
machine meant to recover from that?

some recommendations:
  - run with the Matthew Jacob's "feral" driver rather than QLogic's driver
    it has much better error recovery
  - you may want to increase the delay of SCSI_TIMEOUT in drivers/scsi/scsi.h

in my lab here, i do a ton of work on Fibre Channel & iSCSI.
the best setup i've found is that i end up using ramfs as my root and 
having lots of things in there.  sure, its burns a bit of ram, but i can be 
sure if i'm doing anything that could impact the i/o path, its on less 
system-critical stuff.  since its a lab and the things running on the hosts 
aren't RAM hongs, i don't have swap either.  you probably can't get away 
with that, so i'd recommend doing some extensive testing pulling cables out 
and seeing what happens and tuning timers to cope with it accordingly.

>When making an online configuration change on the Symmetrix (such as
>remapping volumes), it is possible for the attached hosts to experience
>a temporary error while accessing a storage array volume. For example,

are you sure this tech note will still apply with the DMX?
i'd imagine that there are still bin file changes that can cause this kind 
of thing, but its something i believe EMC was addressing with the DMX.

>when changing the Symmetrix configuration, it is not uncommon for the
>RS/6000s (also attached to the SAN) to log one or two temporary
>SCSI-errors. They don't cause any problems at all, the AIX volume manager
>never notices a problem.

on RS/6000's, the rules were somewhat different.  the HBAs that IBM had for 
RS6Ks typically only tried to issue FLOGIs once every 30 seconds - so you 
would be more likely to see timeout errors if you impacted the flow of 
traffic temporarily.


cheers,

lincoln.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Booting from Qlogic qla2300 fibre channel card
  2003-04-16  6:56 ` Lincoln Dale
@ 2003-04-16  9:48   ` Jurjen Oskam
  2003-04-16 15:32   ` Michael Clark
  1 sibling, 0 replies; 8+ messages in thread
From: Jurjen Oskam @ 2003-04-16  9:48 UTC (permalink / raw)
  To: linux-kernel

On Wed, Apr 16, 2003 at 04:56:16PM +1000, Lincoln Dale wrote:

> in general, all OSes get rather upset if disks disappear under 
> them.  particularly if those disks contain swap -- exactly how is the 
> machine meant to recover from that?

Of course, if cables are pulled out or something like that, I'm not expecting
the OS to recover from that. :-)

I'm not trying to recover from or survive physical configuration changes.
I'm more interested in what happens when a volume generates a temporary
error, such as the ones that sometimes occur when doing logical
configuration changes (BIN-file changes on Symmetrix, for example).

> >When making an online configuration change on the Symmetrix (such as
> >remapping volumes), it is possible for the attached hosts to experience
> >a temporary error while accessing a storage array volume. For example,
> are you sure this tech note will still apply with the DMX?

I'm not sure, but that doesn't apply to us anyway since we have a 8530.


Anyway, I'll take a look at the SCSI_TIMEOUT value. Thanks for your
suggestions.


-- 
Jurjen Oskam

PGP Key available at http://www.stupendous.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Booting from Qlogic qla2300 fibre channel card
  2003-04-16  6:56 ` Lincoln Dale
  2003-04-16  9:48   ` Jurjen Oskam
@ 2003-04-16 15:32   ` Michael Clark
  2003-04-16 15:56     ` James Bourne
  1 sibling, 1 reply; 8+ messages in thread
From: Michael Clark @ 2003-04-16 15:32 UTC (permalink / raw)
  To: Lincoln Dale; +Cc: Jurjen Oskam, linux-kernel

Hi,

On 04/16/03 14:56, Lincoln Dale wrote:
> Hi,
> 
> At 08:18 AM 16/04/2003 +0200, Jurjen Oskam wrote:
> 
>> At work, we are looking to deploy several Linux boxes on our SAN. The
>> machines will be IBM eServer xSeries 345 with Qlogic qla2340 Fibre 
>> Channel
>> cards, and no internal disks.
>>
>> The storage array is an EMC Symmetrix model 8530. EMC created a document
>> where they explain how to make such a configuration work. When they 
>> mention
>> booting from a Symmetrix-provided volume, they mention the following:
>>
>> "If Linux loses connectivity long enough, the disks disappear from the
>> system. [...] For [this reason], EMC recommends that you do not boot a
>> Linux host from the EMC storage array."
> 
> 
> in general, all OSes get rather upset if disks disappear under them.  
> particularly if those disks contain swap -- exactly how is the machine 
> meant to recover from that?
> 
> some recommendations:
>  - run with the Matthew Jacob's "feral" driver rather than QLogic's driver
>    it has much better error recovery

Although this is certainly a matter of opinion. When i tried the feral
driver a month ago - upon unplugging the fibre (and getting loop down)
the SCSI layer started spewing IO errors and the files copied during
this test (on ext3) had invalid checksums. The qlogic driver however
handled this test fine (handling multiple fibre unplugs while copying a
multi gigabyte file). Certainly the qlogic driver has its fair share of
recovery problems such as an abort function that tries to re-init the
hardware but always fails.

I'm currently looking for alternatives to qlogic HBAs after a year of
not being able to find a stable driver combo (one that can stand up
for more than a few weeks). Does any one out there have experience
with the LSI HBAs and Fusion MPT drivers or perhaps Emulex?

We get the following with latest 6.1 qlogic driver and our 2300s about
every 2 weeks (we are about to file a bug report to qlogic).

Apr  2 10:54:13 prodapp3 kernel: qla2x00: Status Entry invalid handle.
Apr  2 10:54:13 prodapp3 kernel: qla2x00: Performing ISP error recovery - ha= c3afc07c.
Apr  2 10:54:13 prodapp3 kernel: qla2x00_abort_isp(2): **** FAILED ****
Apr  2 10:54:13 prodapp3 kernel: qla2x00: Performing ISP error recovery - ha= c3afc07c.
Apr  2 10:54:13 prodapp3 kernel: qla2x00_abort_isp(2): **** FAILED ****
Apr  2 10:54:13 prodapp3 kernel: qla2x00: Performing ISP error recovery - ha= c3afc07c.
Apr  2 10:54:13 prodapp3 kernel: qla2x00_abort_isp(2): **** FAILED ****
Apr  2 10:54:14 prodapp3 kernel: qla2x00: Performing ISP error recovery - ha= c3afc07c.
Apr  2 10:54:14 prodapp3 kernel: qla2x00_abort_isp(2): **** FAILED ****
Apr  2 10:54:15 prodapp3 kernel: qla2x00: Performing ISP error recovery - ha= c3afc07c.
Apr  2 10:54:15 prodapp3 kernel: qla2x00_abort_isp(2): **** FAILED ****
Apr  2 10:54:16 prodapp3 kernel: qla2x00: Performing ISP error recovery - ha= c3afc07c.
Apr  2 10:54:16 prodapp3 kernel: qla2x00_abort_isp(2): **** FAILED ****
Apr  2 10:54:17 prodapp3 kernel: qla2x00: Performing ISP error recovery - ha= c3afc07c.
Apr  2 10:54:17 prodapp3 kernel: qla2x00(2): ISP error recovery failed - board disabled

>  - you may want to increase the delay of SCSI_TIMEOUT in 
> drivers/scsi/scsi.h
> 
> in my lab here, i do a ton of work on Fibre Channel & iSCSI.
> the best setup i've found is that i end up using ramfs as my root and 
> having lots of things in there.  sure, its burns a bit of ram, but i can 
> be sure if i'm doing anything that could impact the i/o path, its on 
> less system-critical stuff.  since its a lab and the things running on 
> the hosts aren't RAM hongs, i don't have swap either.  you probably 
> can't get away with that, so i'd recommend doing some extensive testing 
> pulling cables out and seeing what happens and tuning timers to cope 
> with it accordingly.
> 
>> When making an online configuration change on the Symmetrix (such as
>> remapping volumes), it is possible for the attached hosts to experience
>> a temporary error while accessing a storage array volume. For example,
> 
> 
> are you sure this tech note will still apply with the DMX?
> i'd imagine that there are still bin file changes that can cause this 
> kind of thing, but its something i believe EMC was addressing with the DMX.
> 
>> when changing the Symmetrix configuration, it is not uncommon for the
>> RS/6000s (also attached to the SAN) to log one or two temporary
>> SCSI-errors. They don't cause any problems at all, the AIX volume manager
>> never notices a problem.
> 
> 
> on RS/6000's, the rules were somewhat different.  the HBAs that IBM had 
> for RS6Ks typically only tried to issue FLOGIs once every 30 seconds - 
> so you would be more likely to see timeout errors if you impacted the 
> flow of traffic temporarily.
> 
> 
> cheers,
> 
> lincoln.
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Booting from Qlogic qla2300 fibre channel card
  2003-04-16 15:32   ` Michael Clark
@ 2003-04-16 15:56     ` James Bourne
  2003-04-16 16:25       ` jds
  2003-04-16 16:43       ` Michael Clark
  0 siblings, 2 replies; 8+ messages in thread
From: James Bourne @ 2003-04-16 15:56 UTC (permalink / raw)
  To: Michael Clark; +Cc: Lincoln Dale, Jurjen Oskam, linux-kernel

On Wed, 16 Apr 2003, Michael Clark wrote:

> Hi,
...
> I'm currently looking for alternatives to qlogic HBAs after a year of
> not being able to find a stable driver combo (one that can stand up
> for more than a few weeks). Does any one out there have experience
> with the LSI HBAs and Fusion MPT drivers or perhaps Emulex?

We are currently using the EMC approved 6.04-fo qla2300 driver with great
success.  With multiple connections to a CX600 fail over occurs properly, it
also does failover for the tape drives, and the system has been running for
about 40 days without any problems...

YMMV, but for us it has been working quite well.

Regards
James Bourne


-- 
James Bourne                  | Email:            jbourne@hardrock.org          
Unix Systems Administrator    | WWW:           http://www.hardrock.org
Custom Unix Programming       | Linux:  The choice of a GNU generation
----------------------------------------------------------------------
 "All you need's an occasional kick in the philosophy." Frank Herbert  


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Booting from Qlogic qla2300 fibre channel card
  2003-04-16  6:18 Booting from Qlogic qla2300 fibre channel card Jurjen Oskam
  2003-04-16  6:56 ` Lincoln Dale
@ 2003-04-16 16:10 ` Patrick Mansfield
  1 sibling, 0 replies; 8+ messages in thread
From: Patrick Mansfield @ 2003-04-16 16:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-scsi

On Wed, Apr 16, 2003 at 08:18:30AM +0200, Jurjen Oskam wrote:

> The storage array is an EMC Symmetrix model 8530. EMC created a document
> where they explain how to make such a configuration work. When they mention
> booting from a Symmetrix-provided volume, they mention the following:
> 
> "If Linux loses connectivity long enough, the disks disappear from the
> system. [...] For [this reason], EMC recommends that you do not boot a
> Linux host from the EMC storage array."

That probably means the EMC is returning a vendor specific (or perhaps even
a standard) sense code, you'd have to ask EMC specifically what they mean,
and what sense code is returned, or maybe check the logs on your other box
for specifics.

This also means that changing the scsi timeout will have no affect, since
the IO will complete without timing out.

See scsi_error.c scsi_check_sense for how they are handled. One important
piece is that the VENDOR SPECIFIC sense key (value of 9 in
sense_buffer[2]) falls into the default case of SUCCESS (i.e. complete the
IO as failed). Whatever (and if) the code is, you would effectively want
scsi_check_sense to return NEEDS_RETRY rather than SUCCESS.

There is also some range (can't remember what, or where they are in the
SCSI spec) for vendor specific ASC/ASCQ for any (or all?) sense keys.

We don't have a way (2.4 or 2.5) to dynamically (and cleanly) add vendor
specific sense codes and handling of them.

> When making an online configuration change on the Symmetrix (such as
> remapping volumes), it is possible for the attached hosts to experience
> a temporary error while accessing a storage array volume. For example,
> when changing the Symmetrix configuration, it is not uncommon for the
> RS/6000s (also attached to the SAN) to log one or two temporary
> SCSI-errors. They don't cause any problems at all, the AIX volume manager
> never notices a problem.


> Does the warning describe a real-world possibility? For example, what is
> "long enough"?

Long enough probably means long enough for any IO to be sent to the EMC.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Booting from Qlogic qla2300 fibre channel card
  2003-04-16 15:56     ` James Bourne
@ 2003-04-16 16:25       ` jds
  2003-04-16 16:43       ` Michael Clark
  1 sibling, 0 replies; 8+ messages in thread
From: jds @ 2003-04-16 16:25 UTC (permalink / raw)
  To: James Bourne, Michael Clark; +Cc: Lincoln Dale, Jurjen Oskam, linux-kernel


Hi:

  We are currently have qlogic qla2200 driver with storage  Hitachi and 
working perfect in linux, boot qlogic or SCSI your define in orden load modules.

  I have two cards Qlogic Fibre Channel in failover to Hitachi

  Regards



---------- Original Message -----------
From: James Bourne <jbourne@hardrock.org>
To: Michael Clark <michael@metaparadigm.com>
Sent: Wed, 16 Apr 2003 09:56:02 -0600 (MDT)
Subject: Re: Booting from Qlogic qla2300 fibre channel card

> On Wed, 16 Apr 2003, Michael Clark wrote:
> 
> > Hi,
> ...
> > I'm currently looking for alternatives to qlogic HBAs after a year of
> > not being able to find a stable driver combo (one that can stand up
> > for more than a few weeks). Does any one out there have experience
> > with the LSI HBAs and Fusion MPT drivers or perhaps Emulex?
> 
> We are currently using the EMC approved 6.04-fo qla2300 driver with great
> success.  With multiple connections to a CX600 fail over occurs 
> properly, it also does failover for the tape drives, and the system 
> has been running for about 40 days without any problems...
> 
> YMMV, but for us it has been working quite well.
> 
> Regards
> James Bourne
> 
> -- 
> James Bourne                  | Email:           
>  jbourne@hardrock.org          Unix Systems Administrator    | WWW:  
>          http://www.hardrock.org Custom Unix Programming       | 
> Linux:  The choice of a GNU generation
> ----------------------------------------------------------------------
>  "All you need's an occasional kick in the philosophy." Frank 
> Herbert  
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-
> kernel" in the body of a message to majordomo@vger.kernel.org More 
> majordomo info at  http://vger.kernel.org/majordomo-info.html Please 
> read the FAQ at  http://www.tux.org/lkml/
------- End of Original Message -------


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Booting from Qlogic qla2300 fibre channel card
  2003-04-16 15:56     ` James Bourne
  2003-04-16 16:25       ` jds
@ 2003-04-16 16:43       ` Michael Clark
  1 sibling, 0 replies; 8+ messages in thread
From: Michael Clark @ 2003-04-16 16:43 UTC (permalink / raw)
  To: James Bourne; +Cc: Lincoln Dale, Jurjen Oskam, linux-kernel

Actually i just realized after checking - we are using 6.04 also (standard
- not failover). The error and abort messages in my previous were from
the 6.04 driver.

Happens on 2 different machines that do around 200 IOs/sec during the day.

We are beginning to suspect heat from a e1000 in a slot next door.
All the crashes occur when our thermostat switches to one aircon instead
of two although the ambient temp is around 25 celcius which is still
relatively cool. Sometimes after the failure, the card will fail to
re-initialise after a cold boot but works after leaving the machine
off for about 20 minutes.

~mc

On 04/16/03 23:56, James Bourne wrote:
> On Wed, 16 Apr 2003, Michael Clark wrote:
> 
> 
>>Hi,
> 
> ...
> 
>>I'm currently looking for alternatives to qlogic HBAs after a year of
>>not being able to find a stable driver combo (one that can stand up
>>for more than a few weeks). Does any one out there have experience
>>with the LSI HBAs and Fusion MPT drivers or perhaps Emulex?
> 
> 
> We are currently using the EMC approved 6.04-fo qla2300 driver with great
> success.  With multiple connections to a CX600 fail over occurs properly, it
> also does failover for the tape drives, and the system has been running for
> about 40 days without any problems...
> 
> YMMV, but for us it has been working quite well.
> 
> Regards
> James Bourne
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2003-04-16 16:31 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-16  6:18 Booting from Qlogic qla2300 fibre channel card Jurjen Oskam
2003-04-16  6:56 ` Lincoln Dale
2003-04-16  9:48   ` Jurjen Oskam
2003-04-16 15:32   ` Michael Clark
2003-04-16 15:56     ` James Bourne
2003-04-16 16:25       ` jds
2003-04-16 16:43       ` Michael Clark
2003-04-16 16:10 ` Patrick Mansfield

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).