linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Problems with 3ware error event notification in kernel
@ 2001-08-15 15:46 Neulinger, Nathan
  2001-09-15 23:45 ` David Johnson
  0 siblings, 1 reply; 3+ messages in thread
From: Neulinger, Nathan @ 2001-08-15 15:46 UTC (permalink / raw)
  To: 'linux-kernel@vger.kernel.org'

I've got a situation with the 3ware driver and 3dm monitor where it appears
to stop receiving notification of status changes from the kernel.

I've seen this with 2.2.19+variouspatches and 2.4.7-ac3. (On some other
machines, it appears to work fine.)

Basically, all of the real-time monitoring and instantaneous status request,
as well as configuration change, etc. stuff all works fine, but after a
while, the 3dm monitor no longer gets messages talking about drive failure
(pulling a drive on hot-swap tray) or when the rebuilds start/stop. Even
restarting the 3dm monitor doesn't seem to help this.

Strace doesn't seem to work on the 3dm executable since it's threaded... (Is
there a way to get that to work?)

Anyone have any ideas on this or have you seen this? The important thing is,
if I reboot the server it's on, I'll generally be able to get a few alert
messages, but after some time period (I think it's time based and not
#-of-alerts based) it no longer receives new alert messages.

3ware tech support doesn't have any idea on what could be wrong and didn't
have any suggestions of what to try other than harping about what version of
sendmail I'm running, etc. (Turning off the mail support didn't have any
effect on the alert processing.)

----

On a side note - is anyone aware of any effort to reverse engineer the
status probing code so that we could monitor the raid arrays using something
other than 3wares 3dm tool? I presume it's just a matter of knowing the
right ioctl's and parms to issue, but there is no info on this, and 3ware
has no plans (according to tech support) to release any documentation or
source for the monitoring code/protocols. Being able to strace the 3dmd
would likely make this alot easier, but haven't been able to get strace to
work against it.

-- Nathan

------------------------------------------------------------
Nathan Neulinger                       EMail:  nneul@umr.edu
University of Missouri - Rolla         Phone: (573) 341-4841
Computing Services                       Fax: (573) 341-4216

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Problems with 3ware error event notification in kernel
  2001-08-15 15:46 Problems with 3ware error event notification in kernel Neulinger, Nathan
@ 2001-09-15 23:45 ` David Johnson
  2001-09-15 23:48   ` Nathan Neulinger
  0 siblings, 1 reply; 3+ messages in thread
From: David Johnson @ 2001-09-15 23:45 UTC (permalink / raw)
  To: Neulinger, Nathan, linux-kernel

On 8/15/01, Neulinger, Nathan wrote:
>I've got a situation with the 3ware driver and 3dm monitor where it appears
>to stop receiving notification of status changes from the kernel.
>
>I've seen this with 2.2.19+variouspatches and 2.4.7-ac3. (On some other
>machines, it appears to work fine.)
>
>Basically, all of the real-time monitoring and instantaneous status request,
>as well as configuration change, etc. stuff all works fine, but after a
>while, the 3dm monitor no longer gets messages talking about drive failure
>(pulling a drive on hot-swap tray) or when the rebuilds start/stop. Even
>restarting the 3dm monitor doesn't seem to help this.

I've noticed this problem several times.  It's not just the email 
notification.  The 'alarms' section of 3dm as well as the syslog 
entries just stop happening.

>Strace doesn't seem to work on the 3dm executable since it's threaded... (Is
>there a way to get that to work?)

Have you tried '-f -F' ?  I haven't tried it since my controller is 
on a production system.

>Anyone have any ideas on this or have you seen this? The important thing is,
>if I reboot the server it's on, I'll generally be able to get a few alert
>messages, but after some time period (I think it's time based and not
>#-of-alerts based) it no longer receives new alert messages.
>
>3ware tech support doesn't have any idea on what could be wrong and didn't
>have any suggestions of what to try other than harping about what version of
>sendmail I'm running, etc. (Turning off the mail support didn't have any
>effect on the alert processing.)
>
>----
>
>On a side note - is anyone aware of any effort to reverse engineer the
>status probing code so that we could monitor the raid arrays using something
>other than 3wares 3dm tool? I presume it's just a matter of knowing the
>right ioctl's and parms to issue, but there is no info on this, and 3ware
>has no plans (according to tech support) to release any documentation or
>source for the monitoring code/protocols. Being able to strace the 3dmd
>would likely make this alot easier, but haven't been able to get strace to
>work against it.
>

On another note something (I assume 3dm) loves to look for eth 
interfaces just prior to every event:

Sep  4 15:30:20 alliance modprobe: modprobe: Can't locate module eth3
Sep  4 15:30:20 alliance modprobe: modprobe: Can't locate module eth1
Sep  4 15:30:20 alliance 3w-xxxx[874]: Drive error encountered on 
port 2 on controller ID:1. Check cables and drives for media errors. 
(0xa)

somewhat annoying, and very unnerving as it shouldn't be do anything 
with the networking setup.  Anyone else seen this?



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Problems with 3ware error event notification in kernel
  2001-09-15 23:45 ` David Johnson
@ 2001-09-15 23:48   ` Nathan Neulinger
  0 siblings, 0 replies; 3+ messages in thread
From: Nathan Neulinger @ 2001-09-15 23:48 UTC (permalink / raw)
  To: David Johnson; +Cc: linux-kernel, uetrecht

David Johnson wrote:
> 
> On 8/15/01, Neulinger, Nathan wrote:
> >I've got a situation with the 3ware driver and 3dm monitor where it appears
> >to stop receiving notification of status changes from the kernel.
> >
> >I've seen this with 2.2.19+variouspatches and 2.4.7-ac3. (On some other
> >machines, it appears to work fine.)
> >
> >Basically, all of the real-time monitoring and instantaneous status request,
> >as well as configuration change, etc. stuff all works fine, but after a
> >while, the 3dm monitor no longer gets messages talking about drive failure
> >(pulling a drive on hot-swap tray) or when the rebuilds start/stop. Even
> >restarting the 3dm monitor doesn't seem to help this.
> 
> I've noticed this problem several times.  It's not just the email
> notification.  The 'alarms' section of 3dm as well as the syslog
> entries just stop happening.
> 
> >Strace doesn't seem to work on the 3dm executable since it's threaded... (Is
> >there a way to get that to work?)
> 
> Have you tried '-f -F' ?  I haven't tried it since my controller is
> on a production system.

Yeah, it seems like the threads are causing an issue. Even attaching to
an existing process fails.

> On another note something (I assume 3dm) loves to look for eth
> interfaces just prior to every event:
> 
> Sep  4 15:30:20 alliance modprobe: modprobe: Can't locate module eth3
> Sep  4 15:30:20 alliance modprobe: modprobe: Can't locate module eth1
> Sep  4 15:30:20 alliance 3w-xxxx[874]: Drive error encountered on
> port 2 on controller ID:1. Check cables and drives for media errors.
> (0xa)
> 
> somewhat annoying, and very unnerving as it shouldn't be do anything
> with the networking setup.  Anyone else seen this?

Hmm.. haven't seen that... a bit odd though. Perhaps your modules.conf
has something in it causing the probes when 3dm tries to do mail send or
something?

-- Nathan

------------------------------------------------------------
Nathan Neulinger                       EMail:  nneul@umr.edu
University of Missouri - Rolla         Phone: (573) 341-4841
CIS - Systems Programming                Fax: (573) 341-4216

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2001-09-15 23:48 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-08-15 15:46 Problems with 3ware error event notification in kernel Neulinger, Nathan
2001-09-15 23:45 ` David Johnson
2001-09-15 23:48   ` Nathan Neulinger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).