All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Justin Stephenson" <justin@evensteveninc.com>
To: Roger Heflin <rogerheflin@gmail.com>
Cc: stan@hardwarefreak.com, Linux RAID <linux-raid@vger.kernel.org>
Subject: Re[4]: RAID 6 crashes system when being accessed
Date: Sat, 05 Jul 2014 19:22:33 +0000	[thread overview]
Message-ID: <em48073ef1-3321-4b13-b0f7-835a907c4967@littlez> (raw)
In-Reply-To: <CAAMCDefrCAQNLgPc54uUQZVW8TGDPzR-JRpmJo+raRj3uZd4eQ@mail.gmail.com>

Hello Roger,

Thank-you for your email and for laying out some trouble shooting steps 
for me. I will take these to heart and keep them on file for the future.

I can report that there was a screen of rapid scrolling text during the 
crashes and some kind of memory contents dump that had a progress 
indicator. From what I could see, there was some kind of kernel panic 
and a message about ATA-9. Nothing in the /var/log/messages file as far 
as I could see.

I had tried unmounting and running fsck before but not with your 
specified -f -y flags.

Here are the steps I took based on your input.

- ran system overnight with md raid unmounted.
- fully completed resync
- performed fsck -f -y. It took approx 6 minutes (on a 12TB volume). No 
errors reported in the printout.
- reboot
- locally initiated and completed a 22 gb copy from and to the md raid 
and a local esata external drive.

---

- from a workstation, opened SMB share to the MD raid
- workstation initiated copy to and from the CentOS box (and MD drive) 
of the same 22gb folder over SMB.
- opened vnc client to the centOS box from a workstation.

Up until the fsck -f -y any of these three operations would cause a 
crash.


In summary, it would seem that the issue has been resolved by the fsck 
-f -y. Up until running fsck - f -y, the system was completely 
unpredictable when the MD drive was mounted - either during a sync or 
after it was completed. I find this surprising, but perhaps I should 
not?

Based on Stan's email, I checked my UPS power settings, and I am certain 
I was ending up with a hard powerdown when the battery ran out. I have 
remedied this.

Could this have caused the MD volume to become unstable?

In any event, everything is up and running. I will report back with a 
log entry if anything else appears.

Thanks again,

- Justin



------ Original Message ------
From: "Roger Heflin" <rogerheflin@gmail.com>
To: "Justin Stephenson" <justin@evensteveninc.com>
Cc: stan@hardwarefreak.com; "Linux RAID" <linux-raid@vger.kernel.org>
Sent: 05/07/2014 12:17:45 AM
Subject: Re: Re[2]: RAID 6 crashes system when being accessed

>Some questions.
>
>Do you get any messages on the screen when it crashes and/or is there
>anything in /var/log/messages from the crashes?
>
>Is a sync running when it crashes? If so what kind of SATA
>controllers/setup are you using? I have had 2 previous setups that
>would run fairly stably so long as a sync was not running, but if a
>sync was running then the machine became unstable.
>
>Did you umount it and run a "fsck -f -y" that took a while (at least
>30 seconds) or just umount it and ran fsck and it finished quickly and
>indicated clean? Generally if you nicely umount it the fs thinks it
>is clean even when it is not because of some previous event.
>
>On Fri, Jul 4, 2014 at 8:08 PM, Justin Stephenson
><justin@evensteveninc.com> wrote:
>>  Hi,
>>
>>  Thanks for your reply.
>>
>>  I should clarify that the crashes continue to be an issue in the 
>>absence of
>>  any power outage so this issue is now independent of power. I 
>>mentioned the
>>  UPS only with the thought that my problems may have been caused by a 
>>sudden
>>  power-down.
>>
>>  Please let me know if there are any logs or status print outs I could 
>>pull
>>  to help troubleshoot this.
>>
>>  Thanks Again,
>>
>>  - J
>>
>>
>>
>>
>>  ------ Original Message ------
>>  From: "Stan Hoeppner" <stan@hardwarefreak.com>
>>  To: "Justin Stephenson" <justin@evensteveninc.com>;
>>  linux-raid@vger.kernel.org
>>  Sent: 04/07/2014 3:34:17 PM
>>  Subject: Re: RAID 6 crashes system when being accessed
>>
>>>  On 7/4/2014 9:11 AM, Justin Stephenson wrote:
>>>>
>>>>   Hello,
>>>>
>>>>   I am experiencing some issues with my md raid. It is crashing my 
>>>>system
>>>>   when accessed with any "verve". The reboot initiates a resync of 
>>>>the
>>>>   raid. I have gone through the crash/reboot/resynced a number of 
>>>>times
>>>>   now and the crash happens within minutes of mounting the raid.
>>>>
>>>>   Here are some details:
>>>>
>>>>   - It is a raid 6 with 7 3TB devices.
>>>>   - Formatted as EXT4
>>>>   - mdadm v3.2.6 - 25th October 2012
>>>>   - centos 6.5 kernel 2.6.32-431.3.1.el6.x86_64
>>>>   - It has been running flawlessly for the previous 6 months.
>>>>   - I have a cron script running that resyncs monthly.
>>>>   - When the raid is unmounted, the system runs fine. (I have an
>>>>   additional "dumb" hardware raid 1 for dailies attached to an ESATA 
>>>>port.
>>>>   This runs perfectly).
>>>>   - I am in the process of re-syncing the raid 6 again right now.
>>>>   - I have run an fsck on the raid volume after it was fully synced 
>>>>and
>>>>   everything came up clean.
>>>>
>>>>   - there have been lots of power outages the last while with the 
>>>>hot
>>>>   summer in Toronto. My UPS shuts the system down for me, though I 
>>>>think I
>>>>   can correlate the issues with the power outages.
>>>
>>>
>>>  This sounds like the UPS is cutting power to the system before the
>>>  shutdown sequence completes, before the array is stopped. This 
>>>assumes
>>>  you are already using apcupsd or similar. If you are check the
>>>  configuration to make sure the system has plenty of time to shutdown
>>>  after the UPS sends notification to the system. If you are not, then
>>>  this will always happen as the UPS is simply cutting power when the
>>>  battery gets low.
>>>
>>>  Note that if the UPS is undersized for this system and only yields a 
>>>few
>>>  minutes of on-battery time, it may simply not have enough juice to 
>>>keep
>>>  the machine up throughout the shutdown process.
>>>
>>>  In summary, either your shutdown software isn't configured properly, 
>>>you
>>>  are not using it, or the UPS is too small. This isn't an md problem.
>>>
>>>
>>>  Cheers,
>>>
>>>  Stan
>>
>>
>>  --
>>  To unsubscribe from this list: send the line "unsubscribe linux-raid" 
>>in
>>  the body of a message to majordomo@vger.kernel.org
>>  More majordomo info at http://vger.kernel.org/majordomo-info.html


  reply	other threads:[~2014-07-05 19:22 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-04 14:11 RAID 6 crashes system when being accessed Justin Stephenson
2014-07-04 19:34 ` Stan Hoeppner
2014-07-05  1:08   ` Re[2]: " Justin Stephenson
2014-07-05  4:17     ` Roger Heflin
2014-07-05 19:22       ` Justin Stephenson [this message]
2014-07-05 20:42         ` Re[4]: " Roger Heflin
2014-07-07  0:54           ` Re[6]: " Justin Stephenson
2014-07-07  1:56             ` Roger Heflin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=em48073ef1-3321-4b13-b0f7-835a907c4967@littlez \
    --to=justin@evensteveninc.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=rogerheflin@gmail.com \
    --cc=stan@hardwarefreak.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.