All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Murphy <lists@colorremedies.com>
To: Fredrik Tolf <fredrik@dolda2000.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Rebalancing RAID1
Date: Thu, 14 Feb 2013 00:27:21 -0700	[thread overview]
Message-ID: <17E4F30E-2945-44FE-A87F-0F475DFD794F@colorremedies.com> (raw)
In-Reply-To: <alpine.DEB.2.02.1302140733070.8810@shack.dolda2000.com>


On Feb 13, 2013, at 11:42 PM, Fredrik Tolf <fredrik@dolda2000.com> wrote:

> 
> That's interesting to read. I haven't ever actually experienced missing a bad sector reported by a hard drive, though; and not for a lack of experience with bad sectors.

That experience is consistent with a consumer drive with an ECC timeout that's longer than linux. Well before the drive gives up, linux does (by default anyway.)

>> However, in your case, with both the kernel message ICRC ABRT, and the following SMART entry, this is your cable problem.
> 
> ... I'd still like to solve the problem as it is, so that I know what to do the next time I get some device error.

It depends on the error, but top on that list would be to stop writing to the disk. The last thing I'd do is a rebalance.

> 
>> So the question is whether the cable problem has actually been fixed, and if you're still getting ICRC errors from the kernel.
> 
> I'm not getting any block-layer errors from the kernel. The errors I posted originally are the only ones I'm getting.

Previously you reported:
Feb 12 16:36:51 nerv kernel: [36769.574831] ata6.00: status: { DRDY ERR }
Feb 12 16:36:52 nerv kernel: [36769.578867] ata6.00: error: { ICRC ABRT }

These are not block errors. You should not proceed until you're certain this isn't still intermittently occurring.


> With the general change, I actually decreased the number of drives in the system from 10 to 8, so unless the new drives are incredibly more power-hungry than the old ones, that shouldn't be a problem.

I'd find out and be certain. That ICRC error translates into low power as one of the causes, not just a cable problem.


>> Once that's solved, you should do a scrub, rather than a rebalance.
> 
> Oh, will scrubbing actually rebalance the array? I was under the impression that it only checked for bad checksums.

Scrubbing does not balance the volume. Based on the information you supplied I don't really see the reason for a rebalance.

What you do next depends on what your goal is for this data, on these two disks,  using btrfs. If the idea is to trust the data on the volume; you still have the source data so I'd mkfs.btrfs on the disks and start over. If the idea is to experiment and learn, you might want to do a btrfsck, followed by a scrub.


> I'm still wondering what those errors actually mean, though. I'm still getting them occasionally, even when I'm not rebalancing (just not as often). I'm also very curious about what it means that it's still complaining about sdd rather than sdi.

I have no idea what errors you're still getting, or in what context. This:

Feb 12 22:57:45 nerv kernel: [59626.644110] lost page write due to I/O error on /dev/sdd1

This:
Feb 12 16:36:51 nerv kernel: [36769.574831] ata6.00: status: { DRDY ERR }
Feb 12 16:36:52 nerv kernel: [36769.578867] ata6.00: error: { ICRC ABRT }

Are not btrfs errors. So if you're still getting them. You still have hardware problems to figure out.


> 
> It's worth noting that I still haven't un- and remounted the filesystem since the drive disconnected. I assumed that I shouldn't need to and that the multiple-device layer of btrfs should handle the situation correctly. Is that assumption correct?

Btrfs is stable on stable hardware. Your hardware most definitely was not stable during a series of writes. So I'd say all bets are off. That doesn't mean it can't be fixed, but the very fact you're still getting errors indicates something is still wrong.


Chris Murphy

  reply	other threads:[~2013-02-14  7:27 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-12 23:01 Rebalancing RAID1 Fredrik Tolf
2013-02-13  0:58 ` Chris Murphy
2013-02-13  6:18   ` Fredrik Tolf
2013-02-13  8:10     ` Chris Murphy
2013-02-14  6:42       ` Fredrik Tolf
2013-02-14  7:27         ` Chris Murphy [this message]
2013-02-14  7:58           ` Fredrik Tolf
2013-02-14  8:41             ` Chris Murphy
2013-02-14  8:59               ` Hugo Mills
2013-02-14 18:05                 ` Chris Murphy
2013-02-14 20:56                   ` Hugo Mills
2013-02-14 22:11                     ` Chris Murphy
2013-02-15  3:50                   ` Fredrik Tolf
2013-02-15  3:55                     ` Chris Murphy
2013-02-15  3:56                       ` Fredrik Tolf
2013-02-15  4:03                         ` Chris Murphy
2013-02-14  8:01         ` Chris Murphy
2013-02-15  4:06           ` Fredrik Tolf
2013-02-14 14:44 ` Martin Steigerwald
2013-02-14 18:45   ` Chris Murphy
2013-02-15  3:44   ` Fredrik Tolf
2013-02-15  5:49     ` Sander
2013-02-15  9:05     ` Martin Steigerwald
2013-02-15 21:56       ` Fredrik Tolf
2013-02-18 15:29         ` Stefan Behrens
2013-02-23  0:36           ` Fredrik Tolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=17E4F30E-2945-44FE-A87F-0F475DFD794F@colorremedies.com \
    --to=lists@colorremedies.com \
    --cc=fredrik@dolda2000.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.