From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from list by lists.gnu.org with archive (Exim 4.71)
	id 1Z95en-0005LB-BT
	for mharc-grub-devel@gnu.org; Sun, 28 Jun 2015 02:01:33 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:39265)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <arvidjaar@gmail.com>) id 1Z95ek-0005L0-0R
	for grub-devel@gnu.org; Sun, 28 Jun 2015 02:01:31 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <arvidjaar@gmail.com>) id 1Z95eg-0006HH-RD
	for grub-devel@gnu.org; Sun, 28 Jun 2015 02:01:29 -0400
Received: from mail-la0-x22f.google.com ([2a00:1450:4010:c03::22f]:34166)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <arvidjaar@gmail.com>) id 1Z95eg-0006Gy-EV
	for grub-devel@gnu.org; Sun, 28 Jun 2015 02:01:26 -0400
Received: by lagx9 with SMTP id x9so93825700lag.1
	for <grub-devel@gnu.org>; Sat, 27 Jun 2015 23:01:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=date:from:to:cc:subject:message-id:in-reply-to:references
	:mime-version:content-type:content-transfer-encoding;
	bh=rspdzVPK0pL7kALKfH3JTh/AutIVbz3qrGbekndDczY=;
	b=zaSw9GKI4mnCRgSPqs4b8I99GSNL8UGRSZNrsBCXTyI0rhuBAduO9OanbRhiUldM2G
	eOAmgLvPYpWyY5NNsnymTQHYOJIahyOrWuJTutS0xNgkZOpBQ/s6x/pXyafskrtpBsBg
	bju12iEr49WXUEu2TWPafVo01LHAbiK/tnekW6GPptJWRBDZ/nDZg+HKBaSMdJMOC0+U
	mMTBFpFWMaS0CuZaMM1zQ2Eu2nErELQZkppdKjwqOt3KRKI6l0L6g5rPl5OWDIB8L3Jb
	7Jkgcb2zTjZwhC7c9On61+22iECiU8l7ipx2Z2sEz6rhDRYPaHdpfskOPjx8G1sDl7vW
	nAZA==
X-Received: by 10.112.126.42 with SMTP id mv10mr8843928lbb.58.1435471284071;
	Sat, 27 Jun 2015 23:01:24 -0700 (PDT)
Received: from opensuse.site (ppp91-76-14-38.pppoe.mtu-net.ru. [91.76.14.38])
	by mx.google.com with ESMTPSA id
	jr6sm8719002lab.12.2015.06.27.23.01.23
	(version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Sat, 27 Jun 2015 23:01:23 -0700 (PDT)
Date: Sun, 28 Jun 2015 09:01:21 +0300
From: Andrei Borzenkov <arvidjaar@gmail.com>
To: Dale Carstensen  <dlc@lampinc.com>
Subject: Re: grub rescue read or write sector outside of partition
Message-ID: <20150628090121.465303f2@opensuse.site>
In-Reply-To: <20150627221525.55BF529C@lacn.los-alamos.net>
References: <20150626000953.M20169@lampinc.com>
	<20150626111114.247a6e3a@opensuse.site>
	<20150627221525.55BF529C@lacn.los-alamos.net>
X-Mailer: Claws Mail 3.11.0 (GTK+ 2.24.28; x86_64-suse-linux-gnu)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
	(bad octet value).
X-Received-From: 2a00:1450:4010:c03::22f
Cc: grub-devel@gnu.org
X-BeenThere: grub-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
Reply-To: The development of GNU GRUB <grub-devel@gnu.org>
List-Id: The development of GNU GRUB <grub-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/grub-devel>,
	<mailto:grub-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/grub-devel>
List-Post: <mailto:grub-devel@gnu.org>
List-Help: <mailto:grub-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/grub-devel>,
	<mailto:grub-devel-request@gnu.org?subject=subscribe>
X-List-Received-Date: Sun, 28 Jun 2015 06:01:31 -0000

=D0=92 Sat, 27 Jun 2015 15:17:38 -0700
Dale Carstensen  <dlc@lampinc.com> =D0=BF=D0=B8=D1=88=D0=B5=D1=82:

>=20
> The bash history is long gone.  My feeble memory is that it
> was simply
>=20
>  grub2-install /dev/sdf
>=20
> and it responded there were no errors.
>=20
> Eventually I booted from a DVD and used chroot to do
>=20
>  grub2-install /dev/sdb
>=20
> The disk configuration, as shown by /proc/mdstat, is:
>=20
> md126 : active raid6 sdf8[5] sdd1[4] sdc1[3] sdb1[2] sda1[1]
>       87836160 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/5] [U=
UUUU]
>      =20
> md127 : active raid6 sdf10[5] sdd3[4] sdc3[3] sdb3[2] sda3[1]
>       840640512 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/5] [=
UUUUU]
>       bitmap: 1/3 pages [4KB], 65536KB chunk
>=20
> / is mounted from md126p1, /home from md127p1.
>=20

Do you really have partitioned MD RAID? Why? The only reason to have it
is firmware RAID, but here you already build MD from partitions. Having
additional partitions on top of them just complicates things.

> The sdf8 and sdf10 partitions are on the replacement drive.
> The former partitions those replaced are still on sde8 and
> sde10.
>=20

Was drive sde still present in system after you had replaced it with
sdf in MD RAID? You said it failed - how exactly? Was it media failure
on some sectors or was it complete failure, which made drive
inaccessible?

> Grub calls sde (hd0), sdf (hd1), md126 (md/1) and md127 (md/3).
> The DVD boot calls sde sda, and sdf sdb.  All neatly made
> consistent by those long UUID strings.  And grub calls
> md126p1 (md/1,gpt1), but for command input seems to like
> (md/1,1) without the label-type distinction.
>=20

Yes, you indeed have partitioned RAID. And grub even does work with it.
Wow! :)

...
>=20
> Well, it seems to work again.
>=20
> The first baby step was to make a partition on (hd1)/sdb/sdf
> starting at block 34 and ending at block 2047.  Partition 8
> begins at block 2048, and originally I set it to type ef02.

If at this state you run grub2-install /dev/sdf *and* it completed
without errors, it means it overwrote beginning of /dev/sdf8. I wonder
what can we do to check that partition is not in use. May be opening it
exclusively would help in this case.

> Then I changed it to fd00 and made the block 34 partition (11)
> type ef02.  I tried to make that partition 11 ext3 and
> put some of /boot in it, but obviously it's way too short
> for anything but grub's persistent environment.  So I used
> dd with if=3D/dev/zero to clear it.  And I did grub2-install
> with the --recheck option.  All while booted from DVD and
> using chroot, keeping in mind the device was /dev/sdb.
>=20
> That avoided "grub rescue", but the only kernel it found was
> the old one, 3.8.11.
>=20

Not sure I understand so far how it could fix "grub rescue".

> I stabbed in the dark through another 4 or 5 reboots, until
> eventually pager=3D1 and cat to look at /boot/grub/grub.cfg
> showed that the only menuentry in it for Linux was for
> 3.8.11, while I knew the latest grub.cfg I had also had
> the new 4.0.5, as well as older 3.8.9 and 3.8.7 ones.
> I'm still not sure where that grub.cfg came from, but I
> made the assumption that it had to do with grub being too
> liberal about failed members of RAID6 partitions.
>=20
> So I ran
>=20
>  mdadm --zero-superblock /dev/sde8
>=20
> and also for 10.
>=20
> I think that fixed things.

Yes, *that* is quite possible. Unfortunately GRUB does not currently
checks whether disk superblock generations match each other, and it
stops scanning as soon as enough disks are found, so it could pick up
stale pieces from sde instead of new one from sdf.

So to be on safe side it is necessary to either remove replaced drive
or zero it out, so it is not detected as part of RAID.

Anyway, I'm happy you fixed it and thank you very much for sharing your
experience, it is quite helpful!