From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from list by lists.gnu.org with archive (Exim 4.71) id 1Z8yQP-0006Su-3S for mharc-grub-devel@gnu.org; Sat, 27 Jun 2015 18:18:13 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:32910) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z8yQM-0006RZ-Bw for grub-devel@gnu.org; Sat, 27 Jun 2015 18:18:12 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z8yQI-0007mT-2q for grub-devel@gnu.org; Sat, 27 Jun 2015 18:18:10 -0400 Received: from los-alamos.net ([67.133.86.10]:2599) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z8yQH-0007ll-PT for grub-devel@gnu.org; Sat, 27 Jun 2015 18:18:05 -0400 Received: from lacn.los-alamos.net (lacn.los-alamos.net [10.9.91.6]) by los-alamos.net (8.12.8/8.10.1) with ESMTP id t5RMLOZF001202; Sat, 27 Jun 2015 16:21:24 -0600 (MDT) Received: from localhost (lacn.los-alamos.net [127.0.0.1]) by lacn.los-alamos.net (Postfix) with ESMTP id 856943B8; Sat, 27 Jun 2015 16:16:11 -0600 (MDT) Received: from lacn.los-alamos.net ([127.0.0.1]) by localhost (lacn.los-alamos.net [127.0.0.1]) (amavisd-maia, port 10024) with ESMTP id 21607-05; Sat, 27 Jun 2015 16:15:25 -0600 (MDT) Received: from lampinc.com (lacn.los-alamos.net [127.0.0.1]) by lacn.los-alamos.net (Postfix) with ESMTP id 55BF529C; Sat, 27 Jun 2015 16:15:25 -0600 (MDT) X-Mailer: exmh version 2.7.0 04/02/2003 (gentoo 2.7.0) with nmh-1.3 To: Andrei Borzenkov Subject: Re: grub rescue read or write sector outside of partition In-reply-to: <20150626111114.247a6e3a@opensuse.site> References: <20150626000953.M20169@lampinc.com> <20150626111114.247a6e3a@opensuse.site> Comments: In-reply-to Andrei Borzenkov message dated "Fri, 26 Jun 2015 11:11:14 +0300." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sat, 27 Jun 2015 15:17:38 -0700 From: Dale Carstensen Message-Id: <20150627221525.55BF529C@lacn.los-alamos.net> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 67.133.86.10 Cc: grub-devel@gnu.org X-BeenThere: grub-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: The development of GNU GRUB List-Id: The development of GNU GRUB List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 27 Jun 2015 22:18:12 -0000 TL;DR it looks to me like grub has a problem with leaving failed mdadm RAID6 members around Thanks to Fajar A. Nugraha for the advice about --modules for grub-install (seems to me to be undocumented). I managed to stumble through without enhancing the commands for "grub rescue", but it's good to know I could have. I still have a question, though. The grub.cfg file has menuentry nesting, with an outer name of "Gentoo GNU/Linux", and inner names by version/recovery. But I can't find any documentation of how to navigate to choose, say, 3.8.11, now that I've made 4.0.5 default. Seems to me all the lines used to show up. Maybe I manually took out the nesting before?? So what key(s) drill down into sub-menus on the grub menu? Did I miss it in the info page / manual? >Date: Fri, 26 Jun 2015 11:11:14 +0300 >To: "Dale Carstensen" >cc: grub-devel@gnu.org >From: Andrei Borzenkov >Subject: Re: grub rescue read or write sector outside of partition > Thu, 25 Jun 2015 17:33:25 -0700 >"Dale Carstensen" : > >> I had a drive fail, and it is the one that had grub on it. >> It had parts of two RAID-6 partitions, too. So I bought a >> new drive and added partitions on it to replace the failed >> RAID-6 parts. That was still booting OK from the failed >> drive, but then I updated the kernel, and I decided to also >> install a new grub on the new drive. > >How? Please show exact commands you used as well as your disk >configuration. The bash history is long gone. My feeble memory is that it was simply grub2-install /dev/sdf and it responded there were no errors. Eventually I booted from a DVD and used chroot to do grub2-install /dev/sdb The disk configuration, as shown by /proc/mdstat, is: md126 : active raid6 sdf8[5] sdd1[4] sdc1[3] sdb1[2] sda1[1] 87836160 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/5] [UUUUU] md127 : active raid6 sdf10[5] sdd3[4] sdc3[3] sdb3[2] sda3[1] 840640512 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/5] [UUUUU] bitmap: 1/3 pages [4KB], 65536KB chunk / is mounted from md126p1, /home from md127p1. The sdf8 and sdf10 partitions are on the replacement drive. The former partitions those replaced are still on sde8 and sde10. Grub calls sde (hd0), sdf (hd1), md126 (md/1) and md127 (md/3). The DVD boot calls sde sda, and sdf sdb. All neatly made consistent by those long UUID strings. And grub calls md126p1 (md/1,gpt1), but for command input seems to like (md/1,1) without the label-type distinction. Or maybe I have md/1 and md/3 swapped?? I hope not. The command that replaced the bad drive with the good in RAID6 was mdadm --add /dev/md126 /dev/sdf8 Below, I'll show what I think has made it stable and useful again. >> That seemed to go OK until I tried to reboot. I landed in >> grub rescue. Fortunately I have several computers, so I can >> look up documentation, etc. without my main desktop functioning. >> Somewhere I found that grub rescue has only a few commands, none >> of them "help" or a list of commands, and no TAB-expansions. >> Well, they seem to be ls, set, unset and insmod. Supposedly, >> running insmod normal, then normal, will get back to the >> fuller set of commands with help, but that's where it gets >> the "outside of partition" error, it seems. >> >> I can ls the /boot/grub/i386-pc/ directory, where normal.mod >> is, so I would think grub rescue could find and read normal.mod, >> too, but, I guess not. >> > >Please show output of "set" command at this point. In the original grub rescue event, I think set output this: cmdpath=(hd0) prefix=(mduuid/73fc9531-525f-05e9-6992-6654b5b95a33,1)/boot/grub root=mduuid/73fc9531-525f-05e9-6992-6654b5b95a33,1 And the number 73fc...5a33 is the blkid for /dev/sdf8. I think it was just the three variables. Note that I booted from (hd1), but somehow cmdpath got diverted to (hd0), though the UUID for prefix and root were still on (hd1). Unless I misremember. > >> So, set debug=all helped a little, expanding the message >> from just something like (I'd have to keep trying to >> reboot to get it verbatim) read or write bad, to >> the specific size of the partition (in decimal, around >> 175 million 512-byte blocks) and the sector it is trying >> to read (read.c:461) (in hexadecimal), around 10 million. >> But 10 million hex really is larger than 175 million >> decimal. >> >> So maybe my BIOS has some limitation on how deep it can >> read into this 2 TB drive, or maybe the drive having >> hardware sectors of 4096 bytes replacing one with >> 512 confuses grub. But the old drive with the failures >> gets the same problem. >> >> It's gentoo, grub2 (I could look up the version once it's >> running again), Part of the output of eix grub | cat is [I] sys-boot/grub ... Installed versions: 2.02_beta2-r3(2)^t(07:25:12 03/13/15)(multislot nls sdl truetype -debug -device-mapper -doc -efiemu -libzfs -mount -static -test GRUB_PLATFORMS="-coreboot -efi-32 -efi-64 -emu -ieee1275 -loongson -multiboot -pc -qemu -qemu-mips -xen") >> 64-bit (although grub seems not to really >> notice 32- vs 64-bit, or the kernel, so I'm not sure it's >> just smart or really dumb), It is multilib, so 32- vs 64-bit appearance is nuanced. >> and, like I say, the / partition >> is RAID-6, including /boot. I'm going to try making a >> non-RAID /boot, maybe later I'll try making it RAID-1, >> to see if that helps. >> >> Any advise? >> >> Thanks. Well, it seems to work again. The first baby step was to make a partition on (hd1)/sdb/sdf starting at block 34 and ending at block 2047. Partition 8 begins at block 2048, and originally I set it to type ef02. Then I changed it to fd00 and made the block 34 partition (11) type ef02. I tried to make that partition 11 ext3 and put some of /boot in it, but obviously it's way too short for anything but grub's persistent environment. So I used dd with if=/dev/zero to clear it. And I did grub2-install with the --recheck option. All while booted from DVD and using chroot, keeping in mind the device was /dev/sdb. That avoided "grub rescue", but the only kernel it found was the old one, 3.8.11. I stabbed in the dark through another 4 or 5 reboots, until eventually pager=1 and cat to look at /boot/grub/grub.cfg showed that the only menuentry in it for Linux was for 3.8.11, while I knew the latest grub.cfg I had also had the new 4.0.5, as well as older 3.8.9 and 3.8.7 ones. I'm still not sure where that grub.cfg came from, but I made the assumption that it had to do with grub being too liberal about failed members of RAID6 partitions. So I ran mdadm --zero-superblock /dev/sde8 and also for 10. I think that fixed things. Oh, I also had, before the zero-superblock, changed /etc/default/grub to set the default menu item to the long weird id for 4.0.5. So, it's working, or at least appears to work. I suppose I should check whether cmdpath in grub is (hd1) or maybe is still the incorrect (hd0).