linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* GRUB bug with Btrfs multiple devices
@ 2019-11-26  4:05 Chris Murphy
  2019-11-26 21:11 ` Goffredo Baroncelli
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Murphy @ 2019-11-26  4:05 UTC (permalink / raw)
  To: Btrfs BTRFS

grub2-efi-x64-2.02-100.fc31.x86_64
kernel-5.3.13-300.fc31.x86_64

I've seen this before, so it isn't a regression in either of the above
versions. But I'm also not certain when the regression occurred,
because the last time I tested Btrfs multiple devices (specifically
data single profile), was years ago and I didn't run into this.

The gist to reproduce:
1. btrfs single device, single profile data, single profile metadata
2. device starts to run out of space; no problem 'btrfs device add
/dev/'  voila it works, reboots, keeps on working for a while, but
then...
3. install a kernel or two or three or four

I suspect that at some point kernels end up on the newly added device
due to new block groups eventually being created there, and GRUB
subsequently gets confused, starts spewing a bunch of error
information which I have to page through. Eventually it does find
everything and does boot. But it's kinda ugly and I'm not really sure
how to gather more information.

Shaky cam video of the boot is here:
https://photos.app.goo.gl/wvJbB6kBEFzNwogo6


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: GRUB bug with Btrfs multiple devices
  2019-11-26  4:05 GRUB bug with Btrfs multiple devices Chris Murphy
@ 2019-11-26 21:11 ` Goffredo Baroncelli
  2019-11-26 23:53   ` Chris Murphy
  0 siblings, 1 reply; 23+ messages in thread
From: Goffredo Baroncelli @ 2019-11-26 21:11 UTC (permalink / raw)
  To: Chris Murphy, Btrfs BTRFS

On 26/11/2019 05.05, Chris Murphy wrote:
> grub2-efi-x64-2.02-100.fc31.x86_64
> kernel-5.3.13-300.fc31.x86_64
> 
> I've seen this before, so it isn't a regression in either of the above
> versions. But I'm also not certain when the regression occurred,
> because the last time I tested Btrfs multiple devices (specifically
> data single profile), was years ago and I didn't run into this.

 From the video, it seems that GRUB complaints about a "failure reading". However GRUB is capable to perform the boot and because the profiles are "single (no redundancy), it seems a "false positive" error.

When I added the RADID5/6 support to grub, I remember errors like what you showed. However it happened 1 year ago, so my remember may be wrong.
I noticed that GRUB test a lot of disks (hd0 ... hd3) . Could you be so kindly to share the disks layout ? Most error is something like "failure reading sector 0xXX". However I can't read the XX number: could you be so kindly to tell us which number is "XX" ? It seems 0x80... but my eyes are bad and your video is even worse :-)

I think that the errors is due to the "rescan" logic (see grub commit [1]). Could you try a more recent grub (2.04 instead of 2.02) ?

> The gist to reproduce:
> 1. btrfs single device, single profile data, single profile metadata
> 2. device starts to run out of space; no problem 'btrfs device add
> /dev/'  voila it works, reboots, keeps on working for a while, but
> then...
> 3. install a kernel or two or three or four
> 
> I suspect that at some point kernels end up on the newly added device
> due to new block groups eventually being created there, and GRUB
> subsequently gets confused, starts spewing a bunch of error
> information which I have to page through. Eventually it does find
> everything and does boot. But it's kinda ugly and I'm not really sure
> how to gather more information.
> 
> Shaky cam video of the boot is here:
> https://photos.app.goo.gl/wvJbB6kBEFzNwogo6
> 
> 

[1] http://git.savannah.gnu.org/cgit/grub.git/commit/grub-core/fs/btrfs.c?id=fd5a1d82f1d6a0482f5fe201ce646ddba8574bab


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: GRUB bug with Btrfs multiple devices
  2019-11-26 21:11 ` Goffredo Baroncelli
@ 2019-11-26 23:53   ` Chris Murphy
  2019-11-27  1:35     ` Chris Murphy
                       ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Chris Murphy @ 2019-11-26 23:53 UTC (permalink / raw)
  To: Goffredo Baroncelli; +Cc: Chris Murphy, Btrfs BTRFS

On Tue, Nov 26, 2019 at 2:11 PM Goffredo Baroncelli <kreijack@libero.it> wrote:
>
> On 26/11/2019 05.05, Chris Murphy wrote:
> > grub2-efi-x64-2.02-100.fc31.x86_64
> > kernel-5.3.13-300.fc31.x86_64
> >
> > I've seen this before, so it isn't a regression in either of the above
> > versions. But I'm also not certain when the regression occurred,
> > because the last time I tested Btrfs multiple devices (specifically
> > data single profile), was years ago and I didn't run into this.
>
>  From the video, it seems that GRUB complaints about a "failure reading". However GRUB is capable to perform the boot and because the profiles are "single (no redundancy), it seems a "false positive" error.
>
> When I added the RADID5/6 support to grub, I remember errors like what you showed. However it happened 1 year ago, so my remember may be wrong.
> I noticed that GRUB test a lot of disks (hd0 ... hd3) . Could you be so kindly to share the disks layout ? Most error is something like "failure reading sector 0xXX". However I can't read the XX number: could you be so kindly to tell us which number is "XX" ? It seems 0x80... but my eyes are bad and your video is even worse :-)

It was a dark room and shaky cam was seeking for focus :-D It's 0x80.

The storage is one CD-ROM drive and one SSD drive. That's it. So I
don't know why there's hd2 and hd3, it seems like GRUB is confused
about how many drives there are, but that pre-dates this problem.


> I think that the errors is due to the "rescan" logic (see grub commit [1]). Could you try a more recent grub (2.04 instead of 2.02) ?

Yes Fedora Rawhide has 2.04 in it, so I'll give that a shot next time
I rebuild this particular laptop, which should be relatively soon; or
even maybe I can reproduce this problem in a VM with two virtio
devices.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: GRUB bug with Btrfs multiple devices
  2019-11-26 23:53   ` Chris Murphy
@ 2019-11-27  1:35     ` Chris Murphy
  2019-11-27  6:07       ` Goffredo Baroncelli
  2019-11-27  6:09     ` Goffredo Baroncelli
  2019-11-29 20:50     ` Andrei Borzenkov
  2 siblings, 1 reply; 23+ messages in thread
From: Chris Murphy @ 2019-11-27  1:35 UTC (permalink / raw)
  To: Goffredo Baroncelli; +Cc: Btrfs BTRFS

On Tue, Nov 26, 2019 at 4:53 PM Chris Murphy <lists@colorremedies.com> wrote:
>
> On Tue, Nov 26, 2019 at 2:11 PM Goffredo Baroncelli <kreijack@libero.it> wrote:
> >
> > I think that the errors is due to the "rescan" logic (see grub commit [1]). Could you try a more recent grub (2.04 instead of 2.02) ?
>
> Yes Fedora Rawhide has 2.04 in it, so I'll give that a shot next time
> I rebuild this particular laptop, which should be relatively soon; or
> even maybe I can reproduce this problem in a VM with two virtio
> devices.

I was able to just update to the Fedora 2.04-4.fc32 packages. It's not
upstream's but it's a quick and dirty way to give it a shot. Turns
out, the same errors happen, although the line number for efidisk.c
has changed:
https://photos.app.goo.gl/aKWRYhJkkJRDtC1W7

For grins, I dropped to a grub prompt, and issued ls and get a different result:
https://photos.app.goo.gl/MvL9QZa6zGsiktAf9

Also for what it's worth, the Btrfs in question is on hd5,gpt4 and
hd5gpt5 - same physical device, different partitions.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: GRUB bug with Btrfs multiple devices
  2019-11-27  1:35     ` Chris Murphy
@ 2019-11-27  6:07       ` Goffredo Baroncelli
  2019-11-28  0:42         ` Chris Murphy
  0 siblings, 1 reply; 23+ messages in thread
From: Goffredo Baroncelli @ 2019-11-27  6:07 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On 27/11/2019 02.35, Chris Murphy wrote:
> On Tue, Nov 26, 2019 at 4:53 PM Chris Murphy <lists@colorremedies.com> wrote:
>>
>> On Tue, Nov 26, 2019 at 2:11 PM Goffredo Baroncelli <kreijack@libero.it> wrote:
>>>
>>> I think that the errors is due to the "rescan" logic (see grub commit [1]). Could you try a more recent grub (2.04 instead of 2.02) ?
>>
>> Yes Fedora Rawhide has 2.04 in it, so I'll give that a shot next time
>> I rebuild this particular laptop, which should be relatively soon; or
>> even maybe I can reproduce this problem in a VM with two virtio
>> devices.
> 
> I was able to just update to the Fedora 2.04-4.fc32 packages. It's not
> upstream's but it's a quick and dirty way to give it a shot. Turns
> out, the same errors happen, although the line number for efidisk.c
> has changed:
> https://photos.app.goo.gl/aKWRYhJkkJRDtC1W7
> 
> For grins, I dropped to a grub prompt, and issued ls and get a different result:
> https://photos.app.goo.gl/MvL9QZa6zGsiktAf9

Looking at the second picture, it seems that grub had problem to access the disk 0..3 not only when is doing a btrfs activity.
No problem accessing hd4 and hd5*

Could you enable the debug, doing

	set pager=1
	set debug=all

?

> 
> Also for what it's worth, the Btrfs in question is on hd5,gpt4 and
> hd5gpt5 - same physical device, different partitions.
> 
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: GRUB bug with Btrfs multiple devices
  2019-11-26 23:53   ` Chris Murphy
  2019-11-27  1:35     ` Chris Murphy
@ 2019-11-27  6:09     ` Goffredo Baroncelli
  2019-11-29 20:50     ` Andrei Borzenkov
  2 siblings, 0 replies; 23+ messages in thread
From: Goffredo Baroncelli @ 2019-11-27  6:09 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On 27/11/2019 00.53, Chris Murphy wrote:
> On Tue, Nov 26, 2019 at 2:11 PM Goffredo Baroncelli <kreijack@libero.it> wrote:
>>
>> On 26/11/2019 05.05, Chris Murphy wrote:
>>> grub2-efi-x64-2.02-100.fc31.x86_64
>>> kernel-5.3.13-300.fc31.x86_64
>>>
>>> I've seen this before, so it isn't a regression in either of the above
>>> versions. But I'm also not certain when the regression occurred,
>>> because the last time I tested Btrfs multiple devices (specifically
>>> data single profile), was years ago and I didn't run into this.
>>
>>   From the video, it seems that GRUB complaints about a "failure reading". However GRUB is capable to perform the boot and because the profiles are "single (no redundancy), it seems a "false positive" error.
>>
>> When I added the RADID5/6 support to grub, I remember errors like what you showed. However it happened 1 year ago, so my remember may be wrong.
>> I noticed that GRUB test a lot of disks (hd0 ... hd3) . Could you be so kindly to share the disks layout ? Most error is something like "failure reading sector 0xXX". However I can't read the XX number: could you be so kindly to tell us which number is "XX" ? It seems 0x80... but my eyes are bad and your video is even worse :-)
> 
> It was a dark room and shaky cam was seeking for focus :-D It's 0x80.
> 
> The storage is one CD-ROM drive and one SSD drive. That's it. So I
> don't know why there's hd2 and hd3, it seems like GRUB is confused
> about how many drives there are, but that pre-dates this problem.

If these drives are phantom ones, these could be the root of the problem...

> 
> 
>> I think that the errors is due to the "rescan" logic (see grub commit [1]). Could you try a more recent grub (2.04 instead of 2.02) ?
> 
> Yes Fedora Rawhide has 2.04 in it, so I'll give that a shot next time
> I rebuild this particular laptop, which should be relatively soon; or
> even maybe I can reproduce this problem in a VM with two virtio
> devices.
> 
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: GRUB bug with Btrfs multiple devices
  2019-11-27  6:07       ` Goffredo Baroncelli
@ 2019-11-28  0:42         ` Chris Murphy
  2019-11-28 17:58           ` Goffredo Baroncelli
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Murphy @ 2019-11-28  0:42 UTC (permalink / raw)
  To: Goffredo Baroncelli; +Cc: Btrfs BTRFS

On Tue, Nov 26, 2019 at 11:07 PM Goffredo Baroncelli <kreijack@inwind.it> wrote:
>
> On 27/11/2019 02.35, Chris Murphy wrote:
> > On Tue, Nov 26, 2019 at 4:53 PM Chris Murphy <lists@colorremedies.com> wrote:
> >>
> >> On Tue, Nov 26, 2019 at 2:11 PM Goffredo Baroncelli <kreijack@libero.it> wrote:
> >>>
> >>> I think that the errors is due to the "rescan" logic (see grub commit [1]). Could you try a more recent grub (2.04 instead of 2.02) ?
> >>
> >> Yes Fedora Rawhide has 2.04 in it, so I'll give that a shot next time
> >> I rebuild this particular laptop, which should be relatively soon; or
> >> even maybe I can reproduce this problem in a VM with two virtio
> >> devices.
> >
> > I was able to just update to the Fedora 2.04-4.fc32 packages. It's not
> > upstream's but it's a quick and dirty way to give it a shot. Turns
> > out, the same errors happen, although the line number for efidisk.c
> > has changed:
> > https://photos.app.goo.gl/aKWRYhJkkJRDtC1W7
> >
> > For grins, I dropped to a grub prompt, and issued ls and get a different result:
> > https://photos.app.goo.gl/MvL9QZa6zGsiktAf9
>
> Looking at the second picture, it seems that grub had problem to access the disk 0..3 not only when is doing a btrfs activity.
> No problem accessing hd4 and hd5*
>
> Could you enable the debug, doing
>
>         set pager=1
>         set debug=all

I need to narrow the scope. Adding 'set debug=all', there's just way
too much to video, minutes of pages just holding down space bar full
time which is even too fast to video. There must be over 1000 pages, a
tiny minority contain efidisk.c references, the vast majority are
btrfs.c references. As many pages as there are, I was never able to
stop right on a boundary between efidisk.c and btrfs.c. So I gave up
on that approach.

Since the errors happen with efidisk.c I've enabled 'set
debug=efidisk' and captured 74 photos, available at the link below
(they are in pager order)

https://photos.app.goo.gl/nuDH5hFMRxUVKXpX6

It does seem that the errors only happen in efidisk.c and only when
trying to read from what might be phantom devices; I do not know how a
second device in a Btrfs volume triggers this though. There must be
some interaction between efidisk.c and btrfs.c? The grubx64.efi,
grubenv, grub.cfg, and grub modules are all on an HFS+ (no journal)
file system acting as the EFI System partition (as is the default
behavior in Fedora on Macs for many years now). Only vmlinuz and
initramfs are on Btrfs. So I'm not really even sure why btrfs.c gets
called before the GRUB menu is displayed.

I'll see about reproducing this with a VM using edk2 UEFI and two
virtio devices, at least get to a cleaner environment so we're not
confusing multiple system specific weird things. And I can also leave
this particular Mac laptop as it is for further study.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: GRUB bug with Btrfs multiple devices
  2019-11-28  0:42         ` Chris Murphy
@ 2019-11-28 17:58           ` Goffredo Baroncelli
  2019-11-28 20:05             ` Chris Murphy
  0 siblings, 1 reply; 23+ messages in thread
From: Goffredo Baroncelli @ 2019-11-28 17:58 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On 28/11/2019 01.42, Chris Murphy wrote:
> On Tue, Nov 26, 2019 at 11:07 PM Goffredo Baroncelli <kreijack@inwind.it> wrote:
>>
[...]
>> Could you enable the debug, doing
>>
>>          set pager=1
>>          set debug=all
> 
> I need to narrow the scope. Adding 'set debug=all', there's just way
> too much to video, minutes of pages just holding down space bar full
> time which is even too fast to video. There must be over 1000 pages, a
> tiny minority contain efidisk.c references, the vast majority are
> btrfs.c references. As many pages as there are, I was never able to
> stop right on a boundary between efidisk.c and btrfs.c. So I gave up
> on that approach.

If I remember correctly, in the previous email you reports that even a simple "ls" at the grub prompt raises an error.
So you could watch what happens when doing something simpler like "ls" or "ls (hd0)"


> 
> Since the errors happen with efidisk.c I've enabled 'set
> debug=efidisk' and captured 74 photos, available at the link below
> (they are in pager order)
> 
> 
> 
> It does seem that the errors only happen in efidisk.c and only when
> trying to read from what might be phantom devices; I do not know how a
> second device in a Btrfs volume triggers this though. There must be
> some interaction between efidisk.c and btrfs.c? The grubx64.efi,
> grubenv, grub.cfg, and grub modules are all on an HFS+ (no journal)
> file system acting as the EFI System partition (as is the default
> behavior in Fedora on Macs for many years now). Only vmlinuz and
> initramfs are on Btrfs. So I'm not really even sure why btrfs.c gets
> called before the GRUB menu is displayed.
> 
> I'll see about reproducing this with a VM using edk2 UEFI and two
> virtio devices, at least get to a cleaner environment so we're not
> confusing multiple system specific weird things. And I can also leave
> this particular Mac laptop as it is for further study.
> 
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: GRUB bug with Btrfs multiple devices
  2019-11-28 17:58           ` Goffredo Baroncelli
@ 2019-11-28 20:05             ` Chris Murphy
  2019-11-28 21:57               ` Goffredo Baroncelli
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Murphy @ 2019-11-28 20:05 UTC (permalink / raw)
  To: Goffredo Baroncelli; +Cc: Chris Murphy, Btrfs BTRFS

On Thu, Nov 28, 2019 at 10:58 AM Goffredo Baroncelli <kreijack@inwind.it> wrote:
>
> On 28/11/2019 01.42, Chris Murphy wrote:
> > On Tue, Nov 26, 2019 at 11:07 PM Goffredo Baroncelli <kreijack@inwind.it> wrote:
> >>
> [...]
> >> Could you enable the debug, doing
> >>
> >>          set pager=1
> >>          set debug=all
> >
> > I need to narrow the scope. Adding 'set debug=all', there's just way
> > too much to video, minutes of pages just holding down space bar full
> > time which is even too fast to video. There must be over 1000 pages, a
> > tiny minority contain efidisk.c references, the vast majority are
> > btrfs.c references. As many pages as there are, I was never able to
> > stop right on a boundary between efidisk.c and btrfs.c. So I gave up
> > on that approach.
>
> If I remember correctly, in the previous email you reports that even a simple "ls" at the grub prompt raises an error.
> So you could watch what happens when doing something simpler like "ls" or "ls (hd0)"

Errors with only ls.
https://photos.app.goo.gl/BJpsLvwpL6yf19uj6

Errors with ls per device
https://photos.app.goo.gl/pgxQDdj1JDjq86mZ9

But without rebooting, just repeating the ls for the same devices, I
don't get the error for hd4 again.
https://photos.app.goo.gl/M6yraHfgfAsMigaP8

From the first ls, it shows GPT on hd5, shouldn't 'ls (hd5)' report
GPT rather than no file system? gdisk finds no problem with the GPT on
/dev/sda which is hd5.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: GRUB bug with Btrfs multiple devices
  2019-11-28 20:05             ` Chris Murphy
@ 2019-11-28 21:57               ` Goffredo Baroncelli
  2019-11-29 17:57                 ` Chris Murphy
  0 siblings, 1 reply; 23+ messages in thread
From: Goffredo Baroncelli @ 2019-11-28 21:57 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On 28/11/2019 21.05, Chris Murphy wrote:
> On Thu, Nov 28, 2019 at 10:58 AM Goffredo Baroncelli <kreijack@inwind.it> wrote:
>>
>> On 28/11/2019 01.42, Chris Murphy wrote:
>>> On Tue, Nov 26, 2019 at 11:07 PM Goffredo Baroncelli <kreijack@inwind.it> wrote:
>>>>
>> [...]
>>>> Could you enable the debug, doing
>>>>
>>>>           set pager=1
>>>>           set debug=all
>>>
>>> I need to narrow the scope. Adding 'set debug=all', there's just way
>>> too much to video, minutes of pages just holding down space bar full
>>> time which is even too fast to video. There must be over 1000 pages, a
>>> tiny minority contain efidisk.c references, the vast majority are
>>> btrfs.c references. As many pages as there are, I was never able to
>>> stop right on a boundary between efidisk.c and btrfs.c. So I gave up
>>> on that approach.
>>
>> If I remember correctly, in the previous email you reports that even a simple "ls" at the grub prompt raises an error.
>> So you could watch what happens when doing something simpler like "ls" or "ls (hd0)"
> 
> Errors with only ls.
> https://photos.app.goo.gl/BJpsLvwpL6yf19uj6

It seems that my supposition is true: the problem exists independently of btrfs.
It would be useful to see the debug (set debug=all + set pager=1) when doing "ls". It is a not so huge set of information (however it is composed by few pages).

> 
> Errors with ls per device
> https://photos.app.goo.gl/pgxQDdj1JDjq86mZ9

Grub sees hd0..hd3 as disks of ~120GB; to be exactly, the size is 125753602048 bytes. The error is reported as unable to access sector 0xea3bfc8, which is locate at 0xea3bf00*512=125753491456 byte, which is less than the previous value...

It seems that  GRUB is correct in complaining. It is trying to access a valid disk location which return an error.
Why grub is trying to access this location ? My supposition is that grub is trying to probe a filesystem (or a partition type...)

The problem seems to be related to the first 4 disks, which have all the same size and are "phantom" disks...
May be that the problem is that GRUB incorrectly detects disks ?
> 
> But without rebooting, just repeating the ls for the same devices, I
> don't get the error for hd4 again.
> https://photos.app.goo.gl/M6yraHfgfAsMigaP8

My understanding is that GRUB tried to load some external modules (zfs, ufs2, ...) without success. However this tentative was attempted only the first time. This could explain the fact that the error appeared only one time.
> 
>>From the first ls, it shows GPT on hd5, shouldn't 'ls (hd5)' report
> GPT rather than no file system? gdisk finds no problem with the GPT on
> /dev/sda which is hd5.

It seems no
-----------------------------------------------------------------------
                             GNU GRUB  version 2.03

    Minimal BASH-like line editing is supported. For the first word, TAB
    lists possible command completions. Anywhere else TAB lists possible
    device or file completions.


grub> ls
(proc) (hd11) (hd13) (hd14) (hd15) (hd19) (hd20) (hd31) (hd31,msdos1) (hd32) (h
d32,msdos2) (hd32,msdos1) (hd51) (hd52) (hd53) (hd61) (hd62) (hd63) (hd64) (hd7
1) (hd72) (hd73) (hd74) (hd99) (hd99,gpt2) (hd99,gpt1) (host) (md/0)
grub> ls (hd99)
Device hd99: No known filesystem detected - Sector size 512B - Total size
10485760KiB
grub>
-----------------------------------------------------------------------

> 
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: GRUB bug with Btrfs multiple devices
  2019-11-28 21:57               ` Goffredo Baroncelli
@ 2019-11-29 17:57                 ` Chris Murphy
  2019-11-29 19:54                   ` Goffredo Baroncelli
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Murphy @ 2019-11-29 17:57 UTC (permalink / raw)
  To: Goffredo Baroncelli; +Cc: Chris Murphy, Btrfs BTRFS

On Thu, Nov 28, 2019 at 2:57 PM Goffredo Baroncelli <kreijack@inwind.it> wrote:
>
> It seems that my supposition is true: the problem exists independently of btrfs.
> It would be useful to see the debug (set debug=all + set pager=1) when doing "ls". It is a not so huge set of information (however it is composed by few pages).

OK I did debug=all on the grub command line instead of in the
grub.cfg, and it's much more manageable.
https://photos.app.goo.gl/75Lbobg39R4D9QUk6

It's a very strange coincidence that these errors only began soon
after the Btrfs volume becomes a two device fs. I forgot to mention
that while grub.cfg is on hfsplus, Fedora GRUB now uses blscfg.mod by
default which goes looking for BLS snippets, which happen to be on
/boot/loader/entries, which is on Btrfs. So even drawing the GRUB menu
does in fact need to read from the 2 device Btrfs.

> Grub sees hd0..hd3 as disks of ~120GB; to be exactly, the size is 125753602048 bytes. The error is reported as unable to access sector 0xea3bfc8, which is locate at 0xea3bf00*512=125753491456 byte, which is less than the previous value...

Looks to me that hd0, hd1, hd2, hd3, hd4 are all phantom devices. hd5
is the SSD, /dev/sda. cd0 is the empty dvd-rom drive.


>
> It seems that  GRUB is correct in complaining. It is trying to access a valid disk location which return an error.
> Why grub is trying to access this location ? My supposition is that grub is trying to probe a filesystem (or a partition type...)
>
> The problem seems to be related to the first 4 disks, which have all the same size and are "phantom" disks...
> May be that the problem is that GRUB incorrectly detects disks ?
> >
> > But without rebooting, just repeating the ls for the same devices, I
> > don't get the error for hd4 again.
> > https://photos.app.goo.gl/M6yraHfgfAsMigaP8
>
> My understanding is that GRUB tried to load some external modules (zfs, ufs2, ...) without success. However this tentative was attempted only the first time. This could explain the fact that the error appeared only one time.

These errors may be misleading because the Fedora grubx64.efi doesn't
contain them, and I've only copied a few GRUB modules from
/usr/lib/grub/x86_64-efi to /boot/efi/EFI/fedora/x86_64-efi

The default installation on Fedora doesn't copy external modules to
the ESP at all, so only the ones already in the grubx64.efi are
available.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: GRUB bug with Btrfs multiple devices
  2019-11-29 17:57                 ` Chris Murphy
@ 2019-11-29 19:54                   ` Goffredo Baroncelli
  2019-11-29 21:17                     ` Chris Murphy
  0 siblings, 1 reply; 23+ messages in thread
From: Goffredo Baroncelli @ 2019-11-29 19:54 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On 29/11/2019 18.57, Chris Murphy wrote:
> On Thu, Nov 28, 2019 at 2:57 PM Goffredo Baroncelli <kreijack@inwind.it> wrote:
>>
>> It seems that my supposition is true: the problem exists independently of btrfs.
>> It would be useful to see the debug (set debug=all + set pager=1) when doing "ls". It is a not so huge set of information (however it is composed by few pages).
> 
> OK I did debug=all on the grub command line instead of in the
> grub.cfg, and it's much more manageable.
> https://photos.app.goo.gl/75Lbobg39R4D9QUk6
> 
> It's a very strange coincidence that these errors only began soon
> after the Btrfs volume becomes a two device fs. I forgot to mention
> that while grub.cfg is on hfsplus, Fedora GRUB now uses blscfg.mod by
> default which goes looking for BLS snippets, which happen to be on
> /boot/loader/entries, which is on Btrfs. So even drawing the GRUB menu
> does in fact need to read from the 2 device Btrfs.
> 
>> Grub sees hd0..hd3 as disks of ~120GB; to be exactly, the size is 125753602048 bytes. The error is reported as unable to access sector 0xea3bfc8, which is locate at 0xea3bf00*512=125753491456 byte, which is less than the previous value...
> 
> Looks to me that hd0, hd1, hd2, hd3, hd4 are all phantom devices. hd5
> is the SSD, /dev/sda. cd0 is the empty dvd-rom drive.

On the basis of these info, it seems that when "ls" is run the errors come from the fact that:
- hd0..hd3 return errors when read (even before the end of device)
- hd4 returns error, because its size is 0 (as reported by grub)

However for these error btrfs seems not to be related.

Regarding the error at boot time; my hypothesis is that during the loading of the kernel, grub tries (but I don't know why) to read from hd0..hd4 returning an error. Unfortunately the videos is not available anymore.

Could you be so kindly to share the picture of the loading of the kernel/initramdisk ? Something like:

grub> set debug=all
grub> initrd /boot/initrd....

I hope that the errors come quickly. I don't think that we need the pictuers of all the download. It would be sufficient the pictures until the first (or better second) error....

BR
G.Baroncelli



> 
>>
>> It seems that  GRUB is correct in complaining. It is trying to access a valid disk location which return an error.
>> Why grub is trying to access this location ? My supposition is that grub is trying to probe a filesystem (or a partition type...)
>>
>> The problem seems to be related to the first 4 disks, which have all the same size and are "phantom" disks...
>> May be that the problem is that GRUB incorrectly detects disks ?
>>>
>>> But without rebooting, just repeating the ls for the same devices, I
>>> don't get the error for hd4 again.
>>> https://photos.app.goo.gl/M6yraHfgfAsMigaP8
>>
>> My understanding is that GRUB tried to load some external modules (zfs, ufs2, ...) without success. However this tentative was attempted only the first time. This could explain the fact that the error appeared only one time.
> 
> These errors may be misleading because the Fedora grubx64.efi doesn't
> contain them, and I've only copied a few GRUB modules from
> /usr/lib/grub/x86_64-efi to /boot/efi/EFI/fedora/x86_64-efi
> 
> The default installation on Fedora doesn't copy external modules to
> the ESP at all, so only the ones already in the grubx64.efi are
> available.
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: GRUB bug with Btrfs multiple devices
  2019-11-26 23:53   ` Chris Murphy
  2019-11-27  1:35     ` Chris Murphy
  2019-11-27  6:09     ` Goffredo Baroncelli
@ 2019-11-29 20:50     ` Andrei Borzenkov
  2019-11-29 21:11       ` Chris Murphy
  2 siblings, 1 reply; 23+ messages in thread
From: Andrei Borzenkov @ 2019-11-29 20:50 UTC (permalink / raw)
  To: Chris Murphy, Goffredo Baroncelli; +Cc: Btrfs BTRFS

27.11.2019 02:53, Chris Murphy пишет:
> 
> The storage is one CD-ROM drive and one SSD drive. That's it. So I
> don't know why there's hd2 and hd3, it seems like GRUB is confused
> about how many drives there are, but that pre-dates this problem.
> 

grub enumerates what EFI provides. What "lsefi" in grub says?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: GRUB bug with Btrfs multiple devices
  2019-11-29 20:50     ` Andrei Borzenkov
@ 2019-11-29 21:11       ` Chris Murphy
  2019-11-30  7:31         ` Andrei Borzenkov
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Murphy @ 2019-11-29 21:11 UTC (permalink / raw)
  To: Andrei Borzenkov; +Cc: Chris Murphy, Goffredo Baroncelli, Btrfs BTRFS

On Fri, Nov 29, 2019 at 1:50 PM Andrei Borzenkov <arvidjaar@gmail.com> wrote:
>
> 27.11.2019 02:53, Chris Murphy пишет:
> >
> > The storage is one CD-ROM drive and one SSD drive. That's it. So I
> > don't know why there's hd2 and hd3, it seems like GRUB is confused
> > about how many drives there are, but that pre-dates this problem.
> >
>
> grub enumerates what EFI provides. What "lsefi" in grub says?

https://photos.app.goo.gl/pBxLJNdbzz6J9Vo56

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: GRUB bug with Btrfs multiple devices
  2019-11-29 19:54                   ` Goffredo Baroncelli
@ 2019-11-29 21:17                     ` Chris Murphy
  2019-11-30  7:33                       ` Andrei Borzenkov
  2019-11-30  8:12                       ` Goffredo Baroncelli
  0 siblings, 2 replies; 23+ messages in thread
From: Chris Murphy @ 2019-11-29 21:17 UTC (permalink / raw)
  To: Goffredo Baroncelli; +Cc: Btrfs BTRFS

On Fri, Nov 29, 2019 at 12:54 PM Goffredo Baroncelli <kreijack@inwind.it> wrote:
> Could you be so kindly to share the picture of the loading of the kernel/initramdisk ? Something like:
>
> grub> set debug=all
> grub> initrd /boot/initrd....
>
> I hope that the errors come quickly. I don't think that we need the pictuers of all the download. It would be sufficient the pictures until the first (or better second) error....

I paged through it for minutes, hundreds of pages and never found any
errors. But these are the first pages. This might actually be some
kind of search, not load of the kernel, because I pressed tab to
autocomplete. But it didn't autocomplete it immediately started
spitting out debug pages.

https://photos.app.goo.gl/kpa7dJ9spAy29yj26

Is it possible to redirect grub debug output to a FAT file?



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: GRUB bug with Btrfs multiple devices
  2019-11-29 21:11       ` Chris Murphy
@ 2019-11-30  7:31         ` Andrei Borzenkov
  2019-11-30 16:31           ` Chris Murphy
  0 siblings, 1 reply; 23+ messages in thread
From: Andrei Borzenkov @ 2019-11-30  7:31 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Goffredo Baroncelli, Btrfs BTRFS

30.11.2019 00:11, Chris Murphy пишет:
> On Fri, Nov 29, 2019 at 1:50 PM Andrei Borzenkov <arvidjaar@gmail.com> wrote:
>>
>> 27.11.2019 02:53, Chris Murphy пишет:
>>>
>>> The storage is one CD-ROM drive and one SSD drive. That's it. So I
>>> don't know why there's hd2 and hd3, it seems like GRUB is confused
>>> about how many drives there are, but that pre-dates this problem.
>>>
>>
>> grub enumerates what EFI provides. What "lsefi" in grub says?
> 
> https://photos.app.goo.gl/pBxLJNdbzz6J9Vo56
> 

These are vendor media device paths handles that are children of (some) 
disk partitions. GRUB already tries to skip such handles:


       /* Ghosts proudly presented by Apple.  */
       if (GRUB_EFI_DEVICE_PATH_TYPE (dp) == GRUB_EFI_MEDIA_DEVICE_PATH_TYPE
           && GRUB_EFI_DEVICE_PATH_SUBTYPE (dp)
           == GRUB_EFI_VENDOR_MEDIA_DEVICE_PATH_SUBTYPE)
         {
           grub_efi_vendor_device_path_t *vendor = 
(grub_efi_vendor_device_path_t *) dp;
           const struct grub_efi_guid apple = GRUB_EFI_VENDOR_APPLE_GUID;

           if (vendor->header.length == sizeof (*vendor)
               && grub_memcmp (&vendor->vendor_guid, &apple,
                               sizeof (vendor->vendor_guid)) == 0
               && find_parent_device (devices, d))
             continue;
         }

but these have different GUID. Google search comes with something 
hinting on Apple still (like 
https://www.macos86.it/topic/1136-asus-x202e-hm76-vs-n56vb-hm76/page/2/?tab=comments#comment-31186). 
  Device paths look like

PciRoot(0x0)\Pci(0x1F,0x2)\Sata(0x0,0xFFFF,0x0)\HD(4,GPT,A640EF60-F7E9-4945-81A9-B04CCE53EE97,0x176F4800,0x482FC88)\VenMedia(BE74FCF7-0B7C-49F3-9147-01F4042E6842,4F20CFA89785973FAAF730597BFC41BA)

where vendor GUID is BE74FCF7-0B7C-49F3-9147-01F4042E6842

So we have hard disk, then partition as child and then this vendor media 
as child of partition.

This should certainly be reported to grub list. What system is it - is 
it Apple?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: GRUB bug with Btrfs multiple devices
  2019-11-29 21:17                     ` Chris Murphy
@ 2019-11-30  7:33                       ` Andrei Borzenkov
  2019-11-30  8:12                       ` Goffredo Baroncelli
  1 sibling, 0 replies; 23+ messages in thread
From: Andrei Borzenkov @ 2019-11-30  7:33 UTC (permalink / raw)
  To: Chris Murphy, Goffredo Baroncelli; +Cc: Btrfs BTRFS

30.11.2019 00:17, Chris Murphy пишет:
> 
> Is it possible to redirect grub debug output to a FAT file?
> 

No.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: GRUB bug with Btrfs multiple devices
  2019-11-29 21:17                     ` Chris Murphy
  2019-11-30  7:33                       ` Andrei Borzenkov
@ 2019-11-30  8:12                       ` Goffredo Baroncelli
  2019-11-30 16:38                         ` Chris Murphy
  1 sibling, 1 reply; 23+ messages in thread
From: Goffredo Baroncelli @ 2019-11-30  8:12 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On 29/11/2019 22.17, Chris Murphy wrote:
> On Fri, Nov 29, 2019 at 12:54 PM Goffredo Baroncelli <kreijack@inwind.it> wrote:
>> Could you be so kindly to share the picture of the loading of the kernel/initramdisk ? Something like:
>>
>> grub> set debug=all
>> grub> initrd /boot/initrd....
>>
>> I hope that the errors come quickly. I don't think that we need the pictuers of all the download. It would be sufficient the pictures until the first (or better second) error....
> 
> I paged through it for minutes, hundreds of pages and never found any
> errors. But these are the first pages. This might actually be some
> kind of search, not load of the kernel, because I pressed tab to
> autocomplete. But it didn't autocomplete it immediately started
> spitting out debug pages.
> 
> https://photos.app.goo.gl/kpa7dJ9spAy29yj26
> 
> Is it possible to redirect grub debug output to a FAT file?

It is possible to redirect to a serial console ..
Did the machine has a serial port ?
> 
> 
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: GRUB bug with Btrfs multiple devices
  2019-11-30  7:31         ` Andrei Borzenkov
@ 2019-11-30 16:31           ` Chris Murphy
  2019-11-30 17:02             ` Andrei Borzenkov
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Murphy @ 2019-11-30 16:31 UTC (permalink / raw)
  To: Andrei Borzenkov; +Cc: Chris Murphy, Goffredo Baroncelli, Btrfs BTRFS

On Sat, Nov 30, 2019 at 12:31 AM Andrei Borzenkov <arvidjaar@gmail.com> wrote:
>
> 30.11.2019 00:11, Chris Murphy пишет:
> > On Fri, Nov 29, 2019 at 1:50 PM Andrei Borzenkov <arvidjaar@gmail.com> wrote:
> >>
> >> 27.11.2019 02:53, Chris Murphy пишет:
> >>>
> >>> The storage is one CD-ROM drive and one SSD drive. That's it. So I
> >>> don't know why there's hd2 and hd3, it seems like GRUB is confused
> >>> about how many drives there are, but that pre-dates this problem.
> >>>
> >>
> >> grub enumerates what EFI provides. What "lsefi" in grub says?
> >
> > https://photos.app.goo.gl/pBxLJNdbzz6J9Vo56
> >
>
> These are vendor media device paths handles that are children of (some)
> disk partitions. GRUB already tries to skip such handles:
>
>
>        /* Ghosts proudly presented by Apple.  */
>        if (GRUB_EFI_DEVICE_PATH_TYPE (dp) == GRUB_EFI_MEDIA_DEVICE_PATH_TYPE
>            && GRUB_EFI_DEVICE_PATH_SUBTYPE (dp)
>            == GRUB_EFI_VENDOR_MEDIA_DEVICE_PATH_SUBTYPE)
>          {
>            grub_efi_vendor_device_path_t *vendor =
> (grub_efi_vendor_device_path_t *) dp;
>            const struct grub_efi_guid apple = GRUB_EFI_VENDOR_APPLE_GUID;
>
>            if (vendor->header.length == sizeof (*vendor)
>                && grub_memcmp (&vendor->vendor_guid, &apple,
>                                sizeof (vendor->vendor_guid)) == 0
>                && find_parent_device (devices, d))
>              continue;
>          }
>
> but these have different GUID. Google search comes with something
> hinting on Apple still (like
> https://www.macos86.it/topic/1136-asus-x202e-hm76-vs-n56vb-hm76/page/2/?tab=comments#comment-31186).
>   Device paths look like
>
> PciRoot(0x0)\Pci(0x1F,0x2)\Sata(0x0,0xFFFF,0x0)\HD(4,GPT,A640EF60-F7E9-4945-81A9-B04CCE53EE97,0x176F4800,0x482FC88)\VenMedia(BE74FCF7-0B7C-49F3-9147-01F4042E6842,4F20CFA89785973FAAF730597BFC41BA)
>
> where vendor GUID is BE74FCF7-0B7C-49F3-9147-01F4042E6842
>
> So we have hard disk, then partition as child and then this vendor media
> as child of partition.
>
> This should certainly be reported to grub list. What system is it - is
> it Apple?

Yes. Macbook Pro 8,2 (2011). I'll report the phantom device problem to
grub-devel@

But still an open question is what's the instigator or secondary
factor because this wasn't happening before adding an unused but
already existing partition as a 2nd Btrfs device. Last time this
happened, all I did was remove the 2nd device and the problem went
away. I'm ready to try that again (remove the 2nd device) and see if
the problem goes away, but has enough information been collected about
the present state?


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: GRUB bug with Btrfs multiple devices
  2019-11-30  8:12                       ` Goffredo Baroncelli
@ 2019-11-30 16:38                         ` Chris Murphy
  0 siblings, 0 replies; 23+ messages in thread
From: Chris Murphy @ 2019-11-30 16:38 UTC (permalink / raw)
  To: Goffredo Baroncelli; +Cc: Chris Murphy, Btrfs BTRFS

On Sat, Nov 30, 2019 at 1:12 AM Goffredo Baroncelli <kreijack@inwind.it> wrote:
>
> On 29/11/2019 22.17, Chris Murphy wrote:
> > On Fri, Nov 29, 2019 at 12:54 PM Goffredo Baroncelli <kreijack@inwind.it> wrote:
> >> Could you be so kindly to share the picture of the loading of the kernel/initramdisk ? Something like:
> >>
> >> grub> set debug=all
> >> grub> initrd /boot/initrd....
> >>
> >> I hope that the errors come quickly. I don't think that we need the pictuers of all the download. It would be sufficient the pictures until the first (or better second) error....
> >
> > I paged through it for minutes, hundreds of pages and never found any
> > errors. But these are the first pages. This might actually be some
> > kind of search, not load of the kernel, because I pressed tab to
> > autocomplete. But it didn't autocomplete it immediately started
> > spitting out debug pages.
> >
> > https://photos.app.goo.gl/kpa7dJ9spAy29yj26
> >
> > Is it possible to redirect grub debug output to a FAT file?
>
> It is possible to redirect to a serial console ..
> Did the machine has a serial port ?

USB and wired ethernet.

So far I'm unable to reproduce in a VM with 2 partitions used for 2
device Btrfs. It might be a multi-layer bug where the 1st bug must
happen before the 2nd one has a chance of being revealed. The 1st bug
being the issue of phantom devices, which *are* present when the Btrfs
is a single device volume, but none of the errors show up in the
GRUB/pre-boot environment until the 2nd device was added (and new
kernel installed).

It's too bad GRUB doesn't have a debug option to write a file to a FAT
file system. The btrfs debug output is extremely long.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: GRUB bug with Btrfs multiple devices
  2019-11-30 16:31           ` Chris Murphy
@ 2019-11-30 17:02             ` Andrei Borzenkov
  2019-11-30 17:14               ` Chris Murphy
  0 siblings, 1 reply; 23+ messages in thread
From: Andrei Borzenkov @ 2019-11-30 17:02 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Goffredo Baroncelli, Btrfs BTRFS

30.11.2019 19:31, Chris Murphy пишет:
> On Sat, Nov 30, 2019 at 12:31 AM Andrei Borzenkov <arvidjaar@gmail.com> wrote:
>>
>> 30.11.2019 00:11, Chris Murphy пишет:
>>> On Fri, Nov 29, 2019 at 1:50 PM Andrei Borzenkov <arvidjaar@gmail.com> wrote:
>>>>
>>>> 27.11.2019 02:53, Chris Murphy пишет:
>>>>>
>>>>> The storage is one CD-ROM drive and one SSD drive. That's it. So I
>>>>> don't know why there's hd2 and hd3, it seems like GRUB is confused
>>>>> about how many drives there are, but that pre-dates this problem.
>>>>>
>>>>
>>>> grub enumerates what EFI provides. What "lsefi" in grub says?
>>>
>>> https://photos.app.goo.gl/pBxLJNdbzz6J9Vo56
>>>
>>
>> These are vendor media device paths handles that are children of (some)
>> disk partitions. GRUB already tries to skip such handles:
>>
>>
>>         /* Ghosts proudly presented by Apple.  */
>>         if (GRUB_EFI_DEVICE_PATH_TYPE (dp) == GRUB_EFI_MEDIA_DEVICE_PATH_TYPE
>>             && GRUB_EFI_DEVICE_PATH_SUBTYPE (dp)
>>             == GRUB_EFI_VENDOR_MEDIA_DEVICE_PATH_SUBTYPE)
>>           {
>>             grub_efi_vendor_device_path_t *vendor =
>> (grub_efi_vendor_device_path_t *) dp;
>>             const struct grub_efi_guid apple = GRUB_EFI_VENDOR_APPLE_GUID;
>>
>>             if (vendor->header.length == sizeof (*vendor)
>>                 && grub_memcmp (&vendor->vendor_guid, &apple,
>>                                 sizeof (vendor->vendor_guid)) == 0
>>                 && find_parent_device (devices, d))
>>               continue;
>>           }
>>
>> but these have different GUID. Google search comes with something
>> hinting on Apple still (like
>> https://www.macos86.it/topic/1136-asus-x202e-hm76-vs-n56vb-hm76/page/2/?tab=comments#comment-31186).
>>    Device paths look like
>>
>> PciRoot(0x0)\Pci(0x1F,0x2)\Sata(0x0,0xFFFF,0x0)\HD(4,GPT,A640EF60-F7E9-4945-81A9-B04CCE53EE97,0x176F4800,0x482FC88)\VenMedia(BE74FCF7-0B7C-49F3-9147-01F4042E6842,4F20CFA89785973FAAF730597BFC41BA)
>>
>> where vendor GUID is BE74FCF7-0B7C-49F3-9147-01F4042E6842
>>
>> So we have hard disk, then partition as child and then this vendor media
>> as child of partition.
>>
>> This should certainly be reported to grub list. What system is it - is
>> it Apple?
> 
> Yes. Macbook Pro 8,2 (2011). I'll report the phantom device problem to
> grub-devel@
> 
> But still an open question is what's the instigator or secondary
> factor because this wasn't happening before adding an unused but
> already existing partition as a 2nd Btrfs device.

GRUB is normally using hints - grub-install (and grub-mkconfig) tries to 
guess firmware device name. At boot time grub tries to access hinted 
device first, if it succeeds, it does not try anything else. With second 
btrfs partition grub needs to find second device at boot time so it now 
probes everything and hits those vendor media devices.

At least this explains what you see as well as ...

> Last time this
> happened, all I did was remove the 2nd device and the problem went
> away.

... this.

If you go in grub shell in this state (without errors), do you see those 
ghost devices?

> I'm ready to try that again (remove the 2nd device) and see if
> the problem goes away, but has enough information been collected about
> the present state?
> 
> 

If you are reasonably sure that all errors are related to those phantom 
devices - I would say yes, the reason for these phantom devices to exist 
is already clear.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: GRUB bug with Btrfs multiple devices
  2019-11-30 17:02             ` Andrei Borzenkov
@ 2019-11-30 17:14               ` Chris Murphy
  2019-11-30 17:34                 ` Chris Murphy
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Murphy @ 2019-11-30 17:14 UTC (permalink / raw)
  To: Andrei Borzenkov; +Cc: Chris Murphy, Goffredo Baroncelli, Btrfs BTRFS

On Sat, Nov 30, 2019 at 10:02 AM Andrei Borzenkov <arvidjaar@gmail.com> wrote:
>
> GRUB is normally using hints - grub-install (and grub-mkconfig) tries to
> guess firmware device name. At boot time grub tries to access hinted
> device first, if it succeeds, it does not try anything else. With second
> btrfs partition grub needs to find second device at boot time so it now
> probes everything and hits those vendor media devices.
>
> At least this explains what you see as well as ...
>
> > Last time this
> > happened, all I did was remove the 2nd device and the problem went
> > away.
>
> ... this.

Ahhh, that makes complete sense. So it is Btrfs multiple device
related, but not a bug in btrfs.c per se.

>
> If you go in grub shell in this state (without errors), do you see those
> ghost devices?

Uncertain. My vague memory recall is that yes they are there, because
I found their existence strange and different compared to pre-GRUB
2.02 where on this same system I'd see only either hd0 or hd1 (one
without the other), along with cd0. But something changed either with
a firmware update from Apple, or GRUB, that resulted in additional
GRUB devices, hd2, hd3, hd4, hd5.

> > I'm ready to try that again (remove the 2nd device) and see if
> > the problem goes away, but has enough information been collected about
> > the present state?
> >
> >
>
> If you are reasonably sure that all errors are related to those phantom
> devices - I would say yes, the reason for these phantom devices to exist
> is already clear.

I'll give it a shot in a bit.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: GRUB bug with Btrfs multiple devices
  2019-11-30 17:14               ` Chris Murphy
@ 2019-11-30 17:34                 ` Chris Murphy
  0 siblings, 0 replies; 23+ messages in thread
From: Chris Murphy @ 2019-11-30 17:34 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Andrei Borzenkov, Goffredo Baroncelli, Btrfs BTRFS

On Sat, Nov 30, 2019 at 10:14 AM Chris Murphy <lists@colorremedies.com> wrote:
>
> On Sat, Nov 30, 2019 at 10:02 AM Andrei Borzenkov <arvidjaar@gmail.com> wrote:
> >
> > GRUB is normally using hints - grub-install (and grub-mkconfig) tries to
> > guess firmware device name. At boot time grub tries to access hinted
> > device first, if it succeeds, it does not try anything else. With second
> > btrfs partition grub needs to find second device at boot time so it now
> > probes everything and hits those vendor media devices.
> >
> > At least this explains what you see as well as ...
> >
> > > Last time this
> > > happened, all I did was remove the 2nd device and the problem went
> > > away.
> >
> > ... this.
>
> Ahhh, that makes complete sense. So it is Btrfs multiple device
> related, but not a bug in btrfs.c per se.
>
> >
> > If you go in grub shell in this state (without errors), do you see those
> > ghost devices?
>
> Uncertain. My vague memory recall is that yes they are there, because
> I found their existence strange and different compared to pre-GRUB
> 2.02 where on this same system I'd see only either hd0 or hd1 (one
> without the other), along with cd0. But something changed either with
> a firmware update from Apple, or GRUB, that resulted in additional
> GRUB devices, hd2, hd3, hd4, hd5.

OK my vague memory  is correct with respect to phantom devices still
present after Brfs device removal.


>
> > > I'm ready to try that again (remove the 2nd device) and see if
> > > the problem goes away, but has enough information been collected about
> > > the present state?
> > >
> > >
> >
> > If you are reasonably sure that all errors are related to those phantom
> > devices - I would say yes, the reason for these phantom devices to exist
> > is already clear.
>
> I'll give it a shot in a bit.

Yep, the errors no longer happen; but phantom devices still there.
I've posted to grub-devel@ and updated it with this latest
information.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2019-11-30 17:36 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-26  4:05 GRUB bug with Btrfs multiple devices Chris Murphy
2019-11-26 21:11 ` Goffredo Baroncelli
2019-11-26 23:53   ` Chris Murphy
2019-11-27  1:35     ` Chris Murphy
2019-11-27  6:07       ` Goffredo Baroncelli
2019-11-28  0:42         ` Chris Murphy
2019-11-28 17:58           ` Goffredo Baroncelli
2019-11-28 20:05             ` Chris Murphy
2019-11-28 21:57               ` Goffredo Baroncelli
2019-11-29 17:57                 ` Chris Murphy
2019-11-29 19:54                   ` Goffredo Baroncelli
2019-11-29 21:17                     ` Chris Murphy
2019-11-30  7:33                       ` Andrei Borzenkov
2019-11-30  8:12                       ` Goffredo Baroncelli
2019-11-30 16:38                         ` Chris Murphy
2019-11-27  6:09     ` Goffredo Baroncelli
2019-11-29 20:50     ` Andrei Borzenkov
2019-11-29 21:11       ` Chris Murphy
2019-11-30  7:31         ` Andrei Borzenkov
2019-11-30 16:31           ` Chris Murphy
2019-11-30 17:02             ` Andrei Borzenkov
2019-11-30 17:14               ` Chris Murphy
2019-11-30 17:34                 ` Chris Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).