linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Bug when mounting XFS with external SATA drives in USB enclosures
@ 2019-11-23  3:21 Pedro Ribeiro
  2019-11-23 16:56 ` Chris Murphy
  2019-11-23 18:26 ` Eric Sandeen
  0 siblings, 2 replies; 7+ messages in thread
From: Pedro Ribeiro @ 2019-11-23  3:21 UTC (permalink / raw)
  To: linux-xfs

[-- Attachment #1: Type: text/plain, Size: 2094 bytes --]

Hi,

I have been trying to find out the cause of a bug that's affecting all
my external hard drive backups.

I have three external drives, in different USB enclosures, with the same
configuration and the same problem.

Drive A: 2TB HDD, USB3 Seagate self enclosed drive
Drive B: 4TB HDD, USB3 Toshiba self enclosed drive
Drive C: 512MB SSD, Crucial MX500 with USB-C third party enclosure

All of the drives have a dm-crypt / LUKS on top, with a XFS partition
inside. Drive A is a few months old, Drive B is about 3 years old, drive
C about 1.5 years old. They are seldomly used (they're backup drives) so
they are all fine mechanically.

The problem is when I attach any of the drives, enter the LUKS password
and then try to mount, this happens:
[   66.039772] XFS (dm-0): Mounting V5 Filesystem
[   66.060934] XFS (dm-0): log recovery read I/O error at daddr 0x0 len
8 error -5
[   66.060939] XFS (dm-0): empty log check failed
[   66.060940] XFS (dm-0): log mount/recovery failed: error -5
[   66.061064] XFS (dm-0): log mount failed

No matter what I do, using all the recovery tools, etc, it's impossible
to mount...

The thing is that is there is NOTHING wrong with these drives. The above
happens when running my specific, stripped and locked down kernel config.

If I take Debian's 4.19 kernel config, put it on a 5.3.11 tree, run make
oldconfig and just answer the defaults on all prompts, all of the drives
above mount fine:
[   46.184068] XFS (dm-0): Mounting V5 Filesystem
[   46.412566] XFS (dm-0): Ending clean mount

I hit this problem recently when I moved from kernel 4.18.20, which I
was using for a long time, to 5.3.X. In kernel 4.18.20, I did not have
any problems with my specific stripped down config.

I have asked for help in IRC at #xfs, and one of the guys there (ailiop)
was very helpful in trying to track down the problem, but we ultimately
failed, hence why I'm asking for help here.

I'm attaching the kernel configs and the dmesg outputs. There is nothing
obvious in the kernel config diff that should make this happen... it's a
very weird bug.

Regards,
Pedro

[-- Attachment #2: bugs.tar.xz --]
[-- Type: application/x-xz, Size: 75304 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bug when mounting XFS with external SATA drives in USB enclosures
  2019-11-23  3:21 Bug when mounting XFS with external SATA drives in USB enclosures Pedro Ribeiro
@ 2019-11-23 16:56 ` Chris Murphy
  2019-11-23 18:12   ` Pedro Ribeiro
  2019-11-23 18:26 ` Eric Sandeen
  1 sibling, 1 reply; 7+ messages in thread
From: Chris Murphy @ 2019-11-23 16:56 UTC (permalink / raw)
  To: Pedro Ribeiro; +Cc: xfs list

On Fri, Nov 22, 2019 at 8:21 PM Pedro Ribeiro <pedrib@gmail.com> wrote:
>
> Hi,
>
> I have been trying to find out the cause of a bug that's affecting all
> my external hard drive backups.
>
> I have three external drives, in different USB enclosures, with the same
> configuration and the same problem.
>
> Drive A: 2TB HDD, USB3 Seagate self enclosed drive
> Drive B: 4TB HDD, USB3 Toshiba self enclosed drive
> Drive C: 512MB SSD, Crucial MX500 with USB-C third party enclosure
>
> All of the drives have a dm-crypt / LUKS on top, with a XFS partition
> inside. Drive A is a few months old, Drive B is about 3 years old, drive
> C about 1.5 years old. They are seldomly used (they're backup drives) so
> they are all fine mechanically.
>
> The problem is when I attach any of the drives, enter the LUKS password
> and then try to mount, this happens:
> [   66.039772] XFS (dm-0): Mounting V5 Filesystem
> [   66.060934] XFS (dm-0): log recovery read I/O error at daddr 0x0 len
> 8 error -5
> [   66.060939] XFS (dm-0): empty log check failed
> [   66.060940] XFS (dm-0): log mount/recovery failed: error -5
> [   66.061064] XFS (dm-0): log mount failed
>
> No matter what I do, using all the recovery tools, etc, it's impossible
> to mount...
>
> The thing is that is there is NOTHING wrong with these drives. The above
> happens when running my specific, stripped and locked down kernel config.
>
> If I take Debian's 4.19 kernel config, put it on a 5.3.11 tree, run make
> oldconfig and just answer the defaults on all prompts, all of the drives
> above mount fine:
> [   46.184068] XFS (dm-0): Mounting V5 Filesystem
> [   46.412566] XFS (dm-0): Ending clean mount
>
> I hit this problem recently when I moved from kernel 4.18.20, which I
> was using for a long time, to 5.3.X. In kernel 4.18.20, I did not have
> any problems with my specific stripped down config.
>
> I have asked for help in IRC at #xfs, and one of the guys there (ailiop)
> was very helpful in trying to track down the problem, but we ultimately
> failed, hence why I'm asking for help here.
>
> I'm attaching the kernel configs and the dmesg outputs. There is nothing
> obvious in the kernel config diff that should make this happen... it's a
> very weird bug.
>
> Regards,
> Pedro

What about checking for differences in kernel messages between the
stripped down and stocked kernel, during device discovery. That is
connect no drives, boot the stripped kernel with the problem, connect
one of the problem USB devices, record the kernel messages that
result. Repeat that with the stock Debian kernel that doesn't exhibit
the bug.

My guess is this is some obscure USB related bug. There are a ton of
bugs with USB enclosure firmware, controllers, and drivers.

Also, is this USB enclosure directly connected to the computer? Or to
a powered hub? I have inordinate problems with USB enclosures directly
connected to an Intel NUC, but when connected to a Dyconn USB hub with
external power source, the problems all go away. And my understanding
is the hub doesn't just act like a repeater. It pretty much rewrites
the entire stream. So there's something screwy going on either with
the Intel controller I have, or the USB-SATA bridge chip, that causes
confusing that the hub eliminates.

And it may be that your stripped down kernel has turned off some
obscure USB related error checking or mode switching that this
particular setup needs.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bug when mounting XFS with external SATA drives in USB enclosures
  2019-11-23 16:56 ` Chris Murphy
@ 2019-11-23 18:12   ` Pedro Ribeiro
  0 siblings, 0 replies; 7+ messages in thread
From: Pedro Ribeiro @ 2019-11-23 18:12 UTC (permalink / raw)
  To: Chris Murphy; +Cc: xfs list


On 23/11/2019 23:56, Chris Murphy wrote:
> 
> What about checking for differences in kernel messages between the
> stripped down and stocked kernel, during device discovery. That is
> connect no drives, boot the stripped kernel with the problem, connect
> one of the problem USB devices, record the kernel messages that
> result. Repeat that with the stock Debian kernel that doesn't exhibit
> the bug.

I'm not sure if you received it in the first email, but I have a zipfile
with the dmesg output of both the Debian config and my own config, as
well as the configs themselves.

If for some reason the mailing list didn't process the attachment, you
can download it from here:
https://gofile.io/?c=6TaB3p

Not sure if there a way to enable more verbose output?

> 
> My guess is this is some obscure USB related bug. There are a ton of
> bugs with USB enclosure firmware, controllers, and drivers.
> 

Possibly, although it affects 3 different enclosures, so it should not
be something enclosure specific, but affect a common layer.


> Also, is this USB enclosure directly connected to the computer? Or to
> a powered hub? I have inordinate problems with USB enclosures directly
> connected to an Intel NUC, but when connected to a Dyconn USB hub with
> external power source, the problems all go away. And my understanding
> is the hub doesn't just act like a repeater. It pretty much rewrites
> the entire stream. So there's something screwy going on either with
> the Intel controller I have, or the USB-SATA bridge chip, that causes
> confusing that the hub eliminates.

All connected directly to the computer, two via USB-3, one via USB-C,
same errors.


> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bug when mounting XFS with external SATA drives in USB enclosures
  2019-11-23  3:21 Bug when mounting XFS with external SATA drives in USB enclosures Pedro Ribeiro
  2019-11-23 16:56 ` Chris Murphy
@ 2019-11-23 18:26 ` Eric Sandeen
  2019-11-24  6:49   ` Pedro Ribeiro
  1 sibling, 1 reply; 7+ messages in thread
From: Eric Sandeen @ 2019-11-23 18:26 UTC (permalink / raw)
  To: Pedro Ribeiro, linux-xfs

On 11/22/19 9:21 PM, Pedro Ribeiro wrote:
> Hi,
> 
> I have been trying to find out the cause of a bug that's affecting all
> my external hard drive backups.
> 
> I have three external drives, in different USB enclosures, with the same
> configuration and the same problem.
> 
> Drive A: 2TB HDD, USB3 Seagate self enclosed drive
> Drive B: 4TB HDD, USB3 Toshiba self enclosed drive
> Drive C: 512MB SSD, Crucial MX500 with USB-C third party enclosure
> 
> All of the drives have a dm-crypt / LUKS on top, with a XFS partition
> inside. Drive A is a few months old, Drive B is about 3 years old, drive
> C about 1.5 years old. They are seldomly used (they're backup drives) so
> they are all fine mechanically.
> 
> The problem is when I attach any of the drives, enter the LUKS password
> and then try to mount, this happens:
> [   66.039772] XFS (dm-0): Mounting V5 Filesystem
> [   66.060934] XFS (dm-0): log recovery read I/O error at daddr 0x0 len
> 8 error -5
> [   66.060939] XFS (dm-0): empty log check failed
> [   66.060940] XFS (dm-0): log mount/recovery failed: error -5
> [   66.061064] XFS (dm-0): log mount failed

I assume that it used to work, right?  When did it stop working?
<reads further, sees 5.3>

> No matter what I do, using all the recovery tools, etc, it's impossible
> to mount...
> 
> The thing is that is there is NOTHING wrong with these drives. The above
> happens when running my specific, stripped and locked down kernel config.
> 
> If I take Debian's 4.19 kernel config, put it on a 5.3.11 tree, run make
> oldconfig and just answer the defaults on all prompts, all of the drives
> above mount fine:
> [   46.184068] XFS (dm-0): Mounting V5 Filesystem
> [   46.412566] XFS (dm-0): Ending clean mount

> I hit this problem recently when I moved from kernel 4.18.20, which I
> was using for a long time, to 5.3.X. In kernel 4.18.20, I did not have
> any problems with my specific stripped down config.

Could be related to memory alignment.  Commit:

commit f8f9ee479439c1be9e33c4404912a2a112c46200
Author: Dave Chinner <dchinner@redhat.com>
Date:   Mon Aug 26 12:08:39 2019 -0700

    xfs: add kmem_alloc_io()
    
    Memory we use to submit for IO needs strict alignment to the
    underlying driver contraints. Worst case, this is 512 bytes. Given
    that all allocations for IO are always a power of 2 multiple of 512
    bytes, the kernel heap provides natural alignment for objects of
    these sizes and that suffices.

went into kernel 5.4, and doesn't look like it's in the 5.3.x stable
stream.

I haven't looked very closely at your config deltas for what might change
alignment but it'd be worth giving:

f8f9ee479439 xfs: add kmem_alloc_io()
d916275aa4dd xfs: get allocation alignment from the buftarg
0ad95687c3ad xfs: add kmem allocation trace points

a try.

-Eric

> I have asked for help in IRC at #xfs, and one of the guys there (ailiop)
> was very helpful in trying to track down the problem, but we ultimately
> failed, hence why I'm asking for help here.
> 
> I'm attaching the kernel configs and the dmesg outputs. There is nothing
> obvious in the kernel config diff that should make this happen... it's a
> very weird bug.
> 
> Regards,
> Pedro
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bug when mounting XFS with external SATA drives in USB enclosures
  2019-11-23 18:26 ` Eric Sandeen
@ 2019-11-24  6:49   ` Pedro Ribeiro
  2019-11-24 19:16     ` Eric Sandeen
  0 siblings, 1 reply; 7+ messages in thread
From: Pedro Ribeiro @ 2019-11-24  6:49 UTC (permalink / raw)
  To: Eric Sandeen, linux-xfs



On 24/11/2019 01:26, Eric Sandeen wrote:

> I haven't looked very closely at your config deltas for what might change
> alignment but it'd be worth giving:
> 
> f8f9ee479439 xfs: add kmem_alloc_io()
> d916275aa4dd xfs: get allocation alignment from the buftarg
> 0ad95687c3ad xfs: add kmem allocation trace points
> 
> a try.
> 
> -Eric

Hi Eric,

That did the trick. Took me some time to resolve the rejects, but now
5.3.11 and 5.3.12 work like a charm.

While trying to track down the patches, I found your reply here:
https://bugzilla.redhat.com/show_bug.cgi?id=1762596

I ended up applying:
f8f9ee479439 xfs: add kmem_alloc_io()
d916275aa4dd xfs: get allocation alignment from the buftarg
0ad95687c3ad xfs: add kmem allocation trace points

And I don't know why at the time (I was sleepy), I ended up applying
this one too:
xfs: assure zeroed memory buffers for certain kmem allocations

I had to remove the second argument to kmem_alloc_io when applying this
last one, as kmem_alloc_io had two arguments in the 5.3.12 tree + those
3 patches above, instead of three arguments in the actual patch:
return kmem_alloc_io(BBTOB(nbblks), align_mask, KM_MAYFAIL | KM_ZERO);
return kmem_alloc_io(BBTOB(nbblks), KM_MAYFAIL | KM_ZERO);

Do you think it's safe to keep these 4 patches on top of the 5.3.12
tree? So far it all looks fine, filesystems mount and work fine.

Regards,
Pedro

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bug when mounting XFS with external SATA drives in USB enclosures
  2019-11-24  6:49   ` Pedro Ribeiro
@ 2019-11-24 19:16     ` Eric Sandeen
  2019-11-25  2:30       ` Pedro Ribeiro
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Sandeen @ 2019-11-24 19:16 UTC (permalink / raw)
  To: Pedro Ribeiro, linux-xfs

On 11/24/19 12:49 AM, Pedro Ribeiro wrote:
> 
> 
> On 24/11/2019 01:26, Eric Sandeen wrote:
> 
>> I haven't looked very closely at your config deltas for what might change
>> alignment but it'd be worth giving:
>>
>> f8f9ee479439 xfs: add kmem_alloc_io()
>> d916275aa4dd xfs: get allocation alignment from the buftarg
>> 0ad95687c3ad xfs: add kmem allocation trace points
>>
>> a try.
>>
>> -Eric
> 
> Hi Eric,
> 
> That did the trick. Took me some time to resolve the rejects, but now
> 5.3.11 and 5.3.12 work like a charm.
> 
> While trying to track down the patches, I found your reply here:
> https://bugzilla.redhat.com/show_bug.cgi?id=1762596
> 
> I ended up applying:
> f8f9ee479439 xfs: add kmem_alloc_io()
> d916275aa4dd xfs: get allocation alignment from the buftarg
> 0ad95687c3ad xfs: add kmem allocation trace points
> 
> And I don't know why at the time (I was sleepy), I ended up applying
> this one too:
> xfs: assure zeroed memory buffers for certain kmem allocations

that one's not needed for this problem.

> I had to remove the second argument to kmem_alloc_io when applying this
> last one, as kmem_alloc_io had two arguments in the 5.3.12 tree + those
> 3 patches above, instead of three arguments in the actual patch:

Hm that doesn't make sense; f8f9ee479439 introduces kmem_alloc_io
with 3 arguments.  2 arguments to kmem_alloc_io, missing the alignment
mask, would be a problem.

> return kmem_alloc_io(BBTOB(nbblks), align_mask, KM_MAYFAIL | KM_ZERO);
> return kmem_alloc_io(BBTOB(nbblks), KM_MAYFAIL | KM_ZERO);
> 
> Do you think it's safe to keep these 4 patches on top of the 5.3.12
> tree? So far it all looks fine, filesystems mount and work fine.

Yes, but ... they should probably be applied correctly.  A quick test here
seems to show the three I suggested apply to 5.3.12 cleanly.

-Eric

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bug when mounting XFS with external SATA drives in USB enclosures
  2019-11-24 19:16     ` Eric Sandeen
@ 2019-11-25  2:30       ` Pedro Ribeiro
  0 siblings, 0 replies; 7+ messages in thread
From: Pedro Ribeiro @ 2019-11-25  2:30 UTC (permalink / raw)
  To: Eric Sandeen, linux-xfs



On 25/11/2019 02:16, Eric Sandeen wrote:

> Hm that doesn't make sense; f8f9ee479439 introduces kmem_alloc_io
> with 3 arguments.  2 arguments to kmem_alloc_io, missing the alignment
> mask, would be a problem.
> 
>> return kmem_alloc_io(BBTOB(nbblks), align_mask, KM_MAYFAIL | KM_ZERO);
>> return kmem_alloc_io(BBTOB(nbblks), KM_MAYFAIL | KM_ZERO);
>>
>> Do you think it's safe to keep these 4 patches on top of the 5.3.12
>> tree? So far it all looks fine, filesystems mount and work fine.
> 
> Yes, but ... they should probably be applied correctly.  A quick test here
> seems to show the three I suggested apply to 5.3.12 cleanly.
> 
> -Eric
> 

You're right, my bad, I applied them in order and now they work fine. I
guess there's no point in fixing this in stable since 5.3 is not a long
term kernel and the fix is already in 5.4?

Regards,
Pedro

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-11-25  2:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-23  3:21 Bug when mounting XFS with external SATA drives in USB enclosures Pedro Ribeiro
2019-11-23 16:56 ` Chris Murphy
2019-11-23 18:12   ` Pedro Ribeiro
2019-11-23 18:26 ` Eric Sandeen
2019-11-24  6:49   ` Pedro Ribeiro
2019-11-24 19:16     ` Eric Sandeen
2019-11-25  2:30       ` Pedro Ribeiro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).