* [Bug 205197] kernel BUG at fs/ext4/extents_status.c:884
2019-10-15 12:38 [Bug 205197] New: kernel BUG at fs/ext4/extents_status.c:884 bugzilla-daemon
@ 2019-10-15 13:53 ` bugzilla-daemon
2019-10-17 6:50 ` bugzilla-daemon
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2019-10-15 13:53 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=205197
Theodore Tso (tytso@mit.edu) changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |tytso@mit.edu
--- Comment #1 from Theodore Tso (tytso@mit.edu) ---
It looks like the journal inode is corrupted but it shouldn't have BUG'ed on
you.
Can you reproduce this crash? If so, does this fairly simple patch cause it
not to BUG? (It will still fail to mount, but it shouldn't crash.)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index f203bf989a4c..d83b325fb54b 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -375,7 +375,7 @@ static int ext4_valid_extent(struct inode *inode, struct
ext4_extent *ext)
* - zero length
* - overflow/wrap-around
*/
- if (lblock + len <= lblock)
+ if (lblock + (ext4_lblk_t) len <= lblock)
return 0;
return ext4_data_block_valid(EXT4_SB(inode->i_sb), block, len);
}
Apologies if this is whitespace damaged, but t's a fairly simple edit to apply,
and I'm currently on a chromebook so I can't easily get a patch uploaded into
bugzilla.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Bug 205197] kernel BUG at fs/ext4/extents_status.c:884
2019-10-15 12:38 [Bug 205197] New: kernel BUG at fs/ext4/extents_status.c:884 bugzilla-daemon
2019-10-15 13:53 ` [Bug 205197] " bugzilla-daemon
@ 2019-10-17 6:50 ` bugzilla-daemon
2019-10-23 13:13 ` bugzilla-daemon
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2019-10-17 6:50 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=205197
--- Comment #2 from Arnaud Bétrémieux (arnaud@btmx.fr) ---
Sorry for the delay. I can confirm that although the partition still does not
mount, there is indeed no "BUG" with this patch applied.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug 205197] kernel BUG at fs/ext4/extents_status.c:884
2019-10-15 12:38 [Bug 205197] New: kernel BUG at fs/ext4/extents_status.c:884 bugzilla-daemon
2019-10-15 13:53 ` [Bug 205197] " bugzilla-daemon
2019-10-17 6:50 ` bugzilla-daemon
@ 2019-10-23 13:13 ` bugzilla-daemon
2019-10-24 6:58 ` bugzilla-daemon
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2019-10-23 13:13 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=205197
--- Comment #3 from Theodore Tso (tytso@mit.edu) ---
It's been pointed out to me that the patch in #1 should have been a no-op,
since a signed integer gets converted to be unsigned before it is added to an
unsigned int.
Can you confirm that without the patch, you can still reliably reproduce the
failure?
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug 205197] kernel BUG at fs/ext4/extents_status.c:884
2019-10-15 12:38 [Bug 205197] New: kernel BUG at fs/ext4/extents_status.c:884 bugzilla-daemon
` (2 preceding siblings ...)
2019-10-23 13:13 ` bugzilla-daemon
@ 2019-10-24 6:58 ` bugzilla-daemon
2024-03-04 4:17 ` bugzilla-daemon
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2019-10-24 6:58 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=205197
--- Comment #4 from Arnaud Bétrémieux (arnaud@btmx.fr) ---
I just tried it with the same kernel I used at the time of the bug report, and
no, I can't reproduce the failure anymore. I'm not sure what changed… sorry !
Strangely, I'm pretty sure I did test with and without the patch and it all
seemed to work at the time (BUG with no patch, no BUG with patch).
The partition is automounted, so maybe there was an auto-fsck at some point. I
should have thought of removing the automount to keep things testable.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug 205197] kernel BUG at fs/ext4/extents_status.c:884
2019-10-15 12:38 [Bug 205197] New: kernel BUG at fs/ext4/extents_status.c:884 bugzilla-daemon
` (3 preceding siblings ...)
2019-10-24 6:58 ` bugzilla-daemon
@ 2024-03-04 4:17 ` bugzilla-daemon
2024-03-14 21:31 ` bugzilla-daemon
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2024-03-04 4:17 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=205197
Antony Amburose (antony.ambrose@in.bosch.com) changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |antony.ambrose@in.bosch.com
--- Comment #5 from Antony Amburose (antony.ambrose@in.bosch.com) ---
Working with a 5.4.233 on aarch64 (Qualcomm/Android) platform we get the same
error. I am able to reliably reproduce this problem even after applying the
patch #1.Could you please let me know what additional information required ?
As the partition is FBE encrypted , I am not able to look at the hex dump to
check the nature corruption.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug 205197] kernel BUG at fs/ext4/extents_status.c:884
2019-10-15 12:38 [Bug 205197] New: kernel BUG at fs/ext4/extents_status.c:884 bugzilla-daemon
` (4 preceding siblings ...)
2024-03-04 4:17 ` bugzilla-daemon
@ 2024-03-14 21:31 ` bugzilla-daemon
2024-03-15 12:14 ` bugzilla-daemon
2024-03-15 17:03 ` bugzilla-daemon
7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2024-03-14 21:31 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=205197
--- Comment #6 from Theodore Tso (tytso@mit.edu) ---
The reason why no one has paid much attention to it is because the bug is
reported against a very old kernel, and upstream developers generally only
worry about the upstream kernel. Companies which insist on using old stable
kernels need to either engage paid support (e.g., contacting Red Hat if you are
using RHEL, etc.) or have their own kernel developers on staff to debug the
problem. Upstream developers are volunteers don't have the time to provide
free support to companies that are using old kernels. In general, at the
minimum we ask kernel engineers working on these kernels to try to reproduce
the problem on the latest upstream kernel, and if they can't.... maybe they
should work on using a newer upstream kernel, or they should figure out how to
backport fixes to old LTS kernels.
Also, it seems... weird.... that you can't look at the hex dump. The kernel is
able to mount the kernel, so you have access to the encryption key, or at
least, to a block device which has the encryption key set up by your user
space. So you should be able to run e2fsck -fn /dev/hdXX. This would help
provide a hint to the nature of the corruption, so that we could try to
reproduce the problem on an upstream kernel. But what we really don't have
time to do is to hand-hold users who don't know how to run fsck or apply kernel
patches, and trying to run test kernels.
If you can let us know what you actually can do, perhaps we might bend the
rules and try to give you some debugging help. But it will only be on a best
efforts basis, and when we have time, since after all, we're volunteers....
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug 205197] kernel BUG at fs/ext4/extents_status.c:884
2019-10-15 12:38 [Bug 205197] New: kernel BUG at fs/ext4/extents_status.c:884 bugzilla-daemon
` (5 preceding siblings ...)
2024-03-14 21:31 ` bugzilla-daemon
@ 2024-03-15 12:14 ` bugzilla-daemon
2024-03-15 17:03 ` bugzilla-daemon
7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2024-03-15 12:14 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=205197
--- Comment #7 from Antony Amburose (antony.ambrose@in.bosch.com) ---
Thank you for the response. I understand now , why there was not much attention
to this issue. Sorry for providing a minimal information in the first
communication...
We have back-ported the interesting changes from upstream (~70 of them) and
could still see the problem. I have reported the issue based on old kernel to
have the continuity. The old issue reported as well seen while mounting an
encrypted sd card and we have also seen this on an encrypted volume, but its
onboard storage. I thought it is logical to continue the discussion here as you
had given some debugging hints and issue did not progress as the old reporter
could not reproduce the problem but we could even after backporting the change.
I will create the bug based on the latest kernel in future. Thanks for the
hint.
The issue could be reproduced in a sequence where we interrupt the power. From
our decade long experience working with ext4, we have never seen an issue where
we could corrupt the ext4 volume in a way that it is not mountable by
executing a power loss sequence. That was main reason to report the issue to
the community experts. Ofcourse we have some paid support and also inhouse
kernel engineers, and I thought it is also better to report to the community
experts as the old bug is still open and we have a reliable reproduction .My
current assumption is either that we have a problem with our sequence or
problem with handling encrypted ext4 partition.
Regarding our knowhow and usage of tooling , we can work with the hex dump and
understand the ext4 disk layout and also work with the e2fsprogs to debug the
problem. Hence, we expect only some debugging hints and direction and hopefully
we try to solve the issues together.
As the device resets cyclically , we could not hook into the device and get the
/dev/sdXX . The existing tooling only get the encrypted data .We will try to
resolve this situation and somehow get the hex dump and provide more details on
the nature of corruption and will also provide the fsck output.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug 205197] kernel BUG at fs/ext4/extents_status.c:884
2019-10-15 12:38 [Bug 205197] New: kernel BUG at fs/ext4/extents_status.c:884 bugzilla-daemon
` (6 preceding siblings ...)
2024-03-15 12:14 ` bugzilla-daemon
@ 2024-03-15 17:03 ` bugzilla-daemon
7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2024-03-15 17:03 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=205197
--- Comment #8 from Theodore Tso (tytso@mit.edu) ---
One of the things I'd recommend doing is to grabbing a compressed raw e2image
dump. See the e2image man page for the the -r or the -Q option. It's not
hard to build e2image for Android. At one point I had added support for
building e2image in the AOSP build files (although this might be before the
AOSP build system has gotten updated, so it might require making some minor
work on your side; still, it's really not hard to build an AOSP image with
e2image and debugfs enabled, and if you're trying to do file system debugging
on Android, this is a Really Good idea.)
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread