linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* e2fsck dir_info corruption
@ 2020-02-01  4:07 Andreas Dilger
  2020-02-01 17:02 ` Andreas Dilger
  0 siblings, 1 reply; 2+ messages in thread
From: Andreas Dilger @ 2020-02-01  4:07 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

[-- Attachment #1: Type: text/plain, Size: 5124 bytes --]

I've been trying to track down a failure with e2fsck on a large filesystem.  Running e2fsck-1.45.2 repeatedly and consistently reports an error in pass2:

Internal error: couldn't find dir_info for <ino>

The inode number is consistent across multiple runs, but examining it with
debugfs shows that it is a normal directory with no problems.  In order to
complete the e2fsck, we tried using "debugfs -w -R 'clri <ino>'" to erase
the inode, but this just shifted the error onto a slightly later directory
(which would itself now be reported consistently).

The filesystem is 6TB with 2.3B inodes, about 1.8B of which are in use.  The
inode numbers reporting problems are in the range of 1.2B.  e2fsck is using
about 45GB of RAM (with some swap on an NVMe SSD, so much faster than the
HDD the filesystem image is on).  It takes about 3-4h to hit this problem.

I was able to run e2fsck under gdb, though this slows it down significantly
(10-100x slower), if there are conditional breakpoints since they need to
be evaluated for each inode, which makes it impossible to use this way.

What I've discovered is quite interesting.  The ctx->dir_info->array is
filled in properly for the bad inode, and the array is OK at the start of
pass2 (2 is the root inode, and $66 holds the bad inode number):

    (gdb) p e2fsck_get_dir_info(ctx, 2)
    $171 = (struct dir_info *) 0x7ffabf2bd010
    (gdb) p *$171
    $172 = {ino = 2, dotdot = 0, parent = 2}
    (gdb) p $66
    $173 = 1222630490
    (gdb) p e2fsck_get_dir_info(ctx, $66)
    $174 = (struct dir_info *) 0x7ffb3d327f54
    (gdb) p *$174
    $175 = {ino = 1222630490, dotdot = 0, parent = 0}

The dir_info array itself is over 4GB in size (not sure if this is relevant
or not), with 380M directories, but the bad inode is only about half way
through the array ($140 is the index of the problematic entry):

    (gdb) p $140
    $176 = 176197275
    (gdb) p p ctx->dir_info->array[$140]
    $177 = {ino = 1222630490, dotdot = 0, parent = 0}
    (gdb) p *ctx->dir_info
    $178 = {count = 381539950, size = 381539965,
      array = 0x7ffabf2bd010, last_lookup = 0x0,
      tdb_fn = 0x0, tdb = 0x0}
    (gdb) p $178.count * sizeof(ctx->dir_info->array[0])
    $179 = 4578479400

I put a hardware watchpoint on the ".parent" and ".dotdot" fields of this
entry so that I could watch as e2fsck_dir_info_set_dotdot() updated it,
but I didn't think to put one on the ".ino" field, since I thought it would
be static.

    (gdb) info break
    Num     Type           Disp Enb Address    What
    15      hw watchpoint  keep y              -location $177->parent
    16      hw watchpoint  keep y              -location $177->dotdot

The watchpoint triggered, and saw that the entry was changed by qsort_r(),
which at first I thought "OK, the dir_info array needs to be sorted, because
a binary search is used on it", but in fact the array is *created* in order
during the inode table scan and does not need to be sorted.  As can be seen
from the stack, it is *another* array that is being sorted that overwrites
the dir_info entry:

    Hardware watchpoint 16: -location $177->dotdot

    Old value = 0
    New value = 1581603653
    0x0000003b2ea381f7 in _quicksort ()
       from /lib64/libc.so.6
    (gdb) p $141
    $179 = {ino = 1222630490, dotdot = 0, parent = 0}
    (gdb) bt
    #0  0x0000003b2ea381e1 in _quicksort ()
      from /lib64/libc.so.6
    #1  0x0000003b2ea38f82 in qsort_r ()
      from /lib64/libc.so.6
    #2  0x000000000045447b in ext2fs_dblist_sort2 (
        dblist=0x6c9a90,
        sortfunc=0x428c3d <special_dir_block_cmp>)
        at dblist.c:189
    #3  0x00000000004285bc in e2fsck_pass2 (
        ctx=0x6c4000) at pass2.c:180
    #4  0x000000000041646b in e2fsck_run (ctx=0x6c4000)
        at e2fsck.c:253
    #5  0x0000000000415210 in main (argc=4,
        argv=0x7fffffffe088) at unix.c:2047
    (gdb) p ctx->dir_info->array[$140]
    $207 = {ino = 0, dotdot = 1581603653, parent = 0}
    (gdb) f 3
    #3  0x00000000004285bc in e2fsck_pass2 (
        ctx=0x6c4000) at pass2.c:180
    180       ext2fs_dblist_sort2(fs->dblist, special_dir_block_cmp);

AFAIK, the ext2fs_dblist_sort2() is for the directory *blocks*, and should
not be changing the dir_info at all.  Is this a bug in qsort or glibc?

What I just noticed writing this email is that the fs->dblist.list address
is right in the middle of the dir_info array address range:

    (gdb) p *fs->dblist
    $210 = {magic = 2133571340, fs = 0x6c4460,
      size = 763079922, count = 388821313, sorted = 0,
      list = 0x7ffad011e010}
    (gdb) p &ctx->dir_info->array[0]
    $211 = (struct dir_info *) 0x7ffabf2bd010
    (gdb) p &ctx->dir_info->array[$140]
    $212 = (struct dir_info *) 0x7ffb3d327f54

which might explain why sorting dblist is messing with dir_info?  I don't
_think_ it is a problem with my build or swap, which is different from
the system that this was originally reproduced on.

Any thoughts on how to continue debugging this?

Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: e2fsck dir_info corruption
  2020-02-01  4:07 e2fsck dir_info corruption Andreas Dilger
@ 2020-02-01 17:02 ` Andreas Dilger
  0 siblings, 0 replies; 2+ messages in thread
From: Andreas Dilger @ 2020-02-01 17:02 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

[-- Attachment #1: Type: text/plain, Size: 3210 bytes --]

On Jan 31, 2020, at 9:07 PM, Andreas Dilger <adilger@dilger.ca> wrote:
> 
> I've been trying to track down a failure with e2fsck on a large filesystem.  Running e2fsck-1.45.2 repeatedly and consistently reports an error in pass2:
> 
> Internal error: couldn't find dir_info for <ino>

[snip]

> The dir_info array itself is over 4GB in size (not sure if this is relevant
> or not), with 380M directories, but the bad inode is only about half way
> through the array ($140 is the index of the problematic entry):

[snip]

> The watchpoint triggered, and saw that the entry was changed by qsort_r(),
> which at first I thought "OK, the dir_info array needs to be sorted, because
> a binary search is used on it", but in fact the array is *created* in order
> during the inode table scan and does not need to be sorted.  As can be seen
> from the stack, it is *another* array that is being sorted that overwrites
> the dir_info entry:

[snip]

> AFAIK, the ext2fs_dblist_sort2() is for the directory *blocks*, and should
> not be changing the dir_info at all.  Is this a bug in qsort or glibc?
> 
> What I just noticed writing this email is that the fs->dblist.list address
> is right in the middle of the dir_info array address range:
> 
>    (gdb) p *fs->dblist
>    $210 = {magic = 2133571340, fs = 0x6c4460,
>      size = 763079922, count = 388821313, sorted = 0,
>      list = 0x7ffad011e010}
>    (gdb) p &ctx->dir_info->array[0]
>    $211 = (struct dir_info *) 0x7ffabf2bd010
>    (gdb) p &ctx->dir_info->array[$140]
>    $212 = (struct dir_info *) 0x7ffb3d327f54
> 
> which might explain why sorting dblist is messing with dir_info?  I don't
> _think_ it is a problem with my build or swap, which is different from
> the system that this was originally reproduced on.

Just like any good mystery, the information I needed was there all along.

After abandoning the previous e2fsck-under-gdb run (which took me a couple
of days to get into the right state, so wasn't done lightly), I restarted
and was tracking the dblist and dir_info allocations, thinking that I might
catch when they became "bad" to due realloc() or similar.  In fact, these
allocations were bad from the beginning (similar to what was shown above,
with dblist in the middle of what should be the dir_info array).  This is
due to calls to e2fsck_allocate_memory() overflowing 4GB from the use of
"unsigned int size" as the argument.  That dooms the allocations from the
beginning, though it is very surprising that there wasn't massive corruption
visible earlier in the test run...

I think there are two options to fix this:
- change the e2fsck_allocate_memory() argument to "unsigned long size", and
  fix all of the callers to typecast to unsigned long before calling, but
  this is error prone if something is missed or a new allocation is added
- add a new e2fsck_allocate_array() function that takes size an count as
  arguments and does the calculation internally, which I think is more robust

Patches forthcoming, after I have verified that they are working (this may
take a few days due to the lengthy runtime for this filesystem).


Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-02-01 17:02 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-01  4:07 e2fsck dir_info corruption Andreas Dilger
2020-02-01 17:02 ` Andreas Dilger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).