On Jun 17, 2020, at 1:01 PM, Eric Sandeen wrote: > > We recently had a report of a panic in do_split; the filesystem in question > panicked a distribution kernel when trying to add a new directory entry; > the behavior/bug persists upstream. > > The directory block in question had lots of unused and un-coalesced > entries, like this, printed from the loop in ext4_insert_dentry(): > > [32778.024654] reclen 44 for name len 36 > [32778.028745] start: de ffff9f4cb5309800 top ffff9f4cb5309bd4 > [32778.034971] offset 0 nlen 28 rlen 40, rlen-nlen 12, reclen 44 name > [32778.042744] offset 40 nlen 28 rlen 28, rlen-nlen 0, reclen 44 name > [32778.050521] offset 68 nlen 32 rlen 32, rlen-nlen 0, reclen 44 name > [32778.058294] offset 100 nlen 28 rlen 28, rlen-nlen 0, reclen 44 name > [32778.066166] offset 128 nlen 28 rlen 28, rlen-nlen 0, reclen 44 name > [32778.074035] offset 156 nlen 28 rlen 28, rlen-nlen 0, reclen 44 name > [32778.081907] offset 184 nlen 24 rlen 24, rlen-nlen 0, reclen 44 name > [32778.089779] offset 208 nlen 36 rlen 36, rlen-nlen 0, reclen 44 name > [32778.097648] offset 244 nlen 12 rlen 12, rlen-nlen 0, reclen 44 name REDACTED > [32778.105227] offset 256 nlen 24 rlen 24, rlen-nlen 0, reclen 44 name > [32778.113099] offset 280 nlen 24 rlen 24, rlen-nlen 0, reclen 44 name REDACTED > [32778.122134] offset 304 nlen 20 rlen 20, rlen-nlen 0, reclen 44 name REDACTED > [32778.130780] offset 324 nlen 16 rlen 16, rlen-nlen 0, reclen 44 name REDACTED > [32778.138746] offset 340 nlen 24 rlen 24, rlen-nlen 0, reclen 44 name > [32778.146616] offset 364 nlen 28 rlen 28, rlen-nlen 0, reclen 44 name > [32778.154487] offset 392 nlen 24 rlen 24, rlen-nlen 0, reclen 44 name > [32778.162362] offset 416 nlen 16 rlen 16, rlen-nlen 0, reclen 44 name > ... > > the file we were trying to insert needed a record length of 44, and none of the > non-coalesced slots were big enough, so we failed and told do_split > to get to work. > > However, the sum of the non-empty entries didn't exceed half the block size, so > the loop in do_split() iterated over all of the entries, ended at "count," and > told us to split at (count - move) which is zero, and eventually: > > continued = hash2 == map[split - 1].hash; > > exploded on the negative index. > > It's an open question as to how this directory got into this format; I'm not > sure if this should ever happen or not. But at a minimum, I think we should > be defensive here, hence [PATCH 1/1] will do that as an expedient fix and > backportable patch for this situation. There may be some other underlying > probem which led to this directory structure if it's unexpected, and maybe that > can come as another patch if anyone can investigate. I thought this might be a bit of a conundrum. There is *supposed* to be merging of adjacent entries, but in some quick testing on RHEL7 (kernel 3.10.0-957.12.1.el7, same with Debian 4.14.79) shows this to be broken if the files are deleted in dirent order (which would seem to be the most common order): # mkdir tmp; cd tmp # touch file{1..100} # rm file{33,36,37,39,41,42,43,46,47} # debugfs -c -R "ls -ld tmp" /dev/sda1 366 100644 (1) 0 0 0 18-Jun-2020 18:43 file30 < 369> 0 (1) 0 0 file33 < 372> 0 (1) 0 0 file36 < 373> 0 (1) 0 0 file37 < 375> 0 (1) 0 0 file39 < 377> 0 (1) 0 0 file41 < 378> 0 (1) 0 0 file42 < 379> 0 (1) 0 0 file43 < 382> 0 (1) 0 0 file46 < 383> 0 (1) 0 0 file47 386 100644 (1) 0 0 0 18-Jun-2020 18:43 file50 Above shows (with modified debugfs to show reclen for deleted files) that the dirents are *not* combined. If the dirent *before* the other entries is deleted, then they are merged: # rm file30 < 366> 0 (1) 0 0 file30 < 369> 0 (1) 0 0 file33 < 372> 0 (1) 0 0 file36 < 373> 0 (1) 0 0 file37 < 375> 0 (1) 0 0 file39 < 377> 0 (1) 0 0 file41 < 378> 0 (1) 0 0 file42 < 379> 0 (1) 0 0 file43 < 382> 0 (1) 0 0 file46 < 383> 0 (1) 0 0 file47 386 100644 (1) 0 0 0 18-Jun-2020 18:43 file50 Cheers, Andreas