All of lore.kernel.org
 help / color / mirror / Atom feed
* open bugs found by fuzzing
@ 2016-07-14 21:10 Vegard Nossum
  2016-07-15 13:39 ` kernel BUG at fs/ext4/inode.c:3709! (Re: open bugs found by fuzzing) Vegard Nossum
  0 siblings, 1 reply; 6+ messages in thread
From: Vegard Nossum @ 2016-07-14 21:10 UTC (permalink / raw)
  To: linux-ext4

Hi all,

I've been doing some ext4 fuzzing with AFL lately and run into a number
of crashes/warnings. Below is a list of these present in a 100% vanilla
mainline kernel. I will keep debugging and submitting patches until the
list is empty. In the meantime, the list is a useful way to keep track
of each bug and gauge the overall progress.

If anybody thinks they know what causes a particular bug, I'm happy to
test patches or provide more info. The only thing I can't do is to post
full-blown disk images or reproducers. Also note that several of these
may actually be the same underlying bug.

1. kasan: GPF could be caused by NULL-ptr deref or user memory 
accessgeneral protection fault: 0000 [#1] KASAN
http://139.162.151.198/f/ext4/57be666646a37e9821d52bc64846a3b3b785ee7a

2. kernel BUG at fs/buffer.c:2994!
http://139.162.151.198/f/ext4/7df880da89c82579c15ca8bc786a3467ca9c47f7

3. kernel BUG at fs/ext4/inode.c:3709!
http://139.162.151.198/f/ext4/5bdefda69f39b2f2c56d9b67d5b7d9e2cc8dfd5f

4. kernel BUG at fs/ext4/mballoc.c:3188!
http://139.162.151.198/f/ext4/34284738d67f0405325b2c43211c56020b9d0211

5. kernel BUG at fs/ext4/mballoc.c:3518!
http://139.162.151.198/f/ext4/0f702e84173b87861c4ce226cc2e82f600ad9d0c

6. kernel BUG at fs/jbd2/commit.c:825!
http://139.162.151.198/f/ext4/3143febf7925bd1ea398bd1a775551133bd69ffd

7. WARNING: CPU: 0 PID: 58 at fs/ext4/ext4.h:2807 
ext4_block_bitmap_csum_set+0x358/0x600
http://139.162.151.198/f/ext4/9628c19aff0bbaaae4149a03486305c7f6cd7523

8. WARNING: CPU: 0 PID: 58 at fs/ext4/mballoc.c:3987 
ext4_discard_preallocations+0x6cb/0x8b0
http://139.162.151.198/f/ext4/0181e37a689dfcb8565695d93172e790a34a3d14

9. WARNING: CPU: 0 PID: 58 at fs/jbd2/transaction.c:293 
start_this_handle+0xab6/0xcf0
http://139.162.151.198/f/ext4/55c691ba260963ffe20b365298e1f79f3b81968a

10. WARNING: CPU: 0 PID: 58 at kernel/locking/mutex-debug.c:78 
debug_mutex_unlock+0x214/0x520
http://139.162.151.198/f/ext4/000ac1bce9ae7640565328ddcceb31a675e3052a

11. WARNING: CPU: 0 PID: 58 at lib/idr.c:401 idr_preload+0xec/0x110
http://139.162.151.198/f/ext4/7eace56beb912159fba1776ede9c2566f35f95ca

12. WARNING: CPU: 0 PID: 58 at lib/list_debug.c:36 __list_add+0x169/0x1c0
http://139.162.151.198/f/ext4/488a8e50b5137e01d1dd54e30e0e2fe34d8f0b27

13. WARNING: CPU: 0 PID: 58 at lib/list_debug.c:56 
__list_del_entry+0x135/0x1d0
http://139.162.151.198/f/ext4/2e2c6122422aa6007cec500846fe8f891e954fee

14. WARNING: CPU: 0 PID: 58 at lib/list_debug.c:59 
__list_del_entry+0x14f/0x1d0
http://139.162.151.198/f/ext4/1ac079bb08a23c32500cf5d4c29a29ca615f9295

15. WARNING: CPU: 0 PID: 58 at mm/slab_common.c:861 kmalloc_slab+0x8a/0x90
http://139.162.151.198/f/ext4/53b3aab7ddab0fb156047ea5cf72c359511f2726


Vegard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* kernel BUG at fs/ext4/inode.c:3709! (Re: open bugs found by fuzzing)
  2016-07-14 21:10 open bugs found by fuzzing Vegard Nossum
@ 2016-07-15 13:39 ` Vegard Nossum
  2016-07-15 17:24   ` Theodore Ts'o
  0 siblings, 1 reply; 6+ messages in thread
From: Vegard Nossum @ 2016-07-15 13:39 UTC (permalink / raw)
  To: linux-ext4, Michael Halcrow; +Cc: Ildar Muslukhov, Jaegeuk Kim

On 07/14/2016 11:10 PM, Vegard Nossum wrote:
> 3. kernel BUG at fs/ext4/inode.c:3709!
> http://139.162.151.198/f/ext4/5bdefda69f39b2f2c56d9b67d5b7d9e2cc8dfd5f

I don't see any evidence of memory corruption here, so this one seems
pretty straightforward: we have an encrypted orphan inode and when we
try to truncate it during the orphan list cleanup it results in a BUG
because we haven't loaded the encryption key for it.

The inode in question has ->i_ino == 16 so I don't think this has
anything to do with special inodes.

Something quick and dirty like this does solve the BUG_ON() for me, but
it looks a lot like papering over an underlying bug:

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5a6277d..794b33a 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3735,9 +3735,11 @@ static int __ext4_block_zero_page_range(handle_t 
*handle,
  		if (S_ISREG(inode->i_mode) &&
  		    ext4_encrypted_inode(inode)) {
  			/* We expect the key to be set. */
-			BUG_ON(!fscrypt_has_encryption_key(inode));
-			BUG_ON(blocksize != PAGE_SIZE);
-			WARN_ON_ONCE(fscrypt_decrypt_page(page));
+		        if (list_empty(&EXT4_I(inode)->i_orphan)) {
+				BUG_ON(!fscrypt_has_encryption_key(inode));
+				BUG_ON(blocksize != PAGE_SIZE);
+				WARN_ON_ONCE(fscrypt_decrypt_page(page));
+			}
  		}
  	}
  	if (ext4_should_journal_data(inode)) {

I'm a bit puzzled that we're actually creating a mapping and trying to
decrypt here in the first place, since if this is an orphan inode that
is being recovered at mount time it means that we know _for sure_ that
there is no existing memory mappings and we're truncating it to 0.

Anyway, adding some Ccs.


Vegard

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: kernel BUG at fs/ext4/inode.c:3709! (Re: open bugs found by fuzzing)
  2016-07-15 13:39 ` kernel BUG at fs/ext4/inode.c:3709! (Re: open bugs found by fuzzing) Vegard Nossum
@ 2016-07-15 17:24   ` Theodore Ts'o
  2016-07-15 17:57     ` Vegard Nossum
  0 siblings, 1 reply; 6+ messages in thread
From: Theodore Ts'o @ 2016-07-15 17:24 UTC (permalink / raw)
  To: Vegard Nossum; +Cc: linux-ext4, Michael Halcrow, Ildar Muslukhov, Jaegeuk Kim

On Fri, Jul 15, 2016 at 03:39:19PM +0200, Vegard Nossum wrote:
> 
> I'm a bit puzzled that we're actually creating a mapping and trying to
> decrypt here in the first place, since if this is an orphan inode that
> is being recovered at mount time it means that we know _for sure_ that
> there is no existing memory mappings and we're truncating it to 0.

There are times when we need to make sure i_size is truncated down
(and/or blocks are removed) if we crash in the middle of an operation
that for whatever reason, spans multiple trnsactions.

The simplest such example is truncating down to a non-zero i_size.

If your proposed patch to ext4_block_zero_page_range() helps, then
presumably we're *not* truncating down to zero, but instead truncating
to some non-zero size.  Solving this problem is going to be a bit
tricky.  We could try zeroing the data page at the very begionning of
the truncate operation, but then we could run into the case where data
page write makes it to disk but the truncate oepration was never
committed before a crash.  This would result in a consistent file
system, but the data wouldn't be consistent (that is, it wouldn't be
either the contents before the truncate or after truncate, but
something in between --- so the truncate would no longer be atomic
with respect to crashes).

The other thing I've noticed is I think we're adding some inodes to
the orphan list which shouldn't be necessary if delayed allocation or
data=writeback is enabled.  And when we're deciding whether or not to
add the inode to orphan list we're testing against i_size instead of
i_disksize.  So there are some cases where I think we can avoid adding
inodes to the orphan list in the buffered write case.  And that is a
good thing not just because it avoids the problem with respect to
encrypted inodes, but the orphan list is also a scalability bottleneck
for ext4.

						- Ted

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: kernel BUG at fs/ext4/inode.c:3709! (Re: open bugs found by fuzzing)
  2016-07-15 17:24   ` Theodore Ts'o
@ 2016-07-15 17:57     ` Vegard Nossum
  2016-07-15 19:49       ` Theodore Ts'o
  0 siblings, 1 reply; 6+ messages in thread
From: Vegard Nossum @ 2016-07-15 17:57 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: linux-ext4, Michael Halcrow, Ildar Muslukhov, Jaegeuk Kim

On 07/15/2016 07:24 PM, Theodore Ts'o wrote:
> On Fri, Jul 15, 2016 at 03:39:19PM +0200, Vegard Nossum wrote:
>>
>> I'm a bit puzzled that we're actually creating a mapping and trying to
>> decrypt here in the first place, since if this is an orphan inode that
>> is being recovered at mount time it means that we know _for sure_ that
>> there is no existing memory mappings and we're truncating it to 0.
>
> There are times when we need to make sure i_size is truncated down
> (and/or blocks are removed) if we crash in the middle of an operation
> that for whatever reason, spans multiple trnsactions.
>
> The simplest such example is truncating down to a non-zero i_size.
>
> If your proposed patch to ext4_block_zero_page_range() helps, then
> presumably we're *not* truncating down to zero, but instead truncating
> to some non-zero size.

You're right; I just checked, it's truncating it to 4 bytes.

I thought all the inodes on the orphan list were completely unreachable,
but that's obviously not true following your explanations (thanks!) and
another peek at the cleanup function which only does the truncate in the
first place if ->i_nlink is non-zero, I missed that earlier. I guess
this comment confused me:

/* ext4_orphan_cleanup() walks a singly-linked list of inodes (starting at
  * the superblock) which were deleted from all directories, but held 
open by
  * a process at the time of a crash.

But in any case my simple patch is definitely the wrong thing to do.

Thanks,


Vegard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: kernel BUG at fs/ext4/inode.c:3709! (Re: open bugs found by fuzzing)
  2016-07-15 17:57     ` Vegard Nossum
@ 2016-07-15 19:49       ` Theodore Ts'o
  2016-07-16 16:15         ` Vegard Nossum
  0 siblings, 1 reply; 6+ messages in thread
From: Theodore Ts'o @ 2016-07-15 19:49 UTC (permalink / raw)
  To: Vegard Nossum; +Cc: linux-ext4, Michael Halcrow, Ildar Muslukhov, Jaegeuk Kim

On Fri, Jul 15, 2016 at 07:57:03PM +0200, Vegard Nossum wrote:
> I thought all the inodes on the orphan list were completely unreachable,
> but that's obviously not true following your explanations (thanks!) and
> another peek at the cleanup function which only does the truncate in the
> first place if ->i_nlink is non-zero, I missed that earlier. I guess
> this comment confused me:
> 
> /* ext4_orphan_cleanup() walks a singly-linked list of inodes (starting at
>  * the superblock) which were deleted from all directories, but held open by
>  * a process at the time of a crash.

Yeah, that comment is **badly** (as well over a decade) out-of-date.
Sorry.  :-/

The problem is fixing this is messy.  The problem is if we just set
i_size without zeroing out the bytes between i_size and the end of the
page, we're asking for trouble.  For example, you could think that
it's enough to have ext4_readpage() care of doing the zeroing after
the block is read from disk and then decrypted, but what if the user
does a sparse write beyond i_size?

So either we allow truncate(2) to be non-atomic with respect to
crashes for encrypted files, or alternatively we would have to set a
flag in the inode indicating that the bytes between i_size and the end
of the page must be zeroed before the file can be modified in any way,
or before an attempted O_DIRECT read of the last block, and then when
we read the inode from diks into the inode cache, if the bit is set
and we have the encryption key (which we would need if we want to
modify the file), we could take zeroing the tail end of the file then.

Yulch.   Definitely a bit of a hack.  :-(

					- Ted

P.S. Fortunately, this is a very rare case in practice, so I doubt we
would have hit it in a while, but it's definitely something we need to
fix, one way or another.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: kernel BUG at fs/ext4/inode.c:3709! (Re: open bugs found by fuzzing)
  2016-07-15 19:49       ` Theodore Ts'o
@ 2016-07-16 16:15         ` Vegard Nossum
  0 siblings, 0 replies; 6+ messages in thread
From: Vegard Nossum @ 2016-07-16 16:15 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: linux-ext4, Michael Halcrow, Ildar Muslukhov, Jaegeuk Kim

[-- Attachment #1: Type: text/plain, Size: 1833 bytes --]

On 07/15/2016 09:49 PM, Theodore Ts'o wrote:
> The problem is fixing this is messy.  The problem is if we just set
> i_size without zeroing out the bytes between i_size and the end of the
> page, we're asking for trouble.  For example, you could think that
> it's enough to have ext4_readpage() care of doing the zeroing after
> the block is read from disk and then decrypted, but what if the user
> does a sparse write beyond i_size?
>
> So either we allow truncate(2) to be non-atomic with respect to
> crashes for encrypted files, or alternatively we would have to set a
> flag in the inode indicating that the bytes between i_size and the end
> of the page must be zeroed before the file can be modified in any way,
> or before an attempted O_DIRECT read of the last block, and then when
> we read the inode from diks into the inode cache, if the bit is set
> and we have the encryption key (which we would need if we want to
> modify the file), we could take zeroing the tail end of the file then.

Can we delay the truncation/cleanup of that inode until the correct key
gets loaded? It's not like anybody could open (or otherwise operate on
the data of) that inode anyway until they key is present, right?

Something like the attached (very hacky) patch seems to allow my fuzzed
filesystem to mount without error and keeps the inode in question on the
on-disk orphan list:

+ mount (...)
EXT4-fs (loop0): 1 encrypted truncate delayed
EXT4-fs (loop0): recovery complete
EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: 
errors=remount-ro

You'd of course have to check whether the inode needs the
cleanup/truncate if it's ever accessed and return an error or something.
Maybe in ext4_iget()/ext4_lookup()?

It's probably a stupid idea for a number of reasons, but I'd be happy to
learn exactly why :-)


Vegard

[-- Attachment #2: ext4-encrypted-orphans.patch --]
[-- Type: text/x-patch, Size: 2265 bytes --]

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index c13a4e4..f7b662a 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2303,10 +2303,12 @@ static void ext4_orphan_cleanup(struct super_block *sb,
 				struct ext4_super_block *es)
 {
 	unsigned int s_flags = sb->s_flags;
-	int nr_orphans = 0, nr_truncates = 0;
+	int nr_orphans = 0, nr_truncates = 0, nr_encrypted = 0;
 #ifdef CONFIG_QUOTA
 	int i;
 #endif
+	LIST_HEAD(encrypted_orphans);
+
 	if (!es->s_last_orphan) {
 		jbd_debug(4, "no orphan inodes to clean up\n");
 		return;
@@ -2374,6 +2376,19 @@ static void ext4_orphan_cleanup(struct super_block *sb,
 			break;
 		}
 
+		if (inode->i_nlink && ext4_encrypted_inode(inode)) {
+			if (fscrypt_get_encryption_info(inode)
+				|| !fscrypt_has_encryption_key(inode))
+			{
+				/* Can't cleanup this yet */
+				list_add(&EXT4_I(inode)->i_orphan, &encrypted_orphans);
+				es->s_last_orphan = cpu_to_le32(NEXT_ORPHAN(inode));
+				NEXT_ORPHAN(inode) = 0;
+				nr_encrypted++;
+				continue;
+			}
+		}
+
 		list_add(&EXT4_I(inode)->i_orphan, &EXT4_SB(sb)->s_orphan);
 		dquot_initialize(inode);
 		if (inode->i_nlink) {
@@ -2400,6 +2415,24 @@ static void ext4_orphan_cleanup(struct super_block *sb,
 		iput(inode);  /* The delete magic happens here! */
 	}
 
+	{
+		/*
+		 * Put the delayed orphans back on the on-disk list. In order
+		 * to do the final cleanup of these inodes, the filesystem
+		 * must (currently) be remounted with the corresponding
+		 * encryption keys loaded.
+		 */
+		struct ext4_inode_info *ei, *tmp;
+		list_for_each_entry_safe(ei, tmp, &encrypted_orphans, i_orphan) {
+			struct inode *inode = &ei->vfs_inode;
+
+			NEXT_ORPHAN(inode) = es->s_last_orphan;
+			es->s_last_orphan = cpu_to_le32(inode->i_ino);
+			INIT_LIST_HEAD(&ei->i_orphan);
+			iput(inode);
+		}
+	}
+
 #define PLURAL(x) (x), ((x) == 1) ? "" : "s"
 
 	if (nr_orphans)
@@ -2408,6 +2441,10 @@ static void ext4_orphan_cleanup(struct super_block *sb,
 	if (nr_truncates)
 		ext4_msg(sb, KERN_INFO, "%d truncate%s cleaned up",
 		       PLURAL(nr_truncates));
+	if (nr_encrypted)
+		ext4_msg(sb, KERN_INFO, "%d encrypted truncate%s delayed",
+		       PLURAL(nr_encrypted));
+
 #ifdef CONFIG_QUOTA
 	/* Turn quotas off */
 	for (i = 0; i < EXT4_MAXQUOTAS; i++) {

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-07-16 16:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-14 21:10 open bugs found by fuzzing Vegard Nossum
2016-07-15 13:39 ` kernel BUG at fs/ext4/inode.c:3709! (Re: open bugs found by fuzzing) Vegard Nossum
2016-07-15 17:24   ` Theodore Ts'o
2016-07-15 17:57     ` Vegard Nossum
2016-07-15 19:49       ` Theodore Ts'o
2016-07-16 16:15         ` Vegard Nossum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.