* Regression in next with ext4 oops @ 2016-10-03 23:30 Tony Lindgren 2016-10-04 9:00 ` Jan Kara 0 siblings, 1 reply; 15+ messages in thread From: Tony Lindgren @ 2016-10-03 23:30 UTC (permalink / raw) To: Eric Biggers, Theodore Ts'o Cc: Andreas Dilger, Jaegeuk Kim, linux-ext4, linux-kernel, linux-omap Hi, I'm seeing a repeatable oops with Linux next while running update-initramfs, see below. I tried reverting commit 59aa5a3aeead ("fscrypto: make filename crypto functions return 0 on success") as that's the only commit changing ext4_htree_store_dirent, but that did not help. Anybody else seeing something like this? Regards, Tony 8< -------------- Unable to handle kernel NULL pointer dereference at virtual address 00000004 pgd = ee7e7280 [00000004] *pgd=00000000 Internal error: Oops: 205 [#1] SMP ARM Modules linked in: ledtrig_default_on ledtrig_heartbeat hid_generic usbhid smsc95xx smsc] CPU: 1 PID: 2299 Comm: mkinitramfs Not tainted 4.8.0-rc8-next-20160930+ #974 Hardware name: Generic OMAP5 (Flattened Device Tree) task: ed0b8380 task.stack: ec216000 PC is at rb_insert_color+0x1c/0x1b8 LR is at ext4_htree_store_dirent+0xe0/0x104 pc : [<c0594930>] lr : [<c042aa70>] psr: 600e0013 sp : ec217db8 ip : ed177948 fp : c0a17618 r10: 00000001 r9 : ec217e08 r8 : ed177940 r7 : ec224b80 r6 : 2d7d188a r5 : 83aa4108 r4 : 00000000 r3 : 00000000 r2 : ed389e88 r1 : ec224b80 r0 : ed177948 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 30c5387d Table: ae7e7280 DAC: 55555555 Process mkinitramfs (pid: 2299, stack limit = 0xec216218) Stack: (0xec217db8 to 0xec218000) 7da0: 00000000 83aa4108 7dc0: 2d7d188a c042aa70 c0f1a628 ed80c040 ed80d6d0 ec217e60 c0f1a630 c0f1aff8 7de0: 00000001 c043ecc8 ec217e08 ed80c040 c0f1a000 00001000 00001628 00000000 7e00: 0000005c ebaf9c80 c0f1a630 0000000b ec224bc2 00000000 00000002 ed80d6d0 7e20: 00000000 ebaf9c80 00000000 ec217e60 ec217e70 c043fb8c 00000000 00000000 7e40: 00000006 024080c0 ec224ba0 ec216000 600e0013 c100253c c0f19014 00000002 7e60: 83aa4108 2d7d188a 00000004 ee7bb8b8 ed80c080 c0f19020 c0f19020 00000000 7e80: ec224b80 c028ac34 ab5552a8 ec224b80 ed80d800 ec217f70 ed80d6d0 ed80d770 7ea0: ebaf9c80 ed80d6d0 ee7b8000 c042a694 00000055 ed80d7a4 00000000 00000000 7ec0: 00000001 00000000 600e0013 c1093070 00000000 00000000 00000001 00000000 7ee0: 00000000 c03b49e8 00000000 00000000 00000000 600e0013 00000000 00020000 7f00: ed80d6d0 ed80d770 00000000 ed80d6d0 ec217f70 ed80d770 ec216000 ebaf9c80 7f20: 00000000 00000001 ec217f70 ed80d770 ec216000 00000000 7f5edef4 c03b4ae8 7f40: 7f60261c 7f602611 000000c5 7f617428 ebaf9c80 00008000 0000008d ebaf9c80 7f60: ec216000 00000000 7f5edef4 c03b4f80 c03b4b30 00000000 00000000 00000000 7f80: 7f617428 00000000 00008000 00000000 7f617408 7f60261c 7f617428 0000008d 7fa0: c02080c4 c0207f40 7f617408 7f60261c 00000003 7f617428 00008000 00000000 7fc0: 7f617408 7f60261c 7f617428 0000008d b6f98820 00000002 7f6004c8 7f5edef4 7fe0: 0000008d bed19808 b6edaf41 b6e836f6 600e0030 00000003 cf30eaa8 d31d966f [<c0594930>] (rb_insert_color) from [<c042aa70>] (ext4_htree_store_dirent+0xe0/0x104) [<c042aa70>] (ext4_htree_store_dirent) from [<c043ecc8>] (htree_dirblock_to_tree+0xb0/0x) [<c043ecc8>] (htree_dirblock_to_tree) from [<c043fb8c>] (ext4_htree_fill_tree+0x1c8/0x2a) [<c043fb8c>] (ext4_htree_fill_tree) from [<c042a694>] (ext4_readdir+0x62c/0x910) [<c042a694>] (ext4_readdir) from [<c03b4ae8>] (iterate_dir+0x14c/0x194) [<c03b4ae8>] (iterate_dir) from [<c03b4f80>] (SyS_getdents+0x7c/0x118) [<c03b4f80>] (SyS_getdents) from [<c0207f40>] (ret_fast_syscall+0x0/0x1c) Code: e5923000 e3130001 1a000062 e92d4070 (e593c004) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Regression in next with ext4 oops 2016-10-03 23:30 Regression in next with ext4 oops Tony Lindgren @ 2016-10-04 9:00 ` Jan Kara 2016-10-04 14:02 ` Theodore Ts'o ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Jan Kara @ 2016-10-04 9:00 UTC (permalink / raw) To: Tony Lindgren Cc: Eric Biggers, Theodore Ts'o, Andreas Dilger, Jaegeuk Kim, linux-ext4, linux-kernel, linux-omap, Al Viro [-- Attachment #1: Type: text/plain, Size: 4540 bytes --] Hi! On Mon 03-10-16 16:30:55, Tony Lindgren wrote: > I'm seeing a repeatable oops with Linux next while running > update-initramfs, see below. I tried reverting commit 59aa5a3aeead > ("fscrypto: make filename crypto functions return 0 on success") > as that's the only commit changing ext4_htree_store_dirent, but > that did not help. > > Anybody else seeing something like this? Never seen this but I suspect it is a fallout from Al's directory locking changes. In particular ext4_htree_fill_tree() builds rb-tree of found directory entries in file->private_data (and generally modifies the structure stored there) but after Al's changes we don't have exclusive access to struct file if I'm right so if two processes end up calling getdents() for the same 'struct file' we are doomed. That being said two getdents() calls for a single fd looks like a stupid things to do but I don't see anything that would prevent this. Does the attached patch fix the issue for you? If I'm right, better fix would be to exclude two ->readdir callbacks for one fd but that would be slightly more complicated patch... Honza > 8< -------------- > Unable to handle kernel NULL pointer dereference at virtual address 00000004 > pgd = ee7e7280 > [00000004] *pgd=00000000 > Internal error: Oops: 205 [#1] SMP ARM > Modules linked in: ledtrig_default_on ledtrig_heartbeat hid_generic usbhid smsc95xx smsc] > CPU: 1 PID: 2299 Comm: mkinitramfs Not tainted 4.8.0-rc8-next-20160930+ #974 > Hardware name: Generic OMAP5 (Flattened Device Tree) > task: ed0b8380 task.stack: ec216000 > PC is at rb_insert_color+0x1c/0x1b8 > LR is at ext4_htree_store_dirent+0xe0/0x104 > pc : [<c0594930>] lr : [<c042aa70>] psr: 600e0013 > sp : ec217db8 ip : ed177948 fp : c0a17618 > r10: 00000001 r9 : ec217e08 r8 : ed177940 > r7 : ec224b80 r6 : 2d7d188a r5 : 83aa4108 r4 : 00000000 > r3 : 00000000 r2 : ed389e88 r1 : ec224b80 r0 : ed177948 > Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user > Control: 30c5387d Table: ae7e7280 DAC: 55555555 > Process mkinitramfs (pid: 2299, stack limit = 0xec216218) > Stack: (0xec217db8 to 0xec218000) > 7da0: 00000000 83aa4108 > 7dc0: 2d7d188a c042aa70 c0f1a628 ed80c040 ed80d6d0 ec217e60 c0f1a630 c0f1aff8 > 7de0: 00000001 c043ecc8 ec217e08 ed80c040 c0f1a000 00001000 00001628 00000000 > 7e00: 0000005c ebaf9c80 c0f1a630 0000000b ec224bc2 00000000 00000002 ed80d6d0 > 7e20: 00000000 ebaf9c80 00000000 ec217e60 ec217e70 c043fb8c 00000000 00000000 > 7e40: 00000006 024080c0 ec224ba0 ec216000 600e0013 c100253c c0f19014 00000002 > 7e60: 83aa4108 2d7d188a 00000004 ee7bb8b8 ed80c080 c0f19020 c0f19020 00000000 > 7e80: ec224b80 c028ac34 ab5552a8 ec224b80 ed80d800 ec217f70 ed80d6d0 ed80d770 > 7ea0: ebaf9c80 ed80d6d0 ee7b8000 c042a694 00000055 ed80d7a4 00000000 00000000 > 7ec0: 00000001 00000000 600e0013 c1093070 00000000 00000000 00000001 00000000 > 7ee0: 00000000 c03b49e8 00000000 00000000 00000000 600e0013 00000000 00020000 > 7f00: ed80d6d0 ed80d770 00000000 ed80d6d0 ec217f70 ed80d770 ec216000 ebaf9c80 > 7f20: 00000000 00000001 ec217f70 ed80d770 ec216000 00000000 7f5edef4 c03b4ae8 > 7f40: 7f60261c 7f602611 000000c5 7f617428 ebaf9c80 00008000 0000008d ebaf9c80 > 7f60: ec216000 00000000 7f5edef4 c03b4f80 c03b4b30 00000000 00000000 00000000 > 7f80: 7f617428 00000000 00008000 00000000 7f617408 7f60261c 7f617428 0000008d > 7fa0: c02080c4 c0207f40 7f617408 7f60261c 00000003 7f617428 00008000 00000000 > 7fc0: 7f617408 7f60261c 7f617428 0000008d b6f98820 00000002 7f6004c8 7f5edef4 > 7fe0: 0000008d bed19808 b6edaf41 b6e836f6 600e0030 00000003 cf30eaa8 d31d966f > [<c0594930>] (rb_insert_color) from [<c042aa70>] (ext4_htree_store_dirent+0xe0/0x104) > [<c042aa70>] (ext4_htree_store_dirent) from [<c043ecc8>] (htree_dirblock_to_tree+0xb0/0x) > [<c043ecc8>] (htree_dirblock_to_tree) from [<c043fb8c>] (ext4_htree_fill_tree+0x1c8/0x2a) > [<c043fb8c>] (ext4_htree_fill_tree) from [<c042a694>] (ext4_readdir+0x62c/0x910) > [<c042a694>] (ext4_readdir) from [<c03b4ae8>] (iterate_dir+0x14c/0x194) > [<c03b4ae8>] (iterate_dir) from [<c03b4f80>] (SyS_getdents+0x7c/0x118) > [<c03b4f80>] (SyS_getdents) from [<c0207f40>] (ret_fast_syscall+0x0/0x1c) > Code: e5923000 e3130001 1a000062 e92d4070 (e593c004) > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Jan Kara <jack@suse.com> SUSE Labs, CR [-- Attachment #2: 0001-ext4-Use-exclusive-locking-for-ext4_readdir.patch --] [-- Type: text/x-patch, Size: 1089 bytes --] >From d13477d654e60ec4434c266d11828347a135ca32 Mon Sep 17 00:00:00 2001 From: Jan Kara <jack@suse.cz> Date: Tue, 4 Oct 2016 10:56:00 +0200 Subject: [PATCH] ext4: Use exclusive locking for ext4_readdir() We use file->private_data to track information for readdir calls. As such we cannot allow two readdir calls to happen at the same time for one struct file as they end up corrupting the information. For now just revert back to the old behavior where readdirs are synchronized. Signed-off-by: Jan Kara <jack@suse.cz> --- fs/ext4/dir.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c index 67415e0e6af0..5ee86243620b 100644 --- a/fs/ext4/dir.c +++ b/fs/ext4/dir.c @@ -651,7 +651,7 @@ int ext4_check_all_de(struct inode *dir, struct buffer_head *bh, void *buf, const struct file_operations ext4_dir_operations = { .llseek = ext4_dir_llseek, .read = generic_read_dir, - .iterate_shared = ext4_readdir, + .iterate = ext4_readdir, .unlocked_ioctl = ext4_ioctl, #ifdef CONFIG_COMPAT .compat_ioctl = ext4_compat_ioctl, -- 2.6.6 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: Regression in next with ext4 oops 2016-10-04 9:00 ` Jan Kara @ 2016-10-04 14:02 ` Theodore Ts'o 2016-10-04 14:44 ` Tony Lindgren 2016-10-04 14:59 ` Al Viro 2016-10-04 14:18 ` Tony Lindgren 2016-10-04 14:56 ` Al Viro 2 siblings, 2 replies; 15+ messages in thread From: Theodore Ts'o @ 2016-10-04 14:02 UTC (permalink / raw) To: Jan Kara Cc: Tony Lindgren, Eric Biggers, Andreas Dilger, Jaegeuk Kim, linux-ext4, linux-kernel, linux-omap, Al Viro On Tue, Oct 04, 2016 at 11:00:41AM +0200, Jan Kara wrote: > Never seen this but I suspect it is a fallout from Al's directory locking > changes. In particular ext4_htree_fill_tree() builds rb-tree of found > directory entries in file->private_data (and generally modifies the > structure stored there) but after Al's changes we don't have exclusive > access to struct file if I'm right so if two processes end up calling > getdents() for the same 'struct file' we are doomed. I haven't seen it either, and I've been doing a lot of testing on the ext4 test branch. So I'm guessing Tony has the only reliable repro for the problem at the moment. That being said, it shouldn't be that hard to create a test case for this and add it to xfstests. I'm pretty sure Jan is right about this, though, but it would be great to a get a quick confirmation from Tony if at all possible. - Ted ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Regression in next with ext4 oops 2016-10-04 14:02 ` Theodore Ts'o @ 2016-10-04 14:44 ` Tony Lindgren 2016-10-04 14:59 ` Al Viro 1 sibling, 0 replies; 15+ messages in thread From: Tony Lindgren @ 2016-10-04 14:44 UTC (permalink / raw) To: Theodore Ts'o, Jan Kara, Eric Biggers, Andreas Dilger, Jaegeuk Kim, linux-ext4, linux-kernel, linux-omap, Al Viro * Theodore Ts'o <tytso@mit.edu> [161004 07:04]: > On Tue, Oct 04, 2016 at 11:00:41AM +0200, Jan Kara wrote: > > Never seen this but I suspect it is a fallout from Al's directory locking > > changes. In particular ext4_htree_fill_tree() builds rb-tree of found > > directory entries in file->private_data (and generally modifies the > > structure stored there) but after Al's changes we don't have exclusive > > access to struct file if I'm right so if two processes end up calling > > getdents() for the same 'struct file' we are doomed. > > I haven't seen it either, and I've been doing a lot of testing on the > ext4 test branch. So I'm guessing Tony has the only reliable repro > for the problem at the moment. That being said, it shouldn't be that > hard to create a test case for this and add it to xfstests. > > I'm pretty sure Jan is right about this, though, but it would be great > to a get a quick confirmation from Tony if at all possible. Looks like Jan's patch somehow makes things worse for me. Regards, Tony ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Regression in next with ext4 oops 2016-10-04 14:02 ` Theodore Ts'o 2016-10-04 14:44 ` Tony Lindgren @ 2016-10-04 14:59 ` Al Viro 2016-10-04 19:07 ` Theodore Ts'o 2016-10-04 19:16 ` Tony Lindgren 1 sibling, 2 replies; 15+ messages in thread From: Al Viro @ 2016-10-04 14:59 UTC (permalink / raw) To: Theodore Ts'o, Jan Kara, Tony Lindgren, Eric Biggers, Andreas Dilger, Jaegeuk Kim, linux-ext4, linux-kernel, linux-omap On Tue, Oct 04, 2016 at 10:02:31AM -0400, Theodore Ts'o wrote: > On Tue, Oct 04, 2016 at 11:00:41AM +0200, Jan Kara wrote: > > Never seen this but I suspect it is a fallout from Al's directory locking > > changes. In particular ext4_htree_fill_tree() builds rb-tree of found > > directory entries in file->private_data (and generally modifies the > > structure stored there) but after Al's changes we don't have exclusive > > access to struct file if I'm right so if two processes end up calling > > getdents() for the same 'struct file' we are doomed. > > I haven't seen it either, and I've been doing a lot of testing on the > ext4 test branch. So I'm guessing Tony has the only reliable repro > for the problem at the moment. That being said, it shouldn't be that > hard to create a test case for this and add it to xfstests. > > I'm pretty sure Jan is right about this, though, but it would be great > to a get a quick confirmation from Tony if at all possible. Jan is wrong - we do have per-struct-file serialization for getdents() et.al. It might be a race between getdents() on *different* struct file for the same directory, but ->private_data is not a problem. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Regression in next with ext4 oops 2016-10-04 14:59 ` Al Viro @ 2016-10-04 19:07 ` Theodore Ts'o 2016-10-04 19:19 ` Tony Lindgren 2016-10-04 19:16 ` Tony Lindgren 1 sibling, 1 reply; 15+ messages in thread From: Theodore Ts'o @ 2016-10-04 19:07 UTC (permalink / raw) To: Al Viro Cc: Jan Kara, Tony Lindgren, Eric Biggers, Andreas Dilger, Jaegeuk Kim, linux-ext4, linux-kernel, linux-omap On Tue, Oct 04, 2016 at 03:59:15PM +0100, Al Viro wrote: > Jan is wrong - we do have per-struct-file serialization for getdents() > et.al. It might be a race between getdents() on *different* struct > file for the same directory, but ->private_data is not a problem. So the rb_insert_color() OOPS seems to indicate that the rbtree has gotten corrupted, and it hangs off of private_data. And I've just checked the ext4.git tree and I can't see any changes in the dev branch (which I was just about to push to Linus) that would impact the readdir path for non-encrypted directories. So if it's a race on ->private_data I'm not sure what else might be going wrong. Hmm... Tony, what version was the last good version where it wouldn't reproduce for you? v4.7? Does it reproduce for you on v4.8? Also, would you mind seeing if you can reproduce it on the dev branch of the ext4 git tree? git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git dev Thanks, - Ted ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Regression in next with ext4 oops 2016-10-04 19:07 ` Theodore Ts'o @ 2016-10-04 19:19 ` Tony Lindgren 0 siblings, 0 replies; 15+ messages in thread From: Tony Lindgren @ 2016-10-04 19:19 UTC (permalink / raw) To: Theodore Ts'o, Al Viro, Jan Kara, Eric Biggers, Andreas Dilger, Jaegeuk Kim, linux-ext4, linux-kernel, linux-omap * Theodore Ts'o <tytso@mit.edu> [161004 12:08]: > On Tue, Oct 04, 2016 at 03:59:15PM +0100, Al Viro wrote: > > Jan is wrong - we do have per-struct-file serialization for getdents() > > et.al. It might be a race between getdents() on *different* struct > > file for the same directory, but ->private_data is not a problem. > > So the rb_insert_color() OOPS seems to indicate that the rbtree has > gotten corrupted, and it hangs off of private_data. Yes so it seems.. > And I've just checked the ext4.git tree and I can't see any changes in > the dev branch (which I was just about to push to Linus) that would > impact the readdir path for non-encrypted directories. So if it's a > race on ->private_data I'm not sure what else might be going wrong. > > Hmm... Tony, what version was the last good version where it wouldn't > reproduce for you? v4.7? Does it reproduce for you on v4.8? Well I just found it's caused by my commit d776fc86b82f ("wlcore: sdio: Populate config firmware data") and has nothing to do with ext4, see the mail I just posted. > Also, would you mind seeing if you can reproduce it on the dev > branch of the ext4 git tree? > > git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git dev Seems like there's nothing wrong with ext4, this is probably some memory corruption issue. Thanks everybody for help and sorry for the false ext4 alert. Regards, Tony ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Regression in next with ext4 oops 2016-10-04 14:59 ` Al Viro 2016-10-04 19:07 ` Theodore Ts'o @ 2016-10-04 19:16 ` Tony Lindgren 2016-10-04 19:29 ` Tony Lindgren 1 sibling, 1 reply; 15+ messages in thread From: Tony Lindgren @ 2016-10-04 19:16 UTC (permalink / raw) To: Al Viro Cc: Theodore Ts'o, Jan Kara, Eric Biggers, Andreas Dilger, Jaegeuk Kim, linux-ext4, linux-kernel, linux-omap, Kalle Valo Hi, * Al Viro <viro@ZenIV.linux.org.uk> [161004 08:00]: > On Tue, Oct 04, 2016 at 10:02:31AM -0400, Theodore Ts'o wrote: > > On Tue, Oct 04, 2016 at 11:00:41AM +0200, Jan Kara wrote: > > > Never seen this but I suspect it is a fallout from Al's directory locking > > > changes. In particular ext4_htree_fill_tree() builds rb-tree of found > > > directory entries in file->private_data (and generally modifies the > > > structure stored there) but after Al's changes we don't have exclusive > > > access to struct file if I'm right so if two processes end up calling > > > getdents() for the same 'struct file' we are doomed. > > > > I haven't seen it either, and I've been doing a lot of testing on the > > ext4 test branch. So I'm guessing Tony has the only reliable repro > > for the problem at the moment. That being said, it shouldn't be that > > hard to create a test case for this and add it to xfstests. > > > > I'm pretty sure Jan is right about this, though, but it would be great > > to a get a quick confirmation from Tony if at all possible. > > Jan is wrong - we do have per-struct-file serialization for getdents() > et.al. It might be a race between getdents() on *different* struct > file for the same directory, but ->private_data is not a problem. OK found the guilty person after git bisect and that's me. Git bisect points to commit d776fc86b82f ("wlcore: sdio: Populate config firmware data"), so adding Kalle to Cc. Looks like update-initramfs does rmmod of wlcore_sdio and that triggers some issue with the wlcore driver or with SDIO/MMC. Or maybe it's a memory corruption issue. I don't know yet exactly what's going on here yet but I plan to find out after some lunch. Regards, Tony ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Regression in next with ext4 oops 2016-10-04 19:16 ` Tony Lindgren @ 2016-10-04 19:29 ` Tony Lindgren 2016-10-04 19:41 ` Kalle Valo 0 siblings, 1 reply; 15+ messages in thread From: Tony Lindgren @ 2016-10-04 19:29 UTC (permalink / raw) To: Al Viro Cc: Theodore Ts'o, Jan Kara, Eric Biggers, Andreas Dilger, Jaegeuk Kim, linux-ext4, linux-kernel, linux-omap, Kalle Valo * Tony Lindgren <tony@atomide.com> [161004 12:17]: > Hi, > > * Al Viro <viro@ZenIV.linux.org.uk> [161004 08:00]: > > On Tue, Oct 04, 2016 at 10:02:31AM -0400, Theodore Ts'o wrote: > > > On Tue, Oct 04, 2016 at 11:00:41AM +0200, Jan Kara wrote: > > > > Never seen this but I suspect it is a fallout from Al's directory locking > > > > changes. In particular ext4_htree_fill_tree() builds rb-tree of found > > > > directory entries in file->private_data (and generally modifies the > > > > structure stored there) but after Al's changes we don't have exclusive > > > > access to struct file if I'm right so if two processes end up calling > > > > getdents() for the same 'struct file' we are doomed. > > > > > > I haven't seen it either, and I've been doing a lot of testing on the > > > ext4 test branch. So I'm guessing Tony has the only reliable repro > > > for the problem at the moment. That being said, it shouldn't be that > > > hard to create a test case for this and add it to xfstests. > > > > > > I'm pretty sure Jan is right about this, though, but it would be great > > > to a get a quick confirmation from Tony if at all possible. > > > > Jan is wrong - we do have per-struct-file serialization for getdents() > > et.al. It might be a race between getdents() on *different* struct > > file for the same directory, but ->private_data is not a problem. > > OK found the guilty person after git bisect and that's me. > > Git bisect points to commit d776fc86b82f ("wlcore: sdio: Populate config > firmware data"), so adding Kalle to Cc. > > Looks like update-initramfs does rmmod of wlcore_sdio and that triggers > some issue with the wlcore driver or with SDIO/MMC. Or maybe it's a memory > corruption issue. I don't know yet exactly what's going on here yet but > I plan to find out after some lunch. And the patch below seems to fix the issue as the driver is now using devm_kzalloc. Will do some more testing and then will post a proper patch. The same issue might be there for SPI glue also. Regards, Tony 8< ----------------- --- a/drivers/net/wireless/ti/wlcore/sdio.c +++ b/drivers/net/wireless/ti/wlcore/sdio.c @@ -391,7 +391,6 @@ static void wl1271_remove(struct sdio_func *func) pm_runtime_get_noresume(&func->dev); platform_device_unregister(glue->core); - kfree(glue); } #ifdef CONFIG_PM ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Regression in next with ext4 oops 2016-10-04 19:29 ` Tony Lindgren @ 2016-10-04 19:41 ` Kalle Valo 2016-10-04 20:34 ` Tony Lindgren 0 siblings, 1 reply; 15+ messages in thread From: Kalle Valo @ 2016-10-04 19:41 UTC (permalink / raw) To: Tony Lindgren Cc: Al Viro, Theodore Ts'o, Jan Kara, Eric Biggers, Andreas Dilger, Jaegeuk Kim, linux-ext4, linux-kernel, linux-omap Tony Lindgren <tony@atomide.com> writes: > * Tony Lindgren <tony@atomide.com> [161004 12:17]: >> Hi, >> >> * Al Viro <viro@ZenIV.linux.org.uk> [161004 08:00]: >> > On Tue, Oct 04, 2016 at 10:02:31AM -0400, Theodore Ts'o wrote: >> > > On Tue, Oct 04, 2016 at 11:00:41AM +0200, Jan Kara wrote: >> > > > Never seen this but I suspect it is a fallout from Al's directory locking >> > > > changes. In particular ext4_htree_fill_tree() builds rb-tree of found >> > > > directory entries in file->private_data (and generally modifies the >> > > > structure stored there) but after Al's changes we don't have exclusive >> > > > access to struct file if I'm right so if two processes end up calling >> > > > getdents() for the same 'struct file' we are doomed. >> > > >> > > I haven't seen it either, and I've been doing a lot of testing on the >> > > ext4 test branch. So I'm guessing Tony has the only reliable repro >> > > for the problem at the moment. That being said, it shouldn't be that >> > > hard to create a test case for this and add it to xfstests. >> > > >> > > I'm pretty sure Jan is right about this, though, but it would be great >> > > to a get a quick confirmation from Tony if at all possible. >> > >> > Jan is wrong - we do have per-struct-file serialization for getdents() >> > et.al. It might be a race between getdents() on *different* struct >> > file for the same directory, but ->private_data is not a problem. >> >> OK found the guilty person after git bisect and that's me. >> >> Git bisect points to commit d776fc86b82f ("wlcore: sdio: Populate config >> firmware data"), so adding Kalle to Cc. >> >> Looks like update-initramfs does rmmod of wlcore_sdio and that triggers >> some issue with the wlcore driver or with SDIO/MMC. Or maybe it's a memory >> corruption issue. I don't know yet exactly what's going on here yet but >> I plan to find out after some lunch. > > And the patch below seems to fix the issue as the driver is now > using devm_kzalloc. Will do some more testing and then will post > a proper patch. The same issue might be there for SPI glue also. This was already posted and you even acked it :) wlcore: sdio: drop kfree for memory allocated with devm_kzalloc https://patchwork.kernel.org/patch/9353985/ -- Kalle Valo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Regression in next with ext4 oops 2016-10-04 19:41 ` Kalle Valo @ 2016-10-04 20:34 ` Tony Lindgren 2016-10-05 9:51 ` Kalle Valo 0 siblings, 1 reply; 15+ messages in thread From: Tony Lindgren @ 2016-10-04 20:34 UTC (permalink / raw) To: Kalle Valo Cc: Al Viro, Theodore Ts'o, Jan Kara, Eric Biggers, Andreas Dilger, Jaegeuk Kim, linux-ext4, linux-kernel, linux-omap * Kalle Valo <kvalo@codeaurora.org> [161004 12:42]: > Tony Lindgren <tony@atomide.com> writes: > > > * Tony Lindgren <tony@atomide.com> [161004 12:17]: > >> Hi, > >> > >> * Al Viro <viro@ZenIV.linux.org.uk> [161004 08:00]: > >> > On Tue, Oct 04, 2016 at 10:02:31AM -0400, Theodore Ts'o wrote: > >> > > On Tue, Oct 04, 2016 at 11:00:41AM +0200, Jan Kara wrote: > >> > > > Never seen this but I suspect it is a fallout from Al's directory locking > >> > > > changes. In particular ext4_htree_fill_tree() builds rb-tree of found > >> > > > directory entries in file->private_data (and generally modifies the > >> > > > structure stored there) but after Al's changes we don't have exclusive > >> > > > access to struct file if I'm right so if two processes end up calling > >> > > > getdents() for the same 'struct file' we are doomed. > >> > > > >> > > I haven't seen it either, and I've been doing a lot of testing on the > >> > > ext4 test branch. So I'm guessing Tony has the only reliable repro > >> > > for the problem at the moment. That being said, it shouldn't be that > >> > > hard to create a test case for this and add it to xfstests. > >> > > > >> > > I'm pretty sure Jan is right about this, though, but it would be great > >> > > to a get a quick confirmation from Tony if at all possible. > >> > > >> > Jan is wrong - we do have per-struct-file serialization for getdents() > >> > et.al. It might be a race between getdents() on *different* struct > >> > file for the same directory, but ->private_data is not a problem. > >> > >> OK found the guilty person after git bisect and that's me. > >> > >> Git bisect points to commit d776fc86b82f ("wlcore: sdio: Populate config > >> firmware data"), so adding Kalle to Cc. > >> > >> Looks like update-initramfs does rmmod of wlcore_sdio and that triggers > >> some issue with the wlcore driver or with SDIO/MMC. Or maybe it's a memory > >> corruption issue. I don't know yet exactly what's going on here yet but > >> I plan to find out after some lunch. > > > > And the patch below seems to fix the issue as the driver is now > > using devm_kzalloc. Will do some more testing and then will post > > a proper patch. The same issue might be there for SPI glue also. > > This was already posted and you even acked it :) > > wlcore: sdio: drop kfree for memory allocated with devm_kzalloc > > https://patchwork.kernel.org/patch/9353985/ Heh well now we know :) Can you please apply it as it fixes a memory corruption issue? The SPI glue does not have this same issue. Regards, Tony ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Regression in next with ext4 oops 2016-10-04 20:34 ` Tony Lindgren @ 2016-10-05 9:51 ` Kalle Valo 0 siblings, 0 replies; 15+ messages in thread From: Kalle Valo @ 2016-10-05 9:51 UTC (permalink / raw) To: Tony Lindgren Cc: Al Viro, Theodore Ts'o, Jan Kara, Eric Biggers, Andreas Dilger, Jaegeuk Kim, linux-ext4, linux-kernel, linux-omap Tony Lindgren <tony@atomide.com> writes: > * Kalle Valo <kvalo@codeaurora.org> [161004 12:42]: >> Tony Lindgren <tony@atomide.com> writes: >> >> > And the patch below seems to fix the issue as the driver is now >> > using devm_kzalloc. Will do some more testing and then will post >> > a proper patch. The same issue might be there for SPI glue also. >> >> This was already posted and you even acked it :) >> >> wlcore: sdio: drop kfree for memory allocated with devm_kzalloc >> >> https://patchwork.kernel.org/patch/9353985/ > > Heh well now we know :) Can you please apply it as it fixes a memory > corruption issue? Yeah, I'll apply it to wireless-drivers.git soon. > The SPI glue does not have this same issue. Good to know, thanks for checking. -- Kalle Valo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Regression in next with ext4 oops 2016-10-04 9:00 ` Jan Kara 2016-10-04 14:02 ` Theodore Ts'o @ 2016-10-04 14:18 ` Tony Lindgren 2016-10-04 14:56 ` Al Viro 2 siblings, 0 replies; 15+ messages in thread From: Tony Lindgren @ 2016-10-04 14:18 UTC (permalink / raw) To: Jan Kara Cc: Eric Biggers, Theodore Ts'o, Andreas Dilger, Jaegeuk Kim, linux-ext4, linux-kernel, linux-omap, Al Viro * Jan Kara <jack@suse.cz> [161004 02:01]: > Hi! > > On Mon 03-10-16 16:30:55, Tony Lindgren wrote: > > I'm seeing a repeatable oops with Linux next while running > > update-initramfs, see below. I tried reverting commit 59aa5a3aeead > > ("fscrypto: make filename crypto functions return 0 on success") > > as that's the only commit changing ext4_htree_store_dirent, but > > that did not help. > > > > Anybody else seeing something like this? > > Never seen this but I suspect it is a fallout from Al's directory locking > changes. In particular ext4_htree_fill_tree() builds rb-tree of found > directory entries in file->private_data (and generally modifies the > structure stored there) but after Al's changes we don't have exclusive > access to struct file if I'm right so if two processes end up calling > getdents() for the same 'struct file' we are doomed. OK.. > That being said two getdents() calls for a single fd looks like a stupid > things to do but I don't see anything that would prevent this. Does the > attached patch fix the issue for you? Looks like with your patch I saw the oops few times during kernel boot already, see below. And the errors seem more random now.. I've verified I can run mkinitramfs just fine with my curent for-next that's based on 4.8.0-rc6, so this does not seem to be any memory or MMC related issue. Trying to bisect this would be a pain as the error does not happen every time.. Regards, Tony 8< ------------------- Unable to handle kernel NULL pointer dereference at virtual address 00000004 pgd = c22a46c0 [00000004] *pgd=00000000 Internal error: Oops: 205 [#1] SMP ARM Modules linked in: usbkbd usb_f_rndis usb_f_ecm usb_f_acm u_ether usb_f_mass_storage usb] CPU: 1 PID: 364 Comm: startpar Not tainted 4.8.0-rc8-next-20161003+ #1005 Hardware name: Generic OMAP5 (Flattened Device Tree) task: c2090ec0 task.stack: c25b2000 PC is at rb_insert_color+0x1c/0x1b8 LR is at ext4_htree_store_dirent+0xe0/0x104 pc : [<c0594b10>] lr : [<c042ade0>] psr: 600f0013 sp : c25b3db0 ip : c2297bd8 fp : c0a17618 r10: 00000000 r9 : c25b3e00 r8 : c2297bd0 r7 : c2298f80 r6 : 79de0293 r5 : 998623c8 r4 : 00000000 r3 : 00000000 r2 : c2297c08 r1 : c2298f80 r0 : c2297bd8 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 30c5387d Table: 822a46c0 DAC: 55555555 Process startpar (pid: 364, stack limit = 0xc25b2218) Stack: (0xc25b3db0 to 0xc25b4000) 3da0: 00000000 998623c8 79de0293 c042ade0 3dc0: e92460b4 ed080200 ed084790 c25b3e58 e92460bc e9246ff8 00000000 c043ee9c 3de0: c25b3e00 ed080200 e9246000 00001000 000000b4 c028a6e4 00000009 ee48a400 3e00: e92460bc 0000000d c1025cb4 c2298f80 ed0848c0 ed084790 00000000 ee48a400 3e20: 00000000 c25b3e58 c2257800 c043fc14 00000000 00000000 00000006 024080c0 3e40: c2298fa0 00000000 600f0013 c100253c c10936b4 c02871ac 998623c8 79de0293 3e60: 00000004 c22570b8 c2298f80 c0383f40 030d8fb9 69db434f c2298f80 c20aa000 3e80: 7f5c2000 c2298f80 ed0848c0 c25b3f68 ed084790 ed084830 ee48a400 ed084790 3ea0: c2257800 c042aa04 00000000 00000000 600f0013 c10930b0 00000001 c028c230 3ec0: 00000001 00000000 00000000 00000000 00000000 00000000 00000800 600f0013 3ee0: 00000000 00000002 00000000 ed084830 00000000 00000001 c03b4e3c ed084830 3f00: ee48a400 00000000 bef92f0c c08d4530 00000001 ee48a400 00000000 00000000 3f20: c25b3f68 ed084830 ee48a400 00000000 bef92f0c c03b4e70 ee48a400 bef90800 3f40: 7f59d01d 7f5ba6f0 00008000 00000004 000000d9 ee48a400 ee48a400 00000000 3f60: bef92f0c c03b54a0 c03b5124 00000000 00000000 00000000 7f5ba6f0 00000000 3f80: 00008000 00000000 00000020 00000020 7f5ba6d0 7f5ba6f0 000000d9 c02080c4 3fa0: c25b2000 c0207f40 00000020 7f5ba6d0 00000004 7f5ba6f0 00008000 00000000 3fc0: 00000020 7f5ba6d0 7f5ba6f0 000000d9 b6f15850 00000000 00000000 bef92f0c 3fe0: 000000d9 bef9080c b6e57fa3 b6e006f6 200f0030 00000004 00000000 00150072 [<c0594b10>] (rb_insert_color) from [<c042ade0>] (ext4_htree_store_dirent+0xe0/0x104) [<c042ade0>] (ext4_htree_store_dirent) from [<c043ee9c>] (htree_dirblock_to_tree+0xb0/0x) [<c043ee9c>] (htree_dirblock_to_tree) from [<c043fc14>] (ext4_htree_fill_tree+0x7c/0x2a0) [<c043fc14>] (ext4_htree_fill_tree) from [<c042aa04>] (ext4_readdir+0x62c/0x910) [<c042aa04>] (ext4_readdir) from [<c03b4e70>] (iterate_dir+0xcc/0x194) [<c03b4e70>] (iterate_dir) from [<c03b54a0>] (SyS_getdents64+0x7c/0x10c) [<c03b54a0>] (SyS_getdents64) from [<c0207f40>] (ret_fast_syscall+0x0/0x1c) Code: e5923000 e3130001 1a000062 e92d4070 (e593c004) ---[ end trace 8970a95866a25898 ]--- ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Regression in next with ext4 oops 2016-10-04 9:00 ` Jan Kara 2016-10-04 14:02 ` Theodore Ts'o 2016-10-04 14:18 ` Tony Lindgren @ 2016-10-04 14:56 ` Al Viro 2016-10-05 5:44 ` Jan Kara 2 siblings, 1 reply; 15+ messages in thread From: Al Viro @ 2016-10-04 14:56 UTC (permalink / raw) To: Jan Kara Cc: Tony Lindgren, Eric Biggers, Theodore Ts'o, Andreas Dilger, Jaegeuk Kim, linux-ext4, linux-kernel, linux-omap On Tue, Oct 04, 2016 at 11:00:41AM +0200, Jan Kara wrote: > Hi! > > On Mon 03-10-16 16:30:55, Tony Lindgren wrote: > > I'm seeing a repeatable oops with Linux next while running > > update-initramfs, see below. I tried reverting commit 59aa5a3aeead > > ("fscrypto: make filename crypto functions return 0 on success") > > as that's the only commit changing ext4_htree_store_dirent, but > > that did not help. > > > > Anybody else seeing something like this? > > Never seen this but I suspect it is a fallout from Al's directory locking > changes. In particular ext4_htree_fill_tree() builds rb-tree of found > directory entries in file->private_data (and generally modifies the > structure stored there) but after Al's changes we don't have exclusive > access to struct file if I'm right so if two processes end up calling > getdents() for the same 'struct file' we are doomed. RTFS. We sure as hell *do* have exclusive access to struct file. See /* POSIX.1-2008/SUSv4 Section XSI 2.9.7 */ if (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)) f->f_mode |= FMODE_ATOMIC_POS; in do_dentry_open() as well as if (file && (file->f_mode & FMODE_ATOMIC_POS)) { if (file_count(file) > 1) { v |= FDPUT_POS_UNLOCK; mutex_lock(&file->f_pos_lock); } } in __fdget_pos() and f = fdget_pos(fd); if (!f.file) return -EBADF; in getdents(2). ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Regression in next with ext4 oops 2016-10-04 14:56 ` Al Viro @ 2016-10-05 5:44 ` Jan Kara 0 siblings, 0 replies; 15+ messages in thread From: Jan Kara @ 2016-10-05 5:44 UTC (permalink / raw) To: Al Viro Cc: Jan Kara, Tony Lindgren, Eric Biggers, Theodore Ts'o, Andreas Dilger, Jaegeuk Kim, linux-ext4, linux-kernel, linux-omap On Tue 04-10-16 15:56:39, Al Viro wrote: > On Tue, Oct 04, 2016 at 11:00:41AM +0200, Jan Kara wrote: > > Hi! > > > > On Mon 03-10-16 16:30:55, Tony Lindgren wrote: > > > I'm seeing a repeatable oops with Linux next while running > > > update-initramfs, see below. I tried reverting commit 59aa5a3aeead > > > ("fscrypto: make filename crypto functions return 0 on success") > > > as that's the only commit changing ext4_htree_store_dirent, but > > > that did not help. > > > > > > Anybody else seeing something like this? > > > > Never seen this but I suspect it is a fallout from Al's directory locking > > changes. In particular ext4_htree_fill_tree() builds rb-tree of found > > directory entries in file->private_data (and generally modifies the > > structure stored there) but after Al's changes we don't have exclusive > > access to struct file if I'm right so if two processes end up calling > > getdents() for the same 'struct file' we are doomed. > > RTFS. We sure as hell *do* have exclusive access to struct file. See > /* POSIX.1-2008/SUSv4 Section XSI 2.9.7 */ > if (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)) > f->f_mode |= FMODE_ATOMIC_POS; > in do_dentry_open() as well as > if (file && (file->f_mode & FMODE_ATOMIC_POS)) { > if (file_count(file) > 1) { > v |= FDPUT_POS_UNLOCK; > mutex_lock(&file->f_pos_lock); > } > } > in __fdget_pos() and > f = fdget_pos(fd); > if (!f.file) > return -EBADF; > in getdents(2). Yeah, sorry. I've misunderstood how the FMODE_ATOMIC_POS thing works... Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2016-10-05 9:52 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-10-03 23:30 Regression in next with ext4 oops Tony Lindgren 2016-10-04 9:00 ` Jan Kara 2016-10-04 14:02 ` Theodore Ts'o 2016-10-04 14:44 ` Tony Lindgren 2016-10-04 14:59 ` Al Viro 2016-10-04 19:07 ` Theodore Ts'o 2016-10-04 19:19 ` Tony Lindgren 2016-10-04 19:16 ` Tony Lindgren 2016-10-04 19:29 ` Tony Lindgren 2016-10-04 19:41 ` Kalle Valo 2016-10-04 20:34 ` Tony Lindgren 2016-10-05 9:51 ` Kalle Valo 2016-10-04 14:18 ` Tony Lindgren 2016-10-04 14:56 ` Al Viro 2016-10-05 5:44 ` Jan Kara
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.