linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* fs: gpf in simple_setattr
@ 2013-12-19  0:25 Sasha Levin
  2014-01-08 16:00 ` Sasha Levin
  0 siblings, 1 reply; 23+ messages in thread
From: Sasha Levin @ 2013-12-19  0:25 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, LKML

Hi all,

While fuzzing with trinity inside a KVM tools guest running latest -next kernel, I've stumbled on 
the following spew.

This happens when sb is dereferenced in __mark_inode_dirty():

                 if (sb->s_op->dirty_inode) <--- HERE
                         sb->s_op->dirty_inode(inode, flags);

'sb' is pointing to a memory full of poisoned memory (6b6b6b6b6b6b6b6b).

[  590.469520] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[  590.470737] Dumping ftrace buffer:
[  590.471331]    (ftrace buffer empty)
[  590.471903] Modules linked in:
[  590.472349] CPU: 3 PID: 9685 Comm: trinity-child97 Tainted: G        W 
3.13.0-rc4-next-20131218-sasha-00013-g2cebb9b-dirty #4156
[  590.472396] task: ffff8800bc520000 ti: ffff8800bc4fa000 task.ti: ffff8800bc4fa000
[  590.472396] RIP: 0010:[<ffffffff81302d34>]  [<ffffffff81302d34>] __mark_inode_dirty+0xd4/0x360
[  590.475691] RSP: 0018:ffff8800bc4fbda8  EFLAGS: 00010246
[  590.475691] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8802f9002530 RCX: 000000006b6b6b6b
[  590.475691] RDX: 0000000000000000 RSI: 0000000000000007 RDI: ffff8802f9002530
[  590.475691] RBP: ffff8800bc4fbdc8 R08: 0000000000000000 R09: 0000000000000000
[  590.475691] R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000007
[  590.475691] R13: 0000000000000000 R14: ffff8802f8795668 R15: ffff8802f9002530
[  590.475691] FS:  00007f9bba1b7700(0000) GS:ffff8801c8000000(0000) knlGS:0000000000000000
[  590.475691] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  590.475691] CR2: 00007f9bba17da44 CR3: 00000000bc4e6000 CR4: 00000000000006e0
[  590.475691] Stack:
[  590.475691]  ffff8802f9002530 ffff8800bc4fbe98 0000000000000000 ffff880161244000
[  590.475691]  ffff8800bc4fbdf8 ffffffff8130001b ffff8800bc4fbde8 0000000000001846
[  590.475691]  ffff8800bc4fbe98 0000000000000000 ffff8800bc4fbe78 ffffffff812f3016
[  590.475691] Call Trace:
[  590.475691]  [<ffffffff8130001b>] simple_setattr+0x5b/0x70
[  590.475691]  [<ffffffff812f3016>] notify_change+0x216/0x300
[  590.475691]  [<ffffffff812d0180>] ? zs_malloc+0x1b0/0x200
[  590.475691]  [<ffffffff812d0b15>] chown_common+0x135/0x1c0
[  590.475691]  [<ffffffff812d0c20>] SyS_fchown+0x80/0xd0
[  590.475691]  [<ffffffff843a6d50>] tracesys+0xdd/0xe2
[  590.494436] VFS: Warning: trinity-child15 using old stat() call. Recompile your binary.
[  590.475691] Code: c5 10 48 89 de ff d0 49 8b 45 00 48 85 c0 75 e7 65 ff 0c 25 5c da 00 00 0f 94 
c0 84 c0 74 08 e8 b3 33 d7 ff 0f 1f 00 49 8b 46 30 <48> 8b 40 10 48 85 c0 74 08 44 89 e6 48 89 df ff 
d0 8b 05 bd 6b
[  590.475691] RIP  [<ffffffff81302d34>] __mark_inode_dirty+0xd4/0x360
[  590.475691]  RSP <ffff8800bc4fbda8>


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: fs: gpf in simple_setattr
  2013-12-19  0:25 fs: gpf in simple_setattr Sasha Levin
@ 2014-01-08 16:00 ` Sasha Levin
  2014-03-01 20:05   ` Sasha Levin
  0 siblings, 1 reply; 23+ messages in thread
From: Sasha Levin @ 2014-01-08 16:00 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, LKML

ping? still happening in -next.

On 12/18/2013 07:25 PM, Sasha Levin wrote:
> Hi all,
>
> While fuzzing with trinity inside a KVM tools guest running latest -next kernel, I've stumbled on
> the following spew.
>
> This happens when sb is dereferenced in __mark_inode_dirty():
>
>                  if (sb->s_op->dirty_inode) <--- HERE
>                          sb->s_op->dirty_inode(inode, flags);
>
> 'sb' is pointing to a memory full of poisoned memory (6b6b6b6b6b6b6b6b).
>
> [  590.469520] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [  590.470737] Dumping ftrace buffer:
> [  590.471331]    (ftrace buffer empty)
> [  590.471903] Modules linked in:
> [  590.472349] CPU: 3 PID: 9685 Comm: trinity-child97 Tainted: G        W
> 3.13.0-rc4-next-20131218-sasha-00013-g2cebb9b-dirty #4156
> [  590.472396] task: ffff8800bc520000 ti: ffff8800bc4fa000 task.ti: ffff8800bc4fa000
> [  590.472396] RIP: 0010:[<ffffffff81302d34>]  [<ffffffff81302d34>] __mark_inode_dirty+0xd4/0x360
> [  590.475691] RSP: 0018:ffff8800bc4fbda8  EFLAGS: 00010246
> [  590.475691] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8802f9002530 RCX: 000000006b6b6b6b
> [  590.475691] RDX: 0000000000000000 RSI: 0000000000000007 RDI: ffff8802f9002530
> [  590.475691] RBP: ffff8800bc4fbdc8 R08: 0000000000000000 R09: 0000000000000000
> [  590.475691] R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000007
> [  590.475691] R13: 0000000000000000 R14: ffff8802f8795668 R15: ffff8802f9002530
> [  590.475691] FS:  00007f9bba1b7700(0000) GS:ffff8801c8000000(0000) knlGS:0000000000000000
> [  590.475691] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  590.475691] CR2: 00007f9bba17da44 CR3: 00000000bc4e6000 CR4: 00000000000006e0
> [  590.475691] Stack:
> [  590.475691]  ffff8802f9002530 ffff8800bc4fbe98 0000000000000000 ffff880161244000
> [  590.475691]  ffff8800bc4fbdf8 ffffffff8130001b ffff8800bc4fbde8 0000000000001846
> [  590.475691]  ffff8800bc4fbe98 0000000000000000 ffff8800bc4fbe78 ffffffff812f3016
> [  590.475691] Call Trace:
> [  590.475691]  [<ffffffff8130001b>] simple_setattr+0x5b/0x70
> [  590.475691]  [<ffffffff812f3016>] notify_change+0x216/0x300
> [  590.475691]  [<ffffffff812d0180>] ? zs_malloc+0x1b0/0x200
> [  590.475691]  [<ffffffff812d0b15>] chown_common+0x135/0x1c0
> [  590.475691]  [<ffffffff812d0c20>] SyS_fchown+0x80/0xd0
> [  590.475691]  [<ffffffff843a6d50>] tracesys+0xdd/0xe2
> [  590.494436] VFS: Warning: trinity-child15 using old stat() call. Recompile your binary.
> [  590.475691] Code: c5 10 48 89 de ff d0 49 8b 45 00 48 85 c0 75 e7 65 ff 0c 25 5c da 00 00 0f 94
> c0 84 c0 74 08 e8 b3 33 d7 ff 0f 1f 00 49 8b 46 30 <48> 8b 40 10 48 85 c0 74 08 44 89 e6 48 89 df ff
> d0 8b 05 bd 6b
> [  590.475691] RIP  [<ffffffff81302d34>] __mark_inode_dirty+0xd4/0x360
> [  590.475691]  RSP <ffff8800bc4fbda8>
>
>
> Thanks,
> Sasha


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: fs: gpf in simple_setattr
  2014-01-08 16:00 ` Sasha Levin
@ 2014-03-01 20:05   ` Sasha Levin
  2014-03-02  3:35     ` Linus Torvalds
  2014-03-03 21:40     ` Jan Kara
  0 siblings, 2 replies; 23+ messages in thread
From: Sasha Levin @ 2014-03-01 20:05 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, LKML, Linus Torvalds

ping again?

I've been working on it, but don't see an obvious issue.

It does look like an access to invalid memory easily doable from userspace, so it should probably 
get fixed soon...


Thanks,
Sasha

On 01/08/2014 11:00 AM, Sasha Levin wrote:
> ping? still happening in -next.
>
> On 12/18/2013 07:25 PM, Sasha Levin wrote:
>> Hi all,
>>
>> While fuzzing with trinity inside a KVM tools guest running latest -next kernel, I've stumbled on
>> the following spew.
>>
>> This happens when sb is dereferenced in __mark_inode_dirty():
>>
>>                  if (sb->s_op->dirty_inode) <--- HERE
>>                          sb->s_op->dirty_inode(inode, flags);
>>
>> 'sb' is pointing to a memory full of poisoned memory (6b6b6b6b6b6b6b6b).
>>
>> [  590.469520] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>> [  590.470737] Dumping ftrace buffer:
>> [  590.471331]    (ftrace buffer empty)
>> [  590.471903] Modules linked in:
>> [  590.472349] CPU: 3 PID: 9685 Comm: trinity-child97 Tainted: G        W
>> 3.13.0-rc4-next-20131218-sasha-00013-g2cebb9b-dirty #4156
>> [  590.472396] task: ffff8800bc520000 ti: ffff8800bc4fa000 task.ti: ffff8800bc4fa000
>> [  590.472396] RIP: 0010:[<ffffffff81302d34>]  [<ffffffff81302d34>] __mark_inode_dirty+0xd4/0x360
>> [  590.475691] RSP: 0018:ffff8800bc4fbda8  EFLAGS: 00010246
>> [  590.475691] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8802f9002530 RCX: 000000006b6b6b6b
>> [  590.475691] RDX: 0000000000000000 RSI: 0000000000000007 RDI: ffff8802f9002530
>> [  590.475691] RBP: ffff8800bc4fbdc8 R08: 0000000000000000 R09: 0000000000000000
>> [  590.475691] R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000007
>> [  590.475691] R13: 0000000000000000 R14: ffff8802f8795668 R15: ffff8802f9002530
>> [  590.475691] FS:  00007f9bba1b7700(0000) GS:ffff8801c8000000(0000) knlGS:0000000000000000
>> [  590.475691] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [  590.475691] CR2: 00007f9bba17da44 CR3: 00000000bc4e6000 CR4: 00000000000006e0
>> [  590.475691] Stack:
>> [  590.475691]  ffff8802f9002530 ffff8800bc4fbe98 0000000000000000 ffff880161244000
>> [  590.475691]  ffff8800bc4fbdf8 ffffffff8130001b ffff8800bc4fbde8 0000000000001846
>> [  590.475691]  ffff8800bc4fbe98 0000000000000000 ffff8800bc4fbe78 ffffffff812f3016
>> [  590.475691] Call Trace:
>> [  590.475691]  [<ffffffff8130001b>] simple_setattr+0x5b/0x70
>> [  590.475691]  [<ffffffff812f3016>] notify_change+0x216/0x300
>> [  590.475691]  [<ffffffff812d0180>] ? zs_malloc+0x1b0/0x200
>> [  590.475691]  [<ffffffff812d0b15>] chown_common+0x135/0x1c0
>> [  590.475691]  [<ffffffff812d0c20>] SyS_fchown+0x80/0xd0
>> [  590.475691]  [<ffffffff843a6d50>] tracesys+0xdd/0xe2
>> [  590.494436] VFS: Warning: trinity-child15 using old stat() call. Recompile your binary.
>> [  590.475691] Code: c5 10 48 89 de ff d0 49 8b 45 00 48 85 c0 75 e7 65 ff 0c 25 5c da 00 00 0f 94
>> c0 84 c0 74 08 e8 b3 33 d7 ff 0f 1f 00 49 8b 46 30 <48> 8b 40 10 48 85 c0 74 08 44 89 e6 48 89 df ff
>> d0 8b 05 bd 6b
>> [  590.475691] RIP  [<ffffffff81302d34>] __mark_inode_dirty+0xd4/0x360
>> [  590.475691]  RSP <ffff8800bc4fbda8>
>>
>>
>> Thanks,
>> Sasha
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: fs: gpf in simple_setattr
  2014-03-01 20:05   ` Sasha Levin
@ 2014-03-02  3:35     ` Linus Torvalds
  2014-03-03  2:01       ` Sasha Levin
  2014-03-03 21:40     ` Jan Kara
  1 sibling, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2014-03-02  3:35 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Al Viro, linux-fsdevel, LKML

On Sat, Mar 1, 2014 at 2:05 PM, Sasha Levin <sasha.levin@oracle.com> wrote:
> ping again?
>
> I've been working on it, but don't see an obvious issue.
>
> It does look like an access to invalid memory easily doable from userspace,
> so it should probably get fixed soon...

It doesn't happen in mainline? Any possibility that you could try to bisect it?

            Linus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: fs: gpf in simple_setattr
  2014-03-02  3:35     ` Linus Torvalds
@ 2014-03-03  2:01       ` Sasha Levin
  0 siblings, 0 replies; 23+ messages in thread
From: Sasha Levin @ 2014-03-03  2:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Al Viro, linux-fsdevel, LKML

On 03/01/2014 10:35 PM, Linus Torvalds wrote:
> On Sat, Mar 1, 2014 at 2:05 PM, Sasha Levin <sasha.levin@oracle.com> wrote:
>> ping again?
>>
>> I've been working on it, but don't see an obvious issue.
>>
>> It does look like an access to invalid memory easily doable from userspace,
>> so it should probably get fixed soon...
>
> It doesn't happen in mainline? Any possibility that you could try to bisect it?

It might be in mainline, it just happens once in a couple of days and since I'm not usually fuzzing 
mainline I can't say for sure.

I've tried bisecting but since I don't have a reliable way to reproduce it the bisection goes wrong 
pretty fast.

I've tried adding some debug code in, which clearly suggests that the object is gone, but I can't 
pinpoint to where it disappears.


Thanks,
Sasha


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: fs: gpf in simple_setattr
  2014-03-01 20:05   ` Sasha Levin
  2014-03-02  3:35     ` Linus Torvalds
@ 2014-03-03 21:40     ` Jan Kara
  2014-03-05  0:00       ` Sasha Levin
  1 sibling, 1 reply; 23+ messages in thread
From: Jan Kara @ 2014-03-03 21:40 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Al Viro, linux-fsdevel, LKML, Linus Torvalds

On Sat 01-03-14 15:05:21, Sasha Levin wrote:
> ping again?
> 
> I've been working on it, but don't see an obvious issue.
> 
> It does look like an access to invalid memory easily doable from
> userspace, so it should probably get fixed soon...
  Hum, can you maybe dump the name in dentry passed to simple_setattr()? Or
maybe even the whole path using dentry_path() (but not sure if that will
be workable on half-torn-down fs)? Maybe it will give us a hint at which
filesystem to look...

								Honza

> On 01/08/2014 11:00 AM, Sasha Levin wrote:
> >ping? still happening in -next.
> >
> >On 12/18/2013 07:25 PM, Sasha Levin wrote:
> >>Hi all,
> >>
> >>While fuzzing with trinity inside a KVM tools guest running latest -next kernel, I've stumbled on
> >>the following spew.
> >>
> >>This happens when sb is dereferenced in __mark_inode_dirty():
> >>
> >>                 if (sb->s_op->dirty_inode) <--- HERE
> >>                         sb->s_op->dirty_inode(inode, flags);
> >>
> >>'sb' is pointing to a memory full of poisoned memory (6b6b6b6b6b6b6b6b).
> >>
> >>[  590.469520] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> >>[  590.470737] Dumping ftrace buffer:
> >>[  590.471331]    (ftrace buffer empty)
> >>[  590.471903] Modules linked in:
> >>[  590.472349] CPU: 3 PID: 9685 Comm: trinity-child97 Tainted: G        W
> >>3.13.0-rc4-next-20131218-sasha-00013-g2cebb9b-dirty #4156
> >>[  590.472396] task: ffff8800bc520000 ti: ffff8800bc4fa000 task.ti: ffff8800bc4fa000
> >>[  590.472396] RIP: 0010:[<ffffffff81302d34>]  [<ffffffff81302d34>] __mark_inode_dirty+0xd4/0x360
> >>[  590.475691] RSP: 0018:ffff8800bc4fbda8  EFLAGS: 00010246
> >>[  590.475691] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8802f9002530 RCX: 000000006b6b6b6b
> >>[  590.475691] RDX: 0000000000000000 RSI: 0000000000000007 RDI: ffff8802f9002530
> >>[  590.475691] RBP: ffff8800bc4fbdc8 R08: 0000000000000000 R09: 0000000000000000
> >>[  590.475691] R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000007
> >>[  590.475691] R13: 0000000000000000 R14: ffff8802f8795668 R15: ffff8802f9002530
> >>[  590.475691] FS:  00007f9bba1b7700(0000) GS:ffff8801c8000000(0000) knlGS:0000000000000000
> >>[  590.475691] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> >>[  590.475691] CR2: 00007f9bba17da44 CR3: 00000000bc4e6000 CR4: 00000000000006e0
> >>[  590.475691] Stack:
> >>[  590.475691]  ffff8802f9002530 ffff8800bc4fbe98 0000000000000000 ffff880161244000
> >>[  590.475691]  ffff8800bc4fbdf8 ffffffff8130001b ffff8800bc4fbde8 0000000000001846
> >>[  590.475691]  ffff8800bc4fbe98 0000000000000000 ffff8800bc4fbe78 ffffffff812f3016
> >>[  590.475691] Call Trace:
> >>[  590.475691]  [<ffffffff8130001b>] simple_setattr+0x5b/0x70
> >>[  590.475691]  [<ffffffff812f3016>] notify_change+0x216/0x300
> >>[  590.475691]  [<ffffffff812d0180>] ? zs_malloc+0x1b0/0x200
> >>[  590.475691]  [<ffffffff812d0b15>] chown_common+0x135/0x1c0
> >>[  590.475691]  [<ffffffff812d0c20>] SyS_fchown+0x80/0xd0
> >>[  590.475691]  [<ffffffff843a6d50>] tracesys+0xdd/0xe2
> >>[  590.494436] VFS: Warning: trinity-child15 using old stat() call. Recompile your binary.
> >>[  590.475691] Code: c5 10 48 89 de ff d0 49 8b 45 00 48 85 c0 75 e7 65 ff 0c 25 5c da 00 00 0f 94
> >>c0 84 c0 74 08 e8 b3 33 d7 ff 0f 1f 00 49 8b 46 30 <48> 8b 40 10 48 85 c0 74 08 44 89 e6 48 89 df ff
> >>d0 8b 05 bd 6b
> >>[  590.475691] RIP  [<ffffffff81302d34>] __mark_inode_dirty+0xd4/0x360
> >>[  590.475691]  RSP <ffff8800bc4fbda8>
> >>
> >>
> >>Thanks,
> >>Sasha
> >
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: fs: gpf in simple_setattr
  2014-03-03 21:40     ` Jan Kara
@ 2014-03-05  0:00       ` Sasha Levin
  2014-03-05 12:45         ` Jan Kara
  0 siblings, 1 reply; 23+ messages in thread
From: Sasha Levin @ 2014-03-05  0:00 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, linux-fsdevel, LKML, Linus Torvalds

On 03/03/2014 04:40 PM, Jan Kara wrote:
> On Sat 01-03-14 15:05:21, Sasha Levin wrote:
>> >ping again?
>> >
>> >I've been working on it, but don't see an obvious issue.
>> >
>> >It does look like an access to invalid memory easily doable from
>> >userspace, so it should probably get fixed soon...
>    Hum, can you maybe dump the name in dentry passed to simple_setattr()? Or
> maybe even the whole path using dentry_path() (but not sure if that will
> be workable on half-torn-down fs)? Maybe it will give us a hint at which
> filesystem to look...

It's just garbage, this is why I'm having a hard time making any progress with
this bug.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: fs: gpf in simple_setattr
  2014-03-05  0:00       ` Sasha Levin
@ 2014-03-05 12:45         ` Jan Kara
  2014-03-06 16:02           ` Sasha Levin
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Kara @ 2014-03-05 12:45 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Jan Kara, Al Viro, linux-fsdevel, LKML, Linus Torvalds

On Tue 04-03-14 19:00:32, Sasha Levin wrote:
> On 03/03/2014 04:40 PM, Jan Kara wrote:
> >On Sat 01-03-14 15:05:21, Sasha Levin wrote:
> >>>ping again?
> >>>
> >>>I've been working on it, but don't see an obvious issue.
> >>>
> >>>It does look like an access to invalid memory easily doable from
> >>>userspace, so it should probably get fixed soon...
> >   Hum, can you maybe dump the name in dentry passed to simple_setattr()? Or
> >maybe even the whole path using dentry_path() (but not sure if that will
> >be workable on half-torn-down fs)? Maybe it will give us a hint at which
> >filesystem to look...
> 
> It's just garbage, this is why I'm having a hard time making any progress with
> this bug.
  OK, but that is strange because we hold a reference to the dentry so
noone should free it. So dentry->d_name should be valid... Is the rest of
the dentry also garbage? E.g. does dentry->d_inode still point to the inode
we call __mark_inode_dirty() on? Is dentry->d_sb == dentry->d_inode->i_sb?
Also if the inode isn't completely garbage, we can maybe infer something
from inode->i_op - that should point to some statically allocated
operations struct so we should be able to guess fs type from that.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: fs: gpf in simple_setattr
  2014-03-05 12:45         ` Jan Kara
@ 2014-03-06 16:02           ` Sasha Levin
  2014-03-08  2:14             ` Sasha Levin
  0 siblings, 1 reply; 23+ messages in thread
From: Sasha Levin @ 2014-03-06 16:02 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, linux-fsdevel, LKML, Linus Torvalds

On 03/05/2014 07:45 AM, Jan Kara wrote:
> On Tue 04-03-14 19:00:32, Sasha Levin wrote:
>> On 03/03/2014 04:40 PM, Jan Kara wrote:
>>> On Sat 01-03-14 15:05:21, Sasha Levin wrote:
>>>>> ping again?
>>>>>
>>>>> I've been working on it, but don't see an obvious issue.
>>>>>
>>>>> It does look like an access to invalid memory easily doable from
>>>>> userspace, so it should probably get fixed soon...
>>>    Hum, can you maybe dump the name in dentry passed to simple_setattr()? Or
>>> maybe even the whole path using dentry_path() (but not sure if that will
>>> be workable on half-torn-down fs)? Maybe it will give us a hint at which
>>> filesystem to look...
>>
>> It's just garbage, this is why I'm having a hard time making any progress with
>> this bug.
>    OK, but that is strange because we hold a reference to the dentry so
> noone should free it. So dentry->d_name should be valid... Is the rest of
> the dentry also garbage? E.g. does dentry->d_inode still point to the inode
> we call __mark_inode_dirty() on? Is dentry->d_sb == dentry->d_inode->i_sb?
> Also if the inode isn't completely garbage, we can maybe infer something
> from inode->i_op - that should point to some statically allocated
> operations struct so we should be able to guess fs type from that.

It's actually pretty tricky. This issue being a race makes catching it at the right time
difficult.

I've tried catching it in simple_setattr() before calling mark_inode_dirty() by testing
for the poison values inside inode, but they seem to be perfectly fine there and still
show up as bad within mark_inode_dirty().

Then I tried trapping it inside mark_inode_dirty(), but at that point I usually get garbage
inside inode, and have no way to go back to dentry.

Right now I'm just trying to dump everything that goes through simple_setattr() in hopes that
I could easily figure out what went wrong by looking at the log, but that just stops the bug
from reproducing.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: fs: gpf in simple_setattr
  2014-03-06 16:02           ` Sasha Levin
@ 2014-03-08  2:14             ` Sasha Levin
  2014-03-10 10:43               ` Jan Kara
  0 siblings, 1 reply; 23+ messages in thread
From: Sasha Levin @ 2014-03-08  2:14 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, linux-fsdevel, LKML, Linus Torvalds

On 03/06/2014 11:02 AM, Sasha Levin wrote:
> On 03/05/2014 07:45 AM, Jan Kara wrote:
>> On Tue 04-03-14 19:00:32, Sasha Levin wrote:
>>> On 03/03/2014 04:40 PM, Jan Kara wrote:
>>>> On Sat 01-03-14 15:05:21, Sasha Levin wrote:
>>>>>> ping again?
>>>>>>
>>>>>> I've been working on it, but don't see an obvious issue.
>>>>>>
>>>>>> It does look like an access to invalid memory easily doable from
>>>>>> userspace, so it should probably get fixed soon...
>>>>    Hum, can you maybe dump the name in dentry passed to simple_setattr()? Or
>>>> maybe even the whole path using dentry_path() (but not sure if that will
>>>> be workable on half-torn-down fs)? Maybe it will give us a hint at which
>>>> filesystem to look...
>>>
>>> It's just garbage, this is why I'm having a hard time making any progress with
>>> this bug.
>>    OK, but that is strange because we hold a reference to the dentry so
>> noone should free it. So dentry->d_name should be valid... Is the rest of
>> the dentry also garbage? E.g. does dentry->d_inode still point to the inode
>> we call __mark_inode_dirty() on? Is dentry->d_sb == dentry->d_inode->i_sb?
>> Also if the inode isn't completely garbage, we can maybe infer something
>> from inode->i_op - that should point to some statically allocated
>> operations struct so we should be able to guess fs type from that.
>
> It's actually pretty tricky. This issue being a race makes catching it at the right time
> difficult.
>
> I've tried catching it in simple_setattr() before calling mark_inode_dirty() by testing
> for the poison values inside inode, but they seem to be perfectly fine there and still
> show up as bad within mark_inode_dirty().
>
> Then I tried trapping it inside mark_inode_dirty(), but at that point I usually get garbage
> inside inode, and have no way to go back to dentry.
>
> Right now I'm just trying to dump everything that goes through simple_setattr() in hopes that
> I could easily figure out what went wrong by looking at the log, but that just stops the bug
> from reproducing.

I've tried the following code in simple_setattr() right before the call to mark_inode_dirty():

         p = dentry_path(dentry, pth, 200);
         printk(KERN_ERR "doh: %s %s\n", p, inode->i_sb->s_type->name);

but it seems that while 'p' ends up being "/", inode->i_sb is garbage and we can't pull out anything
about the file system.

Any way I could get anything useful any other way?


Thanks,
Sasha


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: fs: gpf in simple_setattr
  2014-03-08  2:14             ` Sasha Levin
@ 2014-03-10 10:43               ` Jan Kara
  2014-03-10 14:13                 ` Sasha Levin
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Kara @ 2014-03-10 10:43 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Jan Kara, Al Viro, linux-fsdevel, LKML, Linus Torvalds

On Fri 07-03-14 21:14:21, Sasha Levin wrote:
> On 03/06/2014 11:02 AM, Sasha Levin wrote:
> >On 03/05/2014 07:45 AM, Jan Kara wrote:
> >>On Tue 04-03-14 19:00:32, Sasha Levin wrote:
> >>>On 03/03/2014 04:40 PM, Jan Kara wrote:
> >>>>On Sat 01-03-14 15:05:21, Sasha Levin wrote:
> >>>>>>ping again?
> >>>>>>
> >>>>>>I've been working on it, but don't see an obvious issue.
> >>>>>>
> >>>>>>It does look like an access to invalid memory easily doable from
> >>>>>>userspace, so it should probably get fixed soon...
> >>>>   Hum, can you maybe dump the name in dentry passed to simple_setattr()? Or
> >>>>maybe even the whole path using dentry_path() (but not sure if that will
> >>>>be workable on half-torn-down fs)? Maybe it will give us a hint at which
> >>>>filesystem to look...
> >>>
> >>>It's just garbage, this is why I'm having a hard time making any progress with
> >>>this bug.
> >>   OK, but that is strange because we hold a reference to the dentry so
> >>noone should free it. So dentry->d_name should be valid... Is the rest of
> >>the dentry also garbage? E.g. does dentry->d_inode still point to the inode
> >>we call __mark_inode_dirty() on? Is dentry->d_sb == dentry->d_inode->i_sb?
> >>Also if the inode isn't completely garbage, we can maybe infer something
> >>from inode->i_op - that should point to some statically allocated
> >>operations struct so we should be able to guess fs type from that.
> >
> >It's actually pretty tricky. This issue being a race makes catching it at the right time
> >difficult.
> >
> >I've tried catching it in simple_setattr() before calling mark_inode_dirty() by testing
> >for the poison values inside inode, but they seem to be perfectly fine there and still
> >show up as bad within mark_inode_dirty().
> >
> >Then I tried trapping it inside mark_inode_dirty(), but at that point I usually get garbage
> >inside inode, and have no way to go back to dentry.
> >
> >Right now I'm just trying to dump everything that goes through simple_setattr() in hopes that
> >I could easily figure out what went wrong by looking at the log, but that just stops the bug
> >from reproducing.
> 
> I've tried the following code in simple_setattr() right before the call to mark_inode_dirty():
> 
>         p = dentry_path(dentry, pth, 200);
>         printk(KERN_ERR "doh: %s %s\n", p, inode->i_sb->s_type->name);
> 
> but it seems that while 'p' ends up being "/", inode->i_sb is garbage and we can't pull out anything
> about the file system.
  By garbage, do you mean that it is a poison, completely random data or does
inode->i_sb look like a valid pointer but just superblock isn't where it
points to?

> Any way I could get anything useful any other way?
  Hum, can you dump the whole contents of 'dentry' at that place? Maybe it
will tell us something.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: fs: gpf in simple_setattr
  2014-03-10 10:43               ` Jan Kara
@ 2014-03-10 14:13                 ` Sasha Levin
  2014-03-24 14:42                   ` Sasha Levin
  0 siblings, 1 reply; 23+ messages in thread
From: Sasha Levin @ 2014-03-10 14:13 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, linux-fsdevel, LKML, Linus Torvalds

On 03/10/2014 06:43 AM, Jan Kara wrote:
>    By garbage, do you mean that it is a poison, completely random data or does
> inode->i_sb look like a valid pointer but just superblock isn't where it
> points to?

It's poison.

>> >Any way I could get anything useful any other way?
>    Hum, can you dump the whole contents of 'dentry' at that place? Maybe it
> will tell us something.

I'll give it a go, will update when it happens again.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: fs: gpf in simple_setattr
  2014-03-10 14:13                 ` Sasha Levin
@ 2014-03-24 14:42                   ` Sasha Levin
  2014-03-24 21:48                     ` Jan Kara
  0 siblings, 1 reply; 23+ messages in thread
From: Sasha Levin @ 2014-03-24 14:42 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, linux-fsdevel, LKML, Linus Torvalds

On 03/10/2014 10:13 AM, Sasha Levin wrote:
> On 03/10/2014 06:43 AM, Jan Kara wrote:
>>    By garbage, do you mean that it is a poison, completely random data or does
>> inode->i_sb look like a valid pointer but just superblock isn't where it
>> points to?
>
> It's poison.
>
>>> >Any way I could get anything useful any other way?
>>    Hum, can you dump the whole contents of 'dentry' at that place? Maybe it
>> will tell us something.
>
> I'll give it a go, will update when it happens again.

Okay, I've added this:

diff --git a/fs/libfs.c b/fs/libfs.c
index a184424..2492dc4 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -377,6 +377,7 @@ int simple_setattr(struct dentry *dentry, struct iattr *iattr)
         if (iattr->ia_valid & ATTR_SIZE)
                 truncate_setsize(inode, iattr->ia_size);
         setattr_copy(inode, iattr);
+       printk(KERN_ERR "** %u %p %s %p %s %p %p %p\n", dentry->d_flags, dentry->d_pare
         mark_inode_dirty(inode);
         return 0;
  }

And got the following:

[  339.948946] ** 4194304 ffff8805ac03ba38 [eventpoll] ffff8806ec051fe0 [eventpoll] ffffffff84666040 ffff88056c73e7b0           (null)
[  339.956028] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[  339.958199] Dumping ftrace buffer:
[  339.959158]    (ftrace buffer empty)
[  339.960075] Modules linked in:
[  339.960712] CPU: 2 PID: 9702 Comm: trinity-c2 Tainted: G        W     3.14.0-rc7-next-20140321-sasha-00018-g0516fe6-dirty #266
[  339.962657] task: ffff88012b8b0000 ti: ffff88012b88a000 task.ti: ffff88012b88a000
[  339.964089] RIP: 0010:[<ffffffff8133af2c>]  [<ffffffff8133af2c>] __mark_inode_dirty+0x10c/0x4a0
[  339.965274] RSP: 0018:ffff88012b88bdb8  EFLAGS: 00010206
[  339.965274] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8806ec051fe0 RCX: 0000000000000006
[  339.965274] RDX: 0000000000008a90 RSI: 0000000000000007 RDI: ffff8806ec051fe0
[  339.965274] RBP: ffff88012b88bdd8 R08: 0000000000000000 R09: 0000000000000000
[  339.965274] R10: 0000000000000001 R11: 3330636135303838 R12: 0000000000000007
[  339.965274] R13: ffff8806ec051fe0 R14: ffff8806ec6bb3d8 R15: ffff8806ec051fe0
[  339.965274] FS:  00007f1993d82700(0000) GS:ffff8800bec00000(0000) knlGS:0000000000000000
[  339.965274] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  339.974036] CR2: 0000000000000000 CR3: 0000000123b16000 CR4: 00000000000006a0
[  339.974036] DR0: 0000000000698000 DR1: 0000000000000000 DR2: 0000000000000000
[  339.974036] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
[  339.974036] Stack:
[  339.974036]  0000000000000000 ffff8805ac03ba38 ffff8806ec051fe0 ffff88012b88bec8
[  339.974036]  ffff88012b88be28 ffffffff81336f7b ffffffff84666040 ffff88056c73e7b0
[  339.974036]  0000000000000000 ffff8806ec0520c8 0000000000000041 ffff88012b88bec8
[  339.974036] Call Trace:
[  339.974036]  [<ffffffff81336f7b>] simple_setattr+0xab/0xd0
[  339.974036]  [<ffffffff8132b148>] notify_change+0x258/0x390
[  339.974036]  [<ffffffff81307db2>] ? chmod_common+0x72/0x150
[  339.974036]  [<ffffffff81307df4>] chmod_common+0xb4/0x150
[  339.974036]  [<ffffffff8132c274>] ? __fget_light+0xe4/0x130
[  339.974036]  [<ffffffff81309382>] SyS_fchmod+0x62/0xa0
[  339.974036]  [<ffffffff84506a58>] tracesys+0xe1/0xe6
[  339.974036] Code: 8b 45 00 0f 1f 40 00 49 8b 7d 08 44 89 e2 49 83 c5 10 48 89 de ff d0 49 8b 45 00 48 85 c0 75 e7 eb c5 0f 1f 44 00 00 49 8b 46 30 <48> 8b 40 10 48 85 c0 74 08 44 89 e6 48 89 df ff d0 66 66 66 66
[  339.974036] RIP  [<ffffffff8133af2c>] __mark_inode_dirty+0x10c/0x4a0
[  339.974036]  RSP <ffff88012b88bdb8>


Thanks,
Sasha

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: fs: gpf in simple_setattr
  2014-03-24 14:42                   ` Sasha Levin
@ 2014-03-24 21:48                     ` Jan Kara
  2014-03-25  0:44                       ` Sasha Levin
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Kara @ 2014-03-24 21:48 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Jan Kara, Al Viro, linux-fsdevel, LKML, Linus Torvalds

On Mon 24-03-14 10:42:25, Sasha Levin wrote:
> On 03/10/2014 10:13 AM, Sasha Levin wrote:
> >On 03/10/2014 06:43 AM, Jan Kara wrote:
> >>   By garbage, do you mean that it is a poison, completely random data or does
> >>inode->i_sb look like a valid pointer but just superblock isn't where it
> >>points to?
> >
> >It's poison.
> >
> >>>>Any way I could get anything useful any other way?
> >>   Hum, can you dump the whole contents of 'dentry' at that place? Maybe it
> >>will tell us something.
> >
> >I'll give it a go, will update when it happens again.
> 
> Okay, I've added this:
> 
> diff --git a/fs/libfs.c b/fs/libfs.c
> index a184424..2492dc4 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -377,6 +377,7 @@ int simple_setattr(struct dentry *dentry, struct iattr *iattr)
>         if (iattr->ia_valid & ATTR_SIZE)
>                 truncate_setsize(inode, iattr->ia_size);
>         setattr_copy(inode, iattr);
> +       printk(KERN_ERR "** %u %p %s %p %s %p %p %p\n", dentry->d_flags, dentry->d_pare
>         mark_inode_dirty(inode);
>         return 0;
>  }
> 
> And got the following:
> 
> [  339.948946] ** 4194304 ffff8805ac03ba38 [eventpoll] ffff8806ec051fe0
> [eventpoll] ffffffff84666040 ffff88056c73e7b0           (null)
  OK, great. So finally we have something useful. We know we have problems
with [eventpoll] dentry. That is actually a special filesystem not mounted
anywhere - likely you get to that dentry through /proc/<pid>/fd/. Now
eventpoll is interesting because it uses single anon inode for all
eventpoll instances. And that inode should stay in place as long as
eventpoll filesystem exists. So it's not clear how come that inode is
freed. The basic check of handling of inode use count didn't find anything
suspicious. But I can check in more detail and if I fail, we now have a
pretty narrow area where to look...

								Honza


> [  339.956028] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [  339.958199] Dumping ftrace buffer:
> [  339.959158]    (ftrace buffer empty)
> [  339.960075] Modules linked in:
> [  339.960712] CPU: 2 PID: 9702 Comm: trinity-c2 Tainted: G        W     3.14.0-rc7-next-20140321-sasha-00018-g0516fe6-dirty #266
> [  339.962657] task: ffff88012b8b0000 ti: ffff88012b88a000 task.ti: ffff88012b88a000
> [  339.964089] RIP: 0010:[<ffffffff8133af2c>]  [<ffffffff8133af2c>] __mark_inode_dirty+0x10c/0x4a0
> [  339.965274] RSP: 0018:ffff88012b88bdb8  EFLAGS: 00010206
> [  339.965274] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8806ec051fe0 RCX: 0000000000000006
> [  339.965274] RDX: 0000000000008a90 RSI: 0000000000000007 RDI: ffff8806ec051fe0
> [  339.965274] RBP: ffff88012b88bdd8 R08: 0000000000000000 R09: 0000000000000000
> [  339.965274] R10: 0000000000000001 R11: 3330636135303838 R12: 0000000000000007
> [  339.965274] R13: ffff8806ec051fe0 R14: ffff8806ec6bb3d8 R15: ffff8806ec051fe0
> [  339.965274] FS:  00007f1993d82700(0000) GS:ffff8800bec00000(0000) knlGS:0000000000000000
> [  339.965274] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  339.974036] CR2: 0000000000000000 CR3: 0000000123b16000 CR4: 00000000000006a0
> [  339.974036] DR0: 0000000000698000 DR1: 0000000000000000 DR2: 0000000000000000
> [  339.974036] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
> [  339.974036] Stack:
> [  339.974036]  0000000000000000 ffff8805ac03ba38 ffff8806ec051fe0 ffff88012b88bec8
> [  339.974036]  ffff88012b88be28 ffffffff81336f7b ffffffff84666040 ffff88056c73e7b0
> [  339.974036]  0000000000000000 ffff8806ec0520c8 0000000000000041 ffff88012b88bec8
> [  339.974036] Call Trace:
> [  339.974036]  [<ffffffff81336f7b>] simple_setattr+0xab/0xd0
> [  339.974036]  [<ffffffff8132b148>] notify_change+0x258/0x390
> [  339.974036]  [<ffffffff81307db2>] ? chmod_common+0x72/0x150
> [  339.974036]  [<ffffffff81307df4>] chmod_common+0xb4/0x150
> [  339.974036]  [<ffffffff8132c274>] ? __fget_light+0xe4/0x130
> [  339.974036]  [<ffffffff81309382>] SyS_fchmod+0x62/0xa0
> [  339.974036]  [<ffffffff84506a58>] tracesys+0xe1/0xe6
> [  339.974036] Code: 8b 45 00 0f 1f 40 00 49 8b 7d 08 44 89 e2 49 83 c5 10 48 89 de ff d0 49 8b 45 00 48 85 c0 75 e7 eb c5 0f 1f 44 00 00 49 8b 46 30 <48> 8b 40 10 48 85 c0 74 08 44 89 e6 48 89 df ff d0 66 66 66 66
> [  339.974036] RIP  [<ffffffff8133af2c>] __mark_inode_dirty+0x10c/0x4a0
> [  339.974036]  RSP <ffff88012b88bdb8>
> 
> 
> Thanks,
> Sasha
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: fs: gpf in simple_setattr
  2014-03-24 21:48                     ` Jan Kara
@ 2014-03-25  0:44                       ` Sasha Levin
  2014-03-25 17:33                         ` Jan Kara
  0 siblings, 1 reply; 23+ messages in thread
From: Sasha Levin @ 2014-03-25  0:44 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, linux-fsdevel, LKML, Linus Torvalds

On 03/24/2014 05:48 PM, Jan Kara wrote:
>> >[  339.948946] ** 4194304 ffff8805ac03ba38 [eventpoll] ffff8806ec051fe0
>> >[eventpoll] ffffffff84666040 ffff88056c73e7b0           (null)
>    OK, great. So finally we have something useful. We know we have problems
> with [eventpoll] dentry. That is actually a special filesystem not mounted
> anywhere - likely you get to that dentry through/proc/<pid>/fd/. Now
> eventpoll is interesting because it uses single anon inode for all
> eventpoll instances. And that inode should stay in place as long as
> eventpoll filesystem exists. So it's not clear how come that inode is
> freed. The basic check of handling of inode use count didn't find anything
> suspicious. But I can check in more detail and if I fail, we now have a
> pretty narrow area where to look...

Seems like it's not specific to eventpoll, I saw the same error message with
"eventfd" and "perf_event".


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: fs: gpf in simple_setattr
  2014-03-25  0:44                       ` Sasha Levin
@ 2014-03-25 17:33                         ` Jan Kara
  2014-03-25 17:51                           ` Sasha Levin
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Kara @ 2014-03-25 17:33 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Jan Kara, Al Viro, linux-fsdevel, LKML, Linus Torvalds

On Mon 24-03-14 20:44:14, Sasha Levin wrote:
> On 03/24/2014 05:48 PM, Jan Kara wrote:
> >>>[  339.948946] ** 4194304 ffff8805ac03ba38 [eventpoll] ffff8806ec051fe0
> >>>[eventpoll] ffffffff84666040 ffff88056c73e7b0           (null)
> >   OK, great. So finally we have something useful. We know we have problems
> >with [eventpoll] dentry. That is actually a special filesystem not mounted
> >anywhere - likely you get to that dentry through/proc/<pid>/fd/. Now
> >eventpoll is interesting because it uses single anon inode for all
> >eventpoll instances. And that inode should stay in place as long as
> >eventpoll filesystem exists. So it's not clear how come that inode is
> >freed. The basic check of handling of inode use count didn't find anything
> >suspicious. But I can check in more detail and if I fail, we now have a
> >pretty narrow area where to look...
> 
> Seems like it's not specific to eventpoll, I saw the same error message with
> "eventfd" and "perf_event".
  Yup, all these use anon_inode_getfile() so it all points to the fact that
for some reason we freed anon_inode_inode. But I don't see where the
problem is. Can you maybe make 'anon_inode_inode' external to
fs/anon_inodes.c and dump stack for all iput() calls to anon_inode_inode?
There must be some suckers which don't belong there...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: fs: gpf in simple_setattr
  2014-03-25 17:33                         ` Jan Kara
@ 2014-03-25 17:51                           ` Sasha Levin
  2014-03-25 21:12                             ` Jan Kara
  0 siblings, 1 reply; 23+ messages in thread
From: Sasha Levin @ 2014-03-25 17:51 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, linux-fsdevel, LKML, Linus Torvalds

On 03/25/2014 01:33 PM, Jan Kara wrote:
> On Mon 24-03-14 20:44:14, Sasha Levin wrote:
>> On 03/24/2014 05:48 PM, Jan Kara wrote:
>>>>> [  339.948946] ** 4194304 ffff8805ac03ba38 [eventpoll] ffff8806ec051fe0
>>>>> [eventpoll] ffffffff84666040 ffff88056c73e7b0           (null)
>>>    OK, great. So finally we have something useful. We know we have problems
>>> with [eventpoll] dentry. That is actually a special filesystem not mounted
>>> anywhere - likely you get to that dentry through/proc/<pid>/fd/. Now
>>> eventpoll is interesting because it uses single anon inode for all
>>> eventpoll instances. And that inode should stay in place as long as
>>> eventpoll filesystem exists. So it's not clear how come that inode is
>>> freed. The basic check of handling of inode use count didn't find anything
>>> suspicious. But I can check in more detail and if I fail, we now have a
>>> pretty narrow area where to look...
>>
>> Seems like it's not specific to eventpoll, I saw the same error message with
>> "eventfd" and "perf_event".
>    Yup, all these use anon_inode_getfile() so it all points to the fact that
> for some reason we freed anon_inode_inode. But I don't see where the
> problem is. Can you maybe make 'anon_inode_inode' external to
> fs/anon_inodes.c and dump stack for all iput() calls to anon_inode_inode?
> There must be some suckers which don't belong there...

Okay, this is straightforward. It happened right after boot so we're lucky.

I'm also looking into that, but odds you'll spot the issue faster than me.


[  396.414556] CPU: 6 PID: 9838 Comm: trinity-c99 Tainted: G        W     3.14.0-rc7-next-20140325-sasha-00015-g142d039-dirty #275
[  396.415969]  0000000000000000 ffff88012554dd78 ffffffff844bd702 0000000000000001
[  396.417776]  ffff880bec133fc0 ffff88012554dda8 ffffffff8132db8e ffff8801ec065100
[  396.421972]  0000000000000000 ffff880bec133fc0 ffff8801ec065190 ffff88012554dde8
[  396.425483] Call Trace:
[  396.426543]  [<ffffffff844bd702>] dump_stack+0x4f/0x7c
[  396.428485]  [<ffffffff8132db8e>] iput+0x2e/0x1a0
[  396.429964]  [<ffffffff81327660>] dentry_kill+0x1b0/0x240
[  396.431414]  [<ffffffff813277fe>] dput+0x10e/0x130
[  396.433203]  [<ffffffff8130ffd7>] __fput+0x297/0x2e0
[  396.435423]  [<ffffffff8131006e>] ____fput+0xe/0x10
[  396.436233]  [<ffffffff811850ee>] task_work_run+0xae/0xf0
[  396.437842]  [<ffffffff8115e9c8>] do_exit+0x4a8/0xc80
[  396.439770]  [<ffffffff81b37003>] ? __this_cpu_preempt_check+0x13/0x20
[  396.442399]  [<ffffffff811c28f4>] ? trace_hardirqs_on_caller+0x1f4/0x290
[  396.444903]  [<ffffffff811c299d>] ? trace_hardirqs_on+0xd/0x10
[  396.446929]  [<ffffffff8115f274>] do_group_exit+0x84/0xd0
[  396.448845]  [<ffffffff8115f2d7>] SyS_exit_group+0x17/0x20
[  396.451430]  [<ffffffff84513258>] tracesys+0xe1/0xe6

[...]

[  396.896374] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[  396.899164] Dumping ftrace buffer:
[  396.900236]    (ftrace buffer empty)
[  396.900236] Modules linked in:
[  396.900236] CPU: 2 PID: 9737 Comm: trinity-c2 Tainted: G        W     3.14.0-rc7-next-20140325-sasha-00015-g142d039-dirty #275
[  396.900236] task: ffff88012c790000 ti: ffff88012b992000 task.ti: ffff88012b992000
[  396.900236] RIP: 0010:[<ffffffff8133e85c>]  [<ffffffff8133e85c>] __mark_inode_dirty+0x10c/0x4a0
[  396.900236] RSP: 0018:ffff88012b993dd8  EFLAGS: 00010206
[  396.900236] RAX: 6b6b6b6b6b6b6b6b RBX: ffff880bec133fc0 RCX: 0000000000000001
[  396.900236] RDX: ffff8800cb8f0000 RSI: 0000000000000007 RDI: ffff880bec133fc0
[  396.900236] RBP: ffff88012b993df8 R08: 000000005331c188 R09: 0000000000000000
[  396.900236] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000007
[  396.900236] R13: 0000000000000000 R14: ffff880bec790000 R15: ffff880bec133fc0
[  396.900236] FS:  00007f3eeb0ff700(0000) GS:ffff8800bec00000(0000) knlGS:0000000000000000
[  396.900236] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  396.900236] CR2: 00007f3eeb0c4504 CR3: 000000012b984000 CR4: 00000000000006a0
[  396.900236] DR0: 0000000000698000 DR1: 0000000000000000 DR2: 0000000000000000
[  396.900236] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000050602
[  396.900236] Stack:
[  396.900236]  ffff880bec133fc0 ffff88012b993ec8 0000000000000000 0000000000000180
[  396.900236]  ffff88012b993e28 ffffffff8133a8bb ffff88012b993e98 0000000000000041
[  396.900236]  ffff88012b993ec8 ffff88016c0e6cd8 ffff88012b993e98 ffffffff8132ead8
[  396.900236] Call Trace:
[  396.900236]  [<ffffffff8133a8bb>] simple_setattr+0x5b/0x70
[  396.900236]  [<ffffffff8132ead8>] notify_change+0x258/0x390
[  396.900236]  [<ffffffff8130b732>] ? chmod_common+0x72/0x150
[  396.900236]  [<ffffffff8130b774>] chmod_common+0xb4/0x150
[  396.900236]  [<ffffffff8132fc04>] ? __fget_light+0xe4/0x130
[  396.900236]  [<ffffffff8130cd02>] SyS_fchmod+0x62/0xa0
[  396.900236]  [<ffffffff84513258>] tracesys+0xe1/0xe6
[  396.900236] Code: 8b 45 00 0f 1f 40 00 49 8b 7d 08 44 89 e2 49 83 c5 10 48 89 de ff d0 49 8b 45 00 48 85 c0 75 e7 eb c5 0f 1f 44 00 00 49 8b 46 30 <48> 8b 40 10 48 85 c0 74 08 44 89 e6 48 89 df ff d0 66 66 66 66
[  396.939254] RIP  [<ffffffff8133e85c>] __mark_inode_dirty+0x10c/0x4a0
[  396.939254]  RSP <ffff88012b993dd8>


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: fs: gpf in simple_setattr
  2014-03-25 17:51                           ` Sasha Levin
@ 2014-03-25 21:12                             ` Jan Kara
  2014-03-26  0:12                               ` Sasha Levin
                                                 ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Jan Kara @ 2014-03-25 21:12 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Jan Kara, Al Viro, linux-fsdevel, LKML, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 1750 bytes --]

On Tue 25-03-14 13:51:11, Sasha Levin wrote:
> On 03/25/2014 01:33 PM, Jan Kara wrote:
> >On Mon 24-03-14 20:44:14, Sasha Levin wrote:
> >>On 03/24/2014 05:48 PM, Jan Kara wrote:
> >>>>>[  339.948946] ** 4194304 ffff8805ac03ba38 [eventpoll] ffff8806ec051fe0
> >>>>>[eventpoll] ffffffff84666040 ffff88056c73e7b0           (null)
> >>>   OK, great. So finally we have something useful. We know we have problems
> >>>with [eventpoll] dentry. That is actually a special filesystem not mounted
> >>>anywhere - likely you get to that dentry through/proc/<pid>/fd/. Now
> >>>eventpoll is interesting because it uses single anon inode for all
> >>>eventpoll instances. And that inode should stay in place as long as
> >>>eventpoll filesystem exists. So it's not clear how come that inode is
> >>>freed. The basic check of handling of inode use count didn't find anything
> >>>suspicious. But I can check in more detail and if I fail, we now have a
> >>>pretty narrow area where to look...
> >>
> >>Seems like it's not specific to eventpoll, I saw the same error message with
> >>"eventfd" and "perf_event".
> >   Yup, all these use anon_inode_getfile() so it all points to the fact that
> >for some reason we freed anon_inode_inode. But I don't see where the
> >problem is. Can you maybe make 'anon_inode_inode' external to
> >fs/anon_inodes.c and dump stack for all iput() calls to anon_inode_inode?
> >There must be some suckers which don't belong there...
> 
> Okay, this is straightforward. It happened right after boot so we're lucky.
> 
> I'm also looking into that, but odds you'll spot the issue faster than me.
  Can you try whether the following patch fixes the issue for you?

							Thanks
								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

[-- Attachment #2: 0001-fs-Avoid-userspace-mounting-anon_inodefs-filesystem.patch --]
[-- Type: text/x-patch, Size: 1303 bytes --]

>From 08effd8156660b889af7df31c22695f1469d12ad Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Tue, 25 Mar 2014 21:37:09 +0100
Subject: [PATCH] fs: Avoid userspace mounting anon_inodefs filesystem

anon_inodefs filesystem is a kernel internal filesystem userspace
shouldn't mess with. Remove registration of it so userspace cannot
even try to mount it (which would fail anyway because the filesystem is
MS_NOUSER).

This fixes an oops triggered by trinity when it tried mounting
anon_inodefs which overwrote anon_inode_inode pointer while other CPU
has been in anon_inode_getfile() between ihold() and d_instantiate().
Thus effectively creating dentry pointing to an inode without holding a
reference to it.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/anon_inodes.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 24084732b1d0..4b4543b8b894 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -177,9 +177,6 @@ static int __init anon_inode_init(void)
 {
 	int error;
 
-	error = register_filesystem(&anon_inode_fs_type);
-	if (error)
-		goto err_exit;
 	anon_inode_mnt = kern_mount(&anon_inode_fs_type);
 	if (IS_ERR(anon_inode_mnt)) {
 		error = PTR_ERR(anon_inode_mnt);
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: fs: gpf in simple_setattr
  2014-03-25 21:12                             ` Jan Kara
@ 2014-03-26  0:12                               ` Sasha Levin
  2014-03-26  0:41                               ` Linus Torvalds
  2014-03-26  5:53                               ` Dave Jones
  2 siblings, 0 replies; 23+ messages in thread
From: Sasha Levin @ 2014-03-26  0:12 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, linux-fsdevel, LKML, Linus Torvalds

On 03/25/2014 05:12 PM, Jan Kara wrote:
> On Tue 25-03-14 13:51:11, Sasha Levin wrote:
>> On 03/25/2014 01:33 PM, Jan Kara wrote:
>>> On Mon 24-03-14 20:44:14, Sasha Levin wrote:
>>>> On 03/24/2014 05:48 PM, Jan Kara wrote:
>>>>>>> [  339.948946] ** 4194304 ffff8805ac03ba38 [eventpoll] ffff8806ec051fe0
>>>>>>> [eventpoll] ffffffff84666040 ffff88056c73e7b0           (null)
>>>>>    OK, great. So finally we have something useful. We know we have problems
>>>>> with [eventpoll] dentry. That is actually a special filesystem not mounted
>>>>> anywhere - likely you get to that dentry through/proc/<pid>/fd/. Now
>>>>> eventpoll is interesting because it uses single anon inode for all
>>>>> eventpoll instances. And that inode should stay in place as long as
>>>>> eventpoll filesystem exists. So it's not clear how come that inode is
>>>>> freed. The basic check of handling of inode use count didn't find anything
>>>>> suspicious. But I can check in more detail and if I fail, we now have a
>>>>> pretty narrow area where to look...
>>>>
>>>> Seems like it's not specific to eventpoll, I saw the same error message with
>>>> "eventfd" and "perf_event".
>>>    Yup, all these use anon_inode_getfile() so it all points to the fact that
>>> for some reason we freed anon_inode_inode. But I don't see where the
>>> problem is. Can you maybe make 'anon_inode_inode' external to
>>> fs/anon_inodes.c and dump stack for all iput() calls to anon_inode_inode?
>>> There must be some suckers which don't belong there...
>>
>> Okay, this is straightforward. It happened right after boot so we're lucky.
>>
>> I'm also looking into that, but odds you'll spot the issue faster than me.
>    Can you try whether the following patch fixes the issue for you?

Looks like it fixes it, thanks!


Thanks,
Sasha


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: fs: gpf in simple_setattr
  2014-03-25 21:12                             ` Jan Kara
  2014-03-26  0:12                               ` Sasha Levin
@ 2014-03-26  0:41                               ` Linus Torvalds
  2014-03-26  5:34                                 ` Jan Kara
  2014-03-26  5:53                               ` Dave Jones
  2 siblings, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2014-03-26  0:41 UTC (permalink / raw)
  To: Jan Kara; +Cc: Sasha Levin, Al Viro, linux-fsdevel, LKML

On Tue, Mar 25, 2014 at 2:12 PM, Jan Kara <jack@suse.cz> wrote:
>
>   Can you try whether the following patch fixes the issue for you?

Good catch, Honza.

I hate how fragile that code ends up being and would love to see that
"anon_inode_inode" allocation and assignment just once in
anon_inode_init(), but considering that it wants the superblock
pointer, I suspect it's not cleanly doable. Oh well. Your patch looks
like it should make the issue moot, I just dread this happening again
due to the layout of the code.

Will apply the patch,

             Linus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: fs: gpf in simple_setattr
  2014-03-26  0:41                               ` Linus Torvalds
@ 2014-03-26  5:34                                 ` Jan Kara
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Kara @ 2014-03-26  5:34 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jan Kara, Sasha Levin, Al Viro, linux-fsdevel, LKML

[-- Attachment #1: Type: text/plain, Size: 751 bytes --]

On Tue 25-03-14 17:41:59, Linus Torvalds wrote:
> On Tue, Mar 25, 2014 at 2:12 PM, Jan Kara <jack@suse.cz> wrote:
> >
> >   Can you try whether the following patch fixes the issue for you?
> 
> Good catch, Honza.
> 
> I hate how fragile that code ends up being and would love to see that
> "anon_inode_inode" allocation and assignment just once in
> anon_inode_init(), but considering that it wants the superblock
> pointer, I suspect it's not cleanly doable. Oh well. Your patch looks
> like it should make the issue moot, I just dread this happening again
> due to the layout of the code.
  Well, I think it could be done relatively cleanly... How about the
attached patch (it boots for me)?

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

[-- Attachment #2: 0001-vfs-Allocate-anon_inode_inode-in-anon_inode_init.patch --]
[-- Type: text/x-patch, Size: 2020 bytes --]

>From de87b1e6c3613d92f659c632be57461e87828e42 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Wed, 26 Mar 2014 06:20:14 +0100
Subject: [PATCH] vfs: Allocate anon_inode_inode in anon_inode_init()

Currently we allocated anon_inode_inode in anon_inodefs_mount. This is
somewhat fragile as if that function ever gets called again, it will
overwrite anon_inode_inode pointer. So move the initialization of
anon_inode_inode to anon_inode_init().

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/anon_inodes.c | 22 +++++++---------------
 1 file changed, 7 insertions(+), 15 deletions(-)

diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 4b4543b8b894..7f34f7702204 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -41,19 +41,8 @@ static const struct dentry_operations anon_inodefs_dentry_operations = {
 static struct dentry *anon_inodefs_mount(struct file_system_type *fs_type,
 				int flags, const char *dev_name, void *data)
 {
-	struct dentry *root;
-	root = mount_pseudo(fs_type, "anon_inode:", NULL,
+	return mount_pseudo(fs_type, "anon_inode:", NULL,
 			&anon_inodefs_dentry_operations, ANON_INODE_FS_MAGIC);
-	if (!IS_ERR(root)) {
-		struct super_block *s = root->d_sb;
-		anon_inode_inode = alloc_anon_inode(s);
-		if (IS_ERR(anon_inode_inode)) {
-			dput(root);
-			deactivate_locked_super(s);
-			root = ERR_CAST(anon_inode_inode);
-		}
-	}
-	return root;
 }
 
 static struct file_system_type anon_inode_fs_type = {
@@ -180,12 +169,15 @@ static int __init anon_inode_init(void)
 	anon_inode_mnt = kern_mount(&anon_inode_fs_type);
 	if (IS_ERR(anon_inode_mnt)) {
 		error = PTR_ERR(anon_inode_mnt);
-		goto err_unregister_filesystem;
+		goto err_exit;
+	}
+	anon_inode_inode = alloc_anon_inode(anon_inode_mnt->mnt_sb);
+	if (IS_ERR(anon_inode_inode)) {
+		error = PTR_ERR(anon_inode_inode);
+		goto err_exit;
 	}
 	return 0;
 
-err_unregister_filesystem:
-	unregister_filesystem(&anon_inode_fs_type);
 err_exit:
 	panic(KERN_ERR "anon_inode_init() failed (%d)\n", error);
 }
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: fs: gpf in simple_setattr
  2014-03-25 21:12                             ` Jan Kara
  2014-03-26  0:12                               ` Sasha Levin
  2014-03-26  0:41                               ` Linus Torvalds
@ 2014-03-26  5:53                               ` Dave Jones
  2014-03-26 15:00                                 ` Sasha Levin
  2 siblings, 1 reply; 23+ messages in thread
From: Dave Jones @ 2014-03-26  5:53 UTC (permalink / raw)
  To: Jan Kara; +Cc: Sasha Levin, Al Viro, linux-fsdevel, LKML, Linus Torvalds

On Tue, Mar 25, 2014 at 10:12:29PM +0100, Jan Kara wrote:

 > This fixes an oops triggered by trinity when it tried mounting
 > anon_inodefs which overwrote anon_inode_inode pointer while other CPU
 > has been in anon_inode_getfile() between ihold() and d_instantiate().
 > Thus effectively creating dentry pointing to an inode without holding a
 > reference to it.
 
<raises eyebrows>

Sasha, do you have changes to trinity's syscall/mount.c ?
It's kind of miraculous we managed to get the type arg right
there, because right now we're just passing a random address as an arg.
The only way that could work is if we randomly managed to do an allocation,
and then a seek & read from /proc/filesystems to that buffer.  If you're
lucky enough to get all those conditions right from rand() calls, you should
probably give up on kernel hacking and buy some powerball tickets.

We should add some code to make that only return strings from /proc/filesystems,
which makes me wonder if you already did that..

	Dave


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: fs: gpf in simple_setattr
  2014-03-26  5:53                               ` Dave Jones
@ 2014-03-26 15:00                                 ` Sasha Levin
  0 siblings, 0 replies; 23+ messages in thread
From: Sasha Levin @ 2014-03-26 15:00 UTC (permalink / raw)
  To: Dave Jones, Jan Kara, Al Viro, linux-fsdevel, LKML, Linus Torvalds

On 03/26/2014 01:53 AM, Dave Jones wrote:
> On Tue, Mar 25, 2014 at 10:12:29PM +0100, Jan Kara wrote:
>
>   > This fixes an oops triggered by trinity when it tried mounting
>   > anon_inodefs which overwrote anon_inode_inode pointer while other CPU
>   > has been in anon_inode_getfile() between ihold() and d_instantiate().
>   > Thus effectively creating dentry pointing to an inode without holding a
>   > reference to it.
>
> <raises eyebrows>
>
> Sasha, do you have changes to trinity's syscall/mount.c ?
> It's kind of miraculous we managed to get the type arg right
> there, because right now we're just passing a random address as an arg.
> The only way that could work is if we randomly managed to do an allocation,
> and then a seek & read from /proc/filesystems to that buffer.  If you're
> lucky enough to get all those conditions right from rand() calls, you should
> probably give up on kernel hacking and buy some powerball tickets.
>
> We should add some code to make that only return strings from /proc/filesystems,
> which makes me wonder if you already did that..

Almost :)

I'm mounting every filesystem under /prov/filesystems into trinity's victims path
before trinity starts, so it gets to fuzz every filesystem known to the kernel.

For nodev filesystems it just mounts then and for nodev ones it will try to
mkfs.$fs a block device and mount that.

So basically I'm not surprised by the explanation for that bug or that trinity
has triggered it.


Thanks,
Sasha



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2014-03-26 15:01 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-19  0:25 fs: gpf in simple_setattr Sasha Levin
2014-01-08 16:00 ` Sasha Levin
2014-03-01 20:05   ` Sasha Levin
2014-03-02  3:35     ` Linus Torvalds
2014-03-03  2:01       ` Sasha Levin
2014-03-03 21:40     ` Jan Kara
2014-03-05  0:00       ` Sasha Levin
2014-03-05 12:45         ` Jan Kara
2014-03-06 16:02           ` Sasha Levin
2014-03-08  2:14             ` Sasha Levin
2014-03-10 10:43               ` Jan Kara
2014-03-10 14:13                 ` Sasha Levin
2014-03-24 14:42                   ` Sasha Levin
2014-03-24 21:48                     ` Jan Kara
2014-03-25  0:44                       ` Sasha Levin
2014-03-25 17:33                         ` Jan Kara
2014-03-25 17:51                           ` Sasha Levin
2014-03-25 21:12                             ` Jan Kara
2014-03-26  0:12                               ` Sasha Levin
2014-03-26  0:41                               ` Linus Torvalds
2014-03-26  5:34                                 ` Jan Kara
2014-03-26  5:53                               ` Dave Jones
2014-03-26 15:00                                 ` Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).