linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* general protection fault (btrfs_real_readdir)
@ 2016-05-18 11:31 Markus Trippelsdorf
  2016-05-18 12:21 ` Al Viro
  0 siblings, 1 reply; 6+ messages in thread
From: Markus Trippelsdorf @ 2016-05-18 11:31 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel

I'm running the latest Linus git tree and the parallel filesystem directory
handling update seems to cause the following issue:

 general protection fault: 0000 [#1] SMP
 CPU: 0 PID: 24801 Comm: ld Not tainted 4.6.0-03623-g0b7962a6c4a3-dirty #118
 Hardware name: System manufacturer System Product Name/M4A78T-E, BIOS 3503    04/13/2011
 task: ffff88016cafa800 ti: ffff8801076e0000 task.ti: ffff8801076e0000
 RIP: 0010:[<ffffffff8134e9f3>]  [<ffffffff8134e9f3>] btrfs_readdir_delayed_dir_index+0x73/0x120
 RSP: 0018:ffff8801076e3dc0  EFLAGS: 00010202
 RAX: dead000000000100 RBX: ffff8800dbbd3840 RCX: ffff8800dbbd3880
 RDX: dead000000000200 RSI: ffff8801076e3e30 RDI: ffff8801076e3ef0
 RBP: ffff8801076e3e30 R08: ffff88006ac4b798 R09: ffff880000000000
 R10: 0000160000000000 R11: 0000000000001000 R12: dead000000000200
 R13: dead000000000100 R14: ffff8801076e3ef0 R15: dead0000000000c0
 FS:  00007f4782e04740(0000) GS:ffff88021fc00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f47821ef000 CR3: 00000001b9e38000 CR4: 00000000000006f0
 Stack:
  ffff8801076e3e2f 0000000000003000 0000000000000001 ffff88006ac4b6f8
  ffff8801076e3e30 ffff8801076e3ef0 0000000000000000 0000000000000008
  ffffffff812f038b 0000001281e62a80 ffff880214ea8800 ffff8801986fdbd0
 Call Trace:
  [<ffffffff812f038b>] ? btrfs_real_readdir+0x44b/0x540
  [<ffffffff811b064d>] ? SyS_getdents+0x12d/0x2a0
  [<ffffffff811affa0>] ? SyS_ioctl+0x6a0/0x6a0
  [<ffffffff810923db>] ? entry_SYSCALL_64_fastpath+0x13/0x8f
 Code: 02 00 00 00 00 ad de eb 1e f0 ff 4b 60 74 73 49 8b 47 40 49 8d 57 40 4c 89 fb 48 39 d5 4c 8d 78 c0 0f 84 8d 00 00 00 48 8b 53 48 <48> 89 50 08 48 89 02 4c 89 6b 40 4c 89 63 48 48 8b 4b 21 49 3b 
 RIP  [<ffffffff8134e9f3>] btrfs_readdir_delayed_dir_index+0x73/0x120
  RSP <ffff8801076e3dc0>
 ---[ end trace 91067801e8a68a7e ]---

This happened while I was building gcc, so the system was very busy.

-- 
Markus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: general protection fault (btrfs_real_readdir)
  2016-05-18 11:31 general protection fault (btrfs_real_readdir) Markus Trippelsdorf
@ 2016-05-18 12:21 ` Al Viro
  2016-05-18 12:32   ` Markus Trippelsdorf
                     ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Al Viro @ 2016-05-18 12:21 UTC (permalink / raw)
  To: Markus Trippelsdorf; +Cc: linux-kernel, Chris Mason

On Wed, May 18, 2016 at 01:31:40PM +0200, Markus Trippelsdorf wrote:
> I'm running the latest Linus git tree and the parallel filesystem directory
> handling update seems to cause the following issue:
 
>  Call Trace:
>   [<ffffffff812f038b>] ? btrfs_real_readdir+0x44b/0x540
>   [<ffffffff811b064d>] ? SyS_getdents+0x12d/0x2a0
>   [<ffffffff811affa0>] ? SyS_ioctl+0x6a0/0x6a0
>   [<ffffffff810923db>] ? entry_SYSCALL_64_fastpath+0x13/0x8f
>  Code: 02 00 00 00 00 ad de eb 1e f0 ff 4b 60 74 73 49 8b 47 40 49 8d 57 40 4c 89 fb 48 39 d5 4c 8d 78 c0 0f 84 8d 00 00 00 48 8b 53 48 <48> 89 50 08 48 89 02 4c 89 6b 40 4c 89 63 48 48 8b 4b 21 49 3b 
>  RIP  [<ffffffff8134e9f3>] btrfs_readdir_delayed_dir_index+0x73/0x120
>   RSP <ffff8801076e3dc0>
>  ---[ end trace 91067801e8a68a7e ]---
> 
> This happened while I was building gcc, so the system was very busy.

>From a very superficial reading of delayed-inode.c, it looks like delayed
node might need locking...  This
        list_for_each_entry_safe(curr, next, ins_list, readdir_list) {
                list_del(&curr->readdir_list);
looks particularly unpleasant.  Just to make sure that this *is* just a
readdir issue (and not something involving lookups), could you try to
reproduce the breakage with 972b241f8 reverted?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: general protection fault (btrfs_real_readdir)
  2016-05-18 12:21 ` Al Viro
@ 2016-05-18 12:32   ` Markus Trippelsdorf
  2016-05-18 13:01   ` Markus Trippelsdorf
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Markus Trippelsdorf @ 2016-05-18 12:32 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel, Chris Mason

On 2016.05.18 at 13:21 +0100, Al Viro wrote:
> On Wed, May 18, 2016 at 01:31:40PM +0200, Markus Trippelsdorf wrote:
> > I'm running the latest Linus git tree and the parallel filesystem directory
> > handling update seems to cause the following issue:
>  
> >  Call Trace:
> >   [<ffffffff812f038b>] ? btrfs_real_readdir+0x44b/0x540
> >   [<ffffffff811b064d>] ? SyS_getdents+0x12d/0x2a0
> >   [<ffffffff811affa0>] ? SyS_ioctl+0x6a0/0x6a0
> >   [<ffffffff810923db>] ? entry_SYSCALL_64_fastpath+0x13/0x8f
> >  Code: 02 00 00 00 00 ad de eb 1e f0 ff 4b 60 74 73 49 8b 47 40 49 8d 57 40 4c 89 fb 48 39 d5 4c 8d 78 c0 0f 84 8d 00 00 00 48 8b 53 48 <48> 89 50 08 48 89 02 4c 89 6b 40 4c 89 63 48 48 8b 4b 21 49 3b 
> >  RIP  [<ffffffff8134e9f3>] btrfs_readdir_delayed_dir_index+0x73/0x120
> >   RSP <ffff8801076e3dc0>
> >  ---[ end trace 91067801e8a68a7e ]---
> > 
> > This happened while I was building gcc, so the system was very busy.
> 
> From a very superficial reading of delayed-inode.c, it looks like delayed
> node might need locking...  This
>         list_for_each_entry_safe(curr, next, ins_list, readdir_list) {
>                 list_del(&curr->readdir_list);
> looks particularly unpleasant.  Just to make sure that this *is* just a
> readdir issue (and not something involving lookups), could you try to
> reproduce the breakage with 972b241f8 reverted?

Will give it a try later.

Just in case it may help:

(gdb) list *(btrfs_readdir_delayed_dir_index+0x73)
0xffffffff8134e9f3 is in btrfs_readdir_delayed_dir_index (include/linux/list.h:89).
84       * This is only for internal list manipulation where we know
85       * the prev/next entries already!
86       */
87      static inline void __list_del(struct list_head * prev, struct list_head * next)
88      {
89              next->prev = prev;
90              WRITE_ONCE(prev->next, next);
91      }
92
93      /**
(gdb) list *(btrfs_real_readdir+0x44b)
0xffffffff812f038b is in btrfs_real_readdir (fs/btrfs/inode.c:5858).
5853
5854            if (key_type == BTRFS_DIR_INDEX_KEY) {
5855                    if (is_curr)
5856                            ctx->pos++;
5857                    ret = btrfs_readdir_delayed_dir_index(ctx, &ins_list, &emitted);
5858                    if (ret)
5859                            goto nopos;
5860            }
5861
5862            /*

-- 
Markus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: general protection fault (btrfs_real_readdir)
  2016-05-18 12:21 ` Al Viro
  2016-05-18 12:32   ` Markus Trippelsdorf
@ 2016-05-18 13:01   ` Markus Trippelsdorf
  2016-05-18 13:11   ` Chris Mason
  2016-05-18 14:14   ` Chris Mason
  3 siblings, 0 replies; 6+ messages in thread
From: Markus Trippelsdorf @ 2016-05-18 13:01 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel, Chris Mason

On 2016.05.18 at 13:21 +0100, Al Viro wrote:
> On Wed, May 18, 2016 at 01:31:40PM +0200, Markus Trippelsdorf wrote:
> > I'm running the latest Linus git tree and the parallel filesystem directory
> > handling update seems to cause the following issue:
>  
> >  Call Trace:
> >   [<ffffffff812f038b>] ? btrfs_real_readdir+0x44b/0x540
> >   [<ffffffff811b064d>] ? SyS_getdents+0x12d/0x2a0
> >   [<ffffffff811affa0>] ? SyS_ioctl+0x6a0/0x6a0
> >   [<ffffffff810923db>] ? entry_SYSCALL_64_fastpath+0x13/0x8f
> >  Code: 02 00 00 00 00 ad de eb 1e f0 ff 4b 60 74 73 49 8b 47 40 49 8d 57 40 4c 89 fb 48 39 d5 4c 8d 78 c0 0f 84 8d 00 00 00 48 8b 53 48 <48> 89 50 08 48 89 02 4c 89 6b 40 4c 89 63 48 48 8b 4b 21 49 3b 
> >  RIP  [<ffffffff8134e9f3>] btrfs_readdir_delayed_dir_index+0x73/0x120
> >   RSP <ffff8801076e3dc0>
> >  ---[ end trace 91067801e8a68a7e ]---
> > 
> > This happened while I was building gcc, so the system was very busy.
> 
> From a very superficial reading of delayed-inode.c, it looks like delayed
> node might need locking...  This
>         list_for_each_entry_safe(curr, next, ins_list, readdir_list) {
>                 list_del(&curr->readdir_list);
> looks particularly unpleasant.  Just to make sure that this *is* just a
> readdir issue (and not something involving lookups), could you try to
> reproduce the breakage with 972b241f8 reverted?

For what it's worth, gcc bootstrapped fine with 972b241f8 reverted.

-- 
Markus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: general protection fault (btrfs_real_readdir)
  2016-05-18 12:21 ` Al Viro
  2016-05-18 12:32   ` Markus Trippelsdorf
  2016-05-18 13:01   ` Markus Trippelsdorf
@ 2016-05-18 13:11   ` Chris Mason
  2016-05-18 14:14   ` Chris Mason
  3 siblings, 0 replies; 6+ messages in thread
From: Chris Mason @ 2016-05-18 13:11 UTC (permalink / raw)
  To: Al Viro; +Cc: Markus Trippelsdorf, linux-kernel

On Wed, May 18, 2016 at 01:21:14PM +0100, Al Viro wrote:
> On Wed, May 18, 2016 at 01:31:40PM +0200, Markus Trippelsdorf wrote:
> > I'm running the latest Linus git tree and the parallel filesystem directory
> > handling update seems to cause the following issue:
>  
> >  Call Trace:
> >   [<ffffffff812f038b>] ? btrfs_real_readdir+0x44b/0x540
> >   [<ffffffff811b064d>] ? SyS_getdents+0x12d/0x2a0
> >   [<ffffffff811affa0>] ? SyS_ioctl+0x6a0/0x6a0
> >   [<ffffffff810923db>] ? entry_SYSCALL_64_fastpath+0x13/0x8f
> >  Code: 02 00 00 00 00 ad de eb 1e f0 ff 4b 60 74 73 49 8b 47 40 49 8d 57 40 4c 89 fb 48 39 d5 4c 8d 78 c0 0f 84 8d 00 00 00 48 8b 53 48 <48> 89 50 08 48 89 02 4c 89 6b 40 4c 89 63 48 48 8b 4b 21 49 3b 
> >  RIP  [<ffffffff8134e9f3>] btrfs_readdir_delayed_dir_index+0x73/0x120
> >   RSP <ffff8801076e3dc0>
> >  ---[ end trace 91067801e8a68a7e ]---
> > 
> > This happened while I was building gcc, so the system was very busy.
> 
> From a very superficial reading of delayed-inode.c, it looks like delayed
> node might need locking...  This
>         list_for_each_entry_safe(curr, next, ins_list, readdir_list) {
>                 list_del(&curr->readdir_list);
> looks particularly unpleasant.  Just to make sure that this *is* just a
> readdir issue (and not something involving lookups), could you try to
> reproduce the breakage with 972b241f8 reverted?

Yeah, it does expect the mutex to be held, sorry Al I missed this when
you asked.  I'll cook a patch today.

-chris

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: general protection fault (btrfs_real_readdir)
  2016-05-18 12:21 ` Al Viro
                     ` (2 preceding siblings ...)
  2016-05-18 13:11   ` Chris Mason
@ 2016-05-18 14:14   ` Chris Mason
  3 siblings, 0 replies; 6+ messages in thread
From: Chris Mason @ 2016-05-18 14:14 UTC (permalink / raw)
  To: Al Viro; +Cc: Markus Trippelsdorf, linux-kernel

On Wed, May 18, 2016 at 01:21:14PM +0100, Al Viro wrote:
> On Wed, May 18, 2016 at 01:31:40PM +0200, Markus Trippelsdorf wrote:
> > I'm running the latest Linus git tree and the parallel filesystem directory
> > handling update seems to cause the following issue:
>  
> >  Call Trace:
> >   [<ffffffff812f038b>] ? btrfs_real_readdir+0x44b/0x540
> >   [<ffffffff811b064d>] ? SyS_getdents+0x12d/0x2a0
> >   [<ffffffff811affa0>] ? SyS_ioctl+0x6a0/0x6a0
> >   [<ffffffff810923db>] ? entry_SYSCALL_64_fastpath+0x13/0x8f
> >  Code: 02 00 00 00 00 ad de eb 1e f0 ff 4b 60 74 73 49 8b 47 40 49 8d 57 40 4c 89 fb 48 39 d5 4c 8d 78 c0 0f 84 8d 00 00 00 48 8b 53 48 <48> 89 50 08 48 89 02 4c 89 6b 40 4c 89 63 48 48 8b 4b 21 49 3b 
> >  RIP  [<ffffffff8134e9f3>] btrfs_readdir_delayed_dir_index+0x73/0x120
> >   RSP <ffff8801076e3dc0>
> >  ---[ end trace 91067801e8a68a7e ]---
> > 
> > This happened while I was building gcc, so the system was very busy.
> 
> From a very superficial reading of delayed-inode.c, it looks like delayed
> node might need locking...  This
>         list_for_each_entry_safe(curr, next, ins_list, readdir_list) {
>                 list_del(&curr->readdir_list);
> looks particularly unpleasant.  Just to make sure that this *is* just a
> readdir issue (and not something involving lookups), could you try to
> reproduce the breakage with 972b241f8 reverted?

Ok, really fixing this means redoing the delayed directory walking
completely.  I'm not comfortable with that for a one day fix, lets just
revert 972b241f8 while I bash this out.

-chris

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-05-18 14:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-18 11:31 general protection fault (btrfs_real_readdir) Markus Trippelsdorf
2016-05-18 12:21 ` Al Viro
2016-05-18 12:32   ` Markus Trippelsdorf
2016-05-18 13:01   ` Markus Trippelsdorf
2016-05-18 13:11   ` Chris Mason
2016-05-18 14:14   ` Chris Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).