All of lore.kernel.org
 help / color / mirror / Atom feed
* Kernel Bug: unable to handle kernel paging request
@ 2013-07-12  5:24 Jérôme Poulin
       [not found] ` <CALJXSJquK6YxGKuH97Ec2CTMyJaZrJjOfePSKtgPDm8_9YXzzw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Jérôme Poulin @ 2013-07-12  5:24 UTC (permalink / raw)
  To: linux-nilfs

In response to Vyacheslav Dubeyko, here is the problem I encounter,
I'm not sure how to reproduce it now, but before deleting
/var/cache/apt, I was able to reproduce it by issuing apt-get update.
Now, it triggers when Ubuntu launches the apt daemon. Afterward, the
whole system is frozen, not even SysRq+I would return me to the
console.

After setting the log in another partition, here is what I have in syslog:

Jul 12 01:08:43 bluetoothd[635]: Stopping discovery
Jul 12 01:10:43 dbus[622]: [system] Activating service
name='org.freedesktop.PackageKit' (using servicehelper)
Jul 12 01:10:43 AptDaemon: INFO: Initializing daemon
Jul 12 01:10:43 AptDaemon.PackageKit: INFO: Initializing PackageKit compat layer
Jul 12 01:10:43 dbus[622]: [system] Successfully activated service
'org.freedesktop.PackageKit'
[ 1677.310656] BUG: unable to handle kernel paging request at 0000000000004c83
[ 1677.310683] IP: [<ffffffffa024d0f2>] nilfs_end_page_io+0x12/0xd0 [nilfs2]
[ 1677.310708] PGD 0
[ 1677.310715] Oops: 0000 [#1] SMP
[ 1677.310726] Modules linked in: pci_stub vboxpci(OF) vboxnetadp(OF)
vboxnetflt(OF) vboxdrv(OF) nfsd(F) auth_rpcgss(F) nfs_acl(F) lockd(F)
sunrpc(F) dm_crypt(F) bbswitch(OF) zram(C) intel_powerclamp kvm_intel
kvm parport_pc(F) ppdev(F) lp(F) uvcvideo parport(F) crc32_pclmul(F)
ghash_clmulni_intel(F) aesni_intel(F) snd_hda_codec_realtek
snd_hda_intel snd_hda_codec aes_x86_64(F) asus_wmi lrw(F)
sparse_keymap gf128mul(F) snd_hwdep(F) glue_helper(F) ablk_helper(F)
arc4(F) snd_pcm(F) cryptd(F) joydev(F) videobuf2_vmalloc
videobuf2_memops videobuf2_core mxm_wmi iwldvm snd_page_alloc(F) bnep
snd_seq_midi(F) videodev snd_seq_midi_event(F) snd_rawmidi(F) mac80211
snd_seq(F) snd_seq_device(F) btusb snd_timer(F) iwlwifi snd(F)
soundcore(F) microcode(F) psmouse(F) rfcomm bluetooth serio_raw(F)
cfg80211 lpc_ich mei_me wmi mei mac_hid coretemp binfmt_misc(F) nilfs2
btrfs(F) xor(F) zlib_deflate(F) raid6_pq(F) libcrc32c(F) nbd(F) i915
i2c_algo_bit drm_kms_helper drm alx mdio ahci(F) libahci(F) vi
deo(F) [last unloaded: ipmi_msghandler]
[ 1677.311066] CPU: 7 PID: 414 Comm: segctord Tainted: GF        C O
3.10.0-2-generic #10-Ubuntu
[ 1677.311096] Hardware name: ASUSTeK COMPUTER INC. N56VZ/N56VZ, BIOS
N56VZ.216 12/06/2012
[ 1677.311124] task: ffff88021c484650 ti: ffff88021eaa2000 task.ti:
ffff88021eaa2000
[ 1677.311155] RIP: 0010:[<ffffffffa024d0f2>]  [<ffffffffa024d0f2>]
nilfs_end_page_io+0x12/0xd0 [nilfs2]
[ 1677.311199] RSP: 0000:ffff88021eaa3d00  EFLAGS: 00010202
[ 1677.311218] RAX: ffff880167625180 RBX: 0000000000004c83 RCX: 0000000000000034
[ 1677.311248] RDX: 000000000000000d RSI: 0000000000000000 RDI: 0000000000004c83
[ 1677.311277] RBP: ffff88021eaa3d08 R08: 7800000000000000 R09: a8001fa0bc000000
[ 1677.311305] R10: 57ffca5f4be82f00 R11: 0000000000000019 R12: ffff880213f46288
[ 1677.311328] R13: 0000000000000000 R14: ffffea0007321f80 R15: ffff880167625138
[ 1677.311353] FS:  0000000000000000(0000) GS:ffff88022efc0000(0000)
knlGS:0000000000000000
[ 1677.311383] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1677.311403] CR2: 0000000000004c83 CR3: 0000000001c0e000 CR4: 00000000001407e0
[ 1677.311428] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1677.311455] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1677.311478] Stack:
[ 1677.311487]  ffff880213f461e0 ffff88021eaa3e00 ffffffffa024e4c5
ffffffff81019d09
[ 1677.311520]  ffff88021c484650 ffff88021c484650 ffff88021c484650
ffff880221282270
[ 1677.311541]  ffff88021e691f58 ffff88021e691e00 ffff880221282260
000000031c484650
[ 1677.311562] Call Trace:
[ 1677.311575]  [<ffffffffa024e4c5>]
nilfs_segctor_do_construct+0xf25/0x1b20 [nilfs2]
[ 1677.311596]  [<ffffffff81019d09>] ? sched_clock+0x9/0x10
[ 1677.311614]  [<ffffffffa024f3ab>]
nilfs_segctor_construct+0x17b/0x290 [nilfs2]
[ 1677.311636]  [<ffffffffa024f5e2>] nilfs_segctor_thread+0x122/0x3b0 [nilfs2]
[ 1677.311657]  [<ffffffffa024f4c0>] ?
nilfs_segctor_construct+0x290/0x290 [nilfs2]
[ 1677.311677]  [<ffffffff8107cae0>] kthread+0xc0/0xd0
[ 1677.311690]  [<ffffffff8107ca20>] ? kthread_create_on_node+0x120/0x120
[ 1677.311709]  [<ffffffff816dd16c>] ret_from_fork+0x7c/0xb0
[ 1677.311724]  [<ffffffff8107ca20>] ? kthread_create_on_node+0x120/0x120
[ 1677.311740] Code: 2d ee e0 5b 5d c3 48 89 df e8 fb 25 ee e0 eb db
66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 85 ff 48 89 e5 53 48
89 fb 74 29 <48> 8b 07 f6 c4 08 0f 84 9c 00 00 00 48 8b 47 30 48 8b 00
a9 00
[ 1677.311821] RIP  [<ffffffffa024d0f2>] nilfs_end_page_io+0x12/0xd0 [nilfs2]
[ 1677.311841]  RSP <ffff88021eaa3d00>
[ 1677.311850] CR2: 0000000000004c83
[ 1677.320046] ---[ end trace 0e7c8d51bd66cbe6 ]---
Jul 12 01:11:50 kernel: [ 1741.418989] SysRq : Emergency Sync
Jul 12 01:11:53 kernel: [ 1744.788020] SysRq : Terminate All Tasks
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Kernel Bug: unable to handle kernel paging request
       [not found] ` <CALJXSJquK6YxGKuH97Ec2CTMyJaZrJjOfePSKtgPDm8_9YXzzw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-07-12 18:58   ` Jérôme Poulin
       [not found]     ` <CALJXSJoW9Qpp9t42u_k4cW3gO6qzSPoeCjtQDU3tDKq6TJ=K8Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Jérôme Poulin @ 2013-07-12 18:58 UTC (permalink / raw)
  To: linux-nilfs

The problem happened again today after resume, it seems to be more
frequent since last week. Here is a pastebin of the traceback +
sysrq+W.

http://pastebin.ca/2426059

On Fri, Jul 12, 2013 at 1:24 AM, Jérôme Poulin <jeromepoulin@gmail.com> wrote:
> In response to Vyacheslav Dubeyko, here is the problem I encounter,
> I'm not sure how to reproduce it now, but before deleting
> /var/cache/apt, I was able to reproduce it by issuing apt-get update.
> Now, it triggers when Ubuntu launches the apt daemon. Afterward, the
> whole system is frozen, not even SysRq+I would return me to the
> console.
>
> After setting the log in another partition, here is what I have in syslog:
>
> Jul 12 01:08:43 bluetoothd[635]: Stopping discovery
> Jul 12 01:10:43 dbus[622]: [system] Activating service
> name='org.freedesktop.PackageKit' (using servicehelper)
> Jul 12 01:10:43 AptDaemon: INFO: Initializing daemon
> Jul 12 01:10:43 AptDaemon.PackageKit: INFO: Initializing PackageKit compat layer
> Jul 12 01:10:43 dbus[622]: [system] Successfully activated service
> 'org.freedesktop.PackageKit'
> [ 1677.310656] BUG: unable to handle kernel paging request at 0000000000004c83
> [ 1677.310683] IP: [<ffffffffa024d0f2>] nilfs_end_page_io+0x12/0xd0 [nilfs2]
> [ 1677.310708] PGD 0
> [ 1677.310715] Oops: 0000 [#1] SMP
> [ 1677.310726] Modules linked in: pci_stub vboxpci(OF) vboxnetadp(OF)
> vboxnetflt(OF) vboxdrv(OF) nfsd(F) auth_rpcgss(F) nfs_acl(F) lockd(F)
> sunrpc(F) dm_crypt(F) bbswitch(OF) zram(C) intel_powerclamp kvm_intel
> kvm parport_pc(F) ppdev(F) lp(F) uvcvideo parport(F) crc32_pclmul(F)
> ghash_clmulni_intel(F) aesni_intel(F) snd_hda_codec_realtek
> snd_hda_intel snd_hda_codec aes_x86_64(F) asus_wmi lrw(F)
> sparse_keymap gf128mul(F) snd_hwdep(F) glue_helper(F) ablk_helper(F)
> arc4(F) snd_pcm(F) cryptd(F) joydev(F) videobuf2_vmalloc
> videobuf2_memops videobuf2_core mxm_wmi iwldvm snd_page_alloc(F) bnep
> snd_seq_midi(F) videodev snd_seq_midi_event(F) snd_rawmidi(F) mac80211
> snd_seq(F) snd_seq_device(F) btusb snd_timer(F) iwlwifi snd(F)
> soundcore(F) microcode(F) psmouse(F) rfcomm bluetooth serio_raw(F)
> cfg80211 lpc_ich mei_me wmi mei mac_hid coretemp binfmt_misc(F) nilfs2
> btrfs(F) xor(F) zlib_deflate(F) raid6_pq(F) libcrc32c(F) nbd(F) i915
> i2c_algo_bit drm_kms_helper drm alx mdio ahci(F) libahci(F) vi
> deo(F) [last unloaded: ipmi_msghandler]
> [ 1677.311066] CPU: 7 PID: 414 Comm: segctord Tainted: GF        C O
> 3.10.0-2-generic #10-Ubuntu
> [ 1677.311096] Hardware name: ASUSTeK COMPUTER INC. N56VZ/N56VZ, BIOS
> N56VZ.216 12/06/2012
> [ 1677.311124] task: ffff88021c484650 ti: ffff88021eaa2000 task.ti:
> ffff88021eaa2000
> [ 1677.311155] RIP: 0010:[<ffffffffa024d0f2>]  [<ffffffffa024d0f2>]
> nilfs_end_page_io+0x12/0xd0 [nilfs2]
> [ 1677.311199] RSP: 0000:ffff88021eaa3d00  EFLAGS: 00010202
> [ 1677.311218] RAX: ffff880167625180 RBX: 0000000000004c83 RCX: 0000000000000034
> [ 1677.311248] RDX: 000000000000000d RSI: 0000000000000000 RDI: 0000000000004c83
> [ 1677.311277] RBP: ffff88021eaa3d08 R08: 7800000000000000 R09: a8001fa0bc000000
> [ 1677.311305] R10: 57ffca5f4be82f00 R11: 0000000000000019 R12: ffff880213f46288
> [ 1677.311328] R13: 0000000000000000 R14: ffffea0007321f80 R15: ffff880167625138
> [ 1677.311353] FS:  0000000000000000(0000) GS:ffff88022efc0000(0000)
> knlGS:0000000000000000
> [ 1677.311383] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1677.311403] CR2: 0000000000004c83 CR3: 0000000001c0e000 CR4: 00000000001407e0
> [ 1677.311428] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1677.311455] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 1677.311478] Stack:
> [ 1677.311487]  ffff880213f461e0 ffff88021eaa3e00 ffffffffa024e4c5
> ffffffff81019d09
> [ 1677.311520]  ffff88021c484650 ffff88021c484650 ffff88021c484650
> ffff880221282270
> [ 1677.311541]  ffff88021e691f58 ffff88021e691e00 ffff880221282260
> 000000031c484650
> [ 1677.311562] Call Trace:
> [ 1677.311575]  [<ffffffffa024e4c5>]
> nilfs_segctor_do_construct+0xf25/0x1b20 [nilfs2]
> [ 1677.311596]  [<ffffffff81019d09>] ? sched_clock+0x9/0x10
> [ 1677.311614]  [<ffffffffa024f3ab>]
> nilfs_segctor_construct+0x17b/0x290 [nilfs2]
> [ 1677.311636]  [<ffffffffa024f5e2>] nilfs_segctor_thread+0x122/0x3b0 [nilfs2]
> [ 1677.311657]  [<ffffffffa024f4c0>] ?
> nilfs_segctor_construct+0x290/0x290 [nilfs2]
> [ 1677.311677]  [<ffffffff8107cae0>] kthread+0xc0/0xd0
> [ 1677.311690]  [<ffffffff8107ca20>] ? kthread_create_on_node+0x120/0x120
> [ 1677.311709]  [<ffffffff816dd16c>] ret_from_fork+0x7c/0xb0
> [ 1677.311724]  [<ffffffff8107ca20>] ? kthread_create_on_node+0x120/0x120
> [ 1677.311740] Code: 2d ee e0 5b 5d c3 48 89 df e8 fb 25 ee e0 eb db
> 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 85 ff 48 89 e5 53 48
> 89 fb 74 29 <48> 8b 07 f6 c4 08 0f 84 9c 00 00 00 48 8b 47 30 48 8b 00
> a9 00
> [ 1677.311821] RIP  [<ffffffffa024d0f2>] nilfs_end_page_io+0x12/0xd0 [nilfs2]
> [ 1677.311841]  RSP <ffff88021eaa3d00>
> [ 1677.311850] CR2: 0000000000004c83
> [ 1677.320046] ---[ end trace 0e7c8d51bd66cbe6 ]---
> Jul 12 01:11:50 kernel: [ 1741.418989] SysRq : Emergency Sync
> Jul 12 01:11:53 kernel: [ 1744.788020] SysRq : Terminate All Tasks
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Kernel Bug: unable to handle kernel paging request
       [not found]     ` <CALJXSJoW9Qpp9t42u_k4cW3gO6qzSPoeCjtQDU3tDKq6TJ=K8Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-07-18 17:30       ` Vyacheslav Dubeyko
       [not found]         ` <F4156394-8A25-4F81-81C3-9921CB00BD92-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Vyacheslav Dubeyko @ 2013-07-18 17:30 UTC (permalink / raw)
  To: Jérôme Poulin; +Cc: linux-nilfs

Hi Jérôme,

On Jul 12, 2013, at 10:58 PM, Jérôme Poulin wrote:

Thank you for details. Sorry for delay with answer. I were on vacation.

> The problem happened again today after resume, it seems to be more
> frequent since last week. Here is a pastebin of the traceback +
> sysrq+W.
> 
> http://pastebin.ca/2426059
> 

Unfortunately, currently I haven't access to this share.

> On Fri, Jul 12, 2013 at 1:24 AM, Jérôme Poulin <jeromepoulin@gmail.com> wrote:
>> In response to Vyacheslav Dubeyko, here is the problem I encounter,
>> I'm not sure how to reproduce it now, but before deleting
>> /var/cache/apt, I was able to reproduce it by issuing apt-get update.
>> Now, it triggers when Ubuntu launches the apt daemon. Afterward, the
>> whole system is frozen, not even SysRq+I would return me to the
>> console.
>> 

So, as I see, the reproducing path is: (1) delete /var/cache/apt; (2) issue apt-get update.

Could you share additional details about the issue on your side?

I mean such details:
(1) The strace output for the case of issuing the apt-get update (in the case of issue reproducing).
(2) I need more details about your NILFS2 partition. Could you share output of "nilfs-tune -l"?

Thanks,
Vyacheslav Dubeyko.

>> After setting the log in another partition, here is what I have in syslog:
>> 
>> Jul 12 01:08:43 bluetoothd[635]: Stopping discovery
>> Jul 12 01:10:43 dbus[622]: [system] Activating service
>> name='org.freedesktop.PackageKit' (using servicehelper)
>> Jul 12 01:10:43 AptDaemon: INFO: Initializing daemon
>> Jul 12 01:10:43 AptDaemon.PackageKit: INFO: Initializing PackageKit compat layer
>> Jul 12 01:10:43 dbus[622]: [system] Successfully activated service
>> 'org.freedesktop.PackageKit'
>> [ 1677.310656] BUG: unable to handle kernel paging request at 0000000000004c83
>> [ 1677.310683] IP: [<ffffffffa024d0f2>] nilfs_end_page_io+0x12/0xd0 [nilfs2]
>> [ 1677.310708] PGD 0
>> [ 1677.310715] Oops: 0000 [#1] SMP
>> [ 1677.310726] Modules linked in: pci_stub vboxpci(OF) vboxnetadp(OF)
>> vboxnetflt(OF) vboxdrv(OF) nfsd(F) auth_rpcgss(F) nfs_acl(F) lockd(F)
>> sunrpc(F) dm_crypt(F) bbswitch(OF) zram(C) intel_powerclamp kvm_intel
>> kvm parport_pc(F) ppdev(F) lp(F) uvcvideo parport(F) crc32_pclmul(F)
>> ghash_clmulni_intel(F) aesni_intel(F) snd_hda_codec_realtek
>> snd_hda_intel snd_hda_codec aes_x86_64(F) asus_wmi lrw(F)
>> sparse_keymap gf128mul(F) snd_hwdep(F) glue_helper(F) ablk_helper(F)
>> arc4(F) snd_pcm(F) cryptd(F) joydev(F) videobuf2_vmalloc
>> videobuf2_memops videobuf2_core mxm_wmi iwldvm snd_page_alloc(F) bnep
>> snd_seq_midi(F) videodev snd_seq_midi_event(F) snd_rawmidi(F) mac80211
>> snd_seq(F) snd_seq_device(F) btusb snd_timer(F) iwlwifi snd(F)
>> soundcore(F) microcode(F) psmouse(F) rfcomm bluetooth serio_raw(F)
>> cfg80211 lpc_ich mei_me wmi mei mac_hid coretemp binfmt_misc(F) nilfs2
>> btrfs(F) xor(F) zlib_deflate(F) raid6_pq(F) libcrc32c(F) nbd(F) i915
>> i2c_algo_bit drm_kms_helper drm alx mdio ahci(F) libahci(F) vi
>> deo(F) [last unloaded: ipmi_msghandler]
>> [ 1677.311066] CPU: 7 PID: 414 Comm: segctord Tainted: GF        C O
>> 3.10.0-2-generic #10-Ubuntu
>> [ 1677.311096] Hardware name: ASUSTeK COMPUTER INC. N56VZ/N56VZ, BIOS
>> N56VZ.216 12/06/2012
>> [ 1677.311124] task: ffff88021c484650 ti: ffff88021eaa2000 task.ti:
>> ffff88021eaa2000
>> [ 1677.311155] RIP: 0010:[<ffffffffa024d0f2>]  [<ffffffffa024d0f2>]
>> nilfs_end_page_io+0x12/0xd0 [nilfs2]
>> [ 1677.311199] RSP: 0000:ffff88021eaa3d00  EFLAGS: 00010202
>> [ 1677.311218] RAX: ffff880167625180 RBX: 0000000000004c83 RCX: 0000000000000034
>> [ 1677.311248] RDX: 000000000000000d RSI: 0000000000000000 RDI: 0000000000004c83
>> [ 1677.311277] RBP: ffff88021eaa3d08 R08: 7800000000000000 R09: a8001fa0bc000000
>> [ 1677.311305] R10: 57ffca5f4be82f00 R11: 0000000000000019 R12: ffff880213f46288
>> [ 1677.311328] R13: 0000000000000000 R14: ffffea0007321f80 R15: ffff880167625138
>> [ 1677.311353] FS:  0000000000000000(0000) GS:ffff88022efc0000(0000)
>> knlGS:0000000000000000
>> [ 1677.311383] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 1677.311403] CR2: 0000000000004c83 CR3: 0000000001c0e000 CR4: 00000000001407e0
>> [ 1677.311428] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [ 1677.311455] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> [ 1677.311478] Stack:
>> [ 1677.311487]  ffff880213f461e0 ffff88021eaa3e00 ffffffffa024e4c5
>> ffffffff81019d09
>> [ 1677.311520]  ffff88021c484650 ffff88021c484650 ffff88021c484650
>> ffff880221282270
>> [ 1677.311541]  ffff88021e691f58 ffff88021e691e00 ffff880221282260
>> 000000031c484650
>> [ 1677.311562] Call Trace:
>> [ 1677.311575]  [<ffffffffa024e4c5>]
>> nilfs_segctor_do_construct+0xf25/0x1b20 [nilfs2]
>> [ 1677.311596]  [<ffffffff81019d09>] ? sched_clock+0x9/0x10
>> [ 1677.311614]  [<ffffffffa024f3ab>]
>> nilfs_segctor_construct+0x17b/0x290 [nilfs2]
>> [ 1677.311636]  [<ffffffffa024f5e2>] nilfs_segctor_thread+0x122/0x3b0 [nilfs2]
>> [ 1677.311657]  [<ffffffffa024f4c0>] ?
>> nilfs_segctor_construct+0x290/0x290 [nilfs2]
>> [ 1677.311677]  [<ffffffff8107cae0>] kthread+0xc0/0xd0
>> [ 1677.311690]  [<ffffffff8107ca20>] ? kthread_create_on_node+0x120/0x120
>> [ 1677.311709]  [<ffffffff816dd16c>] ret_from_fork+0x7c/0xb0
>> [ 1677.311724]  [<ffffffff8107ca20>] ? kthread_create_on_node+0x120/0x120
>> [ 1677.311740] Code: 2d ee e0 5b 5d c3 48 89 df e8 fb 25 ee e0 eb db
>> 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 85 ff 48 89 e5 53 48
>> 89 fb 74 29 <48> 8b 07 f6 c4 08 0f 84 9c 00 00 00 48 8b 47 30 48 8b 00
>> a9 00
>> [ 1677.311821] RIP  [<ffffffffa024d0f2>] nilfs_end_page_io+0x12/0xd0 [nilfs2]
>> [ 1677.311841]  RSP <ffff88021eaa3d00>
>> [ 1677.311850] CR2: 0000000000004c83
>> [ 1677.320046] ---[ end trace 0e7c8d51bd66cbe6 ]---
>> Jul 12 01:11:50 kernel: [ 1741.418989] SysRq : Emergency Sync
>> Jul 12 01:11:53 kernel: [ 1744.788020] SysRq : Terminate All Tasks
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Kernel Bug: unable to handle kernel paging request
       [not found]         ` <F4156394-8A25-4F81-81C3-9921CB00BD92-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2013-07-22 19:11           ` Jérôme Poulin
       [not found]             ` <CALJXSJrj0J_-ZUCOurJXaYhx_wEJwxb2_5OOJjQSSmmP-PQDgg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Jérôme Poulin @ 2013-07-22 19:11 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs

On Thu, Jul 18, 2013 at 1:30 PM, Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org> wrote:
>
> > The problem happened again today after resume, it seems to be more
> > frequent since last week. Here is a pastebin of the traceback +
> > sysrq+W.
> >
> > http://pastebin.ca/2426059
> >
>
> Unfortunately, currently I haven't access to this share.

I'm back using my laptop but wasn't able to reproduce it again, it is
not as easy as it was before to reproduce, sometimes apt-get update
works, sometimes it freezes. Here is a different paste bin as
pastebin.ca seems to be down for some time now:
http://pastebin.com/ALmuHdfh

> So, as I see, the reproducing path is: (1) delete /var/cache/apt; (2) issue apt-get update.

In fact it is the other way around;
(1) Issue apt-get update: Crash.
(2) Reboot, try again: Crash.
(3) Delete /var/lib/apt/lists/* and try again: Works for some time.


> (1) The strace output for the case of issuing the apt-get update (in the case of issue reproducing).

That will be hard to obtain except maybe if I alias my apt-get update
to strace -f -o /boot/somefile.txt apt-get update and hope the problem
occur again. It sometimes happen when launched from the Ubuntu Store
which doesn't use bash to launch the update.

> (2) I need more details about your NILFS2 partition. Could you share output of "nilfs-tune -l"?

$ sudo nilfs-tune -l /dev/vgUbuntu/root
nilfs-tune 2.1.4
Filesystem volume name:  root
Filesystem UUID:  336f247d-c8d1-4e91-887a-258121c4face
Filesystem magic number:  0x3434
Filesystem revision #:  2.0
Filesystem features:      (none)
Filesystem state:  invalid or mounted
Filesystem OS type:  Linux
Block size:  4096
Filesystem created:  Thu Apr 11 22:34:29 2013
Last mount time:  Mon Jul 22 13:35:46 2013
Last write time:  Mon Jul 22 15:06:02 2013
Mount count:  86
Maximum mount count:  50
Reserve blocks uid:  0 (user root)
Reserve blocks gid:  0 (group root)
First inode:  11
Inode size:  128
DAT entry size:  32
Checkpoint size:  192
Segment usage size:  16
Number of segments:  25599
Device size:  214748364800
First data block:  1
# of blocks per segment:  2048
Reserved segments %:  5
Last checkpoint #:  3229012
Last block address:  17242181
Last sequence #:  136170
Free blocks count:  17827840
Commit interval:  0
# of blks to create seg:  0
CRC seed:  0x7ab1d7ed
CRC check sum:  0xfab54710
CRC check data size:  0x00000118
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Kernel Bug: unable to handle kernel paging request
       [not found]             ` <CALJXSJrj0J_-ZUCOurJXaYhx_wEJwxb2_5OOJjQSSmmP-PQDgg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-07-23 11:15               ` Vyacheslav Dubeyko
  2013-07-26 18:13                 ` Jérôme Poulin
  0 siblings, 1 reply; 15+ messages in thread
From: Vyacheslav Dubeyko @ 2013-07-23 11:15 UTC (permalink / raw)
  To: Jérôme Poulin; +Cc: linux-nilfs

On Mon, 2013-07-22 at 15:11 -0400, Jérôme Poulin wrote:

[snip]
> 
> I'm back using my laptop but wasn't able to reproduce it again, it is
> not as easy as it was before to reproduce, sometimes apt-get update
> works, sometimes it freezes. Here is a different paste bin as
> pastebin.ca seems to be down for some time now:
> http://pastebin.com/ALmuHdfh
> 

Thank you. Now I downloaded this output.

> > So, as I see, the reproducing path is: (1) delete /var/cache/apt; (2) issue apt-get update.
> 
> In fact it is the other way around;
> (1) Issue apt-get update: Crash.
> (2) Reboot, try again: Crash.
> (3) Delete /var/lib/apt/lists/* and try again: Works for some time.
> 

Ok. Thank you for additional details.

> 
> > (1) The strace output for the case of issuing the apt-get update (in the case of issue reproducing).
> 
> That will be hard to obtain except maybe if I alias my apt-get update
> to strace -f -o /boot/somefile.txt apt-get update and hope the problem
> occur again. It sometimes happen when launched from the Ubuntu Store
> which doesn't use bash to launch the update.
> 

Ok. I see. I'll try to analyze the issue on the basis of available
information. But it will be a great to have the requested strace output.

Thanks,
Vyacheslav Dubeyko.

> > (2) I need more details about your NILFS2 partition. Could you share output of "nilfs-tune -l"?
> 
> $ sudo nilfs-tune -l /dev/vgUbuntu/root
> nilfs-tune 2.1.4
> Filesystem volume name:  root
> Filesystem UUID:  336f247d-c8d1-4e91-887a-258121c4face
> Filesystem magic number:  0x3434
> Filesystem revision #:  2.0
> Filesystem features:      (none)
> Filesystem state:  invalid or mounted
> Filesystem OS type:  Linux
> Block size:  4096
> Filesystem created:  Thu Apr 11 22:34:29 2013
> Last mount time:  Mon Jul 22 13:35:46 2013
> Last write time:  Mon Jul 22 15:06:02 2013
> Mount count:  86
> Maximum mount count:  50
> Reserve blocks uid:  0 (user root)
> Reserve blocks gid:  0 (group root)
> First inode:  11
> Inode size:  128
> DAT entry size:  32
> Checkpoint size:  192
> Segment usage size:  16
> Number of segments:  25599
> Device size:  214748364800
> First data block:  1
> # of blocks per segment:  2048
> Reserved segments %:  5
> Last checkpoint #:  3229012
> Last block address:  17242181
> Last sequence #:  136170
> Free blocks count:  17827840
> Commit interval:  0
> # of blks to create seg:  0
> CRC seed:  0x7ab1d7ed
> CRC check sum:  0xfab54710
> CRC check data size:  0x00000118
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Kernel Bug: unable to handle kernel paging request
  2013-07-23 11:15               ` Vyacheslav Dubeyko
@ 2013-07-26 18:13                 ` Jérôme Poulin
       [not found]                   ` <CALJXSJrY22eGkYA76wwL4moAdsjV+_PUtvVO6tt5K16hzMh8xQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Jérôme Poulin @ 2013-07-26 18:13 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs

Good afternoon,

I have more informations to add to the bug.

On Tue, Jul 23, 2013 at 7:15 AM, Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org> wrote:
>> > So, as I see, the reproducing path is: (1) delete /var/cache/apt; (2) issue apt-get update.
>>
>> In fact it is the other way around;
>> (1) Issue apt-get update: Crash.
>> (2) Reboot, try again: Crash.
>> (3) Delete /var/lib/apt/lists/* and try again: Works for some time.
>>
>
> Ok. Thank you for additional details.

I was able to reproduce consistently the problem.
1. Make a snapshot of a checkpoint and mount it.
2. Read from the checkpoint (make a backup).
3. Issue apt-get update.

>
>>
>> > (1) The strace output for the case of issuing the apt-get update (in the case of issue reproducing).
>>
>> That will be hard to obtain except maybe if I alias my apt-get update
>> to strace -f -o /boot/somefile.txt apt-get update and hope the problem
>> occur again. It sometimes happen when launched from the Ubuntu Store
>> which doesn't use bash to launch the update.
>>
>
> Ok. I see. I'll try to analyze the issue on the basis of available
> information. But it will be a great to have the requested strace output.

Here are the log files.
1. strace from the backup was still reading, it did not stop.
2. apt-get-crash.log
4362  13:32:04 read(6, " Debugging symbols\nHomepage: htt"..., 32052) = 32052
4362  13:32:04 read(6, "tcp-wrappers/libwrap0-dev_7.6.q-"..., 32460) = 32460
4362  13:32:04 read(6, "bdevel\nInstalled-Size: 542\nMaint"..., 32710) = 32710
4362  13:32:04 read(6, "ian.org>\nArchitecture: i386\nSour"..., 32558) = 32558
4362  13:32:04 read(6, "l-Maintainer: XCB Developers <xc"..., 32613) = 32613
4362  13:32:04 read(6, "ed: 9m\n\nPackage: libxcb-xvmc0-db"..., 31979) = 31979
4362  13:32:04 read(6, "ibz-dev\nFilename: pool/main/x/xf"..., 32335) = 32335
4362  13:32:04 read(6, "n: Ubuntu\nSupported: 9m\nTask: my"..., 31903) = 31903
4362  13:32:04 read(6, "\nFilename: pool/main/libx/libxpm"..., 32467) = 32467
4362  13:32:04 read(6, "buntu/+filebug\nOrigin: Ubuntu\nSu"..., 31982) = 31982
4362  13:32:04 read(6, "chpad.net/ubuntu/+filebug\nOrigin"..., 31976) = 31976
4362  13:32:04 read(6, "buntu Developers <ubuntu-devel-d"..., 32681) = 32681
4362  13:32:04 read(6, "-utils\nDepends: debconf (>= 0.5)"..., 32468) = 32468
4362  13:32:04 read(6, "top, mythbuntu-backend-slave, my"..., 31678) = 31678
4362  13:32:04 read(6, "d5: 1b0992eebd45ca5ceadc775532a4"..., 32126) = 32126
4362  13:32:04 read(6, " all\nSource: munin\nVersion: 2.0."..., 32535) = 32535
4362  13:32:04 read(6, "fice-core | openoffice.org-hunsp"..., 32373) = 32373
4362  13:32:04 read(6, "ffice.org-dictionaries (1:3.3.0~"..., 32513) = 32513
4362  13:32:04 read(6, "hbuntu-backend-master\n\nPackage: "..., 31756) = 31756
4362  13:32:04 read(6, "2614b9\nSHA256: bcc70d6577dd06565"..., 32118) = 32118
4362  13:32:04 read(6, "17\nSHA256: d903f798a8c38fef2b992"..., 32339) = 32339
-- End of file --

3. syslog:
Jul 26 13:32:04 kernel: [  317.525021] BUG: unable to handle kernel
paging request at 00000000000033e5
Jul 26 13:32:04 kernel: [  317.525371] IP: [<ffffffffa02930f2>]
nilfs_end_page_io+0x12/0xd0 [nilfs2]
Jul 26 13:32:04 kernel: [  317.525715] PGD 0
Jul 26 13:32:04 kernel: [  317.526044] Oops: 0000 [#1] SMP
Jul 26 13:32:04 kernel: [  317.526372] Modules linked in: pci_stub
vboxpci(OF) vboxnetadp(OF) vboxnetflt(OF) vboxdrv(OF) nfsd(F)
auth_rpcgss(F) nfs_acl(F) lockd(F) sunrpc(F) dm_crypt(F) bbswitch(OF)
intel_powerclamp kvm uvcvideo videobuf2_v
malloc videobuf2_memops videobuf2_core crc32_pclmul(F)
ghash_clmulni_intel(F) aesni_intel(F) videodev aes_x86_64(F) lrw(F)
gf128mul(F) btusb glue_helper(F) ablk_helper(F) cryptd(F)
snd_hda_codec_realtek snd_hda_intel snd_hda_codec arc4(F) asus_wmi
snd_
hwdep(F) sparse_keymap mxm_wmi snd_pcm(F) joydev(F) snd_page_alloc(F)
iwldvm snd_seq_midi(F) snd_seq_midi_event(F) snd_rawmidi(F) snd_seq(F)
mac80211 snd_seq_device(F) snd_timer(F) iwlwifi snd(F) soundcore(F)
cfg80211 mei_me mei lpc_ich psmouse(F) wmi
microcode(F) serio_raw(F) mac_hid parport_pc(F) ppdev(F) lp(F)
parport(F) bnep rfcomm bluetooth binfmt_misc(F) coretemp nilfs2
btrfs(F) xor(F) zlib_deflate(F) raid6_pq(F) libcrc32c(F) nbd(F)
hid_generic usbhid hid usb_storage(F) i915 i2c_algo_bit drm_k
ms_helper drm alx mdio a
Jul 26 13:32:04 kernel: hci(F) libahci(F) video(F) [last unloaded:
ipmi_msghandler]
Jul 26 13:32:04 kernel: [  317.528925] CPU: 4 PID: 388 Comm: segctord
Tainted: GF        C O 3.10.0-4-generic #13-Ubuntu
Jul 26 13:32:04 kernel: [  317.529487] Hardware name: ASUSTeK COMPUTER
INC. N56VZ/N56VZ, BIOS N56VZ.216 12/06/2012
Jul 26 13:32:04 kernel: [  317.530046] task: ffff88021d159770 ti:
ffff88022067a000 task.ti: ffff88022067a000
Jul 26 13:32:04 kernel: [  317.530606] RIP: 0010:[<ffffffffa02930f2>]
[<ffffffffa02930f2>] nilfs_end_page_io+0x12/0xd0 [nilfs2]
Jul 26 13:32:04 kernel: [  317.531184] RSP: 0018:ffff88022067bd00
EFLAGS: 00010202
Jul 26 13:32:04 kernel: [  317.531756] RAX: ffff8801fd7f16c8 RBX:
00000000000033e5 RCX: 0000000000000034
Jul 26 13:32:04 kernel: [  317.532371] RDX: 000000000000000d RSI:
0000000000000000 RDI: 00000000000033e5
Jul 26 13:32:04 kernel: [  317.532952] RBP: ffff88022067bd08 R08:
7800000000000000 R09: a80022ed3c000000
Jul 26 13:32:04 kernel: [  317.533541] R10: 57ffc712ccbb4f00 R11:
0000000000000019 R12: ffff88021db8eeb8
Jul 26 13:32:04 kernel: [  317.534129] R13: 0000000000000000 R14:
ffffea0007a481c0 R15: ffff8801fd7f1680
Jul 26 13:32:04 kernel: [  317.534717] FS:  0000000000000000(0000)
GS:ffff88022ef00000(0000) knlGS:0000000000000000
Jul 26 13:32:04 kernel: [  317.535318] CS:  0010 DS: 0000 ES: 0000
CR0: 0000000080050033
Jul 26 13:32:04 kernel: [  317.535919] CR2: 00000000000033e5 CR3:
0000000001c0e000 CR4: 00000000001407e0
Jul 26 13:32:04 kernel: [  317.536528] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Jul 26 13:32:04 kernel: [  317.537176] DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
Jul 26 13:32:04 kernel: [  317.537791] Stack:
Jul 26 13:32:04 kernel: [  317.538406]  ffff88021db8ee10
ffff88022067be00 ffffffffa02944c5 ffffffff81019d09
Jul 26 13:32:04 kernel: [  317.539057]  ffff88021d159770
ffff88021d159770 ffff88021d159770 ffff88021f1bec70
Jul 26 13:32:04 kernel: [  317.539715]  ffff88021dcc1158
ffff88021dcc1000 ffff88021f1bec60 000000031d159770
Jul 26 13:32:04 kernel: [  317.540384] Call Trace:
Jul 26 13:32:04 kernel: [  317.541056]  [<ffffffffa02944c5>]
nilfs_segctor_do_construct+0xf25/0x1b20 [nilfs2]
Jul 26 13:32:04 kernel: [  317.541744]  [<ffffffff81019d09>] ?
sched_clock+0x9/0x10
Jul 26 13:32:04 kernel: [  317.542440]  [<ffffffffa02953ab>]
nilfs_segctor_construct+0x17b/0x290 [nilfs2]
Jul 26 13:32:04 kernel: [  317.543145]  [<ffffffffa02955e2>]
nilfs_segctor_thread+0x122/0x3b0 [nilfs2]
Jul 26 13:32:04 kernel: [  317.543840]  [<ffffffffa02954c0>] ?
nilfs_segctor_construct+0x290/0x290 [nilfs2]
Jul 26 13:32:04 kernel: [  317.544534]  [<ffffffff8107cca0>] kthread+0xc0/0xd0
Jul 26 13:32:04 kernel: [  317.545225]  [<ffffffff8107cbe0>] ?
kthread_create_on_node+0x120/0x120
Jul 26 13:32:04 kernel: [  317.545924]  [<ffffffff816f026c>]
ret_from_fork+0x7c/0xb0
Jul 26 13:32:04 kernel: [  317.546583]  [<ffffffff8107cbe0>] ?
kthread_create_on_node+0x120/0x120
Jul 26 13:32:04 kernel: [  317.547208] Code: d9 e9 e0 5b 5d c3 48 89
df e8 db d1 e9 e0 eb db 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
48 85 ff 48 89 e5 53 48 89 fb 74 29 <48> 8b 07 f6 c4 08 0f 84 9c 00 00
00 48 8b 47 30 48 8b 00 a9 00
Jul 26 13:32:04 kernel: [  317.548013] RIP  [<ffffffffa02930f2>]
nilfs_end_page_io+0x12/0xd0 [nilfs2]
Jul 26 13:32:04 kernel: [  317.548674]  RSP <ffff88022067bd00>
Jul 26 13:32:04 kernel: [  317.549286] CR2: 00000000000033e5
Jul 26 13:32:04 kernel: [  317.549897] ---[ end trace ffe6496742ccfbe8 ]---
Jul 26 13:32:06 AptDaemon: INFO: Initializing daemon
Jul 26 13:32:07 AptDaemon.PackageKit: INFO: Initializing PackageKit compat layer
Jul 26 13:32:07 dbus[569]: [system] Successfully activated service
'org.freedesktop.PackageKit'
-- Reboot --
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Kernel Bug: unable to handle kernel paging request
       [not found]                   ` <CALJXSJrY22eGkYA76wwL4moAdsjV+_PUtvVO6tt5K16hzMh8xQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-07-27 17:06                     ` Vyacheslav Dubeyko
       [not found]                       ` <CALJXSJqV5nYb_t6GMS0FpWyf1aRehAgpvebwgbJzMJfctf1b2A@mail.gmail.com>
  0 siblings, 1 reply; 15+ messages in thread
From: Vyacheslav Dubeyko @ 2013-07-27 17:06 UTC (permalink / raw)
  To: Jérôme Poulin; +Cc: linux-nilfs


On Jul 26, 2013, at 10:13 PM, Jérôme Poulin wrote:

> Good afternoon,
> 
> I have more informations to add to the bug.
> 

Thank you for additional info.

[snip]
> I was able to reproduce consistently the problem.
> 1. Make a snapshot of a checkpoint and mount it.
> 2. Read from the checkpoint (make a backup).
> 3. Issue apt-get update.
> 

I tried to reproduce the issue but I haven't success in it.
Could you describe the reproducing path in more details?

Moreover, could you share lscp utility output for your NILFS2 partition?

[snip]
> 
> Here are the log files.
> 1. strace from the backup was still reading, it did not stop.
> 2. apt-get-crash.log
> 4362  13:32:04 read(6, " Debugging symbols\nHomepage: htt"..., 32052) = 32052
> 4362  13:32:04 read(6, "tcp-wrappers/libwrap0-dev_7.6.q-"..., 32460) = 32460
> 4362  13:32:04 read(6, "bdevel\nInstalled-Size: 542\nMaint"..., 32710) = 32710
> 4362  13:32:04 read(6, "ian.org>\nArchitecture: i386\nSour"..., 32558) = 32558
> 4362  13:32:04 read(6, "l-Maintainer: XCB Developers <xc"..., 32613) = 32613
> 4362  13:32:04 read(6, "ed: 9m\n\nPackage: libxcb-xvmc0-db"..., 31979) = 31979
> 4362  13:32:04 read(6, "ibz-dev\nFilename: pool/main/x/xf"..., 32335) = 32335
> 4362  13:32:04 read(6, "n: Ubuntu\nSupported: 9m\nTask: my"..., 31903) = 31903
> 4362  13:32:04 read(6, "\nFilename: pool/main/libx/libxpm"..., 32467) = 32467
> 4362  13:32:04 read(6, "buntu/+filebug\nOrigin: Ubuntu\nSu"..., 31982) = 31982
> 4362  13:32:04 read(6, "chpad.net/ubuntu/+filebug\nOrigin"..., 31976) = 31976
> 4362  13:32:04 read(6, "buntu Developers <ubuntu-devel-d"..., 32681) = 32681
> 4362  13:32:04 read(6, "-utils\nDepends: debconf (>= 0.5)"..., 32468) = 32468
> 4362  13:32:04 read(6, "top, mythbuntu-backend-slave, my"..., 31678) = 31678
> 4362  13:32:04 read(6, "d5: 1b0992eebd45ca5ceadc775532a4"..., 32126) = 32126
> 4362  13:32:04 read(6, " all\nSource: munin\nVersion: 2.0."..., 32535) = 32535
> 4362  13:32:04 read(6, "fice-core | openoffice.org-hunsp"..., 32373) = 32373
> 4362  13:32:04 read(6, "ffice.org-dictionaries (1:3.3.0~"..., 32513) = 32513
> 4362  13:32:04 read(6, "hbuntu-backend-master\n\nPackage: "..., 31756) = 31756
> 4362  13:32:04 read(6, "2614b9\nSHA256: bcc70d6577dd06565"..., 32118) = 32118
> 4362  13:32:04 read(6, "17\nSHA256: d903f798a8c38fef2b992"..., 32339) = 32339
> -- End of file --
> 

Could you share the full strace output?

Because in shared part of strace output I can't see any issue or failure.
Moreover, the reported issue occurs in segstor thread. But, usually, segctor activity
takes place after writing or flushing operation. So, it means for me that a reason of the
issue is hidden in another part of the strace output or, maybe, in strace output for
backup operation. Could you share the strace output for backup operation too?

Thanks,
Vyacheslav Dubeyko.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Kernel Bug: unable to handle kernel paging request
       [not found]                         ` <CALJXSJqV5nYb_t6GMS0FpWyf1aRehAgpvebwgbJzMJfctf1b2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-07-29 17:30                           ` Vyacheslav Dubeyko
  2013-08-09 13:15                           ` Vyacheslav Dubeyko
  1 sibling, 0 replies; 15+ messages in thread
From: Vyacheslav Dubeyko @ 2013-07-29 17:30 UTC (permalink / raw)
  To: Jérôme Poulin; +Cc: linux-nilfs


On Jul 27, 2013, at 10:05 PM, Jérôme Poulin wrote:

> On Sat, Jul 27, 2013 at 1:06 PM, Vyacheslav Dubeyko <slava-yeENwD64cLyIwRZHo2/mJg@public.gmane.orgm> wrote:
>> I tried to reproduce the issue but I haven't success in it.
>> Could you describe the reproducing path in more details?
>> 
> 
> Since I have a reproducible path, here is how I did:
> 1. init S to get to single user mode.
> 2. sysrq+E to make sure only my shell is running
> 3. start network-manager to get my wifi connection up
> 4. login as root and launch "screen"
> 5. cd /boot/log/nilfs which is a ext3 mount point and can log when NILFS dies.
> 6. lscp | xz -9e > lscp.txt.xz
> 7. mount my snapshot using mount -o cp=3360839,ro /dev/vgUbuntu/root /mnt/nilfs
> 8. start a screen to dump /proc/kmsg to text file since rsyslog is killed
> 9. start a screen and launch strace -f -o find-cat.log -t find
> /mnt/nilfs -type f -exec cat {} > /dev/null \;
> 10. start a screen and launch strace -f -o apt-get.log -t apt-get update
> 11. launch the last command again as it did not crash the first time
> 12. apt-get crashes
> 13. ps aux > ps-aux-crashed.log
> 13. sysrq+W
> 14. sysrq+E  wait for everything to terminate
> 15. sysrq+SUSB
> 

I have reproduced the issue successfully. So, I can begin to investigate it.

Thank you for efforts in the issue reproducing and detailed information
about the issue.

With the bets regards,
Vyacheslav Dubeyko.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Kernel Bug: unable to handle kernel paging request
       [not found]                         ` <CALJXSJqV5nYb_t6GMS0FpWyf1aRehAgpvebwgbJzMJfctf1b2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2013-07-29 17:30                           ` Vyacheslav Dubeyko
@ 2013-08-09 13:15                           ` Vyacheslav Dubeyko
  2013-08-14 22:38                             ` Ryusuke Konishi
  1 sibling, 1 reply; 15+ messages in thread
From: Vyacheslav Dubeyko @ 2013-08-09 13:15 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: linux-nilfs, Jérôme Poulin

Hi Ryusuke,

I am investigating the issue during last two weeks and I think that it
is time to share current results and my considerations. I feel necessity
to discuss possible reasons of the issue. Maybe, I miss something and it
needs to advise me a proper way of the issue investigation.

Actually, I can reproduce the issue by means of way of starting on
rootfs compilation task of Linux kernel and apt-get update task in
parallel. The issue results in such crash:

[  220.130662] BUG: unable to handle kernel paging request at 0000000000004612
[  220.130666] IP: [<ffffffff812b55ae>] nilfs_end_page_io+0x3e/0x180

[  220.130574] Call Trace:
[  220.130587]  [<ffffffff816c6b57>] dump_stack+0x19/0x1b
[  220.130593]  [<ffffffff812b5667>] nilfs_end_page_io+0xf7/0x180
[  220.130598]  [<ffffffff812ba2c4>] nilfs_segctor_do_construct+0x1984/0x2410
[  220.130603]  [<ffffffff812bb1f3>] nilfs_segctor_construct+0x1c3/0x450
[  220.130608]  [<ffffffff812bb5da>] nilfs_segctor_thread+0x15a/0x4c0
[  220.130612]  [<ffffffff816cad1f>] ? __schedule+0x3cf/0x810
[  220.130617]  [<ffffffff812bb480>] ? nilfs_segctor_construct+0x450/0x450
[  220.130622]  [<ffffffff81069760>] kthread+0xc0/0xd0
[  220.130626]  [<ffffffff810696a0>] ? flush_kthread_worker+0xb0/0xb0
[  220.130631]  [<ffffffff816d519c>] ret_from_fork+0x7c/0xb0
[  220.130635]  [<ffffffff810696a0>] ? flush_kthread_worker+0xb0/0xb0

I suppose that I haven't clear picture of the issue, currently. But I
have some steady reproducible results of the issue investigation.

As I can see, the issue is reproduced in the case of writing on volume
many blocks of a big file (for example, 1518 blocks) with mixture in the
buffer heads chain some count of another small files' blocks. Usually,
the issue takes place for a buffer heads chain that contains about 1500
- 2000 blocks.

I have such picture on the phase of adding of payload buffers:

[  959.803987] NILFS [nilfs_segbuf_add_payload_buffer]:167 page->index 22579166, i_ino 3, i_size 0, nblocks 1762
[  959.803990] NILFS [nilfs_segbuf_add_payload_buffer]:168 bh->b_assoc_buffers.next ffff8802247e3af8, bh->b_assoc_buffers.prev ffff880209838a08
[  959.803993] NILFS [nilfs_segctor_apply_buffers]:1158 listp ffff880220345ba8, listp->prev ffff880209836a70, listp->next ffff880209839ad8
[  959.803997] NILFS [nilfs_segctor_apply_buffers]:1159 bh->b_blocknr 22579166, bh->b_size 4096, bh->b_page ffffea000895db40
[  959.804000] NILFS [nilfs_segctor_apply_buffers]:1160 bh->b_assoc_buffers.next ffff8802247e3af8, bh->b_assoc_buffers.prev ffff880209838a08
[  959.804006] NILFS [nilfs_segbuf_add_payload_buffer]:167 page->index 22579167, i_ino 3, i_size 0, nblocks 1763
[  959.804009] NILFS [nilfs_segbuf_add_payload_buffer]:168 bh->b_assoc_buffers.next ffff8802247e3af8, bh->b_assoc_buffers.prev ffff8802267aac78
[  959.804013] NILFS [nilfs_segctor_apply_buffers]:1158 listp ffff880220345ba8, listp->prev ffff880209836a70, listp->next ffff880209836ad8
[  959.804016] NILFS [nilfs_segctor_apply_buffers]:1159 bh->b_blocknr 22579167, bh->b_size 4096, bh->b_page ffffea00082b73c0
[  959.804025] NILFS [nilfs_segbuf_add_payload_buffer]:167 page->index 22579168, i_ino 3, i_size 0, nblocks 1764
[  959.804028] NILFS [nilfs_segbuf_add_payload_buffer]:168 bh->b_assoc_buffers.next ffff8802247e3af8, bh->b_assoc_buffers.prev ffff880209839ad8
[  959.804032] NILFS [nilfs_segctor_apply_buffers]:1158 listp ffff880220345ba8, listp->prev ffff880209836a70, listp->next ffff880209836a70
[  959.804035] NILFS [nilfs_segctor_apply_buffers]:1159 bh->b_blocknr 22579168, bh->b_size 4096, bh->b_page ffffea00082afc00
[  959.804044] NILFS [nilfs_segbuf_add_payload_buffer]:167 page->index 22579169, i_ino 3, i_size 0, nblocks 1765
[  959.804047] NILFS [nilfs_segbuf_add_payload_buffer]:168 bh->b_assoc_buffers.next ffff8802247e3af8, bh->b_assoc_buffers.prev ffff880209836ad8
[  959.804051] NILFS [nilfs_segctor_apply_buffers]:1158 listp ffff880220345ba8, listp->prev ffff880220345ba8, listp->next ffff880220345ba8
[  959.804054] NILFS [nilfs_segctor_apply_buffers]:1159 bh->b_blocknr 22579169, bh->b_size 4096, bh->b_page ffffea00082a9b40
[  959.804058] NILFS [nilfs_segctor_apply_buffers]:1160 bh->b_assoc_buffers.next ffff8802247e3af8, bh->b_assoc_buffers.prev ffff880209836ad8
[  959.804092] NILFS [nilfs_segbuf_add_payload_buffer]:167 page->index 22583013, i_ino 0, i_size 242770509824, nblocks 1766
[  959.804096] NILFS [nilfs_segbuf_add_payload_buffer]:168 bh->b_assoc_buffers.next ffff8802247e3af8, bh->b_assoc_buffers.prev ffff880209836a70

It is possible to see that:
(1) It was added 1766 blocks in list.
(2) The last blocks are blocks of inode (ino = 3): #1762, #1763, #1764,
#1765.
(3) The last buffer head has next pointer ffff8802247e3af8 that is
pointed on first buffer head in list (as I understand).

But on the stage of complete write we have such picture:

[  959.848722] NILFS [nilfs_segctor_complete_write]:2224 bh_count 1
[  959.848735] NILFS [nilfs_segctor_complete_write]:2225 bh->b_blocknr 21394345, bh->b_size 4096, bh->b_page ffffea00076ffd80
[  959.848739] NILFS [nilfs_segctor_complete_write]:2226 bh->b_assoc_buffers.next ffff88021de434c0, bh->b_assoc_buffers.prev ffff8802247e3828
[  959.848744] NILFS [nilfs_segctor_complete_write]:2227 page->index 12, i_ino 1005398, i_size 77824
[  959.848752] NILFS [nilfs_segctor_complete_write]:2224 bh_count 2
[  959.848756] NILFS [nilfs_segctor_complete_write]:2225 bh->b_blocknr 21394887, bh->b_size 4096, bh->b_page ffffea00078db900
[  959.848759] NILFS [nilfs_segctor_complete_write]:2226 bh->b_assoc_buffers.next ffff88021de10048, bh->b_assoc_buffers.prev ffff88021de42048
[  959.848763] NILFS [nilfs_segctor_complete_write]:2227 page->index 13, i_ino 1005398, i_size 77824
[  959.848771] NILFS [nilfs_segctor_complete_write]:2224 bh_count 3
[  959.848774] NILFS [nilfs_segctor_complete_write]:2225 bh->b_blocknr 50231152, bh->b_size 4096, bh->b_page ffffea000889ae80
[  959.848778] NILFS [nilfs_segctor_complete_write]:2226 bh->b_assoc_buffers.next ffff880182abab40, bh->b_assoc_buffers.prev ffff88021de434c0
[  959.848782] NILFS [nilfs_segctor_complete_write]:2227 page->index 50231152, i_ino 1005398, i_size 77824

[............................................................................................................................................]

[  959.874242] NILFS [nilfs_segctor_complete_write]:2224 bh_count 1761
[  959.874245] NILFS [nilfs_segctor_complete_write]:2225 bh->b_blocknr 22583012, bh->b_size 4096, bh->b_page ffffea00082a9b40
[  959.874249] NILFS [nilfs_segctor_complete_write]:2226 bh->b_assoc_buffers.next ffff880182b97fb8, bh->b_assoc_buffers.prev ffff880209836ad8
[  959.874252] NILFS [nilfs_segctor_complete_write]:2227 page->index 22583012, i_ino 3, i_size 0
[  959.874255] NILFS [nilfs_segctor_complete_write]:2224 bh_count 1762
[  959.874259] NILFS [nilfs_segctor_complete_write]:2225 bh->b_blocknr 22583013, bh->b_size 4096, bh->b_page ffffea0005fe3080
[  959.874262] NILFS [nilfs_segctor_complete_write]:2226 bh->b_assoc_buffers.next ffff8802247e3af8, bh->b_assoc_buffers.prev ffff880209836a70
[  959.874266] NILFS [nilfs_segctor_complete_write]:2227 page->index 22583013, i_ino 0, i_size 242770509824
[  959.874270] NILFS [nilfs_segctor_complete_write]:2224 bh_count 1763
[  959.874274] NILFS [nilfs_segctor_complete_write]:2225 bh->b_blocknr 22581248, bh->b_size 22583295, bh->b_page 0000000000002b13
[  959.874277] NILFS [nilfs_segctor_complete_write]:2226 bh->b_assoc_buffers.next ffff880182abab40, bh->b_assoc_buffers.prev ffff880182b97fb8


It is possible to see that buffer head {page->index 22583013, i_ino 0,
i_size 242770509824, nblocks 1766} has #1762 index on complete write
phase and namely next item in the list to raise crash because of illegal
page address {bh->b_page 0000000000002b13}. But all content of next item
is very strange. So, I think that it is not list's memory. But it is
more strange that bh->b_assoc_buffers.prev ffff880182b97fb8 of this
corrupted item has address that points on previous good item (this item
was last in the list). As I can see, item #1762 {page->index 22583013,
i_ino 0, i_size 242770509824} has unchanged next and prev pointers
{bh->b_assoc_buffers.next ffff8802247e3af8, bh->b_assoc_buffers.prev
ffff880209836a70}. So, I suspect that we have the reason of the issue
somewhere between add payload buffer and complete write phase. But,
currently, I haven't clear understanding of the whole picture and the
reason of the issue.

I think that it makes sense to try to simplify the issue environment
with the purpose to investigate the issue more deeply. But, maybe, you
can advise something yet.

Do you have any ideas about the reason of the issue? Could you share
your vision of possible reason of the issue? Anyway, I continue
investigation of the issue. But, unfortunately, I don't catch the issue
reason yet.

With the best regards,
Vyacheslav Dubeyko.


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Kernel Bug: unable to handle kernel paging request
  2013-08-09 13:15                           ` Vyacheslav Dubeyko
@ 2013-08-14 22:38                             ` Ryusuke Konishi
       [not found]                               ` <20130815.073806.260411879.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Ryusuke Konishi @ 2013-08-14 22:38 UTC (permalink / raw)
  To: Vyacheslav Dubeyko
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA, jeromepoulin-Re5JQEeQqe8AvxtiuMwx3w

Hi Vyacheslav,
On Fri, 09 Aug 2013 17:15:25 +0400, Vyacheslav Dubeyko wrote:
> Hi Ryusuke,
> 
> I am investigating the issue during last two weeks and I think that it
> is time to share current results and my considerations. I feel necessity
> to discuss possible reasons of the issue. Maybe, I miss something and it
> needs to advise me a proper way of the issue investigation.
> 
> Actually, I can reproduce the issue by means of way of starting on
> rootfs compilation task of Linux kernel and apt-get update task in
> parallel. The issue results in such crash:
<snip>

> [  959.874242] NILFS [nilfs_segctor_complete_write]:2224 bh_count 1761
> [  959.874245] NILFS [nilfs_segctor_complete_write]:2225 bh->b_blocknr 22583012, bh->b_size 4096, bh->b_page ffffea00082a9b40
> [  959.874249] NILFS [nilfs_segctor_complete_write]:2226 bh->b_assoc_buffers.next ffff880182b97fb8, bh->b_assoc_buffers.prev ffff880209836ad8
> [  959.874252] NILFS [nilfs_segctor_complete_write]:2227 page->index 22583012, i_ino 3, i_size 0

> [  959.874255] NILFS [nilfs_segctor_complete_write]:2224 bh_count 1762
> [  959.874259] NILFS [nilfs_segctor_complete_write]:2225 bh->b_blocknr 22583013, bh->b_size 4096, bh->b_page ffffea0005fe3080
> [  959.874262] NILFS [nilfs_segctor_complete_write]:2226 bh->b_assoc_buffers.next ffff8802247e3af8, bh->b_assoc_buffers.prev ffff880209836a70
> [  959.874266] NILFS [nilfs_segctor_complete_write]:2227 page->index 22583013, i_ino 0, i_size 242770509824

This block (physical block number = #22583013) looks to be a super
root block, so the strange i_ino and i_size are, maybe, correct.

> [  959.874270] NILFS [nilfs_segctor_complete_write]:2224 bh_count 1763
> [  959.874274] NILFS [nilfs_segctor_complete_write]:2225 bh->b_blocknr 22581248, bh->b_size 22583295, bh->b_page 0000000000002b13
> [  959.874277] NILFS [nilfs_segctor_complete_write]:2226 bh->b_assoc_buffers.next ffff880182abab40, bh->b_assoc_buffers.prev ffff880182b97fb8

This looks a list head structure on the head at &segbuf->sb_payload_buffers.
So, maybe the strange b_blocknr, b_size, b_page, are correct.

How did you judge the end condition of this loop?

Is this buffer head actually causing the oops at nilfs_end_page_io() ?


Regards,
Ryusuke Konishi


> It is possible to see that buffer head {page->index 22583013, i_ino 0,
> i_size 242770509824, nblocks 1766} has #1762 index on complete write
> phase and namely next item in the list to raise crash because of illegal
> page address {bh->b_page 0000000000002b13}. But all content of next item
> is very strange. So, I think that it is not list's memory. But it is
> more strange that bh->b_assoc_buffers.prev ffff880182b97fb8 of this
> corrupted item has address that points on previous good item (this item
> was last in the list). As I can see, item #1762 {page->index 22583013,
> i_ino 0, i_size 242770509824} has unchanged next and prev pointers
> {bh->b_assoc_buffers.next ffff8802247e3af8, bh->b_assoc_buffers.prev
> ffff880209836a70}. So, I suspect that we have the reason of the issue
> somewhere between add payload buffer and complete write phase. But,
> currently, I haven't clear understanding of the whole picture and the
> reason of the issue.
> 
> I think that it makes sense to try to simplify the issue environment
> with the purpose to investigate the issue more deeply. But, maybe, you
> can advise something yet.
> 
> Do you have any ideas about the reason of the issue? Could you share
> your vision of possible reason of the issue? Anyway, I continue
> investigation of the issue. But, unfortunately, I don't catch the issue
> reason yet.
> 
> With the best regards,
> Vyacheslav Dubeyko.
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Kernel Bug: unable to handle kernel paging request
       [not found]                               ` <20130815.073806.260411879.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
@ 2013-08-16  4:49                                 ` Ryusuke Konishi
       [not found]                                   ` <20130816.134934.27810145.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Ryusuke Konishi @ 2013-08-16  4:49 UTC (permalink / raw)
  To: Vyacheslav Dubeyko
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA, jeromepoulin-Re5JQEeQqe8AvxtiuMwx3w

Hi Vyachelav,

I haven't yet succeeded to reproduce this issue even with apt-get
update operation.

How long did it take to reproduce this issue in your environment ?

According to reported logs, the crash seems to occur at the following
BUG_ON() which is inlined in nilfs_end_page_io() function:

#define page_buffers(page)                                      \
        ({                                                      \
                BUG_ON(!PagePrivate(page));                     \
                ((struct buffer_head *)page_private(page));     \
        })

However, it's hard to narrow down the cause without reproducing the
issue.  The page private flag is used to indicate that the given page
has buffer heads.  So, this issue seems to be caused by that an
invalid page was passed to nilfs_end_page_io() or
try_to_free_buffers() freed the buffer head by some reason.

The latter situation can occur if the following buffer_busy() function
unexpectedly failed for the buffer head:

static inline int buffer_busy(struct buffer_head *bh)
{
        return atomic_read(&bh->b_count) |
                (bh->b_state & ((1 << BH_Dirty) | (1 << BH_Lock)));
}

Since BH_Dirty is dropped in nilfs_segctor_complete_write() function,
I suspect the situation that bh->b_count mistakenly reached zero.

Anyhow, further debug seems hard without reproducing the issue.


Regards,
Ryusuke Konishi


On Thu, 15 Aug 2013 07:38:06 +0900 (JST), Ryusuke Konishi wrote:
> Hi Vyacheslav,
> On Fri, 09 Aug 2013 17:15:25 +0400, Vyacheslav Dubeyko wrote:
>> Hi Ryusuke,
>> 
>> I am investigating the issue during last two weeks and I think that it
>> is time to share current results and my considerations. I feel necessity
>> to discuss possible reasons of the issue. Maybe, I miss something and it
>> needs to advise me a proper way of the issue investigation.
>> 
>> Actually, I can reproduce the issue by means of way of starting on
>> rootfs compilation task of Linux kernel and apt-get update task in
>> parallel. The issue results in such crash:
> <snip>
> 
>> [  959.874242] NILFS [nilfs_segctor_complete_write]:2224 bh_count 1761
>> [  959.874245] NILFS [nilfs_segctor_complete_write]:2225 bh->b_blocknr 22583012, bh->b_size 4096, bh->b_page ffffea00082a9b40
>> [  959.874249] NILFS [nilfs_segctor_complete_write]:2226 bh->b_assoc_buffers.next ffff880182b97fb8, bh->b_assoc_buffers.prev ffff880209836ad8
>> [  959.874252] NILFS [nilfs_segctor_complete_write]:2227 page->index 22583012, i_ino 3, i_size 0
> 
>> [  959.874255] NILFS [nilfs_segctor_complete_write]:2224 bh_count 1762
>> [  959.874259] NILFS [nilfs_segctor_complete_write]:2225 bh->b_blocknr 22583013, bh->b_size 4096, bh->b_page ffffea0005fe3080
>> [  959.874262] NILFS [nilfs_segctor_complete_write]:2226 bh->b_assoc_buffers.next ffff8802247e3af8, bh->b_assoc_buffers.prev ffff880209836a70
>> [  959.874266] NILFS [nilfs_segctor_complete_write]:2227 page->index 22583013, i_ino 0, i_size 242770509824
> 
> This block (physical block number = #22583013) looks to be a super
> root block, so the strange i_ino and i_size are, maybe, correct.
> 
>> [  959.874270] NILFS [nilfs_segctor_complete_write]:2224 bh_count 1763
>> [  959.874274] NILFS [nilfs_segctor_complete_write]:2225 bh->b_blocknr 22581248, bh->b_size 22583295, bh->b_page 0000000000002b13
>> [  959.874277] NILFS [nilfs_segctor_complete_write]:2226 bh->b_assoc_buffers.next ffff880182abab40, bh->b_assoc_buffers.prev ffff880182b97fb8
> 
> This looks a list head structure on the head at &segbuf->sb_payload_buffers.
> So, maybe the strange b_blocknr, b_size, b_page, are correct.
> 
> How did you judge the end condition of this loop?
> 
> Is this buffer head actually causing the oops at nilfs_end_page_io() ?
> 
> 
> Regards,
> Ryusuke Konishi
> 
> 
>> It is possible to see that buffer head {page->index 22583013, i_ino 0,
>> i_size 242770509824, nblocks 1766} has #1762 index on complete write
>> phase and namely next item in the list to raise crash because of illegal
>> page address {bh->b_page 0000000000002b13}. But all content of next item
>> is very strange. So, I think that it is not list's memory. But it is
>> more strange that bh->b_assoc_buffers.prev ffff880182b97fb8 of this
>> corrupted item has address that points on previous good item (this item
>> was last in the list). As I can see, item #1762 {page->index 22583013,
>> i_ino 0, i_size 242770509824} has unchanged next and prev pointers
>> {bh->b_assoc_buffers.next ffff8802247e3af8, bh->b_assoc_buffers.prev
>> ffff880209836a70}. So, I suspect that we have the reason of the issue
>> somewhere between add payload buffer and complete write phase. But,
>> currently, I haven't clear understanding of the whole picture and the
>> reason of the issue.
>> 
>> I think that it makes sense to try to simplify the issue environment
>> with the purpose to investigate the issue more deeply. But, maybe, you
>> can advise something yet.
>> 
>> Do you have any ideas about the reason of the issue? Could you share
>> your vision of possible reason of the issue? Anyway, I continue
>> investigation of the issue. But, unfortunately, I don't catch the issue
>> reason yet.
>> 
>> With the best regards,
>> Vyacheslav Dubeyko.
>> 
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Kernel Bug: unable to handle kernel paging request
       [not found]                                   ` <20130816.134934.27810145.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
@ 2013-08-16  7:03                                     ` Vyacheslav Dubeyko
  2013-08-29 19:10                                       ` Vyacheslav Dubeyko
  0 siblings, 1 reply; 15+ messages in thread
From: Vyacheslav Dubeyko @ 2013-08-16  7:03 UTC (permalink / raw)
  To: Ryusuke Konishi
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA, jeromepoulin-Re5JQEeQqe8AvxtiuMwx3w

Hi Ryusuke,

On Fri, 2013-08-16 at 13:49 +0900, Ryusuke Konishi wrote:
> Hi Vyachelav,
> 
> I haven't yet succeeded to reproduce this issue even with apt-get
> update operation.
> 
> How long did it take to reproduce this issue in your environment ?
> 

I reproduce the issue stably in my environment. But sometimes I need to
repeat reproducing path several times before achieving the issue.
Usually, the issue is reproduced on the phase of "Reading package
lists...". But it is hard to predict on what concrete percent of
operation progress you will reproduce the issue.

I have such version of the kernel: Linux 3.10.0-rc5+ #45 SMP Thu Aug 8
17:20:43 MSK 2013 x86_64 x86_64 x86_64 GNU/Linux. This is Ubuntu 12.04.2
LTS (GNU/Linux 3.10.0-rc5+ x86_64) distro.

I simply start four terminal windows in parallel with root permissions:
(1) "tail -n 30 -f /var/log/syslog" output;
(2) "top" output;
(3) start kernel compilation;
(4) start apt-get update;

> According to reported logs, the crash seems to occur at the following
> BUG_ON() which is inlined in nilfs_end_page_io() function:
> 
> #define page_buffers(page)                                      \
>         ({                                                      \
>                 BUG_ON(!PagePrivate(page));                     \
>                 ((struct buffer_head *)page_private(page));     \
>         })
> 
> However, it's hard to narrow down the cause without reproducing the
> issue.  The page private flag is used to indicate that the given page
> has buffer heads.  So, this issue seems to be caused by that an
> invalid page was passed to nilfs_end_page_io() or
> try_to_free_buffers() freed the buffer head by some reason.
> 
> The latter situation can occur if the following buffer_busy() function
> unexpectedly failed for the buffer head:
> 
> static inline int buffer_busy(struct buffer_head *bh)
> {
>         return atomic_read(&bh->b_count) |
>                 (bh->b_state & ((1 << BH_Dirty) | (1 << BH_Lock)));
> }
> 
> Since BH_Dirty is dropped in nilfs_segctor_complete_write() function,
> I suspect the situation that bh->b_count mistakenly reached zero.
> 
> Anyhow, further debug seems hard without reproducing the issue.
> 

Yes, I see. I will take into account your considerations about possible
reason of the issue. Thank you.

Unfortunately, I haven't opportunity for the issue investigation during
this week. I think that I can check your suspicion during today. Anyway,
I will continue investigation of the issue on the next week.

Sorry that I don't answer on your previous e-mail. I were busy.

With the best regards,
Vyacheslav Dubeyko.


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Kernel Bug: unable to handle kernel paging request
  2013-08-16  7:03                                     ` Vyacheslav Dubeyko
@ 2013-08-29 19:10                                       ` Vyacheslav Dubeyko
       [not found]                                         ` <72C60256-983E-43D0-9DA1-D4A446B578BB-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Vyacheslav Dubeyko @ 2013-08-29 19:10 UTC (permalink / raw)
  To: Jérôme Poulin; +Cc: Ryusuke Konishi, linux-nilfs

Hi Jérôme,

I need to check independently some my suspicions about the issue. So, I need in additional details.

Did you have any mounted ext4/ext3 partitions in the background of the reproduced issue?

Could you check that you can reproduce the issue in the case of absence any mounted
ext4/ext3 partitions?

Could you check also that you can reproduce the issue for 3.2 or earlier kernel version?

Thanks,
Vyacheslav Dubeyko. 

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Kernel Bug: unable to handle kernel paging request
       [not found]                                         ` <72C60256-983E-43D0-9DA1-D4A446B578BB-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2013-08-29 23:37                                           ` Jérôme Poulin
       [not found]                                             ` <CALJXSJpbHN2SQWz0e2gC_hrRKG8EcnV2bWf068GWsuoa8AX5Dw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Jérôme Poulin @ 2013-08-29 23:37 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: Ryusuke Konishi, linux-nilfs

On Thu, Aug 29, 2013 at 3:10 PM, Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org> wrote:
> I need to check independently some my suspicions about the issue. So, I need in additional details.
>
> Did you have any mounted ext4/ext3 partitions in the background of the reproduced issue?

I had /boot as ext3 mounted all that time but completely unused until
we started diagnosing the logs.

>
> Could you check that you can reproduce the issue in the case of absence any mounted
> ext4/ext3 partitions?
>

Would you like me to try with /boot umounted?

> Could you check also that you can reproduce the issue for 3.2 or earlier kernel version?

That would be harder to test but possible.


I have a bigger issue though. Right now I'm running in the problem
that the cleaner won't work anymore and partition is full. I migrated
to ext4 until I decide making a new nilfs2 partition, I'm not sure
I'll be able to reproduce the problem on a full FS, I could resize it
a bit though.

Link to this problem:
http://permalink.gmane.org/gmane.comp.file-systems.nilfs.user/3072
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Kernel Bug: unable to handle kernel paging request
       [not found]                                             ` <CALJXSJpbHN2SQWz0e2gC_hrRKG8EcnV2bWf068GWsuoa8AX5Dw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-08-30  5:43                                               ` Vyacheslav Dubeyko
  0 siblings, 0 replies; 15+ messages in thread
From: Vyacheslav Dubeyko @ 2013-08-30  5:43 UTC (permalink / raw)
  To: Jérôme Poulin; +Cc: Ryusuke Konishi, linux-nilfs

On Thu, 2013-08-29 at 19:37 -0400, Jérôme Poulin wrote:
> On Thu, Aug 29, 2013 at 3:10 PM, Vyacheslav Dubeyko <slava-yeENwD64cLyIwRZHo2/mJg@public.gmane.orgm> wrote:
> > I need to check independently some my suspicions about the issue. So, I need in additional details.
> >
> > Did you have any mounted ext4/ext3 partitions in the background of the reproduced issue?
> 
> I had /boot as ext3 mounted all that time but completely unused until
> we started diagnosing the logs.
> 

Yes, I also has mounted ext4 partition in the background of the issue.

> >
> > Could you check that you can reproduce the issue in the case of absence any mounted
> > ext4/ext3 partitions?
> >
> 
> Would you like me to try with /boot umounted?
> 

Yes, it needs to check without any ext3/ext4 mounted partitions in
background. I suspect that it has some strange interaction between jbd
(ext3/ext4 journaling daemon) and segctor on the block layer. Maybe I am
wrong. But I can't reproduce the issue for the case of more earlier
kernel version (3.2, for example). This kernel version hasn't some
commits for jbd/ext4/ext3. So, I need to check my assumption
independently because I can misunderstand something. I have checked many
assumptions about the issue earlier but I don't catch the reason yet. I
hope that I have some real hints about the issue's reason now.

> > Could you check also that you can reproduce the issue for 3.2 or earlier kernel version?
> 
> That would be harder to test but possible.
> 
> 
> I have a bigger issue though. Right now I'm running in the problem
> that the cleaner won't work anymore and partition is full. I migrated
> to ext4 until I decide making a new nilfs2 partition, I'm not sure
> I'll be able to reproduce the problem on a full FS, I could resize it
> a bit though.
> 
> Link to this problem:
> http://permalink.gmane.org/gmane.comp.file-systems.nilfs.user/3072

Yes, it is bad. Maybe, it is another issue.

But if you can try to reproduce and confirm (or refuse) my assumption
then it will be great.

Thanks,
Vyacheslav Dubeyko. 


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-08-30  5:43 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-12  5:24 Kernel Bug: unable to handle kernel paging request Jérôme Poulin
     [not found] ` <CALJXSJquK6YxGKuH97Ec2CTMyJaZrJjOfePSKtgPDm8_9YXzzw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-07-12 18:58   ` Jérôme Poulin
     [not found]     ` <CALJXSJoW9Qpp9t42u_k4cW3gO6qzSPoeCjtQDU3tDKq6TJ=K8Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-07-18 17:30       ` Vyacheslav Dubeyko
     [not found]         ` <F4156394-8A25-4F81-81C3-9921CB00BD92-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2013-07-22 19:11           ` Jérôme Poulin
     [not found]             ` <CALJXSJrj0J_-ZUCOurJXaYhx_wEJwxb2_5OOJjQSSmmP-PQDgg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-07-23 11:15               ` Vyacheslav Dubeyko
2013-07-26 18:13                 ` Jérôme Poulin
     [not found]                   ` <CALJXSJrY22eGkYA76wwL4moAdsjV+_PUtvVO6tt5K16hzMh8xQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-07-27 17:06                     ` Vyacheslav Dubeyko
     [not found]                       ` <CALJXSJqV5nYb_t6GMS0FpWyf1aRehAgpvebwgbJzMJfctf1b2A@mail.gmail.com>
     [not found]                         ` <CALJXSJqV5nYb_t6GMS0FpWyf1aRehAgpvebwgbJzMJfctf1b2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-07-29 17:30                           ` Vyacheslav Dubeyko
2013-08-09 13:15                           ` Vyacheslav Dubeyko
2013-08-14 22:38                             ` Ryusuke Konishi
     [not found]                               ` <20130815.073806.260411879.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2013-08-16  4:49                                 ` Ryusuke Konishi
     [not found]                                   ` <20130816.134934.27810145.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2013-08-16  7:03                                     ` Vyacheslav Dubeyko
2013-08-29 19:10                                       ` Vyacheslav Dubeyko
     [not found]                                         ` <72C60256-983E-43D0-9DA1-D4A446B578BB-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2013-08-29 23:37                                           ` Jérôme Poulin
     [not found]                                             ` <CALJXSJpbHN2SQWz0e2gC_hrRKG8EcnV2bWf068GWsuoa8AX5Dw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-08-30  5:43                                               ` Vyacheslav Dubeyko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.