All of lore.kernel.org
 help / color / mirror / Atom feed
* Crash...
@ 2009-07-23 12:55 Andrea Gelmini
       [not found] ` <9cdbb57f0907230555k768383c2ld1690d31cc6fff83-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 45+ messages in thread
From: Andrea Gelmini @ 2009-07-23 12:55 UTC (permalink / raw)
  To: users-JrjvKiOkagjYtjvyW6yDsg

Hi all,
   thanks a lot for nilfs2.
   I'm trying to migrate my /home on nilfs2, but I've got this problem
doing rsync:

[ 4808.492544] ------------[ cut here ]------------
[ 4808.492547] kernel BUG at fs/nilfs2/segment.c:744!
[ 4808.492549] invalid opcode: 0000 [#1] SMP
[ 4808.492551] last sysfs file: /sys/devices/virtual/block/dm-0/range
[ 4808.492552] Modules linked in: nilfs2 reiserfs binfmt_misc rfcomm
bridge stp llc bnep sco l2cap bluetooth rfkill kqemu video backlight
output sbs sbshc pci_slot container battery ac loop snd_hda_codec_idt
snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss
snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi
snd_seq_midi_event snd_seq iTCO_wdt psmouse snd_timer snd_seq_device
iTCO_vendor_support pcspkr serio_raw processor rtc_cmos rtc_core
rtc_lib rt2860sta(C) snd joydev evdev button soundcore snd_page_alloc
xfs exportfs sha256_generic usbhid hid sg sr_mod usb_storage
usb_libusual sd_mod cdrom ata_generic pata_acpi pata_marvell ata_piix
ohci1394 ieee1394 uhci_hcd ehci_hcd libata scsi_mod e1000e usbcore
nls_base dm_snapshot thermal fan thermal_sys hwmon dm_mirror
dm_region_hash dm_log
[ 4808.492597]
[ 4808.492599] Pid: 15705, comm: segctord Tainted: G         C
(2.6.31-rc4g #6)
[ 4808.492601] EIP: 0060:[<f8439ffe>] EFLAGS: 00010246 CPU: 1
[ 4808.492611] EIP is at nilfs_segctor_scan_file+0xa5/0x19e [nilfs2]
[ 4808.492613] EAX: 40000020 EBX: 00000000 ECX: 0000000e EDX: c16b5cc0
[ 4808.492615] ESI: ef0b9500 EDI: dfe0f318 EBP: f4900dd0 ESP: f4900d5c
[ 4808.492616]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 4808.492618] Process segctord (pid: 15705, ti=f4900000 task=e2448dc0
task.ti=f4900000)
[ 4808.492620] Stack:
[ 4808.492621]  e52e42c0 f8449c98 e52e42c0 dfe0f2b8 dfe0f230 0000000e
00000000 c16b5cc0
[ 4808.492624] <0> c1580340 c14b9c20 c19dc5a0 c170f300 c1a4e4e0
c1996320 c18c9f80 c1950820
[ 4808.492628] <0> c1951de0 c18c9660 c19f9420 c1817ea0 c194fb80
f4900db0 f4900db0 d86ea228
[ 4808.492632] Call Trace:
[ 4808.492642]  [<f843abc8>] ? nilfs_segctor_do_construct+0x639/0x1dfe [nilfs2]
[ 4808.492652]  [<f843c511>] ? nilfs_segctor_construct+0x35/0xae [nilfs2]
[ 4808.492661]  [<f843ce6f>] ? nilfs_segctor_thread+0x138/0x2af [nilfs2]
[ 4808.492670]  [<f843cad5>] ? nilfs_construction_timeout+0x0/0xa [nilfs2]
[ 4808.492678]  [<f843cd37>] ? nilfs_segctor_thread+0x0/0x2af [nilfs2]
[ 4808.492683]  [<c1038145>] ? kthread+0x69/0x6e
[ 4808.492685]  [<c10380dc>] ? kthread+0x0/0x6e
[ 4808.492688]  [<c10033e7>] ? kernel_thread_helper+0x7/0x10
[ 4808.492690] Code: 8d 47 a0 89 55 9c 89 45 98 c7 45 f0 00 00 00 00
c7 45 a0 00 00 00 00 c7 45 a4 00 00 00 00 eb 5e 8b 54 9d a8 8b 02 f6
c4 08 75 04 <0f> 0b eb fe 8b 52 0c 89 55 94 89 d1 f6 01 02 74 21 8d 41
34 f0
[ 4808.492711] EIP: [<f8439ffe>] nilfs_segctor_scan_file+0xa5/0x19e
[nilfs2] SS:ESP 0068:f4900d5c
[ 4808.492721] ---[ end trace b07556da5ec25007 ]---

Well, I'm using Ubuntu 9.04 with a vanilla kernel 2.6.31-rc4 with
pulled "git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2.git
experimental".
What other info can be useful to you? Instead of a giant attachment
with .config, /proc/cpu, lspci and so on, I can provide all of this
via web.

Thanks a lot for your work,
Andrea

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Crash...
       [not found] ` <9cdbb57f0907230555k768383c2ld1690d31cc6fff83-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2009-07-23 16:12   ` Ryusuke Konishi
       [not found]     ` <20090724.011249.110726474.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 45+ messages in thread
From: Ryusuke Konishi @ 2009-07-23 16:12 UTC (permalink / raw)
  To: users-JrjvKiOkagjYtjvyW6yDsg, andrea.gelmini-Re5JQEeQqe8AvxtiuMwx3w

Hi,
On Thu, 23 Jul 2009 14:55:01 +0200, Andrea Gelmini wrote:
> Hi all,
>    thanks a lot for nilfs2.
>    I'm trying to migrate my /home on nilfs2, but I've got this problem
> doing rsync:
> 
> [ 4808.492544] ------------[ cut here ]------------
> [ 4808.492547] kernel BUG at fs/nilfs2/segment.c:744!
> [ 4808.492549] invalid opcode: 0000 [#1] SMP
> [ 4808.492551] last sysfs file: /sys/devices/virtual/block/dm-0/range
> [ 4808.492552] Modules linked in: nilfs2 reiserfs binfmt_misc rfcomm
> bridge stp llc bnep sco l2cap bluetooth rfkill kqemu video backlight
> output sbs sbshc pci_slot container battery ac loop snd_hda_codec_idt
> snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss
> snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi
> snd_seq_midi_event snd_seq iTCO_wdt psmouse snd_timer snd_seq_device
> iTCO_vendor_support pcspkr serio_raw processor rtc_cmos rtc_core
> rtc_lib rt2860sta(C) snd joydev evdev button soundcore snd_page_alloc
> xfs exportfs sha256_generic usbhid hid sg sr_mod usb_storage
> usb_libusual sd_mod cdrom ata_generic pata_acpi pata_marvell ata_piix
> ohci1394 ieee1394 uhci_hcd ehci_hcd libata scsi_mod e1000e usbcore
> nls_base dm_snapshot thermal fan thermal_sys hwmon dm_mirror
> dm_region_hash dm_log
> [ 4808.492597]
> [ 4808.492599] Pid: 15705, comm: segctord Tainted: G         C
> (2.6.31-rc4g #6)
> [ 4808.492601] EIP: 0060:[<f8439ffe>] EFLAGS: 00010246 CPU: 1
> [ 4808.492611] EIP is at nilfs_segctor_scan_file+0xa5/0x19e [nilfs2]
> [ 4808.492613] EAX: 40000020 EBX: 00000000 ECX: 0000000e EDX: c16b5cc0
> [ 4808.492615] ESI: ef0b9500 EDI: dfe0f318 EBP: f4900dd0 ESP: f4900d5c
> [ 4808.492616]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [ 4808.492618] Process segctord (pid: 15705, ti=f4900000 task=e2448dc0
> task.ti=f4900000)
> [ 4808.492620] Stack:
> [ 4808.492621]  e52e42c0 f8449c98 e52e42c0 dfe0f2b8 dfe0f230 0000000e
> 00000000 c16b5cc0
> [ 4808.492624] <0> c1580340 c14b9c20 c19dc5a0 c170f300 c1a4e4e0
> c1996320 c18c9f80 c1950820
> [ 4808.492628] <0> c1951de0 c18c9660 c19f9420 c1817ea0 c194fb80
> f4900db0 f4900db0 d86ea228
> [ 4808.492632] Call Trace:
> [ 4808.492642]  [<f843abc8>] ? nilfs_segctor_do_construct+0x639/0x1dfe [nilfs2]
> [ 4808.492652]  [<f843c511>] ? nilfs_segctor_construct+0x35/0xae [nilfs2]
> [ 4808.492661]  [<f843ce6f>] ? nilfs_segctor_thread+0x138/0x2af [nilfs2]
> [ 4808.492670]  [<f843cad5>] ? nilfs_construction_timeout+0x0/0xa [nilfs2]
> [ 4808.492678]  [<f843cd37>] ? nilfs_segctor_thread+0x0/0x2af [nilfs2]
> [ 4808.492683]  [<c1038145>] ? kthread+0x69/0x6e
> [ 4808.492685]  [<c10380dc>] ? kthread+0x0/0x6e
> [ 4808.492688]  [<c10033e7>] ? kernel_thread_helper+0x7/0x10
> [ 4808.492690] Code: 8d 47 a0 89 55 9c 89 45 98 c7 45 f0 00 00 00 00
> c7 45 a0 00 00 00 00 c7 45 a4 00 00 00 00 eb 5e 8b 54 9d a8 8b 02 f6
> c4 08 75 04 <0f> 0b eb fe 8b 52 0c 89 55 94 89 d1 f6 01 02 74 21 8d 41
> 34 f0
> [ 4808.492711] EIP: [<f8439ffe>] nilfs_segctor_scan_file+0xa5/0x19e
> [nilfs2] SS:ESP 0068:f4900d5c
> [ 4808.492721] ---[ end trace b07556da5ec25007 ]---
> 
> Well, I'm using Ubuntu 9.04 with a vanilla kernel 2.6.31-rc4 with
> pulled "git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2.git
> experimental".

Thank you for your report.

The log gives us really meaningful information.

First, I could identify the function which raised the assertion. 
( it's a page_buffers() call in nilfs_lookup_dirty_node_buffers() in
segment.c )

It also suggests that an inconsistent state in page cache of B-tree
nodes hit the function; the function found a dirty page, but the page
didn't have buffer heads which was supposed to be impossible for the
b-tree of nilfs.

I don't know yet why it can happen, but it's helpful to me in
narrowing down the cause.

> What other info can be useful to you? Instead of a giant attachment
> with .config, /proc/cpu, lspci and so on, I can provide all of this
> via web.
> 
> Thanks a lot for your work,
> Andrea

I will ask for your help if need arises.

Thanks,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Crash...
       [not found]     ` <20090724.011249.110726474.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2009-07-23 21:02       ` Andrea Gelmini
       [not found]         ` <9cdbb57f0907231402i1a92cb4qfe5a9d81346a4665-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 45+ messages in thread
From: Andrea Gelmini @ 2009-07-23 21:02 UTC (permalink / raw)
  To: users-JrjvKiOkagjYtjvyW6yDsg

2009/7/23 Ryusuke Konishi <ryusuke-sG5X7nlA6pw@public.gmane.org>:
> It also suggests that an inconsistent state in page cache of B-tree
> nodes hit the function; the function found a dirty page, but the page
> didn't have buffer heads which was supposed to be impossible for the
> b-tree of nilfs.

Well,
   thanks for your quick reply.
   Anyway, I can reproduce the same problem doing same things with
stable kernel (2.6.29.6) and nilfs2-module from git repository.
   I do this:
   -> mkfs.nilfs2 -b 1024 /dev/mapper/VG-NilfHome (maybe the problem
is the 1K block size?)
   -> mount /dev/mapper/VG-NilfHome /tmp/test/
   -> I run mirrordir (here's exactly as a "cp -a")

   It stucks at the same file as the crash before.
   It's a 5G file, if it could help.

  Here the log:
 1429.034107] ------------[ cut here ]------------
[ 1429.034111] kernel BUG at /home/gelma/dev/nilfs2-module/fs/segment.c:846!
[ 1429.034113] invalid opcode: 0000 [#1] SMP
[ 1429.034115] last sysfs file: /sys/devices/virtual/block/dm-1/range
[ 1429.034116] Modules linked in: nilfs2 binfmt_misc rfcomm bridge stp
llc bnep sco l2cap bluetooth kqemu video backlight output sbs sbshc
pci_slot container battery ac loop rtc_cmos psmouse nvidia(P) rtc_core
rt2860sta(C) rtc_lib serio_raw i2c_core pcspkr iTCO_wdt
iTCO_vendor_support evdev joydev button sha256_generic sr_mod cdrom sg
usbhid hid sd_mod ata_generic usb_storage libusual pata_acpi ohci1394
ieee1394 ata_piix ehci_hcd pata_marvell libata scsi_mod uhci_hcd
e1000e usbcore dm_snapshot thermal processor fan thermal_sys hwmon
dm_mirror dm_region_hash dm_log
[ 1429.034149]
[ 1429.034151] Pid: 10294, comm: segctord Tainted: P         C
(2.6.29.6 #1)
[ 1429.034153] EIP: 0060:[<f842fccc>] EFLAGS: 00010246 CPU: 1
[ 1429.034162] EIP is at nilfs_segctor_scan_file+0xa5/0x19e [nilfs2]
[ 1429.034164] EAX: 40000020 EBX: 00000000 ECX: 0000000e EDX: c149c880
[ 1429.034165] ESI: f1c1e900 EDI: eba22468 EBP: f15c6dec ESP: f15c6d78
[ 1429.034167]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 1429.034168] Process segctord (pid: 10294, ti=f15c6000 task=f2c07c90
task.ti=f15c6000)
[ 1429.034170] Stack:
[ 1429.034170]  e3918e80 f843faa8 e3918e80 eba22408 eba22380 0000000e
00000000 c149c880
[ 1429.034174]  c139cd20 c16c99c0 c112f9e0 c127dca0 c127e560 c127e580
c127dda0 c127ddc0
[ 1429.034178]  c127e5c0 c1599640 c1599660 c1599680 c15996a0 f15c6dcc
f15c6dcc e38fe628
[ 1429.034182] Call Trace:
[ 1429.034187]  [<f84308d6>] ? nilfs_segctor_do_construct+0x66a/0x1e7f [nilfs2]
[ 1429.034196]  [<c010edab>] ? smp_reschedule_interrupt+0x13/0x25
[ 1429.034200]  [<c0103414>] ? reschedule_interrupt+0x28/0x30
[ 1429.034203]  [<c01226e6>] ? finish_task_switch+0x2c/0xa9
[ 1429.034210]  [<f843227c>] ? nilfs_segctor_construct+0x35/0x7f [nilfs2]
[ 1429.034218]  [<f8432bab>] ? nilfs_segctor_thread+0x141/0x26d [nilfs2]
[ 1429.034226]  [<f8432831>] ? nilfs_construction_timeout+0x0/0xa [nilfs2]
[ 1429.034235]  [<f8432a6a>] ? nilfs_segctor_thread+0x0/0x26d [nilfs2]
[ 1429.034243]  [<c013564f>] ? kthread+0x3b/0x61
[ 1429.034245]  [<c0135614>] ? kthread+0x0/0x61
[ 1429.034247]  [<c0103623>] ? kernel_thread_helper+0x7/0x10
[ 1429.034250] Code: 8d 47 a0 89 55 9c 89 45 98 c7 45 f0 00 00 00 00
c7 45 a0 00 00 00 00 c7 45 a4 00 00 00 00 eb 5e 8b 54 9d a8 8b 02 f6
c4 08 75 04 <0f> 0b eb fe 8b 52 0c 89 55 94 89 d1 f6 01 02 74 21 8d 41
34 f0
[ 1429.034271] EIP: [<f842fccc>] nilfs_segctor_scan_file+0xa5/0x19e
[nilfs2] SS:ESP 0068:f15c6d78
[ 1429.034285] ---[ end trace 0ea06b01ef382693 ]---

Thanks a lot again,
Andrea

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Crash...
       [not found]         ` <9cdbb57f0907231402i1a92cb4qfe5a9d81346a4665-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2009-07-23 21:20           ` Andrea Gelmini
       [not found]             ` <9cdbb57f0907231420y4122d649y69fee2273a05b4cc-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2009-07-24  8:58           ` Crash Reinoud Zandijk
  2009-07-29  2:46           ` Crash Ryusuke Konishi
  2 siblings, 1 reply; 45+ messages in thread
From: Andrea Gelmini @ 2009-07-23 21:20 UTC (permalink / raw)
  To: users-JrjvKiOkagjYtjvyW6yDsg

2009/7/23 Andrea Gelmini <andrea.gelmini@gmail.com>:
>   -> mkfs.nilfs2 -b 1024 /dev/mapper/VG-NilfHome (maybe the problem
> is the 1K block size?)

Ok, not defining '-b 1024' it works like a charm...

Thanks a lot,
Andrea
_______________________________________________
users mailing list
users@nilfs.org
https://www.nilfs.org/mailman/listinfo/users

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Crash...
       [not found]         ` <9cdbb57f0907231402i1a92cb4qfe5a9d81346a4665-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2009-07-23 21:20           ` Crash Andrea Gelmini
@ 2009-07-24  8:58           ` Reinoud Zandijk
       [not found]             ` <20090724085803.GA23256-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
  2009-07-29  2:46           ` Crash Ryusuke Konishi
  2 siblings, 1 reply; 45+ messages in thread
From: Reinoud Zandijk @ 2009-07-24  8:58 UTC (permalink / raw)
  To: NILFS Users mailing list

On Thu, Jul 23, 2009 at 11:02:47PM +0200, Andrea Gelmini wrote:
> 2009/7/23 Ryusuke Konishi <ryusuke-sG5X7nlA6pw@public.gmane.org>:
> > It also suggests that an inconsistent state in page cache of B-tree
> > nodes hit the function; the function found a dirty page, but the page
> > didn't have buffer heads which was supposed to be impossible for the
> > b-tree of nilfs.
>    -> mkfs.nilfs2 -b 1024 /dev/mapper/VG-NilfHome (maybe the problem
> is the 1K block size?)

1024 is lower than the page size so that might explain a lot! I think its a
missing check in the mkfs.nilfs2 to never allow lower values than the page
size for block size.... but Ryusuke can better answer that :-D

With regards,
Reinoud

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Crash...
       [not found]             ` <20090724085803.GA23256-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
@ 2009-07-24  9:47               ` Andrea Gelmini
       [not found]                 ` <9cdbb57f0907240247n5ffd6f81yaee39eb386516c25-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2009-07-24 10:46               ` Crash Ryusuke Konishi
  1 sibling, 1 reply; 45+ messages in thread
From: Andrea Gelmini @ 2009-07-24  9:47 UTC (permalink / raw)
  To: NILFS Users mailing list

2009/7/24 Reinoud Zandijk <reinoud-S783fYmB3Ccdnm+yROfE0A@public.gmane.org>:
> 1024 is lower than the page size so that might explain a lot! I think its a
> missing check in the mkfs.nilfs2 to never allow lower values than the page
> size for block size.... but Ryusuke can better answer that :-D

I agree with you, but man page of mkfs.nilfs2 says it can be
1024/2048/4096/8192.
Of course it can't be > of page size, but that's a VFS problem.

Ciao,
Andrea

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Crash...
       [not found]                 ` <9cdbb57f0907240247n5ffd6f81yaee39eb386516c25-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2009-07-24 10:02                   ` Reinoud Zandijk
  2009-07-24 10:47                   ` Crash Ryusuke Konishi
  1 sibling, 0 replies; 45+ messages in thread
From: Reinoud Zandijk @ 2009-07-24 10:02 UTC (permalink / raw)
  To: NILFS Users mailing list

On Fri, Jul 24, 2009 at 11:47:21AM +0200, Andrea Gelmini wrote:
> 2009/7/24 Reinoud Zandijk <reinoud-S783fYmB3Ccdnm+yROfE0A@public.gmane.org>:
> > 1024 is lower than the page size so that might explain a lot! I think its a
> > missing check in the mkfs.nilfs2 to never allow lower values than the page
> > size for block size.... but Ryusuke can better answer that :-D
> 
> I agree with you, but man page of mkfs.nilfs2 says it can be
> 1024/2048/4096/8192.
> Of course it can't be > of page size, but that's a VFS problem.

It could be an implementation limit or a bug in the sub-page handling... i
dont know linux that much :-S

With regards,
Reinoud

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Crash...
       [not found]             ` <20090724085803.GA23256-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
  2009-07-24  9:47               ` Crash Andrea Gelmini
@ 2009-07-24 10:46               ` Ryusuke Konishi
       [not found]                 ` <20090724.194617.88653682.ryusuke-sG5X7nlA6pw@public.gmane.org>
  1 sibling, 1 reply; 45+ messages in thread
From: Ryusuke Konishi @ 2009-07-24 10:46 UTC (permalink / raw)
  To: users-JrjvKiOkagjYtjvyW6yDsg, reinoud-S783fYmB3Ccdnm+yROfE0A

On Fri, 24 Jul 2009 10:58:03 +0200, Reinoud Zandijk <reinoud-S783fYmB3Ccdnm+yROfE0A@public.gmane.org> wrote:
> On Thu, Jul 23, 2009 at 11:02:47PM +0200, Andrea Gelmini wrote:
> > 2009/7/23 Ryusuke Konishi <ryusuke-sG5X7nlA6pw@public.gmane.org>:
> > > It also suggests that an inconsistent state in page cache of B-tree
> > > nodes hit the function; the function found a dirty page, but the page
> > > didn't have buffer heads which was supposed to be impossible for the
> > > b-tree of nilfs.
> >    -> mkfs.nilfs2 -b 1024 /dev/mapper/VG-NilfHome (maybe the problem
> > is the 1K block size?)
> 
> 1024 is lower than the page size so that might explain a lot!

Yes, I suspect this in fact.  The small size block changes several
code paths.

> I think its a missing check in the mkfs.nilfs2 to never allow lower
> values than the page size for block size.... but Ryusuke can better
> answer that :-D
>
> With regards,
> Reinoud
 
We have implemented nilfs to allow smaller block sizes in order to
support some devices such like DVD/CD-ROM or flash based ones.

So, I count it a bug which we should fix.

Thanks,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Crash...
       [not found]                 ` <9cdbb57f0907240247n5ffd6f81yaee39eb386516c25-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2009-07-24 10:02                   ` Crash Reinoud Zandijk
@ 2009-07-24 10:47                   ` Ryusuke Konishi
  1 sibling, 0 replies; 45+ messages in thread
From: Ryusuke Konishi @ 2009-07-24 10:47 UTC (permalink / raw)
  To: users-JrjvKiOkagjYtjvyW6yDsg, andrea.gelmini-Re5JQEeQqe8AvxtiuMwx3w

On Fri, 24 Jul 2009 11:47:21 +0200, Andrea Gelmini wrote:
> 2009/7/24 Reinoud Zandijk <reinoud-S783fYmB3Ccdnm+yROfE0A@public.gmane.org>:
> > 1024 is lower than the page size so that might explain a lot! I think its a
> > missing check in the mkfs.nilfs2 to never allow lower values than the page
> > size for block size.... but Ryusuke can better answer that :-D
> 
> I agree with you, but man page of mkfs.nilfs2 says it can be
> 1024/2048/4096/8192.
> Of course it can't be > of page size, but that's a VFS problem.

Exactly.
 
> Ciao,
> Andrea

Cheers,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Crash...
       [not found]                 ` <20090724.194617.88653682.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2009-07-24 11:13                   ` Reinoud Zandijk
       [not found]                     ` <20090724111333.GE23256-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
  0 siblings, 1 reply; 45+ messages in thread
From: Reinoud Zandijk @ 2009-07-24 11:13 UTC (permalink / raw)
  To: NILFS Users mailing list

Hi Ryusuke,

On Fri, Jul 24, 2009 at 07:46:17PM +0900, Ryusuke Konishi wrote:
> We have implemented nilfs to allow smaller block sizes in order to
> support some devices such like DVD/CD-ROM or flash based ones.

sounds fine :) i had some ideas about using NiLFS on CD-R/DVD*R and some ideas
about refining NiLFS on flash based media.

For CD-R/DVD*R the last block written (or written several times) can be the
superblock so sequential media will work just fine. No use for a garbage
collector there ;)

For flash media, the entire first segment/erase block can be used to store the
superblock in; filling it up sequentially and when full, trigger the erase
block wiping :)

Idea?

With regards,
Reinoud

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Crash...
       [not found]             ` <9cdbb57f0907231420y4122d649y69fee2273a05b4cc-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2009-07-27  0:40               ` Jiro SEKIBA
       [not found]                 ` <873a8jhsbd.wl%jir-27yqGEOhnJbQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 45+ messages in thread
From: Jiro SEKIBA @ 2009-07-27  0:40 UTC (permalink / raw)
  To: NILFS Users mailing list

Hi,

I tried to reproduce the situation, but I can not reproduce the bug
with rc4, rc4+experimental on debian/lenny. 

At Thu, 23 Jul 2009 23:20:13 +0200,
Andrea Gelmini wrote:
> 
> 2009/7/23 Andrea Gelmini <andrea.gelmini-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
> >   -> mkfs.nilfs2 -b 1024 /dev/mapper/VG-NilfHome (maybe the problem
> > is the 1K block size?)
> 
> Ok, not defining '-b 1024' it works like a charm...

Looks like you are using lvm.  So differences are:
- debian or Ubuntsu 
- bare disk(/dev/sda?) or lvm.

Can you reproduce the bug without using lvm volume?

thanks,

regards,
-- 
Jiro SEKIBA <jir-hfpbi5WX9J54Eiagz67IpQ@public.gmane.org>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Crash...
       [not found]                     ` <20090724111333.GE23256-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
@ 2009-07-27  7:45                       ` Ryusuke Konishi
  0 siblings, 0 replies; 45+ messages in thread
From: Ryusuke Konishi @ 2009-07-27  7:45 UTC (permalink / raw)
  To: users-JrjvKiOkagjYtjvyW6yDsg, reinoud-S783fYmB3Ccdnm+yROfE0A

Hi,
On Fri, 24 Jul 2009 13:13:33 +0200, Reinoud Zandijk wrote:
> Hi Ryusuke,
> 
> On Fri, Jul 24, 2009 at 07:46:17PM +0900, Ryusuke Konishi wrote:
> > We have implemented nilfs to allow smaller block sizes in order to
> > support some devices such like DVD/CD-ROM or flash based ones.
> 
> sounds fine :) i had some ideas about using NiLFS on CD-R/DVD*R and some ideas
> about refining NiLFS on flash based media.
> 
> For CD-R/DVD*R the last block written (or written several times) can be the
> superblock so sequential media will work just fine. No use for a garbage
> collector there ;)

I didn't mean the on-the-fly burning, but it sounds interesting.

NILFS already has the secondary superblock in tail of the device, so
it seems achievable by adjusting writeback of two superblocks.

It sounds like UDF except that NILFS allows snapshot access
though even UDF seems to be capable of restoring past data in theory.
(I dunno if it can in reality)
 
> For flash media, the entire first segment/erase block can be used to store the
> superblock in; filling it up sequentially and when full, trigger the erase
> block wiping :)
> 
> Idea?
>
> With regards,
> Reinoud

Sounds nice.

I have a pending patch in the experimental tree which allows nilfs to
issue erase commands to the segment GC reclaimed.  But, I felt it
lacks something needed by flash devices.

If we add a ssd mount option to nilfs, I think this sort of ingenuity
to increase the lifespan of the device should be taken in.

Cheers,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Crash...
       [not found]                 ` <873a8jhsbd.wl%jir-27yqGEOhnJbQT0dZR+AlfA@public.gmane.org>
@ 2009-07-27  7:58                   ` Andrea Gelmini
  2009-07-29  2:49                   ` Crash Jiro SEKIBA
  2009-08-01 13:39                   ` Crash Andrea Gelmini
  2 siblings, 0 replies; 45+ messages in thread
From: Andrea Gelmini @ 2009-07-27  7:58 UTC (permalink / raw)
  To: NILFS Users mailing list

2009/7/27 Jiro SEKIBA <jir-hfpbi5WX9J54Eiagz67IpQ@public.gmane.org>:
> - debian or Ubuntsu
> - bare disk(/dev/sda?) or lvm.
>
> Can you reproduce the bug without using lvm volume?

Today I'll stress it with a normal primary partition.
Well, I'll do the test with stable kernel release to make it easier
for others to replicate the test.

Thanks a lot,
Andrea

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Crash...
       [not found]         ` <9cdbb57f0907231402i1a92cb4qfe5a9d81346a4665-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2009-07-23 21:20           ` Crash Andrea Gelmini
  2009-07-24  8:58           ` Crash Reinoud Zandijk
@ 2009-07-29  2:46           ` Ryusuke Konishi
       [not found]             ` <20090729.114604.56042421.ryusuke-sG5X7nlA6pw@public.gmane.org>
  2 siblings, 1 reply; 45+ messages in thread
From: Ryusuke Konishi @ 2009-07-29  2:46 UTC (permalink / raw)
  To: andrea.gelmini-Re5JQEeQqe8AvxtiuMwx3w
  Cc: konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg, users-JrjvKiOkagjYtjvyW6yDsg

[-- Attachment #1: Type: Text/Plain, Size: 2262 bytes --]

Hi Andrea,
On Thu, 23 Jul 2009 23:02:47 +0200, Andrea Gelmini wrote:
> 2009/7/23 Ryusuke Konishi <ryusuke-sG5X7nlA6pw@public.gmane.org>:
> > It also suggests that an inconsistent state in page cache of B-tree
> > nodes hit the function; the function found a dirty page, but the page
> > didn't have buffer heads which was supposed to be impossible for the
> > b-tree of nilfs.
> 
> Well,
>    thanks for your quick reply.
>    Anyway, I can reproduce the same problem doing same things with
> stable kernel (2.6.29.6) and nilfs2-module from git repository.
>    I do this:
>    -> mkfs.nilfs2 -b 1024 /dev/mapper/VG-NilfHome (maybe the problem
> is the 1K block size?)
>    -> mount /dev/mapper/VG-NilfHome /tmp/test/
>    -> I run mirrordir (here's exactly as a "cp -a")
> 
>    It stucks at the same file as the crash before.
>    It's a 5G file, if it could help.

I found a bug which may cause the kernel oops you reported.
The bug can arise only if buffer size is smaller than page size.

Here I attach the patch that will hopefully fix this problem.

Could you test if the patch makes a difference for the same file ?

Regards,
Ryusuke Konishi
---
 fs/nilfs2/segment.c |   16 +++++++++++++++-
 1 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 8b5e477..51ff3d0 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -1859,12 +1859,26 @@ static void nilfs_end_page_io(struct page *page, int err)
 	if (!page)
 		return;
 
-	if (buffer_nilfs_node(page_buffers(page)) && !PageWriteback(page))
+	if (buffer_nilfs_node(page_buffers(page)) && !PageWriteback(page)) {
 		/*
 		 * For b-tree node pages, this function may be called twice
 		 * or more because they might be split in a segment.
 		 */
+		if (PageDirty(page)) {
+			/*
+			 * For pages holding split b-tree node buffers, dirty
+			 * flag on the buffers may be cleared discretely.
+			 * In that case, the page is once redirtied for
+			 * remaining buffers, and it must be cancelled if
+			 * all the buffers get cleaned later.
+			 */
+			lock_page(page);
+			if (nilfs_page_buffers_clean(page))
+				__nilfs_clear_page_dirty(page);
+			unlock_page(page);
+		}
 		return;
+	}
 
 	__nilfs_end_page_io(page, err);
 }
-- 
1.6.3.3


[-- Attachment #2: nilfs2-fix-oops-due-to-inconsistent-page-state.patch.bz2 --]
[-- Type: Application/Octet-Stream, Size: 1128 bytes --]

[-- Attachment #3: Type: text/plain, Size: 158 bytes --]

_______________________________________________
users mailing list
users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
https://www.nilfs.org/mailman/listinfo/users

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: Crash...
       [not found]                 ` <873a8jhsbd.wl%jir-27yqGEOhnJbQT0dZR+AlfA@public.gmane.org>
  2009-07-27  7:58                   ` Crash Andrea Gelmini
@ 2009-07-29  2:49                   ` Jiro SEKIBA
       [not found]                     ` <87eis0mcev.wl%jir-27yqGEOhnJbQT0dZR+AlfA@public.gmane.org>
  2009-08-01 13:39                   ` Crash Andrea Gelmini
  2 siblings, 1 reply; 45+ messages in thread
From: Jiro SEKIBA @ 2009-07-29  2:49 UTC (permalink / raw)
  To: NILFS Users mailing list

Hi, 

> I tried to reproduce the situation, but I can not reproduce the bug
> with rc4, rc4+experimental on debian/lenny. 

 Well, when I tried I got different kernel dump.
I don't know if it's related or not, but just in case.

 I got following with rc4 with device mapper, created nilfs2 filesystem on
it during rsync on the filesystem.


[405816.059174] general protection fault: 0000 [#1] SMP 
[405816.059205] last sysfs file: /sys/block/dm-0/removable
[405816.059233] CPU 0 
[405816.059255] Modules linked in: dm_mod nilfs2 ipv6 loop snd_hda_codec_realtek i2c_i801 i2c_core iTCO_wdt serio_raw snd_hda_intel snd_hda_codec pcspkr psmouse snd_pcm snd_timer snd button processor soundcore intel_agp snd_page_alloc evdev ext3 jbd mbcache raid10 raid456 raid6_pq async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod sg sr_mod sd_mod cdrom ahci libata scsi_mod tg3 libphy uhci_hcd ehci_hcd thermal fan thermal_sys
[405816.059462] Pid: 215, comm: kswapd0 Not tainted 2.6.31-rc4 #1 N8-S720XMZCUUA2
[405816.059504] RIP: 0010:[<ffffffff810a2ac0>]  [<ffffffff810a2ac0>] shrink_page_list+0x2ac/0x609
[405816.059554] RSP: 0018:ffff88016cec1a40  EFLAGS: 00010282
[405816.059579] RAX: e7d50c6d4d1428d2 RBX: ffffea0003f5fbb0 RCX: 0000000000000800
[405816.059621] RDX: 0000000000000002 RSI: 0000000000000001 RDI: 0000000000000000
[405816.059663] RBP: ffffea0003f5fbd8 R08: 0000000000000001 R09: ffff88016fc075c0
[405816.059705] R10: 00003ffffffff000 R11: ffff88007a26f880 R12: ffff8801506f41a8
[405816.059747] R13: 0000000000000001 R14: 000000000000e800 R15: ffff88016cec1e10
[405816.059789] FS:  0000000000000000(0000) GS:ffff880028028000(0000) knlGS:0000000000000000
[405816.059833] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[405816.059858] CR2: 0000000001ae9618 CR3: 000000016c802000 CR4: 00000000000406f0
[405816.059900] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[405816.059943] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[405816.059985] Process kswapd0 (pid: 215, threadinfo ffff88016cec0000, task ffff88016f27f930)
[405816.060028] Stack:
[405816.060047]  0000000000000001 ffff88016cec1af0 0000000000000000 ffff88016cec1cb0
[405816.060078] <0> 0000000000000000 0000000000000017 0000000000000009 0000000000000001
[405816.060124] <0> ffffea00039702b8 ffffea0004aa0458 ffffea0004aa0810 ffffea0003970248
[405816.060185] Call Trace:
[405816.060208]  [<ffffffff8100c40e>] ? common_interrupt+0xe/0x13
[405816.060234]  [<ffffffff810a1cdb>] ? isolate_pages_global+0xa9/0x1f3
[405816.060262]  [<ffffffff810a338a>] ? shrink_list+0x2d8/0x5ec
[405816.060289]  [<ffffffff8110ac52>] ? proc_delete_inode+0x0/0x40
[405816.060317]  [<ffffffff8109f621>] ? determine_dirtyable_memory+0xd/0x1d
[405816.060345]  [<ffffffff8109f697>] ? get_dirty_limits+0x1d/0x256
[405816.060371]  [<ffffffff8100a54d>] ? __switch_to+0xae/0x266
[405816.060397]  [<ffffffff810a3921>] ? shrink_zone+0x283/0x335
[405816.060427]  [<ffffffffa0189217>] ? mb_cache_shrink_fn+0x26/0x117 [mbcache]
[405816.060456]  [<ffffffff810a3b14>] ? shrink_slab+0x141/0x153
[405816.060482]  [<ffffffff810a42ff>] ? kswapd+0x482/0x631
[405816.060507]  [<ffffffff810a1c32>] ? isolate_pages_global+0x0/0x1f3
[405816.060536]  [<ffffffff81053522>] ? autoremove_wake_function+0x0/0x2e
[405816.060564]  [<ffffffff810a3e7d>] ? kswapd+0x0/0x631
[405816.060588]  [<ffffffff810531d9>] ? kthread+0x84/0x8c
[405816.060614]  [<ffffffff8100caca>] ? child_rip+0xa/0x20
[405816.060639]  [<ffffffff81053155>] ? kthread+0x0/0x8c
[405816.060664]  [<ffffffff8100cac0>] ? child_rip+0x0/0x20
[405816.060688] Code: c0 0f 84 d1 02 00 00 f0 80 65 d8 ef 48 c7 c6 e0 b9 2a 81 48 c7 c7 26 fc 32 81 31 c0 e8 02 d0 1e 00 e9 b6 01 00 00 49 8b 44 24 58 <48> 83 38 00 0f 84 79 02 00 00 65 48 8b 04 25 00 b0 00 00 f6 40 
[405816.060819] RIP  [<ffffffff810a2ac0>] shrink_page_list+0x2ac/0x609
[405816.060847]  RSP <ffff88016cec1a40>
[405816.061045] ---[ end trace c44a8d41c1aab2f3 ]---




-- 
Jiro SEKIBA <jir-hfpbi5WX9J54Eiagz67IpQ@public.gmane.org>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Crash...
       [not found]                     ` <87eis0mcev.wl%jir-27yqGEOhnJbQT0dZR+AlfA@public.gmane.org>
@ 2009-07-29  3:46                       ` Ryusuke Konishi
       [not found]                         ` <20090729.124638.38314632.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 45+ messages in thread
From: Ryusuke Konishi @ 2009-07-29  3:46 UTC (permalink / raw)
  To: users-JrjvKiOkagjYtjvyW6yDsg, jir-hfpbi5WX9J54Eiagz67IpQ

Hi,
On Wed, 29 Jul 2009 11:49:12 +0900, Jiro SEKIBA <jir-hfpbi5WX9J54Eiagz67IpQ@public.gmane.org> wrote:
> Hi, 
> 
> > I tried to reproduce the situation, but I can not reproduce the bug
> > with rc4, rc4+experimental on debian/lenny. 
> 
>  Well, when I tried I got different kernel dump.
> I don't know if it's related or not, but just in case.
> 
>  I got following with rc4 with device mapper, created nilfs2 filesystem on
> it during rsync on the filesystem.

shrink_page_list() is a core memory management function to reclaim
free pages.

Could you send me the disassembled source of mm/vmscan.o ?
You cat get it by using objdump command:

 $ cd linux/mm
 $ objdump -D vmscan.o > vmscan.s

The instruction in question is that has the code "<48>" in the
following sequence.

> [405816.060688] Code: c0 0f 84 d1 02 00 00 f0 80 65 d8 ef 48 c7 c6 e0 b9 2a 81 48 c7 c7 26 fc 32 81 31 c0 e8 02 d0 1e 00 e9 b6 01 00 00 49 8b 44 24 58 <48> 83 38 00 0f 84 79 02 00 00 65 48 8b 04 25 00 b0 00 00 f6 40 

Cheers,
Ryusuke Konishi 
 
> [405816.059174] general protection fault: 0000 [#1] SMP 
> [405816.059205] last sysfs file: /sys/block/dm-0/removable
> [405816.059233] CPU 0 
> [405816.059255] Modules linked in: dm_mod nilfs2 ipv6 loop snd_hda_codec_realtek i2c_i801 i2c_core iTCO_wdt serio_raw snd_hda_intel snd_hda_codec pcspkr psmouse snd_pcm snd_timer snd button processor soundcore intel_agp snd_page_alloc evdev ext3 jbd mbcache raid10 raid456 raid6_pq async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod sg sr_mod sd_mod cdrom ahci libata scsi_mod tg3 libphy uhci_hcd ehci_hcd thermal fan thermal_sys
> [405816.059462] Pid: 215, comm: kswapd0 Not tainted 2.6.31-rc4 #1 N8-S720XMZCUUA2
> [405816.059504] RIP: 0010:[<ffffffff810a2ac0>]  [<ffffffff810a2ac0>] shrink_page_list+0x2ac/0x609
> [405816.059554] RSP: 0018:ffff88016cec1a40  EFLAGS: 00010282
> [405816.059579] RAX: e7d50c6d4d1428d2 RBX: ffffea0003f5fbb0 RCX: 0000000000000800
> [405816.059621] RDX: 0000000000000002 RSI: 0000000000000001 RDI: 0000000000000000
> [405816.059663] RBP: ffffea0003f5fbd8 R08: 0000000000000001 R09: ffff88016fc075c0
> [405816.059705] R10: 00003ffffffff000 R11: ffff88007a26f880 R12: ffff8801506f41a8
> [405816.059747] R13: 0000000000000001 R14: 000000000000e800 R15: ffff88016cec1e10
> [405816.059789] FS:  0000000000000000(0000) GS:ffff880028028000(0000) knlGS:0000000000000000
> [405816.059833] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [405816.059858] CR2: 0000000001ae9618 CR3: 000000016c802000 CR4: 00000000000406f0
> [405816.059900] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [405816.059943] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [405816.059985] Process kswapd0 (pid: 215, threadinfo ffff88016cec0000, task ffff88016f27f930)
> [405816.060028] Stack:
> [405816.060047]  0000000000000001 ffff88016cec1af0 0000000000000000 ffff88016cec1cb0
> [405816.060078] <0> 0000000000000000 0000000000000017 0000000000000009 0000000000000001
> [405816.060124] <0> ffffea00039702b8 ffffea0004aa0458 ffffea0004aa0810 ffffea0003970248
> [405816.060185] Call Trace:
> [405816.060208]  [<ffffffff8100c40e>] ? common_interrupt+0xe/0x13
> [405816.060234]  [<ffffffff810a1cdb>] ? isolate_pages_global+0xa9/0x1f3
> [405816.060262]  [<ffffffff810a338a>] ? shrink_list+0x2d8/0x5ec
> [405816.060289]  [<ffffffff8110ac52>] ? proc_delete_inode+0x0/0x40
> [405816.060317]  [<ffffffff8109f621>] ? determine_dirtyable_memory+0xd/0x1d
> [405816.060345]  [<ffffffff8109f697>] ? get_dirty_limits+0x1d/0x256
> [405816.060371]  [<ffffffff8100a54d>] ? __switch_to+0xae/0x266
> [405816.060397]  [<ffffffff810a3921>] ? shrink_zone+0x283/0x335
> [405816.060427]  [<ffffffffa0189217>] ? mb_cache_shrink_fn+0x26/0x117 [mbcache]
> [405816.060456]  [<ffffffff810a3b14>] ? shrink_slab+0x141/0x153
> [405816.060482]  [<ffffffff810a42ff>] ? kswapd+0x482/0x631
> [405816.060507]  [<ffffffff810a1c32>] ? isolate_pages_global+0x0/0x1f3
> [405816.060536]  [<ffffffff81053522>] ? autoremove_wake_function+0x0/0x2e
> [405816.060564]  [<ffffffff810a3e7d>] ? kswapd+0x0/0x631
> [405816.060588]  [<ffffffff810531d9>] ? kthread+0x84/0x8c
> [405816.060614]  [<ffffffff8100caca>] ? child_rip+0xa/0x20
> [405816.060639]  [<ffffffff81053155>] ? kthread+0x0/0x8c
> [405816.060664]  [<ffffffff8100cac0>] ? child_rip+0x0/0x20
> [405816.060688] Code: c0 0f 84 d1 02 00 00 f0 80 65 d8 ef 48 c7 c6 e0 b9 2a 81 48 c7 c7 26 fc 32 81 31 c0 e8 02 d0 1e 00 e9 b6 01 00 00 49 8b 44 24 58 <48> 83 38 00 0f 84 79 02 00 00 65 48 8b 04 25 00 b0 00 00 f6 40 
> [405816.060819] RIP  [<ffffffff810a2ac0>] shrink_page_list+0x2ac/0x609
> [405816.060847]  RSP <ffff88016cec1a40>
> [405816.061045] ---[ end trace c44a8d41c1aab2f3 ]---
> 
> 
> 
> 
> -- 
> Jiro SEKIBA <jir-hfpbi5WX9J54Eiagz67IpQ@public.gmane.org>
> _______________________________________________
> users mailing list
> users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
> https://www.nilfs.org/mailman/listinfo/users

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Crash...
       [not found]                         ` <20090729.124638.38314632.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2009-07-29  4:40                           ` Jiro SEKIBA
       [not found]                             ` <874osw14pz.wl%jir-27yqGEOhnJbQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 45+ messages in thread
From: Jiro SEKIBA @ 2009-07-29  4:40 UTC (permalink / raw)
  To: NILFS Users mailing list

Hi,

At Wed, 29 Jul 2009 12:46:38 +0900 (JST),
Ryusuke Konishi wrote:
> 
> Hi,
> On Wed, 29 Jul 2009 11:49:12 +0900, Jiro SEKIBA <jir-hfpbi5WX9J54Eiagz67IpQ@public.gmane.org> wrote:
> > Hi, 
> > 
> > > I tried to reproduce the situation, but I can not reproduce the bug
> > > with rc4, rc4+experimental on debian/lenny. 
> > 
> >  Well, when I tried I got different kernel dump.
> > I don't know if it's related or not, but just in case.
> > 
> >  I got following with rc4 with device mapper, created nilfs2 filesystem on
> > it during rsync on the filesystem.
> 
> shrink_page_list() is a core memory management function to reclaim
> free pages.
> 
> Could you send me the disassembled source of mm/vmscan.o ?
> You cat get it by using objdump command:

Here is the corresponded dump of shrink_page_list:

     ee4:       85 c0                   test   %eax,%eax
     ee6:       0f 84 d1 02 00 00       je     11bd <shrink_page_list+0x559>
     eec:       f0 80 65 d8 ef          lock andb $0xef,-0x28(%rbp)
     ef1:       48 c7 c6 00 00 00 00    mov    $0x0,%rsi
     ef8:       48 c7 c7 00 00 00 00    mov    $0x0,%rdi
     eff:       31 c0                   xor    %eax,%eax
     f01:       e8 00 00 00 00          callq  f06 <shrink_page_list+0x2a2>
     f06:       e9 b6 01 00 00          jmpq   10c1 <shrink_page_list+0x45d>
     f0b:       49 8b 44 24 58          mov    0x58(%r12),%rax
     f10:       48 83 38 00             cmpq   $0x0,(%rax)
     f14:       0f 84 79 02 00 00       je     1193 <shrink_page_list+0x52f>
     f1a:       65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
     f21:       00 00 
     f23:       f6 40 16 80             testb  $0x80,0x16(%rax)

looks like <48> is cmpq at f10.

>  $ cd linux/mm
>  $ objdump -D vmscan.o > vmscan.s
> 
> The instruction in question is that has the code "<48>" in the
> following sequence.
> 
> > [405816.060688] Code: c0 0f 84 d1 02 00 00 f0 80 65 d8 ef 48 c7 c6 e0 b9 2a 81 48 c7 c7 26 fc 32 81 31 c0 e8 02 d0 1e 00 e9 b6 01 00 00 49 8b 44 24 58 <48> 83 38 00 0f 84 79 02 00 00 65 48 8b 04 25 00 b0 00 00 f6 40 
> 
> Cheers,
> Ryusuke Konishi 
>  
> > [405816.059174] general protection fault: 0000 [#1] SMP 
> > [405816.059205] last sysfs file: /sys/block/dm-0/removable
> > [405816.059233] CPU 0 
> > [405816.059255] Modules linked in: dm_mod nilfs2 ipv6 loop snd_hda_codec_realtek i2c_i801 i2c_core iTCO_wdt serio_raw snd_hda_intel snd_hda_codec pcspkr psmouse snd_pcm snd_timer snd button processor soundcore intel_agp snd_page_alloc evdev ext3 jbd mbcache raid10 raid456 raid6_pq async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod sg sr_mod sd_mod cdrom ahci libata scsi_mod tg3 libphy uhci_hcd ehci_hcd thermal fan thermal_sys
> > [405816.059462] Pid: 215, comm: kswapd0 Not tainted 2.6.31-rc4 #1 N8-S720XMZCUUA2
> > [405816.059504] RIP: 0010:[<ffffffff810a2ac0>]  [<ffffffff810a2ac0>] shrink_page_list+0x2ac/0x609
> > [405816.059554] RSP: 0018:ffff88016cec1a40  EFLAGS: 00010282
> > [405816.059579] RAX: e7d50c6d4d1428d2 RBX: ffffea0003f5fbb0 RCX: 0000000000000800
> > [405816.059621] RDX: 0000000000000002 RSI: 0000000000000001 RDI: 0000000000000000
> > [405816.059663] RBP: ffffea0003f5fbd8 R08: 0000000000000001 R09: ffff88016fc075c0
> > [405816.059705] R10: 00003ffffffff000 R11: ffff88007a26f880 R12: ffff8801506f41a8
> > [405816.059747] R13: 0000000000000001 R14: 000000000000e800 R15: ffff88016cec1e10
> > [405816.059789] FS:  0000000000000000(0000) GS:ffff880028028000(0000) knlGS:0000000000000000
> > [405816.059833] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> > [405816.059858] CR2: 0000000001ae9618 CR3: 000000016c802000 CR4: 00000000000406f0
> > [405816.059900] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [405816.059943] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > [405816.059985] Process kswapd0 (pid: 215, threadinfo ffff88016cec0000, task ffff88016f27f930)
> > [405816.060028] Stack:
> > [405816.060047]  0000000000000001 ffff88016cec1af0 0000000000000000 ffff88016cec1cb0
> > [405816.060078] <0> 0000000000000000 0000000000000017 0000000000000009 0000000000000001
> > [405816.060124] <0> ffffea00039702b8 ffffea0004aa0458 ffffea0004aa0810 ffffea0003970248
> > [405816.060185] Call Trace:
> > [405816.060208]  [<ffffffff8100c40e>] ? common_interrupt+0xe/0x13
> > [405816.060234]  [<ffffffff810a1cdb>] ? isolate_pages_global+0xa9/0x1f3
> > [405816.060262]  [<ffffffff810a338a>] ? shrink_list+0x2d8/0x5ec
> > [405816.060289]  [<ffffffff8110ac52>] ? proc_delete_inode+0x0/0x40
> > [405816.060317]  [<ffffffff8109f621>] ? determine_dirtyable_memory+0xd/0x1d
> > [405816.060345]  [<ffffffff8109f697>] ? get_dirty_limits+0x1d/0x256
> > [405816.060371]  [<ffffffff8100a54d>] ? __switch_to+0xae/0x266
> > [405816.060397]  [<ffffffff810a3921>] ? shrink_zone+0x283/0x335
> > [405816.060427]  [<ffffffffa0189217>] ? mb_cache_shrink_fn+0x26/0x117 [mbcache]
> > [405816.060456]  [<ffffffff810a3b14>] ? shrink_slab+0x141/0x153
> > [405816.060482]  [<ffffffff810a42ff>] ? kswapd+0x482/0x631
> > [405816.060507]  [<ffffffff810a1c32>] ? isolate_pages_global+0x0/0x1f3
> > [405816.060536]  [<ffffffff81053522>] ? autoremove_wake_function+0x0/0x2e
> > [405816.060564]  [<ffffffff810a3e7d>] ? kswapd+0x0/0x631
> > [405816.060588]  [<ffffffff810531d9>] ? kthread+0x84/0x8c
> > [405816.060614]  [<ffffffff8100caca>] ? child_rip+0xa/0x20
> > [405816.060639]  [<ffffffff81053155>] ? kthread+0x0/0x8c
> > [405816.060664]  [<ffffffff8100cac0>] ? child_rip+0x0/0x20
> > [405816.060688] Code: c0 0f 84 d1 02 00 00 f0 80 65 d8 ef 48 c7 c6 e0 b9 2a 81 48 c7 c7 26 fc 32 81 31 c0 e8 02 d0 1e 00 e9 b6 01 00 00 49 8b 44 24 58 <48> 83 38 00 0f 84 79 02 00 00 65 48 8b 04 25 00 b0 00 00 f6 40 
> > [405816.060819] RIP  [<ffffffff810a2ac0>] shrink_page_list+0x2ac/0x609
> > [405816.060847]  RSP <ffff88016cec1a40>
> > [405816.061045] ---[ end trace c44a8d41c1aab2f3 ]---
> > 
> > 
> > 
> > 
> > -- 
> > Jiro SEKIBA <jir-hfpbi5WX9J54Eiagz67IpQ@public.gmane.org>
> > _______________________________________________
> > users mailing list
> > users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
> > https://www.nilfs.org/mailman/listinfo/users
> _______________________________________________
> users mailing list
> users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
> https://www.nilfs.org/mailman/listinfo/users
> 
> 
> 


-- 
Jiro SEKIBA <jir-hfpbi5WX9J54Eiagz67IpQ@public.gmane.org>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Crash...
       [not found]                             ` <874osw14pz.wl%jir-27yqGEOhnJbQT0dZR+AlfA@public.gmane.org>
@ 2009-07-29  5:08                               ` Ryusuke Konishi
       [not found]                                 ` <20090729.140821.103585622.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 45+ messages in thread
From: Ryusuke Konishi @ 2009-07-29  5:08 UTC (permalink / raw)
  To: users-JrjvKiOkagjYtjvyW6yDsg, jir-hfpbi5WX9J54Eiagz67IpQ

On Wed, 29 Jul 2009 13:40:56 +0900, Jiro SEKIBA <jir-hfpbi5WX9J54Eiagz67IpQ@public.gmane.org> wrote:
> Hi,
> 
> At Wed, 29 Jul 2009 12:46:38 +0900 (JST),
> Ryusuke Konishi wrote:
> > 
> > Hi,
> > On Wed, 29 Jul 2009 11:49:12 +0900, Jiro SEKIBA <jir-hfpbi5WX9J54Eiagz67IpQ@public.gmane.org> wrote:
> > > Hi, 
> > > 
> > > > I tried to reproduce the situation, but I can not reproduce the bug
> > > > with rc4, rc4+experimental on debian/lenny. 
> > > 
> > >  Well, when I tried I got different kernel dump.
> > > I don't know if it's related or not, but just in case.
> > > 
> > >  I got following with rc4 with device mapper, created nilfs2 filesystem on
> > > it during rsync on the filesystem.
> > 
> > shrink_page_list() is a core memory management function to reclaim
> > free pages.
> > 
> > Could you send me the disassembled source of mm/vmscan.o ?
> > You cat get it by using objdump command:
> 
> Here is the corresponded dump of shrink_page_list:
> 
>      ee4:       85 c0                   test   %eax,%eax
>      ee6:       0f 84 d1 02 00 00       je     11bd <shrink_page_list+0x559>
>      eec:       f0 80 65 d8 ef          lock andb $0xef,-0x28(%rbp)
>      ef1:       48 c7 c6 00 00 00 00    mov    $0x0,%rsi
>      ef8:       48 c7 c7 00 00 00 00    mov    $0x0,%rdi
>      eff:       31 c0                   xor    %eax,%eax
>      f01:       e8 00 00 00 00          callq  f06 <shrink_page_list+0x2a2>
>      f06:       e9 b6 01 00 00          jmpq   10c1 <shrink_page_list+0x45d>
>      f0b:       49 8b 44 24 58          mov    0x58(%r12),%rax
>      f10:       48 83 38 00             cmpq   $0x0,(%rax)
>      f14:       0f 84 79 02 00 00       je     1193 <shrink_page_list+0x52f>
>      f1a:       65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
>      f21:       00 00 
>      f23:       f6 40 16 80             testb  $0x80,0x16(%rax)
> 
> looks like <48> is cmpq at f10.

Thanks.

It looks like a part of the pageout() function inlined in the
shrink_page_list().

static pageout_t pageout(struct page *page, struct address_space *mapping,
                                                enum pageout_io sync_writeback)
{
	...
        if (!is_page_cache_freeable(page))
                return PAGE_KEEP;
        if (!mapping) {
                /*
                 * Some data journaling orphaned pages can have
                 * page->mapping == NULL while being dirty with clean buffers.
                 */
                if (page_has_private(page)) {
                        if (try_to_free_buffers(page)) {
                                ClearPageDirty(page);
                                printk("%s: orphaned page\n", __func__);
                                return PAGE_CLEAN;
                        }
                }
                return PAGE_KEEP;
        }
        if (mapping->a_ops->writepage == NULL)
                return PAGE_ACTIVATE;
       ...

The above ``a_ops->writepage'' causes the violative access.  According
to your log, mapping->a_ops (= RAX) seems to store a meaningless value.

Maybe the mapping->a_ops is not set on the page.

Ok, I'll check if nilfs can cause this sort of problem.

Thanks,
Ryusuke Konishi

> >  $ cd linux/mm
> >  $ objdump -D vmscan.o > vmscan.s
> > 
> > The instruction in question is that has the code "<48>" in the
> > following sequence.
> > 
> > > [405816.060688] Code: c0 0f 84 d1 02 00 00 f0 80 65 d8 ef 48 c7 c6 e0 b9 2a 81 48 c7 c7 26 fc 32 81 31 c0 e8 02 d0 1e 00 e9 b6 01 00 00 49 8b 44 24 58 <48> 83 38 00 0f 84 79 02 00 00 65 48 8b 04 25 00 b0 00 00 f6 40 
> > 
> > Cheers,
> > Ryusuke Konishi 
> >  
> > > [405816.059174] general protection fault: 0000 [#1] SMP 
> > > [405816.059205] last sysfs file: /sys/block/dm-0/removable
> > > [405816.059233] CPU 0 
> > > [405816.059255] Modules linked in: dm_mod nilfs2 ipv6 loop snd_hda_codec_realtek i2c_i801 i2c_core iTCO_wdt serio_raw snd_hda_intel snd_hda_codec pcspkr psmouse snd_pcm snd_timer snd button processor soundcore intel_agp snd_page_alloc evdev ext3 jbd mbcache raid10 raid456 raid6_pq async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod sg sr_mod sd_mod cdrom ahci libata scsi_mod tg3 libphy uhci_hcd ehci_hcd thermal fan thermal_sys
> > > [405816.059462] Pid: 215, comm: kswapd0 Not tainted 2.6.31-rc4 #1 N8-S720XMZCUUA2
> > > [405816.059504] RIP: 0010:[<ffffffff810a2ac0>]  [<ffffffff810a2ac0>] shrink_page_list+0x2ac/0x609
> > > [405816.059554] RSP: 0018:ffff88016cec1a40  EFLAGS: 00010282
> > > [405816.059579] RAX: e7d50c6d4d1428d2 RBX: ffffea0003f5fbb0 RCX: 0000000000000800
> > > [405816.059621] RDX: 0000000000000002 RSI: 0000000000000001 RDI: 0000000000000000
> > > [405816.059663] RBP: ffffea0003f5fbd8 R08: 0000000000000001 R09: ffff88016fc075c0
> > > [405816.059705] R10: 00003ffffffff000 R11: ffff88007a26f880 R12: ffff8801506f41a8
> > > [405816.059747] R13: 0000000000000001 R14: 000000000000e800 R15: ffff88016cec1e10
> > > [405816.059789] FS:  0000000000000000(0000) GS:ffff880028028000(0000) knlGS:0000000000000000
> > > [405816.059833] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> > > [405816.059858] CR2: 0000000001ae9618 CR3: 000000016c802000 CR4: 00000000000406f0
> > > [405816.059900] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > [405816.059943] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > [405816.059985] Process kswapd0 (pid: 215, threadinfo ffff88016cec0000, task ffff88016f27f930)
> > > [405816.060028] Stack:
> > > [405816.060047]  0000000000000001 ffff88016cec1af0 0000000000000000 ffff88016cec1cb0
> > > [405816.060078] <0> 0000000000000000 0000000000000017 0000000000000009 0000000000000001
> > > [405816.060124] <0> ffffea00039702b8 ffffea0004aa0458 ffffea0004aa0810 ffffea0003970248
> > > [405816.060185] Call Trace:
> > > [405816.060208]  [<ffffffff8100c40e>] ? common_interrupt+0xe/0x13
> > > [405816.060234]  [<ffffffff810a1cdb>] ? isolate_pages_global+0xa9/0x1f3
> > > [405816.060262]  [<ffffffff810a338a>] ? shrink_list+0x2d8/0x5ec
> > > [405816.060289]  [<ffffffff8110ac52>] ? proc_delete_inode+0x0/0x40
> > > [405816.060317]  [<ffffffff8109f621>] ? determine_dirtyable_memory+0xd/0x1d
> > > [405816.060345]  [<ffffffff8109f697>] ? get_dirty_limits+0x1d/0x256
> > > [405816.060371]  [<ffffffff8100a54d>] ? __switch_to+0xae/0x266
> > > [405816.060397]  [<ffffffff810a3921>] ? shrink_zone+0x283/0x335
> > > [405816.060427]  [<ffffffffa0189217>] ? mb_cache_shrink_fn+0x26/0x117 [mbcache]
> > > [405816.060456]  [<ffffffff810a3b14>] ? shrink_slab+0x141/0x153
> > > [405816.060482]  [<ffffffff810a42ff>] ? kswapd+0x482/0x631
> > > [405816.060507]  [<ffffffff810a1c32>] ? isolate_pages_global+0x0/0x1f3
> > > [405816.060536]  [<ffffffff81053522>] ? autoremove_wake_function+0x0/0x2e
> > > [405816.060564]  [<ffffffff810a3e7d>] ? kswapd+0x0/0x631
> > > [405816.060588]  [<ffffffff810531d9>] ? kthread+0x84/0x8c
> > > [405816.060614]  [<ffffffff8100caca>] ? child_rip+0xa/0x20
> > > [405816.060639]  [<ffffffff81053155>] ? kthread+0x0/0x8c
> > > [405816.060664]  [<ffffffff8100cac0>] ? child_rip+0x0/0x20
> > > [405816.060688] Code: c0 0f 84 d1 02 00 00 f0 80 65 d8 ef 48 c7 c6 e0 b9 2a 81 48 c7 c7 26 fc 32 81 31 c0 e8 02 d0 1e 00 e9 b6 01 00 00 49 8b 44 24 58 <48> 83 38 00 0f 84 79 02 00 00 65 48 8b 04 25 00 b0 00 00 f6 40 
> > > [405816.060819] RIP  [<ffffffff810a2ac0>] shrink_page_list+0x2ac/0x609
> > > [405816.060847]  RSP <ffff88016cec1a40>
> > > [405816.061045] ---[ end trace c44a8d41c1aab2f3 ]---
> > > 
> > > 
> > > 
> > > 
> > > -- 
> > > Jiro SEKIBA <jir-hfpbi5WX9J54Eiagz67IpQ@public.gmane.org>
> > > _______________________________________________
> > > users mailing list
> > > users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
> > > https://www.nilfs.org/mailman/listinfo/users
> > _______________________________________________
> > users mailing list
> > users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
> > https://www.nilfs.org/mailman/listinfo/users
> > 
> > 
> > 
> 
> 
> -- 
> Jiro SEKIBA <jir-hfpbi5WX9J54Eiagz67IpQ@public.gmane.org>
> _______________________________________________
> users mailing list
> users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
> https://www.nilfs.org/mailman/listinfo/users

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Crash...
       [not found]             ` <20090729.114604.56042421.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2009-08-01 13:36               ` Andrea Gelmini
       [not found]                 ` <9cdbb57f0908010636u7296da29p61df192dc35d0d12-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 45+ messages in thread
From: Andrea Gelmini @ 2009-08-01 13:36 UTC (permalink / raw)
  To: users-JrjvKiOkagjYtjvyW6yDsg

2009/7/29 Ryusuke Konishi <ryusuke-sG5X7nlA6pw@public.gmane.org>:
> Could you test if the patch makes a difference for the same file ?

Well,
   I used this:
   git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2.git
(branch fixes)

   I stressed it a lot, and it works perfectly.
   In a few days I will test the 2048 block size, too (even I guess
it's not necessary).

Thanks a lot for your work,
Andrea

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Crash...
       [not found]                 ` <873a8jhsbd.wl%jir-27yqGEOhnJbQT0dZR+AlfA@public.gmane.org>
  2009-07-27  7:58                   ` Crash Andrea Gelmini
  2009-07-29  2:49                   ` Crash Jiro SEKIBA
@ 2009-08-01 13:39                   ` Andrea Gelmini
       [not found]                     ` <9cdbb57f0908010639l26c26182ma121b0d7672003e0-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2 siblings, 1 reply; 45+ messages in thread
From: Andrea Gelmini @ 2009-08-01 13:39 UTC (permalink / raw)
  To: NILFS Users mailing list

2009/7/27 Jiro SEKIBA <jir-hfpbi5WX9J54Eiagz67IpQ@public.gmane.org>:
> Can you reproduce the bug without using lvm volume?

Hi,
   to trigger the ops I had to write a file >4G.
   Anyway, with latest Ryusuke's patch, I had no problem at all, even
with nilfs on top of MD+LVM.

Thanks a lot,
Andrea

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Crash...
       [not found]                 ` <9cdbb57f0908010636u7296da29p61df192dc35d0d12-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2009-08-01 13:56                   ` Ryusuke Konishi
  0 siblings, 0 replies; 45+ messages in thread
From: Ryusuke Konishi @ 2009-08-01 13:56 UTC (permalink / raw)
  To: andrea.gelmini-Re5JQEeQqe8AvxtiuMwx3w; +Cc: users-JrjvKiOkagjYtjvyW6yDsg

On Sat, 1 Aug 2009 15:36:22 +0200, Andrea Gelmini wrote:
> 2009/7/29 Ryusuke Konishi <ryusuke-sG5X7nlA6pw@public.gmane.org>:
> > Could you test if the patch makes a difference for the same file ?
> 
> Well,
>    I used this:
>    git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2.git
> (branch fixes)
> 
>    I stressed it a lot, and it works perfectly.
>    In a few days I will test the 2048 block size, too (even I guess
> it's not necessary).
> 
> Thanks a lot for your work,
> Andrea

Thanks for your response.
I will send the fix to Linus for the next -rc release.

Thanks,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Crash...
       [not found]                     ` <9cdbb57f0908010639l26c26182ma121b0d7672003e0-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2009-08-02  7:58                       ` Jiro SEKIBA
  0 siblings, 0 replies; 45+ messages in thread
From: Jiro SEKIBA @ 2009-08-02  7:58 UTC (permalink / raw)
  To: NILFS Users mailing list

Hi,

At Sat, 1 Aug 2009 15:39:17 +0200,
Andrea Gelmini wrote:
> 
> 2009/7/27 Jiro SEKIBA <jir-hfpbi5WX9J54Eiagz67IpQ@public.gmane.org>:
> > Can you reproduce the bug without using lvm volume?
> 
> Hi,
>    to trigger the ops I had to write a file >4G.

nhhh, I rsynced lots of DVD images, so some of those must have been >4G.

>    Anyway, with latest Ryusuke's patch, I had no problem at all, even
> with nilfs on top of MD+LVM.

That is good news, good news anyway

thanks

regards,
-- 
Jiro SEKIBA <jir-hfpbi5WX9J54Eiagz67IpQ@public.gmane.org>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* kernel oops on shrink_page_list (was Re: Crash...)
       [not found]                                 ` <20090729.140821.103585622.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2009-08-10  6:54                                   ` Ryusuke Konishi
       [not found]                                     ` <20090810.155420.42596352.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 45+ messages in thread
From: Ryusuke Konishi @ 2009-08-10  6:54 UTC (permalink / raw)
  To: users-JrjvKiOkagjYtjvyW6yDsg; +Cc: jir-hfpbi5WX9J54Eiagz67IpQ

Hi,

Recently, I saw this oops rather frequently for the latest
nilfs2-module.git, which applied a bunch of patches backported from
2.6.31-rc.

So, I'm doing git bisect to find out the cause for the changes since
nilfs-2.0.15.  So far, nilfs-2.0.15 seems stable to me.

If you see this oops on the nilfs-2.0.15 or prior versions (or at
kernel-2.6.30), please let me know.


Thanks,
Ryusuke Konishi

On Wed, 29 Jul 2009 14:08:21 +0900 (JST), Ryusuke Konishi wrote:
> On Wed, 29 Jul 2009 13:40:56 +0900, Jiro SEKIBA <jir-hfpbi5WX9J54Eiagz67IpQ@public.gmane.org> wrote:
> > Hi,
> > 
> > At Wed, 29 Jul 2009 12:46:38 +0900 (JST),
> > Ryusuke Konishi wrote:
> > > 
> > > Hi,
> > > On Wed, 29 Jul 2009 11:49:12 +0900, Jiro SEKIBA <jir-hfpbi5WX9J54Eiagz67IpQ@public.gmane.org> wrote:
> > > > Hi, 
> > > > 
> > > > > I tried to reproduce the situation, but I can not reproduce the bug
> > > > > with rc4, rc4+experimental on debian/lenny. 
> > > > 
> > > >  Well, when I tried I got different kernel dump.
> > > > I don't know if it's related or not, but just in case.
> > > > 
> > > >  I got following with rc4 with device mapper, created nilfs2 filesystem on
> > > > it during rsync on the filesystem.
> > > 
> > > shrink_page_list() is a core memory management function to reclaim
> > > free pages.
> > > 
> > > Could you send me the disassembled source of mm/vmscan.o ?
> > > You cat get it by using objdump command:
> > 
> > Here is the corresponded dump of shrink_page_list:
> > 
> >      ee4:       85 c0                   test   %eax,%eax
> >      ee6:       0f 84 d1 02 00 00       je     11bd <shrink_page_list+0x559>
> >      eec:       f0 80 65 d8 ef          lock andb $0xef,-0x28(%rbp)
> >      ef1:       48 c7 c6 00 00 00 00    mov    $0x0,%rsi
> >      ef8:       48 c7 c7 00 00 00 00    mov    $0x0,%rdi
> >      eff:       31 c0                   xor    %eax,%eax
> >      f01:       e8 00 00 00 00          callq  f06 <shrink_page_list+0x2a2>
> >      f06:       e9 b6 01 00 00          jmpq   10c1 <shrink_page_list+0x45d>
> >      f0b:       49 8b 44 24 58          mov    0x58(%r12),%rax
> >      f10:       48 83 38 00             cmpq   $0x0,(%rax)
> >      f14:       0f 84 79 02 00 00       je     1193 <shrink_page_list+0x52f>
> >      f1a:       65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
> >      f21:       00 00 
> >      f23:       f6 40 16 80             testb  $0x80,0x16(%rax)
> > 
> > looks like <48> is cmpq at f10.
> 
> Thanks.
> 
> It looks like a part of the pageout() function inlined in the
> shrink_page_list().
> 
> static pageout_t pageout(struct page *page, struct address_space *mapping,
>                                                 enum pageout_io sync_writeback)
> {
> 	...
>         if (!is_page_cache_freeable(page))
>                 return PAGE_KEEP;
>         if (!mapping) {
>                 /*
>                  * Some data journaling orphaned pages can have
>                  * page->mapping == NULL while being dirty with clean buffers.
>                  */
>                 if (page_has_private(page)) {
>                         if (try_to_free_buffers(page)) {
>                                 ClearPageDirty(page);
>                                 printk("%s: orphaned page\n", __func__);
>                                 return PAGE_CLEAN;
>                         }
>                 }
>                 return PAGE_KEEP;
>         }
>         if (mapping->a_ops->writepage == NULL)
>                 return PAGE_ACTIVATE;
>        ...
> 
> The above ``a_ops->writepage'' causes the violative access.  According
> to your log, mapping->a_ops (= RAX) seems to store a meaningless value.
> 
> Maybe the mapping->a_ops is not set on the page.
> 
> Ok, I'll check if nilfs can cause this sort of problem.
> 
> Thanks,
> Ryusuke Konishi
> 
> > >  $ cd linux/mm
> > >  $ objdump -D vmscan.o > vmscan.s
> > > 
> > > The instruction in question is that has the code "<48>" in the
> > > following sequence.
> > > 
> > > > [405816.060688] Code: c0 0f 84 d1 02 00 00 f0 80 65 d8 ef 48 c7 c6 e0 b9 2a 81 48 c7 c7 26 fc 32 81 31 c0 e8 02 d0 1e 00 e9 b6 01 00 00 49 8b 44 24 58 <48> 83 38 00 0f 84 79 02 00 00 65 48 8b 04 25 00 b0 00 00 f6 40 
> > > 
> > > Cheers,
> > > Ryusuke Konishi 
> > >  
> > > > [405816.059174] general protection fault: 0000 [#1] SMP 
> > > > [405816.059205] last sysfs file: /sys/block/dm-0/removable
> > > > [405816.059233] CPU 0 
> > > > [405816.059255] Modules linked in: dm_mod nilfs2 ipv6 loop snd_hda_codec_realtek i2c_i801 i2c_core iTCO_wdt serio_raw snd_hda_intel snd_hda_codec pcspkr psmouse snd_pcm snd_timer snd button processor soundcore intel_agp snd_page_alloc evdev ext3 jbd mbcache raid10 raid456 raid6_pq async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod sg sr_mod sd_mod cdrom ahci libata scsi_mod tg3 libphy uhci_hcd ehci_hcd thermal fan thermal_sys
> > > > [405816.059462] Pid: 215, comm: kswapd0 Not tainted 2.6.31-rc4 #1 N8-S720XMZCUUA2
> > > > [405816.059504] RIP: 0010:[<ffffffff810a2ac0>]  [<ffffffff810a2ac0>] shrink_page_list+0x2ac/0x609
> > > > [405816.059554] RSP: 0018:ffff88016cec1a40  EFLAGS: 00010282
> > > > [405816.059579] RAX: e7d50c6d4d1428d2 RBX: ffffea0003f5fbb0 RCX: 0000000000000800
> > > > [405816.059621] RDX: 0000000000000002 RSI: 0000000000000001 RDI: 0000000000000000
> > > > [405816.059663] RBP: ffffea0003f5fbd8 R08: 0000000000000001 R09: ffff88016fc075c0
> > > > [405816.059705] R10: 00003ffffffff000 R11: ffff88007a26f880 R12: ffff8801506f41a8
> > > > [405816.059747] R13: 0000000000000001 R14: 000000000000e800 R15: ffff88016cec1e10
> > > > [405816.059789] FS:  0000000000000000(0000) GS:ffff880028028000(0000) knlGS:0000000000000000
> > > > [405816.059833] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> > > > [405816.059858] CR2: 0000000001ae9618 CR3: 000000016c802000 CR4: 00000000000406f0
> > > > [405816.059900] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > [405816.059943] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > > [405816.059985] Process kswapd0 (pid: 215, threadinfo ffff88016cec0000, task ffff88016f27f930)
> > > > [405816.060028] Stack:
> > > > [405816.060047]  0000000000000001 ffff88016cec1af0 0000000000000000 ffff88016cec1cb0
> > > > [405816.060078] <0> 0000000000000000 0000000000000017 0000000000000009 0000000000000001
> > > > [405816.060124] <0> ffffea00039702b8 ffffea0004aa0458 ffffea0004aa0810 ffffea0003970248
> > > > [405816.060185] Call Trace:
> > > > [405816.060208]  [<ffffffff8100c40e>] ? common_interrupt+0xe/0x13
> > > > [405816.060234]  [<ffffffff810a1cdb>] ? isolate_pages_global+0xa9/0x1f3
> > > > [405816.060262]  [<ffffffff810a338a>] ? shrink_list+0x2d8/0x5ec
> > > > [405816.060289]  [<ffffffff8110ac52>] ? proc_delete_inode+0x0/0x40
> > > > [405816.060317]  [<ffffffff8109f621>] ? determine_dirtyable_memory+0xd/0x1d
> > > > [405816.060345]  [<ffffffff8109f697>] ? get_dirty_limits+0x1d/0x256
> > > > [405816.060371]  [<ffffffff8100a54d>] ? __switch_to+0xae/0x266
> > > > [405816.060397]  [<ffffffff810a3921>] ? shrink_zone+0x283/0x335
> > > > [405816.060427]  [<ffffffffa0189217>] ? mb_cache_shrink_fn+0x26/0x117 [mbcache]
> > > > [405816.060456]  [<ffffffff810a3b14>] ? shrink_slab+0x141/0x153
> > > > [405816.060482]  [<ffffffff810a42ff>] ? kswapd+0x482/0x631
> > > > [405816.060507]  [<ffffffff810a1c32>] ? isolate_pages_global+0x0/0x1f3
> > > > [405816.060536]  [<ffffffff81053522>] ? autoremove_wake_function+0x0/0x2e
> > > > [405816.060564]  [<ffffffff810a3e7d>] ? kswapd+0x0/0x631
> > > > [405816.060588]  [<ffffffff810531d9>] ? kthread+0x84/0x8c
> > > > [405816.060614]  [<ffffffff8100caca>] ? child_rip+0xa/0x20
> > > > [405816.060639]  [<ffffffff81053155>] ? kthread+0x0/0x8c
> > > > [405816.060664]  [<ffffffff8100cac0>] ? child_rip+0x0/0x20
> > > > [405816.060688] Code: c0 0f 84 d1 02 00 00 f0 80 65 d8 ef 48 c7 c6 e0 b9 2a 81 48 c7 c7 26 fc 32 81 31 c0 e8 02 d0 1e 00 e9 b6 01 00 00 49 8b 44 24 58 <48> 83 38 00 0f 84 79 02 00 00 65 48 8b 04 25 00 b0 00 00 f6 40 
> > > > [405816.060819] RIP  [<ffffffff810a2ac0>] shrink_page_list+0x2ac/0x609
> > > > [405816.060847]  RSP <ffff88016cec1a40>
> > > > [405816.061045] ---[ end trace c44a8d41c1aab2f3 ]---
> > > > 
> > > > 
> > > > 
> > > > 
> > > > -- 
> > > > Jiro SEKIBA <jir-hfpbi5WX9J54Eiagz67IpQ@public.gmane.org>
> > > > _______________________________________________
> > > > users mailing list
> > > > users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
> > > > https://www.nilfs.org/mailman/listinfo/users
> > > _______________________________________________
> > > users mailing list
> > > users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
> > > https://www.nilfs.org/mailman/listinfo/users
> > > 
> > > 
> > > 
> > 
> > 
> > -- 
> > Jiro SEKIBA <jir-hfpbi5WX9J54Eiagz67IpQ@public.gmane.org>
> > _______________________________________________
> > users mailing list
> > users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
> > https://www.nilfs.org/mailman/listinfo/users

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: kernel oops on shrink_page_list
       [not found]                                     ` <20090810.155420.42596352.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2009-08-25 10:50                                       ` Ryusuke Konishi
  0 siblings, 0 replies; 45+ messages in thread
From: Ryusuke Konishi @ 2009-08-25 10:50 UTC (permalink / raw)
  To: users-JrjvKiOkagjYtjvyW6yDsg; +Cc: jir-hfpbi5WX9J54Eiagz67IpQ

Hi everyone,

On Mon, 10 Aug 2009 15:54:20 +0900 (JST), Ryusuke Konishi wrote:
> Hi,
> 
> Recently, I saw this oops rather frequently for the latest
> nilfs2-module.git, which applied a bunch of patches backported from
> 2.6.31-rc.
> 
> So, I'm doing git bisect to find out the cause for the changes since
> nilfs-2.0.15.  So far, nilfs-2.0.15 seems stable to me.
> 
> If you see this oops on the nilfs-2.0.15 or prior versions (or at
> kernel-2.6.30), please let me know.
> 
> 
> Thanks,
> Ryusuke Konishi

After applying a nilfs patch merged in 2.6.31-rc6, this oops problem
totally disappeared.  So, I deem it fixed with the patch.

The patch got included in the stable kernel 2.6.30.5, and I've pushed
it to nilfs2-module.git, too.

I would recommend users having the same problem try upgrade to these
updated kernels.

Thanks,
Ryusuke Konishi


> On Wed, 29 Jul 2009 14:08:21 +0900 (JST), Ryusuke Konishi wrote:
> > On Wed, 29 Jul 2009 13:40:56 +0900, Jiro SEKIBA <jir-hfpbi5WX9J54Eiagz67IpQ@public.gmane.org> wrote:
> > > Hi,
> > > 
> > > At Wed, 29 Jul 2009 12:46:38 +0900 (JST),
> > > Ryusuke Konishi wrote:
> > > > 
> > > > Hi,
> > > > On Wed, 29 Jul 2009 11:49:12 +0900, Jiro SEKIBA <jir-hfpbi5WX9J54Eiagz67IpQ@public.gmane.org> wrote:
> > > > > Hi, 
> > > > > 
> > > > > > I tried to reproduce the situation, but I can not reproduce the bug
> > > > > > with rc4, rc4+experimental on debian/lenny. 
> > > > > 
> > > > >  Well, when I tried I got different kernel dump.
> > > > > I don't know if it's related or not, but just in case.
> > > > > 
> > > > >  I got following with rc4 with device mapper, created nilfs2 filesystem on
> > > > > it during rsync on the filesystem.
> > > > 
> > > > shrink_page_list() is a core memory management function to reclaim
> > > > free pages.
> > > > 
> > > > Could you send me the disassembled source of mm/vmscan.o ?
> > > > You cat get it by using objdump command:
> > > 
> > > Here is the corresponded dump of shrink_page_list:
> > > 
> > >      ee4:       85 c0                   test   %eax,%eax
> > >      ee6:       0f 84 d1 02 00 00       je     11bd <shrink_page_list+0x559>
> > >      eec:       f0 80 65 d8 ef          lock andb $0xef,-0x28(%rbp)
> > >      ef1:       48 c7 c6 00 00 00 00    mov    $0x0,%rsi
> > >      ef8:       48 c7 c7 00 00 00 00    mov    $0x0,%rdi
> > >      eff:       31 c0                   xor    %eax,%eax
> > >      f01:       e8 00 00 00 00          callq  f06 <shrink_page_list+0x2a2>
> > >      f06:       e9 b6 01 00 00          jmpq   10c1 <shrink_page_list+0x45d>
> > >      f0b:       49 8b 44 24 58          mov    0x58(%r12),%rax
> > >      f10:       48 83 38 00             cmpq   $0x0,(%rax)
> > >      f14:       0f 84 79 02 00 00       je     1193 <shrink_page_list+0x52f>
> > >      f1a:       65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
> > >      f21:       00 00 
> > >      f23:       f6 40 16 80             testb  $0x80,0x16(%rax)
> > > 
> > > looks like <48> is cmpq at f10.
> > 
> > Thanks.
> > 
> > It looks like a part of the pageout() function inlined in the
> > shrink_page_list().
> > 
> > static pageout_t pageout(struct page *page, struct address_space *mapping,
> >                                                 enum pageout_io sync_writeback)
> > {
> > 	...
> >         if (!is_page_cache_freeable(page))
> >                 return PAGE_KEEP;
> >         if (!mapping) {
> >                 /*
> >                  * Some data journaling orphaned pages can have
> >                  * page->mapping == NULL while being dirty with clean buffers.
> >                  */
> >                 if (page_has_private(page)) {
> >                         if (try_to_free_buffers(page)) {
> >                                 ClearPageDirty(page);
> >                                 printk("%s: orphaned page\n", __func__);
> >                                 return PAGE_CLEAN;
> >                         }
> >                 }
> >                 return PAGE_KEEP;
> >         }
> >         if (mapping->a_ops->writepage == NULL)
> >                 return PAGE_ACTIVATE;
> >        ...
> > 
> > The above ``a_ops->writepage'' causes the violative access.  According
> > to your log, mapping->a_ops (= RAX) seems to store a meaningless value.
> > 
> > Maybe the mapping->a_ops is not set on the page.
> > 
> > Ok, I'll check if nilfs can cause this sort of problem.
> > 
> > Thanks,
> > Ryusuke Konishi
> > 
> > > >  $ cd linux/mm
> > > >  $ objdump -D vmscan.o > vmscan.s
> > > > 
> > > > The instruction in question is that has the code "<48>" in the
> > > > following sequence.
> > > > 
> > > > > [405816.060688] Code: c0 0f 84 d1 02 00 00 f0 80 65 d8 ef 48 c7 c6 e0 b9 2a 81 48 c7 c7 26 fc 32 81 31 c0 e8 02 d0 1e 00 e9 b6 01 00 00 49 8b 44 24 58 <48> 83 38 00 0f 84 79 02 00 00 65 48 8b 04 25 00 b0 00 00 f6 40 
> > > > 
> > > > Cheers,
> > > > Ryusuke Konishi 
> > > >  
> > > > > [405816.059174] general protection fault: 0000 [#1] SMP 
> > > > > [405816.059205] last sysfs file: /sys/block/dm-0/removable
> > > > > [405816.059233] CPU 0 
> > > > > [405816.059255] Modules linked in: dm_mod nilfs2 ipv6 loop snd_hda_codec_realtek i2c_i801 i2c_core iTCO_wdt serio_raw snd_hda_intel snd_hda_codec pcspkr psmouse snd_pcm snd_timer snd button processor soundcore intel_agp snd_page_alloc evdev ext3 jbd mbcache raid10 raid456 raid6_pq async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod sg sr_mod sd_mod cdrom ahci libata scsi_mod tg3 libphy uhci_hcd ehci_hcd thermal fan thermal_sys
> > > > > [405816.059462] Pid: 215, comm: kswapd0 Not tainted 2.6.31-rc4 #1 N8-S720XMZCUUA2
> > > > > [405816.059504] RIP: 0010:[<ffffffff810a2ac0>]  [<ffffffff810a2ac0>] shrink_page_list+0x2ac/0x609
> > > > > [405816.059554] RSP: 0018:ffff88016cec1a40  EFLAGS: 00010282
> > > > > [405816.059579] RAX: e7d50c6d4d1428d2 RBX: ffffea0003f5fbb0 RCX: 0000000000000800
> > > > > [405816.059621] RDX: 0000000000000002 RSI: 0000000000000001 RDI: 0000000000000000
> > > > > [405816.059663] RBP: ffffea0003f5fbd8 R08: 0000000000000001 R09: ffff88016fc075c0
> > > > > [405816.059705] R10: 00003ffffffff000 R11: ffff88007a26f880 R12: ffff8801506f41a8
> > > > > [405816.059747] R13: 0000000000000001 R14: 000000000000e800 R15: ffff88016cec1e10
> > > > > [405816.059789] FS:  0000000000000000(0000) GS:ffff880028028000(0000) knlGS:0000000000000000
> > > > > [405816.059833] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> > > > > [405816.059858] CR2: 0000000001ae9618 CR3: 000000016c802000 CR4: 00000000000406f0
> > > > > [405816.059900] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > [405816.059943] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > > > [405816.059985] Process kswapd0 (pid: 215, threadinfo ffff88016cec0000, task ffff88016f27f930)
> > > > > [405816.060028] Stack:
> > > > > [405816.060047]  0000000000000001 ffff88016cec1af0 0000000000000000 ffff88016cec1cb0
> > > > > [405816.060078] <0> 0000000000000000 0000000000000017 0000000000000009 0000000000000001
> > > > > [405816.060124] <0> ffffea00039702b8 ffffea0004aa0458 ffffea0004aa0810 ffffea0003970248
> > > > > [405816.060185] Call Trace:
> > > > > [405816.060208]  [<ffffffff8100c40e>] ? common_interrupt+0xe/0x13
> > > > > [405816.060234]  [<ffffffff810a1cdb>] ? isolate_pages_global+0xa9/0x1f3
> > > > > [405816.060262]  [<ffffffff810a338a>] ? shrink_list+0x2d8/0x5ec
> > > > > [405816.060289]  [<ffffffff8110ac52>] ? proc_delete_inode+0x0/0x40
> > > > > [405816.060317]  [<ffffffff8109f621>] ? determine_dirtyable_memory+0xd/0x1d
> > > > > [405816.060345]  [<ffffffff8109f697>] ? get_dirty_limits+0x1d/0x256
> > > > > [405816.060371]  [<ffffffff8100a54d>] ? __switch_to+0xae/0x266
> > > > > [405816.060397]  [<ffffffff810a3921>] ? shrink_zone+0x283/0x335
> > > > > [405816.060427]  [<ffffffffa0189217>] ? mb_cache_shrink_fn+0x26/0x117 [mbcache]
> > > > > [405816.060456]  [<ffffffff810a3b14>] ? shrink_slab+0x141/0x153
> > > > > [405816.060482]  [<ffffffff810a42ff>] ? kswapd+0x482/0x631
> > > > > [405816.060507]  [<ffffffff810a1c32>] ? isolate_pages_global+0x0/0x1f3
> > > > > [405816.060536]  [<ffffffff81053522>] ? autoremove_wake_function+0x0/0x2e
> > > > > [405816.060564]  [<ffffffff810a3e7d>] ? kswapd+0x0/0x631
> > > > > [405816.060588]  [<ffffffff810531d9>] ? kthread+0x84/0x8c
> > > > > [405816.060614]  [<ffffffff8100caca>] ? child_rip+0xa/0x20
> > > > > [405816.060639]  [<ffffffff81053155>] ? kthread+0x0/0x8c
> > > > > [405816.060664]  [<ffffffff8100cac0>] ? child_rip+0x0/0x20
> > > > > [405816.060688] Code: c0 0f 84 d1 02 00 00 f0 80 65 d8 ef 48 c7 c6 e0 b9 2a 81 48 c7 c7 26 fc 32 81 31 c0 e8 02 d0 1e 00 e9 b6 01 00 00 49 8b 44 24 58 <48> 83 38 00 0f 84 79 02 00 00 65 48 8b 04 25 00 b0 00 00 f6 40 
> > > > > [405816.060819] RIP  [<ffffffff810a2ac0>] shrink_page_list+0x2ac/0x609
> > > > > [405816.060847]  RSP <ffff88016cec1a40>
> > > > > [405816.061045] ---[ end trace c44a8d41c1aab2f3 ]---
> > > > > 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: crash
  2020-02-06 20:58 crash Frank Esposito
@ 2020-02-07  5:28 ` Rebecca Cran
  0 siblings, 0 replies; 45+ messages in thread
From: Rebecca Cran @ 2020-02-07  5:28 UTC (permalink / raw)
  To: Frank Esposito, fio

On 2020-02-06 13:58, Frank Esposito wrote:
> Hello --
>
> I am running win 10 / 1909  64bit --- all current update applied --
>
> I download the win binaries  including the zip ones  --
>
> I unzipped  fio-3.16-x64.zip   and ran fio.exe --- Is it looking for a DLL

That's strange. Could you try again with the 3.18 packages I've just
uploaded to https://bsdio.com/fio/ please? 


-- 
Rebecca Cran




^ permalink raw reply	[flat|nested] 45+ messages in thread

* crash
@ 2020-02-06 20:58 Frank Esposito
  2020-02-07  5:28 ` crash Rebecca Cran
  0 siblings, 1 reply; 45+ messages in thread
From: Frank Esposito @ 2020-02-06 20:58 UTC (permalink / raw)
  To: fio

Hello --

I am running win 10 / 1909  64bit --- all current update applied --

I download the win binaries  including the zip ones  --

I unzipped  fio-3.16-x64.zip   and ran fio.exe --- Is it looking for a DLL


This is the text from the event log ---


Windows cannot access the file for one of the following reasons:  there is a
problem with the network connection, the disk that the file is stored on, or the
storage drivers installed on this computer; or the disk is missing.  Windows
closed the program fio.exe because of this error.

Program:  fio.exe File:

The error value is listed in the Additional Data section.  User Action 1. Open
the file again.  This situation might be a temporary problem that corrects
itself when the program runs again.

2. If the file still cannot be accessed and

- It is on the network, your network administrator should verify that there is
not a problem with the network and that the server can be contacted.

- It is on a removable disk, for example, a floppy disk or CD-ROM, verify that
the disk is fully inserted into the computer.

3. Check and repair the file system by running CHKDSK.  To run CHKDSK, click
Start, click Run, type CMD, and then click OK.  At the command prompt, type
CHKDSK /F, and then press ENTER.

4. If the problem persists, restore the file from a backup copy.

5. Determine whether other files on the same disk can be opened.  If not, the
disk might be damaged.  If it is a hard disk, contact your administrator or
computer hardware vendor for further assistance.

Additional Data Error value:  00000000 Disk type:  0 -


-- 
Frank Esposito


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: crash
       [not found] ` <ee5afd761002182235r1fe20b0kc7ef7082a5a907e3-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-02-20 13:38   ` Ryusuke Konishi
  0 siblings, 0 replies; 45+ messages in thread
From: Ryusuke Konishi @ 2010-02-20 13:38 UTC (permalink / raw)
  To: jan.de.kruyf-Re5JQEeQqe8AvxtiuMwx3w; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,
On Fri, 19 Feb 2010 08:35:56 +0200, Jan de Kruyf wrote:
> Hallo
> Just to check that you did receive my email re nilfs on febr 14 with
> 'crashedVar1.tgz' (346k) attached.
> 
> Enjoy your day,
> Jan de Kruyf.

The mail I received was broken and it's also missing from archives.

Could you resend the files added to the new tar ball?

Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: crash
       [not found]                   ` <ee5afd761002110943y1ca061bdi610de2f1a5df3c32-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-02-14  7:34                     ` Ryusuke Konishi
  0 siblings, 0 replies; 45+ messages in thread
From: Ryusuke Konishi @ 2010-02-14  7:34 UTC (permalink / raw)
  To: Jan de Kruyf; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,
2010/2/12 Jan de Kruyf <jan.de.kruyf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
> Pliny the Elder (ad 23–79): ‘Semper aliquid novi Africam adferre
> [Africa always brings [us] something new]’
>
> Hallo,
> something is not quite right between the git repository and my git master copy
> somehow the patches dont update the local copy with git am
> I have to extract the patch and apply with 'patch -l' (Match  patterns
>  loosely, in case tabs or spaces have been munged in your files.)
> Which then has other complications like finding the wrong spot in
> super.c for one patch, so that hunk was then rejected and had to be
> done by hand.
>
>
> On the crash:
> My initial idea was quite wrong. Something went amiss somewhere in the
> middle of nowhere.
> It might be cleanerd connected, but I cannot say for sure.
>
> See the attached .tgz for details of my post mortem effort. If more
> data is needed please say so.
>
> Regards,
>
> Jan de Kruyf.

Thank you for the detail report.

According to your log, many tiny checkpoints were created
around 2010-02-09 20:30:01 ~ 2010-02-09 20:31:00.

I saw the dump data of the segment 343 to see the logs having checkpoints
on "20:30:01", but they looked normal to me.

The series of checkpoints after 1844624 seems to need care as you say.

             1844624  2010-02-09 20:30:59   cp    -         55      16621
             1844625  2010-02-09 20:30:59   cp    -         33      16621
             1844626  2010-02-09 20:30:59   cp    -         33      16621
             1844627  2010-02-09 20:30:59   cp    -         34      16621
             1844628  2010-02-09 20:30:59   cp    -         33      16621
             1844629  2010-02-09 20:30:59   cp    -         33      16621
             1844630  2010-02-09 20:30:59   cp    -         33      16621

But, it was not included in the segment 343 but in the segment 344.

Checkpoints can be created when application performs synchronous writes
(e.g. fsync or OSYNC writes).

So, we needs an additional dumpseg data to confirm if it's normal or not.

Can you still take out the dump ?

In addition, I think "ls -laiR" for the var directory would be helpful because
it prints inode numbers along with file names.

Thanks in advance,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: crash
       [not found]               ` <20100211.143001.184824921.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2010-02-11 17:43                 ` Jan de Kruyf
       [not found]                   ` <ee5afd761002110943y1ca061bdi610de2f1a5df3c32-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 45+ messages in thread
From: Jan de Kruyf @ 2010-02-11 17:43 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 852 bytes --]

Pliny the Elder (ad 23–79): ‘Semper aliquid novi Africam adferre
[Africa always brings [us] something new]’

Hallo,
something is not quite right between the git repository and my git master copy
somehow the patches dont update the local copy with git am
I have to extract the patch and apply with 'patch -l' (Match  patterns
 loosely, in case tabs or spaces have been munged in your files.)
Which then has other complications like finding the wrong spot in
super.c for one patch, so that hunk was then rejected and had to be
done by hand.


On the crash:
My initial idea was quite wrong. Something went amiss somewhere in the
middle of nowhere.
It might be cleanerd connected, but I cannot say for sure.

See the attached .tgz for details of my post mortem effort. If more
data is needed please say so.

Regards,

Jan de Kruyf.

[-- Attachment #2: crashedVar.tgz --]
[-- Type: application/x-gzip, Size: 68863 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: crash
       [not found]           ` <ee5afd761002092326k3bb3a74fq7145cdb9925d88d5-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-02-11  5:30             ` Ryusuke Konishi
       [not found]               ` <20100211.143001.184824921.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 45+ messages in thread
From: Ryusuke Konishi @ 2010-02-11  5:30 UTC (permalink / raw)
  To: jan.de.kruyf-Re5JQEeQqe8AvxtiuMwx3w; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,
On Wed, 10 Feb 2010 09:26:46 +0200, Jan de Kruyf wrote:
> Hallo,
> This patch never made it to the module tree:
> 
> from    Ryusuke Konishi <ryusuke-sG5X7nlA6pw@public.gmane.org>
> to    users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org,
> sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> cc    jan.de.kruyf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
> konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org
> date    Thu, Nov 19, 2009 at 9:10 PM
> subject    Re: [NILFS users] [PATCH 4/4] nilfs2: add norepair mount option
> 
> 
> Could I apply the original patch without a problem to the latest module tree?
> 
> from    Ryusuke Konishi <ryusuke-sG5X7nlA6pw@public.gmane.org>
> to    users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org,
> jan.de.kruyf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
> date    Sat, Nov 7, 2009 at 9:28 PM
> subject    Re: [NILFS users] [SPAM] Re: [SPAM] Re: urgent help need!
> disk partition info lost
> 
> Regards,
> 
> Jan de Kruyf.

The patch looks a bit old.

I attached the latest patch against 2.0.18.

Maybe both are safe, but this new one would be better.

Regards,
Ryusuke Konishi

--
diff --git a/fs/nilfs2_fs.h b/fs/nilfs2_fs.h
index ce52040..2e4cbd1 100644
--- a/fs/nilfs2_fs.h
+++ b/fs/nilfs2_fs.h
@@ -151,6 +151,8 @@ struct nilfs_super_root {
 #define NILFS_MOUNT_BARRIER		0x1000  /* Use block barriers */
 #define NILFS_MOUNT_STRICT_ORDER	0x2000  /* Apply strict in-order
 						   semantics also for data */
+#define NILFS_MOUNT_NORECOVERY		0x4000  /* Disable write access during
+						   mount-time recovery */
 
 
 /**
diff --git a/fs/super.c b/fs/super.c
index c9acc9a..d15ede2 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -505,22 +505,6 @@ void nilfs_detach_checkpoint(struct nilfs_sb_info *sbi)
 	nilfs_debug(2, "detached ifile\n");
 }
 
-static int nilfs_mark_recovery_complete(struct nilfs_sb_info *sbi)
-{
-	struct the_nilfs *nilfs = sbi->s_nilfs;
-	int err = 0;
-
-	down_write(&nilfs->ns_sem);
-	if (!(nilfs->ns_mount_state & NILFS_VALID_FS)) {
-		nilfs->ns_mount_state |= NILFS_VALID_FS;
-		err = nilfs_commit_super(sbi, 1);
-		if (likely(!err))
-			printk(KERN_INFO "NILFS: recovery complete.\n");
-	}
-	up_write(&nilfs->ns_sem);
-	return err;
-}
-
 static int nilfs_statfs(struct dentry *dentry, struct kstatfs *buf)
 {
 	struct super_block *sb = dentry->d_sb;
@@ -649,7 +633,7 @@ static struct export_operations nilfs_export_ops = {
 
 enum {
 	Opt_err_cont, Opt_err_panic, Opt_err_ro,
-	Opt_barrier, Opt_snapshot, Opt_order,
+	Opt_barrier, Opt_snapshot, Opt_order, Opt_norecovery,
 	Opt_err,
 };
 
@@ -660,6 +644,7 @@ static match_table_t tokens = {
 	{Opt_barrier, "barrier=%s"},
 	{Opt_snapshot, "cp=%u"},
 	{Opt_order, "order=%s"},
+	{Opt_norecovery, "norecovery"},
 	{Opt_err, NULL}
 };
 
@@ -728,6 +713,9 @@ static int parse_options(char *options, struct super_block *sb)
 			sbi->s_snapshot_cno = option;
 			nilfs_set_opt(sbi, SNAPSHOT);
 			break;
+		case Opt_norecovery:
+			nilfs_set_opt(sbi, NORECOVERY);
+			break;
 		default:
 			printk(KERN_ERR
 			       "NILFS: Unrecognized mount option \"%s\"\n", p);
@@ -753,9 +741,7 @@ static int nilfs_setup_super(struct nilfs_sb_info *sbi)
 	int mnt_count = le16_to_cpu(sbp->s_mnt_count);
 
 	/* nilfs->sem must be locked by the caller. */
-	if (!(nilfs->ns_mount_state & NILFS_VALID_FS)) {
-		printk(KERN_WARNING "NILFS warning: mounting unchecked fs\n");
-	} else if (nilfs->ns_mount_state & NILFS_ERROR_FS) {
+	if (nilfs->ns_mount_state & NILFS_ERROR_FS) {
 		printk(KERN_WARNING
 		       "NILFS warning: mounting fs with errors\n");
 #if 0
@@ -865,11 +851,10 @@ nilfs_fill_super(struct super_block *sb, void *data, int silent,
 	sb->s_root = NULL;
 	sb->s_time_gran = 1;
 
-	if (!nilfs_loaded(nilfs)) {
-		err = load_nilfs(nilfs, sbi);
-		if (err)
-			goto failed_sbi;
-	}
+	err = load_nilfs(nilfs, sbi);
+	if (err)
+		goto failed_sbi;
+
 	cno = nilfs_last_cno(nilfs);
 
 	if (sb->s_flags & MS_RDONLY) {
@@ -946,12 +931,6 @@ nilfs_fill_super(struct super_block *sb, void *data, int silent,
 		up_write(&nilfs->ns_sem);
 	}
 
-	err = nilfs_mark_recovery_complete(sbi);
-	if (unlikely(err)) {
-		printk(KERN_ERR "NILFS: recovery failed.\n");
-		goto failed_root;
-	}
-
 	down_write(&nilfs->ns_super_sem);
 	if (!nilfs_test_opt(sbi, SNAPSHOT))
 		nilfs->ns_current = sbi;
@@ -960,10 +939,6 @@ nilfs_fill_super(struct super_block *sb, void *data, int silent,
 	nilfs_debug(1, "mounted filesystem\n");
 	return 0;
 
- failed_root:
-	dput(sb->s_root);
-	sb->s_root = NULL;
-
  failed_segctor:
 	nilfs_detach_segment_constructor(sbi);
 
@@ -1008,6 +983,14 @@ static int nilfs_remount(struct super_block *sb, int *flags, char *data)
 		goto restore_opts;
 	}
 
+	if (!nilfs_valid_fs(nilfs)) {
+		printk(KERN_WARNING "NILFS (device %s): couldn't "
+		       "remount because the filesystem is in an "
+		       "incomplete recovery state.\n", sb->s_id);
+		err = -EINVAL;
+		goto restore_opts;
+	}
+
 	if ((*flags & MS_RDONLY) == (sb->s_flags & MS_RDONLY))
 		goto out;
 	if (*flags & MS_RDONLY) {
diff --git a/fs/the_nilfs.c b/fs/the_nilfs.c
index 365971b..028f378 100644
--- a/fs/the_nilfs.c
+++ b/fs/the_nilfs.c
@@ -294,29 +294,30 @@ int load_nilfs(struct the_nilfs *nilfs, struct nilfs_sb_info *sbi)
 	struct nilfs_recovery_info ri;
 	unsigned int s_flags = sbi->s_super->s_flags;
 	int really_read_only = bdev_read_only(nilfs->ns_bdev);
-	unsigned valid_fs;
-	int err = 0;
-
-	nilfs_init_recovery_info(&ri);
+	int valid_fs = nilfs_valid_fs(nilfs);
+	int err;
 
-	down_write(&nilfs->ns_sem);
-	valid_fs = (nilfs->ns_mount_state & NILFS_VALID_FS);
-	up_write(&nilfs->ns_sem);
+	if (nilfs_loaded(nilfs)) {
+		if (valid_fs ||
+		    ((s_flags & MS_RDONLY) && nilfs_test_opt(sbi, NORECOVERY)))
+			return 0;
+		printk(KERN_ERR "NILFS: the filesystem is in an incomplete "
+		       "recovery state.\n");
+		return -EINVAL;
+	}
 
-	if (!valid_fs && (s_flags & MS_RDONLY)) {
-		printk(KERN_INFO "NILFS: INFO: recovery "
-		       "required for readonly filesystem.\n");
-		if (really_read_only) {
-			printk(KERN_ERR "NILFS: write access "
-			       "unavailable, cannot proceed.\n");
-			err = -EROFS;
-			goto failed;
+	if (!valid_fs) {
+		printk(KERN_WARNING "NILFS warning: mounting unchecked fs\n");
+		if (s_flags & MS_RDONLY) {
+			printk(KERN_INFO "NILFS: INFO: recovery "
+			       "required for readonly filesystem.\n");
+			printk(KERN_INFO "NILFS: write access will "
+			       "be enabled during recovery.\n");
 		}
-		printk(KERN_INFO "NILFS: write access will "
-		       "be enabled during recovery.\n");
-		sbi->s_super->s_flags &= ~MS_RDONLY;
 	}
 
+	nilfs_init_recovery_info(&ri);
+
 	err = nilfs_search_super_root(nilfs, sbi, &ri);
 	if (unlikely(err)) {
 		printk(KERN_ERR "NILFS: error searching super root.\n");
@@ -329,19 +330,56 @@ int load_nilfs(struct the_nilfs *nilfs, struct nilfs_sb_info *sbi)
 		goto failed;
 	}
 
-	if (!valid_fs) {
-		err = nilfs_recover_logical_segments(nilfs, sbi, &ri);
-		if (unlikely(err)) {
-			nilfs_mdt_destroy(nilfs->ns_cpfile);
-			nilfs_mdt_destroy(nilfs->ns_sufile);
-			nilfs_mdt_destroy(nilfs->ns_dat);
-			goto failed;
+	if (valid_fs)
+		goto skip_recovery;
+
+	if (s_flags & MS_RDONLY) {
+		if (nilfs_test_opt(sbi, NORECOVERY)) {
+			printk(KERN_INFO "NILFS: norecovery option specified. "
+			       "skipping roll-forward recovery\n");
+			goto skip_recovery;
 		}
-		if (ri.ri_need_recovery == NILFS_RECOVERY_SR_UPDATED)
-			sbi->s_super->s_dirt = 1;
+		if (really_read_only) {
+			printk(KERN_ERR "NILFS: write access "
+			       "unavailable, cannot proceed.\n");
+			err = -EROFS;
+			goto failed_unload;
+		}
+		sbi->s_super->s_flags &= ~MS_RDONLY;
+	} else if (nilfs_test_opt(sbi, NORECOVERY)) {
+		printk(KERN_ERR "NILFS: recovery cancelled because norecovery "
+		       "option was specified for a read/write mount\n");
+		err = -EINVAL;
+		goto failed_unload;
 	}
 
+	err = nilfs_recover_logical_segments(nilfs, sbi, &ri);
+	if (err)
+		goto failed_unload;
+
+	down_write(&nilfs->ns_sem);
+	nilfs->ns_mount_state |= NILFS_VALID_FS;
+	nilfs->ns_sbp[0]->s_state = cpu_to_le16(nilfs->ns_mount_state);
+	err = nilfs_commit_super(sbi, 1);
+	up_write(&nilfs->ns_sem);
+
+	if (err) {
+		printk(KERN_ERR "NILFS: failed to update super block. "
+		       "recovery unfinished.\n");
+		goto failed_unload;
+	}
+	printk(KERN_INFO "NILFS: recovery complete.\n");
+
+ skip_recovery:
 	set_nilfs_loaded(nilfs);
+	nilfs_clear_recovery_info(&ri);
+	sbi->s_super->s_flags = s_flags;
+	return 0;
+
+ failed_unload:
+	nilfs_mdt_destroy(nilfs->ns_cpfile);
+	nilfs_mdt_destroy(nilfs->ns_sufile);
+	nilfs_mdt_destroy(nilfs->ns_dat);
 
  failed:
 	nilfs_clear_recovery_info(&ri);
diff --git a/fs/the_nilfs.h b/fs/the_nilfs.h
index fa3a1df..25c24a0 100644
--- a/fs/the_nilfs.h
+++ b/fs/the_nilfs.h
@@ -242,6 +242,16 @@ static inline void nilfs_put_sbinfo(struct nilfs_sb_info *sbi)
 		kfree(sbi);
 }
 
+static inline int nilfs_valid_fs(struct the_nilfs *nilfs)
+{
+	unsigned valid_fs;
+
+	down_read(&nilfs->ns_sem);
+	valid_fs = (nilfs->ns_mount_state & NILFS_VALID_FS);
+	up_read(&nilfs->ns_sem);
+	return valid_fs;
+}
+
 static inline void
 nilfs_get_segment_range(struct the_nilfs *nilfs, __u64 segnum,
 			sector_t *seg_start, sector_t *seg_end)
diff --git a/nilfs2.txt b/nilfs2.txt
index dbbda00..906b505 100644
--- a/nilfs2.txt
+++ b/nilfs2.txt
@@ -71,6 +71,10 @@ order=strict		Apply strict in-order semantics that preserves sequence
 			blocks.  That means, it is guaranteed that no
 			overtaking of events occurs in the recovered file
 			system after a crash.
+norecovery		Disable recovery of the filesystem on mount.
+			This disables every write access on the device for
+			read-only mounts or snapshots.  This option will fail
+			for r/w mounts on an unclean volume.
 
 NILFS2 usage
 ============
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* crash
       [not found]       ` <ee5afd761002092314i1ba1ec66ie6fe7d0f22d6927e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-02-10  7:26         ` Jan de Kruyf
       [not found]           ` <ee5afd761002092326k3bb3a74fq7145cdb9925d88d5-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 45+ messages in thread
From: Jan de Kruyf @ 2010-02-10  7:26 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hallo,
This patch never made it to the module tree:

from    Ryusuke Konishi <ryusuke-sG5X7nlA6pw@public.gmane.org>
to    users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org,
sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
cc    jan.de.kruyf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org
date    Thu, Nov 19, 2009 at 9:10 PM
subject    Re: [NILFS users] [PATCH 4/4] nilfs2: add norepair mount option


Could I apply the original patch without a problem to the latest module tree?

from    Ryusuke Konishi <ryusuke-sG5X7nlA6pw@public.gmane.org>
to    users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org,
jan.de.kruyf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
date    Sat, Nov 7, 2009 at 9:28 PM
subject    Re: [NILFS users] [SPAM] Re: [SPAM] Re: urgent help need!
disk partition info lost

Regards,

Jan de Kruyf.



On Wed, Feb 10, 2010 at 6:07 AM, Ryusuke Konishi <ryusuke-sG5X7nlA6pw@public.gmane.org> wrote:
>
> ----
> Hi,
> > (Cc'ed to linux-nilfs)
> > Hallo,
> > here is a copy of a post to the list in which you might have an interest.
> > If you need me to do anything please feel free to ask, I hope I will have
> > some time tomorrow.
> >
> > Regards
> >
> > Jan de Kruyf.
> >
> > ------------------------------------------------
> > Hallo,
> > I did it again.
> > computer locked up. Most likely X, since the keyboard was dead. But the
> > cleanerd was still running
> > from the flashing of the hard-drive LED.
> >
> > Press the big hard reset button
> > restart . . . /var partition (nilfs2) is now full and the restart cannot run
> > to completion.
> >
> > This is the 3rd time it happened to me!
> >
> > The only thing I can think of right now is that the cleaner daemon was
> > interrupted at the wrong moment.
> > And the /var partition was left in a full state. Is this possible?
> >
> > Does anybody want anything of the image for a post mortem?
> >
> > Does anybody want me to do some tests before I reformat?
> >
> > Further details will follow once I got the rescue HD hooked up.
> >
> > Regards,
> >
> > Jan de Kruyf.
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> Can you mount the partition with -o ro,norecovery ?
>
> If so, can you see if there are warnings or errors of some kind in
> syslog or daemon or messages?
>
>
> Thanks,
> Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: crash
       [not found] ` <201002100059.AA01340-ZdTO5nnmHvkOizVVqyxoihMFgDP4sedm@public.gmane.org>
@ 2010-02-10  4:07   ` Ryusuke Konishi
       [not found]     ` <ee5afd761002092314i1ba1ec66ie6fe7d0f22d6927e@mail.gmail.com>
  0 siblings, 1 reply; 45+ messages in thread
From: Ryusuke Konishi @ 2010-02-10  4:07 UTC (permalink / raw)
  To: jan.de.kruyf-Re5JQEeQqe8AvxtiuMwx3w
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA,
	konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg

----
Hi,
> (Cc'ed to linux-nilfs)
> Hallo,
> here is a copy of a post to the list in which you might have an interest.
> If you need me to do anything please feel free to ask, I hope I will have
> some time tomorrow.
> 
> Regards
> 
> Jan de Kruyf.
> 
> ------------------------------------------------
> Hallo,
> I did it again.
> computer locked up. Most likely X, since the keyboard was dead. But the
> cleanerd was still running
> from the flashing of the hard-drive LED.
> 
> Press the big hard reset button
> restart . . . /var partition (nilfs2) is now full and the restart cannot run
> to completion.
> 
> This is the 3rd time it happened to me!
> 
> The only thing I can think of right now is that the cleaner daemon was
> interrupted at the wrong moment.
> And the /var partition was left in a full state. Is this possible?
> 
> Does anybody want anything of the image for a post mortem?
> 
> Does anybody want me to do some tests before I reformat?
> 
> Further details will follow once I got the rescue HD hooked up.
> 
> Regards,
> 
> Jan de Kruyf.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Can you mount the partition with -o ro,norecovery ?

If so, can you see if there are warnings or errors of some kind in
syslog or daemon or messages?


Thanks,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* crash
@ 2010-02-10  0:59 Ryusuke Konishi
       [not found] ` <201002100059.AA01340-ZdTO5nnmHvkOizVVqyxoihMFgDP4sedm@public.gmane.org>
  0 siblings, 1 reply; 45+ messages in thread
From: Ryusuke Konishi @ 2010-02-10  0:59 UTC (permalink / raw)
  To: konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA, Jan de Kruyf

(Cc'ed to linux-nilfs)
Hallo,
here is a copy of a post to the list in which you might have an interest.
If you need me to do anything please feel free to ask, I hope I will have
some time tomorrow.

Regards

Jan de Kruyf.

------------------------------------------------
Hallo,
I did it again.
computer locked up. Most likely X, since the keyboard was dead. But the
cleanerd was still running
from the flashing of the hard-drive LED.

Press the big hard reset button
restart . . . /var partition (nilfs2) is now full and the restart cannot run
to completion.

This is the 3rd time it happened to me!

The only thing I can think of right now is that the cleaner daemon was
interrupted at the wrong moment.
And the /var partition was left in a full state. Is this possible?

Does anybody want anything of the image for a post mortem?

Does anybody want me to do some tests before I reformat?

Further details will follow once I got the rescue HD hooked up.

Regards,

Jan de Kruyf.
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Crash
  2005-03-02  7:32           ` Crash Keir Fraser
@ 2005-03-02 11:52             ` visik7
  0 siblings, 0 replies; 45+ messages in thread
From: visik7 @ 2005-03-02 11:52 UTC (permalink / raw)
  To: xen-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Keir Fraser wrote:
|
| On 1 Mar 2005, at 23:07, visik7 wrote:
|
|> Really I can't figure how to check if I'm building agains arch/xen or
|> arch/i386
|> can u help me
|> thanks
|>     Marco
|
|
| We've just done some comparsion of modules built against arch/i386 and
| arch/xen, and decided that there isn't so much difference between them
| after all. I'll add CLI/STI emulation to xen-unstable and let you know
| when I;ve done that. With that support checked in your driver ought to
| work.
|
|  -- Keir
|
|
Well I'll wait for it
anyway, do I need to recompile only the hipervisor from the unstable
branch or also the kernels and tools ?
Thanks
	Marco
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCJakQihYL426P5W4RAilRAJ0SkpnKW3g/KtZbApg32EOQBFZ0hQCfSW2K
4kohoWGvBVkwkgr9HOYw59I=
=5YU2
-----END PGP SIGNATURE-----


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Crash
  2005-02-25 15:00       ` Crash Keir Fraser
  2005-02-27 23:46         ` Crash visik7
@ 2005-03-01 23:07         ` visik7
  2005-03-02  7:32           ` Crash Keir Fraser
  1 sibling, 1 reply; 45+ messages in thread
From: visik7 @ 2005-03-01 23:07 UTC (permalink / raw)
  To: xen-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Keir Fraser wrote:
|
| On 25 Feb 2005, at 07:59, visik7 wrote:

| Really, since you are building this driver from source, the correct
| course of action is to find why it compiled against i386 rather than xen
| architecture. It is probably something really simple: e.g., did you
| build the driver while in native Linux and only after that reboot into
| XenLinux? If you know how, taking a look at the compiler command line
| and Makefiles will probably make it fairly obvious how to make the
| driver look at arch/xen rather than arch/i386.
|
|  -- Keir


Really I can't figure how to check if I'm building agains arch/xen or
arch/i386
can u help me
thanks
	Marco

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCJPWUihYL426P5W4RAsk0AKCeb6y8FF4jspbh82tAhz8WwmWuGACePkV1
EM8KcdbrXrJnBF8QyhmtAq8=
=ED8j
-----END PGP SIGNATURE-----


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Crash
  2005-02-25 15:00       ` Crash Keir Fraser
@ 2005-02-27 23:46         ` visik7
  2005-03-01 23:07         ` Crash visik7
  1 sibling, 0 replies; 45+ messages in thread
From: visik7 @ 2005-02-27 23:46 UTC (permalink / raw)
  To: xen-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Keir Fraser wrote:

| After writing my previosu email I've concluded that, even if we fix
|  cli/sti instructions, there are other arch-dependencies that are
| much harder for us to handle automatically, or even with static
| rewriting of the driver binary.
|
| Really, since you are building this driver from source, the correct
|  course of action is to find why it compiled against i386 rather
| than xen architecture. It is probably something really simple:
| e.g., did you build the driver while in native Linuxt and only
| after hat reboot into XenLinux? If you know how, taking a look at
| the compiler command line and Makefiles will probably make it
| fairly obvious how to make the driver look at arch/xen rather than
| arch/i386.
|
| -- Keir
|
|
ok I do the following steps:
compile a kernel linux-2.6.10-xen0 and use it both for dom0 and domU
than after run a domU I get the source of linux-2.6.10-xen0 tree into
/usr/src/linux-2.6.10 (getting also linux-2.6.10-xen-sparse and
putting it in the same dir of the source (so no broken links))
after doing this I compiled my driver against this source
I check the source tree of the module but there aren't no explicit
reference to i386
thankyou
~    Marco
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCIlvTihYL426P5W4RAkErAJ9+h9J1ZF0N3aSeFq+S1437IvKLHQCfSvAD
+al2WPxpY2sr0wqXjMfE8HI=
=HNSj
-----END PGP SIGNATURE-----


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Crash
  2005-02-25  7:59     ` Crash visik7
@ 2005-02-25 15:00       ` Keir Fraser
  2005-02-27 23:46         ` Crash visik7
  2005-03-01 23:07         ` Crash visik7
  0 siblings, 2 replies; 45+ messages in thread
From: Keir Fraser @ 2005-02-25 15:00 UTC (permalink / raw)
  To: visik7; +Cc: xen-devel


On 25 Feb 2005, at 07:59, visik7 wrote:

> I understand.
> I'm running xen stable 2.0.4 but I'm planning to upgrade to unstable
> if it solves my problem, the question is
> how much unstable is 'unstable', I mean for a production system.
> still one thing: the first oops occured while apt-get install cvs on
> domain1 then no modprobe or access to the zaptel driver,
> what happened ?
> I ask you this 'couse I'm totally incompetent about reading an oops or
> debugging a problem like this
> thank you
>
Marco,

After writing my previosu email I've concluded that, even if we fix 
cli/sti instructions, there are other arch-dependencies that are much 
harder for us to handle automatically, or even with static rewriting of 
the driver binary.

Really, since you are building this driver from source, the correct 
course of action is to find why it compiled against i386 rather than 
xen architecture. It is probably something really simple: e.g., did you 
build the driver while in native Linux and only after that reboot into 
XenLinux? If you know how, taking a look at the compiler command line 
and Makefiles will probably make it fairly obvious how to make the 
driver look at arch/xen rather than arch/i386.

  -- Keir



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Crash
  2005-02-25  7:10   ` Crash Keir Fraser
@ 2005-02-25  7:59     ` visik7
  2005-02-25 15:00       ` Crash Keir Fraser
  0 siblings, 1 reply; 45+ messages in thread
From: visik7 @ 2005-02-25  7:59 UTC (permalink / raw)
  To: xen-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
 
Keir Fraser wrote:

| Your compiled driver contains privileged instructions that guests
| are not allowed to execute. Either these are hardcoded within the
| driver or it has compiled itself against arch/i386 rather than
| arch/xen.
|
| If you are running on xen-unstable then we can add CLI/STI to the
| instructions that we emulate and that may get the driver working.
| But we don't emulate instructions in 2.0-testing so if you are
| running the stable series you will have to dig into teh driver a
| bit and work out how to pull in the Xen definitions of cli() and
| sti().
|
| -- Keir

I understand.
I'm running xen stable 2.0.4 but I'm planning to upgrade to unstable
if it solves my problem, the question is
how much unstable is 'unstable', I mean for a production system.
still one thing: the first oops occured while apt-get install cvs on
domain1 then no modprobe or access to the zaptel driver,
what happened ?
I ask you this 'couse I'm totally incompetent about reading an oops or
debugging a problem like this
thank you
~    Marco

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
 
iD8DBQFCHtrtihYL426P5W4RAgCqAJ9Ss3TWdgnwpLqEykANftaG4jAp8QCfQczh
ZUGSn7iBAoWjweq1/A+z8TY=
=ELnH
-----END PGP SIGNATURE-----



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Crash
  2005-02-25  0:24 ` Crash visik7
@ 2005-02-25  7:10   ` Keir Fraser
  2005-02-25  7:59     ` Crash visik7
  0 siblings, 1 reply; 45+ messages in thread
From: Keir Fraser @ 2005-02-25  7:10 UTC (permalink / raw)
  To: visik7; +Cc: xen-devel


On 25 Feb 2005, at 00:24, visik7 wrote:

> Ok At least I'm able to reproduce something
> all the stuff are in
> http://www.junghanns.net/asterisk/downloads/bristuff-0.2.0-RC7f.tar.gz
> this tgz contain a script that download zaptel drivers
> and compile it against the running kernel
> this is the problem:
>
Your compiled driver contains privileged instructions that guests are 
not allowed to execute. Either these are hardcoded within the driver or 
it has compiled itself against arch/i386 rather than arch/xen.

If you are running on xen-unstable then we can add CLI/STI to the 
instructions that we emulate and that may get the driver working. But 
we don't emulate instructions in 2.0-testing so if you are running the 
stable series you will have to dig into teh driver a bit and work out 
how to pull in the Xen definitions of cli() and sti().

  -- Keir



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Crash
  2005-02-24 22:45 Crash visik7
@ 2005-02-25  0:24 ` visik7
  2005-02-25  7:10   ` Crash Keir Fraser
  0 siblings, 1 reply; 45+ messages in thread
From: visik7 @ 2005-02-25  0:24 UTC (permalink / raw)
  To: xen-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
 
Ok At least I'm able to reproduce something
all the stuff are in
http://www.junghanns.net/asterisk/downloads/bristuff-0.2.0-RC7f.tar.gz
this tgz contain a script that download zaptel drivers
and compile it against the running kernel
this is the problem:

domain1:~# lsmod
Module                  Size  Used by
domain1:~# modprobe zaptel
Zapata Telephony Interface Registered on major 196
domain1:~# lsmod
Module                  Size  Used by
zaptel                222468  0
crc_ccitt               2176  1 zaptel
domain1:~# insmod bristuff-0.2.0-RC7f/zaphfc/zaphfc.ko
zaphfc: no version for "zt_receive" found: kernel tainted.
PCI: Enabling device 0000:00:0b.0 (0002 -> 0003)
PCI: Obtained IRQ 19 for device 0000:00:0b.0
zaphfc: CCD/Billion/Asuscom 2BD0 configured at mem 0xc48be000 fifo
0xc3b08000(0x
3b08000) IRQ 19 HZ 1000
zaphfc: Card 0 configured for TE mode
general protection fault: 0000 [#1]
PREEMPT
Modules linked in: zaphfc zaptel crc_ccitt
CPU:    0
EIP:    0061:[<c49301ad>]    Tainted: GF     VLI
EFLAGS: 00010202   (2.6.10)
EIP is at zt_chan_reg+0x9/0xd8 [zaptel]
eax: 00000002   ebx: 00000001   ecx: c48e896a   edx: c3af0114
esi: c3af0114   edi: 00000202   ebp: c3af0114   esp: c39b1ec8
ds: 007b   es: 007b   ss: 0069
Process insmod (pid: 970, threadinfo=c39b0000 task=c10b7080)
Stack: 00000001 00000000 c3af0004 c492256c c3af0114 ffffffff c01b9a8f
c3af1241
~       ffffffff 00000003 00000000 0000000a ffffffff 00000000 00000002
ffffffff
~       ffffffff 00000003 c3af1110 c3af1aac 00000000 c01b9d52 c3af1238
3c50edc8
Call Trace:
~ [<c492256c>] zt_register+0xcc/0x218 [zaptel]
~ [<c01b9a8f>] vsnprintf+0x223/0x46c
~ [<c01b9d52>] vsprintf+0x16/0x1c
~ [<c01b9d6a>] sprintf+0x12/0x18
~ [<c48e81b9>] zthfc_initialize+0x17d/0x1b8 [zaphfc]
~ [<c0118655>] vprintk+0xfd/0x188
~ [<c48e83d1>] hfc_findCards+0x1dd/0x3a4 [zaphfc]
~ [<c48e85d2>] init_module+0x3a/0x60 [zaphfc]
~ [<c012e1e2>] sys_init_module+0x15e/0x20c
~ [<c0109368>] syscall_call+0x7/0xb
Code: 00 6a 00 ba 01 00 00 00 e8 89 50 7e fb 59 eb 95 89 f6 5b c3 89
f6 8b 93 f4
~ 01 00 00 e9 53 ff ff ff 90 57 56 53 8b 74 24 10 9c 5f <fa> ba 00 e0
ff ff 21 e2
~ ff 42 14 bb 01 00 00 00 8d 76 00 8b 14
~ <1>general protection fault: 0000 [#2]
PREEMPT
Modules linked in: zaphfc zaptel crc_ccitt
CPU:    0
EIP:    0061:[<c48e7bda>]    Tainted: GF     VLI
EFLAGS: 00010286   (2.6.10)
EIP is at hfc_interrupt+0x1e/0x33c [zaphfc]
eax: 00000000   ebx: c3ae6800   ecx: c130ef60   edx: c1323998
esi: 00000286   edi: c1323998   ebp: 00000013   esp: c1323918
ds: 007b   es: 007b   ss: 0069
Process syslogd (pid: 892, threadinfo=c1322000 task=c12f15a0)
Stack: 00000000 c130e380 c130ef60 00000000 c1323998 00000013 c012faf7
00000013
~       c3ae6800 c1323998 00000000 c1322000 00000013 c0364400 c1322000
c012fc09
~       c130ef60 c1323998 00000000 00000000 fbffb000 00000000 c010ddad
c0105cc7
Call Trace:
~ [<c012faf7>] handle_IRQ_event+0x2f/0x74
~ [<c012fc09>] __do_IRQ+0xcd/0x128
~ [<c010ddad>] do_IRQ+0x19/0x24
~ [<c0105cc7>] evtchn_do_upcall+0x9f/0x100
~ [<c0109517>] hypervisor_callback+0x37/0x40
~ [<c0105c26>] force_evtchn_callback+0xa/0xc
~ [<c01fa21c>] xencons_tx_flush_task_routine+0x4c/0x50
~ [<c010545a>] __ctrl_if_tx_tasklet+0x14a/0x164
~ [<c011cb97>] tasklet_action+0x4f/0x94
~ [<c011c91b>] __do_softirq+0x8f/0xa0
~ [<c011c975>] do_softirq+0x49/0x4c
~ [<c012fac5>] irq_exit+0x35/0x38
~ [<c010ddb2>] do_IRQ+0x1e/0x24
~ [<c0105cc7>] evtchn_do_upcall+0x9f/0x100
~ [<c0109517>] hypervisor_callback+0x37/0x40
~ [<c019c212>] reiserfs_paste_into_item+0x17e/0x238
~ [<c018bb11>] reiserfs_allocate_blocks_for_region+0x8d5/0x1358
~ [<c0199e38>] search_for_position_by_key+0x8c/0x37c
~ [<c01861d0>] make_cpu_key+0x30/0x34
~ [<c0198951>] pathrelse+0x21/0x30
~ [<c018d996>] reiserfs_file_write+0x42a/0x5ec
~ [<c0108785>] setup_sigcontext+0x385/0x47c
~ [<c0231c59>] sys_recvfrom+0x9d/0xf4
~ [<c0231c99>] sys_recvfrom+0xdd/0xf4
~ [<c02366c3>] datagram_poll+0x1f/0xbc
~ [<c015cba4>] poll_freewait+0x3c/0x44
~ [<c015ced3>] do_select+0x1a7/0x2b0
~ [<c01bab16>] copy_from_user+0x2e/0x54
~ [<c014cc40>] do_readv_writev+0x158/0x218
~ [<c018d56c>] reiserfs_file_write+0x0/0x5ec
~ [<c0231cc9>] sys_recv+0x19/0x20
~ [<c014cd84>] vfs_writev+0x3c/0x48
~ [<c014ce33>] sys_writev+0x3b/0x68
~ [<c0109368>] syscall_call+0x7/0xb
Code: f6 81 c3 00 02 00 00 e9 27 fe ff ff 90 55 57 56 53 83 ec 08 8b
5c 24 20 31
~ c0 85 db 74 30 8b 73 10 85 f6 0f 84 08 03 00 00 9c 5e <fa> ba 00 e0
ff ff 21 e2
~ ff 42 14 8b 4b 10 8a 41 70 84 c0 78 1d
~ <0>Kernel panic - not syncing: Fatal exception in interrupt
~ <0>Rebooting in 1 seconds..

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
 
iD8DBQFCHnA8ihYL426P5W4RAshFAJ9c4t03dbql+rhGV19tPZ4tOepA/ACbBNtB
fEBQcGxAs6Ev5szT0Vj2Rpc=
=Arxz
-----END PGP SIGNATURE-----



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Crash
@ 2005-02-24 22:45 visik7
  2005-02-25  0:24 ` Crash visik7
  0 siblings, 1 reply; 45+ messages in thread
From: visik7 @ 2005-02-24 22:45 UTC (permalink / raw)
  To: xen-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
 
while apt-get install cvs in a non0domain

Unable to handle kernel NULL pointer dereference at virtual address
000000b2
~ printing eip:
c019e50c
*pde = ma 00000000 pa 55555000
~ [<c017eeb6>] scan_bitmap_block+0xea/0x2c0
~ [<c017f338>] scan_bitmap+0x1f4/0x228
~ [<c0180397>] reiserfs_allocate_blocknrs+0x237/0x514
~ [<c018b458>] reiserfs_allocate_blocks_for_region+0x21c/0x1358
~ [<c0199e38>] search_for_position_by_key+0x8c/0x37c
~ [<c0199f2e>] search_for_position_by_key+0x182/0x37c
~ [<c0150afd>] alloc_buffer_head+0x11/0x44
~ [<c01861d0>] make_cpu_key+0x30/0x34
~ [<c0198951>] pathrelse+0x21/0x30
~ [<c018d996>] reiserfs_file_write+0x42a/0x5ec
~ [<c013ef45>] do_wp_page+0x381/0x414
~ [<c01201b1>] update_process_times+0x29/0x30
~ [<c01124bd>] do_page_fault+0x3c5/0x5b0
~ [<c0112544>] do_page_fault+0x44c/0x5b0
~ [<c0127598>] rcu_check_quiescent_state+0x58/0x6c
~ [<c0127658>] __rcu_process_callbacks+0xa8/0x10c
~ [<c012030a>] run_timer_softirq+0x13a/0x1d0
~ [<c012faf7>] handle_IRQ_event+0x2f/0x74
~ [<c012faf7>] handle_IRQ_event+0x2f/0x74
~ [<c014c8b3>] vfs_write+0x8b/0xd0
~ [<c014c99b>] sys_write+0x3b/0x68
~ [<c0109368>] syscall_call+0x7/0xb
Oops: 0000 [#1]
PREEMPT
Modules linked in: zaptel crc_ccitt
CPU:    0
EIP:    0061:[<c019e50c>]    Not tainted VLI
EFLAGS: 00010206   (2.6.10)
EIP is at reiserfs_in_journal+0xc0/0x180
eax: 000000aa   ebx: 0003b700   ecx: 000004bd   edx: c4811000
esi: 00000001   edi: c0455000   ebp: 00003700   esp: c1da9be0
ds: 007b   es: 007b   ss: 0069
Process frontend (pid: 5190, threadinfo=c1da8000 task=c1f97a40)
Stack: c4811000 00003700 c480f038 c0455000 c1da9c60 c017eeb6 c0455000
00000007
~       00003700 00000001 c1da9c10 000036f2 00000000 00000000 00000007
c0455000
~       c1da9c60 c0444800 c017f338 c1da9eb4 00000007 c1da9c60 00008000
00000001
Call Trace:
~ [<c017eeb6>] scan_bitmap_block+0xea/0x2c0
~ [<c017f338>] scan_bitmap+0x1f4/0x228
~ [<c0180397>] reiserfs_allocate_blocknrs+0x237/0x514
~ [<c018b458>] reiserfs_allocate_blocks_for_region+0x21c/0x1358
~ [<c0199e38>] search_for_position_by_key+0x8c/0x37c
~ [<c0199f2e>] search_for_position_by_key+0x182/0x37c
~ [<c0150afd>] alloc_buffer_head+0x11/0x44
~ [<c01861d0>] make_cpu_key+0x30/0x34
~ [<c0198951>] pathrelse+0x21/0x30
~ [<c018d996>] reiserfs_file_write+0x42a/0x5ec
~ [<c013ef45>] do_wp_page+0x381/0x414
~ [<c01201b1>] update_process_times+0x29/0x30
~ [<c01124bd>] do_page_fault+0x3c5/0x5b0
~ [<c0112544>] do_page_fault+0x44c/0x5b0
~ [<c0127598>] rcu_check_quiescent_state+0x58/0x6c
~ [<c0127658>] __rcu_process_callbacks+0xa8/0x10c
~ [<c012030a>] run_timer_softirq+0x13a/0x1d0
~ [<c012faf7>] handle_IRQ_event+0x2f/0x74
~ [<c012faf7>] handle_IRQ_event+0x2f/0x74
~ [<c014c8b3>] vfs_write+0x8b/0xd0
~ [<c014c99b>] sys_write+0x3b/0x68
~ [<c0109368>] syscall_call+0x7/0xb
Code: d8 c1 f8 0d c1 e2 07 31 c2 89 f9 8d 04 1b 31 c2 c1 e9 07 31 d1
81 e1 ff 1f 00 00 8b 14 24 8b 84 8a 0c 81 00 00 85 c0 74 12 89 f6 <39>
58 08 0f 84 83 00 00 00 8b 40 20 85 c0 75 f0 31 d2 85 d2 b8
~ dpkg: error processing cvs (--configure):
~ subprocess post-installation script killed by signal (Segmentation
fault)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
 
iD8DBQFCHlkjihYL426P5W4RAtOmAJwJ+XrtMX4m8RwIGsZbFZBgBuoeKgCdFF/M
oFwnci+jObh1YtltQWIlCXk=
=q0O+
-----END PGP SIGNATURE-----



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Crash
@ 2002-07-07 23:34 Jarda Gress
  0 siblings, 0 replies; 45+ messages in thread
From: Jarda Gress @ 2002-07-07 23:34 UTC (permalink / raw)
  To: linux-kernel, jgress

[-- Attachment #1: Type: text/plain, Size: 201 bytes --]

The kernel oops.
I think its in driver for sda. I have 2 IDE drives. One cdrom and one zip.
No scsi controlers or devices.
Attached is the log.
Thanks for delivering this to the right persdon.
Jarda



[-- Attachment #2: messages --]
[-- Type: text/plain, Size: 4845 bytes --]

Jul  7 01:49:37 localhost syslogd 1.4.1: restart.
Jul  7 01:50:04 localhost net_monitor.real[1671]: launched command: /usr/sbin/logdrake --file=/var/log/messages &
Jul  7 01:50:08 localhost net_monitor.real[1671]: launched command: /usr/sbin/logdrake --file=/var/log/messages &
Jul  7 01:50:09 localhost logdrake[1876]: ### Program is starting ###
Jul  7 01:50:13 localhost net_monitor.real[1671]: launched command: /usr/sbin/logdrake --file=/var/log/messages &
Jul  7 01:50:19 localhost logdrake[1878]: ### Program is starting ###
Jul  7 01:50:21 localhost dhcpcd[1710]: timed out waiting for a valid DHCP server response 
Jul  7 01:50:26 localhost logdrake[1880]: ### Program is starting ###
Jul  7 01:51:11 localhost kernel: CSLIP: code copyright 1989 Regents of the University of California
Jul  7 01:51:11 localhost kernel: PPP generic driver version 2.4.1
Jul  7 01:51:11 localhost pppd[1892]: pppd 2.4.1 started by jarda, uid 501
Jul  7 01:51:11 localhost pppd[1892]: Using interface ppp0
Jul  7 01:51:11 localhost pppd[1892]: Connect: ppp0 <--> /dev/tts/1
Jul  7 01:51:35 localhost pppd[1892]: Hangup (SIGHUP)
Jul  7 01:51:35 localhost pppd[1892]: Modem hangup
Jul  7 01:51:35 localhost pppd[1892]: Connection terminated.
Jul  7 01:51:35 localhost pppd[1892]: Exit.
Jul  7 01:52:18 localhost pppd[1900]: pppd 2.4.1 started by jarda, uid 501
Jul  7 01:52:18 localhost pppd[1900]: Using interface ppp0
Jul  7 01:52:18 localhost pppd[1900]: Connect: ppp0 <--> /dev/tts/1
Jul  7 01:52:22 localhost kernel: PPP BSD Compression module registered
Jul  7 01:52:23 localhost kernel: PPP Deflate Compression module registered
Jul  7 01:52:23 localhost pppd[1900]: local  IP address 210.50.68.22
Jul  7 01:52:23 localhost pppd[1900]: remote IP address 192.168.34.1
Jul  7 01:52:23 localhost pppd[1900]: primary   DNS address 203.134.64.66
Jul  7 01:52:23 localhost pppd[1900]: secondary DNS address 203.134.65.66
Jul  7 01:53:44 localhost kernel: Device not ready.  Make sure there is a disc in the drive.
Jul  7 01:53:44 localhost kernel: sda : READ CAPACITY failed.
Jul  7 01:53:44 localhost kernel: sda : status = 0, message = 00, host = 0, driver = 28 
Jul  7 01:53:44 localhost kernel: Current sd00:00: sense key Not Ready
Jul  7 01:53:44 localhost kernel: Additional sense indicates Medium not present
Jul  7 01:53:44 localhost kernel: sda : block size assumed to be 512 bytes, disk size 1GB.  
Jul  7 01:53:44 localhost kernel:  /dev/scsi/host0/bus0/target0/lun0: I/O error: dev 08:00, sector 0
Jul  7 01:53:44 localhost kernel:  I/O error: dev 08:00, sector 0
Jul  7 01:53:44 localhost kernel: Unable to handle kernel paging request at virtual address 204f2f8d
Jul  7 01:53:44 localhost kernel:  printing eip:
Jul  7 01:53:44 localhost kernel: c0160783
Jul  7 01:53:44 localhost kernel: *pde = 00000000
Jul  7 01:53:44 localhost kernel: Oops: 0000
Jul  7 01:53:44 localhost kernel: CPU:    0
Jul  7 01:53:44 localhost kernel: EIP:    0010:[scan_dir_for_removable+19/64]    Tainted: P 
Jul  7 01:53:44 localhost kernel: EIP:    0010:[<c0160783>]    Tainted: P 
Jul  7 01:53:44 localhost kernel: EFLAGS: 00010202
Jul  7 01:53:44 localhost kernel: eax: c4b4dd20   ebx: 204f2f49   ecx: 00000000   edx: c4b4dd20
Jul  7 01:53:44 localhost kernel: esi: c54cb500   edi: c58efb60   ebp: c18525c0   esp: c4f03f28
Jul  7 01:53:44 localhost kernel: ds: 0018   es: 0018   ss: 0018
Jul  7 01:53:44 localhost kernel: Process msec_find (pid: 1873, stackpage=c4f03000)
Jul  7 01:53:44 localhost kernel: Stack: c54cb500 c0160c16 c58efb60 c0265a40 00000000 c54cb500 c54cb580 c54cb56c 
Jul  7 01:53:44 localhost kernel:        c18525c0 c0141690 c18525c0 c4f03fa0 c0141b90 c18525c0 fffffff7 0000000d 
Jul  7 01:53:44 localhost kernel:        bfffeae8 c0141d3f c18525c0 c0141b90 c4f03fa0 c5816c40 c01338f7 c5816c40 
Jul  7 01:53:44 localhost kernel: Call Trace: [devfs_readdir+86/448] [vfs_readdir+96/144] [filldir64+0/352] [sys_getdents64+79/185] [filldir64+0/352] 
Jul  7 01:53:44 localhost kernel: Call Trace: [<c0160c16>] [<c0141690>] [<c0141b90>] [<c0141d3f>] [<c0141b90>] 
Jul  7 01:53:44 localhost kernel:    [sys_fchdir+199/224] [system_call+51/64] 
Jul  7 01:53:44 localhost kernel:    [<c01338f7>] [<c0106f23>] 
Jul  7 01:53:44 localhost kernel: 
Jul  7 01:53:44 localhost kernel: Code: 66 8b 43 44 25 00 f0 00 00 66 3d 00 60 75 0d f6 43 10 04 74 
Jul  7 01:53:56 localhost anacron[1285]: Job `cron.daily' terminated (mailing output)
Jul  7 02:01:00 localhost CROND[2006]: (root) CMD (nice -n 19 run-parts /etc/cron.hourly) 
Jul  7 02:06:07 localhost init: Switching to runlevel: 0
Jul  7 02:06:10 localhost autologin(pam_unix)[1388]: session closed for user jarda
Jul  7 02:07:45 localhost init: Switching to runlevel: 6
Jul  7 02:14:22 localhost syslogd 1.4.1: restart.
Jul  7 02:14:22 localhost kernel: klogd 1.4.1, log source = /proc/kmsg started.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Crash
  2001-05-07 15:01 ` Crash Alan Cox
@ 2001-05-13 18:11   ` Anuradha Ratnaweera
  0 siblings, 0 replies; 45+ messages in thread
From: Anuradha Ratnaweera @ 2001-05-13 18:11 UTC (permalink / raw)
  To: Alan Cox; +Cc: C.Praveen, linux-kernel


On Mon, 7 May 2001, Alan Cox wrote:

> > Is it possible to screw up the hardware entirely from software? I made
> 
> In an abstract theoretical sense yes. Accidentally almost impossible.

There _were_ some viruses (in M$ world) that added "expensive" operations
to every disk access, such as reading from the extreme ends of the disk,
so that the head of the hard disk might eventually fail.

Also, I have heard a hum coming out of a slightly old monitor (optiplex)
when set to run X with high resolutions (well above horizontal/vertical
frequency limits). This monior eventually failed when operating in the
_safe_ regime. However, I suspect that this was a problem with the monitor
rather than software.

Regards,

Anuradha




^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Crash
  2001-05-07 14:37 Crash C.Praveen
@ 2001-05-07 15:01 ` Alan Cox
  2001-05-13 18:11   ` Crash Anuradha Ratnaweera
  0 siblings, 1 reply; 45+ messages in thread
From: Alan Cox @ 2001-05-07 15:01 UTC (permalink / raw)
  To: C.Praveen; +Cc: linux-kernel

> Is it possible to screw up the hardware entirely from software? I made

In an abstract theoretical sense yes. Accidentally almost impossible.

> know is if there is any way to screw the board from software in such a way
> that power off and power on does not bring it up ?.

The only people are ever likely to do is to corrupt the CMOS, which is easily
cleared.

> Its a dual pentium-3 machine. The power supply is gone also, the power
> supply from the crashed machine does not bring up another normal computer,
> also power supply from normal computer does not bring up crashed computer.

Sounds like a rather more physical layer problem - like a power spike and
PSU failure.

BTW: Always put a voltmeter on a power supply before you swap it like that
to test it. You need to check the voltages under load look sane otherwise you
may end up using a failed PSU to blow up other motherboards which is a
rather expensive debugging error ;)

Alan


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Crash
@ 2001-05-07 14:37 C.Praveen
  2001-05-07 15:01 ` Crash Alan Cox
  0 siblings, 1 reply; 45+ messages in thread
From: C.Praveen @ 2001-05-07 14:37 UTC (permalink / raw)
  To: linux-kernel

Hello,

Is it possible to screw up the hardware entirely from software? I made
some changes to the 2.4.2 kernel to support save/restore of the event
counters. It crashed and does not come up at all, what i would like to
know is if there is any way to screw the board from software in such a way
that power off and power on does not bring it up ?.

Its a dual pentium-3 machine. The power supply is gone also, the power
supply from the crashed machine does not bring up another normal computer,
also power supply from normal computer does not bring up crashed computer.
so there must be something really wrong with the motherboard. Id like to
know if it was because of me ..., is it possible to do things to the
motherboard from software (I did change things in the kernel, timer ISR
also), that wont boot the machine at all when power turned off and then on
?. from aboce is it very likely that the power supply went out and took
the board with it ??

and by "doesnt come up" i meant, totally blank, no output at all
absolutely

*Any* help/comments please!

Praveen C



^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2020-02-07  5:28 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-07-23 12:55 Crash Andrea Gelmini
     [not found] ` <9cdbb57f0907230555k768383c2ld1690d31cc6fff83-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-07-23 16:12   ` Crash Ryusuke Konishi
     [not found]     ` <20090724.011249.110726474.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-07-23 21:02       ` Crash Andrea Gelmini
     [not found]         ` <9cdbb57f0907231402i1a92cb4qfe5a9d81346a4665-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-07-23 21:20           ` Crash Andrea Gelmini
     [not found]             ` <9cdbb57f0907231420y4122d649y69fee2273a05b4cc-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-07-27  0:40               ` Crash Jiro SEKIBA
     [not found]                 ` <873a8jhsbd.wl%jir-27yqGEOhnJbQT0dZR+AlfA@public.gmane.org>
2009-07-27  7:58                   ` Crash Andrea Gelmini
2009-07-29  2:49                   ` Crash Jiro SEKIBA
     [not found]                     ` <87eis0mcev.wl%jir-27yqGEOhnJbQT0dZR+AlfA@public.gmane.org>
2009-07-29  3:46                       ` Crash Ryusuke Konishi
     [not found]                         ` <20090729.124638.38314632.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-07-29  4:40                           ` Crash Jiro SEKIBA
     [not found]                             ` <874osw14pz.wl%jir-27yqGEOhnJbQT0dZR+AlfA@public.gmane.org>
2009-07-29  5:08                               ` Crash Ryusuke Konishi
     [not found]                                 ` <20090729.140821.103585622.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-08-10  6:54                                   ` kernel oops on shrink_page_list (was Re: Crash...) Ryusuke Konishi
     [not found]                                     ` <20090810.155420.42596352.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-08-25 10:50                                       ` kernel oops on shrink_page_list Ryusuke Konishi
2009-08-01 13:39                   ` Crash Andrea Gelmini
     [not found]                     ` <9cdbb57f0908010639l26c26182ma121b0d7672003e0-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-08-02  7:58                       ` Crash Jiro SEKIBA
2009-07-24  8:58           ` Crash Reinoud Zandijk
     [not found]             ` <20090724085803.GA23256-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
2009-07-24  9:47               ` Crash Andrea Gelmini
     [not found]                 ` <9cdbb57f0907240247n5ffd6f81yaee39eb386516c25-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-07-24 10:02                   ` Crash Reinoud Zandijk
2009-07-24 10:47                   ` Crash Ryusuke Konishi
2009-07-24 10:46               ` Crash Ryusuke Konishi
     [not found]                 ` <20090724.194617.88653682.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-07-24 11:13                   ` Crash Reinoud Zandijk
     [not found]                     ` <20090724111333.GE23256-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
2009-07-27  7:45                       ` Crash Ryusuke Konishi
2009-07-29  2:46           ` Crash Ryusuke Konishi
     [not found]             ` <20090729.114604.56042421.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-08-01 13:36               ` Crash Andrea Gelmini
     [not found]                 ` <9cdbb57f0908010636u7296da29p61df192dc35d0d12-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-08-01 13:56                   ` Crash Ryusuke Konishi
  -- strict thread matches above, loose matches on Subject: below --
2020-02-06 20:58 crash Frank Esposito
2020-02-07  5:28 ` crash Rebecca Cran
     [not found] <ee5afd761002182235r1fe20b0kc7ef7082a5a907e3@mail.gmail.com>
     [not found] ` <ee5afd761002182235r1fe20b0kc7ef7082a5a907e3-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-02-20 13:38   ` crash Ryusuke Konishi
2010-02-10  0:59 crash Ryusuke Konishi
     [not found] ` <201002100059.AA01340-ZdTO5nnmHvkOizVVqyxoihMFgDP4sedm@public.gmane.org>
2010-02-10  4:07   ` crash Ryusuke Konishi
     [not found]     ` <ee5afd761002092314i1ba1ec66ie6fe7d0f22d6927e@mail.gmail.com>
     [not found]       ` <ee5afd761002092314i1ba1ec66ie6fe7d0f22d6927e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-02-10  7:26         ` crash Jan de Kruyf
     [not found]           ` <ee5afd761002092326k3bb3a74fq7145cdb9925d88d5-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-02-11  5:30             ` crash Ryusuke Konishi
     [not found]               ` <20100211.143001.184824921.ryusuke-sG5X7nlA6pw@public.gmane.org>
2010-02-11 17:43                 ` crash Jan de Kruyf
     [not found]                   ` <ee5afd761002110943y1ca061bdi610de2f1a5df3c32-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-02-14  7:34                     ` crash Ryusuke Konishi
2005-02-24 22:45 Crash visik7
2005-02-25  0:24 ` Crash visik7
2005-02-25  7:10   ` Crash Keir Fraser
2005-02-25  7:59     ` Crash visik7
2005-02-25 15:00       ` Crash Keir Fraser
2005-02-27 23:46         ` Crash visik7
2005-03-01 23:07         ` Crash visik7
2005-03-02  7:32           ` Crash Keir Fraser
2005-03-02 11:52             ` Crash visik7
2002-07-07 23:34 Crash Jarda Gress
2001-05-07 14:37 Crash C.Praveen
2001-05-07 15:01 ` Crash Alan Cox
2001-05-13 18:11   ` Crash Anuradha Ratnaweera

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.