All of lore.kernel.org
 help / color / mirror / Atom feed
* Broken nilfs2 filesystem
@ 2013-05-22 20:33 Anton Eliasson
       [not found] ` <519D2B96.9000106-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
  0 siblings, 1 reply; 27+ messages in thread
From: Anton Eliasson @ 2013-05-22 20:33 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA; +Cc: devel-17Olwe7vw2dLC78zk6coLg

[-- Attachment #1: Type: text/plain, Size: 18942 bytes --]

Greetings!
It pains me to report that my /home filesystem broke down today. My 
system is running Arch Linux 64-bit. The filesystem resides on a Crucial 
M4 256 GB SSD, on top of a LVM2 volume. The drive and filesystem are 
both around six months old. Partition table and error log excerpts are 
at the bottom of this e-mail. Full logs are available upon request.

I am providing this information as a bug report. I have no reason to 
suspect the hardware but I cannot exclude it either. If you (the 
developers) are interested in troubleshooting this for prosperity, I can 
be your hands and run whatever tools are required. If not, I'll reformat 
the filesystem, restore the data from backup and forget that this happened.

In case the formatting gets mangled, this e-mail is also available at 
What happened today, in chronological order:

~18:00
======
I am troubleshooting some issues that turn out to be caused by a wrongly 
configured system clock. The RTC (hardware clock) is set to local time 
(UTC+2) but the OS is configured to treat the RTC as UTC. This is 
because it was set to UTC previously, but then I reinstalled Windows 
which promptly reset it to local time.

This set the mtime of some files in both / and /home to dates in the 
future. When I discovered this, I `touch`ed all affected files (`touch 
now; sudo find / /home -xdev -newer now -exec touch {} \;`) to reset 
their mtime and rebooted the system. I do not know if this is relevant; 
if not, it makes reading the log files more fun.

I then launch my command line backup program "bup", Firefox and some 
other apps.

~18:50-19:00
============
Firefox freezes. The system keeps running but I can't launch new 
programs. It looked like all I/O broke down. However, bup kept running. 
I left the computer alone for perhaps 30-60 min.

~20:00
======
When I came back, bup hade frozen (/var/log/messages at 18:53:31).[1] I 
restart X by pressing Alt+SysRq+K (/var/log/messages at 20:06:33) and 
return to the login screen. The system freezes during login though, 
probably because /home had probably been mounted read only). So I reboot 
using Alt+SysRq+REISUB (/var/log/messages at 20:07:05). I noticed some 
I/O errors during shutdown.

After the reboot there are no immediate signs of disaster. I launch bup 
again. Some time later, /home remounts as read only. I notice that bup 
has reported I/O errors while reading some files in /home.[2] dmesg and 
/var/log/kern.log contains errors mentioning "bad btree node" and 
"nilfs_bmap_lookup_contig: broken bmap".[3]

I proceed to examine one of the files that bup reported I/O errors for:

     [2/5.0.2]{1}anton@riven:~/Bilder/20130321-28 Jakobs bilder från 
Nederländerna> LANG=C stat 179.JPG
       File: '179.JPG'
       Size: 3774546   	Blocks: 7416       IO Block: 4096   regular file
     Device: fe03h/65027d	Inode: 136492      Links: 1
     Access: (0644/-rw-r--r--)  Uid: ( 1000/   anton)   Gid: ( 1000/ 
anton)
     Access: 2013-03-28 22:04:48.000000000 +0100
     Modify: 2013-03-28 22:04:48.000000000 +0100
     Change: 2013-04-30 16:40:34.053914840 +0200
      Birth: -
     [2/5.0.2]anton@riven:~/Bilder/20130321-28 Jakobs bilder från 
Nederländerna> LANG= cat 179.JPG > /dev/null
     cat: 179.JPG: Input/output error
     [2/5.0.2]{1}anton@riven:~/Bilder/20130321-28 Jakobs bilder från 
Nederländerna> LANG=C cat 179.JPG > /dev/null
     cat: 179.JPG: Input/output error
     [2/5.0.2]{1}anton@riven:~/Bilder/20130321-28 Jakobs bilder från 
Nederländerna> dmesg | tail

     [ 3762.363260] NILFS: bad btree node (blocknr=14351789): level = 0, 
flags = 0x0, nchildren = 0
     [ 3762.363269] NILFS error (device dm-3): nilfs_bmap_lookup_contig: 
broken bmap (inode number=136492)

     [ 3855.881972] NILFS: bad btree node (blocknr=14351789): level = 0, 
flags = 0x0, nchildren = 0
     [ 3855.881980] NILFS error (device dm-3): nilfs_bmap_lookup_contig: 
broken bmap (inode number=136492)

     [ 3857.977754] NILFS: bad btree node (blocknr=14351789): level = 0, 
flags = 0x0, nchildren = 0
     [ 3857.977763] NILFS error (device dm-3): nilfs_bmap_lookup_contig: 
broken bmap (inode number=136492)

     [2/5.0.2]anton@riven:~/Bilder/20130321-28 Jakobs bilder från 
Nederländerna> sudo journalctl -xn
     [sudo] password for anton:
     -- Logs begin at fre 2012-10-19 17:29:01 CEST, end at ons 
2013-05-22 21:12:17 CEST. --
     maj 22 21:09:45 riven sudo[1545]: pam_unix(sudo:session): session 
closed for user root
     maj 22 21:10:42 riven kernel: NILFS: bad btree node 
(blocknr=14351789): level = 0, flags = 0x0, nchildren = 0
     maj 22 21:10:42 riven kernel: [87B blob data]
     maj 22 21:11:12 riven sudo[1563]: anton : TTY=pts/3 ; 
PWD=/Athena/Dump/nilfs-felsökning-2013-05-22 ; USER=root ; COMM
     maj 22 21:11:12 riven sudo[1563]: pam_unix(sudo:session): session 
opened for user root by anton(uid=0)
     maj 22 21:11:18 riven sudo[1563]: pam_unix(sudo:session): session 
closed for user root
     maj 22 21:12:15 riven kernel: NILFS: bad btree node 
(blocknr=14351789): level = 0, flags = 0x0, nchildren = 0
     maj 22 21:12:15 riven kernel: [87B blob data]
     maj 22 21:12:17 riven kernel: NILFS: bad btree node 
(blocknr=14351789): level = 0, flags = 0x0, nchildren = 0
     maj 22 21:12:17 riven kernel: [87B blob data]
     [2/5.0.2]anton@riven:~/Bilder/20130321-28 Jakobs bilder från 
Nederländerna> LANG=C rm 179.JPG
     rm: cannot remove '179.JPG': Read-only file system

System configuration
====================
Summary
-------
/ is on the logical volume riven/arch. /home is on the logical volume 
riven/home. The volume group riven is on the physical volume sda2. The 
physical volume sda2 is on the Crucial M4 SSD.

There is also a volume group riven-proto on a 1 TB hard disk drive 
containing an old unused Arch Linux installation and a filesystem 
"supplement" which is still used.

There is one NTFS partition on the SSD sda and one on the HDD sdb. sdb3 
is an old unused ext4 partition.

/etc/fstab
----------
     tmpfs		/tmp	tmpfs	nodev,nosuid	0	0
     /dev/mapper/riven-arch	/         	nilfs2    	rw,noatime,discard 0 0
     /dev/mapper/riven-home	/home     	nilfs2    	rw,noatime,discard 0 0
     /dev/mapper/riven-swap  none            swap            defaults 
         0 0
     /dev/riven-proto/supplement /Supplement ext4 defaults,noatime 0 0
# some NFS mounts excluded

$ sudo lvdisplay
----------------
   --- Logical volume ---
   LV Path                /dev/riven-proto/arch
   LV Name                arch
   VG Name                riven-proto
   LV UUID                GA0SNf-N1rZ-ErCU-ALG5-0L6D-Ix4j-s3cLe4
   LV Write Access        read/write
   LV Creation host, time archiso, 2012-09-27 22:42:09 +0200
   LV Status              available
   # open                 0
   LV Size                30,00 GiB
   Current LE             7680
   Segments               1
   Allocation             inherit
   Read ahead sectors     auto
   - currently set to     256
   Block device           254:1

   --- Logical volume ---
   LV Path                /dev/riven-proto/supplement
   LV Name                supplement
   VG Name                riven-proto
   LV UUID                jgrYcK-fEm9-tAOq-I6PR-PB4N-Khet-E0qsaU
   LV Write Access        read/write
   LV Creation host, time riven, 2012-10-31 18:31:31 +0100
   LV Status              available
   # open                 1
   LV Size                200,00 GiB
   Current LE             51200
   Segments               1
   Allocation             inherit
   Read ahead sectors     auto
   - currently set to     256
   Block device           254:4

   --- Logical volume ---
   LV Path                /dev/riven/swap
   LV Name                swap
   VG Name                riven
   LV UUID                28r67b-M7hy-2orC-5snN-7CUu-jqGn-x1vxXc
   LV Write Access        read/write
   LV Creation host, time archiso, 2012-10-06 15:47:56 +0200
   LV Status              available
   # open                 2
   LV Size                1,00 GiB
   Current LE             256
   Segments               1
   Allocation             inherit
   Read ahead sectors     auto
   - currently set to     256
   Block device           254:0

   --- Logical volume ---
   LV Path                /dev/riven/arch
   LV Name                arch
   VG Name                riven
   LV UUID                QAjHWq-5eDv-IyQe-ihiq-dpyZ-acIR-Imox4Y
   LV Write Access        read/write
   LV Creation host, time archiso, 2012-10-06 15:48:50 +0200
   LV Status              available
   # open                 2
   LV Size                30,00 GiB
   Current LE             7680
   Segments               1
   Allocation             inherit
   Read ahead sectors     auto
   - currently set to     256
   Block device           254:2

   --- Logical volume ---
   LV Path                /dev/riven/home
   LV Name                home
   VG Name                riven
   LV UUID                YAQgTA-3Cvo-fuzu-6Uaj-0C6m-Tzt9-SyGv2y
   LV Write Access        read/write
   LV Creation host, time archiso, 2012-10-06 15:50:25 +0200
   LV Status              available
   # open                 1
   LV Size                109,68 GiB
   Current LE             28079
   Segments               1
   Allocation             inherit
   Read ahead sectors     auto
   - currently set to     256
   Block device           254:3


$ sudo vgdisplay
----------------------
   --- Volume group ---
   VG Name               riven-proto
   System ID
   Format                lvm2
   Metadata Areas        1
   Metadata Sequence No  11
   VG Access             read/write
   VG Status             resizable
   MAX LV                0
   Cur LV                2
   Open LV               1
   Max PV                0
   Cur PV                1
   Act PV                1
   VG Size               512,41 GiB
   PE Size               4,00 MiB
   Total PE              131178
   Alloc PE / Size       58880 / 230,00 GiB
   Free  PE / Size       72298 / 282,41 GiB
   VG UUID               HGfujG-CdYE-zQuD-xIOf-Lyus-8B3f-4WPPxp

   --- Volume group ---
   VG Name               riven
   System ID
   Format                lvm2
   Metadata Areas        1
   Metadata Sequence No  4
   VG Access             read/write
   VG Status             resizable
   MAX LV                0
   Cur LV                3
   Open LV               3
   Max PV                0
   Cur PV                1
   Act PV                1
   VG Size               140,68 GiB
   PE Size               4,00 MiB
   Total PE              36015
   Alloc PE / Size       36015 / 140,68 GiB
   Free  PE / Size       0 / 0
   VG UUID               EZSXiE-F9Ec-vUX2-Ny0l-Fa2U-IIr6-6sAmIV


$ sudo pvdisplay
--------------------------
   --- Physical volume ---
   PV Name               /dev/sdb2
   VG Name               riven-proto
   PV Size               512,42 GiB / not usable 4,00 MiB
   Allocatable           yes
   PE Size               4,00 MiB
   Total PE              131178
   Free PE               72298
   Allocated PE          58880
   PV UUID               dAN0do-QWac-iBO6-tBxg-fr2z-ciNL-EZyOko

   --- Physical volume ---
   PV Name               /dev/sda2
   VG Name               riven
   PV Size               140,68 GiB / not usable 0
   Allocatable           yes (but full)
   PE Size               4,00 MiB
   Total PE              36015
   Free PE               0
   Allocated PE          36015
   PV UUID               KtuR1D-G2vj-8qb9-Kzyz-xkDP-weI9-NtHFMc


$ LANG=C sudo fdisk -l
----------------------

Disk /dev/sda: 256.1 GB, 256060514304 bytes, 500118192 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000e08ae

    Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048   205078527   102538240    7  HPFS/NTFS/exFAT
/dev/sda2       205078528   500115455   147518464   83  Linux

Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes, 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000b5f5c

    Device Boot      Start         End      Blocks   Id  System
/dev/sdb1              63   615353759   307676848+   7  HPFS/NTFS/exFAT
/dev/sdb2       878905344  1953523711   537309184   83  Linux
/dev/sdb3   *   877930496   878905343      487424   83  Linux

Partition table entries are not in disk order

Disk /dev/mapper/riven-swap: 1073 MB, 1073741824 bytes, 2097152 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/riven--proto-arch: 32.2 GB, 32212254720 bytes, 62914560 
sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x20ac7dda

This doesn't look like a partition table
Probably you selected the wrong device.

                         Device Boot      Start         End      Blocks 
   Id  System
/dev/mapper/riven--proto-arch1   ?  3224498923  3657370039   216435558+ 
   7  HPFS/NTFS/exFAT
/dev/mapper/riven--proto-arch2   ?  3272020941  5225480974   976730017 
  16  Hidden FAT16
/dev/mapper/riven--proto-arch3   ?           0           0           0 
  6f  Unknown
/dev/mapper/riven--proto-arch4        50200576   974536369   462167897 
   0  Empty

Partition table entries are not in disk order

Disk /dev/mapper/riven-arch: 32.2 GB, 32212254720 bytes, 62914560 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/riven-home: 117.8 GB, 117771862016 bytes, 230023168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/riven--proto-supplement: 214.7 GB, 214748364800 bytes, 
419430400 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

$ ls -l /dev/riven/
-------------------
totalt 0
lrwxrwxrwx 1 root root 7 22 maj 20.08 arch -> ../dm-2
lrwxrwxrwx 1 root root 7 22 maj 20.08 home -> ../dm-3
lrwxrwxrwx 1 root root 7 22 maj 20.08 swap -> ../dm-0

/etc/nilfs-cleanerd.conf
------------------------
protection_period	0
min_clean_segments	0
max_clean_segments	100%
clean_check_interval	10
selection_policy	timestamp	# timestamp in ascend order
nsegments_per_clean	2
mc_nsegments_per_clean	4
cleaning_interval	5
mc_cleaning_interval	1
retry_interval		60
use_mmap
log_priority		info


References
==========

[1]:
See attached file "messages-excerpt-Anton_Eliasson-20130522"

[2]:
     [2/5.0.2]anton@riven:~> bup.sh
     /home/anton/etc-tomahna-20120615/etc/vmware/: [Errno 13] Permission 
denied
     Indexing: 117930, done.
     bup: merging indexes (211106/211106), done.
     WARNING: 1 errors encountered.
     Reading index: 211091, done.
     /home/anton/vmware/WXP/WXP-disk1.vmdk: [Errno 5] Input/output error
     /home/anton/Bilder/20130321-28 Jakobs bilder från 
Nederländerna/179.JPG: [Errno 5] Input/output error
     /home/anton/Bilder/20130321-28 Jakobs bilder från 
Nederländerna/172.JPG: [Errno 5] Input/output error
     /home/anton/Bilder/20130321-28 Jakobs bilder från 
Nederländerna/170.JPG: [Errno 5] Input/output error
     /home/anton/Bilder/20130321-28 Jakobs bilder från 
Nederländerna/165.JPG: [Errno 5] Input/output error
     /home/anton/Bilder/20130321-28 Jakobs bilder från 
Nederländerna/164.JPG: [Errno 5] Input/output error
     /home/anton/Bilder/20130321-28 Jakobs bilder från 
Nederländerna/163.JPG: [Errno 5] Input/output error
     /home/anton/Bilder/20130321-28 Jakobs bilder från 
Nederländerna/160.JPG: [Errno 5] Input/output error
     Saving: 100.00% (52192637/52192637k, 211091/211091 files), done.
     bloom: adding 1 file (59294 objects).
     Traceback (most recent call last):
       File "/usr/lib/bup/cmd/bup-save", line 431, in <module>
         w.close()  # must close before we can update the ref
       File "/usr/lib/bup/bup/client.py", line 316, in close
         id = self._end()
       File "/usr/lib/bup/bup/client.py", line 313, in _end
         return self.suggest_packs() # Returns last idx received
       File "/usr/lib/bup/bup/client.py", line 233, in _suggest_packs
         self.sync_index(idx)
       File "/usr/lib/bup/bup/client.py", line 193, in sync_index
         f = open(fn + '.tmp', 'w')
     IOError: [Errno 30] Read-only file system: 
'/home/anton/.bup/index-cache/hermes__Athena_Backup_bup/pack-90022de5619eee12a7611a33caf41047fb57a90a.idx.tmp'
     [2/5.0.2]{1}anton@riven:~>

[3]:
     [  233.951973] NILFS: bad btree node (blocknr=19549978): level = 0, 
flags = 0x0, nchildren = 0
     [  233.951982] NILFS error (device dm-3): nilfs_bmap_lookup_contig: 
broken bmap (inode number=4301)

     [  233.955999] Remounting filesystem read-only
     [  233.956119] NILFS: bad btree node (blocknr=19549978): level = 0, 
flags = 0x0, nchildren = 0
     [  233.956125] NILFS error (device dm-3): nilfs_bmap_lookup_contig: 
broken bmap (inode number=4301)

     [  233.956417] NILFS: bad btree node (blocknr=19549978): level = 0, 
flags = 0x0, nchildren = 0
     [  233.956422] NILFS error (device dm-3): nilfs_bmap_lookup_contig: 
broken bmap (inode number=4301)

     [  233.956524] NILFS: bad btree node (blocknr=19549978): level = 0, 
flags = 0x0, nchildren = 0
     [  233.956530] NILFS error (device dm-3): nilfs_bmap_lookup_contig: 
broken bmap (inode number=4301)

     ...

     [  819.004092] NILFS: bad btree node (blocknr=14351789): level = 0, 
flags = 0x0, nchildren = 0
     [  819.004101] NILFS error (device dm-3): nilfs_bmap_lookup_contig: 
broken bmap (inode number=136492)

     [  819.004177] NILFS: bad btree node (blocknr=14351789): level = 0, 
flags = 0x0, nchildren = 0
     [  819.004181] NILFS error (device dm-3): nilfs_bmap_lookup_contig: 
broken bmap (inode number=136492)

     [  819.004257] NILFS: bad btree node (blocknr=14351789): level = 0, 
flags = 0x0, nchildren = 0
     [  819.004263] NILFS error (device dm-3): nilfs_bmap_lookup_contig: 
broken bmap (inode number=136492)

     ...

-- 
Best Regards,
Anton Eliasson

[-- Attachment #2: messages-excerpt-Anton_Eliasson-20130522 --]
[-- Type: text/plain, Size: 36354 bytes --]

May 22 18:50:14 riven slim[274]: 18:50:14 | ListenerTcp | Pinging tcp://notifications.sparkleshare.org:443/
May 22 18:50:14 riven slim[274]: 18:50:14 | ListenerTcp | Received pong from tcp://notifications.sparkleshare.org:443/
May 22 18:53:31 riven kernel: [ 3821.605591] PGD 19636d067 PUD 19636e067 PMD 0 
May 22 18:53:31 riven kernel: [ 3821.605597] Oops: 0000 [#1] PREEMPT SMP 
May 22 18:53:31 riven kernel: [ 3821.605602] Modules linked in: nfsv3 nfs_acl ppdev parport_pc parport fuse vsock btrfs nvidia(PO) raid6_pq crc32c libcrc32c zlib_deflate iTCO_wdt iTCO_vendor_support gpio_ich ext4 crc16 mbcache xor jbd2 coretemp kvm_intel kvm snd_hda_codec_realtek microcode psmouse pcspkr serio_raw lpc_ich i2c_i801 snd_hda_intel r8169 evdev snd_hda_codec drm mii i2c_core snd_hwdep snd_pcm acpi_cpufreq snd_page_alloc mperf snd_timer snd button soundcore intel_agp intel_gtt processor loop nfs lockd sunrpc fscache nilfs2 dm_mod sd_mod sr_mod cdrom ata_generic pata_acpi hid_generic usbhid hid pata_it8213 ahci libahci firewire_ohci libata firewire_core scsi_mod crc_itu_t ehci_pci uhci_hcd ehci_hcd usbcore usb_common
May 22 18:53:31 riven kernel: [ 3821.605669] CPU 2 
May 22 18:53:31 riven kernel: [ 3821.605674] Pid: 250, comm: nilfs_cleanerd Tainted: P           O 3.9.3-1-ARCH #1 Gigabyte Technology Co., Ltd. EP45-DS4/EP45-DS4
May 22 18:53:31 riven kernel: [ 3821.605677] RIP: 0010:[<ffffffffa027f1a2>]  [<ffffffffa027f1a2>] nilfs_end_page_io+0x12/0xc0 [nilfs2]
May 22 18:53:31 riven kernel: [ 3821.605686] RSP: 0018:ffff8801960f7b30  EFLAGS: 00010202
May 22 18:53:31 riven kernel: [ 3821.605690] RAX: ffff880101b49250 RBX: 00000000000036cd RCX: 0000000000000034
May 22 18:53:31 riven kernel: [ 3821.605692] RDX: 000000000000000d RSI: 0000000000000000 RDI: 00000000000036cd
May 22 18:53:31 riven kernel: [ 3821.605695] RBP: ffff8801960f7b38 R08: 1c00000000000000 R09: a80000c80e000000
May 22 18:53:31 riven kernel: [ 3821.605697] R10: 57ffe937f2320380 R11: 0000000000000019 R12: ffff8801a27ac738
May 22 18:53:31 riven kernel: [ 3821.605700] R13: ffff880101b49208 R14: ffffea00001eab40 R15: ffffea00001dee80
May 22 18:53:31 riven kernel: [ 3821.605703] FS:  00007f44fa5d0740(0000) GS:ffff8801afd00000(0000) knlGS:0000000000000000
May 22 18:53:31 riven kernel: [ 3821.605706] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
May 22 18:53:31 riven kernel: [ 3821.605709] CR2: 00000000000036cd CR3: 000000019636a000 CR4: 00000000000007e0
May 22 18:53:31 riven kernel: [ 3821.605711] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 22 18:53:31 riven kernel: [ 3821.605714] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
May 22 18:53:31 riven kernel: [ 3821.605717] Process nilfs_cleanerd (pid: 250, threadinfo ffff8801960f6000, task ffff8801a509cb60)
May 22 18:53:31 riven kernel: [ 3821.605719] Stack:
May 22 18:53:31 riven kernel: [ 3821.605721]  ffff8801a27ac690 ffff8801960f7c20 ffffffffa0280ed5 0000000000000002
May 22 18:53:31 riven kernel: [ 3821.605727]  ffff8801a509cb60 ffff8801a509cb60 ffff8801a509cb60 ffff8801a3dc6070
May 22 18:53:31 riven kernel: [ 3821.605731]  ffff8801a5cddd60 ffff8801a5cddc00 00000001001ead00 ffff8801a3dc6060
May 22 18:53:31 riven kernel: [ 3821.605736] Call Trace:
May 22 18:53:31 riven kernel: [ 3821.605747]  [<ffffffffa0280ed5>] nilfs_segctor_do_construct+0xd65/0x1ab0 [nilfs2]
May 22 18:53:31 riven kernel: [ 3821.605756]  [<ffffffffa0281e42>] nilfs_segctor_construct+0x172/0x290 [nilfs2]
May 22 18:53:31 riven kernel: [ 3821.605765]  [<ffffffffa0282ead>] nilfs_clean_segments+0xed/0x270 [nilfs2]
May 22 18:53:31 riven kernel: [ 3821.605771]  [<ffffffff811bc4bc>] ? __set_page_dirty+0x6c/0xc0
May 22 18:53:31 riven kernel: [ 3821.605780]  [<ffffffffa028906f>] nilfs_ioctl_clean_segments.isra.14+0x4bf/0x740 [nilfs2]
May 22 18:53:31 riven kernel: [ 3821.605788]  [<ffffffffa0279a8d>] ? nilfs_btree_lookup+0x4d/0x70 [nilfs2]
May 22 18:53:31 riven kernel: [ 3821.605797]  [<ffffffffa028970c>] nilfs_ioctl+0x21c/0x740 [nilfs2]
May 22 18:53:31 riven kernel: [ 3821.605802]  [<ffffffff814d0b76>] ? __schedule+0x3f6/0x940
May 22 18:53:31 riven kernel: [ 3821.605808]  [<ffffffff8119cf65>] do_vfs_ioctl+0x2e5/0x4d0
May 22 18:53:31 riven kernel: [ 3821.605813]  [<ffffffff81152930>] ? do_munmap+0x2b0/0x3e0
May 22 18:53:31 riven kernel: [ 3821.605818]  [<ffffffff8119d1d1>] sys_ioctl+0x81/0xa0
May 22 18:53:31 riven kernel: [ 3821.605822]  [<ffffffff814d3769>] ? do_device_not_available+0x19/0x20
May 22 18:53:31 riven kernel: [ 3821.605827]  [<ffffffff814d9e9d>] system_call_fastpath+0x1a/0x1f
May 22 18:53:31 riven kernel: [ 3821.605829] Code: ff ff ff 48 81 c4 88 00 00 00 5b 41 5c 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 85 ff 48 89 e5 53 48 89 fb 74 4e <48> 8b 07 f6 c4 08 0f 84 8c 00 00 00 48 8b 47 30 48 8b 00 f6 c4 
May 22 18:53:31 riven kernel: [ 3821.605881]  RSP <ffff8801960f7b30>
May 22 18:53:31 riven kernel: [ 3821.605884] CR2: 00000000000036cd
May 22 18:53:31 riven kernel: [ 3821.605887] ---[ end trace 21dfcc9b8d62edba ]---
May 22 18:55:14 riven slim[274]: 18:55:14 | ListenerTcp | Pinging tcp://notifications.sparkleshare.org:443/
May 22 18:58:33 riven slim[274]: /home/buildbot/buildslave_steam/steam_rel_client_ubuntu12_linux/build/src/clientdll/../common/pipes.cpp (722) : Assertion Failed: Stalled cross-thread pipe
May 22 18:58:33 riven slim[274]: /home/buildbot/buildslave_steam/steam_rel_client_ubuntu12_linux/build/src/clientdll/../common/pipes.cpp (722) : Fatal assert failed: /home/buildbot/buildslave_steam/steam_rel_client_ubuntu12_linux/build/src/clientdll/../common/pipes.cpp, line 722.  Application exiting.
May 22 18:58:33 riven slim[274]: Assert( Fatal assert ):/home/buildbot/buildslave_steam/steam_rel_client_ubuntu12_linux/build/src/clientdll/../common/pipes.cpp:722
May 22 18:58:33 riven slim[274]: Installing breakpad exception handler for appid(steam)/version(1368838102_client)
May 22 20:05:56 riven slim[274]: ESC[31m[16:43:03.489783 Warning]ESC[0m ESC[35m[ZeitgeistPlugin]ESC[0m Zeitgeist search failed: GDBus.Error:org.freedesktop.DBus.Error.UnknownMethod: No such interface `org.gnome.zeitgeist.Index' on object at path /org/gnome/zeitgeist/index/activity
May 22 20:05:56 riven slim[274]: ESC[31m[16:43:03.570057 Warning]ESC[0m ESC[35m[ZeitgeistPlugin]ESC[0m Zeitgeist search failed: GDBus.Error:org.freedesktop.DBus.Error.UnknownMethod: No such interface `org.gnome.zeitgeist.Index' on object at path /org/gnome/zeitgeist/index/activity
May 22 20:05:56 riven slim[274]: ESC[31m[16:43:03.681706 Warning]ESC[0m ESC[35m[ZeitgeistPlugin]ESC[0m Zeitgeist search failed: GDBus.Error:org.freedesktop.DBus.Error.UnknownMethod: No such interface `org.gnome.zeitgeist.Index' on object at path /org/gnome/zeitgeist/index/activity
May 22 20:05:56 riven slim[274]: Got Event! 2, -1
May 22 20:05:56 riven slim[274]: Got KeyPress! keycode: 65, modifiers: 64
May 22 20:05:56 riven slim[274]: Calling handler for '<Super>space'...
May 22 20:06:23 riven slim[274]: ** (zeitgeist-datahub:592): WARNING **: zeitgeist-datahub.vala:209: Error during inserting events: Timeout was reached
May 22 20:06:33 riven kernel: [ 8203.651816] SysRq : SAK
May 22 20:06:33 riven kernel: [ 8203.651872] SAK: killed process 318 (X): task_session(p)==tty->session
May 22 20:06:33 riven kernel: [ 8203.651977] SAK: killed process 318 (X): task_session(p)==tty->session
May 22 20:06:33 riven slim[274]: xfce4-session: Fatal IO error 11 (Resursen tillfälligt otillgänglig) on X server :0.0.
May 22 20:06:33 riven slim[274]: xfwm4: Fatal IO error 11 (Resursen tillfälligt otillgänglig) on X server :0.0.
May 22 20:06:33 riven slim[274]: xfsettingsd: Fatal IO error 11 (Resursen tillfälligt otillgänglig) on X server :0.0.
May 22 20:06:33 riven slim[274]: Thunar: Fatal IO error 11 (Resursen tillfälligt otillgänglig) on X server :0.0.
May 22 20:06:33 riven slim[274]: xfce4-panel: Fatal IO error 11 (Resursen tillfälligt otillgänglig) on X server :0.0.
May 22 20:06:33 riven slim[274]: wrapper: Fatal IO error 11 (Resursen tillfälligt otillgänglig) on X server :0.0.
May 22 20:06:33 riven slim[274]: (pasystray:564): Gdk-WARNING **: pasystray: Fatal IO error 11 (Resursen tillfälligt otillgänglig) on X server :0.0.
May 22 20:06:33 riven slim[274]: xfdesktop: Fatal IO error 11 (Resursen tillfälligt otillgänglig) on X server :0.0.
May 22 20:06:33 riven slim[274]: wrapper: Fatal IO error 11 (Resursen tillfälligt otillgänglig) on X server :0.0.
May 22 20:06:33 riven slim[274]: terminator: Fatal IO error 11 (Resursen tillfälligt otillgänglig) on X server :0.0.
May 22 20:06:33 riven slim[274]: AL lib: pulseaudio.c:353: Received context failure!
May 22 20:06:33 riven slim[274]: AL lib: pulseaudio.c:366: Received stream failure!
May 22 20:06:33 riven slim[274]: /usr/bin/xauth:  file /var/run/slim.auth does not exist
May 22 20:06:33 riven slim[274]: X.Org X Server 1.14.1
May 22 20:06:33 riven slim[274]: Release Date: 2013-04-17
May 22 20:06:33 riven slim[274]: X Protocol Version 11, Revision 0
May 22 20:06:33 riven slim[274]: Build Operating System: Linux 3.8.7-1-ARCH x86_64
May 22 20:06:33 riven slim[274]: Current Operating System: Linux riven 3.9.3-1-ARCH #1 SMP PREEMPT Sun May 19 22:50:29 CEST 2013 x86_64
May 22 20:06:33 riven slim[274]: Kernel command line: BOOT_IMAGE=/boot/vmlinuz-linux root=/dev/mapper/riven-arch ro quiet
May 22 20:06:33 riven slim[274]: Build Date: 17 April 2013  02:37:06PM
May 22 20:06:33 riven slim[274]: Current version of pixman: 0.30.0
May 22 20:06:33 riven slim[274]: Before reporting problems, check http://wiki.x.org
May 22 20:06:33 riven slim[274]: to make sure that you have the latest version.
May 22 20:06:33 riven slim[274]: Markers: (--) probed, (**) from config file, (==) default setting,
May 22 20:06:33 riven slim[274]: (++) from command line, (!!) notice, (II) informational,
May 22 20:06:33 riven slim[274]: (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
May 22 20:06:33 riven slim[274]: (==) Log file: "/var/log/Xorg.0.log", Time: Wed May 22 20:06:33 2013
May 22 20:06:33 riven slim[274]: (==) Using config directory: "/etc/X11/xorg.conf.d"
May 22 20:06:33 riven slim[274]: Initializing built-in extension Generic Event Extension
May 22 20:06:33 riven slim[274]: Initializing built-in extension SHAPE
May 22 20:06:33 riven slim[274]: Initializing built-in extension MIT-SHM
May 22 20:06:33 riven slim[274]: Initializing built-in extension XInputExtension
May 22 20:06:33 riven slim[274]: Initializing built-in extension XTEST
May 22 20:06:33 riven slim[274]: Initializing built-in extension BIG-REQUESTS
May 22 20:06:33 riven slim[274]: Initializing built-in extension SYNC
May 22 20:06:33 riven slim[274]: Initializing built-in extension XKEYBOARD
May 22 20:06:33 riven slim[274]: Initializing built-in extension XC-MISC
May 22 20:06:33 riven slim[274]: Initializing built-in extension SECURITY
May 22 20:06:33 riven slim[274]: Initializing built-in extension XINERAMA
May 22 20:06:33 riven slim[274]: Initializing built-in extension XFIXES
May 22 20:06:33 riven slim[274]: Initializing built-in extension RENDER
May 22 20:06:33 riven slim[274]: Initializing built-in extension RANDR
May 22 20:06:33 riven slim[274]: Initializing built-in extension COMPOSITE
May 22 20:06:33 riven slim[274]: Initializing built-in extension DAMAGE
May 22 20:06:33 riven slim[274]: Initializing built-in extension MIT-SCREEN-SAVER
May 22 20:06:33 riven slim[274]: Initializing built-in extension DOUBLE-BUFFER
May 22 20:06:33 riven slim[274]: Initializing built-in extension RECORD
May 22 20:06:33 riven slim[274]: Initializing built-in extension DPMS
May 22 20:06:33 riven slim[274]: Initializing built-in extension X-Resource
May 22 20:06:33 riven slim[274]: Initializing built-in extension XVideo
May 22 20:06:33 riven slim[274]: Initializing built-in extension XVideo-MotionCompensation
May 22 20:06:33 riven slim[274]: Initializing built-in extension XFree86-VidModeExtension
May 22 20:06:33 riven slim[274]: Initializing built-in extension XFree86-DGA
May 22 20:06:33 riven slim[274]: Initializing built-in extension XFree86-DRI
May 22 20:06:33 riven slim[274]: Initializing built-in extension DRI2
May 22 20:06:33 riven slim[274]: Loading extension GLX
May 22 20:06:33 riven slim[274]: Loading extension NV-GLX
May 22 20:06:33 riven slim[274]: Loading extension NV-CONTROL
May 22 20:06:33 riven slim[274]: Loading extension XINERAMA
May 22 20:06:34 riven slim[274]: The XKEYBOARD keymap compiler (xkbcomp) reports:
May 22 20:06:34 riven slim[274]: > Warning:          Type "ONE_LEVEL" has 1 levels, but <RALT> has 2 symbols
May 22 20:06:34 riven slim[274]: >                   Ignoring extra symbols
May 22 20:06:34 riven slim[274]: Errors from xkbcomp are not fatal to the X server
May 22 20:07:05 riven kernel: [ 8235.987564] SysRq : Keyboard mode set to system default
May 22 20:07:06 riven kernel: [ 8236.819557] SysRq : Terminate All Tasks
May 22 20:07:06 riven [ 8236.827760] systemd-journald[150]: Received SIGTERM
May 22 20:07:06 riven [ 8236.842434] systemd[1]: getty-2yfe/R6NngVTDjBF/Jpztg@public.gmane.org holdoff time over, scheduling restart.
May 22 20:07:06 riven [ 8236.842702] systemd[1]: Cannot add dependency job for unit lvm.service, ignoring: Unit lvm.service failed to load: No such file or directory. See system logs and 'systemctl status lvm.service' for details.
May 22 20:07:06 riven [ 8236.842783] systemd[1]: Stopping Getty on tty1...
May 22 20:07:06 riven [ 8236.842948] systemd[1]: Starting Getty on tty1...
May 22 20:07:06 riven [ 8236.843662] systemd-udevd[4989]: starting version 204
May 22 20:07:06 riven [ 8236.843883] systemd[1]: Started Getty on tty1.
May 22 20:07:06 riven [ 8236.844554] systemd[1]: Started udev Kernel Device Manager.
May 22 20:07:06 riven [ 8236.847758] systemd[1]: systemd-journald.service holdoff time over, scheduling restart.
May 22 20:07:06 riven [ 8236.847820] systemd[1]: Stopping Journal Service...
May 22 20:07:06 riven [ 8236.847915] systemd[1]: Starting Journal Service...
May 22 20:07:06 riven [ 8236.849708] systemd[1]: Started Journal Service.
May 22 20:07:06 riven [ 8236.849770] systemd[1]: Starting Trigger Flushing of Journal to Persistent Storage...
May 22 20:07:06 riven ntpd[441]: ntpd exiting on signal 15
May 22 20:07:06 riven dhcpcd[415]: received SIGTERM, stopping
May 22 20:07:06 riven dhcpcd[415]: eth0: removing interface
May 22 20:07:06 riven nilfs_cleanerd[182]: shutdown
May 22 20:07:06 riven systemd[1]: systemd-udevd.service holdoff time over, scheduling restart.
May 22 20:07:06 riven systemd[1]: Stopping udev Kernel Device Manager...
May 22 20:07:06 riven systemd[1]: Starting udev Kernel Device Manager...
May 22 20:07:06 riven systemd[1]: rpcbind.service: main process exited, code=exited, status=2/INVALIDARGUMENT
May 22 20:07:06 riven systemd[1]: Unit rpcbind.service entered failed state.
May 22 20:07:06 riven systemd[1]: systemd-logind.service holdoff time over, scheduling restart.
May 22 20:07:06 riven systemd[1]: Cannot add dependency job for unit lvm.service, ignoring: Unit lvm.service failed to load: No such file or directory. See system logs and 'systemctl status lvm.service' for details.
May 22 20:07:06 riven systemd[1]: Stopping Login Service...
May 22 20:07:06 riven systemd[1]: Starting Login Service...
May 22 20:07:06 riven systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
May 22 20:07:06 riven systemd[1]: Cannot add dependency job for unit lvm.service, ignoring: Unit lvm.service failed to load: No such file or directory. See system logs and 'systemctl status lvm.service' for details.
May 22 20:07:06 riven systemd[1]: dhcpcd-ET31U/T6GptTDjBF/Jpztg@public.gmane.org: main process exited, code=exited, status=1/FAILURE
May 22 20:07:06 riven systemd[1]: dhcpcd-ET31U/T6GptTDjBF/Jpztg@public.gmane.org: control process exited, code=exited status=1
May 22 20:07:06 riven systemd[1]: Unit dhcpcd-ET31U/T6GptTDjBF/Jpztg@public.gmane.org entered failed state.
May 22 20:07:06 riven dhcpcd[5022]: dhcpcd[5022]: dhcpcd not running
May 22 20:07:06 riven systemd[1]: Started Login Service.
May 22 20:07:06 riven systemd[1]: sshd.service holdoff time over, scheduling restart.
May 22 20:07:06 riven systemd[1]: Cannot add dependency job for unit lvm.service, ignoring: Unit lvm.service failed to load: No such file or directory. See system logs and 'systemctl status lvm.service' for details.
May 22 20:07:06 riven systemd[1]: Stopping OpenSSH Daemon...
May 22 20:07:06 riven systemd[1]: Started SSH Key Generation.
May 22 20:07:06 riven systemd[1]: Starting OpenSSH Daemon...
May 22 20:07:06 riven systemd[1]: Started OpenSSH Daemon.
May 22 20:07:06 riven systemd[1]: Cannot add dependency job for unit lvm.service, ignoring: Unit lvm.service failed to load: No such file or directory. See system logs and 'systemctl status lvm.service' for details.
May 22 20:07:06 riven systemd[1]: Starting System Logger Daemon...
May 22 20:07:06 riven systemd[1]: Started System Logger Daemon.
May 22 20:07:06 riven systemd[1]: cronie.service holdoff time over, scheduling restart.
May 22 20:07:06 riven systemd[1]: Cannot add dependency job for unit lvm.service, ignoring: Unit lvm.service failed to load: No such file or directory. See system logs and 'systemctl status lvm.service' for details.
May 22 20:07:06 riven systemd[1]: Started NFSv2/3 Network Status Monitor Daemon.
May 22 20:07:09 riven systemd[1]: sshd.service: main process exited, code=killed, status=9/KILL
May 22 20:07:09 riven systemd[1]: Unit sshd.service entered failed state.
May 22 20:07:09 riven systemd[1]: syslog-ng.service: main process exited, code=killed, status=9/KILL
May 22 20:07:09 riven systemd[1]: Unit syslog-ng.service entered failed state.
May 22 20:07:09 riven systemd[1]: systemd-journald.service: main process exited, code=killed, status=9/KILL
May 22 20:07:10 riven kernel: [ 8239.939532] SysRq : Kill All Tasks
May 22 20:07:10 riven [ 8239.942395] systemd[1]: Unit systemd-journald.service entered failed state.
May 22 20:07:10 riven [ 8239.943000] systemd[1]: systemd-udevd.service: main process exited, code=killed, status=9/KILL
May 22 20:07:10 riven [ 8239.943778] systemd[1]: Unit systemd-udevd.service entered failed state.
May 22 20:07:10 riven [ 8239.944266] systemd[1]: dbus.service: main process exited, code=killed, status=9/KILL
May 22 20:07:10 riven [ 8239.944989] systemd[1]: Unit dbus.service entered failed state.
May 22 20:07:10 riven [ 8239.945422] systemd[1]: cronie.service: main process exited, code=killed, status=9/KILL
May 22 20:07:10 riven [ 8239.946121] systemd[1]: Unit cronie.service entered failed state.
May 22 20:07:10 riven [ 8239.946582] systemd[1]: rpcbind.service: main process exited, code=killed, status=9/KILL
May 22 20:07:10 riven [ 8239.947427] systemd[1]: Unit rpcbind.service entered failed state.
May 22 20:07:10 riven rpc.statd[5056]: Version 1.2.8 starting
May 22 20:07:10 riven [ 8239.947857] systemd[1]: rpc-statd.service: main process exited, code=killed, status=9/KILL
May 22 20:07:10 riven [ 8239.948624] systemd[1]: Unit rpc-statd.service entered failed state.
May 22 20:07:10 riven [ 8239.949063] systemd[1]: getty-2yfe/R6NngVTDjBF/Jpztg@public.gmane.org holdoff time over, scheduling restart.
May 22 20:07:10 riven [ 8239.949661] systemd[1]: Cannot add dependency job for unit lvm.service, ignoring: Unit lvm.service failed to load: No such file or directory. See system logs and 'systemctl status lvm.service' for details.
May 22 20:07:10 riven [ 8239.950658] systemd[1]: Stopping Getty on tty1...
May 22 20:07:10 riven [ 8239.951036] systemd[1]: Starting Getty on tty1...
May 22 20:07:10 riven [ 8239.951991] systemd[1]: Started Getty on tty1.
May 22 20:07:10 riven [ 8239.952501] systemd[1]: systemd-journald.service holdoff time over, scheduling restart.
May 22 20:07:10 riven [ 8239.953037] systemd[1]: Stopping Journal Service...
May 22 20:07:10 riven [ 8239.961433] systemd[1]: Starting Journal Service...
May 22 20:07:10 riven [ 8239.970154] systemd[1]: Started Journal Service.
May 22 20:07:10 riven [ 8239.978363] systemd[1]: Starting Trigger Flushing of Journal to Persistent Storage...
May 22 20:07:10 riven [ 8239.994267] systemd[1]: systemd-udevd.service holdoff time over, scheduling restart.
May 22 20:07:10 riven [ 8240.009916] systemd[1]: Stopping udev Kernel Device Manager...
May 22 20:07:10 riven [ 8240.012250] systemd-journald[5047]: File /var/log/journal/5b2137919d6f4039a3b2c2f21333daa6/system.journal corrupted or uncleanly shut down, renaming and replacing.
May 22 20:07:10 riven [ 8240.041143] systemd[1]: Starting udev Kernel Device Manager...
May 22 20:07:10 riven [ 8240.050383] systemd[1]: ntpd.service: main process exited, code=killed, status=9/KILL
May 22 20:07:10 riven [ 8240.053378] systemd-udevd[5049]: starting version 204
May 22 20:07:10 riven [ 8240.074818] systemd[1]: Unit ntpd.service entered failed state.
May 22 20:07:10 riven [ 8240.082734] systemd[1]: sshd.service holdoff time over, scheduling restart.
May 22 20:07:10 riven [ 8240.090582] systemd[1]: Cannot add dependency job for unit lvm.service, ignoring: Unit lvm.service failed to load: No such file or directory. See system logs and 'systemctl status lvm.service' for details.
May 22 20:07:10 riven [ 8240.115806] systemd[1]: Stopping OpenSSH Daemon...
May 22 20:07:10 riven [ 8240.122627] systemd[1]: Started SSH Key Generation.
May 22 20:07:10 riven [ 8240.129163] systemd[1]: Starting OpenSSH Daemon...
May 22 20:07:10 riven [ 8240.136086] systemd[1]: Started OpenSSH Daemon.
May 22 20:07:10 riven [ 8240.142750] systemd[1]: cronie.service holdoff time over, scheduling restart.
May 22 20:07:10 riven [ 8240.149212] systemd[1]: Cannot add dependency job for unit lvm.service, ignoring: Unit lvm.service failed to load: No such file or directory. See system logs and 'systemctl status lvm.service' for details.
May 22 20:07:10 riven [ 8240.169042] systemd[1]: Stopping Periodic Command Scheduler...
May 22 20:07:10 riven [ 8240.175785] systemd[1]: Starting Periodic Command Scheduler...
May 22 20:07:10 riven [ 8240.182815] systemd[1]: Started Periodic Command Scheduler.
May 22 20:07:10 riven [ 8240.189721] systemd[1]: rpcbind.service holdoff time over, scheduling restart.
May 22 20:07:10 riven [ 8240.203185] systemd[1]: Cannot add dependency job for unit lvm.service, ignoring: Unit lvm.service failed to load: No such file or directory. See system logs and 'systemctl status lvm.service' for details.
May 22 20:07:10 riven [ 8240.223731] systemd[1]: Stopping RPC Bind...
May 22 20:07:10 riven [ 8240.230514] systemd[1]: Starting RPC Bind...
May 22 20:07:10 riven [ 8240.237587] systemd[1]: Started udev Kernel Device Manager.
May 22 20:07:10 riven [ 8240.244349] systemd[1]: systemd-logind.service: main process exited, code=killed, status=9/KILL
May 22 20:07:10 riven [ 8240.257182] systemd[1]: Unit systemd-logind.service entered failed state.
May 22 20:07:10 riven [ 8240.263939] systemd[1]: Started RPC Bind.
May 22 20:07:10 riven [ 8240.270314] systemd[1]: Starting NFSv2/3 Network Status Monitor Daemon...
May 22 20:07:10 riven [ 8240.277481] systemd[1]: Cannot add dependency job for unit lvm.service, ignoring: Unit lvm.service failed to load: No such file or directory. See system logs and 'systemctl status lvm.service' for details.
May 22 20:07:10 riven [ 8240.297612] systemd[1]: Starting System Logger Daemon...
May 22 20:07:10 riven [ 8240.304684] systemd[1]: Started System Logger Daemon.
May 22 20:07:10 riven [ 8240.311549] systemd[1]: ntpd.service holdoff time over, scheduling restart.
May 22 20:07:10 riven [ 8240.318164] systemd[1]: Cannot add dependency job for unit lvm.service, ignoring: Unit lvm.service failed to load: No such file or directory. See system logs and 'systemctl status lvm.service' for details.
May 22 20:07:10 riven [ 8240.338387] systemd[1]: Stopping Network Time Service...
May 22 20:07:10 riven [ 8240.345345] systemd[1]: Starting Network Time Service...
May 22 20:07:10 riven ntpd[5059]: ntpd 4.2.6p5@1.2349-o Mon May  6 10:20:10 UTC 2013 (1)
May 22 20:07:10 riven ntpd[5060]: proto: precision = 0.558 usec
May 22 20:07:10 riven ntpd[5060]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123
May 22 20:07:10 riven ntpd[5060]: Listen and drop on 1 v6wildcard :: UDP 123
May 22 20:07:10 riven ntpd[5060]: Listen normally on 2 lo 127.0.0.1 UDP 123
May 22 20:07:10 riven ntpd[5060]: Listen normally on 3 lo ::1 UDP 123
May 22 20:07:10 riven ntpd[5060]: Listen normally on 4 eth0 fe80::21f:d0ff:fe26:7491 UDP 123
May 22 20:07:10 riven ntpd[5060]: peers refreshed
May 22 20:07:10 riven ntpd[5060]: Listening on routing socket on fd #21 for interface updates
May 22 20:07:10 riven ntpd[5060]: Deferring DNS for 0.pool.ntp.org 1
May 22 20:07:10 riven [ 8240.352774] systemd[1]: Started NFSv2/3 Network Status Monitor Daemon.
May 22 20:07:10 riven ntpd[5060]: Deferring DNS for 1.pool.ntp.org 1
May 22 20:07:10 riven ntpd[5060]: Deferring DNS for 2.pool.ntp.org 1
May 22 20:07:10 riven [ 8240.360274] systemd[1]: Started Network Time Service.
May 22 20:07:10 riven [ 8240.366831] systemd[1]: systemd-logind.service holdoff time over, scheduling restart.
May 22 20:07:10 riven [ 8240.379809] systemd[1]: Cannot add dependency job for unit lvm.service, ignoring: Unit lvm.service failed to load: No such file or directory. See system logs and 'systemctl status lvm.service' for details.
May 22 20:07:10 riven [ 8240.400281] systemd[1]: Stopping Login Service...
May 22 20:07:10 riven [ 8240.407082] systemd[1]: Starting Login Service...
May 22 20:07:10 riven [ 8240.416399] systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
May 22 20:07:10 riven [ 8240.430330] systemd[1]: Cannot add dependency job for unit lvm.service, ignoring: Unit lvm.service failed to load: No such file or directory. See system logs and 'systemctl status lvm.service' for details.
May 22 20:07:10 riven [ 8240.451224] systemd[1]: Starting D-Bus System Message Bus...
May 22 20:07:10 riven [ 8240.458934] systemd[1]: Started D-Bus System Message Bus.
May 22 20:07:10 riven systemd[1]: Started Login Service.
May 22 20:07:12 riven kernel: [ 8242.947507] SysRq : Emergency Sync
May 22 20:07:14 riven kernel: [ 8244.899488] SysRq : Emergency Remount R/O
May 22 20:08:06 riven systemd-sysctl[148]: Duplicate assignment of kernel/sysrq in file '/usr/lib/sysctl.d/50-default.conf', ignoring.
May 22 20:08:06 riven systemd[1]: Mounting Arbitrary Executable File Formats File System...
May 22 20:08:06 riven systemd[1]: Mounted Debug File System.
May 22 20:08:06 riven systemd[1]: Mounted POSIX Message Queue File System.
May 22 20:08:06 riven systemd[1]: Mounted Huge Pages File System.
May 22 20:08:06 riven systemd[1]: Started udev Kernel Device Manager.
May 22 20:08:06 riven systemd[1]: Mounted Arbitrary Executable File Formats File System.
May 22 20:08:06 riven systemd[1]: Started Set Up Additional Binary Formats.
May 22 20:08:06 riven systemd-fsck[161]: Ignoring error.
May 22 20:08:07 riven kernel: [    0.000000] Initializing cgroup subsys cpuset
May 22 20:08:07 riven kernel: [    0.000000] Initializing cgroup subsys cpu
May 22 20:08:07 riven kernel: [    0.000000] Linux version 3.9.3-1-ARCH (tobias@T-POWA-LX) (gcc version 4.8.0 20130502 (prerelease) (GCC) ) #1 SMP PREEMPT Sun May 19 22:50:29 CEST 2013
May 22 20:08:07 riven kernel: [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-linux root=/dev/mapper/riven-arch ro quiet
May 22 20:08:07 riven kernel: [    0.000000] e820: BIOS-provided physical RAM map:
May 22 20:08:07 riven kernel: [    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009e7ff] usable
May 22 20:08:07 riven kernel: [    0.000000] BIOS-e820: [mem 0x000000000009f800-0x000000000009ffff] reserved
May 22 20:08:07 riven kernel: [    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
May 22 20:08:07 riven kernel: [    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000cfeaffff] usable
May 22 20:08:07 riven kernel: [    0.000000] BIOS-e820: [mem 0x00000000cfeb0000-0x00000000cfee2fff] ACPI NVS
May 22 20:08:06 riven systemd[1]: Started File System Check on Root Device.
May 22 20:08:07 riven systemd[1]: Reached target Sockets.
May 22 20:08:07 riven kernel: [    0.000000] BIOS-e820: [mem 0x00000000cfee3000-0x00000000cfeeffff] ACPI data
May 22 20:08:07 riven kernel: [    0.000000] BIOS-e820: [mem 0x00000000cfef0000-0x00000000cfefffff] reserved
May 22 20:08:07 riven kernel: [    0.000000] BIOS-e820: [mem 0x00000000e0000000-0x00000000e3ffffff] reserved
May 22 20:08:07 riven kernel: [    0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000ffffffff] reserved
May 22 20:08:07 riven kernel: [    0.000000] BIOS-e820: [mem 0x0000000100000000-0x00000001afffffff] usable
May 22 20:08:07 riven kernel: [    0.000000] NX (Execute Disable) protection: active
May 22 20:08:07 riven kernel: [    0.000000] SMBIOS 2.4 present.
May 22 20:08:07 riven dhcpcd[274]: eth0: waiting for carrier
May 22 20:08:07 riven kernel: [    0.000000] No AGP bridge found
May 22 20:08:07 riven kernel: [    0.000000] e820: last_pfn = 0x1b0000 max_arch_pfn = 0x400000000
May 22 20:08:07 riven kernel: [    0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
May 22 20:08:07 riven kernel: [    0.000000] e820: last_pfn = 0xcfeb0 max_arch_pfn = 0x400000000
May 22 20:08:07 riven kernel: [    0.000000] found SMP MP-table at [mem 0x000f58e0-0x000f58ef] mapped at [ffff8800000f58e0]
May 22 20:08:07 riven kernel: [    0.000000] Scanning 1 areas for low memory corruption
May 22 20:08:07 riven kernel: [    0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
May 22 20:08:07 riven kernel: [    0.000000] init_memory_mapping: [mem 0x1afe00000-0x1afffffff]
May 22 20:08:07 riven kernel: [    0.000000] init_memory_mapping: [mem 0x1ac000000-0x1afdfffff]
May 22 20:08:07 riven kernel: [    0.000000] init_memory_mapping: [mem 0x180000000-0x1abffffff]
May 22 20:08:07 riven kernel: [    0.000000] init_memory_mapping: [mem 0x00100000-0xcfeaffff]
May 22 20:08:07 riven kernel: [    0.000000] init_memory_mapping: [mem 0x100000000-0x17fffffff]
May 22 20:08:07 riven kernel: [    0.000000] RAMDISK: [mem 0x37ae4000-0x37d69fff]
May 22 20:08:07 riven kernel: [    0.000000] ACPI: RSDP 00000000000f7710 00014 (v00 GBT   )
May 22 20:08:07 riven kernel: [    0.000000] ACPI: RSDT 00000000cfee3040 00034 (v01 GBT    GBTUACPI 42302E31 GBTU 01010101)
May 22 20:08:07 riven kernel: [    0.000000] ACPI: FACP 00000000cfee30c0 00074 (v01 GBT    GBTUACPI 42302E31 GBTU 01010101)
May 22 20:08:07 riven kernel: [    0.000000] ACPI: DSDT 00000000cfee3180 03C78 (v01 GBT    GBTUACPI 00001000 MSFT 0100000C)
May 22 20:08:07 riven kernel: [    0.000000] ACPI: FACS 00000000cfeb0000 00040
May 22 20:08:07 riven kernel: [    0.000000] ACPI: MCFG 00000000cfee6fc0 0003C (v01 GBT    GBTUACPI 42302E31 GBTU 01010101)
May 22 20:08:07 riven kernel: [    0.000000] ACPI: APIC 00000000cfee6e40 00084 (v01 GBT    GBTUACPI 42302E31 GBTU 01010101)
May 22 20:08:07 riven kernel: [    0.000000] ACPI: SSDT 00000000cfee79a0 003AB (v01  PmRef    CpuPm 00003000 INTL 20040311)
May 22 20:08:07 riven kernel: [    0.000000] No NUMA configuration found
May 22 20:08:07 riven kernel: [    0.000000] Faking a node at [mem 0x0000000000000000-0x00000001afffffff]
May 22 20:08:07 riven kernel: [    0.000000] Initmem setup node 0 [mem 0x00000000-0x1afffffff]
May 22 20:08:07 riven kernel: [    0.000000]   NODE_DATA [mem 0x1afff6000-0x1afffafff]
May 22 20:08:07 riven kernel: [    0.000000] Zone ranges:
May 22 20:08:07 riven kernel: [    0.000000]   DMA      [mem 0x00001000-0x00ffffff]
May 22 20:08:07 riven kernel: [    0.000000]   DMA32    [mem 0x01000000-0xffffffff]
May 22 20:08:07 riven kernel: [    0.000000]   Normal   [mem 0x100000000-0x1afffffff]
May 22 20:08:07 riven kernel: [    0.000000] Movable zone start for each node
May 22 20:08:07 riven kernel: [    0.000000] Early memory node ranges
May 22 20:08:07 riven kernel: [    0.000000]   node   0: [mem 0x00001000-0x0009dfff]
May 22 20:08:07 riven kernel: [    0.000000]   node   0: [mem 0x00100000-0xcfeaffff]
May 22 20:08:07 riven kernel: [    0.000000]   node   0: [mem 0x100000000-0x1afffffff]
May 22 20:08:07 riven kernel: [    0.000000] ACPI: PM-Timer IO Port: 0x408
May 22 20:08:07 riven kernel: [    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
May 22 20:08:07 riven kernel: [    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x03] enabled)
May 22 20:08:07 riven kernel: [    0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
May 22 20:08:07 riven kernel: [    0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x01] enabled)
May 22 20:08:07 riven kernel: [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1])
May 22 20:08:07 riven kernel: [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
May 22 20:08:07 riven kernel: [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1])
May 22 20:08:07 riven kernel: [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x03] dfl dfl lint[0x1])
May 22 20:08:07 riven kernel: [    0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
May 22 20:08:07 riven kernel: [    0.000000] IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
May 22 20:08:07 riven kernel: [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
May 22 20:08:07 riven kernel: [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
May 22 20:08:07 riven kernel: [    0.000000] Using ACPI (MADT) for SMP configuration information
May 22 20:08:07 riven kernel: [    0.000000] smpboot: Allowing 4 CPUs, 0 hotplug CPUs
May 22 20:08:07 riven kernel: [    0.000000] PM: Registered nosave memory: 000000000009e000 - 00000000000a0000
May 22 20:08:07 riven kernel: [    0.000000] PM: Registered nosave memory: 00000000000a0000 - 00000000000f0000
May 22 20:08:07 riven kernel: [    0.000000] PM: Registered nosave memory: 00000000000f0000 - 0000000000100000
May 22 20:08:07 riven kernel: [    0.000000] PM: Registered nosave memory: 00000000cfeb0000 - 00000000cfee3000
May 22 20:08:07 riven kernel: [    0.000000] PM: Registered nosave memory: 00000000cfee3000 - 00000000cfef0000
May 22 20:08:07 riven kernel: [    0.000000] PM: Registered nosave memory: 00000000cfef0000 - 00000000cff00000
May 22 20:08:07 riven kernel: [    0.000000] PM: Registered nosave memory: 00000000cff00000 - 00000000e0000000
May 22 20:08:07 riven kernel: [    0.000000] PM: Registered nosave memory: 00000000e0000000 - 00000000e4000000
May 22 20:08:07 riven kernel: [    0.000000] PM: Registered nosave memory: 00000000e4000000 - 00000000fec00000
May 22 20:08:07 riven kernel: [    0.000000] PM: Registered nosave memory: 00000000fec00000 - 0000000100000000
May 22 20:08:07 riven kernel: [    0.000000] e820: [mem 0xe4000000-0xfebfffff] available for PCI devices
May 22 20:08:07 riven kernel: [    0.000000] Booting paravirtualized kernel on bare hardware
May 22 20:08:07 riven kernel: [    0.000000] setup_percpu: NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:4 nr_node_ids:1
May 22 20:08:07 riven kernel: [    0.000000] PERCPU: Embedded 28 pages/cpu @ffff8801afc00000 s85824 r8192 d20672 u524288
May 22 20:08:07 riven kernel: [    0.000000] Built 1 zonelists in Node order, mobility grouping on.  Total pages: 1547837
May 22 20:08:07 riven kernel: [    0.000000] Policy zone: Normal
May 22 20:08:07 riven kernel: [    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-linux root=/dev/mapper/riven-arch ro quiet
May 22 20:08:07 riven kernel: [    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
May 22 20:08:07 riven kernel: [    0.000000] __ex_table already sorted, skipping sort
May 22 20:08:07 riven kernel: [    0.000000] Checking aperture...
May 22 20:08:07 riven kernel: [    0.000000] No AGP bridge found
May 22 20:08:07 riven kernel: [    0.000000] Memory: 6110480k/7077888k available (4983k kernel code, 788172k absent, 179236k reserved, 3967k data, 1092k init)
May 22 20:08:07 riven kernel: [    0.000000] SLUB: Genslabs=15, HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
May 22 20:08:07 riven kernel: [    0.000000] Preemptible hierarchical RCU implementation.
May 22 20:08:07 riven kernel: [    0.000000]    RCU dyntick-idle grace-period acceleration is enabled.
May 22 20:08:07 riven kernel: [    0.000000]    Dump stacks of tasks blocking RCU-preempt GP.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Broken nilfs2 filesystem
       [not found] ` <519D2B96.9000106-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
@ 2013-05-22 20:36   ` Anton Eliasson
       [not found]     ` <519D2C32.5040600-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
  2013-05-23  6:44   ` Vyacheslav Dubeyko
  1 sibling, 1 reply; 27+ messages in thread
From: Anton Eliasson @ 2013-05-22 20:36 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Anton Eliasson skrev 2013-05-22 22:33:
> Greetings!
> It pains me to report that my /home filesystem broke down today. My 
> system is running Arch Linux 64-bit. The filesystem resides on a 
> Crucial M4 256 GB SSD, on top of a LVM2 volume. The drive and 
> filesystem are both around six months old. Partition table and error 
> log excerpts are at the bottom of this e-mail. Full logs are available 
> upon request.
>
> I am providing this information as a bug report. I have no reason to 
> suspect the hardware but I cannot exclude it either. If you (the 
> developers) are interested in troubleshooting this for prosperity, I 
> can be your hands and run whatever tools are required. If not, I'll 
> reformat the filesystem, restore the data from backup and forget that 
> this happened.
>
> In case the formatting gets mangled, this e-mail is also available at
Right here: http://paste.debian.net/5841/

-- 
Best Regards,
Anton Eliasson

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Broken nilfs2 filesystem
       [not found]     ` <519D2C32.5040600-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
@ 2013-05-23  1:40       ` Ryusuke Konishi
  0 siblings, 0 replies; 27+ messages in thread
From: Ryusuke Konishi @ 2013-05-23  1:40 UTC (permalink / raw)
  To: Anton Eliasson; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,
On Wed, 22 May 2013 22:36:02 +0200, Anton Eliasson wrote:
> Anton Eliasson skrev 2013-05-22 22:33:
>> Greetings!
>> It pains me to report that my /home filesystem broke down today. My
>> system is running Arch Linux 64-bit. The filesystem resides on a
>> Crucial M4 256 GB SSD, on top of a LVM2 volume. The drive and
>> filesystem are both around six months old. Partition table and error
>> log excerpts are at the bottom of this e-mail. Full logs are available
>> upon request.
>>
>> I am providing this information as a bug report. I have no reason to
>> suspect the hardware but I cannot exclude it either. If you (the
>> developers) are interested in troubleshooting this for prosperity, I
>> can be your hands and run whatever tools are required. If not, I'll
>> reformat the filesystem, restore the data from backup and forget that
>> this happened.
>>
>> In case the formatting gets mangled, this e-mail is also available at
> Right here: http://paste.debian.net/5841/

Thank you for the report.

According to the log, btree of a regular file is destroyed for some reason.
I think we should look into how the btree block is broken.

Could you try the following commands to inspect the broken disk segment ?

 $ sudo dd if=/dev/dm-3 bs=4k count=2048 skip=14350336 iflag=direct 2>/dev/null | hexdump -C

This will print out blocks of the segment 7007 which includes the
broken btree block.

The following commands are also useful to get debug information.
Could you try them, too ?

 $ sudo nilfs-tune -l /dev/dm-3
 $ sudo dumpseg /dev/dm-3 7007
 $ lssu -a /dev/dm-3

The third command requires the device is mounted, so /home should be
mounted previously with a readonly option and a norecovery option:

 $ sudo mount -t nilfs2 -o ro,norecovery /dev/dm-3 /home


With regards,
Ryusuke Konishi

> -- 
> Best Regards,
> Anton Eliasson
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs"
> in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Broken nilfs2 filesystem
       [not found] ` <519D2B96.9000106-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
  2013-05-22 20:36   ` Anton Eliasson
@ 2013-05-23  6:44   ` Vyacheslav Dubeyko
  2013-05-25 11:59     ` Anton Eliasson
  1 sibling, 1 reply; 27+ messages in thread
From: Vyacheslav Dubeyko @ 2013-05-23  6:44 UTC (permalink / raw)
  To: Anton Eliasson; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Anton,

On Wed, 2013-05-22 at 22:33 +0200, Anton Eliasson wrote:
> Greetings!
> It pains me to report that my /home filesystem broke down today. My 
> system is running Arch Linux 64-bit. The filesystem resides on a Crucial 
> M4 256 GB SSD, on top of a LVM2 volume. The drive and filesystem are 
> both around six months old. Partition table and error log excerpts are 
> at the bottom of this e-mail. Full logs are available upon request.
> 
> I am providing this information as a bug report. I have no reason to 
> suspect the hardware but I cannot exclude it either. If you (the 
> developers) are interested in troubleshooting this for prosperity, I can 
> be your hands and run whatever tools are required. If not, I'll reformat 
> the filesystem, restore the data from backup and forget that this happened.
> 
> In case the formatting gets mangled, this e-mail is also available at 
> What happened today, in chronological order:
> 
> ~18:00
> ======
> I am troubleshooting some issues that turn out to be caused by a wrongly 
> configured system clock. The RTC (hardware clock) is set to local time 
> (UTC+2) but the OS is configured to treat the RTC as UTC. This is 
> because it was set to UTC previously, but then I reinstalled Windows 
> which promptly reset it to local time.
> 
> This set the mtime of some files in both / and /home to dates in the 
> future. When I discovered this, I `touch`ed all affected files (`touch 
> now; sudo find / /home -xdev -newer now -exec touch {} \;`) to reset 
> their mtime and rebooted the system. I do not know if this is relevant; 
> if not, it makes reading the log files more fun.
> 
> I then launch my command line backup program "bup", Firefox and some 
> other apps.
> 
> ~18:50-19:00
> ============
> Firefox freezes. The system keeps running but I can't launch new 
> programs. It looked like all I/O broke down. However, bup kept running. 
> I left the computer alone for perhaps 30-60 min.
> 

So, as I understand, a reproducing path is:
(1) set mtime of some files in the future;
(2) touch all affected files;
(3) reboot the system;
(4) launch backup program "bup", Firefox and some other apps.

I think that it makes sense to try this reproducing path. But we had
reports about the issue with likewise symptoms
(nilfs_bmap_lookup_contig: broken bmap) for the case of 4 KB block size
from other users. Unfortunately, I can't reproduce such issue for the
case of 4 KB blocks size earlier. As I feel the clear reproducing path
is crucial for this issue.

I understand that it can be hard to reproduce the issue again. But,
anyway, have you opportunity to try to reproduce the issue on another
NILFS2 partition on your side?

Anyway, I am going to reproduce the issue by this reproducing path on my
side.

> ~20:00
> ======
> When I came back, bup hade frozen (/var/log/messages at 18:53:31).[1] I 
> restart X by pressing Alt+SysRq+K (/var/log/messages at 20:06:33) and 
> return to the login screen. The system freezes during login though, 
> probably because /home had probably been mounted read only). So I reboot 
> using Alt+SysRq+REISUB (/var/log/messages at 20:07:05). I noticed some 
> I/O errors during shutdown.
> 
> After the reboot there are no immediate signs of disaster. I launch bup 
> again. Some time later, /home remounts as read only. I notice that bup 
> has reported I/O errors while reading some files in /home.[2] dmesg and 
> /var/log/kern.log contains errors mentioning "bad btree node" and 
> "nilfs_bmap_lookup_contig: broken bmap".[3]
> 

Now we have patch for overcome the freezing of system after such issue:
http://www.mail-archive.com/linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg01614.html.

With the best regards,
Vyacheslav Dubeyko.



--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Broken nilfs2 filesystem
  2013-05-23  6:44   ` Vyacheslav Dubeyko
@ 2013-05-25 11:59     ` Anton Eliasson
       [not found]       ` <51A0A7A0.6010207-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
  0 siblings, 1 reply; 27+ messages in thread
From: Anton Eliasson @ 2013-05-25 11:59 UTC (permalink / raw)
  To: slava-yeENwD64cLxBDgjK7y7TUQ; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Vyacheslav Dubeyko skrev 2013-05-23 08:44:
> Hi Anton,
>
> On Wed, 2013-05-22 at 22:33 +0200, Anton Eliasson wrote:
>> Greetings!
>> It pains me to report that my /home filesystem broke down today. My
>> system is running Arch Linux 64-bit. The filesystem resides on a Crucial
>> M4 256 GB SSD, on top of a LVM2 volume. The drive and filesystem are
>> both around six months old. Partition table and error log excerpts are
>> at the bottom of this e-mail. Full logs are available upon request.
>>
>> I am providing this information as a bug report. I have no reason to
>> suspect the hardware but I cannot exclude it either. If you (the
>> developers) are interested in troubleshooting this for prosperity, I can
>> be your hands and run whatever tools are required. If not, I'll reformat
>> the filesystem, restore the data from backup and forget that this happened.
>>
>> In case the formatting gets mangled, this e-mail is also available at
>> What happened today, in chronological order:
>>
>> ~18:00
>> ======
>> I am troubleshooting some issues that turn out to be caused by a wrongly
>> configured system clock. The RTC (hardware clock) is set to local time
>> (UTC+2) but the OS is configured to treat the RTC as UTC. This is
>> because it was set to UTC previously, but then I reinstalled Windows
>> which promptly reset it to local time.
>>
>> This set the mtime of some files in both / and /home to dates in the
>> future. When I discovered this, I `touch`ed all affected files (`touch
>> now; sudo find / /home -xdev -newer now -exec touch {} \;`) to reset
>> their mtime and rebooted the system. I do not know if this is relevant;
>> if not, it makes reading the log files more fun.
>>
>> I then launch my command line backup program "bup", Firefox and some
>> other apps.
>>
>> ~18:50-19:00
>> ============
>> Firefox freezes. The system keeps running but I can't launch new
>> programs. It looked like all I/O broke down. However, bup kept running.
>> I left the computer alone for perhaps 30-60 min.
>>
> So, as I understand, a reproducing path is:
> (1) set mtime of some files in the future;
> (2) touch all affected files;
> (3) reboot the system;
> (4) launch backup program "bup", Firefox and some other apps.
That about sums up what I did, yes. While debugging the clock problems I 
rebooted more than once in a short time period.
> I think that it makes sense to try this reproducing path. But we had
> reports about the issue with likewise symptoms
> (nilfs_bmap_lookup_contig: broken bmap) for the case of 4 KB block size
> from other users. Unfortunately, I can't reproduce such issue for the
> case of 4 KB blocks size earlier. As I feel the clear reproducing path
> is crucial for this issue.
>
> I understand that it can be hard to reproduce the issue again. But,
> anyway, have you opportunity to try to reproduce the issue on another
> NILFS2 partition on your side?
>
> Anyway, I am going to reproduce the issue by this reproducing path on my
> side.
I have created a new nilfs filesystem about the same size as the old one 
on another drive and restored /home to it. If I find the time this 
weekend, I'll give it the same treatment.
>> ~20:00
>> ======
>> When I came back, bup hade frozen (/var/log/messages at 18:53:31).[1] I
>> restart X by pressing Alt+SysRq+K (/var/log/messages at 20:06:33) and
>> return to the login screen. The system freezes during login though,
>> probably because /home had probably been mounted read only). So I reboot
>> using Alt+SysRq+REISUB (/var/log/messages at 20:07:05). I noticed some
>> I/O errors during shutdown.
>>
>> After the reboot there are no immediate signs of disaster. I launch bup
>> again. Some time later, /home remounts as read only. I notice that bup
>> has reported I/O errors while reading some files in /home.[2] dmesg and
>> /var/log/kern.log contains errors mentioning "bad btree node" and
>> "nilfs_bmap_lookup_contig: broken bmap".[3]
>>
> Now we have patch for overcome the freezing of system after such issue:
> http://www.mail-archive.com/linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg01614.html.
That is good. I shall await the next release with great anticipation.
> With the best regards,
> Vyacheslav Dubeyko.
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Best Regards,
Anton Eliasson

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Broken nilfs2 filesystem
       [not found]       ` <51A0A7A0.6010207-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
@ 2013-05-25 16:26         ` Anton Eliasson
       [not found]           ` <51A0E62D.5060600-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
  0 siblings, 1 reply; 27+ messages in thread
From: Anton Eliasson @ 2013-05-25 16:26 UTC (permalink / raw)
  To: slava-yeENwD64cLxBDgjK7y7TUQ; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Anton Eliasson skrev 2013-05-25 13:59:
[...]
>>> ~20:00
>>> ======
>>> When I came back, bup hade frozen (/var/log/messages at 18:53:31).[1] I
>>> restart X by pressing Alt+SysRq+K (/var/log/messages at 20:06:33) and
>>> return to the login screen. The system freezes during login though,
>>> probably because /home had probably been mounted read only). So I reboot
>>> using Alt+SysRq+REISUB (/var/log/messages at 20:07:05). I noticed some
>>> I/O errors during shutdown.
>>>
>>> After the reboot there are no immediate signs of disaster. I launch bup
>>> again. Some time later, /home remounts as read only. I notice that bup
>>> has reported I/O errors while reading some files in /home.[2] dmesg and
>>> /var/log/kern.log contains errors mentioning "bad btree node" and
>>> "nilfs_bmap_lookup_contig: broken bmap".[3]
>>>
>> Now we have patch for overcome the freezing of system after such issue:
>> http://www.mail-archive.com/linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg01614.html.
> That is good. I shall await the next release with great anticipation.
I don't think the bug described in the patch you linked to is 
responsible for my crashes. Check this out:

May 25 17:15:12 riven kernel: [ 1165.629786] /dev/vmnet: port on hub 0 
successfully opened
May 25 17:15:15 riven kernel: [ 1168.871258] /dev/vmnet: open called by 
PID 2073 (vmx-vcpu-0)
May 25 17:15:15 riven kernel: [ 1168.871281] /dev/vmnet: port on hub 0 
successfully opened
May 25 17:15:34 riven kernel: [ 1187.572676] /dev/vmnet: open called by 
PID 2075 (vmx-vcpu-1)
May 25 17:15:34 riven kernel: [ 1187.572693] /dev/vmnet: port on hub 0 
successfully opened
May 25 17:15:38 riven kernel: [ 1192.188770] BUG: unable to handle 
kernel NULL pointer dereference at 0000000000000b95
May 25 17:15:38 riven kernel: [ 1192.188781] IP: [<ffffffffa03021a2>] 
nilfs_end_page_io+0x12/0xc0 [nilfs2]
May 25 17:15:38 riven kernel: [ 1192.188798] PGD 1982f8067 PUD 198e2b067 
PMD 0
May 25 17:15:38 riven kernel: [ 1192.188803] Oops: 0000 [#1] PREEMPT SMP
May 25 17:15:38 riven kernel: [ 1192.188809] Modules linked in: nfsv3 
nfs_acl vmnet(O) ppdev parport_pc parport fuse vsock vmci(O) vmmon(O) 
ext4 crc16 mbcache jbd2 nvidia(PO) gpio_ich iTCO_wdt iTCO_vendor_support 
coretemp kvm_intel kvm snd_hda_codec_realtek pcspkr psmouse microcode 
serio_raw i2c_i801 snd_hda_intel lpc_ich snd_hda_codec drm evdev r8169 
snd_hwdep snd_pcm i2c_core snd_page_alloc mii acpi_cpufreq snd_timer 
intel_agp mperf intel_gtt snd soundcore button processor loop nfs lockd 
sunrpc fscache nilfs2 dm_mod sd_mod sr_mod cdrom ata_generic pata_acpi 
hid_generic usbhid hid ahci libahci pata_it8213 libata firewire_ohci 
scsi_mod firewire_core crc_itu_t ehci_pci uhci_hcd ehci_hcd usbcore 
usb_common
May 25 17:15:38 riven kernel: [ 1192.188877] CPU 1
May 25 17:15:38 riven kernel: [ 1192.188883] Pid: 262, comm: 
nilfs_cleanerd Tainted: P           O 3.9.3-1-ARCH #1 Gigabyte 
Technology Co., Ltd. EP45-DS4/EP45-DS4
May 25 17:15:38 riven kernel: [ 1192.188888] RIP: 
0010:[<ffffffffa03021a2>]  [<ffffffffa03021a2>] 
nilfs_end_page_io+0x12/0xc0 [nilfs2]
May 25 17:15:38 riven kernel: [ 1192.188897] RSP: 0018:ffff880195afdb30 
  EFLAGS: 00010206
May 25 17:15:38 riven kernel: [ 1192.188900] RAX: ffff8801a25e7d48 RBX: 
0000000000000b95 RCX: 0000000000000034
May 25 17:15:38 riven kernel: [ 1192.188903] RDX: 000000000000000d RSI: 
0000000000000000 RDI: 0000000000000b95
May 25 17:15:38 riven kernel: [ 1192.188906] RBP: ffff880195afdb38 R08: 
a200000000000000 R09: a800028051000000
May 25 17:15:38 riven kernel: [ 1192.188908] R10: 57ffe77fafa01440 R11: 
0000000000000019 R12: ffff8801988b2648
May 25 17:15:38 riven kernel: [ 1192.188911] R13: ffff8801a25e7d00 R14: 
ffffea00000d04c0 R15: ffffea0000a01180
May 25 17:15:38 riven kernel: [ 1192.188914] FS:  00007f8bf81f3740(0000) 
GS:ffff8801afc80000(0000) knlGS:0000000000000000
May 25 17:15:38 riven kernel: [ 1192.188917] CS:  0010 DS: 0000 ES: 0000 
CR0: 000000008005003b
May 25 17:15:38 riven kernel: [ 1192.188920] CR2: 0000000000000b95 CR3: 
00000001959eb000 CR4: 00000000000007e0
May 25 17:15:38 riven kernel: [ 1192.188923] DR0: 0000000000000000 DR1: 
0000000000000000 DR2: 0000000000000000
May 25 17:15:38 riven kernel: [ 1192.188925] DR3: 0000000000000000 DR6: 
00000000ffff0ff0 DR7: 0000000000000400
May 25 17:15:38 riven kernel: [ 1192.188928] Process nilfs_cleanerd 
(pid: 262, threadinfo ffff880195afc000, task ffff880195f2c300)
May 25 17:15:38 riven kernel: [ 1192.188930] Stack:
May 25 17:15:38 riven kernel: [ 1192.188932]  ffff8801988b25a0 
ffff880195afdc20 ffffffffa0303ed5 ffffea0002dfb7c0
May 25 17:15:38 riven kernel: [ 1192.188938]  ffff880195f2c300 
ffff880195f2c300 ffff880195f2c300 ffff8801a56b8a70
May 25 17:15:38 riven kernel: [ 1192.188942]  ffff8801a49d0b60 
ffff8801a49d0a00 0000000102dfb7c0 ffff8801a56b8a60
May 25 17:15:38 riven kernel: [ 1192.188947] Call Trace:
May 25 17:15:38 riven kernel: [ 1192.188959]  [<ffffffffa0303ed5>] 
nilfs_segctor_do_construct+0xd65/0x1ab0 [nilfs2]
May 25 17:15:38 riven kernel: [ 1192.188969]  [<ffffffffa0304e42>] 
nilfs_segctor_construct+0x172/0x290 [nilfs2]
May 25 17:15:38 riven kernel: [ 1192.188978]  [<ffffffffa0305ead>] 
nilfs_clean_segments+0xed/0x270 [nilfs2]
May 25 17:15:38 riven kernel: [ 1192.188985]  [<ffffffff811bc4bc>] ? 
__set_page_dirty+0x6c/0xc0
May 25 17:15:38 riven kernel: [ 1192.188994]  [<ffffffffa030c06f>] 
nilfs_ioctl_clean_segments.isra.14+0x4bf/0x740 [nilfs2]
May 25 17:15:38 riven kernel: [ 1192.189003]  [<ffffffffa02fca8d>] ? 
nilfs_btree_lookup+0x4d/0x70 [nilfs2]
May 25 17:15:38 riven kernel: [ 1192.189012]  [<ffffffffa030c70c>] 
nilfs_ioctl+0x21c/0x740 [nilfs2]
May 25 17:15:38 riven kernel: [ 1192.189018]  [<ffffffff8119cf65>] 
do_vfs_ioctl+0x2e5/0x4d0
May 25 17:15:38 riven kernel: [ 1192.189025]  [<ffffffff81152930>] ? 
do_munmap+0x2b0/0x3e0
May 25 17:15:38 riven kernel: [ 1192.189029]  [<ffffffff8119d1d1>] 
sys_ioctl+0x81/0xa0
May 25 17:15:38 riven kernel: [ 1192.189036]  [<ffffffff814d3769>] ? 
do_device_not_available+0x19/0x20
May 25 17:15:38 riven kernel: [ 1192.189042]  [<ffffffff814d9e9d>] 
system_call_fastpath+0x1a/0x1f
May 25 17:15:38 riven kernel: [ 1192.189044] Code: ff ff ff 48 81 c4 88 
00 00 00 5b 41 5c 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 
48 85 ff 48 89 e5 53 48 89 fb 74 4e <48> 8b 07 f6 c4 08 0f 84 8c 00 00 
00 48 8b 47 30 48 8b 00 f6 c4
May 25 17:15:38 riven kernel: [ 1192.189089] RIP  [<ffffffffa03021a2>] 
nilfs_end_page_io+0x12/0xc0 [nilfs2]
May 25 17:15:38 riven kernel: [ 1192.189098]  RSP <ffff880195afdb30>
May 25 17:15:38 riven kernel: [ 1192.189100] CR2: 0000000000000b95
May 25 17:15:38 riven kernel: [ 1192.189104] ---[ end trace 
0c7496171e3b9dfd ]---
May 25 18:03:02 riven kernel: [    0.000000] Initializing cgroup subsys 
cpuset
May 25 18:03:02 riven kernel: [    0.000000] Initializing cgroup subsys cpu
May 25 18:03:02 riven kernel: [    0.000000] Linux version 3.9.3-1-ARCH 
(tobias@T-POWA-LX) (gcc version 4.8.0 20130502 (prerelease) (GCC) ) #1 
SMP PREEMPT Sun May 19 22:50:29 CEST 2013
May 25 18:03:02 riven kernel: [    0.000000] Command line: 
BOOT_IMAGE=/boot/vmlinuz-linux root=/dev/mapper/riven-arch ro quiet

No remounts, just a kernel oops. I can reproduce this without fail by 
booting a VMware Workstation (9.0.2) virtual machine that resides on the 
nilfs /home volume while another virtual machine is doing something 
IO-intensive.

More specifically, I have a virtual machine running Windows XP in /home, 
a nilfs filesystem, and a virtual machine running Windows 7 in 
/Supplement. /Supplement is an ext4 volume in the same LVM volume group 
as /home on the same slow hard drive. I can crash the host by either:

* Starting both machines at the same time.
* Starting the W7 machine first and when it is fully booted to the 
desktop, but still doing I/O intensive Windows stuff, starting the WXP 
machine.

If I first start the WXP machine and let it boot to the desktop, at the 
point where it is actually I/O idle, I can safely start the W7 machine. 
After that I found no trouble installing software updates and logging in 
and out of both machines at the same time, though the HDD made it very 
slow of course.

After the host had crashed, I could still list and read files in /home 
but as soon as I attempted to `touch` a file, that terminal froze. Any 
terminal that attempted to read a file after that point froze as well 
and there was nothing left to do but to Alt+SysRq+B.

-- 
Best Regards,
Anton Eliasson
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Broken nilfs2 filesystem
       [not found]           ` <51A0E62D.5060600-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
@ 2013-05-26 12:54             ` Vyacheslav Dubeyko
  2013-05-29  6:39             ` Vyacheslav Dubeyko
  1 sibling, 0 replies; 27+ messages in thread
From: Vyacheslav Dubeyko @ 2013-05-26 12:54 UTC (permalink / raw)
  To: Anton Eliasson; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Anton,

On May 25, 2013, at 8:26 PM, Anton Eliasson wrote:

> Anton Eliasson skrev 2013-05-25 13:59:
> [...]
>>>> ~20:00
>>>> ======
>>>> When I came back, bup hade frozen (/var/log/messages at 18:53:31).[1] I
>>>> restart X by pressing Alt+SysRq+K (/var/log/messages at 20:06:33) and
>>>> return to the login screen. The system freezes during login though,
>>>> probably because /home had probably been mounted read only). So I reboot
>>>> using Alt+SysRq+REISUB (/var/log/messages at 20:07:05). I noticed some
>>>> I/O errors during shutdown.
>>>> 
>>>> After the reboot there are no immediate signs of disaster. I launch bup
>>>> again. Some time later, /home remounts as read only. I notice that bup
>>>> has reported I/O errors while reading some files in /home.[2] dmesg and
>>>> /var/log/kern.log contains errors mentioning "bad btree node" and
>>>> "nilfs_bmap_lookup_contig: broken bmap".[3]
>>>> 
>>> Now we have patch for overcome the freezing of system after such issue:
>>> http://www.mail-archive.com/linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg01614.html.
>> That is good. I shall await the next release with great anticipation.
> I don't think the bug described in the patch you linked to is responsible for my crashes. Check this out:
> 

I didn't state that this patch solves your issue. I meant that after remount in RO mode the NILFS2 driver still has dirty
pages for the case of issue. This patch solves the issue of infinite trying of flushing these dirty pages by kernel
flush thread. The infinite trying to flush dirty pages can result in system freezing, as I understand.

[snip]

> No remounts, just a kernel oops. I can reproduce this without fail by booting a VMware Workstation (9.0.2) virtual machine that resides on the nilfs /home volume while another virtual machine is doing something IO-intensive.
> 

Sorry, I am confused slightly by different descriptions of the issue in your e-mails. Initially, I have understanding
that, first of all, you have issue with remount in RO mode. But now you are talking about crash without remount.

Could you share full system log that you have for the issue case?

I need to understand a sequence of events. Maybe, you have two issues instead of one. Currently, I haven't
clear picture of the issue's environment.

Thanks,
Vyacheslav Dubeyko.

> More specifically, I have a virtual machine running Windows XP in /home, a nilfs filesystem, and a virtual machine running Windows 7 in /Supplement. /Supplement is an ext4 volume in the same LVM volume group as /home on the same slow hard drive. I can crash the host by either:
> 
> * Starting both machines at the same time.
> * Starting the W7 machine first and when it is fully booted to the desktop, but still doing I/O intensive Windows stuff, starting the WXP machine.
> 
> If I first start the WXP machine and let it boot to the desktop, at the point where it is actually I/O idle, I can safely start the W7 machine. After that I found no trouble installing software updates and logging in and out of both machines at the same time, though the HDD made it very slow of course.
> 
> After the host had crashed, I could still list and read files in /home but as soon as I attempted to `touch` a file, that terminal froze. Any terminal that attempted to read a file after that point froze as well and there was nothing left to do but to Alt+SysRq+B.
> 
> -- 
> Best Regards,
> Anton Eliasson
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Broken nilfs2 filesystem
       [not found]           ` <51A0E62D.5060600-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
  2013-05-26 12:54             ` Vyacheslav Dubeyko
@ 2013-05-29  6:39             ` Vyacheslav Dubeyko
  2013-05-29 14:37               ` Ryusuke Konishi
  2013-05-30  8:10               ` Anton Eliasson
  1 sibling, 2 replies; 27+ messages in thread
From: Vyacheslav Dubeyko @ 2013-05-29  6:39 UTC (permalink / raw)
  To: Anton Eliasson; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Anton

On Sat, 2013-05-25 at 18:26 +0200, Anton Eliasson wrote:

[snip]
> More specifically, I have a virtual machine running Windows XP in /home, 
> a nilfs filesystem, and a virtual machine running Windows 7 in 
> /Supplement. /Supplement is an ext4 volume in the same LVM volume group 
> as /home on the same slow hard drive. I can crash the host by either:
> 
> * Starting both machines at the same time.
> * Starting the W7 machine first and when it is fully booted to the 
> desktop, but still doing I/O intensive Windows stuff, starting the WXP 
> machine.
> 
> If I first start the WXP machine and let it boot to the desktop, at the 
> point where it is actually I/O idle, I can safely start the W7 machine. 
> After that I found no trouble installing software updates and logging in 
> and out of both machines at the same time, though the HDD made it very 
> slow of course.

Currently, I am thinking about reproducing path. It is really important
to have clear reproducing path. But I haven't clear picture of your
environment yet. As I understand, you have two virtual VmWare machine
(Win XP and Win 7). Am I correct?

Moreover, I am thinking about the fact that virtual machine on different
volumes influence on each other in the issue environment. Currently, I
haven't clear understanding of this.

> /etc/fstab
> ----------
>      tmpfs		/tmp	tmpfs	nodev,nosuid	0	0
>      /dev/mapper/riven-arch	/         	nilfs2    	rw,noatime,discard 0 0
>      /dev/mapper/riven-home	/home     	nilfs2    	rw,noatime,discard 0 0
>      /dev/mapper/riven-swap  none            swap            defaults 
>          0 0
>      /dev/riven-proto/supplement /Supplement ext4 defaults,noatime 0 0
> # some NFS mounts excluded
>

As I can see, riven-arch, riven-home and riven-swap are under device
mapper but riven-proto is not. Could you share more details about how
your Logical Volumes environment was prepared?

Current state of fsck.nilfs2 doesn't give many useful details. But debug
output of fsck.nilfs2 contains detailed info about first superblock,
second superblock and segment summaries of all segments. I think that
this output can give to me more understanding about NILFS2 volume state.
Could you share debug output of fsck.nilfs2 for me?

You can found archive with fsck.nilf2 source code in this place:
(http://dubeyko.com/development/FileSystems/NILFS/nilfs-utils-fsck-v.0.04-under-development.tar.gz). Please, build fsck.nilfs2 but don't install it. The fsck.nilfs2 on the initial state of development. Currently, fsck.nilfs2 doesn't make any writing operations. So, you can execute command in such way: "fsck.nilfs2 -v debug [device] 2> [output-file]". The output file has a big size, usually.

I am preparing patch for NILFS2 driver with debug output. I think that
it makes sense to get more detail about the issue on your side because
you can reproduce the issue stably. So, I'll send you this patch as it
will be ready. Have you opportunity to patch your kernel and share debug
output for the reproduced issue case?

Thanks,
Vyacheslav Dubeyko.


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Broken nilfs2 filesystem
  2013-05-29  6:39             ` Vyacheslav Dubeyko
@ 2013-05-29 14:37               ` Ryusuke Konishi
       [not found]                 ` <20130529.233757.27789741.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
  2013-05-30  8:10               ` Anton Eliasson
  1 sibling, 1 reply; 27+ messages in thread
From: Ryusuke Konishi @ 2013-05-29 14:37 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: Anton Eliasson, linux-nilfs-u79uwXL29TY76Z2rM5mHXA


I don't know whether this may be a hint of this trouble, but according
to the system log, page_buffers() of nilfs_end_page_io() seems to hit
an Oops due to an invalid page address "0x36cd":

May 22 18:53:31 riven kernel: [ 3821.605568] BUG: unable to handle kernel paging request at 00000000000036cd
May 22 18:53:31 riven kernel: [ 3821.605577] IP: [<ffffffffa027f1a2>] nilfs_end_page_io+0x12/0xc0 [nilfs2]
May 22 18:53:31 riven kernel: [ 3821.605591] PGD 19636d067 PUD 19636e067 PMD 0 
May 22 18:53:31 riven kernel: [ 3821.605597] Oops: 0000 [#1] PREEMPT SMP 
<snip>
May 22 18:53:31 riven kernel: [ 3821.605829] Code: ff ff ff 48 81 c4 88 00 00 00 5b 41 5c 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 85 ff 48 89 e5 53 48 89 fb 74 4e <48> 8b 07 f6 c4 08 0f 84 8c 00 00 00 48 8b 47 30 48 8b 00 f6 c4 
May 22 18:53:31 riven kernel: [ 3821.605873] RIP  [<ffffffffa027f1a2>] nilfs_end_page_io+0x12/0xc0 [nilfs2]
May 22 18:53:31 riven kernel: [ 3821.605881]  RSP <ffff8801960f7b30>
May 22 18:53:31 riven kernel: [ 3821.605884] CR2: 00000000000036cd

where the instruction sequence of "<48> 8b 07 f6 c4 08" is "mov
(%rdi),%rax; test $0x8, %ah", and corresponds to the part testing
PagePrivate(page) in page_buffers() macro called within
nilfs_end_page_io() routine:

      if (buffer_nilfs_node(page_buffers(page)) && !PageWriteback(page)) {

This cannot happen, but there may be something we missed.


Regards,
Ryusuke Konishi



On Wed, 29 May 2013 10:39:33 +0400, Vyacheslav Dubeyko wrote:
> Hi Anton
> 
> On Sat, 2013-05-25 at 18:26 +0200, Anton Eliasson wrote:
> 
> [snip]
>> More specifically, I have a virtual machine running Windows XP in /home, 
>> a nilfs filesystem, and a virtual machine running Windows 7 in 
>> /Supplement. /Supplement is an ext4 volume in the same LVM volume group 
>> as /home on the same slow hard drive. I can crash the host by either:
>> 
>> * Starting both machines at the same time.
>> * Starting the W7 machine first and when it is fully booted to the 
>> desktop, but still doing I/O intensive Windows stuff, starting the WXP 
>> machine.
>> 
>> If I first start the WXP machine and let it boot to the desktop, at the 
>> point where it is actually I/O idle, I can safely start the W7 machine. 
>> After that I found no trouble installing software updates and logging in 
>> and out of both machines at the same time, though the HDD made it very 
>> slow of course.
> 
> Currently, I am thinking about reproducing path. It is really important
> to have clear reproducing path. But I haven't clear picture of your
> environment yet. As I understand, you have two virtual VmWare machine
> (Win XP and Win 7). Am I correct?
> 
> Moreover, I am thinking about the fact that virtual machine on different
> volumes influence on each other in the issue environment. Currently, I
> haven't clear understanding of this.
> 
>> /etc/fstab
>> ----------
>>      tmpfs		/tmp	tmpfs	nodev,nosuid	0	0
>>      /dev/mapper/riven-arch	/         	nilfs2    	rw,noatime,discard 0 0
>>      /dev/mapper/riven-home	/home     	nilfs2    	rw,noatime,discard 0 0
>>      /dev/mapper/riven-swap  none            swap            defaults 
>>          0 0
>>      /dev/riven-proto/supplement /Supplement ext4 defaults,noatime 0 0
>> # some NFS mounts excluded
>>
> 
> As I can see, riven-arch, riven-home and riven-swap are under device
> mapper but riven-proto is not. Could you share more details about how
> your Logical Volumes environment was prepared?
> 
> Current state of fsck.nilfs2 doesn't give many useful details. But debug
> output of fsck.nilfs2 contains detailed info about first superblock,
> second superblock and segment summaries of all segments. I think that
> this output can give to me more understanding about NILFS2 volume state.
> Could you share debug output of fsck.nilfs2 for me?
> 
> You can found archive with fsck.nilf2 source code in this place:
> (http://dubeyko.com/development/FileSystems/NILFS/nilfs-utils-fsck-v.0.04-under-development.tar.gz). Please, build fsck.nilfs2 but don't install it. The fsck.nilfs2 on the initial state of development. Currently, fsck.nilfs2 doesn't make any writing operations. So, you can execute command in such way: "fsck.nilfs2 -v debug [device] 2> [output-file]". The output file has a big size, usually.
> 
> I am preparing patch for NILFS2 driver with debug output. I think that
> it makes sense to get more detail about the issue on your side because
> you can reproduce the issue stably. So, I'll send you this patch as it
> will be ready. Have you opportunity to patch your kernel and share debug
> output for the reproduced issue case?
> 
> Thanks,
> Vyacheslav Dubeyko.
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Broken nilfs2 filesystem
       [not found]                 ` <20130529.233757.27789741.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
@ 2013-05-30  6:13                   ` Vyacheslav Dubeyko
  2013-05-30  6:55                     ` Ryusuke Konishi
  0 siblings, 1 reply; 27+ messages in thread
From: Vyacheslav Dubeyko @ 2013-05-30  6:13 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: Anton Eliasson, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Wed, 2013-05-29 at 23:37 +0900, Ryusuke Konishi wrote:
> I don't know whether this may be a hint of this trouble, but according
> to the system log, page_buffers() of nilfs_end_page_io() seems to hit
> an Oops due to an invalid page address "0x36cd":
> 

Yes. There are two possible way to be in nilfs_end_page_io(): (1)
nilfs_segctor_complete_write(); (2) nilfs_abort_logs(). Currently, I
suspect the nilfs_abort_logs() because of compiler optimization. But now
I haven't evidence of it. And it needs to investigate issue more deeply
for stating something definitely, I think.

With the best regards,
Vyacheslav Dubeyko.

> May 22 18:53:31 riven kernel: [ 3821.605568] BUG: unable to handle kernel paging request at 00000000000036cd
> May 22 18:53:31 riven kernel: [ 3821.605577] IP: [<ffffffffa027f1a2>] nilfs_end_page_io+0x12/0xc0 [nilfs2]
> May 22 18:53:31 riven kernel: [ 3821.605591] PGD 19636d067 PUD 19636e067 PMD 0 
> May 22 18:53:31 riven kernel: [ 3821.605597] Oops: 0000 [#1] PREEMPT SMP 
> <snip>
> May 22 18:53:31 riven kernel: [ 3821.605829] Code: ff ff ff 48 81 c4 88 00 00 00 5b 41 5c 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 85 ff 48 89 e5 53 48 89 fb 74 4e <48> 8b 07 f6 c4 08 0f 84 8c 00 00 00 48 8b 47 30 48 8b 00 f6 c4 
> May 22 18:53:31 riven kernel: [ 3821.605873] RIP  [<ffffffffa027f1a2>] nilfs_end_page_io+0x12/0xc0 [nilfs2]
> May 22 18:53:31 riven kernel: [ 3821.605881]  RSP <ffff8801960f7b30>
> May 22 18:53:31 riven kernel: [ 3821.605884] CR2: 00000000000036cd
> 
> where the instruction sequence of "<48> 8b 07 f6 c4 08" is "mov
> (%rdi),%rax; test $0x8, %ah", and corresponds to the part testing
> PagePrivate(page) in page_buffers() macro called within
> nilfs_end_page_io() routine:
> 
>       if (buffer_nilfs_node(page_buffers(page)) && !PageWriteback(page)) {
> 
> This cannot happen, but there may be something we missed.
> 
> 
> Regards,
> Ryusuke Konishi


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Broken nilfs2 filesystem
  2013-05-30  6:13                   ` Vyacheslav Dubeyko
@ 2013-05-30  6:55                     ` Ryusuke Konishi
       [not found]                       ` <20130530.155543.480320022.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
  0 siblings, 1 reply; 27+ messages in thread
From: Ryusuke Konishi @ 2013-05-30  6:55 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: Anton Eliasson, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Thu, 30 May 2013 10:13:05 +0400, Vyacheslav Dubeyko wrote:
> On Wed, 2013-05-29 at 23:37 +0900, Ryusuke Konishi wrote:
>> I don't know whether this may be a hint of this trouble, but according
>> to the system log, page_buffers() of nilfs_end_page_io() seems to hit
>> an Oops due to an invalid page address "0x36cd":
>> 
> 
> Yes. There are two possible way to be in nilfs_end_page_io(): (1)
> nilfs_segctor_complete_write(); (2) nilfs_abort_logs(). Currently, I
> suspect the nilfs_abort_logs()

That sounds a likely cause.

Can you test nilfs_abort_logs by injecting a random fault in some easy
way ?

Regards,
Ryusuke Konishi


> because of compiler optimization. But now
> I haven't evidence of it. And it needs to investigate issue more deeply
> for stating something definitely, I think.


> With the best regards,
> Vyacheslav Dubeyko.
> 
>> May 22 18:53:31 riven kernel: [ 3821.605568] BUG: unable to handle kernel paging request at 00000000000036cd
>> May 22 18:53:31 riven kernel: [ 3821.605577] IP: [<ffffffffa027f1a2>] nilfs_end_page_io+0x12/0xc0 [nilfs2]
>> May 22 18:53:31 riven kernel: [ 3821.605591] PGD 19636d067 PUD 19636e067 PMD 0 
>> May 22 18:53:31 riven kernel: [ 3821.605597] Oops: 0000 [#1] PREEMPT SMP 
>> <snip>
>> May 22 18:53:31 riven kernel: [ 3821.605829] Code: ff ff ff 48 81 c4 88 00 00 00 5b 41 5c 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 85 ff 48 89 e5 53 48 89 fb 74 4e <48> 8b 07 f6 c4 08 0f 84 8c 00 00 00 48 8b 47 30 48 8b 00 f6 c4 
>> May 22 18:53:31 riven kernel: [ 3821.605873] RIP  [<ffffffffa027f1a2>] nilfs_end_page_io+0x12/0xc0 [nilfs2]
>> May 22 18:53:31 riven kernel: [ 3821.605881]  RSP <ffff8801960f7b30>
>> May 22 18:53:31 riven kernel: [ 3821.605884] CR2: 00000000000036cd
>> 
>> where the instruction sequence of "<48> 8b 07 f6 c4 08" is "mov
>> (%rdi),%rax; test $0x8, %ah", and corresponds to the part testing
>> PagePrivate(page) in page_buffers() macro called within
>> nilfs_end_page_io() routine:
>> 
>>       if (buffer_nilfs_node(page_buffers(page)) && !PageWriteback(page)) {
>> 
>> This cannot happen, but there may be something we missed.
>> 
>> 
>> Regards,
>> Ryusuke Konishi
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Broken nilfs2 filesystem
       [not found]                       ` <20130530.155543.480320022.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
@ 2013-05-30  7:21                         ` Vyacheslav Dubeyko
  2013-06-06  6:56                         ` Vyacheslav Dubeyko
  1 sibling, 0 replies; 27+ messages in thread
From: Vyacheslav Dubeyko @ 2013-05-30  7:21 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: Anton Eliasson, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Thu, 2013-05-30 at 15:55 +0900, Ryusuke Konishi wrote:
> On Thu, 30 May 2013 10:13:05 +0400, Vyacheslav Dubeyko wrote:
> > On Wed, 2013-05-29 at 23:37 +0900, Ryusuke Konishi wrote:
> >> I don't know whether this may be a hint of this trouble, but according
> >> to the system log, page_buffers() of nilfs_end_page_io() seems to hit
> >> an Oops due to an invalid page address "0x36cd":
> >> 
> > 
> > Yes. There are two possible way to be in nilfs_end_page_io(): (1)
> > nilfs_segctor_complete_write(); (2) nilfs_abort_logs(). Currently, I
> > suspect the nilfs_abort_logs()
> 
> That sounds a likely cause.
> 
> Can you test nilfs_abort_logs by injecting a random fault in some easy
> way ?
> 

Yes, sure. Now I am thinking about proper place for such injection. I'll
share results of such attempt.

With the best regards,
Vyacheslav Dubeyko.

> Regards,
> Ryusuke Konishi
> 
> 
> > because of compiler optimization. But now
> > I haven't evidence of it. And it needs to investigate issue more deeply
> > for stating something definitely, I think.
> 
> 
> > With the best regards,
> > Vyacheslav Dubeyko.
> > 
> >> May 22 18:53:31 riven kernel: [ 3821.605568] BUG: unable to handle kernel paging request at 00000000000036cd
> >> May 22 18:53:31 riven kernel: [ 3821.605577] IP: [<ffffffffa027f1a2>] nilfs_end_page_io+0x12/0xc0 [nilfs2]
> >> May 22 18:53:31 riven kernel: [ 3821.605591] PGD 19636d067 PUD 19636e067 PMD 0 
> >> May 22 18:53:31 riven kernel: [ 3821.605597] Oops: 0000 [#1] PREEMPT SMP 
> >> <snip>
> >> May 22 18:53:31 riven kernel: [ 3821.605829] Code: ff ff ff 48 81 c4 88 00 00 00 5b 41 5c 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 85 ff 48 89 e5 53 48 89 fb 74 4e <48> 8b 07 f6 c4 08 0f 84 8c 00 00 00 48 8b 47 30 48 8b 00 f6 c4 
> >> May 22 18:53:31 riven kernel: [ 3821.605873] RIP  [<ffffffffa027f1a2>] nilfs_end_page_io+0x12/0xc0 [nilfs2]
> >> May 22 18:53:31 riven kernel: [ 3821.605881]  RSP <ffff8801960f7b30>
> >> May 22 18:53:31 riven kernel: [ 3821.605884] CR2: 00000000000036cd
> >> 
> >> where the instruction sequence of "<48> 8b 07 f6 c4 08" is "mov
> >> (%rdi),%rax; test $0x8, %ah", and corresponds to the part testing
> >> PagePrivate(page) in page_buffers() macro called within
> >> nilfs_end_page_io() routine:
> >> 
> >>       if (buffer_nilfs_node(page_buffers(page)) && !PageWriteback(page)) {
> >> 
> >> This cannot happen, but there may be something we missed.
> >> 
> >> 
> >> Regards,
> >> Ryusuke Konishi
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Broken nilfs2 filesystem
  2013-05-29  6:39             ` Vyacheslav Dubeyko
  2013-05-29 14:37               ` Ryusuke Konishi
@ 2013-05-30  8:10               ` Anton Eliasson
       [not found]                 ` <51A70971.40602-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
  1 sibling, 1 reply; 27+ messages in thread
From: Anton Eliasson @ 2013-05-30  8:10 UTC (permalink / raw)
  To: slava-yeENwD64cLxBDgjK7y7TUQ; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Vyacheslav Dubeyko skrev 2013-05-29 08:39:
> Hi Anton
>
> On Sat, 2013-05-25 at 18:26 +0200, Anton Eliasson wrote:
>
> [snip]
>> More specifically, I have a virtual machine running Windows XP in /home,
>> a nilfs filesystem, and a virtual machine running Windows 7 in
>> /Supplement. /Supplement is an ext4 volume in the same LVM volume group
>> as /home on the same slow hard drive. I can crash the host by either:
>>
>> * Starting both machines at the same time.
>> * Starting the W7 machine first and when it is fully booted to the
>> desktop, but still doing I/O intensive Windows stuff, starting the WXP
>> machine.
>>
>> If I first start the WXP machine and let it boot to the desktop, at the
>> point where it is actually I/O idle, I can safely start the W7 machine.
>> After that I found no trouble installing software updates and logging in
>> and out of both machines at the same time, though the HDD made it very
>> slow of course.
> Currently, I am thinking about reproducing path. It is really important
> to have clear reproducing path. But I haven't clear picture of your
> environment yet. As I understand, you have two virtual VmWare machine
> (Win XP and Win 7). Am I correct?
Correct. The Windows XP machine is stored in /home/anton/vmware/ and the 
Windows 7 machine is stored in /Supplement/anton/vmware/.
> Moreover, I am thinking about the fact that virtual machine on different
> volumes influence on each other in the issue environment. Currently, I
> haven't clear understanding of this.
>
>> /etc/fstab
>> ----------
>>       tmpfs		/tmp	tmpfs	nodev,nosuid	0	0
>>       /dev/mapper/riven-arch	/         	nilfs2    	rw,noatime,discard 0 0
>>       /dev/mapper/riven-home	/home     	nilfs2    	rw,noatime,discard 0 0
>>       /dev/mapper/riven-swap  none            swap            defaults
>>           0 0
>>       /dev/riven-proto/supplement /Supplement ext4 defaults,noatime 0 0
>> # some NFS mounts excluded
>>
> As I can see, riven-arch, riven-home and riven-swap are under device
> mapper but riven-proto is not. Could you share more details about how
> your Logical Volumes environment was prepared?
I drew some sketches to help illustrate my partitioning scheme because 
it's getting quite complicated: http://imgur.com/HC8GstJ,MlKc3DN

riven-proto is also managed by device mapper. An LVM volume can be 
specified either as /dev/mapper/<vg>-<lv> or /dev/<vg>/<lv>. Both of 
these files are symlinks to the same device. I think I've read somewhere 
that the former symlinks are created earlier in the boot process but 
other than that these two ways are basically equivalent.
> Current state of fsck.nilfs2 doesn't give many useful details. But debug
> output of fsck.nilfs2 contains detailed info about first superblock,
> second superblock and segment summaries of all segments. I think that
> this output can give to me more understanding about NILFS2 volume state.
> Could you share debug output of fsck.nilfs2 for me?
>
> You can found archive with fsck.nilf2 source code in this place:
> (http://dubeyko.com/development/FileSystems/NILFS/nilfs-utils-fsck-v.0.04-under-development.tar.gz). Please, build fsck.nilfs2 but don't install it. The fsck.nilfs2 on the initial state of development. Currently, fsck.nilfs2 doesn't make any writing operations. So, you can execute command in such way: "fsck.nilfs2 -v debug [device] 2> [output-file]". The output file has a big size, usually.
I'll see if I can do that tonight.
> I am preparing patch for NILFS2 driver with debug output. I think that
> it makes sense to get more detail about the issue on your side because
> you can reproduce the issue stably. So, I'll send you this patch as it
> will be ready. Have you opportunity to patch your kernel and share debug
> output for the reproduced issue case?
I've never patched a kernel before but I could give it a try. I probably 
won't have time for that until next weekend though, as I will be away 
from this particular computer for the next week.
> Thanks,
> Vyacheslav Dubeyko.
>
>


-- 
Best Regards,
Anton Eliasson

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Broken nilfs2 filesystem
       [not found]                 ` <51A70971.40602-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
@ 2013-05-30 15:30                   ` Anton Eliasson
       [not found]                     ` <51A770A8.9070105-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
  0 siblings, 1 reply; 27+ messages in thread
From: Anton Eliasson @ 2013-05-30 15:30 UTC (permalink / raw)
  To: slava-yeENwD64cLxBDgjK7y7TUQ; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Anton Eliasson skrev 2013-05-30 10:10:
> Vyacheslav Dubeyko skrev 2013-05-29 08:39:
[snip]
>> Current state of fsck.nilfs2 doesn't give many useful details. But debug
>> output of fsck.nilfs2 contains detailed info about first superblock,
>> second superblock and segment summaries of all segments. I think that
>> this output can give to me more understanding about NILFS2 volume state.
>> Could you share debug output of fsck.nilfs2 for me?
>>
>> You can found archive with fsck.nilf2 source code in this place:
>> (http://dubeyko.com/development/FileSystems/NILFS/nilfs-utils-fsck-v.0.04-under-development.tar.gz).
>> Please, build fsck.nilfs2 but don't install it. The fsck.nilfs2 on the
>> initial state of development. Currently, fsck.nilfs2 doesn't make any
>> writing operations. So, you can execute command in such way:
>> "fsck.nilfs2 -v debug [device] 2> [output-file]". The output file has
>> a big size, usually.
> I'll see if I can do that tonight.

Okay, this is what was printed to stdout by fsck.nilfs2 with debug 
verbosity:

fsck.nilfs2 v.0.04-under-development (nilfs-utils 2.1.4)
[UI_INFO]: The NILFS superblocks checking begins.
[FS_INFO]: [SB] [ID: 0x10200020000005f] [SEG: 0 LOG: 0] Superblock state 
flag *tells* that filesystem stays in mounted state.
[FS_INFO]: [SB] [ID: 0x10200020000006a] Primary and secondary 
superblocks have different info about last checkpoint.
[FS_INFO]: [SB] [ID: 0x10200020000006b] Primary and secondary 
superblocks have different info about disk block address of partial segment.
[FS_INFO]: [SB] [ID: 0x10200020000006c] Primary and secondary 
superblocks have different info about sequential number of partial segment.
[FS_INFO]: [SB] [ID: 0x102000200000070] Primary and secondary 
superblocks have different last write time.
[FS_INFO]: [SB] [ID: 0x102000200000073] Primary and secondary 
superblocks have different file system state flags.
[FS_INFO]: [SB] [ID: 0x10200020000003d] NILFS has valid primary and 
secondary superblocks.
[INTERNAL_INFO]: NILFS has valid primary and secondary superblocks. 
Requested device: /dev/riven/home.
[UI_INFO]: NILFS volume's segments checking begins.
[UI_INFO]: FSCK currently has partial and experimental checking 
functionality. Sorry. Functionality is not implemented yet.
[UI_INFO]: All is OK. Have a nice day.

And here's the log that was printed to stderr: 
http://antoneliasson.se/publicdump/riven-home-fsck-stderr.log.gz
It's 12 MB gzipped and 457 MB uncompressed.

-- 
Best Regards,
Anton Eliasson
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Broken nilfs2 filesystem
       [not found]                     ` <51A770A8.9070105-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
@ 2013-05-30 20:50                       ` Anton Eliasson
       [not found]                         ` <51A7BB84.3010505-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
  0 siblings, 1 reply; 27+ messages in thread
From: Anton Eliasson @ 2013-05-30 20:50 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Sorry for my frequent posting today. It looks like my system is falling 
apart. Earlier today /home died again. This is the volume "newhome" on 
the sketch I sent two e-mails ago. Just like the last time it involved a 
file related to VMware and just like last time the ro remount was 
triggered by my backup program bup as it tried to read the corrupted 
file. VMware itself was not running at the time.

I create a new logical volume riven-proto/homesweethome, format it to 
ext4, change the fstab entry for /home to 
/dev/riven-proto/homesweethome, reboot and check the logs. kernel.log said:

May 30 19:41:24 riven kernel: [88324.864707] NILFS: bad btree node 
(blocknr=16521285): level = 100, flags = 0x3c, nchildren = 29793
May 30 19:41:24 riven kernel: [88324.864716] NILFS error (device dm-4): 
nilfs_bmap_lookup_contig: broken bmap (inode number=117612)
May 30 19:41:24 riven kernel: [88324.864716]
May 30 19:41:24 riven kernel: [88324.875626] Remounting filesystem read-only
May 30 19:41:24 riven kernel: [88324.875803] NILFS: bad btree node 
(blocknr=16521285): level = 100, flags = 0x3c, nchildren = 29793
May 30 19:41:24 riven kernel: [88324.875809] NILFS error (device dm-4): 
nilfs_bmap_lookup_contig: broken bmap (inode number=117612)
May 30 19:41:24 riven kernel: [88324.875809]

This output makes me believe that only one file is corrupted:

$ sudo mount -o ro,norecovery /dev/riven-proto/newhome /mnt
$ cd /mnt/anton/
$ LANG=C find . -type f -exec cat {} >/dev/null \;
cat: ./vmware/WXP/WXP-15dc29db.vmem: Input/output error




Next issue: after said reboot I got these errors:

May 30 20:09:35 riven kernel: [    7.298727] 
nilfs_ioctl_move_inode_block: conflicting data buffer: ino=8079, 
cno=726783, offset=911, blocknr=4812804, vblocknr=565882
May 30 20:09:35 riven kernel: [    7.299406] NILFS: GC failed during 
preparation: cannot read source blocks: err=-17

nilfs_cleanerd won't start on the root fs. Same errors if I try to start 
it manually (`nilfs_cleanerd /dev/riven/arch` as root).

I'd really like to have my SSD back now. Can I dd /home ("old" home on 
volume group "riven" that we've been debugging these last few days) to 
an image file and then reformat? I could keep riven-proto/newhome around 
if you want to debug that as well. As far as I know, riven-proto/newhome 
died very cleanly, with no rw mounts after the corruption was first 
discovered.

-- 
Best Regards,
Anton Eliasson
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Broken nilfs2 filesystem
       [not found]                         ` <51A7BB84.3010505-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
@ 2013-05-31  6:39                           ` Vyacheslav Dubeyko
  0 siblings, 0 replies; 27+ messages in thread
From: Vyacheslav Dubeyko @ 2013-05-31  6:39 UTC (permalink / raw)
  To: Anton Eliasson; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Anton,

On Thu, 2013-05-30 at 22:50 +0200, Anton Eliasson wrote:

[snip]
> 
> I'd really like to have my SSD back now. Can I dd /home ("old" home on 
> volume group "riven" that we've been debugging these last few days) to 
> an image file and then reformat? I could keep riven-proto/newhome around 
> if you want to debug that as well. As far as I know, riven-proto/newhome 
> died very cleanly, with no rw mounts after the corruption was first 
> discovered.
> 

Yes, of course, you can reformat your drive and have working file
system. Please, simply make image of corrupted partition with
reproducible issue. It will be a great to have such image for further
opportunity to investigate the issue on your side in the case of
necessity. Anyway, first of all, I'll try to investigate and fix issue
on my side.

Thank you for all information and details that you provided to us.

Thanks,
Vyacheslav Dubeyko.


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Broken nilfs2 filesystem
       [not found]                       ` <20130530.155543.480320022.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
  2013-05-30  7:21                         ` Vyacheslav Dubeyko
@ 2013-06-06  6:56                         ` Vyacheslav Dubeyko
  2013-06-06  9:20                           ` Reinoud Zandijk
  2013-06-12 20:31                           ` Anton Eliasson
  1 sibling, 2 replies; 27+ messages in thread
From: Vyacheslav Dubeyko @ 2013-06-06  6:56 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: Anton Eliasson, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Thu, 2013-05-30 at 15:55 +0900, Ryusuke Konishi wrote:
> On Thu, 30 May 2013 10:13:05 +0400, Vyacheslav Dubeyko wrote:
> > On Wed, 2013-05-29 at 23:37 +0900, Ryusuke Konishi wrote:
> >> I don't know whether this may be a hint of this trouble, but according
> >> to the system log, page_buffers() of nilfs_end_page_io() seems to hit
> >> an Oops due to an invalid page address "0x36cd":
> >> 
> > 
> > Yes. There are two possible way to be in nilfs_end_page_io(): (1)
> > nilfs_segctor_complete_write(); (2) nilfs_abort_logs(). Currently, I
> > suspect the nilfs_abort_logs()
> 
> That sounds a likely cause.
> 
> Can you test nilfs_abort_logs by injecting a random fault in some easy
> way ?
> 

So, what I discovered currently.

First of all, unfortunately, I can't reproduce the issue yet, currently.
I suspect that in this issue the aging state of volume, peculiarity of
workload and environment play very important role. As I remember, all
reporters of likewise symptoms (broken bnode error messages) talked
about several months of successful working of NILFS2 file system.

I tried to make LVM environment as it was described by Anton. But I
didn't catch the issue in this environment. So, I think that I haven't
properly aged NILFS2 volume state and I tried not proper workload. It
needs to think about proper workload more deeply. As I can see from
Anton's system log that it took place frequent update and git activity.
Moreover, update and git were nearly before crash:

May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:43] Downloading update (37 782 of 41 158 KB)...
May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:43] Downloading update (38 390 of 41 158 KB)...
May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:43] Downloading update (39 066 of 41 158 KB)...
May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:44] Downloading update (39 742 of 41 158 KB)...
May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:44] Downloading update (40 311 of 41 158 KB)...
May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:44] Downloading update (40 956 of 41 158 KB)...
May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:45] Downloading update (41 158 of 41 158 KB)...
May 22 18:50:13 riven slim[274]: [2013-05-22 18:48:45] Downl18:50:13 | Git | default | Checking for remote changes...
May 22 18:50:13 riven slim[274]: 18:50:13 | Cmd | default | git rev-parse HEAD
May 22 18:50:13 riven slim[274]: 18:50:13 | Cmd | default | git ls-remote --heads --exit-code "ssh://storage@hephaestus/home/storage/default" master
May 22 18:50:13 riven slim[274]: 18:50:13 | Git | default | No remote changes, local+remote: 8eab1e96aa618010ff17c11a955f4423d823beb6
May 22 18:50:14 riven slim[274]: 18:50:14 | ListenerTcp | Pinging tcp://notifications.sparkleshare.org:443/
May 22 18:50:14 riven slim[274]: 18:50:14 | ListenerTcp | Received pong from tcp://notifications.sparkleshare.org:443/
May 22 18:53:31 riven kernel: [ 3821.605568] BUG: unable to handle kernel paging request at 00000000000036cd
May 22 18:53:31 riven kernel: [ 3821.605577] IP: [<ffffffffa027f1a2>] nilfs_end_page_io+0x12/0xc0 [nilfs2]

So, maybe, git activity is a possible workload for the issue
reproducing. It needs to check it, I suppose.

I tried to simulate errors occurrence in nilfs_segctor_do_construct()
method by means of excluding of error checking in places:

http://lxr.free-electrons.com/source/fs/nilfs2/segment.c#L1942
http://lxr.free-electrons.com/source/fs/nilfs2/segment.c#L1953
http://lxr.free-electrons.com/source/fs/nilfs2/segment.c#L1962
http://lxr.free-electrons.com/source/fs/nilfs2/segment.c#L1976
http://lxr.free-electrons.com/source/fs/nilfs2/segment.c#L1989

Initially, by chance, I simply comment error checking statement. Then, I
comment error checking statement and additionally set code error by
-EINVAL. It is strange but if I set error code then I haven't any
visible failure in working of NILFS2 driver. But I have very interesting
error in the case when I simply comment error checking statement without
setting code error:

May 31 15:05:49 slavad-ubuntu nilfs_cleanerd[2409]: run (manual)
May 31 15:05:50 slavad-ubuntu kernel: [  737.725827] [nilfs_segctor_do_construct] fs/nilfs2/segment.c:1944
May 31 15:05:50 slavad-ubuntu nilfs_cleanerd[2409]: cannot clean segments: File exists
May 31 15:05:50 slavad-ubuntu nilfs_cleanerd[2409]: shutdown
May 31 15:05:50 slavad-ubuntu kernel: [  737.744660] ------------[ cut here ]------------
May 31 15:05:50 slavad-ubuntu kernel: [  737.744674] WARNING: at fs/nilfs2/ioctl.c:449 nilfs_ioctl_clean_segments.isra.11+0x667/0x690()
May 31 15:05:50 slavad-ubuntu kernel: [  737.744676] Hardware name: OptiPlex 760                 
May 31 15:05:50 slavad-ubuntu kernel: [  737.744679] Modules linked in: snd_hda_codec_analog snd_hda_intel i915 snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event bnep rfcomm snd_seq drm_kms_helper drm bluetooth nfsv4 snd_timer snd_seq_device i2c_algo_bit snd joydev hid_generic soundcore dell_wmi video dcdbas coretemp psmouse serio_raw mei sparse_keymap ppdev snd_page_alloc lpc_ich mac_hid parport_pc microcode wmi lp parport binfmt_misc nfsd nfs_acl auth_rpcgss nfs fscache lockd sunrpc e1000e ptp pps_core usbhid hid
May 31 15:05:50 slavad-ubuntu kernel: [  737.744746] Pid: 2409, comm: nilfs_cleanerd Tainted: G          I  3.9.0-rc6+ #35
May 31 15:05:50 slavad-ubuntu kernel: [  737.744748] Call Trace:
May 31 15:05:50 slavad-ubuntu kernel: [  737.744756]  [<ffffffff8105c7df>] warn_slowpath_common+0x7f/0xc0
May 31 15:05:50 slavad-ubuntu kernel: [  737.744760]  [<ffffffff8105c83a>] warn_slowpath_null+0x1a/0x20
May 31 15:05:50 slavad-ubuntu kernel: [  737.744765]  [<ffffffff81301837>] nilfs_ioctl_clean_segments.isra.11+0x667/0x690
May 31 15:05:50 slavad-ubuntu kernel: [  737.744771]  [<ffffffff81098f0f>] ? local_clock+0x6f/0x80
May 31 15:05:50 slavad-ubuntu kernel: [  737.744776]  [<ffffffff81301e44>] nilfs_ioctl+0x3d4/0x690
May 31 15:05:50 slavad-ubuntu kernel: [  737.744781]  [<ffffffff810c370f>] ? lock_release_non_nested+0x30f/0x350
May 31 15:05:50 slavad-ubuntu kernel: [  737.744785]  [<ffffffff81098ca5>] ? sched_clock_local+0x25/0x90
May 31 15:05:50 slavad-ubuntu kernel: [  737.744790]  [<ffffffff811b7e26>] do_vfs_ioctl+0x96/0x570
May 31 15:05:50 slavad-ubuntu kernel: [  737.744795]  [<ffffffff81169e4c>] ? might_fault+0x5c/0xb0
May 31 15:05:50 slavad-ubuntu kernel: [  737.744801]  [<ffffffff81748985>] ? sysret_check+0x22/0x5d
May 31 15:05:50 slavad-ubuntu kernel: [  737.744805]  [<ffffffff811b8391>] sys_ioctl+0x91/0xb0
May 31 15:05:50 slavad-ubuntu kernel: [  737.744809]  [<ffffffff813a70be>] ? trace_hardirqs_on_thunk+0x3a/0x3f
May 31 15:05:50 slavad-ubuntu kernel: [  737.744813]  [<ffffffff81748959>] system_call_fastpath+0x16/0x1b
May 31 15:05:50 slavad-ubuntu kernel: [  737.744816] ---[ end trace 374fc1d251cc46c6 ]---
May 31 15:05:50 slavad-ubuntu kernel: [  737.744933] NILFS: GC failed during preparation: cannot read source blocks: err=-17
May 31 15:09:44 slavad-ubuntu kernel: [  972.324583] [nilfs_segctor_do_construct] fs/nilfs2/segment.c:1944
May 31 15:09:49 slavad-ubuntu kernel: [  977.349257] [nilfs_segctor_do_construct] fs/nilfs2/segment.c:1944
May 31 15:11:57 slavad-ubuntu nilfs_cleanerd[2820]: start
May 31 15:11:57 slavad-ubuntu nilfs_cleanerd[2820]: pause (clean check)
May 31 15:12:08 slavad-ubuntu nilfs_cleanerd[2820]: run (manual)
May 31 15:12:08 slavad-ubuntu nilfs_cleanerd[2820]: cannot clean segments: File exists
May 31 15:12:08 slavad-ubuntu nilfs_cleanerd[2820]: shutdown
May 31 15:12:08 slavad-ubuntu kernel: [ 1115.562880] nilfs_ioctl_move_inode_block: conflicting data buffer: ino=4, cno=0, offset=0, blocknr=2086, vblocknr=232528
May 31 15:12:08 slavad-ubuntu kernel: [ 1115.562887] NILFS: GC failed during preparation: cannot read source blocks: err=-17

As I understand, this error looks like last Anton's reports about
complete failure of using the corrupted NILFS2 volume. So, maybe, it is
possible to make assumption that it takes place continuous and permanent
segments construction abortion in the case of the issue. But simulation
by means of commenting error checking statement without setting code
error is not proper driver's workflow, as I understand. And it confuses
me. Currently, I haven't clear understanding of it.

So, it needs to continue investigation of the issue further, from my
viewpoint.

With the best regards,
Vyacheslav Dubeyko.

> Regards,
> Ryusuke Konishi
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Broken nilfs2 filesystem
  2013-06-06  6:56                         ` Vyacheslav Dubeyko
@ 2013-06-06  9:20                           ` Reinoud Zandijk
       [not found]                             ` <20130606092054.GA201-HNv6YvNvQKMNqjISwOrxaLFspR4gePGN@public.gmane.org>
  2013-06-12 20:31                           ` Anton Eliasson
  1 sibling, 1 reply; 27+ messages in thread
From: Reinoud Zandijk @ 2013-06-06  9:20 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 887 bytes --]

Hi,

just my $0.02 so to say:

On Thu, Jun 06, 2013 at 10:56:09AM +0400, Vyacheslav Dubeyko wrote:
> First of all, unfortunately, I can't reproduce the issue yet, currently.
> I suspect that in this issue the aging state of volume, peculiarity of
> workload and environment play very important role. As I remember, all
> reporters of likewise symptoms (broken bnode error messages) talked
> about several months of successful working of NILFS2 file system.

sounds to me as if a b-tree is in a perculiar state and that updating the
btree results in this corruption.

Have you tried to mount one of the checkpoints/snapshots earlier as RO and see
if those are correct? If so, dumping both DATs and both btrees might give a
clue as to what went wrong. If only it gives a clue as to how complicated the
btree is before the updating and what actions are taken on it.

With regards,
Reinoud


[-- Attachment #2: Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Broken nilfs2 filesystem
       [not found]                             ` <20130606092054.GA201-HNv6YvNvQKMNqjISwOrxaLFspR4gePGN@public.gmane.org>
@ 2013-06-06  9:34                               ` Vyacheslav Dubeyko
  2013-06-06 14:19                                 ` Reinoud Zandijk
  2013-06-12 20:12                               ` Anton Eliasson
  1 sibling, 1 reply; 27+ messages in thread
From: Vyacheslav Dubeyko @ 2013-06-06  9:34 UTC (permalink / raw)
  To: Reinoud Zandijk; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Thu, 2013-06-06 at 11:20 +0200, Reinoud Zandijk wrote:
> Hi,
> 
> just my $0.02 so to say:
> 
> On Thu, Jun 06, 2013 at 10:56:09AM +0400, Vyacheslav Dubeyko wrote:
> > First of all, unfortunately, I can't reproduce the issue yet, currently.
> > I suspect that in this issue the aging state of volume, peculiarity of
> > workload and environment play very important role. As I remember, all
> > reporters of likewise symptoms (broken bnode error messages) talked
> > about several months of successful working of NILFS2 file system.
> 
> sounds to me as if a b-tree is in a perculiar state and that updating the
> btree results in this corruption.
> 
> Have you tried to mount one of the checkpoints/snapshots earlier as RO and see
> if those are correct? If so, dumping both DATs and both btrees might give a
> clue as to what went wrong. If only it gives a clue as to how complicated the
> btree is before the updating and what actions are taken on it.
> 

Unfortunately, I haven't reproduced issue on my side. On my side all is
OK. I am trying to reproduce the issue that was reported by many times
of different users. But, currently, without any success on my side. So,
I can't investigate the essence of the issue. I know symptoms but I
don't know reproducing path of the issue.

Thank you for your advice. But without corruption on my side I can't
investigate anything.

Thanks,
Vyacheslav Dubeyko.

> With regards,
> Reinoud
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Broken nilfs2 filesystem
  2013-06-06  9:34                               ` Vyacheslav Dubeyko
@ 2013-06-06 14:19                                 ` Reinoud Zandijk
  0 siblings, 0 replies; 27+ messages in thread
From: Reinoud Zandijk @ 2013-06-06 14:19 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Thu, Jun 06, 2013 at 01:34:48PM +0400, Vyacheslav Dubeyko wrote:
> On Thu, 2013-06-06 at 11:20 +0200, Reinoud Zandijk wrote:
> Unfortunately, I haven't reproduced issue on my side. On my side all is
> OK. I am trying to reproduce the issue that was reported by many times
> of different users. But, currently, without any success on my side. So,
> I can't investigate the essence of the issue. I know symptoms but I
> don't know reproducing path of the issue.

Oops i must have CC'd it to you instead of the submitter, well i hope he reads
it on the list :)

With regards,
Reinoud

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Broken nilfs2 filesystem
       [not found]                             ` <20130606092054.GA201-HNv6YvNvQKMNqjISwOrxaLFspR4gePGN@public.gmane.org>
  2013-06-06  9:34                               ` Vyacheslav Dubeyko
@ 2013-06-12 20:12                               ` Anton Eliasson
  1 sibling, 0 replies; 27+ messages in thread
From: Anton Eliasson @ 2013-06-12 20:12 UTC (permalink / raw)
  To: Reinoud Zandijk; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Reinoud Zandijk skrev 2013-06-06 11:20:
> Hi,
>
> just my $0.02 so to say:
>
> On Thu, Jun 06, 2013 at 10:56:09AM +0400, Vyacheslav Dubeyko wrote:
>> First of all, unfortunately, I can't reproduce the issue yet, currently.
>> I suspect that in this issue the aging state of volume, peculiarity of
>> workload and environment play very important role. As I remember, all
>> reporters of likewise symptoms (broken bnode error messages) talked
>> about several months of successful working of NILFS2 file system.
> sounds to me as if a b-tree is in a perculiar state and that updating the
> btree results in this corruption.
>
> Have you tried to mount one of the checkpoints/snapshots earlier as RO and see
> if those are correct? If so, dumping both DATs and both btrees might give a
> clue as to what went wrong. If only it gives a clue as to how complicated the
> btree is before the updating and what actions are taken on it.
>
> With regards,
> Reinoud
>
I have configured nilfs_cleanerd.conf to clean very aggressively so my 
earliest checkpoint is from after the incident. I included the contents 
of that file in my first email sent on May 22 
(http://article.gmane.org/gmane.comp.file-systems.nilfs.user/2920).

Even so, I tried to loopback mount the oldest checkpoint I have which I 
found was affected by the same corruption.

# losetup /dev/loop0 /Athena/Dump/riven/riven-home-20130531.img
# mount /dev/loop0 /mnt
$ mount | tail -1
/dev/loop0 on /mnt type nilfs2 (ro,relatime,norecovery)
$ lscp /dev/loop0
                  CNO        DATE     TIME  MODE  FLG NBLKINC       ICNT
              1260571  2013-05-23 16:51:49   cp    -          140 155496
              1260572  2013-05-23 16:51:51   cp    -         1632 155495
              1260575  2013-05-23 16:52:06   cp    -         1473 155496
              1260576  2013-05-23 16:52:09   cp    -           49 155495
              1260580  2013-05-24 23:36:11   cp    -         1345 155496
              1260581  2013-05-24 23:36:16   cp    -         1500 155495
              1260582  2013-05-24 23:36:21   cp    -         1356 155497
              1260583  2013-05-24 23:36:26   cp    -         1465 155495
# chcp ss /dev/loop0 1260571
# umount /mnt
# mount -o ro,norecovery,cp=1260571 /dev/loop0 /mnt
$ cd /mnt/anton/Bilder/20130321-28\ Jakobs\ bilder\ från\ Nederländerna
$ LANG=C cat *>/dev/null
cat: 160.JPG: Input/output error
cat: 163.JPG: Input/output error
cat: 164.JPG: Input/output error
cat: 165.JPG: Input/output error
cat: 170.JPG: Input/output error
cat: 172.JPG: Input/output error
cat: 179.JPG: Input/output error

-- 
Best Regards,
Anton Eliasson

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Broken nilfs2 filesystem
  2013-06-06  6:56                         ` Vyacheslav Dubeyko
  2013-06-06  9:20                           ` Reinoud Zandijk
@ 2013-06-12 20:31                           ` Anton Eliasson
       [not found]                             ` <51B8DA8E.6020802-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
  1 sibling, 1 reply; 27+ messages in thread
From: Anton Eliasson @ 2013-06-12 20:31 UTC (permalink / raw)
  To: slava-yeENwD64cLxBDgjK7y7TUQ; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Vyacheslav Dubeyko skrev 2013-06-06 08:56:
> On Thu, 2013-05-30 at 15:55 +0900, Ryusuke Konishi wrote:
>> On Thu, 30 May 2013 10:13:05 +0400, Vyacheslav Dubeyko wrote:
>>> On Wed, 2013-05-29 at 23:37 +0900, Ryusuke Konishi wrote:
>>>> I don't know whether this may be a hint of this trouble, but according
>>>> to the system log, page_buffers() of nilfs_end_page_io() seems to hit
>>>> an Oops due to an invalid page address "0x36cd":
>>>>
>>> Yes. There are two possible way to be in nilfs_end_page_io(): (1)
>>> nilfs_segctor_complete_write(); (2) nilfs_abort_logs(). Currently, I
>>> suspect the nilfs_abort_logs()
>> That sounds a likely cause.
>>
>> Can you test nilfs_abort_logs by injecting a random fault in some easy
>> way ?
>>
> So, what I discovered currently.
>
> First of all, unfortunately, I can't reproduce the issue yet, currently.
> I suspect that in this issue the aging state of volume, peculiarity of
> workload and environment play very important role. As I remember, all
> reporters of likewise symptoms (broken bnode error messages) talked
> about several months of successful working of NILFS2 file system.
>
> I tried to make LVM environment as it was described by Anton. But I
> didn't catch the issue in this environment. So, I think that I haven't
> properly aged NILFS2 volume state and I tried not proper workload. It
> needs to think about proper workload more deeply. As I can see from
> Anton's system log that it took place frequent update and git activity.
> Moreover, update and git were nearly before crash:
I'm not so sure that my issues are caused by aging of the filesystem. As 
I described in my third e-mail on May 30 
(http://article.gmane.org/gmane.comp.file-systems.nilfs.user/2957), I 
was able to trash my new /home which was only a week old. I'm starting 
to think it has something to do with either VMware or bup (which is git 
based) or a combination of both.
> May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:43] Downloading update (37 782 of 41 158 KB)...
> May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:43] Downloading update (38 390 of 41 158 KB)...
> May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:43] Downloading update (39 066 of 41 158 KB)...
> May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:44] Downloading update (39 742 of 41 158 KB)...
> May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:44] Downloading update (40 311 of 41 158 KB)...
> May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:44] Downloading update (40 956 of 41 158 KB)...
> May 22 18:48:45 riven slim[274]: [2013-05-22 18:48:45] Downloading update (41 158 of 41 158 KB)...
> May 22 18:50:13 riven slim[274]: [2013-05-22 18:48:45] Downl18:50:13 | Git | default | Checking for remote changes...
> May 22 18:50:13 riven slim[274]: 18:50:13 | Cmd | default | git rev-parse HEAD
> May 22 18:50:13 riven slim[274]: 18:50:13 | Cmd | default | git ls-remote --heads --exit-code "ssh://storage@hephaestus/home/storage/default" master
> May 22 18:50:13 riven slim[274]: 18:50:13 | Git | default | No remote changes, local+remote: 8eab1e96aa618010ff17c11a955f4423d823beb6
> May 22 18:50:14 riven slim[274]: 18:50:14 | ListenerTcp | Pinging tcp://notifications.sparkleshare.org:443/
> May 22 18:50:14 riven slim[274]: 18:50:14 | ListenerTcp | Received pong from tcp://notifications.sparkleshare.org:443/
> May 22 18:53:31 riven kernel: [ 3821.605568] BUG: unable to handle kernel paging request at 00000000000036cd
> May 22 18:53:31 riven kernel: [ 3821.605577] IP: [<ffffffffa027f1a2>] nilfs_end_page_io+0x12/0xc0 [nilfs2]
>
> So, maybe, git activity is a possible workload for the issue
> reproducing. It needs to check it, I suppose.
Git in this case is a part of SparkleShare. SparkleShare is a Git based 
file synchronisation program, much like Dropbox but self hosted. 
However, I've made very little changes to the files tracked by 
SparkleShare so the Git workload should be extremely light.

I believe Steam is what's printing "Downloading update".
> I tried to simulate errors occurrence in nilfs_segctor_do_construct()
> method by means of excluding of error checking in places:
>
[...]

-- 
Best Regards,
Anton Eliasson

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Broken nilfs2 filesystem
       [not found]                             ` <51B8DA8E.6020802-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
@ 2013-06-13 10:01                               ` Vyacheslav Dubeyko
  0 siblings, 0 replies; 27+ messages in thread
From: Vyacheslav Dubeyko @ 2013-06-13 10:01 UTC (permalink / raw)
  To: Anton Eliasson; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Wed, 2013-06-12 at 22:31 +0200, Anton Eliasson wrote:

[snip]
> > I tried to make LVM environment as it was described by Anton. But I
> > didn't catch the issue in this environment. So, I think that I haven't
> > properly aged NILFS2 volume state and I tried not proper workload. It
> > needs to think about proper workload more deeply. As I can see from
> > Anton's system log that it took place frequent update and git activity.
> > Moreover, update and git were nearly before crash:

> I'm not so sure that my issues are caused by aging of the filesystem. As 
> I described in my third e-mail on May 30 
> (http://article.gmane.org/gmane.comp.file-systems.nilfs.user/2957), I 
> was able to trash my new /home which was only a week old. I'm starting 
> to think it has something to do with either VMware or bup (which is git 
> based) or a combination of both.

As I understand, the issue takes place on GC side. This fact complicates
the situation because the real problem can be far from detected
symptoms. I mean that a real issue can occur earlier without detection.
So, I assume that possible reasons can be: (1) special file system aging
state; (2) race condition. Any of these reason can be reproduced only by
clear and strict reproducing path. So, I have to find the reproducing
path, firstly.

Soon, I'll finish the preparation of debugging output patch set. I hope
that this patch set can give more details about the issue by reproducing
it on your side.

With the best regards,
Vyacheslav Dubeyko.


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Broken nilfs2 filesystem
       [not found]       ` <9016EBD5-1E01-476F-B1B9-66AE593F4728-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2013-07-27 22:32         ` Anton Eliasson
  0 siblings, 0 replies; 27+ messages in thread
From: Anton Eliasson @ 2013-07-27 22:32 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Vyacheslav Dubeyko skrev 2013-07-27 18:23:
> Hi Anton,
>
> On Jul 26, 2013, at 8:52 PM, Anton Eliasson wrote:
>
> Thank you for your efforts. But, as I understand, currently, you
> don't reproduce the issue and shared system log doesn't contain
> any new details about the issue. Please, see my description below.
That is correct, I just wanted to know if I was on the right track (and 
it turned out that I weren't).
> [snip]
>> I have aborted the experiments for today. kernel.log has 35 million
>> lines and compresses to 220 MB. I've uploaded it here
>> (http://antoneliasson.se/publicdump/kernel.log.20130726.gz). What should
>> I do next?
>>
> Unfortunately, the shared system log content doesn't contain any NILFS2
> error messages. So, it means that the issue doesn't be reproduced. Do you
> really confident that you can reproduce the issue before beginning of getting
> debug output? Could you check firstly the issue reproducibility?
You're right, I should try that first. Unfortunately, I'll be away from 
this computer again for the next week or two. I'll try to allocate some 
time for this investigation after that. I can access it via SSH so I 
might spend an evening recompiling the kernel remotely, but I don't want 
to reboot the computer remotely.
>
> You made one mistake during configuration of debug output. Please, see
> my description below.
>
> [snip]
>> * Append the following lines to config (just in case) and config.x86_64
>> (which I assume I will use):
>>
>>     CONFIG_NILFS2_DEBUG_SHOW_ERRORS=y
>>     CONFIG_NILFS2_DEBUG_BASE_OPERATIONS=y
>>     CONFIG_NILFS2_DEBUG_MDT_FILES=y
>>     CONFIG_NILFS2_DEBUG_SEGMENTS_SUBSYSTEM=y
>>     CONFIG_NILFS2_DEBUG_BLOCK_MAPPING=y
>>     CONFIG_NILFS2_DEBUG_DUMP_STACK=y
>>
> I think that better to use "make menuconfig" for debug output configuration
> because above-mentioned options have dependencies from other ones.
> Please, use "make menuconfig" way because it is not so easy to describe
> what set of configuration options are valid.
>
> [snip]
>>     * File systems
>>     *
>>     [...]
>>     NILFS2 file system support (NILFS2_FS) [M/n/y/?] m
>>       NILFS2 debugging (NILFS2_DEBUG) [N/y/?] (NEW) y
>>         Use pr_debug() instead of printk() (NILFS2_USE_PR_DEBUG)
>> [N/y/?] (NEW) y
> No, no, no... When you select using pr_debug() then you disable
> CONFIG_NILFS2_DEBUG_BASE_OPERATIONS,
> CONFIG_NILFS2_DEBUG_MDT_FILES, CONFIG_NILFS2_DEBUG_SEGMENTS_SUBSYSTEM,
> CONFIG_NILFS2_DEBUG_BLOCK_MAPPING options because you need to use
> dynamic debug opportunity (please, see Documentation/dynamic-debug-howto.txt).
> Moreover, when you select CONFIG_NILFS2_DEBUG_DUMP_STACK in dynamic
> debug output case then every function emits dump_stack() output. Please, read
> comments for configuration options.
>
> Firstly, I want to get debug output without enabling pr_debug(). We will have debug output
> only from requested subsystems in the case of using simple printk(). So, improper
> configuration of debug output is the reason of huge size of system log. I suggest
> not to use  CONFIG_NILFS2_DEBUG_DUMP_STACK option, firstly.
>
>>         Show internal errors (NILFS2_DEBUG_SHOW_ERRORS) [N/y/?] (NEW) y
>>         Enable dump stack output (NILFS2_DEBUG_DUMP_STACK) [N/y/?] (NEW) y
>>
> So, first of all, we need to reproduce the issue in initial state. Then, it needs to configure
> debug output properly and to get debug output for the case of reproduced issue.
>
> Thanks,
> Vyacheslav Dubeyko.
>
Thanks for all your advice. I'm very new to compiling and configuring 
kernels. I will keep you updated on how my next attempt works out.

-- 
Best Regards,
Anton Eliasson

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Broken nilfs2 filesystem
       [not found]   ` <51F2A945.6050909-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
@ 2013-07-27 16:23     ` Vyacheslav Dubeyko
       [not found]       ` <9016EBD5-1E01-476F-B1B9-66AE593F4728-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 27+ messages in thread
From: Vyacheslav Dubeyko @ 2013-07-27 16:23 UTC (permalink / raw)
  To: Anton Eliasson; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Anton,

On Jul 26, 2013, at 8:52 PM, Anton Eliasson wrote:

Thank you for your efforts. But, as I understand, currently, you
don't reproduce the issue and shared system log doesn't contain
any new details about the issue. Please, see my description below.

[snip]
> 
> I have aborted the experiments for today. kernel.log has 35 million
> lines and compresses to 220 MB. I've uploaded it here
> (http://antoneliasson.se/publicdump/kernel.log.20130726.gz). What should
> I do next?
> 

Unfortunately, the shared system log content doesn't contain any NILFS2
error messages. So, it means that the issue doesn't be reproduced. Do you
really confident that you can reproduce the issue before beginning of getting
debug output? Could you check firstly the issue reproducibility?

You made one mistake during configuration of debug output. Please, see
my description below.

[snip]
> * Append the following lines to config (just in case) and config.x86_64
> (which I assume I will use):
> 
>    CONFIG_NILFS2_DEBUG_SHOW_ERRORS=y
>    CONFIG_NILFS2_DEBUG_BASE_OPERATIONS=y
>    CONFIG_NILFS2_DEBUG_MDT_FILES=y
>    CONFIG_NILFS2_DEBUG_SEGMENTS_SUBSYSTEM=y
>    CONFIG_NILFS2_DEBUG_BLOCK_MAPPING=y
>    CONFIG_NILFS2_DEBUG_DUMP_STACK=y
> 

I think that better to use "make menuconfig" for debug output configuration
because above-mentioned options have dependencies from other ones.
Please, use "make menuconfig" way because it is not so easy to describe
what set of configuration options are valid.

[snip]
>    * File systems
>    *
>    [...]
>    NILFS2 file system support (NILFS2_FS) [M/n/y/?] m
>      NILFS2 debugging (NILFS2_DEBUG) [N/y/?] (NEW) y
>        Use pr_debug() instead of printk() (NILFS2_USE_PR_DEBUG)
> [N/y/?] (NEW) y

No, no, no... When you select using pr_debug() then you disable
CONFIG_NILFS2_DEBUG_BASE_OPERATIONS,
CONFIG_NILFS2_DEBUG_MDT_FILES, CONFIG_NILFS2_DEBUG_SEGMENTS_SUBSYSTEM,
CONFIG_NILFS2_DEBUG_BLOCK_MAPPING options because you need to use
dynamic debug opportunity (please, see Documentation/dynamic-debug-howto.txt).
Moreover, when you select CONFIG_NILFS2_DEBUG_DUMP_STACK in dynamic
debug output case then every function emits dump_stack() output. Please, read
comments for configuration options.

Firstly, I want to get debug output without enabling pr_debug(). We will have debug output
only from requested subsystems in the case of using simple printk(). So, improper
configuration of debug output is the reason of huge size of system log. I suggest
not to use  CONFIG_NILFS2_DEBUG_DUMP_STACK option, firstly.

>        Show internal errors (NILFS2_DEBUG_SHOW_ERRORS) [N/y/?] (NEW) y
>        Enable dump stack output (NILFS2_DEBUG_DUMP_STACK) [N/y/?] (NEW) y
> 

So, first of all, we need to reproduce the issue in initial state. Then, it needs to configure
debug output properly and to get debug output for the case of reproduced issue.

Thanks,
Vyacheslav Dubeyko.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Broken nilfs2 filesystem
       [not found]       ` <51A35558.1080503-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
@ 2013-05-27 13:23         ` Vyacheslav Dubeyko
  0 siblings, 0 replies; 27+ messages in thread
From: Vyacheslav Dubeyko @ 2013-05-27 13:23 UTC (permalink / raw)
  To: Anton Eliasson; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Anton,

On Mon, 2013-05-27 at 14:45 +0200, Anton Eliasson wrote:

[snip]
> Additionally, I have uploaded /var/log/everything.log spanning May 19-22 
> here:
> http://antoneliasson.se/publicdump/everything.log.gz
> 
> The first system crash is on line 14748. On line 15829 onwards nilfs 
> warns that an fs is unchecked and has a bad checksum. On line 16206 is 
> the first bad btree node error. I copied the entire /var/log tree a 
> reboot or two after I figured out that I had a bad fs. Please tell me if 
> you need any other log files from there.
> 

Thank you for all additional details and, especially, for the system
log. Your system log is really interesting. It is first time when I can
see crash dump before error messages about broken bmap. It is really
important detail, I suppose. So, I need to investigate all your
information more deeply. Maybe, I will ask about additional information
later. But, currently, I have enough and I need to think over about
known issue's environment.

Thanks,
Vyacheslav Dubeyko.


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Broken nilfs2 filesystem
       [not found]   ` <713B7146-DC0C-45AE-9ED2-30EB8F84FA57-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2013-05-27 12:45     ` Anton Eliasson
       [not found]       ` <51A35558.1080503-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
  0 siblings, 1 reply; 27+ messages in thread
From: Anton Eliasson @ 2013-05-27 12:45 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA; +Cc: devel-17Olwe7vw2dLC78zk6coLg

Vyacheslav Dubeyko skrev 2013-05-26 14:59:
> Hi Anton,
>
> On May 25, 2013, at 4:07 PM, Anton Eliasson wrote:
>
>>
> Thank you for additional details.
>
> But, as I remember, Ryusuke asked to try such commands too:
>
> $ sudo nilfs-tune -l /dev/dm-3
> $ sudo dumpseg /dev/dm-3 7007
> $ lssu -a /dev/dm-3
>
> Could you share output of these commands?
>
>
My messages are being silently swallowed! Maybe your list doesn't like 
my attachments? This is the third attempt and this time without attachments.

Ryusuke Konishi skrev 2013-05-23 03:40:
  > Hi,
  > On Wed, 22 May 2013 22:36:02 +0200, Anton Eliasson wrote:
  >> Anton Eliasson skrev 2013-05-22 22:33:
  >>> Greetings!
  >>> It pains me to report that my /home filesystem broke down today. My
  >>> system is running Arch Linux 64-bit. The filesystem resides on a
  >>> Crucial M4 256 GB SSD, on top of a LVM2 volume. The drive and
  >>> filesystem are both around six months old. Partition table and error
  >>> log excerpts are at the bottom of this e-mail. Full logs are available
  >>> upon request.
  >>>
  >>> I am providing this information as a bug report. I have no reason to
  >>> suspect the hardware but I cannot exclude it either. If you (the
  >>> developers) are interested in troubleshooting this for prosperity, I
  >>> can be your hands and run whatever tools are required. If not, I'll
  >>> reformat the filesystem, restore the data from backup and forget that
  >>> this happened.
  >>>
  >>> In case the formatting gets mangled, this e-mail is also available at
  >> Right here: http://paste.debian.net/5841/
  > Thank you for the report.
  >
  > According to the log, btree of a regular file is destroyed for some 
reason.
  > I think we should look into how the btree block is broken.
  >
  > Could you try the following commands to inspect the broken disk 
segment ?
  >
  >   $ sudo dd if=/dev/dm-3 bs=4k count=2048 skip=14350336 iflag=direct 
2>/dev/null | hexdump -C
There's some semi-private stuff in there so I'll e-mail it separately to 
Ryusuke Konishi and Vyacheslav Dubeyko.
  >
  > This will print out blocks of the segment 7007 which includes the
  > broken btree block.
  >
  > The following commands are also useful to get debug information.
  > Could you try them, too ?
  >
  >   $ sudo nilfs-tune -l /dev/dm-3
Today (May 23) it's called dm-2 but I don't think that should matter.

nilfs-tune 2.1.5
Filesystem volume name:      home
Filesystem UUID:      e4e8bd9a-12f6-4c2a-b32f-9471f1b321fc
Filesystem magic number:  0x3434
Filesystem revision #:      2.0
Filesystem features:      (none)
Filesystem state:      invalid or mounted,error
Filesystem OS type:      Linux
Block size:          4096
Filesystem created:      Sat Oct  6 15:52:11 2012
Last mount time:      Sat May 25 10:42:30 2013
Last write time:      Sat May 25 10:42:30 2013
Mount count:          143
Maximum mount count:      50
Reserve blocks uid:      0 (user root)
Reserve blocks gid:      0 (group root)
First inode:          11
Inode size:          128
DAT entry size:          32
Checkpoint size:      192
Segment usage size:      16
Number of segments:      14039
Device size:          117771862016
First data block:      1
# of blocks per segment:  2048
Reserved segments %:      5
Last checkpoint #:      1260585
Last block address:      430080
Last sequence #:      1557848
Free blocks count:      10317824
Commit interval:      0
# of blks to create seg:  0
CRC seed:          0xfb8deb0b
CRC check sum:          0x0db18bf2
CRC check data size:      0x00000118

  >   $ sudo dumpseg /dev/dm-3 7007
http://antoneliasson.se/publicdump/dumpseg-home-Anton_Eliasson-20130525.gz
  >   $ lssu -a /dev/dm-3
I ran this on May 23 but haven't had time to compose this e-mail until 
two days ago. During that period I mounted the filesystem as rw once or 
twice and I unfortunately forgot to kill nilfs_cleanerd so some of the 
segments might have moved around. So I have rerun lssu and uploaded both 
outputs here:
http://antoneliasson.se/publicdump/lssu-Anton_Eliasson-20130523.gz
http://antoneliasson.se/publicdump/lssu-Anton_Eliasson-20130525.gz
  >
  > The third command requires the device is mounted, so /home should be
  > mounted previously with a readonly option and a norecovery option:
  >
  >   $ sudo mount -t nilfs2 -o ro,norecovery /dev/dm-3 /home
  >
Additionally, I have uploaded /var/log/everything.log spanning May 19-22 
here:
http://antoneliasson.se/publicdump/everything.log.gz

The first system crash is on line 14748. On line 15829 onwards nilfs 
warns that an fs is unchecked and has a bad checksum. On line 16206 is 
the first bad btree node error. I copied the entire /var/log tree a 
reboot or two after I figured out that I had a bad fs. Please tell me if 
you need any other log files from there.

-- 
Best Regards
Anton Eliasson


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2013-07-27 22:32 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-22 20:33 Broken nilfs2 filesystem Anton Eliasson
     [not found] ` <519D2B96.9000106-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-05-22 20:36   ` Anton Eliasson
     [not found]     ` <519D2C32.5040600-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-05-23  1:40       ` Ryusuke Konishi
2013-05-23  6:44   ` Vyacheslav Dubeyko
2013-05-25 11:59     ` Anton Eliasson
     [not found]       ` <51A0A7A0.6010207-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-05-25 16:26         ` Anton Eliasson
     [not found]           ` <51A0E62D.5060600-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-05-26 12:54             ` Vyacheslav Dubeyko
2013-05-29  6:39             ` Vyacheslav Dubeyko
2013-05-29 14:37               ` Ryusuke Konishi
     [not found]                 ` <20130529.233757.27789741.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2013-05-30  6:13                   ` Vyacheslav Dubeyko
2013-05-30  6:55                     ` Ryusuke Konishi
     [not found]                       ` <20130530.155543.480320022.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2013-05-30  7:21                         ` Vyacheslav Dubeyko
2013-06-06  6:56                         ` Vyacheslav Dubeyko
2013-06-06  9:20                           ` Reinoud Zandijk
     [not found]                             ` <20130606092054.GA201-HNv6YvNvQKMNqjISwOrxaLFspR4gePGN@public.gmane.org>
2013-06-06  9:34                               ` Vyacheslav Dubeyko
2013-06-06 14:19                                 ` Reinoud Zandijk
2013-06-12 20:12                               ` Anton Eliasson
2013-06-12 20:31                           ` Anton Eliasson
     [not found]                             ` <51B8DA8E.6020802-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-06-13 10:01                               ` Vyacheslav Dubeyko
2013-05-30  8:10               ` Anton Eliasson
     [not found]                 ` <51A70971.40602-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-05-30 15:30                   ` Anton Eliasson
     [not found]                     ` <51A770A8.9070105-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-05-30 20:50                       ` Anton Eliasson
     [not found]                         ` <51A7BB84.3010505-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-05-31  6:39                           ` Vyacheslav Dubeyko
     [not found] <51A0A97A.4020503@antoneliasson.se>
     [not found] ` <713B7146-DC0C-45AE-9ED2-30EB8F84FA57@dubeyko.com>
     [not found]   ` <713B7146-DC0C-45AE-9ED2-30EB8F84FA57-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2013-05-27 12:45     ` Anton Eliasson
     [not found]       ` <51A35558.1080503-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-05-27 13:23         ` Vyacheslav Dubeyko
     [not found] <51F2A8A4.4020400@antoneliasson.se>
2013-07-26 16:52 ` Fwd: " Anton Eliasson
     [not found]   ` <51F2A945.6050909-17Olwe7vw2dLC78zk6coLg@public.gmane.org>
2013-07-27 16:23     ` Vyacheslav Dubeyko
     [not found]       ` <9016EBD5-1E01-476F-B1B9-66AE593F4728-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2013-07-27 22:32         ` Anton Eliasson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.