All of lore.kernel.org
 help / color / mirror / Atom feed
* NILFS error after power loss
       [not found] ` <3909a3843e9fc28c7c1f7812f47b7b2b-m2T68/X/qvZAfugRpC6u6w@public.gmane.org>
@ 2017-07-16 19:37   ` mikael-m2T68/X/qvZAfugRpC6u6w
       [not found]     ` <39eaa9df1d7d61fbcbfba585f26896e5-m2T68/X/qvZAfugRpC6u6w@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: mikael-m2T68/X/qvZAfugRpC6u6w @ 2017-07-16 19:37 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,

The battery in my laptop, Lenovo X1 Carbon, with NixOS version 17.03 ran 
out of power in May. When I started the laptop again, it did not boot 
properly. I have used another laptop since then.

I booted it with a Live USB with NixOS 17.03 and followed the 
instructions under the section 'NILFS got stuck' at 
http://nilfs.sourceforge.net/en/faq.html:

# echo t > /proc/sysrq-trigger
# dmesg

I should say I have no prior experience of sysrq. Anyhow, you can see 
the dmesg output at the following link:

https://mega.nz/#F!4UoFwYaQ!u7GEnZ3n0BUyNexe3vI2-w

Please get back to me if you want more information.

Not sure if this is of any interest but this is an excerpt from 
journalctl:

Apr 24 06:55:22 nixos kernel: NILFS (dm-0): bad btree node (ino=1609, 
blocknr=13626216): level = 51, flags = 0x44, nchildren = 12298
Apr 24 06:55:22 nixos kernel: NILFS error (device dm-0): 
nilfs_bmap_lookup_contig: broken bmap (inode number=1609)
Apr 24 06:55:22 nixos kernel: Remounting filesystem read-only
Apr 24 06:55:22 nixos kernel: NILFS (dm-0): bad btree node (ino=1609, 
blocknr=13626216): level = 51, flags = 0x44, nchildren = 12298
Apr 24 06:55:22 nixos kernel: NILFS error (device dm-0): 
nilfs_bmap_lookup_contig: broken bmap (inode number=1609)
Apr 24 06:55:22 nixos kernel: NILFS (dm-0): bad btree node (ino=1609, 
blocknr=13626216): level = 51, flags = 0x44, nchildren = 12298
Apr 24 06:55:22 nixos kernel: NILFS error (device dm-0): 
nilfs_bmap_lookup_contig: broken bmap (inode number=1609)
Apr 24 06:55:22 nixos kernel: NILFS (dm-0): bad btree node (ino=1609, 
blocknr=13626216): level = 51, flags = 0x44, nchildren = 12298
Apr 24 06:55:22 nixos kernel: NILFS error (device dm-0): 
nilfs_bmap_lookup_contig: broken bmap (inode number=1609)
Apr 24 06:55:22 nixos kernel: NILFS (dm-0): bad btree node (ino=1609, 
blocknr=13626216): level = 51, flags = 0x44, nchildren = 12298
Apr 24 06:55:22 nixos kernel: NILFS error (device dm-0): 
nilfs_bmap_lookup_contig: broken bmap (inode number=1609)
Apr 24 06:55:22 nixos kernel: NILFS (dm-0): bad btree node (ino=1609, 
blocknr=13626216): level = 51, flags = 0x44, nchildren = 12298
Apr 24 06:55:22 nixos kernel: NILFS error (device dm-0): 
nilfs_bmap_lookup_contig: broken bmap (inode number=1609)
Apr 24 06:55:22 nixos kernel: NILFS (dm-0): bad btree node (ino=1609, 
blocknr=13626216): level = 51, flags = 0x44, nchildren = 12298
Apr 24 06:55:22 nixos kernel: NILFS error (device dm-0): 
nilfs_bmap_lookup_contig: broken bmap (inode number=1609)
Apr 24 06:55:22 nixos kernel: NILFS (dm-0): bad btree node (ino=1609, 
blocknr=13626216): level = 51, flags = 0x44, nchildren = 12298
Apr 24 06:55:22 nixos kernel: NILFS error (device dm-0): 
nilfs_bmap_lookup_contig: broken bmap (inode number=1609)
Apr 24 06:55:22 nixos kernel: NILFS (dm-0): bad btree node (ino=1609, 
blocknr=13626216): level = 51, flags = 0x44, nchildren = 12298
Apr 24 06:55:22 nixos kernel: NILFS error (device dm-0): 
nilfs_bmap_lookup_contig: broken bmap (inode number=1609)
Apr


Kind regards,

Mikael Andersson
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NILFS error after power loss
       [not found]     ` <39eaa9df1d7d61fbcbfba585f26896e5-m2T68/X/qvZAfugRpC6u6w@public.gmane.org>
@ 2017-07-16 22:47       ` Peter Grandi
       [not found]         ` <78546a7101d18887029883c515fadbf5@fripost.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Grandi @ 2017-07-16 22:47 UTC (permalink / raw)
  To: Linux fs NILFS

[ ... ]

> Apr 24 06:55:22 nixos kernel: NILFS (dm-0): bad btree node (ino=1609, blocknr=13626216): level = 51, flags = 0x44, nchildren = 12298
> Apr 24 06:55:22 nixos kernel: NILFS error (device dm-0): nilfs_bmap_lookup_contig: broken bmap (inode number=1609)
> Apr 24 06:55:22 nixos kernel: Remounting filesystem read-only

The standard NILFS2 recovery is to mount an earlier
checkpoint. This to be done quickly to ensure they don't get
deleted by the cleaner, but then if the system does not start it
should not start either.

Usually corruption like this is because of the storage system
not implementing correctly barriers, and that might related to
the use of DM (some versions lack that implementation). If
barriers do work the previous checkpoint to the faulty one
should always be correct, because it has been checkpointed.

Otherwise there is a a known bug that apparently is triggered
easily only by very high concurrent load, as it involves a race
condition:

  http://marc.info/?l=linux-nilfs&m=149992828611084&w=2
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NILFS error after power loss
       [not found]           ` <78546a7101d18887029883c515fadbf5-m2T68/X/qvZAfugRpC6u6w@public.gmane.org>
@ 2017-07-22 15:21             ` Peter Grandi
       [not found]               ` <6603e34b36ee6db3997c47e450fc28dd@fripost.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Grandi @ 2017-07-22 15:21 UTC (permalink / raw)
  To: Linux fs NILFS

>> The standard NILFS2 recovery is to mount an earlier
>> checkpoint. This to be done quickly to ensure they don't get
>> deleted by the cleaner, but then if the system does not start
>> it should not start either.

> I have started on Ubuntu Live and mounted the filesystem

That can be done with option 'nogc' to avoid garbage collection
of checkpoints.

> and have made the earliest checkpoint a snapshot in order not
> to lose it.

Good idea, would be nice perhaps also for other checkpoints.

> How do I rollback in the simplest way?

As the manual explains, checkpoints can be mounted, and if they
fail to be mounted, be deleted. The NILFS2 code tries to mount
the latest checkpoint (with a valid checksum).

>> Usually corruption like this is because of the storage system
>> not implementing correctly barriers, and that might related
>> to the use of DM (some versions lack that implementation). If
>> barriers do work the previous checkpoint to the faulty one
>> should always be correct, because it has been checkpointed.

> I do not know the technical details well enough. Where can I
> read more about what barriers mean in this context?

I think that there are things like "web search engines" that
might help, but to starts that search those means features of
hardware and software that ensure that all critical updates are
recorded on persistent storage, things used to implement "fsync"
and the in-kernel equivalent. Since you use a laptop you may
have been tempted to disable or weaken them.

> What does DM mean?

"Device Manager", as indicated by the error message including
"dm-0":

>> Apr 24 06:55:22 nixos kernel: NILFS (dm-0): bad btree node (ino=1609, blocknr=13626216): level = 51, flags = 0x44, nchildren = 12298

It is used to support LVM2 or LUKS etc.
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NILFS error after power loss
       [not found]                 ` <6603e34b36ee6db3997c47e450fc28dd-m2T68/X/qvZAfugRpC6u6w@public.gmane.org>
@ 2017-07-31 13:20                   ` Peter Grandi
  0 siblings, 0 replies; 4+ messages in thread
From: Peter Grandi @ 2017-07-31 13:20 UTC (permalink / raw)
  To: Linux fs NILFS

[ ... ]

> But as far as I understand it is not possible to mount a
> previous snapshot as writable if there are snapshots/checkpoints
> after this snapshot. Since I only get a filesystem error when
> mounting a snapshot writable,

That seems unlikely to me. After mounting read-only, check whether
the whole filetree can be accessed error-free, with something like

  find $DIR -xdev -perm /07777 | wc -l

for metadata and then for data too:

  tar -f /dev/zero -c --one $DIR

> I will then have to remove one checkpoint at a time from the end
> and make the latest one a snapshot and mount it rewritable. [
> ... ]

Eventually, if you can find a checkpoint/snapshot that is
error-free, you can delete any newer corrupted ones and mount that
one read-write. Ideally you would do a nice backup before doing
that.

If you cannot find any that is error-free, probably that was
either a grievous IO error (most likely lack of proper barriers)
or the consequences of that recently discovered bug, if you are
very unlucky.

Usually the second newest checkpoint/snapshot is error-free when
a system crashes and the newest has got errors, that is usually
only the newest checkpoint is invalid.
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-07-31 13:20 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <3909a3843e9fc28c7c1f7812f47b7b2b@fripost.org>
     [not found] ` <3909a3843e9fc28c7c1f7812f47b7b2b-m2T68/X/qvZAfugRpC6u6w@public.gmane.org>
2017-07-16 19:37   ` NILFS error after power loss mikael-m2T68/X/qvZAfugRpC6u6w
     [not found]     ` <39eaa9df1d7d61fbcbfba585f26896e5-m2T68/X/qvZAfugRpC6u6w@public.gmane.org>
2017-07-16 22:47       ` Peter Grandi
     [not found]         ` <78546a7101d18887029883c515fadbf5@fripost.org>
     [not found]           ` <78546a7101d18887029883c515fadbf5-m2T68/X/qvZAfugRpC6u6w@public.gmane.org>
2017-07-22 15:21             ` Peter Grandi
     [not found]               ` <6603e34b36ee6db3997c47e450fc28dd@fripost.org>
     [not found]                 ` <6603e34b36ee6db3997c47e450fc28dd-m2T68/X/qvZAfugRpC6u6w@public.gmane.org>
2017-07-31 13:20                   ` Peter Grandi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.