All of lore.kernel.org
 help / color / mirror / Atom feed
* couldn't mount because of unsupported optional features (477e7ad1e859f753)
@ 2021-12-30  9:16 Hendrik Levsen
       [not found] ` <39575f5e-b47a-d971-6c15-35985a35c9d5-j5CO6tLloWodnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Hendrik Levsen @ 2021-12-30  9:16 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,

Trying to mount a NILFS partition fails with "couldn't mount because of
unsupported optional features (477e7ad1e859f753)". Neither the OS nor
the partition have been touched/modified since the last successful
mount. Might the underlying block device be corrupt? Is
"477e7ad1e859f753" a valid set of feature flag?

Thank you

Hendrik



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: couldn't mount because of unsupported optional features (477e7ad1e859f753)
       [not found] ` <39575f5e-b47a-d971-6c15-35985a35c9d5-j5CO6tLloWodnm+yROfE0A@public.gmane.org>
@ 2021-12-30 12:00   ` Peter Grandi
       [not found]     ` <25037.40801.808565.662504-5EdyzN1Ji+RYO2OccljXW7VCufUGDwFn@public.gmane.org>
  2021-12-31 12:50   ` Ryusuke Konishi
  1 sibling, 1 reply; 7+ messages in thread
From: Peter Grandi @ 2021-12-30 12:00 UTC (permalink / raw)
  To: list Linux fs NILFS

> Trying to mount a NILFS partition

To be pedantic, but it matters, it is "block device" in UNIX-like
systems, and more "NILFS2 instance", as here could be multiple
NILFS2 instances even in a single block device (but that is a
very rare setup usually requiring 'losetup' mounts).

> fails with "couldn't mount because of unsupported optional
> features (477e7ad1e859f753)". [...]

That does not look a lucky situation. You can use 'lscp
/dev/...'  to list the checkpoints and try to mount an older
checkpoint with 'mount -t nilfs2 -o cp=... /dev/... ...' to
mount it and resume work from that. In theory older checkpoints
will be fully consistent even if the latest one is corrupted.

Unless that  message means that the NILFS2 instance is corrupted
because of "issues" (usually hardware, most common with block
devices on USB storage devices).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: couldn't mount because of unsupported optional features (477e7ad1e859f753)
       [not found]     ` <25037.40801.808565.662504-5EdyzN1Ji+RYO2OccljXW7VCufUGDwFn@public.gmane.org>
@ 2021-12-31  9:51       ` Hendrik Levsen
       [not found]         ` <37be5d12-adea-6399-65c3-6d50008c18ff-j5CO6tLloWodnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Hendrik Levsen @ 2021-12-31  9:51 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On 30/12/21 11:00 pm, Peter Grandi wrote:
>> Trying to mount a NILFS partition
> To be pedantic, but it matters, it is "block device" in UNIX-like
> systems, and more "NILFS2 instance", as here could be multiple
> NILFS2 instances even in a single block device (but that is a
> very rare setup usually requiring 'losetup' mounts).

Correct, being sloppy here in my terminology, it's a block device which
is a RAID-1 (+ dmcrypt/luks) which gives me confidence that the
underlying hardware is ok.

>> fails with "couldn't mount because of unsupported optional
>> features (477e7ad1e859f753)". [...]
> That does not look a lucky situation. You can use 'lscp
> /dev/...'  to list the checkpoints and try to mount an older
> checkpoint with 'mount -t nilfs2 -o cp=... /dev/... ...' to
> mount it and resume work from that. In theory older checkpoints
> will be fully consistent even if the latest one is corrupted.

Thanks Peter, it seems both lscp and mount -o cp need a functioning
super block though.

> Unless that  message means that the NILFS2 instance is corrupted
> because of "issues" (usually hardware, most common with block
> devices on USB storage devices).

I might dig into this a little deeper, the data isn't that important but
gaining a correct understanding of NILFS working principles is. My
understanding so far was that it's quite hard for data to become
entirely inaccessible.

This looks like a good idea, linear scan for segment nodes:
https://www.spinics.net/lists/linux-nilfs/msg02198.html Could be the
start of the fsck that never happened.

-- h.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: couldn't mount because of unsupported optional features (477e7ad1e859f753)
       [not found]         ` <37be5d12-adea-6399-65c3-6d50008c18ff-j5CO6tLloWodnm+yROfE0A@public.gmane.org>
@ 2021-12-31 11:43           ` Peter Grandi
       [not found]             ` <25038.60666.361700.270143-5EdyzN1Ji+RYO2OccljXW7VCufUGDwFn@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Grandi @ 2021-12-31 11:43 UTC (permalink / raw)
  To: list Linux fs NILFS

[...]
>> use 'lscp /dev/...' to list the checkpoints and try to mount
>> an older checkpoint with 'mount -t nilfs2 -o cp=... /dev/...
>> ...' to mount it and resume work from that. In theory older
>> checkpoints will be fully consistent even if the latest one
>> is corrupted.

> Thanks Peter, it seems both lscp and mount -o cp need a
> functioning super block though.

If the superblock is gone, it is a rather unlucky situation. But
note that NILFS2 has got a redundant copy of the superblock like
most other filesystem types. This is described here:

  https://github.com/nilfs-dev/nilfs2-kmod7/blob/master/fs/nilfs2/the_nilfs.c#L490

This mailing list thread may be particularly relevant:

  https://www.mail-archive.com/linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg01438.html
  https://www.mail-archive.com/linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg01239.html
  https://www.mail-archive.com/linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg01238.html

In my experience it never happened that NILFS2 corrupted a
superblock, so it is most likely an external cause.

> I might dig into this a little deeper, the data isn't that
> important but gaining a correct understanding of NILFS working
> principles is. My understanding so far was that it's quite
> hard for data to become entirely inaccessible.

The same for most other filesystem types, but for log structured
ones it is even harder. The NILFS2 idea is that since all
metadata blocks are checksummed, one can just roll back to a
checkpoint where all checksums work, and then the filesystem is
consistent up to that point. This does not protect against most
cases of data corruption or damage to the superblock or spread
damage to metadata (in the latter case it may be impossible to
find a sequence of segments with valid checksums).

NILFS2 has some interesting recovery logic here:

  https://github.com/nilfs-dev/nilfs2-kmod7/blob/master/fs/nilfs2/recovery.c

> This looks like a good idea, linear scan for segment nodes:
> https://www.spinics.net/lists/linux-nilfs/msg02198.html Could
> be the start of the fsck that never happened.

That is not quite an 'fsck' but a recovery tool; many 'fsck'
implementations also attempt to do a bit of recovery too, but
their primary function is to repair metadata in case of partial
writes, which because of the checksums mentioned above is not
necessary for NILFS2, and the same argument is used for ZFS,
which is "log based" or "log inspired".

I find the lack of 'fack' for NILFS2 and ZFS a mild issue:
whether or not a filesystem type needs a repair too, another
core function of 'fsck' is an auditing tool, to be run
periodically even if there are no known issues (ZFS "resilvering
is not a full audit). But then how many people nowadays run
regularly 'fsck' where it is available as an auditing tool even
if there are no known issues?

One of the most profound quotes in the history of information
engineering:

  "As far as we know, our computer has never had an undetected
  error" Conrad H. Weisert (Union Carbide Corporation) in
  "Datamation" (1969)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: couldn't mount because of unsupported optional features (477e7ad1e859f753)
       [not found]             ` <25038.60666.361700.270143-5EdyzN1Ji+RYO2OccljXW7VCufUGDwFn@public.gmane.org>
@ 2021-12-31 11:55               ` Peter Grandi
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Grandi @ 2021-12-31 11:55 UTC (permalink / raw)
  To: list Linux fs NILFS

>> This looks like a good idea, linear scan for segment nodes:
>> https://www.spinics.net/lists/linux-nilfs/msg02198.html Could
>> be the start of the fsck that never happened.

There was something like that:

  https://www.mail-archive.com/linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg01391.html

The same person has written a very detailed overview of the
implementation of NILFS2:

  http://dubeyko.com/development/FileSystems/NILFS/nilfs2-design.pdf

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: couldn't mount because of unsupported optional features (477e7ad1e859f753)
       [not found] ` <39575f5e-b47a-d971-6c15-35985a35c9d5-j5CO6tLloWodnm+yROfE0A@public.gmane.org>
  2021-12-30 12:00   ` Peter Grandi
@ 2021-12-31 12:50   ` Ryusuke Konishi
       [not found]     ` <CAKFNMo=gkj_9wzw+qjmfHr53-4WZeMjSgwHnrDakfTdZkSGdNw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 7+ messages in thread
From: Ryusuke Konishi @ 2021-12-31 12:50 UTC (permalink / raw)
  To: Hendrik Levsen, Peter Grandi; +Cc: linux-nilfs

Hi,

On Fri, Dec 31, 2021 at 7:16 PM Hendrik Levsen <hendrik-j5CO6tLloWodnm+yROfE0A@public.gmane.org> wrote:
>
> Hi,
>
> Trying to mount a NILFS partition fails with "couldn't mount because of
> unsupported optional features (477e7ad1e859f753)". Neither the OS nor
> the partition have been touched/modified since the last successful install
> mount. Might the underlying block device be corrupt? Is
> "477e7ad1e859f753" a valid set of feature flag?

This value is not valid as the feature flags.
Only 0x00000001 is used for them at present.

The strange thing is that the test of feature flags is done after
validity checks of super blocks (with CRC and a magic number).
So, this type of corruption usually doesn't happen.

NILFS maintains two super blocks, at the beginning and end of the
partition.  If one of them is destroyed, it will be automatically detected
and repaired with a spare.
It seems that something unexpected has happened -  for example,
some external tool or underlying device driver directly overwrote
the super block, and oneline NILFS driver signed CRC against the
broken super block and then wrote it back to the block device unfortunately.

If you inspect further, nilfs-tune command may give some information.
It displays a summary of one of the valid super blocks as follows:

$ sudo nilfs-tune -l /dev/xxxx
nilfs-tune 2.2.5
Filesystem volume name:   (none)
Filesystem UUID:   xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Filesystem magic number:  0x3434
Filesystem revision #:   2.0
Filesystem features:      (none)
Filesystem state:   valid
Filesystem OS type:   Linux
Block size:   4096
Filesystem created:   Tue Mar  2 20:10:57 2021
Last mount time:   Fri Dec 31 20:52:02 2021
Last write time:   Fri Dec 31 20:52:22 2021
Mount count:   28
Maximum mount count:   50
Reserve blocks uid:   0 (user root)
Reserve blocks gid:   0 (group root)
First inode:   11
Inode size:   128
DAT entry size:   32
Checkpoint size:   192
Segment usage size:   16
Number of segments:   59617
Device size:   500107862016
First data block:   1
# of blocks per segment:  2048
Reserved segments %:   5
Last checkpoint #:   529322
Last block address:   50448632
Last sequence #:   24605
Free blocks count:   110473216
Commit interval:   0
# of blks to create seg:  0
CRC seed:   0xxxxxxxxx
CRC check sum:   0xxxxxxxxx
CRC check data size:   0x00000118


As for recovery, if a spare superblock survives with
valid data, you may be able to repair the file system by manually
erasing the broken one, in theory.   However, this operation
must be done very carefully and is not an intended repair method.
Therefore, I don't recommend it, and I think we should not make it
an usual option.

OTOH, it deserves consideration if there is room for improvement
on the logic or implementation of the tandem method of NILFS
super blocks.

Thanks,
Ryusuke Konishi


>
> Thank you
>
> Hendrik
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: couldn't mount because of unsupported optional features (477e7ad1e859f753)
       [not found]     ` <CAKFNMo=gkj_9wzw+qjmfHr53-4WZeMjSgwHnrDakfTdZkSGdNw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2021-12-31 13:40       ` Hendrik Levsen
  0 siblings, 0 replies; 7+ messages in thread
From: Hendrik Levsen @ 2021-12-31 13:40 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Peter and Ryusuke,

Awesome thanks for the detailed info and pointers. As for the risky
recovery attempts, presumably I can just copy the whole block device
block-by-block to a new disk or even a file and mess with it there to my
heart's content.

-- h.



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-12-31 13:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-30  9:16 couldn't mount because of unsupported optional features (477e7ad1e859f753) Hendrik Levsen
     [not found] ` <39575f5e-b47a-d971-6c15-35985a35c9d5-j5CO6tLloWodnm+yROfE0A@public.gmane.org>
2021-12-30 12:00   ` Peter Grandi
     [not found]     ` <25037.40801.808565.662504-5EdyzN1Ji+RYO2OccljXW7VCufUGDwFn@public.gmane.org>
2021-12-31  9:51       ` Hendrik Levsen
     [not found]         ` <37be5d12-adea-6399-65c3-6d50008c18ff-j5CO6tLloWodnm+yROfE0A@public.gmane.org>
2021-12-31 11:43           ` Peter Grandi
     [not found]             ` <25038.60666.361700.270143-5EdyzN1Ji+RYO2OccljXW7VCufUGDwFn@public.gmane.org>
2021-12-31 11:55               ` Peter Grandi
2021-12-31 12:50   ` Ryusuke Konishi
     [not found]     ` <CAKFNMo=gkj_9wzw+qjmfHr53-4WZeMjSgwHnrDakfTdZkSGdNw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-12-31 13:40       ` Hendrik Levsen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.