linux-nilfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Martin Vahi <martin.vahi@softf1.com>
To: linux-nilfs@vger.kernel.org
Subject: Continuation of the topic "error Foo(now -5, once -22) while searching super root"
Date: Fri, 29 Dec 2023 17:51:37 +0200	[thread overview]
Message-ID: <853218a8-507e-18de-d745-4e6e51b43025@softf1.com> (raw)
In-Reply-To: <4f12ac12-0cae-7959-6aea-9b2fc6e1e4f5@softf1.com>

This letter is a continuation of the thread that I started at 2023_10_15

https://marc.info/?l=linux-nilfs&m=169738371518323&w=2
archival copy: https://archive.is/Fbw5e

The purpose of my current letter is to document, write down,
information that might contain hints to the flaw.

This time the computer was the same,

     (Two 2-threaded cores, 12GiB RAM minus the video memory. A line from /var/cpuinfo)
     Intel(R) Core(TM) i5-4200M CPU @ 2.50GHz

Linux distribution on the computer was different,
a live-DVD with KNOPPIX version 9.1
freshly written on a MDisc DVD(As of 2023_12 still totally buyable
from original manufacturer, ritek-europe dot com, redirects to conrexx dot com,
in small quantities like a pack of 100 discs for about 200€, including all shipping and handling.
They order them from Taiwan to Netherlands and then use FedEx to send from
Netherlands to the rest of the EU. No, I do NOT sell them myself, NOR do I earn from that business in any way,
I just had hard time getting MDisc DVDs and that's where I got those,
but the delivery time was about 2 months.) and the USB-HDD was a
~465GiB < 500GiB sized magnetic disc, which was mounted
with mount options "noatime,nodiratime". The error occurred
during a long "git commit -a". The repository resided
on the USB-HDD. There was plenty of CPU-time free, because
only the window manager with a "few" "standard" KNOPPIX
programs were running and there was no shortage of RAM, because
multiple GiB was free. The "git commit -a" was given over
an SSH session, id est the USB cable stayed put, no movement
of the USB cable due to the use of a laptop keyboard.
The laptop booted from the MDisc DVD about 2 days before
the error occurred and the rest of the programs at the laptop
seem to work fine after the error without rebooting the laptop, id est
the kernel did not totally crash.

The laptop with the ~465GiB USB-HDD was not on the same table with the
keyboard that was in use, id est keyboard vibrations did not
reach the USB-HDD or the laptop in any significant amount.


     ----start--of--citation--of--dmesg--output--last--lines---
     [  150.848200] usb 3-4: new high-speed USB device number 5 using xhci_hcd
     [  150.989381] usb 3-4: New USB device found, idVendor=152d, idProduct=2329, bcdDevice= 1.00
     [  150.989390] usb 3-4: New USB device strings: Mfr=1, Product=2, SerialNumber=5
     [  150.989394] usb 3-4: Product: USB to ATA/ATAPI bridge
     [  150.989397] usb 3-4: Manufacturer: JMicron
     [  150.989401] usb 3-4: SerialNumber: 801130168383
     [  150.990928] usb-storage 3-4:1.0: USB Mass Storage device detected
     [  150.991138] usb-storage 3-4:1.0: Quirks match for vid 152d pid 2329: 8020
     [  150.991193] scsi host7: usb-storage 3-4:1.0
     [  154.102691] scsi 7:0:0:0: Direct-Access     WDC WD50 00LPLX-60ZNTT1   02.0 PQ: 0 ANSI: 2 CCS
     [  154.103140] sd 7:0:0:0: Attached scsi generic sg2 type 0
     [  154.103556] sd 7:0:0:0: [sdc] 976773168 512-byte logical blocks: (500 GB/466 GiB)
     [  154.103931] sd 7:0:0:0: [sdc] Write Protect is off
     [  154.103940] sd 7:0:0:0: [sdc] Mode Sense: 28 00 00 00
     [  154.104321] sd 7:0:0:0: [sdc] No Caching mode page found
     [  154.104328] sd 7:0:0:0: [sdc] Assuming drive cache: write through
     [  154.187547]  sdc: sdc1 sdc2 sdc3 sdc4 sdc5 sdc6 sdc7 sdc8
     [  154.188768] sd 7:0:0:0: [sdc] Attached SCSI disk
     [  222.853951] NILFS version 2 loaded
     [  222.855302] NILFS (sdc7): mounting unchecked fs
     [  226.181683] sd 7:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=2s
     [  226.181687] sd 7:0:0:0: [sdc] tag#0 Sense Key : Medium Error [current]
     [  226.181689] sd 7:0:0:0: [sdc] tag#0 Add. Sense: Unrecovered read error
     [  226.181692] sd 7:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 28 13 20 28 00 00 08 00
     [  226.181694] blk_update_request: critical medium error, dev sdc, sector 672342056 op 0x0:(READ) flags 0x0 
phys_seg 1 prio class 0
     [  226.181726] NILFS (sdc7): I/O error reading segment
     [  226.181729] NILFS (sdc7): error -5 while searching super root
     root@Microknoppix:/home/knoppix/haakimiskataloogid# mount -t nilfs2 /dev/sdc7 ./h1
     mount.nilfs2: Error while mounting /dev/sdc7 on /home/knoppix/haakimiskataloogid/h1: Input/output error
     root@Microknoppix:/home/knoppix/haakimiskataloogid#
     ----end----of--citation--of--dmesg--output--last--lines---

I find it scary that a file system can get so unusable
during ordinary use while the hardware seems to be just fine
and there is no standard tool to recover even a fraction
of the files at the NilFS2 partition. As of 2023_12_29
I have most of my files on NilFS2 partitions with the hope
that it helps to preserve them, but it turns out that
when ext4 fails, looses files, at power failures, then NilFS2 fails
at plain usage scenarios, where there is no power
failure or any other relevant event. As things stand now (2023_12_29),
my strategy is to mirror my files at different file systems:
NilFS2 to not loose files during power failures or resets and
ext4 to not loose files during plain, calm, low-intensity, HDD usage.

Another line of thought is that RAIDs are useless, if the
kernel of the computer that the RAID is connected to,
corrupts in RAM. Therefore the HDDs with different file system types
should be connected to different computers, preferably running
different operating systems. As my main operating system is some
Linux distribution (varies over time and between machines), then
FreeBSD, OpenBSD and Solaris derivatives (illumos and alike
come to mind) as hopefully sufficiently varying options.

I mentioned MDisc DVD-s, because if an operating system
boots from a DVD, then there CAN NOT BE ANY KERNEL
FILE SYSTEM CORRUPTION RELATED BOOT BINARY CORRUPTION
and unlike plain DVD-s, MDisc DVDs last longer than 10 years.
MDisc DVDs can be reliably written only with a special DVD writer that has
slightly more powerful laser than other, "ordinary", DVD-writers,
but the prices of such USB-DVD-writers are roughly the same as
with other USB-DVD-writers, only the specs differ and one
must make a slightly greater effort to find the USB-DVD-writers
that have "MDisc support". Supposedly other DVD-writers
also write MDisc DVDs, but with a great error rate. MDisc DVDs
are designed to be readable with plain DVD-writers/readers
and even old DVD-readers that lack DVD writing capability.
I mention that aspect, because with Raspberry_Pi-like computers
the Flash memory card wear related errors also appear
at the Linux kernel binary and other installed binaries and
that makes the Raspberry_Pi-like computers unstable over time.
As of 2023 the newest Raspberry_Pi official Linux, the "RaspberryOS"
has the option to use a "readonly-write-only-to-RAM filesystem", a lot like
live Linux DVDs use to reduce the wear of the memory card,
but the various scientific papers (You can search them Yourself,
there are plenty on the net, easy to find, semanticscholar dot org )
basically, depending on how one interprets the graphs and
temperature conditions of memory cards and memory sticks,
state the "sufficiently reliable" data retention rate to
about one year, 2 years tops. Again, depending on interpretation.
My interpretation is that if I touch a memory card or
a USB memory stick than I feel that it's hot and therefore
the Flash memory die in the device must be even hotter.
My personal intuitive observation matches with the
roughly 1 year retention time, after which the data should
be rewritten, including file system formatting information.

That is to say, for reliability, Flash memory card should be taken out of
a Raspberry_Pi-like computer roughly once per year and
the Linux program dd should/might be used to rewrite the
original image to the memory card, even if the memory card
is used in "readonly-write-only-to-RAM mode".

The F-RAM, used for storing program code in
car-industry microcontrollers (MCUs) does not
seem to be any better than Flash, despite initial hype,
except that may be MCUs are "stored"/used in cooler conditions.
I do not know. Car engines do get hot. That is to say,
some old-fashioned ROM can be pretty nice thing to have
and for laptops and desktops a MDisc DVD or BluRay
can be that "ROM", except that according to
some sources on the wild-wild-web the BluRay's,
including the proper MDisc BluRays that have
the inorganic die, not the
Verbatim (yes, that famous brand) fakes that only
use the MDisc as a trade-mark, supposedly have
a higher error rate than MDisc DVDs. But, even
plain DVDs can be a lot of help, for at least 5 year period,
possibly for a 10 year period. Again, if DVD
is like ROM, then any driver binary, including NilFS2 driver,
binary does not corrupt due to storage bitrot,
filesystem information corruption.

And the bad news is that the so hyped up "cloud storage",
where the storage providers advertise that they store
their clients' data at some really fancy and fast
solid state storage devices (essentially Flash memory)
has exactly the same bitrotting issue, which is why
at least one person that I know of (not me, yet)
keeps an off-cloud list of file hashes (MD5, SHA256, ...)
of files that a his clients' web application
at a server consists of. I mean, if banking information
or other critical information is stored at modern Flash-memory
based fast storage devices at the greatest and fanciest
servers and that information were to corrode due to
some file system driver issue that no RAID can compensate...


Summary of my compromise-semi-workaround:

     x) Boot from a live-DVD and use a Bash script to
        customize the running instance, id est copy
        the /etc/passwd and /etc/shadow files and
        /etc/ssh folder. Plain DVDs will do, but
        MDisc DVDs are better and with some long-term planning
        it is still (as of 2023_12) possible to get them, not as "new-old-stock", but
        as brand new products that are still being produced in relatively small volumes.
        Once their patents expire, there might be even multiple MDisc DVD producers,
        if there is enough demand for MDisc DVDs. If people still
        buy plain DVDs, then there might be also some market for MDisc DVDs.

        (I like to think of DVDs like I think of paper: relatively low data capacity,
        we don't produce them at home, id est it takes a factory to
        produce them, yet we use the old-fashioned paper still for
        data storage in many situations, like labels on apple-jam jars,
        packaging of many products contain text and image information, etc.
        In that sense DVD format as such might last for a long time,
        specially if it overcomes storage reliability issues like
        the MDisc DVDs have overcome, and if there are
        multiple producers like there are multiple paper producing
        factories.)

     x) Mirror files on different HDDs/SSDs that have
        different file system types,
        one HDD/SSD per computer to counter a situation, where
        a kernel/file_system_driver running instance corrupts
        due to some typical C/C++ related memory corruption.

     x) With Raspberry_Pi-like computers, overwrite the
        memory card once per year (with "dd", id est
        including filesystem formatting information)
        and try to avoid wearing the memory card by switching off
        the swap ("swapoff --all").

     x) With various Linux file systems use the
        "noatime,nodiratime" mount options
        ("mount -o noatime,nodiratime /dev/foodevice /bar/folder")

Thank You for reading my letter.
I hope that it helps to somehow get by
till the core of the NilFS2
corruption issue gets solved.

Yours sincerely,
Martin.Vahi@softf1.com



      parent reply	other threads:[~2023-12-29 15:59 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-15 10:12 How to Elegantly Handle "error -22 while searching super root" with Multi-TiB USB-HDDs Martin Vahi
2023-10-15 15:31 ` Ryusuke Konishi
2023-12-29 15:51 ` Martin Vahi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=853218a8-507e-18de-d745-4e6e51b43025@softf1.com \
    --to=martin.vahi@softf1.com \
    --cc=linux-nilfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).