All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Corrupted nilfs2 volume
       [not found] ` <20130315024019.7DC1BD00-foQFVU7ZUqen8/iIaQqUdCjHesifYMBh@public.gmane.org>
@ 2013-03-15  6:21   ` Vyacheslav Dubeyko
  2013-03-15 23:21     ` Alexander Bezrukov
  0 siblings, 1 reply; 14+ messages in thread
From: Vyacheslav Dubeyko @ 2013-03-15  6:21 UTC (permalink / raw)
  To: Alexander Bezrukov; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Alexander,

On Fri, 2013-03-15 at 06:40 +0400, Alexander Bezrukov wrote:
> Hello!
> 
> First of all, thank you for the nilfs2.
> 
> I have been using this filesystem for few years on different types of storage ranging from small SSDs to huge 10+TB volumes on big arrays with no a single problem but now I am stuck with a broken volume.
> 
> The long story short. I have a laptop which runs vanilla linux-3.8.2 with nilfs on root partition (and nilfs-utils-2.1.4, if that matters; the volume has been created with 2.0.*-series of nilfs-utils with default options). Yesterday I noticed it doesn't switch its display off (probably because of some failed io) and continously displays a screensaver but I didn't touch it. Today I touched it to see that it deadly hung. I had to power cycle the laptop and since then the kernel cannot mount rootfs with:
> 
> NILFS: Invalid checkpoint (checkpoint number=5439464)
> NILFS: error loading last checkpoint (checkpoint number=5439464)
> 

Had you any snapshots on this volume?

It needs to investigate the issue more deeply. Could you try to mount this volume under another system? I hope that it gives opportunity to get more details about the issue from system log. So, could you share NILFS2 related messages for such try?

> I booted from a USB flash and inspected S.M.A.R.T attributes of the HDD. It looks absolutely healthy.
> 
> Of remarkable events, couple of months ago I have managed to make the / partition full. My distro has made a change in its initscripts (or maybe udev rules, I don't remember exactly) which led to /dev/root symlink no more created. With no /dev/root, nilfs_cleanerd didn't start, this lead to the full root partition and at the end /etc/mtab could no more be created early at boot. I rebooted with init=/bin/bash, remounted / read-write, started nilfs_cleanerd manually, waited until it cleans space and manually fixed the problem with /dev/root. After that I used my laptop for about two months, absolutely flawlessly.
> 
> To my best knowledge there is no "official" fsck tool for nilfs2 but at times nilfs2 has just been added to the mainline kernel I read somewhere about some "unofficial" version at some developer branch.
> 

I send you personally archive with actual state of fsck.nilfs2. I need in debug output of the fsck.nilfs2 for analysis of situation on the volume.

> Anyway, is there any chance to debug the problem and probably cure the volume?

I hope that we can recover your volume. But, anyway, we need to analyze the issue and try to cure the volume.

With the best regards,
Vyacheslav Dubeyko.

> I would be happy to provide any additional information.
> 
> Thanks in advance.
> Alexander
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Corrupted nilfs2 volume
  2013-03-15  6:21   ` Corrupted nilfs2 volume Vyacheslav Dubeyko
@ 2013-03-15 23:21     ` Alexander Bezrukov
  2013-03-16 15:11       ` Vyacheslav Dubeyko
  0 siblings, 1 reply; 14+ messages in thread
From: Alexander Bezrukov @ 2013-03-15 23:21 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA


Hi Vyacheslav,

> Had you any snapshots on this volume?

Unfortunately no. I had had one but have recently converted
t to an ordinary cp.

> It needs to investigate the issue more deeply. Could you
> try to mount this volume under another system? I hope that
> it gives opportunity to get more details about the issue
> from system log. So, could you share NILFS2 related messages
> for such try?

This is what I've indeed done in the first place.

>> NILFS: Invalid checkpoint (checkpoint number=5439464)
>> NILFS: error loading last checkpoint (checkpoint number=5439464)

In fact, these lines I have copied from the system log of another
system (with some some debian distro kernel 3.2.0). But this is
exactly how my original kernel complains too.

> > To my best knowledge there is no "official" fsck tool for nilfs2
> > but at times nilfs2 has just been added to the mainline kernel I
> > read somewhere about some "unofficial" version at some developer
> > branch.
>
> I send you personally archive with actual state of fsck.nilfs2.
> I need in debug output of the fsck.nilfs2 for analysis of situation
> on the volume.

I am sorry, I didn't receive any archive.

> I hope that we can recover your volume. But, anyway, we need to analyze
> the issue and try to cure the volume.

For me the latter is even more important.

Best regards.
Alexander Bezrukov

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Corrupted nilfs2 volume
  2013-03-15 23:21     ` Alexander Bezrukov
@ 2013-03-16 15:11       ` Vyacheslav Dubeyko
       [not found]         ` <70219EF1-8083-4DD5-BA18-84CD1914DC3E-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Vyacheslav Dubeyko @ 2013-03-16 15:11 UTC (permalink / raw)
  To: Alexander Bezrukov; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Alexander,

On Mar 16, 2013, at 2:21 AM, Alexander Bezrukov wrote:

> 
> Hi Vyacheslav,
> 
>> Had you any snapshots on this volume?
> 
> Unfortunately no. I had had one but have recently converted
> t to an ordinary cp.
> 
>> It needs to investigate the issue more deeply. Could you
>> try to mount this volume under another system? I hope that
>> it gives opportunity to get more details about the issue
>> from system log. So, could you share NILFS2 related messages
>> for such try?
> 
> This is what I've indeed done in the first place.
> 
>>> NILFS: Invalid checkpoint (checkpoint number=5439464)
>>> NILFS: error loading last checkpoint (checkpoint number=5439464)
> 
> In fact, these lines I have copied from the system log of another
> system (with some some debian distro kernel 3.2.0). But this is
> exactly how my original kernel complains too.
> 

Ok. I think that maybe I will prepare special patch with additional debug output for deeper investigation and understanding situation on your volume. But initially I need to have the whole picture of your volume. Namely, I need in debug output of fsck.nilfs2 utility. Moreover, could you share output of "nilfs-tune -l" utility?

>>> To my best knowledge there is no "official" fsck tool for nilfs2
>>> but at times nilfs2 has just been added to the mainline kernel I
>>> read somewhere about some "unofficial" version at some developer
>>> branch.
>> 
>> I send you personally archive with actual state of fsck.nilfs2.
>> I need in debug output of the fsck.nilfs2 for analysis of situation
>> on the volume.
> 
> I am sorry, I didn't receive any archive.
> 

It is strange. Now you can download nilfs-utils archive with last actual version of fsck.nilfs2 from http://dubeyko.com/development/FileSystems/NILFS/nilfs-utils-fsck-v.0.04-under-development.tar.gz. Unfortunately, I haven't enough time for further implementation of fsck during last two months. But I am going to continue implementation. So, fsck can check only superblocks and segment summaries. But debug output of current state of fsck.nilfs2 will be very helpful for getting picture of the volume. Please, compile utilities set with fsck and run "fsck -v debug [device] 2> [output-file]". Then, please, share only with output file because it will have significant size.

Thanks,
Vyacheslav Dubeyko.

>> I hope that we can recover your volume. But, anyway, we need to analyze
>> the issue and try to cure the volume.
> 
> For me the latter is even more important.
> 
> Best regards.
> Alexander Bezrukov
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Corrupted nilfs2 volume
       [not found]         ` <70219EF1-8083-4DD5-BA18-84CD1914DC3E-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2013-03-18 14:23           ` Alexander Bezrukov
  2013-03-18 20:44             ` Vyacheslav Dubeyko
  0 siblings, 1 reply; 14+ messages in thread
From: Alexander Bezrukov @ 2013-03-18 14:23 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Vyacheslav,


> Moreover, could you share output of "nilfs-tune -l" utility?

Here you are.

nilfs-tune 2.1.4
Filesystem volume name:	  ROOT
Filesystem UUID:	  0b4ff339-f246-41c6-9c4c-55064d9ffea9
Filesystem magic number:  0x3434
Filesystem revision #:	  2.0
Filesystem features:      (none)
Filesystem state:	  invalid or mounted
Filesystem OS type:	  Linux
Block size:		  4096
Filesystem created:	  Wed May  9 16:52:59 2012
Last mount time:	  Tue Mar 12 07:24:52 2013
Last write time:	  Thu Mar 14 23:39:36 2013
Mount count:		  81
Maximum mount count:	  50
Reserve blocks uid:	  0 (user root)
Reserve blocks gid:	  0 (group root)
First inode:		  11
Inode size:		  128
DAT entry size:		  32
Checkpoint size:	  192
Segment usage size:	  16
Number of segments:	  56432
Device size:		  473390895104
First data block:	  1
# of blocks per segment:  2048
Reserved segments %:	  5
Last checkpoint #:	  5439414
Last block address:	  48011326
Last sequence #:	  305301
Free blocks count:	  45154304
Commit interval:	  0
# of blks to create seg:  0
CRC seed:		  0xf8f795ab
CRC check sum:		  0xc4ea5e0d
CRC check data size:	  0x00000118

> > I am sorry, I didn't receive any archive.
> > 
> 
> It is strange.

[offtopic]: I observed and am still observing continous timeouts
after data coming from bluehost.com domain like this:

Mar 15 10:22:18 localhost postfix/smtpd[26071]: connect from oproxy5-pub.bluehost.com[67.222.38.55]
Mar 15 10:22:19 localhost postfix/smtpd[26071]: 53590D03:client=oproxy5-pub.bluehost.com[67.222.38.55]
Mar 15 10:24:01 localhost postfix/smtpd[26054]: disconnect from vger.kernel.org[209.132.180.67]
Mar 15 10:25:18 localhost postfix/smtpd[26037]: timeout after DATA (407 bytes) from unknown[144.76.18.105]
Mar 15 10:25:18 localhost postfix/smtpd[26037]: disconnect from unknown[144.76.18.105]
Mar 15 10:26:46 localhost postfix/smtpd[26052]: timeout after DATA (999 bytes) from oproxy1-pub.bluehost.com[66.147.249.253]
Mar 15 10:26:46 localhost postfix/smtpd[26052]: disconnect from oproxy1-pub.bluehost.com[66.147.249.253]
Mar 15 10:26:46 localhost postfix/cleanup[26053]: 39E48256: message-id=<1363328491.2078.21.camel@slavad-ubuntu>
Mar 15 10:27:19 localhost postfix/smtpd[26071]: timeout after DATA (1006 bytes) from oproxy5-pub.bluehost.com[67.222.38.55]
Mar 15 10:27:19 localhost postfix/smtpd[26071]: disconnect from oproxy5-pub.bluehost.com[67.222.38.55]
Mar 15 10:27:19 localhost postfix/cleanup[26060]: 53590D03: message-id=<1363328512.2078.23.camel@slavad-ubuntu>
Mar 15 10:28:23 localhost postfix/smtpd[26052]: connect from oproxy1-pub.bluehost.com[66.147.249.253]
Mar 15 10:28:24 localhost postfix/smtpd[26052]: 1995D256: client=oproxy1-pub.bluehost.com[66.147.249.253]

This may or may not be connected to delivery of your previous mail. What
makes me to suspect this may have relation is the hostname in the
message-id. [/offtopic]


> Now you can download nilfs-utils archive with last actual
> version of fsck.nilfs2 from
> http://dubeyko.com/development/FileSystems/NILFS/nilfs-utils-fsck-v.0.04-under-development.tar.gz.

Thank you for the tool.

> Please, compile utilities set with fsck and run "fsck -v debug [device] 2> [output-file]".

I first tried with '-n' and discovered that this mode is not supported.
Then I ran the program as you suggested. I didn't study the program
source but is seems that this tool never writes. The report came out
really huge, please download in from
http://www.dragonworks.ru/nilfs2/fsck.nilfs2.debug.log.xz

I also prepared the '-v info' version which is considerably smaller:
http://www.dragonworks.ru/nilfs2/fsck.nilfs2.info.log.xz

Besides seemengly many problems in the filesystem's structures it
suggests that the primary and secondary superblocks are not identical
so I simply copied last 4kB of the volume to offset 1kB --- to no avail.
(Of course I experimented with a copy of the volume data)

What else can I do?

Thanks and regards.
Alexander.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Corrupted nilfs2 volume
  2013-03-18 14:23           ` Alexander Bezrukov
@ 2013-03-18 20:44             ` Vyacheslav Dubeyko
       [not found]               ` <4E5BDA60-7615-4F82-AC0F-4459DD9EF544-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Vyacheslav Dubeyko @ 2013-03-18 20:44 UTC (permalink / raw)
  To: Alexander Bezrukov; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Alexander,

On Mar 18, 2013, at 5:23 PM, Alexander Bezrukov wrote:

[snip]
> 
> [offtopic]: I observed and am still observing continous timeouts
> after data coming from bluehost.com domain like this:
> 
> Mar 15 10:22:18 localhost postfix/smtpd[26071]: connect from oproxy5-pub.bluehost.com[67.222.38.55]
> Mar 15 10:22:19 localhost postfix/smtpd[26071]: 53590D03:client=oproxy5-pub.bluehost.com[67.222.38.55]
> Mar 15 10:24:01 localhost postfix/smtpd[26054]: disconnect from vger.kernel.org[209.132.180.67]
> Mar 15 10:25:18 localhost postfix/smtpd[26037]: timeout after DATA (407 bytes) from unknown[144.76.18.105]
> Mar 15 10:25:18 localhost postfix/smtpd[26037]: disconnect from unknown[144.76.18.105]
> Mar 15 10:26:46 localhost postfix/smtpd[26052]: timeout after DATA (999 bytes) from oproxy1-pub.bluehost.com[66.147.249.253]
> Mar 15 10:26:46 localhost postfix/smtpd[26052]: disconnect from oproxy1-pub.bluehost.com[66.147.249.253]
> Mar 15 10:26:46 localhost postfix/cleanup[26053]: 39E48256: message-id=<1363328491.2078.21.camel@slavad-ubuntu>
> Mar 15 10:27:19 localhost postfix/smtpd[26071]: timeout after DATA (1006 bytes) from oproxy5-pub.bluehost.com[67.222.38.55]
> Mar 15 10:27:19 localhost postfix/smtpd[26071]: disconnect from oproxy5-pub.bluehost.com[67.222.38.55]
> Mar 15 10:27:19 localhost postfix/cleanup[26060]: 53590D03: message-id=<1363328512.2078.23.camel@slavad-ubuntu>
> Mar 15 10:28:23 localhost postfix/smtpd[26052]: connect from oproxy1-pub.bluehost.com[66.147.249.253]
> Mar 15 10:28:24 localhost postfix/smtpd[26052]: 1995D256: client=oproxy1-pub.bluehost.com[66.147.249.253]
> 
> This may or may not be connected to delivery of your previous mail. What
> makes me to suspect this may have relation is the hostname in the
> message-id. [/offtopic]
> 

Sorry for that. I send e-mails from several machines. I have to configure my e-mail client (Evolution) on Ubuntu more better. Currently, it fills "References" field with something like "some-uid.camel@slavad-ubuntu" instead of real e-mail. But now I don't find solution for it in internet yet.

> 
>> Now you can download nilfs-utils archive with last actual
>> version of fsck.nilfs2 from
>> http://dubeyko.com/development/FileSystems/NILFS/nilfs-utils-fsck-v.0.04-under-development.tar.gz.
> 
> Thank you for the tool.
> 
>> Please, compile utilities set with fsck and run "fsck -v debug [device] 2> [output-file]".
> 
> I first tried with '-n' and discovered that this mode is not supported.
> Then I ran the program as you suggested. I didn't study the program
> source but is seems that this tool never writes. The report came out
> really huge, please download in from
> http://www.dragonworks.ru/nilfs2/fsck.nilfs2.debug.log.xz
> 
> I also prepared the '-v info' version which is considerably smaller:
> http://www.dragonworks.ru/nilfs2/fsck.nilfs2.info.log.xz
> 
> Besides seemengly many problems in the filesystem's structures it
> suggests that the primary and secondary superblocks are not identical
> so I simply copied last 4kB of the volume to offset 1kB --- to no avail.
> (Of course I experimented with a copy of the volume data)
> 
> What else can I do?
> 

As I can see from the debug output last write was on Friday March 15, 2013 at 00:02:24. This write operation took place in segment #23445 (log #0) with sequence number ss_seq = 305303 that begins from block #48015360. It occurred something bad during with write operation or after it. Secondary superblock points out on segment with sequence number ss_seq = 305303. But primary superblock points out on segment with sequence number ss_seq = 305301.

So, I need in raw dump of three segments #23443 (ss_seq = 305301, first block #48011264), #23444 (ss_seq = 305302, first block #48013312), #23445 (ss_seq = 305303, first block #48015360). Please, make raw dump from #48011264 till #48017408 (namely, 6144 blocks) and share with me. Moreover, please, share dumpseg output for this segments.

Tomorrow, I prepare patch with additional debug output for deeper investigation of the issue on your side.

Thanks,
Vyacheslav Dubeyko.

> Thanks and regards.
> Alexander.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Corrupted nilfs2 volume
       [not found]               ` <4E5BDA60-7615-4F82-AC0F-4459DD9EF544-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2013-03-18 22:31                 ` Alexander Bezrukov
  2013-03-19  8:06                   ` Vyacheslav Dubeyko
  0 siblings, 1 reply; 14+ messages in thread
From: Alexander Bezrukov @ 2013-03-18 22:31 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Vyacheslav,

thank you for your analysis and support!

> As I can see from the debug output last write was on Friday
> March 15, 2013 at 00:02:24.

This write (unless something has happened to system clock) must have
occured during the mount attempt after the reboot. The time you
mention (which timezone, btw?) is approximately the time when I
discovered that my laptop has hung and when I rebooted it. My
impression (which may be wrong) is that there passed quite a while
after the hang before I had noticed this.

> So, I need in raw dump of three segments #23443 (ss_seq = 305301,
> first block #48011264), #23444 (ss_seq = 305302, first block
> #48013312), #23445 (ss_seq = 305303, first block #48015360).
> Please, make raw dump from #48011264 till #48017408 (namely,
> 6144 blocks)

A small note: 48017408-48011264+1 = 6145

> and share with me. Moreover, please, share dumpseg output
> for this segments.

I prepared the dumpseg output for these segments. Please find it
at http://www.dragonworks.ru/nilfs2/dumpseg.log .

As to the raw data, these blocks happened to contain some quite
sensitive information. I won't share this information
publicitly and would avoid leaking it as much as it seems
reasonable. If this data is of real and great help, please
let me know, I will prepare and send a link privately.
Sorry about that.

> Tomorrow, I prepare patch with additional debug output for
> deeper investigation of the issue on your side.

Is there any chance to register a different fs aside with
"official" nilfs2? Would such a simple change be sufficient?

diff -ur org/super.c new/super.c
--- org/super.c	2013-03-19 02:17:23.922469000 +0400
+++ new/super.c	2013-03-19 02:16:20.440634698 +0400
@@ -1356,7 +1356,7 @@
 
 struct file_system_type nilfs_fs_type = {
 	.owner    = THIS_MODULE,
-	.name     = "nilfs2",
+	.name     = "nilfs2-dbg",
 	.mount    = nilfs_mount,
 	.kill_sb  = kill_block_super,
 	.fs_flags = FS_REQUIRES_DEV,

What I want is to whenever possible avoid rebooting a machine
I am using for experimentation. It has /home on a nilfs2
partition and there're usually open user files on it.

Thanks and regards.
Alexander Bezrukov.
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Corrupted nilfs2 volume
  2013-03-18 22:31                 ` Alexander Bezrukov
@ 2013-03-19  8:06                   ` Vyacheslav Dubeyko
  2013-03-19  8:30                     ` ARAI Shun-ichi
  2013-03-19 17:35                     ` Alexander Bezrukov
  0 siblings, 2 replies; 14+ messages in thread
From: Vyacheslav Dubeyko @ 2013-03-19  8:06 UTC (permalink / raw)
  To: Alexander Bezrukov; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Alexander,

On Tue, 2013-03-19 at 02:31 +0400, Alexander Bezrukov wrote:

[snip]
> 
> > So, I need in raw dump of three segments #23443 (ss_seq = 305301,
> > first block #48011264), #23444 (ss_seq = 305302, first block
> > #48013312), #23445 (ss_seq = 305303, first block #48015360).
> > Please, make raw dump from #48011264 till #48017408 (namely,
> > 6144 blocks)
> 
> A small note: 48017408-48011264+1 = 6145
> 
> > and share with me. Moreover, please, share dumpseg output
> > for this segments.
> 
> I prepared the dumpseg output for these segments. Please find it
> at http://www.dragonworks.ru/nilfs2/dumpseg.log .
> 

Thank you for dumpseg output.

> As to the raw data, these blocks happened to contain some quite
> sensitive information. I won't share this information
> publicitly and would avoid leaking it as much as it seems
> reasonable. If this data is of real and great help, please
> let me know, I will prepare and send a link privately.
> Sorry about that.
> 

Ok. I see. But I really need to have for analysis as minimum last
segment's raw content #23445 (ss_seq = 305303, first block #48015360).
Because I need to conclude what the reason of failure with checkpoint
attach. I can see from code analysis that only raw dump can give to me
more hints than additional debug output.

As I can see, the most part of last partial segment is contained by
metadata files. Of course, you can share raw dump privately only with
me. I need to understand the metadata state only.

> > Tomorrow, I prepare patch with additional debug output for
> > deeper investigation of the issue on your side.
> 
> Is there any chance to register a different fs aside with
> "official" nilfs2? Would such a simple change be sufficient?
> 
> diff -ur org/super.c new/super.c
> --- org/super.c	2013-03-19 02:17:23.922469000 +0400
> +++ new/super.c	2013-03-19 02:16:20.440634698 +0400
> @@ -1356,7 +1356,7 @@
>  
>  struct file_system_type nilfs_fs_type = {
>  	.owner    = THIS_MODULE,
> -	.name     = "nilfs2",
> +	.name     = "nilfs2-dbg",
>  	.mount    = nilfs_mount,
>  	.kill_sb  = kill_block_super,
>  	.fs_flags = FS_REQUIRES_DEV,
> 
> What I want is to whenever possible avoid rebooting a machine
> I am using for experimentation. It has /home on a nilfs2
> partition and there're usually open user files on it.
> 

I think that you can have more simple solution. I suggest to have
special experimental build of the kernel that you can choose in the grub
menu. Moreover, debug output will be completely disabled by means of
#undef macro. Anyway, I need to analyze raw dump before offering to you
a patch with additional debug output.

Thanks,
Vyacheslav Dubeyko.

> Thanks and regards.
> Alexander Bezrukov.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Corrupted nilfs2 volume
  2013-03-19  8:06                   ` Vyacheslav Dubeyko
@ 2013-03-19  8:30                     ` ARAI Shun-ichi
       [not found]                       ` <20130319.173020.3478740326553917.hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org>
  2013-03-19 17:35                     ` Alexander Bezrukov
  1 sibling, 1 reply; 14+ messages in thread
From: ARAI Shun-ichi @ 2013-03-19  8:30 UTC (permalink / raw)
  To: nilfs-issues-tvrqcU/I3+LBxmVkcvZ8vQ
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA, hermes-akuOmOme3sQYOdUovKs6ag

Hello,

I reported nilfs2 filesystem corruption in Nov.-Dec., 2012.
Well, I have following information/data about the error.

- 'remount' kernel log
- daily snapshots before/after corruption

Can I help you?
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Corrupted nilfs2 volume
       [not found]                       ` <20130319.173020.3478740326553917.hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org>
@ 2013-03-19  9:15                         ` Vyacheslav Dubeyko
       [not found]                           ` <20121130.164616.1703426007244850753.hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Vyacheslav Dubeyko @ 2013-03-19  9:15 UTC (permalink / raw)
  To: ARAI Shun-ichi
  Cc: nilfs-issues-tvrqcU/I3+LBxmVkcvZ8vQ, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Tue, 2013-03-19 at 17:30 +0900, ARAI Shun-ichi wrote:
> Hello,
> 
> I reported nilfs2 filesystem corruption in Nov.-Dec., 2012.

Could you point out on initial e-mail with report about file system
corruption?

It is possible that you had another reason of file system corruption.

Thanks,
Vyacheslav Dubeyko.

> Well, I have following information/data about the error.
> 
> - 'remount' kernel log
> - daily snapshots before/after corruption
> 
> Can I help you?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Corrupted nilfs2 volume
       [not found]                           ` <20121130.164616.1703426007244850753.hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org>
@ 2013-03-19 11:17                             ` ARAI Shun-ichi
       [not found]                               ` <20130319.201730.1543938923107719972.hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: ARAI Shun-ichi @ 2013-03-19 11:17 UTC (permalink / raw)
  To: nilfs-issues-tvrqcU/I3+LBxmVkcvZ8vQ, linux-nilfs-u79uwXL29TY76Z2rM5mHXA
  Cc: hermes-akuOmOme3sQYOdUovKs6ag

Hi,

In <1363684508.2229.17.camel@slavad-ubuntu>;
   Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org> wrote
   as Subject "Re: Corrupted nilfs2 volume":

> On Tue, 2013-03-19 at 17:30 +0900, ARAI Shun-ichi wrote:
>> Hello,
>> 
>> I reported nilfs2 filesystem corruption in Nov.-Dec., 2012.
> 
> Could you point out on initial e-mail with report about file system
> corruption?
> 
> It is possible that you had another reason of file system corruption.

Please refer this.

In <20121130.164616.1703426007244850753.hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org>;
   ARAI Shun-ichi <hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org> wrote
   as Subject "Re: A lot of NILFS: bad btree node messages (readonly fs)":

To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Date: Fri, 30 Nov 2012 16:46:16 +0900 (JST)
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Corrupted nilfs2 volume
       [not found]                               ` <20130319.201730.1543938923107719972.hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org>
@ 2013-03-19 11:42                                 ` Vyacheslav Dubeyko
  2013-03-19 12:52                                   ` ARAI Shun-ichi
  0 siblings, 1 reply; 14+ messages in thread
From: Vyacheslav Dubeyko @ 2013-03-19 11:42 UTC (permalink / raw)
  To: ARAI Shun-ichi
  Cc: nilfs-issues-tvrqcU/I3+LBxmVkcvZ8vQ, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi ARAI,

On Tue, 2013-03-19 at 20:17 +0900, ARAI Shun-ichi wrote:
> Hi,
> 
> In <1363684508.2229.17.camel@slavad-ubuntu>;
>    Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org> wrote
>    as Subject "Re: Corrupted nilfs2 volume":
> 
> > On Tue, 2013-03-19 at 17:30 +0900, ARAI Shun-ichi wrote:
> >> Hello,
> >> 
> >> I reported nilfs2 filesystem corruption in Nov.-Dec., 2012.
> > 
> > Could you point out on initial e-mail with report about file system
> > corruption?
> > 
> > It is possible that you had another reason of file system corruption.
> 
> Please refer this.
> 
> In <20121130.164616.1703426007244850753.hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org>;
>    ARAI Shun-ichi <hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org> wrote
>    as Subject "Re: A lot of NILFS: bad btree node messages (readonly fs)":
> 

But it is another issue. Now I don't know exactly what is a reason of
the issue that is reported in this thread. And we have different
symptoms in the case of "Re: A lot of NILFS: bad btree node messages
(readonly fs)" and in the case of issue of this thread.

Yes, I am investigating the issue with "bad b-tree node messages". It
was reported recently the script with the reproduction path that is
reproducible on my side. But I am worrying only that this reproduction
path can be a 1 KB related issue (maybe, I am wrong).

But if you can provide some additional details about issue with "bad
b-tree node messages" then, please, share it.

Thanks,
Vyacheslav Dubeyko.


> To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Date: Fri, 30 Nov 2012 16:46:16 +0900 (JST)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Corrupted nilfs2 volume
  2013-03-19 11:42                                 ` Vyacheslav Dubeyko
@ 2013-03-19 12:52                                   ` ARAI Shun-ichi
  0 siblings, 0 replies; 14+ messages in thread
From: ARAI Shun-ichi @ 2013-03-19 12:52 UTC (permalink / raw)
  To: nilfs-issues-tvrqcU/I3+LBxmVkcvZ8vQ, linux-nilfs-u79uwXL29TY76Z2rM5mHXA
  Cc: hermes-akuOmOme3sQYOdUovKs6ag


In <1363693343.2229.28.camel@slavad-ubuntu>;
   Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org> wrote
   as Subject "Re: Corrupted nilfs2 volume":

> But it is another issue.
I see, it should be divided. My misunderstanding.

> But if you can provide some additional details about issue with "bad
> b-tree node messages" then, please, share it.

I will report if I find new info (in appropriate place).
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Corrupted nilfs2 volume
  2013-03-19  8:06                   ` Vyacheslav Dubeyko
  2013-03-19  8:30                     ` ARAI Shun-ichi
@ 2013-03-19 17:35                     ` Alexander Bezrukov
  2013-03-21  6:31                       ` Vyacheslav Dubeyko
  1 sibling, 1 reply; 14+ messages in thread
From: Alexander Bezrukov @ 2013-03-19 17:35 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Vyacheslav,

sorry about my slow turnaround.

> Ok. I see. But I really need to have for analysis as minimum last
> segment's raw content #23445 (ss_seq = 305303, first block #48015360).
> Because I need to conclude what the reason of failure with checkpoint
> attach. I can see from code analysis that only raw dump can give to me
> more hints than additional debug output.

I have prepared and sent you privately the data.

> > Is there any chance to register a different fs aside with
> > "official" nilfs2? Would such a simple change be sufficient?
> > 
> > diff -ur org/super.c new/super.c
> > --- org/super.c	2013-03-19 02:17:23.922469000 +0400
> > +++ new/super.c	2013-03-19 02:16:20.440634698 +0400
> > @@ -1356,7 +1356,7 @@
> >  
> >  struct file_system_type nilfs_fs_type = {
> >  	.owner    = THIS_MODULE,
> > -	.name     = "nilfs2",
> > +	.name     = "nilfs2-dbg",
> >  	.mount    = nilfs_mount,
> >  	.kill_sb  = kill_block_super,
> >  	.fs_flags = FS_REQUIRES_DEV,
> > 
> > What I want is to whenever possible avoid rebooting a machine
> > I am using for experimentation. It has /home on a nilfs2
> > partition and there're usually open user files on it.
> > 
> 
> I think that you can have more simple solution. I suggest to have
> special experimental build of the kernel that you can choose in the grub
> menu. Moreover, debug output will be completely disabled by means of
> #undef macro. Anyway, I need to analyze raw dump before offering to you
> a patch with additional debug output.

I surely can do this but my goal was to avoid rebooting whenever
possible. That's why I asked to register the fs with a different name.
If this is non-trivial I will boot into a new kernel, no problem.

Thanks for all your support.

Regards.
Alexander Bezrukov.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Corrupted nilfs2 volume
  2013-03-19 17:35                     ` Alexander Bezrukov
@ 2013-03-21  6:31                       ` Vyacheslav Dubeyko
  0 siblings, 0 replies; 14+ messages in thread
From: Vyacheslav Dubeyko @ 2013-03-21  6:31 UTC (permalink / raw)
  To: Alexander Bezrukov; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Alexander,

On Tue, 2013-03-19 at 21:35 +0400, Alexander Bezrukov wrote:
> Hi Vyacheslav,
> 
> sorry about my slow turnaround.
> 
> > Ok. I see. But I really need to have for analysis as minimum last
> > segment's raw content #23445 (ss_seq = 305303, first block #48015360).
> > Because I need to conclude what the reason of failure with checkpoint
> > attach. I can see from code analysis that only raw dump can give to me
> > more hints than additional debug output.
> 
> I have prepared and sent you privately the data.
> 

Thank you for additional details.

The reason of reported symptoms (NILFS: Invalid checkpoint (checkpoint
number=5439464)) is clear. The actual superblock points out on last
checkpoint with number s_last_cno = 5439464. But actual state of cpfile
in last partial segment doesn't contain valid record for #5439464. The
last valid record in cpfile is record for #5439463 checkpoint. Moreover,
it is possible to see from dumpseg output that in last partial segment
files live in #5439463 checkpoint. So, it is strange. I think that it
was some failure during NILFS2 driver operations with this volume.
Maybe, system log on this volume contains some details about errors that
were a reason of such corruption.

Currently, I think that it is recoverable corruption. But for some
reason the recovery doesn't take place during mount. I am preparing
patchset with debug output for more detailed investigating the issue
with mount on your side and checking my presuppositions about possible
way to fix the issue. So, I'll send you this patchset soon.

> > > Is there any chance to register a different fs aside with
> > > "official" nilfs2? Would such a simple change be sufficient?
> > > 
> > > diff -ur org/super.c new/super.c
> > > --- org/super.c	2013-03-19 02:17:23.922469000 +0400
> > > +++ new/super.c	2013-03-19 02:16:20.440634698 +0400
> > > @@ -1356,7 +1356,7 @@
> > >  
> > >  struct file_system_type nilfs_fs_type = {
> > >  	.owner    = THIS_MODULE,
> > > -	.name     = "nilfs2",
> > > +	.name     = "nilfs2-dbg",
> > >  	.mount    = nilfs_mount,
> > >  	.kill_sb  = kill_block_super,
> > >  	.fs_flags = FS_REQUIRES_DEV,
> > > 
> > > What I want is to whenever possible avoid rebooting a machine
> > > I am using for experimentation. It has /home on a nilfs2
> > > partition and there're usually open user files on it.
> > > 
> > 
> > I think that you can have more simple solution. I suggest to have
> > special experimental build of the kernel that you can choose in the grub
> > menu. Moreover, debug output will be completely disabled by means of
> > #undef macro. Anyway, I need to analyze raw dump before offering to you
> > a patch with additional debug output.
> 
> I surely can do this but my goal was to avoid rebooting whenever
> possible. That's why I asked to register the fs with a different name.
> If this is non-trivial I will boot into a new kernel, no problem.
> 

I think that maybe loadable kernel module of NILFS2 driver can be a
solution for you.

With the best regards,
Vyacheslav Dubeyko.

> Thanks for all your support.
> 
> Regards.
> Alexander Bezrukov.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-03-21  6:31 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20130315024019.7DC1BD00@mail.dragonworks.ru>
     [not found] ` <20130315024019.7DC1BD00-foQFVU7ZUqen8/iIaQqUdCjHesifYMBh@public.gmane.org>
2013-03-15  6:21   ` Corrupted nilfs2 volume Vyacheslav Dubeyko
2013-03-15 23:21     ` Alexander Bezrukov
2013-03-16 15:11       ` Vyacheslav Dubeyko
     [not found]         ` <70219EF1-8083-4DD5-BA18-84CD1914DC3E-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2013-03-18 14:23           ` Alexander Bezrukov
2013-03-18 20:44             ` Vyacheslav Dubeyko
     [not found]               ` <4E5BDA60-7615-4F82-AC0F-4459DD9EF544-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2013-03-18 22:31                 ` Alexander Bezrukov
2013-03-19  8:06                   ` Vyacheslav Dubeyko
2013-03-19  8:30                     ` ARAI Shun-ichi
     [not found]                       ` <20130319.173020.3478740326553917.hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org>
2013-03-19  9:15                         ` Vyacheslav Dubeyko
     [not found]                           ` <20121130.164616.1703426007244850753.hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org>
2013-03-19 11:17                             ` ARAI Shun-ichi
     [not found]                               ` <20130319.201730.1543938923107719972.hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org>
2013-03-19 11:42                                 ` Vyacheslav Dubeyko
2013-03-19 12:52                                   ` ARAI Shun-ichi
2013-03-19 17:35                     ` Alexander Bezrukov
2013-03-21  6:31                       ` Vyacheslav Dubeyko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.