All of lore.kernel.org
 help / color / mirror / Atom feed
* very large mount time after unxepected power down
       [not found] ` <CAFPMYnE3ybWO4o=E1UonAZJ7Uwn5y9n4840ksYGAu7qAYJ0zKw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-10-30 13:20   ` Сергей Александров
       [not found]     ` <CAFPMYnEZ28qvwkE3kaB59h2rD_8noT+gQtp7Hs6uvmHcL6KzYA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Сергей Александров @ 2012-10-30 13:20 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Good time of the day!

I'v got a nilfs2 partition on a 1TB md RAID1 partition composed of two
HDD's. Kernel 3.5.3, userspace utils v2.1.1. Gentoo linux
distribution.
Just updated utils to 2.1.4 but no failure since.

After power shutdown, mount takes about several hours.

For the first time I thought that it won't mount at all and tried to
use fsck tool, found somewhere in the internet(don't really remember).
It reported that superblock is ok.
Than I commented the check in the source file and the default number
of blocks to check appeared to be too small. It failed to find the
next superblock. I've increased the number, but increasing it to *100
didn't help.
So, probably the reserved SB is too far from away and it takes too
long to find it.

Does anybody knows, how can it be speed up? I know, UPS is a solution,
but I consider it be a bug.

nilfs-tune -l /dev/md0
nilfs-tune 2.1.1
Filesystem volume name:   (none)
Filesystem UUID:          9adc7cf2-4c04-4030-91d9-fae9a19ded03
Filesystem magic number:  0x3434
Filesystem revision #:    2.0
Filesystem features:      (none)
Filesystem state:         invalid or mounted
Filesystem OS type:       Linux
Block size:               4096
Filesystem created:       Wed Feb 29 20:32:28 2012
Last mount time:          Fri Oct 26 23:54:26 2012
Last write time:          Mon Oct 29 20:05:34 2012
Mount count:              11
Maximum mount count:      50
Reserve blocks uid:       0 (user root)
Reserve blocks gid:       0 (group root)
First inode:              11
Inode size:               128
DAT entry size:           32
Checkpoint size:          192
Segment usage size:       16
Number of segments:       119233
Device size:              1000202649600
First data block:         1
# of blocks per segment:  2048
Reserved segments %:      5
Last checkpoint #:        1003256
Last block address:       160163840
Last sequence #:          3416007
Free blocks count:        53164032
Commit interval:          0
# of blks to create seg:  0
CRC seed:                 0xc2316108
CRC check sum:            0x56f9857c
CRC check data size:      0x00000118

--------------------------------------------------
Aleksandrov Sergey Vasil'evich
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: very large mount time after unxepected power down
       [not found]     ` <CAFPMYnEZ28qvwkE3kaB59h2rD_8noT+gQtp7Hs6uvmHcL6KzYA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-10-30 13:49       ` Vyacheslav Dubeyko
  2012-10-30 14:30         ` Сергей Александров
  0 siblings, 1 reply; 28+ messages in thread
From: Vyacheslav Dubeyko @ 2012-10-30 13:49 UTC (permalink / raw)
  To: Сергей
	Александров
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,

On Tue, 2012-10-30 at 16:20 +0300, Сергей Александров wrote:
> Good time of the day!
> 
> I'v got a nilfs2 partition on a 1TB md RAID1 partition composed of two
> HDD's. Kernel 3.5.3, userspace utils v2.1.1. Gentoo linux
> distribution.
> Just updated utils to 2.1.4 but no failure since.
> 
> After power shutdown, mount takes about several hours.
> 

What about RAID1 consistency? Could you describe more about your RAID
configuration?

> For the first time I thought that it won't mount at all and tried to
> use fsck tool, found somewhere in the internet(don't really remember).
> It reported that superblock is ok.

So, I am implementing the fsck tool for NILFS2. I guess that you take
sources from NILFS2 e-mail list. 

> Than I commented the check in the source file and the default number
> of blocks to check appeared to be too small. It failed to find the
> next superblock. I've increased the number, but increasing it to *100
> didn't help.

Sorry, I can't understand about what sources you are talking. Could you
describe more details about what and where you commented?

> So, probably the reserved SB is too far from away and it takes too
> long to find it.
> 

If you try to find the second superblock then it is placed in the begin
of last 4 KB of the volume. Your device size is 1000202649600 bytes.

> Does anybody knows, how can it be speed up? I know, UPS is a solution,
> but I consider it be a bug.
> 

Could you share more details about situation during mount operations? I
mean: (1) NILFS2-related messages in the system log; (2) "ps ax" output;
(3) maybe "top" output can be useful also; (4) "mount" output before
trying to mount NILFS2 volume.

Currently, your situation is unclear and, first of all, it needs to
understand what do you have. It needs to have more info before any
advices.

With the best regards,
Vyacheslav Dubeyko.

> nilfs-tune -l /dev/md0
> nilfs-tune 2.1.1
> Filesystem volume name:   (none)
> Filesystem UUID:          9adc7cf2-4c04-4030-91d9-fae9a19ded03
> Filesystem magic number:  0x3434
> Filesystem revision #:    2.0
> Filesystem features:      (none)
> Filesystem state:         invalid or mounted
> Filesystem OS type:       Linux
> Block size:               4096
> Filesystem created:       Wed Feb 29 20:32:28 2012
> Last mount time:          Fri Oct 26 23:54:26 2012
> Last write time:          Mon Oct 29 20:05:34 2012
> Mount count:              11
> Maximum mount count:      50
> Reserve blocks uid:       0 (user root)
> Reserve blocks gid:       0 (group root)
> First inode:              11
> Inode size:               128
> DAT entry size:           32
> Checkpoint size:          192
> Segment usage size:       16
> Number of segments:       119233
> Device size:              1000202649600
> First data block:         1
> # of blocks per segment:  2048
> Reserved segments %:      5
> Last checkpoint #:        1003256
> Last block address:       160163840
> Last sequence #:          3416007
> Free blocks count:        53164032
> Commit interval:          0
> # of blks to create seg:  0
> CRC seed:                 0xc2316108
> CRC check sum:            0x56f9857c
> CRC check data size:      0x00000118
> 
> --------------------------------------------------
> Aleksandrov Sergey Vasil'evich
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: very large mount time after unxepected power down
  2012-10-30 13:49       ` Vyacheslav Dubeyko
@ 2012-10-30 14:30         ` Сергей Александров
       [not found]           ` <CAFPMYnHhtFxuVZOMu9MZ6Xb74mFPm1a-4axyFKkHiJjDEW_4BA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Сергей Александров @ 2012-10-30 14:30 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

--------------------------------------------------
Александров Сергей Васильевич


2012/10/30 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
> Hi,
>
> On Tue, 2012-10-30 at 16:20 +0300, Сергей Александров wrote:
>> Good time of the day!
>>
>> I'v got a nilfs2 partition on a 1TB md RAID1 partition composed of two
>> HDD's. Kernel 3.5.3, userspace utils v2.1.1. Gentoo linux
>> distribution.
>> Just updated utils to 2.1.4 but no failure since.
>>
>> After power shutdown, mount takes about several hours.
>>
>
> What about RAID1 consistency? Could you describe more about your RAID
> configuration?

# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active raid1 sdb1[0] sdc1[2]
      976760400 blocks super 1.2 [2/2] [UU]

So, raid is consistent. Reading speed from md device is about 60MB/s
according to iostat.

>> For the first time I thought that it won't mount at all and tried to
>> use fsck tool, found somewhere in the internet(don't really remember).
>> It reported that superblock is ok.
>
> So, I am implementing the fsck tool for NILFS2. I guess that you take
> sources from NILFS2 e-mail list.
>
>> Than I commented the check in the source file and the default number
>> of blocks to check appeared to be too small. It failed to find the
>> next superblock. I've increased the number, but increasing it to *100
>> didn't help.
>
> Sorry, I can't understand about what sources you are talking. Could you
> describe more details about what and where you commented?
>
I've forced test_latest_log to return negative result. And changed
MAX_SCAN_SEGMENTS to 100000
That was not enough. It finished without finding the SB.


The load from fsck was the same as from mount.
About 60MB/s read from md device and about 30% load on one core.

>> So, probably the reserved SB is too far from away and it takes too
>> long to find it.
>>
>
> If you try to find the second superblock then it is placed in the begin
> of last 4 KB of the volume. Your device size is 1000202649600 bytes.
>
>> Does anybody knows, how can it be speed up? I know, UPS is a solution,
>> but I consider it be a bug.
>>
>
> Could you share more details about situation during mount operations? I
> mean: (1) NILFS2-related messages in the system log; (2) "ps ax" output;
> (3) maybe "top" output can be useful also; (4) "mount" output before
> trying to mount NILFS2 volume.
last situation:

messages log:
Oct 30 12:18:52 router kernel: [  159.674579] NILFS warning: mounting
unchecked fs
.....
.....
Oct 30 13:03:06 router kernel: [ 2810.304245] NILFS: recovery complete.
Oct 30 13:03:06 router kernel: [ 2810.325240] segctord starting.
Construction interval = 5 seconds, CP frequency < 30 seconds
Oct 30 13:03:07 router nilfs_cleanerd[15453]: start
Oct 30 13:03:07 router nilfs_cleanerd[15453]: pause (clean check)

It took about 45 minutes.
Previous time it took more than 4 hours.
Both times RAID was consistent.

top showed one process eating about 27% of cpu (2 cores, AMD Athon II
X2 250 @3000MHz)
Also, about 80% of memory is used for cache.
Sory, have not saved ps output...

I can repeat the situation if it helps.

--------------------------------------------------
Aleksandrov Sergey Vasil'evich
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: very large mount time after unxepected power down
       [not found]           ` <CAFPMYnHhtFxuVZOMu9MZ6Xb74mFPm1a-4axyFKkHiJjDEW_4BA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-10-30 14:52             ` Vyacheslav Dubeyko
  2012-10-30 15:02               ` Сергей Александров
  0 siblings, 1 reply; 28+ messages in thread
From: Vyacheslav Dubeyko @ 2012-10-30 14:52 UTC (permalink / raw)
  To: Сергей
	Александров
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Tue, 2012-10-30 at 17:30 +0300, Сергей Александров wrote:
> --------------------------------------------------
> Александров Сергей Васильевич
> 
> 
> 2012/10/30 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
> > Hi,
> >
> > On Tue, 2012-10-30 at 16:20 +0300, Сергей Александров wrote:
> >> Good time of the day!
> >>
> >> I'v got a nilfs2 partition on a 1TB md RAID1 partition composed of two
> >> HDD's. Kernel 3.5.3, userspace utils v2.1.1. Gentoo linux
> >> distribution.
> >> Just updated utils to 2.1.4 but no failure since.
> >>
> >> After power shutdown, mount takes about several hours.
> >>
> >
> > What about RAID1 consistency? Could you describe more about your RAID
> > configuration?
> 
> # cat /proc/mdstat
> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
> md0 : active raid1 sdb1[0] sdc1[2]
>       976760400 blocks super 1.2 [2/2] [UU]
> 
> So, raid is consistent. Reading speed from md device is about 60MB/s
> according to iostat.
> 
> >> For the first time I thought that it won't mount at all and tried to
> >> use fsck tool, found somewhere in the internet(don't really remember).
> >> It reported that superblock is ok.
> >
> > So, I am implementing the fsck tool for NILFS2. I guess that you take
> > sources from NILFS2 e-mail list.
> >
> >> Than I commented the check in the source file and the default number
> >> of blocks to check appeared to be too small. It failed to find the
> >> next superblock. I've increased the number, but increasing it to *100
> >> didn't help.
> >
> > Sorry, I can't understand about what sources you are talking. Could you
> > describe more details about what and where you commented?
> >
> I've forced test_latest_log to return negative result. And changed
> MAX_SCAN_SEGMENTS to 100000
> That was not enough. It finished without finding the SB.
> 
> 
> The load from fsck was the same as from mount.
> About 60MB/s read from md device and about 30% load on one core.
> 
> >> So, probably the reserved SB is too far from away and it takes too
> >> long to find it.
> >>
> >
> > If you try to find the second superblock then it is placed in the begin
> > of last 4 KB of the volume. Your device size is 1000202649600 bytes.
> >
> >> Does anybody knows, how can it be speed up? I know, UPS is a solution,
> >> but I consider it be a bug.
> >>
> >
> > Could you share more details about situation during mount operations? I
> > mean: (1) NILFS2-related messages in the system log; (2) "ps ax" output;
> > (3) maybe "top" output can be useful also; (4) "mount" output before
> > trying to mount NILFS2 volume.
> last situation:
> 
> messages log:
> Oct 30 12:18:52 router kernel: [  159.674579] NILFS warning: mounting
> unchecked fs
> .....
> .....
> Oct 30 13:03:06 router kernel: [ 2810.304245] NILFS: recovery complete.
> Oct 30 13:03:06 router kernel: [ 2810.325240] segctord starting.
> Construction interval = 5 seconds, CP frequency < 30 seconds
> Oct 30 13:03:07 router nilfs_cleanerd[15453]: start
> Oct 30 13:03:07 router nilfs_cleanerd[15453]: pause (clean check)
> 

Could you share content of your /etc/nilfs_cleanerd.conf file?

Could you try to reproduce the issue with log_priority enhanced to debug
level (I mean option in nilfs_cleanerd.conf) and share messages log
again?

> It took about 45 minutes.
> Previous time it took more than 4 hours.

You mean that your console returns input after 45 minutes when you try
to execute mount. Am I correct?

With the best regards,
Vyacheslav Dubeyko.

> Both times RAID was consistent.
> 
> top showed one process eating about 27% of cpu (2 cores, AMD Athon II
> X2 250 @3000MHz)
> Also, about 80% of memory is used for cache.
> Sory, have not saved ps output...
> 
> I can repeat the situation if it helps.
> 
> --------------------------------------------------
> Aleksandrov Sergey Vasil'evich


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: very large mount time after unxepected power down
  2012-10-30 14:52             ` Vyacheslav Dubeyko
@ 2012-10-30 15:02               ` Сергей Александров
       [not found]                 ` <CAFPMYnGn4aNf=5B9v93TtTc6x4hG1ULgt0P9i75uO=xGX0U2bg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Сергей Александров @ 2012-10-30 15:02 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

--------------------------------------------------
Александров Сергей Васильевич


2012/10/30 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
> On Tue, 2012-10-30 at 17:30 +0300, Сергей Александров wrote:
>> --------------------------------------------------
>> Александров Сергей Васильевич
>>
>>
>> 2012/10/30 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
>> > Hi,
>> >
>> > On Tue, 2012-10-30 at 16:20 +0300, Сергей Александров wrote:
>> >> Good time of the day!
>> >>
>> >> I'v got a nilfs2 partition on a 1TB md RAID1 partition composed of two
>> >> HDD's. Kernel 3.5.3, userspace utils v2.1.1. Gentoo linux
>> >> distribution.
>> >> Just updated utils to 2.1.4 but no failure since.
>> >>
>> >> After power shutdown, mount takes about several hours.
>> >>
>> >
>> > What about RAID1 consistency? Could you describe more about your RAID
>> > configuration?
>>
>> # cat /proc/mdstat
>> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
>> md0 : active raid1 sdb1[0] sdc1[2]
>>       976760400 blocks super 1.2 [2/2] [UU]
>>
>> So, raid is consistent. Reading speed from md device is about 60MB/s
>> according to iostat.
>>
>> >> For the first time I thought that it won't mount at all and tried to
>> >> use fsck tool, found somewhere in the internet(don't really remember).
>> >> It reported that superblock is ok.
>> >
>> > So, I am implementing the fsck tool for NILFS2. I guess that you take
>> > sources from NILFS2 e-mail list.
>> >
>> >> Than I commented the check in the source file and the default number
>> >> of blocks to check appeared to be too small. It failed to find the
>> >> next superblock. I've increased the number, but increasing it to *100
>> >> didn't help.
>> >
>> > Sorry, I can't understand about what sources you are talking. Could you
>> > describe more details about what and where you commented?
>> >
>> I've forced test_latest_log to return negative result. And changed
>> MAX_SCAN_SEGMENTS to 100000
>> That was not enough. It finished without finding the SB.
>>
>>
>> The load from fsck was the same as from mount.
>> About 60MB/s read from md device and about 30% load on one core.
>>
>> >> So, probably the reserved SB is too far from away and it takes too
>> >> long to find it.
>> >>
>> >
>> > If you try to find the second superblock then it is placed in the begin
>> > of last 4 KB of the volume. Your device size is 1000202649600 bytes.
>> >
>> >> Does anybody knows, how can it be speed up? I know, UPS is a solution,
>> >> but I consider it be a bug.
>> >>
>> >
>> > Could you share more details about situation during mount operations? I
>> > mean: (1) NILFS2-related messages in the system log; (2) "ps ax" output;
>> > (3) maybe "top" output can be useful also; (4) "mount" output before
>> > trying to mount NILFS2 volume.
>> last situation:
>>
>> messages log:
>> Oct 30 12:18:52 router kernel: [  159.674579] NILFS warning: mounting
>> unchecked fs
>> .....
>> .....
>> Oct 30 13:03:06 router kernel: [ 2810.304245] NILFS: recovery complete.
>> Oct 30 13:03:06 router kernel: [ 2810.325240] segctord starting.
>> Construction interval = 5 seconds, CP frequency < 30 seconds
>> Oct 30 13:03:07 router nilfs_cleanerd[15453]: start
>> Oct 30 13:03:07 router nilfs_cleanerd[15453]: pause (clean check)
>>
>
> Could you share content of your /etc/nilfs_cleanerd.conf file?


protection_period       3600
min_clean_segments      10%
max_clean_segments      20%
clean_check_interval    10
selection_policy        timestamp
nsegments_per_clean     4
mc_nsegments_per_clean  8
cleaning_interval       5
mc_cleaning_interval    2
retry_interval          30
use_mmap
log_priority            info

> Could you try to reproduce the issue with log_priority enhanced to debug
> level (I mean option in nilfs_cleanerd.conf) and share messages log
> again?

I can, but a bit later, if you don''t mind(in 24 hours).

>> It took about 45 minutes.
>> Previous time it took more than 4 hours.
>
> You mean that your console returns input after 45 minutes when you try
> to execute mount. Am I correct?

Yes, you are correct.

--------------------------------------------------
Aleksandrov Sergey Vasil'evich
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: very large mount time after unxepected power down
       [not found]                 ` <CAFPMYnGn4aNf=5B9v93TtTc6x4hG1ULgt0P9i75uO=xGX0U2bg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-10-30 17:35                   ` Vyacheslav Dubeyko
       [not found]                     ` <AFFE5823-0AD0-488C-B465-55CF45A10785-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Vyacheslav Dubeyko @ 2012-10-30 17:35 UTC (permalink / raw)
  To: Сергей
	Александров
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA


On Oct 30, 2012, at 6:02 PM, Сергей Александров wrote:

> --------------------------------------------------
> Александров Сергей Васильевич
> 
> 
> 2012/10/30 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
>> On Tue, 2012-10-30 at 17:30 +0300, Сергей Александров wrote:
>>> --------------------------------------------------
>>> Александров Сергей Васильевич
>>> 
>>> 
>>> 2012/10/30 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
>>>> Hi,
>>>> 
>>>> On Tue, 2012-10-30 at 16:20 +0300, Сергей Александров wrote:
>>>>> Good time of the day!
>>>>> 
>>>>> I'v got a nilfs2 partition on a 1TB md RAID1 partition composed of two
>>>>> HDD's. Kernel 3.5.3, userspace utils v2.1.1. Gentoo linux
>>>>> distribution.
>>>>> Just updated utils to 2.1.4 but no failure since.
>>>>> 
>>>>> After power shutdown, mount takes about several hours.
>>>>> 
>>>> 
>>>> What about RAID1 consistency? Could you describe more about your RAID
>>>> configuration?
>>> 
>>> # cat /proc/mdstat
>>> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
>>> md0 : active raid1 sdb1[0] sdc1[2]
>>>      976760400 blocks super 1.2 [2/2] [UU]
>>> 
>>> So, raid is consistent. Reading speed from md device is about 60MB/s
>>> according to iostat.
>>> 
>>>>> For the first time I thought that it won't mount at all and tried to
>>>>> use fsck tool, found somewhere in the internet(don't really remember).
>>>>> It reported that superblock is ok.
>>>> 
>>>> So, I am implementing the fsck tool for NILFS2. I guess that you take
>>>> sources from NILFS2 e-mail list.
>>>> 
>>>>> Than I commented the check in the source file and the default number
>>>>> of blocks to check appeared to be too small. It failed to find the
>>>>> next superblock. I've increased the number, but increasing it to *100
>>>>> didn't help.
>>>> 
>>>> Sorry, I can't understand about what sources you are talking. Could you
>>>> describe more details about what and where you commented?
>>>> 
>>> I've forced test_latest_log to return negative result. And changed
>>> MAX_SCAN_SEGMENTS to 100000
>>> That was not enough. It finished without finding the SB.
>>> 
>>> 
>>> The load from fsck was the same as from mount.
>>> About 60MB/s read from md device and about 30% load on one core.
>>> 
>>>>> So, probably the reserved SB is too far from away and it takes too
>>>>> long to find it.
>>>>> 
>>>> 
>>>> If you try to find the second superblock then it is placed in the begin
>>>> of last 4 KB of the volume. Your device size is 1000202649600 bytes.
>>>> 
>>>>> Does anybody knows, how can it be speed up? I know, UPS is a solution,
>>>>> but I consider it be a bug.
>>>>> 
>>>> 
>>>> Could you share more details about situation during mount operations? I
>>>> mean: (1) NILFS2-related messages in the system log; (2) "ps ax" output;
>>>> (3) maybe "top" output can be useful also; (4) "mount" output before
>>>> trying to mount NILFS2 volume.
>>> last situation:
>>> 
>>> messages log:
>>> Oct 30 12:18:52 router kernel: [  159.674579] NILFS warning: mounting
>>> unchecked fs
>>> .....
>>> .....
>>> Oct 30 13:03:06 router kernel: [ 2810.304245] NILFS: recovery complete.
>>> Oct 30 13:03:06 router kernel: [ 2810.325240] segctord starting.
>>> Construction interval = 5 seconds, CP frequency < 30 seconds
>>> Oct 30 13:03:07 router nilfs_cleanerd[15453]: start
>>> Oct 30 13:03:07 router nilfs_cleanerd[15453]: pause (clean check)
>>> 
>> 
>> Could you share content of your /etc/nilfs_cleanerd.conf file?
> 
> 
> protection_period       3600
> min_clean_segments      10%
> max_clean_segments      20%
> clean_check_interval    10
> selection_policy        timestamp
> nsegments_per_clean     4
> mc_nsegments_per_clean  8
> cleaning_interval       5
> mc_cleaning_interval    2
> retry_interval          30
> use_mmap
> log_priority            info
> 
>> Could you try to reproduce the issue with log_priority enhanced to debug
>> level (I mean option in nilfs_cleanerd.conf) and share messages log
>> again?
> 
> I can, but a bit later, if you don''t mind(in 24 hours).
> 

Please, try to mount with using "strace". The output of "strace" can be useful. Please, share it after trying.

Thanks,
Vyacheslav Dubeyko.

>>> It took about 45 minutes.
>>> Previous time it took more than 4 hours.
>> 
>> You mean that your console returns input after 45 minutes when you try
>> to execute mount. Am I correct?
> 
> Yes, you are correct.
> 
> --------------------------------------------------
> Aleksandrov Sergey Vasil'evich
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: very large mount time after unxepected power down
       [not found]                     ` <AFFE5823-0AD0-488C-B465-55CF45A10785-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2012-10-31  3:51                       ` Сергей Александров
       [not found]                         ` <CAFPMYnEtXMr1UOVYdNNRxxH83=O-_UOR_ZhCdqjh+JuUNrFiDA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Сергей Александров @ 2012-10-31  3:51 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

>>> Could you try to reproduce the issue with log_priority enhanced to debug
>>> level (I mean option in nilfs_cleanerd.conf) and share messages log
>>> again?
>>
>> I can, but a bit later, if you don''t mind(in 24 hours).
>>
>
> Please, try to mount with using "strace". The output of "strace" can be useful. Please, share it after trying.

I really dont'n know, what happened, but I failed to reproduce...
The only thing that have changed is version of nilfs-utils. It seems
that somewhere between 2.1.1 and 2.1.4 the bug was fixed.
Should I try to reprodce the bug with v 2.1.1, or we will consider the
question closed?

--------------------------------------------------
Aleksandrov Sergey Vasil'evich
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: very large mount time after unxepected power down
       [not found]                         ` <CAFPMYnEtXMr1UOVYdNNRxxH83=O-_UOR_ZhCdqjh+JuUNrFiDA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-10-31  6:13                           ` Vyacheslav Dubeyko
  2012-10-31 18:06                             ` Сергей Александров
  0 siblings, 1 reply; 28+ messages in thread
From: Vyacheslav Dubeyko @ 2012-10-31  6:13 UTC (permalink / raw)
  To: Сергей
	Александров
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Wed, 2012-10-31 at 06:51 +0300, Сергей Александров wrote:
> >>> Could you try to reproduce the issue with log_priority enhanced to debug
> >>> level (I mean option in nilfs_cleanerd.conf) and share messages log
> >>> again?
> >>
> >> I can, but a bit later, if you don''t mind(in 24 hours).
> >>
> >
> > Please, try to mount with using "strace". The output of "strace" can be useful. Please, share it after trying.
> 
> I really dont'n know, what happened, but I failed to reproduce...
> The only thing that have changed is version of nilfs-utils. It seems
> that somewhere between 2.1.1 and 2.1.4 the bug was fixed.
> Should I try to reprodce the bug with v 2.1.1, or we will consider the
> question closed?
> 

I think that it makes sense to try to reproduce the issue with 2.1.1
version. If you can reproduce on 2.1.1 version but on 2.1.4 can't then
it means that bug was fixed.

But if you are happy without it then don't worry. :-)

With the best regards,
Vyacheslav Dubeyko.

> --------------------------------------------------
> Aleksandrov Sergey Vasil'evich


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: very large mount time after unxepected power down
  2012-10-31  6:13                           ` Vyacheslav Dubeyko
@ 2012-10-31 18:06                             ` Сергей Александров
       [not found]                               ` <CAFPMYnHB=x2y3C-bVSEcaT2nMYn12zc5Jnr56ph31zBbym4Kfw@mail.gmail.com>
  0 siblings, 1 reply; 28+ messages in thread
From: Сергей Александров @ 2012-10-31 18:06 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Thank you very much for your help, sorry for diverting you from
probably more important and interesting things. May be I'll try to
reproduce the issue later(As a programmer I'm interested in finding
the bug), but (as a user) for now I'm just happy with present
situation.
Thanks for your help again.

>> I really dont'n know, what happened, but I failed to reproduce...
>> The only thing that have changed is version of nilfs-utils. It seems
>> that somewhere between 2.1.1 and 2.1.4 the bug was fixed.
>> Should I try to reprodce the bug with v 2.1.1, or we will consider the
>> question closed?
>>
>
> I think that it makes sense to try to reproduce the issue with 2.1.1
> version. If you can reproduce on 2.1.1 version but on 2.1.4 can't then
> it means that bug was fixed.
>
> But if you are happy without it then don't worry. :-)
>
> With the best regards,
> Vyacheslav Dubeyko.
>
--------------------------------------------------
Aleksandrov Sergey Vasil'evich
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: very large mount time after unxepected power down
       [not found]                                 ` <CAFPMYnHB=x2y3C-bVSEcaT2nMYn12zc5Jnr56ph31zBbym4Kfw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-11-15  4:07                                   ` Сергей Александров
       [not found]                                     ` <CAFPMYnE2j0DjiqcSuJRiJX5hfDjHoyh-WUhG0cMav9K=tbsLDQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Сергей Александров @ 2012-11-15  4:07 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,
The bug appeared again. And apparently it's a kernel one.
My bad that I didn't notice that the CPU activity wals in kernel space.
The mount process hangs inside mount call itself. I've an strace
output, but it's not informative, it perfectly normal besides the hang
in mount call.
I've a very few experience in catching kernel bugs like that, but I'll
do my best to help.


According to my observations, the bug takes place only after emergency
power down after a rather long uptime, may be even the time it takes
to mount the partition is proportional to ether uptime or the amount
of io since last mount.

I left FS unclean, so it is possible to do some tests, but I would
like to see it up and running as soon as possible.


Best regards,
Aleksandrov Sergey Vasil'evich
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: very large mount time after unxepected power down
       [not found]                                     ` <CAFPMYnE2j0DjiqcSuJRiJX5hfDjHoyh-WUhG0cMav9K=tbsLDQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-11-15  6:32                                       ` Vyacheslav Dubeyko
       [not found]                                         ` <CAFPMYnH4npNU8dJKAHwjatxAA=WoT10EWho5xyYjZJjz4uOYBA@mail.gmail.com>
  0 siblings, 1 reply; 28+ messages in thread
From: Vyacheslav Dubeyko @ 2012-11-15  6:32 UTC (permalink / raw)
  To: Сергей
	Александров
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,

On Thu, 2012-11-15 at 07:07 +0300, Сергей Александров wrote:
> Hi,
> The bug appeared again. And apparently it's a kernel one.
> My bad that I didn't notice that the CPU activity wals in kernel space.
> The mount process hangs inside mount call itself. I've an strace
> output, but it's not informative, it perfectly normal besides the hang
> in mount call.

I will try to reproduce the issue. As I remember, you described your
environment with details. But what size of files you have preferably on
your partition? Moreover, could you share strace output anyway? Because
it can be a basis for understanding of achieving the issue reproducing
and analysis of execution environment.

If you want to try to investigate and fix the issue by yourself then
your strace output is needed also.

By the way, could you share your dmesg output for analysis?

I think also it can be very useful lscp and lssu output before sudden
power off and after mount is ended successfully. Could you share it?

With the best regards,
Vyacheslav Dubeyko.

> I've a very few experience in catching kernel bugs like that, but I'll
> do my best to help.
> 
> 
> According to my observations, the bug takes place only after emergency
> power down after a rather long uptime, may be even the time it takes
> to mount the partition is proportional to ether uptime or the amount
> of io since last mount.
> 
> I left FS unclean, so it is possible to do some tests, but I would
> like to see it up and running as soon as possible.
> 
> 
> Best regards,
> Aleksandrov Sergey Vasil'evich


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: very large mount time after unxepected power down
       [not found]                                             ` <CAFPMYnG6zjT6-=x7XcVuuCp1__H0FhCBfNmyrfQi8dNpWC_m2w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-11-16  6:26                                               ` Vyacheslav Dubeyko
  2012-11-16  6:40                                                 ` Сергей Александров
  0 siblings, 1 reply; 28+ messages in thread
From: Vyacheslav Dubeyko @ 2012-11-16  6:26 UTC (permalink / raw)
  To: Сергей
	Александров
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Thu, 2012-11-15 at 16:08 +0300, Сергей Александров wrote:
> lssu, lscp after mount. Actually I missed the moment and
> nilfs_cleanerd has cleaned some data.
> Mount took about 50 minutes.
> 

Thank you for info.

I have some additional questions after thinking about issue. As I
remember, you wrote that you tried to understand what process eats CPU
time during issue. But you don't share details about it. Could you share
details of "top" and "ps ax" outputs for the case of issue reproducing?

With the best regards,
Vyacheslav Dubeyko.

> --------------------------------------------------
> Александров Сергей Васильевич
> 
> 
> 2012/11/15 Сергей Александров <splavgm-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
> > 2012/11/15 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
> >
> >>> The bug appeared again. And apparently it's a kernel one.
> >>> My bad that I didn't notice that the CPU activity wals in kernel space.
> >>> The mount process hangs inside mount call itself. I've an strace
> >>> output, but it's not informative, it perfectly normal besides the hang
> >>> in mount call.
> >>
> >> I will try to reproduce the issue. As I remember, you described your
> >> environment with details. But what size of files you have preferably on
> >> your partition? Moreover, could you share strace output anyway? Because
> >> it can be a basis for understanding of achieving the issue reproducing
> >> and analysis of execution environment.
> >
> > Essentially there are three kind of files: large ones about 700MB-2GB,
> > video, written by rtorrent/libtorrent (with no preallocation, if it
> > matters)and physics numerical simulation files - about 2GB data files
> > and few killobytes long description files.
> >
> >>
> >> If you want to try to investigate and fix the issue by yourself then
> >> your strace output is needed also.
> >>
> >> By the way, could you share your dmesg output for analysis?
> >
> > strace and dmesg output attached. Mount is in progress.
> >
> >>
> >> I think also it can be very useful lscp and lssu output before sudden
> >> power off and after mount is ended successfully. Could you share it?
> >
> > It will be a bit hard to get the before data, but when mount will be
> > finished I'll send you after results.
> >
> >
> > I'll try to reproduce and get the before results too, but I am not
> > sure about it.
> >
> >
> > Best regards,
> > Aleksandrov Sergey Vasil'evich


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: very large mount time after unxepected power down
  2012-11-16  6:26                                               ` Vyacheslav Dubeyko
@ 2012-11-16  6:40                                                 ` Сергей Александров
       [not found]                                                   ` <CAFPMYnFLSZW068cFZ4FqDKF5sS_zF3SoV=vPG2=m+kvaxq-BZA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Сергей Александров @ 2012-11-16  6:40 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Sorry, but I didn't save top output this time..
But for sure, it was "mount /dev/md0 /nfs/raid -o ...." process. The
CPU load was fully in kernel space.
So while the mount call, the kernel was doing something very both IO
and CPU intensive for almost 50 minutes.
As I have already written the load was about 80MB/s read IO according
to iotop, and about 60% of the first CPU core according to top.

If this info is not sufficient I'll try to reproduce the case as soon
as possible.
--------------------------------------------------
Александров Сергей Васильевич


2012/11/16 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
> On Thu, 2012-11-15 at 16:08 +0300, Сергей Александров wrote:
>> lssu, lscp after mount. Actually I missed the moment and
>> nilfs_cleanerd has cleaned some data.
>> Mount took about 50 minutes.
>>
>
> Thank you for info.
>
> I have some additional questions after thinking about issue. As I
> remember, you wrote that you tried to understand what process eats CPU
> time during issue. But you don't share details about it. Could you share
> details of "top" and "ps ax" outputs for the case of issue reproducing?
>
> With the best regards,
> Vyacheslav Dubeyko.
>
>> --------------------------------------------------
>> Александров Сергей Васильевич
>>
>>
>> 2012/11/15 Сергей Александров <splavgm-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
>> > 2012/11/15 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
>> >
>> >>> The bug appeared again. And apparently it's a kernel one.
>> >>> My bad that I didn't notice that the CPU activity wals in kernel space.
>> >>> The mount process hangs inside mount call itself. I've an strace
>> >>> output, but it's not informative, it perfectly normal besides the hang
>> >>> in mount call.
>> >>
>> >> I will try to reproduce the issue. As I remember, you described your
>> >> environment with details. But what size of files you have preferably on
>> >> your partition? Moreover, could you share strace output anyway? Because
>> >> it can be a basis for understanding of achieving the issue reproducing
>> >> and analysis of execution environment.
>> >
>> > Essentially there are three kind of files: large ones about 700MB-2GB,
>> > video, written by rtorrent/libtorrent (with no preallocation, if it
>> > matters)and physics numerical simulation files - about 2GB data files
>> > and few killobytes long description files.
>> >
>> >>
>> >> If you want to try to investigate and fix the issue by yourself then
>> >> your strace output is needed also.
>> >>
>> >> By the way, could you share your dmesg output for analysis?
>> >
>> > strace and dmesg output attached. Mount is in progress.
>> >
>> >>
>> >> I think also it can be very useful lscp and lssu output before sudden
>> >> power off and after mount is ended successfully. Could you share it?
>> >
>> > It will be a bit hard to get the before data, but when mount will be
>> > finished I'll send you after results.
>> >
>> >
>> > I'll try to reproduce and get the before results too, but I am not
>> > sure about it.
>> >
>> >
>> > Best regards,
>> > Aleksandrov Sergey Vasil'evich
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: very large mount time after unxepected power down
       [not found]                                                   ` <CAFPMYnFLSZW068cFZ4FqDKF5sS_zF3SoV=vPG2=m+kvaxq-BZA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-11-16  6:53                                                     ` Vyacheslav Dubeyko
  2012-11-16  7:11                                                       ` Сергей Александров
  0 siblings, 1 reply; 28+ messages in thread
From: Vyacheslav Dubeyko @ 2012-11-16  6:53 UTC (permalink / raw)
  To: Сергей
	Александров
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Fri, 2012-11-16 at 09:40 +0300, Сергей Александров wrote:
> Sorry, but I didn't save top output this time..
> But for sure, it was "mount /dev/md0 /nfs/raid -o ...." process. The
> CPU load was fully in kernel space.
> So while the mount call, the kernel was doing something very both IO
> and CPU intensive for almost 50 minutes.
> As I have already written the load was about 80MB/s read IO according
> to iotop, and about 60% of the first CPU core according to top.
> 

Ok. I see.

I suspect currently that you can have some special corruption of the
volume state that is resulted in so long recovery code working time. But
if so, then you can have some warning messages in system log from
recovery subsystem (maybe not, of course). As I know, Gentoo has special
log that keeps error and warning messages from the kernel. Could you
check that shared by you the dmesg output contains error messages from
kernel?

Moreover, current functionality state of fsck.nilfs2 is not very useful
yet. But it can check superblocks and segment summary headers validity.
Maybe it makes sense to check your volume by fsck.nilfs2. Could you try
to check your volume?

With the best regards,
Vyacheslav Dubeyko.


> If this info is not sufficient I'll try to reproduce the case as soon
> as possible.
> --------------------------------------------------
> Александров Сергей Васильевич
> 
> 
> 2012/11/16 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
> > On Thu, 2012-11-15 at 16:08 +0300, Сергей Александров wrote:
> >> lssu, lscp after mount. Actually I missed the moment and
> >> nilfs_cleanerd has cleaned some data.
> >> Mount took about 50 minutes.
> >>
> >
> > Thank you for info.
> >
> > I have some additional questions after thinking about issue. As I
> > remember, you wrote that you tried to understand what process eats CPU
> > time during issue. But you don't share details about it. Could you share
> > details of "top" and "ps ax" outputs for the case of issue reproducing?
> >
> > With the best regards,
> > Vyacheslav Dubeyko.
> >
> >> --------------------------------------------------
> >> Александров Сергей Васильевич
> >>
> >>


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: very large mount time after unxepected power down
  2012-11-16  6:53                                                     ` Vyacheslav Dubeyko
@ 2012-11-16  7:11                                                       ` Сергей Александров
       [not found]                                                         ` <CAFPMYnEYnLv5e6a3ZcFRjw-8cNB80T5=mpuiX9jaWa+pEj8Q-A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Сергей Александров @ 2012-11-16  7:11 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

dmesg:
[53994.254432] NILFS warning: mounting unchecked fs
[56686.968229] NILFS: recovery complete.
[56686.969316] segctord starting. Construction interval = 5 seconds,
CP frequency < 30 seconds

messages:
Nov 15 10:57:06 router kernel: [53994.254432] NILFS warning: mounting
unchecked fs
Nov 15 11:42:02 router kernel: [56686.968229] NILFS: recovery complete.
Nov 15 11:42:02 router kernel: [56686.969316] segctord starting.
Construction interval = 5 seconds, CP frequency < 30 seconds

May be there is some kernel config option to get more debug output?

As for fsck, I have not found it in git public repo, so where can I
get the latest version?
--------------------------------------------------
Александров Сергей Васильевич


2012/11/16 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
> On Fri, 2012-11-16 at 09:40 +0300, Сергей Александров wrote:
>> Sorry, but I didn't save top output this time..
>> But for sure, it was "mount /dev/md0 /nfs/raid -o ...." process. The
>> CPU load was fully in kernel space.
>> So while the mount call, the kernel was doing something very both IO
>> and CPU intensive for almost 50 minutes.
>> As I have already written the load was about 80MB/s read IO according
>> to iotop, and about 60% of the first CPU core according to top.
>>
>
> Ok. I see.
>
> I suspect currently that you can have some special corruption of the
> volume state that is resulted in so long recovery code working time. But
> if so, then you can have some warning messages in system log from
> recovery subsystem (maybe not, of course). As I know, Gentoo has special
> log that keeps error and warning messages from the kernel. Could you
> check that shared by you the dmesg output contains error messages from
> kernel?
>
> Moreover, current functionality state of fsck.nilfs2 is not very useful
> yet. But it can check superblocks and segment summary headers validity.
> Maybe it makes sense to check your volume by fsck.nilfs2. Could you try
> to check your volume?
>
> With the best regards,
> Vyacheslav Dubeyko.
>
>
>> If this info is not sufficient I'll try to reproduce the case as soon
>> as possible.
>> --------------------------------------------------
>> Александров Сергей Васильевич
>>
>>
>> 2012/11/16 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
>> > On Thu, 2012-11-15 at 16:08 +0300, Сергей Александров wrote:
>> >> lssu, lscp after mount. Actually I missed the moment and
>> >> nilfs_cleanerd has cleaned some data.
>> >> Mount took about 50 minutes.
>> >>
>> >
>> > Thank you for info.
>> >
>> > I have some additional questions after thinking about issue. As I
>> > remember, you wrote that you tried to understand what process eats CPU
>> > time during issue. But you don't share details about it. Could you share
>> > details of "top" and "ps ax" outputs for the case of issue reproducing?
>> >
>> > With the best regards,
>> > Vyacheslav Dubeyko.
>> >
>> >> --------------------------------------------------
>> >> Александров Сергей Васильевич
>> >>
>> >>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: very large mount time after unxepected power down
       [not found]                                                         ` <CAFPMYnEYnLv5e6a3ZcFRjw-8cNB80T5=mpuiX9jaWa+pEj8Q-A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-11-16  7:37                                                           ` Vyacheslav Dubeyko
  2012-11-16 13:52                                                           ` Vyacheslav Dubeyko
  1 sibling, 0 replies; 28+ messages in thread
From: Vyacheslav Dubeyko @ 2012-11-16  7:37 UTC (permalink / raw)
  To: Сергей
	Александров
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Fri, 2012-11-16 at 10:11 +0300, Сергей Александров wrote:
> dmesg:
> [53994.254432] NILFS warning: mounting unchecked fs
> [56686.968229] NILFS: recovery complete.
> [56686.969316] segctord starting. Construction interval = 5 seconds,
> CP frequency < 30 seconds
> 
> messages:
> Nov 15 10:57:06 router kernel: [53994.254432] NILFS warning: mounting
> unchecked fs
> Nov 15 11:42:02 router kernel: [56686.968229] NILFS: recovery complete.
> Nov 15 11:42:02 router kernel: [56686.969316] segctord starting.
> Construction interval = 5 seconds, CP frequency < 30 seconds
> 
> May be there is some kernel config option to get more debug output?
> 

I am afraid that it is all that we can get from NILFS2 driver currently.
So, as I understand, we haven't any messages about detected corruptions.
It needs to analyze situation further.

But maybe, it makes sense to enable some kernel options from the kernel
hacking part (maybe, synchronization related). 

> As for fsck, I have not found it in git public repo, so where can I
> get the latest version?

Unfortunately, you can get it in the form of path set only. I sent the
last version (v4) in the e-mail list at November 12. 

With the best regards,
Vyacheslav Dubeyko.

> --------------------------------------------------
> Александров Сергей Васильевич
> 
> 
> 2012/11/16 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
> > On Fri, 2012-11-16 at 09:40 +0300, Сергей Александров wrote:
> >> Sorry, but I didn't save top output this time..
> >> But for sure, it was "mount /dev/md0 /nfs/raid -o ...." process. The
> >> CPU load was fully in kernel space.
> >> So while the mount call, the kernel was doing something very both IO
> >> and CPU intensive for almost 50 minutes.
> >> As I have already written the load was about 80MB/s read IO according
> >> to iotop, and about 60% of the first CPU core according to top.
> >>
> >
> > Ok. I see.
> >
> > I suspect currently that you can have some special corruption of the
> > volume state that is resulted in so long recovery code working time. But
> > if so, then you can have some warning messages in system log from
> > recovery subsystem (maybe not, of course). As I know, Gentoo has special
> > log that keeps error and warning messages from the kernel. Could you
> > check that shared by you the dmesg output contains error messages from
> > kernel?
> >
> > Moreover, current functionality state of fsck.nilfs2 is not very useful
> > yet. But it can check superblocks and segment summary headers validity.
> > Maybe it makes sense to check your volume by fsck.nilfs2. Could you try
> > to check your volume?
> >
> > With the best regards,
> > Vyacheslav Dubeyko.
> >
> >
> >> If this info is not sufficient I'll try to reproduce the case as soon
> >> as possible.
> >> --------------------------------------------------
> >> Александров Сергей Васильевич
> >>
> >>
> >> 2012/11/16 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
> >> > On Thu, 2012-11-15 at 16:08 +0300, Сергей Александров wrote:
> >> >> lssu, lscp after mount. Actually I missed the moment and
> >> >> nilfs_cleanerd has cleaned some data.
> >> >> Mount took about 50 minutes.
> >> >>
> >> >
> >> > Thank you for info.
> >> >
> >> > I have some additional questions after thinking about issue. As I
> >> > remember, you wrote that you tried to understand what process eats CPU
> >> > time during issue. But you don't share details about it. Could you share
> >> > details of "top" and "ps ax" outputs for the case of issue reproducing?
> >> >
> >> > With the best regards,
> >> > Vyacheslav Dubeyko.
> >> >
> >> >> --------------------------------------------------
> >> >> Александров Сергей Васильевич
> >> >>
> >> >>
> >
> >


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: very large mount time after unxepected power down
       [not found]                                                         ` <CAFPMYnEYnLv5e6a3ZcFRjw-8cNB80T5=mpuiX9jaWa+pEj8Q-A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2012-11-16  7:37                                                           ` Vyacheslav Dubeyko
@ 2012-11-16 13:52                                                           ` Vyacheslav Dubeyko
  2012-11-16 13:56                                                             ` Сергей Александров
  1 sibling, 1 reply; 28+ messages in thread
From: Vyacheslav Dubeyko @ 2012-11-16 13:52 UTC (permalink / raw)
  To: Сергей
	Александров
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 1404 bytes --]

On Fri, 2012-11-16 at 10:11 +0300, Сергей Александров wrote:
> dmesg:
> [53994.254432] NILFS warning: mounting unchecked fs
> [56686.968229] NILFS: recovery complete.
> [56686.969316] segctord starting. Construction interval = 5 seconds,
> CP frequency < 30 seconds
> 
> messages:
> Nov 15 10:57:06 router kernel: [53994.254432] NILFS warning: mounting
> unchecked fs
> Nov 15 11:42:02 router kernel: [56686.968229] NILFS: recovery complete.
> Nov 15 11:42:02 router kernel: [56686.969316] segctord starting.
> Construction interval = 5 seconds, CP frequency < 30 seconds
> 
> May be there is some kernel config option to get more debug output?
> 

I prepared small patch (please, find in attachment). This patch simply
adds several debug messages into recovery module. I suggest to apply the
patch on your NILFS2 driver and try to mount again in the issue
environment. I hope that these debug messages can give more information
about issue on your side.

The patch uses pr_debug() call. Please, see
Documentation/dynamic-debug-howto.txt for more details.

To be honest, I don't test the patch yet. So, if you will have any
troubles, please, e-mail to me.

With the best regards,
Vyacheslav Dubeyko.

> As for fsck, I have not found it in git public repo, so where can I
> get the latest version?
> --------------------------------------------------


[-- Attachment #2: 0001-nilfs-add-debug-messages-for-recovery-module.patch --]
[-- Type: text/x-patch, Size: 2657 bytes --]

From: Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
Subject: [PATCH] nilfs: add debug messages for recovery module

This patch simply adds several pr_debug() messages for debugging recovery code functioning.

Signed-off-by: Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
---
 fs/nilfs2/recovery.c |   19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/fs/nilfs2/recovery.c b/fs/nilfs2/recovery.c
index f1626f5..cb5c0c7 100644
--- a/fs/nilfs2/recovery.c
+++ b/fs/nilfs2/recovery.c
@@ -435,6 +435,12 @@ static int nilfs_prepare_segment_for_recovery(struct the_nilfs *nilfs,
 	int err;
 	int i;
 
+	pr_debug("NILFS: nilfs_prepare_segment_for_recovery\n");
+	pr_debug("NILFS: ns_segnum %lld, ns_nextnum %lld\n",
+			nilfs->ns_segnum, nilfs->ns_nextnum);
+	pr_debug("NILFS: ri_segnum %lld, ri_nextnum %lld\n",
+			ri->ri_segnum, ri->ri_nextnum);
+
 	segnum[0] = nilfs->ns_segnum;
 	segnum[1] = nilfs->ns_nextnum;
 	segnum[2] = ri->ri_segnum;
@@ -602,6 +608,11 @@ static int nilfs_do_roll_forward(struct the_nilfs *nilfs,
 	segnum = nilfs_get_segnum_of_block(nilfs, pseg_start);
 	nilfs_get_segment_range(nilfs, segnum, &seg_start, &seg_end);
 
+	pr_debug("NILFS: nilfs_do_roll_forward\n");
+	pr_debug("NILFS: pseg_start %lld, seg_seq %lld\n", pseg_start, seg_seq);
+	pr_debug("NILFS: segnum %lld\n", segnum);
+	pr_debug("NILFS: seg_start %lld, seg_end %lld\n", seg_start, seg_end);
+
 	while (segnum != ri->ri_segnum || pseg_start <= ri->ri_pseg_start) {
 		brelse(bh_sum);
 		bh_sum = nilfs_read_log_header(nilfs, pseg_start, &sum);
@@ -705,6 +716,8 @@ static void nilfs_finish_roll_forward(struct the_nilfs *nilfs,
 	struct buffer_head *bh;
 	int err;
 
+	pr_debug("NILFS: nilfs_finish_roll_forward\n");
+
 	if (nilfs_get_segnum_of_block(nilfs, ri->ri_lsegs_start) !=
 	    nilfs_get_segnum_of_block(nilfs, ri->ri_super_root))
 		return;
@@ -750,6 +763,8 @@ int nilfs_salvage_orphan_logs(struct the_nilfs *nilfs,
 	if (ri->ri_lsegs_start == 0 || ri->ri_lsegs_end == 0)
 		return 0;
 
+	pr_debug("NILFS: nilfs_salvage_orphan_logs\n");
+
 	err = nilfs_attach_checkpoint(sb, ri->ri_cno, true, &root);
 	if (unlikely(err)) {
 		printk(KERN_ERR
@@ -831,6 +846,10 @@ int nilfs_search_super_root(struct the_nilfs *nilfs,
 	cno = nilfs->ns_last_cno;
 	segnum = nilfs_get_segnum_of_block(nilfs, pseg_start);
 
+	pr_debug("NILFS: nilfs_search_super_root\n");
+	pr_debug("NILFS: pseg_start %lld, seg_seq %lld\n", pseg_start, seg_seq);
+	pr_debug("NILFS: cno %lld, segnum %lld\n", cno, segnum);
+
 	/* Calculate range of segment */
 	nilfs_get_segment_range(nilfs, segnum, &seg_start, &seg_end);
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: very large mount time after unxepected power down
  2012-11-16 13:52                                                           ` Vyacheslav Dubeyko
@ 2012-11-16 13:56                                                             ` Сергей Александров
       [not found]                                                               ` <CAFPMYnHzghu8k36wQj5h4K7a2wS6EcURcQmCOUTb5B2CJB9ufQ@mail.gmail.com>
  0 siblings, 1 reply; 28+ messages in thread
From: Сергей Александров @ 2012-11-16 13:56 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Ok, thanks. I'll try to apply it. I've also turned on function graph
tracing, may be the graph for init_nilfs function will help.
--------------------------------------------------
Александров Сергей Васильевич


2012/11/16 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
> On Fri, 2012-11-16 at 10:11 +0300, Сергей Александров wrote:
>> dmesg:
>> [53994.254432] NILFS warning: mounting unchecked fs
>> [56686.968229] NILFS: recovery complete.
>> [56686.969316] segctord starting. Construction interval = 5 seconds,
>> CP frequency < 30 seconds
>>
>> messages:
>> Nov 15 10:57:06 router kernel: [53994.254432] NILFS warning: mounting
>> unchecked fs
>> Nov 15 11:42:02 router kernel: [56686.968229] NILFS: recovery complete.
>> Nov 15 11:42:02 router kernel: [56686.969316] segctord starting.
>> Construction interval = 5 seconds, CP frequency < 30 seconds
>>
>> May be there is some kernel config option to get more debug output?
>>
>
> I prepared small patch (please, find in attachment). This patch simply
> adds several debug messages into recovery module. I suggest to apply the
> patch on your NILFS2 driver and try to mount again in the issue
> environment. I hope that these debug messages can give more information
> about issue on your side.
>
> The patch uses pr_debug() call. Please, see
> Documentation/dynamic-debug-howto.txt for more details.
>
> To be honest, I don't test the patch yet. So, if you will have any
> troubles, please, e-mail to me.
>
> With the best regards,
> Vyacheslav Dubeyko.
>
>> As for fsck, I have not found it in git public repo, so where can I
>> get the latest version?
>> --------------------------------------------------
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: very large mount time after unxepected power down
       [not found]                                                                     ` <782558C7-081D-4466-8780-51886E209A62-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2012-11-26 19:17                                                                       ` Сергей Александров
       [not found]                                                                         ` <CAFPMYnGT1byuVA1hCnETWc2GZAbDsjeS95-F8f15QYPe_YHABA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Сергей Александров @ 2012-11-26 19:17 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,

Unfortunately, the only output I saw is:
Nov 26 15:54:00 router kernel: [ 5808.068533] NILFS warning: mounting
unchecked fs
Nov 26 15:54:00 router kernel: [ 5808.068546] NILFS: nilfs_search_super_root
Nov 26 15:54:00 router kernel: [ 5808.068552] NILFS: pseg_start
115603456, seg_seq 4347757
Nov 26 15:54:00 router kernel: [ 5808.068558] NILFS: cno 1252241, segnum 56447

I suppose the last three lines are due to the patch.

Dynamic debug was on:
router dynamic_debug # cat control | grep nilfs
fs/nilfs2/recovery.c:851 [nilfs2]nilfs_search_super_root =p "NILFS:
cno %lld, segnum %lld\012"
fs/nilfs2/recovery.c:850 [nilfs2]nilfs_search_super_root =p "NILFS:
pseg_start %lld, seg_seq %lld\012"
fs/nilfs2/recovery.c:849 [nilfs2]nilfs_search_super_root =p "NILFS:
nilfs_search_super_root\012"
fs/nilfs2/recovery.c:766 [nilfs2]nilfs_salvage_orphan_logs =p "NILFS:
nilfs_salvage_orphan_logs\012"
fs/nilfs2/recovery.c:614 [nilfs2]nilfs_do_roll_forward =p "NILFS:
seg_start %lld, seg_end %lld\012"
fs/nilfs2/recovery.c:613 [nilfs2]nilfs_do_roll_forward =p "NILFS:
segnum %lld\012"
fs/nilfs2/recovery.c:612 [nilfs2]nilfs_do_roll_forward =p "NILFS:
pseg_start %lld, seg_seq %lld\012"
fs/nilfs2/recovery.c:611 [nilfs2]nilfs_do_roll_forward =p "NILFS:
nilfs_do_roll_forward\012"
fs/nilfs2/recovery.c:442 [nilfs2]nilfs_prepare_segment_for_recovery =p
"NILFS: ri_segnum %lld, ri_nextnum %lld\012"
fs/nilfs2/recovery.c:440 [nilfs2]nilfs_prepare_segment_for_recovery =p
"NILFS: ns_segnum %lld, ns_nextnum %lld\012"
fs/nilfs2/recovery.c:438 [nilfs2]nilfs_prepare_segment_for_recovery =p
"NILFS: nilfs_prepare_segment_for_recovery\012"
fs/nilfs2/recovery.c:719 [nilfs2]nilfs_finish_roll_forward =p "NILFS:
nilfs_finish_roll_forward\012"
--------------------------------------------------
Александров Сергей Васильевич


2012/11/27 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
> Hi,
>
> On Nov 26, 2012, at 7:19 PM, Сергей Александров wrote:
>
>> The bug finally appeared again. Here are the trace log and kernel log.
>> I've hard rebooted machine so FS is in unchecked state for now and it
>> is possible to make some other tests.
>> I'll manage to do without this partition for a day may be two.
>>
>
> Thank you for logs. But what about my patch with debug output? The output of the patch can be very helpful for the issue analysis. It needs to understand what segments are processed during mount. Could you share debug output of my patch?
>
> With the best regards,
> Vyacheslav Dubeyko.
>
>>
>> --------------------------------------------------
>> Александров Сергей Васильевич
>>
>>
>> 2012/11/16 Сергей Александров <splavgm-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
>>> Ok, thanks. I'll try to apply it. I've also turned on function graph
>>> tracing, may be the graph for init_nilfs function will help.
>>> --------------------------------------------------
>>> Александров Сергей Васильевич
>>>
>>>
>>> 2012/11/16 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
>>>> On Fri, 2012-11-16 at 10:11 +0300, Сергей Александров wrote:
>>>>> dmesg:
>>>>> [53994.254432] NILFS warning: mounting unchecked fs
>>>>> [56686.968229] NILFS: recovery complete.
>>>>> [56686.969316] segctord starting. Construction interval = 5 seconds,
>>>>> CP frequency < 30 seconds
>>>>>
>>>>> messages:
>>>>> Nov 15 10:57:06 router kernel: [53994.254432] NILFS warning: mounting
>>>>> unchecked fs
>>>>> Nov 15 11:42:02 router kernel: [56686.968229] NILFS: recovery complete.
>>>>> Nov 15 11:42:02 router kernel: [56686.969316] segctord starting.
>>>>> Construction interval = 5 seconds, CP frequency < 30 seconds
>>>>>
>>>>> May be there is some kernel config option to get more debug output?
>>>>>
>>>>
>>>> I prepared small patch (please, find in attachment). This patch simply
>>>> adds several debug messages into recovery module. I suggest to apply the
>>>> patch on your NILFS2 driver and try to mount again in the issue
>>>> environment. I hope that these debug messages can give more information
>>>> about issue on your side.
>>>>
>>>> The patch uses pr_debug() call. Please, see
>>>> Documentation/dynamic-debug-howto.txt for more details.
>>>>
>>>> To be honest, I don't test the patch yet. So, if you will have any
>>>> troubles, please, e-mail to me.
>>>>
>>>> With the best regards,
>>>> Vyacheslav Dubeyko.
>>>>
>>>>> As for fsck, I have not found it in git public repo, so where can I
>>>>> get the latest version?
>>>>> --------------------------------------------------
>>>>
>> <kern.log><trace_fail>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: very large mount time after unxepected power down
       [not found]                                                                 ` <CAFPMYnHzghu8k36wQj5h4K7a2wS6EcURcQmCOUTb5B2CJB9ufQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-11-26 20:10                                                                   ` Vyacheslav Dubeyko
       [not found]                                                                     ` <782558C7-081D-4466-8780-51886E209A62-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Vyacheslav Dubeyko @ 2012-11-26 20:10 UTC (permalink / raw)
  To: Сергей
	Александров
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,

On Nov 26, 2012, at 7:19 PM, Сергей Александров wrote:

> The bug finally appeared again. Here are the trace log and kernel log.
> I've hard rebooted machine so FS is in unchecked state for now and it
> is possible to make some other tests.
> I'll manage to do without this partition for a day may be two.
> 

Thank you for logs. But what about my patch with debug output? The output of the patch can be very helpful for the issue analysis. It needs to understand what segments are processed during mount. Could you share debug output of my patch?

With the best regards,
Vyacheslav Dubeyko. 

> 
> --------------------------------------------------
> Александров Сергей Васильевич
> 
> 
> 2012/11/16 Сергей Александров <splavgm-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
>> Ok, thanks. I'll try to apply it. I've also turned on function graph
>> tracing, may be the graph for init_nilfs function will help.
>> --------------------------------------------------
>> Александров Сергей Васильевич
>> 
>> 
>> 2012/11/16 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
>>> On Fri, 2012-11-16 at 10:11 +0300, Сергей Александров wrote:
>>>> dmesg:
>>>> [53994.254432] NILFS warning: mounting unchecked fs
>>>> [56686.968229] NILFS: recovery complete.
>>>> [56686.969316] segctord starting. Construction interval = 5 seconds,
>>>> CP frequency < 30 seconds
>>>> 
>>>> messages:
>>>> Nov 15 10:57:06 router kernel: [53994.254432] NILFS warning: mounting
>>>> unchecked fs
>>>> Nov 15 11:42:02 router kernel: [56686.968229] NILFS: recovery complete.
>>>> Nov 15 11:42:02 router kernel: [56686.969316] segctord starting.
>>>> Construction interval = 5 seconds, CP frequency < 30 seconds
>>>> 
>>>> May be there is some kernel config option to get more debug output?
>>>> 
>>> 
>>> I prepared small patch (please, find in attachment). This patch simply
>>> adds several debug messages into recovery module. I suggest to apply the
>>> patch on your NILFS2 driver and try to mount again in the issue
>>> environment. I hope that these debug messages can give more information
>>> about issue on your side.
>>> 
>>> The patch uses pr_debug() call. Please, see
>>> Documentation/dynamic-debug-howto.txt for more details.
>>> 
>>> To be honest, I don't test the patch yet. So, if you will have any
>>> troubles, please, e-mail to me.
>>> 
>>> With the best regards,
>>> Vyacheslav Dubeyko.
>>> 
>>>> As for fsck, I have not found it in git public repo, so where can I
>>>> get the latest version?
>>>> --------------------------------------------------
>>> 
> <kern.log><trace_fail>

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: very large mount time after unxepected power down
       [not found]                                                                         ` <CAFPMYnGT1byuVA1hCnETWc2GZAbDsjeS95-F8f15QYPe_YHABA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-11-27  7:12                                                                           ` Vyacheslav Dubeyko
  2012-11-27  7:47                                                                             ` Сергей Александров
  0 siblings, 1 reply; 28+ messages in thread
From: Vyacheslav Dubeyko @ 2012-11-27  7:12 UTC (permalink / raw)
  To: Сергей
	Александров
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,

On Mon, 2012-11-26 at 22:17 +0300, Сергей Александров wrote:
> Hi,
> 
> Unfortunately, the only output I saw is:
> Nov 26 15:54:00 router kernel: [ 5808.068533] NILFS warning: mounting
> unchecked fs
> Nov 26 15:54:00 router kernel: [ 5808.068546] NILFS: nilfs_search_super_root
> Nov 26 15:54:00 router kernel: [ 5808.068552] NILFS: pseg_start
> 115603456, seg_seq 4347757
> Nov 26 15:54:00 router kernel: [ 5808.068558] NILFS: cno 1252241, segnum 56447
> 

Sorry, the patch is working. The output about searching super root is
the result of debug output. But I expected more information in this
output. Did you break mount operation before ending?

As I understand, you have output not for whole mount operation. Am I
correct? Could you share whole debug output from the mount beginning to
the end?

Or do you share all available output that you had during so long mount
operation?

> I suppose the last three lines are due to the patch.
> 
> Dynamic debug was on:
> router dynamic_debug # cat control | grep nilfs
> fs/nilfs2/recovery.c:851 [nilfs2]nilfs_search_super_root =p "NILFS:
> cno %lld, segnum %lld\012"
> fs/nilfs2/recovery.c:850 [nilfs2]nilfs_search_super_root =p "NILFS:
> pseg_start %lld, seg_seq %lld\012"
> fs/nilfs2/recovery.c:849 [nilfs2]nilfs_search_super_root =p "NILFS:
> nilfs_search_super_root\012"
> fs/nilfs2/recovery.c:766 [nilfs2]nilfs_salvage_orphan_logs =p "NILFS:
> nilfs_salvage_orphan_logs\012"
> fs/nilfs2/recovery.c:614 [nilfs2]nilfs_do_roll_forward =p "NILFS:
> seg_start %lld, seg_end %lld\012"
> fs/nilfs2/recovery.c:613 [nilfs2]nilfs_do_roll_forward =p "NILFS:
> segnum %lld\012"
> fs/nilfs2/recovery.c:612 [nilfs2]nilfs_do_roll_forward =p "NILFS:
> pseg_start %lld, seg_seq %lld\012"
> fs/nilfs2/recovery.c:611 [nilfs2]nilfs_do_roll_forward =p "NILFS:
> nilfs_do_roll_forward\012"
> fs/nilfs2/recovery.c:442 [nilfs2]nilfs_prepare_segment_for_recovery =p
> "NILFS: ri_segnum %lld, ri_nextnum %lld\012"
> fs/nilfs2/recovery.c:440 [nilfs2]nilfs_prepare_segment_for_recovery =p
> "NILFS: ns_segnum %lld, ns_nextnum %lld\012"
> fs/nilfs2/recovery.c:438 [nilfs2]nilfs_prepare_segment_for_recovery =p
> "NILFS: nilfs_prepare_segment_for_recovery\012"
> fs/nilfs2/recovery.c:719 [nilfs2]nilfs_finish_roll_forward =p "NILFS:
> nilfs_finish_roll_forward\012"

This is simply list of known debug messages format. 

With the best regards,
Vyacheslav Dubeyko.

> --------------------------------------------------
> Александров Сергей Васильевич
> 
> 
> 2012/11/27 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
> > Hi,
> >
> > On Nov 26, 2012, at 7:19 PM, Сергей Александров wrote:
> >
> >> The bug finally appeared again. Here are the trace log and kernel log.
> >> I've hard rebooted machine so FS is in unchecked state for now and it
> >> is possible to make some other tests.
> >> I'll manage to do without this partition for a day may be two.
> >>
> >
> > Thank you for logs. But what about my patch with debug output? The output of the patch can be very helpful for the issue analysis. It needs to understand what segments are processed during mount. Could you share debug output of my patch?
> >
> > With the best regards,
> > Vyacheslav Dubeyko.
> >
> >>
> >> --------------------------------------------------
> >> Александров Сергей Васильевич
> >>
> >>
> >> 2012/11/16 Сергей Александров <splavgm-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
> >>> Ok, thanks. I'll try to apply it. I've also turned on function graph
> >>> tracing, may be the graph for init_nilfs function will help.
> >>> --------------------------------------------------
> >>> Александров Сергей Васильевич
> >>>
> >>>
> >>> 2012/11/16 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
> >>>> On Fri, 2012-11-16 at 10:11 +0300, Сергей Александров wrote:
> >>>>> dmesg:
> >>>>> [53994.254432] NILFS warning: mounting unchecked fs
> >>>>> [56686.968229] NILFS: recovery complete.
> >>>>> [56686.969316] segctord starting. Construction interval = 5 seconds,
> >>>>> CP frequency < 30 seconds
> >>>>>
> >>>>> messages:
> >>>>> Nov 15 10:57:06 router kernel: [53994.254432] NILFS warning: mounting
> >>>>> unchecked fs
> >>>>> Nov 15 11:42:02 router kernel: [56686.968229] NILFS: recovery complete.
> >>>>> Nov 15 11:42:02 router kernel: [56686.969316] segctord starting.
> >>>>> Construction interval = 5 seconds, CP frequency < 30 seconds
> >>>>>
> >>>>> May be there is some kernel config option to get more debug output?
> >>>>>
> >>>>
> >>>> I prepared small patch (please, find in attachment). This patch simply
> >>>> adds several debug messages into recovery module. I suggest to apply the
> >>>> patch on your NILFS2 driver and try to mount again in the issue
> >>>> environment. I hope that these debug messages can give more information
> >>>> about issue on your side.
> >>>>
> >>>> The patch uses pr_debug() call. Please, see
> >>>> Documentation/dynamic-debug-howto.txt for more details.
> >>>>
> >>>> To be honest, I don't test the patch yet. So, if you will have any
> >>>> troubles, please, e-mail to me.
> >>>>
> >>>> With the best regards,
> >>>> Vyacheslav Dubeyko.
> >>>>
> >>>>> As for fsck, I have not found it in git public repo, so where can I
> >>>>> get the latest version?
> >>>>> --------------------------------------------------
> >>>>
> >> <kern.log><trace_fail>
> >


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: very large mount time after unxepected power down
  2012-11-27  7:12                                                                           ` Vyacheslav Dubeyko
@ 2012-11-27  7:47                                                                             ` Сергей Александров
       [not found]                                                                               ` <CAFPMYnFTvQVTC7sgmq=9sx4hX7fKkXyjNHSRXay3yTuRQsOq4w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Сергей Александров @ 2012-11-27  7:47 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,

Yes, I it's not a full output, it's an output for abut 2 minutes.
I can get a full one, but after that the partition apparently will be
checked and it' will be hard to reproduce.
I also can split md device, so that there will be another attempt, should I?
--------------------------------------------------
Александров Сергей Васильевич


2012/11/27 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
> Hi,
>
> On Mon, 2012-11-26 at 22:17 +0300, Сергей Александров wrote:
>> Hi,
>>
>> Unfortunately, the only output I saw is:
>> Nov 26 15:54:00 router kernel: [ 5808.068533] NILFS warning: mounting
>> unchecked fs
>> Nov 26 15:54:00 router kernel: [ 5808.068546] NILFS: nilfs_search_super_root
>> Nov 26 15:54:00 router kernel: [ 5808.068552] NILFS: pseg_start
>> 115603456, seg_seq 4347757
>> Nov 26 15:54:00 router kernel: [ 5808.068558] NILFS: cno 1252241, segnum 56447
>>
>
> Sorry, the patch is working. The output about searching super root is
> the result of debug output. But I expected more information in this
> output. Did you break mount operation before ending?
>
> As I understand, you have output not for whole mount operation. Am I
> correct? Could you share whole debug output from the mount beginning to
> the end?
>
> Or do you share all available output that you had during so long mount
> operation?
>
>> I suppose the last three lines are due to the patch.
>>
>> Dynamic debug was on:
>> router dynamic_debug # cat control | grep nilfs
>> fs/nilfs2/recovery.c:851 [nilfs2]nilfs_search_super_root =p "NILFS:
>> cno %lld, segnum %lld\012"
>> fs/nilfs2/recovery.c:850 [nilfs2]nilfs_search_super_root =p "NILFS:
>> pseg_start %lld, seg_seq %lld\012"
>> fs/nilfs2/recovery.c:849 [nilfs2]nilfs_search_super_root =p "NILFS:
>> nilfs_search_super_root\012"
>> fs/nilfs2/recovery.c:766 [nilfs2]nilfs_salvage_orphan_logs =p "NILFS:
>> nilfs_salvage_orphan_logs\012"
>> fs/nilfs2/recovery.c:614 [nilfs2]nilfs_do_roll_forward =p "NILFS:
>> seg_start %lld, seg_end %lld\012"
>> fs/nilfs2/recovery.c:613 [nilfs2]nilfs_do_roll_forward =p "NILFS:
>> segnum %lld\012"
>> fs/nilfs2/recovery.c:612 [nilfs2]nilfs_do_roll_forward =p "NILFS:
>> pseg_start %lld, seg_seq %lld\012"
>> fs/nilfs2/recovery.c:611 [nilfs2]nilfs_do_roll_forward =p "NILFS:
>> nilfs_do_roll_forward\012"
>> fs/nilfs2/recovery.c:442 [nilfs2]nilfs_prepare_segment_for_recovery =p
>> "NILFS: ri_segnum %lld, ri_nextnum %lld\012"
>> fs/nilfs2/recovery.c:440 [nilfs2]nilfs_prepare_segment_for_recovery =p
>> "NILFS: ns_segnum %lld, ns_nextnum %lld\012"
>> fs/nilfs2/recovery.c:438 [nilfs2]nilfs_prepare_segment_for_recovery =p
>> "NILFS: nilfs_prepare_segment_for_recovery\012"
>> fs/nilfs2/recovery.c:719 [nilfs2]nilfs_finish_roll_forward =p "NILFS:
>> nilfs_finish_roll_forward\012"
>
> This is simply list of known debug messages format.
>
> With the best regards,
> Vyacheslav Dubeyko.
>
>> --------------------------------------------------
>> Александров Сергей Васильевич
>>
>>
>> 2012/11/27 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
>> > Hi,
>> >
>> > On Nov 26, 2012, at 7:19 PM, Сергей Александров wrote:
>> >
>> >> The bug finally appeared again. Here are the trace log and kernel log.
>> >> I've hard rebooted machine so FS is in unchecked state for now and it
>> >> is possible to make some other tests.
>> >> I'll manage to do without this partition for a day may be two.
>> >>
>> >
>> > Thank you for logs. But what about my patch with debug output? The output of the patch can be very helpful for the issue analysis. It needs to understand what segments are processed during mount. Could you share debug output of my patch?
>> >
>> > With the best regards,
>> > Vyacheslav Dubeyko.
>> >
>> >>
>> >> --------------------------------------------------
>> >> Александров Сергей Васильевич
>> >>
>> >>
>> >> 2012/11/16 Сергей Александров <splavgm-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
>> >>> Ok, thanks. I'll try to apply it. I've also turned on function graph
>> >>> tracing, may be the graph for init_nilfs function will help.
>> >>> --------------------------------------------------
>> >>> Александров Сергей Васильевич
>> >>>
>> >>>
>> >>> 2012/11/16 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
>> >>>> On Fri, 2012-11-16 at 10:11 +0300, Сергей Александров wrote:
>> >>>>> dmesg:
>> >>>>> [53994.254432] NILFS warning: mounting unchecked fs
>> >>>>> [56686.968229] NILFS: recovery complete.
>> >>>>> [56686.969316] segctord starting. Construction interval = 5 seconds,
>> >>>>> CP frequency < 30 seconds
>> >>>>>
>> >>>>> messages:
>> >>>>> Nov 15 10:57:06 router kernel: [53994.254432] NILFS warning: mounting
>> >>>>> unchecked fs
>> >>>>> Nov 15 11:42:02 router kernel: [56686.968229] NILFS: recovery complete.
>> >>>>> Nov 15 11:42:02 router kernel: [56686.969316] segctord starting.
>> >>>>> Construction interval = 5 seconds, CP frequency < 30 seconds
>> >>>>>
>> >>>>> May be there is some kernel config option to get more debug output?
>> >>>>>
>> >>>>
>> >>>> I prepared small patch (please, find in attachment). This patch simply
>> >>>> adds several debug messages into recovery module. I suggest to apply the
>> >>>> patch on your NILFS2 driver and try to mount again in the issue
>> >>>> environment. I hope that these debug messages can give more information
>> >>>> about issue on your side.
>> >>>>
>> >>>> The patch uses pr_debug() call. Please, see
>> >>>> Documentation/dynamic-debug-howto.txt for more details.
>> >>>>
>> >>>> To be honest, I don't test the patch yet. So, if you will have any
>> >>>> troubles, please, e-mail to me.
>> >>>>
>> >>>> With the best regards,
>> >>>> Vyacheslav Dubeyko.
>> >>>>
>> >>>>> As for fsck, I have not found it in git public repo, so where can I
>> >>>>> get the latest version?
>> >>>>> --------------------------------------------------
>> >>>>
>> >> <kern.log><trace_fail>
>> >
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: very large mount time after unxepected power down
       [not found]                                                                               ` <CAFPMYnFTvQVTC7sgmq=9sx4hX7fKkXyjNHSRXay3yTuRQsOq4w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-11-27  7:58                                                                                 ` Vyacheslav Dubeyko
  2012-11-27  8:03                                                                                   ` Сергей Александров
  0 siblings, 1 reply; 28+ messages in thread
From: Vyacheslav Dubeyko @ 2012-11-27  7:58 UTC (permalink / raw)
  To: Сергей
	Александров
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Tue, 2012-11-27 at 10:47 +0300, Сергей Александров wrote:
> Hi,
> 
> Yes, I it's not a full output, it's an output for abut 2 minutes.
> I can get a full one, but after that the partition apparently will be
> checked and it' will be hard to reproduce.
> I also can split md device, so that there will be another attempt, should I?

If you get enough disk space then it is possible to get full volume
image. And, as a result, to get issue reproducible volume state by means
of mount volume replica as loop device anytime. Could you do it?

I think that we need to get whole output of mount operation anyway for
the issue analysis.

With the best regards,
Vyacheslav Dubeyko.

> --------------------------------------------------
> Александров Сергей Васильевич
> 
> 
> 2012/11/27 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
> > Hi,
> >
> > On Mon, 2012-11-26 at 22:17 +0300, Сергей Александров wrote:
> >> Hi,
> >>
> >> Unfortunately, the only output I saw is:
> >> Nov 26 15:54:00 router kernel: [ 5808.068533] NILFS warning: mounting
> >> unchecked fs
> >> Nov 26 15:54:00 router kernel: [ 5808.068546] NILFS: nilfs_search_super_root
> >> Nov 26 15:54:00 router kernel: [ 5808.068552] NILFS: pseg_start
> >> 115603456, seg_seq 4347757
> >> Nov 26 15:54:00 router kernel: [ 5808.068558] NILFS: cno 1252241, segnum 56447
> >>
> >
> > Sorry, the patch is working. The output about searching super root is
> > the result of debug output. But I expected more information in this
> > output. Did you break mount operation before ending?
> >
> > As I understand, you have output not for whole mount operation. Am I
> > correct? Could you share whole debug output from the mount beginning to
> > the end?
> >
> > Or do you share all available output that you had during so long mount
> > operation?
> >
> >> I suppose the last three lines are due to the patch.
> >>
> >> Dynamic debug was on:
> >> router dynamic_debug # cat control | grep nilfs
> >> fs/nilfs2/recovery.c:851 [nilfs2]nilfs_search_super_root =p "NILFS:
> >> cno %lld, segnum %lld\012"
> >> fs/nilfs2/recovery.c:850 [nilfs2]nilfs_search_super_root =p "NILFS:
> >> pseg_start %lld, seg_seq %lld\012"
> >> fs/nilfs2/recovery.c:849 [nilfs2]nilfs_search_super_root =p "NILFS:
> >> nilfs_search_super_root\012"
> >> fs/nilfs2/recovery.c:766 [nilfs2]nilfs_salvage_orphan_logs =p "NILFS:
> >> nilfs_salvage_orphan_logs\012"
> >> fs/nilfs2/recovery.c:614 [nilfs2]nilfs_do_roll_forward =p "NILFS:
> >> seg_start %lld, seg_end %lld\012"
> >> fs/nilfs2/recovery.c:613 [nilfs2]nilfs_do_roll_forward =p "NILFS:
> >> segnum %lld\012"
> >> fs/nilfs2/recovery.c:612 [nilfs2]nilfs_do_roll_forward =p "NILFS:
> >> pseg_start %lld, seg_seq %lld\012"
> >> fs/nilfs2/recovery.c:611 [nilfs2]nilfs_do_roll_forward =p "NILFS:
> >> nilfs_do_roll_forward\012"
> >> fs/nilfs2/recovery.c:442 [nilfs2]nilfs_prepare_segment_for_recovery =p
> >> "NILFS: ri_segnum %lld, ri_nextnum %lld\012"
> >> fs/nilfs2/recovery.c:440 [nilfs2]nilfs_prepare_segment_for_recovery =p
> >> "NILFS: ns_segnum %lld, ns_nextnum %lld\012"
> >> fs/nilfs2/recovery.c:438 [nilfs2]nilfs_prepare_segment_for_recovery =p
> >> "NILFS: nilfs_prepare_segment_for_recovery\012"
> >> fs/nilfs2/recovery.c:719 [nilfs2]nilfs_finish_roll_forward =p "NILFS:
> >> nilfs_finish_roll_forward\012"
> >
> > This is simply list of known debug messages format.
> >
> > With the best regards,
> > Vyacheslav Dubeyko.
> >
> >> --------------------------------------------------
> >> Александров Сергей Васильевич
> >>
> >>
> >> 2012/11/27 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
> >> > Hi,
> >> >
> >> > On Nov 26, 2012, at 7:19 PM, Сергей Александров wrote:
> >> >
> >> >> The bug finally appeared again. Here are the trace log and kernel log.
> >> >> I've hard rebooted machine so FS is in unchecked state for now and it
> >> >> is possible to make some other tests.
> >> >> I'll manage to do without this partition for a day may be two.
> >> >>
> >> >
> >> > Thank you for logs. But what about my patch with debug output? The output of the patch can be very helpful for the issue analysis. It needs to understand what segments are processed during mount. Could you share debug output of my patch?
> >> >
> >> > With the best regards,
> >> > Vyacheslav Dubeyko.
> >> >
> >> >>
> >> >> --------------------------------------------------
> >> >> Александров Сергей Васильевич
> >> >>
> >> >>
> >> >> 2012/11/16 Сергей Александров <splavgm-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
> >> >>> Ok, thanks. I'll try to apply it. I've also turned on function graph
> >> >>> tracing, may be the graph for init_nilfs function will help.
> >> >>> --------------------------------------------------
> >> >>> Александров Сергей Васильевич
> >> >>>
> >> >>>
> >> >>> 2012/11/16 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
> >> >>>> On Fri, 2012-11-16 at 10:11 +0300, Сергей Александров wrote:
> >> >>>>> dmesg:
> >> >>>>> [53994.254432] NILFS warning: mounting unchecked fs
> >> >>>>> [56686.968229] NILFS: recovery complete.
> >> >>>>> [56686.969316] segctord starting. Construction interval = 5 seconds,
> >> >>>>> CP frequency < 30 seconds
> >> >>>>>
> >> >>>>> messages:
> >> >>>>> Nov 15 10:57:06 router kernel: [53994.254432] NILFS warning: mounting
> >> >>>>> unchecked fs
> >> >>>>> Nov 15 11:42:02 router kernel: [56686.968229] NILFS: recovery complete.
> >> >>>>> Nov 15 11:42:02 router kernel: [56686.969316] segctord starting.
> >> >>>>> Construction interval = 5 seconds, CP frequency < 30 seconds
> >> >>>>>
> >> >>>>> May be there is some kernel config option to get more debug output?
> >> >>>>>
> >> >>>>
> >> >>>> I prepared small patch (please, find in attachment). This patch simply
> >> >>>> adds several debug messages into recovery module. I suggest to apply the
> >> >>>> patch on your NILFS2 driver and try to mount again in the issue
> >> >>>> environment. I hope that these debug messages can give more information
> >> >>>> about issue on your side.
> >> >>>>
> >> >>>> The patch uses pr_debug() call. Please, see
> >> >>>> Documentation/dynamic-debug-howto.txt for more details.
> >> >>>>
> >> >>>> To be honest, I don't test the patch yet. So, if you will have any
> >> >>>> troubles, please, e-mail to me.
> >> >>>>
> >> >>>> With the best regards,
> >> >>>> Vyacheslav Dubeyko.
> >> >>>>
> >> >>>>> As for fsck, I have not found it in git public repo, so where can I
> >> >>>>> get the latest version?
> >> >>>>> --------------------------------------------------
> >> >>>>
> >> >> <kern.log><trace_fail>
> >> >
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: very large mount time after unxepected power down
  2012-11-27  7:58                                                                                 ` Vyacheslav Dubeyko
@ 2012-11-27  8:03                                                                                   ` Сергей Александров
       [not found]                                                                                     ` <CAFPMYnEPJLQs9TVMy8PFXqV-XvsM8oTvrf_XTE0+g9cQ+5MJXA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Сергей Александров @ 2012-11-27  8:03 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Sorry, I don't have another multi-TB volume.. And this is a 1TB large.
But I can do split/resync on md to restore broken state.

I'll send full log as soon as possible.
--------------------------------------------------
Александров Сергей Васильевич


2012/11/27 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
> On Tue, 2012-11-27 at 10:47 +0300, Сергей Александров wrote:
>> Hi,
>>
>> Yes, I it's not a full output, it's an output for abut 2 minutes.
>> I can get a full one, but after that the partition apparently will be
>> checked and it' will be hard to reproduce.
>> I also can split md device, so that there will be another attempt, should I?
>
> If you get enough disk space then it is possible to get full volume
> image. And, as a result, to get issue reproducible volume state by means
> of mount volume replica as loop device anytime. Could you do it?
>
> I think that we need to get whole output of mount operation anyway for
> the issue analysis.
>
> With the best regards,
> Vyacheslav Dubeyko.
>
>> --------------------------------------------------
>> Александров Сергей Васильевич
>>
>>
>> 2012/11/27 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
>> > Hi,
>> >
>> > On Mon, 2012-11-26 at 22:17 +0300, Сергей Александров wrote:
>> >> Hi,
>> >>
>> >> Unfortunately, the only output I saw is:
>> >> Nov 26 15:54:00 router kernel: [ 5808.068533] NILFS warning: mounting
>> >> unchecked fs
>> >> Nov 26 15:54:00 router kernel: [ 5808.068546] NILFS: nilfs_search_super_root
>> >> Nov 26 15:54:00 router kernel: [ 5808.068552] NILFS: pseg_start
>> >> 115603456, seg_seq 4347757
>> >> Nov 26 15:54:00 router kernel: [ 5808.068558] NILFS: cno 1252241, segnum 56447
>> >>
>> >
>> > Sorry, the patch is working. The output about searching super root is
>> > the result of debug output. But I expected more information in this
>> > output. Did you break mount operation before ending?
>> >
>> > As I understand, you have output not for whole mount operation. Am I
>> > correct? Could you share whole debug output from the mount beginning to
>> > the end?
>> >
>> > Or do you share all available output that you had during so long mount
>> > operation?
>> >
>> >> I suppose the last three lines are due to the patch.
>> >>
>> >> Dynamic debug was on:
>> >> router dynamic_debug # cat control | grep nilfs
>> >> fs/nilfs2/recovery.c:851 [nilfs2]nilfs_search_super_root =p "NILFS:
>> >> cno %lld, segnum %lld\012"
>> >> fs/nilfs2/recovery.c:850 [nilfs2]nilfs_search_super_root =p "NILFS:
>> >> pseg_start %lld, seg_seq %lld\012"
>> >> fs/nilfs2/recovery.c:849 [nilfs2]nilfs_search_super_root =p "NILFS:
>> >> nilfs_search_super_root\012"
>> >> fs/nilfs2/recovery.c:766 [nilfs2]nilfs_salvage_orphan_logs =p "NILFS:
>> >> nilfs_salvage_orphan_logs\012"
>> >> fs/nilfs2/recovery.c:614 [nilfs2]nilfs_do_roll_forward =p "NILFS:
>> >> seg_start %lld, seg_end %lld\012"
>> >> fs/nilfs2/recovery.c:613 [nilfs2]nilfs_do_roll_forward =p "NILFS:
>> >> segnum %lld\012"
>> >> fs/nilfs2/recovery.c:612 [nilfs2]nilfs_do_roll_forward =p "NILFS:
>> >> pseg_start %lld, seg_seq %lld\012"
>> >> fs/nilfs2/recovery.c:611 [nilfs2]nilfs_do_roll_forward =p "NILFS:
>> >> nilfs_do_roll_forward\012"
>> >> fs/nilfs2/recovery.c:442 [nilfs2]nilfs_prepare_segment_for_recovery =p
>> >> "NILFS: ri_segnum %lld, ri_nextnum %lld\012"
>> >> fs/nilfs2/recovery.c:440 [nilfs2]nilfs_prepare_segment_for_recovery =p
>> >> "NILFS: ns_segnum %lld, ns_nextnum %lld\012"
>> >> fs/nilfs2/recovery.c:438 [nilfs2]nilfs_prepare_segment_for_recovery =p
>> >> "NILFS: nilfs_prepare_segment_for_recovery\012"
>> >> fs/nilfs2/recovery.c:719 [nilfs2]nilfs_finish_roll_forward =p "NILFS:
>> >> nilfs_finish_roll_forward\012"
>> >
>> > This is simply list of known debug messages format.
>> >
>> > With the best regards,
>> > Vyacheslav Dubeyko.
>> >
>> >> --------------------------------------------------
>> >> Александров Сергей Васильевич
>> >>
>> >>
>> >> 2012/11/27 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
>> >> > Hi,
>> >> >
>> >> > On Nov 26, 2012, at 7:19 PM, Сергей Александров wrote:
>> >> >
>> >> >> The bug finally appeared again. Here are the trace log and kernel log.
>> >> >> I've hard rebooted machine so FS is in unchecked state for now and it
>> >> >> is possible to make some other tests.
>> >> >> I'll manage to do without this partition for a day may be two.
>> >> >>
>> >> >
>> >> > Thank you for logs. But what about my patch with debug output? The output of the patch can be very helpful for the issue analysis. It needs to understand what segments are processed during mount. Could you share debug output of my patch?
>> >> >
>> >> > With the best regards,
>> >> > Vyacheslav Dubeyko.
>> >> >
>> >> >>
>> >> >> --------------------------------------------------
>> >> >> Александров Сергей Васильевич
>> >> >>
>> >> >>
>> >> >> 2012/11/16 Сергей Александров <splavgm-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
>> >> >>> Ok, thanks. I'll try to apply it. I've also turned on function graph
>> >> >>> tracing, may be the graph for init_nilfs function will help.
>> >> >>> --------------------------------------------------
>> >> >>> Александров Сергей Васильевич
>> >> >>>
>> >> >>>
>> >> >>> 2012/11/16 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
>> >> >>>> On Fri, 2012-11-16 at 10:11 +0300, Сергей Александров wrote:
>> >> >>>>> dmesg:
>> >> >>>>> [53994.254432] NILFS warning: mounting unchecked fs
>> >> >>>>> [56686.968229] NILFS: recovery complete.
>> >> >>>>> [56686.969316] segctord starting. Construction interval = 5 seconds,
>> >> >>>>> CP frequency < 30 seconds
>> >> >>>>>
>> >> >>>>> messages:
>> >> >>>>> Nov 15 10:57:06 router kernel: [53994.254432] NILFS warning: mounting
>> >> >>>>> unchecked fs
>> >> >>>>> Nov 15 11:42:02 router kernel: [56686.968229] NILFS: recovery complete.
>> >> >>>>> Nov 15 11:42:02 router kernel: [56686.969316] segctord starting.
>> >> >>>>> Construction interval = 5 seconds, CP frequency < 30 seconds
>> >> >>>>>
>> >> >>>>> May be there is some kernel config option to get more debug output?
>> >> >>>>>
>> >> >>>>
>> >> >>>> I prepared small patch (please, find in attachment). This patch simply
>> >> >>>> adds several debug messages into recovery module. I suggest to apply the
>> >> >>>> patch on your NILFS2 driver and try to mount again in the issue
>> >> >>>> environment. I hope that these debug messages can give more information
>> >> >>>> about issue on your side.
>> >> >>>>
>> >> >>>> The patch uses pr_debug() call. Please, see
>> >> >>>> Documentation/dynamic-debug-howto.txt for more details.
>> >> >>>>
>> >> >>>> To be honest, I don't test the patch yet. So, if you will have any
>> >> >>>> troubles, please, e-mail to me.
>> >> >>>>
>> >> >>>> With the best regards,
>> >> >>>> Vyacheslav Dubeyko.
>> >> >>>>
>> >> >>>>> As for fsck, I have not found it in git public repo, so where can I
>> >> >>>>> get the latest version?
>> >> >>>>> --------------------------------------------------
>> >> >>>>
>> >> >> <kern.log><trace_fail>
>> >> >
>> >
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: very large mount time after unxepected power down
       [not found]                                                                                     ` <CAFPMYnEPJLQs9TVMy8PFXqV-XvsM8oTvrf_XTE0+g9cQ+5MJXA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-11-27  9:58                                                                                       ` Сергей Александров
       [not found]                                                                                         ` <CAFPMYnFKKwfRD6zpO_AYCMXP5U_deHfOF6CXGdd6jjMjVkMJ6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Сергей Александров @ 2012-11-27  9:58 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

As I expected, no new info:
[57291.417113] NILFS warning: mounting unchecked fs
[57291.417125] NILFS: nilfs_search_super_root
[57291.417132] NILFS: pseg_start 115603456, seg_seq 4347757
[57291.417137] NILFS: cno 1252241, segnum 56447
[62768.349242] NILFS: recovery complete.
[62768.351804] segctord starting. Construction interval = 5 seconds,
CP frequency < 30 seconds

So, I suppose, nilfs is searching for a super root for a very long
time and find's out that it is consistent.
--------------------------------------------------
Александров Сергей Васильевич


2012/11/27 Сергей Александров <splavgm@gmail.com>:
> Sorry, I don't have another multi-TB volume.. And this is a 1TB large.
> But I can do split/resync on md to restore broken state.
>
> I'll send full log as soon as possible.
> --------------------------------------------------
> Александров Сергей Васильевич
>
>
> 2012/11/27 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
>> On Tue, 2012-11-27 at 10:47 +0300, Сергей Александров wrote:
>>> Hi,
>>>
>>> Yes, I it's not a full output, it's an output for abut 2 minutes.
>>> I can get a full one, but after that the partition apparently will be
>>> checked and it' will be hard to reproduce.
>>> I also can split md device, so that there will be another attempt, should I?
>>
>> If you get enough disk space then it is possible to get full volume
>> image. And, as a result, to get issue reproducible volume state by means
>> of mount volume replica as loop device anytime. Could you do it?
>>
>> I think that we need to get whole output of mount operation anyway for
>> the issue analysis.
>>
>> With the best regards,
>> Vyacheslav Dubeyko.
>>
>>> --------------------------------------------------
>>> Александров Сергей Васильевич
>>>
>>>
>>> 2012/11/27 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
>>> > Hi,
>>> >
>>> > On Mon, 2012-11-26 at 22:17 +0300, Сергей Александров wrote:
>>> >> Hi,
>>> >>
>>> >> Unfortunately, the only output I saw is:
>>> >> Nov 26 15:54:00 router kernel: [ 5808.068533] NILFS warning: mounting
>>> >> unchecked fs
>>> >> Nov 26 15:54:00 router kernel: [ 5808.068546] NILFS: nilfs_search_super_root
>>> >> Nov 26 15:54:00 router kernel: [ 5808.068552] NILFS: pseg_start
>>> >> 115603456, seg_seq 4347757
>>> >> Nov 26 15:54:00 router kernel: [ 5808.068558] NILFS: cno 1252241, segnum 56447
>>> >>
>>> >
>>> > Sorry, the patch is working. The output about searching super root is
>>> > the result of debug output. But I expected more information in this
>>> > output. Did you break mount operation before ending?
>>> >
>>> > As I understand, you have output not for whole mount operation. Am I
>>> > correct? Could you share whole debug output from the mount beginning to
>>> > the end?
>>> >
>>> > Or do you share all available output that you had during so long mount
>>> > operation?
>>> >
>>> >> I suppose the last three lines are due to the patch.
>>> >>
>>> >> Dynamic debug was on:
>>> >> router dynamic_debug # cat control | grep nilfs
>>> >> fs/nilfs2/recovery.c:851 [nilfs2]nilfs_search_super_root =p "NILFS:
>>> >> cno %lld, segnum %lld\012"
>>> >> fs/nilfs2/recovery.c:850 [nilfs2]nilfs_search_super_root =p "NILFS:
>>> >> pseg_start %lld, seg_seq %lld\012"
>>> >> fs/nilfs2/recovery.c:849 [nilfs2]nilfs_search_super_root =p "NILFS:
>>> >> nilfs_search_super_root\012"
>>> >> fs/nilfs2/recovery.c:766 [nilfs2]nilfs_salvage_orphan_logs =p "NILFS:
>>> >> nilfs_salvage_orphan_logs\012"
>>> >> fs/nilfs2/recovery.c:614 [nilfs2]nilfs_do_roll_forward =p "NILFS:
>>> >> seg_start %lld, seg_end %lld\012"
>>> >> fs/nilfs2/recovery.c:613 [nilfs2]nilfs_do_roll_forward =p "NILFS:
>>> >> segnum %lld\012"
>>> >> fs/nilfs2/recovery.c:612 [nilfs2]nilfs_do_roll_forward =p "NILFS:
>>> >> pseg_start %lld, seg_seq %lld\012"
>>> >> fs/nilfs2/recovery.c:611 [nilfs2]nilfs_do_roll_forward =p "NILFS:
>>> >> nilfs_do_roll_forward\012"
>>> >> fs/nilfs2/recovery.c:442 [nilfs2]nilfs_prepare_segment_for_recovery =p
>>> >> "NILFS: ri_segnum %lld, ri_nextnum %lld\012"
>>> >> fs/nilfs2/recovery.c:440 [nilfs2]nilfs_prepare_segment_for_recovery =p
>>> >> "NILFS: ns_segnum %lld, ns_nextnum %lld\012"
>>> >> fs/nilfs2/recovery.c:438 [nilfs2]nilfs_prepare_segment_for_recovery =p
>>> >> "NILFS: nilfs_prepare_segment_for_recovery\012"
>>> >> fs/nilfs2/recovery.c:719 [nilfs2]nilfs_finish_roll_forward =p "NILFS:
>>> >> nilfs_finish_roll_forward\012"
>>> >
>>> > This is simply list of known debug messages format.
>>> >
>>> > With the best regards,
>>> > Vyacheslav Dubeyko.
>>> >
>>> >> --------------------------------------------------
>>> >> Александров Сергей Васильевич
>>> >>
>>> >>
>>> >> 2012/11/27 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
>>> >> > Hi,
>>> >> >
>>> >> > On Nov 26, 2012, at 7:19 PM, Сергей Александров wrote:
>>> >> >
>>> >> >> The bug finally appeared again. Here are the trace log and kernel log.
>>> >> >> I've hard rebooted machine so FS is in unchecked state for now and it
>>> >> >> is possible to make some other tests.
>>> >> >> I'll manage to do without this partition for a day may be two.
>>> >> >>
>>> >> >
>>> >> > Thank you for logs. But what about my patch with debug output? The output of the patch can be very helpful for the issue analysis. It needs to understand what segments are processed during mount. Could you share debug output of my patch?
>>> >> >
>>> >> > With the best regards,
>>> >> > Vyacheslav Dubeyko.
>>> >> >
>>> >> >>
>>> >> >> --------------------------------------------------
>>> >> >> Александров Сергей Васильевич
>>> >> >>
>>> >> >>
>>> >> >> 2012/11/16 Сергей Александров <splavgm-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
>>> >> >>> Ok, thanks. I'll try to apply it. I've also turned on function graph
>>> >> >>> tracing, may be the graph for init_nilfs function will help.
>>> >> >>> --------------------------------------------------
>>> >> >>> Александров Сергей Васильевич
>>> >> >>>
>>> >> >>>
>>> >> >>> 2012/11/16 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
>>> >> >>>> On Fri, 2012-11-16 at 10:11 +0300, Сергей Александров wrote:
>>> >> >>>>> dmesg:
>>> >> >>>>> [53994.254432] NILFS warning: mounting unchecked fs
>>> >> >>>>> [56686.968229] NILFS: recovery complete.
>>> >> >>>>> [56686.969316] segctord starting. Construction interval = 5 seconds,
>>> >> >>>>> CP frequency < 30 seconds
>>> >> >>>>>
>>> >> >>>>> messages:
>>> >> >>>>> Nov 15 10:57:06 router kernel: [53994.254432] NILFS warning: mounting
>>> >> >>>>> unchecked fs
>>> >> >>>>> Nov 15 11:42:02 router kernel: [56686.968229] NILFS: recovery complete.
>>> >> >>>>> Nov 15 11:42:02 router kernel: [56686.969316] segctord starting.
>>> >> >>>>> Construction interval = 5 seconds, CP frequency < 30 seconds
>>> >> >>>>>
>>> >> >>>>> May be there is some kernel config option to get more debug output?
>>> >> >>>>>
>>> >> >>>>
>>> >> >>>> I prepared small patch (please, find in attachment). This patch simply
>>> >> >>>> adds several debug messages into recovery module. I suggest to apply the
>>> >> >>>> patch on your NILFS2 driver and try to mount again in the issue
>>> >> >>>> environment. I hope that these debug messages can give more information
>>> >> >>>> about issue on your side.
>>> >> >>>>
>>> >> >>>> The patch uses pr_debug() call. Please, see
>>> >> >>>> Documentation/dynamic-debug-howto.txt for more details.
>>> >> >>>>
>>> >> >>>> To be honest, I don't test the patch yet. So, if you will have any
>>> >> >>>> troubles, please, e-mail to me.
>>> >> >>>>
>>> >> >>>> With the best regards,
>>> >> >>>> Vyacheslav Dubeyko.
>>> >> >>>>
>>> >> >>>>> As for fsck, I have not found it in git public repo, so where can I
>>> >> >>>>> get the latest version?
>>> >> >>>>> --------------------------------------------------
>>> >> >>>>
>>> >> >> <kern.log><trace_fail>
>>> >> >
>>> >
>>> >
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: very large mount time after unxepected power down
       [not found]                                                                                         ` <CAFPMYnFKKwfRD6zpO_AYCMXP5U_deHfOF6CXGdd6jjMjVkMJ6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-11-27 11:29                                                                                           ` Vyacheslav Dubeyko
  2012-12-03 13:56                                                                                             ` Vyacheslav Dubeyko
  2012-11-28  7:10                                                                                           ` Vyacheslav Dubeyko
  1 sibling, 1 reply; 28+ messages in thread
From: Vyacheslav Dubeyko @ 2012-11-27 11:29 UTC (permalink / raw)
  To: Сергей
	Александров
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Tue, 2012-11-27 at 12:58 +0300, Сергей Александров wrote:
> As I expected, no new info:
> [57291.417113] NILFS warning: mounting unchecked fs
> [57291.417125] NILFS: nilfs_search_super_root
> [57291.417132] NILFS: pseg_start 115603456, seg_seq 4347757
> [57291.417137] NILFS: cno 1252241, segnum 56447
> [62768.349242] NILFS: recovery complete.
> [62768.351804] segctord starting. Construction interval = 5 seconds,
> CP frequency < 30 seconds
> 
> So, I suppose, nilfs is searching for a super root for a very long
> time and find's out that it is consistent.


Ok. I think that I understand situation currently. I try to simulate the
issue and investigate it more deeply. If my presupposition will be wrong
then I will ask about more details.

Thanks,
Vyacheslav Dubeyko.

> --------------------------------------------------
> Александров Сергей Васильевич
> 
> 
> 2012/11/27 Сергей Александров <splavgm-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
> > Sorry, I don't have another multi-TB volume.. And this is a 1TB large.
> > But I can do split/resync on md to restore broken state.
> >
> > I'll send full log as soon as possible.
> > --------------------------------------------------
> > Александров Сергей Васильевич
> >
> >
> > 2012/11/27 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
> >> On Tue, 2012-11-27 at 10:47 +0300, Сергей Александров wrote:
> >>> Hi,
> >>>
> >>> Yes, I it's not a full output, it's an output for abut 2 minutes.
> >>> I can get a full one, but after that the partition apparently will be
> >>> checked and it' will be hard to reproduce.
> >>> I also can split md device, so that there will be another attempt, should I?
> >>
> >> If you get enough disk space then it is possible to get full volume
> >> image. And, as a result, to get issue reproducible volume state by means
> >> of mount volume replica as loop device anytime. Could you do it?
> >>
> >> I think that we need to get whole output of mount operation anyway for
> >> the issue analysis.
> >>
> >> With the best regards,
> >> Vyacheslav Dubeyko.
> >>
> >>> --------------------------------------------------
> >>> Александров Сергей Васильевич
> >>>
> >>>
> >>> 2012/11/27 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
> >>> > Hi,
> >>> >
> >>> > On Mon, 2012-11-26 at 22:17 +0300, Сергей Александров wrote:
> >>> >> Hi,
> >>> >>
> >>> >> Unfortunately, the only output I saw is:
> >>> >> Nov 26 15:54:00 router kernel: [ 5808.068533] NILFS warning: mounting
> >>> >> unchecked fs
> >>> >> Nov 26 15:54:00 router kernel: [ 5808.068546] NILFS: nilfs_search_super_root
> >>> >> Nov 26 15:54:00 router kernel: [ 5808.068552] NILFS: pseg_start
> >>> >> 115603456, seg_seq 4347757
> >>> >> Nov 26 15:54:00 router kernel: [ 5808.068558] NILFS: cno 1252241, segnum 56447
> >>> >>
> >>> >
> >>> > Sorry, the patch is working. The output about searching super root is
> >>> > the result of debug output. But I expected more information in this
> >>> > output. Did you break mount operation before ending?
> >>> >
> >>> > As I understand, you have output not for whole mount operation. Am I
> >>> > correct? Could you share whole debug output from the mount beginning to
> >>> > the end?
> >>> >
> >>> > Or do you share all available output that you had during so long mount
> >>> > operation?
> >>> >
> >>> >> I suppose the last three lines are due to the patch.
> >>> >>
> >>> >> Dynamic debug was on:
> >>> >> router dynamic_debug # cat control | grep nilfs
> >>> >> fs/nilfs2/recovery.c:851 [nilfs2]nilfs_search_super_root =p "NILFS:
> >>> >> cno %lld, segnum %lld\012"
> >>> >> fs/nilfs2/recovery.c:850 [nilfs2]nilfs_search_super_root =p "NILFS:
> >>> >> pseg_start %lld, seg_seq %lld\012"
> >>> >> fs/nilfs2/recovery.c:849 [nilfs2]nilfs_search_super_root =p "NILFS:
> >>> >> nilfs_search_super_root\012"
> >>> >> fs/nilfs2/recovery.c:766 [nilfs2]nilfs_salvage_orphan_logs =p "NILFS:
> >>> >> nilfs_salvage_orphan_logs\012"
> >>> >> fs/nilfs2/recovery.c:614 [nilfs2]nilfs_do_roll_forward =p "NILFS:
> >>> >> seg_start %lld, seg_end %lld\012"
> >>> >> fs/nilfs2/recovery.c:613 [nilfs2]nilfs_do_roll_forward =p "NILFS:
> >>> >> segnum %lld\012"
> >>> >> fs/nilfs2/recovery.c:612 [nilfs2]nilfs_do_roll_forward =p "NILFS:
> >>> >> pseg_start %lld, seg_seq %lld\012"
> >>> >> fs/nilfs2/recovery.c:611 [nilfs2]nilfs_do_roll_forward =p "NILFS:
> >>> >> nilfs_do_roll_forward\012"
> >>> >> fs/nilfs2/recovery.c:442 [nilfs2]nilfs_prepare_segment_for_recovery =p
> >>> >> "NILFS: ri_segnum %lld, ri_nextnum %lld\012"
> >>> >> fs/nilfs2/recovery.c:440 [nilfs2]nilfs_prepare_segment_for_recovery =p
> >>> >> "NILFS: ns_segnum %lld, ns_nextnum %lld\012"
> >>> >> fs/nilfs2/recovery.c:438 [nilfs2]nilfs_prepare_segment_for_recovery =p
> >>> >> "NILFS: nilfs_prepare_segment_for_recovery\012"
> >>> >> fs/nilfs2/recovery.c:719 [nilfs2]nilfs_finish_roll_forward =p "NILFS:
> >>> >> nilfs_finish_roll_forward\012"
> >>> >
> >>> > This is simply list of known debug messages format.
> >>> >
> >>> > With the best regards,
> >>> > Vyacheslav Dubeyko.
> >>> >
> >>> >> --------------------------------------------------
> >>> >> Александров Сергей Васильевич
> >>> >>
> >>> >>
> >>> >> 2012/11/27 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
> >>> >> > Hi,
> >>> >> >
> >>> >> > On Nov 26, 2012, at 7:19 PM, Сергей Александров wrote:
> >>> >> >
> >>> >> >> The bug finally appeared again. Here are the trace log and kernel log.
> >>> >> >> I've hard rebooted machine so FS is in unchecked state for now and it
> >>> >> >> is possible to make some other tests.
> >>> >> >> I'll manage to do without this partition for a day may be two.
> >>> >> >>
> >>> >> >
> >>> >> > Thank you for logs. But what about my patch with debug output? The output of the patch can be very helpful for the issue analysis. It needs to understand what segments are processed during mount. Could you share debug output of my patch?
> >>> >> >
> >>> >> > With the best regards,
> >>> >> > Vyacheslav Dubeyko.
> >>> >> >
> >>> >> >>
> >>> >> >> --------------------------------------------------
> >>> >> >> Александров Сергей Васильевич
> >>> >> >>
> >>> >> >>
> >>> >> >> 2012/11/16 Сергей Александров <splavgm@gmail.com>:
> >>> >> >>> Ok, thanks. I'll try to apply it. I've also turned on function graph
> >>> >> >>> tracing, may be the graph for init_nilfs function will help.
> >>> >> >>> --------------------------------------------------
> >>> >> >>> Александров Сергей Васильевич
> >>> >> >>>
> >>> >> >>>
> >>> >> >>> 2012/11/16 Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>:
> >>> >> >>>> On Fri, 2012-11-16 at 10:11 +0300, Сергей Александров wrote:
> >>> >> >>>>> dmesg:
> >>> >> >>>>> [53994.254432] NILFS warning: mounting unchecked fs
> >>> >> >>>>> [56686.968229] NILFS: recovery complete.
> >>> >> >>>>> [56686.969316] segctord starting. Construction interval = 5 seconds,
> >>> >> >>>>> CP frequency < 30 seconds
> >>> >> >>>>>
> >>> >> >>>>> messages:
> >>> >> >>>>> Nov 15 10:57:06 router kernel: [53994.254432] NILFS warning: mounting
> >>> >> >>>>> unchecked fs
> >>> >> >>>>> Nov 15 11:42:02 router kernel: [56686.968229] NILFS: recovery complete.
> >>> >> >>>>> Nov 15 11:42:02 router kernel: [56686.969316] segctord starting.
> >>> >> >>>>> Construction interval = 5 seconds, CP frequency < 30 seconds
> >>> >> >>>>>
> >>> >> >>>>> May be there is some kernel config option to get more debug output?
> >>> >> >>>>>
> >>> >> >>>>
> >>> >> >>>> I prepared small patch (please, find in attachment). This patch simply
> >>> >> >>>> adds several debug messages into recovery module. I suggest to apply the
> >>> >> >>>> patch on your NILFS2 driver and try to mount again in the issue
> >>> >> >>>> environment. I hope that these debug messages can give more information
> >>> >> >>>> about issue on your side.
> >>> >> >>>>
> >>> >> >>>> The patch uses pr_debug() call. Please, see
> >>> >> >>>> Documentation/dynamic-debug-howto.txt for more details.
> >>> >> >>>>
> >>> >> >>>> To be honest, I don't test the patch yet. So, if you will have any
> >>> >> >>>> troubles, please, e-mail to me.
> >>> >> >>>>
> >>> >> >>>> With the best regards,
> >>> >> >>>> Vyacheslav Dubeyko.
> >>> >> >>>>
> >>> >> >>>>> As for fsck, I have not found it in git public repo, so where can I
> >>> >> >>>>> get the latest version?
> >>> >> >>>>> --------------------------------------------------
> >>> >> >>>>
> >>> >> >> <kern.log><trace_fail>
> >>> >> >
> >>> >
> >>> >
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> >>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >>


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: very large mount time after unxepected power down
       [not found]                                                                                         ` <CAFPMYnFKKwfRD6zpO_AYCMXP5U_deHfOF6CXGdd6jjMjVkMJ6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2012-11-27 11:29                                                                                           ` Vyacheslav Dubeyko
@ 2012-11-28  7:10                                                                                           ` Vyacheslav Dubeyko
  1 sibling, 0 replies; 28+ messages in thread
From: Vyacheslav Dubeyko @ 2012-11-28  7:10 UTC (permalink / raw)
  To: Сергей
	Александров,
	Ryusuke Konishi
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,

On Tue, 2012-11-27 at 12:58 +0300, Сергей Александров wrote:
> As I expected, no new info:
> [57291.417113] NILFS warning: mounting unchecked fs
> [57291.417125] NILFS: nilfs_search_super_root
> [57291.417132] NILFS: pseg_start 115603456, seg_seq 4347757
> [57291.417137] NILFS: cno 1252241, segnum 56447
> [62768.349242] NILFS: recovery complete.
> [62768.351804] segctord starting. Construction interval = 5 seconds,
> CP frequency < 30 seconds
> 
> So, I suppose, nilfs is searching for a super root for a very long
> time and find's out that it is consistent.

Currently, I have such picture of the issue's nature. The key details
are: (1) long system uptime; (2) sudden power-off; (3) special case of
interrelation of s_last_pseg fields in primary and secondary
superblocks. This combination of factors leads to the situation when the
issue takes place.

The NILFS2 volume has two superblocks (primary and secondary). Only one
superblock is active after filesystem mount and to keep actual
superblock state. Another superblock keeps previous superblock's state
and previous value of s_last_pseg field. When it is detected an invalid
state of log for the value of s_last_pseg in actual superblock then it
makes trying to search a previous valid supper root block. If we have
previous value of s_last_pseg in another superblock then it is possible
to find previous valid super root by means of passing through the chain
of log starting from the s_last_pseg of backup superblock. Usually, all
works fine for the case when the s_last_pseg field in actual superblock
is greater than s_last_pseg field in backup superblock. So, if we have
other situation (s_last_pseg field of actual superblock is lesser than
s_last_pseg field of backup superblock) then it needs to pass through
logs' chain from the end of volume to the volume begin. Such logic
results in the issue only when we have s_last_pseg field of actual
superblock is lesser than s_last_pseg field of backup superblock and a
slightly small difference in values. So, for the case if s_last_pseg of
actual superblock has 115603456 and s_last_pseg of backup superblock has
160163840 values then we need to passing through the whole volume
starting from the end.

I think that a proper fix of the issue is to make periodic flushing of
backup superblock in the case of long system uptime with the purpose to
have reasonable difference between of s_last_pseg field of actual
superblock and s_last_pseg field of backup superblock. I am going to fix
the issue in such way.

With the best regards,
Vyacheslav Dubeyko.

> --------------------------------------------------
> Александров Сергей Васильевич
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: very large mount time after unxepected power down
  2012-11-27 11:29                                                                                           ` Vyacheslav Dubeyko
@ 2012-12-03 13:56                                                                                             ` Vyacheslav Dubeyko
  0 siblings, 0 replies; 28+ messages in thread
From: Vyacheslav Dubeyko @ 2012-12-03 13:56 UTC (permalink / raw)
  To: Сергей
	Александров
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 2488 bytes --]

Hi,

On Tue, 2012-11-27 at 15:29 +0400, Vyacheslav Dubeyko wrote:
> On Tue, 2012-11-27 at 12:58 +0300, Сергей Александров wrote:
> > As I expected, no new info:
> > [57291.417113] NILFS warning: mounting unchecked fs
> > [57291.417125] NILFS: nilfs_search_super_root
> > [57291.417132] NILFS: pseg_start 115603456, seg_seq 4347757
> > [57291.417137] NILFS: cno 1252241, segnum 56447
> > [62768.349242] NILFS: recovery complete.
> > [62768.351804] segctord starting. Construction interval = 5 seconds,
> > CP frequency < 30 seconds
> > 
> > So, I suppose, nilfs is searching for a super root for a very long
> > time and find's out that it is consistent.
> 
> 
> Ok. I think that I understand situation currently. I try to simulate the
> issue and investigate it more deeply. If my presupposition will be wrong
> then I will ask about more details.
> 

Currently, I understand that my presuppositions were wrong or I simulate
situation in not proper environment. Sorry.

We can view from outputs that it really takes place situation of super
root searching by means of validation of logs (nilfs_validate_log). But
it is not so clear a reason of it. My presupposition about values in
superblocks can be a one of possible reason of reported issue. But in
current NILFS implementation primary and secondary superblocks are
swapped and flushed frequently. So, it can be a reason only in the case
when RAID influences on flush process, I think. But it is only my
suspicion. The algorithm of superblocks swapping works fine on my side
and for sudden power-off also.

So, anyway, I can suggest currently only to investigate situation more
deeply. The reason can be in some special corruption of logs or in
peculiarities of algorithm of nilfs_search_super_root() method (I think
that it is a more probable reason). But we need to have more details
about environment of the issue.

I prepared additional two patches (please, find in attachment) that add
more debug messages (especially, in the nilfs_search_super_root()
method). I suggest to apply these patches temporary and get this debug
output for the case of issue reproducing. The most of this output will
occur during mount operation and tryings to recover filesystem.

With the best regards,
Vyacheslav Dubeyko.

> Thanks,
> Vyacheslav Dubeyko.
> 
> > --------------------------------------------------
> > Александров Сергей Васильевич



[-- Attachment #2: 0001-nilfs2-debug-messages-for-log-validation-sequence-in.patch --]
[-- Type: text/x-patch, Size: 2895 bytes --]

From: Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
Subject: [PATCH 1/2] nilfs2: debug messages for log validation sequence investigation

Add debug messages for log validation sequence investigation

Signed-off-by: Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
---
 fs/nilfs2/recovery.c  |    6 ++++++
 fs/nilfs2/super.c     |    6 ++++++
 fs/nilfs2/the_nilfs.c |    6 ++++++
 3 files changed, 18 insertions(+)

diff --git a/fs/nilfs2/recovery.c b/fs/nilfs2/recovery.c
index f1626f5..1e8b044 100644
--- a/fs/nilfs2/recovery.c
+++ b/fs/nilfs2/recovery.c
@@ -840,6 +840,12 @@ int nilfs_search_super_root(struct the_nilfs *nilfs,
 		__breadahead(nilfs->ns_bdev, b++, nilfs->ns_blocksize);
 
 	for (;;) {
+
+
+		pr_debug("NILFS: pseg_start %ld, seg_seq %lld\n", pseg_start, seg_seq);
+		pr_debug("NILFS: cno %lld, segnum %lld\n", cno, segnum);
+		pr_debug("NILFS: nextnum %lld\n", nextnum);
+
 		brelse(bh_sum);
 		ret = NILFS_SEG_FAIL_IO;
 		bh_sum = nilfs_read_log_header(nilfs, pseg_start, &sum);
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 3c991dc..c88cb7a 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -254,6 +254,10 @@ void nilfs_set_log_cursor(struct nilfs_super_block *sbp,
 	nilfs_count_free_blocks(nilfs, &nfreeblocks);
 	sbp->s_free_blocks_count = cpu_to_le64(nfreeblocks);
 
+	pr_debug("NILFS: nilfs_set_log_cursor\n");
+	pr_debug("NILFS: nilfs->ns_last_cno: %lld\n", nilfs->ns_last_cno);
+	pr_debug("NILFS: nilfs->ns_last_pseg: %lld\n", nilfs->ns_last_pseg);
+
 	spin_lock(&nilfs->ns_last_segment_lock);
 	sbp->s_last_seq = cpu_to_le64(nilfs->ns_last_seq);
 	sbp->s_last_pseg = cpu_to_le64(nilfs->ns_last_pseg);
@@ -294,6 +298,8 @@ int nilfs_commit_super(struct super_block *sb, int flag)
 	struct nilfs_super_block **sbp = nilfs->ns_sbp;
 	time_t t;
 
+	pr_debug("NILFS: nilfs_commit_super: sb %#x, flags %x\n", sb, flag);
+
 	/* nilfs->ns_sem must be locked by the caller. */
 	t = get_seconds();
 	nilfs->ns_sbwtime = t;
diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
index 41e6a04..ceeaf9b 100644
--- a/fs/nilfs2/the_nilfs.c
+++ b/fs/nilfs2/the_nilfs.c
@@ -475,6 +475,12 @@ void nilfs_swap_super_block(struct the_nilfs *nilfs)
 	struct buffer_head *tsbh = nilfs->ns_sbh[0];
 	struct nilfs_super_block *tsbp = nilfs->ns_sbp[0];
 
+	pr_debug("NILFS: swap superblocks\n");
+	pr_debug("NILFS: nilfs->ns_sbp[0]->s_last_cno: %lld\n", le64_to_cpu(nilfs->ns_sbp[0]->s_last_cno));
+	pr_debug("NILFS: nilfs->ns_sbp[0]->s_last_pseg: %lld\n", le64_to_cpu(nilfs->ns_sbp[0]->s_last_pseg));
+	pr_debug("NILFS: nilfs->ns_sbp[1]->s_last_cno: %lld\n", le64_to_cpu(nilfs->ns_sbp[1]->s_last_cno));
+	pr_debug("NILFS: nilfs->ns_sbp[1]->s_last_pseg: %lld\n", le64_to_cpu(nilfs->ns_sbp[1]->s_last_pseg));
+
 	nilfs->ns_sbh[0] = nilfs->ns_sbh[1];
 	nilfs->ns_sbp[0] = nilfs->ns_sbp[1];
 	nilfs->ns_sbh[1] = tsbh;
-- 
1.7.9.5


[-- Attachment #3: 0002-nilfs2-debug-messages-for-log-validation-sequence-in.patch --]
[-- Type: text/x-patch, Size: 4852 bytes --]

From: Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
Subject: [PATCH 2/2] nilfs2: debug messages for log validation sequence investigation

Add debug messages for log validation sequence investigation

Signed-off-by: Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
---
 fs/nilfs2/recovery.c |   52 +++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 45 insertions(+), 7 deletions(-)

diff --git a/fs/nilfs2/recovery.c b/fs/nilfs2/recovery.c
index 1e8b044..cb9dd62 100644
--- a/fs/nilfs2/recovery.c
+++ b/fs/nilfs2/recovery.c
@@ -841,19 +841,23 @@ int nilfs_search_super_root(struct the_nilfs *nilfs,
 
 	for (;;) {
 
-
-		pr_debug("NILFS: pseg_start %ld, seg_seq %lld\n", pseg_start, seg_seq);
+		pr_debug("NILFS: pseg_start %ld, seg_seq %lld\n",
+							pseg_start, seg_seq);
 		pr_debug("NILFS: cno %lld, segnum %lld\n", cno, segnum);
 		pr_debug("NILFS: nextnum %lld\n", nextnum);
 
 		brelse(bh_sum);
 		ret = NILFS_SEG_FAIL_IO;
 		bh_sum = nilfs_read_log_header(nilfs, pseg_start, &sum);
-		if (!bh_sum)
+		if (!bh_sum) {
+			pr_debug("NILFS: nilfs_read_log_header error\n");
 			goto failed;
+		}
 
 		ret = nilfs_validate_log(nilfs, seg_seq, bh_sum, sum);
 		if (ret) {
+			pr_debug("NILFS: nilfs_validate_log: err %d\n",
+					ret);
 			if (ret == NILFS_SEG_FAIL_IO)
 				goto failed;
 			goto strayed;
@@ -862,6 +866,9 @@ int nilfs_search_super_root(struct the_nilfs *nilfs,
 		nblocks = le32_to_cpu(sum->ss_nblocks);
 		pseg_end = pseg_start + nblocks - 1;
 		if (unlikely(pseg_end > seg_end)) {
+			pr_debug("NILFS: failed consistency\n");
+			pr_debug("NILFS: pseg_start %lld, nblocks %d\n",
+						pseg_start, nblocks);
 			ret = NILFS_SEG_FAIL_CONSISTENCY;
 			goto strayed;
 		}
@@ -877,6 +884,7 @@ int nilfs_search_super_root(struct the_nilfs *nilfs,
 
 		flags = le16_to_cpu(sum->ss_flags);
 		if (!(flags & NILFS_SS_SR) && !scan_newer) {
+			pr_debug("NILFS: !NILFS_SS_SR && !scan_newer\n");
 			/* This will never happen because a superblock
 			   (last_segment) always points to a pseg
 			   having a super root. */
@@ -897,10 +905,16 @@ int nilfs_search_super_root(struct the_nilfs *nilfs,
 			}
 			if (flags & NILFS_SS_LOGEND)
 				ri->ri_lsegs_end = pseg_start;
+
+			pr_debug("NILFS: !NILFS_SS_SR\n");
+
 			goto try_next_pseg;
 		}
 
 		/* A valid super root was found. */
+		pr_debug("NILFS: A valid super root was found\n");
+		pr_debug("NILFS: cno %lld, pseg_end %lld\n", cno, pseg_end);
+
 		ri->ri_cno = cno++;
 		ri->ri_super_root = pseg_end;
 		ri->ri_lsegs_start = ri->ri_lsegs_end = 0;
@@ -914,15 +928,28 @@ int nilfs_search_super_root(struct the_nilfs *nilfs,
 		nilfs->ns_ctime = le64_to_cpu(sum->ss_create);
 		nilfs->ns_nextnum = nextnum;
 
-		if (scan_newer)
+		if (scan_newer) {
+
+			pr_debug("NILFS: scan_newer -> need_recovery\n");
+
 			ri->ri_need_recovery = NILFS_RECOVERY_SR_UPDATED;
-		else {
-			if (nilfs->ns_mount_state & NILFS_VALID_FS)
+		} else {
+			if (nilfs->ns_mount_state & NILFS_VALID_FS) {
+
+				pr_debug("NILFS: NILFS_VALID_FS\n");
+				pr_debug("NILFS: super_root_found\n");
+
 				goto super_root_found;
+			} else {
+
+				pr_debug("NILFS: *not* NILFS_VALID_FS\n");
+
+			}
 			scan_newer = 1;
 		}
 
  try_next_pseg:
+		pr_debug("NILFS: try_next_pseg\n");
 		/* Standing on a course, or met an inconsistent state */
 		pseg_start += nblocks;
 		if (pseg_start < seg_end)
@@ -930,6 +957,7 @@ int nilfs_search_super_root(struct the_nilfs *nilfs,
 		goto feed_segment;
 
  strayed:
+		pr_debug("NILFS: strayed: scan_newer %d\n", scan_newer);
 		/* Off the trail */
 		if (!scan_newer)
 			/*
@@ -939,13 +967,16 @@ int nilfs_search_super_root(struct the_nilfs *nilfs,
 			goto failed;
 
  feed_segment:
+		pr_debug("NILFS: feed_segment: empty_seg %d\n", empty_seg);
 		/* Looking to the next full segment */
 		if (empty_seg++)
 			goto super_root_found; /* found a valid super root */
 
 		ret = nilfs_segment_list_add(&segments, segnum);
-		if (unlikely(ret))
+		if (unlikely(ret)) {
+			pr_debug("NILFS: nilfs_segment_list_add err %d\n", ret);
 			goto failed;
+		}
 
 		seg_seq++;
 		segnum = nextnum;
@@ -954,15 +985,22 @@ int nilfs_search_super_root(struct the_nilfs *nilfs,
 	}
 
  super_root_found:
+	pr_debug("NILFS: super_root_found\n");
+
 	/* Updating pointers relating to the latest checkpoint */
 	brelse(bh_sum);
 	list_splice_tail(&segments, &ri->ri_used_segments);
 	nilfs->ns_last_pseg = sr_pseg_start;
 	nilfs->ns_last_seq = nilfs->ns_seg_seq;
 	nilfs->ns_last_cno = ri->ri_cno;
+
+	pr_debug("NILFS: last_pseg %lld, last_seq %lld, last_cno %lld\n",
+				sr_pseg_start, nilfs->ns_seg_seq, ri->ri_cno);
+
 	return 0;
 
  failed:
+	pr_debug("NILFS: nilfs_search_super_root failed\n");
 	brelse(bh_sum);
 	nilfs_dispose_segment_list(&segments);
 	return (ret < 0) ? ret : nilfs_warn_segment_error(ret);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2012-12-03 13:56 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAFPMYnE3ybWO4o=E1UonAZJ7Uwn5y9n4840ksYGAu7qAYJ0zKw@mail.gmail.com>
     [not found] ` <CAFPMYnE3ybWO4o=E1UonAZJ7Uwn5y9n4840ksYGAu7qAYJ0zKw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-10-30 13:20   ` very large mount time after unxepected power down Сергей Александров
     [not found]     ` <CAFPMYnEZ28qvwkE3kaB59h2rD_8noT+gQtp7Hs6uvmHcL6KzYA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-10-30 13:49       ` Vyacheslav Dubeyko
2012-10-30 14:30         ` Сергей Александров
     [not found]           ` <CAFPMYnHhtFxuVZOMu9MZ6Xb74mFPm1a-4axyFKkHiJjDEW_4BA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-10-30 14:52             ` Vyacheslav Dubeyko
2012-10-30 15:02               ` Сергей Александров
     [not found]                 ` <CAFPMYnGn4aNf=5B9v93TtTc6x4hG1ULgt0P9i75uO=xGX0U2bg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-10-30 17:35                   ` Vyacheslav Dubeyko
     [not found]                     ` <AFFE5823-0AD0-488C-B465-55CF45A10785-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2012-10-31  3:51                       ` Сергей Александров
     [not found]                         ` <CAFPMYnEtXMr1UOVYdNNRxxH83=O-_UOR_ZhCdqjh+JuUNrFiDA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-10-31  6:13                           ` Vyacheslav Dubeyko
2012-10-31 18:06                             ` Сергей Александров
     [not found]                               ` <CAFPMYnHB=x2y3C-bVSEcaT2nMYn12zc5Jnr56ph31zBbym4Kfw@mail.gmail.com>
     [not found]                                 ` <CAFPMYnHB=x2y3C-bVSEcaT2nMYn12zc5Jnr56ph31zBbym4Kfw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-11-15  4:07                                   ` Сергей Александров
     [not found]                                     ` <CAFPMYnE2j0DjiqcSuJRiJX5hfDjHoyh-WUhG0cMav9K=tbsLDQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-11-15  6:32                                       ` Vyacheslav Dubeyko
     [not found]                                         ` <CAFPMYnH4npNU8dJKAHwjatxAA=WoT10EWho5xyYjZJjz4uOYBA@mail.gmail.com>
     [not found]                                           ` <CAFPMYnG6zjT6-=x7XcVuuCp1__H0FhCBfNmyrfQi8dNpWC_m2w@mail.gmail.com>
     [not found]                                             ` <CAFPMYnG6zjT6-=x7XcVuuCp1__H0FhCBfNmyrfQi8dNpWC_m2w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-11-16  6:26                                               ` Vyacheslav Dubeyko
2012-11-16  6:40                                                 ` Сергей Александров
     [not found]                                                   ` <CAFPMYnFLSZW068cFZ4FqDKF5sS_zF3SoV=vPG2=m+kvaxq-BZA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-11-16  6:53                                                     ` Vyacheslav Dubeyko
2012-11-16  7:11                                                       ` Сергей Александров
     [not found]                                                         ` <CAFPMYnEYnLv5e6a3ZcFRjw-8cNB80T5=mpuiX9jaWa+pEj8Q-A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-11-16  7:37                                                           ` Vyacheslav Dubeyko
2012-11-16 13:52                                                           ` Vyacheslav Dubeyko
2012-11-16 13:56                                                             ` Сергей Александров
     [not found]                                                               ` <CAFPMYnHzghu8k36wQj5h4K7a2wS6EcURcQmCOUTb5B2CJB9ufQ@mail.gmail.com>
     [not found]                                                                 ` <CAFPMYnHzghu8k36wQj5h4K7a2wS6EcURcQmCOUTb5B2CJB9ufQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-11-26 20:10                                                                   ` Vyacheslav Dubeyko
     [not found]                                                                     ` <782558C7-081D-4466-8780-51886E209A62-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2012-11-26 19:17                                                                       ` Сергей Александров
     [not found]                                                                         ` <CAFPMYnGT1byuVA1hCnETWc2GZAbDsjeS95-F8f15QYPe_YHABA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-11-27  7:12                                                                           ` Vyacheslav Dubeyko
2012-11-27  7:47                                                                             ` Сергей Александров
     [not found]                                                                               ` <CAFPMYnFTvQVTC7sgmq=9sx4hX7fKkXyjNHSRXay3yTuRQsOq4w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-11-27  7:58                                                                                 ` Vyacheslav Dubeyko
2012-11-27  8:03                                                                                   ` Сергей Александров
     [not found]                                                                                     ` <CAFPMYnEPJLQs9TVMy8PFXqV-XvsM8oTvrf_XTE0+g9cQ+5MJXA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-11-27  9:58                                                                                       ` Сергей Александров
     [not found]                                                                                         ` <CAFPMYnFKKwfRD6zpO_AYCMXP5U_deHfOF6CXGdd6jjMjVkMJ6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-11-27 11:29                                                                                           ` Vyacheslav Dubeyko
2012-12-03 13:56                                                                                             ` Vyacheslav Dubeyko
2012-11-28  7:10                                                                                           ` Vyacheslav Dubeyko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.