All of lore.kernel.org
 help / color / mirror / Atom feed
* A lot of NILFS: bad btree node messages (readonly fs)
@ 2012-11-29 21:06 Piotr Szymaniak
  2012-11-30  6:09 ` Vyacheslav Dubeyko
  0 siblings, 1 reply; 18+ messages in thread
From: Piotr Szymaniak @ 2012-11-29 21:06 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 1357 bytes --]

Hello.

Doing some work today (system update, kernel update, rsync rootfs with
external drive) I noticed some unwanted behaviour:
maszn ~ # touch afile
touch: cannot touch ‘afile’: System plików wyłącznie do odczytu
                        it's Read only filesystem

dmesg shows a lot (1812 in the log) of:
[ 8288.319012] NILFS: bad btree node (blocknr=26229286): level = 178,
flags = 0x0, nchildren = 0
[ 8288.319017] NILFS error (device sda2): nilfs_bmap_lookup_contig:
broken bmap (inode number=102230)

After mount -o remount,rw /
[ 8384.734024] segctord starting. Construction interval = 300 seconds,
CP frequency < 30 seconds
[ 8384.734028] NILFS warning: mounting fs with errors

Never happend before and it looks like it works fine now (using vanilla
kernel 3.6.4).

After a reboot (to vanilla kernel 3.6.8) dmesg shows:
[    6.046883] segctord starting. Construction interval = 300 seconds,
CP frequency < 30 seconds
[    6.046884] NILFS warning: mounting fs with errors

I don't like the "errors" part...


Piotr Szymaniak.
-- 
 - Nie doceniasz sie - zaoponowal. - Jestes samoukiem,  madrym facetem.
W dodatku, moim zdaniem, dosc niezwyklym.
 - Hej, nawet nie mam matury.
 - Wiem - odparl. - Jednak to nie kawalek papieru stanowi o czlowieku.
  -- Stephen King, "Rita Hayworth and Shawshank Redemption"

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A lot of NILFS: bad btree node messages (readonly fs)
  2012-11-29 21:06 A lot of NILFS: bad btree node messages (readonly fs) Piotr Szymaniak
@ 2012-11-30  6:09 ` Vyacheslav Dubeyko
  2012-11-30  7:32   ` Piotr Szymaniak
  2012-11-30  7:46   ` ARAI Shun-ichi
  0 siblings, 2 replies; 18+ messages in thread
From: Vyacheslav Dubeyko @ 2012-11-30  6:09 UTC (permalink / raw)
  To: Piotr Szymaniak; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,

On Thu, 2012-11-29 at 22:06 +0100, Piotr Szymaniak wrote:
> Hello.
> 
> Doing some work today (system update, kernel update, rsync rootfs with
> external drive) I noticed some unwanted behaviour:
> maszn ~ # touch afile
> touch: cannot touch ‘afile’: System plików wyłącznie do odczytu
>                         it's Read only filesystem
> 
> dmesg shows a lot (1812 in the log) of:
> [ 8288.319012] NILFS: bad btree node (blocknr=26229286): level = 178,
> flags = 0x0, nchildren = 0
> [ 8288.319017] NILFS error (device sda2): nilfs_bmap_lookup_contig:
> broken bmap (inode number=102230)
> 
> After mount -o remount,rw /
> [ 8384.734024] segctord starting. Construction interval = 300 seconds,
> CP frequency < 30 seconds
> [ 8384.734028] NILFS warning: mounting fs with errors
> 
> Never happend before and it looks like it works fine now (using vanilla
> kernel 3.6.4).
> 
> After a reboot (to vanilla kernel 3.6.8) dmesg shows:
> [    6.046883] segctord starting. Construction interval = 300 seconds,
> CP frequency < 30 seconds
> [    6.046884] NILFS warning: mounting fs with errors
> 
> I don't like the "errors" part...
> 

Can you reproduce this issue (I mean that the issue reproducible or
not)? Have you any side effects of the issue such as data loss or any
other bad things?

With the best regards,
Vyacheslav Dubeyko.

> 
> Piotr Szymaniak.


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A lot of NILFS: bad btree node messages (readonly fs)
  2012-11-30  6:09 ` Vyacheslav Dubeyko
@ 2012-11-30  7:32   ` Piotr Szymaniak
  2012-12-25  6:05     ` Vyacheslav Dubeyko
  2012-11-30  7:46   ` ARAI Shun-ichi
  1 sibling, 1 reply; 18+ messages in thread
From: Piotr Szymaniak @ 2012-11-30  7:32 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 2097 bytes --]

On Fri, Nov 30, 2012 at 10:09:04AM +0400, Vyacheslav Dubeyko wrote:
> Hi,
> 
> On Thu, 2012-11-29 at 22:06 +0100, Piotr Szymaniak wrote:
> > Hello.
> > 
> > Doing some work today (system update, kernel update, rsync rootfs with
> > external drive) I noticed some unwanted behaviour:
> > maszn ~ # touch afile
> > touch: cannot touch ‘afile’: System plików wyłącznie do odczytu
> >                         it's Read only filesystem
> > 
> > dmesg shows a lot (1812 in the log) of:
> > [ 8288.319012] NILFS: bad btree node (blocknr=26229286): level = 178,
> > flags = 0x0, nchildren = 0
> > [ 8288.319017] NILFS error (device sda2): nilfs_bmap_lookup_contig:
> > broken bmap (inode number=102230)
> > 
> > After mount -o remount,rw /
> > [ 8384.734024] segctord starting. Construction interval = 300 seconds,
> > CP frequency < 30 seconds
> > [ 8384.734028] NILFS warning: mounting fs with errors
> > 
> > Never happend before and it looks like it works fine now (using vanilla
> > kernel 3.6.4).
> > 
> > After a reboot (to vanilla kernel 3.6.8) dmesg shows:
> > [    6.046883] segctord starting. Construction interval = 300 seconds,
> > CP frequency < 30 seconds
> > [    6.046884] NILFS warning: mounting fs with errors
> > 
> > I don't like the "errors" part...
> > 
> 
> Can you reproduce this issue (I mean that the issue reproducible or
> not)? Have you any side effects of the issue such as data loss or any
> other bad things?

It was the first time, but I sure will update my system and kernel in
few days, so if this was related to disk-intense work maybe it could be
reproduced.

Didn't noticed any data loss (at least my system works and kernel boots
the way it should). But I will check last updated packages to see if
there's something wrong.


Piotr Szymaniak.
-- 
Nagle zdalam sobie sprawe, ze wszyscy mezczyzni,  ktorzy wciskaja sie w
cialo  kobiety sila,  podstepem czy szantazem,  ktorzy rzna kobiete nie
kochajac  jej,  sa  gwalcicielami.   Przestepcami  godnymi  najwyzszych
wyrokow.
  -- Graham Masterton, "Famine"

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A lot of NILFS: bad btree node messages (readonly fs)
  2012-11-30  6:09 ` Vyacheslav Dubeyko
  2012-11-30  7:32   ` Piotr Szymaniak
@ 2012-11-30  7:46   ` ARAI Shun-ichi
  1 sibling, 0 replies; 18+ messages in thread
From: ARAI Shun-ichi @ 2012-11-30  7:46 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA; +Cc: hermes-akuOmOme3sQYOdUovKs6ag

In <1354255744.2090.3.camel@slavad-ubuntu>;
   Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org> wrote
   as Subject "Re: A lot of NILFS: bad btree node messages (readonly fs)":

>> dmesg shows a lot (1812 in the log) of:
>> [ 8288.319012] NILFS: bad btree node (blocknr=26229286): level = 178,
>> flags = 0x0, nchildren = 0
>> [ 8288.319017] NILFS error (device sda2): nilfs_bmap_lookup_contig:
>> broken bmap (inode number=102230)

> Can you reproduce this issue (I mean that the issue reproducible or
> not)? Have you any side effects of the issue such as data loss or any
> other bad things?

I met similar error.
Errors reported by my PC is like (/var/log/messages):

Aug 11 20:58:23 celio kernel: [365058.267458] scsi_verify_blk_ioctl: 6 callbacks suppressed
Aug 11 20:58:23 celio kernel: [365058.267460] VBoxSVC: sending ioctl 5393 to a partition!
Aug 11 20:58:23 celio kernel: [365058.267489] VBoxSVC: sending ioctl 5393 to a partition!
Aug 11 20:58:23 celio kernel: [365058.267504] VBoxSVC: sending ioctl 5393 to a partition!
Aug 11 20:58:23 celio kernel: [365058.267519] VBoxSVC: sending ioctl 5393 to a partition!
Aug 11 20:58:23 celio kernel: [365058.267534] VBoxSVC: sending ioctl 5393 to a partition!
Aug 11 20:58:23 celio kernel: [365058.267548] VBoxSVC: sending ioctl 5393 to a partition!
Aug 11 20:58:23 celio kernel: [365058.267563] VBoxSVC: sending ioctl 5393 to a partition!
Aug 11 20:58:23 celio kernel: [365058.267578] VBoxSVC: sending ioctl 5393 to a partition!
Aug 11 20:58:23 celio kernel: [365058.285133] NILFS: bad btree node (blocknr=17620479): level = 146, flags = 0x9a, nchildren = 49895
Aug 11 20:58:23 celio kernel: [365058.285137] NILFS error (device dm-9): nilfs_bmap_lookup_contig: broken bmap (inode number=571846)
Aug 11 20:58:23 celio kernel: [365058.285138] 
Aug 11 20:58:23 celio kernel: [365058.286426] Remounting filesystem read-only
Aug 11 20:58:23 celio kernel: [365058.286631] NILFS: bad btree node (blocknr=17620479): level = 146, flags = 0x9a, nchildren = 49895
Aug 11 20:58:23 celio kernel: [365058.286635] NILFS error (device dm-9): nilfs_bmap_lookup_contig: broken bmap (inode number=571846)
Aug 11 20:58:23 celio kernel: [365058.286636] 
Aug 11 20:58:23 celio kernel: [365058.286855] NILFS: bad btree node (blocknr=17620479): level = 146, flags = 0x9a, nchildren = 49895
Aug 11 20:58:23 celio kernel: [365058.286857] NILFS error (device dm-9): nilfs_bmap_lookup_contig: broken bmap (inode number=571846)
Aug 11 20:58:23 celio kernel: [365058.286858] 
(snip)

These errors ware reported at VirtualBox VM (Win7) startup (not
VirtualBox process startup). VirtualBox configuration directory
(~/.VirtualBox) was in NILFS.

Currently, my .VirtualBox directory is in other FS and symlinked from
home directory. After the workaround, my PC reports no error.

But I can say nothing about dependency with VirtualBox...

Kernel version was 3.4.5, VirtualBox was 4.1.18.

-- 
ARAI, Shun-ichi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A lot of NILFS: bad btree node messages (readonly fs)
  2012-11-30  7:32   ` Piotr Szymaniak
@ 2012-12-25  6:05     ` Vyacheslav Dubeyko
  2012-12-25 20:54       ` Piotr Szymaniak
  2013-01-03 21:49       ` Piotr Szymaniak
  0 siblings, 2 replies; 18+ messages in thread
From: Vyacheslav Dubeyko @ 2012-12-25  6:05 UTC (permalink / raw)
  To: Piotr Szymaniak; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Piotr,

On Fri, 2012-11-30 at 08:32 +0100, Piotr Szymaniak wrote:
> On Fri, Nov 30, 2012 at 10:09:04AM +0400, Vyacheslav Dubeyko wrote:
> > Hi,
> > 
> > On Thu, 2012-11-29 at 22:06 +0100, Piotr Szymaniak wrote:
> > > Hello.
> > > 
> > > Doing some work today (system update, kernel update, rsync rootfs with
> > > external drive) I noticed some unwanted behaviour:
> > > maszn ~ # touch afile
> > > touch: cannot touch ‘afile’: System plików wyłącznie do odczytu
> > >                         it's Read only filesystem
> > > 
> > > dmesg shows a lot (1812 in the log) of:
> > > [ 8288.319012] NILFS: bad btree node (blocknr=26229286): level = 178,
> > > flags = 0x0, nchildren = 0
> > > [ 8288.319017] NILFS error (device sda2): nilfs_bmap_lookup_contig:
> > > broken bmap (inode number=102230)
> > > 
> > > After mount -o remount,rw /
> > > [ 8384.734024] segctord starting. Construction interval = 300 seconds,
> > > CP frequency < 30 seconds
> > > [ 8384.734028] NILFS warning: mounting fs with errors
> > > 
> > > Never happend before and it looks like it works fine now (using vanilla
> > > kernel 3.6.4).
> > > 
> > > After a reboot (to vanilla kernel 3.6.8) dmesg shows:
> > > [    6.046883] segctord starting. Construction interval = 300 seconds,
> > > CP frequency < 30 seconds
> > > [    6.046884] NILFS warning: mounting fs with errors
> > > 
> > > I don't like the "errors" part...
> > > 

Did you have any snapshots on this volume or not?

With the best regards,
Vyacheslav Dubeyko.


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A lot of NILFS: bad btree node messages (readonly fs)
  2012-12-25  6:05     ` Vyacheslav Dubeyko
@ 2012-12-25 20:54       ` Piotr Szymaniak
  2013-01-03 21:49       ` Piotr Szymaniak
  1 sibling, 0 replies; 18+ messages in thread
From: Piotr Szymaniak @ 2012-12-25 20:54 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 370 bytes --]

On Tue, Dec 25, 2012 at 10:05:52AM +0400, Vyacheslav Dubeyko wrote:
> Hi Piotr,
> 
> *snip*
> 
> Did you have any snapshots on this volume or not?

I wont be able to check this in few (or more) days, but afair there were
no snapshosts, only checkpoints.

Piotr Szymaniak.
-- 
Marriage is like a coffin and each kid is like another nail.
  -- Homer Simpson

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A lot of NILFS: bad btree node messages (readonly fs)
  2012-12-25  6:05     ` Vyacheslav Dubeyko
  2012-12-25 20:54       ` Piotr Szymaniak
@ 2013-01-03 21:49       ` Piotr Szymaniak
  2013-01-04 12:46         ` Vyacheslav Dubeyko
  1 sibling, 1 reply; 18+ messages in thread
From: Piotr Szymaniak @ 2013-01-03 21:49 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 4395 bytes --]

On Tue, Dec 25, 2012 at 10:05:52AM +0400, Vyacheslav Dubeyko wrote:
> Hi Piotr,
> 
> On Fri, 2012-11-30 at 08:32 +0100, Piotr Szymaniak wrote:
> > On Fri, Nov 30, 2012 at 10:09:04AM +0400, Vyacheslav Dubeyko wrote:
> > > Hi,
> > > 
> > > On Thu, 2012-11-29 at 22:06 +0100, Piotr Szymaniak wrote:
> > > > Hello.
> > > > 
> > > > Doing some work today (system update, kernel update, rsync rootfs with
> > > > external drive) I noticed some unwanted behaviour:
> > > > maszn ~ # touch afile
> > > > touch: cannot touch ‘afile’: System plików wyłącznie do odczytu
> > > >                         it's Read only filesystem
> > > > 
> > > > dmesg shows a lot (1812 in the log) of:
> > > > [ 8288.319012] NILFS: bad btree node (blocknr=26229286): level = 178,
> > > > flags = 0x0, nchildren = 0
> > > > [ 8288.319017] NILFS error (device sda2): nilfs_bmap_lookup_contig:
> > > > broken bmap (inode number=102230)
> > > > 
> > > > After mount -o remount,rw /
> > > > [ 8384.734024] segctord starting. Construction interval = 300 seconds,
> > > > CP frequency < 30 seconds
> > > > [ 8384.734028] NILFS warning: mounting fs with errors
> > > > 
> > > > Never happend before and it looks like it works fine now (using vanilla
> > > > kernel 3.6.4).
> > > > 
> > > > After a reboot (to vanilla kernel 3.6.8) dmesg shows:
> > > > [    6.046883] segctord starting. Construction interval = 300 seconds,
> > > > CP frequency < 30 seconds
> > > > [    6.046884] NILFS warning: mounting fs with errors
> > > > 
> > > > I don't like the "errors" part...
> > > > 
> 
> Did you have any snapshots on this volume or not?

No, only checkpoints (just checked).

Also, got this issue again. A *lot* of:
Jan 03 22:36:38 [kernel] [  953.289973] NILFS: bad btree node
(blocknr=26229286): level = 67, flags = 0xee, nchildren = 40
Jan 03 22:36:38 [kernel] [  953.289976] NILFS error (device sda2):
nilfs_bmap_lookup_contig: broken bmap (inode number=102230)
Jan 03 22:36:38 [kernel] [  953.289976] 
Jan 03 22:36:38 [kernel] [  953.290248] NILFS: bad btree node
(blocknr=26229286): level = 67, flags = 0xee, nchildren = 40
Jan 03 22:36:38 [kernel] [  953.290251] NILFS error (device sda2):
nilfs_bmap_lookup_contig: broken bmap (inode number=102230)
Jan 03 22:36:38 [kernel] [  953.290251] 
Jan 03 22:36:38 [kernel] [  953.290523] NILFS: bad btree node
(blocknr=26229286): level = 67, flags = 0xee, nchildren = 40
Jan 03 22:36:38 [kernel] [  953.290526] NILFS error (device sda2):
nilfs_bmap_lookup_contig: broken bmap (inode number=102230)

Then I remounted rw and tried again (i was rsyncing my whole rootfs to
another drive):
Jan 03 22:38:15 [kernel] [ 1050.290134] segctord starting. Construction
interval = 300 seconds, CP frequency < 30 seconds
Jan 03 22:38:15 [kernel] [ 1050.290135] NILFS warning: mounting fs with
errors
Jan 03 22:38:15 [nilfs_cleanerd] start
Jan 03 22:38:15 [nilfs_cleanerd] pause (clean check)
Jan 03 22:38:40 [kernel] [ 1075.157346] NILFS: bad btree node
(blocknr=26229286): level = 67, flags = 0xee, nchildren = 40
Jan 03 22:38:40 [kernel] [ 1075.157351] NILFS error (device sda2):
nilfs_bmap_lookup_contig: broken bmap (inode number=102230)
Jan 03 22:38:40 [kernel] [ 1075.157351] 
Jan 03 22:38:40 [kernel] [ 1075.157352] Remounting filesystem read-only
Jan 03 22:38:40 [kernel] [ 1075.159162] NILFS: bad btree node
(blocknr=26229286): level = 67, flags = 0xee, nchildren = 40
Jan 03 22:38:40 [kernel] [ 1075.159168] NILFS error (device sda2):
nilfs_bmap_lookup_contig: broken bmap (inode number=102230)
Jan 03 22:38:40 [kernel] [ 1075.159168] 
Jan 03 22:38:40 [kernel] [ 1075.161003] NILFS: bad btree node
(blocknr=26229286): level = 67, flags = 0xee, nchildren = 40
Jan 03 22:38:40 [kernel] [ 1075.161019] NILFS error (device sda2):
nilfs_bmap_lookup_contig: broken bmap (inode number=102230)

But, thanks to rsync I got a (corupted?) file that causes the problem.
It's some /var/tmp/kdecache-$USER/foo.kcache. copying this file to
another place ends with readonly.


Piotr Szymaniak.
-- 
  - Chyba nie jest pan jednym z tych roniacych lzy liberalow?
  - Odmawiam odpowiedzi, poniewaz moglaby zostac wykorzystana przeciwko
mnie - odparlem.  Taksiarz  wydal  prychniecie oznaczajace dlaczego-ja-
zawsze-trafiam-na-takich-cwaniakow... ale zamknal sie.
  -- Stephen King, "The Breathing Method"

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A lot of NILFS: bad btree node messages (readonly fs)
  2013-01-03 21:49       ` Piotr Szymaniak
@ 2013-01-04 12:46         ` Vyacheslav Dubeyko
       [not found]           ` <28ABE57F-66DD-491E-9FDB-A6CE3CF773FB-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Vyacheslav Dubeyko @ 2013-01-04 12:46 UTC (permalink / raw)
  To: Piotr Szymaniak; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA


On Jan 4, 2013, at 12:49 AM, Piotr Szymaniak wrote:

> On Tue, Dec 25, 2012 at 10:05:52AM +0400, Vyacheslav Dubeyko wrote:
>> Hi Piotr,
>> 
>> On Fri, 2012-11-30 at 08:32 +0100, Piotr Szymaniak wrote:
>>> On Fri, Nov 30, 2012 at 10:09:04AM +0400, Vyacheslav Dubeyko wrote:
>>>> Hi,
>>>> 
>>>> On Thu, 2012-11-29 at 22:06 +0100, Piotr Szymaniak wrote:
>>>>> Hello.
>>>>> 
>>>>> Doing some work today (system update, kernel update, rsync rootfs with
>>>>> external drive) I noticed some unwanted behaviour:
>>>>> maszn ~ # touch afile
>>>>> touch: cannot touch ‘afile’: System plików wyłącznie do odczytu
>>>>>                        it's Read only filesystem
>>>>> 
>>>>> dmesg shows a lot (1812 in the log) of:
>>>>> [ 8288.319012] NILFS: bad btree node (blocknr=26229286): level = 178,
>>>>> flags = 0x0, nchildren = 0
>>>>> [ 8288.319017] NILFS error (device sda2): nilfs_bmap_lookup_contig:
>>>>> broken bmap (inode number=102230)
>>>>> 
>>>>> After mount -o remount,rw /
>>>>> [ 8384.734024] segctord starting. Construction interval = 300 seconds,
>>>>> CP frequency < 30 seconds
>>>>> [ 8384.734028] NILFS warning: mounting fs with errors
>>>>> 
>>>>> Never happend before and it looks like it works fine now (using vanilla
>>>>> kernel 3.6.4).
>>>>> 
>>>>> After a reboot (to vanilla kernel 3.6.8) dmesg shows:
>>>>> [    6.046883] segctord starting. Construction interval = 300 seconds,
>>>>> CP frequency < 30 seconds
>>>>> [    6.046884] NILFS warning: mounting fs with errors
>>>>> 
>>>>> I don't like the "errors" part...
>>>>> 
>> 
>> Did you have any snapshots on this volume or not?
> 
> No, only checkpoints (just checked).
> 

Ok. It means that snapshots don't play any role for the issue.

> Also, got this issue again. A *lot* of:
> Jan 03 22:36:38 [kernel] [  953.289973] NILFS: bad btree node
> (blocknr=26229286): level = 67, flags = 0xee, nchildren = 40
> Jan 03 22:36:38 [kernel] [  953.289976] NILFS error (device sda2):
> nilfs_bmap_lookup_contig: broken bmap (inode number=102230)
> Jan 03 22:36:38 [kernel] [  953.289976] 
> Jan 03 22:36:38 [kernel] [  953.290248] NILFS: bad btree node
> (blocknr=26229286): level = 67, flags = 0xee, nchildren = 40
> Jan 03 22:36:38 [kernel] [  953.290251] NILFS error (device sda2):
> nilfs_bmap_lookup_contig: broken bmap (inode number=102230)
> Jan 03 22:36:38 [kernel] [  953.290251] 
> Jan 03 22:36:38 [kernel] [  953.290523] NILFS: bad btree node
> (blocknr=26229286): level = 67, flags = 0xee, nchildren = 40
> Jan 03 22:36:38 [kernel] [  953.290526] NILFS error (device sda2):
> nilfs_bmap_lookup_contig: broken bmap (inode number=102230)
> 
> Then I remounted rw and tried again (i was rsyncing my whole rootfs to
> another drive):

So, the issue was occurred not because of using rsync. Am I correct? I simply remember that rsync was used in another report about this issue also.

> Jan 03 22:38:15 [kernel] [ 1050.290134] segctord starting. Construction
> interval = 300 seconds, CP frequency < 30 seconds
> Jan 03 22:38:15 [kernel] [ 1050.290135] NILFS warning: mounting fs with
> errors
> Jan 03 22:38:15 [nilfs_cleanerd] start
> Jan 03 22:38:15 [nilfs_cleanerd] pause (clean check)
> Jan 03 22:38:40 [kernel] [ 1075.157346] NILFS: bad btree node
> (blocknr=26229286): level = 67, flags = 0xee, nchildren = 40
> Jan 03 22:38:40 [kernel] [ 1075.157351] NILFS error (device sda2):
> nilfs_bmap_lookup_contig: broken bmap (inode number=102230)
> Jan 03 22:38:40 [kernel] [ 1075.157351] 
> Jan 03 22:38:40 [kernel] [ 1075.157352] Remounting filesystem read-only
> Jan 03 22:38:40 [kernel] [ 1075.159162] NILFS: bad btree node
> (blocknr=26229286): level = 67, flags = 0xee, nchildren = 40
> Jan 03 22:38:40 [kernel] [ 1075.159168] NILFS error (device sda2):
> nilfs_bmap_lookup_contig: broken bmap (inode number=102230)
> Jan 03 22:38:40 [kernel] [ 1075.159168] 
> Jan 03 22:38:40 [kernel] [ 1075.161003] NILFS: bad btree node
> (blocknr=26229286): level = 67, flags = 0xee, nchildren = 40
> Jan 03 22:38:40 [kernel] [ 1075.161019] NILFS error (device sda2):
> nilfs_bmap_lookup_contig: broken bmap (inode number=102230)
> 
> But, thanks to rsync I got a (corupted?) file that causes the problem.
> It's some /var/tmp/kdecache-$USER/foo.kcache. copying this file to
> another place ends with readonly.
> 

So, maybe the nature of /var/tmp/kdecache-$USER/foo.kcache is a reason of the issue. It is a very interesting detail. Could you share more details about this file? I mean size, creation time, modification time, owner, rights and so on. Moreover, could you share more details about your hardware environment? I mean CPU and RAM details.

I think that it can be very interesting to know about how this file is distributed on the volume. Could you get by means of dumpseg information about  several last segments? I don't know how many it needs for understanding but maybe about 10 can be enough for the beginning.

Thanks,
Vyacheslav Dubeyko.

> 
> Piotr Szymaniak.
> -- 
>  - Chyba nie jest pan jednym z tych roniacych lzy liberalow?
>  - Odmawiam odpowiedzi, poniewaz moglaby zostac wykorzystana przeciwko
> mnie - odparlem.  Taksiarz  wydal  prychniecie oznaczajace dlaczego-ja-
> zawsze-trafiam-na-takich-cwaniakow... ale zamknal sie.
>  -- Stephen King, "The Breathing Method"

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A lot of NILFS: bad btree node messages (readonly fs)
       [not found]           ` <28ABE57F-66DD-491E-9FDB-A6CE3CF773FB-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2013-01-04 19:16             ` Piotr Szymaniak
  2013-01-05 11:40               ` Vyacheslav Dubeyko
  0 siblings, 1 reply; 18+ messages in thread
From: Piotr Szymaniak @ 2013-01-04 19:16 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 2676 bytes --]

On Fri, Jan 04, 2013 at 03:46:13PM +0300, Vyacheslav Dubeyko wrote:
> 
> On Jan 4, 2013, at 12:49 AM, Piotr Szymaniak wrote:
> > No, only checkpoints (just checked).
> 
> Ok. It means that snapshots don't play any role for the issue.

Yes, it seems so.


> So, the issue was occurred not because of using rsync. Am I correct? I
> simply remember that rsync was used in another report about this issue
> also.

Yes, rsync just gave me information where it had an i/o error.


> > But, thanks to rsync I got a (corupted?) file that causes the
> > problem.  It's some /var/tmp/kdecache-$USER/foo.kcache. copying this
> > file to another place ends with readonly.
> > 
> 
> So, maybe the nature of /var/tmp/kdecache-$USER/foo.kcache is a reason
> of the issue. It is a very interesting detail. Could you share more
> details about this file? I mean size, creation time, modification
> time, owner, rights and so on. Moreover, could you share more details
> about your hardware environment? I mean CPU and RAM details.

maszn ~ # file /var/tmp/kdecache-appe/plasma_theme_Transparent-sima84.kcache
gives me an i/o error and remounts ro.

maszn ~ # stat /var/tmp/kdecache-appe/plasma_theme_Transparent-sima84.kcache
  File: ‘/var/tmp/kdecache-appe/plasma_theme_Transparent-sima84.kcache’
  Size: 84213856        Blocks: 165144     IO Block: 4096   regular file
Device: 802h/2050d      Inode: 102230      Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1001/   appe)   Gid: (  414/   appe)
Access: 2012-11-14 16:15:01.377207479 +0100
Modify: 2012-11-14 16:15:01.377207479 +0100
Change: 2012-11-14 16:15:01.377207479 +0100

Intel Pentium Dual-Core E5400, 4GB RAM, nilfs2 is located on part of
Kingston SSD. It runs x86 Gentoo Linux with vanilla kernel (from
kernel.org) and I try to keep it up to date (3.6.8 right now, but I was
away for some time) and recent nilfs-utils from the portage tree (this
could also be considered as "up to date" or "up to upstream releases").


> I think that it can be very interesting to know about how this file is
> distributed on the volume. Could you get by means of dumpseg
> information about several last segments? I don't know how many it
> needs for understanding but maybe about 10 can be enough for the
> beginning.

The file is pretty old so last 10 segments won't solve the issue, right?

Piotr Szymaniak.
-- 
 Natezyl sluch. Nic poza odglosami owadow z dziedzinca i naszeptywaniem
spryskiwaczy trawnika. Nic poza miarowym kapaniem z kranu w lazience.
 To bylo niczym zycie, skapujace kropla po kropli. K a p, k a p, k a p,
wszystko w kanal.
  -- Graham Masterton, "The Burning"

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A lot of NILFS: bad btree node messages (readonly fs)
  2013-01-04 19:16             ` Piotr Szymaniak
@ 2013-01-05 11:40               ` Vyacheslav Dubeyko
       [not found]                 ` <C2F4A5DE-538B-4468-9A6A-BA619EDDA462-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Vyacheslav Dubeyko @ 2013-01-05 11:40 UTC (permalink / raw)
  To: Piotr Szymaniak; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA


On Jan 4, 2013, at 10:16 PM, Piotr Szymaniak wrote:

[snip]
>> 
>> So, maybe the nature of /var/tmp/kdecache-$USER/foo.kcache is a reason
>> of the issue. It is a very interesting detail. Could you share more
>> details about this file? I mean size, creation time, modification
>> time, owner, rights and so on. Moreover, could you share more details
>> about your hardware environment? I mean CPU and RAM details.
> 
> maszn ~ # file /var/tmp/kdecache-appe/plasma_theme_Transparent-sima84.kcache
> gives me an i/o error and remounts ro.
> 
> maszn ~ # stat /var/tmp/kdecache-appe/plasma_theme_Transparent-sima84.kcache
>  File: ‘/var/tmp/kdecache-appe/plasma_theme_Transparent-sima84.kcache’
>  Size: 84213856        Blocks: 165144     IO Block: 4096   regular file
> Device: 802h/2050d      Inode: 102230      Links: 1
> Access: (0664/-rw-rw-r--)  Uid: ( 1001/   appe)   Gid: (  414/   appe)
> Access: 2012-11-14 16:15:01.377207479 +0100
> Modify: 2012-11-14 16:15:01.377207479 +0100
> Change: 2012-11-14 16:15:01.377207479 +0100
> 
> Intel Pentium Dual-Core E5400, 4GB RAM, nilfs2 is located on part of
> Kingston SSD. It runs x86 Gentoo Linux with vanilla kernel (from
> kernel.org) and I try to keep it up to date (3.6.8 right now, but I was
> away for some time) and recent nilfs-utils from the portage tree (this
> could also be considered as "up to date" or "up to upstream releases").
> 

Thank you for details.

> 
>> I think that it can be very interesting to know about how this file is
>> distributed on the volume. Could you get by means of dumpseg
>> information about several last segments? I don't know how many it
>> needs for understanding but maybe about 10 can be enough for the
>> beginning.
> 
> The file is pretty old so last 10 segments won't solve the issue, right?
> 

You know that only patch can solve the issue. :-)

I simply try to collect enough details about the issue for deep understanding of it. There are two ways of issue investigation: (1) reproducing on my side; (2) preparing special debug output that can give more details about the issue on your side. So, I feel necessity to prepare a patch with additional debug output that can be enabled in NILFS2 in the case of special needs.

Currently, I have such understanding of the situation. As I can see from the error output and preliminary code analysis, it occurs situation when allocation code discovers that memory page is uptodate before reading. It can means that we have synchronization issue. But it needs to investigate situation more deeply.

 Now I am preparing patch for another issue with very long mount time. So, when I end this activity then I begin more deep investigation of the issue with readonly fs.

Thanks,
Vyacheslav Dubeyko.

> Piotr Szymaniak.
> -- 
> Natezyl sluch. Nic poza odglosami owadow z dziedzinca i naszeptywaniem
> spryskiwaczy trawnika. Nic poza miarowym kapaniem z kranu w lazience.
> To bylo niczym zycie, skapujace kropla po kropli. K a p, k a p, k a p,
> wszystko w kanal.
>  -- Graham Masterton, "The Burning"

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A lot of NILFS: bad btree node messages (readonly fs)
       [not found]                 ` <C2F4A5DE-538B-4468-9A6A-BA619EDDA462-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2013-01-05 16:47                   ` Piotr Szymaniak
  2013-01-08 12:52                     ` Vyacheslav Dubeyko
  0 siblings, 1 reply; 18+ messages in thread
From: Piotr Szymaniak @ 2013-01-05 16:47 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 1209 bytes --]

On Sat, Jan 05, 2013 at 02:40:19PM +0300, Vyacheslav Dubeyko wrote:
> I simply try to collect enough details about the issue for deep
> understanding of it. There are two ways of issue investigation: (1)
> reproducing on my side; (2) preparing special debug output that can
> give more details about the issue on your side. So, I feel necessity
> to prepare a patch with additional debug output that can be enabled in
> NILFS2 in the case of special needs.
> 
> Currently, I have such understanding of the situation. As I can see
> from the error output and preliminary code analysis, it occurs
> situation when allocation code discovers that memory page is uptodate
> before reading. It can means that we have synchronization issue. But
> it needs to investigate situation more deeply.
> 
> Now I am preparing patch for another issue with very long mount time.
> So, when I end this activity then I begin more deep investigation of
> the issue with readonly fs.

Ok, I'll wait. Good thing it's just some KDE cache file. (:


Piotr Szymaniak.
-- 
  Czujesz  ostatni oddech opuszczajacy ich ciala.  Patrzysz w ich oczy.
Osoba w takiej sytuacji jest Bogiem!
  -- Theodore Robert Bundy

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A lot of NILFS: bad btree node messages (readonly fs)
  2013-01-05 16:47                   ` Piotr Szymaniak
@ 2013-01-08 12:52                     ` Vyacheslav Dubeyko
       [not found]                       ` <504C0292-D0C6-426C-BDFD-1CD50FB32617-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Vyacheslav Dubeyko @ 2013-01-08 12:52 UTC (permalink / raw)
  To: Piotr Szymaniak, Elmer Zhang,
	Сергей
	Александров
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi guys,

I am trying to reproduce the issue last three days but without success. I tried different workloads and different environments. As I know all of you have the issue in reproduced state. So I have additional questions.

1. All of you have such messages:

Jan 03 22:36:38 [kernel] [  953.289973] NILFS: bad btree node (blocknr=26229286): level = 67, flags = 0xee, nchildren = 40
Jan 03 22:36:38 [kernel] [  953.289976] NILFS error (device sda2): nilfs_bmap_lookup_contig: broken bmap (inode number=102230)

As I understand, you still have message for concrete block number (for example, blocknr=26229286) during remount. But you haven't the message for this block number (for example, blocknr=26229286) after umount and mount again. But you can get error messages for another block number after it. Am I correct?

2. As I understand, you have corrupted file on your volume after such error message (for example, for inode number=102230).

在 2013-1-6,12:46,Elmer Zhang <freeboy6716@gmail.com> 写道:

> I have found the corrupted file using inode number:
> [root@yf237 data0]# cat mysql6003/app_wyxgrab/weibo_rank.MYI > /dev/null 
> cat: mysql6003/app_wyxgrab/weibo_rank.MYI: Input/output error

Could you share strace output for "cat" command for such corrupted file? Maybe syslog can contain some interesting details during execution of "cat" command. Could you check syslog for interesting error messages during such try?

3. Could you share configuration file of your kernel (.config)? I suspect that you can have some special configuration of your environment that I haven't.

4. Could you share content of nilfs_cleanerd.conf file for NILFS2 partition that has such issue? Sorry, if I ask about it again.

5. Did you have any sudden power-off before you encounter the issue firstly?

6. I understand that it can be not so easy. But, anyway, could you share details of your system log for the case of first case of the issue occurrence? I need only details about how live system before the issue.

7. I analyzed the raw dump of segment that I received from Elmer Zhang. Currently, I have such feeling that it takes place situation when driver tries to take block that was filled by GC yet. But it needs to investigate the issue more deeply. And, currently, I don't understand how the issue can be achieved. Successful reproducing of the issue is a half of the success.

Thanks,
Vyacheslav Dubeyko.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A lot of NILFS: bad btree node messages (readonly fs)
       [not found]                       ` <504C0292-D0C6-426C-BDFD-1CD50FB32617-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2013-01-09  6:20                         ` Elmer Zhang
       [not found]                           ` <30D452FB-F1A5-425C-97ED-1A12CA4DCFD9-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2013-01-09 21:09                         ` Piotr Szymaniak
  1 sibling, 1 reply; 18+ messages in thread
From: Elmer Zhang @ 2013-01-09  6:20 UTC (permalink / raw)
  To: Vyacheslav Dubeyko
  Cc: Piotr Szymaniak,
	Сергей
	Александров,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA


在 2013-1-8,20:52,Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org> 写道:

> Hi guys,
> 
> I am trying to reproduce the issue last three days but without success. I tried different workloads and different environments. As I know all of you have the issue in reproduced state. So I have additional questions.
> 
> 1. All of you have such messages:
> 
> Jan 03 22:36:38 [kernel] [  953.289973] NILFS: bad btree node (blocknr=26229286): level = 67, flags = 0xee, nchildren = 40
> Jan 03 22:36:38 [kernel] [  953.289976] NILFS error (device sda2): nilfs_bmap_lookup_contig: broken bmap (inode number=102230)
> 
> As I understand, you still have message for concrete block number (for example, blocknr=26229286) during remount. But you haven't the message for this block number (for example, blocknr=26229286) after umount and mount again. But you can get error messages for another block number after it. Am I correct?

I got error messages for the same block after remount. That is to say the bad block is always bad.

> 2. As I understand, you have corrupted file on your volume after such error message (for example, for inode number=102230).
> 

Yes. Once I open the corrupted file and read, kernel will report bad btree node and remount filesystem read-only.

> 在 2013-1-6,12:46,Elmer Zhang <freeboy6716-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 写道:
> 
>> I have found the corrupted file using inode number:
>> [root@yf237 data0]# cat mysql6003/app_wyxgrab/weibo_rank.MYI > /dev/null 
>> cat: mysql6003/app_wyxgrab/weibo_rank.MYI: Input/output error
> 
> Could you share strace output for "cat" command for such corrupted file? Maybe syslog can contain some interesting details during execution of "cat" command. Could you check syslog for interesting error messages during such try?
> 

output of strace for cat: http://d.pr/n/Qboc
error messages during cat: http://d.pr/n/snOt

> 3. Could you share configuration file of your kernel (.config)? I suspect that you can have some special configuration of your environment that I haven't.
> 

content of /boot/config-2.6.32-220.13.1.el6.x86_64 : http://d.pr/n/qTQk

> 4. Could you share content of nilfs_cleanerd.conf file for NILFS2 partition that has such issue? Sorry, if I ask about it again.
> 

content of nilfs_cleanerd.conf: http://d.pr/n/YIwj

> 5. Did you have any sudden power-off before you encounter the issue firstly?
> 

No.

> 6. I understand that it can be not so easy. But, anyway, could you share details of your system log for the case of first case of the issue occurrence? I need only details about how live system before the issue.
> 

I found some backtrace in syslog: http://d.pr/n/ddZd

> 7. I analyzed the raw dump of segment that I received from Elmer Zhang. Currently, I have such feeling that it takes place situation when driver tries to take block that was filled by GC yet. But it needs to investigate the issue more deeply. And, currently, I don't understand how the issue can be achieved. Successful reproducing of the issue is a half of the success.
> 
> Thanks,
> Vyacheslav Dubeyko.
> 

---
Elmer Zhang

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A lot of NILFS: bad btree node messages (readonly fs)
       [not found]                           ` <30D452FB-F1A5-425C-97ED-1A12CA4DCFD9-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2013-01-09 12:10                             ` Vyacheslav Dubeyko
  0 siblings, 0 replies; 18+ messages in thread
From: Vyacheslav Dubeyko @ 2013-01-09 12:10 UTC (permalink / raw)
  To: Elmer Zhang
  Cc: Piotr Szymaniak,
	Сергей
	Александров,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Wed, 2013-01-09 at 14:20 +0800, Elmer Zhang wrote:
[snip]
> 
> > 6. I understand that it can be not so easy. But, anyway, could you share details of your system log for the case of first case of the issue occurrence? I need only details about how live system before the issue.
> > 
> 
> I found some backtrace in syslog: http://d.pr/n/ddZd
> 

Thank you for additional details. It is very helpful for me.

So, I have a picture of the situation on your system after analyzing
your system log.

First of all, as I can see, you have many regular error messages about
page allocation failure:

Dec 24 12:24:11 yf237 kernel: swapper: page allocation failure. order:1, mode:0x20

As I understand, you have incorrect memory subsystem configuration or
issue with NIC driver. Do you have e1000 NIC? As I can see after surfing
in the internet the reason of it can be a NIC's driver issue or
configuration of vm.min_free_kbytes. It recommends to enhance the value
of vm.min_free_kbytes = 65536. Could you share details of your virtual
memory configuration? You can find different details of your
configuration in /proc/sys/vm (especially, I need to know
vm.min_free_kbytes). I suspect that I need to decrease this value on my
system for issue reproducing.

Secondly, as I can see you have regular troubles with page allocation
failures from:

Dec 24 12:24:11 yf237 kernel: <IRQ>  [<ffffffff8112405f>] ? __alloc_pages_nodemask+0x77f/0x940

till:

Dec 27 04:06:11 yf237 kernel: <IRQ>  [<ffffffff8112405f>] ? __alloc_pages_nodemask+0x77f/0x940

Then your system was suddenly power-off. I think so because I can't see
message about nilfs_cleanerd shutdown before new system start that was
in:

Dec 27 11:16:29 yf237 kernel: Linux version 2.6.32-220.13.1.el6.x86_64 (mockbuild-ktcNWDX7ZkcnYXo6ltff2qQZW8Oz2YPi@public.gmane.org) (gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC) ) #1 SMP Tue Apr 17 23:56:34 BST 2012

And then we have:

Dec 27 11:32:34 yf237 kernel: NILFS warning: mounting unchecked fs
Dec 27 11:32:34 yf237 kernel: NILFS: recovery complete.
Dec 27 11:32:34 yf237 kernel: segctord starting. Construction interval = 60 seconds, CP frequency < 30 seconds
Dec 27 11:32:34 yf237 kernel: NILFS warning: mounting fs with errors

This means that we mounted filesystem that was not unmounted cleanly.

Finally, I can see such messages:

Dec 27 14:03:50 yf237 kernel: INFO: task nilfs_cleanerd:5046 blocked for more than 120 seconds.
Dec 27 14:03:50 yf237 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[snip]
Dec 27 14:03:50 yf237 kernel: INFO: task mysqld:4379 blocked for more than 120 seconds.
Dec 27 14:03:50 yf237 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[snip]
Dec 27 14:03:50 yf237 kernel: INFO: task mysqld:4380 blocked for more than 120 seconds.
Dec 27 14:03:50 yf237 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[etc...]

These messages means that you have encounter issue with flush kernel
thread. This issue was reported firstly before the report about
discussed here issue. I achieved such messages after deleting some big
file and executing sync command when I can see flush kernel thread that
eats 100% CPU time.

So, currently, I think that the reason of the issue can be issue with
flush kernel thread with adding some incorrect virtual memory subsystem
configuration (that lead into page allocation failure). These two
reasons can summarize in corruption metadata in memory pages that
flushed on the volume as a result or not flushed properly.

As a resume, I need to fix issue with flush kernel thread (I can
reproduce in easily). And, anyway, I need to try to reproduce this issue
also.

With the best regards,
Vyacheslav Dubeyko.

> > 7. I analyzed the raw dump of segment that I received from Elmer Zhang. Currently, I have such feeling that it takes place situation when driver tries to take block that was filled by GC yet. But it needs to investigate the issue more deeply. And, currently, I don't understand how the issue can be achieved. Successful reproducing of the issue is a half of the success.
> > 
> > Thanks,
> > Vyacheslav Dubeyko.
> > 
> 
> ---
> Elmer Zhang
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A lot of NILFS: bad btree node messages (readonly fs)
       [not found]                       ` <504C0292-D0C6-426C-BDFD-1CD50FB32617-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
  2013-01-09  6:20                         ` Elmer Zhang
@ 2013-01-09 21:09                         ` Piotr Szymaniak
  2013-01-10  7:12                           ` Vyacheslav Dubeyko
  1 sibling, 1 reply; 18+ messages in thread
From: Piotr Szymaniak @ 2013-01-09 21:09 UTC (permalink / raw)
  To: Vyacheslav Dubeyko
  Cc: Elmer Zhang,
	Сергей
	Александров,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA


[-- Attachment #1.1: Type: text/plain, Size: 2418 bytes --]

On Tue, Jan 08, 2013 at 03:52:30PM +0300, Vyacheslav Dubeyko wrote:
> Hi guys,
> 
> I am trying to reproduce the issue last three days but without
> success. I tried different workloads and different environments. As I
> know all of you have the issue in reproduced state. So I have
> additional questions.
> 
> 1. All of you have such messages:
> 
> Jan 03 22:36:38 [kernel] [  953.289973] NILFS: bad btree node (blocknr=26229286): level = 67, flags = 0xee, nchildren = 40
> Jan 03 22:36:38 [kernel] [  953.289976] NILFS error (device sda2): nilfs_bmap_lookup_contig: broken bmap (inode number=102230)

I noticed already, that before I had:
Jan 04 19:40:29 [kernel] [  801.937864] NILFS: bad btree node (blocknr=26229286): level = 67, flags = 0xee, nchildren = 40

And now it is:
[34101.963275] NILFS: bad btree node (blocknr=26229286): level = 84, flags = 0x89, nchildren = 13348

So different level, flags and nchildren, but blocknr is the same.


> As I understand, you still have message for concrete block number (for
> example, blocknr=26229286) during remount. But you haven't the message
> for this block number (for example, blocknr=26229286) after umount and
> mount again. But you can get error messages for another block number
> after it. Am I correct?

Looking at my logs I only have an error for the mentioned blocknr.


> Could you share strace output for "cat" command for such corrupted
> file? Maybe syslog can contain some interesting details during
> execution of "cat" command. Could you check syslog for interesting
> error messages during such try?

Attached strace output.


> 3. Could you share configuration file of your kernel (.config)? I
> suspect that you can have some special configuration of your
> environment that I haven't.

Attached.


> 4. Could you share content of nilfs_cleanerd.conf file for NILFS2
> partition that has such issue? Sorry, if I ask about it again.

Attached.



> 5. Did you have any sudden power-off before you encounter the issue
> firstly?

No, I think not.


Piotr Szymaniak.
-- 
 Odpowiedz, nie zastanawiając się: co jest najgroźniejsze dla firmowych
danych? Pomyślałeś o sabotażu, hakerach albo szpiegostwie przemysłowym?
Owszem,  wszystko to prawda,  ale największym zagrożeniem dla firmowych
danych są żółte samoprzylepne karteczki.
  -- Ben Rothke, "Computer security"

[-- Attachment #1.2: config.bz2 --]
[-- Type: application/x-bzip2, Size: 14282 bytes --]

[-- Attachment #1.3: dmesg_cat.log.bz2 --]
[-- Type: application/x-bzip2, Size: 996 bytes --]

[-- Attachment #1.4: nilfs.conf --]
[-- Type: text/plain, Size: 285 bytes --]

protection_period	3600
min_clean_segments	15%
max_clean_segments	25%
clean_check_interval	10
selection_policy	timestamp	# timestamp in ascend order
nsegments_per_clean	2
mc_nsegments_per_clean	4
cleaning_interval	5
mc_cleaning_interval	1
retry_interval		60
use_mmap
log_priority		info

[-- Attachment #1.5: strace_cat.log --]
[-- Type: text/plain, Size: 4245 bytes --]

execve("/bin/cat", ["cat", "/var/tmp/kdecache-malpa/plasma_t"...], [/* 45 vars */]) = 0
brk(0)                                  = 0x8ba0000
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb772a000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=105158, ...}) = 0
mmap2(NULL, 105158, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7710000
close(3)                                = 0
open("/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\0\315\1\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1729888, ...}) = 0
mmap2(NULL, 1743644, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7566000
mprotect(0xb7709000, 4096, PROT_NONE)   = 0
mmap2(0xb770a000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a3) = 0xb770a000
mmap2(0xb770d000, 11036, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb770d000
close(3)                                = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7565000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb75656c0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xb770a000, 8192, PROT_READ)   = 0
mprotect(0x8052000, 4096, PROT_READ)    = 0
mprotect(0xb774c000, 4096, PROT_READ)   = 0
munmap(0xb7710000, 105158)              = 0
brk(0)                                  = 0x8ba0000
brk(0x8bc1000)                          = 0x8bc1000
open("/usr/lib/locale/locale-archive", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=1716432, ...}) = 0
mmap2(NULL, 1716432, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb73c1000
close(3)                                = 0
fstat64(1, {st_mode=S_IFREG|0664, st_size=1904, ...}) = 0
open("/var/tmp/kdecache-malpa/plasma_theme_Transparent-sima84.kcache", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0664, st_size=84213856, ...}) = 0
fadvise64_64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
read(3, 0x8ba1000, 65536)               = -1 EIO (Input/output error)
write(2, "cat: ", 5cat: )                    = 5
write(2, "/var/tmp/kdecache-malpa/plasma_t"..., 62/var/tmp/kdecache-malpa/plasma_theme_Transparent-sima84.kcache) = 62
open("/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 4
fcntl64(4, F_GETFD)                     = 0x1 (flags FD_CLOEXEC)
fstat64(4, {st_mode=S_IFREG|0644, st_size=2502, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7729000
read(4, "# Locale name alias data base.\n#"..., 4096) = 2502
read(4, "", 4096)                       = 0
close(4)                                = 0
munmap(0xb7729000, 4096)                = 0
open("/usr/share/locale/pl_PL.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/pl_PL.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/pl_PL/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/pl.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/pl.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/pl/LC_MESSAGES/libc.mo", O_RDONLY) = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=143799, ...}) = 0
mmap2(NULL, 143799, PROT_READ, MAP_PRIVATE, 4, 0) = 0xb739d000
close(4)                                = 0
open("/usr/lib/gconv/gconv-modules.cache", O_RDONLY) = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=26242, ...}) = 0
mmap2(NULL, 26242, PROT_READ, MAP_SHARED, 4, 0) = 0xb7396000
close(4)                                = 0
write(2, ": B\305\202\304\205d wej\305\233cia/wyj\305\233cia", 26: Błąd wejścia/wyjścia) = 26
write(2, "\n", 1
)                       = 1
close(3)                                = 0
close(1)                                = 0
close(2)                                = 0
exit_group(1)                           = ?
+++ exited with 1 +++

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A lot of NILFS: bad btree node messages (readonly fs)
  2013-01-09 21:09                         ` Piotr Szymaniak
@ 2013-01-10  7:12                           ` Vyacheslav Dubeyko
  2013-01-12 21:24                             ` Piotr Szymaniak
  0 siblings, 1 reply; 18+ messages in thread
From: Vyacheslav Dubeyko @ 2013-01-10  7:12 UTC (permalink / raw)
  To: Piotr Szymaniak
  Cc: Elmer Zhang,
	Сергей
	Александров,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Wed, 2013-01-09 at 22:09 +0100, Piotr Szymaniak wrote:
> On Tue, Jan 08, 2013 at 03:52:30PM +0300, Vyacheslav Dubeyko wrote:
> > 
[snip]

Thank you for additional details.

As I remember, you reported about issue with flush kernel thread firstly
and, then, about this issue. So, I think that you can have some similar
environment peculiarities as Elmer Zhang.

Could you share details of your virtual memory configuration
(especially, vm.min_free_kbytes)?

Could you share more detailed output of your system log? Especially, I
interested what was in the first case of the issue occurrence. But it
can be interested also part of system log that begins from system start
and ends with messages about remounting in RO state.

Thanks,
Vyacheslav Dubeyko.

> 
> 
> Piotr Szymaniak.


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A lot of NILFS: bad btree node messages (readonly fs)
  2013-01-10  7:12                           ` Vyacheslav Dubeyko
@ 2013-01-12 21:24                             ` Piotr Szymaniak
  2013-01-14 19:32                               ` Vyacheslav Dubeyko
  0 siblings, 1 reply; 18+ messages in thread
From: Piotr Szymaniak @ 2013-01-12 21:24 UTC (permalink / raw)
  To: Vyacheslav Dubeyko
  Cc: Elmer Zhang,
	Сергей
	Александров,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 1577 bytes --]

On Thu, Jan 10, 2013 at 11:12:05AM +0400, Vyacheslav Dubeyko wrote:
> On Wed, 2013-01-09 at 22:09 +0100, Piotr Szymaniak wrote:
> > On Tue, Jan 08, 2013 at 03:52:30PM +0300, Vyacheslav Dubeyko wrote:
> > > 
> [snip]
> 
> Thank you for additional details.
> 
> As I remember, you reported about issue with flush kernel thread firstly
> and, then, about this issue. So, I think that you can have some similar
> environment peculiarities as Elmer Zhang.
> 
> Could you share details of your virtual memory configuration
> (especially, vm.min_free_kbytes)?

vm.min_free_kbytes = 3806


> Could you share more detailed output of your system log? Especially, I
> interested what was in the first case of the issue occurrence. But it
> can be interested also part of system log that begins from system start
> and ends with messages about remounting in RO state.

I can connect the broken file with other event. User repoted that his
KDE session wont load or loads very (and it's like few minutes+) long
time. Now I think the reason for this could be the broken file in
/var/tmp/kde-foo. User moved his "broken" ~/.kde and everything was back
to normal state (well, except that he was missing his session and DE
setting).

I'm just writing it as this was some time ago (looking at the broken
file it was ~2 months ago) and the logs are somewhere in /dev/null now
and I think that those issues are connected somehow.


Piotr Szymaniak.
-- 
Stajesz sie odpowiedzialny na zawsze za to, co oswoiles.
  -- Antoine De Saint-Exupery, "Le Petit Prince"

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A lot of NILFS: bad btree node messages (readonly fs)
  2013-01-12 21:24                             ` Piotr Szymaniak
@ 2013-01-14 19:32                               ` Vyacheslav Dubeyko
  0 siblings, 0 replies; 18+ messages in thread
From: Vyacheslav Dubeyko @ 2013-01-14 19:32 UTC (permalink / raw)
  To: Piotr Szymaniak
  Cc: Elmer Zhang,
	Сергей
	Александров,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA


On Jan 13, 2013, at 12:24 AM, Piotr Szymaniak wrote:

> On Thu, Jan 10, 2013 at 11:12:05AM +0400, Vyacheslav Dubeyko wrote:
>> On Wed, 2013-01-09 at 22:09 +0100, Piotr Szymaniak wrote:
>>> On Tue, Jan 08, 2013 at 03:52:30PM +0300, Vyacheslav Dubeyko wrote:
>>>> 
>> [snip]
>> 
>> Thank you for additional details.
>> 
>> As I remember, you reported about issue with flush kernel thread firstly
>> and, then, about this issue. So, I think that you can have some similar
>> environment peculiarities as Elmer Zhang.
>> 
>> Could you share details of your virtual memory configuration
>> (especially, vm.min_free_kbytes)?
> 
> vm.min_free_kbytes = 3806
> 

I think that it is a very important detail.

> 
>> Could you share more detailed output of your system log? Especially, I
>> interested what was in the first case of the issue occurrence. But it
>> can be interested also part of system log that begins from system start
>> and ends with messages about remounting in RO state.
> 
> I can connect the broken file with other event. User repoted that his
> KDE session wont load or loads very (and it's like few minutes+) long
> time. Now I think the reason for this could be the broken file in
> /var/tmp/kde-foo. User moved his "broken" ~/.kde and everything was back
> to normal state (well, except that he was missing his session and DE
> setting).
> 
> I'm just writing it as this was some time ago (looking at the broken
> file it was ~2 months ago) and the logs are somewhere in /dev/null now
> and I think that those issues are connected somehow.
> 

Thank you for additional info.

Now I am fixing the issue with flush kernel thread. I think that the issue with flush kernel thread is a part of the issue with bad btree node messages.  

With the best regards,
Vyacheslav Dubeyko.

> 
> Piotr Szymaniak.
> -- 
> Stajesz sie odpowiedzialny na zawsze za to, co oswoiles.
>  -- Antoine De Saint-Exupery, "Le Petit Prince"

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-01-14 19:32 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-29 21:06 A lot of NILFS: bad btree node messages (readonly fs) Piotr Szymaniak
2012-11-30  6:09 ` Vyacheslav Dubeyko
2012-11-30  7:32   ` Piotr Szymaniak
2012-12-25  6:05     ` Vyacheslav Dubeyko
2012-12-25 20:54       ` Piotr Szymaniak
2013-01-03 21:49       ` Piotr Szymaniak
2013-01-04 12:46         ` Vyacheslav Dubeyko
     [not found]           ` <28ABE57F-66DD-491E-9FDB-A6CE3CF773FB-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2013-01-04 19:16             ` Piotr Szymaniak
2013-01-05 11:40               ` Vyacheslav Dubeyko
     [not found]                 ` <C2F4A5DE-538B-4468-9A6A-BA619EDDA462-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2013-01-05 16:47                   ` Piotr Szymaniak
2013-01-08 12:52                     ` Vyacheslav Dubeyko
     [not found]                       ` <504C0292-D0C6-426C-BDFD-1CD50FB32617-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2013-01-09  6:20                         ` Elmer Zhang
     [not found]                           ` <30D452FB-F1A5-425C-97ED-1A12CA4DCFD9-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2013-01-09 12:10                             ` Vyacheslav Dubeyko
2013-01-09 21:09                         ` Piotr Szymaniak
2013-01-10  7:12                           ` Vyacheslav Dubeyko
2013-01-12 21:24                             ` Piotr Szymaniak
2013-01-14 19:32                               ` Vyacheslav Dubeyko
2012-11-30  7:46   ` ARAI Shun-ichi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.