Odd problem starting nilfs_cleanerd due to an eMMC misbehaviour

All of lore.kernel.org
 help / color / mirror / Atom feed

* Odd problem starting nilfs_cleanerd due to an eMMC misbehaviour
@ 2012-01-26 13:52 Paul Fertser
       [not found] ` <20120126135203.GM2267-0MSThuzptbI@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Fertser @ 2012-01-26 13:52 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,

I'm using nilfs2 for the root filesystem on an ARM-based netbook
(Toshiba ac100) with Debian hardfloat. Custom kernel is based on 3.0.8
and nilfs-tools is 2.1.0-1 from the Debian repository.

I wanted to try the threaded i/o test from the Phoronix test suite and
somehow it happened that during the test the garbage collecting daemon
failed and never came back. So i got the filesystem 100% full and
after i noticed it i tried running the daemon manually. It didn't
start even after reboot. Suprisingly, the eMMC error went away on its
own after fully powering off the whole device, and after that the
daemon started to work properly.

I'm not sure what conclusion might be made from this but i'd still
appreciate any comments, especially the suggestions on what to do if
the error didn't "recover".

The relevant dmesg excerpts (full might be available from
http://paulfertser.info/files/failing_emmc.txt ):

[    2.837036] mmc0: new high speed MMC card at address 0001
[    2.847637] mmcblk0: mmc0:0001 MMC32G 29.8 GiB 
...
[ 5668.706475] mmcblk0: retrying using single block read
[ 5671.580366] mmcblk0: error -110 transferring data, sector 15563278, nr 122, card status 0x200900
[ 5671.603701] end_request: I/O error, dev mmcblk0, sector 15563278
[ 5674.421016] mmcblk0: error -110 transferring data, sector 15563279, nr 121, card status 0x200900
[ 5674.445322] end_request: I/O error, dev mmcblk0, sector 15563279
[ 5674.466988] NILFS: GC failed during preparation: cannot read source blocks: err=-5
...
[ 7121.711242] WARNING: at fs/nilfs2/ioctl.c:449 nilfs_ioctl_clean_segments.clone.7+0x4f4/0x634 [nilfs2]()
...
[ 7121.834580] [<bf01d578>] (nilfs_ioctl_clean_segments.clone.7+0x4f4/0x634 [nilfs2]) from [<bf01de10>] (nilfs_ioctl+0x584/0x85c [nilfs2])
[ 7121.852173] [<bf01de10>] (nilfs_ioctl+0x584/0x85c [nilfs2]) from [<c014b518>] (do_vfs_ioctl+0x51c/0x590)
[ 7121.866961] [<c014b518>] (do_vfs_ioctl+0x51c/0x590) from [<c014b5ec>] (sys_ioctl+0x60/0x84)
[ 7121.880665] [<c014b5ec>] (sys_ioctl+0x60/0x84) from [<c0055020>] (ret_fast_syscall+0x0/0x30)
[ 7121.894273] ---[ end trace 7f37788dc3302b00 ]---
[ 7121.904330] NILFS: GC failed during preparation: cannot read source blocks: err=-17
[ 7124.499469] mmcblk0: retrying using single block read
[ 7127.319607] mmcblk0: error -110 transferring data, sector 15563278, nr 2, card status 0x200900
[ 7127.342323] end_request: I/O error, dev mmcblk0, sector 15563278
...

Then any attempt to manually start GC lead to:

[39133.541126] NILFS: GC failed during preparation: cannot read source blocks: err=-17
[39136.196519] mmcblk0: retrying using single block read
[39139.003276] mmcblk0: error -110 transferring data, sector 15563278, nr 2, card status 0x200900
[39139.012406] end_request: I/O error, dev mmcblk0, sector 15563278
[39141.815571] mmcblk0: error -110 transferring data, sector 15563279, nr 1, card status 0x200900
[39141.824837] end_request: I/O error, dev mmcblk0, sector 15563279
...

TIA and best of luck!
-- 
Be free, use free (http://www.gnu.org/philosophy/free-sw.html) software!
mailto:fercerpav-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Odd problem starting nilfs_cleanerd due to an eMMC misbehaviour
       [not found] ` <20120126135203.GM2267-0MSThuzptbI@public.gmane.org>
@ 2012-01-27 16:19   ` Christian Smith
       [not found]     ` <20120127161921.GL750-Ng8wz+J301SNY5Lh21HnMTHS2PGA244I9dF7HbQ/qKg@public.gmane.org>
  2012-01-28 12:53   ` Martin Steigerwald
  2013-01-10 13:16   ` New experience with the odd " Paul Fertser
  2 siblings, 1 reply; 13+ messages in thread
From: Christian Smith @ 2012-01-27 16:19 UTC (permalink / raw)
  To: Paul Fertser; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Thu, Jan 26, 2012 at 05:52:03PM +0400, Paul Fertser wrote:
> Hi,
> 
> I'm using nilfs2 for the root filesystem on an ARM-based netbook
> (Toshiba ac100) with Debian hardfloat. Custom kernel is based on 3.0.8
> and nilfs-tools is 2.1.0-1 from the Debian repository.
> 
> I wanted to try the threaded i/o test from the Phoronix test suite and
> somehow it happened that during the test the garbage collecting daemon
> failed and never came back. So i got the filesystem 100% full and
> after i noticed it i tried running the daemon manually. It didn't
> start even after reboot. Suprisingly, the eMMC error went away on its
> own after fully powering off the whole device, and after that the
> daemon started to work properly.
> 
> I'm not sure what conclusion might be made from this but i'd still
> appreciate any comments, especially the suggestions on what to do if
> the error didn't "recover".

Remember, SDCards contain their own embedded controller to do the
block mapping between LBA and FLASH blocks. There may even be an ARM
based controller in the SDCard. Under the stress of a benchmark, the
firmware probably just got itself in a bit of a state and needed a
hard reset to recover.

What brand of SD Card is it? Most SD Cards are designed for low
stress low speed IO in devices such as cameras. Perhaps try a
different brand.

Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Odd problem starting nilfs_cleanerd due to an eMMC misbehaviour
       [not found]     ` <20120127161921.GL750-Ng8wz+J301SNY5Lh21HnMTHS2PGA244I9dF7HbQ/qKg@public.gmane.org>
@ 2012-01-27 16:29       ` Gordan Bobic
  0 siblings, 0 replies; 13+ messages in thread
From: Gordan Bobic @ 2012-01-27 16:29 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Christian Smith wrote:
> On Thu, Jan 26, 2012 at 05:52:03PM +0400, Paul Fertser wrote:
>> Hi,
>>
>> I'm using nilfs2 for the root filesystem on an ARM-based netbook
>> (Toshiba ac100) with Debian hardfloat. Custom kernel is based on 3.0.8
>> and nilfs-tools is 2.1.0-1 from the Debian repository.
>>
>> I wanted to try the threaded i/o test from the Phoronix test suite and
>> somehow it happened that during the test the garbage collecting daemon
>> failed and never came back. So i got the filesystem 100% full and
>> after i noticed it i tried running the daemon manually. It didn't
>> start even after reboot. Suprisingly, the eMMC error went away on its
>> own after fully powering off the whole device, and after that the
>> daemon started to work properly.
>>
>> I'm not sure what conclusion might be made from this but i'd still
>> appreciate any comments, especially the suggestions on what to do if
>> the error didn't "recover".
> 
> Remember, SDCards contain their own embedded controller to do the
> block mapping between LBA and FLASH blocks. There may even be an ARM
> based controller in the SDCard. Under the stress of a benchmark, the
> firmware probably just got itself in a bit of a state and needed a
> hard reset to recover.
> 
> What brand of SD Card is it? Most SD Cards are designed for low
> stress low speed IO in devices such as cameras. Perhaps try a
> different brand.

I believe Paul was referring to the internal eMMC (not an SD card) on 
the Toshiba AC100. Not something that is easily replaceable. :(

I should also point out that having benchmarked many SD cards, I have 
yet to find any that offer decent performance on random-writes, no 
matter how good they may be at linear writes - hence the interest in nilfs2.

Gordan
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Odd problem starting nilfs_cleanerd due to an eMMC misbehaviour
       [not found] ` <20120126135203.GM2267-0MSThuzptbI@public.gmane.org>
  2012-01-27 16:19   ` Christian Smith
@ 2012-01-28 12:53   ` Martin Steigerwald
       [not found]     ` <201201281353.00537.Martin-3kZCPVa5dk2azgQtNeiOUg@public.gmane.org>
  2013-01-10 13:16   ` New experience with the odd " Paul Fertser
  2 siblings, 1 reply; 13+ messages in thread
From: Martin Steigerwald @ 2012-01-28 12:53 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Am Donnerstag, 26. Januar 2012 schrieben Sie:
> Hi,
> 
> I'm using nilfs2 for the root filesystem on an ARM-based netbook
> (Toshiba ac100) with Debian hardfloat. Custom kernel is based on 3.0.8
> and nilfs-tools is 2.1.0-1 from the Debian repository.
> 
> I wanted to try the threaded i/o test from the Phoronix test suite and
> somehow it happened that during the test the garbage collecting daemon
> failed and never came back. So i got the filesystem 100% full and
> after i noticed it i tried running the daemon manually. It didn't
> start even after reboot. Suprisingly, the eMMC error went away on its
> own after fully powering off the whole device, and after that the
> daemon started to work properly.
> 
> I'm not sure what conclusion might be made from this but i'd still
> appreciate any comments, especially the suggestions on what to do if
> the error didn't "recover".
> 
> The relevant dmesg excerpts (full might be available from
> http://paulfertser.info/files/failing_emmc.txt ):
> 
> [    2.837036] mmc0: new high speed MMC card at address 0001
> [    2.847637] mmcblk0: mmc0:0001 MMC32G 29.8 GiB
> ...
> [ 5668.706475] mmcblk0: retrying using single block read
> [ 5671.580366] mmcblk0: error -110 transferring data, sector 15563278,
> nr 122, card status 0x200900 [ 5671.603701] end_request: I/O error,
> dev mmcblk0, sector 15563278 [ 5674.421016] mmcblk0: error -110
> transferring data, sector 15563279, nr 121, card status 0x200900 [
> 5674.445322] end_request: I/O error, dev mmcblk0, sector 15563279 [
> 5674.466988] NILFS: GC failed during preparation: cannot read source
> blocks: err=-5 ...

Well I think thats clear: Thats an I/O error when trying to access the MMC 
flash. I think the cleaner shouldn´t hang on it, but aside from that I do 
not see a NILFS issue here.

As to the exact nature of the MMC error I have no idea. Maybe searching 
for parts of the dmesg message does help.

I have I/O errors sometimes with the internal card reader on my ThinkPad 
T520 with Kingston Ultimate SD cards which work perfectly well with an 
external USB card reader. I reported these to bugzilla.kernel.org after I 
noticed that some similar sounding issue was reported there as well.

So I recommend checking whether this might be a known issue with the mmc 
driver in the Linux kernel.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Odd problem starting nilfs_cleanerd due to an eMMC misbehaviour
       [not found]     ` <201201281353.00537.Martin-3kZCPVa5dk2azgQtNeiOUg@public.gmane.org>
@ 2012-02-01  6:06       ` Paul Fertser
  0 siblings, 0 replies; 13+ messages in thread
From: Paul Fertser @ 2012-02-01  6:06 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,

On Sat, Jan 28, 2012 at 01:53:00PM +0100, Martin Steigerwald wrote:
> > [ 5671.580366] mmcblk0: error -110 transferring data, sector 15563278, nr 122, card status 0x200900
> > [ 5671.603701] end_request: I/O error, dev mmcblk0, sector 15563278
> 
> Well I think thats clear: Thats an I/O error when trying to access the MMC 
> flash. I think the cleaner shouldn´t hang on it, but aside from that I do 
> not see a NILFS issue here.

Well, please consider my message as a bugreport against the cleaner,
i.e. "defect/featurerequest: if the cleaner daemon stops due to I/O
errors, output an informational message suggesting to 1. Fully power
off the media and reboot 2. In case it doesn't help, try another
controller."

I'm not sure if the facility to mark bad blocks and skip them is
needed on modern devices, so it's not clear if that's reasonable to
expect it from NILFS.

As to the fsck, it would be very nice to have at least some very basic
utility, probably even doing only diagnostic output. And with Debian
on NILFS rootfs i had to substitute it with /bin/true because Debian
expects a usable fsck for its root.

> As to the exact nature of the MMC error I have no idea. Maybe searching 
> for parts of the dmesg message does help.

Probably some glitch in the embedded controller, dmesg had no
interesting related data.

> I have I/O errors sometimes with the internal card reader on my ThinkPad 
> T520 with Kingston Ultimate SD cards which work perfectly well with an 
> external USB card reader. I reported these to bugzilla.kernel.org after I 
> noticed that some similar sounding issue was reported there as well.

I wonder if that might be related to the power supply voltage. I think
external cardreaders always use 5V for the card but the internal can
vary it and if the card's descriptor permits lower voltages, it goes
for the lowest; similar reasoning applies to the frequency.

Also some cards are more prone to RF interference than the others. At
OpenMoko we had reports of I/O errors occuring simultaneously with the
GSM activity. I guess laptop's insides have enough sources of
interference too.

-- 
Be free, use free (http://www.gnu.org/philosophy/free-sw.html) software!
mailto:fercerpav-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* New experience with the odd problem starting nilfs_cleanerd due to an eMMC misbehaviour
       [not found] ` <20120126135203.GM2267-0MSThuzptbI@public.gmane.org>
  2012-01-27 16:19   ` Christian Smith
  2012-01-28 12:53   ` Martin Steigerwald
@ 2013-01-10 13:16   ` Paul Fertser
       [not found]     ` <20130110131659.GA29689-0MSThuzptbI@public.gmane.org>
  2 siblings, 1 reply; 13+ messages in thread
From: Paul Fertser @ 2013-01-10 13:16 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,

I've reported this issue earlier and back then it "resolved itself"
after a power-cycle. The hardware in question is the same ac100
netbook with an internal 32GB eMMC.

On nilfs_cleanerd start i was consistently getting these messages:
[   46.122096] mmcblk0: error -110 transferring data, sector 26671630, nr 10, card status 0x200900
[   48.934623] mmcblk0: error -110 transferring data, sector 26671631, nr 9, card status 0x200900
(and similar output for two other sectors)

However, this time was different, several full power cycles didn't
help and the read was still failing and cleanerd refused to start.

So i resorted to brute force, trying to write to the sectors in
question. 512 and 2048 byte writes were failing with the same error
message, so i tried 4096 and it took about a second but succeeded:

dd if=/dev/zero of=/dev/mmcblk0 bs=4096 count=1 seek=3333953

Same has to be done with the other failing sector.

And only after that i was able to start nilfs_cleanerd and my system
seems to be running fine again.

If it happened with e.g. ext4, i would be unable to read a certain
file and after several attempts would simply delete it and the blocks
in question would get overwritten eventually without any impact for
the system usability. However, with nilfs had i not figured out the
"dd" trick i would have to overwrite the whole filesystem (and in fact
that's the main and only partition on my netbook). The described mmc
card behaviour (i.e. remapping bad blocks only on write attempts)
seems to be kind of reasonable and i would expect the same problem to
happen with someone else one day.

HTH
-- 
Be free, use free (http://www.gnu.org/philosophy/free-sw.html) software!
mailto:fercerpav-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: New experience with the odd problem starting nilfs_cleanerd due to an eMMC misbehaviour
       [not found]     ` <20130110131659.GA29689-0MSThuzptbI@public.gmane.org>
@ 2013-01-10 13:34       ` Vyacheslav Dubeyko
  2013-01-10 13:49         ` Paul Fertser
  2013-06-30  7:44         ` Paul Fertser
  0 siblings, 2 replies; 13+ messages in thread
From: Vyacheslav Dubeyko @ 2013-01-10 13:34 UTC (permalink / raw)
  To: Paul Fertser; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,

On Thu, 2013-01-10 at 17:16 +0400, Paul Fertser wrote:
> Hi,
> 
> I've reported this issue earlier and back then it "resolved itself"
> after a power-cycle. The hardware in question is the same ac100
> netbook with an internal 32GB eMMC.
> 
> On nilfs_cleanerd start i was consistently getting these messages:
> [   46.122096] mmcblk0: error -110 transferring data, sector 26671630, nr 10, card status 0x200900
> [   48.934623] mmcblk0: error -110 transferring data, sector 26671631, nr 9, card status 0x200900
> (and similar output for two other sectors)
> 
> However, this time was different, several full power cycles didn't
> help and the read was still failing and cleanerd refused to start.
> 

Could you share system log content and strace of nilfs_cleanerd's trying
to start in the case of the issue? It needs for the beginning of the
issue understanding. Please, set in the nilfs_cleanerd.conf debug level.

Could you made raw dump of bad sectors? Could you share dumpseg output
for segments which contains the bad sectors?

So, I need to think how to investigate your issue without availability
of eMMC. :-)

Thanks,
Vyacheslav Dubeyko.

> So i resorted to brute force, trying to write to the sectors in
> question. 512 and 2048 byte writes were failing with the same error
> message, so i tried 4096 and it took about a second but succeeded:
> 
> dd if=/dev/zero of=/dev/mmcblk0 bs=4096 count=1 seek=3333953
> 
> Same has to be done with the other failing sector.
> 
> And only after that i was able to start nilfs_cleanerd and my system
> seems to be running fine again.
> 
> If it happened with e.g. ext4, i would be unable to read a certain
> file and after several attempts would simply delete it and the blocks
> in question would get overwritten eventually without any impact for
> the system usability. However, with nilfs had i not figured out the
> "dd" trick i would have to overwrite the whole filesystem (and in fact
> that's the main and only partition on my netbook). The described mmc
> card behaviour (i.e. remapping bad blocks only on write attempts)
> seems to be kind of reasonable and i would expect the same problem to
> happen with someone else one day.
> 
> HTH


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: New experience with the odd problem starting nilfs_cleanerd due to an eMMC misbehaviour
  2013-01-10 13:34       ` Vyacheslav Dubeyko
@ 2013-01-10 13:49         ` Paul Fertser
       [not found]           ` <20130110134907.GB29689-0MSThuzptbI@public.gmane.org>
  2013-06-30  7:44         ` Paul Fertser
  1 sibling, 1 reply; 13+ messages in thread
From: Paul Fertser @ 2013-01-10 13:49 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,

On Thu, Jan 10, 2013 at 05:34:22PM +0400, Vyacheslav Dubeyko wrote:
> On Thu, 2013-01-10 at 17:16 +0400, Paul Fertser wrote:
> > I've reported this issue earlier and back then it "resolved itself"
> > after a power-cycle. The hardware in question is the same ac100
> > netbook with an internal 32GB eMMC.
> > 
> > On nilfs_cleanerd start i was consistently getting these messages:
> > [   46.122096] mmcblk0: error -110 transferring data, sector 26671630, nr 10, card status 0x200900
> > [   48.934623] mmcblk0: error -110 transferring data, sector 26671631, nr 9, card status 0x200900
> > (and similar output for two other sectors)
> > 
> > However, this time was different, several full power cycles didn't
> > help and the read was still failing and cleanerd refused to start.
> > 
> 
> Could you share system log content and strace of nilfs_cleanerd's trying
> to start in the case of the issue? It needs for the beginning of the
> issue understanding. Please, set in the nilfs_cleanerd.conf debug level.

Eh, sorry, i was too involved in thinking about how to fix the issue
and so haven't saved any extra logs :(

> Could you made raw dump of bad sectors? Could you share dumpseg output
> for segments which contains the bad sectors?

The bad sectors in question were unreadable with dd, it just hanged
for a moment and an additional error -110 was output to dmesg.

> So, I need to think how to investigate your issue without availability
> of eMMC. :-)

I think you can reproduce something similar with loopback mounting if
you modify the driver to return read error for some particular sectors
that cleanerd wants to access when it starts cleaning.

Sorry again for not having saved enough details.

-- 
Be free, use free (http://www.gnu.org/philosophy/free-sw.html) software!
mailto:fercerpav-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: New experience with the odd problem starting nilfs_cleanerd due to an eMMC misbehaviour
       [not found]           ` <20130110134907.GB29689-0MSThuzptbI@public.gmane.org>
@ 2013-01-10 14:00             ` Vyacheslav Dubeyko
  2013-01-10 14:12               ` Paul Fertser
  0 siblings, 1 reply; 13+ messages in thread
From: Vyacheslav Dubeyko @ 2013-01-10 14:00 UTC (permalink / raw)
  To: Paul Fertser; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Thu, 2013-01-10 at 17:49 +0400, Paul Fertser wrote:
> Hi,
> 
> On Thu, Jan 10, 2013 at 05:34:22PM +0400, Vyacheslav Dubeyko wrote:
> > On Thu, 2013-01-10 at 17:16 +0400, Paul Fertser wrote:
> > > I've reported this issue earlier and back then it "resolved itself"
> > > after a power-cycle. The hardware in question is the same ac100
> > > netbook with an internal 32GB eMMC.
> > > 
> > > On nilfs_cleanerd start i was consistently getting these messages:
> > > [   46.122096] mmcblk0: error -110 transferring data, sector 26671630, nr 10, card status 0x200900
> > > [   48.934623] mmcblk0: error -110 transferring data, sector 26671631, nr 9, card status 0x200900
> > > (and similar output for two other sectors)
> > > 
> > > However, this time was different, several full power cycles didn't
> > > help and the read was still failing and cleanerd refused to start.
> > > 
> > 
> > Could you share system log content and strace of nilfs_cleanerd's trying
> > to start in the case of the issue? It needs for the beginning of the
> > issue understanding. Please, set in the nilfs_cleanerd.conf debug level.
> 
> Eh, sorry, i was too involved in thinking about how to fix the issue
> and so haven't saved any extra logs :(
> 

Anyway, thank you for the report.

> > Could you made raw dump of bad sectors? Could you share dumpseg output
> > for segments which contains the bad sectors?
> 
> The bad sectors in question were unreadable with dd, it just hanged
> for a moment and an additional error -110 was output to dmesg.
> 
> > So, I need to think how to investigate your issue without availability
> > of eMMC. :-)
> 
> I think you can reproduce something similar with loopback mounting if
> you modify the driver to return read error for some particular sectors
> that cleanerd wants to access when it starts cleaning.
> 

Yes, it is a good idea. I'll try it. But I worry only that it is
possible to investigate not the issue that was reported by you.

Thanks,
Vyacheslav Dubeyko.

> Sorry again for not having saved enough details.
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: New experience with the odd problem starting nilfs_cleanerd due to an eMMC misbehaviour
  2013-01-10 14:00             ` Vyacheslav Dubeyko
@ 2013-01-10 14:12               ` Paul Fertser
  0 siblings, 0 replies; 13+ messages in thread
From: Paul Fertser @ 2013-01-10 14:12 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Thu, Jan 10, 2013 at 06:00:59PM +0400, Vyacheslav Dubeyko wrote:
> Anyway, thank you for the report.

Big thanks to you for caring about the project!

> > I think you can reproduce something similar with loopback mounting if
> > you modify the driver to return read error for some particular sectors
> > that cleanerd wants to access when it starts cleaning.
> 
> Yes, it is a good idea. I'll try it. But I worry only that it is
> possible to investigate not the issue that was reported by you.

I think what's important is to keep in mind the following failure
scenario: a misbehaving media suddenly starts returning read errors
consistently for a number of sectors until the whole erase block is
written to.

I also found two nice options to simulate my situation:
http://stackoverflow.com/questions/1870696/simulate-a-faulty-block-device-with-read-errors

Good luck and happy hacking.
Spasibo!
-- 
Be free, use free (http://www.gnu.org/philosophy/free-sw.html) software!
mailto:fercerpav-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: New experience with the odd problem starting nilfs_cleanerd due to an eMMC misbehaviour
  2013-01-10 13:34       ` Vyacheslav Dubeyko
  2013-01-10 13:49         ` Paul Fertser
@ 2013-06-30  7:44         ` Paul Fertser
       [not found]           ` <20130630074404.GI22224-0MSThuzptbI@public.gmane.org>
  1 sibling, 1 reply; 13+ messages in thread
From: Paul Fertser @ 2013-06-30  7:44 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 2318 bytes --]

On Thu, Jan 10, 2013 at 05:34:22PM +0400, Vyacheslav Dubeyko wrote:
> > On nilfs_cleanerd start i was consistently getting these messages:
> > [   46.122096] mmcblk0: error -110 transferring data, sector 26671630, nr 10, card status 0x200900
> > [   48.934623] mmcblk0: error -110 transferring data, sector 26671631, nr 9, card status 0x200900
> > (and similar output for two other sectors)
> > 
> > However, this time was different, several full power cycles didn't
> > help and the read was still failing and cleanerd refused to start.
> > 
> 
> Could you share system log content and strace of nilfs_cleanerd's trying
> to start in the case of the issue? It needs for the beginning of the
> issue understanding. Please, set in the nilfs_cleanerd.conf debug level.

Now I've took my time to gather all the info I could (and it wasn't
that easy to calculate the offsets, I have nilfs2 on an LVM volume on
a GPT disk:), LVM partition offset is 34816 sectors, 1st PE is 384
sectors inside the partition, the NILFS volume starts on the 256th PE)

This command produced error messages so I think the offset is right:
sudo dd if=/dev/mapper/emmc--lvm-rootfs of=rawdump_2713.bin bs=$((4096*2048)) count=1 skip=2713

[ 5142.730071] mmcblk0: retrying using single block read
[ 5145.599122] mmcblk0: error -110 transferring data, sector 46594382, nr 434, card status 0x200900
[ 5145.624308] end_request: I/O error, dev mmcblk0, sector 46594382
[ 5148.442955] mmcblk0: error -110 transferring data, sector 46594383, nr 433, card status 0x200900
[ 5148.467911] end_request: I/O error, dev mmcblk0, sector 46594383
[ 5148.489628] Buffer I/O error on device dm-0, logical block 5557753
[ 5151.436927] mmcblk0: error -110 transferring data, sector 46594574, nr 242, card status 0x200900
[ 5151.462374] end_request: I/O error, dev mmcblk0, sector 46594574
[ 5154.286162] mmcblk0: error -110 transferring data, sector 46594575, nr 241, card status 0x200900
[ 5154.311555] end_request: I/O error, dev mmcblk0, sector 46594575
[ 5154.333388] Buffer I/O error on device dm-0, logical block 5557777

Attaching dumpseg output for segment 2713, strace for nilfs_cleanerd,
etc, HTH.

Thank you in advance!
-- 
Be free, use free (http://www.gnu.org/philosophy/free-sw.html) software!
mailto:fercerpav-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org

[-- Attachment #2: dumpseg_2713.txt.bz2 --]
[-- Type: application/octet-stream, Size: 12943 bytes --]

[-- Attachment #3: lssu.txt.bz2 --]
[-- Type: application/octet-stream, Size: 13342 bytes --]

[-- Attachment #4: nilfs_tune_l.txt --]
[-- Type: text/plain, Size: 1030 bytes --]

nilfs-tune 2.1.0
Filesystem volume name:	  (none)
Filesystem UUID:	  822e1c89-0714-42e7-9f5c-2a11234cec85
Filesystem magic number:  0x3434
Filesystem revision #:	  2.0
Filesystem features:      (none)
Filesystem state:	  invalid or mounted
Filesystem OS type:	  Linux
Block size:		  4096
Filesystem created:	  Sat Jan 14 12:43:52 2012
Last mount time:	  Sun Jun 30 10:08:09 2013
Last write time:	  Sun Jun 30 11:01:35 2013
Mount count:		  403
Maximum mount count:	  50
Reserve blocks uid:	  0 (user root)
Reserve blocks gid:	  0 (group root)
First inode:		  11
Inode size:		  128
DAT entry size:		  32
Checkpoint size:	  192
Segment usage size:	  16
Number of segments:	  3685
Device size:		  30916214784
First data block:	  1
# of blocks per segment:  2048
Reserved segments %:	  5
Last checkpoint #:	  470550
Last block address:	  4689935
Last sequence #:	  35454
Free blocks count:	  862208
Commit interval:	  0
# of blks to create seg:  0
CRC seed:		  0x08347184
CRC check sum:		  0x6969faa9
CRC check data size:	  0x00000118

[-- Attachment #5: syslog.bz2 --]
[-- Type: application/octet-stream, Size: 11175 bytes --]

[-- Attachment #6: cleanerd_debug_log.txt.bz2 --]
[-- Type: application/octet-stream, Size: 3454 bytes --]

[-- Attachment #7: daemon_debug.log --]
[-- Type: text/plain, Size: 1727 bytes --]

Jun 30 10:22:33 ac100-debian wpa_supplicant[1149]: WPA: Group rekeying completed with 00:27:19:ce:3b:ce [GTK=TKIP]
Jun 30 10:32:33 ac100-debian wpa_supplicant[1149]: WPA: Group rekeying completed with 00:27:19:ce:3b:ce [GTK=TKIP]
Jun 30 10:34:56 ac100-debian nilfs_cleanerd[1908]: start
Jun 30 10:34:56 ac100-debian nilfs_cleanerd[1908]: pause (clean check)
Jun 30 10:34:56 ac100-debian nilfs_cleanerd[1908]: resume (clean check)
Jun 30 10:34:56 ac100-debian nilfs_cleanerd[1908]: ncleansegs = 428
Jun 30 10:34:56 ac100-debian nilfs_cleanerd[1908]: 4 segments selected to be cleaned
Jun 30 10:34:56 ac100-debian nilfs_cleanerd[1908]: cannot clean segments: File exists
Jun 30 10:34:56 ac100-debian nilfs_cleanerd[1908]: shutdown
Jun 30 10:35:34 ac100-debian nilfs_cleanerd[1914]: start
Jun 30 10:35:34 ac100-debian nilfs_cleanerd[1914]: pause (clean check)
Jun 30 10:35:34 ac100-debian nilfs_cleanerd[1914]: resume (clean check)
Jun 30 10:35:34 ac100-debian nilfs_cleanerd[1914]: ncleansegs = 428
Jun 30 10:35:34 ac100-debian nilfs_cleanerd[1914]: 4 segments selected to be cleaned
Jun 30 10:35:34 ac100-debian nilfs_cleanerd[1914]: cannot clean segments: File exists
Jun 30 10:35:34 ac100-debian nilfs_cleanerd[1914]: shutdown
Jun 30 10:38:37 ac100-debian nilfs_cleanerd[2011]: start
Jun 30 10:38:37 ac100-debian nilfs_cleanerd[2011]: pause (clean check)
Jun 30 10:38:37 ac100-debian nilfs_cleanerd[2011]: resume (clean check)
Jun 30 10:38:37 ac100-debian nilfs_cleanerd[2011]: ncleansegs = 428
Jun 30 10:38:37 ac100-debian nilfs_cleanerd[2011]: 4 segments selected to be cleaned
Jun 30 10:38:37 ac100-debian nilfs_cleanerd[2011]: cannot clean segments: File exists
Jun 30 10:38:37 ac100-debian nilfs_cleanerd[2011]: shutdown

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: New experience with the odd problem starting nilfs_cleanerd due to an eMMC misbehaviour
       [not found]           ` <20130630074404.GI22224-0MSThuzptbI@public.gmane.org>
@ 2013-06-30  7:56             ` Paul Fertser
       [not found]               ` <20130630075626.GJ22224-0MSThuzptbI@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Fertser @ 2013-06-30  7:56 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Sun, Jun 30, 2013 at 11:44:04AM +0400, Paul Fertser wrote:
> Attaching dumpseg output for segment 2713, strace for nilfs_cleanerd,
> etc, HTH.

But now zeroing out the eMMC sectors that lead to errors didn't help,
I get

[   81.903509] nilfs_cpfile_delete_checkpoints: invalid range of checkpoint numbers: [0, 436896)
[   81.912342] NILFS: GC failed during preparation: cannot delete checkpoints: err=-22

Jun 30 11:51:43 ac100-debian nilfs_cleanerd[1553]: start
Jun 30 11:51:43 ac100-debian nilfs_cleanerd[1553]: pause (clean check)
Jun 30 11:51:43 ac100-debian nilfs_cleanerd[1553]: resume (clean check)
Jun 30 11:51:43 ac100-debian nilfs_cleanerd[1553]: ncleansegs = 464
Jun 30 11:51:43 ac100-debian nilfs_cleanerd[1553]: 4 segments selected to be cleaned
Jun 30 11:51:43 ac100-debian nilfs_cleanerd[1553]: cannot clean segments: Invalid argument
Jun 30 11:51:43 ac100-debian nilfs_cleanerd[1553]: shutdown

I've no backups for this volume which is a rootfs on my netbook, not
that it has something very important but I'd like to keep it
functional... Any ideas how to proceed?

-- 
Be free, use free (http://www.gnu.org/philosophy/free-sw.html) software!
mailto:fercerpav-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: New experience with the odd problem starting nilfs_cleanerd due to an eMMC misbehaviour
       [not found]               ` <20130630075626.GJ22224-0MSThuzptbI@public.gmane.org>
@ 2013-06-30 11:49                 ` Vyacheslav Dubeyko
  0 siblings, 0 replies; 13+ messages in thread
From: Vyacheslav Dubeyko @ 2013-06-30 11:49 UTC (permalink / raw)
  To: Paul Fertser; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Paul,

On Jun 30, 2013, at 11:56 AM, Paul Fertser wrote:

> On Sun, Jun 30, 2013 at 11:44:04AM +0400, Paul Fertser wrote:
>> Attaching dumpseg output for segment 2713, strace for nilfs_cleanerd,
>> etc, HTH.
> 
> But now zeroing out the eMMC sectors that lead to errors didn't help,
> I get
> 
> [   81.903509] nilfs_cpfile_delete_checkpoints: invalid range of checkpoint numbers: [0, 436896)
> [   81.912342] NILFS: GC failed during preparation: cannot delete checkpoints: err=-22
> 
> Jun 30 11:51:43 ac100-debian nilfs_cleanerd[1553]: start
> Jun 30 11:51:43 ac100-debian nilfs_cleanerd[1553]: pause (clean check)
> Jun 30 11:51:43 ac100-debian nilfs_cleanerd[1553]: resume (clean check)
> Jun 30 11:51:43 ac100-debian nilfs_cleanerd[1553]: ncleansegs = 464
> Jun 30 11:51:43 ac100-debian nilfs_cleanerd[1553]: 4 segments selected to be cleaned
> Jun 30 11:51:43 ac100-debian nilfs_cleanerd[1553]: cannot clean segments: Invalid argument
> Jun 30 11:51:43 ac100-debian nilfs_cleanerd[1553]: shutdown
> 

Such error likewise (NILFS: GC failed during preparation: cannot read source blocks: err=-17)
was reported by another user too. Currently, I am waiting debugging output from him. I suppose
that this issue was reported by many users with slightly different symptoms. But, of course, it can
be a different issues.

What a pity that you tried to zeroing blocks. So, we can't investigate the issue in initial state.
But as I can see the issue doesn't vanish. It means for me that we can investigate the issue anyway.

As I said earlier, I think that it is the issue that was reported by many users. So, it will be a great
if you agree to spend some time on the issue investigation.

> I've no backups for this volume which is a rootfs on my netbook, not
> that it has something very important but I'd like to keep it
> functional... Any ideas how to proceed?
> 

I need additional details about the issue:
1. Please, share with me output of dumpseg for all segments.

2. You shared content of primary superblock. But I need to know the content of secondary superblock too.
Usually, the secondary superblock is placed in last block of the volume. So, you can share with me raw
dump of last block. But I think that it can be more easy for you and more informative for me to share
debug output of fsck utility (http://dubeyko.com/development/FileSystems/NILFS/nilfs-utils-fsck-v.0.04-under-development.tar.gz):

fsck.nilfs2 -v debug <device> 2> output_file.txt

This output will be a big in size. So, it makes sense to share it only with me.

3. Please, share with me debug output for the case of the issue reproducing. I send you in private
e-mail archive this patch set. You need to patch your kernel source tree by this patch set. Then,
it needs to configure kernel with enabling of such configuration options:
CONFIG_NILFS2_DEBUG_SHOW_ERRORS, CONFIG_NILFS2_DEBUG_DUMP_STACK,
CONFIG_NILFS2_DEBUG_BASE_OPERATIONS, CONFIG_NILFS2_DEBUG_MDT_FILES,
CONFIG_NILFS2_DEBUG_GC_SUBSYSTEM, CONFIG_NILFS2_DEBUG_RECOVERY_SUBSYSTEM,
CONFIG_NILFS2_DEBUG_BLOCK_MAPPING, CONFIG_NILFS2_DEBUG_HEXDUMP.
Rebuild your kernel after configuration. Reproduce the issue after kernel restart. If all steps will be
ended successfully then you will have detailed debug output in the system log. You need only share
it with me.

So, when I will have all additional details then I can investigate the issue more deeply. And I hope
that I will have enough for fast fix of the issue.

Thanks,
Vyacheslav Dubeyko.

> -- 
> Be free, use free (http://www.gnu.org/philosophy/free-sw.html) software!
> mailto:fercerpav-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-06-30 11:49 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-26 13:52 Odd problem starting nilfs_cleanerd due to an eMMC misbehaviour Paul Fertser
     [not found] ` <20120126135203.GM2267-0MSThuzptbI@public.gmane.org>
2012-01-27 16:19   ` Christian Smith
     [not found]     ` <20120127161921.GL750-Ng8wz+J301SNY5Lh21HnMTHS2PGA244I9dF7HbQ/qKg@public.gmane.org>
2012-01-27 16:29       ` Gordan Bobic
2012-01-28 12:53   ` Martin Steigerwald
     [not found]     ` <201201281353.00537.Martin-3kZCPVa5dk2azgQtNeiOUg@public.gmane.org>
2012-02-01  6:06       ` Paul Fertser
2013-01-10 13:16   ` New experience with the odd " Paul Fertser
     [not found]     ` <20130110131659.GA29689-0MSThuzptbI@public.gmane.org>
2013-01-10 13:34       ` Vyacheslav Dubeyko
2013-01-10 13:49         ` Paul Fertser
     [not found]           ` <20130110134907.GB29689-0MSThuzptbI@public.gmane.org>
2013-01-10 14:00             ` Vyacheslav Dubeyko
2013-01-10 14:12               ` Paul Fertser
2013-06-30  7:44         ` Paul Fertser
     [not found]           ` <20130630074404.GI22224-0MSThuzptbI@public.gmane.org>
2013-06-30  7:56             ` Paul Fertser
     [not found]               ` <20130630075626.GJ22224-0MSThuzptbI@public.gmane.org>
2013-06-30 11:49                 ` Vyacheslav Dubeyko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.