From: Anders Olofsson <pingu@mazeda.se>
To: Richard Weinberger <richard@nod.at>
Cc: "linux-mtd@lists.infradead.org" <linux-mtd@lists.infradead.org>
Subject: Re: UBI: Race between fastmap_write and wear_leveling_worker
Date: Thu, 25 Aug 2016 10:32:09 +0200 [thread overview]
Message-ID: <d05b2f1d-8fb7-d660-2855-642add16ef91@mazeda.se> (raw)
In-Reply-To: <34d645bd-3c72-5d93-dd26-3215db9f92b9@nod.at>
On 2016-08-25 09:38, Richard Weinberger wrote:
> On 25.08.2016 08:52, Anders Olofsson wrote:
>> On 2016-08-24 17:04, Richard Weinberger wrote:
>>> Anders,
>>>
>>> On Wed, Aug 24, 2016 at 1:37 PM, Anders Olofsson <pingu@mazeda.se> wrote:
>>>> After enabling fastmap I sometimes get the following warning at boot:
>>>>
>>>
>>> Hehe, you're lucky I've recently fixed an issue in this area, can you
>>> please try:
>>> http://lists.infradead.org/pipermail/linux-mtd/2016-August/068919.html
>>>
>>> I did these fixes on top of an rather old customer kernel and started
>>> upstreaming
>>> them.
>>>
>>
>> Tested it and from what I can tell it solves my problem as well. I've run a bunch of reboots and the wear leveling worker no longer runs while the fastmap is being updated.
>>
>> Good work and thanks a lot for solving it so quickly.
>
> How do you test? I wonder how you can trigger this so easily.
> The said patch emerged while a customer did excessive Fastmap testing
> and the race appeared only once. I found it while staring at the code.
I don't know what I'm doing that makes my system special. I can only
guess it's related to the size of the UBI partition since it only
happens on the smaller of the two partitions we use (160 PEBs vs. 1830
in the larger partition where I've never seen this happen).
Having only 160 PEBs means the WL pool consists of only 4 PEBs if that
could be any clue to the behaviors I'm describing here.
If size is the key, then the setup is a 20MB partition with a 8MB UBIFS
volume in it and the only thing I need to do to trigger this is to
attach the partition and mount the filesystem. I think my system may
also do some small write to a file in the filesystem, but mostly just
reading. Clean reboot or power cycle seems to work equally well in
triggering the fault.
What I have seen is that at every boot, the wear leveling worker always
wants to relocate one PEB and always fails. The source PEB varies but
the target PEB is always the first one from the WL pool. The relocation
always fails, either because the source block is unused or because it is
locked and the handling in the worker is to always erase the destination
PEB and this was happening while the fastmap was being updated.
This by itself sounds like a bug somewhere, there should be no need to
erase the destination PEB when the wear leveling was aborted before
anything was written. Since it is always the same PEB, the result is
this PEB having a much higher erase count than the other PEBs in the
partition.
The wear leveling always seems to happen right after attaching and the
fastmap is also always rewritten at this time. From what I've understood
so far from the fastmap logic, I don't see why it needs to update the
map at every boot though, but it happens on my partition and since both
of these happens at the same time the race occurs often enough to be
visible as more than just a small glitch.
This behavior is of course the same with your patch. The only difference
is that the wear leveling worker isn't allowed to run until after the
fastmap update is completed.
I did notice the fault happening more easily while I was debugging, so
having a lot of debug prints in the code made the race window larger,
but I still got this at least 1/10 of every boot before adding any
prints on the multi-core systems.
> But it is good to see that finally after years embedded Folks start
> using Fastmap and non-obvious issues can get sorted out.
I'm working on an embedded system where boot times are becoming more and
more important. Using fastmap removes a whole second from our total boot
time (half in boot loader and half in kernel) so this was definitely a
good feature for us.
/Anders
prev parent reply other threads:[~2016-08-25 8:32 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-24 11:37 UBI: Race between fastmap_write and wear_leveling_worker Anders Olofsson
2016-08-24 15:04 ` Richard Weinberger
2016-08-25 6:52 ` Anders Olofsson
2016-08-25 7:38 ` Richard Weinberger
2016-08-25 8:32 ` Anders Olofsson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d05b2f1d-8fb7-d660-2855-642add16ef91@mazeda.se \
--to=pingu@mazeda.se \
--cc=linux-mtd@lists.infradead.org \
--cc=richard@nod.at \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.