All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: a UBIFS image makes task pdflush blocked > 120 seconds
       [not found] <4B012D0D.4080500@aimvalley.nl>
@ 2009-11-16 12:12 ` Joakim Tjernlund
  2009-11-17  8:25 ` Artem Bityutskiy
  1 sibling, 0 replies; 27+ messages in thread
From: Joakim Tjernlund @ 2009-11-16 12:12 UTC (permalink / raw)
  To: N. van Bolhuis; +Cc: linux-mtd, dedekind1



"N. van Bolhuis" <nvbolhuis@aimvalley.nl> wrote on 16/11/2009 11:44:29:
>
>
> > Fair enough. However, this is not UBI/UBIFS which blocks everything.
> > This MTD chip lock that locks everything and is contended at. The right
> > fix would be improve MTD/CFI and teach them to be asynchronous and drop
> > the chip lock while the eraseing is in progress, allowing other tasks to
> > do I/O on other blocks. However, I do not know is it possible at all.
> >
> > I mean, how UBI/UBIFS can help when:
> >
> > 1. Erase takes long time (how long? 0.2-0.5 sec?)
> > 2. There are huge number of eraseblocks to erase in the queue.
> > 3. All I/O is contended on one locks, and everything block while one
> >    eraseblock is erased.
> >
> > How can UBI/UBIFS make things really fly fast?
> >
>
> true, UBI/UBIFS can't really fix this.
> Btw. the sector erase time for our NOR flash (S29GL01GP) is
> 0.7-0.8 seconds.
>
> > The only way to help from UBI side I see is to lessen the UBI background
> > thread's priority - you could try this. Or for the background
> > formatting, have a 'yield()' call in the UBI background thread.
> >
> > Or we could have a 'nor_erase_quirk()' in the UBI background thread,
> > which would call yield() if the previous operation was "erase", and the
> > next one is also "erase". You could try this.
> >
> > Do you have any other suggestions?
>
> No. Let's try the 'nor_erase_quirk()' solution.
> yield() will send the current process to expired list
> which means the other processes will be activated for sure, right ?
> I think this will work fine.
> It sounds like you already have an idea how to implement this,
> can you send a patch ? I will test it.
>
>
> The best solution is to use erase-suspend (if your chip supports it).
> Our chip supports read+write erase-suspend (cfip->EraseSuspend = 0x2).
> But as Joakim already mentioned cfi_cmdset_0002.c disables erase-suspend
> for writing. Maybe this is because of line 710 in the below code:
>
>   703                 if (mode == FL_WRITING) /* FIXME:
> Erase-suspend-program appears broken. */
>   704                         goto sleep;
>   705
>   706                 if (!(   mode == FL_READY
>   707                       || mode == FL_POINT
>   708                       || !cfip
>   709                       || (mode == FL_WRITING &&
> (cfip->EraseSuspend & 0x2))
>   710                       || (mode == FL_WRITING &&
> (cfip->EraseSuspend & 0x1)
>   711                     )))
>   712                         goto sleep;
>   713
>   714                 /* We could check to see if we're trying to access
> the sector
>
>
> As has been mentioned earlier line 710 should probably be removed.
> Erase-suspend shouldn't be done if read-only erase-suspend is supported
> (cfip->EraseSuspend = 0x1).
>
> I can remove line 703-705 + 710 and test this.
> I wonder though, if this works, is this enough "solid ground" to enable
> erase-suspend for cmdset_002 ?

The test on line 706 is broken, I posted a semi complete patch
correcting this:
if (!cfip || !(cfip->EraseSuspend & (0x1|0x2)) ||
    !(mode == FL_READY || mode == FL_POINT ||
     (mode == FL_WRITING && (cfip->EraseSuspend & 0x2))))

Then remove the
  if (mode == FL_WRITING) /* FIXME: Erase-suspend-program appears broken. */
       goto sleep;
to enable Erase-suspend-program

Find in the archives heading "don't suspend erase for erase in cfi_cmdset_0002"
for details.

       Jocke

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: a UBIFS image makes task pdflush blocked > 120 seconds
       [not found] <4B012D0D.4080500@aimvalley.nl>
  2009-11-16 12:12 ` a UBIFS image makes task pdflush blocked > 120 seconds Joakim Tjernlund
@ 2009-11-17  8:25 ` Artem Bityutskiy
  2009-11-17 16:25   ` Norbert van Bolhuis
  1 sibling, 1 reply; 27+ messages in thread
From: Artem Bityutskiy @ 2009-11-17  8:25 UTC (permalink / raw)
  To: N. van Bolhuis; +Cc: linux-mtd, joakim.tjernlund

On Mon, 2009-11-16 at 11:44 +0100, N. van Bolhuis wrote:
> > The only way to help from UBI side I see is to lessen the UBI background
> > thread's priority - you could try this. Or for the background
> > formatting, have a 'yield()' call in the UBI background thread.
> > 
> > Or we could have a 'nor_erase_quirk()' in the UBI background thread,
> > which would call yield() if the previous operation was "erase", and the
> > next one is also "erase". You could try this.
> > 
> > Do you have any other suggestions?
> 
> No. Let's try the 'nor_erase_quirk()' solution.

Thinking about this more, I do not like using 'yield()'. JFFS2 had it in
the background thread, and there were many discussions about this, it
hurt one group of people, and helps another group. But it is hacky,
because depending on kernel version, 'yield()' may have different
effects. I do not like it.

Could you just try lowering the UBI bgt priority, and see if it helps?
I would not want to go for any yielding unless you report it does not
help.

Just renice the BGT and see what happens, and report back please.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: a UBIFS image makes task pdflush blocked > 120 seconds
  2009-11-17  8:25 ` Artem Bityutskiy
@ 2009-11-17 16:25   ` Norbert van Bolhuis
  2009-11-18  8:28     ` Artem Bityutskiy
  0 siblings, 1 reply; 27+ messages in thread
From: Norbert van Bolhuis @ 2009-11-17 16:25 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd, joakim.tjernlund



I tested all mentioned solutions. It turns out only the "erase-suspend option"
helps.


lowering the UBI BG task priority somehow causes the application to block until
the UBI BG task finishes erasing *all* NOR blocks. I guess the application is blocked
in get_chip and never gets activated by the scheduler.
I do not understand why this happens. How can a lower UBI BG prio make the application
block for so long ?
Maybe I implemented it wrong, I added line 1359.
1354 int ubi_thread(void *u)
1355 {
1356         int failures = 0;
1357         struct ubi_device *ubi = u;
1358
1359         set_user_nice(current, 19);
1360
1361         ubi_msg("background thread \"%s\" started, PID %d",
1362                 ubi->bgt_name, task_pid_nr(current));
I did see the ubi_bg prio = 139 (lowest possible prio), it used to be 115.
I also tried 135, same result.
Anyway, this doesn't seem to work.


Using yield() in the UBI BG task doesn't seem to do much. It for sure doesn't cause
the application to become active after a (couple of) erase operations. The log shows:
LE:0197, TIME=0026.276, pid=0918, M(xy=00)=do_erase_oneblock, START
LE:0198, TIME=0026.868, pid=1237, M(xy=04)=commit_semR, make_reservation, Req DOWN 1
LE:0199, TIME=0026.868, pid=1237, M(xy=05)=commit_semR, make_reservation, DOWN 1
LE:0200, TIME=0026.868, pid=1237, M(xy=02)=io_mutex, reserve_space, Req LOCK 1
LE:0201, TIME=0026.868, pid=1237, M(xy=03)=io_mutex, reserve_space, LOCK 1
LE:0202, TIME=0027.104, pid=0918, M(xy=01)=do_erase_oneblock, END
LE:0203, TIME=0027.104, pid=0918, M(xy=0c)=yield
LE:0204, TIME=0027.104, pid=0918, M(xy=00)=do_erase_oneblock, START
LE:0205, TIME=0027.876, pid=0918, M(xy=01)=do_erase_oneblock, END
LE:0206, TIME=0027.876, pid=0918, M(xy=0c)=yield
LE:0207, TIME=0027.876, pid=0918, M(xy=00)=do_erase_oneblock, START
LE:0208, TIME=0028.672, pid=0918, M(xy=01)=do_erase_oneblock, END
LE:0209, TIME=0028.672, pid=0918, M(xy=0c)=yield
   <omitted many do_erase_oneblock END/START + yield entries>
LE:0294, TIME=0051.132, pid=0918, M(xy=00)=do_erase_oneblock, START
LE:0295, TIME=0051.944, pid=0918, M(xy=01)=do_erase_oneblock, END
LE:0296, TIME=0051.944, pid=0918, M(xy=0c)=yield
LE:0297, TIME=0051.944, pid=0918, M(xy=00)=do_erase_oneblock, START
LE:0298, TIME=0052.080, pid=0108, M(xy=04)=commit_semR, make_reservation, Req DOWN 1
LE:0299, TIME=0052.080, pid=0108, M(xy=05)=commit_semR, make_reservation, DOWN 1
LE:0300, TIME=0052.080, pid=0108, M(xy=02)=io_mutex, reserve_space, Req LOCK 1
   <omitted many do_erase_oneblock END/START + yield entries>
LE:0405, TIME=0080.652, pid=0918, M(xy=00)=do_erase_oneblock, START
LE:0406, TIME=0081.536, pid=0918, M(xy=01)=do_erase_oneblock, END
LE:0407, TIME=0081.536, pid=0918, M(xy=0c)=yield
LE:0408, TIME=0081.536, pid=0918, M(xy=00)=do_erase_oneblock, START
LE:0409, TIME=0082.332, pid=0918, M(xy=01)=do_erase_oneblock, END
LE:0410, TIME=0082.332, pid=0918, M(xy=0c)=yield
LE:0411, TIME=0082.332, pid=1237, M(xy=06)=io_mutex, release_head, Req UNLOCK
LE:0412, TIME=0082.332, pid=0108, M(xy=03)=io_mutex, reserve_space, LOCK 1
LE:0413, TIME=0082.332, pid=0108, M(xy=06)=io_mutex, release_head, Req UNLOCK
LE:0414, TIME=0082.332, pid=0108, M(xy=07)=io_mutex, release_head, UNLOCK
LE:0415, TIME=0082.332, pid=0108, M(xy=08)=commit_semR, finish_reservation, Req UP
LE:0416, TIME=0082.332, pid=0108, M(xy=09)=commit_semR, finish_reservation, UP
As can be seen it still takes ~ 55 seconds before the application (pid=1237) and pdflush
(pid=0108) can continue.


Using the erase-suspend feature works great. The application creates and writes
files without any delay. pdflush isn't delayed either.
The difference is that the UBI BG thread erased only 200 blocks after our application
is completely up and running while this used to be ~ 1000. This is as expected and
a logical consequence of the erase-suspend.


So as far as I'm concerned this problem has been solved. I'm gonna use the
erase-suspend feature in the way Joakim suggested.
I will do a few additional erase-suspend related tests (since it looks like I'm one of the
first users for cmdset_002) and report problems to this list (as usual).

thanks for all help.

Norbert.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: a UBIFS image makes task pdflush blocked > 120 seconds
  2009-11-17 16:25   ` Norbert van Bolhuis
@ 2009-11-18  8:28     ` Artem Bityutskiy
  2009-11-18  9:26       ` Norbert van Bolhuis
  0 siblings, 1 reply; 27+ messages in thread
From: Artem Bityutskiy @ 2009-11-18  8:28 UTC (permalink / raw)
  To: Norbert van Bolhuis; +Cc: linux-mtd, joakim.tjernlund

On Tue, 2009-11-17 at 17:25 +0100, Norbert van Bolhuis wrote:
> 
> I tested all mentioned solutions. It turns out only the "erase-suspend option"
> helps.
> 
> 
> lowering the UBI BG task priority somehow causes the application to block until
> the UBI BG task finishes erasing *all* NOR blocks. I guess the application is blocked
> in get_chip and never gets activated by the scheduler.
> I do not understand why this happens. How can a lower UBI BG prio make the application
> block for so long ?
> Maybe I implemented it wrong, I added line 1359.
> 1354 int ubi_thread(void *u)
> 1355 {
> 1356         int failures = 0;
> 1357         struct ubi_device *ubi = u;
> 1358
> 1359         set_user_nice(current, 19);
> 1360
> 1361         ubi_msg("background thread \"%s\" started, PID %d",
> 1362                 ubi->bgt_name, task_pid_nr(current));
> I did see the ubi_bg prio = 139 (lowest possible prio), it used to be 115.
> I also tried 135, same result.
> Anyway, this doesn't seem to work.
> 
> 
> Using yield() in the UBI BG task doesn't seem to do much. It for sure doesn't cause
> the application to become active after a (couple of) erase operations. The log shows:
> LE:0197, TIME=0026.276, pid=0918, M(xy=00)=do_erase_oneblock, START
> LE:0198, TIME=0026.868, pid=1237, M(xy=04)=commit_semR, make_reservation, Req DOWN 1
> LE:0199, TIME=0026.868, pid=1237, M(xy=05)=commit_semR, make_reservation, DOWN 1
> LE:0200, TIME=0026.868, pid=1237, M(xy=02)=io_mutex, reserve_space, Req LOCK 1
> LE:0201, TIME=0026.868, pid=1237, M(xy=03)=io_mutex, reserve_space, LOCK 1
> LE:0202, TIME=0027.104, pid=0918, M(xy=01)=do_erase_oneblock, END
> LE:0203, TIME=0027.104, pid=0918, M(xy=0c)=yield
> LE:0204, TIME=0027.104, pid=0918, M(xy=00)=do_erase_oneblock, START
> LE:0205, TIME=0027.876, pid=0918, M(xy=01)=do_erase_oneblock, END
> LE:0206, TIME=0027.876, pid=0918, M(xy=0c)=yield
> LE:0207, TIME=0027.876, pid=0918, M(xy=00)=do_erase_oneblock, START
> LE:0208, TIME=0028.672, pid=0918, M(xy=01)=do_erase_oneblock, END
> LE:0209, TIME=0028.672, pid=0918, M(xy=0c)=yield
>    <omitted many do_erase_oneblock END/START + yield entries>
> LE:0294, TIME=0051.132, pid=0918, M(xy=00)=do_erase_oneblock, START
> LE:0295, TIME=0051.944, pid=0918, M(xy=01)=do_erase_oneblock, END
> LE:0296, TIME=0051.944, pid=0918, M(xy=0c)=yield
> LE:0297, TIME=0051.944, pid=0918, M(xy=00)=do_erase_oneblock, START
> LE:0298, TIME=0052.080, pid=0108, M(xy=04)=commit_semR, make_reservation, Req DOWN 1
> LE:0299, TIME=0052.080, pid=0108, M(xy=05)=commit_semR, make_reservation, DOWN 1
> LE:0300, TIME=0052.080, pid=0108, M(xy=02)=io_mutex, reserve_space, Req LOCK 1
>    <omitted many do_erase_oneblock END/START + yield entries>
> LE:0405, TIME=0080.652, pid=0918, M(xy=00)=do_erase_oneblock, START
> LE:0406, TIME=0081.536, pid=0918, M(xy=01)=do_erase_oneblock, END
> LE:0407, TIME=0081.536, pid=0918, M(xy=0c)=yield
> LE:0408, TIME=0081.536, pid=0918, M(xy=00)=do_erase_oneblock, START
> LE:0409, TIME=0082.332, pid=0918, M(xy=01)=do_erase_oneblock, END
> LE:0410, TIME=0082.332, pid=0918, M(xy=0c)=yield
> LE:0411, TIME=0082.332, pid=1237, M(xy=06)=io_mutex, release_head, Req UNLOCK
> LE:0412, TIME=0082.332, pid=0108, M(xy=03)=io_mutex, reserve_space, LOCK 1
> LE:0413, TIME=0082.332, pid=0108, M(xy=06)=io_mutex, release_head, Req UNLOCK
> LE:0414, TIME=0082.332, pid=0108, M(xy=07)=io_mutex, release_head, UNLOCK
> LE:0415, TIME=0082.332, pid=0108, M(xy=08)=commit_semR, finish_reservation, Req UP
> LE:0416, TIME=0082.332, pid=0108, M(xy=09)=commit_semR, finish_reservation, UP
> As can be seen it still takes ~ 55 seconds before the application (pid=1237) and pdflush
> (pid=0108) can continue.
> 
> 
> Using the erase-suspend feature works great. The application creates and writes
> files without any delay. pdflush isn't delayed either.
> The difference is that the UBI BG thread erased only 200 blocks after our application
> is completely up and running while this used to be ~ 1000. This is as expected and
> a logical consequence of the erase-suspend.
> 
> 
> So as far as I'm concerned this problem has been solved. I'm gonna use the
> erase-suspend feature in the way Joakim suggested.
> I will do a few additional erase-suspend related tests (since it looks like I'm one of the
> first users for cmdset_002) and report problems to this list (as usual).
> 
> thanks for all help.

The best way to say thanks is:

1. Add a small documentation section about your case to UBIFS howto, so
others could benefit. You could send a patch against the MTD web-site:
http://git.infradead.org/mtd-www.git

2. Make your solution work in upstream out-of-the box.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: a UBIFS image makes task pdflush blocked > 120 seconds
  2009-11-18  8:28     ` Artem Bityutskiy
@ 2009-11-18  9:26       ` Norbert van Bolhuis
  2009-11-18  9:40         ` Artem Bityutskiy
  0 siblings, 1 reply; 27+ messages in thread
From: Norbert van Bolhuis @ 2009-11-18  9:26 UTC (permalink / raw)
  To: dedekind1, joakim.tjernlund; +Cc: linux-mtd

Artem Bityutskiy wrote:

> 
> The best way to say thanks is:
> 
> 1. Add a small documentation section about your case to UBIFS howto, so
> others could benefit. You could send a patch against the MTD web-site:
> http://git.infradead.org/mtd-www.git
> 

Ok, I'll do this.

> 2. Make your solution work in upstream out-of-the box.
> 

The solution is to enable erase-suspend for writing for which
Joakim already sent earlier patches.
So if it's OK with him I will send the patch for cmdset_0002.

thanks,
Norbert.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: a UBIFS image makes task pdflush blocked > 120 seconds
  2009-11-18  9:26       ` Norbert van Bolhuis
@ 2009-11-18  9:40         ` Artem Bityutskiy
  2009-11-18 10:38           ` Joakim Tjernlund
  0 siblings, 1 reply; 27+ messages in thread
From: Artem Bityutskiy @ 2009-11-18  9:40 UTC (permalink / raw)
  To: Norbert van Bolhuis; +Cc: linux-mtd, joakim.tjernlund

On Wed, 2009-11-18 at 10:26 +0100, Norbert van Bolhuis wrote:
> Artem Bityutskiy wrote:
> 
> > 
> > The best way to say thanks is:
> > 
> > 1. Add a small documentation section about your case to UBIFS howto, so
> > others could benefit. You could send a patch against the MTD web-site:
> > http://git.infradead.org/mtd-www.git
> > 
> 
> Ok, I'll do this.

Cool. I guess a section to this page would work well:
http://www.linux-mtd.infradead.org/faq/ubifs.html

> > 2. Make your solution work in upstream out-of-the box.
> > 
> 
> The solution is to enable erase-suspend for writing for which
> Joakim already sent earlier patches.
> So if it's OK with him I will send the patch for cmdset_0002.

Why his patch is not in upstream yet? I think he mentioned some issues
with modules.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: a UBIFS image makes task pdflush blocked > 120 seconds
  2009-11-18  9:40         ` Artem Bityutskiy
@ 2009-11-18 10:38           ` Joakim Tjernlund
  2009-11-18 10:54             ` Artem Bityutskiy
  0 siblings, 1 reply; 27+ messages in thread
From: Joakim Tjernlund @ 2009-11-18 10:38 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd, Norbert van Bolhuis

Artem Bityutskiy <dedekind1@gmail.com> wrote on 18/11/2009 10:40:24:
>
> On Wed, 2009-11-18 at 10:26 +0100, Norbert van Bolhuis wrote:
> > Artem Bityutskiy wrote:
> >
> > >
> > > The best way to say thanks is:
> > >
> > > 1. Add a small documentation section about your case to UBIFS howto, so
> > > others could benefit. You could send a patch against the MTD web-site:
> > > http://git.infradead.org/mtd-www.git
> > >
> >
> > Ok, I'll do this.
>
> Cool. I guess a section to this page would work well:
> http://www.linux-mtd.infradead.org/faq/ubifs.html
>
> > > 2. Make your solution work in upstream out-of-the box.
> > >
> >
> > The solution is to enable erase-suspend for writing for which
> > Joakim already sent earlier patches.
> > So if it's OK with him I will send the patch for cmdset_0002.

I am fine with that, I sent them earlier because I wanted someone
with cmdset_0002 to pick them up, test and submit.

>
> Why his patch is not in upstream yet? I think he mentioned some issues
> with modules.

Nope, that was a another JFFS2 issue. Not related to this.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: a UBIFS image makes task pdflush blocked > 120 seconds
  2009-11-18 10:38           ` Joakim Tjernlund
@ 2009-11-18 10:54             ` Artem Bityutskiy
  2009-11-18 10:59               ` Norbert van Bolhuis
  2009-11-18 11:01               ` Joakim Tjernlund
  0 siblings, 2 replies; 27+ messages in thread
From: Artem Bityutskiy @ 2009-11-18 10:54 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: linux-mtd, Norbert van Bolhuis

On Wed, 2009-11-18 at 11:38 +0100, Joakim Tjernlund wrote:
> Artem Bityutskiy <dedekind1@gmail.com> wrote on 18/11/2009 10:40:24:
> >
> > On Wed, 2009-11-18 at 10:26 +0100, Norbert van Bolhuis wrote:
> > > Artem Bityutskiy wrote:
> > >
> > > >
> > > > The best way to say thanks is:
> > > >
> > > > 1. Add a small documentation section about your case to UBIFS howto, so
> > > > others could benefit. You could send a patch against the MTD web-site:
> > > > http://git.infradead.org/mtd-www.git
> > > >
> > >
> > > Ok, I'll do this.
> >
> > Cool. I guess a section to this page would work well:
> > http://www.linux-mtd.infradead.org/faq/ubifs.html
> >
> > > > 2. Make your solution work in upstream out-of-the box.
> > > >
> > >
> > > The solution is to enable erase-suspend for writing for which
> > > Joakim already sent earlier patches.
> > > So if it's OK with him I will send the patch for cmdset_0002.
> 
> I am fine with that, I sent them earlier because I wanted someone
> with cmdset_0002 to pick them up, test and submit.
> 
> >
> > Why his patch is not in upstream yet? I think he mentioned some issues
> > with modules.
> 
> Nope, that was a another JFFS2 issue. Not related to this.

I'm do not know CFI, but if you or Norbet re-submit the patch with CC to
Nicolas, put some good description which problems it solves, tell that
you tested it, I can help merging it by putting it to my l2-mtd-2.6.git.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: a UBIFS image makes task pdflush blocked > 120 seconds
  2009-11-18 10:54             ` Artem Bityutskiy
@ 2009-11-18 10:59               ` Norbert van Bolhuis
  2009-11-18 11:01               ` Joakim Tjernlund
  1 sibling, 0 replies; 27+ messages in thread
From: Norbert van Bolhuis @ 2009-11-18 10:59 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd, Joakim Tjernlund

Artem Bityutskiy wrote:

>>>>>
>>>> The solution is to enable erase-suspend for writing for which
>>>> Joakim already sent earlier patches.
>>>> So if it's OK with him I will send the patch for cmdset_0002.
>> I am fine with that, I sent them earlier because I wanted someone
>> with cmdset_0002 to pick them up, test and submit.
>>
>>> Why his patch is not in upstream yet? I think he mentioned some issues
>>> with modules.
>> Nope, that was a another JFFS2 issue. Not related to this.
> 
> I'm do not know CFI, but if you or Norbet re-submit the patch with CC to
> Nicolas, put some good description which problems it solves, tell that
> you tested it, I can help merging it by putting it to my l2-mtd-2.6.git.
> 

OK, I will do that.

thanks,
Norbert.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: a UBIFS image makes task pdflush blocked > 120 seconds
  2009-11-18 10:54             ` Artem Bityutskiy
  2009-11-18 10:59               ` Norbert van Bolhuis
@ 2009-11-18 11:01               ` Joakim Tjernlund
  1 sibling, 0 replies; 27+ messages in thread
From: Joakim Tjernlund @ 2009-11-18 11:01 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd, Norbert van Bolhuis

Artem Bityutskiy <dedekind1@gmail.com> wrote on 18/11/2009 11:54:37:
>
> On Wed, 2009-11-18 at 11:38 +0100, Joakim Tjernlund wrote:
> > Artem Bityutskiy <dedekind1@gmail.com> wrote on 18/11/2009 10:40:24:
> > >
> > > On Wed, 2009-11-18 at 10:26 +0100, Norbert van Bolhuis wrote:
> > > > Artem Bityutskiy wrote:
> > > >
> > > > >
> > > > > The best way to say thanks is:
> > > > >
> > > > > 1. Add a small documentation section about your case to UBIFS howto, so
> > > > > others could benefit. You could send a patch against the MTD web-site:
> > > > > http://git.infradead.org/mtd-www.git
> > > > >
> > > >
> > > > Ok, I'll do this.
> > >
> > > Cool. I guess a section to this page would work well:
> > > http://www.linux-mtd.infradead.org/faq/ubifs.html
> > >
> > > > > 2. Make your solution work in upstream out-of-the box.
> > > > >
> > > >
> > > > The solution is to enable erase-suspend for writing for which
> > > > Joakim already sent earlier patches.
> > > > So if it's OK with him I will send the patch for cmdset_0002.
> >
> > I am fine with that, I sent them earlier because I wanted someone
> > with cmdset_0002 to pick them up, test and submit.
> >
> > >
> > > Why his patch is not in upstream yet? I think he mentioned some issues
> > > with modules.
> >
> > Nope, that was a another JFFS2 issue. Not related to this.
>
> I'm do not know CFI, but if you or Norbet re-submit the patch with CC to
> Nicolas, put some good description which problems it solves, tell that
> you tested it, I can help merging it by putting it to my l2-mtd-2.6.git.

That would be Norberts task, I only pointed out and suggested a fix for a bug
I found in cmdset_0002, I don't use cmdset_0002 so I have no intention
do any more work in this area. I merely tried to help out

       Jocke

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: a UBIFS image makes task pdflush blocked > 120 seconds
  2009-11-16 23:22     ` Jamie Lokier
  2009-11-17  8:31       ` Joakim Tjernlund
@ 2009-11-17 10:45       ` Norbert van Bolhuis
  1 sibling, 0 replies; 27+ messages in thread
From: Norbert van Bolhuis @ 2009-11-17 10:45 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: nvbolhuis, linux-mtd, Joakim Tjernlund, dedekind1

Jamie Lokier wrote:

> According to datasheets, the flash I'm using with cmdset_0002 supports
> multi-block erase, although I don't think the driver does at the moment.
> 
> Multi-block erase works by giving the chip a list of blocks to erase,
> or even the whole flash, and then it does them simultaneously.
> 
> Could this solve your problem?
> 
> -- Jamie
> 


I doubt it.

Afaik multi-block erasure doesn't speed up things. So for n blocks you'd still have
to wait n x block-erase-time (in this case erase-suspend whould be even more important).
Besides, currently MTD/CFI do not support this.

thanks,
Norbert.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: a UBIFS image makes task pdflush blocked > 120 seconds
  2009-11-16 23:22     ` Jamie Lokier
@ 2009-11-17  8:31       ` Joakim Tjernlund
  2009-11-17 10:45       ` Norbert van Bolhuis
  1 sibling, 0 replies; 27+ messages in thread
From: Joakim Tjernlund @ 2009-11-17  8:31 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: nvbolhuis, linux-mtd, dedekind1

Jamie Lokier <jamie@shareable.org> wrote on 17/11/2009 00:22:28:
>
> Joakim Tjernlund wrote:
> > I only noticed a problem with deleting big files and then a reboot.
> > if you don't reboot, the system will work. Perhaps this is due
> > to the actual CFI driver? I am using cmdset_0001 and it is a bit more
> > evolved than the others. My NOR flash also supports erase suspend.
> > If you are using cmdset_0002 you should consider fixing it.
> > As I recall, erase suspend is disabled. There was a thread about it
> > a while ago were I noticed a bug in the driver which may be the reason
> > for erase suspend being disabled. I even posted some
> > fixes for this but I don't think anyone bothered to test them.
>
> According to datasheets, the flash I'm using with cmdset_0002 supports
> multi-block erase, although I don't think the driver does at the moment.
>
> Multi-block erase works by giving the chip a list of blocks to erase,
> or even the whole flash, and then it does them simultaneously.
>
> Could this solve your problem?

That is for Norbert to say, but I think the main feature that is needed
is the possibility to suspend an erase to do a write.

   Jocke

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: a UBIFS image makes task pdflush blocked > 120 seconds
  2009-11-16  8:53   ` Joakim Tjernlund
@ 2009-11-16 23:22     ` Jamie Lokier
  2009-11-17  8:31       ` Joakim Tjernlund
  2009-11-17 10:45       ` Norbert van Bolhuis
  0 siblings, 2 replies; 27+ messages in thread
From: Jamie Lokier @ 2009-11-16 23:22 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: nvbolhuis, linux-mtd, dedekind1

Joakim Tjernlund wrote:
> I only noticed a problem with deleting big files and then a reboot.
> if you don't reboot, the system will work. Perhaps this is due
> to the actual CFI driver? I am using cmdset_0001 and it is a bit more
> evolved than the others. My NOR flash also supports erase suspend.
> If you are using cmdset_0002 you should consider fixing it.
> As I recall, erase suspend is disabled. There was a thread about it
> a while ago were I noticed a bug in the driver which may be the reason
> for erase suspend being disabled. I even posted some
> fixes for this but I don't think anyone bothered to test them.

According to datasheets, the flash I'm using with cmdset_0002 supports
multi-block erase, although I don't think the driver does at the moment.

Multi-block erase works by giving the chip a list of blocks to erase,
or even the whole flash, and then it does them simultaneously.

Could this solve your problem?

-- Jamie

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: a UBIFS image makes task pdflush blocked > 120 seconds
  2009-11-16  8:13 ` Artem Bityutskiy
@ 2009-11-16  8:53   ` Joakim Tjernlund
  2009-11-16 23:22     ` Jamie Lokier
  0 siblings, 1 reply; 27+ messages in thread
From: Joakim Tjernlund @ 2009-11-16  8:53 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd, nvbolhuis

>
> On Sat, 2009-11-14 at 13:38 +0100, nvbolhuis@aimsys.nl wrote:
> > > The only reason I see why it would do this is because you attach an
> > > empty flash to UBI. In this case, UBI has to format it. And it is doing
> > > the formatting asynchronously, in the background thread.
> > >
> > > The idea was that we can allow using the UBI volume even if it is not
> > > fully formatted, and format in parallel. For NAND with its fast erase it
> > > works fine. With NOR it appears to be not so goot.
> > >
> > > Please, attach UBI, then wait until the UBI background thread is done,
> > > and then mount and start using UBIFS.
> > >
> > > The other option is to format the flash _properly_ before attaching it
> > > to UBI. In this case the background thread will not have to do that job.
> > > Use ubiformat for this:
> > >
> > >
> >
> > The slow NOR erase is exactly my point. I don't want to ubiformat
> > the entire NOR flash first. This takes 15-20 mins (for our 156 MB
> > NOR partition).
> > Time is very costly when our system becomes operational for the
> > first time. This is because it happens at manufacturing
> > (which performs some system wide tests). I don't want
> > manufacturing to wait for the entire NOR flash to be fully
> > formatted for each system to be produced.
> > I know UBI will format in parallel and I consider it a feature.
> > I just didn't expect it to block user-space applications.
>
> Fair enough. However, this is not UBI/UBIFS which blocks everything.
> This MTD chip lock that locks everything and is contended at. The right
> fix would be improve MTD/CFI and teach them to be asynchronous and drop
> the chip lock while the eraseing is in progress, allowing other tasks to
> do I/O on other blocks. However, I do not know is it possible at all.
>
> I mean, how UBI/UBIFS can help when:
>
> 1. Erase takes long time (how long? 0.2-0.5 sec?)
> 2. There are huge number of eraseblocks to erase in the queue.
> 3. All I/O is contended on one locks, and everything block while one
>    eraseblock is erased.
>
> How can UBI/UBIFS make things really fly fast?
>
> The only way to help from UBI side I see is to lessen the UBI background
> thread's priority - you could try this. Or for the background
> formatting, have a 'yield()' call in the UBI background thread.
>
> Or we could have a 'nor_erase_quirk()' in the UBI background thread,
> which would call yield() if the previous operation was "erase", and the
> next one is also "erase". You could try this.
>
> Do you have any other suggestions?
>
> > Won't user-space applications also be blocked whenever UBI needs
> > to erase several NOR PEBs for other reasons than the initial
> > parallel format. I'm thinking about replacing large files when UBIFS
> > is almost fully occupied.
>
> True. As Joakim pointed, this can happen if you delete many files.

I only noticed a problem with deleting big files and then a reboot.
if you don't reboot, the system will work. Perhaps this is due
to the actual CFI driver? I am using cmdset_0001 and it is a bit more
evolved than the others. My NOR flash also supports erase suspend.
If you are using cmdset_0002 you should consider fixing it.
As I recall, erase suspend is disabled. There was a thread about it
a while ago were I noticed a bug in the driver which may be the reason
for erase suspend being disabled. I even posted some
fixes for this but I don't think anyone bothered to test them.

     Jocke

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: a UBIFS image makes task pdflush blocked > 120 seconds
       [not found] <34637.10.10.0.184.1258202287.squirrel@intranet.aimsys.nl>
@ 2009-11-16  8:13 ` Artem Bityutskiy
  2009-11-16  8:53   ` Joakim Tjernlund
  0 siblings, 1 reply; 27+ messages in thread
From: Artem Bityutskiy @ 2009-11-16  8:13 UTC (permalink / raw)
  To: nvbolhuis; +Cc: linux-mtd

On Sat, 2009-11-14 at 13:38 +0100, nvbolhuis@aimsys.nl wrote:
> > The only reason I see why it would do this is because you attach an
> > empty flash to UBI. In this case, UBI has to format it. And it is doing
> > the formatting asynchronously, in the background thread.
> >
> > The idea was that we can allow using the UBI volume even if it is not
> > fully formatted, and format in parallel. For NAND with its fast erase it
> > works fine. With NOR it appears to be not so goot.
> >
> > Please, attach UBI, then wait until the UBI background thread is done,
> > and then mount and start using UBIFS.
> >
> > The other option is to format the flash _properly_ before attaching it
> > to UBI. In this case the background thread will not have to do that job.
> > Use ubiformat for this:
> >
> >
> 
> The slow NOR erase is exactly my point. I don't want to ubiformat
> the entire NOR flash first. This takes 15-20 mins (for our 156 MB
> NOR partition).
> Time is very costly when our system becomes operational for the
> first time. This is because it happens at manufacturing
> (which performs some system wide tests). I don't want
> manufacturing to wait for the entire NOR flash to be fully
> formatted for each system to be produced.
> I know UBI will format in parallel and I consider it a feature.
> I just didn't expect it to block user-space applications.

Fair enough. However, this is not UBI/UBIFS which blocks everything.
This MTD chip lock that locks everything and is contended at. The right
fix would be improve MTD/CFI and teach them to be asynchronous and drop
the chip lock while the eraseing is in progress, allowing other tasks to
do I/O on other blocks. However, I do not know is it possible at all.

I mean, how UBI/UBIFS can help when:

1. Erase takes long time (how long? 0.2-0.5 sec?)
2. There are huge number of eraseblocks to erase in the queue.
3. All I/O is contended on one locks, and everything block while one
   eraseblock is erased.

How can UBI/UBIFS make things really fly fast?

The only way to help from UBI side I see is to lessen the UBI background
thread's priority - you could try this. Or for the background
formatting, have a 'yield()' call in the UBI background thread.

Or we could have a 'nor_erase_quirk()' in the UBI background thread,
which would call yield() if the previous operation was "erase", and the
next one is also "erase". You could try this.

Do you have any other suggestions?

> Won't user-space applications also be blocked whenever UBI needs
> to erase several NOR PEBs for other reasons than the initial
> parallel format. I'm thinking about replacing large files when UBIFS
> is almost fully occupied.

True. As Joakim pointed, this can happen if you delete many files.

> I don't see why the "parallel format" (for NOR) needs to be done so
> actively so soon at current high priority.

Well, we did not put any extra code to be "active". We just have a
thread and a queue of works to be done, and the thread simply performs
the works one-by-one. Try to lessen the thread's priority. Find a way
which works for you, and then we can add a 'nor_priority_quirk()' to
UBI.

>  If the system reboots
> (soft or hard) UBI will continue parallel formatting where it left,
> right ?

Correct.

> Btw. does UBI really need to erase the already empty=erased NOR PEBs.
> Can't it just somehow detect empty PEBs and then only program it with
> the UBI headers.

The reason is that it is not easy to find out whether an eraseblock is
erased or not. We would need to read all eraseblocks and compare the
contents with 0xFF. And even this would be unreliable because unclean
reboots often lead to bitflips, so you may read an eraseblock as all
0xFF at first, and then when you read it the second time, you can find
that on or few bits are set to 0 there. This is why we need to re-erase
non-formatted eraseblocks (non-formatted = no erase counter).

But if you need, you can extend the mtd UBI_IOCATT ioctl with a flag
that tells UBI that the flash is for sure empty and does not require
re-erasing. Then if you are mistaken, you'll end up with nasty subtle
problems.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: a UBIFS image makes task pdflush blocked > 120 seconds
  2009-11-13 15:52             ` Artem Bityutskiy
  2009-11-13 15:56               ` Artem Bityutskiy
@ 2009-11-13 16:28               ` Joakim Tjernlund
  1 sibling, 0 replies; 27+ messages in thread
From: Joakim Tjernlund @ 2009-11-13 16:28 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd, Norbert van Bolhuis

>
> On Fri, 2009-11-13 at 16:09 +0100, Norbert van Bolhuis wrote:
> > OK, so now I understand commit_sem is a RW semaphore which can be acquired multiple
> > times by readers but only once for writers. I failed to see that initially.
> > I think this problem is not related to the commit_sem though.
>
> OK.
>
> > I changed the logging a bit and added the PID info.
> > The problem is indeed caused by a third task holding the "MTD chip lock". This
> > blocks the application which is blocking pdflush.
>
> OK.
>
> > This 3rd task is the UBI background thread. It starts to erase many NOR PEBsvery soon.
>
> The only reason I see why it would do this is because you attach an
> empty flash to UBI. In this case, UBI has to format it. And it is doing
> the formatting asynchronously, in the background thread.
>
> The idea was that we can allow using the UBI volume even if it is not
> fully formatted, and format in parallel. For NAND with its fast erase it
> works fine. With NOR it appears to be not so goot.
>
> Please, attach UBI, then wait until the UBI background thread is done,
> and then mount and start using UBIFS.
>
> The other option is to format the flash _properly_ before attaching it
> to UBI. In this case the background thread will not have to do that job.

Erasing blocks should never block other stuff as long as there is free
space available.
I had a similar problem with JFFS2 and rebooting during erase.
Turns out that if you remove some big files
and then reboot, it will hang during remount RO/umount until all
flash sectors has been erased. I fixed the problem with the following
patch, but it was never accepted as it is racy w.r.t module unload.
Fortunately I don't use modules so I got no issues with it.

 Jocke

>From 3184883eab1f0703dedc0ed3aa23e1bd94693601 Mon Sep 17 00:00:00 2001
From: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Date: Fri, 9 Nov 2007 13:48:08 +0100
Subject: [PATCH] [JFFS2] Stop erasing blocks when rebooting.

Rebooting shortly after deleting lots of big files
makes the reboot hang until all blocks has been erased,
which can take minutes if there are lots of blocks to erase.
This addresses the issue by moving the erasing to
pdflush_operation context and testing for the superblock
flags MS_RDONLY and MS_ACTIVE while erasing.
---
 fs/jffs2/erase.c |    8 ++++++++
 fs/jffs2/fs.c    |   11 ++++++++++-
 2 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/fs/jffs2/erase.c b/fs/jffs2/erase.c
index a1db918..ec95a28 100644
--- a/fs/jffs2/erase.c
+++ b/fs/jffs2/erase.c
@@ -106,6 +106,7 @@ static void jffs2_erase_block(struct jffs2_sb_info *c,
 void jffs2_erase_pending_blocks(struct jffs2_sb_info *c, int count)
 {
 	struct jffs2_eraseblock *jeb;
+	struct super_block *sb = OFNI_BS_2SFFJ(c);

 	down(&c->erase_free_sem);

@@ -114,6 +115,13 @@ void jffs2_erase_pending_blocks(struct jffs2_sb_info *c, int count)
 	while (!list_empty(&c->erase_complete_list) ||
 	       !list_empty(&c->erase_pending_list)) {

+		if ((sb->s_flags & MS_RDONLY) || !(sb->s_flags & MS_ACTIVE)) {
+			spin_unlock(&c->erase_completion_lock);
+			up(&c->erase_free_sem);
+			D1(printk(KERN_DEBUG "FS readonly/inactive. "
+				  "jffs2_erase_pending_blocks leaving\n"));
+			goto done;
+		}
 		if (!list_empty(&c->erase_complete_list)) {
 			jeb = list_entry(c->erase_complete_list.next, struct jffs2_eraseblock, list);
 			list_del(&jeb->list);
diff --git a/fs/jffs2/fs.c b/fs/jffs2/fs.c
index ed85f9a..f102607 100644
--- a/fs/jffs2/fs.c
+++ b/fs/jffs2/fs.c
@@ -13,6 +13,7 @@
 #include <linux/kernel.h>
 #include <linux/sched.h>
 #include <linux/fs.h>
+#include <linux/writeback.h>	/* for erasing blocks */
 #include <linux/list.h>
 #include <linux/mtd/mtd.h>
 #include <linux/pagemap.h>
@@ -385,6 +386,14 @@ int jffs2_remount_fs (struct super_block *sb, int *flags, char *data)
 	return 0;
 }

+void do_start_erase(unsigned long sb_arg)
+{
+	struct super_block *sb = (struct super_block *) sb_arg;
+	struct jffs2_sb_info *c = JFFS2_SB_INFO(sb);
+
+	jffs2_erase_pending_blocks(c, 0);
+}
+
 void jffs2_write_super (struct super_block *sb)
 {
 	struct jffs2_sb_info *c = JFFS2_SB_INFO(sb);
@@ -395,7 +404,7 @@ void jffs2_write_super (struct super_block *sb)

 	D1(printk(KERN_DEBUG "jffs2_write_super()\n"));
 	jffs2_garbage_collect_trigger(c);
-	jffs2_erase_pending_blocks(c, 0);
+	pdflush_operation(do_start_erase, (unsigned long)sb);
 	jffs2_flush_wbuf_gc(c, 0);
 }

--
1.6.4.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: a UBIFS image makes task pdflush blocked > 120 seconds
  2009-11-13 15:52             ` Artem Bityutskiy
@ 2009-11-13 15:56               ` Artem Bityutskiy
  2009-11-13 16:28               ` Joakim Tjernlund
  1 sibling, 0 replies; 27+ messages in thread
From: Artem Bityutskiy @ 2009-11-13 15:56 UTC (permalink / raw)
  To: Norbert van Bolhuis; +Cc: linux-mtd

On Fri, 2009-11-13 at 17:52 +0200, Artem Bityutskiy wrote:
> On Fri, 2009-11-13 at 16:09 +0100, Norbert van Bolhuis wrote:
> > OK, so now I understand commit_sem is a RW semaphore which can be acquired multiple
> > times by readers but only once for writers. I failed to see that initially.
> > I think this problem is not related to the commit_sem though.
> 
> OK.
> 
> > I changed the logging a bit and added the PID info.
> > The problem is indeed caused by a third task holding the "MTD chip lock". This
> > blocks the application which is blocking pdflush.
> 
> OK.
> 
> > This 3rd task is the UBI background thread. It starts to erase many NOR PEBs very soon.
> 
> The only reason I see why it would do this is because you attach an
> empty flash to UBI. In this case, UBI has to format it. And it is doing
> the formatting asynchronously, in the background thread.

And of course, it is trivial to implement a way for userspace "flush"
UBI work queue, which means you can open the UBI device, call 'fsync()',
and be blocked until it is formatted. It is not implemented, but it is
very easy to do.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: a UBIFS image makes task pdflush blocked > 120 seconds
  2009-11-13 15:09           ` Norbert van Bolhuis
@ 2009-11-13 15:52             ` Artem Bityutskiy
  2009-11-13 15:56               ` Artem Bityutskiy
  2009-11-13 16:28               ` Joakim Tjernlund
  0 siblings, 2 replies; 27+ messages in thread
From: Artem Bityutskiy @ 2009-11-13 15:52 UTC (permalink / raw)
  To: Norbert van Bolhuis; +Cc: linux-mtd

On Fri, 2009-11-13 at 16:09 +0100, Norbert van Bolhuis wrote:
> OK, so now I understand commit_sem is a RW semaphore which can be acquired multiple
> times by readers but only once for writers. I failed to see that initially.
> I think this problem is not related to the commit_sem though.

OK.

> I changed the logging a bit and added the PID info.
> The problem is indeed caused by a third task holding the "MTD chip lock". This
> blocks the application which is blocking pdflush.

OK.

> This 3rd task is the UBI background thread. It starts to erase many NOR PEBs very soon.

The only reason I see why it would do this is because you attach an
empty flash to UBI. In this case, UBI has to format it. And it is doing
the formatting asynchronously, in the background thread.

The idea was that we can allow using the UBI volume even if it is not
fully formatted, and format in parallel. For NAND with its fast erase it
works fine. With NOR it appears to be not so goot.

Please, attach UBI, then wait until the UBI background thread is done,
and then mount and start using UBIFS.

The other option is to format the flash _properly_ before attaching it
to UBI. In this case the background thread will not have to do that job.
Use ubiformat for this:

http://www.linux-mtd.infradead.org/faq/ubi.html#L_format_mtd

Please, confirm that you do not have this problem in that case.

> To be absolutely certain I also added log entries for the do_erase_oneblock
> function.
> To better reproduce this problem I decreased the warning threshold from 120 sec to 60 secs.
> 
> Here's the detailed info.
> 
> log entries (LE:xyz is the log entry number):
> ---------------------------------------------
> LE:0235, TIME=3984.244, pid=0918, M(xy=00)=do_erase_oneblock, START
> LE:0236, TIME=3984.836, pid=1236, M(xy=04)=commit_sem, make_reservation, Req DOWN 1	<- application creating a file
> LE:0237, TIME=3984.836, pid=1236, M(xy=05)=commit_sem, make_reservation, DOWN 1
> LE:0238, TIME=3984.836, pid=1236, M(xy=02)=io_mutex, reserve_space, Req LOCK 1
> LE:0239, TIME=3984.836, pid=1236, M(xy=03)=io_mutex, reserve_space, LOCK 1
> LE:0240, TIME=3985.100, pid=0918, M(xy=01)=do_erase_oneblock, END			<- ubi_bgt0d continuing erasing LEBs
> LE:0241, TIME=3985.100, pid=0918, M(xy=00)=do_erase_oneblock, START
>   <omitted many do_erase_oneblock END/START entries>
> LE:0275, TIME=3998.608, pid=0918, M(xy=00)=do_erase_oneblock, START
> LE:0276, TIME=3999.256, pid=0108, M(xy=04)=commit_sem, make_reservation, Req DOWN 1
> LE:0277, TIME=3999.256, pid=0108, M(xy=05)=commit_sem, make_reservation, DOWN 1
> LE:0278, TIME=3999.256, pid=0108, M(xy=02)=io_mutex, reserve_space, Req LOCK 1		<- pdflush blocks here
> LE:0279, TIME=3999.444, pid=0918, M(xy=01)=do_erase_oneblock, END
>   <omitted many do_erase_oneblock END/START entries>
> LE:0433, TIME=4062.400, pid=0918, M(xy=01)=do_erase_oneblock, END
> LE:0434, TIME=4062.400, pid=0918, M(xy=00)=do_erase_oneblock, START
> LE:0435, TIME=4063.124, pid=0918, M(xy=01)=do_erase_oneblock, END
> LE:0436, TIME=4063.124, pid=0918, M(xy=00)=do_erase_oneblock, START
> LE:0437, TIME=4063.308, pid=0004, M(xy=0a)=pdflush error report
> LE:0438, TIME=4066.120, pid=0918, M(xy=01)=do_erase_oneblock, END
> LE:0439, TIME=4066.120, pid=0918, M(xy=00)=do_erase_oneblock, START
> LE:0440, TIME=4066.876, pid=1236, M(xy=06)=io_mutex, release_head, Req UNLOCK
> LE:0441, TIME=4066.876, pid=1236, M(xy=07)=io_mutex, release_head, UNLOCK		<- application unlocks io_mutex
> LE:0442, TIME=4066.876, pid=1236, M(xy=08)=commit_sem, finish_reservation, Req UP
> LE:0443, TIME=4066.876, pid=1236, M(xy=09)=commit_sem, finish_reservation, UP
> LE:0444, TIME=4066.880, pid=0108, M(xy=03)=io_mutex, reserve_space, LOCK 1		<- pdflush can continue
> LE:0445, TIME=4066.880, pid=0108, M(xy=06)=io_mutex, release_head, Req UNLOCK
> LE:0446, TIME=4066.880, pid=0108, M(xy=07)=io_mutex, release_head, UNLOCK
> LE:0447, TIME=4066.880, pid=0108, M(xy=08)=commit_sem, finish_reservation, Req UP
> LE:0448, TIME=4066.880, pid=0108, M(xy=09)=commit_sem, finish_reservation, UP
> 
> 
> 
> relevant processes
> ------------------
> pdflush       D 00000000     0   108      2
> Call Trace:
> [ceb51b50] [7f64db78] 0x7f64db78 (unreliable)
> [ceb51c10] [c0008b24] 0xc0008b24        __switch_to
> [ceb51c30] [c0291e40] 0xc0291e40        schedule
> [ceb51c80] [c0292d60] 0xc0292d60        __mutex_lock_slowpath
> [ceb51cb0] [c013d5fc] 0xc013d5fc        make_reservation
> [ceb51d20] [c013ddf8] 0xc013ddf8        ubifs_jnl_write_inode
> [ceb51d60] [c014531c] 0xc014531c        ubifs_write_inode
> [ceb51d80] [c01410ec] 0xc01410ec        ubifs_writepage
> [ceb51db0] [c005b124] 0xc005b124        __writepage
> [ceb51dc0] [c005b948] 0xc005b948        write_cache_pages
> [ceb51e60] [c005babc] 0xc005babc        do_writepages
> [ceb51e70] [c00a3638] 0xc00a3638        __writeback_single_inode
> [ceb51ec0] [c00a3bfc] 0xc00a3bfc        generic_sync_sb_inodes
> [ceb51f00] [c00a439c] 0xc00a439c        writeback_inodes
> [ceb51f20] [c005c7e0] 0xc005c7e0        wb_kupdate
> [ceb51f80] [c005cfdc] 0xc005cfdc        pdflush
> [ceb51fd0] [c003bd88] 0xc003bd88        kthread
> [ceb51ff0] [c00114a0] 0xc00114a0        kernel_thread
> 
> 
> ubi_bgt0d     R running      0   918      2
> Call Trace:
> [ce6e1b80] [00001032] 0x001032 (unreliable)
> [ce6e1c40] [c0008b24] 0xc0008b24        __switch_to
> [ce6e1c60] [c0291e40] 0xc0291e40        schedule
> [ce6e1cb0] [c02926d4] 0xc02926d4        schedule_timeout
> [ce6e1cf0] [c002f684] 0xc002f684        msleep
> [ce6e1d00] [c01ee900] 0xc01ee900        do_erase_oneblock
> [ce6e1d60] [c01e98a0] 0xc01e98a0        cfi_varsize_frob
> [ce6e1da0] [c01ecdc4] 0xc01ecdc4        cfi_amdstd_erase_varsize
> [ce6e1db0] [c01e30b8] 0xc01e30b8        part_erase
> [ce6e1dc0] [c01e5800] 0xc01e5800        concat_erase
> [ce6e1e20] [c01e30b8] 0xc01e30b8        part_erase
> [ce6e1e30] [c01f8134] 0xc01f8134        do_sync_erase
> [ce6e1ec0] [c01f83f0] 0xc01f83f0        ubi_io_sync_erase
> [ce6e1f40] [c01f9658] 0xc01f9658        erase_worker
> [ce6e1f70] [c01f8cc0] 0xc01f8cc0        do_work
> [ce6e1f90] [c01fa078] 0xc01fa078        ubi_thread
> [ce6e1fd0] [c003bd88] 0xc003bd88        kthread
> [ce6e1ff0] [c00114a0] 0xc00114a0        kernel_thread
> 
> mainctrl      D 0fbd3f00     0  1236   1215
> Call Trace:
> [ce7fd9f0] [00000001] 0x000001 (unreliable)
> [ce7fdab0] [c0008b24] 0xc0008b24        __switch_to
> [ce7fdad0] [c0291e40] 0xc0291e40        schedule
> [ce7fdb20] [c01ea090] 0xc01ea090        get_chip
> [ce7fdb90] [c01eb8dc] 0xc01eb8dc        cfi_amdstd_write_buffers
> [ce7fdc20] [c01e2da8] 0xc01e2da8        part_write
> [ce7fdc30] [c01e4128] 0xc01e4128        concat_write
> [ce7fdc70] [c01e2da8] 0xc01e2da8        part_write
> [ce7fdc80] [c01f7854] 0xc01f7854        ubi_io_write
> [ce7fdcb0] [c01f63d4] 0xc01f63d4        ubi_eba_write_leb
> [ce7fdd10] [c01f546c] 0xc01f546c        ubi_leb_write
> [ce7fdd30] [c0148c6c] 0xc0148c6c        ubifs_wbuf_write_nolock
> [ce7fdd70] [c013dd0c] 0xc013dd0c        write_head
> [ce7fdd90] [c013e9e0] 0xc013e9e0        ubifs_jnl_update
> [ce7fde00] [c0143030] 0xc0143030        ubifs_create
> [ce7fde50] [c008caf8] 0xc008caf8        vfs_create
> [ce7fde70] [c0090a20] 0xc0090a20        do_filp_open
> [ce7fdf20] [c008155c] 0xc008155c        do_sys_open
> [ce7fdf40] [c001167c] 0xc001167c        ret_from_syscall
> 
> 
> relevant process (priorities)
> -----------------------------
> F S UID        PID  SPID  PPID  C PRI  NI ADDR SZ  WCHAN STIME TTY          TIME CMD
> 1 S root       108   108     2  0  80   0 -     0  5cf64 22:16 ?        00:00:00 [pdflush]
> 1 S root       918   918     2  0  75  -5 -     0 1fa2dc 22:16 ?        00:00:00 [ubi_bgt0d]
> 0 S root      1236  1236  1221  0  90  10 - 12617  49354 22:17 ttyS1    00:00:00 mainctrl
> 
> 
> 
> So the problem is that the application holding the io_mutex is blocked by the ubi bg task
> because of the "MTD chip lock".

Right, and the task is formatting your flash. Formatting means that for
each eraseblock it
   1. erases (very slow on NOR!)
   2. writes the UBI headers.

Your write task (pid=1236) is then blocked in CFI. If it is unlucky, it
may even wait for several block erasures, and then be able to take the
lock.

> I'm happy I understand the problem now, I guess it's not *that* serious.
> It does cause unnecessary delays however which I'd like to get rid of.

Try the formatting.

> I guess it wouldn't make sense to somehow release the per-journalhead io_mutex whenever
> blocked by the "MTD chip lock" ?

> How can I prevent the UBI background thread to erase so many NOR LEBs so soon ?
> (I hope the only option is not by creating a 156MB UBIFS image (preprogrammed by manufacturing) for
>   our 156 MB NOR partition).

No. The option is to use ubiformat. You can use this utility for
formatting, and also for flashing images, in which case it will flash
data, and the rest of the eraseblocks will be formatted.

> Would it make sense to lower the UBI background thread task prio ?

That would be a hack. You can even add a 'yield()' there, that would
also be a hack.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: a UBIFS image makes task pdflush blocked > 120 seconds
  2009-11-13  8:20         ` Artem Bityutskiy
@ 2009-11-13 15:09           ` Norbert van Bolhuis
  2009-11-13 15:52             ` Artem Bityutskiy
  0 siblings, 1 reply; 27+ messages in thread
From: Norbert van Bolhuis @ 2009-11-13 15:09 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd

> 
> Try to see deeper at what the other task is doing, and store process ids
> as well.
> 
> HTH.
> 

first of all, thanks for your help with this.

please forget about "M=(t= ,xy= )", it's a internal log code which is not interesting.

OK, so now I understand commit_sem is a RW semaphore which can be acquired multiple
times by readers but only once for writers. I failed to see that initially.
I think this problem is not related to the commit_sem though.

I changed the logging a bit and added the PID info.
The problem is indeed caused by a third task holding the "MTD chip lock". This
blocks the application which is blocking pdflush.
This 3rd task is the UBI background thread. It starts to erase many NOR PEBs very soon.
To be absolutely certain I also added log entries for the do_erase_oneblock
function.
To better reproduce this problem I decreased the warning threshold from 120 sec to 60 secs.

Here's the detailed info.

log entries (LE:xyz is the log entry number):
---------------------------------------------
LE:0235, TIME=3984.244, pid=0918, M(xy=00)=do_erase_oneblock, START
LE:0236, TIME=3984.836, pid=1236, M(xy=04)=commit_sem, make_reservation, Req DOWN 1	<- application creating a file
LE:0237, TIME=3984.836, pid=1236, M(xy=05)=commit_sem, make_reservation, DOWN 1
LE:0238, TIME=3984.836, pid=1236, M(xy=02)=io_mutex, reserve_space, Req LOCK 1
LE:0239, TIME=3984.836, pid=1236, M(xy=03)=io_mutex, reserve_space, LOCK 1
LE:0240, TIME=3985.100, pid=0918, M(xy=01)=do_erase_oneblock, END			<- ubi_bgt0d continuing erasing LEBs
LE:0241, TIME=3985.100, pid=0918, M(xy=00)=do_erase_oneblock, START
  <omitted many do_erase_oneblock END/START entries>
LE:0275, TIME=3998.608, pid=0918, M(xy=00)=do_erase_oneblock, START
LE:0276, TIME=3999.256, pid=0108, M(xy=04)=commit_sem, make_reservation, Req DOWN 1
LE:0277, TIME=3999.256, pid=0108, M(xy=05)=commit_sem, make_reservation, DOWN 1
LE:0278, TIME=3999.256, pid=0108, M(xy=02)=io_mutex, reserve_space, Req LOCK 1		<- pdflush blocks here
LE:0279, TIME=3999.444, pid=0918, M(xy=01)=do_erase_oneblock, END
  <omitted many do_erase_oneblock END/START entries>
LE:0433, TIME=4062.400, pid=0918, M(xy=01)=do_erase_oneblock, END
LE:0434, TIME=4062.400, pid=0918, M(xy=00)=do_erase_oneblock, START
LE:0435, TIME=4063.124, pid=0918, M(xy=01)=do_erase_oneblock, END
LE:0436, TIME=4063.124, pid=0918, M(xy=00)=do_erase_oneblock, START
LE:0437, TIME=4063.308, pid=0004, M(xy=0a)=pdflush error report
LE:0438, TIME=4066.120, pid=0918, M(xy=01)=do_erase_oneblock, END
LE:0439, TIME=4066.120, pid=0918, M(xy=00)=do_erase_oneblock, START
LE:0440, TIME=4066.876, pid=1236, M(xy=06)=io_mutex, release_head, Req UNLOCK
LE:0441, TIME=4066.876, pid=1236, M(xy=07)=io_mutex, release_head, UNLOCK		<- application unlocks io_mutex
LE:0442, TIME=4066.876, pid=1236, M(xy=08)=commit_sem, finish_reservation, Req UP
LE:0443, TIME=4066.876, pid=1236, M(xy=09)=commit_sem, finish_reservation, UP
LE:0444, TIME=4066.880, pid=0108, M(xy=03)=io_mutex, reserve_space, LOCK 1		<- pdflush can continue
LE:0445, TIME=4066.880, pid=0108, M(xy=06)=io_mutex, release_head, Req UNLOCK
LE:0446, TIME=4066.880, pid=0108, M(xy=07)=io_mutex, release_head, UNLOCK
LE:0447, TIME=4066.880, pid=0108, M(xy=08)=commit_sem, finish_reservation, Req UP
LE:0448, TIME=4066.880, pid=0108, M(xy=09)=commit_sem, finish_reservation, UP



relevant processes
------------------
pdflush       D 00000000     0   108      2
Call Trace:
[ceb51b50] [7f64db78] 0x7f64db78 (unreliable)
[ceb51c10] [c0008b24] 0xc0008b24        __switch_to
[ceb51c30] [c0291e40] 0xc0291e40        schedule
[ceb51c80] [c0292d60] 0xc0292d60        __mutex_lock_slowpath
[ceb51cb0] [c013d5fc] 0xc013d5fc        make_reservation
[ceb51d20] [c013ddf8] 0xc013ddf8        ubifs_jnl_write_inode
[ceb51d60] [c014531c] 0xc014531c        ubifs_write_inode
[ceb51d80] [c01410ec] 0xc01410ec        ubifs_writepage
[ceb51db0] [c005b124] 0xc005b124        __writepage
[ceb51dc0] [c005b948] 0xc005b948        write_cache_pages
[ceb51e60] [c005babc] 0xc005babc        do_writepages
[ceb51e70] [c00a3638] 0xc00a3638        __writeback_single_inode
[ceb51ec0] [c00a3bfc] 0xc00a3bfc        generic_sync_sb_inodes
[ceb51f00] [c00a439c] 0xc00a439c        writeback_inodes
[ceb51f20] [c005c7e0] 0xc005c7e0        wb_kupdate
[ceb51f80] [c005cfdc] 0xc005cfdc        pdflush
[ceb51fd0] [c003bd88] 0xc003bd88        kthread
[ceb51ff0] [c00114a0] 0xc00114a0        kernel_thread


ubi_bgt0d     R running      0   918      2
Call Trace:
[ce6e1b80] [00001032] 0x001032 (unreliable)
[ce6e1c40] [c0008b24] 0xc0008b24        __switch_to
[ce6e1c60] [c0291e40] 0xc0291e40        schedule
[ce6e1cb0] [c02926d4] 0xc02926d4        schedule_timeout
[ce6e1cf0] [c002f684] 0xc002f684        msleep
[ce6e1d00] [c01ee900] 0xc01ee900        do_erase_oneblock
[ce6e1d60] [c01e98a0] 0xc01e98a0        cfi_varsize_frob
[ce6e1da0] [c01ecdc4] 0xc01ecdc4        cfi_amdstd_erase_varsize
[ce6e1db0] [c01e30b8] 0xc01e30b8        part_erase
[ce6e1dc0] [c01e5800] 0xc01e5800        concat_erase
[ce6e1e20] [c01e30b8] 0xc01e30b8        part_erase
[ce6e1e30] [c01f8134] 0xc01f8134        do_sync_erase
[ce6e1ec0] [c01f83f0] 0xc01f83f0        ubi_io_sync_erase
[ce6e1f40] [c01f9658] 0xc01f9658        erase_worker
[ce6e1f70] [c01f8cc0] 0xc01f8cc0        do_work
[ce6e1f90] [c01fa078] 0xc01fa078        ubi_thread
[ce6e1fd0] [c003bd88] 0xc003bd88        kthread
[ce6e1ff0] [c00114a0] 0xc00114a0        kernel_thread

mainctrl      D 0fbd3f00     0  1236   1215
Call Trace:
[ce7fd9f0] [00000001] 0x000001 (unreliable)
[ce7fdab0] [c0008b24] 0xc0008b24        __switch_to
[ce7fdad0] [c0291e40] 0xc0291e40        schedule
[ce7fdb20] [c01ea090] 0xc01ea090        get_chip
[ce7fdb90] [c01eb8dc] 0xc01eb8dc        cfi_amdstd_write_buffers
[ce7fdc20] [c01e2da8] 0xc01e2da8        part_write
[ce7fdc30] [c01e4128] 0xc01e4128        concat_write
[ce7fdc70] [c01e2da8] 0xc01e2da8        part_write
[ce7fdc80] [c01f7854] 0xc01f7854        ubi_io_write
[ce7fdcb0] [c01f63d4] 0xc01f63d4        ubi_eba_write_leb
[ce7fdd10] [c01f546c] 0xc01f546c        ubi_leb_write
[ce7fdd30] [c0148c6c] 0xc0148c6c        ubifs_wbuf_write_nolock
[ce7fdd70] [c013dd0c] 0xc013dd0c        write_head
[ce7fdd90] [c013e9e0] 0xc013e9e0        ubifs_jnl_update
[ce7fde00] [c0143030] 0xc0143030        ubifs_create
[ce7fde50] [c008caf8] 0xc008caf8        vfs_create
[ce7fde70] [c0090a20] 0xc0090a20        do_filp_open
[ce7fdf20] [c008155c] 0xc008155c        do_sys_open
[ce7fdf40] [c001167c] 0xc001167c        ret_from_syscall


relevant process (priorities)
-----------------------------
F S UID        PID  SPID  PPID  C PRI  NI ADDR SZ  WCHAN STIME TTY          TIME CMD
1 S root       108   108     2  0  80   0 -     0  5cf64 22:16 ?        00:00:00 [pdflush]
1 S root       918   918     2  0  75  -5 -     0 1fa2dc 22:16 ?        00:00:00 [ubi_bgt0d]
0 S root      1236  1236  1221  0  90  10 - 12617  49354 22:17 ttyS1    00:00:00 mainctrl



So the problem is that the application holding the io_mutex is blocked by the ubi bg task
because of the "MTD chip lock".

I'm happy I understand the problem now, I guess it's not *that* serious.
It does cause unnecessary delays however which I'd like to get rid of.

I guess it wouldn't make sense to somehow release the per-journalhead io_mutex whenever
blocked by the "MTD chip lock" ?

How can I prevent the UBI background thread to erase so many NOR LEBs so soon ?
(I hope the only option is not by creating a 156MB UBIFS image (preprogrammed by manufacturing) for
  our 156 MB NOR partition).

Would it make sense to lower the UBI background thread task prio ?


thanks,
Norbert.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: a UBIFS image makes task pdflush blocked > 120 seconds
  2009-11-11 15:54       ` Norbert van Bolhuis
@ 2009-11-13  8:20         ` Artem Bityutskiy
  2009-11-13 15:09           ` Norbert van Bolhuis
  0 siblings, 1 reply; 27+ messages in thread
From: Artem Bityutskiy @ 2009-11-13  8:20 UTC (permalink / raw)
  To: Norbert van Bolhuis; +Cc: linux-mtd

Hi,

On Wed, 2009-11-11 at 16:54 +0100, Norbert van Bolhuis wrote:
> >> ok, will do that. I'll track commit_sem, io_mutex and ubifs_garbage_collect().
> >> I'll report back.
> >>
> >> Btw. stackdump is the same (2 out of 2 times).
> > 
> > OK. I really do not have any idea off the top of my head now, sorry.
> > Try to investigate this a beet deeper.
> > 
> 
> 
> I investigated this. It turn out it's some kind of race condition between the pdflush
> deamon and other tasks (e.g. events). It has everything to do the the commit_sem
> and io_mutex. It's not related to my preprogrammed UBIFS image (i've seen it also after
> a big file was written to UBIFS and a sync was done "simultaneously") nor is it related
> to ubifs_garbage_collect().
> 
> Here's what I did and how I conclude the above.
> 
> I logged (in memory) all commit_sem, io_mutex and ubifs_garbage_collect actions
> and parse the results to human readable text afterwards.
> "Req" means mutex/semaphore requested. If there's no "Req" the operation
> has been done.
> 
> 
> The race condition occurs for the below UBIFS pseaudo code (which is done by many
> UBIFS functions):
> 
> 
>   err = make_reservation () {
>    down_read(&c->commit_sem)				/*** commit_sem, make_reservation, DOWN 1 ***/
>    err = reserve_space(c, jhead, len) {
>     ...
>   again:
>     mutex_lock_nested(&wbuf->io_mutex, wbuf->jhead);	/*** io_mutex, reserve_space, LOCK 1 ***/
>     ...
>     avail = c->leb_size - wbuf->offs - wbuf->used;
>     if (wbuf->lnum != -1 && avail >= len)
>      return 0;
>    }
>    if (!err)
>   	return 0;
>   }
>   /* the journal head stays locked and later has to be unlocked using release_head() */
>   if (err)
>    goto out_free;
>   ...
>   err = write_head(c, BASEHD, dent, len, &lnum, &dent_offs, sync);
>   if (err)
>    goto out_release;
>   ...
>   release_head(c, BASEHD) {
>    mutex_unlock(&c->jheads[jhead].wbuf.io_mutex);	/*** io_mutex, release_head, UNLOCK ***/
>   }
>   ...
>   finish_reservation(c) {
>    up_read(&c->commit_sem);				/*** commit_sem, finish_reservation, UP ***/
>   }
> 
> 
> 
> 
> Normally (for one "update" I guess) the logs shows:
> 
> TIME=0000.000, M(t=8,xy=06)=commit_sem, make_reservation, Req DOWN 1
> TIME=0000.000, M(t=0,xy=06)=commit_sem, make_reservation, DOWN 1
> TIME=0000.000, M(t=9,xy=07)=io_mutex, reserve_space, Req LOCK 1
> TIME=0000.000, M(t=1,xy=07)=io_mutex, reserve_space, LOCK 1
> TIME=0000.480, M(t=9,xy=8a)=io_mutex, release_head, Req UNLOCK
> TIME=0000.480, M(t=1,xy=8a)=io_mutex, release_head, UNLOCK
> TIME=0000.480, M(t=8,xy=88)=commit_sem, finish_reservation, Req UP
> TIME=0000.480, M(t=0,xy=88)=commit_sem, finish_reservation, UP

What are M(t= ,xy= ) ?
Would be very useful to save process ID (current()->pid) as well.

> 
> 
> 
> when "another task" is also updating the log shows:
> 
> TIME=0304.252, M(t=8,xy=06)=commit_sem, make_reservation, Req DOWN 1
> TIME=0304.252, M(t=0,xy=06)=commit_sem, make_reservation, DOWN 1
> TIME=0304.252, M(t=9,xy=07)=io_mutex, reserve_space, Req LOCK 1
> TIME=0304.252, M(t=1,xy=07)=io_mutex, reserve_space, LOCK 1
> TIME=0304.292, M(t=8,xy=06)=commit_sem, make_reservation, Req DOWN 1
> TIME=0304.292, M(t=0,xy=06)=commit_sem, make_reservation, DOWN 1	<- 2nd DOWN !?

Right, this is RW semaphore, and in journal reservation we lock it for
reading (down_read()), and there may be many readers at a time.

The only place this semaphore is locked for writing is the commit code,
where we prohibit all journal accesses for the first phase of commit.
May be there is some interference with down_write(commit_sem) ?

> TIME=0304.292, M(t=9,xy=07)=io_mutex, reserve_space, Req LOCK 1
> TIME=0304.736, M(t=9,xy=8a)=io_mutex, release_head, Req UNLOCK
> TIME=0304.736, M(t=1,xy=8a)=io_mutex, release_head, UNLOCK
> TIME=0304.736, M(t=8,xy=88)=commit_sem, finish_reservation, Req UP
> TIME=0304.736, M(t=0,xy=88)=commit_sem, finish_reservation, UP
> TIME=0304.736, M(t=1,xy=07)=io_mutex, reserve_space, LOCK 1
> TIME=0304.736, M(t=9,xy=8a)=io_mutex, release_head, Req UNLOCK
> TIME=0304.736, M(t=1,xy=8a)=io_mutex, release_head, UNLOCK
> TIME=0304.736, M(t=8,xy=88)=commit_sem, finish_reservation, Req UP
> TIME=0304.736, M(t=0,xy=88)=commit_sem, finish_reservation, UP
> 
> I really don't understand the commit_sem here. Is there a 2nd copy of commit_sem ?

See above.

> the very first time pdflush interfers the pdflush error report shows
> (for which there's also a log entry):
> 
> TIME=0021.644, M(t=8,xy=06)=commit_sem, make_reservation, Req DOWN 1
> TIME=0021.644, M(t=0,xy=06)=commit_sem, make_reservation, DOWN 1
> TIME=0021.644, M(t=9,xy=07)=io_mutex, reserve_space, Req LOCK 1
> TIME=0021.644, M(t=1,xy=07)=io_mutex, reserve_space, LOCK 1
> TIME=0034.316, M(t=8,xy=06)=commit_sem, make_reservation, Req DOWN 1
> TIME=0034.316, M(t=0,xy=06)=commit_sem, make_reservation, DOWN 1	<- 2nd DOWN !?
> TIME=0034.316, M(t=9,xy=07)=io_mutex, reserve_space, Req LOCK 1		<- pdflush blocks here
> TIME=0168.868, M(t=3,xy=00)=pdflush error report
> TIME=0302.012, M(t=3,xy=00)=pdflush error report

Please, save PIDs as well, it is very useful. So now we need to figure
out why it cannot take the mutex fro so long time. You should look
deeper to the first task, which took the mutex and blocks pdflush. What
that task is doing. For sure it writes something to the flash, which
goes down to UBI, then to MTD, then to your flash driver. And probably
somewhere there the task is blocked for long time?

> TIME=0303.280, M(t=9,xy=8a)=io_mutex, release_head, Req UNLOCK
> TIME=0303.280, M(t=1,xy=8a)=io_mutex, release_head, UNLOCK		<- pdflush can continue
> TIME=0303.280, M(t=8,xy=88)=commit_sem, finish_reservation, Req UP
> TIME=0303.280, M(t=0,xy=88)=commit_sem, finish_reservation, UP
> TIME=0303.284, M(t=1,xy=07)=io_mutex, reserve_space, LOCK 1		<- pdflush continues here
> TIME=0303.284, M(t=9,xy=8a)=io_mutex, release_head, Req UNLOCK
> TIME=0303.284, M(t=1,xy=8a)=io_mutex, release_head, UNLOCK
> TIME=0303.284, M(t=8,xy=88)=commit_sem, finish_reservation, Req UP
> TIME=0303.284, M(t=0,xy=88)=commit_sem, finish_reservation, UP
> 
> 
> 
> I must admit I dont't see the root cause. I don't understand why the commit_sem
> doesn't seem to work.

Because it is R/W semaphore. In your investigations you looked at we
locks: the commit_sem (which is an R/W sem.) and io_mutex (which is a
normal mutex). Let's discuss what they protect.

1. commit_sem serializes all FS writers vs. the commit process. While
UBIFS has space in journals, tasks can just write. But once the journal
is full, we have to commit. When we commit, we need to stop all writers
for a short period of time (not for whole commit time). Thus, what we do
is that all journal writers take the commit_sem for reading, and the
commit function takes it for writing.

2. In UBIFS wa have several journal heads. We put different data to
different heads. And the io_mutex is per-head mutex. 2 tasks are allowed
to write at the same time to different heads. But if they want to write
to the same head, they are serialized by the io_mutex.

>  Nor do I understand why the "other task" needs to hold the
> io_mutex for more than 280 seconds.

You need to look deeper to that task. See where it goes next, and how it
spends some that time.

It might happen to be that by the time the other task releases io_mutex,
there is a third task which takes it, and by the time the third task
releases the mutex, there is a fourth task which takes it. So pdflush
waits very long. IOW, I think mutexes are not very fair.

However, I think that most likely the other task is just blocked for
long time somewhere in UBI or MTD level, for some reasons.

> If you see the reason or have any idea please let me know, otherwise I'll
> investigate this more thorough.

Try to see deeper at what the other task is doing, and store process ids
as well.

HTH.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: a UBIFS image makes task pdflush blocked > 120 seconds
  2009-10-14  8:56     ` Artem Bityutskiy
@ 2009-11-11 15:54       ` Norbert van Bolhuis
  2009-11-13  8:20         ` Artem Bityutskiy
  0 siblings, 1 reply; 27+ messages in thread
From: Norbert van Bolhuis @ 2009-11-11 15:54 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd


>> ok, will do that. I'll track commit_sem, io_mutex and ubifs_garbage_collect().
>> I'll report back.
>>
>> Btw. stackdump is the same (2 out of 2 times).
> 
> OK. I really do not have any idea off the top of my head now, sorry.
> Try to investigate this a beet deeper.
> 


I investigated this. It turn out it's some kind of race condition between the pdflush
deamon and other tasks (e.g. events). It has everything to do the the commit_sem
and io_mutex. It's not related to my preprogrammed UBIFS image (i've seen it also after
a big file was written to UBIFS and a sync was done "simultaneously") nor is it related
to ubifs_garbage_collect().

Here's what I did and how I conclude the above.

I logged (in memory) all commit_sem, io_mutex and ubifs_garbage_collect actions
and parse the results to human readable text afterwards.
"Req" means mutex/semaphore requested. If there's no "Req" the operation
has been done.


The race condition occurs for the below UBIFS pseaudo code (which is done by many
UBIFS functions):


  err = make_reservation () {
   down_read(&c->commit_sem)				/*** commit_sem, make_reservation, DOWN 1 ***/
   err = reserve_space(c, jhead, len) {
    ...
  again:
    mutex_lock_nested(&wbuf->io_mutex, wbuf->jhead);	/*** io_mutex, reserve_space, LOCK 1 ***/
    ...
    avail = c->leb_size - wbuf->offs - wbuf->used;
    if (wbuf->lnum != -1 && avail >= len)
     return 0;
   }
   if (!err)
  	return 0;
  }
  /* the journal head stays locked and later has to be unlocked using release_head() */
  if (err)
   goto out_free;
  ...
  err = write_head(c, BASEHD, dent, len, &lnum, &dent_offs, sync);
  if (err)
   goto out_release;
  ...
  release_head(c, BASEHD) {
   mutex_unlock(&c->jheads[jhead].wbuf.io_mutex);	/*** io_mutex, release_head, UNLOCK ***/
  }
  ...
  finish_reservation(c) {
   up_read(&c->commit_sem);				/*** commit_sem, finish_reservation, UP ***/
  }




Normally (for one "update" I guess) the logs shows:

TIME=0000.000, M(t=8,xy=06)=commit_sem, make_reservation, Req DOWN 1
TIME=0000.000, M(t=0,xy=06)=commit_sem, make_reservation, DOWN 1
TIME=0000.000, M(t=9,xy=07)=io_mutex, reserve_space, Req LOCK 1
TIME=0000.000, M(t=1,xy=07)=io_mutex, reserve_space, LOCK 1
TIME=0000.480, M(t=9,xy=8a)=io_mutex, release_head, Req UNLOCK
TIME=0000.480, M(t=1,xy=8a)=io_mutex, release_head, UNLOCK
TIME=0000.480, M(t=8,xy=88)=commit_sem, finish_reservation, Req UP
TIME=0000.480, M(t=0,xy=88)=commit_sem, finish_reservation, UP



when "another task" is also updating the log shows:

TIME=0304.252, M(t=8,xy=06)=commit_sem, make_reservation, Req DOWN 1
TIME=0304.252, M(t=0,xy=06)=commit_sem, make_reservation, DOWN 1
TIME=0304.252, M(t=9,xy=07)=io_mutex, reserve_space, Req LOCK 1
TIME=0304.252, M(t=1,xy=07)=io_mutex, reserve_space, LOCK 1
TIME=0304.292, M(t=8,xy=06)=commit_sem, make_reservation, Req DOWN 1
TIME=0304.292, M(t=0,xy=06)=commit_sem, make_reservation, DOWN 1	<- 2nd DOWN !?
TIME=0304.292, M(t=9,xy=07)=io_mutex, reserve_space, Req LOCK 1
TIME=0304.736, M(t=9,xy=8a)=io_mutex, release_head, Req UNLOCK
TIME=0304.736, M(t=1,xy=8a)=io_mutex, release_head, UNLOCK
TIME=0304.736, M(t=8,xy=88)=commit_sem, finish_reservation, Req UP
TIME=0304.736, M(t=0,xy=88)=commit_sem, finish_reservation, UP
TIME=0304.736, M(t=1,xy=07)=io_mutex, reserve_space, LOCK 1
TIME=0304.736, M(t=9,xy=8a)=io_mutex, release_head, Req UNLOCK
TIME=0304.736, M(t=1,xy=8a)=io_mutex, release_head, UNLOCK
TIME=0304.736, M(t=8,xy=88)=commit_sem, finish_reservation, Req UP
TIME=0304.736, M(t=0,xy=88)=commit_sem, finish_reservation, UP

I really don't understand the commit_sem here. Is there a 2nd copy of commit_sem ?



the very first time pdflush interfers the pdflush error report shows
(for which there's also a log entry):

TIME=0021.644, M(t=8,xy=06)=commit_sem, make_reservation, Req DOWN 1
TIME=0021.644, M(t=0,xy=06)=commit_sem, make_reservation, DOWN 1
TIME=0021.644, M(t=9,xy=07)=io_mutex, reserve_space, Req LOCK 1
TIME=0021.644, M(t=1,xy=07)=io_mutex, reserve_space, LOCK 1
TIME=0034.316, M(t=8,xy=06)=commit_sem, make_reservation, Req DOWN 1
TIME=0034.316, M(t=0,xy=06)=commit_sem, make_reservation, DOWN 1	<- 2nd DOWN !?
TIME=0034.316, M(t=9,xy=07)=io_mutex, reserve_space, Req LOCK 1		<- pdflush blocks here
TIME=0168.868, M(t=3,xy=00)=pdflush error report
TIME=0302.012, M(t=3,xy=00)=pdflush error report
TIME=0303.280, M(t=9,xy=8a)=io_mutex, release_head, Req UNLOCK
TIME=0303.280, M(t=1,xy=8a)=io_mutex, release_head, UNLOCK		<- pdflush can continue
TIME=0303.280, M(t=8,xy=88)=commit_sem, finish_reservation, Req UP
TIME=0303.280, M(t=0,xy=88)=commit_sem, finish_reservation, UP
TIME=0303.284, M(t=1,xy=07)=io_mutex, reserve_space, LOCK 1		<- pdflush continues here
TIME=0303.284, M(t=9,xy=8a)=io_mutex, release_head, Req UNLOCK
TIME=0303.284, M(t=1,xy=8a)=io_mutex, release_head, UNLOCK
TIME=0303.284, M(t=8,xy=88)=commit_sem, finish_reservation, Req UP
TIME=0303.284, M(t=0,xy=88)=commit_sem, finish_reservation, UP



I must admit I dont't see the root cause. I don't understand why the commit_sem
doesn't seem to work. Nor do I understand why the "other task" needs to hold the
io_mutex for more than 280 seconds.

If you see the reason or have any idea please let me know, otherwise I'll
investigate this more thorough.

thanks,
Norbert.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: a UBIFS image makes task pdflush blocked > 120 seconds
  2009-10-12 10:09   ` Norbert van Bolhuis
@ 2009-10-14  8:56     ` Artem Bityutskiy
  2009-11-11 15:54       ` Norbert van Bolhuis
  0 siblings, 1 reply; 27+ messages in thread
From: Artem Bityutskiy @ 2009-10-14  8:56 UTC (permalink / raw)
  To: Norbert van Bolhuis; +Cc: linux-mtd

On Mon, 2009-10-12 at 12:09 +0200, Norbert van Bolhuis wrote:
> Artem Bityutskiy wrote:
> > On Fri, 2009-10-09 at 15:02 +0200, Norbert van Bolhuis wrote:
> >> We're using a 19MB UBIFS image (preprogrammed by manufacturing)
> >> for a 156 MB NOR flash partition.
> > 
> > Wow, really huge NOR. Keep in mind that we have not tested UBIFS
> > for NOR well enough.
> It's a MTD_CONCATENATED 128MB NOR flash + 28MB NOR flash partition.
> ok, that's good to realize.
> Are there any (more) thorough UBIFS tests scheduled for NOR flash ?

Not by us. But you may do this. Although, I did test power-cut recovery
for NOR recently. This was on the "Kilauea" board.

> >> This message repeats once. Apart from the message everything is
> >> functioning OK.
> >> so it's the UBIFS commit_sem that's causing this.
> > 
> > Not the semaphore itself, but it is locked for too long time and we
> > cannot acquire it for too long. Since make_reservation needs it only
> > in read mode, it is obvious that the it is is locked somewhere in
> > the commit code, which is strange.
> > 
> > How full if the FS when this happens? What is your eraseblock size?
> > 
> 
> The UBIFS image contains only 1 file (of 18MB) and a few directories. Several
> application processes create (small) files and make modifications. I guess that
> the FS contains only 20MB.
> PEB-size = 131072, LEB-size = 130944

Hmm, then GC should have almost no work to do.

> >> We're using linux-2.6.28. The linux-next backport for 2.6.28
> >> (from git://git.infradead.org/~dedekind/ubifs-v2.6.28.git) changes
> >> are in.
> >>
> >> I guess that, initially, there's a lot of work to be done
> >> for UBI. I'm thinking about scan entire 156MB, add UBI VID/EC headers
> >> for the empty 137MB, make LEB mappings, etc..
> >>
> >> I don't understand why this would block UBIFS/pdflush.
> > 
> > It shouldn't. UBI spends time for scanning when you attach your flash.
> > You pay the price at the very beginning, and then it should be fast.
> > 
> 
> but, this "problem" does occur (~ 1 minute after) the very beginning (when power is
> applied for the 1st time to the board).

OK.

> > What I suggest you is to inject some code to UBIFS which measures for
> > how long the 'commit_sem' is locked in "write" mode, and find the times.
> > Then try to investigate why this actually happens. I cannot tell why
> > this could be, off the top of my head.
> > 
> 
> ok, will do that. I'll track commit_sem, io_mutex and ubifs_garbage_collect().
> I'll report back.
> 
> Btw. stackdump is the same (2 out of 2 times).

OK. I really do not have any idea off the top of my head now, sorry.
Try to investigate this a beet deeper.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: a UBIFS image makes task pdflush blocked > 120 seconds
  2009-10-11 13:52 ` Artem Bityutskiy
@ 2009-10-12 10:09   ` Norbert van Bolhuis
  2009-10-14  8:56     ` Artem Bityutskiy
  0 siblings, 1 reply; 27+ messages in thread
From: Norbert van Bolhuis @ 2009-10-12 10:09 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd

Artem Bityutskiy wrote:
> On Fri, 2009-10-09 at 15:02 +0200, Norbert van Bolhuis wrote:
>> We're using a 19MB UBIFS image (preprogrammed by manufacturing)
>> for a 156 MB NOR flash partition.
> 
> Wow, really huge NOR. Keep in mind that we have not tested UBIFS
> for NOR well enough.
> 

It's a MTD_CONCATENATED 128MB NOR flash + 28MB NOR flash partition.
ok, that's good to realize.
Are there any (more) thorough UBIFS tests scheduled for NOR flash ?

>>
>> This message repeats once. Apart from the message everything is
>> functioning OK.
>> so it's the UBIFS commit_sem that's causing this.
> 
> Not the semaphore itself, but it is locked for too long time and we
> cannot acquire it for too long. Since make_reservation needs it only
> in read mode, it is obvious that the it is is locked somewhere in
> the commit code, which is strange.
> 
> How full if the FS when this happens? What is your eraseblock size?
> 

The UBIFS image contains only 1 file (of 18MB) and a few directories. Several
application processes create (small) files and make modifications. I guess that
the FS contains only 20MB.
PEB-size = 131072, LEB-size = 130944

>> We're using linux-2.6.28. The linux-next backport for 2.6.28
>> (from git://git.infradead.org/~dedekind/ubifs-v2.6.28.git) changes
>> are in.
>>
>> I guess that, initially, there's a lot of work to be done
>> for UBI. I'm thinking about scan entire 156MB, add UBI VID/EC headers
>> for the empty 137MB, make LEB mappings, etc..
>>
>> I don't understand why this would block UBIFS/pdflush.
> 
> It shouldn't. UBI spends time for scanning when you attach your flash.
> You pay the price at the very beginning, and then it should be fast.
> 

but, this "problem" does occur (~ 1 minute after) the very beginning (when power is
applied for the 1st time to the board).

> What I suggest you is to inject some code to UBIFS which measures for
> how long the 'commit_sem' is locked in "write" mode, and find the times.
> Then try to investigate why this actually happens. I cannot tell why
> this could be, off the top of my head.
> 

ok, will do that. I'll track commit_sem, io_mutex and ubifs_garbage_collect().
I'll report back.

Btw. stackdump is the same (2 out of 2 times).

thanks,
Norbert.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: a UBIFS image makes task pdflush blocked > 120 seconds
  2009-10-11 14:04 ` Artem Bityutskiy
@ 2009-10-11 14:09   ` Artem Bityutskiy
  0 siblings, 0 replies; 27+ messages in thread
From: Artem Bityutskiy @ 2009-10-11 14:09 UTC (permalink / raw)
  To: Norbert van Bolhuis; +Cc: linux-mtd

On Sun, 2009-10-11 at 17:04 +0300, Artem Bityutskiy wrote:
> On Fri, 2009-10-09 at 15:02 +0200, Norbert van Bolhuis wrote:
> > We're using a 19MB UBIFS image (preprogrammed by manufacturing)
> > for a 156 MB NOR flash partition.
> > 
> > As soon as some of our application processes start reading/writing
> > to the UBIFS the below message occurs:
> > 
> > INFO: task pdflush:110 blocked for more than 120 seconds.
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > pdflush       D 00000000     0   110      2
> > 
> > the below call trace belongs to the message:
> > 
> > [ceb6fc10] [c0008b24] 0xc0008b24
> > [ceb6fc30] [c029b9a8] 0xc029b9a8 schedule
> > [ceb6fc80] [c029c8c8] 0xc029c8c8 __mutex_lock_slowpath
> > [ceb6fcb0] [c014211c] 0xc014211c make_reservation
> > [ceb6fd20] [c014291c] 0xc014291c ubifs_jnl_write_inode
> > [ceb6fd60] [c0149cc8] 0xc0149cc8 ubifs_write_inode
> > [ceb6fd80] [c0145bbc] 0xc0145bbc ubifs_writepage
> > [ceb6fdb0] [c005b0b8] 0xc005b0b8 __writepage
> > [ceb6fdc0] [c005b8dc] 0xc005b8dc write_cache_pages
> > [ceb6fe60] [c005ba50] 0xc005ba50 do_writepages
> > [ceb6fe70] [c00a35cc] 0xc00a35cc __writeback_single_inode
> > [ceb6fec0] [c00a3b90] 0xc00a3b90 generic_sync_sb_inodes
> > [ceb6ff00] [c00a4330] 0xc00a4330 writeback_inodes
> > [ceb6ff20] [c005c774] 0xc005c774 wb_kupdate
> > [ceb6ff80] [c005cf70] 0xc005cf70 pdflush
> > [ceb6ffd0] [c003bd30] 0xc003bd30
> > [ceb6fff0] [c0011480] 0xc0011480
> 
> Does the stackdump always look like this?
> 
> Also, it is probably not commit_sem, because __mutex_lock_slowpath
> is about mutexes, so this is about
> mutex_lock_nested(&wbuf->io_mutex, wbuf->jhead);

I guess slow GC may have this effect. Try to see how much time is
spent in 'ubifs_garbage_collect()'.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: a UBIFS image makes task pdflush blocked > 120 seconds
  2009-10-09 13:02 Norbert van Bolhuis
  2009-10-11 13:52 ` Artem Bityutskiy
@ 2009-10-11 14:04 ` Artem Bityutskiy
  2009-10-11 14:09   ` Artem Bityutskiy
  1 sibling, 1 reply; 27+ messages in thread
From: Artem Bityutskiy @ 2009-10-11 14:04 UTC (permalink / raw)
  To: Norbert van Bolhuis; +Cc: linux-mtd

On Fri, 2009-10-09 at 15:02 +0200, Norbert van Bolhuis wrote:
> We're using a 19MB UBIFS image (preprogrammed by manufacturing)
> for a 156 MB NOR flash partition.
> 
> As soon as some of our application processes start reading/writing
> to the UBIFS the below message occurs:
> 
> INFO: task pdflush:110 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> pdflush       D 00000000     0   110      2
> 
> the below call trace belongs to the message:
> 
> [ceb6fc10] [c0008b24] 0xc0008b24
> [ceb6fc30] [c029b9a8] 0xc029b9a8 schedule
> [ceb6fc80] [c029c8c8] 0xc029c8c8 __mutex_lock_slowpath
> [ceb6fcb0] [c014211c] 0xc014211c make_reservation
> [ceb6fd20] [c014291c] 0xc014291c ubifs_jnl_write_inode
> [ceb6fd60] [c0149cc8] 0xc0149cc8 ubifs_write_inode
> [ceb6fd80] [c0145bbc] 0xc0145bbc ubifs_writepage
> [ceb6fdb0] [c005b0b8] 0xc005b0b8 __writepage
> [ceb6fdc0] [c005b8dc] 0xc005b8dc write_cache_pages
> [ceb6fe60] [c005ba50] 0xc005ba50 do_writepages
> [ceb6fe70] [c00a35cc] 0xc00a35cc __writeback_single_inode
> [ceb6fec0] [c00a3b90] 0xc00a3b90 generic_sync_sb_inodes
> [ceb6ff00] [c00a4330] 0xc00a4330 writeback_inodes
> [ceb6ff20] [c005c774] 0xc005c774 wb_kupdate
> [ceb6ff80] [c005cf70] 0xc005cf70 pdflush
> [ceb6ffd0] [c003bd30] 0xc003bd30
> [ceb6fff0] [c0011480] 0xc0011480

Does the stackdump always look like this?

Also, it is probably not commit_sem, because __mutex_lock_slowpath
is about mutexes, so this is about
mutex_lock_nested(&wbuf->io_mutex, wbuf->jhead);

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: a UBIFS image makes task pdflush blocked > 120 seconds
  2009-10-09 13:02 Norbert van Bolhuis
@ 2009-10-11 13:52 ` Artem Bityutskiy
  2009-10-12 10:09   ` Norbert van Bolhuis
  2009-10-11 14:04 ` Artem Bityutskiy
  1 sibling, 1 reply; 27+ messages in thread
From: Artem Bityutskiy @ 2009-10-11 13:52 UTC (permalink / raw)
  To: Norbert van Bolhuis; +Cc: linux-mtd

On Fri, 2009-10-09 at 15:02 +0200, Norbert van Bolhuis wrote:
> We're using a 19MB UBIFS image (preprogrammed by manufacturing)
> for a 156 MB NOR flash partition.

Wow, really huge NOR. Keep in mind that we have not tested UBIFS
for NOR well enough.

> As soon as some of our application processes start reading/writing
> to the UBIFS the below message occurs:
> 
> INFO: task pdflush:110 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> pdflush       D 00000000     0   110      2
> 
> the below call trace belongs to the message:
> 
> [ceb6fc10] [c0008b24] 0xc0008b24
> [ceb6fc30] [c029b9a8] 0xc029b9a8 schedule
> [ceb6fc80] [c029c8c8] 0xc029c8c8 __mutex_lock_slowpath
> [ceb6fcb0] [c014211c] 0xc014211c make_reservation
> [ceb6fd20] [c014291c] 0xc014291c ubifs_jnl_write_inode
> [ceb6fd60] [c0149cc8] 0xc0149cc8 ubifs_write_inode
> [ceb6fd80] [c0145bbc] 0xc0145bbc ubifs_writepage
> [ceb6fdb0] [c005b0b8] 0xc005b0b8 __writepage
> [ceb6fdc0] [c005b8dc] 0xc005b8dc write_cache_pages
> [ceb6fe60] [c005ba50] 0xc005ba50 do_writepages
> [ceb6fe70] [c00a35cc] 0xc00a35cc __writeback_single_inode
> [ceb6fec0] [c00a3b90] 0xc00a3b90 generic_sync_sb_inodes
> [ceb6ff00] [c00a4330] 0xc00a4330 writeback_inodes
> [ceb6ff20] [c005c774] 0xc005c774 wb_kupdate
> [ceb6ff80] [c005cf70] 0xc005cf70 pdflush
> [ceb6ffd0] [c003bd30] 0xc003bd30
> [ceb6fff0] [c0011480] 0xc0011480
> 
> This message repeats once. Apart from the message everything is
> functioning OK.
> so it's the UBIFS commit_sem that's causing this.

Not the semaphore itself, but it is locked for too long time and we
cannot acquire it for too long. Since make_reservation needs it only
in read mode, it is obvious that the it is is locked somewhere in
the commit code, which is strange.

How full if the FS when this happens? What is your eraseblock size?

> We're using linux-2.6.28. The linux-next backport for 2.6.28
> (from git://git.infradead.org/~dedekind/ubifs-v2.6.28.git) changes
> are in.
> 
> I guess that, initially, there's a lot of work to be done
> for UBI. I'm thinking about scan entire 156MB, add UBI VID/EC headers
> for the empty 137MB, make LEB mappings, etc..
> 
> I don't understand why this would block UBIFS/pdflush.

It shouldn't. UBI spends time for scanning when you attach your flash.
You pay the price at the very beginning, and then it should be fast.

What I suggest you is to inject some code to UBIFS which measures for
how long the 'commit_sem' is locked in "write" mode, and find the times.
Then try to investigate why this actually happens. I cannot tell why
this could be, off the top of my head.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* a UBIFS image makes task pdflush blocked > 120 seconds
@ 2009-10-09 13:02 Norbert van Bolhuis
  2009-10-11 13:52 ` Artem Bityutskiy
  2009-10-11 14:04 ` Artem Bityutskiy
  0 siblings, 2 replies; 27+ messages in thread
From: Norbert van Bolhuis @ 2009-10-09 13:02 UTC (permalink / raw)
  To: linux-mtd


We're using a 19MB UBIFS image (preprogrammed by manufacturing)
for a 156 MB NOR flash partition.

As soon as some of our application processes start reading/writing
to the UBIFS the below message occurs:

INFO: task pdflush:110 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
pdflush       D 00000000     0   110      2

the below call trace belongs to the message:

[ceb6fc10] [c0008b24] 0xc0008b24
[ceb6fc30] [c029b9a8] 0xc029b9a8 schedule
[ceb6fc80] [c029c8c8] 0xc029c8c8 __mutex_lock_slowpath
[ceb6fcb0] [c014211c] 0xc014211c make_reservation
[ceb6fd20] [c014291c] 0xc014291c ubifs_jnl_write_inode
[ceb6fd60] [c0149cc8] 0xc0149cc8 ubifs_write_inode
[ceb6fd80] [c0145bbc] 0xc0145bbc ubifs_writepage
[ceb6fdb0] [c005b0b8] 0xc005b0b8 __writepage
[ceb6fdc0] [c005b8dc] 0xc005b8dc write_cache_pages
[ceb6fe60] [c005ba50] 0xc005ba50 do_writepages
[ceb6fe70] [c00a35cc] 0xc00a35cc __writeback_single_inode
[ceb6fec0] [c00a3b90] 0xc00a3b90 generic_sync_sb_inodes
[ceb6ff00] [c00a4330] 0xc00a4330 writeback_inodes
[ceb6ff20] [c005c774] 0xc005c774 wb_kupdate
[ceb6ff80] [c005cf70] 0xc005cf70 pdflush
[ceb6ffd0] [c003bd30] 0xc003bd30
[ceb6fff0] [c0011480] 0xc0011480

This message repeats once. Apart from the message everything is
functioning OK.

so it's the UBIFS commit_sem that's causing this.

We're using linux-2.6.28. The linux-next backport for 2.6.28
(from git://git.infradead.org/~dedekind/ubifs-v2.6.28.git) changes
are in.

I guess that, initially, there's a lot of work to be done
for UBI. I'm thinking about scan entire 156MB, add UBI VID/EC headers
for the empty 137MB, make LEB mappings, etc..

I don't understand why this would block UBIFS/pdflush.

I'm hoping someone can explain what's going on here.

Is there a way to avoid the situation ?

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2009-11-18 11:03 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <4B012D0D.4080500@aimvalley.nl>
2009-11-16 12:12 ` a UBIFS image makes task pdflush blocked > 120 seconds Joakim Tjernlund
2009-11-17  8:25 ` Artem Bityutskiy
2009-11-17 16:25   ` Norbert van Bolhuis
2009-11-18  8:28     ` Artem Bityutskiy
2009-11-18  9:26       ` Norbert van Bolhuis
2009-11-18  9:40         ` Artem Bityutskiy
2009-11-18 10:38           ` Joakim Tjernlund
2009-11-18 10:54             ` Artem Bityutskiy
2009-11-18 10:59               ` Norbert van Bolhuis
2009-11-18 11:01               ` Joakim Tjernlund
     [not found] <34637.10.10.0.184.1258202287.squirrel@intranet.aimsys.nl>
2009-11-16  8:13 ` Artem Bityutskiy
2009-11-16  8:53   ` Joakim Tjernlund
2009-11-16 23:22     ` Jamie Lokier
2009-11-17  8:31       ` Joakim Tjernlund
2009-11-17 10:45       ` Norbert van Bolhuis
2009-10-09 13:02 Norbert van Bolhuis
2009-10-11 13:52 ` Artem Bityutskiy
2009-10-12 10:09   ` Norbert van Bolhuis
2009-10-14  8:56     ` Artem Bityutskiy
2009-11-11 15:54       ` Norbert van Bolhuis
2009-11-13  8:20         ` Artem Bityutskiy
2009-11-13 15:09           ` Norbert van Bolhuis
2009-11-13 15:52             ` Artem Bityutskiy
2009-11-13 15:56               ` Artem Bityutskiy
2009-11-13 16:28               ` Joakim Tjernlund
2009-10-11 14:04 ` Artem Bityutskiy
2009-10-11 14:09   ` Artem Bityutskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.