All of lore.kernel.org
 help / color / mirror / Atom feed
* an infinite loop in ext4 in 3.14
@ 2014-04-17 19:23 Mikulas Patocka
  2014-04-17 21:16 ` Theodore Ts'o
  0 siblings, 1 reply; 4+ messages in thread
From: Mikulas Patocka @ 2014-04-17 19:23 UTC (permalink / raw)
  To: linux-ext4, Theodore Ts'o; +Cc: linux-kernel

Hi

I hit a bug in ext4 - jbd2 was stuck in an infinite loop when remounting 
the root filesystem read-only during shutdown.

The filesystem is ext3, but it uses the ext4 driver with the following 
options: rw,relatime,discard,errors=remount-ro,data=ordered

The machine was stuck, but it was possible to obtain a stacktrace with 
Alt-SysRQ-P. I put several stacktraces here:

http://people.redhat.com/~mpatocka/crashes/ext4/

The stacktraces change in jbd2_log_do_checkpoint and 
jbd2_cleanup_journal_tail. jbd2_log_do_checkpoint doesn't call 
jbd2_cleanup_journal_tail from a loop, so the the probable location of the 
infinite loop was this piece of code in jbd2_journal_flush:

while (!err && journal->j_checkpoint_transactions != NULL) {

Mikulas

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: an infinite loop in ext4 in 3.14
  2014-04-17 19:23 an infinite loop in ext4 in 3.14 Mikulas Patocka
@ 2014-04-17 21:16 ` Theodore Ts'o
  2014-04-17 22:36   ` Mikulas Patocka
  2014-04-19 20:33   ` Mikulas Patocka
  0 siblings, 2 replies; 4+ messages in thread
From: Theodore Ts'o @ 2014-04-17 21:16 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: linux-ext4, linux-kernel

On Thu, Apr 17, 2014 at 03:23:13PM -0400, Mikulas Patocka wrote:
> 
> I hit a bug in ext4 - jbd2 was stuck in an infinite loop when remounting 
> the root filesystem read-only during shutdown.

Is this at all repeatable?  I suspect what happened is that we're not
checking the error return from jbd2_log_do_checkpoint(), and if it ran
into an error doing the jbd2_log_do_checkpoint --- for example, if it
wasn't able to write to the journal --- say, because __wait_cp_io()
returned -EIO, we might be spinning in the while loop in jbd2_journal_flush:

> while (!err && journal->j_checkpoint_transactions != NULL) {


(as you suspected).

I can add some error checking, but it would be interesting to know if
you can easily reproduce the problem so we can confirm if that's what
was really going on.

Regards,

						- Ted

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: an infinite loop in ext4 in 3.14
  2014-04-17 21:16 ` Theodore Ts'o
@ 2014-04-17 22:36   ` Mikulas Patocka
  2014-04-19 20:33   ` Mikulas Patocka
  1 sibling, 0 replies; 4+ messages in thread
From: Mikulas Patocka @ 2014-04-17 22:36 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4, linux-kernel



On Thu, 17 Apr 2014, Theodore Ts'o wrote:

> On Thu, Apr 17, 2014 at 03:23:13PM -0400, Mikulas Patocka wrote:
> > 
> > I hit a bug in ext4 - jbd2 was stuck in an infinite loop when remounting 
> > the root filesystem read-only during shutdown.
> 
> Is this at all repeatable?

No - it happened just once.

> I suspect what happened is that we're not
> checking the error return from jbd2_log_do_checkpoint(), and if it ran
> into an error doing the jbd2_log_do_checkpoint --- for example, if it

There were no I/O errors on the console when the lockup happened.

> wasn't able to write to the journal --- say, because __wait_cp_io()
> returned -EIO, we might be spinning in the while loop in jbd2_journal_flush:
> 
> > while (!err && journal->j_checkpoint_transactions != NULL) {
> 
> 
> (as you suspected).
> 
> I can add some error checking, but it would be interesting to know if
> you can easily reproduce the problem so we can confirm if that's what
> was really going on.

I can write a script that reboots the machine and run it overnight...

> Regards,
> 
> 						- Ted

Mikulas

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: an infinite loop in ext4 in 3.14
  2014-04-17 21:16 ` Theodore Ts'o
  2014-04-17 22:36   ` Mikulas Patocka
@ 2014-04-19 20:33   ` Mikulas Patocka
  1 sibling, 0 replies; 4+ messages in thread
From: Mikulas Patocka @ 2014-04-19 20:33 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4, linux-kernel



On Thu, 17 Apr 2014, Theodore Ts'o wrote:

> On Thu, Apr 17, 2014 at 03:23:13PM -0400, Mikulas Patocka wrote:
> > 
> > I hit a bug in ext4 - jbd2 was stuck in an infinite loop when remounting 
> > the root filesystem read-only during shutdown.
> 
> Is this at all repeatable?  I suspect what happened is that we're not
> checking the error return from jbd2_log_do_checkpoint(), and if it ran
> into an error doing the jbd2_log_do_checkpoint --- for example, if it
> wasn't able to write to the journal --- say, because __wait_cp_io()
> returned -EIO, we might be spinning in the while loop in jbd2_journal_flush:
> 
> > while (!err && journal->j_checkpoint_transactions != NULL) {
> 
> 
> (as you suspected).
> 
> I can add some error checking, but it would be interesting to know if
> you can easily reproduce the problem so we can confirm if that's what
> was really going on.
> 
> Regards,
> 
> 						- Ted

Hi

It turned out that the computer had bad memory, there were other stability 
issues - so you can ignore this bug report.

Mikulas

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-04-19 20:33 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-17 19:23 an infinite loop in ext4 in 3.14 Mikulas Patocka
2014-04-17 21:16 ` Theodore Ts'o
2014-04-17 22:36   ` Mikulas Patocka
2014-04-19 20:33   ` Mikulas Patocka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.