All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jaehoon Chung <jh80.chung@samsung.com>
To: Eric Whitney <enwlinux@gmail.com>,
	"Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Theodore Ts'o <tytso@mit.edu>,
	Matteo Croce <technoboy85@gmail.com>,
	David Jander <david@protonic.nl>,
	Dmitry Monakhov <dmonakhov@openvz.org>,
	linux-ext4@vger.kernel.org, Azat Khuzhin <a3at.mail@gmail.com>
Subject: Re: ext4: journal has aborted
Date: Fri, 11 Jul 2014 17:50:34 +0900	[thread overview]
Message-ID: <53BFA55A.8000201@samsung.com> (raw)
In-Reply-To: <20140711004507.GB26636@wallace>

On 07/11/2014 09:45 AM, Eric Whitney wrote:
> * Darrick J. Wong <darrick.wong@oracle.com>:
>> On Thu, Jul 10, 2014 at 06:32:45PM -0400, Theodore Ts'o wrote:
>>> To be clear, what you would need to do is to revert commit
>>> 007649375f6af242d5b1df2c15996949714303ba to prevent the fs corruption.
>>> Darrick's patch is one that tries to fix the problem addressed by that
>>> commit in a different fashion.
>>>
>>> Quite frankly, reverting the commit, which is causing real damage, is
>>> far more impotrant to me right now than what to do in order allow
>>> CONFIG_EXT4FS_DEBUG to work (which is nice, but it's only something
>>> that file system developers need, and to be honest I can't remember
>>> the last time I've used said config option).  But if we know that
>>> Darrick's fix works, I'm willing to push that to Linus at the same
>>> time that I push a revert of 007649375f6af242d5b1df2c15996949714303ba
>>
>> Reverting the 007649375... patch doesn't seem to create any obvious regressions
>> on my test box (though again, I was never able to reproduce it as consistently
>> as Eric W.).
>>
>> Tossing in the [1] patch also fixes the crash when CONFIG_EXT4_DEBUG=y on
>> 3.16-rc4.  I'd say it's safe to send both to Linus and stable.
>>
>> If anyone experiences problems that I'm not seeing, please yell loudly and
>> soon!
>>
> 
> Reverting the suspect patch - 007649375f - on 3.16-rc3 and running on the
> Panda yielded 10 successive "successful" generic/068 failures (no block
> bitmap trouble on reboot).  So, it looks like that patch is all of it.
In my case, after reverting it, i didn't find the bitmap corrupt problem at exynos board.
Before reverting it, when i try to reboot, it occurred the problem at almost every time.
(Kernel version is 3.16-rv4, eMMC5.0 card is used.)

Best Regards,
Jaehoon Chung
> 
> Running the same test scenario on Darrick's patch (CONFIG_EXT4FS_DEBUG =>
> CONFIG_EXT4_DEBUG) applied to 3.16-rc3 lead to exactly the same result.
> No panics, BUGS, or other misbehavior whether generic/068 completed 
> successfully or failed (and that test used here simply because it was
> convenient) and no trouble on boot, etc.
> 
> Let me know if anything else is needed.
> 
> Eric
> 
>> --D
>>
>> [1] http://www.spinics.net/lists/linux-ext4/msg43287.html
>>>
>>> Cheers,
>>>
>>> 						- Ted
>>>
>>> On Thu, Jul 10, 2014 at 11:31:14PM +0200, Matteo Croce wrote:
>>>> Will do, thanks!
>>>>
>>>> 2014-07-10 22:01 GMT+02:00 Darrick J. Wong <darrick.wong@oracle.com>:
>>>>> On Thu, Jul 10, 2014 at 02:57:48PM -0400, Eric Whitney wrote:
>>>>>> * Theodore Ts'o <tytso@mit.edu>:
>>>>>>> On Mon, Jul 07, 2014 at 11:53:10AM -0400, Theodore Ts'o wrote:
>>>>>>>> An update from today's ext4 concall.  Eric Whitney can fairly reliably
>>>>>>>> reproduce this on his Panda board with 3.15, and definitely not on
>>>>>>>> 3.14.  So at this point there seems to be at least some kind of 3.15
>>>>>>>> regression going on here, regardless of whether it's in the eMMC
>>>>>>>> driver or the ext4 code.  (It also means that the bug fix I found is
>>>>>>>> irrelevant for the purposes of working this issue, since that's a much
>>>>>>>> harder to hit, and that bug has been around long before 3.14.)
>>>>>>>>
>>>>>>>> The problem in terms of narrowing it down any further is that the
>>>>>>>> Pandaboard is running into RCU bugs which makes it hard to test the
>>>>>>>> early 3.15-rcX kernels.....
>>>>>>>
>>>>>>> In the hopes of making it easy to bisect, I've created a kernel branch
>>>>>>> which starts with 3.14, and then adds on all of the ext4-related
>>>>>>> commits since then.   You can find it at:
>>>>>>>
>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git test-mb_generate_buddy-failure
>>>>>>>
>>>>>>> https://git.kernel.org/cgit/linux/kernel/git/tytso/ext4.git/log/?h=test-mb_generate_buddy-failure
>>>>>>>
>>>>>>> Eric, can you see if you can repro the failure on your Panda Board?
>>>>>>> If you can, try doing a bisection search on these series:
>>>>>>>
>>>>>>> git bisect start
>>>>>>> git bisect good v3.14
>>>>>>> git bisect bad test-mb_generate_buddy-failure
>>>>>>>
>>>>>>> Hopefully if it is caused by one of the commits in this series, we'll
>>>>>>> be able to pin point it this way.
>>>>>>
>>>>>> First, the good news (with luck):
>>>>>>
>>>>>> My testing currently suggests that the patch causing this regression was
>>>>>> pulled into 3.15-rc3 -
>>>>>>
>>>>>> 007649375f6af242d5b1df2c15996949714303ba
>>>>>> ext4: initialize multi-block allocator before checking block descriptors
>>>>>>
>>>>>> Bisection by selectively reverting ext4 commits in -rc3 identified this patch
>>>>>> while running on the Pandaboard.  I'm still using generic/068 as my reproducer.
>>>>>> It occasionally yields a false negative, but it has passed 10 consecutive
>>>>>> trials on my revert/bisect kernel derived from 3.15-rc3.  Given the frequency
>>>>>> of false negatives I've seen, I'm reasonably confident in that result.  I'm
>>>>>> going to run another series with just that patch reverted on 3.16-rc3.
>>>>>>
>>>>>> Looking at the patch, the call to ext4_mb_init() was hoisted above the code
>>>>>> performing journal recovery in ext4_fill_super().  The regression occurs only
>>>>>> after journal recovery on the root filesystem.
>>>>>
>>>>> Thanks for finding the culprit! :)
>>>>>
>>>>> Can you apply this patch, build with CONFIG_EXT4FS_DEBUG=y, and see if an
>>>>> FS will mount without crashing?  This was the cruddy patch I sent in (and later
>>>>> killed) that fixed the crash on mount with EXT4FS_DEBUG in a somewhat silly
>>>>> way.  Maybe it's appropriate now.
>>>>> http://www.spinics.net/lists/linux-ext4/msg43287.html
>>>>>
>>>>> --D
>>>>>
>>>>>>
>>>>>> Secondly:
>>>>>>
>>>>>> Thanks for that git tree!  However, I discovered that the same "RCU bug" I
>>>>>> thought I was seeing on the Panda was also visible on the x86_64 KVM, and
>>>>>> it was actually just RCU noticing stalls.  These also occurred when using
>>>>>> your git tree as well as on mainline 3.15-rc1 and 3.15-rc2 and during
>>>>>> bisection attempts on 3.15-rc3 within the ext4 patches, and had the effect of
>>>>>> masking the regression on the root filesystem.  The test system would lock up
>>>>>> completely - no console response - and made it impossible to force the reboot
>>>>>> which was required to set up the failure.  Hence the reversion approach, since
>>>>>> RCU does not report stalls in 3.15-rc3 (final).
>>>>>>
>>>>>> Eric
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thanks!!
>>>>>>>
>>>>>>>                                             - Ted
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>>>
>>>> -- 
>>>> Matteo Croce
>>>> OpenWrt Developer
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


  reply	other threads:[~2014-07-11  8:50 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-30 21:30 ext4: journal has aborted Matteo Croce
2014-07-01  6:26 ` David Jander
2014-07-01  8:00   ` Matteo Croce
2014-07-01  8:42   ` Darrick J. Wong
2014-07-01  8:55     ` Matteo Croce
2014-07-02 13:49       ` Dmitry Monakhov
2014-07-03 13:43       ` Theodore Ts'o
2014-07-03 14:15         ` David Jander
2014-07-03 14:46           ` Theodore Ts'o
2014-07-03 14:57           ` Dmitry Monakhov
2014-07-03 14:58           ` Dmitry Monakhov
2014-07-04  9:40             ` David Jander
2014-07-04 10:17               ` Dmitry Monakhov
2014-07-04 11:28                 ` David Jander
2014-07-04 12:20                   ` Theodore Ts'o
2014-07-04 12:38                     ` Dmitry Monakhov
2014-07-04 13:45                     ` David Jander
2014-07-04 18:45                       ` Theodore Ts'o
2014-07-04 22:46                         ` Dave Chinner
2014-07-05  2:30                         ` Dmitry Monakhov
2014-07-05 20:36                         ` Theodore Ts'o
2014-07-07 12:17                         ` David Jander
2014-07-07 15:53                           ` Theodore Ts'o
2014-07-07 22:31                             ` Darrick J. Wong
2014-07-07 22:56                             ` Theodore Ts'o
2014-07-10 18:57                               ` Eric Whitney
2014-07-10 20:01                                 ` Darrick J. Wong
2014-07-10 21:31                                   ` Matteo Croce
2014-07-10 22:32                                     ` Theodore Ts'o
2014-07-11  0:13                                       ` Darrick J. Wong
2014-07-11  0:45                                         ` Eric Whitney
2014-07-11  8:50                                           ` Jaehoon Chung [this message]
2014-07-11 11:43                                           ` Theodore Ts'o
2014-07-15  6:31                                           ` David Jander
2014-07-10 23:29                                 ` Azat Khuzhin
2014-07-04 11:04               ` Jaehoon Chung
2014-07-04 11:32                 ` David Jander
2014-07-01 12:07     ` Jaehoon Chung
2014-07-01 13:50       ` David Jander
2014-07-01 15:58       ` Theodore Ts'o
2014-07-01 16:14         ` Lukáš Czerner
2014-07-01 16:36         ` Eric Whitney
2014-07-02  8:34           ` Matteo Croce
2014-07-02 10:17           ` David Jander
2014-07-02 10:19             ` Matteo Croce
2014-07-03 17:14               ` Eric Whitney
2014-07-03 23:17                 ` Theodore Ts'o
2014-07-04 20:48                   ` Eric Whitney
2014-07-02  9:44         ` David Jander
2014-07-01  9:02   ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53BFA55A.8000201@samsung.com \
    --to=jh80.chung@samsung.com \
    --cc=a3at.mail@gmail.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@protonic.nl \
    --cc=dmonakhov@openvz.org \
    --cc=enwlinux@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=technoboy85@gmail.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.