linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Remounting filesystem read-only
       [not found] <366cf3ac534bbadaaa61714a43006ac7@codeaurora.org>
@ 2018-07-27 19:26 ` Sodagudi Prasad
  2018-07-27 19:52   ` Theodore Y. Ts'o
  0 siblings, 1 reply; 6+ messages in thread
From: Sodagudi Prasad @ 2018-07-27 19:26 UTC (permalink / raw)
  To: tytso, adilger.kernel, wen.xu; +Cc: linux-ext4, linux-kernel

On 2018-07-26 18:04, Sodagudi Prasad wrote:
> Hi All,
> 

+linux-kernel@vger.kernel.org list.

Hi All,

Observing the following issue with one of the partition on android 
device with 4.14.56 kernel.  When I try to remount this partition using 
the command - mount -o rw,remount /vendor/dsp, it is remounting as 
read-only.

[  191.364358] EXT4-fs error (device sde9): ext4_has_uninit_itable:3108: 
comm mount: Inode table for bg 0 marked as needing zeroing
[  191.364762] Aborting journal on device sde9-8.
[  191.365226] EXT4-fs (sde9): Remounting filesystem read-only
[  191.365232] EXT4-fs (sde9): re-mounted. Opts: data=ordere

If I revert this commit [1] -"ext4: only look at the bg_flags field if 
it is valid", the issue is not observed. It is just giving following 
warning message.
[1] - https://patchwork.ozlabs.org/patch/929239/.

[  123.373456] EXT4-fs (sde9): warning: mounting fs with errors, running 
e2fsck is recommended
[  123.389649] EXT4-fs (sde9): re-mounted. Opts: data=ordered

Can you provide some inputs what could be the issue with this partition?

> Observing the following issue with one of the partition on android
> device with 4.14.56 kernel.  When I try to remount this partition
> using the command - mount -o rw,remount /vendor/dsp, it is remounting
> as read-only.
> 
> [  191.364358] EXT4-fs error (device sde9):
> ext4_has_uninit_itable:3108: comm mount: Inode table for bg 0 marked
> as needing zeroing
> [  191.364762] Aborting journal on device sde9-8.
> [  191.365226] EXT4-fs (sde9): Remounting filesystem read-only
> [  191.365232] EXT4-fs (sde9): re-mounted. Opts: data=ordere
> 
> If I revert this commit [1] -"ext4: only look at the bg_flags field if
> it is valid", the issue is not observed. It is just giving following
> warning message.
> [1] - https://patchwork.ozlabs.org/patch/929239/.
> 
> [  123.373456] EXT4-fs (sde9): warning: mounting fs with errors,
> running e2fsck is recommended
> [  123.389649] EXT4-fs (sde9): re-mounted. Opts: data=ordered
> 
> Can you provide some inputs what could be the issue with this 
> partition?
> 
> -Thanks, Prasad

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,
Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Remounting filesystem read-only
  2018-07-27 19:26 ` Remounting filesystem read-only Sodagudi Prasad
@ 2018-07-27 19:52   ` Theodore Y. Ts'o
  2018-07-27 20:34     ` Sodagudi Prasad
  0 siblings, 1 reply; 6+ messages in thread
From: Theodore Y. Ts'o @ 2018-07-27 19:52 UTC (permalink / raw)
  To: Sodagudi Prasad; +Cc: adilger.kernel, wen.xu, linux-ext4, linux-kernel

On Fri, Jul 27, 2018 at 12:26:25PM -0700, Sodagudi Prasad wrote:
> On 2018-07-26 18:04, Sodagudi Prasad wrote:
> > Hi All,
> > 
> 
> +linux-kernel@vger.kernel.org list.
> 
> Hi All,
> 
> Observing the following issue with one of the partition on android device
> with 4.14.56 kernel.  When I try to remount this partition using the command
> - mount -o rw,remount /vendor/dsp, it is remounting as read-only.
> 
> [  191.364358] EXT4-fs error (device sde9): ext4_has_uninit_itable:3108:
> comm mount: Inode table for bg 0 marked as needing zeroing
> [  191.364762] Aborting journal on device sde9-8.
> [  191.365226] EXT4-fs (sde9): Remounting filesystem read-only
> [  191.365232] EXT4-fs (sde9): re-mounted. Opts: data=ordere
> 
> If I revert this commit [1] -"ext4: only look at the bg_flags field if it is
> valid", the issue is not observed. It is just giving following warning
> message.
> [1] - https://patchwork.ozlabs.org/patch/929239/.
> 
> [  123.373456] EXT4-fs (sde9): warning: mounting fs with errors, running
> e2fsck is recommendedt
> [  123.389649] EXT4-fs (sde9): re-mounted. Opts: data=ordered
> 
> Can you provide some inputs what could be the issue with this partition?

The error should be pretty clear: "Inode table for bg 0 marked as
needing zeroing".  That should never happen.  The warning "mounting fs
with errors" means the file system was corrupted.  The commit is now
telling more about how the file system was corrupted, and is
protecting you from further data loss.  (You shoud never, ever, ever,
allow a file system to be mounted read/write with corruptions.  The
file system should have been checked using fsck.ext4, aka e2fsck,
before the file system was allowed to be mounted.)

       	   	       	   	      - Ted

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Remounting filesystem read-only
  2018-07-27 19:52   ` Theodore Y. Ts'o
@ 2018-07-27 20:34     ` Sodagudi Prasad
  2018-07-28  0:18       ` Theodore Y. Ts'o
  0 siblings, 1 reply; 6+ messages in thread
From: Sodagudi Prasad @ 2018-07-27 20:34 UTC (permalink / raw)
  To: Sodagudi Prasad; +Cc: adilger.kernel, wen.xu, linux-ext4, linux-kernel

On 2018-07-27 12:52, Theodore Y. Ts'o wrote:
> On Fri, Jul 27, 2018 at 12:26:25PM -0700, Sodagudi Prasad wrote:
>> On 2018-07-26 18:04, Sodagudi Prasad wrote:
>> > Hi All,
>> >
>> 
>> +linux-kernel@vger.kernel.org list.
>> 
>> Hi All,
>> 
>> Observing the following issue with one of the partition on android 
>> device
>> with 4.14.56 kernel.  When I try to remount this partition using the 
>> command
>> - mount -o rw,remount /vendor/dsp, it is remounting as read-only.
>> 
>> [  191.364358] EXT4-fs error (device sde9): 
>> ext4_has_uninit_itable:3108:
>> comm mount: Inode table for bg 0 marked as needing zeroing
>> [  191.364762] Aborting journal on device sde9-8.
>> [  191.365226] EXT4-fs (sde9): Remounting filesystem read-only
>> [  191.365232] EXT4-fs (sde9): re-mounted. Opts: data=ordere
>> 
>> If I revert this commit [1] -"ext4: only look at the bg_flags field if 
>> it is
>> valid", the issue is not observed. It is just giving following warning
>> message.
>> [1] - https://patchwork.ozlabs.org/patch/929239/.
>> 
>> [  123.373456] EXT4-fs (sde9): warning: mounting fs with errors, 
>> running
>> e2fsck is recommendedt
>> [  123.389649] EXT4-fs (sde9): re-mounted. Opts: data=ordered
>> 
>> Can you provide some inputs what could be the issue with this 
>> partition?
> 
> The error should be pretty clear: "Inode table for bg 0 marked as
> needing zeroing".  That should never happen.

Hi Ted,

Can you provide any debug patch to detect when this corruption is 
happening?
Source of this corruption and how this is partition getting corrupted?
Or which file system operation lead to this corruption?

I am digging code a bit around this warning to understand more.

Thanks in advance for your help.

-Thanks, Prasad

> The warning "mounting fs
> with errors" means the file system was corrupted.  The commit is now
> telling more about how the file system was corrupted, and is
> protecting you from further data loss.  (You shoud never, ever, ever,
> allow a file system to be mounted read/write with corruptions.  The
> file system should have been checked using fsck.ext4, aka e2fsck,
> before the file system was allowed to be mounted.)
> 
>        	   	       	   	      - Ted

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,
Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Remounting filesystem read-only
  2018-07-27 20:34     ` Sodagudi Prasad
@ 2018-07-28  0:18       ` Theodore Y. Ts'o
  2018-07-28  7:47         ` Darrick J. Wong
  0 siblings, 1 reply; 6+ messages in thread
From: Theodore Y. Ts'o @ 2018-07-28  0:18 UTC (permalink / raw)
  To: Sodagudi Prasad; +Cc: adilger.kernel, wen.xu, linux-ext4, linux-kernel

On Fri, Jul 27, 2018 at 01:34:31PM -0700, Sodagudi Prasad wrote:
> > The error should be pretty clear: "Inode table for bg 0 marked as
> > needing zeroing".  That should never happen.
> 
> Can you provide any debug patch to detect when this corruption is happening?
> Source of this corruption and how this is partition getting corrupted?
> Or which file system operation lead to this corruption?

Do you have a reliable repro?  If it's a one-off, it can be caused by
*anything*.  Crappy hardware, a bug in some proprietary, binary-only
GPU driver dereferencing some wild pointer that corrupts kernel
memory, etc.

Asking for a debug patch is like asking for "can you create technology
that can detect when a cockroach enter my house?"

So if you have a reliable repro, then we know what operations might be
triggering the corruption, and then you work on creating a minimal
repro, and only *then* when we have a restricted set of possibilities
that might be the cause (for example, if removing a GPU call makes the
problem go away, then the patch would need to be in the proprietary
GPU driver....)

> I am digging code a bit around this warning to understand more.

The warning means that a flag in block group descriptor #0 is set
that should never be set.  How did the flag get set?  There is any
number of things that could cause that.

You might want to look at the block group descriptor via dumpe2fs or
debugfs, to see if it's just a single bit getting flipped, or if the
entire block group descriptor is garbage.  Note that under normal code
paths, the flag *never* gets set by ext4 kernel code.  The flag will
get set on non-block group 0 block group descriptors by ext4, and the
ext4 kernel code will only clear the flag.

Of course, if there is a bug in some driver that dereferences a
pointer widely, all bets are off.

					- Ted

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Remounting filesystem read-only
  2018-07-28  0:18       ` Theodore Y. Ts'o
@ 2018-07-28  7:47         ` Darrick J. Wong
  2018-08-02  2:23           ` Sodagudi Prasad
  0 siblings, 1 reply; 6+ messages in thread
From: Darrick J. Wong @ 2018-07-28  7:47 UTC (permalink / raw)
  To: Theodore Y. Ts'o, Sodagudi Prasad, adilger.kernel, wen.xu,
	linux-ext4, linux-kernel

On Fri, Jul 27, 2018 at 08:18:23PM -0400, Theodore Y. Ts'o wrote:
> On Fri, Jul 27, 2018 at 01:34:31PM -0700, Sodagudi Prasad wrote:
> > > The error should be pretty clear: "Inode table for bg 0 marked as
> > > needing zeroing".  That should never happen.
> > 
> > Can you provide any debug patch to detect when this corruption is happening?
> > Source of this corruption and how this is partition getting corrupted?
> > Or which file system operation lead to this corruption?
> 
> Do you have a reliable repro?  If it's a one-off, it can be caused by
> *anything*.  Crappy hardware, a bug in some proprietary, binary-only
> GPU driver dereferencing some wild pointer that corrupts kernel
> memory, etc.
> 
> Asking for a debug patch is like asking for "can you create technology
> that can detect when a cockroach enter my house?"

Well, ext4 *could* add metadata read and write verifiers to complain
loudly in dmesg about stuff that shouldn't be there, so at least we'd
know when we're writing cockroaches into the house... :)

--D

> So if you have a reliable repro, then we know what operations might be
> triggering the corruption, and then you work on creating a minimal
> repro, and only *then* when we have a restricted set of possibilities
> that might be the cause (for example, if removing a GPU call makes the
> problem go away, then the patch would need to be in the proprietary
> GPU driver....)
> 
> > I am digging code a bit around this warning to understand more.
> 
> The warning means that a flag in block group descriptor #0 is set
> that should never be set.  How did the flag get set?  There is any
> number of things that could cause that.
> 
> You might want to look at the block group descriptor via dumpe2fs or
> debugfs, to see if it's just a single bit getting flipped, or if the
> entire block group descriptor is garbage.  Note that under normal code
> paths, the flag *never* gets set by ext4 kernel code.  The flag will
> get set on non-block group 0 block group descriptors by ext4, and the
> ext4 kernel code will only clear the flag.
> 
> Of course, if there is a bug in some driver that dereferences a
> pointer widely, all bets are off.
> 
> 					- Ted

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Remounting filesystem read-only
  2018-07-28  7:47         ` Darrick J. Wong
@ 2018-08-02  2:23           ` Sodagudi Prasad
  0 siblings, 0 replies; 6+ messages in thread
From: Sodagudi Prasad @ 2018-08-02  2:23 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Theodore Y. Ts'o, adilger.kernel, wen.xu, linux-ext4, linux-kernel

On 2018-07-28 00:47, Darrick J. Wong wrote:
> On Fri, Jul 27, 2018 at 08:18:23PM -0400, Theodore Y. Ts'o wrote:
>> On Fri, Jul 27, 2018 at 01:34:31PM -0700, Sodagudi Prasad wrote:
>> > > The error should be pretty clear: "Inode table for bg 0 marked as
>> > > needing zeroing".  That should never happen.
>> >
>> > Can you provide any debug patch to detect when this corruption is happening?
>> > Source of this corruption and how this is partition getting corrupted?
>> > Or which file system operation lead to this corruption?
>> 
>> Do you have a reliable repro?  If it's a one-off, it can be caused by
>> *anything*.  Crappy hardware, a bug in some proprietary, binary-only
>> GPU driver dereferencing some wild pointer that corrupts kernel
>> memory, etc.
>> 
>> Asking for a debug patch is like asking for "can you create technology
>> that can detect when a cockroach enter my house?"
> 
> Well, ext4 *could* add metadata read and write verifiers to complain
> loudly in dmesg about stuff that shouldn't be there, so at least we'd
> know when we're writing cockroaches into the house... :)
> 
> --D
> 
Hi Ted,

Below change fixed this issue. Thanks for your support.

https://github.com/torvalds/linux/commit/5012284700775a4e6e3fbe7eac4c543c4874b559

"ext4: fix check to prevent initializing reserved inodes"

-Thanks, Prasad

>> So if you have a reliable repro, then we know what operations might be
>> triggering the corruption, and then you work on creating a minimal
>> repro, and only *then* when we have a restricted set of possibilities
>> that might be the cause (for example, if removing a GPU call makes the
>> problem go away, then the patch would need to be in the proprietary
>> GPU driver....)
>> 
>> > I am digging code a bit around this warning to understand more.
>> 
>> The warning means that a flag in block group descriptor #0 is set
>> that should never be set.  How did the flag get set?  There is any
>> number of things that could cause that.
>> 
>> You might want to look at the block group descriptor via dumpe2fs or
>> debugfs, to see if it's just a single bit getting flipped, or if the
>> entire block group descriptor is garbage.  Note that under normal code
>> paths, the flag *never* gets set by ext4 kernel code.  The flag will
>> get set on non-block group 0 block group descriptors by ext4, and the
>> ext4 kernel code will only clear the flag.
>> 
>> Of course, if there is a bug in some driver that dereferences a
>> pointer widely, all bets are off.
>> 
>> 					- Ted

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,
Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-08-02  2:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <366cf3ac534bbadaaa61714a43006ac7@codeaurora.org>
2018-07-27 19:26 ` Remounting filesystem read-only Sodagudi Prasad
2018-07-27 19:52   ` Theodore Y. Ts'o
2018-07-27 20:34     ` Sodagudi Prasad
2018-07-28  0:18       ` Theodore Y. Ts'o
2018-07-28  7:47         ` Darrick J. Wong
2018-08-02  2:23           ` Sodagudi Prasad

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).