From mboxrd@z Thu Jan  1 00:00:00 1970
From: Stephen Warren <swarren@wwwdotorg.org>
Date: Wed, 25 Mar 2020 14:01:21 -0600
Subject: ext4: invalid extent block on imx7
In-Reply-To: <2a2fd68d-9050-f681-1105-71d2d2efa886@siemens.com>
References: <bb48896d-62b8-dc82-61e6-17875dae3a60@siemens.com>
 <20200320182109.GD5793@bill-the-cat>
 <174f73c8-e821-2de1-4949-30ffb4e02f5c@siemens.com>
 <20200325150043.GR5793@bill-the-cat>
 <2a2fd68d-9050-f681-1105-71d2d2efa886@siemens.com>
Message-ID: <4cf145a0-aef4-c22d-4f91-5e3d6928fc85@wwwdotorg.org>
List-Id: <u-boot.lists.denx.de>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
To: u-boot@lists.denx.de

On 3/25/20 1:11 PM, Jan Kiszka wrote:
> On 25.03.20 16:00, Tom Rini wrote:
>> On Wed, Mar 25, 2020 at 07:32:30AM +0100, Jan Kiszka wrote:
>>> On 20.03.20 19:21, Tom Rini wrote:
>>>> On Mon, Mar 16, 2020 at 08:09:53PM +0100, Jan Kiszka wrote:
>>>>> Hi all,
>>>>>
>>>>> => ls mmc 0:1 /usr/lib/linux-image-4.9.11-1.3.0-dirty
>>>>> CACHE: Misaligned operation at range [bdfff998, bdfffd98]
>>>>> CACHE: Misaligned operation at range [bdfff998, bdfffd98]
>>>>> CACHE: Misaligned operation at range [bdfff998, bdfffd98]
>>>>> CACHE: Misaligned operation at range [bdfff998, bdfffd98]
>>>>> invalid extent block
>>>>>
>>>>> I'm using master (50be9f0e1ccc) on the MCIMX7SABRE, defconfig.
>>>>>
>>>>> What could this be? The filesystem is fine from Linux POV.
>>>>
>>>> Use tune2fs -l and see if there's any new'ish features enabled that we
>>>> need some sort of check-and-reject for would be my first guess.
>>>>
>>>
>>> Here are the reported feature flags:
>>>
>>> has_journal ext_attr resize_inode dir_index filetype extent 64bit
>>> flex_bg
>>> sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
>>
>> Of that, only metadata_csum means that you can't write to that image,
>> but you're just trying to read and that should be fine.? Can you go back
>> in time a little and see if this problem persists or if it's been
>> introduced of late?? Or recreate it on other platforms/SoCs?? Thanks!
>>
> 
> Bisected, regression of d5aee659f217 ("fs: ext4: cache extent data").
> Reverting this commit over master resolves the issue.
> 
> Any idea what could be wrong? What I noticed is that the extent has a
> zeroed magic when things go wrong, so maybe it is falsely considered to
> be cached?

This is puzzling. I took another look at that patch and I don't see
anything wrong. My guess would be:

- Some unrelated memory corruption bug was exposed simply because this
patch uses dynamic memory or stack slightly differently than before.

- Something writes to the cached block, whereas the cache code assumes
the buffer is read-only.

The cache metadata exists on the stack and so only lasts for the
duration of read_allocated_block() or ext4fs_read_file(), so there's no
issue with re-using the cache across different devices, or persisting
across an ext4 write operation or anything like that. Is this easy to
reproduce; is there a small disk image that shows the problem?