linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ahmad Fatoum <a.fatoum@pengutronix.de>
To: Tokunori Ikegami <ikegami.t@gmail.com>,
	Thorsten Leemhuis <regressions@leemhuis.info>,
	linux-mtd@lists.infradead.org, Joakim.Tjernlund@infinera.com,
	miquel.raynal@bootlin.com, vigneshr@ti.com, richard@nod.at,
	"regressions@lists.linux.dev" <regressions@lists.linux.dev>
Cc: Chris Packham <chris.packham@alliedtelesis.co.nz>,
	Brian Norris <computersforpeace@gmail.com>,
	David Woodhouse <dwmw2@infradead.org>,
	marek.vasut@gmail.com, cyrille.pitchen@wedev4u.fr,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Pengutronix Kernel Team <kernel@pengutronix.de>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [BUG] mtd: cfi_cmdset_0002: write regression since v4.17-rc1
Date: Mon, 7 Feb 2022 15:28:10 +0100	[thread overview]
Message-ID: <b231b498-c8d2-28af-ce66-db8c168047f7@pengutronix.de> (raw)
In-Reply-To: <0f2cfcac-83ca-51a9-f92c-ff6495dca1d7@gmail.com>

Hello Tokunori-san,

On 29.01.22 19:01, Tokunori Ikegami wrote:
> Hi Ahmad-san,
> 
> Thanks for your investigation.
> 
>> The issue is still there with #define FORCE_WORD_WRITE 1:
>>
>>    jffs2: Write clean marker to block at 0x000a0000 failed: -5
>>    MTD do_write_oneword_once(): software timeout
> Which kernel version has been tested about this?

I last tested with v5.10.30, but I had briefly tried v5.16-rc as well
when first debugging this issue.

I have rebased onto v5.17-rc2 now and will use that for further tests.
The same issue with word write forcing is reproducible there as well.

> Since the buffered writes disabled by 7e4404113686 for S29GL256N and tested on kernel 5.10.16.
> So I would like to confirm if the issue depended on the CPU or kernel version, etc.
> Note: The chips S29GL064N and S29GL256N seem different the flash Mb size basically.

I see. To be extra sure, I have replaced 0x2201 with 0x0c01 to hit
the same code paths, but no improvement.

>> Doesn't seem to be a buffered write issue here though as the writes
>> did work fine before dfeae1073583. Any other ideas?
> At first I thought the issue is possible to be resolved by using the word write instead of the buffered writes.
> Now I am thinking to disable the changes dfeae1073583 partially with any condition if possible.

What seems to work for me is checking if chip_good or chip_ready
and map_word is equal to 0xFF. I can't justify why this is ok though.
(Worst case bus is floating at this point of time and Hi-Z is read
as 0xff on CPU data lines...)

> By the way could you please let me know the chip information for more detail? (For example model number, cycle and device ID, etc.)

I can't read it off the chip, but vendor uses S29GL064N90FFI02 or S29GL964N11FFI02.
Kernel reports it with:
ff800000.flash: Found 1 x16 devices at 0x0 in 8-bit bank. Manufacturer ID 0x000001 Chip ID 0x000c01

I am not sure what you mean with cycle. If you tell me what
command to run, I can paste the output.

Thanks,
Ahmad



> 
> Regards,
> Ikegami
> 
> 
> On 2021/12/14 16:23, Thorsten Leemhuis wrote:
> 
>>>> [TLDR: adding this regression to regzbot; most of this mail is compiled
>>>> from a few templates paragraphs some of you might have seen already.]
>>>>
>>>> Hi, this is your Linux kernel regression tracker speaking.
>>>>
>>>> Top-posting for once, to make this easy accessible to everyone.
>>>>
>>>> Thanks for the report.
>>>>
>>>> Adding the regression mailing list to the list of recipients, as it
>>>> should be in the loop for all regressions, as explained here:
>>>> https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html
>>>>
>>>> To be sure this issue doesn't fall through the cracks unnoticed, I'm
>>>> adding it to regzbot, my Linux kernel regression tracking bot:
>>>>
>>>> #regzbot ^introduced dfeae1073583
>>>> #regzbot title mtd: cfi_cmdset_0002: flash write accesses on the
>>>> hardware fail on a PowerPC MPC8313 to a 8-bit-parallel S29GL064N flash
>>>> #regzbot ignore-activity
>>>>
>>>> Reminder: when fixing the issue, please add a 'Link:' tag with the URL
>>>> to the report (the parent of this mail), then regzbot will automatically
>>>> mark the regression as resolved once the fix lands in the appropriate
>>>> tree. For more details about regzbot see footer.
>>>>
>>>> Sending this to everyone that got the initial report, to make all aware
>>>> of the tracking. I also hope that messages like this motivate people to
>>>> directly get at least the regression mailing list and ideally even
>>>> regzbot involved when dealing with regressions, as messages like this
>>>> wouldn't be needed then.
>>>>
>>>> Don't worry, I'll send further messages wrt to this regression just to
>>>> the lists (with a tag in the subject so people can filter them away), as
>>>> long as they are intended just for regzbot. With a bit of luck no such
>>>> messages will be needed anyway.
>>>>
>>>> Ciao, Thorsten (wearing his 'Linux kernel regression tracker' hat).
>>>>
>>>> P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
>>>> on my table. I can only look briefly into most of them. Unfortunately
>>>> therefore I sometimes will get things wrong or miss something important.
>>>> I hope that's not the case here; if you think it is, don't hesitate to
>>>> tell me about it in a public reply. That's in everyone's interest, as
>>>> what I wrote above might be misleading to everyone reading this; any
>>>> suggestion I gave thus might sent someone reading this down the wrong
>>>> rabbit hole, which none of us wants.
>>>>
>>>> BTW, I have no personal interest in this issue, which is tracked using
>>>> regzbot, my Linux kernel regression tracking bot
>>>> (https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
>>>> this mail to get things rolling again and hence don't need to be CC on
>>>> all further activities wrt to this regression.
>>>>
>>>> On 13.12.21 14:24, Ahmad Fatoum wrote:
>>>>> Hi,
>>>>>
>>>>> I've been investigating a breakage on a PowerPC MPC8313: The SoC is connected
>>>>> via the "Enhanced Local Bus Controller" to a 8-bit-parallel S29GL064N flash,
>>>>> which is represented as a memory-mapped cfi-flash.
>>>>>
>>>>> The regression began in v4.17-rc1 with
>>>>>
>>>>>     dfeae1073583 ("mtd: cfi_cmdset_0002: Change write buffer to check correct value")
>>>>>
>>>>> and causes all flash write accesses on the hardware to fail. Example output
>>>>> after v5.1-rc2[1]:
>>>>>
>>>>>     root@host:~# mount -t jffs2 /dev/mtdblock0 /mnt
>>>>>     MTD do_write_buffer_wait(): software timeout, address:0x000c000b.
>>>>>     jffs2: Write clean marker to block at 0x000c0000 failed: -5
>>>>>
>>>>> This issue still persists with v5.16-rc. Reverting aforementioned patch fixes
>>>>> it, but I am still looking for a change that keeps both Tokunori's and my
>>>>> hardware happy.
>>>>>
>>>>> What Tokunori's patch did is that it strengthened the success condition
>>>>> for flash writes:
>>>>>
>>>>>    - Prior to the patch, DQ polling was done until bits
>>>>>      stopped toggling. This was taken as an indicator that the write succeeded
>>>>>      and was reported up the stack. i.e. success condition is chip_ready()
>>>>>
>>>>>    - After the patch, polling continues until the just written data is
>>>>>      actually read back, i.e. success condition is chip_good()
>>>>>
>>>>> This new condition never holds for me, when DQ stabilizes, it reads 0xFF,
>>>>> never the just written data. The data is still written and can be read back
>>>>> on subsequent reads, just not at that point of time in the poll loop.
>>>>>
>>>>> We haven't had write issues for the years predating that patch. As the
>>>>> regression has been mainline for a while, I am wondering what about my setup
>>>>> that makes it pop up here, but not elsewhere?
>>>>>
>>>>> I consulted the data sheet[2] and found Figure 27, which describes DQ polling
>>>>> during embedded algorithms. DQ switches from status output to "True" (I assume
>>>>> True == all bits set == 0xFF) until CS# is reasserted.
>>>>>
>>>>> I compared with another chip's datasheet, and it (Figure 8.4) doesn't describe
>>>>> such an intermittent "True" state. In any case, the driver polls a few hundred
>>>>> times, however, before giving up, so there should be enough CS# toggles.
>>>>>
>>>>>
>>>>> Locally, I'll revert this patch for now. I think accepting 0xFF as a success
>>>>> condition may be appropriate, but I don't yet have the rationale to back it up.
>>>>>
>>>>> I am investigating this some more, probably with a logic trace, but I wanted
>>>>> to report this in case someone has pointers and in case other people run into
>>>>> the same issue.
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Ahmad
>>>>>
>>>>> [1] Prior to d9b8a67b3b95 ("mtd: cfi: fix deadloop in cfi_cmdset_0002.c do_write_buffer")
>>>>>       first included with v5.1-rc2, failing writes just hung indefinitely in kernel space.
>>>>>       That's fixed, but the writes still fail.
>>>>>
>>>>> [2]: 001-98525 Rev. *B, https://www.infineon.com/dgdl/Infineon-S29GL064N_S29GL032N_64_Mbit_32_Mbit_3_V_Page_Mode_MirrorBit_Flash-DataSheet-v03_00-EN.pdf?fileId=8ac78c8c7d0d8da4017d0ed556fd548b
>>>>>
>>>>> [3]: https://www.mouser.com/datasheet/2/268/SST39VF1601C-SST39VF1602C-16-Mbit-x16-Multi-Purpos-709008.pdf
>>>>>        Note that "true data" means valid data here, not all bits one.
>>>>>
>>
> 


-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

  reply	other threads:[~2022-02-07 14:46 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-13 13:24 [BUG] mtd: cfi_cmdset_0002: write regression since v4.17-rc1 Ahmad Fatoum
2021-12-14  7:23 ` Thorsten Leemhuis
2021-12-15 17:34   ` Tokunori Ikegami
2022-01-20 13:00     ` Thorsten Leemhuis
2022-01-28 12:55     ` Ahmad Fatoum
2022-01-29 18:01       ` Tokunori Ikegami
2022-02-07 14:28         ` Ahmad Fatoum [this message]
2022-02-13 16:47           ` Tokunori Ikegami
2022-02-14 16:22             ` Ahmad Fatoum
2022-02-14 18:46               ` Tokunori Ikegami
2022-02-20 12:22                 ` Tokunori Ikegami
2022-03-04 11:11                   ` Ahmad Fatoum
2022-03-06 15:49                     ` Tokunori Ikegami
2022-03-08  9:44                       ` Ahmad Fatoum
2022-03-08 16:13                         ` Tokunori Ikegami
2022-03-08 16:23                           ` Ahmad Fatoum
2022-03-08 16:40                             ` Tokunori Ikegami

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b231b498-c8d2-28af-ce66-db8c168047f7@pengutronix.de \
    --to=a.fatoum@pengutronix.de \
    --cc=Joakim.Tjernlund@infinera.com \
    --cc=chris.packham@alliedtelesis.co.nz \
    --cc=computersforpeace@gmail.com \
    --cc=cyrille.pitchen@wedev4u.fr \
    --cc=dwmw2@infradead.org \
    --cc=ikegami.t@gmail.com \
    --cc=kernel@pengutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mtd@lists.infradead.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=marek.vasut@gmail.com \
    --cc=miquel.raynal@bootlin.com \
    --cc=regressions@leemhuis.info \
    --cc=regressions@lists.linux.dev \
    --cc=richard@nod.at \
    --cc=vigneshr@ti.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).