regressions.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* Re: [BUG] mtd: cfi_cmdset_0002: write regression since v4.17-rc1
       [not found] <b687c259-6413-26c9-d4c9-b3afa69ea124@pengutronix.de>
@ 2021-12-14  7:23 ` Thorsten Leemhuis
  2021-12-15 17:34   ` Tokunori Ikegami
  0 siblings, 1 reply; 16+ messages in thread
From: Thorsten Leemhuis @ 2021-12-14  7:23 UTC (permalink / raw)
  To: Ahmad Fatoum, linux-mtd, ikegami, Joakim.Tjernlund,
	miquel.raynal, vigneshr, richard, ikegami.t, regressions
  Cc: Chris Packham, Brian Norris, David Woodhouse, marek.vasut,
	cyrille.pitchen, linux-kernel, Pengutronix Kernel Team,
	linuxppc-dev, Shaohui.Xie

[TLDR: adding this regression to regzbot; most of this mail is compiled
from a few templates paragraphs some of you might have seen already.]

Hi, this is your Linux kernel regression tracker speaking.

Top-posting for once, to make this easy accessible to everyone.

Thanks for the report.

Adding the regression mailing list to the list of recipients, as it
should be in the loop for all regressions, as explained here:
https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html

To be sure this issue doesn't fall through the cracks unnoticed, I'm
adding it to regzbot, my Linux kernel regression tracking bot:

#regzbot ^introduced dfeae1073583
#regzbot title mtd: cfi_cmdset_0002: flash write accesses on the
hardware fail on a PowerPC MPC8313 to a 8-bit-parallel S29GL064N flash
#regzbot ignore-activity

Reminder: when fixing the issue, please add a 'Link:' tag with the URL
to the report (the parent of this mail), then regzbot will automatically
mark the regression as resolved once the fix lands in the appropriate
tree. For more details about regzbot see footer.

Sending this to everyone that got the initial report, to make all aware
of the tracking. I also hope that messages like this motivate people to
directly get at least the regression mailing list and ideally even
regzbot involved when dealing with regressions, as messages like this
wouldn't be needed then.

Don't worry, I'll send further messages wrt to this regression just to
the lists (with a tag in the subject so people can filter them away), as
long as they are intended just for regzbot. With a bit of luck no such
messages will be needed anyway.

Ciao, Thorsten (wearing his 'Linux kernel regression tracker' hat).

P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
on my table. I can only look briefly into most of them. Unfortunately
therefore I sometimes will get things wrong or miss something important.
I hope that's not the case here; if you think it is, don't hesitate to
tell me about it in a public reply. That's in everyone's interest, as
what I wrote above might be misleading to everyone reading this; any
suggestion I gave thus might sent someone reading this down the wrong
rabbit hole, which none of us wants.

BTW, I have no personal interest in this issue, which is tracked using
regzbot, my Linux kernel regression tracking bot
(https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
this mail to get things rolling again and hence don't need to be CC on
all further activities wrt to this regression.

On 13.12.21 14:24, Ahmad Fatoum wrote:
> Hi,
> 
> I've been investigating a breakage on a PowerPC MPC8313: The SoC is connected
> via the "Enhanced Local Bus Controller" to a 8-bit-parallel S29GL064N flash,
> which is represented as a memory-mapped cfi-flash.
> 
> The regression began in v4.17-rc1 with
> 
>   dfeae1073583 ("mtd: cfi_cmdset_0002: Change write buffer to check correct value")
> 
> and causes all flash write accesses on the hardware to fail. Example output
> after v5.1-rc2[1]:
> 
>   root@host:~# mount -t jffs2 /dev/mtdblock0 /mnt
>   MTD do_write_buffer_wait(): software timeout, address:0x000c000b.
>   jffs2: Write clean marker to block at 0x000c0000 failed: -5
> 
> This issue still persists with v5.16-rc. Reverting aforementioned patch fixes
> it, but I am still looking for a change that keeps both Tokunori's and my
> hardware happy.
> 
> What Tokunori's patch did is that it strengthened the success condition
> for flash writes:
> 
>  - Prior to the patch, DQ polling was done until bits
>    stopped toggling. This was taken as an indicator that the write succeeded
>    and was reported up the stack. i.e. success condition is chip_ready()
> 
>  - After the patch, polling continues until the just written data is
>    actually read back, i.e. success condition is chip_good()
> 
> This new condition never holds for me, when DQ stabilizes, it reads 0xFF,
> never the just written data. The data is still written and can be read back
> on subsequent reads, just not at that point of time in the poll loop.
> 
> We haven't had write issues for the years predating that patch. As the
> regression has been mainline for a while, I am wondering what about my setup
> that makes it pop up here, but not elsewhere?
> 
> I consulted the data sheet[2] and found Figure 27, which describes DQ polling
> during embedded algorithms. DQ switches from status output to "True" (I assume
> True == all bits set == 0xFF) until CS# is reasserted. 
> 
> I compared with another chip's datasheet, and it (Figure 8.4) doesn't describe
> such an intermittent "True" state. In any case, the driver polls a few hundred
> times, however, before giving up, so there should be enough CS# toggles.
> 
> 
> Locally, I'll revert this patch for now. I think accepting 0xFF as a success
> condition may be appropriate, but I don't yet have the rationale to back it up.
> 
> I am investigating this some more, probably with a logic trace, but I wanted
> to report this in case someone has pointers and in case other people run into
> the same issue.
> 
> 
> Cheers,
> Ahmad
> 
> [1] Prior to d9b8a67b3b95 ("mtd: cfi: fix deadloop in cfi_cmdset_0002.c do_write_buffer") 
>     first included with v5.1-rc2, failing writes just hung indefinitely in kernel space.
>     That's fixed, but the writes still fail.
> 
> [2]: 001-98525 Rev. *B, https://www.infineon.com/dgdl/Infineon-S29GL064N_S29GL032N_64_Mbit_32_Mbit_3_V_Page_Mode_MirrorBit_Flash-DataSheet-v03_00-EN.pdf?fileId=8ac78c8c7d0d8da4017d0ed556fd548b
> 
> [3]: https://www.mouser.com/datasheet/2/268/SST39VF1601C-SST39VF1602C-16-Mbit-x16-Multi-Purpos-709008.pdf
>      Note that "true data" means valid data here, not all bits one.
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] mtd: cfi_cmdset_0002: write regression since v4.17-rc1
  2021-12-14  7:23 ` [BUG] mtd: cfi_cmdset_0002: write regression since v4.17-rc1 Thorsten Leemhuis
@ 2021-12-15 17:34   ` Tokunori Ikegami
  2022-01-20 13:00     ` Thorsten Leemhuis
  2022-01-28 12:55     ` Ahmad Fatoum
  0 siblings, 2 replies; 16+ messages in thread
From: Tokunori Ikegami @ 2021-12-15 17:34 UTC (permalink / raw)
  To: Thorsten Leemhuis, Ahmad Fatoum, linux-mtd, Joakim.Tjernlund,
	miquel.raynal, vigneshr, richard, regressions
  Cc: Chris Packham, Brian Norris, David Woodhouse, marek.vasut,
	cyrille.pitchen, linux-kernel, Pengutronix Kernel Team,
	linuxppc-dev, Shaohui.Xie

Hi Ahmad-san,

Sorry for the regression issue by the change: dfeae1073583.
To make sure could you please try with the word write instead of the 
buffered writes?

FYI: There are some changes to disable the buffered writes as below.
   1. 
https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/ar71xx/patches-4.9/411-mtd-cfi_cmdset_0002-force-word-write.patch;h=ddd69f17e1ac16e8fc3a694c56231fee1e2ef149;hb=fec8fe806963c96a6506c2aebc3572d3a11f285f
   2. 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/mtd/chips/cfi_cmdset_0002.c?h=v5.16-rc5&id=7e4404113686868858a34210c28ae122e967aa64

Note:
   Currently I am not able to investigate the issue on the product for 
the change before.

   By the way in the past I had investigated the similar issue on 
Buffalo WZR-HP-G300NH using the S29GL256N.
   It was not able to find the root cause by the investigation since not 
required actually at that time.
   Also actually the buffered writes were disabled on the OpenWrt 
firmware as the change [1] above.
   But I am not sure the reason detail to disable the buffered writes on 
the OpenWrt firmware.
   I thought the issue not caused by the change: dfeae1073583 since the 
issue happened without the change.

   So I am not sure why the above change [2] needed to disable the 
buffered writes on Buffalo WZR-HP-G300NH.
   Probably seems needed to disable the buffered writes on the other 
firmware also but not OpenWrt firmware.

   Anyway there are difference with your regression issue as below.
     1. Flash device: S29GL064N (Your regression issue), S29GL256N 
(WZR-HP-G300NH)
     2. Regression issue: Yes (Your regression issue), No (WZR-HP-G300NH 
as I investigated before)

Regards,
Ikegami

On 2021/12/14 16:23, Thorsten Leemhuis wrote:
> [TLDR: adding this regression to regzbot; most of this mail is compiled
> from a few templates paragraphs some of you might have seen already.]
>
> Hi, this is your Linux kernel regression tracker speaking.
>
> Top-posting for once, to make this easy accessible to everyone.
>
> Thanks for the report.
>
> Adding the regression mailing list to the list of recipients, as it
> should be in the loop for all regressions, as explained here:
> https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html
>
> To be sure this issue doesn't fall through the cracks unnoticed, I'm
> adding it to regzbot, my Linux kernel regression tracking bot:
>
> #regzbot ^introduced dfeae1073583
> #regzbot title mtd: cfi_cmdset_0002: flash write accesses on the
> hardware fail on a PowerPC MPC8313 to a 8-bit-parallel S29GL064N flash
> #regzbot ignore-activity
>
> Reminder: when fixing the issue, please add a 'Link:' tag with the URL
> to the report (the parent of this mail), then regzbot will automatically
> mark the regression as resolved once the fix lands in the appropriate
> tree. For more details about regzbot see footer.
>
> Sending this to everyone that got the initial report, to make all aware
> of the tracking. I also hope that messages like this motivate people to
> directly get at least the regression mailing list and ideally even
> regzbot involved when dealing with regressions, as messages like this
> wouldn't be needed then.
>
> Don't worry, I'll send further messages wrt to this regression just to
> the lists (with a tag in the subject so people can filter them away), as
> long as they are intended just for regzbot. With a bit of luck no such
> messages will be needed anyway.
>
> Ciao, Thorsten (wearing his 'Linux kernel regression tracker' hat).
>
> P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
> on my table. I can only look briefly into most of them. Unfortunately
> therefore I sometimes will get things wrong or miss something important.
> I hope that's not the case here; if you think it is, don't hesitate to
> tell me about it in a public reply. That's in everyone's interest, as
> what I wrote above might be misleading to everyone reading this; any
> suggestion I gave thus might sent someone reading this down the wrong
> rabbit hole, which none of us wants.
>
> BTW, I have no personal interest in this issue, which is tracked using
> regzbot, my Linux kernel regression tracking bot
> (https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
> this mail to get things rolling again and hence don't need to be CC on
> all further activities wrt to this regression.
>
> On 13.12.21 14:24, Ahmad Fatoum wrote:
>> Hi,
>>
>> I've been investigating a breakage on a PowerPC MPC8313: The SoC is connected
>> via the "Enhanced Local Bus Controller" to a 8-bit-parallel S29GL064N flash,
>> which is represented as a memory-mapped cfi-flash.
>>
>> The regression began in v4.17-rc1 with
>>
>>    dfeae1073583 ("mtd: cfi_cmdset_0002: Change write buffer to check correct value")
>>
>> and causes all flash write accesses on the hardware to fail. Example output
>> after v5.1-rc2[1]:
>>
>>    root@host:~# mount -t jffs2 /dev/mtdblock0 /mnt
>>    MTD do_write_buffer_wait(): software timeout, address:0x000c000b.
>>    jffs2: Write clean marker to block at 0x000c0000 failed: -5
>>
>> This issue still persists with v5.16-rc. Reverting aforementioned patch fixes
>> it, but I am still looking for a change that keeps both Tokunori's and my
>> hardware happy.
>>
>> What Tokunori's patch did is that it strengthened the success condition
>> for flash writes:
>>
>>   - Prior to the patch, DQ polling was done until bits
>>     stopped toggling. This was taken as an indicator that the write succeeded
>>     and was reported up the stack. i.e. success condition is chip_ready()
>>
>>   - After the patch, polling continues until the just written data is
>>     actually read back, i.e. success condition is chip_good()
>>
>> This new condition never holds for me, when DQ stabilizes, it reads 0xFF,
>> never the just written data. The data is still written and can be read back
>> on subsequent reads, just not at that point of time in the poll loop.
>>
>> We haven't had write issues for the years predating that patch. As the
>> regression has been mainline for a while, I am wondering what about my setup
>> that makes it pop up here, but not elsewhere?
>>
>> I consulted the data sheet[2] and found Figure 27, which describes DQ polling
>> during embedded algorithms. DQ switches from status output to "True" (I assume
>> True == all bits set == 0xFF) until CS# is reasserted.
>>
>> I compared with another chip's datasheet, and it (Figure 8.4) doesn't describe
>> such an intermittent "True" state. In any case, the driver polls a few hundred
>> times, however, before giving up, so there should be enough CS# toggles.
>>
>>
>> Locally, I'll revert this patch for now. I think accepting 0xFF as a success
>> condition may be appropriate, but I don't yet have the rationale to back it up.
>>
>> I am investigating this some more, probably with a logic trace, but I wanted
>> to report this in case someone has pointers and in case other people run into
>> the same issue.
>>
>>
>> Cheers,
>> Ahmad
>>
>> [1] Prior to d9b8a67b3b95 ("mtd: cfi: fix deadloop in cfi_cmdset_0002.c do_write_buffer")
>>      first included with v5.1-rc2, failing writes just hung indefinitely in kernel space.
>>      That's fixed, but the writes still fail.
>>
>> [2]: 001-98525 Rev. *B, https://www.infineon.com/dgdl/Infineon-S29GL064N_S29GL032N_64_Mbit_32_Mbit_3_V_Page_Mode_MirrorBit_Flash-DataSheet-v03_00-EN.pdf?fileId=8ac78c8c7d0d8da4017d0ed556fd548b
>>
>> [3]: https://www.mouser.com/datasheet/2/268/SST39VF1601C-SST39VF1602C-16-Mbit-x16-Multi-Purpos-709008.pdf
>>       Note that "true data" means valid data here, not all bits one.
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] mtd: cfi_cmdset_0002: write regression since v4.17-rc1
  2021-12-15 17:34   ` Tokunori Ikegami
@ 2022-01-20 13:00     ` Thorsten Leemhuis
  2022-01-28 12:55     ` Ahmad Fatoum
  1 sibling, 0 replies; 16+ messages in thread
From: Thorsten Leemhuis @ 2022-01-20 13:00 UTC (permalink / raw)
  To: Tokunori Ikegami, Thorsten Leemhuis, Ahmad Fatoum, linux-mtd,
	Joakim.Tjernlund, miquel.raynal, vigneshr, richard, regressions
  Cc: Chris Packham, Brian Norris, David Woodhouse, marek.vasut,
	cyrille.pitchen, linux-kernel, Pengutronix Kernel Team,
	linuxppc-dev, Shaohui.Xie

Hi, this is your Linux kernel regression tracker speaking.

On 15.12.21 18:34, Tokunori Ikegami wrote:
> Hi Ahmad-san,
> 
> Sorry for the regression issue by the change: dfeae1073583.
> To make sure could you please try with the word write instead of the
> buffered writes?

Ahmad, did you try what Tokunori asked? Was any progress made to get
this regression fixed? To me it looks like it fell through the cracks.
Can anyone provide a status update please?

Ciao, Thorsten

P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
on my table. I can only look briefly into most of them. Unfortunately
therefore I sometimes will get things wrong or miss something important.
I hope that's not the case here; if you think it is, don't hesitate to
tell me about it in a public reply, that's in everyone's interest.

BTW, I have no personal interest in this issue, which is tracked using
regzbot, my Linux kernel regression tracking bot
(https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
this mail to get things rolling again and hence don't need to be CC on
all further activities wrt to this regression.

#regzbot poke

> FYI: There are some changes to disable the buffered writes as below.
>   1.
> https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/ar71xx/patches-4.9/411-mtd-cfi_cmdset_0002-force-word-write.patch;h=ddd69f17e1ac16e8fc3a694c56231fee1e2ef149;hb=fec8fe806963c96a6506c2aebc3572d3a11f285f
> 
>   2.
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/mtd/chips/cfi_cmdset_0002.c?h=v5.16-rc5&id=7e4404113686868858a34210c28ae122e967aa64
> 
> 
> Note:
>   Currently I am not able to investigate the issue on the product for
> the change before.
> 
>   By the way in the past I had investigated the similar issue on Buffalo
> WZR-HP-G300NH using the S29GL256N.
>   It was not able to find the root cause by the investigation since not
> required actually at that time.
>   Also actually the buffered writes were disabled on the OpenWrt
> firmware as the change [1] above.
>   But I am not sure the reason detail to disable the buffered writes on
> the OpenWrt firmware.
>   I thought the issue not caused by the change: dfeae1073583 since the
> issue happened without the change.
> 
>   So I am not sure why the above change [2] needed to disable the
> buffered writes on Buffalo WZR-HP-G300NH.
>   Probably seems needed to disable the buffered writes on the other
> firmware also but not OpenWrt firmware.
> 
>   Anyway there are difference with your regression issue as below.
>     1. Flash device: S29GL064N (Your regression issue), S29GL256N
> (WZR-HP-G300NH)
>     2. Regression issue: Yes (Your regression issue), No (WZR-HP-G300NH
> as I investigated before)
> 
> Regards,
> Ikegami
> 
> On 2021/12/14 16:23, Thorsten Leemhuis wrote:
>> [TLDR: adding this regression to regzbot; most of this mail is compiled
>> from a few templates paragraphs some of you might have seen already.]
>>
>> Hi, this is your Linux kernel regression tracker speaking.
>>
>> Top-posting for once, to make this easy accessible to everyone.
>>
>> Thanks for the report.
>>
>> Adding the regression mailing list to the list of recipients, as it
>> should be in the loop for all regressions, as explained here:
>> https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html
>>
>> To be sure this issue doesn't fall through the cracks unnoticed, I'm
>> adding it to regzbot, my Linux kernel regression tracking bot:
>>
>> #regzbot ^introduced dfeae1073583
>> #regzbot title mtd: cfi_cmdset_0002: flash write accesses on the
>> hardware fail on a PowerPC MPC8313 to a 8-bit-parallel S29GL064N flash
>> #regzbot ignore-activity
>>
>> Reminder: when fixing the issue, please add a 'Link:' tag with the URL
>> to the report (the parent of this mail), then regzbot will automatically
>> mark the regression as resolved once the fix lands in the appropriate
>> tree. For more details about regzbot see footer.
>>
>> Sending this to everyone that got the initial report, to make all aware
>> of the tracking. I also hope that messages like this motivate people to
>> directly get at least the regression mailing list and ideally even
>> regzbot involved when dealing with regressions, as messages like this
>> wouldn't be needed then.
>>
>> Don't worry, I'll send further messages wrt to this regression just to
>> the lists (with a tag in the subject so people can filter them away), as
>> long as they are intended just for regzbot. With a bit of luck no such
>> messages will be needed anyway.
>>
>> Ciao, Thorsten (wearing his 'Linux kernel regression tracker' hat).
>>
>> P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
>> on my table. I can only look briefly into most of them. Unfortunately
>> therefore I sometimes will get things wrong or miss something important.
>> I hope that's not the case here; if you think it is, don't hesitate to
>> tell me about it in a public reply. That's in everyone's interest, as
>> what I wrote above might be misleading to everyone reading this; any
>> suggestion I gave thus might sent someone reading this down the wrong
>> rabbit hole, which none of us wants.
>>
>> BTW, I have no personal interest in this issue, which is tracked using
>> regzbot, my Linux kernel regression tracking bot
>> (https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
>> this mail to get things rolling again and hence don't need to be CC on
>> all further activities wrt to this regression.
>>
>> On 13.12.21 14:24, Ahmad Fatoum wrote:
>>> Hi,
>>>
>>> I've been investigating a breakage on a PowerPC MPC8313: The SoC is
>>> connected
>>> via the "Enhanced Local Bus Controller" to a 8-bit-parallel S29GL064N
>>> flash,
>>> which is represented as a memory-mapped cfi-flash.
>>>
>>> The regression began in v4.17-rc1 with
>>>
>>>    dfeae1073583 ("mtd: cfi_cmdset_0002: Change write buffer to check
>>> correct value")
>>>
>>> and causes all flash write accesses on the hardware to fail. Example
>>> output
>>> after v5.1-rc2[1]:
>>>
>>>    root@host:~# mount -t jffs2 /dev/mtdblock0 /mnt
>>>    MTD do_write_buffer_wait(): software timeout, address:0x000c000b.
>>>    jffs2: Write clean marker to block at 0x000c0000 failed: -5
>>>
>>> This issue still persists with v5.16-rc. Reverting aforementioned
>>> patch fixes
>>> it, but I am still looking for a change that keeps both Tokunori's
>>> and my
>>> hardware happy.
>>>
>>> What Tokunori's patch did is that it strengthened the success condition
>>> for flash writes:
>>>
>>>   - Prior to the patch, DQ polling was done until bits
>>>     stopped toggling. This was taken as an indicator that the write
>>> succeeded
>>>     and was reported up the stack. i.e. success condition is
>>> chip_ready()
>>>
>>>   - After the patch, polling continues until the just written data is
>>>     actually read back, i.e. success condition is chip_good()
>>>
>>> This new condition never holds for me, when DQ stabilizes, it reads
>>> 0xFF,
>>> never the just written data. The data is still written and can be
>>> read back
>>> on subsequent reads, just not at that point of time in the poll loop.
>>>
>>> We haven't had write issues for the years predating that patch. As the
>>> regression has been mainline for a while, I am wondering what about
>>> my setup
>>> that makes it pop up here, but not elsewhere?
>>>
>>> I consulted the data sheet[2] and found Figure 27, which describes DQ
>>> polling
>>> during embedded algorithms. DQ switches from status output to "True"
>>> (I assume
>>> True == all bits set == 0xFF) until CS# is reasserted.
>>>
>>> I compared with another chip's datasheet, and it (Figure 8.4) doesn't
>>> describe
>>> such an intermittent "True" state. In any case, the driver polls a
>>> few hundred
>>> times, however, before giving up, so there should be enough CS# toggles.
>>>
>>>
>>> Locally, I'll revert this patch for now. I think accepting 0xFF as a
>>> success
>>> condition may be appropriate, but I don't yet have the rationale to
>>> back it up.
>>>
>>> I am investigating this some more, probably with a logic trace, but I
>>> wanted
>>> to report this in case someone has pointers and in case other people
>>> run into
>>> the same issue.
>>>
>>>
>>> Cheers,
>>> Ahmad
>>>
>>> [1] Prior to d9b8a67b3b95 ("mtd: cfi: fix deadloop in
>>> cfi_cmdset_0002.c do_write_buffer")
>>>      first included with v5.1-rc2, failing writes just hung
>>> indefinitely in kernel space.
>>>      That's fixed, but the writes still fail.
>>>
>>> [2]: 001-98525 Rev. *B,
>>> https://www.infineon.com/dgdl/Infineon-S29GL064N_S29GL032N_64_Mbit_32_Mbit_3_V_Page_Mode_MirrorBit_Flash-DataSheet-v03_00-EN.pdf?fileId=8ac78c8c7d0d8da4017d0ed556fd548b
>>>
>>>
>>> [3]:
>>> https://www.mouser.com/datasheet/2/268/SST39VF1601C-SST39VF1602C-16-Mbit-x16-Multi-Purpos-709008.pdf
>>>
>>>       Note that "true data" means valid data here, not all bits one.
>>>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] mtd: cfi_cmdset_0002: write regression since v4.17-rc1
  2021-12-15 17:34   ` Tokunori Ikegami
  2022-01-20 13:00     ` Thorsten Leemhuis
@ 2022-01-28 12:55     ` Ahmad Fatoum
  2022-01-29 18:01       ` Tokunori Ikegami
  1 sibling, 1 reply; 16+ messages in thread
From: Ahmad Fatoum @ 2022-01-28 12:55 UTC (permalink / raw)
  To: Tokunori Ikegami, Thorsten Leemhuis, linux-mtd, Joakim.Tjernlund,
	miquel.raynal, vigneshr, richard, regressions
  Cc: Chris Packham, Brian Norris, David Woodhouse, marek.vasut,
	cyrille.pitchen, linux-kernel, Pengutronix Kernel Team,
	linuxppc-dev, Shaohui.Xie

Hello Tokunori-san,

On 15.12.21 18:34, Tokunori Ikegami wrote:
> Hi Ahmad-san,

Thanks for your reply (and Thorsten for the reminder) and sorry for
the delay. I had a lot of backlog after my time off.

> Sorry for the regression issue by the change: dfeae1073583.
> To make sure could you please try with the word write instead of the buffered writes?

The issue is still there with #define FORCE_WORD_WRITE 1:

  jffs2: Write clean marker to block at 0x000a0000 failed: -5
  MTD do_write_oneword_once(): software timeout

> FYI: There are some changes to disable the buffered writes as below.
>   1. https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/ar71xx/patches-4.9/411-mtd-cfi_cmdset_0002-force-word-write.patch;h=ddd69f17e1ac16e8fc3a694c56231fee1e2ef149;hb=fec8fe806963c96a6506c2aebc3572d3a11f285f
>   2. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/mtd/chips/cfi_cmdset_0002.c?h=v5.16-rc5&id=7e4404113686868858a34210c28ae122e967aa64
> 
> Note:
>   Currently I am not able to investigate the issue on the product for the change before.
> 
>   By the way in the past I had investigated the similar issue on Buffalo WZR-HP-G300NH using the S29GL256N.
>   It was not able to find the root cause by the investigation since not required actually at that time.
>   Also actually the buffered writes were disabled on the OpenWrt firmware as the change [1] above.
>   But I am not sure the reason detail to disable the buffered writes on the OpenWrt firmware.
>   I thought the issue not caused by the change: dfeae1073583 since the issue happened without the change.
> 
>   So I am not sure why the above change [2] needed to disable the buffered writes on Buffalo WZR-HP-G300NH.
>   Probably seems needed to disable the buffered writes on the other firmware also but not OpenWrt firmware.
> 
>   Anyway there are difference with your regression issue as below.
>     1. Flash device: S29GL064N (Your regression issue), S29GL256N (WZR-HP-G300NH)
>     2. Regression issue: Yes (Your regression issue), No (WZR-HP-G300NH as I investigated before)

Doesn't seem to be a buffered write issue here though as the writes
did work fine before dfeae1073583. Any other ideas?

Cheers,
Ahmad

> 
> Regards,
> Ikegami
> 
> On 2021/12/14 16:23, Thorsten Leemhuis wrote:
>> [TLDR: adding this regression to regzbot; most of this mail is compiled
>> from a few templates paragraphs some of you might have seen already.]
>>
>> Hi, this is your Linux kernel regression tracker speaking.
>>
>> Top-posting for once, to make this easy accessible to everyone.
>>
>> Thanks for the report.
>>
>> Adding the regression mailing list to the list of recipients, as it
>> should be in the loop for all regressions, as explained here:
>> https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html
>>
>> To be sure this issue doesn't fall through the cracks unnoticed, I'm
>> adding it to regzbot, my Linux kernel regression tracking bot:
>>
>> #regzbot ^introduced dfeae1073583
>> #regzbot title mtd: cfi_cmdset_0002: flash write accesses on the
>> hardware fail on a PowerPC MPC8313 to a 8-bit-parallel S29GL064N flash
>> #regzbot ignore-activity
>>
>> Reminder: when fixing the issue, please add a 'Link:' tag with the URL
>> to the report (the parent of this mail), then regzbot will automatically
>> mark the regression as resolved once the fix lands in the appropriate
>> tree. For more details about regzbot see footer.
>>
>> Sending this to everyone that got the initial report, to make all aware
>> of the tracking. I also hope that messages like this motivate people to
>> directly get at least the regression mailing list and ideally even
>> regzbot involved when dealing with regressions, as messages like this
>> wouldn't be needed then.
>>
>> Don't worry, I'll send further messages wrt to this regression just to
>> the lists (with a tag in the subject so people can filter them away), as
>> long as they are intended just for regzbot. With a bit of luck no such
>> messages will be needed anyway.
>>
>> Ciao, Thorsten (wearing his 'Linux kernel regression tracker' hat).
>>
>> P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
>> on my table. I can only look briefly into most of them. Unfortunately
>> therefore I sometimes will get things wrong or miss something important.
>> I hope that's not the case here; if you think it is, don't hesitate to
>> tell me about it in a public reply. That's in everyone's interest, as
>> what I wrote above might be misleading to everyone reading this; any
>> suggestion I gave thus might sent someone reading this down the wrong
>> rabbit hole, which none of us wants.
>>
>> BTW, I have no personal interest in this issue, which is tracked using
>> regzbot, my Linux kernel regression tracking bot
>> (https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
>> this mail to get things rolling again and hence don't need to be CC on
>> all further activities wrt to this regression.
>>
>> On 13.12.21 14:24, Ahmad Fatoum wrote:
>>> Hi,
>>>
>>> I've been investigating a breakage on a PowerPC MPC8313: The SoC is connected
>>> via the "Enhanced Local Bus Controller" to a 8-bit-parallel S29GL064N flash,
>>> which is represented as a memory-mapped cfi-flash.
>>>
>>> The regression began in v4.17-rc1 with
>>>
>>>    dfeae1073583 ("mtd: cfi_cmdset_0002: Change write buffer to check correct value")
>>>
>>> and causes all flash write accesses on the hardware to fail. Example output
>>> after v5.1-rc2[1]:
>>>
>>>    root@host:~# mount -t jffs2 /dev/mtdblock0 /mnt
>>>    MTD do_write_buffer_wait(): software timeout, address:0x000c000b.
>>>    jffs2: Write clean marker to block at 0x000c0000 failed: -5
>>>
>>> This issue still persists with v5.16-rc. Reverting aforementioned patch fixes
>>> it, but I am still looking for a change that keeps both Tokunori's and my
>>> hardware happy.
>>>
>>> What Tokunori's patch did is that it strengthened the success condition
>>> for flash writes:
>>>
>>>   - Prior to the patch, DQ polling was done until bits
>>>     stopped toggling. This was taken as an indicator that the write succeeded
>>>     and was reported up the stack. i.e. success condition is chip_ready()
>>>
>>>   - After the patch, polling continues until the just written data is
>>>     actually read back, i.e. success condition is chip_good()
>>>
>>> This new condition never holds for me, when DQ stabilizes, it reads 0xFF,
>>> never the just written data. The data is still written and can be read back
>>> on subsequent reads, just not at that point of time in the poll loop.
>>>
>>> We haven't had write issues for the years predating that patch. As the
>>> regression has been mainline for a while, I am wondering what about my setup
>>> that makes it pop up here, but not elsewhere?
>>>
>>> I consulted the data sheet[2] and found Figure 27, which describes DQ polling
>>> during embedded algorithms. DQ switches from status output to "True" (I assume
>>> True == all bits set == 0xFF) until CS# is reasserted.
>>>
>>> I compared with another chip's datasheet, and it (Figure 8.4) doesn't describe
>>> such an intermittent "True" state. In any case, the driver polls a few hundred
>>> times, however, before giving up, so there should be enough CS# toggles.
>>>
>>>
>>> Locally, I'll revert this patch for now. I think accepting 0xFF as a success
>>> condition may be appropriate, but I don't yet have the rationale to back it up.
>>>
>>> I am investigating this some more, probably with a logic trace, but I wanted
>>> to report this in case someone has pointers and in case other people run into
>>> the same issue.
>>>
>>>
>>> Cheers,
>>> Ahmad
>>>
>>> [1] Prior to d9b8a67b3b95 ("mtd: cfi: fix deadloop in cfi_cmdset_0002.c do_write_buffer")
>>>      first included with v5.1-rc2, failing writes just hung indefinitely in kernel space.
>>>      That's fixed, but the writes still fail.
>>>
>>> [2]: 001-98525 Rev. *B, https://www.infineon.com/dgdl/Infineon-S29GL064N_S29GL032N_64_Mbit_32_Mbit_3_V_Page_Mode_MirrorBit_Flash-DataSheet-v03_00-EN.pdf?fileId=8ac78c8c7d0d8da4017d0ed556fd548b
>>>
>>> [3]: https://www.mouser.com/datasheet/2/268/SST39VF1601C-SST39VF1602C-16-Mbit-x16-Multi-Purpos-709008.pdf
>>>       Note that "true data" means valid data here, not all bits one.
>>>
> 


-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] mtd: cfi_cmdset_0002: write regression since v4.17-rc1
  2022-01-28 12:55     ` Ahmad Fatoum
@ 2022-01-29 18:01       ` Tokunori Ikegami
  2022-02-07 14:28         ` Ahmad Fatoum
  0 siblings, 1 reply; 16+ messages in thread
From: Tokunori Ikegami @ 2022-01-29 18:01 UTC (permalink / raw)
  To: Ahmad Fatoum, Thorsten Leemhuis, linux-mtd, Joakim.Tjernlund,
	miquel.raynal, vigneshr, richard, regressions
  Cc: Chris Packham, Brian Norris, David Woodhouse, marek.vasut,
	cyrille.pitchen, linux-kernel, Pengutronix Kernel Team,
	linuxppc-dev

Hi Ahmad-san,

Thanks for your investigation.

> The issue is still there with #define FORCE_WORD_WRITE 1:
>
>    jffs2: Write clean marker to block at 0x000a0000 failed: -5
>    MTD do_write_oneword_once(): software timeout
Which kernel version has been tested about this?
Since the buffered writes disabled by 7e4404113686 for S29GL256N and 
tested on kernel 5.10.16.
So I would like to confirm if the issue depended on the CPU or kernel 
version, etc.
Note: The chips S29GL064N and S29GL256N seem different the flash Mb size 
basically.
> Doesn't seem to be a buffered write issue here though as the writes
> did work fine before dfeae1073583. Any other ideas?
At first I thought the issue is possible to be resolved by using the 
word write instead of the buffered writes.
Now I am thinking to disable the changes dfeae1073583 partially with any 
condition if possible.
By the way could you please let me know the chip information for more 
detail? (For example model number, cycle and device ID, etc.)

Regards,
Ikegami


On 2021/12/14 16:23, Thorsten Leemhuis wrote:

>>> [TLDR: adding this regression to regzbot; most of this mail is compiled
>>> from a few templates paragraphs some of you might have seen already.]
>>>
>>> Hi, this is your Linux kernel regression tracker speaking.
>>>
>>> Top-posting for once, to make this easy accessible to everyone.
>>>
>>> Thanks for the report.
>>>
>>> Adding the regression mailing list to the list of recipients, as it
>>> should be in the loop for all regressions, as explained here:
>>> https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html
>>>
>>> To be sure this issue doesn't fall through the cracks unnoticed, I'm
>>> adding it to regzbot, my Linux kernel regression tracking bot:
>>>
>>> #regzbot ^introduced dfeae1073583
>>> #regzbot title mtd: cfi_cmdset_0002: flash write accesses on the
>>> hardware fail on a PowerPC MPC8313 to a 8-bit-parallel S29GL064N flash
>>> #regzbot ignore-activity
>>>
>>> Reminder: when fixing the issue, please add a 'Link:' tag with the URL
>>> to the report (the parent of this mail), then regzbot will automatically
>>> mark the regression as resolved once the fix lands in the appropriate
>>> tree. For more details about regzbot see footer.
>>>
>>> Sending this to everyone that got the initial report, to make all aware
>>> of the tracking. I also hope that messages like this motivate people to
>>> directly get at least the regression mailing list and ideally even
>>> regzbot involved when dealing with regressions, as messages like this
>>> wouldn't be needed then.
>>>
>>> Don't worry, I'll send further messages wrt to this regression just to
>>> the lists (with a tag in the subject so people can filter them away), as
>>> long as they are intended just for regzbot. With a bit of luck no such
>>> messages will be needed anyway.
>>>
>>> Ciao, Thorsten (wearing his 'Linux kernel regression tracker' hat).
>>>
>>> P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
>>> on my table. I can only look briefly into most of them. Unfortunately
>>> therefore I sometimes will get things wrong or miss something important.
>>> I hope that's not the case here; if you think it is, don't hesitate to
>>> tell me about it in a public reply. That's in everyone's interest, as
>>> what I wrote above might be misleading to everyone reading this; any
>>> suggestion I gave thus might sent someone reading this down the wrong
>>> rabbit hole, which none of us wants.
>>>
>>> BTW, I have no personal interest in this issue, which is tracked using
>>> regzbot, my Linux kernel regression tracking bot
>>> (https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
>>> this mail to get things rolling again and hence don't need to be CC on
>>> all further activities wrt to this regression.
>>>
>>> On 13.12.21 14:24, Ahmad Fatoum wrote:
>>>> Hi,
>>>>
>>>> I've been investigating a breakage on a PowerPC MPC8313: The SoC is connected
>>>> via the "Enhanced Local Bus Controller" to a 8-bit-parallel S29GL064N flash,
>>>> which is represented as a memory-mapped cfi-flash.
>>>>
>>>> The regression began in v4.17-rc1 with
>>>>
>>>>     dfeae1073583 ("mtd: cfi_cmdset_0002: Change write buffer to check correct value")
>>>>
>>>> and causes all flash write accesses on the hardware to fail. Example output
>>>> after v5.1-rc2[1]:
>>>>
>>>>     root@host:~# mount -t jffs2 /dev/mtdblock0 /mnt
>>>>     MTD do_write_buffer_wait(): software timeout, address:0x000c000b.
>>>>     jffs2: Write clean marker to block at 0x000c0000 failed: -5
>>>>
>>>> This issue still persists with v5.16-rc. Reverting aforementioned patch fixes
>>>> it, but I am still looking for a change that keeps both Tokunori's and my
>>>> hardware happy.
>>>>
>>>> What Tokunori's patch did is that it strengthened the success condition
>>>> for flash writes:
>>>>
>>>>    - Prior to the patch, DQ polling was done until bits
>>>>      stopped toggling. This was taken as an indicator that the write succeeded
>>>>      and was reported up the stack. i.e. success condition is chip_ready()
>>>>
>>>>    - After the patch, polling continues until the just written data is
>>>>      actually read back, i.e. success condition is chip_good()
>>>>
>>>> This new condition never holds for me, when DQ stabilizes, it reads 0xFF,
>>>> never the just written data. The data is still written and can be read back
>>>> on subsequent reads, just not at that point of time in the poll loop.
>>>>
>>>> We haven't had write issues for the years predating that patch. As the
>>>> regression has been mainline for a while, I am wondering what about my setup
>>>> that makes it pop up here, but not elsewhere?
>>>>
>>>> I consulted the data sheet[2] and found Figure 27, which describes DQ polling
>>>> during embedded algorithms. DQ switches from status output to "True" (I assume
>>>> True == all bits set == 0xFF) until CS# is reasserted.
>>>>
>>>> I compared with another chip's datasheet, and it (Figure 8.4) doesn't describe
>>>> such an intermittent "True" state. In any case, the driver polls a few hundred
>>>> times, however, before giving up, so there should be enough CS# toggles.
>>>>
>>>>
>>>> Locally, I'll revert this patch for now. I think accepting 0xFF as a success
>>>> condition may be appropriate, but I don't yet have the rationale to back it up.
>>>>
>>>> I am investigating this some more, probably with a logic trace, but I wanted
>>>> to report this in case someone has pointers and in case other people run into
>>>> the same issue.
>>>>
>>>>
>>>> Cheers,
>>>> Ahmad
>>>>
>>>> [1] Prior to d9b8a67b3b95 ("mtd: cfi: fix deadloop in cfi_cmdset_0002.c do_write_buffer")
>>>>       first included with v5.1-rc2, failing writes just hung indefinitely in kernel space.
>>>>       That's fixed, but the writes still fail.
>>>>
>>>> [2]: 001-98525 Rev. *B, https://www.infineon.com/dgdl/Infineon-S29GL064N_S29GL032N_64_Mbit_32_Mbit_3_V_Page_Mode_MirrorBit_Flash-DataSheet-v03_00-EN.pdf?fileId=8ac78c8c7d0d8da4017d0ed556fd548b
>>>>
>>>> [3]: https://www.mouser.com/datasheet/2/268/SST39VF1601C-SST39VF1602C-16-Mbit-x16-Multi-Purpos-709008.pdf
>>>>        Note that "true data" means valid data here, not all bits one.
>>>>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] mtd: cfi_cmdset_0002: write regression since v4.17-rc1
  2022-01-29 18:01       ` Tokunori Ikegami
@ 2022-02-07 14:28         ` Ahmad Fatoum
  2022-02-13 16:47           ` Tokunori Ikegami
  0 siblings, 1 reply; 16+ messages in thread
From: Ahmad Fatoum @ 2022-02-07 14:28 UTC (permalink / raw)
  To: Tokunori Ikegami, Thorsten Leemhuis, linux-mtd, Joakim.Tjernlund,
	miquel.raynal, vigneshr, richard, regressions
  Cc: Chris Packham, Brian Norris, David Woodhouse, marek.vasut,
	cyrille.pitchen, linux-kernel, Pengutronix Kernel Team,
	linuxppc-dev

Hello Tokunori-san,

On 29.01.22 19:01, Tokunori Ikegami wrote:
> Hi Ahmad-san,
> 
> Thanks for your investigation.
> 
>> The issue is still there with #define FORCE_WORD_WRITE 1:
>>
>>    jffs2: Write clean marker to block at 0x000a0000 failed: -5
>>    MTD do_write_oneword_once(): software timeout
> Which kernel version has been tested about this?

I last tested with v5.10.30, but I had briefly tried v5.16-rc as well
when first debugging this issue.

I have rebased onto v5.17-rc2 now and will use that for further tests.
The same issue with word write forcing is reproducible there as well.

> Since the buffered writes disabled by 7e4404113686 for S29GL256N and tested on kernel 5.10.16.
> So I would like to confirm if the issue depended on the CPU or kernel version, etc.
> Note: The chips S29GL064N and S29GL256N seem different the flash Mb size basically.

I see. To be extra sure, I have replaced 0x2201 with 0x0c01 to hit
the same code paths, but no improvement.

>> Doesn't seem to be a buffered write issue here though as the writes
>> did work fine before dfeae1073583. Any other ideas?
> At first I thought the issue is possible to be resolved by using the word write instead of the buffered writes.
> Now I am thinking to disable the changes dfeae1073583 partially with any condition if possible.

What seems to work for me is checking if chip_good or chip_ready
and map_word is equal to 0xFF. I can't justify why this is ok though.
(Worst case bus is floating at this point of time and Hi-Z is read
as 0xff on CPU data lines...)

> By the way could you please let me know the chip information for more detail? (For example model number, cycle and device ID, etc.)

I can't read it off the chip, but vendor uses S29GL064N90FFI02 or S29GL964N11FFI02.
Kernel reports it with:
ff800000.flash: Found 1 x16 devices at 0x0 in 8-bit bank. Manufacturer ID 0x000001 Chip ID 0x000c01

I am not sure what you mean with cycle. If you tell me what
command to run, I can paste the output.

Thanks,
Ahmad



> 
> Regards,
> Ikegami
> 
> 
> On 2021/12/14 16:23, Thorsten Leemhuis wrote:
> 
>>>> [TLDR: adding this regression to regzbot; most of this mail is compiled
>>>> from a few templates paragraphs some of you might have seen already.]
>>>>
>>>> Hi, this is your Linux kernel regression tracker speaking.
>>>>
>>>> Top-posting for once, to make this easy accessible to everyone.
>>>>
>>>> Thanks for the report.
>>>>
>>>> Adding the regression mailing list to the list of recipients, as it
>>>> should be in the loop for all regressions, as explained here:
>>>> https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html
>>>>
>>>> To be sure this issue doesn't fall through the cracks unnoticed, I'm
>>>> adding it to regzbot, my Linux kernel regression tracking bot:
>>>>
>>>> #regzbot ^introduced dfeae1073583
>>>> #regzbot title mtd: cfi_cmdset_0002: flash write accesses on the
>>>> hardware fail on a PowerPC MPC8313 to a 8-bit-parallel S29GL064N flash
>>>> #regzbot ignore-activity
>>>>
>>>> Reminder: when fixing the issue, please add a 'Link:' tag with the URL
>>>> to the report (the parent of this mail), then regzbot will automatically
>>>> mark the regression as resolved once the fix lands in the appropriate
>>>> tree. For more details about regzbot see footer.
>>>>
>>>> Sending this to everyone that got the initial report, to make all aware
>>>> of the tracking. I also hope that messages like this motivate people to
>>>> directly get at least the regression mailing list and ideally even
>>>> regzbot involved when dealing with regressions, as messages like this
>>>> wouldn't be needed then.
>>>>
>>>> Don't worry, I'll send further messages wrt to this regression just to
>>>> the lists (with a tag in the subject so people can filter them away), as
>>>> long as they are intended just for regzbot. With a bit of luck no such
>>>> messages will be needed anyway.
>>>>
>>>> Ciao, Thorsten (wearing his 'Linux kernel regression tracker' hat).
>>>>
>>>> P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
>>>> on my table. I can only look briefly into most of them. Unfortunately
>>>> therefore I sometimes will get things wrong or miss something important.
>>>> I hope that's not the case here; if you think it is, don't hesitate to
>>>> tell me about it in a public reply. That's in everyone's interest, as
>>>> what I wrote above might be misleading to everyone reading this; any
>>>> suggestion I gave thus might sent someone reading this down the wrong
>>>> rabbit hole, which none of us wants.
>>>>
>>>> BTW, I have no personal interest in this issue, which is tracked using
>>>> regzbot, my Linux kernel regression tracking bot
>>>> (https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
>>>> this mail to get things rolling again and hence don't need to be CC on
>>>> all further activities wrt to this regression.
>>>>
>>>> On 13.12.21 14:24, Ahmad Fatoum wrote:
>>>>> Hi,
>>>>>
>>>>> I've been investigating a breakage on a PowerPC MPC8313: The SoC is connected
>>>>> via the "Enhanced Local Bus Controller" to a 8-bit-parallel S29GL064N flash,
>>>>> which is represented as a memory-mapped cfi-flash.
>>>>>
>>>>> The regression began in v4.17-rc1 with
>>>>>
>>>>>     dfeae1073583 ("mtd: cfi_cmdset_0002: Change write buffer to check correct value")
>>>>>
>>>>> and causes all flash write accesses on the hardware to fail. Example output
>>>>> after v5.1-rc2[1]:
>>>>>
>>>>>     root@host:~# mount -t jffs2 /dev/mtdblock0 /mnt
>>>>>     MTD do_write_buffer_wait(): software timeout, address:0x000c000b.
>>>>>     jffs2: Write clean marker to block at 0x000c0000 failed: -5
>>>>>
>>>>> This issue still persists with v5.16-rc. Reverting aforementioned patch fixes
>>>>> it, but I am still looking for a change that keeps both Tokunori's and my
>>>>> hardware happy.
>>>>>
>>>>> What Tokunori's patch did is that it strengthened the success condition
>>>>> for flash writes:
>>>>>
>>>>>    - Prior to the patch, DQ polling was done until bits
>>>>>      stopped toggling. This was taken as an indicator that the write succeeded
>>>>>      and was reported up the stack. i.e. success condition is chip_ready()
>>>>>
>>>>>    - After the patch, polling continues until the just written data is
>>>>>      actually read back, i.e. success condition is chip_good()
>>>>>
>>>>> This new condition never holds for me, when DQ stabilizes, it reads 0xFF,
>>>>> never the just written data. The data is still written and can be read back
>>>>> on subsequent reads, just not at that point of time in the poll loop.
>>>>>
>>>>> We haven't had write issues for the years predating that patch. As the
>>>>> regression has been mainline for a while, I am wondering what about my setup
>>>>> that makes it pop up here, but not elsewhere?
>>>>>
>>>>> I consulted the data sheet[2] and found Figure 27, which describes DQ polling
>>>>> during embedded algorithms. DQ switches from status output to "True" (I assume
>>>>> True == all bits set == 0xFF) until CS# is reasserted.
>>>>>
>>>>> I compared with another chip's datasheet, and it (Figure 8.4) doesn't describe
>>>>> such an intermittent "True" state. In any case, the driver polls a few hundred
>>>>> times, however, before giving up, so there should be enough CS# toggles.
>>>>>
>>>>>
>>>>> Locally, I'll revert this patch for now. I think accepting 0xFF as a success
>>>>> condition may be appropriate, but I don't yet have the rationale to back it up.
>>>>>
>>>>> I am investigating this some more, probably with a logic trace, but I wanted
>>>>> to report this in case someone has pointers and in case other people run into
>>>>> the same issue.
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Ahmad
>>>>>
>>>>> [1] Prior to d9b8a67b3b95 ("mtd: cfi: fix deadloop in cfi_cmdset_0002.c do_write_buffer")
>>>>>       first included with v5.1-rc2, failing writes just hung indefinitely in kernel space.
>>>>>       That's fixed, but the writes still fail.
>>>>>
>>>>> [2]: 001-98525 Rev. *B, https://www.infineon.com/dgdl/Infineon-S29GL064N_S29GL032N_64_Mbit_32_Mbit_3_V_Page_Mode_MirrorBit_Flash-DataSheet-v03_00-EN.pdf?fileId=8ac78c8c7d0d8da4017d0ed556fd548b
>>>>>
>>>>> [3]: https://www.mouser.com/datasheet/2/268/SST39VF1601C-SST39VF1602C-16-Mbit-x16-Multi-Purpos-709008.pdf
>>>>>        Note that "true data" means valid data here, not all bits one.
>>>>>
>>
> 


-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] mtd: cfi_cmdset_0002: write regression since v4.17-rc1
  2022-02-07 14:28         ` Ahmad Fatoum
@ 2022-02-13 16:47           ` Tokunori Ikegami
  2022-02-14 16:22             ` Ahmad Fatoum
  0 siblings, 1 reply; 16+ messages in thread
From: Tokunori Ikegami @ 2022-02-13 16:47 UTC (permalink / raw)
  To: Ahmad Fatoum, Thorsten Leemhuis, linux-mtd, Joakim.Tjernlund,
	miquel.raynal, vigneshr, richard, regressions
  Cc: Chris Packham, Brian Norris, David Woodhouse, marek.vasut,
	cyrille.pitchen, linux-kernel, Pengutronix Kernel Team,
	linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 9399 bytes --]

Hi Ahmad-san,

Thanks for your confirmations. Sorry for late to reply.

Could you please try the patch attached to disable the chip_good() 
change as before?
I think this should work for S29GL964N since the chip_ready() is used 
and works as mentioned.

On 2022/02/07 23:28, Ahmad Fatoum wrote:
> Hello Tokunori-san,
>
> On 29.01.22 19:01, Tokunori Ikegami wrote:
>> Hi Ahmad-san,
>>
>> Thanks for your investigation.
>>
>>> The issue is still there with #define FORCE_WORD_WRITE 1:
>>>
>>>     jffs2: Write clean marker to block at 0x000a0000 failed: -5
>>>     MTD do_write_oneword_once(): software timeout
>> Which kernel version has been tested about this?
> I last tested with v5.10.30, but I had briefly tried v5.16-rc as well
> when first debugging this issue.
>
> I have rebased onto v5.17-rc2 now and will use that for further tests.
> The same issue with word write forcing is reproducible there as well.
Noted about these.
>
>> Since the buffered writes disabled by 7e4404113686 for S29GL256N and tested on kernel 5.10.16.
>> So I would like to confirm if the issue depended on the CPU or kernel version, etc.
>> Note: The chips S29GL064N and S29GL256N seem different the flash Mb size basically.
> I see. To be extra sure, I have replaced 0x2201 with 0x0c01 to hit
> the same code paths, but no improvement.
I see and check the data sheet as described.
>
>>> Doesn't seem to be a buffered write issue here though as the writes
>>> did work fine before dfeae1073583. Any other ideas?
>> At first I thought the issue is possible to be resolved by using the word write instead of the buffered writes.
>> Now I am thinking to disable the changes dfeae1073583 partially with any condition if possible.
> What seems to work for me is checking if chip_good or chip_ready
> and map_word is equal to 0xFF. I can't justify why this is ok though.
> (Worst case bus is floating at this point of time and Hi-Z is read
> as 0xff on CPU data lines...)

Sorry I am not sure about this.
I thought the chip_ready() itself is correct as implemented as the data 
sheet in the past.
But it did not work correctly so changed to use chip_good() instead as 
it is also correct.

>
>> By the way could you please let me know the chip information for more detail? (For example model number, cycle and device ID, etc.)
> I can't read it off the chip, but vendor uses S29GL064N90FFI02 or S29GL964N11FFI02.
> Kernel reports it with:
> ff800000.flash: Found 1 x16 devices at 0x0 in 8-bit bank. Manufacturer ID 0x000001 Chip ID 0x000c01
The change attached checks the device ID 0x0c01 and use chip_ready() 
instead on chip_good().
>
> I am not sure what you mean with cycle. If you tell me what
> command to run, I can paste the output.
Sorry my understanding was not correct about the data sheet description 
device ID and cycle.

Regards,
Ikegami

>
> Thanks,
> Ahmad
>
>
>
>> Regards,
>> Ikegami
>>
>>
>> On 2021/12/14 16:23, Thorsten Leemhuis wrote:
>>
>>>>> [TLDR: adding this regression to regzbot; most of this mail is compiled
>>>>> from a few templates paragraphs some of you might have seen already.]
>>>>>
>>>>> Hi, this is your Linux kernel regression tracker speaking.
>>>>>
>>>>> Top-posting for once, to make this easy accessible to everyone.
>>>>>
>>>>> Thanks for the report.
>>>>>
>>>>> Adding the regression mailing list to the list of recipients, as it
>>>>> should be in the loop for all regressions, as explained here:
>>>>> https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html
>>>>>
>>>>> To be sure this issue doesn't fall through the cracks unnoticed, I'm
>>>>> adding it to regzbot, my Linux kernel regression tracking bot:
>>>>>
>>>>> #regzbot ^introduced dfeae1073583
>>>>> #regzbot title mtd: cfi_cmdset_0002: flash write accesses on the
>>>>> hardware fail on a PowerPC MPC8313 to a 8-bit-parallel S29GL064N flash
>>>>> #regzbot ignore-activity
>>>>>
>>>>> Reminder: when fixing the issue, please add a 'Link:' tag with the URL
>>>>> to the report (the parent of this mail), then regzbot will automatically
>>>>> mark the regression as resolved once the fix lands in the appropriate
>>>>> tree. For more details about regzbot see footer.
>>>>>
>>>>> Sending this to everyone that got the initial report, to make all aware
>>>>> of the tracking. I also hope that messages like this motivate people to
>>>>> directly get at least the regression mailing list and ideally even
>>>>> regzbot involved when dealing with regressions, as messages like this
>>>>> wouldn't be needed then.
>>>>>
>>>>> Don't worry, I'll send further messages wrt to this regression just to
>>>>> the lists (with a tag in the subject so people can filter them away), as
>>>>> long as they are intended just for regzbot. With a bit of luck no such
>>>>> messages will be needed anyway.
>>>>>
>>>>> Ciao, Thorsten (wearing his 'Linux kernel regression tracker' hat).
>>>>>
>>>>> P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
>>>>> on my table. I can only look briefly into most of them. Unfortunately
>>>>> therefore I sometimes will get things wrong or miss something important.
>>>>> I hope that's not the case here; if you think it is, don't hesitate to
>>>>> tell me about it in a public reply. That's in everyone's interest, as
>>>>> what I wrote above might be misleading to everyone reading this; any
>>>>> suggestion I gave thus might sent someone reading this down the wrong
>>>>> rabbit hole, which none of us wants.
>>>>>
>>>>> BTW, I have no personal interest in this issue, which is tracked using
>>>>> regzbot, my Linux kernel regression tracking bot
>>>>> (https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
>>>>> this mail to get things rolling again and hence don't need to be CC on
>>>>> all further activities wrt to this regression.
>>>>>
>>>>> On 13.12.21 14:24, Ahmad Fatoum wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I've been investigating a breakage on a PowerPC MPC8313: The SoC is connected
>>>>>> via the "Enhanced Local Bus Controller" to a 8-bit-parallel S29GL064N flash,
>>>>>> which is represented as a memory-mapped cfi-flash.
>>>>>>
>>>>>> The regression began in v4.17-rc1 with
>>>>>>
>>>>>>      dfeae1073583 ("mtd: cfi_cmdset_0002: Change write buffer to check correct value")
>>>>>>
>>>>>> and causes all flash write accesses on the hardware to fail. Example output
>>>>>> after v5.1-rc2[1]:
>>>>>>
>>>>>>      root@host:~# mount -t jffs2 /dev/mtdblock0 /mnt
>>>>>>      MTD do_write_buffer_wait(): software timeout, address:0x000c000b.
>>>>>>      jffs2: Write clean marker to block at 0x000c0000 failed: -5
>>>>>>
>>>>>> This issue still persists with v5.16-rc. Reverting aforementioned patch fixes
>>>>>> it, but I am still looking for a change that keeps both Tokunori's and my
>>>>>> hardware happy.
>>>>>>
>>>>>> What Tokunori's patch did is that it strengthened the success condition
>>>>>> for flash writes:
>>>>>>
>>>>>>     - Prior to the patch, DQ polling was done until bits
>>>>>>       stopped toggling. This was taken as an indicator that the write succeeded
>>>>>>       and was reported up the stack. i.e. success condition is chip_ready()
>>>>>>
>>>>>>     - After the patch, polling continues until the just written data is
>>>>>>       actually read back, i.e. success condition is chip_good()
>>>>>>
>>>>>> This new condition never holds for me, when DQ stabilizes, it reads 0xFF,
>>>>>> never the just written data. The data is still written and can be read back
>>>>>> on subsequent reads, just not at that point of time in the poll loop.
>>>>>>
>>>>>> We haven't had write issues for the years predating that patch. As the
>>>>>> regression has been mainline for a while, I am wondering what about my setup
>>>>>> that makes it pop up here, but not elsewhere?
>>>>>>
>>>>>> I consulted the data sheet[2] and found Figure 27, which describes DQ polling
>>>>>> during embedded algorithms. DQ switches from status output to "True" (I assume
>>>>>> True == all bits set == 0xFF) until CS# is reasserted.
>>>>>>
>>>>>> I compared with another chip's datasheet, and it (Figure 8.4) doesn't describe
>>>>>> such an intermittent "True" state. In any case, the driver polls a few hundred
>>>>>> times, however, before giving up, so there should be enough CS# toggles.
>>>>>>
>>>>>>
>>>>>> Locally, I'll revert this patch for now. I think accepting 0xFF as a success
>>>>>> condition may be appropriate, but I don't yet have the rationale to back it up.
>>>>>>
>>>>>> I am investigating this some more, probably with a logic trace, but I wanted
>>>>>> to report this in case someone has pointers and in case other people run into
>>>>>> the same issue.
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>> Ahmad
>>>>>>
>>>>>> [1] Prior to d9b8a67b3b95 ("mtd: cfi: fix deadloop in cfi_cmdset_0002.c do_write_buffer")
>>>>>>        first included with v5.1-rc2, failing writes just hung indefinitely in kernel space.
>>>>>>        That's fixed, but the writes still fail.
>>>>>>
>>>>>> [2]: 001-98525 Rev. *B, https://www.infineon.com/dgdl/Infineon-S29GL064N_S29GL032N_64_Mbit_32_Mbit_3_V_Page_Mode_MirrorBit_Flash-DataSheet-v03_00-EN.pdf?fileId=8ac78c8c7d0d8da4017d0ed556fd548b
>>>>>>
>>>>>> [3]: https://www.mouser.com/datasheet/2/268/SST39VF1601C-SST39VF1602C-16-Mbit-x16-Multi-Purpos-709008.pdf
>>>>>>         Note that "true data" means valid data here, not all bits one.
>>>>>>
>

[-- Attachment #2: 0001-mtd-cfi_cmdset_0002-Use-chip_ready-for-write-on-S29G.patch --]
[-- Type: text/plain, Size: 7494 bytes --]

From 59b1e946931202d7058eec12c2bcda7fc65acbba Mon Sep 17 00:00:00 2001
From: Tokunori Ikegami <ikegami.t@gmail.com>
Date: Mon, 14 Feb 2022 01:08:02 +0900
Subject: [PATCH] mtd: cfi_cmdset_0002: Use chip_ready() for write on S29GL064N

The regression issue has been caused on S29GL064N and reported it.
Also the change mentioned is to use chip_good() for buffered write.
So disable the change on S29GL064N and use chip_ready() as before.

Fixes: dfeae1073583("mtd: cfi_cmdset_0002: Change write buffer to check correct value")
Signed-off-by: Tokunori Ikegami <ikegami.t@gmail.com>
Cc: linux-mtd@lists.infradead.org
---
 drivers/mtd/chips/cfi_cmdset_0002.c | 105 ++++++++++++++++------------
 1 file changed, 59 insertions(+), 46 deletions(-)

diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c b/drivers/mtd/chips/cfi_cmdset_0002.c
index a761134fd3be..a0dfc8ace899 100644
--- a/drivers/mtd/chips/cfi_cmdset_0002.c
+++ b/drivers/mtd/chips/cfi_cmdset_0002.c
@@ -48,6 +48,7 @@
 #define SST49LF040B		0x0050
 #define SST49LF008A		0x005a
 #define AT49BV6416		0x00d6
+#define S29GL064N_MN12		0x0c01
 
 /*
  * Status Register bit description. Used by flash devices that don't
@@ -109,6 +110,8 @@ static struct mtd_chip_driver cfi_amdstd_chipdrv = {
 	.module		= THIS_MODULE
 };
 
+static bool use_chip_good_for_write;
+
 /*
  * Use status register to poll for Erase/write completion when DQ is not
  * supported. This is indicated by Bit[1:0] of SoftwareFeatures field in
@@ -283,6 +286,17 @@ static void fixup_use_write_buffers(struct mtd_info *mtd)
 }
 #endif /* !FORCE_WORD_WRITE */
 
+static void fixup_use_chip_good_for_write(struct mtd_info *mtd)
+{
+	struct map_info *map = mtd->priv;
+	struct cfi_private *cfi = map->fldrv_priv;
+
+	if (cfi->mfr == CFI_MFR_AMD && cfi->id == S29GL064N_MN12)
+		return;
+
+	use_chip_good_for_write = true;
+}
+
 /* Atmel chips don't use the same PRI format as AMD chips */
 static void fixup_convert_atmel_pri(struct mtd_info *mtd)
 {
@@ -462,7 +476,7 @@ static struct cfi_fixup cfi_fixup_table[] = {
 	{ CFI_MFR_AMD, 0x0056, fixup_use_secsi },
 	{ CFI_MFR_AMD, 0x005C, fixup_use_secsi },
 	{ CFI_MFR_AMD, 0x005F, fixup_use_secsi },
-	{ CFI_MFR_AMD, 0x0c01, fixup_s29gl064n_sectors },
+	{ CFI_MFR_AMD, S29GL064N_MN12, fixup_s29gl064n_sectors },
 	{ CFI_MFR_AMD, 0x1301, fixup_s29gl064n_sectors },
 	{ CFI_MFR_AMD, 0x1a00, fixup_s29gl032n_sectors },
 	{ CFI_MFR_AMD, 0x1a01, fixup_s29gl032n_sectors },
@@ -474,6 +488,7 @@ static struct cfi_fixup cfi_fixup_table[] = {
 #if !FORCE_WORD_WRITE
 	{ CFI_MFR_ANY, CFI_ID_ANY, fixup_use_write_buffers },
 #endif
+	{ CFI_MFR_ANY, CFI_ID_ANY, fixup_use_chip_good_for_write },
 	{ 0, 0, NULL }
 };
 static struct cfi_fixup jedec_fixup_table[] = {
@@ -801,42 +816,61 @@ static struct mtd_info *cfi_amdstd_setup(struct mtd_info *mtd)
 	return NULL;
 }
 
-/*
- * Return true if the chip is ready.
- *
- * Ready is one of: read mode, query mode, erase-suspend-read mode (in any
- * non-suspended sector) and is indicated by no toggle bits toggling.
- *
- * Note that anything more complicated than checking if no bits are toggling
- * (including checking DQ5 for an error status) is tricky to get working
- * correctly and is therefore not done	(particularly with interleaved chips
- * as each chip must be checked independently of the others).
- */
-static int __xipram chip_ready(struct map_info *map, struct flchip *chip,
-			       unsigned long addr)
+static int __xipram chip_check(struct map_info *map, struct flchip *chip,
+			       unsigned long addr, map_word *expected)
 {
 	struct cfi_private *cfi = map->fldrv_priv;
-	map_word d, t;
+	map_word oldd, curd;
+	int ret;
 
 	if (cfi_use_status_reg(cfi)) {
 		map_word ready = CMD(CFI_SR_DRB);
+
 		/*
 		 * For chips that support status register, check device
 		 * ready bit
 		 */
 		cfi_send_gen_cmd(0x70, cfi->addr_unlock1, chip->start, map, cfi,
 				 cfi->device_type, NULL);
-		d = map_read(map, addr);
+		curd = map_read(map, addr);
 
-		return map_word_andequal(map, d, ready, ready);
+		return map_word_andequal(map, curd, ready, ready);
 	}
 
-	d = map_read(map, addr);
-	t = map_read(map, addr);
+	oldd = map_read(map, addr);
+	curd = map_read(map, addr);
+
+	ret = map_word_equal(map, oldd, curd);
+
+	if (!ret || !expected)
+		return ret;
+
+	return map_word_equal(map, curd, *expected);
+}
+
+static int __xipram chip_good_for_write(struct map_info *map,
+					struct flchip *chip, unsigned long addr,
+					map_word expected)
+{
+	if (use_chip_good_for_write)
+		return chip_check(map, chip, addr, &expected);
 
-	return map_word_equal(map, d, t);
+	return chip_check(map, chip, addr, NULL);
 }
 
+/*
+ * Return true if the chip is ready.
+ *
+ * Ready is one of: read mode, query mode, erase-suspend-read mode (in any
+ * non-suspended sector) and is indicated by no toggle bits toggling.
+ *
+ * Note that anything more complicated than checking if no bits are toggling
+ * (including checking DQ5 for an error status) is tricky to get working
+ * correctly and is therefore not done	(particularly with interleaved chips
+ * as each chip must be checked independently of the others).
+ */
+#define chip_ready(map, chip, addr) chip_check(map, chip, addr, NULL)
+
 /*
  * Return true if the chip is ready and has the correct value.
  *
@@ -855,28 +889,7 @@ static int __xipram chip_ready(struct map_info *map, struct flchip *chip,
 static int __xipram chip_good(struct map_info *map, struct flchip *chip,
 			      unsigned long addr, map_word expected)
 {
-	struct cfi_private *cfi = map->fldrv_priv;
-	map_word oldd, curd;
-
-	if (cfi_use_status_reg(cfi)) {
-		map_word ready = CMD(CFI_SR_DRB);
-
-		/*
-		 * For chips that support status register, check device
-		 * ready bit
-		 */
-		cfi_send_gen_cmd(0x70, cfi->addr_unlock1, chip->start, map, cfi,
-				 cfi->device_type, NULL);
-		curd = map_read(map, addr);
-
-		return map_word_andequal(map, curd, ready, ready);
-	}
-
-	oldd = map_read(map, addr);
-	curd = map_read(map, addr);
-
-	return	map_word_equal(map, oldd, curd) &&
-		map_word_equal(map, curd, expected);
+	return chip_check(map, chip, addr, &expected);
 }
 
 static int get_chip(struct map_info *map, struct flchip *chip, unsigned long adr, int mode)
@@ -1699,7 +1712,7 @@ static int __xipram do_write_oneword_once(struct map_info *map,
 		 * "chip_good" to avoid the failure due to scheduling.
 		 */
 		if (time_after(jiffies, timeo) &&
-		    !chip_good(map, chip, adr, datum)) {
+		    !chip_good_for_write(map, chip, adr, datum)) {
 			xip_enable(map, chip, adr);
 			printk(KERN_WARNING "MTD %s(): software timeout\n", __func__);
 			xip_disable(map, chip, adr);
@@ -1707,7 +1720,7 @@ static int __xipram do_write_oneword_once(struct map_info *map,
 			break;
 		}
 
-		if (chip_good(map, chip, adr, datum)) {
+		if (chip_good_for_write(map, chip, adr, datum)) {
 			if (cfi_check_err_status(map, chip, adr))
 				ret = -EIO;
 			break;
@@ -1979,14 +1992,14 @@ static int __xipram do_write_buffer_wait(struct map_info *map,
 		 * "chip_good" to avoid the failure due to scheduling.
 		 */
 		if (time_after(jiffies, timeo) &&
-		    !chip_good(map, chip, adr, datum)) {
+		    !chip_good_for_write(map, chip, adr, datum)) {
 			pr_err("MTD %s(): software timeout, address:0x%.8lx.\n",
 			       __func__, adr);
 			ret = -EIO;
 			break;
 		}
 
-		if (chip_good(map, chip, adr, datum)) {
+		if (chip_good_for_write(map, chip, adr, datum)) {
 			if (cfi_check_err_status(map, chip, adr))
 				ret = -EIO;
 			break;
-- 
2.32.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] mtd: cfi_cmdset_0002: write regression since v4.17-rc1
  2022-02-13 16:47           ` Tokunori Ikegami
@ 2022-02-14 16:22             ` Ahmad Fatoum
  2022-02-14 18:46               ` Tokunori Ikegami
  0 siblings, 1 reply; 16+ messages in thread
From: Ahmad Fatoum @ 2022-02-14 16:22 UTC (permalink / raw)
  To: Tokunori Ikegami, Thorsten Leemhuis, linux-mtd, Joakim.Tjernlund,
	miquel.raynal, vigneshr, richard, regressions
  Cc: Chris Packham, Brian Norris, David Woodhouse, marek.vasut,
	cyrille.pitchen, linux-kernel, Pengutronix Kernel Team,
	linuxppc-dev

Hello Tokunori-san,

On 13.02.22 17:47, Tokunori Ikegami wrote:
> Hi Ahmad-san,
> 
> Thanks for your confirmations. Sorry for late to reply.

No worries. I appreciate you taking the time.

> Could you please try the patch attached to disable the chip_good() change as before?
> I think this should work for S29GL964N since the chip_ready() is used and works as mentioned.

yes, this resolves my issue:
Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de>

>>>> Doesn't seem to be a buffered write issue here though as the writes
>>>> did work fine before dfeae1073583. Any other ideas?
>>> At first I thought the issue is possible to be resolved by using the word write instead of the buffered writes.
>>> Now I am thinking to disable the changes dfeae1073583 partially with any condition if possible.
>> What seems to work for me is checking if chip_good or chip_ready
>> and map_word is equal to 0xFF. I can't justify why this is ok though.
>> (Worst case bus is floating at this point of time and Hi-Z is read
>> as 0xff on CPU data lines...)
> 
> Sorry I am not sure about this.
> I thought the chip_ready() itself is correct as implemented as the data sheet in the past.
> But it did not work correctly so changed to use chip_good() instead as it is also correct.

What exactly in the datasheet makes you believe chip_good is not appropriate?

Cheers,
Ahmad


-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] mtd: cfi_cmdset_0002: write regression since v4.17-rc1
  2022-02-14 16:22             ` Ahmad Fatoum
@ 2022-02-14 18:46               ` Tokunori Ikegami
  2022-02-20 12:22                 ` Tokunori Ikegami
  0 siblings, 1 reply; 16+ messages in thread
From: Tokunori Ikegami @ 2022-02-14 18:46 UTC (permalink / raw)
  To: Ahmad Fatoum, Thorsten Leemhuis, linux-mtd, Joakim.Tjernlund,
	miquel.raynal, vigneshr, richard, regressions
  Cc: Chris Packham, Brian Norris, David Woodhouse, marek.vasut,
	cyrille.pitchen, linux-kernel, Pengutronix Kernel Team,
	linuxppc-dev

Hi Ahmad-san,

On 2022/02/15 1:22, Ahmad Fatoum wrote:
> Hello Tokunori-san,
>
> On 13.02.22 17:47, Tokunori Ikegami wrote:
>> Hi Ahmad-san,
>>
>> Thanks for your confirmations. Sorry for late to reply.
> No worries. I appreciate you taking the time.
>
>> Could you please try the patch attached to disable the chip_good() change as before?
>> I think this should work for S29GL964N since the chip_ready() is used and works as mentioned.
> yes, this resolves my issue:
> Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Thanks for your testing. I have just sent the patch to review.
>
>>>>> Doesn't seem to be a buffered write issue here though as the writes
>>>>> did work fine before dfeae1073583. Any other ideas?
>>>> At first I thought the issue is possible to be resolved by using the word write instead of the buffered writes.
>>>> Now I am thinking to disable the changes dfeae1073583 partially with any condition if possible.
>>> What seems to work for me is checking if chip_good or chip_ready
>>> and map_word is equal to 0xFF. I can't justify why this is ok though.
>>> (Worst case bus is floating at this point of time and Hi-Z is read
>>> as 0xff on CPU data lines...)
>> Sorry I am not sure about this.
>> I thought the chip_ready() itself is correct as implemented as the data sheet in the past.
>> But it did not work correctly so changed to use chip_good() instead as it is also correct.
> What exactly in the datasheet makes you believe chip_good is not appropriate?
I just mentioned about the actual issue behaviors as not worked 
chip_good() on S29GL964N and not worked chip_ready() on 
MX29GL512FHT2I-11G before etc.
Anyway let me recheck the data sheet details as just checked it again 
quickly but needed more investigation to understand.

Regards,
Ikegami

>
> Cheers,
> Ahmad
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] mtd: cfi_cmdset_0002: write regression since v4.17-rc1
  2022-02-14 18:46               ` Tokunori Ikegami
@ 2022-02-20 12:22                 ` Tokunori Ikegami
  2022-03-04 11:11                   ` Ahmad Fatoum
  0 siblings, 1 reply; 16+ messages in thread
From: Tokunori Ikegami @ 2022-02-20 12:22 UTC (permalink / raw)
  To: Ahmad Fatoum, Thorsten Leemhuis, linux-mtd, Joakim.Tjernlund,
	miquel.raynal, vigneshr, richard, regressions
  Cc: Chris Packham, Brian Norris, David Woodhouse, marek.vasut,
	cyrille.pitchen, linux-kernel, Pengutronix Kernel Team,
	linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 2780 bytes --]

Hi Ahmad-san,

Could you please try the version 2 patch attached for the error case?
This version is to check the DQ true data 0xFF by chip_good().
But I am not sure if this works or not since the error is possible to be 
caused by Hi-Z 0xff on floating bus or etc.

On 2022/02/15 3:46, Tokunori Ikegami wrote:
> Hi Ahmad-san,
>
> On 2022/02/15 1:22, Ahmad Fatoum wrote:
>> Hello Tokunori-san,
>>
>> On 13.02.22 17:47, Tokunori Ikegami wrote:
>>> Hi Ahmad-san,
>>>
>>> Thanks for your confirmations. Sorry for late to reply.
>> No worries. I appreciate you taking the time.
>>
>>> Could you please try the patch attached to disable the chip_good() 
>>> change as before?
>>> I think this should work for S29GL964N since the chip_ready() is 
>>> used and works as mentioned.
>> yes, this resolves my issue:
>> Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
> Thanks for your testing. I have just sent the patch to review.
>>
>>>>>> Doesn't seem to be a buffered write issue here though as the writes
>>>>>> did work fine before dfeae1073583. Any other ideas?
>>>>> At first I thought the issue is possible to be resolved by using 
>>>>> the word write instead of the buffered writes.
>>>>> Now I am thinking to disable the changes dfeae1073583 partially 
>>>>> with any condition if possible.
>>>> What seems to work for me is checking if chip_good or chip_ready
>>>> and map_word is equal to 0xFF. I can't justify why this is ok though.
>>>> (Worst case bus is floating at this point of time and Hi-Z is read
>>>> as 0xff on CPU data lines...)
>>> Sorry I am not sure about this.
>>> I thought the chip_ready() itself is correct as implemented as the 
>>> data sheet in the past.
>>> But it did not work correctly so changed to use chip_good() instead 
>>> as it is also correct.
>> What exactly in the datasheet makes you believe chip_good is not 
>> appropriate?
> I just mentioned about the actual issue behaviors as not worked 
> chip_good() on S29GL964N and not worked chip_ready() on 
> MX29GL512FHT2I-11G before etc.
> Anyway let me recheck the data sheet details as just checked it again 
> quickly but needed more investigation to understand.

As far as I checked still both chip_good() and chip_ready() seem correct 
but still the root cause is unknown.
If as you mentioned the issue was cased by the DQ true data 0xFF I am 
not sure why the read work without any error after the write operation.
Also if the error was caused by the Hi-Z 0xff on floating bus as 
mentioned I am not sure why the read work without any error after the 
write operation with chip_ready().
Sorry anyway the root cause is also unknown when the write operation was 
changed to use chip_good() instead of chip_ready().

Regards,
Ikegami

>
> Regards,
> Ikegami
>
>>
>> Cheers,
>> Ahmad
>>
>>

[-- Attachment #2: v2-0001-mtd-cfi_cmdset_0002-Change-chip_good-to-check-DQ-.patch --]
[-- Type: text/x-patch, Size: 4289 bytes --]

From 2bef02bee8fa74273cfc764e288b6f92b8646bb7 Mon Sep 17 00:00:00 2001
From: Tokunori Ikegami <ikegami.t@gmail.com>
Date: Sat, 19 Feb 2022 19:39:32 +0900
Subject: [PATCH v2] mtd: cfi_cmdset_0002: Change chip_good() to check DQ true
 data 0xFF

The regression issue has been caused on S29GL064N and reported it.
The change mentioned for regression is to use chip_good() for buffered write.
Also it seems that the 0xFF value is read on the error case.
It is possible to be caused by DQ true data described by S29GL064N datasheet.
So change chip_good() to check DQ true data 0xFF additionally for the error.

Fixes: dfeae1073583("mtd: cfi_cmdset_0002: Change write buffer to check correct value")
Signed-off-by: Tokunori Ikegami <ikegami.t@gmail.com>
Cc: linux-mtd@lists.infradead.org
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/linux-mtd/cedb1604-e024-2738-5b33-15703a653803@gmail.com/
---
 drivers/mtd/chips/cfi_cmdset_0002.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c b/drivers/mtd/chips/cfi_cmdset_0002.c
index a761134fd3be..079f69e5400d 100644
--- a/drivers/mtd/chips/cfi_cmdset_0002.c
+++ b/drivers/mtd/chips/cfi_cmdset_0002.c
@@ -853,7 +853,7 @@ static int __xipram chip_ready(struct map_info *map, struct flchip *chip,
  *
  */
 static int __xipram chip_good(struct map_info *map, struct flchip *chip,
-			      unsigned long addr, map_word expected)
+			      unsigned long addr, map_word *expected)
 {
 	struct cfi_private *cfi = map->fldrv_priv;
 	map_word oldd, curd;
@@ -875,8 +875,13 @@ static int __xipram chip_good(struct map_info *map, struct flchip *chip,
 	oldd = map_read(map, addr);
 	curd = map_read(map, addr);
 
-	return	map_word_equal(map, oldd, curd) &&
-		map_word_equal(map, curd, expected);
+	if (!map_word_equal(map, oldd, curd))
+		return 0;
+
+	if (expected && map_word_equal(map, curd, *expected))
+		return 1;
+
+	return map_word_equal(map, oldd, map_word_ff(map));
 }
 
 static int get_chip(struct map_info *map, struct flchip *chip, unsigned long adr, int mode)
@@ -1699,7 +1704,7 @@ static int __xipram do_write_oneword_once(struct map_info *map,
 		 * "chip_good" to avoid the failure due to scheduling.
 		 */
 		if (time_after(jiffies, timeo) &&
-		    !chip_good(map, chip, adr, datum)) {
+		    !chip_good(map, chip, adr, &datum)) {
 			xip_enable(map, chip, adr);
 			printk(KERN_WARNING "MTD %s(): software timeout\n", __func__);
 			xip_disable(map, chip, adr);
@@ -1707,7 +1712,7 @@ static int __xipram do_write_oneword_once(struct map_info *map,
 			break;
 		}
 
-		if (chip_good(map, chip, adr, datum)) {
+		if (chip_good(map, chip, adr, &datum)) {
 			if (cfi_check_err_status(map, chip, adr))
 				ret = -EIO;
 			break;
@@ -1979,14 +1984,14 @@ static int __xipram do_write_buffer_wait(struct map_info *map,
 		 * "chip_good" to avoid the failure due to scheduling.
 		 */
 		if (time_after(jiffies, timeo) &&
-		    !chip_good(map, chip, adr, datum)) {
+		    !chip_good(map, chip, adr, &datum)) {
 			pr_err("MTD %s(): software timeout, address:0x%.8lx.\n",
 			       __func__, adr);
 			ret = -EIO;
 			break;
 		}
 
-		if (chip_good(map, chip, adr, datum)) {
+		if (chip_good(map, chip, adr, &datum)) {
 			if (cfi_check_err_status(map, chip, adr))
 				ret = -EIO;
 			break;
@@ -2282,7 +2287,7 @@ static int do_panic_write_oneword(struct map_info *map, struct flchip *chip,
 		udelay(1);
 	}
 
-	if (!chip_good(map, chip, adr, datum) ||
+	if (!chip_good(map, chip, adr, &datum) ||
 	    cfi_check_err_status(map, chip, adr)) {
 		/* reset on all failures. */
 		map_write(map, CMD(0xF0), chip->start);
@@ -2478,7 +2483,7 @@ static int __xipram do_erase_chip(struct map_info *map, struct flchip *chip)
 			chip->erase_suspended = 0;
 		}
 
-		if (chip_good(map, chip, adr, map_word_ff(map))) {
+		if (chip_good(map, chip, adr, NULL)) {
 			if (cfi_check_err_status(map, chip, adr))
 				ret = -EIO;
 			break;
@@ -2577,7 +2582,7 @@ static int __xipram do_erase_oneblock(struct map_info *map, struct flchip *chip,
 			chip->erase_suspended = 0;
 		}
 
-		if (chip_good(map, chip, adr, map_word_ff(map))) {
+		if (chip_good(map, chip, adr, NULL)) {
 			if (cfi_check_err_status(map, chip, adr))
 				ret = -EIO;
 			break;
-- 
2.32.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] mtd: cfi_cmdset_0002: write regression since v4.17-rc1
  2022-02-20 12:22                 ` Tokunori Ikegami
@ 2022-03-04 11:11                   ` Ahmad Fatoum
  2022-03-06 15:49                     ` Tokunori Ikegami
  0 siblings, 1 reply; 16+ messages in thread
From: Ahmad Fatoum @ 2022-03-04 11:11 UTC (permalink / raw)
  To: Tokunori Ikegami, Thorsten Leemhuis, linux-mtd, Joakim.Tjernlund,
	miquel.raynal, vigneshr, richard, regressions
  Cc: Chris Packham, Brian Norris, David Woodhouse, marek.vasut,
	cyrille.pitchen, linux-kernel, Pengutronix Kernel Team,
	linuxppc-dev

Hello Tokunori-san,

On 20.02.22 13:22, Tokunori Ikegami wrote:
> Hi Ahmad-san,
> 
> Could you please try the version 2 patch attached for the error case?
> This version is to check the DQ true data 0xFF by chip_good().

I had a similar patch locally as well at first. I just tested yours
and I can't reproduce the issue.

> But I am not sure if this works or not since the error is possible to be caused by Hi-Z 0xff on floating bus or etc.

That it works for me could be because of Hi-Z 0xff, which is why
decided against it.

>>>>> What seems to work for me is checking if chip_good or chip_ready
>>>>> and map_word is equal to 0xFF. I can't justify why this is ok though.
>>>>> (Worst case bus is floating at this point of time and Hi-Z is read
>>>>> as 0xff on CPU data lines...)
>>>> Sorry I am not sure about this.
>>>> I thought the chip_ready() itself is correct as implemented as the data sheet in the past.
>>>> But it did not work correctly so changed to use chip_good() instead as it is also correct.
>>> What exactly in the datasheet makes you believe chip_good is not appropriate?
>> I just mentioned about the actual issue behaviors as not worked chip_good() on S29GL964N and not worked chip_ready() on MX29GL512FHT2I-11G before etc.
>> Anyway let me recheck the data sheet details as just checked it again quickly but needed more investigation to understand.
> 
> As far as I checked still both chip_good() and chip_ready() seem correct but still the root cause is unknown.
> If as you mentioned the issue was cased by the DQ true data 0xFF I am not sure why the read work without any error after the write operation.
> Also if the error was caused by the Hi-Z 0xff on floating bus as mentioned I am not sure why the read work without any error after the write operation with chip_ready().
> Sorry anyway the root cause is also unknown when the write operation was changed to use chip_good() instead of chip_ready().

I've be ok with v1 then. Restores working behavior for me and shouldn't break others.

Cheers and thanks again,
Ahmad

> 
> Regards,
> Ikegami
> 
>>
>> Regards,
>> Ikegami
>>
>>>
>>> Cheers,
>>> Ahmad
>>>
>>>


-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] mtd: cfi_cmdset_0002: write regression since v4.17-rc1
  2022-03-04 11:11                   ` Ahmad Fatoum
@ 2022-03-06 15:49                     ` Tokunori Ikegami
  2022-03-08  9:44                       ` Ahmad Fatoum
  0 siblings, 1 reply; 16+ messages in thread
From: Tokunori Ikegami @ 2022-03-06 15:49 UTC (permalink / raw)
  To: Ahmad Fatoum, Thorsten Leemhuis, linux-mtd, Joakim.Tjernlund,
	miquel.raynal, vigneshr, richard, regressions
  Cc: Chris Packham, Brian Norris, David Woodhouse, marek.vasut,
	cyrille.pitchen, linux-kernel, Pengutronix Kernel Team,
	linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 2554 bytes --]

Hi,

On 2022/03/04 20:11, Ahmad Fatoum wrote:
> Hello Tokunori-san,
>
> On 20.02.22 13:22, Tokunori Ikegami wrote:
>> Hi Ahmad-san,
>>
>> Could you please try the version 2 patch attached for the error case?
>> This version is to check the DQ true data 0xFF by chip_good().
> I had a similar patch locally as well at first. I just tested yours
> and I can't reproduce the issue.
Thanks for your support.
Sorry if possible could you please retest the attached the patch again 
since this fixed the version 1 patch maintainer review comments?
>
>> But I am not sure if this works or not since the error is possible to be caused by Hi-Z 0xff on floating bus or etc.
> That it works for me could be because of Hi-Z 0xff, which is why
> decided against it.
I see.
>
>>>>>> What seems to work for me is checking if chip_good or chip_ready
>>>>>> and map_word is equal to 0xFF. I can't justify why this is ok though.
>>>>>> (Worst case bus is floating at this point of time and Hi-Z is read
>>>>>> as 0xff on CPU data lines...)
>>>>> Sorry I am not sure about this.
>>>>> I thought the chip_ready() itself is correct as implemented as the data sheet in the past.
>>>>> But it did not work correctly so changed to use chip_good() instead as it is also correct.
>>>> What exactly in the datasheet makes you believe chip_good is not appropriate?
>>> I just mentioned about the actual issue behaviors as not worked chip_good() on S29GL964N and not worked chip_ready() on MX29GL512FHT2I-11G before etc.
>>> Anyway let me recheck the data sheet details as just checked it again quickly but needed more investigation to understand.
>> As far as I checked still both chip_good() and chip_ready() seem correct but still the root cause is unknown.
>> If as you mentioned the issue was cased by the DQ true data 0xFF I am not sure why the read work without any error after the write operation.
>> Also if the error was caused by the Hi-Z 0xff on floating bus as mentioned I am not sure why the read work without any error after the write operation with chip_ready().
>> Sorry anyway the root cause is also unknown when the write operation was changed to use chip_good() instead of chip_ready().
> I've be ok with v1 then. Restores working behavior for me and shouldn't break others.

Noted but still I am thinking the version 2 patch to check 0xff seems 
better than to use chip_ready() so let me consider this again later.

Regards,
Ikegami

>
> Cheers and thanks again,
> Ahmad
>
>> Regards,
>> Ikegami
>>
>>> Regards,
>>> Ikegami
>>>
>>>> Cheers,
>>>> Ahmad
>>>>
>>>>
>

[-- Attachment #2: v2-0001-mtd-cfi_cmdset_0002-Use-chip_ready-for-write-on-S.patch --]
[-- Type: text/x-patch, Size: 6844 bytes --]

From 306f7266cb2b6d07bbc5882b3b977264483ad128 Mon Sep 17 00:00:00 2001
From: Tokunori Ikegami <ikegami.t@gmail.com>
Date: Mon, 14 Feb 2022 01:08:02 +0900
Subject: [PATCH v2] mtd: cfi_cmdset_0002: Use chip_ready() for write on
 S29GL064N

The regression issue has been caused on S29GL064N and reported it.
Also the change mentioned is to use chip_good() for buffered write.
So disable the change on S29GL064N and use chip_ready() as before.

Fixes: dfeae1073583("mtd: cfi_cmdset_0002: Change write buffer to check correct value")
Signed-off-by: Tokunori Ikegami <ikegami.t@gmail.com>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Vignesh Raghavendra <vigneshr@ti.com>
Cc: linux-mtd@lists.infradead.org
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/b687c259-6413-26c9-d4c9-b3afa69ea124@pengutronix.de/
---
 drivers/mtd/chips/cfi_cmdset_0002.c | 89 +++++++++++++++--------------
 1 file changed, 47 insertions(+), 42 deletions(-)

diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c b/drivers/mtd/chips/cfi_cmdset_0002.c
index a761134fd3be..5e14b60e8638 100644
--- a/drivers/mtd/chips/cfi_cmdset_0002.c
+++ b/drivers/mtd/chips/cfi_cmdset_0002.c
@@ -48,6 +48,7 @@
 #define SST49LF040B		0x0050
 #define SST49LF008A		0x005a
 #define AT49BV6416		0x00d6
+#define S29GL064N_MN12		0x0c01
 
 /*
  * Status Register bit description. Used by flash devices that don't
@@ -462,7 +463,7 @@ static struct cfi_fixup cfi_fixup_table[] = {
 	{ CFI_MFR_AMD, 0x0056, fixup_use_secsi },
 	{ CFI_MFR_AMD, 0x005C, fixup_use_secsi },
 	{ CFI_MFR_AMD, 0x005F, fixup_use_secsi },
-	{ CFI_MFR_AMD, 0x0c01, fixup_s29gl064n_sectors },
+	{ CFI_MFR_AMD, S29GL064N_MN12, fixup_s29gl064n_sectors },
 	{ CFI_MFR_AMD, 0x1301, fixup_s29gl064n_sectors },
 	{ CFI_MFR_AMD, 0x1a00, fixup_s29gl032n_sectors },
 	{ CFI_MFR_AMD, 0x1a01, fixup_s29gl032n_sectors },
@@ -801,22 +802,12 @@ static struct mtd_info *cfi_amdstd_setup(struct mtd_info *mtd)
 	return NULL;
 }
 
-/*
- * Return true if the chip is ready.
- *
- * Ready is one of: read mode, query mode, erase-suspend-read mode (in any
- * non-suspended sector) and is indicated by no toggle bits toggling.
- *
- * Note that anything more complicated than checking if no bits are toggling
- * (including checking DQ5 for an error status) is tricky to get working
- * correctly and is therefore not done	(particularly with interleaved chips
- * as each chip must be checked independently of the others).
- */
-static int __xipram chip_ready(struct map_info *map, struct flchip *chip,
-			       unsigned long addr)
+static int __xipram chip_check(struct map_info *map, struct flchip *chip,
+			       unsigned long addr, map_word *expected)
 {
 	struct cfi_private *cfi = map->fldrv_priv;
-	map_word d, t;
+	map_word oldd, curd;
+	int ret;
 
 	if (cfi_use_status_reg(cfi)) {
 		map_word ready = CMD(CFI_SR_DRB);
@@ -826,17 +817,35 @@ static int __xipram chip_ready(struct map_info *map, struct flchip *chip,
 		 */
 		cfi_send_gen_cmd(0x70, cfi->addr_unlock1, chip->start, map, cfi,
 				 cfi->device_type, NULL);
-		d = map_read(map, addr);
+		curd = map_read(map, addr);
 
-		return map_word_andequal(map, d, ready, ready);
+		return map_word_andequal(map, curd, ready, ready);
 	}
 
-	d = map_read(map, addr);
-	t = map_read(map, addr);
+	oldd = map_read(map, addr);
+	curd = map_read(map, addr);
+
+	ret = map_word_equal(map, oldd, curd);
 
-	return map_word_equal(map, d, t);
+	if (!ret || !expected)
+		return ret;
+
+	return map_word_equal(map, curd, *expected);
 }
 
+/*
+ * Return true if the chip is ready.
+ *
+ * Ready is one of: read mode, query mode, erase-suspend-read mode (in any
+ * non-suspended sector) and is indicated by no toggle bits toggling.
+ *
+ * Note that anything more complicated than checking if no bits are toggling
+ * (including checking DQ5 for an error status) is tricky to get working
+ * correctly and is therefore not done	(particularly with interleaved chips
+ * as each chip must be checked independently of the others).
+ */
+#define chip_ready(map, chip, addr) chip_check(map, chip, addr, NULL)
+
 /*
  * Return true if the chip is ready and has the correct value.
  *
@@ -855,28 +864,24 @@ static int __xipram chip_ready(struct map_info *map, struct flchip *chip,
 static int __xipram chip_good(struct map_info *map, struct flchip *chip,
 			      unsigned long addr, map_word expected)
 {
-	struct cfi_private *cfi = map->fldrv_priv;
-	map_word oldd, curd;
-
-	if (cfi_use_status_reg(cfi)) {
-		map_word ready = CMD(CFI_SR_DRB);
+	return chip_check(map, chip, addr, &expected);
+}
 
-		/*
-		 * For chips that support status register, check device
-		 * ready bit
-		 */
-		cfi_send_gen_cmd(0x70, cfi->addr_unlock1, chip->start, map, cfi,
-				 cfi->device_type, NULL);
-		curd = map_read(map, addr);
+static bool cfi_use_chip_ready_for_write(struct map_info *map)
+{
+	struct cfi_private *cfi = map->fldrv_priv;
 
-		return map_word_andequal(map, curd, ready, ready);
-	}
+	return cfi->mfr == CFI_MFR_AMD && cfi->id == S29GL064N_MN12;
+}
 
-	oldd = map_read(map, addr);
-	curd = map_read(map, addr);
+static int __xipram chip_good_for_write(struct map_info *map,
+					struct flchip *chip, unsigned long addr,
+					map_word expected)
+{
+	if (cfi_use_chip_ready_for_write(map))
+		return chip_ready(map, chip, addr);
 
-	return	map_word_equal(map, oldd, curd) &&
-		map_word_equal(map, curd, expected);
+	return chip_good(map, chip, addr, expected);
 }
 
 static int get_chip(struct map_info *map, struct flchip *chip, unsigned long adr, int mode)
@@ -1699,7 +1704,7 @@ static int __xipram do_write_oneword_once(struct map_info *map,
 		 * "chip_good" to avoid the failure due to scheduling.
 		 */
 		if (time_after(jiffies, timeo) &&
-		    !chip_good(map, chip, adr, datum)) {
+		    !chip_good_for_write(map, chip, adr, datum)) {
 			xip_enable(map, chip, adr);
 			printk(KERN_WARNING "MTD %s(): software timeout\n", __func__);
 			xip_disable(map, chip, adr);
@@ -1707,7 +1712,7 @@ static int __xipram do_write_oneword_once(struct map_info *map,
 			break;
 		}
 
-		if (chip_good(map, chip, adr, datum)) {
+		if (chip_good_for_write(map, chip, adr, datum)) {
 			if (cfi_check_err_status(map, chip, adr))
 				ret = -EIO;
 			break;
@@ -1979,14 +1984,14 @@ static int __xipram do_write_buffer_wait(struct map_info *map,
 		 * "chip_good" to avoid the failure due to scheduling.
 		 */
 		if (time_after(jiffies, timeo) &&
-		    !chip_good(map, chip, adr, datum)) {
+		    !chip_good_for_write(map, chip, adr, datum)) {
 			pr_err("MTD %s(): software timeout, address:0x%.8lx.\n",
 			       __func__, adr);
 			ret = -EIO;
 			break;
 		}
 
-		if (chip_good(map, chip, adr, datum)) {
+		if (chip_good_for_write(map, chip, adr, datum)) {
 			if (cfi_check_err_status(map, chip, adr))
 				ret = -EIO;
 			break;
-- 
2.32.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] mtd: cfi_cmdset_0002: write regression since v4.17-rc1
  2022-03-06 15:49                     ` Tokunori Ikegami
@ 2022-03-08  9:44                       ` Ahmad Fatoum
  2022-03-08 16:13                         ` Tokunori Ikegami
  0 siblings, 1 reply; 16+ messages in thread
From: Ahmad Fatoum @ 2022-03-08  9:44 UTC (permalink / raw)
  To: Tokunori Ikegami, Thorsten Leemhuis, linux-mtd, Joakim.Tjernlund,
	miquel.raynal, vigneshr, richard, regressions
  Cc: Chris Packham, Brian Norris, David Woodhouse, marek.vasut,
	cyrille.pitchen, linux-kernel, Pengutronix Kernel Team,
	linuxppc-dev

Hello Tokunori,

On 06.03.22 16:49, Tokunori Ikegami wrote:
> Hi,
> 
> On 2022/03/04 20:11, Ahmad Fatoum wrote:
>> Hello Tokunori-san,
>>
>> On 20.02.22 13:22, Tokunori Ikegami wrote:
>>> Hi Ahmad-san,
>>>
>>> Could you please try the version 2 patch attached for the error case?
>>> This version is to check the DQ true data 0xFF by chip_good().
>> I had a similar patch locally as well at first. I just tested yours
>> and I can't reproduce the issue.
> Thanks for your support.
> Sorry if possible could you please retest the attached the patch again since this fixed the version 1 patch maintainer review comments?

Works good.

Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de>

>>> But I am not sure if this works or not since the error is possible to be caused by Hi-Z 0xff on floating bus or etc.
>> That it works for me could be because of Hi-Z 0xff, which is why
>> decided against it.
> I see.
>>
>>>>>>> What seems to work for me is checking if chip_good or chip_ready
>>>>>>> and map_word is equal to 0xFF. I can't justify why this is ok though.
>>>>>>> (Worst case bus is floating at this point of time and Hi-Z is read
>>>>>>> as 0xff on CPU data lines...)
>>>>>> Sorry I am not sure about this.
>>>>>> I thought the chip_ready() itself is correct as implemented as the data sheet in the past.
>>>>>> But it did not work correctly so changed to use chip_good() instead as it is also correct.
>>>>> What exactly in the datasheet makes you believe chip_good is not appropriate?
>>>> I just mentioned about the actual issue behaviors as not worked chip_good() on S29GL964N and not worked chip_ready() on MX29GL512FHT2I-11G before etc.
>>>> Anyway let me recheck the data sheet details as just checked it again quickly but needed more investigation to understand.
>>> As far as I checked still both chip_good() and chip_ready() seem correct but still the root cause is unknown.
>>> If as you mentioned the issue was cased by the DQ true data 0xFF I am not sure why the read work without any error after the write operation.
>>> Also if the error was caused by the Hi-Z 0xff on floating bus as mentioned I am not sure why the read work without any error after the write operation with chip_ready().
>>> Sorry anyway the root cause is also unknown when the write operation was changed to use chip_good() instead of chip_ready().
>> I've be ok with v1 then. Restores working behavior for me and shouldn't break others.
> 
> Noted but still I am thinking the version 2 patch to check 0xff seems better than to use chip_ready() so let me consider this again later.

The original version has less room for surprise as it restores previously
working behavior. Assuming 0xFF to be good without backing from documentation
is more risky IMO.

Thanks for your continued support,
Ahmad

> 
> Regards,
> Ikegami
> 
>>
>> Cheers and thanks again,
>> Ahmad
>>
>>> Regards,
>>> Ikegami
>>>
>>>> Regards,
>>>> Ikegami
>>>>
>>>>> Cheers,
>>>>> Ahmad
>>>>>
>>>>>
>>


-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] mtd: cfi_cmdset_0002: write regression since v4.17-rc1
  2022-03-08  9:44                       ` Ahmad Fatoum
@ 2022-03-08 16:13                         ` Tokunori Ikegami
  2022-03-08 16:23                           ` Ahmad Fatoum
  0 siblings, 1 reply; 16+ messages in thread
From: Tokunori Ikegami @ 2022-03-08 16:13 UTC (permalink / raw)
  To: Ahmad Fatoum, Thorsten Leemhuis, linux-mtd, Joakim.Tjernlund,
	miquel.raynal, vigneshr, richard, regressions
  Cc: Chris Packham, Brian Norris, David Woodhouse, marek.vasut,
	cyrille.pitchen, linux-kernel, Pengutronix Kernel Team,
	linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 3339 bytes --]

Hi Ahmad-san,

On 2022/03/08 18:44, Ahmad Fatoum wrote:
> Hello Tokunori,
>
> On 06.03.22 16:49, Tokunori Ikegami wrote:
>> Hi,
>>
>> On 2022/03/04 20:11, Ahmad Fatoum wrote:
>>> Hello Tokunori-san,
>>>
>>> On 20.02.22 13:22, Tokunori Ikegami wrote:
>>>> Hi Ahmad-san,
>>>>
>>>> Could you please try the version 2 patch attached for the error case?
>>>> This version is to check the DQ true data 0xFF by chip_good().
>>> I had a similar patch locally as well at first. I just tested yours
>>> and I can't reproduce the issue.
>> Thanks for your support.
>> Sorry if possible could you please retest the attached the patch again since this fixed the version 1 patch maintainer review comments?
> Works good.
>
> Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Thank you so much for your test.
>
>>>> But I am not sure if this works or not since the error is possible to be caused by Hi-Z 0xff on floating bus or etc.
>>> That it works for me could be because of Hi-Z 0xff, which is why
>>> decided against it.
>> I see.
>>>>>>>> What seems to work for me is checking if chip_good or chip_ready
>>>>>>>> and map_word is equal to 0xFF. I can't justify why this is ok though.
>>>>>>>> (Worst case bus is floating at this point of time and Hi-Z is read
>>>>>>>> as 0xff on CPU data lines...)
>>>>>>> Sorry I am not sure about this.
>>>>>>> I thought the chip_ready() itself is correct as implemented as the data sheet in the past.
>>>>>>> But it did not work correctly so changed to use chip_good() instead as it is also correct.
>>>>>> What exactly in the datasheet makes you believe chip_good is not appropriate?
>>>>> I just mentioned about the actual issue behaviors as not worked chip_good() on S29GL964N and not worked chip_ready() on MX29GL512FHT2I-11G before etc.
>>>>> Anyway let me recheck the data sheet details as just checked it again quickly but needed more investigation to understand.
>>>> As far as I checked still both chip_good() and chip_ready() seem correct but still the root cause is unknown.
>>>> If as you mentioned the issue was cased by the DQ true data 0xFF I am not sure why the read work without any error after the write operation.
>>>> Also if the error was caused by the Hi-Z 0xff on floating bus as mentioned I am not sure why the read work without any error after the write operation with chip_ready().
>>>> Sorry anyway the root cause is also unknown when the write operation was changed to use chip_good() instead of chip_ready().
>>> I've be ok with v1 then. Restores working behavior for me and shouldn't break others.
>> Noted but still I am thinking the version 2 patch to check 0xff seems better than to use chip_ready() so let me consider this again later.
> The original version has less room for surprise as it restores previously
> working behavior. Assuming 0xFF to be good without backing from documentation
> is more risky IMO.
The change to check 0xFF can be limited for the S29GL064N chip do you 
have any comment about this?
Just attached the patch changed as so and thinking to send the patch as 
version 3 to the maintainer if you are okay.

Regards,
Ikegami

>
> Thanks for your continued support,
> Ahmad
>
>> Regards,
>> Ikegami
>>
>>> Cheers and thanks again,
>>> Ahmad
>>>
>>>> Regards,
>>>> Ikegami
>>>>
>>>>> Regards,
>>>>> Ikegami
>>>>>
>>>>>> Cheers,
>>>>>> Ahmad
>>>>>>
>>>>>>
>

[-- Attachment #2: v3-0001-mtd-cfi_cmdset_0002-Change-chip_good-to-check-DQ-.patch --]
[-- Type: text/plain, Size: 5504 bytes --]

From f4e767b4c9b2d5139387175f0c57afd81f0b62de Mon Sep 17 00:00:00 2001
From: Tokunori Ikegami <ikegami.t@gmail.com>
Date: Sat, 19 Feb 2022 19:39:32 +0900
Subject: [PATCH v3] mtd: cfi_cmdset_0002: Change chip_good() to check DQ true
 data 0xFF on S29GL064N

The regression issue has been caused on S29GL064N and reported it.
The change mentioned for regression is to use chip_good() for buffered write.
Also it seems that the 0xFF value is read on the error case.
It is possible to be caused by DQ true data described by S29GL064N datasheet.
So change chip_good() to check DQ true data 0xFF additionally for the error.

Fixes: dfeae1073583("mtd: cfi_cmdset_0002: Change write buffer to check correct value")
Signed-off-by: Tokunori Ikegami <ikegami.t@gmail.com>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Vignesh Raghavendra <vigneshr@ti.com>
Cc: linux-mtd@lists.infradead.org
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/b687c259-6413-26c9-d4c9-b3afa69ea124@pengutronix.de/
---
 drivers/mtd/chips/cfi_cmdset_0002.c | 36 ++++++++++++++++++++---------
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c b/drivers/mtd/chips/cfi_cmdset_0002.c
index a761134fd3be..99c1c6741b69 100644
--- a/drivers/mtd/chips/cfi_cmdset_0002.c
+++ b/drivers/mtd/chips/cfi_cmdset_0002.c
@@ -48,6 +48,7 @@
 #define SST49LF040B		0x0050
 #define SST49LF008A		0x005a
 #define AT49BV6416		0x00d6
+#define S29GL064N_MN12		0x0c01
 
 /*
  * Status Register bit description. Used by flash devices that don't
@@ -462,7 +463,7 @@ static struct cfi_fixup cfi_fixup_table[] = {
 	{ CFI_MFR_AMD, 0x0056, fixup_use_secsi },
 	{ CFI_MFR_AMD, 0x005C, fixup_use_secsi },
 	{ CFI_MFR_AMD, 0x005F, fixup_use_secsi },
-	{ CFI_MFR_AMD, 0x0c01, fixup_s29gl064n_sectors },
+	{ CFI_MFR_AMD, S29GL064N_MN12, fixup_s29gl064n_sectors },
 	{ CFI_MFR_AMD, 0x1301, fixup_s29gl064n_sectors },
 	{ CFI_MFR_AMD, 0x1a00, fixup_s29gl032n_sectors },
 	{ CFI_MFR_AMD, 0x1a01, fixup_s29gl032n_sectors },
@@ -837,6 +838,11 @@ static int __xipram chip_ready(struct map_info *map, struct flchip *chip,
 	return map_word_equal(map, d, t);
 }
 
+static bool __xipram cfi_use_dq_true_data(struct cfi_private *cfi)
+{
+	return cfi->mfr == CFI_MFR_AMD && cfi->id == S29GL064N_MN12;
+}
+
 /*
  * Return true if the chip is ready and has the correct value.
  *
@@ -853,7 +859,7 @@ static int __xipram chip_ready(struct map_info *map, struct flchip *chip,
  *
  */
 static int __xipram chip_good(struct map_info *map, struct flchip *chip,
-			      unsigned long addr, map_word expected)
+			      unsigned long addr, map_word *expected)
 {
 	struct cfi_private *cfi = map->fldrv_priv;
 	map_word oldd, curd;
@@ -875,8 +881,16 @@ static int __xipram chip_good(struct map_info *map, struct flchip *chip,
 	oldd = map_read(map, addr);
 	curd = map_read(map, addr);
 
-	return	map_word_equal(map, oldd, curd) &&
-		map_word_equal(map, curd, expected);
+	if (!map_word_equal(map, oldd, curd))
+		return 0;
+
+	if (expected && map_word_equal(map, curd, *expected))
+		return 1;
+
+	if (cfi_use_dq_true_data(cfi))
+		return map_word_equal(map, oldd, map_word_ff(map));
+
+	return 0;
 }
 
 static int get_chip(struct map_info *map, struct flchip *chip, unsigned long adr, int mode)
@@ -1699,7 +1713,7 @@ static int __xipram do_write_oneword_once(struct map_info *map,
 		 * "chip_good" to avoid the failure due to scheduling.
 		 */
 		if (time_after(jiffies, timeo) &&
-		    !chip_good(map, chip, adr, datum)) {
+		    !chip_good(map, chip, adr, &datum)) {
 			xip_enable(map, chip, adr);
 			printk(KERN_WARNING "MTD %s(): software timeout\n", __func__);
 			xip_disable(map, chip, adr);
@@ -1707,7 +1721,7 @@ static int __xipram do_write_oneword_once(struct map_info *map,
 			break;
 		}
 
-		if (chip_good(map, chip, adr, datum)) {
+		if (chip_good(map, chip, adr, &datum)) {
 			if (cfi_check_err_status(map, chip, adr))
 				ret = -EIO;
 			break;
@@ -1979,14 +1993,14 @@ static int __xipram do_write_buffer_wait(struct map_info *map,
 		 * "chip_good" to avoid the failure due to scheduling.
 		 */
 		if (time_after(jiffies, timeo) &&
-		    !chip_good(map, chip, adr, datum)) {
+		    !chip_good(map, chip, adr, &datum)) {
 			pr_err("MTD %s(): software timeout, address:0x%.8lx.\n",
 			       __func__, adr);
 			ret = -EIO;
 			break;
 		}
 
-		if (chip_good(map, chip, adr, datum)) {
+		if (chip_good(map, chip, adr, &datum)) {
 			if (cfi_check_err_status(map, chip, adr))
 				ret = -EIO;
 			break;
@@ -2282,7 +2296,7 @@ static int do_panic_write_oneword(struct map_info *map, struct flchip *chip,
 		udelay(1);
 	}
 
-	if (!chip_good(map, chip, adr, datum) ||
+	if (!chip_good(map, chip, adr, &datum) ||
 	    cfi_check_err_status(map, chip, adr)) {
 		/* reset on all failures. */
 		map_write(map, CMD(0xF0), chip->start);
@@ -2478,7 +2492,7 @@ static int __xipram do_erase_chip(struct map_info *map, struct flchip *chip)
 			chip->erase_suspended = 0;
 		}
 
-		if (chip_good(map, chip, adr, map_word_ff(map))) {
+		if (chip_good(map, chip, adr, NULL)) {
 			if (cfi_check_err_status(map, chip, adr))
 				ret = -EIO;
 			break;
@@ -2577,7 +2591,7 @@ static int __xipram do_erase_oneblock(struct map_info *map, struct flchip *chip,
 			chip->erase_suspended = 0;
 		}
 
-		if (chip_good(map, chip, adr, map_word_ff(map))) {
+		if (chip_good(map, chip, adr, NULL)) {
 			if (cfi_check_err_status(map, chip, adr))
 				ret = -EIO;
 			break;
-- 
2.32.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] mtd: cfi_cmdset_0002: write regression since v4.17-rc1
  2022-03-08 16:13                         ` Tokunori Ikegami
@ 2022-03-08 16:23                           ` Ahmad Fatoum
  2022-03-08 16:40                             ` Tokunori Ikegami
  0 siblings, 1 reply; 16+ messages in thread
From: Ahmad Fatoum @ 2022-03-08 16:23 UTC (permalink / raw)
  To: Tokunori Ikegami, Thorsten Leemhuis, linux-mtd, Joakim.Tjernlund,
	miquel.raynal, vigneshr, richard, regressions
  Cc: Chris Packham, Brian Norris, David Woodhouse, marek.vasut,
	cyrille.pitchen, linux-kernel, Pengutronix Kernel Team,
	linuxppc-dev

Hello Tokunori-san,

On 08.03.22 17:13, Tokunori Ikegami wrote:
> Hi Ahmad-san,
> 
> On 2022/03/08 18:44, Ahmad Fatoum wrote:
>> Hello Tokunori,
>>
>> On 06.03.22 16:49, Tokunori Ikegami wrote:
>>> Hi,
>>>
>>> On 2022/03/04 20:11, Ahmad Fatoum wrote:
>>>> Hello Tokunori-san,
>>>>
>>>> On 20.02.22 13:22, Tokunori Ikegami wrote:
>>>>> Hi Ahmad-san,
>>>>>
>>>>> Could you please try the version 2 patch attached for the error case?
>>>>> This version is to check the DQ true data 0xFF by chip_good().
>>>> I had a similar patch locally as well at first. I just tested yours
>>>> and I can't reproduce the issue.
>>> Thanks for your support.
>>> Sorry if possible could you please retest the attached the patch again since this fixed the version 1 patch maintainer review comments?
>> Works good.
>>
>> Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
> Thank you so much for your test.
>>
>>>>> But I am not sure if this works or not since the error is possible to be caused by Hi-Z 0xff on floating bus or etc.
>>>> That it works for me could be because of Hi-Z 0xff, which is why
>>>> decided against it.
>>> I see.
>>>>>>>>> What seems to work for me is checking if chip_good or chip_ready
>>>>>>>>> and map_word is equal to 0xFF. I can't justify why this is ok though.
>>>>>>>>> (Worst case bus is floating at this point of time and Hi-Z is read
>>>>>>>>> as 0xff on CPU data lines...)
>>>>>>>> Sorry I am not sure about this.
>>>>>>>> I thought the chip_ready() itself is correct as implemented as the data sheet in the past.
>>>>>>>> But it did not work correctly so changed to use chip_good() instead as it is also correct.
>>>>>>> What exactly in the datasheet makes you believe chip_good is not appropriate?
>>>>>> I just mentioned about the actual issue behaviors as not worked chip_good() on S29GL964N and not worked chip_ready() on MX29GL512FHT2I-11G before etc.
>>>>>> Anyway let me recheck the data sheet details as just checked it again quickly but needed more investigation to understand.
>>>>> As far as I checked still both chip_good() and chip_ready() seem correct but still the root cause is unknown.
>>>>> If as you mentioned the issue was cased by the DQ true data 0xFF I am not sure why the read work without any error after the write operation.
>>>>> Also if the error was caused by the Hi-Z 0xff on floating bus as mentioned I am not sure why the read work without any error after the write operation with chip_ready().
>>>>> Sorry anyway the root cause is also unknown when the write operation was changed to use chip_good() instead of chip_ready().
>>>> I've be ok with v1 then. Restores working behavior for me and shouldn't break others.
>>> Noted but still I am thinking the version 2 patch to check 0xff seems better than to use chip_ready() so let me consider this again later.
>> The original version has less room for surprise as it restores previously
>> working behavior. Assuming 0xFF to be good without backing from documentation
>> is more risky IMO.
> The change to check 0xFF can be limited for the S29GL064N chip do you have any comment about this?

I see that, but I am not sure it's the correct thing to do on the S29GL064N,
even if it seems to work. In absence of definitive information from the vendor,
I'd prefer we just restore behavior as it was before, i.e. using chip_ready
instead of chip_good for S29GL064N. This is the way of least surprise.

> Just attached the patch changed as so and thinking to send the patch as version 3 to the maintainer if you are okay.
> 
> Regards,
> Ikegami
> 
>>
>> Thanks for your continued support,
>> Ahmad
>>
>>> Regards,
>>> Ikegami
>>>
>>>> Cheers and thanks again,
>>>> Ahmad
>>>>
>>>>> Regards,
>>>>> Ikegami
>>>>>
>>>>>> Regards,
>>>>>> Ikegami
>>>>>>
>>>>>>> Cheers,
>>>>>>> Ahmad
>>>>>>>
>>>>>>>
>>


-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] mtd: cfi_cmdset_0002: write regression since v4.17-rc1
  2022-03-08 16:23                           ` Ahmad Fatoum
@ 2022-03-08 16:40                             ` Tokunori Ikegami
  0 siblings, 0 replies; 16+ messages in thread
From: Tokunori Ikegami @ 2022-03-08 16:40 UTC (permalink / raw)
  To: Ahmad Fatoum, Thorsten Leemhuis, linux-mtd, Joakim.Tjernlund,
	miquel.raynal, vigneshr, richard, regressions
  Cc: Chris Packham, Brian Norris, David Woodhouse, marek.vasut,
	cyrille.pitchen, linux-kernel, Pengutronix Kernel Team,
	linuxppc-dev

Hi,

On 2022/03/09 1:23, Ahmad Fatoum wrote:
> Hello Tokunori-san,
>
> On 08.03.22 17:13, Tokunori Ikegami wrote:
>> Hi Ahmad-san,
>>
>> On 2022/03/08 18:44, Ahmad Fatoum wrote:
>>> Hello Tokunori,
>>>
>>> On 06.03.22 16:49, Tokunori Ikegami wrote:
>>>> Hi,
>>>>
>>>> On 2022/03/04 20:11, Ahmad Fatoum wrote:
>>>>> Hello Tokunori-san,
>>>>>
>>>>> On 20.02.22 13:22, Tokunori Ikegami wrote:
>>>>>> Hi Ahmad-san,
>>>>>>
>>>>>> Could you please try the version 2 patch attached for the error case?
>>>>>> This version is to check the DQ true data 0xFF by chip_good().
>>>>> I had a similar patch locally as well at first. I just tested yours
>>>>> and I can't reproduce the issue.
>>>> Thanks for your support.
>>>> Sorry if possible could you please retest the attached the patch again since this fixed the version 1 patch maintainer review comments?
>>> Works good.
>>>
>>> Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
>> Thank you so much for your test.
>>>>>> But I am not sure if this works or not since the error is possible to be caused by Hi-Z 0xff on floating bus or etc.
>>>>> That it works for me could be because of Hi-Z 0xff, which is why
>>>>> decided against it.
>>>> I see.
>>>>>>>>>> What seems to work for me is checking if chip_good or chip_ready
>>>>>>>>>> and map_word is equal to 0xFF. I can't justify why this is ok though.
>>>>>>>>>> (Worst case bus is floating at this point of time and Hi-Z is read
>>>>>>>>>> as 0xff on CPU data lines...)
>>>>>>>>> Sorry I am not sure about this.
>>>>>>>>> I thought the chip_ready() itself is correct as implemented as the data sheet in the past.
>>>>>>>>> But it did not work correctly so changed to use chip_good() instead as it is also correct.
>>>>>>>> What exactly in the datasheet makes you believe chip_good is not appropriate?
>>>>>>> I just mentioned about the actual issue behaviors as not worked chip_good() on S29GL964N and not worked chip_ready() on MX29GL512FHT2I-11G before etc.
>>>>>>> Anyway let me recheck the data sheet details as just checked it again quickly but needed more investigation to understand.
>>>>>> As far as I checked still both chip_good() and chip_ready() seem correct but still the root cause is unknown.
>>>>>> If as you mentioned the issue was cased by the DQ true data 0xFF I am not sure why the read work without any error after the write operation.
>>>>>> Also if the error was caused by the Hi-Z 0xff on floating bus as mentioned I am not sure why the read work without any error after the write operation with chip_ready().
>>>>>> Sorry anyway the root cause is also unknown when the write operation was changed to use chip_good() instead of chip_ready().
>>>>> I've be ok with v1 then. Restores working behavior for me and shouldn't break others.
>>>> Noted but still I am thinking the version 2 patch to check 0xff seems better than to use chip_ready() so let me consider this again later.
>>> The original version has less room for surprise as it restores previously
>>> working behavior. Assuming 0xFF to be good without backing from documentation
>>> is more risky IMO.
>> The change to check 0xFF can be limited for the S29GL064N chip do you have any comment about this?
> I see that, but I am not sure it's the correct thing to do on the S29GL064N,
> even if it seems to work. In absence of definitive information from the vendor,
> I'd prefer we just restore behavior as it was before, i.e. using chip_ready
> instead of chip_good for S29GL064N. This is the way of least surprise.

Thanks for your comment. I see okay I will keep the version patch 2 
reverting to use chip_ready() for S29GL064N under the review without the 
change to check 0xFF.

Regards,
Ikegami

>
>> Just attached the patch changed as so and thinking to send the patch as version 3 to the maintainer if you are okay.
>>
>> Regards,
>> Ikegami
>>
>>> Thanks for your continued support,
>>> Ahmad
>>>
>>>> Regards,
>>>> Ikegami
>>>>
>>>>> Cheers and thanks again,
>>>>> Ahmad
>>>>>
>>>>>> Regards,
>>>>>> Ikegami
>>>>>>
>>>>>>> Regards,
>>>>>>> Ikegami
>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Ahmad
>>>>>>>>
>>>>>>>>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-03-08 16:40 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <b687c259-6413-26c9-d4c9-b3afa69ea124@pengutronix.de>
2021-12-14  7:23 ` [BUG] mtd: cfi_cmdset_0002: write regression since v4.17-rc1 Thorsten Leemhuis
2021-12-15 17:34   ` Tokunori Ikegami
2022-01-20 13:00     ` Thorsten Leemhuis
2022-01-28 12:55     ` Ahmad Fatoum
2022-01-29 18:01       ` Tokunori Ikegami
2022-02-07 14:28         ` Ahmad Fatoum
2022-02-13 16:47           ` Tokunori Ikegami
2022-02-14 16:22             ` Ahmad Fatoum
2022-02-14 18:46               ` Tokunori Ikegami
2022-02-20 12:22                 ` Tokunori Ikegami
2022-03-04 11:11                   ` Ahmad Fatoum
2022-03-06 15:49                     ` Tokunori Ikegami
2022-03-08  9:44                       ` Ahmad Fatoum
2022-03-08 16:13                         ` Tokunori Ikegami
2022-03-08 16:23                           ` Ahmad Fatoum
2022-03-08 16:40                             ` Tokunori Ikegami

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).