All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: cfi_cmdset_0002: do_write_buffer timeouts
       [not found] <CAN8TOE8dVYxBbb8MtozFio8dS-ypq14U8RuKTo38QcAtXM5Qrw@mail.gmail.com>
@ 2013-04-11  9:00 ` Brian Norris
  2013-04-11  9:21   ` Huang Shijie
                     ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Brian Norris @ 2013-04-11  9:00 UTC (permalink / raw)
  To: linux-mtd; +Cc: Huang Shijie, Kevin Cernekee, David Woodhouse, Artem Bityutskiy

[Sorry for the repeat email for some; Gmail switched me back to
HTML-mode, so my previous email couldn't be delivered to the MTD list]

Hi all,

I'm having some trouble where I am getting timeouts in cfi_cmdset_0002.c:

MTD do_write_buffer(): software timeout

I'm using a 64Mbyte Spansion S29GL512 NOR flash:

physmap-flash.0: Found 1 x16 devices at 0x0 in 16-bit bank.
Manufacturer ID 0x000001 Chip ID 0x002301

I can reproduce the timeout approximately 0.5% of the time on a simple
reboot, mount UBI rootfs test. My system has CONFIG_HZ=250, and so the
timeout comes out to just 1 jiffy. I have to increase this timeout to
at least 3 ticks to avoid the timeouts. (I've been running reboot
tests successfully for several days with the timeout as 3 jiffies.)

So my question is: what is the "best" way to decide these timeouts?
I'm inclined to just increase the timeout (and to use the proper
msecs_to_jiffies() macro, as a cleanup). But according to the
datasheets (which agree with the comments in the code), the max time
should be less than a millisecond. So simply increasing the timeout
may in fact just be masking some other bug.

Huang,

I noticed you recently sent a patch that adjusts the timeout print
message in do_write_buffer(). Have you had problems with this code
recently?

Any thoughts from any interested (or uninterested) party would be useful.

Thanks,
Brian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cfi_cmdset_0002: do_write_buffer timeouts
  2013-04-11  9:00 ` cfi_cmdset_0002: do_write_buffer timeouts Brian Norris
@ 2013-04-11  9:21   ` Huang Shijie
  2013-04-11 19:37     ` Brian Norris
  2013-04-12  6:23   ` Norbert van Bolhuis
  2013-04-12  6:34   ` Stefan Roese
  2 siblings, 1 reply; 9+ messages in thread
From: Huang Shijie @ 2013-04-11  9:21 UTC (permalink / raw)
  To: Brian Norris; +Cc: David Woodhouse, Kevin Cernekee, linux-mtd, Artem Bityutskiy

于 2013年04月11日 17:00, Brian Norris 写道:
> [Sorry for the repeat email for some; Gmail switched me back to
> HTML-mode, so my previous email couldn't be delivered to the MTD list]
>
> Hi all,
>
> I'm having some trouble where I am getting timeouts in cfi_cmdset_0002.c:
>
> MTD do_write_buffer(): software timeout
>
> I'm using a 64Mbyte Spansion S29GL512 NOR flash:
>
> physmap-flash.0: Found 1 x16 devices at 0x0 in 16-bit bank.
> Manufacturer ID 0x000001 Chip ID 0x002301
>
> I can reproduce the timeout approximately 0.5% of the time on a simple
> reboot, mount UBI rootfs test. My system has CONFIG_HZ=250, and so the
> timeout comes out to just 1 jiffy. I have to increase this timeout to
> at least 3 ticks to avoid the timeouts. (I've been running reboot
> tests successfully for several days with the timeout as 3 jiffies.)
>
> So my question is: what is the "best" way to decide these timeouts?
> I'm inclined to just increase the timeout (and to use the proper
> msecs_to_jiffies() macro, as a cleanup). But according to the
> datasheets (which agree with the comments in the code), the max time
> should be less than a millisecond. So simply increasing the timeout
> may in fact just be masking some other bug.
>
> Huang,
>
> I noticed you recently sent a patch that adjusts the timeout print
> message in do_write_buffer(). Have you had problems with this code
> recently?
>
yes. I am fighting with the timeout out now. :(

My chip is M29W256GL7AN6E.
physmap-flash.0: Found 1 x16 devices at 0x0 in 16-bit bank. Manufacturer 
ID 0x000020 Chip ID 0x00227e


When I run the bonnie++/ubifs on the NOR. I will get a timeout 
occasionally. Sometime it can passes the bonie++/ubifs test,
while sometimes it can not.
The timeout occurs at some fixed address, such as 0x4e0000, 0x520000. I 
tried to extend the 1ms to 10ms for the buffer-write in do_write_buffer().
But the bug still occurs.



(I also tested other Nor, such as Spansion S29GL256P10 and Micron 
JS28F256M29EWL.
i do not meet the timeout issue with these two nor.)




thanks
Huang Shijie

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cfi_cmdset_0002: do_write_buffer timeouts
  2013-04-11  9:21   ` Huang Shijie
@ 2013-04-11 19:37     ` Brian Norris
  0 siblings, 0 replies; 9+ messages in thread
From: Brian Norris @ 2013-04-11 19:37 UTC (permalink / raw)
  To: Huang Shijie; +Cc: David Woodhouse, Kevin Cernekee, linux-mtd, Artem Bityutskiy

On Thu, Apr 11, 2013 at 2:21 AM, Huang Shijie <b32955@freescale.com> wrote:
> 于 2013年04月11日 17:00, Brian Norris 写道:
>> I'm having some trouble where I am getting timeouts in cfi_cmdset_0002.c:
>>
>> MTD do_write_buffer(): software timeout
>>
>> I'm using a 64Mbyte Spansion S29GL512 NOR flash:
>>
>> physmap-flash.0: Found 1 x16 devices at 0x0 in 16-bit bank.
>> Manufacturer ID 0x000001 Chip ID 0x002301
>>
>> I can reproduce the timeout approximately 0.5% of the time on a simple
>> reboot, mount UBI rootfs test. My system has CONFIG_HZ=250, and so the
>> timeout comes out to just 1 jiffy. I have to increase this timeout to
>> at least 3 ticks to avoid the timeouts. (I've been running reboot
>> tests successfully for several days with the timeout as 3 jiffies.)
>>
>> So my question is: what is the "best" way to decide these timeouts?
>> I'm inclined to just increase the timeout (and to use the proper
>> msecs_to_jiffies() macro, as a cleanup). But according to the
>> datasheets (which agree with the comments in the code), the max time
>> should be less than a millisecond. So simply increasing the timeout
>> may in fact just be masking some other bug.
>>
>> Huang,
>>
>> I noticed you recently sent a patch that adjusts the timeout print
>> message in do_write_buffer(). Have you had problems with this code
>> recently?
>>
> yes. I am fighting with the timeout out now. :(
>
> My chip is M29W256GL7AN6E.
> physmap-flash.0: Found 1 x16 devices at 0x0 in 16-bit bank. Manufacturer ID
> 0x000020 Chip ID 0x00227e
>
>
> When I run the bonnie++/ubifs on the NOR. I will get a timeout occasionally.
> Sometime it can passes the bonie++/ubifs test,
> while sometimes it can not.
> The timeout occurs at some fixed address, such as 0x4e0000, 0x520000. I
> tried to extend the 1ms to 10ms for the buffer-write in do_write_buffer().
> But the bug still occurs.

Well, our timeouts are a little different then. A larger timeout
solves all my problems. And my timeouts aren't at consistent
addresses. Here's a sampling of mine over the last few hours.

MTD do_write_buffer(): software timeout @ address 0x240b87e
MTD do_write_buffer(): software timeout @ address 0x248b6be
MTD do_write_buffer(): software timeout @ address 0x132067e
MTD do_write_buffer(): software timeout @ address 0x31712fe
MTD do_write_buffer(): software timeout @ address 0x3c0e0fe
MTD do_write_buffer(): software timeout @ address 0xd2037e
MTD do_write_buffer(): software timeout @ address 0x318043e
MTD do_write_buffer(): software timeout @ address 0x2a201fe
MTD do_write_buffer(): software timeout @ address 0x2a4f47e
MTD do_write_buffer(): software timeout @ address 0x2a3ef7e

> (I also tested other Nor, such as Spansion S29GL256P10 and Micron
> JS28F256M29EWL.
> i do not meet the timeout issue with these two nor.)

Brian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cfi_cmdset_0002: do_write_buffer timeouts
  2013-04-11  9:00 ` cfi_cmdset_0002: do_write_buffer timeouts Brian Norris
  2013-04-11  9:21   ` Huang Shijie
@ 2013-04-12  6:23   ` Norbert van Bolhuis
  2013-04-13  2:59     ` Brian Norris
  2013-04-12  6:34   ` Stefan Roese
  2 siblings, 1 reply; 9+ messages in thread
From: Norbert van Bolhuis @ 2013-04-12  6:23 UTC (permalink / raw)
  To: Brian Norris
  Cc: Huang Shijie, Kevin Cernekee, linux-mtd, David Woodhouse,
	Artem Bityutskiy

On 04/11/13 11:00, Brian Norris wrote:
> [Sorry for the repeat email for some; Gmail switched me back to
> HTML-mode, so my previous email couldn't be delivered to the MTD list]
>
> Hi all,
>
> I'm having some trouble where I am getting timeouts in cfi_cmdset_0002.c:
>
> MTD do_write_buffer(): software timeout
>
> I'm using a 64Mbyte Spansion S29GL512 NOR flash:
>
> physmap-flash.0: Found 1 x16 devices at 0x0 in 16-bit bank.
> Manufacturer ID 0x000001 Chip ID 0x002301
>
> I can reproduce the timeout approximately 0.5% of the time on a simple
> reboot, mount UBI rootfs test. My system has CONFIG_HZ=250, and so the
> timeout comes out to just 1 jiffy. I have to increase this timeout to
> at least 3 ticks to avoid the timeouts. (I've been running reboot
> tests successfully for several days with the timeout as 3 jiffies.)
>
> So my question is: what is the "best" way to decide these timeouts?
> I'm inclined to just increase the timeout (and to use the proper
> msecs_to_jiffies() macro, as a cleanup). But according to the
> datasheets (which agree with the comments in the code), the max time
> should be less than a millisecond. So simply increasing the timeout
> may in fact just be masking some other bug.
>
> Huang,
>
> I noticed you recently sent a patch that adjusts the timeout print
> message in do_write_buffer(). Have you had problems with this code
> recently?
>
> Any thoughts from any interested (or uninterested) party would be useful.
>
> Thanks,
> Brian
>


This:

http://lkml.org/lkml/2009/9/3/84

maybe your problem.

Try disabling CONFIG_NO_HZ and you know for sure.

---
Norbert

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cfi_cmdset_0002: do_write_buffer timeouts
  2013-04-11  9:00 ` cfi_cmdset_0002: do_write_buffer timeouts Brian Norris
  2013-04-11  9:21   ` Huang Shijie
  2013-04-12  6:23   ` Norbert van Bolhuis
@ 2013-04-12  6:34   ` Stefan Roese
  2 siblings, 0 replies; 9+ messages in thread
From: Stefan Roese @ 2013-04-12  6:34 UTC (permalink / raw)
  To: Brian Norris
  Cc: Huang Shijie, Kevin Cernekee, linux-mtd, David Woodhouse,
	Artem Bityutskiy

On 11.04.2013 11:00, Brian Norris wrote:
> [Sorry for the repeat email for some; Gmail switched me back to
> HTML-mode, so my previous email couldn't be delivered to the MTD list]
> 
> Hi all,
> 
> I'm having some trouble where I am getting timeouts in cfi_cmdset_0002.c:
> 
> MTD do_write_buffer(): software timeout
> 
> I'm using a 64Mbyte Spansion S29GL512 NOR flash:
> 
> physmap-flash.0: Found 1 x16 devices at 0x0 in 16-bit bank.
> Manufacturer ID 0x000001 Chip ID 0x002301
> 
> I can reproduce the timeout approximately 0.5% of the time on a simple
> reboot, mount UBI rootfs test. My system has CONFIG_HZ=250, and so the
> timeout comes out to just 1 jiffy. I have to increase this timeout to
> at least 3 ticks to avoid the timeouts. (I've been running reboot
> tests successfully for several days with the timeout as 3 jiffies.)
> 
> So my question is: what is the "best" way to decide these timeouts?
> I'm inclined to just increase the timeout (and to use the proper
> msecs_to_jiffies() macro, as a cleanup). But according to the
> datasheets (which agree with the comments in the code), the max time
> should be less than a millisecond. So simply increasing the timeout
> may in fact just be masking some other bug.
> 
> Huang,
> 
> I noticed you recently sent a patch that adjusts the timeout print
> message in do_write_buffer(). Have you had problems with this code
> recently?
> 
> Any thoughts from any interested (or uninterested) party would be useful.

Without looking into the cmdset_0002 code, I remember fixing a
similar issue for cmdset_0001 a few months ago:

git id: 7be1f6b9a1ae3476a424380b52aad7c14c3273ab
Author: Stefan Roese <sr@denx.de>  2012-08-28 11:34:13
Committer: David Woodhouse <David.Woodhouse@intel.com>  2012-09-29 16:29:08
Follows: v3.6-rc2
Precedes: v3.7-rc1

    mtd: cfi_cmdset_0001: Fix problem with unlocking timeout
    
    Unlocking may take up to 1.4 seconds on some Intel flashes. So
    lets use a max. of 1.5 seconds (1500ms) as timeout.
    
    See "Clear Block Lock-Bits Time" on page 40 in
    "3 Volt Intel StrataFlash Memory" 28F128J3,28F640J3,28F320J3 manual
    from February 2003
    
    This patch also fixes some other problems with this timeout:
    
    - Don't use HZ in timeout "calculation"!
      While testing we noticed that an unlocking timeout occured with
      HZ=1000 and didn't occur with HZ=300. This was because the
      timeout parameter was calculated differently depending on the
      HZ value. Now a fixed value of 1500ms is used.
    
    - The last parameter of WAIT_TIMEOUT (defined to
      inval_cache_and_wait_for_operation) has to be passed in
      micro-seconds. So multiply the ms value with 1000 and not 100
      to calculate this value.
    
    - Use variable name "mdelay" instead of misleading "udelay".
    

One main issue here was that the resulting timeout was HZ related
resulting in different behavior depending on the HZ configuration.

This current issue here might be related, not sure though.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cfi_cmdset_0002: do_write_buffer timeouts
  2013-04-12  6:23   ` Norbert van Bolhuis
@ 2013-04-13  2:59     ` Brian Norris
  2013-04-15  7:55       ` Huang Shijie
  0 siblings, 1 reply; 9+ messages in thread
From: Brian Norris @ 2013-04-13  2:59 UTC (permalink / raw)
  To: Norbert van Bolhuis
  Cc: Huang Shijie, Kevin Cernekee, linux-mtd, David Woodhouse,
	Artem Bityutskiy

On Thu, Apr 11, 2013 at 11:23 PM, Norbert van Bolhuis
<nvbolhuis@aimvalley.nl> wrote:
> On 04/11/13 11:00, Brian Norris wrote:
>>
>> [Sorry for the repeat email for some; Gmail switched me back to
>> HTML-mode, so my previous email couldn't be delivered to the MTD list]
>>
>> Hi all,
>>
>> I'm having some trouble where I am getting timeouts in cfi_cmdset_0002.c:
>>
>> MTD do_write_buffer(): software timeout
>>
>> I'm using a 64Mbyte Spansion S29GL512 NOR flash:
>>
>> physmap-flash.0: Found 1 x16 devices at 0x0 in 16-bit bank.
>> Manufacturer ID 0x000001 Chip ID 0x002301
>>
>> I can reproduce the timeout approximately 0.5% of the time on a simple
>> reboot, mount UBI rootfs test. My system has CONFIG_HZ=250, and so the
>> timeout comes out to just 1 jiffy. I have to increase this timeout to
>> at least 3 ticks to avoid the timeouts. (I've been running reboot
>> tests successfully for several days with the timeout as 3 jiffies.)
>>
>> So my question is: what is the "best" way to decide these timeouts?
>> I'm inclined to just increase the timeout (and to use the proper
>> msecs_to_jiffies() macro, as a cleanup). But according to the
>> datasheets (which agree with the comments in the code), the max time
>> should be less than a millisecond. So simply increasing the timeout
>> may in fact just be masking some other bug.
>>
>> Huang,
>>
>> I noticed you recently sent a patch that adjusts the timeout print
>> message in do_write_buffer(). Have you had problems with this code
>> recently?
>>
>> Any thoughts from any interested (or uninterested) party would be useful.
>>
>> Thanks,
>> Brian
>>
>
>
> This:
>
> http://lkml.org/lkml/2009/9/3/84
>
> maybe your problem.
>
> Try disabling CONFIG_NO_HZ and you know for sure.

Disabling CONFIG_NO_HZ doesn't fix my problem.

Brian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cfi_cmdset_0002: do_write_buffer timeouts
  2013-04-13  2:59     ` Brian Norris
@ 2013-04-15  7:55       ` Huang Shijie
  2013-04-17 21:45         ` Brian Norris
  0 siblings, 1 reply; 9+ messages in thread
From: Huang Shijie @ 2013-04-15  7:55 UTC (permalink / raw)
  To: Brian Norris
  Cc: David Woodhouse, Kevin Cernekee, linux-mtd, Norbert van Bolhuis,
	Artem Bityutskiy

于 2013年04月13日 10:59, Brian Norris 写道:
> Disabling CONFIG_NO_HZ doesn't fix my problem.
I also disable the CONFIG_NO_HZ, and it does not fix my problem too.

But after i remove the mutex_unlock/mutex_lock in 
UDELAY/INVALIDATE_CACHE_UDELAY,
my problem disappears. I run for three days, no timeout occurs. (I do 
not enable the CONFIG_MTD_XIP).


--- a/drivers/mtd/chips/cfi_cmdset_0002.c
+++ b/drivers/mtd/chips/cfi_cmdset_0002.c
@@ -1043,17 +1043,13 @@ static void __xipram xip_udelay(struct map_info 
*map, struct flchip *chip,

#define UDELAY(map, chip, adr, usec) \
do { \
- mutex_unlock(&chip->mutex); \
cfi_udelay(usec); \
- mutex_lock(&chip->mutex); \
} while (0)

#define INVALIDATE_CACHE_UDELAY(map, chip, adr, len, usec) \
do { \
- mutex_unlock(&chip->mutex); \
INVALIDATE_CACHED_RANGE(map, adr, len); \
cfi_udelay(usec); \
- mutex_lock(&chip->mutex); \
} while (0)

#endif
-- 




thanks
Huang Shijie

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cfi_cmdset_0002: do_write_buffer timeouts
  2013-04-15  7:55       ` Huang Shijie
@ 2013-04-17 21:45         ` Brian Norris
  2013-04-18  2:09           ` Huang Shijie
  0 siblings, 1 reply; 9+ messages in thread
From: Brian Norris @ 2013-04-17 21:45 UTC (permalink / raw)
  To: Huang Shijie
  Cc: David Woodhouse, Kevin Cernekee, linux-mtd, Norbert van Bolhuis,
	Artem Bityutskiy

On Mon, Apr 15, 2013 at 12:55 AM, Huang Shijie <b32955@freescale.com> wrote:
> 于 2013年04月13日 10:59, Brian Norris 写道:
>
>> Disabling CONFIG_NO_HZ doesn't fix my problem.
>
> I also disable the CONFIG_NO_HZ, and it does not fix my problem too.
>
> But after i remove the mutex_unlock/mutex_lock in
> UDELAY/INVALIDATE_CACHE_UDELAY,
> my problem disappears. I run for three days, no timeout occurs. (I do not
> enable the CONFIG_MTD_XIP).
>
>
> --- a/drivers/mtd/chips/cfi_cmdset_0002.c
> +++ b/drivers/mtd/chips/cfi_cmdset_0002.c
> @@ -1043,17 +1043,13 @@ static void __xipram xip_udelay(struct map_info
> *map, struct flchip *chip,
>
> #define UDELAY(map, chip, adr, usec) \
> do { \
> - mutex_unlock(&chip->mutex); \
> cfi_udelay(usec); \
> - mutex_lock(&chip->mutex); \
> } while (0)
>
> #define INVALIDATE_CACHE_UDELAY(map, chip, adr, len, usec) \
> do { \
> - mutex_unlock(&chip->mutex); \
> INVALIDATE_CACHED_RANGE(map, adr, len); \
> cfi_udelay(usec); \
> - mutex_lock(&chip->mutex); \
> } while (0)
>
> #endif

This patch doesn't solve my problem, so it seems that Huang and I
probably are seeing different root causes for these timeouts.

I tried applying this patch and then timing the exact delay seen by
the time we "time out" (by directly accessing the CPU count register),
and the delay is always very close to 4ms (with my kernel, HZ=250, so
4ms is expected). So it seems like my system is waiting plenty long
(according to the flash part specification) but if I wait even longer,
the operation does complete successfully.

I'll continue to look at this issue, but I thought I'd post my results so far.

Brian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cfi_cmdset_0002: do_write_buffer timeouts
  2013-04-17 21:45         ` Brian Norris
@ 2013-04-18  2:09           ` Huang Shijie
  0 siblings, 0 replies; 9+ messages in thread
From: Huang Shijie @ 2013-04-18  2:09 UTC (permalink / raw)
  To: Brian Norris
  Cc: David Woodhouse, Kevin Cernekee, linux-mtd, Norbert van Bolhuis,
	Artem Bityutskiy

于 2013年04月18日 05:45, Brian Norris 写道:
> This patch doesn't solve my problem, so it seems that Huang and I
> probably are seeing different root causes for these timeouts.
>
yes. My timeout maybe caused by the error in the erase-suspend/erase-resume.

thanks
Huang Shijie

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-04-18  2:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAN8TOE8dVYxBbb8MtozFio8dS-ypq14U8RuKTo38QcAtXM5Qrw@mail.gmail.com>
2013-04-11  9:00 ` cfi_cmdset_0002: do_write_buffer timeouts Brian Norris
2013-04-11  9:21   ` Huang Shijie
2013-04-11 19:37     ` Brian Norris
2013-04-12  6:23   ` Norbert van Bolhuis
2013-04-13  2:59     ` Brian Norris
2013-04-15  7:55       ` Huang Shijie
2013-04-17 21:45         ` Brian Norris
2013-04-18  2:09           ` Huang Shijie
2013-04-12  6:34   ` Stefan Roese

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.