All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/2] Micron (formerly Numonyx) M29EW NOR flash issues
@ 2012-06-18  7:24 Gerlando Falauto
  2012-06-18  7:24 ` [PATCH RFC 1/2] mtd: cfi_cmdset_0002: Micron M29EW bugfix "Correcting Erase Suspend Hang Ups" Gerlando Falauto
  2012-06-18  7:24 ` [PATCH RFC 2/2] mtd: cfi_cmdset_0002: Micron M29EW bugfix "Resolving the Delay After Resume Issue" Gerlando Falauto
  0 siblings, 2 replies; 7+ messages in thread
From: Gerlando Falauto @ 2012-06-18  7:24 UTC (permalink / raw)
  To: linux-mtd; +Cc: Holger Brunck, Leo, Stefan Bigler, Gerlando Falauto

Hi everyone,

we have been experiencing some problems with the above NOR flash.
Please find our analysis and the patches we applied to 3.0.8.
Patches of course are not meant for mainlining "as is"; we'd rather appreciate
your suggestions as to how to make them suitable for inclusion.
It should be a fairly common flash part, but it sounds like noone has run into
this issue so far except
http://lists.infradead.org/pipermail/linux-mtd/2011-April/034867.html

Thank you very much,
Gerlando Falauto

PROBLEM ANALYSIS:

This issue only appears when performing concurrent operations like simultaneous
UBI volume creation/deletion, but rarerly under normal conditions.
The problem seems to happen rather soon though when the unit is put in a
Climate Chamber at high temperatures (say 60°C).

In our experience the most probable root cause is the delay needed after an
erase resume, before a new erase suspend can be issued again [PATCH 2/2].

This is documented on page 22 of the technical note TN-13-07 from Micron:
http://www.micron.com/~/media/Documents/Products/Technical%20Note/NOR%20Flash/tn1307_patching_linux_kernel_for_m29.ashx

[NOTE: TN-13-07 explicitly refers to "some revisions of the M29EW (for example,
A1 and A2 step revisions)", even though our boards are equipped with silicon
revision 12 = 0xC]

Adding this delay with a value of 500 us seems to fix the problem even at high
temperatures. This is also incidentally the typical value for the "Erase to
suspend" parameter as specified the datasheet:
Erase to suspend is the typical time between an initial BLOCK ERASE or
ERASE RESUME command and a subsequent ERASE SUSPEND command. Violating the
specification repeatedly during any particular block erase may cause erase
failures.

Also, [PATCH 1/2] described on page 20 (Correcting Erase Suspend Hang Ups,
was added first, although it does not appear to have any impact on the issue.

SIDE NOTE:

The flash stressing test used to reproduce this issue has shown in some cases
the unforeseen side effect of inexplicably damaging sector 0 (which is where
u-boot code resides). When this happened, sector 0 could not be erased anymore,
not even through JTAG. A couple of times, further attempts at reprogramming the
sector mysteriously lead it to be erasable again.
One particular board however (incidentally brought into that condition after a
test in the climate chamber) showed unstable values for some bits of sector 0
among successive reads. All other sectors seemed to be immune to this problem.
For this board I could not find any way to erase sector 0.
This is currently an open issue with wrong software operations causing hardware
to break.

Signed-off-by: Gerlando Falauto <gerlando.falauto@keymile.com>
Cc: Stefan Bigler <stefan.bigler@keymile.com>
Cc: Holger Brunck <holger.brunck@keymile.com>
Cc: Leo <leo.costa77@gmail.com>

Gerlando Falauto (2):
  mtd: cfi_cmdset_0002: Micron M29EW bugfix "Correcting Erase Suspend
    Hang Ups"
  mtd: cfi_cmdset_0002: Micron M29EW bugfix "Resolving the Delay After
    Resume Issue"

 drivers/mtd/chips/cfi_cmdset_0002.c |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH RFC 1/2] mtd: cfi_cmdset_0002: Micron M29EW bugfix "Correcting Erase Suspend Hang Ups"
  2012-06-18  7:24 [PATCH RFC 0/2] Micron (formerly Numonyx) M29EW NOR flash issues Gerlando Falauto
@ 2012-06-18  7:24 ` Gerlando Falauto
  2012-06-27 10:27   ` Artem Bityutskiy
  2012-06-18  7:24 ` [PATCH RFC 2/2] mtd: cfi_cmdset_0002: Micron M29EW bugfix "Resolving the Delay After Resume Issue" Gerlando Falauto
  1 sibling, 1 reply; 7+ messages in thread
From: Gerlando Falauto @ 2012-06-18  7:24 UTC (permalink / raw)
  To: linux-mtd; +Cc: Holger Brunck, Leo, Stefan Bigler, Gerlando Falauto

>From TN-13-07: Patching the Linux Kernel and U-Boot for M29 Flash, page 20:

Some revisions of the M29EW suffer from erase suspend hang ups. In particular, it can
occur when the sequence Erase Confirm -> Suspend -> Program -> Resume
causes a lockup due to internal timing issues. The consequence is that the erase cannot
be resumed without inserting a dummy command after programming and prior to
resuming. [...] The work-around is to issue a dummy write cycle that
writes an F0 command code before the RESUME command.

Signed-off-by: Stefan Bigler <stefan.bigler@keymile.com>
Signed-off-by: Gerlando Falauto <gerlando.falauto@keymile.com>
Cc: Holger Brunck <holger.brunck@keymile.com>
Cc: Leo <leo.costa77@gmail.com>

---
 drivers/mtd/chips/cfi_cmdset_0002.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c b/drivers/mtd/chips/cfi_cmdset_0002.c
index 23175ed..72f6164 100644
--- a/drivers/mtd/chips/cfi_cmdset_0002.c
+++ b/drivers/mtd/chips/cfi_cmdset_0002.c
@@ -761,6 +761,11 @@ static void put_chip(struct map_info *map, struct flchip *chip, unsigned long ad
 
 	switch(chip->oldstate) {
 	case FL_ERASING:
+		/* before resume, insert a dummy 0xF0 cycle for Micron M29EW devices */
+		if ( (cfi->mfr == 0x0089) &&
+		   (((cfi->device_type == CFI_DEVICETYPE_X8) && ((cfi->id & 0xff)== 0x7e))
+			  || ((cfi->device_type == CFI_DEVICETYPE_X16) && (cfi->id == 0x227e))) )
+			map_write(map, CMD(0xF0), chip->in_progress_block_addr);
 		map_write(map, cfi->sector_erase_cmd, chip->in_progress_block_addr);
 		chip->oldstate = FL_READY;
 		chip->state = FL_ERASING;
@@ -904,6 +909,12 @@ static void __xipram xip_udelay(struct map_info *map, struct flchip *chip,
 			local_irq_disable();
 
 			/* Resume the write or erase operation */
+			/* before resume, insert a dummy 0xF0 cycle for Micron M29EW devices */
+			if ( (cfi->mfr == 0x0089) &&
+				 (((cfi->device_type == CFI_DEVICETYPE_X8) && ((cfi->id & 0xff)== 0x7e))
+				  || ((cfi->device_type == CFI_DEVICETYPE_X16) && (cfi->id == 0x227e))) )
+					map_write(map, CMD(0xF0), adr);
+
 			map_write(map, cfi->sector_erase_cmd, adr);
 			chip->state = oldstate;
 			start = xip_currtime();
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH RFC 2/2] mtd: cfi_cmdset_0002: Micron M29EW bugfix "Resolving the Delay After Resume Issue"
  2012-06-18  7:24 [PATCH RFC 0/2] Micron (formerly Numonyx) M29EW NOR flash issues Gerlando Falauto
  2012-06-18  7:24 ` [PATCH RFC 1/2] mtd: cfi_cmdset_0002: Micron M29EW bugfix "Correcting Erase Suspend Hang Ups" Gerlando Falauto
@ 2012-06-18  7:24 ` Gerlando Falauto
  1 sibling, 0 replies; 7+ messages in thread
From: Gerlando Falauto @ 2012-06-18  7:24 UTC (permalink / raw)
  To: linux-mtd; +Cc: Holger Brunck, Leo, Stefan Bigler, Gerlando Falauto

>From TN-13-07: Patching the Linux Kernel and U-Boot for M29 Flash, page 22:

Some revisions of the M29EW (for example, A1 and A2 step revisions) are affected by a
problem that could cause a hang up when an ERASE SUSPEND command is issued after
an ERASE RESUME operation without waiting for a minimum delay. The result is that
once the ERASE seems to be completed (no bits are toggling), the contents of the Flash
memory block on which the erase was ongoing could be inconsistent with the expected
values (typically, the array value is stuck to the 0xC0, 0xC4, 0x80, or 0x84 values), causing
a consequent failure of the ERASE operation.
The occurrence of this issue could be high, especially when file system operations on the
Flash are intensive. As a result, it is recommended that a patch be applied. Intensive file
system operations can cause many calls to the garbage routine to free Flash space (also
by erasing physical Flash blocks) and as a result, many consecutive SUSPEND and
RESUME commands can occur.
The problem disappears when a delay is inserted after the RESUME command by using
the udelay () function available in Linux.

The DELAY value must be tuned based on the customer’s platform. The maximum value
that fixes the problem in all cases is 500us. But, in our experience, a delay of 30μs to 50μs
is sufficient in most cases.
We have chosen 500us because this latency is acceptable.

Signed-off-by: Stefan Bigler <stefan.bigler@keymile.com>
Signed-off-by: Gerlando Falauto <gerlando.falauto@keymile.com>
Cc: Holger Brunck <holger.brunck@keymile.com>
Cc: Leo <leo.costa77@gmail.com>

---
 drivers/mtd/chips/cfi_cmdset_0002.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c b/drivers/mtd/chips/cfi_cmdset_0002.c
index 72f6164..7fb24dc 100644
--- a/drivers/mtd/chips/cfi_cmdset_0002.c
+++ b/drivers/mtd/chips/cfi_cmdset_0002.c
@@ -767,6 +767,10 @@ static void put_chip(struct map_info *map, struct flchip *chip, unsigned long ad
 			  || ((cfi->device_type == CFI_DEVICETYPE_X16) && (cfi->id == 0x227e))) )
 			map_write(map, CMD(0xF0), chip->in_progress_block_addr);
 		map_write(map, cfi->sector_erase_cmd, chip->in_progress_block_addr);
+		/* Resolving the Delay After Resume Issue see Micron TN-13-07 */
+		/* Worstcase delay must be 500us but 30-50us should be ok as well
+		   nbigls has choosen 500us because this latency is acceptable */
+		udelay(500);
 		chip->oldstate = FL_READY;
 		chip->state = FL_ERASING;
 		break;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC 1/2] mtd: cfi_cmdset_0002: Micron M29EW bugfix "Correcting Erase Suspend Hang Ups"
  2012-06-18  7:24 ` [PATCH RFC 1/2] mtd: cfi_cmdset_0002: Micron M29EW bugfix "Correcting Erase Suspend Hang Ups" Gerlando Falauto
@ 2012-06-27 10:27   ` Artem Bityutskiy
  2012-07-03  7:09     ` [PATCH] mtd: cfi_cmdset_0002: Micron M29EW bugfixes as per TN-13-07 Gerlando Falauto
  0 siblings, 1 reply; 7+ messages in thread
From: Artem Bityutskiy @ 2012-06-27 10:27 UTC (permalink / raw)
  To: Gerlando Falauto; +Cc: Holger Brunck, Leo, linux-mtd, Stefan Bigler

[-- Attachment #1: Type: text/plain, Size: 795 bytes --]

On Mon, 2012-06-18 at 09:24 +0200, Gerlando Falauto wrote:
> +		/* before resume, insert a dummy 0xF0 cycle for Micron M29EW devices */
> +		if ( (cfi->mfr == 0x0089) &&
> +		   (((cfi->device_type == CFI_DEVICETYPE_X8) && ((cfi->id & 0xff)== 0x7e))
> +			  || ((cfi->device_type == CFI_DEVICETYPE_X16) && (cfi->id == 0x227e))) )
> +			map_write(map, CMD(0xF0), chip->in_progress_block_addr);

Please, separate the M29-specific quirks out to functions, do not inject
them to the main code.

Each quirk should be in a separate function, e.g.
'micron_m29_erase_quirk()'. The functions should have a descriptive
comment similar to what you added to the commit message. The function
just returns if the chip ID is not M29. This will be cleaner.

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] mtd: cfi_cmdset_0002: Micron M29EW bugfixes as per TN-13-07
  2012-06-27 10:27   ` Artem Bityutskiy
@ 2012-07-03  7:09     ` Gerlando Falauto
  2012-07-16 14:29       ` Artem Bityutskiy
  2013-02-12 14:50       ` David Woodhouse
  0 siblings, 2 replies; 7+ messages in thread
From: Gerlando Falauto @ 2012-07-03  7:09 UTC (permalink / raw)
  To: linux-mtd
  Cc: Holger Brunck, Stefan Bigler, Gerlando Falauto, Artem Bityutskiy

Fix the following issues with Micron's (formerly Numonyx)
M29EW NOR flash chips, as documented on TN-13-07:
- Correcting Erase Suspend Hang Ups (page 20)
- Resolving the Delay After Resume Issue (page 22)

Signed-off-by: Gerlando Falauto <gerlando.falauto@keymile.com>
Cc: Stefan Bigler <stefan.bigler@keymile.com>
Cc: Holger Brunck <holger.brunck@keymile.com>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
---
 drivers/mtd/chips/cfi_cmdset_0002.c |   69 +++++++++++++++++++++++++++++++++++
 1 files changed, 69 insertions(+), 0 deletions(-)

diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c b/drivers/mtd/chips/cfi_cmdset_0002.c
index 23175ed..7f23248 100644
--- a/drivers/mtd/chips/cfi_cmdset_0002.c
+++ b/drivers/mtd/chips/cfi_cmdset_0002.c
@@ -417,6 +417,70 @@ static void cfi_fixup_major_minor(struct cfi_private *cfi,
 	}
 }
 
+static int is_m29ew(struct cfi_private *cfi)
+{
+	if (cfi->mfr == CFI_MFR_INTEL)
+		if ((cfi->device_type == CFI_DEVICETYPE_X8  &&
+			(cfi->id & 0xff)	== 0x7e) ||
+		    (cfi->device_type == CFI_DEVICETYPE_X16 &&
+			cfi->id			== 0x227e))
+			return 1;
+	return 0;
+}
+
+/*
+ * From TN-13-07: Patching the Linux Kernel and U-Boot for M29 Flash, page 20:
+ * Some revisions of the M29EW suffer from erase suspend hang ups. In
+ * particular, it can occur when the sequence
+ * Erase Confirm -> Suspend -> Program -> Resume
+ * causes a lockup due to internal timing issues. The consequence is that the
+ * erase cannot be resumed without inserting a dummy command after programming
+ * and prior to resuming. [...] The work-around is to issue a dummy write cycle
+ * that writes an F0 command code before the RESUME command.
+ */
+static void cfi_fixup_m29ew_erase_suspend(struct map_info *map,
+					  unsigned long adr)
+{
+	struct cfi_private *cfi = map->fldrv_priv;
+	/* before resume, insert a dummy 0xF0 cycle for Micron M29EW devices */
+	if (is_m29ew(cfi))
+		map_write(map, CMD(0xF0), adr);
+}
+
+/*
+ * From TN-13-07: Patching the Linux Kernel and U-Boot for M29 Flash, page 22:
+ *
+ * Some revisions of the M29EW (for example, A1 and A2 step revisions)
+ * are affected by a problem that could cause a hang up when an ERASE SUSPEND
+ * command is issued after an ERASE RESUME operation without waiting for a
+ * minimum delay.  The result is that once the ERASE seems to be completed
+ * (no bits are toggling), the contents of the Flash memory block on which
+ * the erase was ongoing could be inconsistent with the expected values
+ * (typically, the array value is stuck to the 0xC0, 0xC4, 0x80, or 0x84
+ * values), causing a consequent failure of the ERASE operation.
+ * The occurrence of this issue could be high, especially when file system
+ * operations on the Flash are intensive.  As a result, it is recommended
+ * that a patch be applied.  Intensive file system operations can cause many
+ * calls to the garbage routine to free Flash space (also by erasing physical
+ * Flash blocks) and as a result, many consecutive SUSPEND and RESUME
+ * commands can occur.  The problem disappears when a delay is inserted after
+ * the RESUME command by using the udelay() function available in Linux.
+ * The DELAY value must be tuned based on the customer's platform.
+ * The maximum value that fixes the problem in all cases is 500us.
+ * But, in our experience, a delay of 30 us to 50 us is sufficient
+ * in most cases.
+ * We have chosen 500us because this latency is acceptable.
+ */
+static void cfi_fixup_m29ew_delay_after_resume(struct cfi_private *cfi)
+{
+	/*
+	 * Resolving the Delay After Resume Issue see Micron TN-13-07
+	 * Worstcase delay must be 500us but 30-50us should be ok as well
+	 */
+	if (is_m29ew(cfi))
+		cfi_udelay(500);
+}
+
 struct mtd_info *cfi_cmdset_0002(struct map_info *map, int primary)
 {
 	struct cfi_private *cfi = map->fldrv_priv;
@@ -761,7 +825,10 @@ static void put_chip(struct map_info *map, struct flchip *chip, unsigned long ad
 
 	switch(chip->oldstate) {
 	case FL_ERASING:
+		cfi_fixup_m29ew_erase_suspend(map,
+			chip->in_progress_block_addr);
 		map_write(map, cfi->sector_erase_cmd, chip->in_progress_block_addr);
+		cfi_fixup_m29ew_delay_after_resume(cfi);
 		chip->oldstate = FL_READY;
 		chip->state = FL_ERASING;
 		break;
@@ -903,6 +970,8 @@ static void __xipram xip_udelay(struct map_info *map, struct flchip *chip,
 			/* Disallow XIP again */
 			local_irq_disable();
 
+			/* Correct Erase Suspend Hangups for M29EW */
+			cfi_fixup_m29ew_erase_suspend(map, adr);
 			/* Resume the write or erase operation */
 			map_write(map, cfi->sector_erase_cmd, adr);
 			chip->state = oldstate;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] mtd: cfi_cmdset_0002: Micron M29EW bugfixes as per TN-13-07
  2012-07-03  7:09     ` [PATCH] mtd: cfi_cmdset_0002: Micron M29EW bugfixes as per TN-13-07 Gerlando Falauto
@ 2012-07-16 14:29       ` Artem Bityutskiy
  2013-02-12 14:50       ` David Woodhouse
  1 sibling, 0 replies; 7+ messages in thread
From: Artem Bityutskiy @ 2012-07-16 14:29 UTC (permalink / raw)
  To: Gerlando Falauto; +Cc: Stefan Bigler, Holger Brunck, linux-mtd

[-- Attachment #1: Type: text/plain, Size: 350 bytes --]

On Tue, 2012-07-03 at 09:09 +0200, Gerlando Falauto wrote:
> Fix the following issues with Micron's (formerly Numonyx)
> M29EW NOR flash chips, as documented on TN-13-07:
> - Correcting Erase Suspend Hang Ups (page 20)
> - Resolving the Delay After Resume Issue (page 22)

Pushed to l2-mtd.git, thanks!

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mtd: cfi_cmdset_0002: Micron M29EW bugfixes as per TN-13-07
  2012-07-03  7:09     ` [PATCH] mtd: cfi_cmdset_0002: Micron M29EW bugfixes as per TN-13-07 Gerlando Falauto
  2012-07-16 14:29       ` Artem Bityutskiy
@ 2013-02-12 14:50       ` David Woodhouse
  1 sibling, 0 replies; 7+ messages in thread
From: David Woodhouse @ 2013-02-12 14:50 UTC (permalink / raw)
  To: Gerlando Falauto
  Cc: Holger Brunck, Erwan Velu, linux-mtd, Stefan Bigler, Artem Bityutskiy

[-- Attachment #1: Type: text/plain, Size: 2303 bytes --]

On Tue, 2012-07-03 at 09:09 +0200, Gerlando Falauto wrote:
> +/*
> + * From TN-13-07: Patching the Linux Kernel and U-Boot for M29 Flash, page 22:
> + *
> + * Some revisions of the M29EW (for example, A1 and A2 step revisions)
> + * are affected by a problem that could cause a hang up when an ERASE SUSPEND
> + * command is issued after an ERASE RESUME operation without waiting for a
> + * minimum delay.  The result is that once the ERASE seems to be completed
> + * (no bits are toggling), the contents of the Flash memory block on which
> + * the erase was ongoing could be inconsistent with the expected values
> + * (typically, the array value is stuck to the 0xC0, 0xC4, 0x80, or 0x84
> + * values), causing a consequent failure of the ERASE operation.
> + * The occurrence of this issue could be high, especially when file system
> + * operations on the Flash are intensive.  As a result, it is recommended
> + * that a patch be applied.  Intensive file system operations can cause many
> + * calls to the garbage routine to free Flash space (also by erasing physical
> + * Flash blocks) and as a result, many consecutive SUSPEND and RESUME
> + * commands can occur.  The problem disappears when a delay is inserted after
> + * the RESUME command by using the udelay() function available in Linux.
> + * The DELAY value must be tuned based on the customer's platform.
> + * The maximum value that fixes the problem in all cases is 500us.
> + * But, in our experience, a delay of 30 us to 50 us is sufficient
> + * in most cases.
> + * We have chosen 500us because this latency is acceptable.
> + */
> +static void cfi_fixup_m29ew_delay_after_resume(struct cfi_private *cfi)
> +{
> +       /*
> +        * Resolving the Delay After Resume Issue see Micron TN-13-07
> +        * Worstcase delay must be 500us but 30-50us should be ok as well
> +        */
> +       if (is_m29ew(cfi))
> +               cfi_udelay(500);
> +}

Hm, this would be better off done without a hard delay right there, but
instead just note the timestamp. Then use your existing hook in erase
suspend to check that it's been long enough since the last resume, and
have a *conditional* delay if not.

This assumes you have a timer with enough precision, of course.

-- 
dwmw2


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-02-12 14:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-18  7:24 [PATCH RFC 0/2] Micron (formerly Numonyx) M29EW NOR flash issues Gerlando Falauto
2012-06-18  7:24 ` [PATCH RFC 1/2] mtd: cfi_cmdset_0002: Micron M29EW bugfix "Correcting Erase Suspend Hang Ups" Gerlando Falauto
2012-06-27 10:27   ` Artem Bityutskiy
2012-07-03  7:09     ` [PATCH] mtd: cfi_cmdset_0002: Micron M29EW bugfixes as per TN-13-07 Gerlando Falauto
2012-07-16 14:29       ` Artem Bityutskiy
2013-02-12 14:50       ` David Woodhouse
2012-06-18  7:24 ` [PATCH RFC 2/2] mtd: cfi_cmdset_0002: Micron M29EW bugfix "Resolving the Delay After Resume Issue" Gerlando Falauto

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.