All of lore.kernel.org
 help / color / mirror / Atom feed
* mtd locking
@ 2016-05-29 13:38 Matthias Auchmann
  2016-05-29 14:23 ` Richard Weinberger
  0 siblings, 1 reply; 6+ messages in thread
From: Matthias Auchmann @ 2016-05-29 13:38 UTC (permalink / raw)
  To: linux-mtd

Hi all!

I have a question - sorry if it turns out to be a beginner's question. I'm using kernel version 3.17, but as far as I can tell my question still applies.

My question is how locking is ensured with MTD. I understand that usually there would only be one file system attached to one mtd partition, but what if I concurrently ran multiple instances of nanddump from userspace? Is MTD thread-safe?

I'm asking since I had an issue where when I called nanddump on two partitions of the same NAND flash simultaneously, the ECC errors would count up for both nanddumps although only one partition had ECC errors. Experimentally adding a mutex to mtdpart.c's part_read() function solved the issue.

I then noticed that quite frequently in the MTD code, ecc_stats are saved, then some reads are performed, and then ecc_stats are compared. No mutex or spinlock or whatever protection is used. Given that the ECC errors can result in -EUCLEAN or a positive bitflip count (then resulting in scrubbing for UBI for example)... I started to feel a little uncomfortable about the whole MTD locking issue.

This is where the question comes from... could anyone shed light on this?

Thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mtd locking
  2016-05-29 13:38 mtd locking Matthias Auchmann
@ 2016-05-29 14:23 ` Richard Weinberger
  2016-05-29 15:53   ` Matthias Auchmann
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Weinberger @ 2016-05-29 14:23 UTC (permalink / raw)
  To: Matthias Auchmann; +Cc: linux-mtd

On Sun, May 29, 2016 at 3:38 PM, Matthias Auchmann <m.auchmann@artech.at> wrote:
> Hi all!
>
> I have a question - sorry if it turns out to be a beginner's question. I'm using kernel version 3.17, but as far as I can tell my question still applies.
>
> My question is how locking is ensured with MTD. I understand that usually there would only be one file system attached to one mtd partition, but what if I concurrently ran multiple instances of nanddump from userspace? Is MTD thread-safe?
>
> I'm asking since I had an issue where when I called nanddump on two partitions of the same NAND flash simultaneously, the ECC errors would count up for both nanddumps although only one partition had ECC errors. Experimentally adding a mutex to mtdpart.c's part_read() function solved the issue.

IIRC ECC error counting is not thread safe at all.
So, yes the behavior is indented.
...at least nobody cared enough to make it better.

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: mtd locking
  2016-05-29 14:23 ` Richard Weinberger
@ 2016-05-29 15:53   ` Matthias Auchmann
  2016-05-29 16:17     ` Richard Weinberger
  0 siblings, 1 reply; 6+ messages in thread
From: Matthias Auchmann @ 2016-05-29 15:53 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-mtd

By intended you mean it's wrong and we know, but it's still wrong, right?

What about thread safety in general? Is the rest of the mtd system thread safe? Can it happen that e.g. a command is issued to NAND, and before the buffer is read, another command is issued (and similar scenarios)? Or is there some kind of protection in place for that?

I'm willing to contribute and fix the ECC part, but I first need to understand what other mechanisms or thread safety issues there might still be...

Thanks!

BR,

Matthias


-----Original Message-----
From: Richard Weinberger [mailto:richard.weinberger@gmail.com] 
Sent: Sunday, May 29, 2016 4:24 PM
To: Matthias Auchmann
Cc: linux-mtd@lists.infradead.org
Subject: Re: mtd locking

On Sun, May 29, 2016 at 3:38 PM, Matthias Auchmann <m.auchmann@artech.at> wrote:
> Hi all!
>
> I have a question - sorry if it turns out to be a beginner's question. I'm using kernel version 3.17, but as far as I can tell my question still applies.
>
> My question is how locking is ensured with MTD. I understand that usually there would only be one file system attached to one mtd partition, but what if I concurrently ran multiple instances of nanddump from userspace? Is MTD thread-safe?
>
> I'm asking since I had an issue where when I called nanddump on two partitions of the same NAND flash simultaneously, the ECC errors would count up for both nanddumps although only one partition had ECC errors. Experimentally adding a mutex to mtdpart.c's part_read() function solved the issue.

IIRC ECC error counting is not thread safe at all.
So, yes the behavior is indented.
...at least nobody cared enough to make it better.

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mtd locking
  2016-05-29 15:53   ` Matthias Auchmann
@ 2016-05-29 16:17     ` Richard Weinberger
  2016-05-30  9:24       ` Boris Brezillon
  2016-05-31 18:11       ` Brian Norris
  0 siblings, 2 replies; 6+ messages in thread
From: Richard Weinberger @ 2016-05-29 16:17 UTC (permalink / raw)
  To: Matthias Auchmann; +Cc: linux-mtd

Hi!

Am 29.05.2016 um 17:53 schrieb Matthias Auchmann:
> By intended you mean it's wrong and we know, but it's still wrong, right?

I'd say lazy. Sometimes counters don't need to be exact.
Having exact counting is expensive. See network stack.

I'm not sure what the exact reasons in the ECC error counting
case are.
Maybe Brian or Boris can tell more.

> What about thread safety in general? Is the rest of the mtd system thread safe? Can it happen that e.g. a command is issued to NAND, and before the buffer is read, another command is issued (and similar scenarios)? Or is there some kind of protection in place for that?

NAND chip access is strictly serialized. :-)
See nand_get_device().

Thanks,
//richard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mtd locking
  2016-05-29 16:17     ` Richard Weinberger
@ 2016-05-30  9:24       ` Boris Brezillon
  2016-05-31 18:11       ` Brian Norris
  1 sibling, 0 replies; 6+ messages in thread
From: Boris Brezillon @ 2016-05-30  9:24 UTC (permalink / raw)
  To: Matthias Auchmann
  Cc: Richard Weinberger, linux-mtd, Brian Norris, Sascha Hauer

+Brian and Sasha

Hi Matthias,

On Sun, 29 May 2016 18:17:02 +0200
Richard Weinberger <richard@nod.at> wrote:

> Hi!
> 
> Am 29.05.2016 um 17:53 schrieb Matthias Auchmann:
> > By intended you mean it's wrong and we know, but it's still wrong, right?  
> 
> I'd say lazy. Sometimes counters don't need to be exact.
> Having exact counting is expensive. See network stack.
> 
> I'm not sure what the exact reasons in the ECC error counting
> case are.
> Maybe Brian or Boris can tell more.
> 
> > What about thread safety in general? Is the rest of the mtd system thread safe? Can it happen that e.g. a command is issued to NAND, and before the buffer is read, another command is issued (and similar scenarios)? Or is there some kind of protection in place for that?  
> 
> NAND chip access is strictly serialized. :-)
> See nand_get_device().

Yep, read/write accesses are correctly protected. The statistics update
are also protected, but it does not prevent tools like nanddump to
retrieve incorrect information. Actually, nanddump uses the following
sequence:

1/ get stats
2/ read stuff
3/ get new stats and compare with information retrieved in #1

Since we can't lock access to the MTD device for the whole sequence,
the statistics might change because of other read accesses on the same
MTD device.

I recently suggested the addition of a new ioctl to address that [1].

Regards,

Boris

[1]http://thread.gmane.org/gmane.linux.drivers.mtd/66834

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mtd locking
  2016-05-29 16:17     ` Richard Weinberger
  2016-05-30  9:24       ` Boris Brezillon
@ 2016-05-31 18:11       ` Brian Norris
  1 sibling, 0 replies; 6+ messages in thread
From: Brian Norris @ 2016-05-31 18:11 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Matthias Auchmann, linux-mtd, Sascha Hauer, Boris Brezillon

Hi,

On Sun, May 29, 2016 at 06:17:02PM +0200, Richard Weinberger wrote:
> Am 29.05.2016 um 17:53 schrieb Matthias Auchmann:
> > By intended you mean it's wrong and we know, but it's still wrong, right?
> 
> I'd say lazy. Sometimes counters don't need to be exact.
> Having exact counting is expensive. See network stack.
> 
> I'm not sure what the exact reasons in the ECC error counting
> case are.
> Maybe Brian or Boris can tell more.

Boris correctly explained things. And several people have noticed, but
few solutions have been attempted.

The only thing I'd add: I suspect the MTD char device APIs haven't
gotten as much love because they aren't as useful. Most users have found
better utility by working through JFFS2, UBI/UBIFS, etc., since there's
just so much that a raw MTD doesn't really solve for you, especially
when you start talking about the type of device where you care about
bitflips (i.e., NAND).

You're more likely to get away fine with raw MTD on NOR flash, but then
you don't care about ECCGETSTATS there anyway.

I think Boris has found use cases where he just wants to talk raw
NAND/MTD, so maybe some of this dynamic is changing.

(And BTW, I realized this is a potential impediment to moving the MTD
tests entirely to user space. They can still work fine in
single-threaded mode, but once you start running multiple instances,
you'll start getting inaccurate stats reports, which will make some
tests think they've failed.)

Brian

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-05-31 18:11 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-29 13:38 mtd locking Matthias Auchmann
2016-05-29 14:23 ` Richard Weinberger
2016-05-29 15:53   ` Matthias Auchmann
2016-05-29 16:17     ` Richard Weinberger
2016-05-30  9:24       ` Boris Brezillon
2016-05-31 18:11       ` Brian Norris

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.