All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boris Brezillon <boris.brezillon@collabora.com>
To: Sean Nyekjaer <sean@geanix.com>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>,
	Richard Weinberger <richard@nod.at>,
	Vignesh Raghavendra <vigneshr@ti.com>,
	Boris Brezillon <bbrezillon@kernel.org>,
	linux-mtd@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] mtd: rawnand: use mutex to protect access while in suspend
Date: Tue, 5 Oct 2021 10:58:36 +0200	[thread overview]
Message-ID: <20211005105836.6c300f25@collabora.com> (raw)
In-Reply-To: <20211005084938.jcbw24umhehoiirs@skn-laptop>

On Tue, 5 Oct 2021 10:49:38 +0200
Sean Nyekjaer <sean@geanix.com> wrote:

> On Tue, Oct 05, 2021 at 10:23:00AM +0200, Boris Brezillon wrote:
> > On Tue, 5 Oct 2021 09:09:30 +0200
> > Sean Nyekjaer <sean@geanix.com> wrote:  
> 
> [ ... ]
> 
> > > 
> > > Have you seen the reproducer script?  
> > 
> > How would I know about this script or your previous attempt (mentioned
> > at the end of this email) given I was not Cc-ed on the previous
> > discussion, and nothing mentions it in this RFC...
> >   
> 
> That's why I shared it here ;)
> Initially I thought this was a bug introduced by exec_op.
> 
> > > ---
> > > root@iwg26-v1:/data/root# cat /data/crash.sh
> > > #!/bin/sh -x
> > > 
> > > echo enabled > /sys/devices/platform/soc/2100000.bus/21f4000.serial/tty/ttymxc4/power/wakeup
> > > 
> > > rm /data/test50M
> > > dd if=/dev/urandom of=/tmp/test50M bs=1M count=50
> > > cp /tmp/test50M /data/ &
> > > sleep 1
> > > echo mem > /sys/power/state
> > > ---
> > > 
> > > As seen in the log above disk is synced before suspend.
> > > cp is continuing to copy data to ubifs.
> > > And then user space processes are frozen.
> > > At this point the kernel thread would have unwritten data.
> > > 
> > > We tried to solve this with:
> > > https://lkml.org/lkml/2021/9/1/280  
> > 
> > I see. It's still unclear to me when the write happens. Is it in the
> > suspend path (before the system is actually suspended), or in the
> > resume path (when the system is being resumed).
> > 
> > Anyway, let's admit writing to a storage device while it's suspended is
> > a valid use case and requires the storage layer to put this request on
> > old. This wait should not, IMHO, be handled at the NAND level, but at
> > the MTD level (using a waitqueue, and an atomic to make
> > suspended/resumed transitions safe). And abusing a mutex to implement
> > that is certainly not a good idea.  
> 
> I did't say this was the right solution ;) I actually asked in the RFC:
> "Should we introduce a new mutex? Or maybe a spin_lock?"
> 
> What are you proposing, a waitqueue in mtd_info? That gets checked in
> mtd_write()/mtd_read()?

Yes, and replacing the suspended state by an atomic, and providing a
helper to wait on the device readiness. Helper you will call in every
path involving a communication with the HW, not just mtd_read/write()
(you're missing erase at least, and I fear there are other hooks that
might lead to commands being issued to the device). But before we get
there, I think it's important to understand what the kernel expects.
IOW, if and when threads can do a request on a suspended device, and
when it's acceptable to wait (vs returning -EBUSY), otherwise I fear
we'll end up with deadlocks in the suspend/resume path.

WARNING: multiple messages have this Message-ID (diff)
From: Boris Brezillon <boris.brezillon@collabora.com>
To: Sean Nyekjaer <sean@geanix.com>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>,
	Richard Weinberger <richard@nod.at>,
	Vignesh Raghavendra <vigneshr@ti.com>,
	Boris Brezillon <bbrezillon@kernel.org>,
	linux-mtd@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] mtd: rawnand: use mutex to protect access while in suspend
Date: Tue, 5 Oct 2021 10:58:36 +0200	[thread overview]
Message-ID: <20211005105836.6c300f25@collabora.com> (raw)
In-Reply-To: <20211005084938.jcbw24umhehoiirs@skn-laptop>

On Tue, 5 Oct 2021 10:49:38 +0200
Sean Nyekjaer <sean@geanix.com> wrote:

> On Tue, Oct 05, 2021 at 10:23:00AM +0200, Boris Brezillon wrote:
> > On Tue, 5 Oct 2021 09:09:30 +0200
> > Sean Nyekjaer <sean@geanix.com> wrote:  
> 
> [ ... ]
> 
> > > 
> > > Have you seen the reproducer script?  
> > 
> > How would I know about this script or your previous attempt (mentioned
> > at the end of this email) given I was not Cc-ed on the previous
> > discussion, and nothing mentions it in this RFC...
> >   
> 
> That's why I shared it here ;)
> Initially I thought this was a bug introduced by exec_op.
> 
> > > ---
> > > root@iwg26-v1:/data/root# cat /data/crash.sh
> > > #!/bin/sh -x
> > > 
> > > echo enabled > /sys/devices/platform/soc/2100000.bus/21f4000.serial/tty/ttymxc4/power/wakeup
> > > 
> > > rm /data/test50M
> > > dd if=/dev/urandom of=/tmp/test50M bs=1M count=50
> > > cp /tmp/test50M /data/ &
> > > sleep 1
> > > echo mem > /sys/power/state
> > > ---
> > > 
> > > As seen in the log above disk is synced before suspend.
> > > cp is continuing to copy data to ubifs.
> > > And then user space processes are frozen.
> > > At this point the kernel thread would have unwritten data.
> > > 
> > > We tried to solve this with:
> > > https://lkml.org/lkml/2021/9/1/280  
> > 
> > I see. It's still unclear to me when the write happens. Is it in the
> > suspend path (before the system is actually suspended), or in the
> > resume path (when the system is being resumed).
> > 
> > Anyway, let's admit writing to a storage device while it's suspended is
> > a valid use case and requires the storage layer to put this request on
> > old. This wait should not, IMHO, be handled at the NAND level, but at
> > the MTD level (using a waitqueue, and an atomic to make
> > suspended/resumed transitions safe). And abusing a mutex to implement
> > that is certainly not a good idea.  
> 
> I did't say this was the right solution ;) I actually asked in the RFC:
> "Should we introduce a new mutex? Or maybe a spin_lock?"
> 
> What are you proposing, a waitqueue in mtd_info? That gets checked in
> mtd_write()/mtd_read()?

Yes, and replacing the suspended state by an atomic, and providing a
helper to wait on the device readiness. Helper you will call in every
path involving a communication with the HW, not just mtd_read/write()
(you're missing erase at least, and I fear there are other hooks that
might lead to commands being issued to the device). But before we get
there, I think it's important to understand what the kernel expects.
IOW, if and when threads can do a request on a suspended device, and
when it's acceptable to wait (vs returning -EBUSY), otherwise I fear
we'll end up with deadlocks in the suspend/resume path.

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

  reply	other threads:[~2021-10-05  8:58 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-04  6:56 [RFC PATCH] mtd: rawnand: use mutex to protect access while in suspend Sean Nyekjaer
2021-10-04  6:56 ` Sean Nyekjaer
2021-10-04  8:41 ` Boris Brezillon
2021-10-04  8:41   ` Boris Brezillon
2021-10-04  8:55   ` Sean Nyekjaer
2021-10-04  8:55     ` Sean Nyekjaer
2021-10-04  9:58     ` Boris Brezillon
2021-10-04  9:58       ` Boris Brezillon
2021-10-04 10:12       ` Sean Nyekjaer
2021-10-04 10:12         ` Sean Nyekjaer
2021-10-04 11:47         ` Boris Brezillon
2021-10-04 11:47           ` Boris Brezillon
2021-10-05  7:09           ` Sean Nyekjaer
2021-10-05  7:09             ` Sean Nyekjaer
2021-10-05  8:23             ` Boris Brezillon
2021-10-05  8:23               ` Boris Brezillon
2021-10-05  8:49               ` Sean Nyekjaer
2021-10-05  8:49                 ` Sean Nyekjaer
2021-10-05  8:58                 ` Boris Brezillon [this message]
2021-10-05  8:58                   ` Boris Brezillon
2021-10-07 11:43                   ` Sean Nyekjaer
2021-10-07 11:43                     ` Sean Nyekjaer
2021-10-07 12:18                     ` Boris Brezillon
2021-10-07 12:18                       ` Boris Brezillon
2021-10-07 12:39                       ` Sean Nyekjaer
2021-10-07 12:39                         ` Sean Nyekjaer
2021-10-07 13:14                         ` Boris Brezillon
2021-10-07 13:14                           ` Boris Brezillon
2021-10-08 10:04                           ` Sean Nyekjaer
2021-10-08 10:04                             ` Sean Nyekjaer
2021-10-08 11:20                             ` Boris Brezillon
2021-10-08 11:20                               ` Boris Brezillon
2021-10-08 11:54                               ` Sean Nyekjaer
2021-10-08 11:54                                 ` Sean Nyekjaer
2021-10-08 12:15                                 ` Boris Brezillon
2021-10-08 12:15                                   ` Boris Brezillon
2021-10-08 14:38                                   ` [RFC PATCH 1/2] mtd: core: protect access to mtd devices " Sean Nyekjaer
2021-10-08 14:38                                     ` Sean Nyekjaer
2021-10-08 15:30                                     ` Boris Brezillon
2021-10-08 15:30                                       ` Boris Brezillon
2021-10-08 17:31                                       ` Sean Nyekjaer
2021-10-08 17:31                                         ` Sean Nyekjaer
2021-10-08 15:35                                     ` Miquel Raynal
2021-10-08 15:35                                       ` Miquel Raynal
2021-10-08 16:08                                       ` Boris Brezillon
2021-10-08 16:08                                         ` Boris Brezillon
2021-10-08 17:50                                         ` Sean Nyekjaer
2021-10-08 17:50                                           ` Sean Nyekjaer
2021-10-08 14:38                                   ` [RFC PATCH 2/2] mtd: rawnand: remove suspended check Sean Nyekjaer
2021-10-08 14:38                                     ` Sean Nyekjaer
2021-10-08 22:05                                     ` kernel test robot
2021-10-08 22:05                                       ` kernel test robot
2021-10-08 22:47                                     ` kernel test robot
2021-10-08 22:47                                       ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211005105836.6c300f25@collabora.com \
    --to=boris.brezillon@collabora.com \
    --cc=bbrezillon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mtd@lists.infradead.org \
    --cc=miquel.raynal@bootlin.com \
    --cc=richard@nod.at \
    --cc=sean@geanix.com \
    --cc=vigneshr@ti.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.