All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Nyekjaer <sean@geanix.com>
To: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>,
	Richard Weinberger <richard@nod.at>,
	Vignesh Raghavendra <vigneshr@ti.com>,
	Boris Brezillon <bbrezillon@kernel.org>,
	linux-mtd@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] mtd: rawnand: use mutex to protect access while in suspend
Date: Thu, 7 Oct 2021 14:39:16 +0200	[thread overview]
Message-ID: <20211007123916.w4oaooxfbawe6yw3@skn-laptop> (raw)
In-Reply-To: <20211007141858.314533f2@collabora.com>

On Thu, Oct 07, 2021 at 02:18:58PM +0200, Boris Brezillon wrote:
> On Thu, 7 Oct 2021 13:43:51 +0200
> Sean Nyekjaer <sean@geanix.com> wrote:
> 

[ ... ]

> > 
> > I have a proposal [0] and yes I have ended up in many deadlocks during
> > testing. The hardest part is the locking when going into suspend.
> > I'm not sure the wait_queue is initialized the right place :)
> > And I'm kinda abusing the nand_get_device() for this...
> > 
> > Who do you think we should add to the discussion?
> > 
> > /Sean
> > 
> > [0]:
> > diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c
> > index 3d6c6e880520..735dfff18143 100644
> > --- a/drivers/mtd/nand/raw/nand_base.c
> > +++ b/drivers/mtd/nand/raw/nand_base.c
> 
> As I said previously, I think this should be handled MTD level
> (drivers/mtd/mtdcore.c) not in the raw NAND framework.
> 
> > @@ -337,11 +337,10 @@ static int nand_isbad_bbm(struct nand_chip *chip, loff_t ofs)
> >   */
> >  static int nand_get_device(struct nand_chip *chip)
> >  {
> > +       struct mtd_info *mtd = nand_to_mtd(chip);
> > +
> > +       wait_event(mtd->wait_queue, atomic_read(&chip->suspended) == 0);
> >         mutex_lock(&chip->lock);
> > -       if (chip->suspended) {
> > -               mutex_unlock(&chip->lock);
> > -               return -EBUSY;
> > -       }
> 
> There's a race here: the device might enter suspend again before you're
> able to acquire the lock.
> 

Thought so :)

> >         mutex_lock(&chip->controller->lock);
> > 
> >         return 0;
> > @@ -4562,11 +4561,15 @@ static int nand_suspend(struct mtd_info *mtd)
> >         struct nand_chip *chip = mtd_to_nand(mtd);
> >         int ret = 0;
> > 
> > +       atomic_inc(&chip->suspended);
> >         mutex_lock(&chip->lock);
> 
> And it's racy here as well: you mark the device as suspended before you
> even acquired the lock.
> 
> >         if (chip->ops.suspend)
> >                 ret = chip->ops.suspend(chip);
> > -       if (!ret)
> > -               chip->suspended = 1;
> > +       if (ret) {
> > +               /* Wake things up again if suspend fails */
> > +               atomic_dec(&chip->suspended);
> > +               wake_up(&mtd->wait_queue);
> > +       }
> >         mutex_unlock(&chip->lock);
> > 
> >         return ret;
> > @@ -4581,10 +4584,12 @@ static void nand_resume(struct mtd_info *mtd)
> >         struct nand_chip *chip = mtd_to_nand(mtd);
> > 
> >         mutex_lock(&chip->lock);
> > -       if (chip->suspended) {
> > +       if (atomic_read(&chip->suspended)) {
> >                 if (chip->ops.resume)
> >                         chip->ops.resume(chip);
> > -               chip->suspended = 0;
> > +
> > +               atomic_dec(&chip->suspended);
> > +               wake_up(&mtd->wait_queue);
> >         } else {
> >                 pr_err("%s called for a chip which is not in suspended state\n",
> >                         __func__);
> > @@ -5099,6 +5104,9 @@ static int nand_detect(struct nand_chip *chip, struct nand_flash_dev *type)
> >         pr_info("%d MiB, %s, erase size: %d KiB, page size: %d, OOB size: %d\n",
> >                 (int)(targetsize >> 20), nand_is_slc(chip) ? "SLC" : "MLC",
> >                 mtd->erasesize >> 10, mtd->writesize, mtd->oobsize);
> > +
> > +       init_waitqueue_head(&mtd->wait_queue);
> > +
> 
> It's an MTD field. It should be initialized somewhere in mtdcore.c.
> 
> >         return 0;
> > 
> >  free_detect_allocation:
> > @@ -6264,6 +6272,8 @@ static int nand_scan_tail(struct nand_chip *chip)
> >         if (chip->options & NAND_SKIP_BBTSCAN)
> >                 return 0;
> > 
> > +       atomic_set(&chip->suspended, 0);
> > +
> >         /* Build bad block table */
> >         ret = nand_create_bbt(chip);
> >         if (ret)
> > diff --git a/include/linux/mtd/mtd.h b/include/linux/mtd/mtd.h
> > index 88227044fc86..f7dcbc336170 100644
> > --- a/include/linux/mtd/mtd.h
> > +++ b/include/linux/mtd/mtd.h
> > @@ -360,6 +360,8 @@ struct mtd_info {
> >         int (*_get_device) (struct mtd_info *mtd);
> >         void (*_put_device) (struct mtd_info *mtd);
> > 
> > +       wait_queue_head_t wait_queue;
> > +
> 
> wait_queue doesn't really describe what this waitqueue is used for
> (maybe resume_wq), and the suspended state should be here as well
> (actually, there's one already).

I'll rename to something meaningful.
> 
> Actually, what we need is a way to prevent the device from being
> suspended while accesses are still in progress, and new accesses from
> being queued if a suspend is pending. So, I think you need a readwrite
> lock here:
> 
> * take the lock in read mode for all IO accesses, check the
>   mtd->suspended value
>   - if true, release the lock, and wait (retry on wakeup)
>   - if false, just do the IO
> 
> * take the lock in write mode when you want to suspend/resume the
>   device and update the suspended field. Call wake_up_all() in the
>   resume path

Could we use the chip->lock mutex for this? It's does kinda what you
described above?
If we introduce a new lock, do we really need to have the suspended as
an atomic?

I will test with some wait and retry added to nand_get_device().

WARNING: multiple messages have this Message-ID (diff)
From: Sean Nyekjaer <sean@geanix.com>
To: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>,
	Richard Weinberger <richard@nod.at>,
	Vignesh Raghavendra <vigneshr@ti.com>,
	Boris Brezillon <bbrezillon@kernel.org>,
	linux-mtd@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] mtd: rawnand: use mutex to protect access while in suspend
Date: Thu, 7 Oct 2021 14:39:16 +0200	[thread overview]
Message-ID: <20211007123916.w4oaooxfbawe6yw3@skn-laptop> (raw)
In-Reply-To: <20211007141858.314533f2@collabora.com>

On Thu, Oct 07, 2021 at 02:18:58PM +0200, Boris Brezillon wrote:
> On Thu, 7 Oct 2021 13:43:51 +0200
> Sean Nyekjaer <sean@geanix.com> wrote:
> 

[ ... ]

> > 
> > I have a proposal [0] and yes I have ended up in many deadlocks during
> > testing. The hardest part is the locking when going into suspend.
> > I'm not sure the wait_queue is initialized the right place :)
> > And I'm kinda abusing the nand_get_device() for this...
> > 
> > Who do you think we should add to the discussion?
> > 
> > /Sean
> > 
> > [0]:
> > diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c
> > index 3d6c6e880520..735dfff18143 100644
> > --- a/drivers/mtd/nand/raw/nand_base.c
> > +++ b/drivers/mtd/nand/raw/nand_base.c
> 
> As I said previously, I think this should be handled MTD level
> (drivers/mtd/mtdcore.c) not in the raw NAND framework.
> 
> > @@ -337,11 +337,10 @@ static int nand_isbad_bbm(struct nand_chip *chip, loff_t ofs)
> >   */
> >  static int nand_get_device(struct nand_chip *chip)
> >  {
> > +       struct mtd_info *mtd = nand_to_mtd(chip);
> > +
> > +       wait_event(mtd->wait_queue, atomic_read(&chip->suspended) == 0);
> >         mutex_lock(&chip->lock);
> > -       if (chip->suspended) {
> > -               mutex_unlock(&chip->lock);
> > -               return -EBUSY;
> > -       }
> 
> There's a race here: the device might enter suspend again before you're
> able to acquire the lock.
> 

Thought so :)

> >         mutex_lock(&chip->controller->lock);
> > 
> >         return 0;
> > @@ -4562,11 +4561,15 @@ static int nand_suspend(struct mtd_info *mtd)
> >         struct nand_chip *chip = mtd_to_nand(mtd);
> >         int ret = 0;
> > 
> > +       atomic_inc(&chip->suspended);
> >         mutex_lock(&chip->lock);
> 
> And it's racy here as well: you mark the device as suspended before you
> even acquired the lock.
> 
> >         if (chip->ops.suspend)
> >                 ret = chip->ops.suspend(chip);
> > -       if (!ret)
> > -               chip->suspended = 1;
> > +       if (ret) {
> > +               /* Wake things up again if suspend fails */
> > +               atomic_dec(&chip->suspended);
> > +               wake_up(&mtd->wait_queue);
> > +       }
> >         mutex_unlock(&chip->lock);
> > 
> >         return ret;
> > @@ -4581,10 +4584,12 @@ static void nand_resume(struct mtd_info *mtd)
> >         struct nand_chip *chip = mtd_to_nand(mtd);
> > 
> >         mutex_lock(&chip->lock);
> > -       if (chip->suspended) {
> > +       if (atomic_read(&chip->suspended)) {
> >                 if (chip->ops.resume)
> >                         chip->ops.resume(chip);
> > -               chip->suspended = 0;
> > +
> > +               atomic_dec(&chip->suspended);
> > +               wake_up(&mtd->wait_queue);
> >         } else {
> >                 pr_err("%s called for a chip which is not in suspended state\n",
> >                         __func__);
> > @@ -5099,6 +5104,9 @@ static int nand_detect(struct nand_chip *chip, struct nand_flash_dev *type)
> >         pr_info("%d MiB, %s, erase size: %d KiB, page size: %d, OOB size: %d\n",
> >                 (int)(targetsize >> 20), nand_is_slc(chip) ? "SLC" : "MLC",
> >                 mtd->erasesize >> 10, mtd->writesize, mtd->oobsize);
> > +
> > +       init_waitqueue_head(&mtd->wait_queue);
> > +
> 
> It's an MTD field. It should be initialized somewhere in mtdcore.c.
> 
> >         return 0;
> > 
> >  free_detect_allocation:
> > @@ -6264,6 +6272,8 @@ static int nand_scan_tail(struct nand_chip *chip)
> >         if (chip->options & NAND_SKIP_BBTSCAN)
> >                 return 0;
> > 
> > +       atomic_set(&chip->suspended, 0);
> > +
> >         /* Build bad block table */
> >         ret = nand_create_bbt(chip);
> >         if (ret)
> > diff --git a/include/linux/mtd/mtd.h b/include/linux/mtd/mtd.h
> > index 88227044fc86..f7dcbc336170 100644
> > --- a/include/linux/mtd/mtd.h
> > +++ b/include/linux/mtd/mtd.h
> > @@ -360,6 +360,8 @@ struct mtd_info {
> >         int (*_get_device) (struct mtd_info *mtd);
> >         void (*_put_device) (struct mtd_info *mtd);
> > 
> > +       wait_queue_head_t wait_queue;
> > +
> 
> wait_queue doesn't really describe what this waitqueue is used for
> (maybe resume_wq), and the suspended state should be here as well
> (actually, there's one already).

I'll rename to something meaningful.
> 
> Actually, what we need is a way to prevent the device from being
> suspended while accesses are still in progress, and new accesses from
> being queued if a suspend is pending. So, I think you need a readwrite
> lock here:
> 
> * take the lock in read mode for all IO accesses, check the
>   mtd->suspended value
>   - if true, release the lock, and wait (retry on wakeup)
>   - if false, just do the IO
> 
> * take the lock in write mode when you want to suspend/resume the
>   device and update the suspended field. Call wake_up_all() in the
>   resume path

Could we use the chip->lock mutex for this? It's does kinda what you
described above?
If we introduce a new lock, do we really need to have the suspended as
an atomic?

I will test with some wait and retry added to nand_get_device().

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

  reply	other threads:[~2021-10-07 12:39 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-04  6:56 [RFC PATCH] mtd: rawnand: use mutex to protect access while in suspend Sean Nyekjaer
2021-10-04  6:56 ` Sean Nyekjaer
2021-10-04  8:41 ` Boris Brezillon
2021-10-04  8:41   ` Boris Brezillon
2021-10-04  8:55   ` Sean Nyekjaer
2021-10-04  8:55     ` Sean Nyekjaer
2021-10-04  9:58     ` Boris Brezillon
2021-10-04  9:58       ` Boris Brezillon
2021-10-04 10:12       ` Sean Nyekjaer
2021-10-04 10:12         ` Sean Nyekjaer
2021-10-04 11:47         ` Boris Brezillon
2021-10-04 11:47           ` Boris Brezillon
2021-10-05  7:09           ` Sean Nyekjaer
2021-10-05  7:09             ` Sean Nyekjaer
2021-10-05  8:23             ` Boris Brezillon
2021-10-05  8:23               ` Boris Brezillon
2021-10-05  8:49               ` Sean Nyekjaer
2021-10-05  8:49                 ` Sean Nyekjaer
2021-10-05  8:58                 ` Boris Brezillon
2021-10-05  8:58                   ` Boris Brezillon
2021-10-07 11:43                   ` Sean Nyekjaer
2021-10-07 11:43                     ` Sean Nyekjaer
2021-10-07 12:18                     ` Boris Brezillon
2021-10-07 12:18                       ` Boris Brezillon
2021-10-07 12:39                       ` Sean Nyekjaer [this message]
2021-10-07 12:39                         ` Sean Nyekjaer
2021-10-07 13:14                         ` Boris Brezillon
2021-10-07 13:14                           ` Boris Brezillon
2021-10-08 10:04                           ` Sean Nyekjaer
2021-10-08 10:04                             ` Sean Nyekjaer
2021-10-08 11:20                             ` Boris Brezillon
2021-10-08 11:20                               ` Boris Brezillon
2021-10-08 11:54                               ` Sean Nyekjaer
2021-10-08 11:54                                 ` Sean Nyekjaer
2021-10-08 12:15                                 ` Boris Brezillon
2021-10-08 12:15                                   ` Boris Brezillon
2021-10-08 14:38                                   ` [RFC PATCH 1/2] mtd: core: protect access to mtd devices " Sean Nyekjaer
2021-10-08 14:38                                     ` Sean Nyekjaer
2021-10-08 15:30                                     ` Boris Brezillon
2021-10-08 15:30                                       ` Boris Brezillon
2021-10-08 17:31                                       ` Sean Nyekjaer
2021-10-08 17:31                                         ` Sean Nyekjaer
2021-10-08 15:35                                     ` Miquel Raynal
2021-10-08 15:35                                       ` Miquel Raynal
2021-10-08 16:08                                       ` Boris Brezillon
2021-10-08 16:08                                         ` Boris Brezillon
2021-10-08 17:50                                         ` Sean Nyekjaer
2021-10-08 17:50                                           ` Sean Nyekjaer
2021-10-08 14:38                                   ` [RFC PATCH 2/2] mtd: rawnand: remove suspended check Sean Nyekjaer
2021-10-08 14:38                                     ` Sean Nyekjaer
2021-10-08 22:05                                     ` kernel test robot
2021-10-08 22:05                                       ` kernel test robot
2021-10-08 22:47                                     ` kernel test robot
2021-10-08 22:47                                       ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211007123916.w4oaooxfbawe6yw3@skn-laptop \
    --to=sean@geanix.com \
    --cc=bbrezillon@kernel.org \
    --cc=boris.brezillon@collabora.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mtd@lists.infradead.org \
    --cc=miquel.raynal@bootlin.com \
    --cc=richard@nod.at \
    --cc=vigneshr@ti.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.