Re: [PATCH v7 1/4] mmc: Add MMC host software queue support

From: "(Exiting) Baolin Wang" <baolin.wang@linaro.org>
To: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Baolin Wang <baolin.wang7@gmail.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Asutosh Das <asutoshd@codeaurora.org>,
	Orson Zhai <orsonzhai@gmail.com>,
	Chunyan Zhang <zhang.lyra@gmail.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Linus Walleij <linus.walleij@linaro.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	"linux-mmc@vger.kernel.org" <linux-mmc@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v7 1/4] mmc: Add MMC host software queue support
Date: Mon, 27 Jan 2020 18:23:33 +0800	[thread overview]
Message-ID: <CAMz4ku+c62b5ij5m8Ui0PmgLmQU32GqiQWQHXz6RFhzaYNqtYQ@mail.gmail.com> (raw)
In-Reply-To: <CAPDyKFpm0p+0zr686u70AMEYz6Ne7X5zb3sBvvjTFHfWuvX0jQ@mail.gmail.com>

Hi Ulf,

(Sorry for the late reply due to my holidays).

On Tue, 21 Jan 2020 at 19:52, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Mon, 18 Nov 2019 at 11:43, Baolin Wang <baolin.wang7@gmail.com> wrote:
> >
> > From: Baolin Wang <baolin.wang@linaro.org>
> >
> > Now the MMC read/write stack will always wait for previous request is
> > completed by mmc_blk_rw_wait(), before sending a new request to hardware,
> > or queue a work to complete request, that will bring context switching
> > overhead, especially for high I/O per second rates, to affect the IO
> > performance.
> >
> > Thus this patch introduces MMC software queue interface based on the
> > hardware command queue engine's interfaces, which is similar with the
> > hardware command queue engine's idea, that can remove the context
> > switching. Moreover we set the default queue depth as 32 for software
> > queue, which allows more requests to be prepared, merged and inserted
> > into IO scheduler to improve performance, but we only allow 2 requests
> > in flight, that is enough to let the irq handler always trigger the
> > next request without a context switch, as well as avoiding a long latency.
> >
> > From the fio testing data in cover letter, we can see the software
> > queue can improve some performance with 4K block size, increasing
> > about 16% for random read, increasing about 90% for random write,
> > though no obvious improvement for sequential read and write.
> >
> > Moreover we can expand the software queue interface to support MMC
> > packed request or packed command in future.
> >
> > Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
> > Signed-off-by: Baolin Wang <baolin.wang7@gmail.com>
> > ---
> >  drivers/mmc/core/block.c   |   61 ++++++++
> >  drivers/mmc/core/mmc.c     |   13 +-
> >  drivers/mmc/core/queue.c   |   33 ++++-
> >  drivers/mmc/host/Kconfig   |    7 +
> >  drivers/mmc/host/Makefile  |    1 +
> >  drivers/mmc/host/mmc_hsq.c |  344 ++++++++++++++++++++++++++++++++++++++++++++
> >  drivers/mmc/host/mmc_hsq.h |   30 ++++
> >  include/linux/mmc/host.h   |    3 +
> >  8 files changed, 482 insertions(+), 10 deletions(-)
> >  create mode 100644 drivers/mmc/host/mmc_hsq.c
> >  create mode 100644 drivers/mmc/host/mmc_hsq.h
> >
> > diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
> > index 2c71a43..870462c 100644
> > --- a/drivers/mmc/core/block.c
> > +++ b/drivers/mmc/core/block.c
> > @@ -168,6 +168,11 @@ struct mmc_rpmb_data {
> >
> >  static inline int mmc_blk_part_switch(struct mmc_card *card,
> >                                       unsigned int part_type);
> > +static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq,
> > +                              struct mmc_card *card,
> > +                              int disable_multi,
> > +                              struct mmc_queue *mq);
> > +static void mmc_blk_swq_req_done(struct mmc_request *mrq);
>
> We have debated whether swq ("software queue") is a good name - and
> just to confirm, I also don't have a great better suggestion.
>
> However, I do like the name of new host interface though, host
> software queue, "hsq". That makes sense to me.
>
> One option, to possibly make the core code more aligned to the hsq
> interface, could be to stick with the "hsq" acronym for the core layer
> as well.

Sure.

>
> In other words, the above function (and its new friends introduced in
> the series) could be renamed to mmc_blk_hsq_req_done(). What do you
> think about that?

Yes, totally agree with you. Will change in the next version.

> >
> >  static struct mmc_blk_data *mmc_blk_get(struct gendisk *disk)
> >  {
> > @@ -1569,9 +1574,30 @@ static int mmc_blk_cqe_issue_flush(struct mmc_queue *mq, struct request *req)
> >         return mmc_blk_cqe_start_req(mq->card->host, mrq);
> >  }
> >
> > +static int mmc_blk_swq_issue_rw_rq(struct mmc_queue *mq, struct request *req)
> > +{
> > +       struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
> > +       struct mmc_host *host = mq->card->host;
> > +       int err;
> > +
> > +       mmc_blk_rw_rq_prep(mqrq, mq->card, 0, mq);
> > +       mqrq->brq.mrq.done = mmc_blk_swq_req_done;
> > +       mmc_pre_req(host, &mqrq->brq.mrq);
> > +
> > +       err = mmc_cqe_start_req(host, &mqrq->brq.mrq);
> > +       if (err)
> > +               mmc_post_req(host, &mqrq->brq.mrq, err);
> > +
> > +       return err;
> > +}
> > +
> >  static int mmc_blk_cqe_issue_rw_rq(struct mmc_queue *mq, struct request *req)
> >  {
> >         struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
> > +       struct mmc_host *host = mq->card->host;
> > +
> > +       if (host->swq_enabled)
>
> If we switch to use "hsq", this would then be "hsq_enabled".

Sure.

>
> > +               return mmc_blk_swq_issue_rw_rq(mq, req);
> >
> >         mmc_blk_data_prep(mq, mqrq, 0, NULL, NULL);
> >
> > @@ -1957,6 +1983,41 @@ static void mmc_blk_urgent_bkops(struct mmc_queue *mq,
> >                 mmc_run_bkops(mq->card);
> >  }
> >
> > +static void mmc_blk_swq_req_done(struct mmc_request *mrq)
> > +{
> > +       struct mmc_queue_req *mqrq =
> > +               container_of(mrq, struct mmc_queue_req, brq.mrq);
> > +       struct request *req = mmc_queue_req_to_req(mqrq);
> > +       struct request_queue *q = req->q;
> > +       struct mmc_queue *mq = q->queuedata;
> > +       struct mmc_host *host = mq->card->host;
> > +       unsigned long flags;
> > +
> > +       if (mmc_blk_rq_error(&mqrq->brq) ||
> > +           mmc_blk_urgent_bkops_needed(mq, mqrq)) {
> > +               spin_lock_irqsave(&mq->lock, flags);
> > +               mq->recovery_needed = true;
> > +               mq->recovery_req = req;
> > +               spin_unlock_irqrestore(&mq->lock, flags);
> > +
> > +               host->cqe_ops->cqe_recovery_start(host);
> > +
> > +               schedule_work(&mq->recovery_work);
> > +               return;
> > +       }
> > +
> > +       mmc_blk_rw_reset_success(mq, req);
> > +
> > +       /*
> > +        * Block layer timeouts race with completions which means the normal
> > +        * completion path cannot be used during recovery.
> > +        */
> > +       if (mq->in_recovery)
> > +               mmc_blk_cqe_complete_rq(mq, req);
> > +       else
> > +               blk_mq_complete_request(req);
> > +}
> > +
> >  void mmc_blk_mq_complete(struct request *req)
> >  {
> >         struct mmc_queue *mq = req->q->queuedata;
> > diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
> > index c880489..8eac1a2 100644
> > --- a/drivers/mmc/core/mmc.c
> > +++ b/drivers/mmc/core/mmc.c
> > @@ -1852,15 +1852,22 @@ static int mmc_init_card(struct mmc_host *host, u32 ocr,
> >          */
> >         card->reenable_cmdq = card->ext_csd.cmdq_en;
> >
> > -       if (card->ext_csd.cmdq_en && !host->cqe_enabled) {
> > +       if (host->cqe_ops && !host->cqe_enabled) {
>
> The doesn't looks entirely correct to me, as it means enabling the CQE
> hardware for hosts with MMC_CAP2_CQE set, but no matter of whether the
> eMMC card really supports CMDQ (or if we failed to enabled CMDQ for
> the card). More comments below.
>
> >                 err = host->cqe_ops->cqe_enable(host, card);
> >                 if (err) {
> >                         pr_err("%s: Failed to enable CQE, error %d\n",
> >                                 mmc_hostname(host), err);
> >                 } else {
> >                         host->cqe_enabled = true;
> > -                       pr_info("%s: Command Queue Engine enabled\n",
> > -                               mmc_hostname(host));
> > +
> > +                       if (card->ext_csd.cmdq_en) {
> > +                               pr_info("%s: Command Queue Engine enabled\n",
> > +                                       mmc_hostname(host));
> > +                       } else {
> > +                               host->swq_enabled = true;
> > +                               pr_info("%s: Software Queue enabled\n",
> > +                                       mmc_hostname(host));
>
> A few questions around the above code.
>
> 1.
> Let's assume the host supports MMC_CAP2_CQE, but the eMMC card doesn't
> support CMDQ.
>
> In this case, we still want to allow the host to use the software
> variant (the hsq) of the interface. In principle that is what the code
> above already tries to implement, but then you also need to update the
> support in drivers/mmc/host/cqhci.[ch] to support that dynamically.
> For example, the ->cqe_enable() callback should check
> "card->ext_csd.cmdq_en" and adjust its behavior accordingly, depending
> if the flag has been set or not.

Right. I will add "card->ext_csd.cmdq_en" checking in
drivers/mmc/host/cqhci.c to fix this issue.

> 2.
> I also notice that you are enabling the use of the hsq path, solely
> for eMMC cards. I am guessing hsq is beneficial to use for SD cards as
> well, don't you think?

Right.

>
> Of course, I am fine by enabling that in a step-by-step approach, so
> no need to add that as a part of $subject patch. Although, at least
> make it a part of the series.

Yes, that's what I thought. The SD card related patch was already in
my local tree, and I will post it if this patch set was accepted.

>
> > +                       }
> >                 }
> >         }
> >
> > diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> > index 9edc086..d9086c1 100644
> > --- a/drivers/mmc/core/queue.c
> > +++ b/drivers/mmc/core/queue.c
> > @@ -62,7 +62,7 @@ enum mmc_issue_type mmc_issue_type(struct mmc_queue *mq, struct request *req)
> >  {
> >         struct mmc_host *host = mq->card->host;
> >
> > -       if (mq->use_cqe)
> > +       if (mq->use_cqe && !host->swq_enabled)
> >                 return mmc_cqe_issue_type(host, req);
> >
> >         if (req_op(req) == REQ_OP_READ || req_op(req) == REQ_OP_WRITE)
> > @@ -124,12 +124,14 @@ static enum blk_eh_timer_return mmc_mq_timed_out(struct request *req,
> >  {
> >         struct request_queue *q = req->q;
> >         struct mmc_queue *mq = q->queuedata;
> > +       struct mmc_card *card = mq->card;
> > +       struct mmc_host *host = card->host;
> >         unsigned long flags;
> >         int ret;
> >
> >         spin_lock_irqsave(&mq->lock, flags);
> >
> > -       if (mq->recovery_needed || !mq->use_cqe)
> > +       if (mq->recovery_needed || !mq->use_cqe || host->swq_enabled)
> >                 ret = BLK_EH_RESET_TIMER;
> >         else
> >                 ret = mmc_cqe_timed_out(req);
> > @@ -144,12 +146,13 @@ static void mmc_mq_recovery_handler(struct work_struct *work)
> >         struct mmc_queue *mq = container_of(work, struct mmc_queue,
> >                                             recovery_work);
> >         struct request_queue *q = mq->queue;
> > +       struct mmc_host *host = mq->card->host;
> >
> >         mmc_get_card(mq->card, &mq->ctx);
> >
> >         mq->in_recovery = true;
> >
> > -       if (mq->use_cqe)
> > +       if (mq->use_cqe && !host->swq_enabled)
> >                 mmc_blk_cqe_recovery(mq);
> >         else
> >                 mmc_blk_mq_recovery(mq);
> > @@ -160,6 +163,9 @@ static void mmc_mq_recovery_handler(struct work_struct *work)
> >         mq->recovery_needed = false;
> >         spin_unlock_irq(&mq->lock);
> >
> > +       if (host->swq_enabled)
> > +               host->cqe_ops->cqe_recovery_finish(host);
> > +
> >         mmc_put_card(mq->card, &mq->ctx);
> >
> >         blk_mq_run_hw_queues(q, true);
> > @@ -279,6 +285,14 @@ static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
> >                 }
> >                 break;
> >         case MMC_ISSUE_ASYNC:
> > +               /*
> > +                * For MMC host software queue, we only allow 2 requests in
> > +                * flight to avoid a long latency.
> > +                */
> > +               if (host->swq_enabled && mq->in_flight[issue_type] > 2) {
> > +                       spin_unlock_irq(&mq->lock);
> > +                       return BLK_STS_RESOURCE;
> > +               }
> >                 break;
> >         default:
> >                 /*
> > @@ -430,11 +444,16 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card)
> >          * The queue depth for CQE must match the hardware because the request
> >          * tag is used to index the hardware queue.
> >          */
> > -       if (mq->use_cqe)
> > -               mq->tag_set.queue_depth =
> > -                       min_t(int, card->ext_csd.cmdq_depth, host->cqe_qdepth);
> > -       else
> > +       if (mq->use_cqe) {
> > +               if (host->swq_enabled)
> > +                       mq->tag_set.queue_depth = host->cqe_qdepth;
>
> I don't think we need to treat the hsq as special case in regards to
> the .queue_depth.
>
> It should be fine to use the default MMC_QUEUE_DEPTH (64), don't you think?

Yes, I agree.

> > +               else
> > +                       mq->tag_set.queue_depth =
> > +                               min_t(int, card->ext_csd.cmdq_depth, host->cqe_qdepth);
> > +       } else {
> >                 mq->tag_set.queue_depth = MMC_QUEUE_DEPTH;
> > +       }
> > +
> >         mq->tag_set.numa_node = NUMA_NO_NODE;
> >         mq->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_BLOCKING;
> >         mq->tag_set.nr_hw_queues = 1;
> > diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig
> > index 49ea02c..efa4019 100644
> > --- a/drivers/mmc/host/Kconfig
> > +++ b/drivers/mmc/host/Kconfig
> > @@ -936,6 +936,13 @@ config MMC_CQHCI
> >
> >           If unsure, say N.
> >
> > +config MMC_HSQ
> > +       tristate "MMC Host Software Queue support"
> > +       help
> > +         This selects the Software Queue support.
> > +
> > +         If unsure, say N.
> > +
> >  config MMC_TOSHIBA_PCI
> >         tristate "Toshiba Type A SD/MMC Card Interface Driver"
> >         depends on PCI
> > diff --git a/drivers/mmc/host/Makefile b/drivers/mmc/host/Makefile
> > index 11c4598..c14b439 100644
> > --- a/drivers/mmc/host/Makefile
> > +++ b/drivers/mmc/host/Makefile
> > @@ -98,6 +98,7 @@ obj-$(CONFIG_MMC_SDHCI_BRCMSTB)               += sdhci-brcmstb.o
> >  obj-$(CONFIG_MMC_SDHCI_OMAP)           += sdhci-omap.o
> >  obj-$(CONFIG_MMC_SDHCI_SPRD)           += sdhci-sprd.o
> >  obj-$(CONFIG_MMC_CQHCI)                        += cqhci.o
> > +obj-$(CONFIG_MMC_HSQ)                  += mmc_hsq.o
> >
> >  ifeq ($(CONFIG_CB710_DEBUG),y)
> >         CFLAGS-cb710-mmc        += -DDEBUG
> > diff --git a/drivers/mmc/host/mmc_hsq.c b/drivers/mmc/host/mmc_hsq.c
> > new file mode 100644
> > index 0000000..f5a4f93
> > --- /dev/null
> > +++ b/drivers/mmc/host/mmc_hsq.c
> > @@ -0,0 +1,344 @@
>
> Other than the above, this looks indeed very promising! I have no
> further comment for the rest of the patches in the series.

Very appreciated for your good suggestion. I will send out the next
version after my holidays with addressing your comments. Thanks.

-- 
Baolin Wang
Best Regards