On 08/02/2018 07:23 AM, Adrian Hunter wrote: > On 02/08/18 14:09, Ming Lei wrote: >> On Thu, Aug 02, 2018 at 01:47:29PM +0300, Adrian Hunter wrote: >>> On 02/08/18 13:33, Ming Lei wrote: >>>> On Thu, Aug 02, 2018 at 01:09:31PM +0300, Adrian Hunter wrote: >>>>> On 31/07/18 19:25, Ming Lei wrote: >>>>>> Hi Peter, >>>>>> >>>>>> On Tue, Jul 31, 2018 at 08:47:45AM -0400, Peter Geis wrote: >>>>>>> Good Morning, >>>>>>> >>>>>>> On 07/30/2018 09:38 PM, Ming Lei wrote: >>>>>>>> Hi Peter, >>>>>>>> >>>>>>>> Thanks for collecting the log. >>>>>>>> >>>>>>>> On Mon, Jul 30, 2018 at 02:55:42PM -0400, Peter Geis wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On 07/28/2018 09:37 AM, Ming Lei wrote: >>>>>>>> >>>>>>>> ... >>>>>>>> >>>>>>>>> [ 10.887209] systemd--112 0.n.1 2411122us : blk_mq_make_request: make >>>>>>>>> rq -1 >>>>>>>>> [ 10.890274] kworker/-98 0...1 2411506us : blk_mq_free_request: >>>>>>>>> complete: rq -1 >>>>>>>>> [ 10.893313] systemd--107 0...1 2412025us : blk_mq_make_request: make >>>>>>>>> rq -1 >>>>>>>>> [ 10.896354] systemd--107 0.... 2412323us : mmc_mq_queue_rq: queue rq >>>>>>>>> -1, 0 >>>>>>>>> [ 10.899388] systemd--107 0.... 2412327us : >>>>>>>>> blk_mq_try_issue_list_directly: issue direct: rq -1, ret 0 >>>>>>>>> [ 10.902463] (direxec-111 1...1 2413829us : blk_mq_make_request: make >>>>>>>>> rq -1 >>>>>>>>> [ 10.905513] systemd--114 1...1 2415159us : blk_mq_make_request: make >>>>>>>>> rq -1 >>>>>>>> >>>>>>>> Above is the most interesting part in the log. MMC sets hw queue depth >>>>>>>> as 1, and you are using none scheduler, that means the max number of >>>>>>>> in-flight requests should be one, but the above log shows that there may >>>>>>>> be 3 in-flight requests. >>>>>>> >>>>>>> That's odd, I have CFQ set as the default, is something changing this during >>>>>>> boot? >>>>>>> CONFIG_CFQ_GROUP_IOSCHED=y >>>>>>> CONFIG_DEFAULT_IOSCHED="cfq" >>>>>> >>>>>> No, now mmc has been converted to blk-mq, and the default mq io sched >>>>>> should have been mq-deadline for mmc, but not sure why it is none in >>>>>> your case. >>>>>> >>>>>>> >>>>>>>> >>>>>>>> That seems really weird, but it shouldn't be related with my two patches, >>>>>>>> which won't change the tag allocation behaviour at all. However, what matters >>>>>>>> may be that the patch speeds up the request dispatch. Maybe one bug >>>>>>>> in lib/sbitmap.c block/blk-mq-tag.c. >>>>>>>> >>>>>>>> Unfortunately rq->tag wasn't shown in the log because I forget to dump >>>>>>>> it in the debug patch, so could you apply the following new debug patch and >>>>>>>> provide us the log again? BTW, please attach the ftrace log in the reply >>>>>>>> mail directly, then it may be parsed/looked easily. >>>>>>> >>>>>>> I have resynced to the latest linux-next and applied your new patch. >>>>>>> The log is attached. >>>>>>> >>>>>>> Of note, it took several boots this time before it would progress to where I >>>>>>> could grab the log. >>>>>>> Instead it was blocking the moment RW was requested. >>>>>>> Also of note, it seems only the emmc is affected, the SD card (mmcblk2) does >>>>>>> not trigger any errors. >>>>>> >>>>>> >From the log you captured, seems there are three requests(33, 34, 60) blocked, >>>>>> which should have been inside mmc, but still need to confirm, could you apply the >>>>>> following one-line patch against the last debug patch and collect the log again? >>>>>> >>>>>> diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c >>>>>> index d35f265cd5e0..ac2ffc5a8ed4 100644 >>>>>> --- a/drivers/mmc/core/queue.c >>>>>> +++ b/drivers/mmc/core/queue.c >>>>>> @@ -321,6 +321,8 @@ static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx, >>>>>> !host->hold_retune; >>>>>> } >>>>>> >>>>>> + if (blk_queue_debug(q)) >>>>>> + trace_printk("mmc before issue rq %d %d\n", req->internal_tag, req->tag); >>>>>> blk_mq_start_request(req); >>>>>> >>>>>> issued = mmc_blk_mq_issue_rq(mq, req); >>>>>> >>>>>> >>>>>> Also see comments in mmc_mq_queue_rq(): >>>>>> >>>>>> /* >>>>>> * We use BLK_MQ_F_BLOCKING and have only 1 hardware queue, which means requests >>>>>> * will not be dispatched in parallel. >>>>>> */ >>>>>> >>>>>> which isn't correct, given blk-mq has multiple sw queue and mmc sets the queue depth >>>>>> as > 1, and requests may be dispatched to the unique hw queue in parallel. >>>>>> >>>>>> Adrian, Ulf Hansson and anyone, could you take a look at the warning of WARN_ON(host->cmd) >>>>>> in sdhci_send_command()? Seems you only allow to queue one command, but not sure how you >>>>>> guarantee that. >>>>> >>>>> We didn't guarantee it, but it didn't happen before "blk-mq: issue directly >>>>> if hw queue isn't busy in case of 'none'". >>>> >>>> OK, thanks for clarifying that, and as I mentioned what matters is the >>>> timing change. >>>> >>>>> We did consider adding a mutex, refer >>>>> https://lore.kernel.org/lkml/CAPDyKFr8tiJXSL-weQjGJ3DfRrfv8ZAFY8=ZECLNgSe_43S8Rw@mail.gmail.com/ >>>>> >>>>> However the following might do, do you think? >>>> >>>> If dispatch in parallel isn't supported, just wondering why not set hw >>>> queue depth as 1? That way should be simple to fix this issue. >>> >>> First, it isn't 1. It is 2 for devices with no command queue because we >>> prepare a request while the previous one completes. Otherwise it is the >>> command queue depth. >> >> Could you share where the prepare function is? What does the prepare >> function do? > > The prepare function is mmc_pre_req(). > > The prepare function is to let the host controller DMA map the request. On > some architectures that is sufficiently slow that there is a significant > performance benefit to doing that in advance. > >> >> If the device has no command queue, I understand there is only one >> command which can be queued/issued to controller. If that is true, the queue >> depth should be 1. >> >>> >>> Secondly, we expect an elevator to be used, and the elevator needs a decent >>> number of requests to work with, and the default number of requests is >>> currently tied to the queue depth. >> >> There are two queue depth, we are talking about hw queue depth, which >> means how many commands the controller can queue at most. The other one >> is scheduler queue depth, which is 2 times of hw queue depth at default. >> >> We may improve this case a bit, now if hw queue depth is 1, the scheduler >> queue depth is set as 2 at default, but this default depth isn't good, since >> legacy sets scheduler queue depth as 128, even though the minimized >> depth is 4. >> >>> >>>> >>>>> >>>>> >>>>> diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c >>>>> index 648eb6743ed5..6edffeed9953 100644 >>>>> --- a/drivers/mmc/core/queue.c >>>>> +++ b/drivers/mmc/core/queue.c >>>>> @@ -238,10 +238,6 @@ static void mmc_mq_exit_request(struct blk_mq_tag_set *set, struct request *req, >>>>> mmc_exit_request(mq->queue, req); >>>>> } >>>>> >>>>> -/* >>>>> - * We use BLK_MQ_F_BLOCKING and have only 1 hardware queue, which means requests >>>>> - * will not be dispatched in parallel. >>>>> - */ >>>>> static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx, >>>>> const struct blk_mq_queue_data *bd) >>>>> { >>>>> @@ -264,7 +260,7 @@ static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx, >>>>> >>>>> spin_lock_irq(q->queue_lock); >>>>> >>>>> - if (mq->recovery_needed) { >>>>> + if (mq->recovery_needed || mq->busy) { >>>>> spin_unlock_irq(q->queue_lock); >>>>> return BLK_STS_RESOURCE; >>>>> } >>>>> @@ -291,6 +287,9 @@ static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx, >>>>> break; >>>>> } >>>>> >>>>> + /* Parallel dispatch of requests is not supported at the moment */ >>>>> + mq->busy = true; >>>>> + >>>>> mq->in_flight[issue_type] += 1; >>>>> get_card = (mmc_tot_in_flight(mq) == 1); >>>>> cqe_retune_ok = (mmc_cqe_qcnt(mq) == 1); >>>>> @@ -333,9 +332,12 @@ static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx, >>>>> mq->in_flight[issue_type] -= 1; >>>>> if (mmc_tot_in_flight(mq) == 0) >>>>> put_card = true; >>>>> + mq->busy = false; >>>>> spin_unlock_irq(q->queue_lock); >>>>> if (put_card) >>>>> mmc_put_card(card, &mq->ctx); >>>>> + } else { >>>>> + WRITE_ONCE(mq->busy, false); >>>>> } >>>>> >>>>> return ret; >>>>> diff --git a/drivers/mmc/core/queue.h b/drivers/mmc/core/queue.h >>>>> index 17e59d50b496..9bf3c9245075 100644 >>>>> --- a/drivers/mmc/core/queue.h >>>>> +++ b/drivers/mmc/core/queue.h >>>>> @@ -81,6 +81,7 @@ struct mmc_queue { >>>>> unsigned int cqe_busy; >>>>> #define MMC_CQE_DCMD_BUSY BIT(0) >>>>> #define MMC_CQE_QUEUE_FULL BIT(1) >>>>> + bool busy; >>>>> bool use_cqe; >>>>> bool recovery_needed; >>>>> bool in_recovery; >>>> >>>> Sorry, I am not familiar with mmc code, so can't comment on the above >>>> patch. >>> >>> Right, but if we return BLK_STS_RESOURCE to avoid parallel dispatch, do we >>> need to worry about ensuring the queue gets run later? >> >> Yeah, blk-mq can cover that. > > Peter, could you test if the diff I sent also fixes your original regression? > Good Morning, I apologize for the delay in returning the log with the latest debug patch, but I encountered a new bug with the io scheduler enabled that I was trying to work around. Apparently with the io scheduler enabled, while it got rid of the warnings, I was still getting cache corruption after a period of time. This manifested itself as ENOMEM errors whenever accessing files that had been recently written, and those files showed up as ??????? when ls -al was run. After a reboot those files were available without issue, until written again. I have attached the log from the latest test. Break. Adrian, Your patch appears to have corrected the errors, and it boots without issue. I will have to run an extended test to ensure the bug I just mentioned above does not manifest either. Thanks everyone!