On 08/02/2018 07:23 AM, Adrian Hunter wrote:
> On 02/08/18 14:09, Ming Lei wrote:
>> On Thu, Aug 02, 2018 at 01:47:29PM +0300, Adrian Hunter wrote:
>>> On 02/08/18 13:33, Ming Lei wrote:
>>>> On Thu, Aug 02, 2018 at 01:09:31PM +0300, Adrian Hunter wrote:
>>>>> On 31/07/18 19:25, Ming Lei wrote:
>>>>>> Hi Peter,
>>>>>>
>>>>>> On Tue, Jul 31, 2018 at 08:47:45AM -0400, Peter Geis wrote:
>>>>>>> Good Morning,
>>>>>>>
>>>>>>> On 07/30/2018 09:38 PM, Ming Lei wrote:
>>>>>>>> Hi Peter,
>>>>>>>>
>>>>>>>> Thanks for collecting the log.
>>>>>>>>
>>>>>>>> On Mon, Jul 30, 2018 at 02:55:42PM -0400, Peter Geis wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 07/28/2018 09:37 AM, Ming Lei wrote:
>>>>>>>>
>>>>>>>> ...
>>>>>>>>
>>>>>>>>> [   10.887209] systemd--112     0.n.1 2411122us : blk_mq_make_request: make
>>>>>>>>> rq -1
>>>>>>>>> [   10.890274] kworker/-98      0...1 2411506us : blk_mq_free_request:
>>>>>>>>> complete: rq -1
>>>>>>>>> [   10.893313] systemd--107     0...1 2412025us : blk_mq_make_request: make
>>>>>>>>> rq -1
>>>>>>>>> [   10.896354] systemd--107     0.... 2412323us : mmc_mq_queue_rq: queue rq
>>>>>>>>> -1, 0
>>>>>>>>> [   10.899388] systemd--107     0.... 2412327us :
>>>>>>>>> blk_mq_try_issue_list_directly: issue direct: rq -1, ret 0
>>>>>>>>> [   10.902463] (direxec-111     1...1 2413829us : blk_mq_make_request: make
>>>>>>>>> rq -1
>>>>>>>>> [   10.905513] systemd--114     1...1 2415159us : blk_mq_make_request: make
>>>>>>>>> rq -1
>>>>>>>>
>>>>>>>> Above is the most interesting part in the log. MMC sets hw queue depth
>>>>>>>> as 1, and you are using none scheduler, that means the max number of
>>>>>>>> in-flight requests should be one, but the above log shows that there may
>>>>>>>> be 3 in-flight requests.
>>>>>>>
>>>>>>> That's odd, I have CFQ set as the default, is something changing this during
>>>>>>> boot?
>>>>>>> CONFIG_CFQ_GROUP_IOSCHED=y
>>>>>>> CONFIG_DEFAULT_IOSCHED="cfq"
>>>>>>
>>>>>> No, now mmc has been converted to blk-mq, and the default mq io sched
>>>>>> should have been mq-deadline for mmc, but not sure why it is none in
>>>>>> your case.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> That seems really weird, but it shouldn't be related with my two patches,
>>>>>>>> which won't change the tag allocation behaviour at all. However, what matters
>>>>>>>> may be that the patch speeds up the request dispatch. Maybe one bug
>>>>>>>> in lib/sbitmap.c block/blk-mq-tag.c.
>>>>>>>>
>>>>>>>> Unfortunately rq->tag wasn't shown in the log because I forget to dump
>>>>>>>> it in the debug patch, so could you apply the following new debug patch and
>>>>>>>> provide us the log again? BTW, please attach the ftrace log in the reply
>>>>>>>> mail directly, then it may be parsed/looked easily.
>>>>>>>
>>>>>>> I have resynced to the latest linux-next and applied your new patch.
>>>>>>> The log is attached.
>>>>>>>
>>>>>>> Of note, it took several boots this time before it would progress to where I
>>>>>>> could grab the log.
>>>>>>> Instead it was blocking the moment RW was requested.
>>>>>>> Also of note, it seems only the emmc is affected, the SD card (mmcblk2) does
>>>>>>> not trigger any errors.
>>>>>>
>>>>>> >From the log you captured, seems there are three requests(33, 34, 60) blocked,
>>>>>> which should have been inside mmc, but still need to confirm, could you apply the
>>>>>> following one-line patch against the last debug patch and collect the log again?
>>>>>>
>>>>>> diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
>>>>>> index d35f265cd5e0..ac2ffc5a8ed4 100644
>>>>>> --- a/drivers/mmc/core/queue.c
>>>>>> +++ b/drivers/mmc/core/queue.c
>>>>>> @@ -321,6 +321,8 @@ static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
>>>>>>                                     !host->hold_retune;
>>>>>>          }
>>>>>>
>>>>>> +       if (blk_queue_debug(q))
>>>>>> +               trace_printk("mmc before issue rq %d %d\n", req->internal_tag, req->tag);
>>>>>>          blk_mq_start_request(req);
>>>>>>
>>>>>>          issued = mmc_blk_mq_issue_rq(mq, req);
>>>>>>
>>>>>>
>>>>>> Also see comments in mmc_mq_queue_rq():
>>>>>>
>>>>>> 	/*
>>>>>> 	 * We use BLK_MQ_F_BLOCKING and have only 1 hardware queue, which means requests
>>>>>> 	 * will not be dispatched in parallel.
>>>>>> 	 */
>>>>>>
>>>>>> which isn't correct, given blk-mq has multiple sw queue and mmc sets the queue depth
>>>>>> as > 1, and requests may be dispatched to the unique hw queue in parallel.
>>>>>>
>>>>>> Adrian, Ulf Hansson and anyone, could you take a look at the warning of WARN_ON(host->cmd)
>>>>>> in sdhci_send_command()? Seems you only allow to queue one command, but not sure how you
>>>>>> guarantee that.
>>>>>
>>>>> We didn't guarantee it, but it didn't happen before "blk-mq: issue directly
>>>>> if hw queue isn't busy in case of 'none'".
>>>>
>>>> OK, thanks for clarifying that, and as I mentioned what matters is the
>>>> timing change.
>>>>
>>>>> We did consider adding a mutex, refer
>>>>> https://lore.kernel.org/lkml/CAPDyKFr8tiJXSL-weQjGJ3DfRrfv8ZAFY8=ZECLNgSe_43S8Rw@mail.gmail.com/
>>>>>
>>>>> However the following might do, do you think?
>>>>
>>>> If dispatch in parallel isn't supported, just wondering why not set hw
>>>> queue depth as 1? That way should be simple to fix this issue.
>>>
>>> First, it isn't 1.  It is 2 for devices with no command queue because we
>>> prepare a request while the previous one completes.  Otherwise it is the
>>> command queue depth.
>>
>> Could you share where the prepare function is? What does the prepare
>> function do?
> 
> The prepare function is mmc_pre_req().
> 
> The prepare function is to let the host controller DMA map the request.  On
> some architectures that is sufficiently slow that there is a significant
> performance benefit to doing that in advance.
> 
>>
>> If the device has no command queue, I understand there is only one
>> command which can be queued/issued to controller. If that is true, the queue
>> depth should be 1.
>>
>>>
>>> Secondly, we expect an elevator to be used, and the elevator needs a decent
>>> number of requests to work with, and the default number of requests is
>>> currently tied to the queue depth.
>>
>> There are two queue depth, we are talking about hw queue depth, which
>> means how many commands the controller can queue at most. The other one
>> is scheduler queue depth, which is 2 times of hw queue depth at default.
>>
>> We may improve this case a bit, now if hw queue depth is 1, the scheduler
>> queue depth is set as 2 at default, but this default depth isn't good, since
>> legacy sets scheduler queue depth as 128, even though the minimized
>> depth is 4.
>>
>>>
>>>>
>>>>>
>>>>>
>>>>> diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
>>>>> index 648eb6743ed5..6edffeed9953 100644
>>>>> --- a/drivers/mmc/core/queue.c
>>>>> +++ b/drivers/mmc/core/queue.c
>>>>> @@ -238,10 +238,6 @@ static void mmc_mq_exit_request(struct blk_mq_tag_set *set, struct request *req,
>>>>>   	mmc_exit_request(mq->queue, req);
>>>>>   }
>>>>>   
>>>>> -/*
>>>>> - * We use BLK_MQ_F_BLOCKING and have only 1 hardware queue, which means requests
>>>>> - * will not be dispatched in parallel.
>>>>> - */
>>>>>   static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
>>>>>   				    const struct blk_mq_queue_data *bd)
>>>>>   {
>>>>> @@ -264,7 +260,7 @@ static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
>>>>>   
>>>>>   	spin_lock_irq(q->queue_lock);
>>>>>   
>>>>> -	if (mq->recovery_needed) {
>>>>> +	if (mq->recovery_needed || mq->busy) {
>>>>>   		spin_unlock_irq(q->queue_lock);
>>>>>   		return BLK_STS_RESOURCE;
>>>>>   	}
>>>>> @@ -291,6 +287,9 @@ static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
>>>>>   		break;
>>>>>   	}
>>>>>   
>>>>> +	/* Parallel dispatch of requests is not supported at the moment */
>>>>> +	mq->busy = true;
>>>>> +
>>>>>   	mq->in_flight[issue_type] += 1;
>>>>>   	get_card = (mmc_tot_in_flight(mq) == 1);
>>>>>   	cqe_retune_ok = (mmc_cqe_qcnt(mq) == 1);
>>>>> @@ -333,9 +332,12 @@ static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
>>>>>   		mq->in_flight[issue_type] -= 1;
>>>>>   		if (mmc_tot_in_flight(mq) == 0)
>>>>>   			put_card = true;
>>>>> +		mq->busy = false;
>>>>>   		spin_unlock_irq(q->queue_lock);
>>>>>   		if (put_card)
>>>>>   			mmc_put_card(card, &mq->ctx);
>>>>> +	} else {
>>>>> +		WRITE_ONCE(mq->busy, false);
>>>>>   	}
>>>>>   
>>>>>   	return ret;
>>>>> diff --git a/drivers/mmc/core/queue.h b/drivers/mmc/core/queue.h
>>>>> index 17e59d50b496..9bf3c9245075 100644
>>>>> --- a/drivers/mmc/core/queue.h
>>>>> +++ b/drivers/mmc/core/queue.h
>>>>> @@ -81,6 +81,7 @@ struct mmc_queue {
>>>>>   	unsigned int		cqe_busy;
>>>>>   #define MMC_CQE_DCMD_BUSY	BIT(0)
>>>>>   #define MMC_CQE_QUEUE_FULL	BIT(1)
>>>>> +	bool			busy;
>>>>>   	bool			use_cqe;
>>>>>   	bool			recovery_needed;
>>>>>   	bool			in_recovery;
>>>>
>>>> Sorry, I am not familiar with mmc code, so can't comment on the above
>>>> patch.
>>>
>>> Right, but if we return BLK_STS_RESOURCE to avoid parallel dispatch, do we
>>> need to worry about ensuring the queue gets run later?
>>
>> Yeah, blk-mq can cover that.
> 
> Peter, could you test if the diff I sent also fixes your original regression?
> 

Good Morning,
I apologize for the delay in returning the log with the latest debug 
patch, but I encountered a new bug with the io scheduler enabled that I 
was trying to work around.
Apparently with the io scheduler enabled, while it got rid of the 
warnings, I was still getting cache corruption after a period of time.
This manifested itself as ENOMEM errors whenever accessing files that 
had been recently written, and those files showed up as ??????? when ls 
-al was run.
After a reboot those files were available without issue, until written 
again.

I have attached the log from the latest test.

Break.

Adrian,
Your patch appears to have corrected the errors, and it boots without issue.
I will have to run an extended test to ensure the bug I just mentioned 
above does not manifest either.

Thanks everyone!