From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1422924AbcFHPfq (ORCPT <rfc822;w@1wt.eu>);
	Wed, 8 Jun 2016 11:35:46 -0400
Received: from mail-wm0-f68.google.com ([74.125.82.68]:33903 "EHLO
	mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1422770AbcFHPfk (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Wed, 8 Jun 2016 11:35:40 -0400
Subject: Re: [PATCH v8 2/3] CMDQ: Mediatek CMDQ driver
To: Horng-Shyang Liao <hs.liao@mediatek.com>
References: <1464578397-29743-1-git-send-email-hs.liao@mediatek.com>
 <1464578397-29743-3-git-send-email-hs.liao@mediatek.com>
 <574C5CBF.7060002@gmail.com> <1464683762.14604.59.camel@mtksdaap41>
 <574DEE40.9010008@gmail.com> <1464775020.11122.40.camel@mtksdaap41>
 <574FF264.7050209@gmail.com> <1464934356.15175.31.camel@mtksdaap41>
 <57516774.5080008@gmail.com> <1464956037.16029.8.camel@mtksdaap41>
 <575181E5.6090603@gmail.com> <5756FD73.3050607@gmail.com>
 <1465364427.9963.13.camel@mtksdaap41> <5757F762.4020908@gmail.com>
 <1465388727.21326.8.camel@mtksdaap41>
Cc: Rob Herring <robh+dt@kernel.org>, Daniel Kurtz <djkurtz@chromium.org>,
        Sascha Hauer <s.hauer@pengutronix.de>, devicetree@vger.kernel.org,
        linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
        linux-mediatek@lists.infradead.org, srv_heupstream@mediatek.com,
        Sascha Hauer <kernel@pengutronix.de>,
        Philipp Zabel <p.zabel@pengutronix.de>,
        Nicolas Boichat <drinkcat@chromium.org>, CK HU <ck.hu@mediatek.com>,
        cawa cheng <cawa.cheng@mediatek.com>,
        Bibby Hsieh <bibby.hsieh@mediatek.com>, YT Shen <yt.shen@mediatek.com>,
        Daoyuan Huang <daoyuan.huang@mediatek.com>,
        Damon Chu <damon.chu@mediatek.com>,
        Josh-YC Liu <josh-yc.liu@mediatek.com>,
        Glory Hung <glory.hung@mediatek.com>,
        Jiaguang Zhang <jiaguang.zhang@mediatek.com>,
        Dennis-YC Hsieh <dennis-yc.hsieh@mediatek.com>,
        Monica Wang <monica.wang@mediatek.com>, jassisinghbrar@gmail.com,
        jaswinder.singh@linaro.org
From: Matthias Brugger <matthias.bgg@gmail.com>
Message-ID: <57583B45.2080504@gmail.com>
Date: Wed, 8 Jun 2016 17:35:33 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
 Thunderbird/38.7.0
MIME-Version: 1.0
In-Reply-To: <1465388727.21326.8.camel@mtksdaap41>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org


On 08/06/16 14:25, Horng-Shyang Liao wrote:
> Hi Matthias,
>
> On Wed, 2016-06-08 at 12:45 +0200, Matthias Brugger wrote:
>>
>> On 08/06/16 07:40, Horng-Shyang Liao wrote:
>>> Hi Matthias,
>>>
>>> On Tue, 2016-06-07 at 18:59 +0200, Matthias Brugger wrote:
>>>>
>>>> On 03/06/16 15:11, Matthias Brugger wrote:
>>>>>
>>>>>
>>>> [...]
>>>>
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +            smp_mb(); /* modify jump before enable thread */
>>>>>>>>>>>>>> +        }
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +        cmdq_thread_writel(thread, task->pa_base +
>>>>>>>>>>>>>> task->command_size,
>>>>>>>>>>>>>> +                   CMDQ_THR_END_ADDR);
>>>>>>>>>>>>>> +        cmdq_thread_resume(thread);
>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>> +    list_move_tail(&task->list_entry, &thread->task_busy_list);
>>>>>>>>>>>>>> +    spin_unlock_irqrestore(&cmdq->exec_lock, flags);
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +static void cmdq_handle_error_done(struct cmdq *cmdq,
>>>>>>>>>>>>>> +                   struct cmdq_thread *thread, u32 irq_flag)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> +    struct cmdq_task *task, *tmp, *curr_task = NULL;
>>>>>>>>>>>>>> +    u32 curr_pa;
>>>>>>>>>>>>>> +    struct cmdq_cb_data cmdq_cb_data;
>>>>>>>>>>>>>> +    bool err;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +    if (irq_flag & CMDQ_THR_IRQ_ERROR)
>>>>>>>>>>>>>> +        err = true;
>>>>>>>>>>>>>> +    else if (irq_flag & CMDQ_THR_IRQ_DONE)
>>>>>>>>>>>>>> +        err = false;
>>>>>>>>>>>>>> +    else
>>>>>>>>>>>>>> +        return;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +    curr_pa = cmdq_thread_readl(thread, CMDQ_THR_CURR_ADDR);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +    list_for_each_entry_safe(task, tmp, &thread->task_busy_list,
>>>>>>>>>>>>>> +                 list_entry) {
>>>>>>>>>>>>>> +        if (curr_pa >= task->pa_base &&
>>>>>>>>>>>>>> +            curr_pa < (task->pa_base + task->command_size))
>>>>>>>>>>>>>
>>>>>>>>>>>>> What are you checking here? It seems as if you make some implcit
>>>>>>>>>>>>> assumptions about pa_base and the order of execution of
>>>>>>>>>>>>> commands in the
>>>>>>>>>>>>> thread. Is it save to do so? Does dma_alloc_coherent give any
>>>>>>>>>>>>> guarantees
>>>>>>>>>>>>> about dma_handle?
>>>>>>>>>>>>
>>>>>>>>>>>> 1. Check what is the current running task in this GCE thread.
>>>>>>>>>>>> 2. Yes.
>>>>>>>>>>>> 3. Yes, CMDQ doesn't use iommu, so physical address is continuous.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Yes, physical addresses might be continous, but AFAIK there is no
>>>>>>>>>>> guarantee that the dma_handle address is steadily growing, when
>>>>>>>>>>> calling
>>>>>>>>>>> dma_alloc_coherent. And if I understand the code correctly, you
>>>>>>>>>>> use this
>>>>>>>>>>> assumption to decide if the task picked from task_busy_list is
>>>>>>>>>>> currently
>>>>>>>>>>> executing. So I think this mecanism is not working.
>>>>>>>>>>
>>>>>>>>>> I don't use dma_handle address, and just use physical addresses.
>>>>>>>>>>      From CPU's point of view, tasks are linked by the busy list.
>>>>>>>>>>      From GCE's point of view, tasks are linked by the JUMP command.
>>>>>>>>>>
>>>>>>>>>>> In which cases does the HW thread raise an interrupt.
>>>>>>>>>>> In case of error. When does CMDQ_THR_IRQ_DONE get raised?
>>>>>>>>>>
>>>>>>>>>> GCE will raise interrupt if any task is done or error.
>>>>>>>>>> However, GCE is fast, so CPU may get multiple done tasks
>>>>>>>>>> when it is running ISR.
>>>>>>>>>>
>>>>>>>>>> In case of error, that GCE thread will pause and raise interrupt.
>>>>>>>>>> So, CPU may get multiple done tasks and one error task.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think we should reimplement the ISR mechanism. Can't we just read
>>>>>>>>> CURR_IRQ_STATUS and THR_IRQ_STATUS in the handler and leave
>>>>>>>>> cmdq_handle_error_done to the thread_fn? You will need to pass
>>>>>>>>> information from the handler to thread_fn, but that shouldn't be an
>>>>>>>>> issue. AFAIK interrupts are disabled in the handler, so we should stay
>>>>>>>>> there as short as possible. Traversing task_busy_list is expensive, so
>>>>>>>>> we need to do it in a thread context.
>>>>>>>>
>>>>>>>> Actually, our initial implementation is similar to your suggestion,
>>>>>>>> but display needs CMDQ to return callback function very precisely,
>>>>>>>> else display will drop frame.
>>>>>>>> For display, CMDQ interrupt will be raised every 16 ~ 17 ms,
>>>>>>>> and CMDQ needs to call callback function in ISR.
>>>>>>>> If we defer callback to workqueue, the time interval may be larger than
>>>>>>>> 32 ms.sometimes.
>>>>>>>>
>>>>>>>
>>>>>>> I think the problem is, that you implemented the workqueue as a ordered
>>>>>>> workqueue, so there is no parallel processing. I'm still not sure why
>>>>>>> you need the workqueue to be ordered. Can you please explain.
>>>>>>
>>>>>> The order should be kept.
>>>>>> Let me use mouse cursor as an example.
>>>>>> If task 1 means move mouse cursor to point A, task 2 means point B,
>>>>>> and task 3 means point C, our expected result is A -> B -> C.
>>>>>> If the order is not kept, the result could become A -> C -> B.
>>>>>>
>>>>>
>>>>> Got it, thanks for the clarification.
>>>>>
>>>>
>>>> I think a way to get rid of the workqueue is to use a timer, which gets
>>>> programmed to the time a timeout in the first task in the busy list
>>>> would happen. Everytime we update the busy list (e.g. because of task
>>>> got finished by the thread), we update the timer. When the timer
>>>> triggers, which hopefully won't happen too often, we return timeout on
>>>> the busy list elements, until the time is lower then the actual time.
>>>>
>>>> At least with this we can reduce the data structures in this driver and
>>>> make it more lightweight.
>>>
>>>   From my understanding, your proposed method can handle timeout case.
>>>
>>> However, the workqueue is also in charge of releasing tasks.
>>> Do you take releasing tasks into consideration by using the proposed
>>> timer method?
>>> Furthermore, I think the code will become more complex if we also use
>>> timer to implement releasing tasks.
>>>
>>
>> Can't we call
>>           clk_disable_unprepare(cmdq->clock);
>>           cmdq_task_release(task);
>> after invoking the callback?
>
> Do you mean just call these two functions in ISR?
> My major concern is dma_free_coherent() and kfree() in
> cmdq_task_release(task).

Why do we need the dma calls at all? Can't we just calculate the 
physical address using __pa(x)?

> Therefore, your suggestion is to use GFP_ATOMIC for both
> dma_alloc_coherent() and kzalloc(). Right?

I don't think we need GFP_ATOMIC, the critical path will just free the 
memory.

> If so, I can try to implement timeout by timer, and discuss with you
> if I have further questions.
>

Sounds good :)

Thanks,
Matthias

>> Regrading the clock, wouldn't it be easier to handle the clock
>> enable/disable depending on the state of task_busy_list? I suppose we
>> can't as we would need to check the task_busy_list of all threads, right?
>>
>> Regards,
>> Matthias
>
> Thanks,
> HS
>