linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Horng-Shyang Liao <hs.liao@mediatek.com>
To: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Rob Herring <robh+dt@kernel.org>,
	Daniel Kurtz <djkurtz@chromium.org>,
	Sascha Hauer <s.hauer@pengutronix.de>,
	<devicetree@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<linux-mediatek@lists.infradead.org>,
	<srv_heupstream@mediatek.com>,
	"Sascha Hauer" <kernel@pengutronix.de>,
	Philipp Zabel <p.zabel@pengutronix.de>,
	Nicolas Boichat <drinkcat@chromium.org>,
	CK HU <ck.hu@mediatek.com>,
	"cawa cheng" <cawa.cheng@mediatek.com>,
	Bibby Hsieh <bibby.hsieh@mediatek.com>,
	"YT Shen" <yt.shen@mediatek.com>,
	Daoyuan Huang <daoyuan.huang@mediatek.com>,
	Damon Chu <damon.chu@mediatek.com>,
	Josh-YC Liu <josh-yc.liu@mediatek.com>,
	Glory Hung <glory.hung@mediatek.com>,
	Jiaguang Zhang <jiaguang.zhang@mediatek.com>,
	Dennis-YC Hsieh <dennis-yc.hsieh@mediatek.com>,
	Monica Wang <monica.wang@mediatek.com>,
	<jassisinghbrar@gmail.com>, <jaswinder.singh@linaro.org>,
	<hs.liao@mediatek.com>
Subject: Re: [PATCH v8 2/3] CMDQ: Mediatek CMDQ driver
Date: Fri, 17 Jun 2016 16:28:27 +0800	[thread overview]
Message-ID: <1466152107.11184.14.camel@mtksdaap41> (raw)
In-Reply-To: <1465906063.20796.20.camel@mtksdaap41>

Hi Matthias,

On Tue, 2016-06-14 at 20:07 +0800, Horng-Shyang Liao wrote:
> Hi Matthias,
> 
> On Tue, 2016-06-14 at 12:17 +0200, Matthias Brugger wrote:
> > 
> > On 14/06/16 09:44, Horng-Shyang Liao wrote:
> > > Hi Matthias,
> > >
> > > On Wed, 2016-06-08 at 17:35 +0200, Matthias Brugger wrote:
> > >>
> > >> On 08/06/16 14:25, Horng-Shyang Liao wrote:
> > >>> Hi Matthias,
> > >>>
> > >>> On Wed, 2016-06-08 at 12:45 +0200, Matthias Brugger wrote:
> > >>>>
> > >>>> On 08/06/16 07:40, Horng-Shyang Liao wrote:
> > >>>>> Hi Matthias,
> > >>>>>
> > >>>>> On Tue, 2016-06-07 at 18:59 +0200, Matthias Brugger wrote:
> > >>>>>>
> > >>>>>> On 03/06/16 15:11, Matthias Brugger wrote:
> > >>>>>>>
> > >>>>>>>
> > >>>>>> [...]
> > >>>>>>
> > >>>>>>>>>>>>>>>> +
> > >>>>>>>>>>>>>>>> +            smp_mb(); /* modify jump before enable thread */
> > >>>>>>>>>>>>>>>> +        }
> > >>>>>>>>>>>>>>>> +
> > >>>>>>>>>>>>>>>> +        cmdq_thread_writel(thread, task->pa_base +
> > >>>>>>>>>>>>>>>> task->command_size,
> > >>>>>>>>>>>>>>>> +                   CMDQ_THR_END_ADDR);
> > >>>>>>>>>>>>>>>> +        cmdq_thread_resume(thread);
> > >>>>>>>>>>>>>>>> +    }
> > >>>>>>>>>>>>>>>> +    list_move_tail(&task->list_entry, &thread->task_busy_list);
> > >>>>>>>>>>>>>>>> +    spin_unlock_irqrestore(&cmdq->exec_lock, flags);
> > >>>>>>>>>>>>>>>> +}
> > >>>>>>>>>>>>>>>> +
> > >>>>>>>>>>>>>>>> +static void cmdq_handle_error_done(struct cmdq *cmdq,
> > >>>>>>>>>>>>>>>> +                   struct cmdq_thread *thread, u32 irq_flag)
> > >>>>>>>>>>>>>>>> +{
> > >>>>>>>>>>>>>>>> +    struct cmdq_task *task, *tmp, *curr_task = NULL;
> > >>>>>>>>>>>>>>>> +    u32 curr_pa;
> > >>>>>>>>>>>>>>>> +    struct cmdq_cb_data cmdq_cb_data;
> > >>>>>>>>>>>>>>>> +    bool err;
> > >>>>>>>>>>>>>>>> +
> > >>>>>>>>>>>>>>>> +    if (irq_flag & CMDQ_THR_IRQ_ERROR)
> > >>>>>>>>>>>>>>>> +        err = true;
> > >>>>>>>>>>>>>>>> +    else if (irq_flag & CMDQ_THR_IRQ_DONE)
> > >>>>>>>>>>>>>>>> +        err = false;
> > >>>>>>>>>>>>>>>> +    else
> > >>>>>>>>>>>>>>>> +        return;
> > >>>>>>>>>>>>>>>> +
> > >>>>>>>>>>>>>>>> +    curr_pa = cmdq_thread_readl(thread, CMDQ_THR_CURR_ADDR);
> > >>>>>>>>>>>>>>>> +
> > >>>>>>>>>>>>>>>> +    list_for_each_entry_safe(task, tmp, &thread->task_busy_list,
> > >>>>>>>>>>>>>>>> +                 list_entry) {
> > >>>>>>>>>>>>>>>> +        if (curr_pa >= task->pa_base &&
> > >>>>>>>>>>>>>>>> +            curr_pa < (task->pa_base + task->command_size))
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> What are you checking here? It seems as if you make some implcit
> > >>>>>>>>>>>>>>> assumptions about pa_base and the order of execution of
> > >>>>>>>>>>>>>>> commands in the
> > >>>>>>>>>>>>>>> thread. Is it save to do so? Does dma_alloc_coherent give any
> > >>>>>>>>>>>>>>> guarantees
> > >>>>>>>>>>>>>>> about dma_handle?
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> 1. Check what is the current running task in this GCE thread.
> > >>>>>>>>>>>>>> 2. Yes.
> > >>>>>>>>>>>>>> 3. Yes, CMDQ doesn't use iommu, so physical address is continuous.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Yes, physical addresses might be continous, but AFAIK there is no
> > >>>>>>>>>>>>> guarantee that the dma_handle address is steadily growing, when
> > >>>>>>>>>>>>> calling
> > >>>>>>>>>>>>> dma_alloc_coherent. And if I understand the code correctly, you
> > >>>>>>>>>>>>> use this
> > >>>>>>>>>>>>> assumption to decide if the task picked from task_busy_list is
> > >>>>>>>>>>>>> currently
> > >>>>>>>>>>>>> executing. So I think this mecanism is not working.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I don't use dma_handle address, and just use physical addresses.
> > >>>>>>>>>>>>       From CPU's point of view, tasks are linked by the busy list.
> > >>>>>>>>>>>>       From GCE's point of view, tasks are linked by the JUMP command.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> In which cases does the HW thread raise an interrupt.
> > >>>>>>>>>>>>> In case of error. When does CMDQ_THR_IRQ_DONE get raised?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> GCE will raise interrupt if any task is done or error.
> > >>>>>>>>>>>> However, GCE is fast, so CPU may get multiple done tasks
> > >>>>>>>>>>>> when it is running ISR.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> In case of error, that GCE thread will pause and raise interrupt.
> > >>>>>>>>>>>> So, CPU may get multiple done tasks and one error task.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> I think we should reimplement the ISR mechanism. Can't we just read
> > >>>>>>>>>>> CURR_IRQ_STATUS and THR_IRQ_STATUS in the handler and leave
> > >>>>>>>>>>> cmdq_handle_error_done to the thread_fn? You will need to pass
> > >>>>>>>>>>> information from the handler to thread_fn, but that shouldn't be an
> > >>>>>>>>>>> issue. AFAIK interrupts are disabled in the handler, so we should stay
> > >>>>>>>>>>> there as short as possible. Traversing task_busy_list is expensive, so
> > >>>>>>>>>>> we need to do it in a thread context.
> > >>>>>>>>>>
> > >>>>>>>>>> Actually, our initial implementation is similar to your suggestion,
> > >>>>>>>>>> but display needs CMDQ to return callback function very precisely,
> > >>>>>>>>>> else display will drop frame.
> > >>>>>>>>>> For display, CMDQ interrupt will be raised every 16 ~ 17 ms,
> > >>>>>>>>>> and CMDQ needs to call callback function in ISR.
> > >>>>>>>>>> If we defer callback to workqueue, the time interval may be larger than
> > >>>>>>>>>> 32 ms.sometimes.
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> I think the problem is, that you implemented the workqueue as a ordered
> > >>>>>>>>> workqueue, so there is no parallel processing. I'm still not sure why
> > >>>>>>>>> you need the workqueue to be ordered. Can you please explain.
> > >>>>>>>>
> > >>>>>>>> The order should be kept.
> > >>>>>>>> Let me use mouse cursor as an example.
> > >>>>>>>> If task 1 means move mouse cursor to point A, task 2 means point B,
> > >>>>>>>> and task 3 means point C, our expected result is A -> B -> C.
> > >>>>>>>> If the order is not kept, the result could become A -> C -> B.
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> Got it, thanks for the clarification.
> > >>>>>>>
> > >>>>>>
> > >>>>>> I think a way to get rid of the workqueue is to use a timer, which gets
> > >>>>>> programmed to the time a timeout in the first task in the busy list
> > >>>>>> would happen. Everytime we update the busy list (e.g. because of task
> > >>>>>> got finished by the thread), we update the timer. When the timer
> > >>>>>> triggers, which hopefully won't happen too often, we return timeout on
> > >>>>>> the busy list elements, until the time is lower then the actual time.
> > >>>>>>
> > >>>>>> At least with this we can reduce the data structures in this driver and
> > >>>>>> make it more lightweight.
> > >>>>>
> > >>>>>    From my understanding, your proposed method can handle timeout case.
> > >>>>>
> > >>>>> However, the workqueue is also in charge of releasing tasks.
> > >>>>> Do you take releasing tasks into consideration by using the proposed
> > >>>>> timer method?
> > >>>>> Furthermore, I think the code will become more complex if we also use
> > >>>>> timer to implement releasing tasks.
> > >>>>>
> > >>>>
> > >>>> Can't we call
> > >>>>            clk_disable_unprepare(cmdq->clock);
> > >>>>            cmdq_task_release(task);
> > >>>> after invoking the callback?

After I put clk_disable_unprepare(cmdq->clock) into ISR, I encounter
another BUG.

(Quote some Linux 4.7 source code.)

 605 void clk_unprepare(struct clk *clk)
 606 {
 607         if (IS_ERR_OR_NULL(clk))
 608                 return;
 609
 610         clk_prepare_lock();                      // <-- Here
 611         clk_core_unprepare(clk->core);
 612         clk_prepare_unlock();
 613 }
 614 EXPORT_SYMBOL_GPL(clk_unprepare);

  91 static void clk_prepare_lock(void)
  92 {
  93         if (!mutex_trylock(&prepare_lock)) {     // <-- Here
  94                 if (prepare_owner == current) {
  95                         prepare_refcnt++;
  96                         return;
  97                 }
  98                 mutex_lock(&prepare_lock);
  99         }
 100         WARN_ON_ONCE(prepare_owner != NULL);
 101         WARN_ON_ONCE(prepare_refcnt != 0);
 102         prepare_owner = current;
 103         prepare_refcnt = 1;
 104 }

So, 'unprepare' can sleep and cannot be put into ISR.
I also try to put it into a timer, but the error is the same
since timer callback is executed by softirq.

We need clk_disable_unprepare() since it can save power consumption
in idle.
Therefore, I plan to
(1) move releasing buffer and task into ISR,
(2) move timeout into timer, and
(3) keep workqueue for clk_disable_unprepare().

What do you think?

Thanks,
HS

> > >>>
> > >>> Do you mean just call these two functions in ISR?
> > >>> My major concern is dma_free_coherent() and kfree() in
> > >>> cmdq_task_release(task).
> > >>
> > >> Why do we need the dma calls at all? Can't we just calculate the
> > >> physical address using __pa(x)?
> > >
> > > I prefer to use dma_map_single/dma_unmap_single.
> > >
> > 
> > Can you please elaborate why you need this. We don't do dma, so we 
> > should not use dma memory for this.
> 
> We need a buffer to share between CPU and GCE, so we do need DMA.
> CPU is in charge of writing GCE commands into this buffer.
> GCE is in charge of reading and running GCE commands from this buffer.
> When we chain CMDQ tasks, we also need to modify GCE JUMP command.
> Therefore, I prefer to use dma_alloc_coherent and dma_free_coherent.
> 
> However, if we want to use timer to handle timeout, we need to release
> memory in ISR.
> In this case, using kmalloc/kfree + dma_map_single/dma_unmap_single
> instead of dma_alloc_coherent/dma_free_coherent is an alternative
> solution, but taking care the synchronization between cache and memory
> is the expected overhead.
> 
> > >>> Therefore, your suggestion is to use GFP_ATOMIC for both
> > >>> dma_alloc_coherent() and kzalloc(). Right?
> > >>
> > >> I don't think we need GFP_ATOMIC, the critical path will just free the
> > >> memory.
> > >
> > > I tested these two functions, and kfree was safe.
> > > However, dma_free_coherent raised BUG.
> > > BUG: failure at
> > > /mnt/host/source/src/third_party/kernel/v3.18/mm/vmalloc.c:1514/vunmap()!
> > 
> > Just a general hint. Please try to evaluate on a recent kernel. It looks 
> > like as if you tried this on a v3.18 based one.
> 
> This driver should be backward compatible to v3.18 for a MTK project.
> 
> > Best regards,
> > Matthias
> 
> Thanks,
> HS
> 
> > > 1512 void vunmap(const void *addr)
> > > 1513 {
> > > 1514         BUG_ON(in_interrupt());		// <-- here
> > > 1515         might_sleep();
> > > 1516         if (addr)
> > > 1517                 __vunmap(addr, 0);
> > > 1518 }
> > > 1519 EXPORT_SYMBOL(vunmap);
> > >
> > > Therefore, I plan to use kmalloc + dma_map_single instead of
> > > dma_alloc_coherent, and dma_unmap_single + kfree instead of
> > > dma_free_coherent.
> > >
> > > What do you think about the function replacement?
> > >
> > >>> If so, I can try to implement timeout by timer, and discuss with you
> > >>> if I have further questions.
> > >>>
> > >>
> > >> Sounds good :)
> > >>
> > >> Thanks,
> > >> Matthias
> > >
> > > Thanks,
> > > HS
> > >
> > >>>> Regrading the clock, wouldn't it be easier to handle the clock
> > >>>> enable/disable depending on the state of task_busy_list? I suppose we
> > >>>> can't as we would need to check the task_busy_list of all threads, right?
> > >>>>
> > >>>> Regards,
> > >>>> Matthias
> > >>>
> > >>> Thanks,
> > >>> HS
> > >>>
> > >
> > >
> 
> 

  reply	other threads:[~2016-06-17  8:28 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-30  3:19 [PATCH v8 0/3] Mediatek MT8173 CMDQ support HS Liao
2016-05-30  3:19 ` [PATCH v8 1/3] dt-bindings: soc: Add documentation for the MediaTek GCE unit HS Liao
2016-05-30  3:19 ` [PATCH v8 2/3] CMDQ: Mediatek CMDQ driver HS Liao
2016-05-30  6:49   ` CK Hu
2016-05-30  9:38     ` Horng-Shyang Liao
2016-05-30 15:31   ` Matthias Brugger
2016-05-31  8:36     ` Horng-Shyang Liao
2016-05-31 20:04       ` Matthias Brugger
2016-06-01  9:57         ` Horng-Shyang Liao
2016-06-02  8:46           ` Matthias Brugger
2016-06-03  6:12             ` Horng-Shyang Liao
2016-06-03 11:18               ` Matthias Brugger
2016-06-03 12:13                 ` Horng-Shyang Liao
2016-06-03 13:11                   ` Matthias Brugger
2016-06-07 16:59                     ` Matthias Brugger
2016-06-08  5:40                       ` Horng-Shyang Liao
2016-06-08 10:45                         ` Matthias Brugger
2016-06-08 12:25                           ` Horng-Shyang Liao
2016-06-08 15:35                             ` Matthias Brugger
2016-06-14  7:44                               ` Horng-Shyang Liao
2016-06-14 10:17                                 ` Matthias Brugger
2016-06-14 12:07                                   ` Horng-Shyang Liao
2016-06-17  8:28                                     ` Horng-Shyang Liao [this message]
2016-06-17 15:57                                       ` Matthias Brugger
2016-06-21  5:52                                         ` Horng-Shyang Liao
2016-06-21 13:41                                           ` Matthias Brugger
2016-06-22  5:43                                             ` Horng-Shyang Liao
2016-06-22  9:58                                               ` Matthias Brugger
2016-06-17 16:14                                     ` Matthias Brugger
2016-06-03 13:11                 ` Jassi Brar
2016-06-06  9:33                   ` Horng-Shyang Liao
2016-06-07  2:45                   ` Horng-Shyang Liao
2016-06-07 17:04   ` Matthias Brugger
2016-06-08  5:09     ` Horng-Shyang Liao
2016-06-20 10:41   ` CK Hu
2016-06-20 11:22     ` Horng-Shyang Liao
2016-06-21  2:03       ` CK Hu
2016-06-21  7:46         ` Horng-Shyang Liao
2016-06-24 11:39           ` Horng-Shyang Liao
2016-06-27  2:00             ` CK Hu
2016-06-23  6:03   ` CK Hu
2016-06-23  7:54     ` Horng-Shyang Liao
2016-06-23 11:44       ` CK Hu
2016-05-30  3:19 ` [PATCH v8 3/3] arm64: dts: mt8173: Add GCE node HS Liao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1466152107.11184.14.camel@mtksdaap41 \
    --to=hs.liao@mediatek.com \
    --cc=bibby.hsieh@mediatek.com \
    --cc=cawa.cheng@mediatek.com \
    --cc=ck.hu@mediatek.com \
    --cc=damon.chu@mediatek.com \
    --cc=daoyuan.huang@mediatek.com \
    --cc=dennis-yc.hsieh@mediatek.com \
    --cc=devicetree@vger.kernel.org \
    --cc=djkurtz@chromium.org \
    --cc=drinkcat@chromium.org \
    --cc=glory.hung@mediatek.com \
    --cc=jassisinghbrar@gmail.com \
    --cc=jaswinder.singh@linaro.org \
    --cc=jiaguang.zhang@mediatek.com \
    --cc=josh-yc.liu@mediatek.com \
    --cc=kernel@pengutronix.de \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mediatek@lists.infradead.org \
    --cc=matthias.bgg@gmail.com \
    --cc=monica.wang@mediatek.com \
    --cc=p.zabel@pengutronix.de \
    --cc=robh+dt@kernel.org \
    --cc=s.hauer@pengutronix.de \
    --cc=srv_heupstream@mediatek.com \
    --cc=yt.shen@mediatek.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).