From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751880AbcFNHoi (ORCPT ); Tue, 14 Jun 2016 03:44:38 -0400 Received: from mailgw01.mediatek.com ([210.61.82.183]:54295 "EHLO mailgw01.mediatek.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1751596AbcFNHof (ORCPT ); Tue, 14 Jun 2016 03:44:35 -0400 Message-ID: <1465890268.7191.13.camel@mtksdaap41> Subject: Re: [PATCH v8 2/3] CMDQ: Mediatek CMDQ driver From: Horng-Shyang Liao To: Matthias Brugger CC: Rob Herring , Daniel Kurtz , Sascha Hauer , , , , , , "Sascha Hauer" , Philipp Zabel , Nicolas Boichat , CK HU , "cawa cheng" , Bibby Hsieh , "YT Shen" , Daoyuan Huang , Damon Chu , Josh-YC Liu , Glory Hung , Jiaguang Zhang , Dennis-YC Hsieh , Monica Wang , , , Date: Tue, 14 Jun 2016 15:44:28 +0800 In-Reply-To: <57583B45.2080504@gmail.com> References: <1464578397-29743-1-git-send-email-hs.liao@mediatek.com> <1464578397-29743-3-git-send-email-hs.liao@mediatek.com> <574C5CBF.7060002@gmail.com> <1464683762.14604.59.camel@mtksdaap41> <574DEE40.9010008@gmail.com> <1464775020.11122.40.camel@mtksdaap41> <574FF264.7050209@gmail.com> <1464934356.15175.31.camel@mtksdaap41> <57516774.5080008@gmail.com> <1464956037.16029.8.camel@mtksdaap41> <575181E5.6090603@gmail.com> <5756FD73.3050607@gmail.com> <1465364427.9963.13.camel@mtksdaap41> <5757F762.4020908@gmail.com> <1465388727.21326.8.camel@mtksdaap41> <57583B45.2080504@gmail.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.2.3-0ubuntu6 Content-Transfer-Encoding: 7bit MIME-Version: 1.0 X-MTK: N Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Matthias, On Wed, 2016-06-08 at 17:35 +0200, Matthias Brugger wrote: > > On 08/06/16 14:25, Horng-Shyang Liao wrote: > > Hi Matthias, > > > > On Wed, 2016-06-08 at 12:45 +0200, Matthias Brugger wrote: > >> > >> On 08/06/16 07:40, Horng-Shyang Liao wrote: > >>> Hi Matthias, > >>> > >>> On Tue, 2016-06-07 at 18:59 +0200, Matthias Brugger wrote: > >>>> > >>>> On 03/06/16 15:11, Matthias Brugger wrote: > >>>>> > >>>>> > >>>> [...] > >>>> > >>>>>>>>>>>>>> + > >>>>>>>>>>>>>> + smp_mb(); /* modify jump before enable thread */ > >>>>>>>>>>>>>> + } > >>>>>>>>>>>>>> + > >>>>>>>>>>>>>> + cmdq_thread_writel(thread, task->pa_base + > >>>>>>>>>>>>>> task->command_size, > >>>>>>>>>>>>>> + CMDQ_THR_END_ADDR); > >>>>>>>>>>>>>> + cmdq_thread_resume(thread); > >>>>>>>>>>>>>> + } > >>>>>>>>>>>>>> + list_move_tail(&task->list_entry, &thread->task_busy_list); > >>>>>>>>>>>>>> + spin_unlock_irqrestore(&cmdq->exec_lock, flags); > >>>>>>>>>>>>>> +} > >>>>>>>>>>>>>> + > >>>>>>>>>>>>>> +static void cmdq_handle_error_done(struct cmdq *cmdq, > >>>>>>>>>>>>>> + struct cmdq_thread *thread, u32 irq_flag) > >>>>>>>>>>>>>> +{ > >>>>>>>>>>>>>> + struct cmdq_task *task, *tmp, *curr_task = NULL; > >>>>>>>>>>>>>> + u32 curr_pa; > >>>>>>>>>>>>>> + struct cmdq_cb_data cmdq_cb_data; > >>>>>>>>>>>>>> + bool err; > >>>>>>>>>>>>>> + > >>>>>>>>>>>>>> + if (irq_flag & CMDQ_THR_IRQ_ERROR) > >>>>>>>>>>>>>> + err = true; > >>>>>>>>>>>>>> + else if (irq_flag & CMDQ_THR_IRQ_DONE) > >>>>>>>>>>>>>> + err = false; > >>>>>>>>>>>>>> + else > >>>>>>>>>>>>>> + return; > >>>>>>>>>>>>>> + > >>>>>>>>>>>>>> + curr_pa = cmdq_thread_readl(thread, CMDQ_THR_CURR_ADDR); > >>>>>>>>>>>>>> + > >>>>>>>>>>>>>> + list_for_each_entry_safe(task, tmp, &thread->task_busy_list, > >>>>>>>>>>>>>> + list_entry) { > >>>>>>>>>>>>>> + if (curr_pa >= task->pa_base && > >>>>>>>>>>>>>> + curr_pa < (task->pa_base + task->command_size)) > >>>>>>>>>>>>> > >>>>>>>>>>>>> What are you checking here? It seems as if you make some implcit > >>>>>>>>>>>>> assumptions about pa_base and the order of execution of > >>>>>>>>>>>>> commands in the > >>>>>>>>>>>>> thread. Is it save to do so? Does dma_alloc_coherent give any > >>>>>>>>>>>>> guarantees > >>>>>>>>>>>>> about dma_handle? > >>>>>>>>>>>> > >>>>>>>>>>>> 1. Check what is the current running task in this GCE thread. > >>>>>>>>>>>> 2. Yes. > >>>>>>>>>>>> 3. Yes, CMDQ doesn't use iommu, so physical address is continuous. > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Yes, physical addresses might be continous, but AFAIK there is no > >>>>>>>>>>> guarantee that the dma_handle address is steadily growing, when > >>>>>>>>>>> calling > >>>>>>>>>>> dma_alloc_coherent. And if I understand the code correctly, you > >>>>>>>>>>> use this > >>>>>>>>>>> assumption to decide if the task picked from task_busy_list is > >>>>>>>>>>> currently > >>>>>>>>>>> executing. So I think this mecanism is not working. > >>>>>>>>>> > >>>>>>>>>> I don't use dma_handle address, and just use physical addresses. > >>>>>>>>>> From CPU's point of view, tasks are linked by the busy list. > >>>>>>>>>> From GCE's point of view, tasks are linked by the JUMP command. > >>>>>>>>>> > >>>>>>>>>>> In which cases does the HW thread raise an interrupt. > >>>>>>>>>>> In case of error. When does CMDQ_THR_IRQ_DONE get raised? > >>>>>>>>>> > >>>>>>>>>> GCE will raise interrupt if any task is done or error. > >>>>>>>>>> However, GCE is fast, so CPU may get multiple done tasks > >>>>>>>>>> when it is running ISR. > >>>>>>>>>> > >>>>>>>>>> In case of error, that GCE thread will pause and raise interrupt. > >>>>>>>>>> So, CPU may get multiple done tasks and one error task. > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> I think we should reimplement the ISR mechanism. Can't we just read > >>>>>>>>> CURR_IRQ_STATUS and THR_IRQ_STATUS in the handler and leave > >>>>>>>>> cmdq_handle_error_done to the thread_fn? You will need to pass > >>>>>>>>> information from the handler to thread_fn, but that shouldn't be an > >>>>>>>>> issue. AFAIK interrupts are disabled in the handler, so we should stay > >>>>>>>>> there as short as possible. Traversing task_busy_list is expensive, so > >>>>>>>>> we need to do it in a thread context. > >>>>>>>> > >>>>>>>> Actually, our initial implementation is similar to your suggestion, > >>>>>>>> but display needs CMDQ to return callback function very precisely, > >>>>>>>> else display will drop frame. > >>>>>>>> For display, CMDQ interrupt will be raised every 16 ~ 17 ms, > >>>>>>>> and CMDQ needs to call callback function in ISR. > >>>>>>>> If we defer callback to workqueue, the time interval may be larger than > >>>>>>>> 32 ms.sometimes. > >>>>>>>> > >>>>>>> > >>>>>>> I think the problem is, that you implemented the workqueue as a ordered > >>>>>>> workqueue, so there is no parallel processing. I'm still not sure why > >>>>>>> you need the workqueue to be ordered. Can you please explain. > >>>>>> > >>>>>> The order should be kept. > >>>>>> Let me use mouse cursor as an example. > >>>>>> If task 1 means move mouse cursor to point A, task 2 means point B, > >>>>>> and task 3 means point C, our expected result is A -> B -> C. > >>>>>> If the order is not kept, the result could become A -> C -> B. > >>>>>> > >>>>> > >>>>> Got it, thanks for the clarification. > >>>>> > >>>> > >>>> I think a way to get rid of the workqueue is to use a timer, which gets > >>>> programmed to the time a timeout in the first task in the busy list > >>>> would happen. Everytime we update the busy list (e.g. because of task > >>>> got finished by the thread), we update the timer. When the timer > >>>> triggers, which hopefully won't happen too often, we return timeout on > >>>> the busy list elements, until the time is lower then the actual time. > >>>> > >>>> At least with this we can reduce the data structures in this driver and > >>>> make it more lightweight. > >>> > >>> From my understanding, your proposed method can handle timeout case. > >>> > >>> However, the workqueue is also in charge of releasing tasks. > >>> Do you take releasing tasks into consideration by using the proposed > >>> timer method? > >>> Furthermore, I think the code will become more complex if we also use > >>> timer to implement releasing tasks. > >>> > >> > >> Can't we call > >> clk_disable_unprepare(cmdq->clock); > >> cmdq_task_release(task); > >> after invoking the callback? > > > > Do you mean just call these two functions in ISR? > > My major concern is dma_free_coherent() and kfree() in > > cmdq_task_release(task). > > Why do we need the dma calls at all? Can't we just calculate the > physical address using __pa(x)? I prefer to use dma_map_single/dma_unmap_single. > > Therefore, your suggestion is to use GFP_ATOMIC for both > > dma_alloc_coherent() and kzalloc(). Right? > > I don't think we need GFP_ATOMIC, the critical path will just free the > memory. I tested these two functions, and kfree was safe. However, dma_free_coherent raised BUG. BUG: failure at /mnt/host/source/src/third_party/kernel/v3.18/mm/vmalloc.c:1514/vunmap()! 1512 void vunmap(const void *addr) 1513 { 1514 BUG_ON(in_interrupt()); // <-- here 1515 might_sleep(); 1516 if (addr) 1517 __vunmap(addr, 0); 1518 } 1519 EXPORT_SYMBOL(vunmap); Therefore, I plan to use kmalloc + dma_map_single instead of dma_alloc_coherent, and dma_unmap_single + kfree instead of dma_free_coherent. What do you think about the function replacement? > > If so, I can try to implement timeout by timer, and discuss with you > > if I have further questions. > > > > Sounds good :) > > Thanks, > Matthias Thanks, HS > >> Regrading the clock, wouldn't it be easier to handle the clock > >> enable/disable depending on the state of task_busy_list? I suppose we > >> can't as we would need to check the task_busy_list of all threads, right? > >> > >> Regards, > >> Matthias > > > > Thanks, > > HS > >