From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752830AbcFBIqk (ORCPT ); Thu, 2 Jun 2016 04:46:40 -0400 Received: from mail-wm0-f67.google.com ([74.125.82.67]:35012 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752023AbcFBIqe (ORCPT ); Thu, 2 Jun 2016 04:46:34 -0400 Subject: Re: [PATCH v8 2/3] CMDQ: Mediatek CMDQ driver To: Horng-Shyang Liao References: <1464578397-29743-1-git-send-email-hs.liao@mediatek.com> <1464578397-29743-3-git-send-email-hs.liao@mediatek.com> <574C5CBF.7060002@gmail.com> <1464683762.14604.59.camel@mtksdaap41> <574DEE40.9010008@gmail.com> <1464775020.11122.40.camel@mtksdaap41> Cc: Rob Herring , Daniel Kurtz , Sascha Hauer , devicetree@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, srv_heupstream@mediatek.com, Sascha Hauer , Philipp Zabel , Nicolas Boichat , CK HU , cawa cheng , Bibby Hsieh , YT Shen , Daoyuan Huang , Damon Chu , Josh-YC Liu , Glory Hung , Jiaguang Zhang , Dennis-YC Hsieh From: Matthias Brugger Message-ID: <574FF264.7050209@gmail.com> Date: Thu, 2 Jun 2016 10:46:28 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.0 MIME-Version: 1.0 In-Reply-To: <1464775020.11122.40.camel@mtksdaap41> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/06/16 11:57, Horng-Shyang Liao wrote: > Hi Mathias, > > Please see my inline reply. > > On Tue, 2016-05-31 at 22:04 +0200, Matthias Brugger wrote: >> >> On 31/05/16 10:36, Horng-Shyang Liao wrote: >>> Hi Mathias, >>> >>> Please see my inline reply. >>> >>> On Mon, 2016-05-30 at 17:31 +0200, Matthias Brugger wrote: >>>> >>>> On 30/05/16 05:19, HS Liao wrote: >>>>> This patch is first version of Mediatek Command Queue(CMDQ) driver. The >>>>> CMDQ is used to help read/write registers with critical time limitation, >>>>> such as updating display configuration during the vblank. It controls >>>>> Global Command Engine (GCE) hardware to achieve this requirement. >>>>> Currently, CMDQ only supports display related hardwares, but we expect >>>>> it can be extended to other hardwares for future requirements. >>>>> >>>>> Signed-off-by: HS Liao >>>>> Signed-off-by: CK Hu >>>>> --- >>>>> drivers/soc/mediatek/Kconfig | 10 + >>>>> drivers/soc/mediatek/Makefile | 1 + >>>>> drivers/soc/mediatek/mtk-cmdq.c | 943 ++++++++++++++++++++++++++++++++++++++++ >>>>> include/soc/mediatek/cmdq.h | 197 +++++++++ >>>>> 4 files changed, 1151 insertions(+) >>>>> create mode 100644 drivers/soc/mediatek/mtk-cmdq.c >>>>> create mode 100644 include/soc/mediatek/cmdq.h >>>>> >>>>> diff --git a/drivers/soc/mediatek/Kconfig b/drivers/soc/mediatek/Kconfig >>>>> index 0a4ea80..c4ad75c 100644 >>>>> --- a/drivers/soc/mediatek/Kconfig >>>>> +++ b/drivers/soc/mediatek/Kconfig >>>>> @@ -1,6 +1,16 @@ >>>>> # >>>>> # MediaTek SoC drivers >>>>> # >>>>> +config MTK_CMDQ >>>>> + bool "MediaTek CMDQ Support" >>>>> + depends on ARCH_MEDIATEK || COMPILE_TEST >> >> depends on ARM64 ? > > Will add ARM64. > >>>>> + select MTK_INFRACFG >>>>> + help >>>>> + Say yes here to add support for the MediaTek Command Queue (CMDQ) >>>>> + driver. The CMDQ is used to help read/write registers with critical >>>>> + time limitation, such as updating display configuration during the >>>>> + vblank. >>>>> + >>>>> config MTK_INFRACFG >>>>> bool "MediaTek INFRACFG Support" >>>>> depends on ARCH_MEDIATEK || COMPILE_TEST >>>>> diff --git a/drivers/soc/mediatek/Makefile b/drivers/soc/mediatek/Makefile >>>>> index 12998b0..f7397ef 100644 >>>>> --- a/drivers/soc/mediatek/Makefile >>>>> +++ b/drivers/soc/mediatek/Makefile >>>>> @@ -1,3 +1,4 @@ >>>>> +obj-$(CONFIG_MTK_CMDQ) += mtk-cmdq.o >>>>> obj-$(CONFIG_MTK_INFRACFG) += mtk-infracfg.o >>>>> obj-$(CONFIG_MTK_PMIC_WRAP) += mtk-pmic-wrap.o >>>>> obj-$(CONFIG_MTK_SCPSYS) += mtk-scpsys.o >>>>> diff --git a/drivers/soc/mediatek/mtk-cmdq.c b/drivers/soc/mediatek/mtk-cmdq.c >>>>> new file mode 100644 >>>>> index 0000000..e9d6e1c >>>>> --- /dev/null >>>>> +++ b/drivers/soc/mediatek/mtk-cmdq.c >>>>> @@ -0,0 +1,943 @@ >>>>> +/* >>>>> + * Copyright (c) 2015 MediaTek Inc. >>>>> + * >>>>> + * This program is free software; you can redistribute it and/or modify >>>>> + * it under the terms of the GNU General Public License version 2 as >>>>> + * published by the Free Software Foundation. >>>>> + * >>>>> + * This program is distributed in the hope that it will be useful, >>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of >>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >>>>> + * GNU General Public License for more details. >>>>> + */ >>>>> + >>>>> +#include >>>>> +#include >>>>> +#include >>>>> +#include >>>>> +#include >>>>> +#include >>>>> +#include >>>>> +#include >>>>> +#include >>>>> +#include >>>>> +#include >>>>> +#include >>>>> +#include >>>>> +#include >>>>> +#include >>>>> +#include >>>>> +#include >>>>> +#include >>>>> +#include >>>>> + >>>>> +#define CMDQ_INITIAL_CMD_BLOCK_SIZE PAGE_SIZE >>>>> +#define CMDQ_INST_SIZE 8 /* instruction is 64-bit */ >>>>> +#define CMDQ_TIMEOUT_MS 1000 >>>>> +#define CMDQ_IRQ_MASK 0xffff >>>>> +#define CMDQ_DRIVER_DEVICE_NAME "mtk_cmdq" >>>>> +#define CMDQ_CLK_NAME "gce" >>>> >>>> We can put the names in directly to un-bloat the defines. >>> >>> I will use the names directly and remove defines. >>> >>>>> + >>>>> +#define CMDQ_CURR_IRQ_STATUS 0x010 >>>>> +#define CMDQ_CURR_LOADED_THR 0x018 >>>>> +#define CMDQ_THR_SLOT_CYCLES 0x030 >>>>> + >>>>> +#define CMDQ_THR_BASE 0x100 >>>>> +#define CMDQ_THR_SHIFT 0x080 >>>> >>>> Wouldn't be CMDQ_THR_SIZE more accurate? >>> >>> Will rename it. >>> >>>>> +#define CMDQ_THR_WARM_RESET 0x00 >>>>> +#define CMDQ_THR_ENABLE_TASK 0x04 >>>>> +#define CMDQ_THR_SUSPEND_TASK 0x08 >>>>> +#define CMDQ_THR_CURR_STATUS 0x0c >>>>> +#define CMDQ_THR_IRQ_STATUS 0x10 >>>>> +#define CMDQ_THR_IRQ_ENABLE 0x14 >>>>> +#define CMDQ_THR_CURR_ADDR 0x20 >>>>> +#define CMDQ_THR_END_ADDR 0x24 >>>>> +#define CMDQ_THR_CFG 0x40 >>>>> + >>>>> +#define CMDQ_THR_ENABLED 0x1 >>>>> +#define CMDQ_THR_DISABLED 0x0 >>>>> +#define CMDQ_THR_SUSPEND 0x1 >>>>> +#define CMDQ_THR_RESUME 0x0 >>>>> +#define CMDQ_THR_STATUS_SUSPENDED BIT(1) >>>>> +#define CMDQ_THR_DO_WARM_RESET BIT(0) >>>>> +#define CMDQ_THR_ACTIVE_SLOT_CYCLES 0x3200 >>>>> +#define CMDQ_THR_PRIORITY 3 >>>>> +#define CMDQ_THR_IRQ_DONE 0x1 >>>>> +#define CMDQ_THR_IRQ_ERROR 0x12 >>>>> +#define CMDQ_THR_IRQ_EN 0x13 /* done + error */ >>>> >>>> #define CMDQ_THR_IRQ_EN (CMDQ_THR_IRQ_ERROR | CMDQ_THR_IRQ_DONE) >>> >>> Will do. >>> >>>>> +#define CMDQ_THR_IRQ_MASK 0x13 >>>> >>>> never used. >>> >>> Will remove. >>> >>>>> +#define CMDQ_THR_EXECUTING BIT(31) >>>>> + >>>>> +#define CMDQ_ARG_A_WRITE_MASK 0xffff >>>>> +#define CMDQ_SUBSYS_MASK 0x1f >>>>> +#define CMDQ_OP_CODE_MASK 0xff000000 >>>>> + >>>>> +#define CMDQ_OP_CODE_SHIFT 24 >>>> >>>> Couldn't we connect the mask with the shift, or aren't they related? >>>> >>>> #define CMDQ_OP_CODE_MASK (0xff << CMDQ_OP_CODE_SHIFT) >>> >>> Will do. >>> >>>>> +#define CMDQ_SUBSYS_SHIFT 16 >>>>> + >>>>> +#define CMDQ_WRITE_ENABLE_MASK BIT(0) >>>>> +#define CMDQ_JUMP_BY_OFFSET 0x10000000 >>>>> +#define CMDQ_JUMP_BY_PA 0x10000001 >>>>> +#define CMDQ_JUMP_PASS CMDQ_INST_SIZE >>>>> +#define CMDQ_WFE_UPDATE BIT(31) >>>>> +#define CMDQ_WFE_WAIT BIT(15) >>>>> +#define CMDQ_WFE_WAIT_VALUE 0x1 >>>>> +#define CMDQ_EOC_IRQ_EN BIT(0) >>>>> + >>>>> +enum cmdq_thread_index { >>>>> + CMDQ_THR_DISP_MAIN_IDX, /* main */ >>>>> + CMDQ_THR_DISP_SUB_IDX, /* sub */ >>>>> + CMDQ_THR_DISP_MISC_IDX, /* misc */ >>>>> + CMDQ_THR_MAX_COUNT, /* max */ >>>>> +}; >>>>> + >>>>> +/* >>>>> + * CMDQ_CODE_MOVE: >>>>> + * move value into internal register as mask >>>>> + * format: op mask >>>>> + * CMDQ_CODE_WRITE: >>>>> + * write value into target register >>>>> + * format: op subsys address value >>>>> + * CMDQ_CODE_JUMP: >>>>> + * jump by offset >>>>> + * format: op offset >>>>> + * CMDQ_CODE_WFE: >>>>> + * wait for event and clear >>>>> + * it is just clear if no wait >>>>> + * format: [wait] op event update:1 to_wait:1 wait:1 >>>>> + * [clear] op event update:1 to_wait:0 wait:0 >>>>> + * CMDQ_CODE_EOC: >>>>> + * end of command >>>>> + * format: op irq_flag >>>>> + */ >>>> >>>> I think we need more documentation of how this command queue engine is >>>> working. If not, I think it will be really complicated to understand how >>>> to use this. >>>> >>>>> +enum cmdq_code { >>>>> + CMDQ_CODE_MOVE = 0x02, >>>>> + CMDQ_CODE_WRITE = 0x04, >>>>> + CMDQ_CODE_JUMP = 0x10, >>>>> + CMDQ_CODE_WFE = 0x20, >>>>> + CMDQ_CODE_EOC = 0x40, >>>>> +}; >>>>> + >>>>> +enum cmdq_task_state { >>>>> + TASK_STATE_BUSY, /* running on a GCE thread */ >>>>> + TASK_STATE_ERROR, >>>>> + TASK_STATE_DONE, >>>>> +}; >>>>> + >>>>> +struct cmdq_task_cb { >>>>> + cmdq_async_flush_cb cb; >>>>> + void *data; >>>>> +}; >>>>> + >>>>> +struct cmdq_thread { >>>>> + void __iomem *base; >>>>> + struct list_head task_busy_list; >>>>> + wait_queue_head_t wait_task_done; >>>>> +}; >>>>> + >>>>> +struct cmdq_task { >>>>> + struct cmdq *cmdq; >>>>> + struct list_head list_entry; >>>>> + enum cmdq_task_state task_state; >>>>> + void *va_base; >>>>> + dma_addr_t pa_base; >>>>> + u64 engine_flag; >>>>> + size_t command_size; >>>>> + u32 num_cmd; >>>> >>>> num_cmd is directly connected to command_size. I prefer to just keep >>>> num_cmd and calculate the command_size where necessary. >>> >>> After I trace code, I prefer to keep command_size and calculate num_cmd >>> where necessary. What do you think? >>> >> >> I suppose you prefer this, as you are writing to the GCE depending on >> the command_size. I think it is worth to create a macro for the >> calculation of the number of commands, to make the code more readable. >> Would be nice if you would just pass cmdq_task to it and it would return >> the number. Just as an idea. > > Will add macro. > >>>>> + struct cmdq_thread *thread; >>>>> + struct cmdq_task_cb cb; >>>>> + struct work_struct release_work; >>>>> +}; >>>>> + >>>>> +struct cmdq { >>>>> + struct device *dev; >>>>> + void __iomem *base; >>>>> + u32 irq; >>>>> + struct workqueue_struct *task_release_wq; >>>>> + struct cmdq_thread thread[CMDQ_THR_MAX_COUNT]; >>>>> + struct mutex task_mutex; /* for task */ >>>>> + spinlock_t exec_lock; /* for exec */ >>>>> + struct clk *clock; >>>>> + bool suspended; >>>>> +}; >>>>> + >>>>> +struct cmdq_subsys { >>>>> + u32 base; >>>>> + int id; >>>>> +}; >>>>> + >>>>> +static const struct cmdq_subsys gce_subsys[] = { >>>>> + {0x1400, 1}, >>>>> + {0x1401, 2}, >>>>> + {0x1402, 3}, >>>>> +}; >>>>> + >>>>> +static int cmdq_subsys_base_to_id(u32 base) >>>>> +{ >>>>> + int i; >>>>> + >>>>> + for (i = 0; i < ARRAY_SIZE(gce_subsys); i++) >>>>> + if (gce_subsys[i].base == base) >>>>> + return gce_subsys[i].id; >>>>> + return -EFAULT; >>>>> +} >>>>> + >>>>> +static int cmdq_eng_get_thread(u64 flag) >>>>> +{ >>>>> + if (flag & BIT_ULL(CMDQ_ENG_DISP_DSI0)) >>>>> + return CMDQ_THR_DISP_MAIN_IDX; >>>>> + else if (flag & BIT_ULL(CMDQ_ENG_DISP_DPI0)) >>>>> + return CMDQ_THR_DISP_SUB_IDX; >>>>> + else >>>>> + return CMDQ_THR_DISP_MISC_IDX; >>>>> +} >>>>> + >>>>> +static void cmdq_task_release(struct cmdq_task *task) >>>>> +{ >>>>> + struct cmdq *cmdq = task->cmdq; >>>>> + >>>>> + dma_free_coherent(cmdq->dev, task->command_size, task->va_base, >>>>> + task->pa_base); >>>>> + kfree(task); >>>>> +} >>>>> + >>>>> +static struct cmdq_task *cmdq_task_acquire(struct cmdq_rec *rec, >>>>> + struct cmdq_task_cb cb) >>>>> +{ >>>>> + struct cmdq *cmdq = rec->cmdq; >>>>> + struct device *dev = cmdq->dev; >>>>> + struct cmdq_task *task; >>>>> + >>>>> + task = kzalloc(sizeof(*task), GFP_KERNEL); >>>>> + INIT_LIST_HEAD(&task->list_entry); >>>>> + task->va_base = dma_alloc_coherent(dev, rec->command_size, >>>>> + &task->pa_base, GFP_KERNEL); >>>>> + if (!task->va_base) { >>>>> + dev_err(dev, "allocate command buffer failed\n"); >>>>> + kfree(task); >>>>> + return NULL; >>>>> + } >>>>> + >>>>> + task->cmdq = cmdq; >>>>> + task->command_size = rec->command_size; >>>>> + task->engine_flag = rec->engine_flag; >>>>> + task->task_state = TASK_STATE_BUSY; >>>>> + task->cb = cb; >>>>> + memcpy(task->va_base, rec->buf, rec->command_size); >>>>> + task->num_cmd = task->command_size / CMDQ_INST_SIZE; >>>>> + return task; >>>>> +} >>>>> + >>>>> +static void cmdq_thread_writel(struct cmdq_thread *thread, u32 value, >>>>> + u32 offset) >>>>> +{ >>>>> + writel(value, thread->base + offset); >>>>> +} >>>>> + >>>>> +static u32 cmdq_thread_readl(struct cmdq_thread *thread, u32 offset) >>>>> +{ >>>>> + return readl(thread->base + offset); >>>>> +} >>>> >>>> We can get rid of cmdq_thread_readl/writel. >>> >>> Will do. >>> >>>>> + >>>>> +static int cmdq_thread_suspend(struct cmdq *cmdq, struct cmdq_thread *thread) >>>>> +{ >>>>> + u32 status; >>>>> + >>>>> + cmdq_thread_writel(thread, CMDQ_THR_SUSPEND, CMDQ_THR_SUSPEND_TASK); >>>>> + >>>>> + /* If already disabled, treat as suspended successful. */ >>>>> + if (!(cmdq_thread_readl(thread, CMDQ_THR_ENABLE_TASK) & >>>>> + CMDQ_THR_ENABLED)) >>>>> + return 0; >>>>> + >>>>> + if (readl_poll_timeout_atomic(thread->base + CMDQ_THR_CURR_STATUS, >>>>> + status, status & CMDQ_THR_STATUS_SUSPENDED, 0, 10)) { >>>>> + dev_err(cmdq->dev, "suspend GCE thread 0x%x failed\n", >>>>> + (u32)(thread->base - cmdq->base)); >>>>> + return -EFAULT; >>>>> + } >>>>> + >>>>> + return 0; >>>>> +} >>>>> + >>>>> +static void cmdq_thread_resume(struct cmdq_thread *thread) >>>>> +{ >>>>> + cmdq_thread_writel(thread, CMDQ_THR_RESUME, CMDQ_THR_SUSPEND_TASK); >>>>> +} >>>>> + >>>>> +static int cmdq_thread_reset(struct cmdq *cmdq, struct cmdq_thread *thread) >>>>> +{ >>>>> + u32 warm_reset; >>>>> + >>>>> + cmdq_thread_writel(thread, CMDQ_THR_DO_WARM_RESET, CMDQ_THR_WARM_RESET); >>>>> + if (readl_poll_timeout_atomic(thread->base + CMDQ_THR_WARM_RESET, >>>>> + warm_reset, !(warm_reset & CMDQ_THR_DO_WARM_RESET), >>>>> + 0, 10)) { >>>>> + dev_err(cmdq->dev, "reset GCE thread 0x%x failed\n", >>>>> + (u32)(thread->base - cmdq->base)); >>>>> + return -EFAULT; >>>>> + } >>>>> + writel(CMDQ_THR_ACTIVE_SLOT_CYCLES, cmdq->base + CMDQ_THR_SLOT_CYCLES); >>>>> + return 0; >>>>> +} >>>>> + >>>>> +static void cmdq_thread_disable(struct cmdq *cmdq, struct cmdq_thread *thread) >>>>> +{ >>>>> + cmdq_thread_reset(cmdq, thread); >>>>> + cmdq_thread_writel(thread, CMDQ_THR_DISABLED, CMDQ_THR_ENABLE_TASK); >>>>> +} >>>>> + >>>>> +/* notify GCE to re-fetch commands by setting GCE thread PC */ >>>>> +static void cmdq_thread_invalidate_fetched_data(struct cmdq_thread *thread) >>>>> +{ >>>>> + cmdq_thread_writel(thread, >>>>> + cmdq_thread_readl(thread, CMDQ_THR_CURR_ADDR), >>>>> + CMDQ_THR_CURR_ADDR); >>>>> +} >>>>> + >>>>> +static void cmdq_task_insert_into_thread(struct cmdq_task *task) >>>>> +{ >>>>> + struct cmdq_thread *thread = task->thread; >>>>> + struct cmdq_task *prev_task = list_last_entry( >>>>> + &thread->task_busy_list, typeof(*task), list_entry); >>>>> + u64 *prev_task_base = prev_task->va_base; >>>>> + >>>>> + /* let previous task jump to this task */ >>>>> + prev_task_base[prev_task->num_cmd - 1] = (u64)CMDQ_JUMP_BY_PA << 32 | >>>>> + task->pa_base; >>>>> + >>>>> + cmdq_thread_invalidate_fetched_data(thread); >>>>> +} >>>>> + >>>>> +/* we assume tasks in the same display GCE thread are waiting the same event. */ >>>>> +static void cmdq_task_remove_wfe(struct cmdq_task *task) >>>>> +{ >>>>> + u32 wfe_option = CMDQ_WFE_UPDATE | CMDQ_WFE_WAIT | CMDQ_WFE_WAIT_VALUE; >>>>> + u32 wfe_op = CMDQ_CODE_WFE << CMDQ_OP_CODE_SHIFT; >>>>> + u32 *base = task->va_base; >>>>> + u32 num_cmd = task->num_cmd << 1; >>>>> + int i; >>>>> + >>>>> + for (i = 0; i < num_cmd; i += 2) >>>>> + if (base[i] == wfe_option && >>>>> + (base[i + 1] & CMDQ_OP_CODE_MASK) == wfe_op) { >>>>> + base[i] = CMDQ_JUMP_PASS; >>>>> + base[i + 1] = CMDQ_JUMP_BY_OFFSET; >>>>> + } >>>> >>>> After using the command buffer as a void pointer a u64 pointer, we now >>>> reference to it as u32. I would prefer to explain here, how the command >>>> looks like we are searching for and use a for loop passing task->num_cmd >>>> instead. >>> >>> Will use u64* to rewrite the above code. >>> >>>>> +} >>>>> + >>>>> +static void cmdq_task_exec(struct cmdq_task *task, struct cmdq_thread *thread) >>>>> +{ >>>>> + struct cmdq *cmdq = task->cmdq; >>>>> + unsigned long flags; >>>>> + unsigned long curr_pa, end_pa; >>>>> + >>>>> + WARN_ON(clk_prepare_enable(cmdq->clock) < 0); >>>>> + spin_lock_irqsave(&cmdq->exec_lock, flags); >>>> >>>> cmdq_task_exec is called with cmdq->task_mutex held, so why do we need >>>> the spin_lock here? Can't we just use one of the two? >>> >>> We can drop task_mutex, but we will get some side effects. >>> 1. exec_lock needs to include more code, but I think it is not good for >>> spinlock. >>> 2. In cmdq_rec_flush_async(), task_mutex needs to protect >>> (1) cmdq->suspended, (2) cmdq_task_exec(), and >>> (3) cmdq_task_wait_release_schedule(). >>> If we drop task_mutex, we have to put cmdq->suspended if condition >>> just before cmdq_task_exec() and inside exec_lock, and we have to >>> release task and its command buffer if error. This will let flow >>> become more complex and enlarge code size. >>> >>> What do you think? >> >> Why do you need to protect cmdq_task_wait_release_schedule? We don't >> care about the order of the workqueue elements, do we? > > According to CK's comment, we have to ensure order to remove previous > task. > http://lists.infradead.org/pipermail/linux-mediatek/2016-May/005547.html > >> As far as I understand you would need to protect cmdq_task_acquire as >> well, to "ensure" continously growing pa_base. More on that below. > > We need to ensure continuous physical addresses in a task, but we don't > need to ensure continuous physical addresses between tasks. > So, I think it is unnecessary to protect by mutex or spinlock. > True, I didn't get that. >>> >>>>> + task->thread = thread; >>>>> + task->task_state = TASK_STATE_BUSY; >>>> >>>> That was already set in cmdq_task_acquire, why do we need to set it here >>>> again? >>> >>> Will drop it. >>> >> >> Yeah, but I think it makes more sense to drop it in cmdq_task_acquire >> instead. > > Will drop it in cmdq_task_acquire instead. > >>>>> + if (list_empty(&thread->task_busy_list)) { >>>>> + WARN_ON(cmdq_thread_reset(cmdq, thread) < 0); >>>>> + >>>>> + cmdq_thread_writel(thread, task->pa_base, CMDQ_THR_CURR_ADDR); >>>>> + cmdq_thread_writel(thread, task->pa_base + task->command_size, >>>>> + CMDQ_THR_END_ADDR); >>>>> + cmdq_thread_writel(thread, CMDQ_THR_PRIORITY, CMDQ_THR_CFG); >>>>> + cmdq_thread_writel(thread, CMDQ_THR_IRQ_EN, >>>>> + CMDQ_THR_IRQ_ENABLE); >>>>> + >>>>> + cmdq_thread_writel(thread, CMDQ_THR_ENABLED, >>>>> + CMDQ_THR_ENABLE_TASK); >>>>> + } else { >>>>> + WARN_ON(cmdq_thread_suspend(cmdq, thread) < 0); >>>>> + >>>>> + /* >>>>> + * check boundary condition >>>>> + * PC = END - 8, EOC is executed >>>>> + * PC = END, all CMDs are executed >>>>> + */ >>>>> + curr_pa = cmdq_thread_readl(thread, CMDQ_THR_CURR_ADDR); >>>>> + end_pa = cmdq_thread_readl(thread, CMDQ_THR_END_ADDR); >>>>> + if (curr_pa == end_pa - 8 || curr_pa == end_pa) { >>>> >>>> 8 refers to CMDQ_INST_SIZE, right? >>> >>> Yes, I will use CMDQ_INST_SIZE. >>> >>>>> + /* set to this task directly */ >>>>> + cmdq_thread_writel(thread, task->pa_base, >>>>> + CMDQ_THR_CURR_ADDR); >>>>> + } else { >>>>> + cmdq_task_insert_into_thread(task); >>>>> + >>>>> + if (thread == &cmdq->thread[CMDQ_THR_DISP_MAIN_IDX] || >>>>> + thread == &cmdq->thread[CMDQ_THR_DISP_SUB_IDX]) >>>>> + cmdq_task_remove_wfe(task); >>>> >>>> We could do this check using the task->engine_flag, I suppose that's >>>> easier to undestand then. >>> >>> Will use task->engine_flag. >>> >>>>> + >>>>> + smp_mb(); /* modify jump before enable thread */ >>>>> + } >>>>> + >>>>> + cmdq_thread_writel(thread, task->pa_base + task->command_size, >>>>> + CMDQ_THR_END_ADDR); >>>>> + cmdq_thread_resume(thread); >>>>> + } >>>>> + list_move_tail(&task->list_entry, &thread->task_busy_list); >>>>> + spin_unlock_irqrestore(&cmdq->exec_lock, flags); >>>>> +} >>>>> + >>>>> +static void cmdq_handle_error_done(struct cmdq *cmdq, >>>>> + struct cmdq_thread *thread, u32 irq_flag) >>>>> +{ >>>>> + struct cmdq_task *task, *tmp, *curr_task = NULL; >>>>> + u32 curr_pa; >>>>> + struct cmdq_cb_data cmdq_cb_data; >>>>> + bool err; >>>>> + >>>>> + if (irq_flag & CMDQ_THR_IRQ_ERROR) >>>>> + err = true; >>>>> + else if (irq_flag & CMDQ_THR_IRQ_DONE) >>>>> + err = false; >>>>> + else >>>>> + return; >>>>> + >>>>> + curr_pa = cmdq_thread_readl(thread, CMDQ_THR_CURR_ADDR); >>>>> + >>>>> + list_for_each_entry_safe(task, tmp, &thread->task_busy_list, >>>>> + list_entry) { >>>>> + if (curr_pa >= task->pa_base && >>>>> + curr_pa < (task->pa_base + task->command_size)) >>>> >>>> What are you checking here? It seems as if you make some implcit >>>> assumptions about pa_base and the order of execution of commands in the >>>> thread. Is it save to do so? Does dma_alloc_coherent give any guarantees >>>> about dma_handle? >>> >>> 1. Check what is the current running task in this GCE thread. >>> 2. Yes. >>> 3. Yes, CMDQ doesn't use iommu, so physical address is continuous. >>> >> >> Yes, physical addresses might be continous, but AFAIK there is no >> guarantee that the dma_handle address is steadily growing, when calling >> dma_alloc_coherent. And if I understand the code correctly, you use this >> assumption to decide if the task picked from task_busy_list is currently >> executing. So I think this mecanism is not working. > > I don't use dma_handle address, and just use physical addresses. > From CPU's point of view, tasks are linked by the busy list. > From GCE's point of view, tasks are linked by the JUMP command. > >> In which cases does the HW thread raise an interrupt. >> In case of error. When does CMDQ_THR_IRQ_DONE get raised? > > GCE will raise interrupt if any task is done or error. > However, GCE is fast, so CPU may get multiple done tasks > when it is running ISR. > > In case of error, that GCE thread will pause and raise interrupt. > So, CPU may get multiple done tasks and one error task. > I think we should reimplement the ISR mechanism. Can't we just read CURR_IRQ_STATUS and THR_IRQ_STATUS in the handler and leave cmdq_handle_error_done to the thread_fn? You will need to pass information from the handler to thread_fn, but that shouldn't be an issue. AFAIK interrupts are disabled in the handler, so we should stay there as short as possible. Traversing task_busy_list is expensive, so we need to do it in a thread context. I keep thinking about how to get rid of the two data structures, task_busy_list and the task_release_wq. We need the latter for the only sake of getting a timeout. Did you have a look on how the mailbox framework handles this? By the way, what is the reason to not implement the whole driver as a mailbox controller? For me, this driver looks like a good fit. >>>>> + curr_task = task; >>>>> + if (task->cb.cb) { >>>>> + cmdq_cb_data.err = curr_task ? err : false; >>>>> + cmdq_cb_data.data = task->cb.data; >>>>> + task->cb.cb(cmdq_cb_data); >>>>> + } >>>>> + task->task_state = (curr_task && err) ? TASK_STATE_ERROR : >>>>> + TASK_STATE_DONE; >>>>> + list_del(&task->list_entry); >>>>> + if (curr_task) >>>>> + break; >>>>> + } >>>>> + >>>>> + wake_up(&thread->wait_task_done); >>>>> +} >>>>> + >>>>> +static void cmdq_thread_irq_handler(struct cmdq *cmdq, int tid) >>>>> +{ >>>>> + struct cmdq_thread *thread = &cmdq->thread[tid]; >>>>> + unsigned long flags = 0L; >>>>> + u32 irq_flag; >>>>> + >>>>> + spin_lock_irqsave(&cmdq->exec_lock, flags); >>>>> + >>>>> + irq_flag = cmdq_thread_readl(thread, CMDQ_THR_IRQ_STATUS); >>>>> + cmdq_thread_writel(thread, ~irq_flag, CMDQ_THR_IRQ_STATUS); >>>>> + >>>>> + /* >>>>> + * Another CPU core could run "release task" right before we acquire >>>>> + * the spin lock, and thus reset / disable this GCE thread, so we >>>>> + * need to check the enable bit of this GCE thread. >>>>> + */ >>>>> + if (!(cmdq_thread_readl(thread, CMDQ_THR_ENABLE_TASK) & >>>>> + CMDQ_THR_ENABLED)) >>>>> + irq_flag = 0; >>>> >>>> cmdq_handle_error_done just retuns in this case. Programming this way >>>> just makes things confusing. What about: >>>> >>>> if (cmdq_thread_readl(thread, CMDQ_THR_ENABLE_TASK) & >>>> CMDQ_THR_ENABLED) >>>> cmdq_handle_error_done(cmdq, thread, irq_flag); >>>> else >>>> irq_flag = 0; >>>> >>>> spin_unlock_irqrestore(&cmdq->exec_lock, flags); >>> >>> We still need to clear irq_flag if GCE thread is disabled. >>> So, I think we can just return here. >>> >>> if (!(cmdq_thread_readl(thread, CMDQ_THR_ENABLE_TASK) & >>> CMDQ_THR_ENABLED)) >>> return; >>> >>> What do you think? >>> >> >> No, you can't just return, you need to unlock the spinlock. >> Anyway I would prefer it the other way round, as I put it in my last >> mail. Just delete the else branch, we don't need to set irq_flag to zero. > > Sorry for my previous wrong reply since I merge your comment > and CK's comment. > http://lists.infradead.org/pipermail/linux-mediatek/2016-May/005547.html > So, I will put this if condition into cmdq_task_handle_error_result() > and then just return it if GCE thread is disabled. > You mean in cmdq_task_handle_done We should rename this functions to not create confusion. Regards, Matthias >>>>> + >>>>> + cmdq_handle_error_done(cmdq, thread, irq_flag); >>>>> + spin_unlock_irqrestore(&cmdq->exec_lock, flags); >>>>> +} >>>>> + >>>>> +static irqreturn_t cmdq_irq_handler(int irq, void *dev) >>>>> +{ >>>>> + struct cmdq *cmdq = dev; >>>>> + u32 irq_status; >>>>> + int i; >>>>> + >>>>> + irq_status = readl(cmdq->base + CMDQ_CURR_IRQ_STATUS); >>>>> + irq_status &= CMDQ_IRQ_MASK; >>>>> + irq_status ^= CMDQ_IRQ_MASK; >>>> >>>> irq_status can be much bigger then 3, which is the number of threads in >>>> the system (CMDQ_THR_MAX_COUNT). So why we use this mask here isn't >>>> clear to me. >>> >>> Our GCE hardware has 16 threads, but we only use 3 threads currently. >>> >> >> Ok, but please use bitops here. > > Will use bitops. > >>>>> + >>>>> + if (!irq_status) >>>>> + return IRQ_NONE; >>>>> + >>>>> + while (irq_status) { >>>>> + i = ffs(irq_status) - 1; >>>>> + irq_status &= ~BIT(i); >>>>> + cmdq_thread_irq_handler(cmdq, i); >>>>> + } >>>> >>>> Can you explain how the irq status register looks like, that would it >>>> make much easier to understand what happens here. >>> >>> Bit 0 ~ 15 of irq status register represents GCE thread 0 ~ 15 >>> interrupt. 0 means asserting interrupt; 1 means no interrupt. >>> >> >> Thanks, that helped. :) >> >>>>> + >>>>> + return IRQ_HANDLED; >>>>> +} >>>>> + >>>>> +static int cmdq_task_handle_error_result(struct cmdq_task *task) >>>> >>>> We never check the return values, why do we have them? >>> >>> Will drop return value. >>> >>>>> +{ >>>>> + struct cmdq *cmdq = task->cmdq; >>>>> + struct device *dev = cmdq->dev; >>>>> + struct cmdq_thread *thread = task->thread; >>>>> + struct cmdq_task *next_task, *prev_task; >>>>> + u32 irq_flag; >>>>> + >>>>> + /* suspend GCE thread to ensure consistency */ >>>>> + WARN_ON(cmdq_thread_suspend(cmdq, thread) < 0); >>>>> + >>>>> + /* ISR has handled this error task */ >>>>> + if (task->task_state == TASK_STATE_ERROR) { >>>>> + next_task = list_first_entry_or_null(&thread->task_busy_list, >>>>> + struct cmdq_task, list_entry); >>>>> + if (next_task) /* move to next task */ >>>>> + cmdq_thread_writel(thread, next_task->pa_base, >>>>> + CMDQ_THR_CURR_ADDR); >>>> >>>> We have to do this, as we suppose that the thread did not reach the jump >>>> instruction we put into it's command queue, right? >>> >>> Yes. >>> >> >> So this should then go into it's own function. In wait_release_work, >> something like this: >> >> if(task->task_state == TASK_STATE_ERROR) >> cmdq_task_handle_error(task) > > OK. > I will write new function cmdq_task_handle_error() to handle error case. > >>>>> + cmdq_thread_resume(thread); >>>>> + return -ECANCELED; >>>>> + } >>>>> + >>>> >>>> if task_state != ERROR and != DONE. This means that the timeout of >>>> task_release_wq has timed out, right? >>> >>> Yes. >>> >>>>> + /* >>>>> + * Save next_task and prev_task in advance >>>>> + * since cmdq_handle_error_done will remove list_entry. >>>>> + */ >>>>> + next_task = prev_task = NULL; >>>>> + if (task->list_entry.next != &thread->task_busy_list) >>>>> + next_task = list_next_entry(task, list_entry); >>>>> + if (task->list_entry.prev != &thread->task_busy_list) >>>>> + prev_task = list_prev_entry(task, list_entry); >>>>> + >>>>> + /* >>>>> + * Although IRQ is disabled, GCE continues to execute. >>>>> + * It may have pending IRQ before GCE thread is suspended, >>>>> + * so check this condition again. >>>>> + */ >>>> >>>> The first thing we did in this function was suspending the thread. Why >>>> do we need this then? >>> >>> Because timeout is CPU timeout not GCE timeout, GCE could just finish >>> this task before the GCE thread is suspended. >>> >> >> What are the reasons for a timeout? An error has happend, or the task is >> still executing. > > From GCE's point of view, this task is still executing. > But, it could be an error of client. > For example, task may never get event if display turn off hardware > before waiting for task to finish its work. > >>>> To be honest this whole functions looks really like a design error. We >>>> have to sperate the states much clearer so that it is possible to >>>> understand what is happening in the GCE. Isn't it for example posible to >>>> have worker queues for timed out tasks and tasks with an error? I'm not >>>> sure how to do this, actually I'm not sure if I really understood how >>>> this is supposed to work. >>> >>> GCE doesn't have timeout. The timeout is decided and controlled by CPU, >>> so we check timeout in release work. >>> For error and done, they are easy to check by register, and we have >>> already created release work for timeout. So, I don't think we need to >>> create work queue for each case. >>> >>> What do you think? >>> >> >> I think, if we find in here, that the irq_flag is set, then the the >> interrupt handler was triggered and is spinning the spinlock. If this is >> not the case, we have a timeout and we handle this. > > I will write another function to handle error, and handle timeout here. > >>>> How much do we win, when we patch the thread command queue for every >>>> task we add, instead of just taking one task after another from the >>>> task_busy_list? >>> >>> GCE is used to help read/write registers with critical time limitation. >>> Sometimes, client may ask to process multiple tasks in a short period >>> of time, e.g. display flush multiple tasks for next vblank. So, CMDQ >>> shouldn't limit to process one task after another from the >>> task_busy_list. Currently, when interrupt or timeout, we will check >>> how many tasks are done, and which one is error or timeout. >>> >> >> So I suppose the device driver who use this are interested in throughput >> and not in latency. The callback of every >> >>>>> + irq_flag = cmdq_thread_readl(thread, CMDQ_THR_IRQ_STATUS); >>>>> + cmdq_handle_error_done(cmdq, thread, irq_flag); >>>>> + cmdq_thread_writel(thread, ~irq_flag, CMDQ_THR_IRQ_STATUS); >>>>> + >>>>> + if (task->task_state == TASK_STATE_DONE) { >>>>> + cmdq_thread_resume(thread); >>>>> + return 0; >>>>> + } >>>>> + >>>>> + if (task->task_state == TASK_STATE_ERROR) { >>>>> + dev_err(dev, "task 0x%p error\n", task); >>>>> + if (next_task) /* move to next task */ >>>>> + cmdq_thread_writel(thread, next_task->pa_base, >>>>> + CMDQ_THR_CURR_ADDR); >>>>> + cmdq_thread_resume(thread); >>>>> + return -ECANCELED; >>>>> + } >>>>> + >>>>> + /* Task is running, so we force to remove it. */ >>>>> + dev_err(dev, "task 0x%p timeout or killed\n", task); >>>>> + task->task_state = TASK_STATE_ERROR; >>>>> + >>>>> + if (prev_task) { >>>>> + u64 *prev_va = prev_task->va_base; >>>>> + u64 *curr_va = task->va_base; >>>>> + >>>>> + /* copy JUMP instruction */ >>>>> + prev_va[prev_task->num_cmd - 1] = curr_va[task->num_cmd - 1]; >>>>> + >>>>> + cmdq_thread_invalidate_fetched_data(thread); >>>>> + } else if (next_task) { /* move to next task */ >>>>> + cmdq_thread_writel(thread, next_task->pa_base, >>>>> + CMDQ_THR_CURR_ADDR); >>>>> + } >>>>> + >>>>> + list_del(&task->list_entry); >>>>> + cmdq_thread_resume(thread); >>>>> + >>>>> + /* call cb here to prevent lock */ >>>>> + if (task->cb.cb) { >>>>> + struct cmdq_cb_data cmdq_cb_data; >>>>> + >>>>> + cmdq_cb_data.err = true; >>>>> + cmdq_cb_data.data = task->cb.data; >>>>> + task->cb.cb(cmdq_cb_data); >>>>> + } >>>>> + >>>>> + return -ECANCELED; >>>>> +} >>>>> + >>>>> +static void cmdq_task_wait_release_work(struct work_struct *work_item) >>>>> +{ >>>>> + struct cmdq_task *task = container_of(work_item, struct cmdq_task, >>>>> + release_work); >>>>> + struct cmdq *cmdq = task->cmdq; >>>>> + struct cmdq_thread *thread = task->thread; >>>>> + unsigned long flags; >>>>> + >>>>> + wait_event_timeout(thread->wait_task_done, >>>>> + task->task_state != TASK_STATE_BUSY, >>>>> + msecs_to_jiffies(CMDQ_TIMEOUT_MS)); >>>>> + >>>>> + spin_lock_irqsave(&cmdq->exec_lock, flags); >>>>> + if (task->task_state != TASK_STATE_DONE) >>>>> + cmdq_task_handle_error_result(task); >>>>> + if (list_empty(&thread->task_busy_list)) >>>>> + cmdq_thread_disable(cmdq, thread); >>>>> + spin_unlock_irqrestore(&cmdq->exec_lock, flags); >>>>> + >>>>> + /* release regardless of success or not */ >>>>> + clk_disable_unprepare(cmdq->clock); >>>>> + cmdq_task_release(task); >>>>> +} >>>>> + >>>>> +static void cmdq_task_wait_release_schedule(struct cmdq_task *task) >>>>> +{ >>>>> + struct cmdq *cmdq = task->cmdq; >>>>> + >>>>> + INIT_WORK(&task->release_work, cmdq_task_wait_release_work); >>>>> + queue_work(cmdq->task_release_wq, &task->release_work); >>>>> +} >>>>> + >>>>> +static int cmdq_rec_realloc_cmd_buffer(struct cmdq_rec *rec, size_t size) >>>>> +{ >>>>> + void *new_buf; >>>>> + >>>>> + new_buf = krealloc(rec->buf, size, GFP_KERNEL | __GFP_ZERO); >>>>> + if (!new_buf) >>>>> + return -ENOMEM; >>>>> + rec->buf = new_buf; >>>>> + rec->buf_size = size; >>>>> + return 0; >>>>> +} >>>>> + >>>>> +struct cmdq_base *cmdq_register_device(struct device *dev) >>>>> +{ >>>>> + struct cmdq_base *cmdq_base; >>>>> + struct resource res; >>>>> + int subsys; >>>>> + u32 base; >>>>> + >>>>> + if (of_address_to_resource(dev->of_node, 0, &res)) >>>>> + return NULL; >>>>> + base = (u32)res.start; >>>>> + >>>>> + subsys = cmdq_subsys_base_to_id(base >> 16); >>>>> + if (subsys < 0) >>>>> + return NULL; >>>>> + >>>>> + cmdq_base = devm_kmalloc(dev, sizeof(*cmdq_base), GFP_KERNEL); >>>>> + if (!cmdq_base) >>>>> + return NULL; >>>>> + cmdq_base->subsys = subsys; >>>>> + cmdq_base->base = base; >>>>> + >>>>> + return cmdq_base; >>>>> +} >>>>> +EXPORT_SYMBOL(cmdq_register_device); >>>>> + >>>>> +int cmdq_rec_create(struct device *dev, u64 engine_flag, >>>>> + struct cmdq_rec **rec_ptr) >>>>> +{ >>>>> + struct cmdq_rec *rec; >>>>> + int err; >>>>> + >>>>> + rec = kzalloc(sizeof(*rec), GFP_KERNEL); >>>>> + if (!rec) >>>>> + return -ENOMEM; >>>>> + rec->cmdq = dev_get_drvdata(dev); >>>>> + rec->engine_flag = engine_flag; >>>>> + err = cmdq_rec_realloc_cmd_buffer(rec, CMDQ_INITIAL_CMD_BLOCK_SIZE); >>>> >>>> Just pass PAGE_SIZE here, no need for CMDQ_INITIAL_CMD_BLOCK_SIZE. >>> >>> Will do. >>> >>>>> + if (err < 0) { >>>>> + kfree(rec); >>>>> + return err; >>>>> + } >>>>> + *rec_ptr = rec; >>>>> + return 0; >>>>> +} >>>>> +EXPORT_SYMBOL(cmdq_rec_create); >>>>> + >>>>> +static int cmdq_rec_append_command(struct cmdq_rec *rec, enum cmdq_code code, >>>>> + u32 arg_a, u32 arg_b) >>>>> +{ >>>>> + u64 *cmd_ptr; >>>>> + int err; >>>>> + >>>>> + if (WARN_ON(rec->finalized)) >>>>> + return -EBUSY; >>>>> + if (code < CMDQ_CODE_MOVE || code > CMDQ_CODE_EOC) >>>>> + return -EINVAL; >>>> >>>> cmdq_rec_append_command is just called from inside this driver and code >>>> is a enum. We can expect it to be correct, no need for this check. >>> >>> Will drop this check. >>> >>>>> + if (unlikely(rec->command_size + CMDQ_INST_SIZE > rec->buf_size)) { >>>> >>>> command_size is the offset into the buffer to which a new command is >>>> written, so this name is highly confusing. I wonder if this would be >>>> easier to understand if we redefine command_size to something like the >>>> number of commands and divide/multiply CMDQ_INST_SIZE where this is needed. >>> >>> I can rename command_size to cmd_buf_size and calculate num_cmd by >>> dividing CMDQ_INST_SIZE. >>> What do you think? >>> >>>>> + err = cmdq_rec_realloc_cmd_buffer(rec, rec->buf_size * 2); >>>>> + if (err < 0) >>>>> + return err; >>>>> + } >>>>> + cmd_ptr = rec->buf + rec->command_size; >>>>> + (*cmd_ptr) = (u64)((code << CMDQ_OP_CODE_SHIFT) | arg_a) << 32 | arg_b; >>>>> + rec->command_size += CMDQ_INST_SIZE; >>>>> + return 0; >>>>> +} >>>>> + >>>>> +int cmdq_rec_write(struct cmdq_rec *rec, u32 value, struct cmdq_base *base, >>>>> + u32 offset) >>>>> +{ >>>>> + u32 arg_a = ((base->base + offset) & CMDQ_ARG_A_WRITE_MASK) | >>>>> + ((base->subsys & CMDQ_SUBSYS_MASK) << CMDQ_SUBSYS_SHIFT); >>>> >>>> base->subsys is the id from gce_sybsys, so we can expect it to be >>>> correct, no need to mask with CMDQ_SUBSYS_MASK. >>> >>> Will drop it. >>> >>>>> + return cmdq_rec_append_command(rec, CMDQ_CODE_WRITE, arg_a, value); >>>>> +} >>>>> +EXPORT_SYMBOL(cmdq_rec_write); >>>>> + >>>>> +int cmdq_rec_write_mask(struct cmdq_rec *rec, u32 value, >>>>> + struct cmdq_base *base, u32 offset, u32 mask) >>>>> +{ >>>>> + u32 offset_mask = offset; >>>>> + int err; >>>>> + >>>>> + if (mask != 0xffffffff) { >>>>> + err = cmdq_rec_append_command(rec, CMDQ_CODE_MOVE, 0, ~mask); >>>>> + if (err < 0) >>>>> + return err; >>>>> + offset_mask |= CMDQ_WRITE_ENABLE_MASK; >>>>> + } >>>>> + return cmdq_rec_write(rec, value, base, offset_mask); >>>>> +} >>>>> +EXPORT_SYMBOL(cmdq_rec_write_mask); >>>>> + >>>>> +int cmdq_rec_wfe(struct cmdq_rec *rec, enum cmdq_event event) >>>>> +{ >>>>> + u32 arg_b; >>>>> + >>>>> + if (event >= CMDQ_MAX_HW_EVENT_COUNT || event < 0) >>>>> + return -EINVAL; >>>>> + >>>>> + /* >>>>> + * bit 0-11: wait value >>>>> + * bit 15: 1 - wait, 0 - no wait >>>>> + * bit 16-27: update value >>>>> + * bit 31: 1 - update, 0 - no update >>>>> + */ >>>> >>>> I don't understand this comments. What are they for? >>> >>> This is for WFE command. I will comment it. >>> >>>>> + arg_b = CMDQ_WFE_UPDATE | CMDQ_WFE_WAIT | CMDQ_WFE_WAIT_VALUE; >>>>> + return cmdq_rec_append_command(rec, CMDQ_CODE_WFE, event, arg_b); >>>>> +} >>>>> +EXPORT_SYMBOL(cmdq_rec_wfe); >>>>> + >>>>> +int cmdq_rec_clear_event(struct cmdq_rec *rec, enum cmdq_event event) >>>>> +{ >>>>> + if (event >= CMDQ_MAX_HW_EVENT_COUNT || event < 0) >>>>> + return -EINVAL; >>>>> + >>>>> + return cmdq_rec_append_command(rec, CMDQ_CODE_WFE, event, >>>>> + CMDQ_WFE_UPDATE); >>>>> +} >>>>> +EXPORT_SYMBOL(cmdq_rec_clear_event); >>>>> + >>>>> +static int cmdq_rec_finalize(struct cmdq_rec *rec) >>>>> +{ >>>>> + int err; >>>>> + >>>>> + if (rec->finalized) >>>>> + return 0; >>>>> + >>>>> + /* insert EOC and generate IRQ for each command iteration */ >>>>> + err = cmdq_rec_append_command(rec, CMDQ_CODE_EOC, 0, CMDQ_EOC_IRQ_EN); >>>>> + if (err < 0) >>>>> + return err; >>>>> + >>>>> + /* JUMP to end */ >>>>> + err = cmdq_rec_append_command(rec, CMDQ_CODE_JUMP, 0, CMDQ_JUMP_PASS); >>>>> + if (err < 0) >>>>> + return err; >>>>> + >>>> >>>> Does this need to be atomic? >>>> What happens if after CODE_EOC and before CODE_JUMP some >>>> write/read/event gets added? >>>> What happens if more commands get added to the queue after CODE_JUMP, >>>> but before finalized is set to true. Why don't you use atomic functions >>>> to access finalized? >>> >>> Since cmdq_rec doesn't guarantee thread safe, mutex is needed when >>> client uses cmdq_rec. >>> >> >> Well I think that rec->finalized tries to implement this, but might >> fail, if two kernel threads work on the same cmdq_rec. >> >>>>> + rec->finalized = true; >>>>> + return 0; >>>>> +} >>>>> + >>>>> +int cmdq_rec_flush_async(struct cmdq_rec *rec, cmdq_async_flush_cb cb, >>>>> + void *data) >>>>> +{ >>>>> + struct cmdq *cmdq = rec->cmdq; >>>>> + struct cmdq_task *task; >>>>> + struct cmdq_task_cb task_cb; >>>>> + struct cmdq_thread *thread; >>>>> + int err; >>>>> + >>>>> + mutex_lock(&cmdq->task_mutex); >>>>> + if (cmdq->suspended) { >>>>> + dev_err(cmdq->dev, "%s is called after suspended\n", __func__); >>>>> + mutex_unlock(&cmdq->task_mutex); >>>>> + return -EPERM; >>>>> + } >>>>> + >>>>> + err = cmdq_rec_finalize(rec); >>>>> + if (err < 0) { >>>>> + mutex_unlock(&cmdq->task_mutex); >>>>> + return err; >>>>> + } >>>>> + >>>>> + task_cb.cb = cb; >>>>> + task_cb.data = data; >>>>> + task = cmdq_task_acquire(rec, task_cb); >>>>> + if (!task) { >>>>> + mutex_unlock(&cmdq->task_mutex); >>>>> + return -EFAULT; >>>>> + } >>>>> + >>>>> + thread = &cmdq->thread[cmdq_eng_get_thread(task->engine_flag)]; >>>>> + cmdq_task_exec(task, thread); >>>>> + cmdq_task_wait_release_schedule(task); >>>>> + mutex_unlock(&cmdq->task_mutex); >>>>> + return 0; >>>>> +} >>>>> +EXPORT_SYMBOL(cmdq_rec_flush_async); >>>>> + >>>>> +struct cmdq_flush_completion { >>>>> + struct completion cmplt; >>>>> + bool err; >>>>> +}; >>>>> + >>>>> +static int cmdq_rec_flush_cb(struct cmdq_cb_data data) >>>>> +{ >>>>> + struct cmdq_flush_completion *cmplt = data.data; >>>>> + >>>>> + cmplt->err = data.err; >>>>> + complete(&cmplt->cmplt); >>>>> + return 0; >>>>> +} >>>>> + >>>>> +int cmdq_rec_flush(struct cmdq_rec *rec) >>>>> +{ >>>>> + struct cmdq_flush_completion cmplt; >>>>> + int err; >>>>> + >>>>> + init_completion(&cmplt.cmplt); >>>>> + err = cmdq_rec_flush_async(rec, cmdq_rec_flush_cb, &cmplt); >>>>> + if (err < 0) >>>>> + return err; >>>>> + wait_for_completion(&cmplt.cmplt); >>>>> + return cmplt.err ? -EFAULT : 0; >>>>> +} >>>>> +EXPORT_SYMBOL(cmdq_rec_flush); >>>>> + >>>>> +void cmdq_rec_destroy(struct cmdq_rec *rec) >>>>> +{ >>>>> + kfree(rec->buf); >>>>> + kfree(rec); >>>>> +} >>>>> +EXPORT_SYMBOL(cmdq_rec_destroy); >>>>> + >>>>> +static bool cmdq_task_is_empty(struct cmdq *cmdq) >>>>> +{ >>>>> + struct cmdq_thread *thread; >>>>> + int i; >>>>> + >>>>> + for (i = 0; i < ARRAY_SIZE(cmdq->thread); i++) { >>>>> + thread = &cmdq->thread[i]; >>>>> + if (!list_empty(&thread->task_busy_list)) >>>>> + return false; >>>>> + } >>>>> + return true; >>>>> +} >>>>> + >>>>> +static int cmdq_suspend(struct device *dev) >>>>> +{ >>>>> + struct cmdq *cmdq = dev_get_drvdata(dev); >>>>> + u32 exec_threads; >>>>> + >>>>> + mutex_lock(&cmdq->task_mutex); >>>>> + cmdq->suspended = true; >>>>> + mutex_unlock(&cmdq->task_mutex); >>>>> + >>>>> + exec_threads = readl(cmdq->base + CMDQ_CURR_LOADED_THR); >>>>> + if ((exec_threads & CMDQ_THR_EXECUTING) && !cmdq_task_is_empty(cmdq)) { >>>>> + dev_err(dev, "wait active tasks timeout.\n"); >>>>> + flush_workqueue(cmdq->task_release_wq); >>>>> + } >>>>> + return 0; >>>>> +} >>>>> + >>>>> +static int cmdq_resume(struct device *dev) >>>>> +{ >>>>> + struct cmdq *cmdq = dev_get_drvdata(dev); >>>>> + >>>>> + cmdq->suspended = false; >>>>> + return 0; >>>>> +} >>>>> + >>>>> +static int cmdq_remove(struct platform_device *pdev) >>>>> +{ >>>>> + struct cmdq *cmdq = platform_get_drvdata(pdev); >>>>> + >>>>> + destroy_workqueue(cmdq->task_release_wq); >>>>> + cmdq->task_release_wq = NULL; >>>>> + return 0; >>>>> +} >>>>> + >>>>> +static int cmdq_probe(struct platform_device *pdev) >>>>> +{ >>>>> + struct device *dev = &pdev->dev; >>>>> + struct device_node *node = dev->of_node; >>>>> + struct resource *res; >>>>> + struct cmdq *cmdq; >>>>> + int err, i; >>>>> + >>>>> + cmdq = devm_kzalloc(dev, sizeof(*cmdq), GFP_KERNEL); >>>>> + if (!cmdq) >>>>> + return -ENOMEM; >>>>> + cmdq->dev = dev; >>>>> + >>>>> + res = platform_get_resource(pdev, IORESOURCE_MEM, 0); >>>>> + cmdq->base = devm_ioremap_resource(dev, res); >>>>> + if (IS_ERR(cmdq->base)) { >>>>> + dev_err(dev, "failed to ioremap gce\n"); >>>>> + return PTR_ERR(cmdq->base); >>>>> + } >>>>> + >>>>> + cmdq->irq = irq_of_parse_and_map(node, 0); >>>>> + if (!cmdq->irq) { >>>>> + dev_err(dev, "failed to get irq\n"); >>>>> + return -EINVAL; >>>>> + } >>>>> + >>>>> + dev_dbg(dev, "cmdq device: addr:0x%p, va:0x%p, irq:%d\n", >>>>> + dev, cmdq->base, cmdq->irq); >>>>> + >>>>> + mutex_init(&cmdq->task_mutex); >>>>> + spin_lock_init(&cmdq->exec_lock); >>>>> + cmdq->task_release_wq = alloc_ordered_workqueue( >>>>> + "%s", WQ_MEM_RECLAIM | WQ_HIGHPRI, >>>>> + "cmdq_task_wait_release"); >>>>> + >>>>> + for (i = 0; i < ARRAY_SIZE(cmdq->thread); i++) { >>>>> + cmdq->thread[i].base = cmdq->base + CMDQ_THR_BASE + >>>>> + CMDQ_THR_SHIFT * i; >>>>> + init_waitqueue_head(&cmdq->thread[i].wait_task_done); >>>>> + INIT_LIST_HEAD(&cmdq->thread[i].task_busy_list); >>>>> + } >>>>> + >>>>> + platform_set_drvdata(pdev, cmdq); >>>>> + >>>>> + err = devm_request_irq(dev, cmdq->irq, cmdq_irq_handler, IRQF_SHARED, >>>>> + CMDQ_DRIVER_DEVICE_NAME, cmdq); >>>>> + if (err < 0) { >>>>> + dev_err(dev, "failed to register ISR (%d)\n", err); >>>>> + goto fail; >>>>> + } >>>>> + >>>>> + cmdq->clock = devm_clk_get(dev, CMDQ_CLK_NAME); >>>>> + if (IS_ERR(cmdq->clock)) { >>>>> + dev_err(dev, "failed to get clk:%s\n", CMDQ_CLK_NAME); >>>>> + err = PTR_ERR(cmdq->clock); >>>>> + goto fail; >>>>> + } >>>>> + return 0; >>>>> + >>>>> +fail: >>>>> + cmdq_remove(pdev); >>>>> + return err; >>>>> +} >>>>> + >>>>> +static const struct dev_pm_ops cmdq_pm_ops = { >>>>> + .suspend = cmdq_suspend, >>>>> + .resume = cmdq_resume, >>>>> +}; >>>>> + >>>>> +static const struct of_device_id cmdq_of_ids[] = { >>>>> + {.compatible = "mediatek,mt8173-gce",}, >>>>> + {} >>>>> +}; >>>>> + >>>>> +static struct platform_driver cmdq_drv = { >>>>> + .probe = cmdq_probe, >>>>> + .remove = cmdq_remove, >>>>> + .driver = { >>>>> + .name = CMDQ_DRIVER_DEVICE_NAME, >>>>> + .owner = THIS_MODULE, >>>>> + .pm = &cmdq_pm_ops, >>>>> + .of_match_table = cmdq_of_ids, >>>>> + } >>>>> +}; >>>>> + >>>>> +builtin_platform_driver(cmdq_drv); >>>>> diff --git a/include/soc/mediatek/cmdq.h b/include/soc/mediatek/cmdq.h >>>>> new file mode 100644 >>>>> index 0000000..60eef3d >>>>> --- /dev/null >>>>> +++ b/include/soc/mediatek/cmdq.h >>>>> @@ -0,0 +1,197 @@ >>>>> +/* >>>>> + * Copyright (c) 2015 MediaTek Inc. >>>>> + * >>>>> + * This program is free software; you can redistribute it and/or modify >>>>> + * it under the terms of the GNU General Public License version 2 as >>>>> + * published by the Free Software Foundation. >>>>> + * >>>>> + * This program is distributed in the hope that it will be useful, >>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of >>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >>>>> + * GNU General Public License for more details. >>>>> + */ >>>>> + >>>>> +#ifndef __MTK_CMDQ_H__ >>>>> +#define __MTK_CMDQ_H__ >>>>> + >>>>> +#include >>>>> +#include >>>>> + >>>>> +enum cmdq_eng { >>>>> + CMDQ_ENG_DISP_AAL, >>>>> + CMDQ_ENG_DISP_COLOR0, >>>>> + CMDQ_ENG_DISP_COLOR1, >>>>> + CMDQ_ENG_DISP_DPI0, >>>>> + CMDQ_ENG_DISP_DSI0, >>>>> + CMDQ_ENG_DISP_DSI1, >>>>> + CMDQ_ENG_DISP_GAMMA, >>>>> + CMDQ_ENG_DISP_OD, >>>>> + CMDQ_ENG_DISP_OVL0, >>>>> + CMDQ_ENG_DISP_OVL1, >>>>> + CMDQ_ENG_DISP_PWM0, >>>>> + CMDQ_ENG_DISP_PWM1, >>>>> + CMDQ_ENG_DISP_RDMA0, >>>>> + CMDQ_ENG_DISP_RDMA1, >>>>> + CMDQ_ENG_DISP_RDMA2, >>>>> + CMDQ_ENG_DISP_UFOE, >>>>> + CMDQ_ENG_DISP_WDMA0, >>>>> + CMDQ_ENG_DISP_WDMA1, >>>>> + CMDQ_ENG_MAX, >>>>> +}; >>>>> + >>>>> +/* events for CMDQ and display */ >>>>> +enum cmdq_event { >>>>> + /* Display start of frame(SOF) events */ >>>>> + CMDQ_EVENT_DISP_OVL0_SOF = 11, >>>>> + CMDQ_EVENT_DISP_OVL1_SOF = 12, >>>>> + CMDQ_EVENT_DISP_RDMA0_SOF = 13, >>>>> + CMDQ_EVENT_DISP_RDMA1_SOF = 14, >>>>> + CMDQ_EVENT_DISP_RDMA2_SOF = 15, >>>>> + CMDQ_EVENT_DISP_WDMA0_SOF = 16, >>>>> + CMDQ_EVENT_DISP_WDMA1_SOF = 17, >>>>> + /* Display end of frame(EOF) events */ >>>>> + CMDQ_EVENT_DISP_OVL0_EOF = 39, >>>>> + CMDQ_EVENT_DISP_OVL1_EOF = 40, >>>>> + CMDQ_EVENT_DISP_RDMA0_EOF = 41, >>>>> + CMDQ_EVENT_DISP_RDMA1_EOF = 42, >>>>> + CMDQ_EVENT_DISP_RDMA2_EOF = 43, >>>>> + CMDQ_EVENT_DISP_WDMA0_EOF = 44, >>>>> + CMDQ_EVENT_DISP_WDMA1_EOF = 45, >>>>> + /* Mutex end of frame(EOF) events */ >>>>> + CMDQ_EVENT_MUTEX0_STREAM_EOF = 53, >>>>> + CMDQ_EVENT_MUTEX1_STREAM_EOF = 54, >>>>> + CMDQ_EVENT_MUTEX2_STREAM_EOF = 55, >>>>> + CMDQ_EVENT_MUTEX3_STREAM_EOF = 56, >>>>> + CMDQ_EVENT_MUTEX4_STREAM_EOF = 57, >>>>> + /* Display underrun events */ >>>>> + CMDQ_EVENT_DISP_RDMA0_UNDERRUN = 63, >>>>> + CMDQ_EVENT_DISP_RDMA1_UNDERRUN = 64, >>>>> + CMDQ_EVENT_DISP_RDMA2_UNDERRUN = 65, >>>>> + /* Keep this at the end of HW events */ >>>>> + CMDQ_MAX_HW_EVENT_COUNT = 260, >>>>> +}; >>>>> + >>>>> +struct cmdq_cb_data { >>>>> + bool err; >>>>> + void *data; >>>>> +}; >>>>> + >>>>> +typedef int (*cmdq_async_flush_cb)(struct cmdq_cb_data data); >>>>> + >>>>> +struct cmdq_task; >>>>> +struct cmdq; >>>>> + >>>>> +struct cmdq_rec { >>>>> + struct cmdq *cmdq; >>>>> + u64 engine_flag; >>>>> + size_t command_size; >>>>> + void *buf; >>>>> + size_t buf_size; >>>>> + bool finalized; >>>>> +}; >> >> Why do we need cmdq_rec at all? Can't we just use the cmdq_task for that >> and this way make the driver less complex? > > There are two main reasons for cmdq_rec. > 1. It is slow to access dma too frequently. > So, we append commands to cacheable memory, and then flush to dma. > 2. cmdq_rec is not thread safe, but cmdq_task needs thread safe. > If we merge them, we need to take care of some synchronization > issues. > >>>>> + >>>>> +struct cmdq_base { >>>>> + int subsys; >>>>> + u32 base; >>>> >>>> subsys can always be calculated via cmdq_subsys_base_to_id(base >> 16) >>>> so we can get rid of the struct, right? >>> >>> Current subsys method is based on previous comment from Daniel Kurtz. >>> Please take a look of our previous discussion. >>> http://lists.infradead.org/pipermail/linux-mediatek/2016-February/004483.html >>> Thanks. >>> >> >> I have to look deeper into this, but from what I read, the proposal from >> Daniel [1] seems good to me. >> >> [1] https://patchwork.kernel.org/patch/8068311/ > > The initial proposal has some problem, so please see the follow-up > discussions. Thanks. > http://lists.infradead.org/pipermail/linux-mediatek/2016-February/003972.html > http://lists.infradead.org/pipermail/linux-mediatek/2016-February/004483.html > > >>>>> +}; >>>>> + >>>>> +/** >>>>> + * cmdq_register_device() - register device which needs CMDQ >>>>> + * @dev: device >>>>> + * >>>>> + * Return: cmdq_base pointer or NULL for failed >>>>> + */ >>>>> +struct cmdq_base *cmdq_register_device(struct device *dev); >>>>> + >>>>> +/** >>>>> + * cmdq_rec_create() - create command queue record >>>>> + * @dev: device >>>>> + * @engine_flag: command queue engine flag >>>>> + * @rec_ptr: command queue record pointer to retrieve cmdq_rec >>>>> + * >>>>> + * Return: 0 for success; else the error code is returned >>>>> + */ >>>>> +int cmdq_rec_create(struct device *dev, u64 engine_flag, >>>>> + struct cmdq_rec **rec_ptr); >>>>> + >>>>> +/** >>>>> + * cmdq_rec_write() - append write command to the command queue record >>>>> + * @rec: the command queue record >>>>> + * @value: the specified target register value >>>>> + * @base: the command queue base >>>>> + * @offset: register offset from module base >>>>> + * >>>>> + * Return: 0 for success; else the error code is returned >>>>> + */ >>>>> +int cmdq_rec_write(struct cmdq_rec *rec, u32 value, >>>>> + struct cmdq_base *base, u32 offset); >>>>> + >>>>> +/** >>>>> + * cmdq_rec_write_mask() - append write command with mask to the command >>>>> + * queue record >>>>> + * @rec: the command queue record >>>>> + * @value: the specified target register value >>>>> + * @base: the command queue base >>>>> + * @offset: register offset from module base >>>>> + * @mask: the specified target register mask >>>>> + * >>>>> + * Return: 0 for success; else the error code is returned >>>>> + */ >>>>> +int cmdq_rec_write_mask(struct cmdq_rec *rec, u32 value, >>>>> + struct cmdq_base *base, u32 offset, u32 mask); >>>>> + >>>>> +/** >>>>> + * cmdq_rec_wfe() - append wait for event command to the command queue reco rd >>>> >>>> reco rd -> record >>> >>> Will fix it. >>> >>>> Regards, >>>> Matthias >>>> >>>>> + * @rec: the command queue record >>>>> + * @event: the desired event type to "wait and CLEAR" >>>>> + * >>>>> + * Return: 0 for success; else the error code is returned >>>>> + */ >>>>> +int cmdq_rec_wfe(struct cmdq_rec *rec, enum cmdq_event event); >>>>> + >>>>> +/** >>>>> + * cmdq_rec_clear_event() - append clear event command to the command queue >>>>> + * record >>>>> + * @rec: the command queue record >>>>> + * @event: the desired event to be cleared >>>>> + * >>>>> + * Return: 0 for success; else the error code is returned >>>>> + */ >>>>> +int cmdq_rec_clear_event(struct cmdq_rec *rec, enum cmdq_event event); >>>>> + >>>>> +/** >>>>> + * cmdq_rec_flush() - trigger CMDQ to execute the recorded commands >>>>> + * @rec: the command queue record >>>>> + * >>>>> + * Return: 0 for success; else the error code is returned >>>>> + * >>>>> + * Trigger CMDQ to execute the recorded commands. Note that this is a >>>>> + * synchronous flush function. When the function returned, the recorded >>>>> + * commands have been done. >>>>> + */ >>>>> +int cmdq_rec_flush(struct cmdq_rec *rec); >>>>> + >>>>> +/** >>>>> + * cmdq_rec_flush_async() - trigger CMDQ to asynchronously execute the recorded >>>>> + * commands and call back after ISR is finished >>>>> + * @rec: the command queue record >>>>> + * @cb: called in the end of CMDQ ISR >>>>> + * @data: this data will pass back to cb >>>>> + * >>>>> + * Return: 0 for success; else the error code is returned >>>>> + * >>>>> + * Trigger CMDQ to asynchronously execute the recorded commands and call back >>>>> + * after ISR is finished. Note that this is an ASYNC function. When the function >>>>> + * returned, it may or may not be finished. The ISR callback function is called >>>>> + * in the end of ISR. >> >> "The callback is called from the ISR." >> >> Regards, >> Matthias >> >>>>> + */ >>>>> +int cmdq_rec_flush_async(struct cmdq_rec *rec, cmdq_async_flush_cb cb, >>>>> + void *data); >>>>> + >>>>> +/** >>>>> + * cmdq_rec_destroy() - destroy command queue record >>>>> + * @rec: the command queue record >>>>> + */ >>>>> +void cmdq_rec_destroy(struct cmdq_rec *rec); >>>>> + >>>>> +#endif /* __MTK_CMDQ_H__ */ >>>>> >>> >>> Thanks, >>> HS >>> > > Thanks, > HS >