From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96870C4743C for ; Wed, 23 Jun 2021 11:46:58 +0000 (UTC) Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by mail.kernel.org (Postfix) with ESMTP id 0094460249 for ; Wed, 23 Jun 2021 11:46:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0094460249 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=dev-bounces@dpdk.org Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id DC6984003F; Wed, 23 Jun 2021 13:46:56 +0200 (CEST) Received: from mail-io1-f50.google.com (mail-io1-f50.google.com [209.85.166.50]) by mails.dpdk.org (Postfix) with ESMTP id E4A934003E for ; Wed, 23 Jun 2021 13:46:55 +0200 (CEST) Received: by mail-io1-f50.google.com with SMTP id b7so2942511ioq.12 for ; Wed, 23 Jun 2021 04:46:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=yhGPZNTV1TitIY+69xC+WFeg1cG12cprp/YhvoJeh4c=; b=ugmyNN/LUC3N8LkDryOQoY1J58VyknV1uu1jqn5JwNSc419OLFXu8389wbU1hBJ46y W7/R6Chr4jWh/A3Q/syedVhq+tNnbnVERkAfnrRWs59V+tPoCWfCEv87zgZOrBoAjGRE /lbsGKM1xOMpOBPZhly13QCSaiZ2Ndc/ufVN0hK6ApsbC/QGv8wLnT/xvB4wA8qcpY0P HGII7HoS9m7NigVRNovrsgB9hR9ZhPxss5k4muaUp+NVf1Ppz0sOg8Dj1ZRxdKG5RDCf Z8JlTr1OaRAP0YbcgGklVNs2J7FX1hBPqTAgihDOIU0CMEj78sbNNi6v4Yu9EGcMAO74 2ARA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=yhGPZNTV1TitIY+69xC+WFeg1cG12cprp/YhvoJeh4c=; b=jKDcnGfl2Kfd3E4No9TDpnfLhKwgUVLQt0P3aVlZooNqwEq9AFaAe2jVnszrjCRSBs T+2uBbVaD1IC3mR1NEyTB/tHhYFHZ3di5FEeUxVCcULw6PJj0qavtUU9KN8Lu1bQDIU4 AkEOqgIoKUbohjLIxn+skPyDw+O9VcUTA7uQgNzUUUyYaqkFVZxq0S4239rSSOPL/ObQ pdwANuYT4C17fV0hsEYlrIf0e21RNxi+kXFRkoz1YpPGGP074Y6ti1ApJtj1HWPoqGhK DwjYoQYPFFpJlm0M6+0wU6SKvtuonH+hvEV6FjCRW4MF/KD9VgJihsC/OQtrd3xxkqS2 b8sw== X-Gm-Message-State: AOAM533xq4A0cGk3JU2/118KGtdyN6CKHZkFBQt9a1Q80Ax1YoRyfUUA 3P1COf1Vg9GK48V+WIRhlejxoXN3ol4Il36ptv8= X-Google-Smtp-Source: ABdhPJwprlGnxUodknQaZL+4jhu2BfQ3pfbhzCak20c7IahoWirBsi8xvtwipab/mr6C/kpyyTcdolCvmkBwkE7r/C4= X-Received: by 2002:a05:6638:12cd:: with SMTP id v13mr8605574jas.104.1624448815098; Wed, 23 Jun 2021 04:46:55 -0700 (PDT) MIME-Version: 1.0 References: <1623763327-30987-1-git-send-email-fengchengwen@huawei.com> <98CBD80474FA8B44BF855DF32C47DC35C61860@smartserver.smartshare.dk> <3cb0bd01-2b0d-cf96-d173-920947466041@huawei.com> In-Reply-To: From: Jerin Jacob Date: Wed, 23 Jun 2021 17:16:28 +0530 Message-ID: To: Bruce Richardson Cc: fengchengwen , =?UTF-8?Q?Morten_Br=C3=B8rup?= , Thomas Monjalon , Ferruh Yigit , dpdk-dev , Nipun Gupta , Hemant Agrawal , Maxime Coquelin , Honnappa Nagarahalli , Jerin Jacob , David Marchand , Satananda Burla , Prasun Kapoor Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Wed, Jun 23, 2021 at 3:11 PM Bruce Richardson wrote: > > On Tue, Jun 22, 2021 at 10:55:24PM +0530, Jerin Jacob wrote: > > On Fri, Jun 18, 2021 at 3:11 PM fengchengwen = wrote: > > > > > > On 2021/6/18 13:52, Jerin Jacob wrote: > > > > On Thu, Jun 17, 2021 at 2:46 PM Bruce Richardson > > > > wrote: > > > >> > > > >> On Wed, Jun 16, 2021 at 08:07:26PM +0530, Jerin Jacob wrote: > > > >>> On Wed, Jun 16, 2021 at 3:47 PM fengchengwen wrote: > > > >>>> > > > >>>> On 2021/6/16 15:09, Morten Br=C3=B8rup wrote: > > > >>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Ric= hardson > > > >>>>>> Sent: Tuesday, 15 June 2021 18.39 > > > >>>>>> > > > >>>>>> On Tue, Jun 15, 2021 at 09:22:07PM +0800, Chengwen Feng wrote: > > > >>>>>>> This patch introduces 'dmadevice' which is a generic type of = DMA > > > >>>>>>> device. > > > >>>>>>> > > > >>>>>>> The APIs of dmadev library exposes some generic operations wh= ich can > > > >>>>>>> enable configuration and I/O with the DMA devices. > > > >>>>>>> > > > >>>>>>> Signed-off-by: Chengwen Feng > > > >>>>>>> --- > > > >>>>>> Thanks for sending this. > > > >>>>>> > > > >>>>>> Of most interest to me right now are the key data-plane APIs. = While we > > > >>>>>> are > > > >>>>>> still in the prototyping phase, below is a draft of what we ar= e > > > >>>>>> thinking > > > >>>>>> for the key enqueue/perform_ops/completed_ops APIs. > > > >>>>>> > > > >>>>>> Some key differences I note in below vs your original RFC: > > > >>>>>> * Use of void pointers rather than iova addresses. While using= iova's > > > >>>>>> makes > > > >>>>>> sense in the general case when using hardware, in that it ca= n work > > > >>>>>> with > > > >>>>>> both physical addresses and virtual addresses, if we change = the APIs > > > >>>>>> to use > > > >>>>>> void pointers instead it will still work for DPDK in VA mode= , while > > > >>>>>> at the > > > >>>>>> same time allow use of software fallbacks in error cases, an= d also a > > > >>>>>> stub > > > >>>>>> driver than uses memcpy in the background. Finally, using io= va's > > > >>>>>> makes the > > > >>>>>> APIs a lot more awkward to use with anything but mbufs or si= milar > > > >>>>>> buffers > > > >>>>>> where we already have a pre-computed physical address. > > > >>>>>> * Use of id values rather than user-provided handles. Allowing= the > > > >>>>>> user/app > > > >>>>>> to manage the amount of data stored per operation is a bette= r > > > >>>>>> solution, I > > > >>>>>> feel than proscribing a certain about of in-driver tracking.= Some > > > >>>>>> apps may > > > >>>>>> not care about anything other than a job being completed, wh= ile other > > > >>>>>> apps > > > >>>>>> may have significant metadata to be tracked. Taking the user= -context > > > >>>>>> handles out of the API also makes the driver code simpler. > > > >>>>>> * I've kept a single combined API for completions, which diffe= rs from > > > >>>>>> the > > > >>>>>> separate error handling completion API you propose. I need t= o give > > > >>>>>> the > > > >>>>>> two function approach a bit of thought, but likely both coul= d work. > > > >>>>>> If we > > > >>>>>> (likely) never expect failed ops, then the specifics of erro= r > > > >>>>>> handling > > > >>>>>> should not matter that much. > > > >>>>>> > > > >>>>>> For the rest, the control / setup APIs are likely to be rather > > > >>>>>> uncontroversial, I suspect. However, I think that rather than = xstats > > > >>>>>> APIs, > > > >>>>>> the library should first provide a set of standardized stats l= ike > > > >>>>>> ethdev > > > >>>>>> does. If driver-specific stats are needed, we can add xstats l= ater to > > > >>>>>> the > > > >>>>>> API. > > > >>>>>> > > > >>>>>> Appreciate your further thoughts on this, thanks. > > > >>>>>> > > > >>>>>> Regards, > > > >>>>>> /Bruce > > > >>>>> > > > >>>>> I generally agree with Bruce's points above. > > > >>>>> > > > >>>>> I would like to share a couple of ideas for further discussion: > > > >>> > > > >>> > > > >>> I believe some of the other requirements and comments for generic= DMA will be > > > >>> > > > >>> 1) Support for the _channel_, Each channel may have different > > > >>> capabilities and functionalities. > > > >>> Typical cases are, each channel have separate source and destinat= ion > > > >>> devices like > > > >>> DMA between PCIe EP to Host memory, Host memory to Host memory, P= CIe > > > >>> EP to PCIe EP. > > > >>> So we need some notion of the channel in the specification. > > > >>> > > > >> > > > >> Can you share a bit more detail on what constitutes a channel in t= his case? > > > >> Is it equivalent to a device queue (which we are flattening to ind= ividual > > > >> devices in this API), or to a specific configuration on a queue? > > > > > > > > It not a queue. It is one of the attributes for transfer. > > > > I.e in the same queue, for a given transfer it can specify the > > > > different "source" and "destination" device. > > > > Like CPU to Sound card, CPU to network card etc. > > > > > > > > > > > >> > > > >>> 2) I assume current data plane APIs are not thread-safe. Is it ri= ght? > > > >>> > > > >> Yes. > > > >> > > > >>> > > > >>> 3) Cookie scheme outlined earlier looks good to me. Instead of ha= ving > > > >>> generic dequeue() API > > > >>> > > > >>> 4) Can split the rte_dmadev_enqueue_copy(uint16_t dev_id, void * = src, > > > >>> void * dst, unsigned int length); > > > >>> to two stage API like, Where one will be used in fastpath and oth= er > > > >>> one will use used in slowpath. > > > >>> > > > >>> - slowpath API will for take channel and take other attributes fo= r transfer > > > >>> > > > >>> Example syantx will be: > > > >>> > > > >>> struct rte_dmadev_desc { > > > >>> channel id; > > > >>> ops ; // copy, xor, fill etc > > > >>> other arguments specific to dma transfer // it can be s= et > > > >>> based on capability. > > > >>> > > > >>> }; > > > >>> > > > >>> rte_dmadev_desc_t rte_dmadev_preprare(uint16_t dev_id, struct > > > >>> rte_dmadev_desc *dec); > > > >>> > > > >>> - Fastpath takes arguments that need to change per transfer along= with > > > >>> slow-path handle. > > > >>> > > > >>> rte_dmadev_enqueue(uint16_t dev_id, void * src, void * dst, unsig= ned > > > >>> int length, rte_dmadev_desc_t desc) > > > >>> > > > >>> This will help to driver to > > > >>> -Former API form the device-specific descriptors in slow path fo= r a > > > >>> given channel and fixed attributes per transfer > > > >>> -Later API blend "variable" arguments such as src, dest address w= ith > > > >>> slow-path created descriptors > > > >>> > > > >> > > > >> This seems like an API for a context-aware device, where the chann= el is the > > > >> config data/context that is preserved across operations - is that = correct? > > > >> At least from the Intel DMA accelerators side, we have no concept = of this > > > >> context, and each operation is completely self-described. The loca= tion or > > > >> type of memory for copies is irrelevant, you just pass the src/dst > > > >> addresses to reference. > > > > > > > > it is not context-aware device. Each HW JOB is self-described. > > > > You can view it different attributes of transfer. > > > > > > > > > > > >> > > > >>> The above will give better performance and is the best trade-off = c > > > >>> between performance and per transfer variables. > > > >> > > > >> We may need to have different APIs for context-aware and context-u= naware > > > >> processing, with which to use determined by the capabilities disco= very. > > > >> Given that for these DMA devices the offload cost is critical, mor= e so than > > > >> any other dev class I've looked at before, I'd like to avoid havin= g APIs > > > >> with extra parameters than need to be passed about since that just= adds > > > >> extra CPU cycles to the offload. > > > > > > > > If driver does not support additional attributes and/or the > > > > application does not need it, rte_dmadev_desc_t can be NULL. > > > > So that it won't have any cost in the datapath. I think, we can go = to > > > > different API > > > > cases if we can not abstract problems without performance impact. > > > > Otherwise, it will be too much > > > > pain for applications. > > > > > > Yes, currently we plan to use different API for different case, e.g. > > > rte_dmadev_memcpy() -- deal with local to local memcopy > > > rte_dmadev_memset() -- deal with fill with local memory with patte= rn > > > maybe: > > > rte_dmadev_imm_data() --deal with copy very little data > > > rte_dmadev_p2pcopy() --deal with peer-to-peer copy of diffenet PC= IE addr > > > > > > These API capabilities will be reflected in the device capability set= so that > > > application could know by standard API. > > > > > > There will be a lot of combination of that it will be like M x N cross > > base case, It won't scale. > > > > What are the various cases that are so significantly different? Using the > examples above, the "imm_data" and "p2p_copy" operations are still copy > ops, and the fact of it being a small copy or a p2p one can be expressed > just using flags? [Also, you are not likely to want to offload a small > copy, are you?] I meant, p2p version can have memcpy, memset, _imm_data. So it is gone to 4 to 6 now, If we add one more op, it becomes 8 function. IMO, a separate function is good if driver need to do radically different thing. In our hardware, it is about updating the descriptor field differently, Is it so with other HW? If so, _prep() makes life easy. > > > > > > > > > > > > Just to understand, I think, we need to HW capabilities and how to > > > > have a common API. > > > > I assume HW will have some HW JOB descriptors which will be filled = in > > > > SW and submitted to HW. > > > > In our HW, Job descriptor has the following main elements > > > > > > > > - Channel // We don't expect the application to change per transf= er > > > > - Source address - It can be scatter-gather too - Will be changed p= er transfer > > > > - Destination address - It can be scatter-gather too - Will be chan= ged > > > > per transfer > > > > - Transfer Length - - It can be scatter-gather too - Will be change= d > > > > per transfer > > > > - IOVA address where HW post Job completion status PER Job descript= or > > > > - Will be changed per transfer > > > > - Another sideband information related to channel // We don't expe= ct > > > > the application to change per transfer > > > > - As an option, Job completion can be posted as an event to > > > > rte_event_queue too // We don't expect the application to change p= er > > > > transfer > > > > > > The 'option' field looks like a software interface field, but not HW = descriptor. > > > > It is in HW descriptor. > > > > > > > > > > > > > @Richardson, Bruce @fengchengwen @Hemant Agrawal > > > > > > > > Could you share the options for your HW descriptors which you are > > > > planning to expose through API like above so that we can easily > > > > converge on fastpath API > > > > > > > > > > Kunpeng HW descriptor is self-describing, and don't need refer contex= t info. > > > > > > Maybe the fields which was fix with some transfer type could setup by= driver, and > > > don't expose to application. > > > > Yes. I agree.I think, that reason why I though to have > > rte_dmadev_prep() call to convert DPDK DMA transfer attributes to HW > > specific descriptors > > What are all these attributes? Do you have a reference link for them? >