From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66F8FC48BDF for ; Fri, 18 Jun 2021 09:55:35 +0000 (UTC) Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by mail.kernel.org (Postfix) with ESMTP id E4F7F613D6 for ; Fri, 18 Jun 2021 09:55:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E4F7F613D6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=dev-bounces@dpdk.org Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 24EC140150; Fri, 18 Jun 2021 11:55:34 +0200 (CEST) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by mails.dpdk.org (Postfix) with ESMTP id 1835040142 for ; Fri, 18 Jun 2021 11:55:31 +0200 (CEST) IronPort-SDR: t6FAmrhL4VfhtRfZtJIlZPzoEKsGkZhnEnrwsjFDCZJOr+b5JbqD8frVyyDgprPOwoXyEbE0/V TJQdXwsooZ1A== X-IronPort-AV: E=McAfee;i="6200,9189,10018"; a="204699075" X-IronPort-AV: E=Sophos;i="5.83,283,1616482800"; d="scan'208";a="204699075" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jun 2021 02:55:29 -0700 IronPort-SDR: mjAkyUzBsxJ28apWramya6WuNXNLjOicCzIIS85qCGyzZbyAYz0oDnT1dYqGy8QFxbpCpNa9gL w+X2yoXNfv2Q== X-IronPort-AV: E=Sophos;i="5.83,283,1616482800"; d="scan'208";a="554688937" Received: from bricha3-mobl.ger.corp.intel.com ([10.252.30.209]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-SHA; 18 Jun 2021 02:55:26 -0700 Date: Fri, 18 Jun 2021 10:55:23 +0100 From: Bruce Richardson To: Jerin Jacob Cc: fengchengwen , Morten =?iso-8859-1?Q?Br=F8rup?= , Thomas Monjalon , Ferruh Yigit , dpdk-dev , Nipun Gupta , Hemant Agrawal , Maxime Coquelin , Honnappa Nagarahalli , Jerin Jacob , David Marchand , Satananda Burla , Prasun Kapoor Message-ID: References: <1623763327-30987-1-git-send-email-fengchengwen@huawei.com> <98CBD80474FA8B44BF855DF32C47DC35C61860@smartserver.smartshare.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Subject: Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Fri, Jun 18, 2021 at 11:22:28AM +0530, Jerin Jacob wrote: > On Thu, Jun 17, 2021 at 2:46 PM Bruce Richardson > wrote: > > > > On Wed, Jun 16, 2021 at 08:07:26PM +0530, Jerin Jacob wrote: > > > On Wed, Jun 16, 2021 at 3:47 PM fengchengwen wrote: > > > > > > > > On 2021/6/16 15:09, Morten Brørup wrote: > > > > >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson > > > > >> Sent: Tuesday, 15 June 2021 18.39 > > > > >> > > > > >> On Tue, Jun 15, 2021 at 09:22:07PM +0800, Chengwen Feng wrote: > > > > >>> This patch introduces 'dmadevice' which is a generic type of DMA > > > > >>> device. > > > > >>> > > > > >>> The APIs of dmadev library exposes some generic operations which can > > > > >>> enable configuration and I/O with the DMA devices. > > > > >>> > > > > >>> Signed-off-by: Chengwen Feng > > > > >>> --- > > > > >> Thanks for sending this. > > > > >> > > > > >> Of most interest to me right now are the key data-plane APIs. While we > > > > >> are > > > > >> still in the prototyping phase, below is a draft of what we are > > > > >> thinking > > > > >> for the key enqueue/perform_ops/completed_ops APIs. > > > > >> > > > > >> Some key differences I note in below vs your original RFC: > > > > >> * Use of void pointers rather than iova addresses. While using iova's > > > > >> makes > > > > >> sense in the general case when using hardware, in that it can work > > > > >> with > > > > >> both physical addresses and virtual addresses, if we change the APIs > > > > >> to use > > > > >> void pointers instead it will still work for DPDK in VA mode, while > > > > >> at the > > > > >> same time allow use of software fallbacks in error cases, and also a > > > > >> stub > > > > >> driver than uses memcpy in the background. Finally, using iova's > > > > >> makes the > > > > >> APIs a lot more awkward to use with anything but mbufs or similar > > > > >> buffers > > > > >> where we already have a pre-computed physical address. > > > > >> * Use of id values rather than user-provided handles. Allowing the > > > > >> user/app > > > > >> to manage the amount of data stored per operation is a better > > > > >> solution, I > > > > >> feel than proscribing a certain about of in-driver tracking. Some > > > > >> apps may > > > > >> not care about anything other than a job being completed, while other > > > > >> apps > > > > >> may have significant metadata to be tracked. Taking the user-context > > > > >> handles out of the API also makes the driver code simpler. > > > > >> * I've kept a single combined API for completions, which differs from > > > > >> the > > > > >> separate error handling completion API you propose. I need to give > > > > >> the > > > > >> two function approach a bit of thought, but likely both could work. > > > > >> If we > > > > >> (likely) never expect failed ops, then the specifics of error > > > > >> handling > > > > >> should not matter that much. > > > > >> > > > > >> For the rest, the control / setup APIs are likely to be rather > > > > >> uncontroversial, I suspect. However, I think that rather than xstats > > > > >> APIs, > > > > >> the library should first provide a set of standardized stats like > > > > >> ethdev > > > > >> does. If driver-specific stats are needed, we can add xstats later to > > > > >> the > > > > >> API. > > > > >> > > > > >> Appreciate your further thoughts on this, thanks. > > > > >> > > > > >> Regards, > > > > >> /Bruce > > > > > > > > > > I generally agree with Bruce's points above. > > > > > > > > > > I would like to share a couple of ideas for further discussion: > > > > > > > > > I believe some of the other requirements and comments for generic DMA will be > > > > > > 1) Support for the _channel_, Each channel may have different > > > capabilities and functionalities. > > > Typical cases are, each channel have separate source and destination > > > devices like > > > DMA between PCIe EP to Host memory, Host memory to Host memory, PCIe > > > EP to PCIe EP. > > > So we need some notion of the channel in the specification. > > > > > > > Can you share a bit more detail on what constitutes a channel in this case? > > Is it equivalent to a device queue (which we are flattening to individual > > devices in this API), or to a specific configuration on a queue? > > It not a queue. It is one of the attributes for transfer. > I.e in the same queue, for a given transfer it can specify the > different "source" and "destination" device. > Like CPU to Sound card, CPU to network card etc. > Ok. Thanks for clarifying. Do you think it's best given as a device-specific parameter to the various functions, and NULL for hardware that doesn't need it? > > > > > > 2) I assume current data plane APIs are not thread-safe. Is it right? > > > > > Yes. > > > > > > > > 3) Cookie scheme outlined earlier looks good to me. Instead of having > > > generic dequeue() API > > > > > > 4) Can split the rte_dmadev_enqueue_copy(uint16_t dev_id, void * src, > > > void * dst, unsigned int length); > > > to two stage API like, Where one will be used in fastpath and other > > > one will use used in slowpath. > > > > > > - slowpath API will for take channel and take other attributes for transfer > > > > > > Example syantx will be: > > > > > > struct rte_dmadev_desc { > > > channel id; > > > ops ; // copy, xor, fill etc > > > other arguments specific to dma transfer // it can be set > > > based on capability. > > > > > > }; > > > > > > rte_dmadev_desc_t rte_dmadev_preprare(uint16_t dev_id, struct > > > rte_dmadev_desc *dec); > > > > > > - Fastpath takes arguments that need to change per transfer along with > > > slow-path handle. > > > > > > rte_dmadev_enqueue(uint16_t dev_id, void * src, void * dst, unsigned > > > int length, rte_dmadev_desc_t desc) > > > > > > This will help to driver to > > > -Former API form the device-specific descriptors in slow path for a > > > given channel and fixed attributes per transfer > > > -Later API blend "variable" arguments such as src, dest address with > > > slow-path created descriptors > > > > > > > This seems like an API for a context-aware device, where the channel is the > > config data/context that is preserved across operations - is that correct? > > At least from the Intel DMA accelerators side, we have no concept of this > > context, and each operation is completely self-described. The location or > > type of memory for copies is irrelevant, you just pass the src/dst > > addresses to reference. > > it is not context-aware device. Each HW JOB is self-described. > You can view it different attributes of transfer. > > > > > > > The above will give better performance and is the best trade-off c > > > between performance and per transfer variables. > > > > We may need to have different APIs for context-aware and context-unaware > > processing, with which to use determined by the capabilities discovery. > > Given that for these DMA devices the offload cost is critical, more so than > > any other dev class I've looked at before, I'd like to avoid having APIs > > with extra parameters than need to be passed about since that just adds > > extra CPU cycles to the offload. > > If driver does not support additional attributes and/or the > application does not need it, rte_dmadev_desc_t can be NULL. > So that it won't have any cost in the datapath. I think, we can go to > different API > cases if we can not abstract problems without performance impact. > Otherwise, it will be too much > pain for applications. Ok. Having one extra parameter ignored by some drivers should not be that big of a deal. [With all these, we'll only really know for sure when implemented and offload cost measured] > > Just to understand, I think, we need to HW capabilities and how to > have a common API. > I assume HW will have some HW JOB descriptors which will be filled in > SW and submitted to HW. > In our HW, Job descriptor has the following main elements > > - Channel // We don't expect the application to change per transfer > - Source address - It can be scatter-gather too - Will be changed per transfer > - Destination address - It can be scatter-gather too - Will be changed > per transfer > - Transfer Length - - It can be scatter-gather too - Will be changed > per transfer > - IOVA address where HW post Job completion status PER Job descriptor > - Will be changed per transfer > - Another sideband information related to channel // We don't expect > the application to change per transfer > - As an option, Job completion can be posted as an event to > rte_event_queue too // We don't expect the application to change per > transfer > > @Richardson, Bruce @fengchengwen @Hemant Agrawal > > Could you share the options for your HW descriptors which you are > planning to expose through API like above so that we can easily > converge on fastpath API > Taking the case of a simple copy op, the parameters we need are: * src * dst * length Depending on the specific hardware there will also be passed in the descriptor a completion address, but we plan for these cases to always have the completions written back to a set location so that we have essentially ring-writeback, as with the hardware which doesn't explicitly have a separate completion address. Beyond that, I believe the only descriptor fields we will use are just the flags field indicating the op type etc. /Bruce