From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D70BC4743C for ; Wed, 23 Jun 2021 11:40:52 +0000 (UTC) Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by mail.kernel.org (Postfix) with ESMTP id 15A8E60249 for ; Wed, 23 Jun 2021 11:40:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 15A8E60249 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=dev-bounces@dpdk.org Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 6FA084003F; Wed, 23 Jun 2021 13:40:51 +0200 (CEST) Received: from mail-io1-f51.google.com (mail-io1-f51.google.com [209.85.166.51]) by mails.dpdk.org (Postfix) with ESMTP id CA2F04003E for ; Wed, 23 Jun 2021 13:40:49 +0200 (CEST) Received: by mail-io1-f51.google.com with SMTP id d9so2980472ioo.2 for ; Wed, 23 Jun 2021 04:40:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ExD77MfD9L4HeXuULf1aQFtQgu8UMlMvXMMX1b2/A18=; b=PDsbKoRuH9Y/hWNYKEroaDC84HJy54ejtHkamkUvljpPHn2Rm+beJ+Vk606IpORqd/ InkPiwZEqCr8KtXHkd855zB1LFCssfZUSINDwv4TGQVrRb7r/lFtFok2UU8fOeWHqx/s KrN43oYackTSSdsYcU1ANxT2VV0hacjBlcJoHZlfoRWU0sl9HJjkuARsHOmiMuoV0dRh DsKFt6VkR/wv9YH9pUsj5rwO1sgj3jbfAOJr1TasQZSI6nAmWgZTe4Jlm6fIXmRqHYfh e3sm4JRso4xVFR11DdvCJ1XDgSwv8ukfZJfcYJ4wPIWherXQLmPtuB7a0iUo8CCvMe9t d6SA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ExD77MfD9L4HeXuULf1aQFtQgu8UMlMvXMMX1b2/A18=; b=joZY+Zeu5mfXWLk0am4w7i8XIC303RK2ZQQP51cb7LvvD+OrT4T4FczRrQ6/zs2fZg +pUETgFrOzF6GnnsueXodetGoyUId9v9pIRapgLJ1cR6PXaZPPW73itUQ7WXISULxOGF H+jCieIqwb1aqB+1c9WZvgb8UICPUxp1meqET+Ku7kmZeA4ZDNQuoQeib30XOCT1YSfI iLa3bXiBEKPvUfBHyoQyHTGW6KPYdmZvNE1IIQyU5j0nRLm1xZL04P5l+7mbQo5B1cUM cJ9pfziBVa/qvJ0QxODaC4c/QFMDTs3b+eHiC5PipJQlLLuTad2k1ns76r2wJmAP3fo8 vjlQ== X-Gm-Message-State: AOAM5303RWtuRJNnZ3WnOdRipvWZnay78GbN3hKMo9vG9xg325O5rq7X bCtnevs8NONX9S637T8wfyta0zaDhPCJT5nNBB0= X-Google-Smtp-Source: ABdhPJyQSAEiCu3ZjsKsGAjCaxTChifeSnXlGt2WhGmnfs7XhzEYf9lqQdRZc3sRQvz5rAfA+RlcM2BnuFIWCPSwP2Q= X-Received: by 2002:a05:6638:12cd:: with SMTP id v13mr8583119jas.104.1624448448979; Wed, 23 Jun 2021 04:40:48 -0700 (PDT) MIME-Version: 1.0 References: <98CBD80474FA8B44BF855DF32C47DC35C61860@smartserver.smartshare.dk> <3cb0bd01-2b0d-cf96-d173-920947466041@huawei.com> In-Reply-To: From: Jerin Jacob Date: Wed, 23 Jun 2021 17:10:22 +0530 Message-ID: To: Bruce Richardson Cc: fengchengwen , =?UTF-8?Q?Morten_Br=C3=B8rup?= , Thomas Monjalon , Ferruh Yigit , dpdk-dev , Nipun Gupta , Hemant Agrawal , Maxime Coquelin , Honnappa Nagarahalli , Jerin Jacob , David Marchand , Satananda Burla , Prasun Kapoor Content-Type: text/plain; charset="UTF-8" Subject: Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Wed, Jun 23, 2021 at 3:07 PM Bruce Richardson wrote: > > On Wed, Jun 23, 2021 at 12:51:07PM +0530, Jerin Jacob wrote: > > On Wed, Jun 23, 2021 at 9:00 AM fengchengwen wrote: > > > > > > > > > Currently, it is hard to define generic dma descriptor, I think the well-defined > > > APIs is feasible. > > > > I would like to understand why not feasible? if we move the > > preparation to the slow path. > > > > i.e > > > > struct rte_dmadev_desc defines all the "attributes" of all DMA devices available > > using capability. I believe with the scheme, we can scale and > > incorporate all features of > > all DMA HW without any performance impact. > > > > something like: > > > > struct rte_dmadev_desc { > > /* Attributes all DMA transfer available for all HW under capability. */ > > channel or port; > > ops ; // copy, fill etc.. > > /* impemention opqueue memory as zero length array, > > rte_dmadev_desc_prep() update this memory with HW specific information > > */ > > uint8_t impl_opq[]; > > } > > > > // allocate the memory for dma decriptor > > struct rte_dmadev_desc *rte_dmadev_desc_alloc(devid); > > // Convert DPDK specific descriptors to HW specific descriptors in slowpath */ > > rte_dmadev_desc_prep(devid, struct rte_dmadev_desc *desc); > > // Free dma descriptor memory > > rte_dmadev_desc_free(devid, struct rte_dmadev_desc *desc ) > > > > The above calls in slow path. > > > > Only below call in fastpath. > > // Here desc can be NULL(in case you don't need any specific attribute > > attached to transfer, if needed, it can be an object which is gone > > through rte_dmadev_desc_prep()) > > rte_dmadev_enq(devid, struct rte_dmadev_desc *desc, void *src, void > > *dest, unsigned int len, cookie) > > > > The trouble here is the performance penalty due to building up and tearing > down structures and passing those structures into functions via function > pointer. With the APIs for enqueue/dequeue that have been discussed here, > all parameters will be passed in registers, and then each driver can do a > write of the actual hardware descriptor straight to cache/memory from > registers. With the scheme you propose above, the register contains a > pointer to the data which must then be loaded into the CPU before being > written out again. This increases our offload cost. See below. > > However, assuming that the desc_prep call is just for slowpath or > initialization time, I'd be ok to have the functions take an extra > hw-specific parameter for each call prepared with tx_prep. It would still > allow all other parameters to be passed in registers. How much data are you > looking to store in this desc struct? It can't all be represented as flags, > for example? There is around 128bit of metadata for octeontx2. New HW may completely different metata http://code.dpdk.org/dpdk/v21.05/source/drivers/raw/octeontx2_dma/otx2_dpi_rawdev.h#L149 I see following issue with flags scheme: - We need to start populate in fastpath, Since it based on capabality, application needs to have different versions of fastpath code - Not future proof, Not easy add other stuff as needed when new HW comes with new transfer attributes. > > As for the individual APIs, we could do a generic "enqueue" API, which > takes the op as a parameter, I prefer having each operation as a separate > function, in order to increase the readability of the code and to reduce Only issue I see, all application needs have two path for doing the stuff, one with _prep() and separate function() and drivers need to support both. > the number of parameters needed per function i.e. thereby saving registers > needing to be used and potentially making the function calls and offload My worry is, struct rte_dmadev can hold only function pointers for <= 8 fastpath functions for 64B cache line. When you say new op, say fill, need a new function, What will be the change wrt HW driver point of view? Is it updating HW descriptor with op as _fill_ vs _copy_? something beyond that? If it is about, HW descriptor update, then _prep() can do all work, just driver need to copy desc to to HW. I believe upto to 6 arguments passed over registers in x86(it is 8 in arm64). if so, the desc pointer(already populated in HW descriptor format by _prep()) is in register, and would be simple 64bit/128bit copy from desc pointer to HW memory on driver enq(). I dont see any overhead on that, On other side, we if keep adding arguments, it will spill out to stack. > cost cheaper. Perhaps we can have the "common" ops such as copy, fill, have > their own functions, and have a generic "enqueue" function for the > less-commonly used or supported ops? > > /Bruce