From mboxrd@z Thu Jan 1 00:00:00 1970 From: Olivier Matz Subject: Re: [PATCH 0/6] get status of Rx and Tx descriptors Date: Fri, 3 Mar 2017 17:45:01 +0100 Message-ID: <20170303174501.7dbfbf10@platinum> References: <1479981261-19512-1-git-send-email-olivier.matz@6wind.com> <1488388752-1819-1-git-send-email-olivier.matz@6wind.com> <20170302153215.GA173492@bricha3-MOBL3.ger.corp.intel.com> <20170302171456.52a9415b@platinum> <1FD9B82B8BF2CF418D9A1000154491D97F0072FD@ORSMSX102.amr.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: "Richardson, Bruce" , "dev@dpdk.org" , "thomas.monjalon@6wind.com" , "Ananyev, Konstantin" , "Lu, Wenzhuo" , "Zhang, Helin" , "Wu, Jingjing" , "adrien.mazarguil@6wind.com" , "nelio.laranjeiro@6wind.com" , "Yigit, Ferruh" To: "Venkatesan, Venky" Return-path: Received: from mail-wr0-f172.google.com (mail-wr0-f172.google.com [209.85.128.172]) by dpdk.org (Postfix) with ESMTP id 36E3BBD32 for ; Fri, 3 Mar 2017 17:45:05 +0100 (CET) Received: by mail-wr0-f172.google.com with SMTP id u108so77468682wrb.3 for ; Fri, 03 Mar 2017 08:45:05 -0800 (PST) In-Reply-To: <1FD9B82B8BF2CF418D9A1000154491D97F0072FD@ORSMSX102.amr.corp.intel.com> List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Venky, On Fri, 3 Mar 2017 16:18:52 +0000, "Venkatesan, Venky" wrote: > > -----Original Message----- > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz > > Sent: Thursday, March 2, 2017 8:15 AM > > To: Richardson, Bruce > > Cc: dev@dpdk.org; thomas.monjalon@6wind.com; Ananyev, Konstantin > > ; Lu, Wenzhuo ; > > Zhang, Helin ; Wu, Jingjing > > ; adrien.mazarguil@6wind.com; > > nelio.laranjeiro@6wind.com; Yigit, Ferruh > > Subject: Re: [dpdk-dev] [PATCH 0/6] get status of Rx and Tx descriptors > > > > On Thu, 2 Mar 2017 15:32:15 +0000, Bruce Richardson > > wrote: > > > On Wed, Mar 01, 2017 at 06:19:06PM +0100, Olivier Matz wrote: > > > > This patchset introduces a new ethdev API: > > > > - rte_eth_rx_descriptor_status() > > > > - rte_eth_tx_descriptor_status() > > > > > > > > The Rx API is aims to replace rte_eth_rx_descriptor_done() which > > > > does almost the same, but does not differentiate the case of a > > > > descriptor used by the driver (not returned to the hw). > > > > > > > > The usage of these functions can be: > > > > - on Rx, anticipate that the cpu is not fast enough to process > > > > all incoming packets, and take dispositions to solve the > > > > problem (add more cpus, drop specific packets, ...) > > > > - on Tx, detect that the link is overloaded, and take dispositions > > > > to solve the problem (notify flow control, drop specific > > > > packets) > > > > > > > Looking at it from a slightly higher level, are these APIs really > > > going to help in these situations? If something is overloaded, doing > > > more work by querying the ring status only makes things worse. I > > > suspect that in most cases better results can be got by just looking > > > at the results of RX and TX burst functions. For example, if RX burst > > > is always returning a full set of packets, chances are you are > > > overloaded, or at least have no scope for an unexpected burst or event. > > > > > > Are these really needed for real applications? I suspect our trivial > > > l3fwd power example can be made to work ok without them. > > > > The l3fwd example uses the rte_eth_rx_descriptor_done() API, which is very > > similar to what I'm adding here. The differences are: > > > > - the new lib provides a Tx counterpart > > - it differentiates done/avail/hold descriptors > > > > The alternative was to update the descriptor_done API, but I think we > > agreed to do that in this thread: > > http://www.dpdk.org/ml/archives/dev/2017-January/054947.html > > > > About the usefulness of the API, I confirm it is useful: for instance, you can > > detect that you ring is more than half-full, and take dispositions to increase > > your processing power or select the packets you want to drop first. > > > For either of those cases, you could still implement this in your application without any of these APIs. Simply keep reading rx_burst() till it returns zero. You now have all the packets that you want - look at how many and increase your processing power, or drop them. In my use case, I may have several thresholds, so it gives a fast information about the ring status. Keeping reading rx_burst() until it returns 0 will not work if the packet rate is higher than (or close to) what the cpu is able to eat. > > The issue I have with this newer instantiation of the API is that it is essentially to pick up a descriptor at a specified offset. In most cases, if you plan to read far enough ahead with the API (let's say 16 or 32 ahead, or even more), you are almost always guaranteed an L1/L2 miss - essentially making it a costly API call. In cases that don't have something like Data Direct I/O (DDIO), you are now going to hit memory and stall the CPU for a long time. In any case, the API becomes pretty useless unless you want to stay within a smaller look ahead offset. The rx_burst() methodology simply works better in most circumstances, and allows application level control. > > So, NAK. My suggestion would be to go back to the older API. I don't understand the reason of your nack. The old API is there (for Rx it works the same), and it is illustrated in an example. Since your arguments also applies to the old API, so why are you saying we should keep the older API? For Tx, I want to know if I have enough room to send my packets before doing it. There is no API yet to do that. And yes, this could trigger cache misses, but in some situations it's preferable to be a bit slower (all tests are not test-iofwd) and be able to anticipate that the ring is getting full. Regards, Olivier