From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Roger B. Melton" Subject: Re: [PATCH] net/i40e: Improve i40evf buffer cleanup in tx vector mode Date: Fri, 6 Oct 2017 10:06:02 -0400 Message-ID: <4c446975-687e-1a84-e39b-00d43c909205@cisco.com> References: <20171005191111.27557-1-rmelton@cisco.com> <20171006085403.GA24124@bricha3-MOBL3.ger.corp.intel.com> <20171006135152.GA31796@bricha3-MOBL3.ger.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Cc: jingjing.wu@intel.com, dev@dpdk.org To: Bruce Richardson Return-path: Received: from rcdn-iport-2.cisco.com (rcdn-iport-2.cisco.com [173.37.86.73]) by dpdk.org (Postfix) with ESMTP id B42171B1A4 for ; Fri, 6 Oct 2017 16:06:11 +0200 (CEST) In-Reply-To: <20171006135152.GA31796@bricha3-MOBL3.ger.corp.intel.com> Content-Language: en-US List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 10/6/17 9:51 AM, Bruce Richardson wrote: > On Fri, Oct 06, 2017 at 08:38:01AM -0400, Roger B. Melton wrote: >> On 10/6/17 4:54 AM, Bruce Richardson wrote: >>> On Thu, Oct 05, 2017 at 03:11:11PM -0400, Roger B Melton wrote: >>>> --- >>>> >>>> i40evf tx vector logic frees mbufs, but it does not remove the >>>> mbufs from software rings which leads to double frees. This change >>>> corrects that oversight. We've validated this fix within our application. >>>> >>> Hi Roger, >>> >>> I'm a little concerned here by this driver fix, since if we are getting >>> double frees of mbufs there must be another bug somewhere in tracking >>> the free ring elements. Clearing the entries to NULL introduces extra >>> writes to the vector code path which will likely have a performance >>> impact, but also should be unnecessary, given proper tracking of the >>> ring status. >>> >>> Can you provide us with some details as to how to reproduce the issue, >>> especially with a public sample app? I'd really like to look more into >>> why this is happening, and if other fixes may work. >>> >>> Thanks, >>> /Bruce >>> >>> . >>> >> Hey Bruce, >> >> I've not attempted to reproduce the issue using sample apps.  It was >> initially difficult for us to reproduce with our application until we >> stumbled on a recipe.  The symptoms of the double free are two fold: >> >> * Application crashes: Corrupted packet headers, walking a chain of rx >> segments and loosing one in the middle,... >> * i40e adminq lockup - Meaning VF sends an OP and PF does not process it >> >> The former has been directly correlated to tx vector mode.  We still don't >> understand how this could lead to adminq lockup, but we have not observed >> the issue applied the patch I submitted.  We have questions out to Intel >> i40e team on this. >> >> Here's a high level view of the scenario under which the issues are observed >> and how we concluded that there were issues with tx vector free. We provided >> this information to the Intel i40e DPDK team, they reviewed the tx vector >> logic and suggested changes.  With the changes suggested by Intel (the patch >> I submitted) we have re-enabled tx vector when MTU is < MBUF size and >> observed no crashes. >> >> Below that you will find additional detail on the procedure within our >> application for changing MTU, including DPDK API calls. >> >> Let me know if you have additional questions. >> >> Regards, >> Roger >> >> o Scenario: >> >> + Intel i40e-da4 (4x10G) or Cisco branded equivalent >> + KVM or ESXi 6.5 host >> + In a single VM we have at least 1 VF from all 4 PFs. >> + We are running traffic on all interfaces. >> + We stop traffic >> + We stop a device (VF) >> + We start the device >> + We start traffic >> + At any point after we start the device we can crash >> >> o Experiment (Jumping over some of the work to get to the point >> where we believed that i40e driver was doing double frees): >> >> + Our application attaches userdata to mbufs and we put that >> userdata on a linked list. >> + We observed that when we processed the userdata it had been >> corrupted which lead to crashes. >> + This gave us a hint that the mbuf was on multiple lists. >> + We reviewed our application code and could not find a cause. >> + We began to suspect a double free in the i40evf PMD. >> >> # We disabled rx free logic and observed crashes >> (intentionally leaking mbufs in search of the double free). >> # We disabled tx free logic and observed no crashes >> # This gave us a hint that the double frees were coming >> from the i40evf PMD tx logic. >> >> + We had also observed that if we forced MTU to large always >> that there were no crashes >> >> # A side effect of forcing large MTU is that multi-segment >> is enabled. >> # This gave us a hint that enabling multi-segment was >> somehow avoiding the double free. >> >> + We forced multi-segments regardless of MTU and permitted MTU >> changes and observed no crashes. >> + We reviewed the i40evf mbuf free logic to see the effect of >> enabling multi-segment and observed that when multi-segment >> is enabled, rx vector was enabled but tx vector was not. >> + This lead us to examine RX vector mode free logic vs TX >> vector mode free logic. >> >> # RX free logic has special handling for vector mode free >> # TX free logic does not have any special handling for >> vector free >> >> o By enabling multiple segments always (even if MTU does not >> require multiple segments), we effectively disabled tx vector >> mode and this avoided the double free. >> o Our application no longer crashed, but it could not take >> advantage of tx vector optimizations. >> >> >> >> >> >> CP == Control Plane >> DP == Data Plane >> >> * CP sends admin down to DP >> * DP disables RX/TX >> o Block all future tx burst calls >> o rte_eth_dev_set_link_down() invoked >> o Block all future rx burst calls >> * DP notifies CP admin down action is complete >> * CP sends MTU change >> * DP processes MTU change >> o For each rxq: >> + rte_eth_rx_queue_info_get() >> + if not rxq_deferred_start rte_eth_dev_rx_queue_stop() >> o For each txq: >> + rte_eth_tx_queue_info_get() >> + if not txq_deferred_start rte_eth_dev_tx_queue_stop() >> o rte_eth_dev_stop() >> o Re-configure the port: (Note this is original code, not new code >> which is forcing multisegs always) >> + Set rx_mode.jumbo_frame if MTU > 1518 >> + Set rx_mode.enable_scatter if MTU > 2048 >> + txq_flags = ETH_TXQ_NOOFLOADS >> + if MTU > 2048, txq_flags |= ETH_TXQ_FLAGS_NOMULTSEGS >> + rte_eth_promiscuous_get() >> + rte_eth_dev_info_get() >> + rte_eth_dev_configure() >> # Init tx_vec_allowed and rx_vec_allowed to TRUE. >> + rte_eth_dev_info_get() >> + For each txq: rte_eth_tx_queue_setup() >> # If new MTU > 2048, ETH_TXQ_FLAGS_NOMULTSEGS was set in >> txq_flags & tx_vec_allowed will be cleared. >> + For each rxq: rte_eth_rx_queue_setup() >> o rte_eth_dev_set_mtu() >> o rte_eth_dev_start() >> o rte_eth_dev_info_get() >> o For each rxq: >> + if not rxq_deferred_start rte_eth_dev_rx_queue_start() >> o For each txq: >> + if not txq_deferred_start rte_eth_dev_tx_queue_start() >> * DP notifies CP MTU change applied. >> * CP sends admin up to DP >> * DP enables RX/TX >> o Enable all future tx burst calls >> o rte_eth_dev_set_link_up() invoked >> o Enable all future rx burst calls >> * DP notifies CP admin up action is complete >> >> >> > Thanks for the explanation. > > I'll follow-up with our i40e driver team to see if they can give me more > detail on their investigation. I'd like to know more details about the > ultimate root cause of this double free - presumably it's something to > do with the behaviour when stopping and starting the queue or port - and > find a solution that does not involve more writes on the fast-path. If > we don't come up with a better solution, this fix will do, though. > > Regards, > /Bruce > . > Thanks Bruce,  I'll go ahead and submit V2 properly signed off with same content. Regards, Roger