All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] net/i40e: Improve i40evf buffer cleanup in tx vector mode
@ 2017-10-05 19:11 Roger B Melton
  2017-10-05 23:42 ` Roger B. Melton
  2017-10-06  8:54 ` Bruce Richardson
  0 siblings, 2 replies; 7+ messages in thread
From: Roger B Melton @ 2017-10-05 19:11 UTC (permalink / raw)
  To: jingjing.wu; +Cc: dev, Roger B Melton

---

i40evf tx vector logic frees mbufs, but it does not remove the
mbufs from software rings which leads to double frees.  This change
 corrects that oversight.  We've validated this fix within our application.

 drivers/net/i40e/i40e_rxtx_vec_common.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h b/drivers/net/i40e/i40e_rxtx_vec_common.h
index 39a6da0..fdc6fce 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_common.h
+++ b/drivers/net/i40e/i40e_rxtx_vec_common.h
@@ -127,6 +127,7 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq)
 	if (likely(m != NULL)) {
 		free[0] = m;
 		nb_free = 1;
+		txep[0].mbuf = NULL;
 		for (i = 1; i < n; i++) {
 			m = rte_pktmbuf_prefree_seg(txep[i].mbuf);
 			if (likely(m != NULL)) {
@@ -139,14 +140,17 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq)
 					free[0] = m;
 					nb_free = 1;
 				}
+				txep[i].mbuf = NULL;
 			}
 		}
 		rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
 	} else {
+		txep[0].mbuf = NULL;
 		for (i = 1; i < n; i++) {
 			m = rte_pktmbuf_prefree_seg(txep[i].mbuf);
 			if (m != NULL)
 				rte_mempool_put(m->pool, m);
+			txep[i].mbuf = NULL;
 		}
 	}
 
-- 
2.10.3.dirty

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] net/i40e: Improve i40evf buffer cleanup in tx vector mode
  2017-10-05 19:11 [PATCH] net/i40e: Improve i40evf buffer cleanup in tx vector mode Roger B Melton
@ 2017-10-05 23:42 ` Roger B. Melton
  2017-10-06  0:33   ` Ferruh Yigit
  2017-10-06  8:54 ` Bruce Richardson
  1 sibling, 1 reply; 7+ messages in thread
From: Roger B. Melton @ 2017-10-05 23:42 UTC (permalink / raw)
  To: jingjing.wu; +Cc: dev

Hi Everyone,

As soon as I submitted the patch, I realized I neglected to signoff.  
What's the recommended procedure, resubmit the original signed off, or 
bump to v1?

Thanks,
Roger


On 10/5/17 3:11 PM, Roger B Melton wrote:
> ---
>
> i40evf tx vector logic frees mbufs, but it does not remove the
> mbufs from software rings which leads to double frees.  This change
>   corrects that oversight.  We've validated this fix within our application.
>
>   drivers/net/i40e/i40e_rxtx_vec_common.h | 4 ++++
>   1 file changed, 4 insertions(+)
>
> diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h b/drivers/net/i40e/i40e_rxtx_vec_common.h
> index 39a6da0..fdc6fce 100644
> --- a/drivers/net/i40e/i40e_rxtx_vec_common.h
> +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h
> @@ -127,6 +127,7 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq)
>   	if (likely(m != NULL)) {
>   		free[0] = m;
>   		nb_free = 1;
> +		txep[0].mbuf = NULL;
>   		for (i = 1; i < n; i++) {
>   			m = rte_pktmbuf_prefree_seg(txep[i].mbuf);
>   			if (likely(m != NULL)) {
> @@ -139,14 +140,17 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq)
>   					free[0] = m;
>   					nb_free = 1;
>   				}
> +				txep[i].mbuf = NULL;
>   			}
>   		}
>   		rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
>   	} else {
> +		txep[0].mbuf = NULL;
>   		for (i = 1; i < n; i++) {
>   			m = rte_pktmbuf_prefree_seg(txep[i].mbuf);
>   			if (m != NULL)
>   				rte_mempool_put(m->pool, m);
> +			txep[i].mbuf = NULL;
>   		}
>   	}
>   

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net/i40e: Improve i40evf buffer cleanup in tx vector mode
  2017-10-05 23:42 ` Roger B. Melton
@ 2017-10-06  0:33   ` Ferruh Yigit
  0 siblings, 0 replies; 7+ messages in thread
From: Ferruh Yigit @ 2017-10-06  0:33 UTC (permalink / raw)
  To: Roger B. Melton, jingjing.wu; +Cc: dev

On 10/6/2017 12:42 AM, Roger B. Melton wrote:
> Hi Everyone,
> 
> As soon as I submitted the patch, I realized I neglected to signoff.  
> What's the recommended procedure, resubmit the original signed off, or 
> bump to v1?

Please send a new one increasing version, since this was v1 already next
will be v2.

> 
> Thanks,
> Roger

<...>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net/i40e: Improve i40evf buffer cleanup in tx vector mode
  2017-10-05 19:11 [PATCH] net/i40e: Improve i40evf buffer cleanup in tx vector mode Roger B Melton
  2017-10-05 23:42 ` Roger B. Melton
@ 2017-10-06  8:54 ` Bruce Richardson
  2017-10-06 12:38   ` Roger B. Melton
  1 sibling, 1 reply; 7+ messages in thread
From: Bruce Richardson @ 2017-10-06  8:54 UTC (permalink / raw)
  To: Roger B Melton; +Cc: jingjing.wu, dev

On Thu, Oct 05, 2017 at 03:11:11PM -0400, Roger B Melton wrote:
> ---
> 
> i40evf tx vector logic frees mbufs, but it does not remove the
> mbufs from software rings which leads to double frees.  This change
>  corrects that oversight.  We've validated this fix within our application.
> 

Hi Roger,

I'm a little concerned here by this driver fix, since if we are getting
double frees of mbufs there must be another bug somewhere in tracking
the free ring elements. Clearing the entries to NULL introduces extra
writes to the vector code path which will likely have a performance
impact, but also should be unnecessary, given proper tracking of the
ring status.

Can you provide us with some details as to how to reproduce the issue,
especially with a public sample app? I'd really like to look more into
why this is happening, and if other fixes may work.

Thanks,
/Bruce

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net/i40e: Improve i40evf buffer cleanup in tx vector mode
  2017-10-06  8:54 ` Bruce Richardson
@ 2017-10-06 12:38   ` Roger B. Melton
  2017-10-06 13:51     ` Bruce Richardson
  0 siblings, 1 reply; 7+ messages in thread
From: Roger B. Melton @ 2017-10-06 12:38 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: jingjing.wu, dev

On 10/6/17 4:54 AM, Bruce Richardson wrote:
> On Thu, Oct 05, 2017 at 03:11:11PM -0400, Roger B Melton wrote:
>> ---
>>
>> i40evf tx vector logic frees mbufs, but it does not remove the
>> mbufs from software rings which leads to double frees.  This change
>>   corrects that oversight.  We've validated this fix within our application.
>>
> Hi Roger,
>
> I'm a little concerned here by this driver fix, since if we are getting
> double frees of mbufs there must be another bug somewhere in tracking
> the free ring elements. Clearing the entries to NULL introduces extra
> writes to the vector code path which will likely have a performance
> impact, but also should be unnecessary, given proper tracking of the
> ring status.
>
> Can you provide us with some details as to how to reproduce the issue,
> especially with a public sample app? I'd really like to look more into
> why this is happening, and if other fixes may work.
>
> Thanks,
> /Bruce
>
> .
>
Hey Bruce,

I've not attempted to reproduce the issue using sample apps.  It was 
initially difficult for us to reproduce with our application until we 
stumbled on a recipe.  The symptoms of the double free are two fold:

  * Application crashes: Corrupted packet headers, walking a chain of rx
    segments and loosing one in the middle,...
  * i40e adminq lockup - Meaning VF sends an OP and PF does not process it

The former has been directly correlated to tx vector mode.  We still 
don't understand how this could lead to adminq lockup, but we have not 
observed the issue applied the patch I submitted.  We have questions out 
to Intel i40e team on this.

Here's a high level view of the scenario under which the issues are 
observed and how we concluded that there were issues with tx vector 
free. We provided this information to the Intel i40e DPDK team, they 
reviewed the tx vector logic and suggested changes.  With the changes 
suggested by Intel (the patch I submitted) we have re-enabled tx vector 
when MTU is < MBUF size and observed no crashes.

Below that you will find additional detail on the procedure within our 
application for changing MTU, including DPDK API calls.

Let me know if you have additional questions.

Regards,
Roger

      o Scenario:

          + Intel i40e-da4 (4x10G) or Cisco branded equivalent
          + KVM or ESXi 6.5 host
          + In a single VM we have at least 1 VF from all 4 PFs.
          + We are running traffic on all interfaces.
          + We stop traffic
          + We stop a device (VF)
          + We start the device
          + We start traffic
          + At any point after we start the device we can crash

      o Experiment (Jumping over some of the work to get to the point
        where we believed that i40e driver was doing double frees):

          + Our application attaches userdata to mbufs and we put that
            userdata on a linked list.
          + We observed that when we processed the userdata it had been
            corrupted which lead to crashes.
          + This gave us a hint that the mbuf was on multiple lists.
          + We reviewed our application code and could not find a cause.
          + We began to suspect a double free in the i40evf PMD.

              # We disabled rx free logic and observed crashes
                (intentionally leaking mbufs in search of the double free).
              # We disabled tx free logic and observed no crashes
              # This gave us a hint that the double frees were coming
                from the i40evf PMD tx logic.

          + We had also observed that if we forced MTU to large always
            that there were no crashes

              # A side effect of forcing large MTU is that multi-segment
                is enabled.
              # This gave us a hint that enabling multi-segment was
                somehow avoiding the double free.

          + We forced multi-segments regardless of MTU and permitted MTU
            changes and observed no crashes.
          + We reviewed the i40evf mbuf free logic to see the effect of
            enabling multi-segment and observed that when multi-segment
            is enabled, rx vector was enabled but tx vector was not.
          + This lead us to examine RX vector mode free logic vs TX
            vector mode free logic.

              # RX free logic has special handling for vector mode free
              # TX free logic does not have any special handling for
                vector free

      o By enabling multiple segments always (even if MTU does not
        require multiple segments), we effectively disabled tx vector
        mode and this avoided the double free.
      o Our application no longer crashed, but it could not take
        advantage of tx vector optimizations.





CP == Control Plane
DP == Data Plane

  * CP sends admin down to DP
  * DP disables RX/TX
      o Block all future tx burst calls
      o rte_eth_dev_set_link_down() invoked
      o Block all future rx burst calls
  * DP notifies CP admin down action is complete
  * CP sends MTU change
  * DP processes MTU change
      o For each rxq:
          + rte_eth_rx_queue_info_get()
          + if not rxq_deferred_start rte_eth_dev_rx_queue_stop()
      o For each txq:
          + rte_eth_tx_queue_info_get()
          + if not txq_deferred_start rte_eth_dev_tx_queue_stop()
      o rte_eth_dev_stop()
      o Re-configure the port: (Note this is original code, not new code
        which is forcing multisegs always)
          + Set rx_mode.jumbo_frame if MTU > 1518
          + Set rx_mode.enable_scatter if MTU > 2048
          + txq_flags = ETH_TXQ_NOOFLOADS
          + if MTU > 2048, txq_flags |= ETH_TXQ_FLAGS_NOMULTSEGS
          + rte_eth_promiscuous_get()
          + rte_eth_dev_info_get()
          + rte_eth_dev_configure()
              # Init tx_vec_allowed and rx_vec_allowed to TRUE.
          + rte_eth_dev_info_get()
          + For each txq: rte_eth_tx_queue_setup()
              # If new MTU > 2048, ETH_TXQ_FLAGS_NOMULTSEGS was set in
                txq_flags & tx_vec_allowed will be cleared.
          + For each rxq: rte_eth_rx_queue_setup()
      o rte_eth_dev_set_mtu()
      o rte_eth_dev_start()
      o rte_eth_dev_info_get()
      o For each rxq:
          + if not rxq_deferred_start rte_eth_dev_rx_queue_start()
      o For each txq:
          + if not txq_deferred_start rte_eth_dev_tx_queue_start()
  * DP notifies CP MTU change applied.
  * CP sends admin up to DP
  * DP enables RX/TX
      o Enable all future tx burst calls
      o rte_eth_dev_set_link_up() invoked
      o Enable all future rx burst calls
  * DP notifies CP admin up action is complete



-- 
  ____________________________________________________________________
|Roger B. Melton                |          |      Cisco Systems      |
|CPP Software                  :|:        :|:     7100 Kit Creek Rd  |
|+1.919.476.2332 phone        :|||:      :|||:    RTP, NC 27709-4987 |
|+1.919.392.1094 fax       .:|||||||:..:|||||||:. rmelton@cisco.com  |
|                                                                    |
| This email may contain confidential and privileged material for the|
| sole use of the intended recipient. Any review, use, distribution  |
| or disclosure by others is strictly prohibited. If you are not the |
| intended recipient (or authorized to receive for the recipient),   |
| please contact the sender by reply email and delete all copies of  |
| this message.                                                      |
|                                                                    |
| For corporate legal information go to:                             |
| http://www.cisco.com/web/about/doing_business/legal/cri/index.html |
|__________________________ http://www.cisco.com ____________________|

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net/i40e: Improve i40evf buffer cleanup in tx vector mode
  2017-10-06 12:38   ` Roger B. Melton
@ 2017-10-06 13:51     ` Bruce Richardson
  2017-10-06 14:06       ` Roger B. Melton
  0 siblings, 1 reply; 7+ messages in thread
From: Bruce Richardson @ 2017-10-06 13:51 UTC (permalink / raw)
  To: Roger B. Melton; +Cc: jingjing.wu, dev

On Fri, Oct 06, 2017 at 08:38:01AM -0400, Roger B. Melton wrote:
> On 10/6/17 4:54 AM, Bruce Richardson wrote:
> > On Thu, Oct 05, 2017 at 03:11:11PM -0400, Roger B Melton wrote:
> > > ---
> > > 
> > > i40evf tx vector logic frees mbufs, but it does not remove the
> > > mbufs from software rings which leads to double frees.  This change
> > >   corrects that oversight.  We've validated this fix within our application.
> > > 
> > Hi Roger,
> > 
> > I'm a little concerned here by this driver fix, since if we are getting
> > double frees of mbufs there must be another bug somewhere in tracking
> > the free ring elements. Clearing the entries to NULL introduces extra
> > writes to the vector code path which will likely have a performance
> > impact, but also should be unnecessary, given proper tracking of the
> > ring status.
> > 
> > Can you provide us with some details as to how to reproduce the issue,
> > especially with a public sample app? I'd really like to look more into
> > why this is happening, and if other fixes may work.
> > 
> > Thanks,
> > /Bruce
> > 
> > .
> > 
> Hey Bruce,
> 
> I've not attempted to reproduce the issue using sample apps.  It was
> initially difficult for us to reproduce with our application until we
> stumbled on a recipe.  The symptoms of the double free are two fold:
> 
>  * Application crashes: Corrupted packet headers, walking a chain of rx
>    segments and loosing one in the middle,...
>  * i40e adminq lockup - Meaning VF sends an OP and PF does not process it
> 
> The former has been directly correlated to tx vector mode.  We still don't
> understand how this could lead to adminq lockup, but we have not observed
> the issue applied the patch I submitted.  We have questions out to Intel
> i40e team on this.
> 
> Here's a high level view of the scenario under which the issues are observed
> and how we concluded that there were issues with tx vector free. We provided
> this information to the Intel i40e DPDK team, they reviewed the tx vector
> logic and suggested changes.  With the changes suggested by Intel (the patch
> I submitted) we have re-enabled tx vector when MTU is < MBUF size and
> observed no crashes.
> 
> Below that you will find additional detail on the procedure within our
> application for changing MTU, including DPDK API calls.
> 
> Let me know if you have additional questions.
> 
> Regards,
> Roger
> 
>      o Scenario:
> 
>          + Intel i40e-da4 (4x10G) or Cisco branded equivalent
>          + KVM or ESXi 6.5 host
>          + In a single VM we have at least 1 VF from all 4 PFs.
>          + We are running traffic on all interfaces.
>          + We stop traffic
>          + We stop a device (VF)
>          + We start the device
>          + We start traffic
>          + At any point after we start the device we can crash
> 
>      o Experiment (Jumping over some of the work to get to the point
>        where we believed that i40e driver was doing double frees):
> 
>          + Our application attaches userdata to mbufs and we put that
>            userdata on a linked list.
>          + We observed that when we processed the userdata it had been
>            corrupted which lead to crashes.
>          + This gave us a hint that the mbuf was on multiple lists.
>          + We reviewed our application code and could not find a cause.
>          + We began to suspect a double free in the i40evf PMD.
> 
>              # We disabled rx free logic and observed crashes
>                (intentionally leaking mbufs in search of the double free).
>              # We disabled tx free logic and observed no crashes
>              # This gave us a hint that the double frees were coming
>                from the i40evf PMD tx logic.
> 
>          + We had also observed that if we forced MTU to large always
>            that there were no crashes
> 
>              # A side effect of forcing large MTU is that multi-segment
>                is enabled.
>              # This gave us a hint that enabling multi-segment was
>                somehow avoiding the double free.
> 
>          + We forced multi-segments regardless of MTU and permitted MTU
>            changes and observed no crashes.
>          + We reviewed the i40evf mbuf free logic to see the effect of
>            enabling multi-segment and observed that when multi-segment
>            is enabled, rx vector was enabled but tx vector was not.
>          + This lead us to examine RX vector mode free logic vs TX
>            vector mode free logic.
> 
>              # RX free logic has special handling for vector mode free
>              # TX free logic does not have any special handling for
>                vector free
> 
>      o By enabling multiple segments always (even if MTU does not
>        require multiple segments), we effectively disabled tx vector
>        mode and this avoided the double free.
>      o Our application no longer crashed, but it could not take
>        advantage of tx vector optimizations.
> 
> 
> 
> 
> 
> CP == Control Plane
> DP == Data Plane
> 
>  * CP sends admin down to DP
>  * DP disables RX/TX
>      o Block all future tx burst calls
>      o rte_eth_dev_set_link_down() invoked
>      o Block all future rx burst calls
>  * DP notifies CP admin down action is complete
>  * CP sends MTU change
>  * DP processes MTU change
>      o For each rxq:
>          + rte_eth_rx_queue_info_get()
>          + if not rxq_deferred_start rte_eth_dev_rx_queue_stop()
>      o For each txq:
>          + rte_eth_tx_queue_info_get()
>          + if not txq_deferred_start rte_eth_dev_tx_queue_stop()
>      o rte_eth_dev_stop()
>      o Re-configure the port: (Note this is original code, not new code
>        which is forcing multisegs always)
>          + Set rx_mode.jumbo_frame if MTU > 1518
>          + Set rx_mode.enable_scatter if MTU > 2048
>          + txq_flags = ETH_TXQ_NOOFLOADS
>          + if MTU > 2048, txq_flags |= ETH_TXQ_FLAGS_NOMULTSEGS
>          + rte_eth_promiscuous_get()
>          + rte_eth_dev_info_get()
>          + rte_eth_dev_configure()
>              # Init tx_vec_allowed and rx_vec_allowed to TRUE.
>          + rte_eth_dev_info_get()
>          + For each txq: rte_eth_tx_queue_setup()
>              # If new MTU > 2048, ETH_TXQ_FLAGS_NOMULTSEGS was set in
>                txq_flags & tx_vec_allowed will be cleared.
>          + For each rxq: rte_eth_rx_queue_setup()
>      o rte_eth_dev_set_mtu()
>      o rte_eth_dev_start()
>      o rte_eth_dev_info_get()
>      o For each rxq:
>          + if not rxq_deferred_start rte_eth_dev_rx_queue_start()
>      o For each txq:
>          + if not txq_deferred_start rte_eth_dev_tx_queue_start()
>  * DP notifies CP MTU change applied.
>  * CP sends admin up to DP
>  * DP enables RX/TX
>      o Enable all future tx burst calls
>      o rte_eth_dev_set_link_up() invoked
>      o Enable all future rx burst calls
>  * DP notifies CP admin up action is complete
> 
> 
> 
Thanks for the explanation.

I'll follow-up with our i40e driver team to see if they can give me more
detail on their investigation. I'd like to know more details about the
ultimate root cause of this double free - presumably it's something to
do with the behaviour when stopping and starting the queue or port - and
find a solution that does not involve more writes on the fast-path. If
we don't come up with a better solution, this fix will do, though.

Regards,
/Bruce

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net/i40e: Improve i40evf buffer cleanup in tx vector mode
  2017-10-06 13:51     ` Bruce Richardson
@ 2017-10-06 14:06       ` Roger B. Melton
  0 siblings, 0 replies; 7+ messages in thread
From: Roger B. Melton @ 2017-10-06 14:06 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: jingjing.wu, dev

On 10/6/17 9:51 AM, Bruce Richardson wrote:
> On Fri, Oct 06, 2017 at 08:38:01AM -0400, Roger B. Melton wrote:
>> On 10/6/17 4:54 AM, Bruce Richardson wrote:
>>> On Thu, Oct 05, 2017 at 03:11:11PM -0400, Roger B Melton wrote:
>>>> ---
>>>>
>>>> i40evf tx vector logic frees mbufs, but it does not remove the
>>>> mbufs from software rings which leads to double frees.  This change
>>>>    corrects that oversight.  We've validated this fix within our application.
>>>>
>>> Hi Roger,
>>>
>>> I'm a little concerned here by this driver fix, since if we are getting
>>> double frees of mbufs there must be another bug somewhere in tracking
>>> the free ring elements. Clearing the entries to NULL introduces extra
>>> writes to the vector code path which will likely have a performance
>>> impact, but also should be unnecessary, given proper tracking of the
>>> ring status.
>>>
>>> Can you provide us with some details as to how to reproduce the issue,
>>> especially with a public sample app? I'd really like to look more into
>>> why this is happening, and if other fixes may work.
>>>
>>> Thanks,
>>> /Bruce
>>>
>>> .
>>>
>> Hey Bruce,
>>
>> I've not attempted to reproduce the issue using sample apps.  It was
>> initially difficult for us to reproduce with our application until we
>> stumbled on a recipe.  The symptoms of the double free are two fold:
>>
>>   * Application crashes: Corrupted packet headers, walking a chain of rx
>>     segments and loosing one in the middle,...
>>   * i40e adminq lockup - Meaning VF sends an OP and PF does not process it
>>
>> The former has been directly correlated to tx vector mode.  We still don't
>> understand how this could lead to adminq lockup, but we have not observed
>> the issue applied the patch I submitted.  We have questions out to Intel
>> i40e team on this.
>>
>> Here's a high level view of the scenario under which the issues are observed
>> and how we concluded that there were issues with tx vector free. We provided
>> this information to the Intel i40e DPDK team, they reviewed the tx vector
>> logic and suggested changes.  With the changes suggested by Intel (the patch
>> I submitted) we have re-enabled tx vector when MTU is < MBUF size and
>> observed no crashes.
>>
>> Below that you will find additional detail on the procedure within our
>> application for changing MTU, including DPDK API calls.
>>
>> Let me know if you have additional questions.
>>
>> Regards,
>> Roger
>>
>>       o Scenario:
>>
>>           + Intel i40e-da4 (4x10G) or Cisco branded equivalent
>>           + KVM or ESXi 6.5 host
>>           + In a single VM we have at least 1 VF from all 4 PFs.
>>           + We are running traffic on all interfaces.
>>           + We stop traffic
>>           + We stop a device (VF)
>>           + We start the device
>>           + We start traffic
>>           + At any point after we start the device we can crash
>>
>>       o Experiment (Jumping over some of the work to get to the point
>>         where we believed that i40e driver was doing double frees):
>>
>>           + Our application attaches userdata to mbufs and we put that
>>             userdata on a linked list.
>>           + We observed that when we processed the userdata it had been
>>             corrupted which lead to crashes.
>>           + This gave us a hint that the mbuf was on multiple lists.
>>           + We reviewed our application code and could not find a cause.
>>           + We began to suspect a double free in the i40evf PMD.
>>
>>               # We disabled rx free logic and observed crashes
>>                 (intentionally leaking mbufs in search of the double free).
>>               # We disabled tx free logic and observed no crashes
>>               # This gave us a hint that the double frees were coming
>>                 from the i40evf PMD tx logic.
>>
>>           + We had also observed that if we forced MTU to large always
>>             that there were no crashes
>>
>>               # A side effect of forcing large MTU is that multi-segment
>>                 is enabled.
>>               # This gave us a hint that enabling multi-segment was
>>                 somehow avoiding the double free.
>>
>>           + We forced multi-segments regardless of MTU and permitted MTU
>>             changes and observed no crashes.
>>           + We reviewed the i40evf mbuf free logic to see the effect of
>>             enabling multi-segment and observed that when multi-segment
>>             is enabled, rx vector was enabled but tx vector was not.
>>           + This lead us to examine RX vector mode free logic vs TX
>>             vector mode free logic.
>>
>>               # RX free logic has special handling for vector mode free
>>               # TX free logic does not have any special handling for
>>                 vector free
>>
>>       o By enabling multiple segments always (even if MTU does not
>>         require multiple segments), we effectively disabled tx vector
>>         mode and this avoided the double free.
>>       o Our application no longer crashed, but it could not take
>>         advantage of tx vector optimizations.
>>
>>
>>
>>
>>
>> CP == Control Plane
>> DP == Data Plane
>>
>>   * CP sends admin down to DP
>>   * DP disables RX/TX
>>       o Block all future tx burst calls
>>       o rte_eth_dev_set_link_down() invoked
>>       o Block all future rx burst calls
>>   * DP notifies CP admin down action is complete
>>   * CP sends MTU change
>>   * DP processes MTU change
>>       o For each rxq:
>>           + rte_eth_rx_queue_info_get()
>>           + if not rxq_deferred_start rte_eth_dev_rx_queue_stop()
>>       o For each txq:
>>           + rte_eth_tx_queue_info_get()
>>           + if not txq_deferred_start rte_eth_dev_tx_queue_stop()
>>       o rte_eth_dev_stop()
>>       o Re-configure the port: (Note this is original code, not new code
>>         which is forcing multisegs always)
>>           + Set rx_mode.jumbo_frame if MTU > 1518
>>           + Set rx_mode.enable_scatter if MTU > 2048
>>           + txq_flags = ETH_TXQ_NOOFLOADS
>>           + if MTU > 2048, txq_flags |= ETH_TXQ_FLAGS_NOMULTSEGS
>>           + rte_eth_promiscuous_get()
>>           + rte_eth_dev_info_get()
>>           + rte_eth_dev_configure()
>>               # Init tx_vec_allowed and rx_vec_allowed to TRUE.
>>           + rte_eth_dev_info_get()
>>           + For each txq: rte_eth_tx_queue_setup()
>>               # If new MTU > 2048, ETH_TXQ_FLAGS_NOMULTSEGS was set in
>>                 txq_flags & tx_vec_allowed will be cleared.
>>           + For each rxq: rte_eth_rx_queue_setup()
>>       o rte_eth_dev_set_mtu()
>>       o rte_eth_dev_start()
>>       o rte_eth_dev_info_get()
>>       o For each rxq:
>>           + if not rxq_deferred_start rte_eth_dev_rx_queue_start()
>>       o For each txq:
>>           + if not txq_deferred_start rte_eth_dev_tx_queue_start()
>>   * DP notifies CP MTU change applied.
>>   * CP sends admin up to DP
>>   * DP enables RX/TX
>>       o Enable all future tx burst calls
>>       o rte_eth_dev_set_link_up() invoked
>>       o Enable all future rx burst calls
>>   * DP notifies CP admin up action is complete
>>
>>
>>
> Thanks for the explanation.
>
> I'll follow-up with our i40e driver team to see if they can give me more
> detail on their investigation. I'd like to know more details about the
> ultimate root cause of this double free - presumably it's something to
> do with the behaviour when stopping and starting the queue or port - and
> find a solution that does not involve more writes on the fast-path. If
> we don't come up with a better solution, this fix will do, though.
>
> Regards,
> /Bruce
> .
>

Thanks Bruce,  I'll go ahead and submit V2 properly signed off with same 
content.

Regards,
Roger

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-10-06 14:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-05 19:11 [PATCH] net/i40e: Improve i40evf buffer cleanup in tx vector mode Roger B Melton
2017-10-05 23:42 ` Roger B. Melton
2017-10-06  0:33   ` Ferruh Yigit
2017-10-06  8:54 ` Bruce Richardson
2017-10-06 12:38   ` Roger B. Melton
2017-10-06 13:51     ` Bruce Richardson
2017-10-06 14:06       ` Roger B. Melton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.