All of lore.kernel.org
 help / color / mirror / Atom feed
* mlx5 flow create/destroy behaviour
@ 2017-03-28 12:42 Legacy, Allain
  2017-03-28 15:36 ` Nélio Laranjeiro
  0 siblings, 1 reply; 10+ messages in thread
From: Legacy, Allain @ 2017-03-28 12:42 UTC (permalink / raw)
  To: Nelio Laranjeiro (nelio.laranjeiro@6wind.com),
	Adrien Mazarguil (adrien.mazarguil@6wind.com)
  Cc: dev, Peters, Matt

Hi,
I am setting up an experiment to gauge the usability of the flow API and the flow marking behavior of the CX4.   I am working from v17.02.   I am seeing some unpredictable behavior that I am unsure of the cause. 

This is the layout of the test:
   
   2 x CX4 (15b3:1015) 
      + 1 port used on each
   A test application with 1 core, and 1 queue/port
   Traffic generator attached to each port
      + 500 unique src+dst MAC address combinations sent from each port
      + All traffic is VLAN tagged (1 VLAN per port)

The test application examines packets as they are received on each port.  It sets up flow rules and calls rte_flow_create() for each new layer2 flow that it observes.    The flow patterns are of the form ETH+VLAN+END where ETH matches src+dst+type=vlan, VLAN matches the port's VLAN ID.  The flow actions are of the form MARK+QUEUE+END where MARK assigns a unique integer to each flow and, and QUEUE assigns the flow to queue_id=0 (since the test app only has 1 queue per port).

Once the flows are setup, the application then checks that ingress packets are properly marked with the intended unique integer specified in the MARK action.  

The traffic is run for a short period of time and then stopped.  Once the traffic is stopped the application removes the flow rules by calling rte_flow_destroy().    There is no guarantee that the order of the destroys resembles in any way the order of the creates.   (I mention this because of this warning in rte_flow.h:  "This function is only guaranteed to succeed if handles are destroyed in reverse order of their creation.").   All of the calls to rte_flow_destroy() succeed. 

When I run this test after the NIC has been reset there are no issues.  All calls to rte_flow_create()/rte_flow_destroy() succeed and all packets have a valid mark ID that corresponds to the unique integer assigned to that src+dst+vlan grouping.    

The problem happens when I run this test for a second or third time without first resetting the NIC.  On subsequent test runs I still see no errors in create/destroy API calls but packets are no longer marked by the hardware.  In some test runs none of the flows have valid mark id values, and other test runs have some percentage of flows with valid mark id values while others do not.   The behavior seems inconsistent but if I reset the NIC the behavior goes back to working for 1 test run and then starts behaving incorrectly again on subsequent runs.

I should note that in subsequent test runs the MAC addresses are the same as previous runs, and but the mapping from unique integer to src+dst+vlan are different each time.  

Is this behavior consistent with your experience using the device and/or API?


Regards,
Allain


Allain Legacy, Software Developer
direct 613.270.2279  fax 613.492.7870 skype allain.legacy
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mlx5 flow create/destroy behaviour
  2017-03-28 12:42 mlx5 flow create/destroy behaviour Legacy, Allain
@ 2017-03-28 15:36 ` Nélio Laranjeiro
  2017-03-28 16:16   ` Legacy, Allain
  0 siblings, 1 reply; 10+ messages in thread
From: Nélio Laranjeiro @ 2017-03-28 15:36 UTC (permalink / raw)
  To: Legacy, Allain
  Cc: Adrien Mazarguil (adrien.mazarguil@6wind.com), dev, Peters, Matt

Hi Allain,

My attempt to reproduce it was a failure, may be I missed something,
please see below,

On Tue, Mar 28, 2017 at 12:42:05PM +0000, Legacy, Allain wrote:
> Hi,
> I am setting up an experiment to gauge the usability of the flow API
> and the flow marking behavior of the CX4.   I am working from v17.02.
> I am seeing some unpredictable behavior that I am unsure of the cause. 
> 
> This is the layout of the test:
>    
>    2 x CX4 (15b3:1015) 
>       + 1 port used on each
>    A test application with 1 core, and 1 queue/port
>    Traffic generator attached to each port
>       + 500 unique src+dst MAC address combinations sent from each port
>       + All traffic is VLAN tagged (1 VLAN per port)
> 
> The test application examines packets as they are received on each
> port.  It sets up flow rules and calls rte_flow_create() for each new
> layer2 flow that it observes.    The flow patterns are of the form
> ETH+VLAN+END where ETH matches src+dst+type=vlan, VLAN matches the
> port's VLAN ID.  The flow actions are of the form MARK+QUEUE+END where
> MARK assigns a unique integer to each flow and, and QUEUE assigns the
> flow to queue_id=0 (since the test app only has 1 queue per port).

If I understand correctly, your application is adding 500 rules like:

 flow create 0 ingress pattern eth src is <smac> dst is <dmac> / vlan vid is <vid> / end action mark id is <id> / queue index 0 / end

> Once the flows are setup, the application then checks that ingress
> packets are properly marked with the intended unique integer specified
> in the MARK action.

It is sending packets to verify this?

> The traffic is run for a short period of time and then stopped.  Once
> the traffic is stopped the application removes the flow rules by
> calling rte_flow_destroy().    There is no guarantee that the order of
> the destroys resembles in any way the order of the creates.   (I
> mention this because of this warning in rte_flow.h:  "This function is
> only guaranteed to succeed if handles are destroyed in reverse order
> of their creation.").   All of the calls to rte_flow_destroy()
> succeed. 
> 
> When I run this test after the NIC has been reset there are no issues.

What do you mean by "reset"?

> All calls to rte_flow_create()/rte_flow_destroy() succeed and all
> packets have a valid mark ID that corresponds to the unique integer
> assigned to that src+dst+vlan grouping.

In mlx5 PMD rte_flow_destroy() always returns success as the destruction
should never fail.
Can you compile in debug mode (by setting CONFIG_RTE_LIBRTE_MLX5_DEBUG
to "y")?  Then you should have as many print for the creation rules than
the destroyed ones.

> The problem happens when I run this test for a second or third time
> without first resetting the NIC.  On subsequent test runs I still see
> no errors in create/destroy API calls but packets are no longer marked
> by the hardware.  In some test runs none of the flows have valid mark
> id values, and other test runs have some percentage of flows with
> valid mark id values while others do not.   The behavior seems
> inconsistent but if I reset the NIC the behavior goes back to working
> for 1 test run and then starts behaving incorrectly again on
> subsequent runs.
> 
> I should note that in subsequent test runs the MAC addresses are the
> same as previous runs, and but the mapping from unique integer to
> src+dst+vlan are different each time.
> 
> Is this behavior consistent with your experience using the device
> and/or API?

No I did not face such issue, the behavior was consistent, but I never
tried to generate so many rules in the past.


Thanks,

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mlx5 flow create/destroy behaviour
  2017-03-28 15:36 ` Nélio Laranjeiro
@ 2017-03-28 16:16   ` Legacy, Allain
  2017-03-29  9:45     ` Nélio Laranjeiro
  0 siblings, 1 reply; 10+ messages in thread
From: Legacy, Allain @ 2017-03-28 16:16 UTC (permalink / raw)
  To: Nélio Laranjeiro
  Cc: Adrien Mazarguil (adrien.mazarguil@6wind.com), dev, Peters, Matt

> -----Original Message-----
> From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com]
> Sent: Tuesday, March 28, 2017 11:36 AM
<..> 
> If I understand correctly, your application is adding 500 rules like:
> 
>  flow create 0 ingress pattern eth src is <smac> dst is <dmac> / vlan vid is
> <vid> / end action mark id is <id> / queue index 0 / end
>

Almost... the only difference is that the ETH pattern also checks for type=0x8100

> > Once the flows are setup, the application then checks that ingress
> > packets are properly marked with the intended unique integer specified
> > in the MARK action.
> 
> It is sending packets to verify this?

The traffic generator continues to send packets during the test.   Once all flow rules have been created the application expects further ingress packets will be marked with the unique ID.

 
> > When I run this test after the NIC has been reset there are no issues.
> 
> What do you mean by "reset"?

The DPDK test application is quit and restarted therefore the NIC is re-probed, configured, started, etc.   Seems like this is cleaning up whatever problem is resident in the NIC that is causing the new flow rules to not work properly. 

 
> In mlx5 PMD rte_flow_destroy() always returns success as the destruction
> should never fail.
> Can you compile in debug mode (by setting
> CONFIG_RTE_LIBRTE_MLX5_DEBUG to "y")?  Then you should have as many
> print for the creation rules than the destroyed ones.
 
I can give that a try.  

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mlx5 flow create/destroy behaviour
  2017-03-28 16:16   ` Legacy, Allain
@ 2017-03-29  9:45     ` Nélio Laranjeiro
  2017-03-29 12:29       ` Legacy, Allain
  0 siblings, 1 reply; 10+ messages in thread
From: Nélio Laranjeiro @ 2017-03-29  9:45 UTC (permalink / raw)
  To: Legacy, Allain
  Cc: Adrien Mazarguil (adrien.mazarguil@6wind.com), dev, Peters, Matt

I Allain,

Please see below

On Tue, Mar 28, 2017 at 04:16:08PM +0000, Legacy, Allain wrote:
> > -----Original Message-----
> > From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com]
> > Sent: Tuesday, March 28, 2017 11:36 AM
> <..> 
> > If I understand correctly, your application is adding 500 rules like:
> > 
> >  flow create 0 ingress pattern eth src is <smac> dst is <dmac> / vlan vid is
> > <vid> / end action mark id is <id> / queue index 0 / end
> >
> 
> Almost... the only difference is that the ETH pattern also checks for type=0x8100

Ethernet type was not supported in DPDK 17.02, it was submitted later in
march [1].  Did you embed the patch in your test?

> > > Once the flows are setup, the application then checks that ingress
> > > packets are properly marked with the intended unique integer specified
> > > in the MARK action.
> > 
> > It is sending packets to verify this?
> 
> The traffic generator continues to send packets during the test.
> Once all flow rules have been created the application expects further
> ingress packets will be marked with the unique ID.
>  
> > > When I run this test after the NIC has been reset there are no issues.
> > 
> > What do you mean by "reset"?
> 
> The DPDK test application is quit and restarted therefore the NIC is
> re-probed, configured, started, etc.   Seems like this is cleaning up
> whatever problem is resident in the NIC that is causing the new flow
> rules to not work properly. 
> 
>  
> > In mlx5 PMD rte_flow_destroy() always returns success as the destruction
> > should never fail.
> > Can you compile in debug mode (by setting
> > CONFIG_RTE_LIBRTE_MLX5_DEBUG to "y")?  Then you should have as many
> > print for the creation rules than the destroyed ones.
>  
> I can give that a try.

Thanks,

[1] http://dpdk.org/ml/archives/dev/2017-March/058722.html

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mlx5 flow create/destroy behaviour
  2017-03-29  9:45     ` Nélio Laranjeiro
@ 2017-03-29 12:29       ` Legacy, Allain
  2017-03-30 13:03         ` Nélio Laranjeiro
  0 siblings, 1 reply; 10+ messages in thread
From: Legacy, Allain @ 2017-03-29 12:29 UTC (permalink / raw)
  To: Nélio Laranjeiro
  Cc: Adrien Mazarguil (adrien.mazarguil@6wind.com), dev, Peters, Matt

> -----Original Message-----
> From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com]
> Sent: Wednesday, March 29, 2017 5:45 AM

<...>
> > Almost... the only difference is that the ETH pattern also checks for
> type=0x8100
> 
> Ethernet type was not supported in DPDK 17.02, it was submitted later in
> march [1].  Did you embed the patch in your test?

No, but I am using the default eth mask (rte_flow_item_eth_mask) so it looks like it is accepting any ether type even though I set the vlan type along with the src+dst.


> > > Can you compile in debug mode (by setting
> > > CONFIG_RTE_LIBRTE_MLX5_DEBUG to "y")?  Then you should have as
> many
> > > print for the creation rules than the destroyed ones.
> >
> > I can give that a try.

I ran with debug logs enabled and there are no logs coming from the PMD that indicate an error.  All create and destroy calls report a successful result. 

I modified my test slightly yesterday to try to determine what is happening.  What I found that if I use a smaller number of flows the problem does not happen, but as soon as I use 256 flows or greater the problem manifests itself.   What I mean is:

test 1:
   1) start 16 flows (16 unique src MAC addresses sending to 16 unique dst MAC addresses)
   2) create flow rules
   3) check that all subsequent packets are marked correctly
   4) stop traffic
   5) destroy all flow rules
   6) wait 15 seconds
   7) repeat from (1) for 4 iterations.

test 2:
   same as test1 but with 32 flows

test 3:
   same as test1 but with 64 flows

test 4:
   same as test1 but with 128 flows

test 5:
   same as test1 but with 256 flows (this is where the problem starts happening)... it could very well be somewhere closer to 128 but I am stepping up by powers of 2 so this is the first occurrence. 


I also modified my test to destroy flow rules in the opposite order that I created them just in case ordering is an issue but that had no effect. 

Regards,
Allain

Allain Legacy, Software Developer, Wind River
direct 613.270.2279 fax: 613.492.7870 skype: allain.legacy
350 Terry Fox Drive, Suite 200, Ottawa, Ontario, K2K 2W5

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mlx5 flow create/destroy behaviour
  2017-03-29 12:29       ` Legacy, Allain
@ 2017-03-30 13:03         ` Nélio Laranjeiro
  2017-03-30 16:53           ` Legacy, Allain
  0 siblings, 1 reply; 10+ messages in thread
From: Nélio Laranjeiro @ 2017-03-30 13:03 UTC (permalink / raw)
  To: Legacy, Allain
  Cc: Adrien Mazarguil (adrien.mazarguil@6wind.com), dev, Peters, Matt

Hi Allain,

On Wed, Mar 29, 2017 at 12:29:59PM +0000, Legacy, Allain wrote:
> > -----Original Message-----
> > From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com]
> > Sent: Wednesday, March 29, 2017 5:45 AM
> 
> <...>
> > > Almost... the only difference is that the ETH pattern also checks for
> > type=0x8100
> > 
> > Ethernet type was not supported in DPDK 17.02, it was submitted later in
> > march [1].  Did you embed the patch in your test?
> 
> No, but I am using the default eth mask (rte_flow_item_eth_mask) so it
> looks like it is accepting any ether type even though I set the vlan
> type along with the src+dst.

Right,

> > > > Can you compile in debug mode (by setting
> > > > CONFIG_RTE_LIBRTE_MLX5_DEBUG to "y")?  Then you should have as
> > many
> > > > print for the creation rules than the destroyed ones.
> > >
> > > I can give that a try.
> 
> I ran with debug logs enabled and there are no logs coming from the
> PMD that indicate an error.  All create and destroy calls report a
> successful result. 
> 
> I modified my test slightly yesterday to try to determine what is
> happening.  What I found that if I use a smaller number of flows the
> problem does not happen, but as soon as I use 256 flows or greater the
> problem manifests itself.   What I mean is:
> 
> test 1:
>    1) start 16 flows (16 unique src MAC addresses sending to 16 unique dst MAC addresses)
>    2) create flow rules
>    3) check that all subsequent packets are marked correctly
>    4) stop traffic
>    5) destroy all flow rules
>    6) wait 15 seconds
>    7) repeat from (1) for 4 iterations.
> 
> test 2:
>    same as test1 but with 32 flows
> 
> test 3:
>    same as test1 but with 64 flows
> 
> test 4:
>    same as test1 but with 128 flows
> 
> test 5:
>    same as test1 but with 256 flows (this is where the problem starts
>    happening)... it could very well be somewhere closer to 128 but I
>    am stepping up by powers of 2 so this is the first occurrence. 
>
> I also modified my test to destroy flow rules in the opposite order
> that I created them just in case ordering is an issue but that had no
> effect. 

I found an issue on the id retrieval while receiving an high rate of the
same flow [1].  You may face the same issue.  Can you verify with the
patch?

Thanks,

[1] http://dpdk.org/dev/patchwork/patch/22897/

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mlx5 flow create/destroy behaviour
  2017-03-30 13:03         ` Nélio Laranjeiro
@ 2017-03-30 16:53           ` Legacy, Allain
  2017-03-31  8:34             ` Nélio Laranjeiro
  0 siblings, 1 reply; 10+ messages in thread
From: Legacy, Allain @ 2017-03-30 16:53 UTC (permalink / raw)
  To: Nélio Laranjeiro
  Cc: Adrien Mazarguil (adrien.mazarguil@6wind.com), dev, Peters, Matt

> -----Original Message-----
> From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com]
> Sent: Thursday, March 30, 2017 9:03 AM
<...> 
> I found an issue on the id retrieval while receiving an high rate of the
> same flow [1].  You may face the same issue.  Can you verify with the
> patch?
> 
> Thanks,
> 
> [1] http://dpdk.org/dev/patchwork/patch/22897/

I had some difficulty applying that patch onto v17.02 so I took all of the patches to the mlx5 driver that are in dpdk-next-net just to be sure I had all other outstanding fixes. 

The behavior did not change.  I still see flows that are not marked even after a flow rule has been created to match on that particular flow.    It seems like it works in batches... 10-20 flows will work, and then the next 10-20 flows won't work, and then the next 10-20 flows will work.   But, in all cases I have logs that show that the flow rules were created properly for all flows, and destroyed properly at the end of each test.    It seems pretty consistent that the first test after a NIC reset always works on all flows, but then subsequent tests see variable results.  Every so often I get another test run that has no issues but then the failure pattern resumes on the next attemp. 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mlx5 flow create/destroy behaviour
  2017-03-30 16:53           ` Legacy, Allain
@ 2017-03-31  8:34             ` Nélio Laranjeiro
  2017-03-31 13:16               ` Legacy, Allain
  0 siblings, 1 reply; 10+ messages in thread
From: Nélio Laranjeiro @ 2017-03-31  8:34 UTC (permalink / raw)
  To: Legacy, Allain, Olga Shern
  Cc: Adrien Mazarguil (adrien.mazarguil@6wind.com), dev, Peters, Matt

On Thu, Mar 30, 2017 at 04:53:47PM +0000, Legacy, Allain wrote:
> > -----Original Message-----
> > From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com]
> > Sent: Thursday, March 30, 2017 9:03 AM
> <...> 
> > I found an issue on the id retrieval while receiving an high rate of the
> > same flow [1].  You may face the same issue.  Can you verify with the
> > patch?
> > 
> > Thanks,
> > 
> > [1] http://dpdk.org/dev/patchwork/patch/22897/
> 
> I had some difficulty applying that patch onto v17.02 so I took all of
> the patches to the mlx5 driver that are in dpdk-next-net just to be
> sure I had all other outstanding fixes. 
> 
> The behavior did not change.  I still see flows that are not marked
> even after a flow rule has been created to match on that particular
> flow.    It seems like it works in batches... 10-20 flows will work,
> and then the next 10-20 flows won't work, and then the next 10-20
> flows will work.   But, in all cases I have logs that show that the
> flow rules were created properly for all flows, and destroyed properly
> at the end of each test.    It seems pretty consistent that the first
> test after a NIC reset always works on all flows, but then subsequent
> tests see variable results.  Every so often I get another test run
> that has no issues but then the failure pattern resumes on the next
> attemp. 

+ Olga Shern,

Allain,

Thanks for all this tests, for this last point is seems to be a firmware
or hardware issue, I don't have any way to help on that case.

I suggest you to contact directly Mellanox to have some support
regarding this.

Thanks,

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mlx5 flow create/destroy behaviour
  2017-03-31  8:34             ` Nélio Laranjeiro
@ 2017-03-31 13:16               ` Legacy, Allain
  2017-03-31 13:34                 ` Nélio Laranjeiro
  0 siblings, 1 reply; 10+ messages in thread
From: Legacy, Allain @ 2017-03-31 13:16 UTC (permalink / raw)
  To: Nélio Laranjeiro, Olga Shern
  Cc: Adrien Mazarguil (adrien.mazarguil@6wind.com), dev, Peters, Matt

> -----Original Message-----
> From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com]
> Sent: Friday, March 31, 2017 4:35 AM
<...>
> + Olga Shern,
> 
> Allain,
> 
> Thanks for all this tests, for this last point is seems to be a firmware or
> hardware issue, I don't have any way to help on that case.
> 
> I suggest you to contact directly Mellanox to have some support regarding
> this.
Ok, we will do that. 

I was able to reproduce this using testpmd rather than our own application.   If I sent you the commands (off list) that I used and steps to reproduce would you be willing to confirm that our exact setup works for you?

Regards,
Allain

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mlx5 flow create/destroy behaviour
  2017-03-31 13:16               ` Legacy, Allain
@ 2017-03-31 13:34                 ` Nélio Laranjeiro
  0 siblings, 0 replies; 10+ messages in thread
From: Nélio Laranjeiro @ 2017-03-31 13:34 UTC (permalink / raw)
  To: Legacy, Allain
  Cc: Olga Shern, Adrien Mazarguil (adrien.mazarguil@6wind.com),
	dev, Peters, Matt

On Fri, Mar 31, 2017 at 01:16:51PM +0000, Legacy, Allain wrote:
> > -----Original Message-----
> > From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com]
> > Sent: Friday, March 31, 2017 4:35 AM
> <...>
> > + Olga Shern,
> > 
> > Allain,
> > 
> > Thanks for all this tests, for this last point is seems to be a firmware or
> > hardware issue, I don't have any way to help on that case.
> > 
> > I suggest you to contact directly Mellanox to have some support regarding
> > this.
> Ok, we will do that. 
> 
> I was able to reproduce this using testpmd rather than our own
> application.   If I sent you the commands (off list) that I used and
> steps to reproduce would you be willing to confirm that our exact
> setup works for you?

Yes, please send it.

Thanks,

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-03-31 13:35 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-28 12:42 mlx5 flow create/destroy behaviour Legacy, Allain
2017-03-28 15:36 ` Nélio Laranjeiro
2017-03-28 16:16   ` Legacy, Allain
2017-03-29  9:45     ` Nélio Laranjeiro
2017-03-29 12:29       ` Legacy, Allain
2017-03-30 13:03         ` Nélio Laranjeiro
2017-03-30 16:53           ` Legacy, Allain
2017-03-31  8:34             ` Nélio Laranjeiro
2017-03-31 13:16               ` Legacy, Allain
2017-03-31 13:34                 ` Nélio Laranjeiro

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.