All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Viktorin <viktorin@cesnet.cz>
To: Ori Kam <orika@nvidia.com>
Cc: "Andrew Rybchenko" <andrew.rybchenko@oktetlabs.ru>,
	"Havlík Martin" <xhavli56@stud.fit.vutbr.cz>,
	"Min Hu (Connor)" <humin29@huawei.com>,
	dev@dpdk.org, "Matan Azrad" <matan@nvidia.com>,
	Chas3 <chas3@att.com>, "Shahaf Shuler" <shahafs@nvidia.com>,
	"Slava Ovsiienko" <viacheslavo@nvidia.com>,
	"Ferruh Yigit" <ferruh.yigit@intel.com>,
	"Thomas Monjalon" <thomas@monjalon.net>
Subject: Re: [dpdk-dev] [PATCH 3/3] net/bonding: start ethdev prior to setting 8023ad flow
Date: Tue, 13 Jul 2021 13:06:09 +0200	[thread overview]
Message-ID: <20210713130609.00475428@coaster.localdomain> (raw)
In-Reply-To: <f1ef05bd-0d07-5598-6d8e-55b30b811b6c@oktetlabs.ru>

On Tue, 13 Jul 2021 12:26:35 +0300
Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> wrote:

> On 7/13/21 11:18 AM, Havlík Martin wrote:
> > Dne 2021-07-12 15:07, Ori Kam napsal:  
> >> Hi Jan,
> >>  
> >>> -----Original Message-----
> >>> From: Jan Viktorin <viktorin@cesnet.cz>
> >>> Sent: Monday, July 12, 2021 12:46 AM
> >>>
> >>> On Sun, 11 Jul 2021 08:49:18 +0000
> >>> Ori Kam <orika@nvidia.com> wrote:
> >>>  
> >>> > Hi Jan,  
> >>>
> >>> Hi Ori,
> >>>  
> >>> >
> >>> >  
> >>> > > -----Original Message-----
> >>> > > From: dev <dev-bounces@dpdk.org> On Behalf Of Jan Viktorin
> >>> > > Sent: Wednesday, July 7, 2021 6:54 PM
> >>> > >
> >>> > > On Sun, 4 Jul 2021 15:18:01 +0000
> >>> > > Matan Azrad <matan@nvidia.com> wrote:
> >>> > >  
> >>> > > > From: Havlík Martin  
> >>> > > > > Dne 2021-06-23 09:04, Min Hu (Connor) napsal:  
> >>> > > > > > 在 2021/6/22 17:25, Martin Havlik 写道:  
> >>> > > > > >> When dedicated queues are enabled, mlx5 PMD fails to
> >>> > > > > >> install RTE Flows if the underlying ethdev is not
> >>> > > > > >> started: bond_ethdev_8023ad_flow_set(267) -  
> >>> > > bond_ethdev_8023ad_flow_set:  
> >>> > > > > port  
> >>> > > > > >> not started (slave_port=0 queue_id=1)
> >>> > > > > >>  
> >>> > > > > > Why mlx5 PMD doing flow create relys on port started ?
> >>> > > > > > I noticed other PMDs did not has that reliance.
> >>> > > > > >  
> >>> > > > > After looking into it, I really can't answer this mlx5
> >>> > > > > centered question. Closest related info we found so far
> >>> > > > > is the 5th point in
> >>> > > > > https://doc.dpdk.org/guides/prog_guide/rte_flow.html#caveats
> >>> > > > > but it only specifies it's the application's
> >>> > > > > responsibility and that flow rules are assumed invalid
> >>> > > > > after port stop/close/restart but doesn't say anything
> >>> > > > > about <stop - flow rule create - start> vs <stop - start
> >>> > > > > - flow rule create> where the former is the point of
> >>> > > > > failure on mlx5. I'm addressing the maintainers for mlx5,
> >>> > > > > who might know a bit more on the topic.  
> >>> > > >  
> >>> > >
> >>> > > Hello Matan,
> >>> > >  
> >>> > > > From rte_ethdev.h
> >>> > > >
> >>> > > > * Please note that some configuration is not stored between
> >>> > > > calls to
> >>> > > >  * rte_eth_dev_stop()/rte_eth_dev_start(). The following
> >>> > > > configuration will
> >>> > > >  * be retained:
> >>> > > >  *
> >>> > > >  *     - MTU
> >>> > > >  *     - flow control settings
> >>> > > >  *     - receive mode configuration (promiscuous mode,  
> >>> all-multicast  
> >>> > > > mode,
> >>> > > >  *       hardware checksum mode, RSS/VMDQ settings etc.)
> >>> > > >  *     - VLAN filtering configuration
> >>> > > >  *     - default MAC address
> >>> > > >  *     - MAC addresses supplied to MAC address array
> >>> > > >  *     - flow director filtering mode (but not filtering
> >>> > > >rules)
> >>> > > >  *     - NIC queue statistics mappings  
> >>> > >
> >>> > > just after this section, you can find the following statement:
> >>> > >
> >>> > >  * Any other configuration will not be stored and will need
> >>> > >to be
> >>> > > re-entered
> >>> > >  * before a call to rte_eth_dev_start().
> >>> > >
> >>> > > It is not very clear how is this exactly related to flows
> >>> > > (and this applies for all the quoted section, I think) but at
> >>> > > least it can  
> >>> be used as a
> >>> counter argument.  
> >>> > >  
> >>> > I agree the doc is not clear, as I see it flows are not part of
> >>> > configuration, at least not when we are talking about rte_flow.
> >>> >  
> >>>
> >>> Agree.
> >>>  
> >>> >  
> >>> > > >
> >>> > > >
> >>> > > > Mlx5 assumes flows are allowed to be configured only after
> >>> > > > rte_eth_dev_start(). Before start \ after stop - no flow is
> >>> > > > valid anymore.  
> >>> > >
> >>> > > I believe that this discussion is not about validity of
> >>> > > flows. Let the flows be invalid after calling to
> >>> > > rte_eth_dev_stop(). This is OK, flows must be recreated and
> >>> > > the bonding driver works this way. But why not *before
> >>> > > start*?  
> >>> > Think about it this way by changing the configuration you may
> >>> > create invalid flows, for example, you can only change the
> >>> > number of queues after port stop, so if you create a flow with
> >>> > jump to queue 3 and then you remove queue 3 then, the flow that
> >>> > is cached is not valid anymore. This goes for other
> >>> > configuration that may affect the validity of a  
> >>> flow.
> >>>
> >>> This is a valid argument, of course. The thing is whether it is a
> >>> responsibility
> >>> of the PMD to take care of those corner cases or if this is up to
> >>> the application developer. If we respect the fact that calling to
> >>> stop invalidates all
> >>> flows then when you do:
> >>>  
> >>>  > port stop 0
> >>>  > flow create 0 ingress pattern ... / end actions queue 6 / end
> >>> > port config
> >>> rxq 3  > port start 0
> >>>
> >>> it's clear that something is really wrong with the caller/user. I
> >>> would say that
> >>> this is an application bug. I would expect that you first
> >>> reconfigure port and
> >>> after that you modify flows. It seems quite logical and intuitive,
> >>> doesn't it?
> >>>  
> >>
> >> I agree the use case I described is bad programming, but the same
> >> logic can be
> >> applied on stopping the port why flows should be deleted?

I just took into account what Matan has written. Certainly, it is not
necessary to delete flows on stop.

> >> And even more important on some HW and some flows can only be
> >> inserted after
> >> start since there is no real traffic gate, basically the start
> >> means adding the flows
> >> if a flow is in the HW it is working, adding a gate will result in
> >> PPS degradation.

What do you mean by term "gate", here? Some FPGA logic? Or something
else? Why "adding a gate" would result in PPS degradation? Can you
make those ideas a bit clearer?

> >>  
> >>> Anyway, the PMD can possibly catch this rxq inconsistency but I
> >>> can imagine
> >>> that there are more complicated sitations than just changing
> >>> count of queues. Any idea for a more complex real case?
> >>>  
> >>
> >> Some of the resource may be created only after start,
> >> But I don't have a very good example now.

Please, try to find some reasonable example otherwise your argumentation
does not make much sense...

> >>  
> >>> Martin Havlik, can you please check how ixgbe and i40e behave in
> >>> the case
> >>> above with "rxq change after flow created"?
> >>>  
> > Both PMDs do not raise any errors in the mentioned case. The packets
> > that should go to the non-existent queue are dropped at some point
> > in the processing (from what I could see and understand).

Thanks for clarification. It would be nice if any other PMD can be
tested for this behaviour. Can sombody help us with other NICs?

> > 
> > I'm glad for the discussion that is taking place but I feel we have
> > strayed from the original reason of it. That being the inability to
> > enable dedicated queues for bonding mode 8023AD on mlx5. We have
> > concluded that flow creation has to be done after port/device
> > start. So moving the function call to create the flow after device
> > start should solve the issue
> > (https://mails.dpdk.org/archives/dev/2021-June/212210.html). From my
> > testing, flow create after start is not a problem on Intel PMDs.  
> 
> It would be good to hear what net/bonding maintainers think
> about it. I see no conclusion.
> 
> If net/mlx5 behaviour is fixed, will it fix the initial
> problem?

I believe that if mlx5 allows to create flows before calling to
rte_eth_dev_start() then the bonding would work as expected.

> 
> > 
> > I'm turning to you, Connor, as you have made the original question
> > on this. Is the patch I presented applicable?
> > 
> > Martin  
> >>> >  
> >>> > > Does somebody know how other drivers behaves in this
> >>> > > situation? (We know and can check for Intel, there it does
> >>> > > not seem to be an issue.)
> >>> > >
> >>> > > By the way, the mlx5 behaviour opens a (probably short) time
> >>> > > window between starting of a port and configuation of
> >>> > > filtering flows. You may want to start the port with
> >>> > > thousands of flows that apply just when the port starts (not
> >>> > > after, that's late). This may introduce glitches in filtering
> >>> > > and measuring of traffic (well, it is a  
> >>> question how
> >>> serious issue could it be...).  
> >>> > >  
> >>> > Agree but this is always true nothing is done is zero time and
> >>> > even if it was the insertion is not in zero time, (assuming
> >>> > that even if the flows are stored by the PMD until start they
> >>> > still will not all be inserted in the same time)  
> >>>
> >>> Sorry, I probably do not understand well. Yes, creation of a flow
> >>> would never
> >>> be zero time, agree. But if I create flows before the port starts,
> >>> than I do not
> >>> have to care too much about how long the creation takes. Because
> >>> the card is
> >>> expected to be configured already before pushing the "start
> >>> button".
> >>>
> >>> Maybe, you assume that the created flows would first be cached
> >>> inside the
> >>> PMD and when the port is finally started, it than transparently
> >>> write the
> >>> queues to the HW. But this is not what I was talking about, that's
> >>> maybe even
> >>> worse because you are hiding such behaviour from users...
> >>>
> >>> (I do not know the exact mlx5 implementation details, so my
> >>> answer can miss something, sorry for that.)
> >>>  
> >>
> >> You are correct this is HW/PMD implementation issue, in case of
> >> Nvidia we must

By "must" you probably mean "would have to"? Is that correct?

> >> cache the flows and only insert it after start.
> >> Since RTE flow is generic and HW is not generic sometimes the PMD
> >> needs to
> >> translate this difference and in this gray area there is a game
> >> between best API best performance understanding what the user

Yes, as a former FPGA dev, I can understand your concerns. RTE Flow is
a difficult beast.

Jan

> >> really wants. This is why the doc should be very clear on what is
> >> the expected. but this is also the reason why such document is
> >> very hard to agree on.  
> 
> Do I understand correctly that net/mlx5 maintainers will follow
> up?
> 
> >>
> >> Ori  
> >>> >  
> >>> > > This matters for the bonding case as well, doesn't it?. It is
> >>> > > not desirable to accidently omit a packet that was received
> >>> > > by primary ingress logic instead of being redirected into the
> >>> > > dedicated queue.
> >>> > >
> >>> > > Are there any chances that for mlx5 it would be possible to
> >>> > > insert flow rules before calling rte_eth_dev_start? Anyway,
> >>> > > the behaviour should be specified and documented in DPDK more
> >>> > > precisely to avoid such uncertainty in the future.
> >>> > >  
> >>> > I agree the documentation should be fixed.  
> >>>
> >>> +1  
> 
> Cc Thomas and Ferruh since ethdev documentation should be
> clarified.
> 
> It looks like there is no consensus if the patch is a right
> direction or wrong. For me it looks wrong taking all above
> arguments in to account (mainly necessity to be able to
> insert flows before pushing start button which enables the
> traffic if HW supports it).
> 
> So, I'm applying first two patches and hold on this one.


  reply	other threads:[~2021-07-13 11:06 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-22  9:25 [dpdk-dev] [PATCH 0/3] net/bonding: make dedicated queues work with mlx5 Martin Havlik
2021-06-22  9:25 ` [dpdk-dev] [PATCH 1/3] net/bonding: fix proper return value check and correct log message Martin Havlik
2021-07-08 12:33   ` Min Hu (Connor)
2021-07-13  9:26     ` Andrew Rybchenko
2021-06-22  9:25 ` [dpdk-dev] [PATCH 2/3] net/bonding: fix not checked return value Martin Havlik
2021-07-08 12:43   ` Min Hu (Connor)
2021-07-08 13:20     ` Havlík Martin
2021-07-09  0:09       ` Min Hu (Connor)
2021-07-13  9:26         ` Andrew Rybchenko
2021-06-22  9:25 ` [dpdk-dev] [PATCH 3/3] net/bonding: start ethdev prior to setting 8023ad flow Martin Havlik
2021-06-23  7:04   ` Min Hu (Connor)
2021-06-24 11:57     ` Havlík Martin
2021-07-04 15:18       ` Matan Azrad
2021-07-07 15:54         ` Jan Viktorin
2021-07-11  8:49           ` Ori Kam
2021-07-11 21:45             ` Jan Viktorin
2021-07-12 13:07               ` Ori Kam
2021-07-13  8:18                 ` Havlík Martin
2021-07-13  9:26                   ` Andrew Rybchenko
2021-07-13 11:06                     ` Jan Viktorin [this message]
2021-07-13 17:17                       ` Ori Kam
2021-07-14 15:00                         ` Jan Viktorin
2021-07-15 13:58                           ` Thomas Monjalon
2021-08-24 13:18                             ` Ferruh Yigit
2021-08-26 10:15                               ` Jan Viktorin
2021-07-04 15:03 ` [dpdk-dev] [PATCH 0/3] net/bonding: make dedicated queues work with mlx5 Andrew Rybchenko
2021-09-25  6:15 [dpdk-dev] [PATCH 3/3] net/bonding: start ethdev prior to setting 8023ad flow Havlík Martin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210713130609.00475428@coaster.localdomain \
    --to=viktorin@cesnet.cz \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=chas3@att.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@intel.com \
    --cc=humin29@huawei.com \
    --cc=matan@nvidia.com \
    --cc=orika@nvidia.com \
    --cc=shahafs@nvidia.com \
    --cc=thomas@monjalon.net \
    --cc=viacheslavo@nvidia.com \
    --cc=xhavli56@stud.fit.vutbr.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.