From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Hanoch Haim (hhaim)" Subject: Re: mlx5 reta size is dynamic Date: Thu, 22 Mar 2018 12:33:51 +0000 Message-ID: <6d30b3680a6d4e12b43a8e50b29bbd90@XCH-RTP-017.cisco.com> References: <1b6a9384a5604f15948162766cde90a9@XCH-RTP-017.cisco.com> <20180321214749.GA53128@yongseok-MBP.local> <20180322085441.a3o2eyvols7jkzxo@laranjeiro-vm.dev.6wind.com> <92a7d23b9df748b6af83f7dda88672e4@XCH-RTP-017.cisco.com> <20180322092734.6iulb7yxfkbdsi3h@laranjeiro-vm.dev.6wind.com> <20180322104531.ivfs3hdqezobcxjn@laranjeiro-vm.dev.6wind.com> <7ec498986339404ba89851ef2536ece8@XCH-RTP-017.cisco.com> <20180322122927.gfevzmmdkdzq4n66@laranjeiro-vm.dev.6wind.com> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Cc: Yongseok Koh , "dev@dpdk.org" To: =?iso-8859-1?Q?N=E9lio_Laranjeiro?= Return-path: Received: from alln-iport-1.cisco.com (alln-iport-1.cisco.com [173.37.142.88]) by dpdk.org (Postfix) with ESMTP id B103B5F17 for ; Thu, 22 Mar 2018 13:33:53 +0100 (CET) In-Reply-To: <20180322122927.gfevzmmdkdzq4n66@laranjeiro-vm.dev.6wind.com> Content-Language: en-US List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Regarding #2=20 For some reason the "rte_eth_dev_rss_reta_update" API didn't make a change = for Intel NIC if it was called *before* start. (weird I agree) Moving it after start API solve the issue for all the drivers .. Thanks, Hanoh -----Original Message----- From: N=E9lio Laranjeiro [mailto:nelio.laranjeiro@6wind.com]=20 Sent: Thursday, March 22, 2018 2:29 PM To: Hanoch Haim (hhaim) Cc: Yongseok Koh; dev@dpdk.org Subject: Re: [dpdk-dev] mlx5 reta size is dynamic Hi, On Thu, Mar 22, 2018 at 10:59:36AM +0000, Hanoch Haim (hhaim) wrote: > Hi, >=20 > 1) Regarding this sentence, > "Your need is to have a fixed size returned by the=20 > rte_eth_dev_info_get(), the PMD can have an internal dynamic size, it=20 > won't modify your spreading." >=20 > I'm fine with that as long: >=20 > 1. rte_eth_dev_info_get will expose the same *size* 2.=20 > rte_eth_dev_rss_reta_update will behave the as there are reta_size for=20 > *any* random input (will enlarge the table internally to maximum > size) > In other words, from the user prospective you will have static=20 > reta_size. Good, the requirement is clear enough for me i.e. user static RETA table si= ze and spreading accordingly. > 2) "In such situation, changing the RETA means stopping the traffic,=20 > destroying every single flow, hash Rx queue, indirection table to=20 > remake everything with the new configuration. > Until then, we always recommended to any application to restart the=20 > port on this device after a RETA update to apply this new=20 > configuration." >=20 > From an experiment I did, you *can* change it under traffic and it works = without issue. > Drivers tested are: igbe/i40e/mlx5 hmm, it is certainly calling a devops which will end by calling mlx5_traffi= c_start(). Thanks, > Thanks, > Hanoh >=20 >=20 > -----Original Message----- > From: N=E9lio Laranjeiro [mailto:nelio.laranjeiro@6wind.com] > Sent: Thursday, March 22, 2018 12:46 PM > To: Hanoch Haim (hhaim) > Cc: Yongseok Koh; dev@dpdk.org > Subject: Re: [dpdk-dev] mlx5 reta size is dynamic >=20 > Hi Hanoch, >=20 > On Thu, Mar 22, 2018 at 10:00:45AM +0000, Hanoch Haim (hhaim) wrote: > > Hi Nelio, > >=20 > > Let me provide more background.=20 > > The context is TRex running in Advance Stateful (ASTf) mode using multi= -core. =20 > > In this case the flows are distributed using RSS. New flows (c->s)=20 > > need to have a tuple that will match the generated core. For this=20 > > calculation there is a need of to know the *RETA table size* > >=20 > >=20 > > Code: > >=20 > > /*1. verify that driver can support RSS */ > > rte_eth_dev_info_get(m_repid,&dev_info); > > save_reta_size =3D dev_info.reta_size > > save_hash_key =3D dev_info.hash_key_size > > printf("RETA_SIZE : %d \n",save_reta_size); > > printf("HASH_SIZE : %d \n",save_hash_key); > >=20 > > /*2. configure queues */ > > ret =3D rte_eth_dev_configure(m_repid, > > nb_rx_queue, > > nb_tx_queue, > > eth_conf); > > .. > >=20 > > /* 3. reading the RETA again */ > > rte_eth_dev_info_get(m_repid,&dev_info); > > save_reta_size =3D dev_info.reta_size << > > save_hash_key =3D dev_info.hash_key_size > > printf("RETA_SIZE1 : %d \n",save_reta_size); > >=20 > >=20 > > /* 4. update the RETA table */ > > rte_eth_dev_rss_reta_update(m_repid, &reta_conf[0], > > dev_info.reta_size) > >=20 > > =20 > > 2. /*Output in case of Intel i40e*/ > >=20 > > RETA_SIZE : 512 > > HASH_SIZE : 52 > >=20 > > RETA_SIZE1 : 512 > >=20 > > 3. /*Output in case of Mlx5 */ > >=20 > > RETA_SIZE : 512 > > HASH_SIZE : 0 > >=20 > > RETA_SIZE1 : 4 << not round of 64 , depends on the number of=20 > > rx queues >=20 > Your need is to have a fixed size returned by the rte_eth_dev_info_get(),= the PMD can have an internal dynamic size, it won't modify your spreading. >=20 > An information, you are getting the hash key size, according to the docum= entation of struct rte_eth_rss_conf, only the i40e can have a key len diffe= rent from 40 bytes, others should just ignore the field [1]. >=20 > Regards, >=20 > [1]=20 > https://dpdk.org/browse/dpdk/tree/lib/librte_ether/rte_ethdev.h#n380 >=20 > > Hanoh > >=20 > > -----Original Message----- > > From: N=E9lio Laranjeiro [mailto:nelio.laranjeiro@6wind.com] > > Sent: Thursday, March 22, 2018 11:28 AM > > To: Hanoch Haim (hhaim) > > Cc: Yongseok Koh; dev@dpdk.org > > Subject: Re: [dpdk-dev] mlx5 reta size is dynamic > >=20 > > Hi Hanoch, > >=20 > > On Thu, Mar 22, 2018 at 09:02:19AM +0000, Hanoch Haim (hhaim) wrote: > > > Hi Nelio, > > > I think you didn't understand me. I suggest to keep the RETA table=20 > > > size constant (maximum 512 in your case) and don't change its base=20 > > > on the number of configured Rx-queue. > >=20 > > It is even simpler, we can return the maximum size or a multiple of=20 > > RTE_RETA_GROUP_SIZE according to the number of Rx queues being used,=20 > > in the devop->dev_infos_get() as it is what the > > rte_eth_dev_rss_reta_update() implementation will expect. > > =20 > > > This will make the DPDK API consistent. As a user I need to do=20 > > > tricks (allocate an odd/prime number of rx-queues) to get the RETA=20 > > > size constant at 512 > >=20 > > I understand this issue, what I don't fully understand your needs. > >=20 > > > I'm not talking about changing the values in the RETA table which=20 > > > can be done while there is traffic. > >=20 > > On MLX5 changing the entries of the RETA table don't affect the current= traffic, it needs a port restart to affect it and only for "default" > > flows, any flow created through the public flow API are not impacted by= the RETA table. > >=20 > >=20 > > From my understanding, you wish to have a size returned by > > devop->dev_infos_get() usable directly by rte_eth_dev_rss_reta_update()= . > > This is why you are asking for a fix size? So, if internally the PMD s= tarts with a smaller RETA table does not really matter, until the RETA API = works without any trick from the application side. Is this correct? > >=20 > > Thanks, > >=20 > > > Thanks, > > > Hanoh > > >=20 > > >=20 > > > -----Original Message----- > > > From: N=E9lio Laranjeiro [mailto:nelio.laranjeiro@6wind.com] > > > Sent: Thursday, March 22, 2018 10:55 AM > > > To: Hanoch Haim (hhaim) > > > Cc: Yongseok Koh; dev@dpdk.org > > > Subject: Re: [dpdk-dev] mlx5 reta size is dynamic > > >=20 > > > On Thu, Mar 22, 2018 at 06:52:53AM +0000, Hanoch Haim (hhaim) wrote: > > > > Hi Yongseok, > > > >=20 > > > >=20 > > > > RSS has a DPDK API,application can ask for the reta table size=20 > > > > and configure it. In your case you are assuming specific use=20 > > > > case and change the size dynamically which solve 90% of the=20 > > > > use-cases but break the 10% use-case. > > > > Instead, you could provide the application a consistent API and=20 > > > > with that 100% of the applications can work with no issue. This=20 > > > > is what happen with Intel (ixgbe/i40e) Another minor issue the=20 > > > > rss_key_size return as zero but internally it is 40 bytes > > >=20 > > > Hi Hanoch, > > >=20 > > > Legacy DPDK API has always considered there is only a single indirect= ion table aka. RETA whereas this is not true [1][2] on this device. > > >=20 > > > On MLX5 there is an indirection table per Hash Rx queue according to = the list of queues making part of it. > > > The Hash Rx queue is configured to make the hash with configured > > > information: > > > - Algorithm, > > > - key > > > - hash field (Verbs hash field) > > > - Indirection table > > > An Hash Rx queue cannot handle multiple RSS configuration, we have an= Hash Rx queue per protocol and thus a full configuration per protocol. > > >=20 > > > In such situation, changing the RETA means stopping the traffic, dest= roying every single flow, hash Rx queue, indirection table to remake everyt= hing with the new configuration. > > > Until then, we always recommended to any application to restart the p= ort on this device after a RETA update to apply this new configuration. > > >=20 > > > Since the flow API is the new way to configure flows, application sho= uld move to this new one instead of using old API for such behavior. > > > We should also remove such devop from the PMD to avoid any confusion. > > >=20 > > > Regards, > > >=20 > > > > Thanks, > > > > Hanoh > > > >=20 > > > > -----Original Message----- > > > > From: Yongseok Koh [mailto:yskoh@mellanox.com] > > > > Sent: Wednesday, March 21, 2018 11:48 PM > > > > To: Hanoch Haim (hhaim) > > > > Cc: dev@dpdk.org > > > > Subject: Re: [dpdk-dev] mlx5 reta size is dynamic > > > >=20 > > > > On Wed, Mar 21, 2018 at 06:56:33PM +0000, Hanoch Haim (hhaim) wrote= : > > > > > Hi mlx5 driver expert, > > > > >=20 > > > > > DPDK: 17.11 > > > > > Any reason mlx5 driver change the rate table size dynamically=20 > > > > > based on the rx- queues# ? > > > >=20 > > > > The device only supports 2^n-sized indirection table. For example, = if the number of Rx queues is 6, device can't have 1-1 mapping but the size= of ind tbl could be 8, 16, 32 and so on. If we configure it as 8 for examp= le, 2 out of 6 queues will have 1/4 of traffic while the rest 4 queues rece= ives 1/8. We thought it was too much disparity and preferred setting the ma= x size in order to mitigate the imbalance. > > > >=20 > > > > > There is a hidden assumption that the user wants to distribute=20 > > > > > the packets evenly which is not always correct. > > > >=20 > > > > But it is mostly correct because RSS is used for uniform distributi= on. The decision wasn't made based on our speculation but by many request f= rom multiple customers. > > > >=20 > > > > > /* If the requested number of RX queues is not a power of two, us= e the > > > > > * maximum indirection table size for better balancing. > > > > > * The result is always rounded to the next power of two= . */ > > > > > reta_idx_n =3D (1 << log2above((rxqs_n & (rxqs_n - 1)) = ? > > > > > priv->ind_table_max_si= ze : > > > > > rxqs_n)); > > > >=20 > > > > Thanks, > > > > Yongseok > > >=20 > > > [1] https://dpdk.org/ml/archives/dev/2015-October/024668.html > > > [2] https://dpdk.org/ml/archives/dev/2015-October/024669.html > > >=20 > > > -- > > > N=E9lio Laranjeiro > > > 6WIND > >=20 > > -- > > N=E9lio Laranjeiro > > 6WIND >=20 > -- > N=E9lio Laranjeiro > 6WIND -- N=E9lio Laranjeiro 6WIND