All of lore.kernel.org
 help / color / mirror / Atom feed
From: Feifei Wang <Feifei.Wang2@arm.com>
To: Slava Ovsiienko <viacheslavo@nvidia.com>,
	Matan Azrad <matan@nvidia.com>,
	 Shahaf Shuler <shahafs@nvidia.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>,
	"stable@dpdk.org" <stable@dpdk.org>,
	Ruifeng Wang <Ruifeng.Wang@arm.com>, nd <nd@arm.com>,
	nd <nd@arm.com>
Subject: [dpdk-dev] 回复: [PATCH v1 3/4] net/mlx5: fix rebuild bug for Memory Region cache
Date: Thu, 13 May 2021 05:49:01 +0000	[thread overview]
Message-ID: <DB9PR08MB69235E8750052FB08E15721BC8519@DB9PR08MB6923.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <DM6PR12MB375370B85D291BE7DC8D2A45DF529@DM6PR12MB3753.namprd12.prod.outlook.com>

Hi, Slava

Please see below.

> -----邮件原件-----
> 发件人: Slava Ovsiienko <viacheslavo@nvidia.com>
> 发送时间: 2021年5月12日 19:08
> 收件人: Feifei Wang <Feifei.Wang2@arm.com>; Matan Azrad
> <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>
> 抄送: dev@dpdk.org; nd <nd@arm.com>; stable@dpdk.org; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> 主题: RE: [PATCH v1 3/4] net/mlx5: fix rebuild bug for Memory Region cache
> 
> Hi, Feifei
> 
> ..ship..
> >
> > If I understand correctly, your meaning is that if without wmb, maybe
> > other agents observe changed "dev_gen", but they also observe
> > unchanged "global" cache.
> > This can be defined as  memory inconsistent state.
> >
> > 					Fig1
> > ----------------------------------------------------------------------
> > ------------------------
> > -------
> > Timeslot        	      agent_1               		               agent_2
> > 1		     take_lock
> > 2          		update dev_gen
> > 3                       					     observe changed dev_gen
> > 4						         clear local cache
> >
> > 5            		rebuild global cache		               wait_lock
> > 6		      free_lock
> > 7		         wmb			                take_lock
> > 8							get(new MR)
> > 9                           				             	   free_lock
> > ----------------------------------------------------------------------
> > ------------------------
> > -------
> 
> Yes, something like that.
> 
> 
> > 1. However, in out-of-order platform, though adding a 'wmb at last',
> > 'dev_gen' maybe updated before global cache rebuild, and then other
> > agents can observe changed 'dev_ge'
> > before rebuilding global cache.
> >
> > Thus, though add a 'wmb at last', It is still unable to prevent other
> > agents from observing some inconsistent state. As a result, 'wmb at
> > last' fails to keep consistence.
> >
> > 2. On the other hand, due to lock, agent_2 will wait to take a lock
> > until global cache rebuilt by agent_1, and this ensures agent_2 can
> > get a correct new MR and update new local cache correctly.
> >
> > In summary, 'wmb at last' cannot guarantee other agents to observe the
> > consistent state.
> > But lock can fix this error. So, the existence of wmb at last is
> > redundant and we can remove it.
> 
> If dev_gen change is committed and cache's one is not yet - the agent_2
> might see inconsistent state even inside the lock-protected section. Hence,
> we must commit all writes before leaving the locked section in agent_1.
> 
> Let’s suppose there is no wmb in agent_1 at all, and dev_gen is arbitrary
> committed by CPU and MR cache data change is not. We leave the locked
> section in agent_1, agent_2 sees dev_gen changed, takes the lock and sees
> inconsistent MR-cache state due to not all changes made in agent_1 are
> committed. With wmb we have now in the existing code - there is no issue
> like that.

I can understand you worry that if there is no 'wmb at last', when agent_1 leave
the locked section, agent_2 still may observe unchanged global cache.

However, when agent_2 take a lock and get(new MR) in time slot 7(Fig.1), it means
agent_1 has finished updating global cache and lock is freed. 
Besides, if agent_2 can take a lock, it also shows agent_2 has observed the changed
global cache.

This is because there is a store- release in rte_rwlock_read_unlock,  store-release
ensures all store operation before 'release' can be committed  if store operation after
'release' is observed by other agents.

Thus, in our case, if agent_2 can observe updated 'rwl->cnt'(lock) and take a lock, it also can
observe updated 'dev_gen' and 'global cache'.

As a result, wmb can be removed due to store-release in R/W unlock.

Best Regards
Feifei     
 
> 
> With best regards,
> Slava
> 
> >
> > Best Regards
> > Feifei
> >
> > > With best regards,
> > > Slava
> > >
> > >
> > > >
> > > > Best Regards
> > > > Feifei
> > > >
> > > > > -----邮件原件-----
> > > > > 发件人: Slava Ovsiienko <viacheslavo@nvidia.com>
> > > > > 发送时间: 2021年5月7日 18:15
> > > > > 收件人: Feifei Wang <Feifei.Wang2@arm.com>; Matan Azrad
> > > > > <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>
> > > > > 抄送: dev@dpdk.org; nd <nd@arm.com>; stable@dpdk.org; Ruifeng
> > > Wang
> > > > > <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> > > > > 主题: RE: [PATCH v1 3/4] net/mlx5: fix rebuild bug for Memory
> > > > > Region cache
> > > > >
> > > > > Hi, Feifei
> > > > >
> > > > > We should consider the locks in your scenario - it is crucial
> > > > > for the complete model description:
> > > > >
> > > > > How agent_1 (in your terms) rebuilds global cache:
> > > > >
> > > > > 1a) lock()
> > > > > 1b) rebuild(global cache)
> > > > > 1c) update(dev_gen)
> > > > > 1d) wmb()
> > > > > 1e) unlock()
> > > > >
> > > > > How agent_2 checks:
> > > > >
> > > > > 2a) check(dev_gen) (assume positive - changed)
> > > > > 2b) clear(local_cache)
> > > > > 2c) miss(on empty local_cache) - > eventually it goes to
> > > > > mr_lookup_caches()
> > > > > 2d) lock()
> > > > > 2e) get(new MR)
> > > > > 2f) unlock()
> > > > > 2g) update(local cache with obtained new MR)
> > > > >
> > > > > Hence, even if 1c) becomes visible in 2a) before 1b) committed
> > > > > (say, due to out-of-order Arch) - the agent 2 would be blocked
> > > > > on
> > > > > 2d) and scenario depicted on your Fig2 would not happen (agent_2
> > > > > will wait before step 3 till agent 1 unlocks after its step 5).
> > > > >
> > > > > With best regards,
> > > > > Slava
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Feifei Wang <Feifei.Wang2@arm.com>
> > > > > > Sent: Friday, May 7, 2021 9:36> To: Slava Ovsiienko
> > > > > > <viacheslavo@nvidia.com>; Matan Azrad <matan@nvidia.com>;
> > Shahaf
> > > > > > Shuler <shahafs@nvidia.com>
> > > > > > Cc: dev@dpdk.org; nd <nd@arm.com>; stable@dpdk.org; Ruifeng
> > > Wang
> > > > > > <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> > > > > > Subject: 回复: [PATCH v1 3/4] net/mlx5: fix rebuild bug for
> > > > > > Memory Region cache
> > > > > >
> > > > > > Hi, Slava
> > > > > >
> > > > > > Thanks very much for your reply.
> > > > > >
> > > > > > > -----邮件原件-----
> > > > > > > 发件人: Slava Ovsiienko <viacheslavo@nvidia.com>
> > > > > > > 发送时间: 2021年5月6日 19:22
> > > > > > > 收件人: Feifei Wang <Feifei.Wang2@arm.com>; Matan Azrad
> > > > > > > <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>
> > > > > > > 抄送: dev@dpdk.org; nd <nd@arm.com>; stable@dpdk.org;
> > Ruifeng
> > > > > Wang
> > > > > > > <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> > > > > > > 主题: RE: [PATCH v1 3/4] net/mlx5: fix rebuild bug for Memory
> > > > > > > Region cache
> > > > > > >
> > > > > > > Hi, Feifei
> > > > > > >
> > > > > > > Sorry, I do not follow why we should get rid of the last
> > > > > > > (after dev_gen update) wmb.
> > > > > > > We've rebuilt the global cache, we should notify other
> > > > > > > agents it's happened and they should flush local caches. So,
> > > > > > > dev_gen change should be made visible to other agents to
> > > > > > > trigger this activity and the second wmb is here to ensure this.
> > > > > >
> > > > > > 1. For the first problem why we should get rid of the last wmb
> > > > > > and move it before dev_gen updated, I think our attention is
> > > > > > how the wmb implements the synchronization between multiple
> agents.
> > > > > > 					Fig1
> > > > > > --------------------------------------------------------------
> > > > > > --
> > > > > > --
> > > > > > --
> > > > > > --
> > > > > > ------------------------
> > > > > > -------
> > > > > > Timeslot        		agent_1               		   agent_2
> > > > > > 1          		rebuild global cache
> > > > > > 2                       		wmb
> > > > > > 3            		     update dev_gen ----------------------- load
> changed
> > > > > > dev_gen
> > > > > > 4                                  			        	           rebuild local
> > cache
> > > > > > --------------------------------------------------------------
> > > > > > --
> > > > > > --
> > > > > > --
> > > > > > --
> > > > > > ------------------------
> > > > > > -------
> > > > > >
> > > > > > First, wmb is only for local thread to keep the order between
> > > > > > local
> > > > > > write- write :
> > > > > > Based on the picture above, for agent_1, wmb keeps the order
> > > > > > that rebuilding global cache is always before updating dev_gen.
> > > > > >
> > > > > > Second, agent_1 communicates with agent_2 by the global
> > > > > > variable "dev_gen" :
> > > > > > If agent_1 updates dev_gen, agent_2 will load it and then it
> > > > > > knows it should rebuild local cache
> > > > > >
> > > > > > Finally, agent_2 rebuilds local cache according to whether
> > > > > > agent_1 has rebuilt global cache, and agent_2 knows this
> > > > > > information by the variable
> > > > > "dev_gen".
> > > > > > 					Fig2
> > > > > > --------------------------------------------------------------
> > > > > > --
> > > > > > --
> > > > > > --
> > > > > > --
> > > > > > ------------------------
> > > > > > -------
> > > > > > Timeslot        		agent_1               		   agent_2
> > > > > > 1		        update dev_gen
> > > > > > 2						      load changed
> > dev_gen
> > > > > > 3						          rebuild local
> > cache
> > > > > > 4        		    rebuild global cache
> > > > > > 5			 wmb
> > > > > > --------------------------------------------------------------
> > > > > > --
> > > > > > --
> > > > > > --
> > > > > > --
> > > > > > ------------------------
> > > > > > -------
> > > > > >
> > > > > > However, in arm platform, if wmb is after dev_gen updated,
> > "dev_gen"
> > > > > > may be updated before agent_1 rebuilding global cache, then
> > > > > > agent_2 maybe receive error message and rebuild its local
> > > > > > cache in
> > > advance.
> > > > > >
> > > > > > To summarize, it is not important which time other agents can
> > > > > > see the changed global variable "dev_gen".
> > > > > > (Actually, wmb after "dev_gen" cannot ensure changed "dev_gen"
> > > > > > is committed to the global).
> > > > > > It is more important that if other agents see the changed
> > > > > > "dev_gen", they also can know global cache has been updated.
> > > > > >
> > > > > > > One more point, due to registering new/destroying existing
> > > > > > > MR involves FW (via kernel) calls, it takes so many CPU
> > > > > > > cycles that we could neglect wmb overhead at all.
> > > > > >
> > > > > > We just move the last wmb into the right place, and not delete
> > > > > > it for performance.
> > > > > >
> > > > > > >
> > > > > > > Also, regarding this:
> > > > > > >
> > > > > > >  > > Another question suddenly occurred to me, in order to
> > > > > > > keep the
> > > > > > > >
> > > > > > > > order that rebuilding global cache before updating
> > > > > > > > ”dev_gen“, the
> > > > > > > > > wmb should be before updating "dev_gen" rather than after it.
> > > > > > >  > > Otherwise, in the out-of-order platforms, current order
> > > > > > > cannot be
> > > > > > kept.
> > > > > > >
> > > > > > > it is not clear why ordering is important - global cache
> > > > > > > update and dev_gen change happen under spinlock protection,
> > > > > > > so only the last wmb is meaningful.
> > > > > > >
> > > > > >
> > > > > > 2. The second function of wmb before "dev_gen" updated is for
> > > > > > performance according to our previous discussion.
> > > > > > According to Fig2, if there is no wmb between "global cache
> updated"
> > > > > > and "dev_gen updated", "dev_gen" may update before global
> > > > > > cache
> > > > > updated.
> > > > > >
> > > > > > Then agent_2 may see the changed "dev_gen" and flush entire
> > > > > > local cache in advance.
> > > > > >
> > > > > > This entire flush can degrade the performance:
> > > > > > "the local cache is getting empty and can't provide
> > > > > > translation for other valid (not being removed) MRs, and the
> > > > > > translation has to look up in the global cache, that is locked
> > > > > > now for rebuilding, this causes the delays in data path on
> > > > > > acquiring global
> > cache lock."
> > > > > >
> > > > > > Furthermore, spinlock is just for global cache, not for
> > > > > > dev_gen and local cache.
> > > > > >
> > > > > > > To summarize, in my opinion:
> > > > > > > - if you see some issue with ordering of global cache
> > > > > > > update/dev_gen signalling,
> > > > > > >   could you, please, elaborate? I'm not sure we should
> > > > > > > maintain an order (due to spinlock protection)
> > > > > > > - the last rte_smp_wmb() after dev_gen incrementing should
> > > > > > > be kept intact
> > > > > > >
> > > > > >
> > > > > > At last, for my view, there are two functions that moving wmb
> > > > > > before "dev_gen"
> > > > > > for the write-write order:
> > > > > > --------------------------------
> > > > > > a) rebuild global cache;
> > > > > > b) rte_smp_wmb();
> > > > > > c) updating dev_gen
> > > > > > -------------------------------- 1. Achieve synchronization
> > > > > > between multiple threads in the right way 2.
> > > > > > Prevent other agents from flushing local cache early to ensure
> > > > > > performance
> > > > > >
> > > > > > Best Regards
> > > > > > Feifei
> > > > > >
> > > > > > > With best regards,
> > > > > > > Slava
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Feifei Wang <Feifei.Wang2@arm.com>
> > > > > > > > Sent: Thursday, May 6, 2021 5:52
> > > > > > > > To: Slava Ovsiienko <viacheslavo@nvidia.com>; Matan Azrad
> > > > > > > > <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>
> > > > > > > > Cc: dev@dpdk.org; nd <nd@arm.com>; stable@dpdk.org;
> > > > > > > > Ruifeng
> > > > > Wang
> > > > > > > > <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> > > > > > > > Subject: 回复: [PATCH v1 3/4] net/mlx5: fix rebuild bug for
> > > > > > > > Memory Region cache
> > > > > > > >
> > > > > > > > Hi, Slava
> > > > > > > >
> > > > > > > > Would you have more comments about this patch?
> > > > > > > > For my sight, only one wmb before "dev_gen" updating is
> > > > > > > > enough to synchronize.
> > > > > > > >
> > > > > > > > Thanks very much for your attention.
> > > > > > > >
> > > > > > > >
> > > > > > > > Best Regards
> > > > > > > > Feifei
> > > > > > > >
> > > > > > > > > -----邮件原件-----
> > > > > > > > > 发件人: Feifei Wang
> > > > > > > > > 发送时间: 2021年4月20日 16:42
> > > > > > > > > 收件人: Slava Ovsiienko <viacheslavo@nvidia.com>; Matan
> > Azrad
> > > > > > > > > <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>
> > > > > > > > > 抄送: dev@dpdk.org; nd <nd@arm.com>; stable@dpdk.org;
> > > > Ruifeng
> > > > > > > Wang
> > > > > > > > > <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> > > > > > > > > 主题: 回复: [PATCH v1 3/4] net/mlx5: fix rebuild bug for
> > > > > > > > > Memory
> > > > > > Region
> > > > > > > > > cache
> > > > > > > > >
> > > > > > > > > Hi, Slava
> > > > > > > > >
> > > > > > > > > I think the second wmb can be removed.
> > > > > > > > > As I know, wmb is just a barrier to keep the order
> > > > > > > > > between write and
> > > > > > > write.
> > > > > > > > > and it cannot tell the CPU when it should commit the changes.
> > > > > > > > >
> > > > > > > > > It is usually used before guard variable to keep the
> > > > > > > > > order that updating guard variable after some changes,
> > > > > > > > > which you want to release,
> > > > > > > > have been done.
> > > > > > > > >
> > > > > > > > > For example, for the wmb  after global cache
> > > > > > > > > update/before altering dev_gen, it can ensure the order
> > > > > > > > > that updating global cache before altering
> > > > > > > > > dev_gen:
> > > > > > > > > 1)If other agent load the changed "dev_gen", it can know
> > > > > > > > > the global cache has been updated.
> > > > > > > > > 2)If other agents load the unchanged, "dev_gen", it
> > > > > > > > > means the global cache has not been updated, and the
> > > > > > > > > local cache will not be
> > > > > > flushed.
> > > > > > > > >
> > > > > > > > > As a result, we use  wmb and guard variable "dev_gen" to
> > > > > > > > > ensure the global cache updating is "visible".
> > > > > > > > > The "visible" means when updating guard variable "dev_gen"
> > > > > > > > > is known by other agents, they also can confirm global
> > > > > > > > > cache has been updated in the meanwhile. Thus, just one
> > > > > > > > > wmb before altering dev_gen can ensure
> > > > > > > > this.
> > > > > > > > >
> > > > > > > > > Best Regards
> > > > > > > > > Feifei
> > > > > > > > >
> > > > > > > > > > -----邮件原件-----
> > > > > > > > > > 发件人: Slava Ovsiienko <viacheslavo@nvidia.com>
> > > > > > > > > > 发送时间: 2021年4月20日 15:54
> > > > > > > > > > 收件人: Feifei Wang <Feifei.Wang2@arm.com>; Matan
> Azrad
> > > > > > > > > > <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>
> > > > > > > > > > 抄送: dev@dpdk.org; nd <nd@arm.com>; stable@dpdk.org;
> > > > > Ruifeng
> > > > > > > > Wang
> > > > > > > > > > <Ruifeng.Wang@arm.com>; nd <nd@arm.com>; nd
> > > > > <nd@arm.com>;
> > > > > > > nd
> > > > > > > > > > <nd@arm.com>
> > > > > > > > > > 主题: RE: [PATCH v1 3/4] net/mlx5: fix rebuild bug for
> > > > > > > > > > Memory Region cache
> > > > > > > > > >
> > > > > > > > > > Hi, Feifei
> > > > > > > > > >
> > > > > > > > > > In my opinion, there should be 2 barriers:
> > > > > > > > > >  - after global cache update/before altering dev_gen,
> > > > > > > > > > to ensure the correct order
> > > > > > > > > >  - after altering dev_gen to make this change visible
> > > > > > > > > > for other agents and to trigger local cache update
> > > > > > > > > >
> > > > > > > > > > With best regards,
> > > > > > > > > > Slava
> > > > > > > > > >
> > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > From: Feifei Wang <Feifei.Wang2@arm.com>
> > > > > > > > > > > Sent: Tuesday, April 20, 2021 10:30
> > > > > > > > > > > To: Slava Ovsiienko <viacheslavo@nvidia.com>; Matan
> > > > > > > > > > > Azrad <matan@nvidia.com>; Shahaf Shuler
> > > > > > > > > > > <shahafs@nvidia.com>
> > > > > > > > > > > Cc: dev@dpdk.org; nd <nd@arm.com>; stable@dpdk.org;
> > > > > > > > > > > Ruifeng
> > > > > > > > Wang
> > > > > > > > > > > <Ruifeng.Wang@arm.com>; nd <nd@arm.com>; nd
> > > > > > <nd@arm.com>;
> > > > > > > > nd
> > > > > > > > > > > <nd@arm.com>
> > > > > > > > > > > Subject: 回复: [PATCH v1 3/4] net/mlx5: fix rebuild
> > > > > > > > > > > bug for Memory Region cache
> > > > > > > > > > >
> > > > > > > > > > > Hi, Slava
> > > > > > > > > > >
> > > > > > > > > > > Another question suddenly occurred to me, in order
> > > > > > > > > > > to keep the order that rebuilding global cache
> > > > > > > > > > > before updating ”dev_gen“, the wmb should be before
> > > > > > > > > > > updating
> > > "dev_gen"
> > > > > > > > > > > rather
> > > > > than after it.
> > > > > > > > > > > Otherwise, in the out-of-order platforms, current
> > > > > > > > > > > order cannot be
> > > > > > > kept.
> > > > > > > > > > >
> > > > > > > > > > > Thus, we should change the code as:
> > > > > > > > > > > a) rebuild global cache;
> > > > > > > > > > > b) rte_smp_wmb();
> > > > > > > > > > > c) updating dev_gen
> > > > > > > > > > >
> > > > > > > > > > > Best Regards
> > > > > > > > > > > Feifei
> > > > > > > > > > > > -----邮件原件-----
> > > > > > > > > > > > 发件人: Feifei Wang
> > > > > > > > > > > > 发送时间: 2021年4月20日 13:54
> > > > > > > > > > > > 收件人: Slava Ovsiienko <viacheslavo@nvidia.com>;
> > Matan
> > > > > Azrad
> > > > > > > > > > > > <matan@nvidia.com>; Shahaf Shuler
> > > > > > > > > > > > <shahafs@nvidia.com>
> > > > > > > > > > > > 抄送: dev@dpdk.org; nd <nd@arm.com>;
> > stable@dpdk.org;
> > > > > > > Ruifeng
> > > > > > > > > > Wang
> > > > > > > > > > > > <Ruifeng.Wang@arm.com>; nd <nd@arm.com>; nd
> > > > > > <nd@arm.com>
> > > > > > > > > > > > 主题: 回复: [PATCH v1 3/4] net/mlx5: fix rebuild bug
> > > > > > > > > > > > for Memory
> > > > > > > > > Region
> > > > > > > > > > > > cache
> > > > > > > > > > > >
> > > > > > > > > > > > Hi, Slava
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks very much for your explanation.
> > > > > > > > > > > >
> > > > > > > > > > > > I can understand the app can wait all mbufs are
> > > > > > > > > > > > returned to the memory pool, and then it can free
> > > > > > > > > > > > this mbufs, I agree with
> > > > > > this.
> > > > > > > > > > > >
> > > > > > > > > > > > As a result, I will remove the bug fix patch from
> > > > > > > > > > > > this series and just replace the smp barrier with
> > > > > > > > > > > > C11 thread fence. Thanks very much for your
> > > > > > > > > > > > patient explanation
> > > again.
> > > > > > > > > > > >
> > > > > > > > > > > > Best Regards
> > > > > > > > > > > > Feifei
> > > > > > > > > > > >
> > > > > > > > > > > > > -----邮件原件-----
> > > > > > > > > > > > > 发件人: Slava Ovsiienko <viacheslavo@nvidia.com>
> > > > > > > > > > > > > 发送时间: 2021年4月20日 2:51
> > > > > > > > > > > > > 收件人: Feifei Wang <Feifei.Wang2@arm.com>; Matan
> > > > Azrad
> > > > > > > > > > > > > <matan@nvidia.com>; Shahaf Shuler
> > > > > > > > > > > > > <shahafs@nvidia.com>
> > > > > > > > > > > > > 抄送: dev@dpdk.org; nd <nd@arm.com>;
> > > stable@dpdk.org;
> > > > > > > > Ruifeng
> > > > > > > > > > > Wang
> > > > > > > > > > > > > <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> > > > > > > > > > > > > 主题: RE: [PATCH v1 3/4] net/mlx5: fix rebuild bug
> > > > > > > > > > > > > for Memory Region cache
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi, Feifei
> > > > > > > > > > > > >
> > > > > > > > > > > > > Please, see below
> > > > > > > > > > > > >
> > > > > > > > > > > > > ....
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi, Feifei
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Sorry, I do not follow what this patch fixes.
> > > > > > > > > > > > > > > Do we have some issue/bug with MR cache in
> > practice?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This patch fixes the bug which is based on
> > > > > > > > > > > > > > logical deduction, and it doesn't actually happen.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Each Tx queue has its own dedicated "local"
> > > > > > > > > > > > > > > cache for MRs to convert buffer address in
> > > > > > > > > > > > > > > mbufs being transmitted to LKeys (HW-related
> > > > > > > > > > > > > > > entity
> > > > > > > > > > > > > > > handle) and the "global" cache for all MR
> > > > > > > > > > > > > > > registered on the
> > > > > > > > device.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > AFAIK, how conversion happens in datapath:
> > > > > > > > > > > > > > > - check the local queue cache flush request
> > > > > > > > > > > > > > > - lookup in local cache
> > > > > > > > > > > > > > > - if not found:
> > > > > > > > > > > > > > > - acquire lock for global cache read access
> > > > > > > > > > > > > > > - lookup in global cache
> > > > > > > > > > > > > > > - release lock for global cache
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > How cache update on memory
> > > > > > > > > > > > > > > freeing/unregistering
> > > > > > happens:
> > > > > > > > > > > > > > > - acquire lock for global cache write access
> > > > > > > > > > > > > > > - [a] remove relevant MRs from the global
> > > > > > > > > > > > > > > cache
> > > > > > > > > > > > > > > - [b] set local caches flush request
> > > > > > > > > > > > > > > - free global cache lock
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If I understand correctly, your patch swaps
> > > > > > > > > > > > > > > [a] and [b], and local caches flush is requested
> earlier.
> > > > > > > > > > > > > > > What problem does it
> > > > > > > > solve?
> > > > > > > > > > > > > > > It is not supposed there are in datapath
> > > > > > > > > > > > > > > some mbufs referencing to the memory being
> freed.
> > > > > > > > > > > > > > > Application must ensure this and must not
> > > > > > > > > > > > > > > allocate new mbufs from this memory regions
> > > > > > > > > > being freed.
> > > > > > > > > > > > > > > Hence, the lookups for these MRs in caches
> > > > > > > > > > > > > > > should not
> > > > > > occur.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > For your first point that, application can
> > > > > > > > > > > > > > take charge of preventing MR freed memory
> > > > > > > > > > > > > > being allocated to
> > > > data path.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Does it means that If there is an emergency of
> > > > > > > > > > > > > > MR fragment, such as hotplug, the application
> > > > > > > > > > > > > > must inform thedata path in advance, and this
> > > > > > > > > > > > > > memory will not be allocated, and then the
> > > > > > > > > > > > > > control path will free this memory? If
> > > > > > > > > > > > > > application  can do like this, I agree that
> > > > > > > > > > > > > > this bug
> > > > > > > > > > > > cannot happen.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Actually,  this is the only correct way for
> > > > > > > > > > > > > application to
> > > > operate.
> > > > > > > > > > > > > Let's suppose we have some memory area that
> > > > > > > > > > > > > application wants to
> > > > > > > > > > free.
> > > > > > > > > > > > > ALL references to this area must be removed. If
> > > > > > > > > > > > > we have some mbufs allocated from this area, it
> > > > > > > > > > > > > means that we have memory pool created
> > > > > > > > > > > there.
> > > > > > > > > > > > >
> > > > > > > > > > > > > What application should do:
> > > > > > > > > > > > > - notify all its components/agents the memory
> > > > > > > > > > > > > area is going to be freed
> > > > > > > > > > > > > - all components/agents free the mbufs they
> > > > > > > > > > > > > might own
> > > > > > > > > > > > > - PMD might not support freeing for some mbufs
> > > > > > > > > > > > > (for example being sent and awaiting for
> > > > > > > > > > > > > completion), so app should just wait
> > > > > > > > > > > > > - wait till all mbufs are returned to the memory
> > > > > > > > > > > > > pool (by monitoring available obj == pool size)
> > > > > > > > > > > > >
> > > > > > > > > > > > > Otherwise - it is dangerous to free the memory.
> > > > > > > > > > > > > There are just some mbufs still allocated, it is
> > > > > > > > > > > > > regardless to buf address to MR translation. We
> > > > > > > > > > > > > just can't free the memory - the mapping will be
> > > > > > > > > > > > > destroyed and might cause the segmentation fault
> > > > > > > > > > > > > by SW or some HW issues on DMA access to
> > > > > > > > > > > > > unmapped memory.  It is very generic safety
> > > > > > > > > > > > > approach - do not free the memory that is still
> > > > > > > > > > > > > in
> > > > > > > > use.
> > > > > > > > > > > > > Hence, at the moment of freeing and
> > > > > > > > > > > > > unregistering the MR, there MUST BE NO any
> > > > > > > > > > > > mbufs in flight referencing to the addresses being freed.
> > > > > > > > > > > > > No translation to MR being invalidated can happen.
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > For other side, the cache flush has negative
> > > > > > > > > > > > > > > effect
> > > > > > > > > > > > > > > - the local cache is getting empty and can't
> > > > > > > > > > > > > > > provide translation for other valid (not
> > > > > > > > > > > > > > > being
> > > > > > > > > > > > > > > removed) MRs, and the translation has to
> > > > > > > > > > > > > > > look up in the global cache, that is locked
> > > > > > > > > > > > > > > now for rebuilding, this causes the delays
> > > > > > > > > > > > > > > in datapatch
> > > > > > > > > > > > > > on acquiring global cache lock.
> > > > > > > > > > > > > > > So, I see some potential performance impact.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > If above assumption is true, we can go to your
> > > > > > > > > > > > > > second
> > > > point.
> > > > > > > > > > > > > > I think this is a problem of the tradeoff
> > > > > > > > > > > > > > between cache coherence and
> > > > > > > > > > > > > performance.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I can understand your meaning that though
> > > > > > > > > > > > > > global cache has been changed, we should keep
> > > > > > > > > > > > > > the valid MR in local cache as long as
> > > > > > > > > > > > > > possible to ensure the fast
> > > > searching speed.
> > > > > > > > > > > > > > In the meanwhile, the local cache can be
> > > > > > > > > > > > > > rebuilt later to reduce its waiting time for
> > > > > > > > > > > > > > acquiring the global
> > > > cache lock.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > However,  this mechanism just ensures the
> > > > > > > > > > > > > > performance unchanged for the first few mbufs.
> > > > > > > > > > > > > > During the next mbufs lkey searching after 'dev_gen'
> > > > > > > > > > > > > > updated, it is still necessary to update the local cache.
> > > > > > > > > > > > > > And the performance can firstly reduce and
> > > > > > > > > > > > > > then
> > returns.
> > > > > > > > > > > > > > Thus, no matter whether there is this patch or
> > > > > > > > > > > > > > not, the performance will jitter in a certain
> > > > > > > > > > > > > > period of
> > > > > > > > > > > time.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Local cache should be updated to remove MRs no
> > > > > > > > > > > > > longer
> > > > valid.
> > > > > > > > > > > > > But we just flush the entire cache.
> > > > > > > > > > > > > Let's suppose we have valid MR0, MR1, and not
> > > > > > > > > > > > > valid MRX in local
> > > > > > > > > cache.
> > > > > > > > > > > > > And there are traffic in the datapath for MR0
> > > > > > > > > > > > > and MR1, and no traffic for MRX anymore.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1) If we do as you propose:
> > > > > > > > > > > > > a) take a lock
> > > > > > > > > > > > > b) request flush local cache first - all MR0,
> > > > > > > > > > > > > MR1, MRX will be removed on translation in
> > > > > > > > > > > > > datapath
> > > > > > > > > > > > > c) update global cache,
> > > > > > > > > > > > > d) free lock
> > > > > > > > > > > > > All the traffic for valid MR0, MR1 ALWAYS will
> > > > > > > > > > > > > be blocked on lock taken for cache update since
> > > > > > > > > > > > > point
> > > > > > > > > > > > > b) till point
> > > > > d).
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2) If we do as it is implemented now:
> > > > > > > > > > > > > a) take a lock
> > > > > > > > > > > > > b) update global cache
> > > > > > > > > > > > > c) request flush local cache
> > > > > > > > > > > > > d) free lock
> > > > > > > > > > > > > The traffic MIGHT be locked ONLY for MRs
> > > > > > > > > > > > > non-existing in local cache (not happens for MR0
> > > > > > > > > > > > > and MR1, must not happen for MRX), and
> > > > > > > > > > > > > probability should be minor. And lock might
> > > > > > > > > > > > > happen since
> > > > > > > > > > > > > c) till
> > > > > > > > > > > > > d)
> > > > > > > > > > > > > - quite short period of time
> > > > > > > > > > > > >
> > > > > > > > > > > > > Summary, the difference between 1) and 2)
> > > > > > > > > > > > >
> > > > > > > > > > > > > Lock probability:
> > > > > > > > > > > > > - 1) lock ALWAYS happen for ANY MR translation after
> b),
> > > > > > > > > > > > >   2) lock MIGHT happen, for cache miss ONLY,
> > > > > > > > > > > > > after
> > > > > > > > > > > > > c)
> > > > > > > > > > > > >
> > > > > > > > > > > > > Lock duration:
> > > > > > > > > > > > > - 1) lock since b) till d),
> > > > > > > > > > > > >   2) lock since c) till d), that seems to be  much shorter.
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Finally, in conclusion, I tend to think that
> > > > > > > > > > > > > > the bottom layer can do more things to ensure
> > > > > > > > > > > > > > the correct execution of the program, which
> > > > > > > > > > > > > > may have a negative impact on the performance
> > > > > > > > > > > > > > in a short time, but in the long run, the
> > > > > > > > > > > > > > performance will eventually
> > > > > > > > > > come back.
> > > > > > > > > > > > > > Furthermore, maybe we should pay attention to
> > > > > > > > > > > > > > the performance in the stable period, and try
> > > > > > > > > > > > > > our best to ensure the correctness of the
> > > > > > > > > > > > > > program in case of
> > > > > > > > > > > > > emergencies.
> > > > > > > > > > > > >
> > > > > > > > > > > > > If we have some mbufs still allocated in memory
> > > > > > > > > > > > > being freed
> > > > > > > > > > > > > - there is nothing to say about correctness, it
> > > > > > > > > > > > > is totally incorrect. In my opinion, we should
> > > > > > > > > > > > > not think how to mitigate this incorrect
> > > > > > > > > > > > > behavior, we should not encourage application
> > > > > > > > > > > > > developers to follow the wrong
> > > > > > > > > > > > approaches.
> > > > > > > > > > > > >
> > > > > > > > > > > > > With best regards, Slava
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Best Regards
> > > > > > > > > > > > > > Feifei
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > With best regards, Slava

  reply	other threads:[~2021-05-13  5:49 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-18  7:18 [dpdk-dev] [PATCH v1 0/4] refactor SMP barriers for net/mlx Feifei Wang
2021-03-18  7:18 ` [dpdk-dev] [PATCH v1 1/4] net/mlx4: fix rebuild bug for Memory Region cache Feifei Wang
2021-03-18  7:18 ` [dpdk-dev] [PATCH v1 2/4] net/mlx4: replace SMP barrier with C11 barriers Feifei Wang
2021-03-18  7:18 ` [dpdk-dev] [PATCH v1 3/4] net/mlx5: fix rebuild bug for Memory Region cache Feifei Wang
2021-04-12  8:27   ` Slava Ovsiienko
2021-04-13  5:20     ` [dpdk-dev] 回复: " Feifei Wang
2021-04-19 18:50       ` [dpdk-dev] " Slava Ovsiienko
2021-04-20  5:53         ` [dpdk-dev] 回复: " Feifei Wang
2021-04-20  7:29           ` Feifei Wang
2021-04-20  7:53             ` [dpdk-dev] " Slava Ovsiienko
2021-04-20  8:42               ` [dpdk-dev] 回复: " Feifei Wang
2021-05-06  2:52                 ` Feifei Wang
2021-05-06 11:21                   ` [dpdk-dev] " Slava Ovsiienko
2021-05-07  6:36                     ` [dpdk-dev] 回复: " Feifei Wang
2021-05-07 10:14                       ` [dpdk-dev] " Slava Ovsiienko
2021-05-08  3:13                         ` [dpdk-dev] 回复: " Feifei Wang
2021-05-11  8:18                           ` [dpdk-dev] " Slava Ovsiienko
2021-05-12  5:34                             ` [dpdk-dev] 回复: " Feifei Wang
2021-05-12 11:07                               ` [dpdk-dev] " Slava Ovsiienko
2021-05-13  5:49                                 ` Feifei Wang [this message]
2021-05-13 10:49                                   ` Slava Ovsiienko
2021-05-14  5:18                                     ` [dpdk-dev] 回复: " Feifei Wang
2021-03-18  7:18 ` [dpdk-dev] [PATCH v1 4/4] net/mlx5: replace SMP barriers with C11 barriers Feifei Wang
2021-04-07  1:45 ` [dpdk-dev] [PATCH v1 0/4] refactor SMP barriers for net/mlx Alexander Kozyrev
2021-05-17 10:00 ` [dpdk-dev] [PATCH v2 0/2] remove wmb " Feifei Wang
2021-05-17 10:00   ` [dpdk-dev] [PATCH v2 1/2] net/mlx4: remove unnecessary wmb for Memory Region cache Feifei Wang
2021-05-17 10:00   ` [dpdk-dev] [PATCH v2 2/2] net/mlx5: " Feifei Wang
2021-05-17 14:15     ` Slava Ovsiienko
2021-05-18  8:52       ` [dpdk-dev] 回复: " Feifei Wang
2021-05-18  8:50 ` [dpdk-dev] [PATCH v3 0/2] remove wmb for net/mlx Feifei Wang
2021-05-18  8:50   ` [dpdk-dev] [PATCH v3 1/2] net/mlx4: remove unnecessary wmb for Memory Region cache Feifei Wang
2021-05-18 12:13     ` Slava Ovsiienko
2021-05-18  8:50   ` [dpdk-dev] [PATCH v3 2/2] net/mlx5: " Feifei Wang
2021-05-18 10:17     ` Slava Ovsiienko
2021-05-19  1:54       ` [dpdk-dev] 回复: " Feifei Wang
2021-05-27  8:37   ` [dpdk-dev] [PATCH v3 0/2] remove wmb for net/mlx Raslan Darawsheh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DB9PR08MB69235E8750052FB08E15721BC8519@DB9PR08MB6923.eurprd08.prod.outlook.com \
    --to=feifei.wang2@arm.com \
    --cc=Ruifeng.Wang@arm.com \
    --cc=dev@dpdk.org \
    --cc=matan@nvidia.com \
    --cc=nd@arm.com \
    --cc=shahafs@nvidia.com \
    --cc=stable@dpdk.org \
    --cc=viacheslavo@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.