linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Praveen Kannoju <praveen.kannoju@oracle.com>
To: Leon Romanovsky <leon@kernel.org>, Jason Gunthorpe <jgg@ziepe.ca>
Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>,
	"David S . Miller" <davem@davemloft.net>,
	"kuba@kernel.org" <kuba@kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	"rds-devel@oss.oracle.com" <rds-devel@oss.oracle.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Rama Nichanamatlu <rama.nichanamatlu@oracle.com>,
	Rajesh Sivaramasubramaniom 
	<rajesh.sivaramasubramaniom@oracle.com>
Subject: RE: [PATCH RFC] rds: ib: Reduce the contention caused by the asynchronous workers to flush the mr pool
Date: Thu, 20 Jan 2022 08:00:54 +0000	[thread overview]
Message-ID: <PH0PR10MB55156B918F0D519B8A935C378C5A9@PH0PR10MB5515.namprd10.prod.outlook.com> (raw)
In-Reply-To: <Yegmm4ksXfWiOMME@unreal>

-----Original Message-----
From: Leon Romanovsky [mailto:leon@kernel.org] 
Sent: 19 January 2022 08:26 PM
To: Praveen Kannoju <praveen.kannoju@oracle.com>
Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>; Jason Gunthorpe <jgg@ziepe.ca>; David S . Miller <davem@davemloft.net>; kuba@kernel.org; netdev@vger.kernel.org; linux-rdma@vger.kernel.org; rds-devel@oss.oracle.com; linux-kernel@vger.kernel.org; Rama Nichanamatlu <rama.nichanamatlu@oracle.com>; Rajesh Sivaramasubramaniom <rajesh.sivaramasubramaniom@oracle.com>
Subject: Re: [PATCH RFC] rds: ib: Reduce the contention caused by the asynchronous workers to flush the mr pool

On Wed, Jan 19, 2022 at 11:46:16AM +0000, Praveen Kannoju wrote:
> -----Original Message-----
> From: Leon Romanovsky [mailto:leon@kernel.org]
> Sent: 19 January 2022 12:29 PM
> To: Santosh Shilimkar <santosh.shilimkar@oracle.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>; Praveen Kannoju 
> <praveen.kannoju@oracle.com>; David S . Miller <davem@davemloft.net>; 
> kuba@kernel.org; netdev@vger.kernel.org; linux-rdma@vger.kernel.org; 
> rds-devel@oss.oracle.com; linux-kernel@vger.kernel.org; Rama 
> Nichanamatlu <rama.nichanamatlu@oracle.com>; Rajesh 
> Sivaramasubramaniom <rajesh.sivaramasubramaniom@oracle.com>
> Subject: Re: [PATCH RFC] rds: ib: Reduce the contention caused by the 
> asynchronous workers to flush the mr pool
> 
> On Tue, Jan 18, 2022 at 07:42:54PM +0000, Santosh Shilimkar wrote:
> > On Jan 18, 2022, at 11:17 AM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > 
> > > On Tue, Jan 18, 2022 at 04:48:43PM +0000, Santosh Shilimkar wrote:
> > >> 
> > >>> On Jan 18, 2022, at 6:47 AM, Praveen Kannoju <praveen.kannoju@oracle.com> wrote:
> > >>> 
> > >>> This patch aims to reduce the number of asynchronous workers 
> > >>> being spawned to execute the function "rds_ib_flush_mr_pool" 
> > >>> during the high I/O situations. Synchronous call path's to this function "rds_ib_flush_mr_pool"
> > >>> will be executed without being disturbed. By reducing the number 
> > >>> of processes contending to flush the mr pool, the total number 
> > >>> of D state processes waiting to acquire the mutex lock will be 
> > >>> greatly reduced, which otherwise were causing DB instance crash 
> > >>> as the corresponding processes were not progressing while waiting to acquire the mutex lock.
> > >>> 
> > >>> Signed-off-by: Praveen Kumar Kannoju 
> > >>> <praveen.kannoju@oracle.com> —
> > >>> 
> > >> […]
> > >> 
> > >>> diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c index
> > >>> 8f070ee..6b640b5 100644
> > >>> +++ b/net/rds/ib_rdma.c
> > >>> @@ -393,6 +393,8 @@ int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool,
> > >>> 	 */
> > >>> 	dirty_to_clean = llist_append_to_list(&pool->drop_list, &unmap_list);
> > >>> 	dirty_to_clean += llist_append_to_list(&pool->free_list,
> > >>> &unmap_list);
> > >>> +	WRITE_ONCE(pool->flush_ongoing, true);
> > >>> +	smp_wmb();
> > >>> 	if (free_all) {
> > >>> 		unsigned long flags;
> > >>> 
> > >>> @@ -430,6 +432,8 @@ int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool,
> > >>> 	atomic_sub(nfreed, &pool->item_count);
> > >>> 
> > >>> out:
> > >>> +	WRITE_ONCE(pool->flush_ongoing, false);
> > >>> +	smp_wmb();
> > >>> 	mutex_unlock(&pool->flush_lock);
> > >>> 	if (waitqueue_active(&pool->flush_wait))
> > >>> 		wake_up(&pool->flush_wait);
> > >>> @@ -507,8 +511,17 @@ void rds_ib_free_mr(void *trans_private, 
> > >>> int
> > >>> invalidate)
> > >>> 
> > >>> 	/* If we've pinned too many pages, request a flush */
> > >>> 	if (atomic_read(&pool->free_pinned) >= pool->max_free_pinned ||
> > >>> -	    atomic_read(&pool->dirty_count) >= pool->max_items / 5)
> > >>> -		queue_delayed_work(rds_ib_mr_wq, &pool->flush_worker, 10);
> > >>> +	    atomic_read(&pool->dirty_count) >= pool->max_items / 5) {
> > >>> +		smp_rmb();
> > >> You won’t need these explicit barriers since above atomic and 
> > >> write once already issue them.
> > > 
> > > No, they don't. Use smp_store_release() and smp_load_acquire if 
> > > you want to do something like this, but I still can't quite figure 
> > > out if this usage of unlocked memory accesses makes any sense at all.
> > > 
> > Indeed, I see that now, thanks. Yeah, these multi variable checks 
> > can indeed be racy but they are under lock at least for this code path.
> > But there are few hot path places where single variable states are 
> > evaluated atomically instead of heavy lock.
> 
> At least pool->dirty_count is not locked in rds_ib_free_mr() at all.
> 
> Thanks
> 
> > 
> > Regards,
> > Santosh
> > 
> 
> Thank you Santosh, Leon and Jason for reviewing the Patch.
> 
> 1. Leon, the bool variable "flush_ongoing " introduced through the patch has to be accessed only after acquiring the mutex lock. Hence it is well protected.

I don't see any lock in rds_ib_free_mr() function where your perform "if (!READ_ONCE(pool->flush_ongoing)) { ...".

> 
> 2. As the commit message already conveys the reason, the check being made in the function "rds_ib_free_mr" is only to avoid the redundant asynchronous workers from being spawned.
> 
> 3. The synchronous flush path's through the function "rds_free_mr" with either cookie=0 or "invalidate" flag being set, have to be honoured as per the semantics and hence these paths have to execute the function "rds_ib_flush_mr_pool" unconditionally.
> 
> 4. It complicates the patch to identify, where from the function "rds_ib_flush_mr_pool", has been called. And hence, this patch uses the state of the bool variable to stop the asynchronous workers.
> 
> 5. We knew that "queue_delayed_work" will ensures only one work is running, but avoiding these async workers during high load situations, made way for the allocation and synchronous callers which would otherwise be waiting long for the flush to happen. Great reduction in the number of calls to the function "rds_ib_flush_mr_pool" has been observed with this patch. 

So if you understand that there is only one work in progress, why do you say workerS?

Thanks

> 
> 6. Jason, the only function "rds_ib_free_mr" which accesses the introduced bool variable "flush_ongoing" to spawn a flush worker does not crucially impact the availability of MR's, because the flush happens from allocation path as well when necessary.   Hence the Load-store ordering is not essentially needed here, because of which we chose smp_rmb() and smp_wmb() over smp_load_acquire() and smp_store_release().
> 
> Regards,
> Praveen.


Jason,

	The relaxed ordering primitives smp_rmb() and smp_wmb() ensure to provide
	guaranteed atomic memory operations READ_ONCE and WRITE_ONCE, used in the
	functions "rds_ib_free_mr" and "rds_ib_flush_mr_pool" correspondingly.

	Yes, the memory barrier primitives smp_load_acquire()and smp_store_release() 
	are even better. But, because of the simplicity of the use of memory barrier
	in the patch, smp_rmb() and smp_wmb() are chosen. 

	Please let me know if you want me to switch to smp_load_acquire() and
	smp_store_release().

Leon,

	Avoiding the asynchronous worker from being spawned during the high load situations,
	make way for both synchronous and allocation path to flush the mr pool and grab the
	mr without waiting.

	Please let me know if you still have any queries with this respect or any modifications
	are needed.

Regards,
Praveen.

  reply	other threads:[~2022-01-20  8:01 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-18 14:47 [PATCH RFC] rds: ib: Reduce the contention caused by the asynchronous workers to flush the mr pool Praveen Kumar Kannoju
2022-01-18 16:48 ` Santosh Shilimkar
2022-01-18 18:00   ` Leon Romanovsky
2022-01-18 19:17   ` Jason Gunthorpe
2022-01-18 19:42     ` Santosh Shilimkar
2022-01-19  6:59       ` Leon Romanovsky
2022-01-19 11:46         ` Praveen Kannoju
2022-01-19 13:04           ` Jason Gunthorpe
2022-01-19 13:12             ` Praveen Kannoju
2022-01-19 13:17               ` Jason Gunthorpe
2022-01-19 14:08                 ` Praveen Kannoju
2022-01-19 14:49                   ` Jason Gunthorpe
2022-01-19 14:56           ` Leon Romanovsky
2022-01-20  8:00             ` Praveen Kannoju [this message]
2022-01-20 11:11               ` Leon Romanovsky
2022-01-20 11:57                 ` Praveen Kannoju
2022-01-20 12:21                   ` Leon Romanovsky
2022-01-20 12:27                     ` Praveen Kannoju
  -- strict thread matches above, loose matches on Subject: below --
2022-01-18 13:10 Praveen Kumar Kannoju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=PH0PR10MB55156B918F0D519B8A935C378C5A9@PH0PR10MB5515.namprd10.prod.outlook.com \
    --to=praveen.kannoju@oracle.com \
    --cc=davem@davemloft.net \
    --cc=jgg@ziepe.ca \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=rajesh.sivaramasubramaniom@oracle.com \
    --cc=rama.nichanamatlu@oracle.com \
    --cc=rds-devel@oss.oracle.com \
    --cc=santosh.shilimkar@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).