rcu.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: SeongJae Park <sjpark@amazon.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: SeongJae Park <sjpark@amazon.com>, <davem@davemloft.net>,
	SeongJae Park <sjpark@amazon.de>, <kuba@kernel.org>,
	<kuznet@ms2.inr.ac.ru>, <paulmck@kernel.org>,
	<netdev@vger.kernel.org>, <rcu@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/1] net/ipv4/inet_fragment: Batch fqdir destroy works
Date: Thu, 10 Dec 2020 08:27:07 +0100	[thread overview]
Message-ID: <20201210072707.16079-1-sjpark@amazon.com> (raw)
In-Reply-To: <6d3e32f6-c2df-a1a6-3568-b7387cd0c933@gmail.com>

On Thu, 10 Dec 2020 01:17:58 +0100 Eric Dumazet <eric.dumazet@gmail.com> wrote:

> 
> 
> On 12/8/20 10:45 AM, SeongJae Park wrote:
> > From: SeongJae Park <sjpark@amazon.de>
> > 
> > In 'fqdir_exit()', a work for destruction of the 'fqdir' is enqueued.
> > The work function, 'fqdir_work_fn()', calls 'rcu_barrier()'.  In case of
> > intensive 'fqdir_exit()' (e.g., frequent 'unshare(CLONE_NEWNET)'
> > systemcalls), this increased contention could result in unacceptably
> > high latency of 'rcu_barrier()'.  This commit avoids such contention by
> > doing the destruction in batched manner, as similar to that of
> > 'cleanup_net()'.
> 
> Any numbers to share ? I have never seen an issue.

On our 40 CPU cores / 70GB DRAM machine, 15GB of available memory was reduced
within 2 minutes while my artificial reproducer runs.  The reproducer merely
repeats 'unshare(CLONE_NEWNET)' in a loop for 50,000 times.  The reproducer is
not only artificial but resembles the behavior of our real workloads.
While the reproducer runs, 'cleanup_net()' was called only 4 times.  First two
calls quickly finished, but third call took about 30 seconds, and the fourth
call didn't finished until the reproducer finishes.  We also confirmed the
third and fourth calls just waiting for 'rcu_barrier()'.

I think you've not seen this issue before because we are doing very intensive
'unshare()' calls.  Also, this is not reproducible on every hardware.  On my 6
CPU machine, the problem didn't reproduce.

> 
> > 
> > Signed-off-by: SeongJae Park <sjpark@amazon.de>
> > ---
> >  include/net/inet_frag.h  |  2 +-
> >  net/ipv4/inet_fragment.c | 28 ++++++++++++++++++++--------
> >  2 files changed, 21 insertions(+), 9 deletions(-)
> > 
> > diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
> > index bac79e817776..558893d8810c 100644
> > --- a/include/net/inet_frag.h
> > +++ b/include/net/inet_frag.h
> > @@ -20,7 +20,7 @@ struct fqdir {
> >  
> >  	/* Keep atomic mem on separate cachelines in structs that include it */
> >  	atomic_long_t		mem ____cacheline_aligned_in_smp;
> > -	struct work_struct	destroy_work;
> > +	struct llist_node	destroy_list;
> >  };
> >  
> >  /**
> > diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
> > index 10d31733297d..796b559137c5 100644
> > --- a/net/ipv4/inet_fragment.c
> > +++ b/net/ipv4/inet_fragment.c
> > @@ -145,12 +145,19 @@ static void inet_frags_free_cb(void *ptr, void *arg)
> >  		inet_frag_destroy(fq);
> >  }
> >  
> > +static LLIST_HEAD(destroy_list);
> > +
> >  static void fqdir_work_fn(struct work_struct *work)
> >  {
> > -	struct fqdir *fqdir = container_of(work, struct fqdir, destroy_work);
> > -	struct inet_frags *f = fqdir->f;
> > +	struct llist_node *kill_list;
> > +	struct fqdir *fqdir;
> > +	struct inet_frags *f;
> > +
> > +	/* Atomically snapshot the list of fqdirs to destroy */
> > +	kill_list = llist_del_all(&destroy_list);
> >  
> > -	rhashtable_free_and_destroy(&fqdir->rhashtable, inet_frags_free_cb, NULL);
> > +	llist_for_each_entry(fqdir, kill_list, destroy_list)
> > +		rhashtable_free_and_destroy(&fqdir->rhashtable, inet_frags_free_cb, NULL);
> > 
> 
> 
> OK, it seems rhashtable_free_and_destroy() has cond_resched() so we are not going
> to hold this cpu for long periods.
>  
> >  	/* We need to make sure all ongoing call_rcu(..., inet_frag_destroy_rcu)
> >  	 * have completed, since they need to dereference fqdir.
> > @@ -158,10 +165,13 @@ static void fqdir_work_fn(struct work_struct *work)
> >  	 */
> >  	rcu_barrier();
> >  
> > -	if (refcount_dec_and_test(&f->refcnt))
> > -		complete(&f->completion);
> > +	llist_for_each_entry(fqdir, kill_list, destroy_list) {
> 
> Don't we need the llist_for_each_entry_safe() variant here ???

Oh, indeed.  I will do so in the next version.

> 
> > +		f = fqdir->f;
> > +		if (refcount_dec_and_test(&f->refcnt))
> > +			complete(&f->completion);
> >  
> > -	kfree(fqdir);
> > +		kfree(fqdir);
> > +	}


Thanks,
SeongJae Park

      reply	other threads:[~2020-12-10  7:28 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-08  9:45 [PATCH 0/1] net: Reduce rcu_barrier() contentions from 'unshare(CLONE_NEWNET)' SeongJae Park
2020-12-08  9:45 ` [PATCH 1/1] net/ipv4/inet_fragment: Batch fqdir destroy works SeongJae Park
2020-12-09 23:16   ` Jakub Kicinski
2020-12-10  6:43     ` SeongJae Park
2020-12-10  0:17   ` Eric Dumazet
2020-12-10  7:27     ` SeongJae Park [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201210072707.16079-1-sjpark@amazon.com \
    --to=sjpark@amazon.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=kuba@kernel.org \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=sjpark@amazon.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).