All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junxiao Bi <junxiao.bi@oracle.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Matthew Wilcox <matthew.wilcox@oracle.com>,
	Srinivas Eeda <SRINIVAS.EEDA@oracle.com>,
	"joe.jin@oracle.com" <joe.jin@oracle.com>,
	Wengang Wang <wen.gang.wang@oracle.com>
Subject: Re: [PATCH] proc: Avoid a thundering herd of threads freeing proc dentries
Date: Fri, 19 Jun 2020 08:56:21 -0700	[thread overview]
Message-ID: <caa9adf6-e1bb-167b-6f59-d17fd587d4fa@oracle.com> (raw)
In-Reply-To: <87bllf87ve.fsf_-_@x220.int.ebiederm.org>

Hi Eric,

The patch didn't improve lock contention.

    PerfTop:   48925 irqs/sec  kernel:95.6%  exact: 100.0% lost: 0/0 
drop: 0/0 [4000Hz cycles],  (all, 104 CPUs)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 


     69.66%  [kernel]                                        [k] 
native_queued_spin_lock_slowpath
      1.93%  [kernel]                                        [k] 
_raw_spin_lock
      1.24%  [kernel]                                        [k] 
page_counter_cancel
      0.70%  [kernel]                                        [k] 
do_syscall_64
      0.62%  [kernel]                                        [k] 
find_idlest_group.isra.96
      0.57%  [kernel]                                        [k] 
queued_write_lock_slowpath
      0.56%  [kernel]                                        [k] d_walk
      0.45%  [kernel]                                        [k] 
clear_page_erms
      0.44%  [kernel]                                        [k] 
syscall_return_via_sysret
      0.40%  [kernel]                                        [k] 
entry_SYSCALL_64
      0.38%  [kernel]                                        [k] 
refcount_dec_not_one
      0.37%  [kernel]                                        [k] 
propagate_protected_usage
      0.33%  [kernel]                                        [k] 
unmap_page_range
      0.33%  [kernel]                                        [k] 
select_collect
      0.32%  [kernel]                                        [k] memcpy_erms
      0.30%  [kernel]                                        [k] 
proc_task_readdir
      0.27%  [kernel]                                        [k] 
_raw_spin_lock_irqsave

Thanks,

Junxiao.

On 6/19/20 7:09 AM, ebiederm@xmission.com wrote:
> Junxiao Bi <junxiao.bi@oracle.com> reported:
>> When debugging some performance issue, i found that thousands of threads exit
>> around same time could cause a severe spin lock contention on proc dentry
>> "/proc/$parent_process_pid/task/", that's because threads needs to clean up
>> their pid file from that dir when exit.
> Matthew Wilcox <willy@infradead.org> reported:
>> We've looked at a few different ways of fixing this problem.
> The flushing of the proc dentries from the dcache is an optmization,
> and is not necessary for correctness.  Eventually cache pressure will
> cause the dentries to be freed even if no flushing happens.  Some
> light testing when I refactored the proc flushg[1] indicated that at
> least the memory footprint is easily measurable.
>
> An optimization that causes a performance problem due to a thundering
> herd of threads is no real optimization.
>
> Modify the code to only flush the /proc/<tgid>/ directory when all
> threads in a process are killed at once.  This continues to flush
> practically everything when the process is reaped as the threads live
> under /proc/<tgid>/task/<tid>.
>
> There is a rare possibility that a debugger will access /proc/<tid>/,
> which this change will no longer flush, but I believe such accesses
> are sufficiently rare to not be observed in practice.
>
> [1] 7bc3e6e55acf ("proc: Use a list of inodes to flush from proc")
> Link: https://lkml.kernel.org/r/54091fc0-ca46-2186-97a8-d1f3c4f3877b@oracle.com
> Reported-by: Masahiro Yamada <masahiroy@kernel.org>
> Reported-by: Matthew Wilcox <willy@infradead.org>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>
> I am still waiting for word on how this affects performance, but this is
> a clean version that should avoid the thundering herd problem in
> general.
>
>
>   kernel/exit.c | 19 +++++++++++++++----
>   1 file changed, 15 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/exit.c b/kernel/exit.c
> index cebae77a9664..567354550d62 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -151,8 +151,8 @@ void put_task_struct_rcu_user(struct task_struct *task)
>   
>   void release_task(struct task_struct *p)
>   {
> +	struct pid *flush_pid = NULL;
>   	struct task_struct *leader;
> -	struct pid *thread_pid;
>   	int zap_leader;
>   repeat:
>   	/* don't need to get the RCU readlock here - the process is dead and
> @@ -165,7 +165,16 @@ void release_task(struct task_struct *p)
>   
>   	write_lock_irq(&tasklist_lock);
>   	ptrace_release_task(p);
> -	thread_pid = get_pid(p->thread_pid);
> +
> +	/*
> +	 * When all of the threads are exiting wait until the end
> +	 * and flush everything.
> +	 */
> +	if (thread_group_leader(p))
> +		flush_pid = get_pid(task_tgid(p));
> +	else if (!(p->signal->flags & SIGNAL_GROUP_EXIT))
> +		flush_pid = get_pid(task_pid(p));
> +
>   	__exit_signal(p);
>   
>   	/*
> @@ -188,8 +197,10 @@ void release_task(struct task_struct *p)
>   	}
>   
>   	write_unlock_irq(&tasklist_lock);
> -	proc_flush_pid(thread_pid);
> -	put_pid(thread_pid);
> +	if (flush_pid) {
> +		proc_flush_pid(flush_pid);
> +		put_pid(flush_pid);
> +	}
>   	release_thread(p);
>   	put_task_struct_rcu_user(p);
>   

  reply	other threads:[~2020-06-19 15:56 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-18 22:17 severe proc dentry lock contention Junxiao Bi
2020-06-18 23:39 ` Matthew Wilcox
2020-06-19  0:02   ` Eric W. Biederman
2020-06-19  0:27     ` Junxiao Bi
2020-06-19  3:30       ` Eric W. Biederman
2020-06-19 14:09       ` [PATCH] proc: Avoid a thundering herd of threads freeing proc dentries Eric W. Biederman
2020-06-19 15:56         ` Junxiao Bi [this message]
2020-06-19 17:24           ` Eric W. Biederman
2020-06-19 21:56             ` Junxiao Bi
2020-06-19 22:42               ` Eric W. Biederman
2020-06-20 16:27                 ` Matthew Wilcox
2020-06-22  5:15                   ` Junxiao Bi
2020-06-22 15:20                     ` Eric W. Biederman
2020-06-22 15:48                       ` willy
2020-08-17 12:19                         ` Eric W. Biederman
2020-06-22 17:16                       ` Junxiao Bi
2020-06-23  0:47                     ` Matthew Wilcox
2020-06-25 22:11                       ` Junxiao Bi
2020-06-22  5:33         ` Masahiro Yamada
2020-06-22 15:13           ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=caa9adf6-e1bb-167b-6f59-d17fd587d4fa@oracle.com \
    --to=junxiao.bi@oracle.com \
    --cc=SRINIVAS.EEDA@oracle.com \
    --cc=ebiederm@xmission.com \
    --cc=joe.jin@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matthew.wilcox@oracle.com \
    --cc=wen.gang.wang@oracle.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.