All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: ikent@redhat.com, onestero@redhat.com
Subject: [PATCH 2/3] pid: use idr tag to hint pids associated with group leader tasks
Date: Tue, 14 Jun 2022 14:09:48 -0400	[thread overview]
Message-ID: <20220614180949.102914-3-bfoster@redhat.com> (raw)
In-Reply-To: <20220614180949.102914-1-bfoster@redhat.com>

Searching the pid_namespace for group leader tasks is a fairly
inefficient operation. Listing the root directory of a procfs mount
performs a linear walk of allocated pids, checking each for an
associated PIDTYPE_TGID task to determine whether to populate a
directory entry. This can cause a significant increase in readdir()
syscall latency when run in runtime environments that might have one
or more processes with significant thread counts.

To facilitate improved TGID pid searches, define a new IDR
radix-tree tag for struct pid entries that are likely to have an
associated PIDTYPE_TGID task. To keep the code simple and avoid
having to maintain synchronization between tag state and post-fork
pid-task association changes, the tag is applied to all pids
initially allocated for tasks that are cloned without CLONE_THREAD.
The semantics of the tag are thus that false positives are possible
(i.e. tagged pids without PIDTYPE_TGID tasks), but false negatives
(i.e. untagged pids without PIDTYPE_TGID tasks) are not allowed.

For example, a userspace task that does a setsid() followed by a
fork() and exit() in the initial task will leave the initial pid
tagged (as it remains allocated for the PIDTYPE_SID association)
while the group leader task associates with the pid allocated for
the child fork. Once set, the tag persists for the lifetime of the
pid and is cleared when the pid is freed and associated entry
removed from the idr tree.

This is an effective optimization because false negatives are
relatively uncommon, essentially don't add any overhead that doesn't
already exist (i.e. having to check pid_task(..., PIDTYPE_TGID), but
still allows filtering out large numbers of thread pids that are
guaranteed to not have TGID task association.

Define the new IDR_TGID radix tree tag and provide a couple helpers
to set and check tag state. Set the tag in the allocation path when
the caller specifies that the pid is expected to track a group
leader. Since false negatives are not allowed, warn in the event
that a PIDTYPE_TGID task is ever attached to an untagged pid.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 include/linux/idr.h | 11 +++++++++++
 include/linux/pid.h |  2 +-
 kernel/fork.c       |  2 +-
 kernel/pid.c        |  9 ++++++++-
 4 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/include/linux/idr.h b/include/linux/idr.h
index a0dce14090a9..11e0ccedfc92 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -27,6 +27,7 @@ struct idr {
  * to users.  Use tag 0 to track whether a node has free space below it.
  */
 #define IDR_FREE	0
+#define IDR_TGID	1
 
 /* Set the IDR flag and the IDR_FREE tag */
 #define IDR_RT_MARKER	(ROOT_IS_IDR | (__force gfp_t)			\
@@ -174,6 +175,16 @@ static inline void idr_preload_end(void)
 	local_unlock(&radix_tree_preloads.lock);
 }
 
+static inline void idr_set_group_lead(struct idr *idr, unsigned long id)
+{
+	radix_tree_tag_set(&idr->idr_rt, id, IDR_TGID);
+}
+
+static inline bool idr_is_group_lead(struct idr *idr, unsigned long id)
+{
+	return radix_tree_tag_get(&idr->idr_rt, id, IDR_TGID);
+}
+
 /**
  * idr_for_each_entry() - Iterate over an IDR's elements of a given type.
  * @idr: IDR handle.
diff --git a/include/linux/pid.h b/include/linux/pid.h
index 343abf22092e..31f3cf765cee 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -134,7 +134,7 @@ extern struct pid *find_get_pid(int nr);
 extern struct pid *find_ge_pid(int nr, struct pid_namespace *);
 
 extern struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
-			     size_t set_tid_size);
+			     size_t set_tid_size, bool group_leader);
 extern void free_pid(struct pid *pid);
 extern void disable_pid_allocation(struct pid_namespace *ns);
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 9d44f2d46c69..3c52f45ec93e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2254,7 +2254,7 @@ static __latent_entropy struct task_struct *copy_process(
 
 	if (pid != &init_struct_pid) {
 		pid = alloc_pid(p->nsproxy->pid_ns_for_children, args->set_tid,
-				args->set_tid_size);
+				args->set_tid_size, !(clone_flags & CLONE_THREAD));
 		if (IS_ERR(pid)) {
 			retval = PTR_ERR(pid);
 			goto bad_fork_cleanup_thread;
diff --git a/kernel/pid.c b/kernel/pid.c
index 2fc0a16ec77b..5a745c35475e 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -157,7 +157,7 @@ void free_pid(struct pid *pid)
 }
 
 struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
-		      size_t set_tid_size)
+		      size_t set_tid_size, bool group_leader)
 {
 	struct pid *pid;
 	enum pid_type type;
@@ -272,6 +272,8 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
 	for ( ; upid >= pid->numbers; --upid) {
 		/* Make the PID visible to find_pid_ns. */
 		idr_replace(&upid->ns->idr, pid, upid->nr);
+		if (group_leader)
+			idr_set_group_lead(&upid->ns->idr, upid->nr);
 		upid->ns->pid_allocated++;
 	}
 	spin_unlock_irq(&pidmap_lock);
@@ -331,6 +333,11 @@ static struct pid **task_pid_ptr(struct task_struct *task, enum pid_type type)
 void attach_pid(struct task_struct *task, enum pid_type type)
 {
 	struct pid *pid = *task_pid_ptr(task, type);
+	struct pid_namespace *pid_ns = ns_of_pid(pid);
+	pid_t pid_nr = pid_nr_ns(pid, pid_ns);
+
+	WARN_ON(type == PIDTYPE_TGID &&
+		!idr_is_group_lead(&pid_ns->idr, pid_nr));
 	hlist_add_head_rcu(&task->pid_links[type], &pid->tasks[type]);
 }
 
-- 
2.34.1


  parent reply	other threads:[~2022-06-14 18:10 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-14 18:09 [PATCH 0/3] proc: improve root readdir latency with many threads Brian Foster
2022-06-14 18:09 ` [PATCH 1/3] radix-tree: propagate all tags in idr tree Brian Foster
2022-06-15 11:12   ` Christoph Hellwig
2022-06-15 12:57     ` Brian Foster
2022-06-15 13:40       ` Matthew Wilcox
2022-06-15 14:43         ` Brian Foster
2022-06-15 16:34           ` Matthew Wilcox
2022-06-28 12:55             ` Christian Brauner
2022-06-28 14:53               ` Brian Foster
2022-06-29 19:13                 ` Brian Foster
2022-07-11 20:24                 ` Matthew Wilcox
2022-06-15 13:33     ` Ian Kent
2022-06-14 18:09 ` Brian Foster [this message]
2022-06-14 18:09 ` [PATCH 3/3] proc: use idr tgid tag hint to iterate pids in readdir Brian Foster
2022-06-15 13:44   ` Matthew Wilcox
2022-06-15 14:44     ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220614180949.102914-3-bfoster@redhat.com \
    --to=bfoster@redhat.com \
    --cc=ikent@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=onestero@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.