[PATCH v5] do_wait: make PIDTYPE_PID case O(1) instead of O(n)

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v5] do_wait: make PIDTYPE_PID case O(1) instead of O(n)
@ 2021-03-12 17:38 Jim Newsome
  2021-03-12 18:22 ` Andrew Morton
  2021-03-12 20:29 ` Eric W. Biederman
  0 siblings, 2 replies; 10+ messages in thread
From: Jim Newsome @ 2021-03-12 17:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Oleg Nesterov, Eric W . Biederman, Christian Brauner,
	linux-kernel, Jim Newsome

do_wait is an internal function used to implement waitpid, waitid,
wait4, etc. To handle the general case, it does an O(n) linear scan of
the thread group's children and tracees.

This patch adds a special-case when waiting on a pid to skip these scans
and instead do an O(1) lookup. This improves performance when waiting on
a pid from a thread group with many children and/or tracees.

Signed-off-by: James Newsome <jnewsome@torproject.org>
---

v4: https://lkml.org/lkml/2021/3/11/1333

* Added back missing target->ptrace check.
* Added explicit thread_group_leader check instead of double-lookup.

 kernel/exit.c | 69 +++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 59 insertions(+), 10 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 04029e35e69a..65c862c604a7 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1439,9 +1439,50 @@ void __wake_up_parent(struct task_struct *p, struct task_struct *parent)
 			   TASK_INTERRUPTIBLE, p);
 }
 
+static bool is_effectively_child(struct wait_opts *wo, bool ptrace,
+				 struct task_struct *target)
+{
+	struct task_struct *parent =
+		!ptrace ? target->real_parent : target->parent;
+
+	return current == parent || (!(wo->wo_flags & __WNOTHREAD) &&
+				     same_thread_group(current, parent));
+}
+
+/*
+ * Optimization for waiting on PIDTYPE_PID. No need to iterate through child
+ * and tracee lists to find the target task.
+ */
+static int do_wait_pid(struct wait_opts *wo)
+{
+	bool ptrace;
+	struct task_struct *target;
+	int retval;
+
+	target = pid_task(wo->wo_pid, PIDTYPE_PID);
+	if (!target)
+		return 0;
+
+	ptrace = false;
+	if (thread_group_leader(target) &&
+	    is_effectively_child(wo, ptrace, target)) {
+		retval = wait_consider_task(wo, ptrace, target);
+		if (retval)
+			return retval;
+	}
+
+	ptrace = true;
+	if (target->ptrace && is_effectively_child(wo, ptrace, target)) {
+		retval = wait_consider_task(wo, ptrace, target);
+		if (retval)
+			return retval;
+	}
+
+	return 0;
+}
+
 static long do_wait(struct wait_opts *wo)
 {
-	struct task_struct *tsk;
 	int retval;
 
 	trace_sched_process_wait(wo->wo_pid);
@@ -1463,19 +1504,27 @@ static long do_wait(struct wait_opts *wo)
 
 	set_current_state(TASK_INTERRUPTIBLE);
 	read_lock(&tasklist_lock);
-	tsk = current;
-	do {
-		retval = do_wait_thread(wo, tsk);
-		if (retval)
-			goto end;
 
-		retval = ptrace_do_wait(wo, tsk);
+	if (wo->wo_type == PIDTYPE_PID) {
+		retval = do_wait_pid(wo);
 		if (retval)
 			goto end;
+	} else {
+		struct task_struct *tsk = current;
+
+		do {
+			retval = do_wait_thread(wo, tsk);
+			if (retval)
+				goto end;
 
-		if (wo->wo_flags & __WNOTHREAD)
-			break;
-	} while_each_thread(current, tsk);
+			retval = ptrace_do_wait(wo, tsk);
+			if (retval)
+				goto end;
+
+			if (wo->wo_flags & __WNOTHREAD)
+				break;
+		} while_each_thread(current, tsk);
+	}
 	read_unlock(&tasklist_lock);
 
 notask:
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v5] do_wait: make PIDTYPE_PID case O(1) instead of O(n)
  2021-03-12 17:38 [PATCH v5] do_wait: make PIDTYPE_PID case O(1) instead of O(n) Jim Newsome
@ 2021-03-12 18:22 ` Andrew Morton
  2021-03-12 18:39   ` Jim Newsome
  2021-03-12 20:29 ` Eric W. Biederman
  1 sibling, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2021-03-12 18:22 UTC (permalink / raw)
  To: Jim Newsome
  Cc: Oleg Nesterov, Eric W . Biederman, Christian Brauner, linux-kernel

On Fri, 12 Mar 2021 11:38:55 -0600 Jim Newsome <jnewsome@torproject.org> wrote:

> do_wait is an internal function used to implement waitpid, waitid,
> wait4, etc. To handle the general case, it does an O(n) linear scan of
> the thread group's children and tracees.
> 
> This patch adds a special-case when waiting on a pid to skip these scans
> and instead do an O(1) lookup. This improves performance when waiting on
> a pid from a thread group with many children and/or tracees.

Could we please see some performance testing results to permit us to
evaluate the value of this change?

Thanks.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5] do_wait: make PIDTYPE_PID case O(1) instead of O(n)
  2021-03-12 18:22 ` Andrew Morton
@ 2021-03-12 18:39   ` Jim Newsome
  2021-03-12 18:47     ` Andrew Morton
  2021-03-12 20:01     ` Jim Newsome
  0 siblings, 2 replies; 10+ messages in thread
From: Jim Newsome @ 2021-03-12 18:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Oleg Nesterov, Eric W . Biederman, Christian Brauner, linux-kernel

On 3/12/21 12:22, Andrew Morton wrote:
> 
> Could we please see some performance testing results to permit us to
> evaluate the value of this change?

Sure. I've been doing some ad-hoc measurements with the code below. It
forks 8k children and then waits for them in reverse order (forcing a
full list traversal each time). I'll need to reboot a couple times to
get apples-to-apples measurements on bare metal, though. I'll plan to
run with NUMCHILDREN = 0 -> 8000, by 100.

Does this look like it'd be sufficient, or is there more you'd like to
see? The current form doesn't use ptrace, but I expect the results to be
similar; (maybe more pronounced when tracing threaded children, since
every thread is in the tracee list instead of just the group leaders).

#define NUMCHILDREN 8000

void fork_and_wait() {
    pid_t children[NUMCHILDREN];
    for (int i = 0; i < NUMCHILDREN; ++i) {
        pid_t forkrv = fork();
        if (forkrv < 0) {
            perror("fork");
            exit(1);
        }
        if (forkrv == 0) {
            // child
            exit(0);
        }
        // parent
        children[i] = forkrv;
    }
    for (int i = 0; i < NUMCHILDREN; ++i) {
        int wstatus;
        if (waitpid(children[NUMCHILDREN - i - 1], &wstatus, 0) < 0) {
            perror("waitpid");
            exit(1);
        }
    }
}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5] do_wait: make PIDTYPE_PID case O(1) instead of O(n)
  2021-03-12 18:39   ` Jim Newsome
@ 2021-03-12 18:47     ` Andrew Morton
  2021-03-13  2:42       ` Jim Newsome
  2021-03-12 20:01     ` Jim Newsome
  1 sibling, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2021-03-12 18:47 UTC (permalink / raw)
  To: Jim Newsome
  Cc: Oleg Nesterov, Eric W . Biederman, Christian Brauner, linux-kernel

On Fri, 12 Mar 2021 12:39:12 -0600 Jim Newsome <jnewsome@torproject.org> wrote:

> On 3/12/21 12:22, Andrew Morton wrote:
> > 
> > Could we please see some performance testing results to permit us to
> > evaluate the value of this change?
> 
> Sure. I've been doing some ad-hoc measurements with the code below. It
> forks 8k children and then waits for them in reverse order (forcing a
> full list traversal each time). I'll need to reboot a couple times to
> get apples-to-apples measurements on bare metal, though. I'll plan to
> run with NUMCHILDREN = 0 -> 8000, by 100.
> 
> Does this look like it'd be sufficient, or is there more you'd like to
> see? The current form doesn't use ptrace, but I expect the results to be
> similar; (maybe more pronounced when tracing threaded children, since
> every thread is in the tracee list instead of just the group leaders).

A very specific microbenchmark which tickles a particular corner case
is useful, as long as there's some realistic chance that someone's
workload is adequately modeled by that test.

Also very useful would be some words which describe what led you to do
this work (presumably some real-world was being impacted) and a description
of how the patch improves that workload (or is expected to improve it).

IOW, please spend a bit of time selling the patch!  What is the case
for including it in Linux?  What benefit does it provide our users?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5] do_wait: make PIDTYPE_PID case O(1) instead of O(n)
  2021-03-12 18:39   ` Jim Newsome
  2021-03-12 18:47     ` Andrew Morton
@ 2021-03-12 20:01     ` Jim Newsome
  1 sibling, 0 replies; 10+ messages in thread
From: Jim Newsome @ 2021-03-12 20:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Oleg Nesterov, Eric W . Biederman, Christian Brauner, linux-kernel

(Re-sent without html part, which the list rejected)

On 3/12/21 12:47, Andrew Morton wrote:

> IOW, please spend a bit of time selling the patch!  What is the case
> for including it in Linux?  What benefit does it provide our users?

Ah yes - I'd included some context when I first reached out to Oleg, but
that never made it to the list :).

I'm helping develop a new ptrace-based version of Shadow [1] - a tool
for simulating a (potentially large) network. Shadow runs the network
user-space applications in an emulated environment, and routes network
traffic through a model of the network accounting for latency,
bandwidth, etc. The Tor Project plans to make increasing use of Shadow
both for focused evaluation of specific proposed software and parameter
changes, attacks, and defenses, and as a regular automated performance
evaluation prior to deployment of new versions.  Today Shadow is already
actively used in the research community for applications including tor
and bitcoin.

We're interested in running simulations including at least tens of
thousands of processes, with a stretch goal of being able to handle 1M
processes. Since each process is being ptraced, calling an O(n) waitpid
has a huge performance penalty at this scale, and results in simulation
performance growing ~quadratically with the size of the simulation.

We do have a workaround where we use a "fork proxy" thread to actually
fork all the processes, and we stop and detach inactive processes. (The
number of "active" processes is roughly fixed to the number of worker
threads, which is generally the # of CPUs available). i.e. this keeps
the number of children and tracees small and fixed, allowing us to scale
linearly. However, having to detach and reattach tracees adds a
significant linear overhead factor. This kernel patch would allow us to
get rid of the complexity and extra overhead of this workaround, and
benefit other applications that haven't implemented such a workaround.

We have some details and analysis of this issue in GitHub [2]. I haven't
added results with the patch yet, but plan to do so.

This should also help other applications that have a large number of
children or tracees. Ptrace-based emulation tools are probably the most
likely to benefit. e.g. I suspect it'll help User Mode Linux [3], which
IIUC uses a single tracer thread to ptrace every process running on its
kernel. Likewise it could help DetTrace [4], which uses ptrace for
deterministic software builds.

[1]: https://shadow.github.io/
[2]: https://github.com/shadow/shadow/issues/1134
[3]: https://en.wikipedia.org/wiki/User-mode_Linux
[4]: https://github.com/dettrace/dettrace

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5] do_wait: make PIDTYPE_PID case O(1) instead of O(n)
  2021-03-12 17:38 [PATCH v5] do_wait: make PIDTYPE_PID case O(1) instead of O(n) Jim Newsome
  2021-03-12 18:22 ` Andrew Morton
@ 2021-03-12 20:29 ` Eric W. Biederman
  2021-03-12 21:05   ` Jim Newsome
  2021-03-13 17:25   ` Oleg Nesterov
  1 sibling, 2 replies; 10+ messages in thread
From: Eric W. Biederman @ 2021-03-12 20:29 UTC (permalink / raw)
  To: Jim Newsome; +Cc: Andrew Morton, Oleg Nesterov, Christian Brauner, linux-kernel

Jim Newsome <jnewsome@torproject.org> writes:

> do_wait is an internal function used to implement waitpid, waitid,
> wait4, etc. To handle the general case, it does an O(n) linear scan of
> the thread group's children and tracees.
>
> This patch adds a special-case when waiting on a pid to skip these scans
> and instead do an O(1) lookup. This improves performance when waiting on
> a pid from a thread group with many children and/or tracees.

I am going to kibitz just a little bit more.

When I looked at this a second time it became apparent that using
pid_task twice should actually be faster as it removes a dependent load
caused by thread_group_leader, and replaces it by accessing two adjacent
pointers in the same cache line.

I know the algorithmic improvement is the main advantage, but removing
60ns or so for a dependent load can't hurt.

Plus I think using the two pid types really makes it clear that one
is always a process and the other is always potentially a thread.

/*
 * Optimization for waiting on PIDTYPE_PID. No need to iterate through child
 * and tracee lists to find the target task.
 */
static int do_wait_pid(struct wait_opts *wo)
{
	bool ptrace;
	struct task_struct *target;
	int retval;

	ptrace = false;
	target = pid_task(wo->wo_pid, PIDTYPE_TGID);
	if (target && is_effectively_child(wo, ptrace, target)) {
		retval = wait_consider_task(wo, ptrace, target);
		if (retval)
			return retval;
	}

	ptrace = true;
	target = pid_task(wo->wo_pid, PIDTYPE_PID);
	if (target && target->ptrace &&
            is_effectively_child(wo, ptrace, target)) {
		retval = wait_consider_task(wo, ptrace, target);
		if (retval)
			return retval;
	}

	return 0;
}

Since the probably needs to be respun to include the improved
description can we look at my micro performance improvement?

Eric

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5] do_wait: make PIDTYPE_PID case O(1) instead of O(n)
  2021-03-12 20:29 ` Eric W. Biederman
@ 2021-03-12 21:05   ` Jim Newsome
  2021-03-12 21:25     ` Eric W. Biederman
  2021-03-13 17:25   ` Oleg Nesterov
  1 sibling, 1 reply; 10+ messages in thread
From: Jim Newsome @ 2021-03-12 21:05 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Oleg Nesterov, Christian Brauner, linux-kernel

On 3/12/21 14:29, Eric W. Biederman wrote:
> When I looked at this a second time it became apparent that using
> pid_task twice should actually be faster as it removes a dependent load
> caused by thread_group_leader, and replaces it by accessing two adjacent
> pointers in the same cache line.
> 
> I know the algorithmic improvement is the main advantage, but removing
> 60ns or so for a dependent load can't hurt.
> 
> Plus I think using the two pid types really makes it clear that one
> is always a process and the other is always potentially a thread.
> 
> /*
>  * Optimization for waiting on PIDTYPE_PID. No need to iterate through child
>  * and tracee lists to find the target task.
>  */
> static int do_wait_pid(struct wait_opts *wo)
> {
> 	bool ptrace;
> 	struct task_struct *target;
> 	int retval;
> 
> 	ptrace = false;
> 	target = pid_task(wo->wo_pid, PIDTYPE_TGID);
> 	if (target && is_effectively_child(wo, ptrace, target)) {
> 		retval = wait_consider_task(wo, ptrace, target);
> 		if (retval)
> 			return retval;
> 	}
> 
> 	ptrace = true;
> 	target = pid_task(wo->wo_pid, PIDTYPE_PID);
> 	if (target && target->ptrace &&
>             is_effectively_child(wo, ptrace, target)) {
> 		retval = wait_consider_task(wo, ptrace, target);
> 		if (retval)
> 			return retval;
> 	}
> 
> 	return 0;
> }

I'm fine with either way.

Part of what made my earlier version with the double-lookup a bit
awkward was only doing the second lookup if the first lookup failed. I'm
happy to take your word though that making the second lookup conditional
is unnecessary or even detrimental :). It did cross my mind that it
might not be a very consistent branch for a branch-predictor, but I also
figured pid_task's synchronization might outweigh that.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5] do_wait: make PIDTYPE_PID case O(1) instead of O(n)
  2021-03-12 21:05   ` Jim Newsome
@ 2021-03-12 21:25     ` Eric W. Biederman
  0 siblings, 0 replies; 10+ messages in thread
From: Eric W. Biederman @ 2021-03-12 21:25 UTC (permalink / raw)
  To: Jim Newsome; +Cc: Andrew Morton, Oleg Nesterov, Christian Brauner, linux-kernel

Jim Newsome <jnewsome@torproject.org> writes:

> On 3/12/21 14:29, Eric W. Biederman wrote:
>> When I looked at this a second time it became apparent that using
>> pid_task twice should actually be faster as it removes a dependent load
>> caused by thread_group_leader, and replaces it by accessing two adjacent
>> pointers in the same cache line.
>> 
>> I know the algorithmic improvement is the main advantage, but removing
>> 60ns or so for a dependent load can't hurt.
>> 
>> Plus I think using the two pid types really makes it clear that one
>> is always a process and the other is always potentially a thread.
>> 
>> /*
>>  * Optimization for waiting on PIDTYPE_PID. No need to iterate through child
>>  * and tracee lists to find the target task.
>>  */
>> static int do_wait_pid(struct wait_opts *wo)
>> {
>> 	bool ptrace;
>> 	struct task_struct *target;
>> 	int retval;
>> 
>> 	ptrace = false;
>> 	target = pid_task(wo->wo_pid, PIDTYPE_TGID);
>> 	if (target && is_effectively_child(wo, ptrace, target)) {
>> 		retval = wait_consider_task(wo, ptrace, target);
>> 		if (retval)
>> 			return retval;
>> 	}
>> 
>> 	ptrace = true;
>> 	target = pid_task(wo->wo_pid, PIDTYPE_PID);
>> 	if (target && target->ptrace &&
>>             is_effectively_child(wo, ptrace, target)) {
>> 		retval = wait_consider_task(wo, ptrace, target);
>> 		if (retval)
>> 			return retval;
>> 	}
>> 
>> 	return 0;
>> }
>
> I'm fine with either way.
>
> Part of what made my earlier version with the double-lookup a bit
> awkward was only doing the second lookup if the first lookup failed. I'm
> happy to take your word though that making the second lookup conditional
> is unnecessary or even detrimental :).

Oh absolutely.  The two lookups are independent.

> It did cross my mind that it
> might not be a very consistent branch for a branch-predictor, but I also
> figured pid_task's synchronization might outweigh that.

pid_task has a lot of verbiage but it is only reading a pointer,
verifying the pointer is not NULL and calling container_of on the result
of the pointer read.

Eric


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5] do_wait: make PIDTYPE_PID case O(1) instead of O(n)
  2021-03-12 18:47     ` Andrew Morton
@ 2021-03-13  2:42       ` Jim Newsome
  0 siblings, 0 replies; 10+ messages in thread
From: Jim Newsome @ 2021-03-13  2:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Oleg Nesterov, Eric W . Biederman, Christian Brauner, linux-kernel

Here are the micro-benchmark results. I ended up reworking it to use
google's benchmark tool [1]. For each N I timed how long it took to fork
a new child and then immediately wait on it, while already having N
other children. (Initially I tried to vfork, but that didn't play nicely
with the benchmark tool)

[1] https://github.com/google/benchmark

Without the patch:

./bench_waitpid
2021-03-12T20:03:57-06:00
Running ./bench_waitpid
Run on (4 X 2693.88 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 4096 KiB (x4)
  L3 Unified 16384 KiB (x4)
Load Average: 0.48, 0.16, 0.06
----------------------------------------------------------
Benchmark                Time             CPU   Iterations
----------------------------------------------------------
BM_WaitPid/1        313354 ns        74754 ns         8575
BM_WaitPid/10       305954 ns        72424 ns         7977
BM_WaitPid/100      323126 ns        83368 ns         7861
BM_WaitPid/500      324479 ns        99071 ns         6541
BM_WaitPid/1000     377143 ns       178590 ns         3260
BM_WaitPid/5000     816597 ns       803919 ns          849
BM_WaitPid/8000    1282339 ns      1268179 ns          548

For example, this means that for a process that already has 5000
children, forking and then immediately calling waitpid on 5001st took an
average of 179 microsecs. The O(n) implementation starts getting
noticeably slower with around 100 children.

With the patch:

$ ./bench_waitpid
2021-03-12T20:19:52-06:00
Running ./bench_waitpid
Run on (4 X 2693.88 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 4096 KiB (x4)
  L3 Unified 16384 KiB (x4)
Load Average: 1.39, 0.49, 0.18
----------------------------------------------------------
Benchmark                Time             CPU   Iterations
----------------------------------------------------------
BM_WaitPid/1        305501 ns        74028 ns         9261
BM_WaitPid/10       309644 ns        74916 ns         8838
BM_WaitPid/100      319457 ns        77193 ns         8717
BM_WaitPid/500      306929 ns        73863 ns         8327
BM_WaitPid/1000     310849 ns        74848 ns         8458
BM_WaitPid/5000     329324 ns        79386 ns         9123
BM_WaitPid/8000     317991 ns        77889 ns         7526

As expected, the cost is about the same as the baseline for a small
number of children, and stays about the same as the # of children increase.

Source:

#include <sys/times.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>

#include "benchmark/benchmark.h"

static void BM_WaitPid(benchmark::State& state) {
    int num_children = state.range(0);

    // Create num_children
    std::vector<pid_t> children;
    children.reserve(num_children);
    for (int i = 0; i < num_children; ++i) {
        pid_t forkrv = fork();
        if (forkrv < 0) {
            perror("fork");
            exit(1);
        }
        if (forkrv == 0) {
            // child
            exit(0);
        }
        children.push_back(forkrv);
    }

    // The body of this loop is what gets timed.
    for (auto _ : state) {
        pid_t forkrv = fork();
        if (forkrv < 0) {
            perror("fork");
            exit(1);
        }
        if (forkrv == 0) {
            // child
            exit(0);
        }
        if (waitpid(forkrv, NULL, 0) < 0) {
            perror("waitpid");
            exit(1);
        }
    }

    // Tear down the other children
    for (pid_t child : children) {
        if (waitpid(child, NULL, 0) < 0) {
            perror("waitpid");
            exit(1);
        }
    }
}
// Register the function as a benchmark
BENCHMARK(BM_WaitPid)->Arg(1)->Arg(10)->Arg(100)->Arg(500)->Arg(1000)->Arg(5000)->Arg(8000);
// Run the benchmark
BENCHMARK_MAIN();

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5] do_wait: make PIDTYPE_PID case O(1) instead of O(n)
  2021-03-12 20:29 ` Eric W. Biederman
  2021-03-12 21:05   ` Jim Newsome
@ 2021-03-13 17:25   ` Oleg Nesterov
  1 sibling, 0 replies; 10+ messages in thread
From: Oleg Nesterov @ 2021-03-13 17:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jim Newsome, Andrew Morton, Christian Brauner, linux-kernel

On 03/12, Eric W. Biederman wrote:
>
> I am going to kibitz just a little bit more.
>
> When I looked at this a second time it became apparent that using
> pid_task twice should actually be faster as it removes a dependent load
> caused by thread_group_leader, and replaces it by accessing two adjacent
> pointers in the same cache line.

Heh, I think this code should be optimized for the reader in the first place ;)

But as I said I won't argue, readability is very subjective, I am fine with your
version too.

Oleg.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-03-13 17:26 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-12 17:38 [PATCH v5] do_wait: make PIDTYPE_PID case O(1) instead of O(n) Jim Newsome
2021-03-12 18:22 ` Andrew Morton
2021-03-12 18:39   ` Jim Newsome
2021-03-12 18:47     ` Andrew Morton
2021-03-13  2:42       ` Jim Newsome
2021-03-12 20:01     ` Jim Newsome
2021-03-12 20:29 ` Eric W. Biederman
2021-03-12 21:05   ` Jim Newsome
2021-03-12 21:25     ` Eric W. Biederman
2021-03-13 17:25   ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).