From: ebiederm@xmission.com (Eric W. Biederman)
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
Linux FS Devel <linux-fsdevel@vger.kernel.org>,
Alexey Dobriyan <adobriyan@gmail.com>,
Alexey Gladkov <legion@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Oleg Nesterov <oleg@redhat.com>,
Alexey Gladkov <gladkov.alexey@gmail.com>
Subject: Re: [PATCH v2 2/2] proc: Ensure we see the exit of each process tid exactly
Date: Thu, 23 Apr 2020 22:33:06 -0500 [thread overview]
Message-ID: <87tv19tv65.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <CAHk-=wgXEJdkgGzZQzBDGk7ijjVdAVXe=G-mkFSVng_Hpwd4tQ@mail.gmail.com> (Linus Torvalds's message of "Thu, 23 Apr 2020 13:28:18 -0700")
Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Thu, Apr 23, 2020 at 12:42 PM Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>>
>> +void exchange_tids(struct task_struct *ntask, struct task_struct *otask)
>> +{
>> + /* pid_links[PIDTYPE_PID].next is always NULL */
>> + struct pid *npid = READ_ONCE(ntask->thread_pid);
>> + struct pid *opid = READ_ONCE(otask->thread_pid);
>> +
>> + rcu_assign_pointer(opid->tasks[PIDTYPE_PID].first, &ntask->pid_links[PIDTYPE_PID]);
>> + rcu_assign_pointer(npid->tasks[PIDTYPE_PID].first, &otask->pid_links[PIDTYPE_PID]);
>> + rcu_assign_pointer(ntask->thread_pid, opid);
>> + rcu_assign_pointer(otask->thread_pid, npid);
>> + WRITE_ONCE(ntask->pid_links[PIDTYPE_PID].pprev, &opid->tasks[PIDTYPE_PID].first);
>> + WRITE_ONCE(otask->pid_links[PIDTYPE_PID].pprev, &npid->tasks[PIDTYPE_PID].first);
>> + WRITE_ONCE(ntask->pid, pid_nr(opid));
>> + WRITE_ONCE(otask->pid, pid_nr(npid));
>> +}
>
> This function is _very_ hard to read as written.
>
> It really wants a helper function to do the swapping per hlist_head
> and hlist_node, I think. And "opid/npid" is very hard to see, and the
> naming doesn't make much sense (if it's an "exchange", then why is it
> "old/new" - they're symmetric).
>
> At least something like
>
> struct hlist_head *old_pid_hlist = opid->tasks + PIDTYPE_PID;
> struct hlist_head *new_pid_hlist = npid->tasks + PIDTYPE_PID;
> struct hlist_node *old_pid_node = otask->pid_links + PIDTYPE_PID;
> struct hlist_node *new_pid_node = ntask->pid_links + PIDTYPE_PID;
>
> struct hlist_node *old_first_node = old_pid_hlist->first;
> struct hlist_node *new_first_node = new_pid_hlist->first;
>
> and then trying to group up the first/pprev/thread_pid/pid accesses
> so that you them together, and using a helper function that does the
> whole switch, so that you'd have
>
> /* Move new node to old hlist, and update thread_pid/pid fields */
> insert_pid_pointers(old_pid_hlist, new_pid_node, new_first_node);
> rcu_assign_pointer(ntask->thread_pid, opid);
> WRITE_ONCE(ntask->pid, pid_nr(opid));
>
> /* Move old new to new hlist, and update thread_pid/pid fields */
> insert_pid_pointers(new_pid_hlist, old_pid_node, old_first_node);
> rcu_assign_pointer(otask->thread_pid, npid);
> WRITE_ONCE(otask->pid, pid_nr(npid));
>
> or something roughly like that.
>
> (And the above still uses "old/new", which as mentioned sounds wrong
> to me. Maybe it should just be "a_xyz" and "b_xyz"? Also note that I
> did this in my MUA, so I could have gotten the names and types wrong
> etc).
>
> I think that would make it look at least _slightly_ less like random
> line noise and easier to follow.
>
> But maybe even a rcu_hlist_swap() helper? We have one for regular
> lists. Do we really have to do it all written out, not do it with a
> "remove and reinsert" model?
At one point my brain I had forgetten that xchg can not take two memory
arguments and had hoped to be able to provide stronger guarnatees than I
can. Which is where I think the structure of exchange_pids came from.
I do agree the clearer we can write things, the easier it is for
someone else to come along and follow.
We can not use a remove and reinser model because that does break rcu
accesses, and complicates everything else. With a swap model we have
the struct pids pointer at either of the tasks that are swapped but
never at nothing. With a remove/reinsert model we have to deal the
addittional possibility of the pids not pointing at a thread at all
which can result in things like signals not being delivered at all.
I played with it a bit and the best I have been able to come up is:
void hlist_swap_before_rcu(struct hlist_node *left, struct hlist_node *right)
{
struct hlist_node **lpprev = left->pprev;
struct hlist_node **rpprev = right->pprev;
rcu_assign_pointer(*lpprev, right);
rcu_assign_pointer(*rpprev, left);
WRITE_ONCE(left->pprev, rpprev);
WRITE_ONCE(right->pprev, lpprev);
}
void exchange_tids(struct task_struct *left, struct task_struct *right)
{
struct hlist_node *lnode = &left->pid_links[PIDTYPE_PID];
struct hlist_node *rnode = &right->pid_links[PIDTYPE_PID];
struct pid *lpid, *rpid;
/* Replace the single entry tid lists with each other */
hlist_swap_before_rcu(lnode, rnode);
/* Swap thread_pid */
rpid = left->thread_pid;
lpid = right->thread_pid;
rcu_assign_pointer(left->thread_pid, lpid);
rcu_assign_pointer(right->thread_pid, rpid);
/* Swap the cached pid value */
WRITE_ONCE(left->pid, pid_nr(lpid));
WRITE_ONCE(right->pid, pid_nr(rpid));
}
hlists because they are not doubly linked can legitimately swap their
beginnings or their tails. Something that regular lists can not,
and I think that is exactly the general purpose semantic I want.
Does that look a little more readable?
Eric
next prev parent reply other threads:[~2020-04-24 3:36 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-19 14:10 [PATCH v12 0/7] proc: modernize proc to support multiple private instances Alexey Gladkov
2020-04-19 14:10 ` [PATCH v12 1/7] proc: rename struct proc_fs_info to proc_fs_opts Alexey Gladkov
2020-04-19 14:10 ` [PATCH v12 2/7] proc: allow to mount many instances of proc in one pid namespace Alexey Gladkov
2020-04-23 11:28 ` [PATCH v13 " Alexey Gladkov
2020-04-23 12:16 ` Eric W. Biederman
2020-04-23 20:01 ` Alexey Gladkov
2020-04-19 14:10 ` [PATCH v12 3/7] proc: instantiate only pids that we can ptrace on 'hidepid=4' mount option Alexey Gladkov
2020-04-19 14:10 ` [PATCH v12 4/7] proc: add option to mount only a pids subset Alexey Gladkov
2020-04-19 14:10 ` [PATCH v12 5/7] docs: proc: add documentation for "hidepid=4" and "subset=pid" options and new mount behavior Alexey Gladkov
2020-04-19 14:10 ` [PATCH v12 6/7] proc: use human-readable values for hidepid Alexey Gladkov
2020-04-19 14:10 ` [PATCH v12 7/7] proc: use named enums for better readability Alexey Gladkov
[not found] ` <87ftcv1nqe.fsf@x220.int.ebiederm.org>
2020-04-23 17:54 ` [PATCH v2 0/2] proc: Calling proc_flush_task exactly once per task Oleg Nesterov
2020-04-23 19:38 ` Eric W. Biederman
2020-04-23 19:39 ` [PATCH v2 1/2] proc: Use PIDTYPE_TGID in next_tgid Eric W. Biederman
2020-04-24 17:29 ` Oleg Nesterov
2020-04-23 19:39 ` [PATCH v2 2/2] proc: Ensure we see the exit of each process tid exactly Eric W. Biederman
2020-04-23 20:28 ` Linus Torvalds
2020-04-24 3:33 ` Eric W. Biederman [this message]
2020-04-24 18:02 ` Linus Torvalds
2020-04-24 18:46 ` Linus Torvalds
2020-04-24 19:51 ` Eric W. Biederman
2020-04-24 20:10 ` Linus Torvalds
2020-04-24 17:39 ` Oleg Nesterov
2020-04-24 18:10 ` Eric W. Biederman
2020-04-24 20:50 ` [PATCH] proc: Put thread_pid in release_task not proc_flush_pid Eric W. Biederman
[not found] ` <87mu6ymkea.fsf_-_@x220.int.ebiederm.org>
[not found] ` <87blnemj5t.fsf_-_@x220.int.ebiederm.org>
2020-04-26 17:22 ` [PATCH v3 2/6] posix-cpu-timers: Use PIDTYPE_TGID to simplify the logic in lookup_task Oleg Nesterov
2020-04-27 11:51 ` Eric W. Biederman
2020-04-28 18:03 ` Oleg Nesterov
2020-04-27 10:32 ` Thomas Gleixner
2020-04-27 19:46 ` Eric W. Biederman
[not found] ` <875zdmmj4y.fsf_-_@x220.int.ebiederm.org>
2020-04-26 17:40 ` [PATCH v3 3/6] rculist: Add hlist_swap_before_rcu Linus Torvalds
2020-04-27 14:28 ` Eric W. Biederman
2020-04-27 20:27 ` Linus Torvalds
2020-04-28 12:16 ` [PATCH v4 0/2] proc: Ensure we see the exit of each process tid exactly Eric W. Biederman
2020-04-28 12:18 ` [PATCH v4 1/2] rculist: Add hlists_swap_heads_rcu Eric W. Biederman
2020-04-28 12:19 ` [PATCH v4 2/2] proc: Ensure we see the exit of each process tid exactly once Eric W. Biederman
2020-04-28 16:53 ` [PATCH v4 0/2] proc: Ensure we see the exit of each process tid exactly Linus Torvalds
2020-04-28 17:55 ` Eric W. Biederman
2020-04-28 18:55 ` Eric W. Biederman
2020-04-28 19:36 ` Linus Torvalds
2020-04-28 18:05 ` Oleg Nesterov
2020-04-28 18:54 ` Eric W. Biederman
2020-04-28 21:39 ` [PATCH v1 0/4] signal: Removing has_group_leader_pid Eric W. Biederman
2020-04-28 21:45 ` [PATCH v1 1/4] posix-cpu-timer: Tidy up group_leader logic in lookup_task Eric W. Biederman
2020-04-28 21:48 ` [PATCH 2/4] posix-cpu-timer: Unify the now redundant code " Eric W. Biederman
2020-04-28 21:53 ` [PATCH v1 3/4] exec: Remove BUG_ON(has_group_leader_pid) Eric W. Biederman
2020-04-28 21:56 ` [PATCH v4 4/4] signal: Remove has_group_leader_pid Eric W. Biederman
2020-04-30 11:54 ` [PATCH v1 0/3] posix-cpu-timers: Use pids not tasks in lookup Eric W. Biederman
2020-04-30 11:55 ` [PATCH v1 1/3] posix-cpu-timers: Extend rcu_read_lock removing task_struct references Eric W. Biederman
2020-04-30 11:56 ` [PATCH v1 2/3] posix-cpu-timers: Replace cpu_timer_pid_type with clock_pid_type Eric W. Biederman
2020-04-30 11:56 ` [PATCH v1 3/3] posix-cpu-timers: Replace __get_task_for_clock with pid_for_clock Eric W. Biederman
[not found] ` <87h7x6mj6h.fsf_-_@x220.int.ebiederm.org>
2020-04-27 9:43 ` [PATCH v3 1/6] posix-cpu-timers: Always call __get_task_for_clock holding rcu_read_lock Thomas Gleixner
2020-04-27 11:53 ` Eric W. Biederman
[not found] ` <87r1w8ete7.fsf@x220.int.ebiederm.org>
2020-04-27 20:23 ` [PATCH v3] proc: Ensure we see the exit of each process tid exactly Eric W. Biederman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87tv19tv65.fsf@x220.int.ebiederm.org \
--to=ebiederm@xmission.com \
--cc=adobriyan@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=gladkov.alexey@gmail.com \
--cc=legion@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=oleg@redhat.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).