From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 136A3C55186 for ; Fri, 24 Apr 2020 03:36:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EB588206D7 for ; Fri, 24 Apr 2020 03:36:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726444AbgDXDgS (ORCPT ); Thu, 23 Apr 2020 23:36:18 -0400 Received: from out03.mta.xmission.com ([166.70.13.233]:37994 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725884AbgDXDgR (ORCPT ); Thu, 23 Apr 2020 23:36:17 -0400 Received: from in01.mta.xmission.com ([166.70.13.51]) by out03.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jRp8S-0003fD-Ei; Thu, 23 Apr 2020 21:36:16 -0600 Received: from ip68-227-160-95.om.om.cox.net ([68.227.160.95] helo=x220.xmission.com) by in01.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.87) (envelope-from ) id 1jRp8R-0007ZJ-Fb; Thu, 23 Apr 2020 21:36:16 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: Linus Torvalds Cc: LKML , Linux FS Devel , Alexey Dobriyan , Alexey Gladkov , Andrew Morton , Oleg Nesterov , Alexey Gladkov References: <20200419141057.621356-1-gladkov.alexey@gmail.com> <87ftcv1nqe.fsf@x220.int.ebiederm.org> <87wo66vvnm.fsf_-_@x220.int.ebiederm.org> Date: Thu, 23 Apr 2020 22:33:06 -0500 In-Reply-To: (Linus Torvalds's message of "Thu, 23 Apr 2020 13:28:18 -0700") Message-ID: <87tv19tv65.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1jRp8R-0007ZJ-Fb;;;mid=<87tv19tv65.fsf@x220.int.ebiederm.org>;;;hst=in01.mta.xmission.com;;;ip=68.227.160.95;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1+K74FabDAcvn7I1S0sCNkQEUS9knirSzk= X-SA-Exim-Connect-IP: 68.227.160.95 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [PATCH v2 2/2] proc: Ensure we see the exit of each process tid exactly X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Linus Torvalds writes: > On Thu, Apr 23, 2020 at 12:42 PM Eric W. Biederman > wrote: >> >> +void exchange_tids(struct task_struct *ntask, struct task_struct *otask) >> +{ >> + /* pid_links[PIDTYPE_PID].next is always NULL */ >> + struct pid *npid = READ_ONCE(ntask->thread_pid); >> + struct pid *opid = READ_ONCE(otask->thread_pid); >> + >> + rcu_assign_pointer(opid->tasks[PIDTYPE_PID].first, &ntask->pid_links[PIDTYPE_PID]); >> + rcu_assign_pointer(npid->tasks[PIDTYPE_PID].first, &otask->pid_links[PIDTYPE_PID]); >> + rcu_assign_pointer(ntask->thread_pid, opid); >> + rcu_assign_pointer(otask->thread_pid, npid); >> + WRITE_ONCE(ntask->pid_links[PIDTYPE_PID].pprev, &opid->tasks[PIDTYPE_PID].first); >> + WRITE_ONCE(otask->pid_links[PIDTYPE_PID].pprev, &npid->tasks[PIDTYPE_PID].first); >> + WRITE_ONCE(ntask->pid, pid_nr(opid)); >> + WRITE_ONCE(otask->pid, pid_nr(npid)); >> +} > > This function is _very_ hard to read as written. > > It really wants a helper function to do the swapping per hlist_head > and hlist_node, I think. And "opid/npid" is very hard to see, and the > naming doesn't make much sense (if it's an "exchange", then why is it > "old/new" - they're symmetric). > > At least something like > > struct hlist_head *old_pid_hlist = opid->tasks + PIDTYPE_PID; > struct hlist_head *new_pid_hlist = npid->tasks + PIDTYPE_PID; > struct hlist_node *old_pid_node = otask->pid_links + PIDTYPE_PID; > struct hlist_node *new_pid_node = ntask->pid_links + PIDTYPE_PID; > > struct hlist_node *old_first_node = old_pid_hlist->first; > struct hlist_node *new_first_node = new_pid_hlist->first; > > and then trying to group up the first/pprev/thread_pid/pid accesses > so that you them together, and using a helper function that does the > whole switch, so that you'd have > > /* Move new node to old hlist, and update thread_pid/pid fields */ > insert_pid_pointers(old_pid_hlist, new_pid_node, new_first_node); > rcu_assign_pointer(ntask->thread_pid, opid); > WRITE_ONCE(ntask->pid, pid_nr(opid)); > > /* Move old new to new hlist, and update thread_pid/pid fields */ > insert_pid_pointers(new_pid_hlist, old_pid_node, old_first_node); > rcu_assign_pointer(otask->thread_pid, npid); > WRITE_ONCE(otask->pid, pid_nr(npid)); > > or something roughly like that. > > (And the above still uses "old/new", which as mentioned sounds wrong > to me. Maybe it should just be "a_xyz" and "b_xyz"? Also note that I > did this in my MUA, so I could have gotten the names and types wrong > etc). > > I think that would make it look at least _slightly_ less like random > line noise and easier to follow. > > But maybe even a rcu_hlist_swap() helper? We have one for regular > lists. Do we really have to do it all written out, not do it with a > "remove and reinsert" model? At one point my brain I had forgetten that xchg can not take two memory arguments and had hoped to be able to provide stronger guarnatees than I can. Which is where I think the structure of exchange_pids came from. I do agree the clearer we can write things, the easier it is for someone else to come along and follow. We can not use a remove and reinser model because that does break rcu accesses, and complicates everything else. With a swap model we have the struct pids pointer at either of the tasks that are swapped but never at nothing. With a remove/reinsert model we have to deal the addittional possibility of the pids not pointing at a thread at all which can result in things like signals not being delivered at all. I played with it a bit and the best I have been able to come up is: void hlist_swap_before_rcu(struct hlist_node *left, struct hlist_node *right) { struct hlist_node **lpprev = left->pprev; struct hlist_node **rpprev = right->pprev; rcu_assign_pointer(*lpprev, right); rcu_assign_pointer(*rpprev, left); WRITE_ONCE(left->pprev, rpprev); WRITE_ONCE(right->pprev, lpprev); } void exchange_tids(struct task_struct *left, struct task_struct *right) { struct hlist_node *lnode = &left->pid_links[PIDTYPE_PID]; struct hlist_node *rnode = &right->pid_links[PIDTYPE_PID]; struct pid *lpid, *rpid; /* Replace the single entry tid lists with each other */ hlist_swap_before_rcu(lnode, rnode); /* Swap thread_pid */ rpid = left->thread_pid; lpid = right->thread_pid; rcu_assign_pointer(left->thread_pid, lpid); rcu_assign_pointer(right->thread_pid, rpid); /* Swap the cached pid value */ WRITE_ONCE(left->pid, pid_nr(lpid)); WRITE_ONCE(right->pid, pid_nr(rpid)); } hlists because they are not doubly linked can legitimately swap their beginnings or their tails. Something that regular lists can not, and I think that is exactly the general purpose semantic I want. Does that look a little more readable? Eric