From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C1A8C04A68 for ; Wed, 27 Jul 2022 19:41:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234996AbiG0Tld (ORCPT ); Wed, 27 Jul 2022 15:41:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59380 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229680AbiG0TlO (ORCPT ); Wed, 27 Jul 2022 15:41:14 -0400 Received: from out5-smtp.messagingengine.com (out5-smtp.messagingengine.com [66.111.4.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B4A221121 for ; Wed, 27 Jul 2022 12:41:03 -0700 (PDT) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 52FE35C00D6; Wed, 27 Jul 2022 15:41:01 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute1.internal (MEProxy); Wed, 27 Jul 2022 15:41:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tycho.pizza; h= cc:cc:content-type:date:date:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to; s=fm2; t=1658950861; x=1659037261; bh=9SJH53lxjC C8sHNElWVPiHBchpfU5rpCYTirdYYl4V0=; b=kKT+Fw9lO0nVt4pglWQT6JoLwH xUed/SuDTCgJCm5DkiVg8cWzUNB0CgypGJNK7/osbKo4lIB6F84FURvgxTly5rW6 wp+7S90fk+le3d8VZjSS8fOsxhDapNnXmRGikzz3VY0PCQq9VTC/IarS6ILkOxkq kQnWiwZeSEGHRFgyehwXxggQao2CR3ebhk/txV5EUXMgJn2OjlyitKJY4PYOv4NO XEY+TP3Nneie2cPJu0ShyxyB3bkqRAKpB3ko0LW40S++v+KQwCDX2zzaIOMLwZEP RG+JB/mwjC0eYOGw3NY83XWnnAD4I4bx/LeLt720z93GdPY0VNmD+P51jYTg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:date:date:feedback-id :feedback-id:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:sender:subject:subject:to:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; t=1658950861; x=1659037261; bh=9SJH53lxjCC8sHNElWVPiHBchpfU 5rpCYTirdYYl4V0=; b=znngmkk+z4M7vOPii5EKBwuTm1Sb79+YuXEoVekgjE3D /D2C1IL7NPfd4Kv1nCrUvQW5m44HIG5CCGwv90kr2gNfE53i15Ncv7ohIbXhFEVs ui7sXBs65YnxD8H08QEzDf1RfSAUQpvOO99Uxe77I/6wUkB/jW9p0RF8O74jMRnz di7JHbe+pinDvNuLoXyft6vMPN9+xiocTBsm8LO7gIlID9aVi/wTLvOne1HWap1n 7pxXE55OWBXONysY/jy3pLQPh6ct7fRy2qX6IpcG3RY2v2nI3TCDPLYeuFOrbBrY A3VXPh8T4MMf5CJhD4GVOsN+hC3OnAlhtfeymNraZg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvfedrvdduvddgudegudcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpeffhffvvefukfhfgggtuggjsehttdertddttddvnecuhfhrohhmpefvhigt hhhoucetnhguvghrshgvnhcuoehthigthhhosehthigthhhordhpihiiiigrqeenucggtf frrghtthgvrhhnpeetfeehjeduuedufedvhfdutefgtdeileegkedufeffledvvdffiefg ieevffeuheenucffohhmrghinhepkhgvrhhnvghlrdhorhhgpdhgihhthhhusgdrtghomh enucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehthigt hhhosehthigthhhordhpihiiiigr X-ME-Proxy: Feedback-ID: i21f147d5:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 27 Jul 2022 15:40:59 -0400 (EDT) Date: Wed, 27 Jul 2022 13:40:57 -0600 From: Tycho Andersen To: Oleg Nesterov Cc: "Serge E. Hallyn" , "Eric W . Biederman" , Miklos Szeredi , linux-kernel@vger.kernel.org Subject: Re: [PATCH] sched: __fatal_signal_pending() should also check PF_EXITING Message-ID: References: <20220713175305.1327649-1-tycho@tycho.pizza> <20220720150328.GA30749@mail.hallyn.com> <20220721015459.GA4297@mail.hallyn.com> <20220727175538.GC18822@redhat.com> <20220727191949.GD18822@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220727191949.GD18822@redhat.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 27, 2022 at 09:19:50PM +0200, Oleg Nesterov wrote: > On 07/27, Tycho Andersen wrote: > > > > On Wed, Jul 27, 2022 at 07:55:39PM +0200, Oleg Nesterov wrote: > > > On 07/27, Tycho Andersen wrote: > > > > > > > > Hi all, > > > > > > > > On Wed, Jul 20, 2022 at 08:54:59PM -0500, Serge E. Hallyn wrote: > > > > > Oh - I didn't either - checking the sigkill in shared signals *seems* > > > > > legit if they can be put there - but since you posted the new patch I > > > > > assumed his reasoning was clear to you. I know Eric's busy, cc:ing Oleg > > > > > for his interpretation too. > > > > > > > > Any thoughts on this? > > > > > > Cough... I don't know what can I say except I personally dislike this > > > patch no matter what ;) > > > > > > And I do not understand how can this patch help. OK, a single-threaded > > > PF_EXITING task sleeps in TASK_KILLABLE. send_signal_locked() won't > > > wake it up anyway? > > > > > > I must have missed something. > > > > What do you think of the patch in > > https://lore.kernel.org/all/YsyHMVLuT5U6mm+I@netflix/ ? Hopefully that > > has an explanation that makes more sense. > > Sorry, I still do not follow. Again, I can easily miss something. But how > can ANY change in __fatal_signal_pending() ensure that SIGKILL will wakeup > a PF_EXITING task which already sleeps in TASK_KILLABLE state? or even set > TIF_SIGPENDING as the changelog states? __fatal_signal_pending() just checks the non-shared set: sigismember(&p->pending.signal, SIGKILL) When init in a pid namespace dies, it calls zap_pid_ns_processes(), which does: group_send_sig_info(SIGKILL, SEND_SIG_PRIV, task, PIDTYPE_MAX); that eventually gets to __send_signal_locked() which does: pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending; i.e. it decides to put the signal in the shared set, instead of the individual set. If we change __fatal_signal_pending() to look in the shared set too, it will exit all the wait code in this case. Maybe it should be fixed somehow by complete_signal(), but that doesn't work if the thread is already PF_EXITING, because wants_signal() will cause it to ignore the task, so it remains stuck forever. Does that make sense? Maybe it's me who is missing something. I have a reproducer here: https://github.com/tych0/kernel-utils/tree/master/fuse2 Tycho