From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 047A0C433F5 for ; Mon, 28 Mar 2022 22:14:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229683AbiC1WP7 (ORCPT ); Mon, 28 Mar 2022 18:15:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40308 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229822AbiC1WPb (ORCPT ); Mon, 28 Mar 2022 18:15:31 -0400 Received: from out01.mta.xmission.com (out01.mta.xmission.com [166.70.13.231]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 04FF42921EF for ; Mon, 28 Mar 2022 15:07:37 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:46094) by out01.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1nYxWQ-00B9Xg-Hb; Mon, 28 Mar 2022 16:07:34 -0600 Received: from ip68-227-174-4.om.om.cox.net ([68.227.174.4]:41488 helo=email.froward.int.ebiederm.org.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1nYxWP-004pXC-8f; Mon, 28 Mar 2022 16:07:34 -0600 From: "Eric W. Biederman" To: Sebastian Andrzej Siewior Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Oleg Nesterov , "H. Peter Anvin" , Andy Lutomirski , Ben Segall , Borislav Petkov , Daniel Bristot de Oliveira , Dave Hansen , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Mel Gorman , Peter Zijlstra , Steven Rostedt , Thomas Gleixner , Vincent Guittot References: <8735j2xigt.fsf@email.froward.int.ebiederm.org> Date: Mon, 28 Mar 2022 17:07:25 -0500 In-Reply-To: (Sebastian Andrzej Siewior's message of "Mon, 28 Mar 2022 18:17:38 +0200") Message-ID: <87zgl9pw82.fsf@email.froward.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1nYxWP-004pXC-8f;;;mid=<87zgl9pw82.fsf@email.froward.int.ebiederm.org>;;;hst=in02.mta.xmission.com;;;ip=68.227.174.4;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX18iy3ypwwCXhedTnPt+e/Gtvc0bWk4nhFg= X-SA-Exim-Connect-IP: 68.227.174.4 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [PATCH] signal/x86: Delay calling signals in atomic X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Sebastian Andrzej Siewior writes: > On 2022-03-28 09:25:06 [-0500], Eric W. Biederman wrote: >> Sebastian Andrzej Siewior writes: >> >> Folks I really would have appreciated being copied on a signal handling >> patch like this. > > Sorry for that. For the whole ptrace/signal part is no maintainer listed > and I got the feeling that Oleg knows these bits. Oleg and I maybe a couple of others have been doing the work. I happen to know the fine differences between TIF_SIGPENDING and TIF_NOTIFY_RESUME because I have just been doing some cleanup work in that area. Plus cleanup in the force_sig_info_to_task area. >> It is too late to nack, but I think this buggy patch deserved one. Can >> we please fix PREEMPT_RT instead? > > Sure. > >> As far as I can tell this violates all of rules from >> implementing/maintaining the RT kernel. Instead of coming up with new >> abstractions that makes sense and can use by everyone this introduces >> a hack only for PREEMPT_RT and a pretty horrible one at that. >> >> This talks about int3, but the code looks for in_atomic(). Which means >> that essentially every call of force_sig will take this path as they >> almost all come from exception handlers. It is the nature of signals >> that report on faults. An exception is raised and the kernel reports it >> to userspace with a fault signal (aka force_sig_xxx). > > The int3 is invoked with disabled interrupts. There are also a few > others path which are explicit with disabled interrupts or with a > raw_spinlock_t which lead to an atomic section on PREEMPT_RT. Call > chains with spinlock_t or a rwlock_t don't lead to a atomic section on > PREEMPT_RT. Therefore I don't think this is "essentially every call of > force_sig" that is going to use that. I was assuming that exception handlers would be like int3 and are in general cases that would be places that are atomic and don't allow scheduling. I admit I don't know the PREEMPT_RT rules though. >> Further this code is buggy. TIF_NOTIFY_RESUME is not the correct >> flag to set to enter into exit_to_usermode_loop. TIF_NOTIFY_RESUME is >> about that happens after signal handling. This very much needs to be >> TIF_SIGPENDING with recalc_sigpending and friends updated to know about >> "task->force_info". >> >> Does someone own this problem? Can that person please fix this >> properly? > > Sure. Instead setting TIF_NOTIFY_RESUME you want the code updated to use > recalc_sigpending() only. Or do you have other suggestions regarding > fixing this properly? The rule with TIF_SIGPENDING (which should be used if this technique is used at all), is that recalc_sigpending needs to know how to test if the signal is still pending. >> I really don't think it is going to be maintainable for PREEMPT_RT to >> maintain a separate signal delivery path for faults from the rest of >> linux. > > Okay. We have a few other cases where we deliver signals from interrupts. Off the top of my head there is SAK and magic sysrq, but I think there are more. So I am also not convinced that all signals you care about will go through force_sig_info_to_task. What I really don't know and this is essentially a PREEMPT_RT question is what makes int3 special? Why don't other faults have this problem? I remember there was a change where we had threaded interrupt handlers to get around this for interrupt service routines. I wonder if we need to do something similar with faults. Have a fast part and a threaded part that runs in a schedulable way. Although given that for a fault you need to be fundamentally bound to the task/thread you faulted from it probably just means having a way to switch to a kernel stack that you can schedule from, and not use a reserved per cpu stack. The task_struct would certainly need to stay the same for all of the pieces. Or maybe for PREMPT_RT you pick the i386 mechanism. How does PREEMPT_RT deal with page faults, or general protection faults? This is my long winded way of saying that I rather expect that if PREEMPT_RT is going to call code it has modified to be sleeping that it would also make it safe for that code to sleep. Further (and this is probably my ignorance) I just don't see what makes any of this specific to just int3. Why aren't other faults affected? Eric