From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CAD01C433E2 for ; Fri, 4 Sep 2020 10:15:30 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 36430206D4 for ; Fri, 4 Sep 2020 10:15:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="dZBSlId5"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="eoru9GuS" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 36430206D4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linutronix.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from bilbo.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 4BjYTW2sqzzDr8D for ; Fri, 4 Sep 2020 20:15:27 +1000 (AEST) Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=linutronix.de (client-ip=2a0a:51c0:0:12e:550::1; helo=galois.linutronix.de; envelope-from=tglx@linutronix.de; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=linutronix.de header.i=@linutronix.de header.a=rsa-sha256 header.s=2020 header.b=dZBSlId5; dkim=pass header.d=linutronix.de header.i=@linutronix.de header.a=ed25519-sha256 header.s=2020e header.b=eoru9GuS; dkim-atps=neutral Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4BjYR94NNczDr8D for ; Fri, 4 Sep 2020 20:13:25 +1000 (AEST) From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1599214396; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Z8OvW6DJXZpgU3aKJ+ArJf6u2HaUfprFbOOK8smF874=; b=dZBSlId5W09VtfHcoWhseB5XiTBbGXWSjSWG5XFU58ZWYuqDdtM4Ipu9mNGZY+ddAn37dS AxDs/0r1rhHicd824BDPwytOgDf8f17n5gdpXS06ccu1J8J02/S3Px3uFgDRLoGRl+PjKM bU83O/2nJ/tSYGIE4CElngX6XiPbSOHvrFIlF/46BHUlEwSBUrrrMnzuy1Ks/8KOesKn82 yzz5TXpXJFv9IUUrWA8fr+U2kWyVy8VmvS31HDRYjYZ0xy05l0NK41J5gNkBa6AjMjEEvi SqxPITHKMuMWLYE38SvauwgernJ2tQIyJdf2qZxYvlL917z72wWol9JRX9XuYQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1599214396; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Z8OvW6DJXZpgU3aKJ+ArJf6u2HaUfprFbOOK8smF874=; b=eoru9GuSlO3Zvk6GwC/O3LrVM2/p6u5L16LR2oCIwbP9/NjUC2FQtUYVwbzdk2rHLSP1Ta igirNT0+2/4DKxDw== To: Andy Lutomirski Subject: Re: ptrace_syscall_32 is failing In-Reply-To: References: <87k0xdjbtt.fsf@nanos.tec.linutronix.de> <87blioinub.fsf@nanos.tec.linutronix.de> Date: Fri, 04 Sep 2020 12:13:15 +0200 Message-ID: <87mu254zpg.fsf@nanos.tec.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-s390 , linuxppc-dev , Vasily Gorbik , Brian Gerst , Heiko Carstens , X86 ML , LKML , Christian Borntraeger , Paul Mackerras , Catalin Marinas , Andy Lutomirski , Will Deacon , linux-arm-kernel Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" Andy, On Wed, Sep 02 2020 at 09:49, Andy Lutomirski wrote: > On Wed, Sep 2, 2020 at 1:29 AM Thomas Gleixner wrote: >> >> But you might tell me where exactly you want to inject the SIGTRAP in >> the syscall exit code flow. > > It would be a bit complicated. Definitely after any signals from the > syscall are delivered. Right now, I think that we don't deliver a > SIGTRAP on the instruction boundary after SYSCALL while > single-stepping. (I think we used to, but only sometimes, and now we > are at least consistent.) This is because IRET will not trap if it > starts with TF clear and ends up setting it. (I asked Intel to > document this, and I think they finally did, although I haven't gotten > around to reading the new docs. Certainly the old docs as of a year > or two ago had no description whatsoever of how TF changes worked.) > > Deciding exactly *when* a trap should occur would be nontrivial -- we > can't trap on sigreturn() from a SIGTRAP, for example. > > So this isn't fully worked out. Oh well. >> >> I don't think we want that in general. The current variant is perfectly >> >> fine for everything except the 32bit fast syscall nonsense. Also >> >> irqentry_entry/exit is not equivalent to the syscall_enter/exit >> >> counterparts. >> > >> > If there are any architectures in which actual work is needed to >> > figure out whether something is a syscall in the first place, they'll >> > want to do the usual kernel entry work before the syscall entry work. >> >> That's low level entry code which does not require RCU, lockdep, tracing >> or whatever muck we setup before actual work can be done. >> >> arch_asm_entry() >> ... >> arch_c_entry(cause) { >> switch(cause) { >> case EXCEPTION: arch_c_exception(...); >> case SYSCALL: arch_c_syscall(...); >> ... >> } > > You're assuming that figuring out the cause doesn't need the kernel > entry code to run first. In the case of the 32-bit vDSO fast > syscalls, we arguably don't know whether an entry is a syscall until > we have done a user memory access. Logically, we're doing: > > if (get_user() < 0) { > /* Not a syscall. This is actually a silly operation that sets AX = > -EFAULT and returns. Do not audit or invoke ptrace. */ > } else { > /* This actually is a syscall. */ > } Yes, that's what I've addressed with providing split interfaces. >> You really want to differentiate between exception and syscall >> entry/exit. >> > > Why do we want to distinguish between exception and syscall > entry/exit? For the enter part, AFAICS the exception case boils down > to enter_from_user_mode() and the syscall case is: > > enter_from_user_mode(regs); > instrumentation_begin(); > > local_irq_enable(); > ti_work = READ_ONCE(current_thread_info()->flags); > if (ti_work & SYSCALL_ENTER_WORK) > syscall = syscall_trace_enter(regs, syscall, ti_work); > instrumentation_end(); > > Which would decompose quite nicely as a regular (non-syscall) entry > plus the syscall part later. There is a difference between syscall entry and exception entry at least in my view: syscall: enter_from_user_mode(regs); local_irq_enable(); exception: enter_from_user_mode(regs); >> we'd have: >> >> arch_c_entry() >> irqentry_enter(); >> local_irq_enble(); >> nr = syscall_enter_from_user_mode_work(); >> ... >> >> which enforces two calls for sane entries and more code in arch/.... > > This is why I still like my: > > arch_c_entry() > irqentry_enter_from_user_mode(); > generic_syscall(); > exit... So what we have now (with my patch applied) is either: 1) arch_c_entry() nr = syscall_enter_from_user_mode(); arch_handle_syscall(nr); syscall_exit_to_user_mode(); or for that extra 32bit fast syscall thing: 2) arch_c_entry() syscall_enter_from_user_mode_prepare(); arch_do_stuff(); nr = syscall_enter_from_user_mode_work(); arch_handle_syscall(nr); syscall_exit_to_user_mode(); So for sane cases you just use #1. Ideally we'd not need arch_handle_syscall(nr) at all, but that does not work with multiple ABIs supported, i.e. the compat muck. The only way we could make that work is to have: syscall_enter_exit(regs, mode) nr = syscall_enter_from_user_mode(); arch_handle_syscall(mode, nr); syscall_exit_to_user_mode(); and then arch_c_entry() becomes: syscall_enter_exit(regs, mode); which means that arch_handle_syscall() would have to evaluate the mode and chose the appropriate syscall table. Not sure whether that's a win. Thanks, tglx