* Is s390's new generic-using syscall code actually correct? @ 2021-03-21 3:48 Andy Lutomirski 2021-03-24 17:38 ` Vasily Gorbik 2021-03-29 11:49 ` Sven Schnelle 0 siblings, 2 replies; 6+ messages in thread From: Andy Lutomirski @ 2021-03-21 3:48 UTC (permalink / raw) To: Sven Schnelle, Vasily Gorbik, X86 ML, linux-arch, Mark Rutland, LKML Hi all- I'm working on my kentry patchset, and I encountered: commit 56e62a73702836017564eaacd5212e4d0fa1c01d Author: Sven Schnelle <svens@linux.ibm.com> Date: Sat Nov 21 11:14:56 2020 +0100 s390: convert to generic entry As part of this work, I was cleaning up the generic syscall helpers, and I encountered the goodies in do_syscall() and __do_syscall(). I'm trying to wrap my head around the current code, and I'm rather confused. 1. syscall_exit_to_user_mode_work() does *all* the exit work, not just the syscall exit work. So a do_syscall() that gets called twice will do the loopy part of the exit work (e.g. signal handling) twice. Is this intentional? If so, why? 2. I don't understand how this PIF_SYSCALL_RESTART thing is supposed to work. Looking at the code in Linus' tree, if a signal is pending and a syscall returns -ERESTARTSYS, the syscall will return back to do_syscall(). The work (as in (1)) gets run, calling do_signal(), which will notice -ERESTARTSYS and set PIF_SYSCALL_RESTART. Presumably it will also push the signal frame onto the stack and aim the return address at the svc instruction mentioned in the commit message from "s390: convert to generic entry". Then __do_syscall() will turn interrupts back on and loop right back into do_syscall(). That seems incorrect. Can you enlighten me? My WIP tree is here: https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/kentry Here are my changes to s390, and I don't think they're really correct: https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/diff/arch/s390/kernel/syscall.c?h=x86/kentry&id=58a459922be0fb8e0f17aeaebcb0ac8d0575a62c ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Is s390's new generic-using syscall code actually correct? 2021-03-21 3:48 Is s390's new generic-using syscall code actually correct? Andy Lutomirski @ 2021-03-24 17:38 ` Vasily Gorbik 2021-03-24 18:06 ` Andy Lutomirski 2021-03-29 11:49 ` Sven Schnelle 1 sibling, 1 reply; 6+ messages in thread From: Vasily Gorbik @ 2021-03-24 17:38 UTC (permalink / raw) To: Andy Lutomirski Cc: Sven Schnelle, X86 ML, linux-arch, Mark Rutland, LKML, Heiko Carstens, Christian Borntraeger Hi Andy, On Sat, Mar 20, 2021 at 08:48:34PM -0700, Andy Lutomirski wrote: > Hi all- > > I'm working on my kentry patchset, and I encountered: > > commit 56e62a73702836017564eaacd5212e4d0fa1c01d > Author: Sven Schnelle <svens@linux.ibm.com> > Date: Sat Nov 21 11:14:56 2020 +0100 > > s390: convert to generic entry > > As part of this work, I was cleaning up the generic syscall helpers, > and I encountered the goodies in do_syscall() and __do_syscall(). > > I'm trying to wrap my head around the current code, and I'm rather confused. > > 1. syscall_exit_to_user_mode_work() does *all* the exit work, not just > the syscall exit work. So a do_syscall() that gets called twice will > do the loopy part of the exit work (e.g. signal handling) twice. Is > this intentional? If so, why? > > 2. I don't understand how this PIF_SYSCALL_RESTART thing is supposed > to work. Looking at the code in Linus' tree, if a signal is pending > and a syscall returns -ERESTARTSYS, the syscall will return back to > do_syscall(). The work (as in (1)) gets run, calling do_signal(), > which will notice -ERESTARTSYS and set PIF_SYSCALL_RESTART. > Presumably it will also push the signal frame onto the stack and aim > the return address at the svc instruction mentioned in the commit > message from "s390: convert to generic entry". Then __do_syscall() > will turn interrupts back on and loop right back into do_syscall(). > That seems incorrect. > > Can you enlighten me? My WIP tree is here: > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/kentry > For all the details to that change we'd have to wait for Sven, who is back next week. > Here are my changes to s390, and I don't think they're really correct: > > > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/diff/arch/s390/kernel/syscall.c?h=x86/kentry&id=58a459922be0fb8e0f17aeaebcb0ac8d0575a62c Couple of things: syscall_exit_to_user_mode_prepare is static, and there is another code path in arch/s390/kernel/traps.c using enter_from_user_mode/exit_to_user_mode. Anyhow I gave your branch a spin and got few new failures on strace test suite, in particular on restart_syscall test. I'll try to find time to look into details. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Is s390's new generic-using syscall code actually correct? 2021-03-24 17:38 ` Vasily Gorbik @ 2021-03-24 18:06 ` Andy Lutomirski 2021-03-30 8:13 ` Sven Schnelle 0 siblings, 1 reply; 6+ messages in thread From: Andy Lutomirski @ 2021-03-24 18:06 UTC (permalink / raw) To: Vasily Gorbik Cc: Andy Lutomirski, Sven Schnelle, X86 ML, linux-arch, Mark Rutland, LKML, Heiko Carstens, Christian Borntraeger On Wed, Mar 24, 2021 at 10:39 AM Vasily Gorbik <gor@linux.ibm.com> wrote: > > Hi Andy, > > On Sat, Mar 20, 2021 at 08:48:34PM -0700, Andy Lutomirski wrote: > > Hi all- > > > > I'm working on my kentry patchset, and I encountered: > > > > commit 56e62a73702836017564eaacd5212e4d0fa1c01d > > Author: Sven Schnelle <svens@linux.ibm.com> > > Date: Sat Nov 21 11:14:56 2020 +0100 > > > > s390: convert to generic entry > > > > As part of this work, I was cleaning up the generic syscall helpers, > > and I encountered the goodies in do_syscall() and __do_syscall(). > > > > I'm trying to wrap my head around the current code, and I'm rather confused. > > > > 1. syscall_exit_to_user_mode_work() does *all* the exit work, not just > > the syscall exit work. So a do_syscall() that gets called twice will > > do the loopy part of the exit work (e.g. signal handling) twice. Is > > this intentional? If so, why? > > > > 2. I don't understand how this PIF_SYSCALL_RESTART thing is supposed > > to work. Looking at the code in Linus' tree, if a signal is pending > > and a syscall returns -ERESTARTSYS, the syscall will return back to > > do_syscall(). The work (as in (1)) gets run, calling do_signal(), > > which will notice -ERESTARTSYS and set PIF_SYSCALL_RESTART. > > Presumably it will also push the signal frame onto the stack and aim > > the return address at the svc instruction mentioned in the commit > > message from "s390: convert to generic entry". Then __do_syscall() > > will turn interrupts back on and loop right back into do_syscall(). > > That seems incorrect. > > > > Can you enlighten me? My WIP tree is here: > > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/kentry > > > > For all the details to that change we'd have to wait for Sven, who is back > next week. > > > Here are my changes to s390, and I don't think they're really correct: > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/diff/arch/s390/kernel/syscall.c?h=x86/kentry&id=58a459922be0fb8e0f17aeaebcb0ac8d0575a62c > > Couple of things: syscall_exit_to_user_mode_prepare is static, > and there is another code path in arch/s390/kernel/traps.c using > enter_from_user_mode/exit_to_user_mode. > > Anyhow I gave your branch a spin and got few new failures on strace test > suite, in particular on restart_syscall test. I'll try to find time to > look into details. I refreshed the branch, but I confess I haven't compile tested it. :) I would guess that the new test case failures are a result of the buggy syscall restart logic. I think that all of the "restart" cases except execve() should just be removed. Without my patch, I suspect that signal delivery with -ERESTARTSYS would create the signal frame, do an accidental "restarted" syscall that was a no-op, and then deliver the signal. With my patch, it may simply repeat the original interrupted signal forever. --Andy ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Is s390's new generic-using syscall code actually correct? 2021-03-24 18:06 ` Andy Lutomirski @ 2021-03-30 8:13 ` Sven Schnelle 2021-03-30 8:37 ` Sven Schnelle 0 siblings, 1 reply; 6+ messages in thread From: Sven Schnelle @ 2021-03-30 8:13 UTC (permalink / raw) To: Andy Lutomirski Cc: Vasily Gorbik, X86 ML, linux-arch, Mark Rutland, LKML, Heiko Carstens, Christian Borntraeger Hi Andy, Andy Lutomirski <luto@kernel.org> writes: > On Wed, Mar 24, 2021 at 10:39 AM Vasily Gorbik <gor@linux.ibm.com> wrote: >> >> Hi Andy, >> >> On Sat, Mar 20, 2021 at 08:48:34PM -0700, Andy Lutomirski wrote: >> > Hi all- >> > >> > I'm working on my kentry patchset, and I encountered: >> > >> > commit 56e62a73702836017564eaacd5212e4d0fa1c01d >> > Author: Sven Schnelle <svens@linux.ibm.com> >> > Date: Sat Nov 21 11:14:56 2020 +0100 >> > >> > s390: convert to generic entry >> > >> > As part of this work, I was cleaning up the generic syscall helpers, >> > and I encountered the goodies in do_syscall() and __do_syscall(). >> > >> > I'm trying to wrap my head around the current code, and I'm rather confused. >> > >> > 1. syscall_exit_to_user_mode_work() does *all* the exit work, not just >> > the syscall exit work. So a do_syscall() that gets called twice will >> > do the loopy part of the exit work (e.g. signal handling) twice. Is >> > this intentional? If so, why? >> > >> > 2. I don't understand how this PIF_SYSCALL_RESTART thing is supposed >> > to work. Looking at the code in Linus' tree, if a signal is pending >> > and a syscall returns -ERESTARTSYS, the syscall will return back to >> > do_syscall(). The work (as in (1)) gets run, calling do_signal(), >> > which will notice -ERESTARTSYS and set PIF_SYSCALL_RESTART. >> > Presumably it will also push the signal frame onto the stack and aim >> > the return address at the svc instruction mentioned in the commit >> > message from "s390: convert to generic entry". Then __do_syscall() >> > will turn interrupts back on and loop right back into do_syscall(). >> > That seems incorrect. >> > >> > Can you enlighten me? My WIP tree is here: >> > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/kentry >> > >> >> For all the details to that change we'd have to wait for Sven, who is back >> next week. >> >> > Here are my changes to s390, and I don't think they're really correct: >> > >> > >> > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/diff/arch/s390/kernel/syscall.c?h=x86/kentry&id=58a459922be0fb8e0f17aeaebcb0ac8d0575a62c >> >> Couple of things: syscall_exit_to_user_mode_prepare is static, >> and there is another code path in arch/s390/kernel/traps.c using >> enter_from_user_mode/exit_to_user_mode. >> >> Anyhow I gave your branch a spin and got few new failures on strace test >> suite, in particular on restart_syscall test. I'll try to find time to >> look into details. > > I refreshed the branch, but I confess I haven't compile tested it. :) > > I would guess that the new test case failures are a result of the > buggy syscall restart logic. I think that all of the "restart" cases > except execve() should just be removed. Without my patch, I suspect > that signal delivery with -ERESTARTSYS would create the signal frame, > do an accidental "restarted" syscall that was a no-op, and then > deliver the signal. With my patch, it may simply repeat the original > interrupted signal forever. PIF_SYSCALL_RESTART is set in arch_do_signal_or_restart(), but only if there's no signal handler registered. In that case we don't need a signal frame, so that should be fine. The problem why your branch is not working is that arch_do_signal_or_restart() gets called from exit_to_user_mode_prepare(), and that is after the check whether PIF_SYSCALL_RESTART is set in __do_syscall(). So i'm wondering how to fix that. x86 simply rewinds the pc, so the system call instruction gets re-executed when returning to user space. For s390 that doesn't work, as the s390 svc instruction might have the syscall number encoded. If we would have to restart a system call with restart_systemcall(), we need to change the syscall number to __NR_restart_syscall. As we cannot change the hardcoded system call number, we somehow need to handle that inside of the kernel. So i wonder whether we should implement the PIF_SYSCALL_RESTART check in entry.S after all the return to user space entry code was run but before doing the real swtch back to user space. If PIF_SYSCALL_RESTART is set we would then just jump back to the entry code and pretend we came from user space. That would have the benefit that the entry C code looks the same like other architectures and that amount of code to add in entry.S shouldn't be much. Any thoughts? Regards Sven ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Is s390's new generic-using syscall code actually correct? 2021-03-30 8:13 ` Sven Schnelle @ 2021-03-30 8:37 ` Sven Schnelle 0 siblings, 0 replies; 6+ messages in thread From: Sven Schnelle @ 2021-03-30 8:37 UTC (permalink / raw) To: Andy Lutomirski Cc: Vasily Gorbik, X86 ML, linux-arch, Mark Rutland, LKML, Heiko Carstens, Christian Borntraeger Sven Schnelle <svens@linux.ibm.com> writes: > Hi Andy, > > Andy Lutomirski <luto@kernel.org> writes: > >> On Wed, Mar 24, 2021 at 10:39 AM Vasily Gorbik <gor@linux.ibm.com> wrote: >>> >>> Hi Andy, >>> >>> On Sat, Mar 20, 2021 at 08:48:34PM -0700, Andy Lutomirski wrote: >>> > Hi all- >>> > >>> > I'm working on my kentry patchset, and I encountered: >>> > >>> > commit 56e62a73702836017564eaacd5212e4d0fa1c01d >>> > Author: Sven Schnelle <svens@linux.ibm.com> >>> > Date: Sat Nov 21 11:14:56 2020 +0100 >>> > >>> > s390: convert to generic entry >>> > >>> > As part of this work, I was cleaning up the generic syscall helpers, >>> > and I encountered the goodies in do_syscall() and __do_syscall(). >>> > >>> > I'm trying to wrap my head around the current code, and I'm rather confused. >>> > >>> > 1. syscall_exit_to_user_mode_work() does *all* the exit work, not just >>> > the syscall exit work. So a do_syscall() that gets called twice will >>> > do the loopy part of the exit work (e.g. signal handling) twice. Is >>> > this intentional? If so, why? >>> > >>> > 2. I don't understand how this PIF_SYSCALL_RESTART thing is supposed >>> > to work. Looking at the code in Linus' tree, if a signal is pending >>> > and a syscall returns -ERESTARTSYS, the syscall will return back to >>> > do_syscall(). The work (as in (1)) gets run, calling do_signal(), >>> > which will notice -ERESTARTSYS and set PIF_SYSCALL_RESTART. >>> > Presumably it will also push the signal frame onto the stack and aim >>> > the return address at the svc instruction mentioned in the commit >>> > message from "s390: convert to generic entry". Then __do_syscall() >>> > will turn interrupts back on and loop right back into do_syscall(). >>> > That seems incorrect. >>> > >>> > Can you enlighten me? My WIP tree is here: >>> > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/kentry >>> > >>> >>> For all the details to that change we'd have to wait for Sven, who is back >>> next week. >>> >>> > Here are my changes to s390, and I don't think they're really correct: >>> > >>> > >>> > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/diff/arch/s390/kernel/syscall.c?h=x86/kentry&id=58a459922be0fb8e0f17aeaebcb0ac8d0575a62c >>> >>> Couple of things: syscall_exit_to_user_mode_prepare is static, >>> and there is another code path in arch/s390/kernel/traps.c using >>> enter_from_user_mode/exit_to_user_mode. >>> >>> Anyhow I gave your branch a spin and got few new failures on strace test >>> suite, in particular on restart_syscall test. I'll try to find time to >>> look into details. >> >> I refreshed the branch, but I confess I haven't compile tested it. :) >> >> I would guess that the new test case failures are a result of the >> buggy syscall restart logic. I think that all of the "restart" cases >> except execve() should just be removed. Without my patch, I suspect >> that signal delivery with -ERESTARTSYS would create the signal frame, >> do an accidental "restarted" syscall that was a no-op, and then >> deliver the signal. With my patch, it may simply repeat the original >> interrupted signal forever. > > PIF_SYSCALL_RESTART is set in arch_do_signal_or_restart(), but only if > there's no signal handler registered. In that case we don't need a > signal frame, so that should be fine. > > The problem why your branch is not working is that arch_do_signal_or_restart() > gets called from exit_to_user_mode_prepare(), and that is after the > check whether PIF_SYSCALL_RESTART is set in __do_syscall(). > > So i'm wondering how to fix that. x86 simply rewinds the pc, so the > system call instruction gets re-executed when returning to user > space. For s390 that doesn't work, as the s390 svc instruction might > have the syscall number encoded. If we would have to restart a system > call with restart_systemcall(), we need to change the syscall number to > __NR_restart_syscall. As we cannot change the hardcoded system call > number, we somehow need to handle that inside of the kernel. > > So i wonder whether we should implement the PIF_SYSCALL_RESTART check in > entry.S after all the return to user space entry code was run but before > doing the real swtch back to user space. If PIF_SYSCALL_RESTART is set > we would then just jump back to the entry code and pretend we came from > user space. > > That would have the benefit that the entry C code looks the same like > other architectures and that amount of code to add in entry.S shouldn't > be much. Thinking about this again i guess this idea won't work because the exit loop might have scheduled the old process away already... ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Is s390's new generic-using syscall code actually correct? 2021-03-21 3:48 Is s390's new generic-using syscall code actually correct? Andy Lutomirski 2021-03-24 17:38 ` Vasily Gorbik @ 2021-03-29 11:49 ` Sven Schnelle 1 sibling, 0 replies; 6+ messages in thread From: Sven Schnelle @ 2021-03-29 11:49 UTC (permalink / raw) To: Andy Lutomirski; +Cc: Vasily Gorbik, X86 ML, linux-arch, Mark Rutland, LKML Hi Andy, sorry for the late reply, i was away from kernel development for a few weeks. Andy Lutomirski <luto@kernel.org> writes: > Hi all- > > I'm working on my kentry patchset, and I encountered: > > commit 56e62a73702836017564eaacd5212e4d0fa1c01d > Author: Sven Schnelle <svens@linux.ibm.com> > Date: Sat Nov 21 11:14:56 2020 +0100 > > s390: convert to generic entry > > As part of this work, I was cleaning up the generic syscall helpers, > and I encountered the goodies in do_syscall() and __do_syscall(). > > I'm trying to wrap my head around the current code, and I'm rather confused. > > 1. syscall_exit_to_user_mode_work() does *all* the exit work, not just > the syscall exit work. So a do_syscall() that gets called twice will > do the loopy part of the exit work (e.g. signal handling) twice. Is > this intentional? If so, why? Not really intentional, but i decided to accept the overhead for now and fix that later by splitting up the generic entry code. Otherwise the patch would have had even more dependencies. I have not looked yet into your kentry branch, but will do that later. Maybe there is already a better way to do it or we can work something out. > 2. I don't understand how this PIF_SYSCALL_RESTART thing is supposed > to work. Looking at the code in Linus' tree, if a signal is pending > and a syscall returns -ERESTARTSYS, the syscall will return back to > do_syscall(). The work (as in (1)) gets run, calling do_signal(), > which will notice -ERESTARTSYS and set PIF_SYSCALL_RESTART. > Presumably it will also push the signal frame onto the stack and aim > the return address at the svc instruction mentioned in the commit > message from "s390: convert to generic entry". Then __do_syscall() > will turn interrupts back on and loop right back into do_syscall(). > That seems incorrect. That sounds indeed broken. My understanding is that x86 is always rewinding the pc in the restart case, and is exiting to user mode. That would than also work with signal frames. However, on s390 we cannot simply rewind the pc as the syscall number might be encoded in the system call instruction. If a user would have rewritten the system call number (i.e. with seccomp) it would still execute the original syscall number. That makes me wonder why i have not seen problems with signals and system call restarting so far. Have to read the code again. Regards Sven ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-03-30 8:38 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-03-21 3:48 Is s390's new generic-using syscall code actually correct? Andy Lutomirski 2021-03-24 17:38 ` Vasily Gorbik 2021-03-24 18:06 ` Andy Lutomirski 2021-03-30 8:13 ` Sven Schnelle 2021-03-30 8:37 ` Sven Schnelle 2021-03-29 11:49 ` Sven Schnelle
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.