From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753530AbbAVBYe (ORCPT ); Wed, 21 Jan 2015 20:24:34 -0500 Received: from mail-wg0-f43.google.com ([74.125.82.43]:51574 "EHLO mail-wg0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751759AbbAVBYY (ORCPT ); Wed, 21 Jan 2015 20:24:24 -0500 MIME-Version: 1.0 In-Reply-To: References: <20150116155923.GK12302@n2100.arm.linux.org.uk> <20150116161752.GL12302@n2100.arm.linux.org.uk> <20150119092002.GA32131@arm.com> <20150120224518.GL26493@n2100.arm.linux.org.uk> <20150120230458.GM26493@n2100.arm.linux.org.uk> Date: Thu, 22 Jan 2015 10:24:22 +0900 Message-ID: Subject: Re: [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall From: Roman Peniaev To: Kees Cook Cc: Russell King - ARM Linux , "linux-arm-kernel@lists.infradead.org" , Stefano Stabellini , Marc Zyngier , Catalin Marinas , Will Deacon , "linux-kernel@vger.kernel.org" , Sekhar Nori , "stable@vger.kernel.org" , Christoffer Dall Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 22, 2015 at 8:32 AM, Kees Cook wrote: > On Tue, Jan 20, 2015 at 3:04 PM, Russell King - ARM Linux > wrote: >> On Tue, Jan 20, 2015 at 10:45:19PM +0000, Russell King - ARM Linux wrote: >>> Well, the whole question is this: is restarting a system call like >>> usleep() really a separate system call, or is it a kernel implementation >>> detail? >>> >>> If you wanted seccomp to see this, what would be the use case? Why >>> would seccomp want to block this syscall? Does it make sense for >>> seccomp to block this syscall when it doesn't block something like >>> usleep() and then have usleep() fail just because the thread received >>> a signal? >>> >>> I personally regard the whole restart system call thing as a purely >>> kernel internal thing which should not be exposed to userland. If >>> we decide that it should be exposed to userland, then it becomes part >>> of the user ABI, and it /could/ become difficult if we needed to >>> change it in future - and I'd rather not get into the "oh shit, we >>> can't change it because that would break app X" crap. >> >> Here's a scenario where it could become a problem: >> >> Let's say that we want to use seccomp to secure some code which issues >> system calls. We determine that the app uses system calls which don't >> result in the restart system call being issued, so we decide to ask >> seccomp to block the restart system call. Some of these system calls >> that the app was using are restartable system calls. >> >> When these system calls are restarted, what we see via ptrace etc is >> that the system call simply gets re-issued as its own system call. >> >> In a future kernel version, we decide that we could really do with one >> of those system calls using the restart block feature, so we arrange >> for it to set up the restart block, and return -ERESTART_BLOCK. That's >> fine for most applications, but this app now breaks. >> >> The side effect of that breakage is that we have to revert that kernel >> change - because we've broken userland, and that's simply not allowed. >> >> Now look at the alternative: we don't make the restart syscall visible. >> This means that we hide that detail, and we actually reflect the >> behaviour that we've had for the other system call restart mechanisms, >> and we don't have to fear userspace breakage as a result of switching >> from one restart mechanism to another. >> >> I am very much of the opinion that we should be trying to limit the >> exposure of inappropriate kernel internal details to userland, because >> userland has a habbit of becoming reliant on them, and when it does, >> it makes kernel maintanence unnecessarily harder. > > I totally agree with you. :) My question here is more about what we > should do with what we currently have since we have some unexpected > combinations. > > There is already an __NR_restart_syscall syscall and it seems like > it's already part of the userspace ABI: > - it is possible to call it from userspace directly > - seccomp "sees" it > - ptrace doesn't see it > > Native ARM64 hides the restart from both seccomp and ptrace, and this > seems like the right idea, except that restart_syscall is still > callable from userspace. I don't think there's a way to make that > vanish, which means we'll always have an exposed syscall. If anything > goes wrong with it (which we've been quite close to recently[1]), > there would be no way to have seccomp filter it. > > So, at the least, I'd like arm64 to NOT hide restart_syscall from > seccomp, and at best I'd like both arm and arm64 to (somehow) entirely > remove restart_syscall from the userspace ABI so it wouldn't need to > be filtered, and wouldn't become a weird ABI hiccup as you've > described. > > I fail to imagine a way to remove restart_syscall from userspace, so > I'm left with wanting parity of behavior between ARM and ARM64 (and > x86). What's the right way forward? My sufferings are an opposite of what seccompt expects: currently I do not see any possible way (from userspace) to get syscall number which was restarted, because at some given time userspace checks the procfs syscall file and sees NR_restart there, without any chance to understand what exactly was restarted (I am talking about some kind of debugging tool which does dead-lock analysis of stuck tasks). I totally agree with Russell not to provide kernel guts to userspace, but it is already done. Too late. So probably there is a need to split syscall on two numbers: real and effective. Real number we have right now on x86. And this should be done for both ptrace and procfs syscall file. (am I right that only for ARM we already have PTRACE_SET_SYSCALL? seems we can add also real/effective getter) -- Roman From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: MIME-Version: 1.0 In-Reply-To: References: <20150116155923.GK12302@n2100.arm.linux.org.uk> <20150116161752.GL12302@n2100.arm.linux.org.uk> <20150119092002.GA32131@arm.com> <20150120224518.GL26493@n2100.arm.linux.org.uk> <20150120230458.GM26493@n2100.arm.linux.org.uk> Date: Thu, 22 Jan 2015 10:24:22 +0900 Message-ID: Subject: Re: [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall From: Roman Peniaev To: Kees Cook Cc: Russell King - ARM Linux , "linux-arm-kernel@lists.infradead.org" , Stefano Stabellini , Marc Zyngier , Catalin Marinas , Will Deacon , "linux-kernel@vger.kernel.org" , Sekhar Nori , "stable@vger.kernel.org" , Christoffer Dall Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: On Thu, Jan 22, 2015 at 8:32 AM, Kees Cook wrote: > On Tue, Jan 20, 2015 at 3:04 PM, Russell King - ARM Linux > wrote: >> On Tue, Jan 20, 2015 at 10:45:19PM +0000, Russell King - ARM Linux wrote: >>> Well, the whole question is this: is restarting a system call like >>> usleep() really a separate system call, or is it a kernel implementation >>> detail? >>> >>> If you wanted seccomp to see this, what would be the use case? Why >>> would seccomp want to block this syscall? Does it make sense for >>> seccomp to block this syscall when it doesn't block something like >>> usleep() and then have usleep() fail just because the thread received >>> a signal? >>> >>> I personally regard the whole restart system call thing as a purely >>> kernel internal thing which should not be exposed to userland. If >>> we decide that it should be exposed to userland, then it becomes part >>> of the user ABI, and it /could/ become difficult if we needed to >>> change it in future - and I'd rather not get into the "oh shit, we >>> can't change it because that would break app X" crap. >> >> Here's a scenario where it could become a problem: >> >> Let's say that we want to use seccomp to secure some code which issues >> system calls. We determine that the app uses system calls which don't >> result in the restart system call being issued, so we decide to ask >> seccomp to block the restart system call. Some of these system calls >> that the app was using are restartable system calls. >> >> When these system calls are restarted, what we see via ptrace etc is >> that the system call simply gets re-issued as its own system call. >> >> In a future kernel version, we decide that we could really do with one >> of those system calls using the restart block feature, so we arrange >> for it to set up the restart block, and return -ERESTART_BLOCK. That's >> fine for most applications, but this app now breaks. >> >> The side effect of that breakage is that we have to revert that kernel >> change - because we've broken userland, and that's simply not allowed. >> >> Now look at the alternative: we don't make the restart syscall visible. >> This means that we hide that detail, and we actually reflect the >> behaviour that we've had for the other system call restart mechanisms, >> and we don't have to fear userspace breakage as a result of switching >> from one restart mechanism to another. >> >> I am very much of the opinion that we should be trying to limit the >> exposure of inappropriate kernel internal details to userland, because >> userland has a habbit of becoming reliant on them, and when it does, >> it makes kernel maintanence unnecessarily harder. > > I totally agree with you. :) My question here is more about what we > should do with what we currently have since we have some unexpected > combinations. > > There is already an __NR_restart_syscall syscall and it seems like > it's already part of the userspace ABI: > - it is possible to call it from userspace directly > - seccomp "sees" it > - ptrace doesn't see it > > Native ARM64 hides the restart from both seccomp and ptrace, and this > seems like the right idea, except that restart_syscall is still > callable from userspace. I don't think there's a way to make that > vanish, which means we'll always have an exposed syscall. If anything > goes wrong with it (which we've been quite close to recently[1]), > there would be no way to have seccomp filter it. > > So, at the least, I'd like arm64 to NOT hide restart_syscall from > seccomp, and at best I'd like both arm and arm64 to (somehow) entirely > remove restart_syscall from the userspace ABI so it wouldn't need to > be filtered, and wouldn't become a weird ABI hiccup as you've > described. > > I fail to imagine a way to remove restart_syscall from userspace, so > I'm left with wanting parity of behavior between ARM and ARM64 (and > x86). What's the right way forward? My sufferings are an opposite of what seccompt expects: currently I do not see any possible way (from userspace) to get syscall number which was restarted, because at some given time userspace checks the procfs syscall file and sees NR_restart there, without any chance to understand what exactly was restarted (I am talking about some kind of debugging tool which does dead-lock analysis of stuck tasks). I totally agree with Russell not to provide kernel guts to userspace, but it is already done. Too late. So probably there is a need to split syscall on two numbers: real and effective. Real number we have right now on x86. And this should be done for both ptrace and procfs syscall file. (am I right that only for ARM we already have PTRACE_SET_SYSCALL? seems we can add also real/effective getter) -- Roman From mboxrd@z Thu Jan 1 00:00:00 1970 From: r.peniaev@gmail.com (Roman Peniaev) Date: Thu, 22 Jan 2015 10:24:22 +0900 Subject: [PATCH 1/2] ARM: entry-common: fix forgotten set of thread_info->syscall In-Reply-To: References: <20150116155923.GK12302@n2100.arm.linux.org.uk> <20150116161752.GL12302@n2100.arm.linux.org.uk> <20150119092002.GA32131@arm.com> <20150120224518.GL26493@n2100.arm.linux.org.uk> <20150120230458.GM26493@n2100.arm.linux.org.uk> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Thu, Jan 22, 2015 at 8:32 AM, Kees Cook wrote: > On Tue, Jan 20, 2015 at 3:04 PM, Russell King - ARM Linux > wrote: >> On Tue, Jan 20, 2015 at 10:45:19PM +0000, Russell King - ARM Linux wrote: >>> Well, the whole question is this: is restarting a system call like >>> usleep() really a separate system call, or is it a kernel implementation >>> detail? >>> >>> If you wanted seccomp to see this, what would be the use case? Why >>> would seccomp want to block this syscall? Does it make sense for >>> seccomp to block this syscall when it doesn't block something like >>> usleep() and then have usleep() fail just because the thread received >>> a signal? >>> >>> I personally regard the whole restart system call thing as a purely >>> kernel internal thing which should not be exposed to userland. If >>> we decide that it should be exposed to userland, then it becomes part >>> of the user ABI, and it /could/ become difficult if we needed to >>> change it in future - and I'd rather not get into the "oh shit, we >>> can't change it because that would break app X" crap. >> >> Here's a scenario where it could become a problem: >> >> Let's say that we want to use seccomp to secure some code which issues >> system calls. We determine that the app uses system calls which don't >> result in the restart system call being issued, so we decide to ask >> seccomp to block the restart system call. Some of these system calls >> that the app was using are restartable system calls. >> >> When these system calls are restarted, what we see via ptrace etc is >> that the system call simply gets re-issued as its own system call. >> >> In a future kernel version, we decide that we could really do with one >> of those system calls using the restart block feature, so we arrange >> for it to set up the restart block, and return -ERESTART_BLOCK. That's >> fine for most applications, but this app now breaks. >> >> The side effect of that breakage is that we have to revert that kernel >> change - because we've broken userland, and that's simply not allowed. >> >> Now look at the alternative: we don't make the restart syscall visible. >> This means that we hide that detail, and we actually reflect the >> behaviour that we've had for the other system call restart mechanisms, >> and we don't have to fear userspace breakage as a result of switching >> from one restart mechanism to another. >> >> I am very much of the opinion that we should be trying to limit the >> exposure of inappropriate kernel internal details to userland, because >> userland has a habbit of becoming reliant on them, and when it does, >> it makes kernel maintanence unnecessarily harder. > > I totally agree with you. :) My question here is more about what we > should do with what we currently have since we have some unexpected > combinations. > > There is already an __NR_restart_syscall syscall and it seems like > it's already part of the userspace ABI: > - it is possible to call it from userspace directly > - seccomp "sees" it > - ptrace doesn't see it > > Native ARM64 hides the restart from both seccomp and ptrace, and this > seems like the right idea, except that restart_syscall is still > callable from userspace. I don't think there's a way to make that > vanish, which means we'll always have an exposed syscall. If anything > goes wrong with it (which we've been quite close to recently[1]), > there would be no way to have seccomp filter it. > > So, at the least, I'd like arm64 to NOT hide restart_syscall from > seccomp, and at best I'd like both arm and arm64 to (somehow) entirely > remove restart_syscall from the userspace ABI so it wouldn't need to > be filtered, and wouldn't become a weird ABI hiccup as you've > described. > > I fail to imagine a way to remove restart_syscall from userspace, so > I'm left with wanting parity of behavior between ARM and ARM64 (and > x86). What's the right way forward? My sufferings are an opposite of what seccompt expects: currently I do not see any possible way (from userspace) to get syscall number which was restarted, because at some given time userspace checks the procfs syscall file and sees NR_restart there, without any chance to understand what exactly was restarted (I am talking about some kind of debugging tool which does dead-lock analysis of stuck tasks). I totally agree with Russell not to provide kernel guts to userspace, but it is already done. Too late. So probably there is a need to split syscall on two numbers: real and effective. Real number we have right now on x86. And this should be done for both ptrace and procfs syscall file. (am I right that only for ARM we already have PTRACE_SET_SYSCALL? seems we can add also real/effective getter) -- Roman