All of lore.kernel.org
 help / color / mirror / Atom feed
* seccomp and ptrace. what is the correct order?
@ 2012-05-21 18:21 ` Eric Paris
  0 siblings, 0 replies; 77+ messages in thread
From: Eric Paris @ 2012-05-21 18:21 UTC (permalink / raw)
  To: Will Drewry
  Cc: linux-kernel, linux-security-module, kernel-hardening, hpa,
	mingo, oleg, peterz, rdunlap, mcgrathr, tglx, luto, eparis,
	serge.hallyn, indan, pmoore, akpm, corbet, eric.dumazet, markus,
	coreyb, keescook

Viro ask me a question today and I didn't have a good answer.

Lets assume I set a seccomp filter that will allow read and will
deny/kill ioctl.  If something else is tracing me I could call read.
The read will pass the seccomp hook and move onto the ptrace hook.
The tracer could then change the syscall number to ioctl and I would
then actually perform an ioctl.

Is that what we want?  Do we want to do the permission check based on
what a process ask at syscall enter or do we want to do the permission
check based on what the kernel is actually going to do on behalf of
the process?

Does the question make sense?

-Eric

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] seccomp and ptrace. what is the correct order?
@ 2012-05-21 18:21 ` Eric Paris
  0 siblings, 0 replies; 77+ messages in thread
From: Eric Paris @ 2012-05-21 18:21 UTC (permalink / raw)
  To: Will Drewry
  Cc: linux-kernel, linux-security-module, kernel-hardening, hpa,
	mingo, oleg, peterz, rdunlap, mcgrathr, tglx, luto, eparis,
	serge.hallyn, indan, pmoore, akpm, corbet, eric.dumazet, markus,
	coreyb, keescook

Viro ask me a question today and I didn't have a good answer.

Lets assume I set a seccomp filter that will allow read and will
deny/kill ioctl.  If something else is tracing me I could call read.
The read will pass the seccomp hook and move onto the ptrace hook.
The tracer could then change the syscall number to ioctl and I would
then actually perform an ioctl.

Is that what we want?  Do we want to do the permission check based on
what a process ask at syscall enter or do we want to do the permission
check based on what the kernel is actually going to do on behalf of
the process?

Does the question make sense?

-Eric

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: seccomp and ptrace. what is the correct order?
  2012-05-21 18:21 ` [kernel-hardening] " Eric Paris
@ 2012-05-21 18:25   ` Roland McGrath
  -1 siblings, 0 replies; 77+ messages in thread
From: Roland McGrath @ 2012-05-21 18:25 UTC (permalink / raw)
  To: Eric Paris
  Cc: Will Drewry, linux-kernel, linux-security-module,
	kernel-hardening, hpa, mingo, oleg, peterz, rdunlap, tglx, luto,
	eparis, serge.hallyn, indan, pmoore, akpm, corbet, eric.dumazet,
	markus, coreyb, keescook

>From a security perspective I think the natural expectation would be that
the seccomp check is on the values that will actually be used, without an
intervening opportunity to change anything.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: seccomp and ptrace. what is the correct order?
@ 2012-05-21 18:25   ` Roland McGrath
  0 siblings, 0 replies; 77+ messages in thread
From: Roland McGrath @ 2012-05-21 18:25 UTC (permalink / raw)
  To: Eric Paris
  Cc: Will Drewry, linux-kernel, linux-security-module,
	kernel-hardening, hpa, mingo, oleg, peterz, rdunlap, tglx, luto,
	eparis, serge.hallyn, indan, pmoore, akpm, corbet, eric.dumazet,
	markus, coreyb, keescook

>From a security perspective I think the natural expectation would be that
the seccomp check is on the values that will actually be used, without an
intervening opportunity to change anything.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: seccomp and ptrace. what is the correct order?
  2012-05-21 18:25   ` [kernel-hardening] " Roland McGrath
  (?)
@ 2012-05-21 18:40   ` Andrew Lutomirski
  -1 siblings, 0 replies; 77+ messages in thread
From: Andrew Lutomirski @ 2012-05-21 18:40 UTC (permalink / raw)
  To: Roland McGrath
  Cc: keescook, serge.hallyn, Will Drewry, indan, Eric Paris, hpa,
	rdunlap, linux-security-module, kernel-hardening, markus, mingo,
	eric.dumazet, tglx, pmoore, peterz, oleg, coreyb, corbet,
	linux-kernel, akpm, eparis

[-- Attachment #1: Type: text/plain, Size: 410 bytes --]

On May 21, 2012 8:25 PM, "Roland McGrath" <mcgrathr@google.com> wrote:
>
> From a security perspective I think the natural expectation would be that
> the seccomp check is on the values that will actually be used, without an
> intervening opportunity to change anything.

Certainly if the seccomp sandbox allows ptrace, this is important.

[Apologies if the Android app uses some evil mime type here.]

--Andy

[-- Attachment #2: Type: text/html, Size: 525 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: seccomp and ptrace. what is the correct order?
  2012-05-21 18:21 ` [kernel-hardening] " Eric Paris
@ 2012-05-21 18:47   ` richard -rw- weinberger
  -1 siblings, 0 replies; 77+ messages in thread
From: richard -rw- weinberger @ 2012-05-21 18:47 UTC (permalink / raw)
  To: Eric Paris
  Cc: Will Drewry, linux-kernel, linux-security-module,
	kernel-hardening, hpa, mingo, oleg, peterz, rdunlap, mcgrathr,
	tglx, luto, eparis, serge.hallyn, indan, pmoore, akpm, corbet,
	eric.dumazet, markus, coreyb, keescook

On Mon, May 21, 2012 at 8:21 PM, Eric Paris <netdev@parisplace.org> wrote:
> Is that what we want?  Do we want to do the permission check based on
> what a process ask at syscall enter or do we want to do the permission
> check based on what the kernel is actually going to do on behalf of
> the process?

I think we want the latter.
A system call emulator like UserModeLinux would benefit from that.

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: seccomp and ptrace. what is the correct order?
@ 2012-05-21 18:47   ` richard -rw- weinberger
  0 siblings, 0 replies; 77+ messages in thread
From: richard -rw- weinberger @ 2012-05-21 18:47 UTC (permalink / raw)
  To: Eric Paris
  Cc: Will Drewry, linux-kernel, linux-security-module,
	kernel-hardening, hpa, mingo, oleg, peterz, rdunlap, mcgrathr,
	tglx, luto, eparis, serge.hallyn, indan, pmoore, akpm, corbet,
	eric.dumazet, markus, coreyb, keescook

On Mon, May 21, 2012 at 8:21 PM, Eric Paris <netdev@parisplace.org> wrote:
> Is that what we want?  Do we want to do the permission check based on
> what a process ask at syscall enter or do we want to do the permission
> check based on what the kernel is actually going to do on behalf of
> the process?

I think we want the latter.
A system call emulator like UserModeLinux would benefit from that.

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: seccomp and ptrace. what is the correct order?
  2012-05-21 18:47   ` [kernel-hardening] " richard -rw- weinberger
@ 2012-05-21 19:13     ` H. Peter Anvin
  -1 siblings, 0 replies; 77+ messages in thread
From: H. Peter Anvin @ 2012-05-21 19:13 UTC (permalink / raw)
  To: richard -rw- weinberger
  Cc: Eric Paris, Will Drewry, linux-kernel, linux-security-module,
	kernel-hardening, mingo, oleg, peterz, rdunlap, mcgrathr, tglx,
	luto, eparis, serge.hallyn, indan, pmoore, akpm, corbet,
	eric.dumazet, markus, coreyb, keescook

On 05/21/2012 11:47 AM, richard -rw- weinberger wrote:
> On Mon, May 21, 2012 at 8:21 PM, Eric Paris <netdev@parisplace.org> wrote:
>> Is that what we want?  Do we want to do the permission check based on
>> what a process ask at syscall enter or do we want to do the permission
>> check based on what the kernel is actually going to do on behalf of
>> the process?
> 
> I think we want the latter.
> A system call emulator like UserModeLinux would benefit from that.
> 

Are you sure?  This would mean that a seccomp program used by the
process to intercept its own system calls via SIGSYS would give
completely different results under UML than under native...

	-hpa


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: seccomp and ptrace. what is the correct order?
@ 2012-05-21 19:13     ` H. Peter Anvin
  0 siblings, 0 replies; 77+ messages in thread
From: H. Peter Anvin @ 2012-05-21 19:13 UTC (permalink / raw)
  To: richard -rw- weinberger
  Cc: Eric Paris, Will Drewry, linux-kernel, linux-security-module,
	kernel-hardening, mingo, oleg, peterz, rdunlap, mcgrathr, tglx,
	luto, eparis, serge.hallyn, indan, pmoore, akpm, corbet,
	eric.dumazet, markus, coreyb, keescook

On 05/21/2012 11:47 AM, richard -rw- weinberger wrote:
> On Mon, May 21, 2012 at 8:21 PM, Eric Paris <netdev@parisplace.org> wrote:
>> Is that what we want?  Do we want to do the permission check based on
>> what a process ask at syscall enter or do we want to do the permission
>> check based on what the kernel is actually going to do on behalf of
>> the process?
> 
> I think we want the latter.
> A system call emulator like UserModeLinux would benefit from that.
> 

Are you sure?  This would mean that a seccomp program used by the
process to intercept its own system calls via SIGSYS would give
completely different results under UML than under native...

	-hpa

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: seccomp and ptrace. what is the correct order?
  2012-05-21 18:25   ` [kernel-hardening] " Roland McGrath
@ 2012-05-21 19:20     ` Indan Zupancic
  -1 siblings, 0 replies; 77+ messages in thread
From: Indan Zupancic @ 2012-05-21 19:20 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Eric Paris, Will Drewry, linux-kernel, linux-security-module,
	kernel-hardening, hpa, mingo, oleg, peterz, rdunlap, tglx, luto,
	eparis, serge.hallyn, pmoore, akpm, corbet, eric.dumazet, markus,
	coreyb, keescook

On Mon, May 21, 2012 20:25, Roland McGrath wrote:
> From a security perspective I think the natural expectation would be that
> the seccomp check is on the values that will actually be used, without an
> intervening opportunity to change anything.

Actually, considering a tracer has full control over a traced process,
it would make most sense from a security perspective to check both the
traced task's seccomp filter, as well as the one for the ptracer for
modified system calls (calls where any register poking at all was done).
Otherwise a task could bypass its own seccomp filter by ptracing a hapless
victim.

I mentioned this before, but I forgot why this option was dismissed.
Probably because ptrace shouldn't have been allowed by the filter in
the first place.

The current patch does the seccomp check first and ignores any changes
made via ptrace, just like the old seccomp did. So in that sense nothing
changed.

Originally the seccomp filter check was in the fast path, so doing it
after ptrace was tricky. But now it has been moved to the slow tracehook
path it can easily be checked after the ptrace notification. That would
change the behaviour SECCOMP_MODE=1 though, but probably nobody cares,
as it can be argued that that was a security hole anyway (except if
ptracing a seccomped task was disallowed, in which case moving it to
the end doesn't change anything anyway).

Another argument for moving it to the end is that it makes debugging
seccomped tasks a lot easier, because the debugger sees the denied
system call. With the current patch the tasks would silently die.

Greetings,

Indan



^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: seccomp and ptrace. what is the correct order?
@ 2012-05-21 19:20     ` Indan Zupancic
  0 siblings, 0 replies; 77+ messages in thread
From: Indan Zupancic @ 2012-05-21 19:20 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Eric Paris, Will Drewry, linux-kernel, linux-security-module,
	kernel-hardening, hpa, mingo, oleg, peterz, rdunlap, tglx, luto,
	eparis, serge.hallyn, pmoore, akpm, corbet, eric.dumazet, markus,
	coreyb, keescook

On Mon, May 21, 2012 20:25, Roland McGrath wrote:
> From a security perspective I think the natural expectation would be that
> the seccomp check is on the values that will actually be used, without an
> intervening opportunity to change anything.

Actually, considering a tracer has full control over a traced process,
it would make most sense from a security perspective to check both the
traced task's seccomp filter, as well as the one for the ptracer for
modified system calls (calls where any register poking at all was done).
Otherwise a task could bypass its own seccomp filter by ptracing a hapless
victim.

I mentioned this before, but I forgot why this option was dismissed.
Probably because ptrace shouldn't have been allowed by the filter in
the first place.

The current patch does the seccomp check first and ignores any changes
made via ptrace, just like the old seccomp did. So in that sense nothing
changed.

Originally the seccomp filter check was in the fast path, so doing it
after ptrace was tricky. But now it has been moved to the slow tracehook
path it can easily be checked after the ptrace notification. That would
change the behaviour SECCOMP_MODE=1 though, but probably nobody cares,
as it can be argued that that was a security hole anyway (except if
ptracing a seccomped task was disallowed, in which case moving it to
the end doesn't change anything anyway).

Another argument for moving it to the end is that it makes debugging
seccomped tasks a lot easier, because the debugger sees the denied
system call. With the current patch the tasks would silently die.

Greetings,

Indan

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: seccomp and ptrace. what is the correct order?
  2012-05-21 19:20     ` [kernel-hardening] " Indan Zupancic
@ 2012-05-22 16:23       ` Will Drewry
  -1 siblings, 0 replies; 77+ messages in thread
From: Will Drewry @ 2012-05-22 16:23 UTC (permalink / raw)
  To: Indan Zupancic
  Cc: Roland McGrath, Eric Paris, linux-kernel, linux-security-module,
	kernel-hardening, hpa, mingo, oleg, peterz, rdunlap, tglx, luto,
	eparis, serge.hallyn, pmoore, akpm, corbet, eric.dumazet, markus,
	coreyb, keescook

On Mon, May 21, 2012 at 2:20 PM, Indan Zupancic <indan@nul.nu> wrote:
> On Mon, May 21, 2012 20:25, Roland McGrath wrote:
>> From a security perspective I think the natural expectation would be that
>> the seccomp check is on the values that will actually be used, without an
>> intervening opportunity to change anything.
>
> Actually, considering a tracer has full control over a traced process,
> it would make most sense from a security perspective to check both the
> traced task's seccomp filter, as well as the one for the ptracer for
> modified system calls (calls where any register poking at all was done).
> Otherwise a task could bypass its own seccomp filter by ptracing a hapless
> victim.
>
> I mentioned this before, but I forgot why this option was dismissed.
> Probably because ptrace shouldn't have been allowed by the filter in
> the first place.
>
> The current patch does the seccomp check first and ignores any changes
> made via ptrace, just like the old seccomp did. So in that sense nothing
> changed.
>
> Originally the seccomp filter check was in the fast path, so doing it
> after ptrace was tricky. But now it has been moved to the slow tracehook
> path it can easily be checked after the ptrace notification. That would
> change the behaviour SECCOMP_MODE=1 though, but probably nobody cares,
> as it can be argued that that was a security hole anyway (except if
> ptracing a seccomped task was disallowed, in which case moving it to
> the end doesn't change anything anyway).
>
> Another argument for moving it to the end is that it makes debugging
> seccomped tasks a lot easier, because the debugger sees the denied
> system call. With the current patch the tasks would silently die.

I don't see this as a strict bypass because an successful attack would require:
- the ability to fork/clone _and_ call ptrace (which would otherwise
be blocked by the BPF if the user of the bpf cares)
- the ability to compromise a process with ptrace abilities for a
sandboxed process -- which means that a privilege escalation (in a
word) has already occurred.

However(!), if we did move secure_computing() to after ptrace _and_
added a check after SECCOMP_RET_TRACE's ptrace_event call, we could
ensure the system call was not changed by the tracer.  This would give
strong assurances that whatever system call is executed was explicitly
allowed by seccomp policy is the one that was executed.

I'm open to either leaving things alone (since it isn't horrible) or
making the change to tighten things up. Is the mode=1 behavior change
acceptable?  I assumed it wouldn't be, but perhaps I shouldn't have
made that assumption.

Regardless, I will go ahead and make patches and test them for both,
so they are on-hand regardless.

thanks!
will

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: seccomp and ptrace. what is the correct order?
@ 2012-05-22 16:23       ` Will Drewry
  0 siblings, 0 replies; 77+ messages in thread
From: Will Drewry @ 2012-05-22 16:23 UTC (permalink / raw)
  To: Indan Zupancic
  Cc: Roland McGrath, Eric Paris, linux-kernel, linux-security-module,
	kernel-hardening, hpa, mingo, oleg, peterz, rdunlap, tglx, luto,
	eparis, serge.hallyn, pmoore, akpm, corbet, eric.dumazet, markus,
	coreyb, keescook

On Mon, May 21, 2012 at 2:20 PM, Indan Zupancic <indan@nul.nu> wrote:
> On Mon, May 21, 2012 20:25, Roland McGrath wrote:
>> From a security perspective I think the natural expectation would be that
>> the seccomp check is on the values that will actually be used, without an
>> intervening opportunity to change anything.
>
> Actually, considering a tracer has full control over a traced process,
> it would make most sense from a security perspective to check both the
> traced task's seccomp filter, as well as the one for the ptracer for
> modified system calls (calls where any register poking at all was done).
> Otherwise a task could bypass its own seccomp filter by ptracing a hapless
> victim.
>
> I mentioned this before, but I forgot why this option was dismissed.
> Probably because ptrace shouldn't have been allowed by the filter in
> the first place.
>
> The current patch does the seccomp check first and ignores any changes
> made via ptrace, just like the old seccomp did. So in that sense nothing
> changed.
>
> Originally the seccomp filter check was in the fast path, so doing it
> after ptrace was tricky. But now it has been moved to the slow tracehook
> path it can easily be checked after the ptrace notification. That would
> change the behaviour SECCOMP_MODE=1 though, but probably nobody cares,
> as it can be argued that that was a security hole anyway (except if
> ptracing a seccomped task was disallowed, in which case moving it to
> the end doesn't change anything anyway).
>
> Another argument for moving it to the end is that it makes debugging
> seccomped tasks a lot easier, because the debugger sees the denied
> system call. With the current patch the tasks would silently die.

I don't see this as a strict bypass because an successful attack would require:
- the ability to fork/clone _and_ call ptrace (which would otherwise
be blocked by the BPF if the user of the bpf cares)
- the ability to compromise a process with ptrace abilities for a
sandboxed process -- which means that a privilege escalation (in a
word) has already occurred.

However(!), if we did move secure_computing() to after ptrace _and_
added a check after SECCOMP_RET_TRACE's ptrace_event call, we could
ensure the system call was not changed by the tracer.  This would give
strong assurances that whatever system call is executed was explicitly
allowed by seccomp policy is the one that was executed.

I'm open to either leaving things alone (since it isn't horrible) or
making the change to tighten things up. Is the mode=1 behavior change
acceptable?  I assumed it wouldn't be, but perhaps I shouldn't have
made that assumption.

Regardless, I will go ahead and make patches and test them for both,
so they are on-hand regardless.

thanks!
will

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: seccomp and ptrace. what is the correct order?
  2012-05-22 16:23       ` [kernel-hardening] " Will Drewry
@ 2012-05-22 16:26         ` Will Drewry
  -1 siblings, 0 replies; 77+ messages in thread
From: Will Drewry @ 2012-05-22 16:26 UTC (permalink / raw)
  To: Indan Zupancic
  Cc: Roland McGrath, Eric Paris, linux-kernel, linux-security-module,
	kernel-hardening, hpa, mingo, oleg, peterz, rdunlap, tglx, luto,
	eparis, serge.hallyn, pmoore, akpm, corbet, eric.dumazet, markus,
	coreyb, keescook

On Tue, May 22, 2012 at 11:23 AM, Will Drewry <wad@chromium.org> wrote:
> On Mon, May 21, 2012 at 2:20 PM, Indan Zupancic <indan@nul.nu> wrote:
>> On Mon, May 21, 2012 20:25, Roland McGrath wrote:
>>> From a security perspective I think the natural expectation would be that
>>> the seccomp check is on the values that will actually be used, without an
>>> intervening opportunity to change anything.
>>
>> Actually, considering a tracer has full control over a traced process,
>> it would make most sense from a security perspective to check both the
>> traced task's seccomp filter, as well as the one for the ptracer for
>> modified system calls (calls where any register poking at all was done).
>> Otherwise a task could bypass its own seccomp filter by ptracing a hapless
>> victim.
>>
>> I mentioned this before, but I forgot why this option was dismissed.
>> Probably because ptrace shouldn't have been allowed by the filter in
>> the first place.
>>
>> The current patch does the seccomp check first and ignores any changes
>> made via ptrace, just like the old seccomp did. So in that sense nothing
>> changed.
>>
>> Originally the seccomp filter check was in the fast path, so doing it
>> after ptrace was tricky. But now it has been moved to the slow tracehook
>> path it can easily be checked after the ptrace notification. That would
>> change the behaviour SECCOMP_MODE=1 though, but probably nobody cares,
>> as it can be argued that that was a security hole anyway (except if
>> ptracing a seccomped task was disallowed, in which case moving it to
>> the end doesn't change anything anyway).
>>
>> Another argument for moving it to the end is that it makes debugging
>> seccomped tasks a lot easier, because the debugger sees the denied
>> system call. With the current patch the tasks would silently die.
>
> I don't see this as a strict bypass because an successful attack would require:
> - the ability to fork/clone _and_ call ptrace (which would otherwise
> be blocked by the BPF if the user of the bpf cares)
> - the ability to compromise a process with ptrace abilities for a
> sandboxed process -- which means that a privilege escalation (in a
> word) has already occurred.
>
> However(!), if we did move secure_computing() to after ptrace _and_
> added a check after SECCOMP_RET_TRACE's ptrace_event call, we could
> ensure the system call was not changed by the tracer.  This would give
> strong assurances that whatever system call is executed was explicitly
> allowed by seccomp policy is the one that was executed.
>
> I'm open to either leaving things alone (since it isn't horrible) or

Let me rephrase :)  The current behavior makes sense for an inwardly
focused system. seccomp mode=1 and mode=2 are about the behavior of
the task they apply to.  In that regard, both modes block (or can
block) ptrace, which means that there is no way for them to exploit
this -- they are operating correctly.

The discussion we're having now is if that is the desirable behavior.
Do we want seccomp to be a system call contract between userspace and
the kernel for the task no matter what, or do we want it to be a
system call contract between the specific task's code and the kernel
(thereby allowing a tracer to make things act totally differently).

> making the change to tighten things up. Is the mode=1 behavior change
> acceptable?  I assumed it wouldn't be, but perhaps I shouldn't have
> made that assumption.
>
> Regardless, I will go ahead and make patches and test them for both,
> so they are on-hand regardless.
>
> thanks!
> will

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: seccomp and ptrace. what is the correct order?
@ 2012-05-22 16:26         ` Will Drewry
  0 siblings, 0 replies; 77+ messages in thread
From: Will Drewry @ 2012-05-22 16:26 UTC (permalink / raw)
  To: Indan Zupancic
  Cc: Roland McGrath, Eric Paris, linux-kernel, linux-security-module,
	kernel-hardening, hpa, mingo, oleg, peterz, rdunlap, tglx, luto,
	eparis, serge.hallyn, pmoore, akpm, corbet, eric.dumazet, markus,
	coreyb, keescook

On Tue, May 22, 2012 at 11:23 AM, Will Drewry <wad@chromium.org> wrote:
> On Mon, May 21, 2012 at 2:20 PM, Indan Zupancic <indan@nul.nu> wrote:
>> On Mon, May 21, 2012 20:25, Roland McGrath wrote:
>>> From a security perspective I think the natural expectation would be that
>>> the seccomp check is on the values that will actually be used, without an
>>> intervening opportunity to change anything.
>>
>> Actually, considering a tracer has full control over a traced process,
>> it would make most sense from a security perspective to check both the
>> traced task's seccomp filter, as well as the one for the ptracer for
>> modified system calls (calls where any register poking at all was done).
>> Otherwise a task could bypass its own seccomp filter by ptracing a hapless
>> victim.
>>
>> I mentioned this before, but I forgot why this option was dismissed.
>> Probably because ptrace shouldn't have been allowed by the filter in
>> the first place.
>>
>> The current patch does the seccomp check first and ignores any changes
>> made via ptrace, just like the old seccomp did. So in that sense nothing
>> changed.
>>
>> Originally the seccomp filter check was in the fast path, so doing it
>> after ptrace was tricky. But now it has been moved to the slow tracehook
>> path it can easily be checked after the ptrace notification. That would
>> change the behaviour SECCOMP_MODE=1 though, but probably nobody cares,
>> as it can be argued that that was a security hole anyway (except if
>> ptracing a seccomped task was disallowed, in which case moving it to
>> the end doesn't change anything anyway).
>>
>> Another argument for moving it to the end is that it makes debugging
>> seccomped tasks a lot easier, because the debugger sees the denied
>> system call. With the current patch the tasks would silently die.
>
> I don't see this as a strict bypass because an successful attack would require:
> - the ability to fork/clone _and_ call ptrace (which would otherwise
> be blocked by the BPF if the user of the bpf cares)
> - the ability to compromise a process with ptrace abilities for a
> sandboxed process -- which means that a privilege escalation (in a
> word) has already occurred.
>
> However(!), if we did move secure_computing() to after ptrace _and_
> added a check after SECCOMP_RET_TRACE's ptrace_event call, we could
> ensure the system call was not changed by the tracer.  This would give
> strong assurances that whatever system call is executed was explicitly
> allowed by seccomp policy is the one that was executed.
>
> I'm open to either leaving things alone (since it isn't horrible) or

Let me rephrase :)  The current behavior makes sense for an inwardly
focused system. seccomp mode=1 and mode=2 are about the behavior of
the task they apply to.  In that regard, both modes block (or can
block) ptrace, which means that there is no way for them to exploit
this -- they are operating correctly.

The discussion we're having now is if that is the desirable behavior.
Do we want seccomp to be a system call contract between userspace and
the kernel for the task no matter what, or do we want it to be a
system call contract between the specific task's code and the kernel
(thereby allowing a tracer to make things act totally differently).

> making the change to tighten things up. Is the mode=1 behavior change
> acceptable?  I assumed it wouldn't be, but perhaps I shouldn't have
> made that assumption.
>
> Regardless, I will go ahead and make patches and test them for both,
> so they are on-hand regardless.
>
> thanks!
> will

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: seccomp and ptrace. what is the correct order?
  2012-05-22 16:23       ` [kernel-hardening] " Will Drewry
@ 2012-05-22 17:39         ` Al Viro
  -1 siblings, 0 replies; 77+ messages in thread
From: Al Viro @ 2012-05-22 17:39 UTC (permalink / raw)
  To: Will Drewry
  Cc: Indan Zupancic, Roland McGrath, Eric Paris, linux-kernel,
	linux-security-module, kernel-hardening, hpa, mingo, oleg,
	peterz, rdunlap, tglx, luto, eparis, serge.hallyn, pmoore, akpm,
	corbet, eric.dumazet, markus, coreyb, keescook

On Tue, May 22, 2012 at 11:23:06AM -0500, Will Drewry wrote:

> However(!), if we did move secure_computing() to after ptrace _and_
> added a check after SECCOMP_RET_TRACE's ptrace_event call, we could
> ensure the system call was not changed by the tracer.  This would give
> strong assurances that whatever system call is executed was explicitly
> allowed by seccomp policy is the one that was executed.

BTW, after grepping around a bit, I have to say that some callers of those
hooks make very little sense

Exhibit A: sh32 has in do_syscall_trace_enter(regs)
        secure_computing(regs->regs[0]);
Syscall number in r0, right?
	[usual PTRACE_SYSCALL bits]
        if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
                trace_sys_enter(regs, regs->regs[0]);
Ditto
        audit_syscall_entry(audit_arch(), regs->regs[3],
                            regs->regs[4], regs->regs[5],
                            regs->regs[6], regs->regs[7]);
Oops - that one says syscall number in r3, first 4 arguments in r4..r7
        return ret ?: regs->regs[0];

and the caller of that sucker is
syscall_trace_entry:
        !                       Yes it is traced.
        mov     r15, r4
        mov.l   7f, r11         ! Call do_syscall_trace_enter which notifies
        jsr     @r11            ! superior (will chomp R[0-7])
         nop
        mov.l   r0, @(OFF_R0,r15)       ! Save return value
        !                       Reload R0-R4 from kernel stack, where the
        !                       parent may have modified them using
        !                       ptrace(POKEUSR).  (Note that R0-R2 are
        !                       used by the system call handler directly
        !                       from the kernel stack anyway, so don't need
        !                       to be reloaded here.)  This allows the parent
        !                       to rewrite system calls and args on the fly.
        mov.l   @(OFF_R4,r15), r4   ! arg0
        mov.l   @(OFF_R5,r15), r5  
        mov.l   @(OFF_R6,r15), r6
        mov.l   @(OFF_R7,r15), r7   ! arg3
        mov.l   @(OFF_R3,r15), r3   ! syscall_nr
        !
        mov.l   2f, r10                 ! Number of syscalls
        cmp/hs  r10, r3
        bf      syscall_call
[...]
7:      .long   do_syscall_trace_enter

... and syscall_call very clearly picks an index in syscall table from r3.
Incidentally, r0 is the fifth syscall argument...  So what we have is
	* b0rken hookups for seccomp and tracepoint
	* b0rken cancelling of syscalls by ptrace (replacing syscall number
with -1 would've worked; doing that to the 5th argument - not really)

Exhibit B: sh64; makes even less sense, but there the assembler glue had
been too dense for me to get through.  At the very least, seccomp and
tracepoint are assuming that syscall number is in r9, while audit is
assuming that it's in r1.  I'm not too inclined to trust audit in this
case, though.  The _really_ interesting part is that by the look of
their pt_regs syscall number is stored separately - not in regs->regs[]
at all.  And the caller
	* shoves the return value of do_syscall_trace_enter() into regs->regs[2]
	* picks syscall number from regs->syscall_nr and uses that as index
in sys_call_table
	* seems to imply that arguments are in regs->regs[2..7]
	* code around the (presumable) path leading there seems to imply
that syscall number comes from the trap number and isn't in regs->regs[]
at all.  But I might be misreading that assembler.  Easily.

Exhibit C:
mips is consistent these days, but it has no tracepoint hookup *and* it does
open-code tracehook_report_syscall_entry(), except for its return value...
Used to pass the wrong register to seccomp, IIRC.

We really ought to look into merging those suckers.  It's a source of PITA
that keeps coming back; what we need is
	regs_syscall_number(struct pt_regs *)
	regs_syscall_arg1(struct pt_regs *)
	...
	regs_syscall_arg6(struct pt_regs *)
in addition to existing
	regs_return_value(struct pt_regs *)
added on all platforms and made mandatory for new ones.  With that we
could go a long way towards merging these implementations...

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: seccomp and ptrace. what is the correct order?
@ 2012-05-22 17:39         ` Al Viro
  0 siblings, 0 replies; 77+ messages in thread
From: Al Viro @ 2012-05-22 17:39 UTC (permalink / raw)
  To: Will Drewry
  Cc: Indan Zupancic, Roland McGrath, Eric Paris, linux-kernel,
	linux-security-module, kernel-hardening, hpa, mingo, oleg,
	peterz, rdunlap, tglx, luto, eparis, serge.hallyn, pmoore, akpm,
	corbet, eric.dumazet, markus, coreyb, keescook

On Tue, May 22, 2012 at 11:23:06AM -0500, Will Drewry wrote:

> However(!), if we did move secure_computing() to after ptrace _and_
> added a check after SECCOMP_RET_TRACE's ptrace_event call, we could
> ensure the system call was not changed by the tracer.  This would give
> strong assurances that whatever system call is executed was explicitly
> allowed by seccomp policy is the one that was executed.

BTW, after grepping around a bit, I have to say that some callers of those
hooks make very little sense

Exhibit A: sh32 has in do_syscall_trace_enter(regs)
        secure_computing(regs->regs[0]);
Syscall number in r0, right?
	[usual PTRACE_SYSCALL bits]
        if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
                trace_sys_enter(regs, regs->regs[0]);
Ditto
        audit_syscall_entry(audit_arch(), regs->regs[3],
                            regs->regs[4], regs->regs[5],
                            regs->regs[6], regs->regs[7]);
Oops - that one says syscall number in r3, first 4 arguments in r4..r7
        return ret ?: regs->regs[0];

and the caller of that sucker is
syscall_trace_entry:
        !                       Yes it is traced.
        mov     r15, r4
        mov.l   7f, r11         ! Call do_syscall_trace_enter which notifies
        jsr     @r11            ! superior (will chomp R[0-7])
         nop
        mov.l   r0, @(OFF_R0,r15)       ! Save return value
        !                       Reload R0-R4 from kernel stack, where the
        !                       parent may have modified them using
        !                       ptrace(POKEUSR).  (Note that R0-R2 are
        !                       used by the system call handler directly
        !                       from the kernel stack anyway, so don't need
        !                       to be reloaded here.)  This allows the parent
        !                       to rewrite system calls and args on the fly.
        mov.l   @(OFF_R4,r15), r4   ! arg0
        mov.l   @(OFF_R5,r15), r5  
        mov.l   @(OFF_R6,r15), r6
        mov.l   @(OFF_R7,r15), r7   ! arg3
        mov.l   @(OFF_R3,r15), r3   ! syscall_nr
        !
        mov.l   2f, r10                 ! Number of syscalls
        cmp/hs  r10, r3
        bf      syscall_call
[...]
7:      .long   do_syscall_trace_enter

... and syscall_call very clearly picks an index in syscall table from r3.
Incidentally, r0 is the fifth syscall argument...  So what we have is
	* b0rken hookups for seccomp and tracepoint
	* b0rken cancelling of syscalls by ptrace (replacing syscall number
with -1 would've worked; doing that to the 5th argument - not really)

Exhibit B: sh64; makes even less sense, but there the assembler glue had
been too dense for me to get through.  At the very least, seccomp and
tracepoint are assuming that syscall number is in r9, while audit is
assuming that it's in r1.  I'm not too inclined to trust audit in this
case, though.  The _really_ interesting part is that by the look of
their pt_regs syscall number is stored separately - not in regs->regs[]
at all.  And the caller
	* shoves the return value of do_syscall_trace_enter() into regs->regs[2]
	* picks syscall number from regs->syscall_nr and uses that as index
in sys_call_table
	* seems to imply that arguments are in regs->regs[2..7]
	* code around the (presumable) path leading there seems to imply
that syscall number comes from the trap number and isn't in regs->regs[]
at all.  But I might be misreading that assembler.  Easily.

Exhibit C:
mips is consistent these days, but it has no tracepoint hookup *and* it does
open-code tracehook_report_syscall_entry(), except for its return value...
Used to pass the wrong register to seccomp, IIRC.

We really ought to look into merging those suckers.  It's a source of PITA
that keeps coming back; what we need is
	regs_syscall_number(struct pt_regs *)
	regs_syscall_arg1(struct pt_regs *)
	...
	regs_syscall_arg6(struct pt_regs *)
in addition to existing
	regs_return_value(struct pt_regs *)
added on all platforms and made mandatory for new ones.  With that we
could go a long way towards merging these implementations...

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: seccomp and ptrace. what is the correct order?
  2012-05-22 17:39         ` [kernel-hardening] " Al Viro
@ 2012-05-22 20:26           ` Will Drewry
  -1 siblings, 0 replies; 77+ messages in thread
From: Will Drewry @ 2012-05-22 20:26 UTC (permalink / raw)
  To: Al Viro
  Cc: Indan Zupancic, Roland McGrath, Eric Paris, linux-kernel,
	linux-security-module, kernel-hardening, hpa, mingo, oleg,
	peterz, rdunlap, tglx, luto, eparis, serge.hallyn, pmoore, akpm,
	corbet, eric.dumazet, markus, coreyb, keescook

[-- Attachment #1: Type: text/plain, Size: 5496 bytes --]

On Tue, May 22, 2012 at 12:39 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Tue, May 22, 2012 at 11:23:06AM -0500, Will Drewry wrote:
>
>> However(!), if we did move secure_computing() to after ptrace _and_
>> added a check after SECCOMP_RET_TRACE's ptrace_event call, we could
>> ensure the system call was not changed by the tracer.  This would give
>> strong assurances that whatever system call is executed was explicitly
>> allowed by seccomp policy is the one that was executed.
>
> BTW, after grepping around a bit, I have to say that some callers of those
> hooks make very little sense

Yeah - the arch specific seccomp and ptrace code is in dire need of
attention on a number of platforms.

> Exhibit A: sh32 has in do_syscall_trace_enter(regs)
>        secure_computing(regs->regs[0]);
> Syscall number in r0, right?
>        [usual PTRACE_SYSCALL bits]
>        if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
>                trace_sys_enter(regs, regs->regs[0]);
> Ditto
>        audit_syscall_entry(audit_arch(), regs->regs[3],
>                            regs->regs[4], regs->regs[5],
>                            regs->regs[6], regs->regs[7]);
> Oops - that one says syscall number in r3, first 4 arguments in r4..r7
>        return ret ?: regs->regs[0];
>
> and the caller of that sucker is
> syscall_trace_entry:
>        !                       Yes it is traced.
>        mov     r15, r4
>        mov.l   7f, r11         ! Call do_syscall_trace_enter which notifies
>        jsr     @r11            ! superior (will chomp R[0-7])
>         nop
>        mov.l   r0, @(OFF_R0,r15)       ! Save return value
>        !                       Reload R0-R4 from kernel stack, where the
>        !                       parent may have modified them using
>        !                       ptrace(POKEUSR).  (Note that R0-R2 are
>        !                       used by the system call handler directly
>        !                       from the kernel stack anyway, so don't need
>        !                       to be reloaded here.)  This allows the parent
>        !                       to rewrite system calls and args on the fly.
>        mov.l   @(OFF_R4,r15), r4   ! arg0
>        mov.l   @(OFF_R5,r15), r5
>        mov.l   @(OFF_R6,r15), r6
>        mov.l   @(OFF_R7,r15), r7   ! arg3
>        mov.l   @(OFF_R3,r15), r3   ! syscall_nr
>        !
>        mov.l   2f, r10                 ! Number of syscalls
>        cmp/hs  r10, r3
>        bf      syscall_call
> [...]
> 7:      .long   do_syscall_trace_enter
>
> ... and syscall_call very clearly picks an index in syscall table from r3.
> Incidentally, r0 is the fifth syscall argument...  So what we have is
>        * b0rken hookups for seccomp and tracepoint
>        * b0rken cancelling of syscalls by ptrace (replacing syscall number
> with -1 would've worked; doing that to the 5th argument - not really)
>
> Exhibit B: sh64; makes even less sense, but there the assembler glue had
> been too dense for me to get through.  At the very least, seccomp and
> tracepoint are assuming that syscall number is in r9, while audit is
> assuming that it's in r1.  I'm not too inclined to trust audit in this
> case, though.  The _really_ interesting part is that by the look of
> their pt_regs syscall number is stored separately - not in regs->regs[]
> at all.  And the caller
>        * shoves the return value of do_syscall_trace_enter() into regs->regs[2]
>        * picks syscall number from regs->syscall_nr and uses that as index
> in sys_call_table
>        * seems to imply that arguments are in regs->regs[2..7]
>        * code around the (presumable) path leading there seems to imply
> that syscall number comes from the trap number and isn't in regs->regs[]
> at all.  But I might be misreading that assembler.  Easily.
>
> Exhibit C:
> mips is consistent these days, but it has no tracepoint hookup *and* it does
> open-code tracehook_report_syscall_entry(), except for its return value...
> Used to pass the wrong register to seccomp, IIRC.
>
> We really ought to look into merging those suckers.  It's a source of PITA
> that keeps coming back; what we need is
>        regs_syscall_number(struct pt_regs *)
>        regs_syscall_arg1(struct pt_regs *)
>        ...
>        regs_syscall_arg6(struct pt_regs *)
> in addition to existing
>        regs_return_value(struct pt_regs *)
> added on all platforms and made mandatory for new ones.  With that we
> could go a long way towards merging these implementations...

For fun, I took a stab at that to see how it'd play out for x86.
(Attached instead of inline since it's just a first cut.)

I do think much of the ptrace/seccomp could be merged, but I don't
know how quickly given all the changes that each arch will need.

What makes sense for the current seccomp work?  I'm not displeased
with the in-tree behavior, but the alternative approach would change
the ptrace+seccomp interactions (minimally) just by switching the
ordering.  It's not hard to change all the existing arches that call
secure_computing calls (I have a tentative patch set), but it wouldn't
fix their general brokenness.

Anyway, I'd like to see seccomp work properly on all arches, but if
there are strong opinions about the correct expectations between
seccomp and ptrace, I'm happy to fix that /now/ for whatever arches it
makes sense for.

thanks!
will

[-- Attachment #2: 0001-arch-x86-asm-generic-add-syscall_regs.h.patch.txt --]
[-- Type: text/plain, Size: 8873 bytes --]

From f281e194e7dee18134266fa4af926d1f1e5ca208 Mon Sep 17 00:00:00 2001
From: Will Drewry <wad@chromium.org>
Date: Tue, 22 May 2012 14:54:16 -0500
Subject: [PATCH] arch/x86,asm-generic: add syscall_regs.h

asm/syscall.h provides a reliable layer of abstraction for accessing
system call-related values across arches.  It doesn't come without
costs, and using asm/syscall.h values in place of direct struct pt_regs
access adds non-trivial overhead.

While asm/syscall.h has its place encoding tricky logic, like
CONFIG_COMPAT behavior, there are many places where arch-specific code
could be moved into a shared location if there were arch-agnostic
accessors for the system call register properties.

asm/syscall_regs.h provides inline functions which provide direct
access to the pt_regs members for the cases where asm/syscall.h
doesn't make sense or is too costly.

As an example arch/x86/kernel/ptrace.c has been converted to use the
macros.  The resulting ptrace.o file is the exact same size and contains
the exact same output as the pre-patch version.

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Will Drewry <wad@chromium.org>
---
 arch/x86/include/asm/syscall_regs.h |   91 +++++++++++++++++++++++++++++++++++
 arch/x86/kernel/ptrace.c            |   23 +++++----
 include/asm-generic/syscall_regs.h  |   76 +++++++++++++++++++++++++++++
 3 files changed, 181 insertions(+), 9 deletions(-)
 create mode 100644 arch/x86/include/asm/syscall_regs.h
 create mode 100644 include/asm-generic/syscall_regs.h

diff --git a/arch/x86/include/asm/syscall_regs.h b/arch/x86/include/asm/syscall_regs.h
new file mode 100644
index 0000000..68aacf4
--- /dev/null
+++ b/arch/x86/include/asm/syscall_regs.h
@@ -0,0 +1,91 @@
+/*
+ * Efficient, uniform access to system call-related registers
+ *
+ * Copyright (C) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *
+ * This copyrighted material is made available to anyone wishing to use,
+ * modify, copy, or redistribute it subject to the terms and conditions
+ * of the GNU General Public License v.2.
+ *
+ * See asm-generic/syscall_regs.h for descriptions of what we must do here.
+ */
+
+#ifndef _ASM_X86_SYSCALL_REGS_H
+#define _ASM_X86_SYSCALL_REGS_H
+
+#include <asm/unistd.h>
+
+static inline unsigned long *regs_syscall_nr(struct pt_regs *regs)
+{
+	return &regs->orig_ax;
+}
+
+static inline unsigned long *regs_syscall_return(struct pt_regs *regs)
+{
+	/* Note! This does not handle compat mode sign extension, etc */
+	return &regs->ax;
+}
+
+static inline unsigned long *regs_syscall32_arg1(struct pt_regs *regs)
+{
+		return &regs->bx;
+}
+
+static inline unsigned long *regs_syscall32_arg2(struct pt_regs *regs)
+{
+		return &regs->cx;
+}
+
+static inline unsigned long *regs_syscall32_arg3(struct pt_regs *regs)
+{
+		return &regs->dx;
+}
+
+static inline unsigned long *regs_syscall32_arg4(struct pt_regs *regs)
+{
+		return &regs->si;
+}
+
+static inline unsigned long *regs_syscall32_arg5(struct pt_regs *regs)
+{
+		return &regs->di;
+}
+
+static inline unsigned long *regs_syscall32_arg6(struct pt_regs *regs)
+{
+		return &regs->bp;
+}
+
+#ifdef CONFIG_X86_64
+static inline unsigned long *regs_syscall64_arg1(struct pt_regs *regs)
+{
+		return &regs->di;
+}
+
+static inline unsigned long *regs_syscall64_arg2(struct pt_regs *regs)
+{
+		return &regs->si;
+}
+
+static inline unsigned long *regs_syscall64_arg3(struct pt_regs *regs)
+{
+		return &regs->dx;
+}
+
+static inline unsigned long *regs_syscall64_arg4(struct pt_regs *regs)
+{
+		return &regs->r10;
+}
+
+static inline unsigned long *regs_syscall64_arg5(struct pt_regs *regs)
+{
+		return &regs->r8;
+}
+
+static inline unsigned long *regs_syscall64_arg6(struct pt_regs *regs)
+{
+		return &regs->r9;
+}
+#endif	/* CONFIG_X86_64 */
+
+#endif	/* _ASM_X86_SYSCALL_REGS_H */
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 13474d0..a02ac2a 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -34,6 +34,7 @@
 #include <asm/proto.h>
 #include <asm/hw_breakpoint.h>
 #include <asm/traps.h>
+#include <asm/syscall_regs.h>
 
 #include "tls.h"
 
@@ -1487,29 +1488,33 @@ long syscall_trace_enter(struct pt_regs *regs)
 		ret = -1L;
 
 	/* do the secure computing after userspace can't change the syscall. */
-	if (!ret && secure_computing(regs->orig_ax)) {
+	if (!ret && secure_computing(*regs_syscall_nr(regs))) {
 		ret = -1L;
 		goto out;
 	}
 
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
-		trace_sys_enter(regs, regs->orig_ax);
+		trace_sys_enter(regs, *regs_syscall_nr(regs));
 
 	if (IS_IA32)
 		audit_syscall_entry(AUDIT_ARCH_I386,
-				    regs->orig_ax,
-				    regs->bx, regs->cx,
-				    regs->dx, regs->si);
+				    *regs_syscall_nr(regs),
+				    *regs_syscall32_arg1(regs),
+				    *regs_syscall32_arg2(regs),
+				    *regs_syscall32_arg3(regs),
+				    *regs_syscall32_arg4(regs));
 #ifdef CONFIG_X86_64
 	else
 		audit_syscall_entry(AUDIT_ARCH_X86_64,
-				    regs->orig_ax,
-				    regs->di, regs->si,
-				    regs->dx, regs->r10);
+				    *regs_syscall_nr(regs),
+				    *regs_syscall64_arg1(regs),
+				    *regs_syscall64_arg2(regs),
+				    *regs_syscall64_arg3(regs),
+				    *regs_syscall64_arg4(regs));
 #endif
 
 out:
-	return ret ?: regs->orig_ax;
+	return ret ?: syscall_get_nr(current, regs);
 }
 
 void syscall_trace_leave(struct pt_regs *regs)
diff --git a/include/asm-generic/syscall_regs.h b/include/asm-generic/syscall_regs.h
new file mode 100644
index 0000000..7b3fb5b
--- /dev/null
+++ b/include/asm-generic/syscall_regs.h
@@ -0,0 +1,76 @@
+/*
+ * Efficient, uniform access to system call-related registers
+ *
+ * Copyright (C) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *
+ * This copyrighted material is made available to anyone wishing to use,
+ * modify, copy, or redistribute it subject to the terms and conditions
+ * of the GNU General Public License v.2.
+ *
+ * This file is a stub providing documentation for what functions
+ * asm-ARCH/syscall_regs.h files need to define. All arch definitions
+ * should be simple inlines.
+ *
+ * All functions are meant to be cross-arch accessors into pt_regs.
+ * This means that no locks are expected and a populated pt_regs is
+ * expected.  It also means that any per-arch expectations, like
+ * runtime CONFIG_COMPAT behavior, must be handled by the caller.
+ *
+ * See asm/syscall.h for implementations that handle higher level
+ * runtime logic.
+ */
+
+#ifndef _ASM_X86_SYSCALL_REGS_H
+#define _ASM_X86_SYSCALL_REGS_H
+
+struct pt_regs;
+
+/**
+ * regs_syscall_nr - returns a pointer to the system call number entry
+ * @regs:	task_pt_regs() for the targeted task
+ *
+ * It's only valid to call this when @regs is known to be populated and
+ * the caller is the only reader/writer.
+ */
+static inline unsigned long *regs_syscall_nr(struct pt_regs *regs);
+
+/**
+ * regs_syscall_return - returns a pointer to the system call return value entry
+ * @regs:	task_pt_regs() for the targeted task
+ *
+ * It's only valid to call this when @regs is known to be populated and
+ * the caller is the only reader/writer.
+ */
+static inline unsigned long *regs_syscall_return(struct pt_regs *regs);
+
+/**
+ * regs_syscall32_arg1...6 - returns a pointer to the system call arg entry
+ * @regs:	task_pt_regs() for the targeted task
+ *
+ * It's only valid to call this when @regs is known to be populated and
+ * the caller is the only reader/writer.  The behavior is undefined if
+ * these functions are called on non-32-bit system call argument registers.
+ */
+static inline unsigned long *regs_syscall32_arg1(struct pt_regs *regs);
+static inline unsigned long *regs_syscall32_arg2(struct pt_regs *regs);
+static inline unsigned long *regs_syscall32_arg3(struct pt_regs *regs);
+static inline unsigned long *regs_syscall32_arg4(struct pt_regs *regs);
+static inline unsigned long *regs_syscall32_arg5(struct pt_regs *regs);
+static inline unsigned long *regs_syscall32_arg6(struct pt_regs *regs);
+
+/**
+ * regs_syscall64_arg1...6 - returns a pointer to the system call arg entry
+ * @regs:	task_pt_regs() for the targeted task
+ *
+ * It's only valid to call this when @regs is known to be populated and
+ * the caller is the only reader/writer.  The behavior is undefined if
+ * these functions are called on non-32-bit system call argument registers.
+ */
+static inline unsigned long *regs_syscall32_arg1(struct pt_regs *regs);
+static inline unsigned long *regs_syscall32_arg2(struct pt_regs *regs);
+static inline unsigned long *regs_syscall32_arg3(struct pt_regs *regs);
+static inline unsigned long *regs_syscall32_arg4(struct pt_regs *regs);
+static inline unsigned long *regs_syscall32_arg5(struct pt_regs *regs);
+static inline unsigned long *regs_syscall32_arg6(struct pt_regs *regs);
+
+#endif	/* _ASM_X86_SYSCALL_REGS_H */
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: seccomp and ptrace. what is the correct order?
@ 2012-05-22 20:26           ` Will Drewry
  0 siblings, 0 replies; 77+ messages in thread
From: Will Drewry @ 2012-05-22 20:26 UTC (permalink / raw)
  To: Al Viro
  Cc: Indan Zupancic, Roland McGrath, Eric Paris, linux-kernel,
	linux-security-module, kernel-hardening, hpa, mingo, oleg,
	peterz, rdunlap, tglx, luto, eparis, serge.hallyn, pmoore, akpm,
	corbet, eric.dumazet, markus, coreyb, keescook

[-- Attachment #1: Type: text/plain, Size: 5496 bytes --]

On Tue, May 22, 2012 at 12:39 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Tue, May 22, 2012 at 11:23:06AM -0500, Will Drewry wrote:
>
>> However(!), if we did move secure_computing() to after ptrace _and_
>> added a check after SECCOMP_RET_TRACE's ptrace_event call, we could
>> ensure the system call was not changed by the tracer.  This would give
>> strong assurances that whatever system call is executed was explicitly
>> allowed by seccomp policy is the one that was executed.
>
> BTW, after grepping around a bit, I have to say that some callers of those
> hooks make very little sense

Yeah - the arch specific seccomp and ptrace code is in dire need of
attention on a number of platforms.

> Exhibit A: sh32 has in do_syscall_trace_enter(regs)
>        secure_computing(regs->regs[0]);
> Syscall number in r0, right?
>        [usual PTRACE_SYSCALL bits]
>        if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
>                trace_sys_enter(regs, regs->regs[0]);
> Ditto
>        audit_syscall_entry(audit_arch(), regs->regs[3],
>                            regs->regs[4], regs->regs[5],
>                            regs->regs[6], regs->regs[7]);
> Oops - that one says syscall number in r3, first 4 arguments in r4..r7
>        return ret ?: regs->regs[0];
>
> and the caller of that sucker is
> syscall_trace_entry:
>        !                       Yes it is traced.
>        mov     r15, r4
>        mov.l   7f, r11         ! Call do_syscall_trace_enter which notifies
>        jsr     @r11            ! superior (will chomp R[0-7])
>         nop
>        mov.l   r0, @(OFF_R0,r15)       ! Save return value
>        !                       Reload R0-R4 from kernel stack, where the
>        !                       parent may have modified them using
>        !                       ptrace(POKEUSR).  (Note that R0-R2 are
>        !                       used by the system call handler directly
>        !                       from the kernel stack anyway, so don't need
>        !                       to be reloaded here.)  This allows the parent
>        !                       to rewrite system calls and args on the fly.
>        mov.l   @(OFF_R4,r15), r4   ! arg0
>        mov.l   @(OFF_R5,r15), r5
>        mov.l   @(OFF_R6,r15), r6
>        mov.l   @(OFF_R7,r15), r7   ! arg3
>        mov.l   @(OFF_R3,r15), r3   ! syscall_nr
>        !
>        mov.l   2f, r10                 ! Number of syscalls
>        cmp/hs  r10, r3
>        bf      syscall_call
> [...]
> 7:      .long   do_syscall_trace_enter
>
> ... and syscall_call very clearly picks an index in syscall table from r3.
> Incidentally, r0 is the fifth syscall argument...  So what we have is
>        * b0rken hookups for seccomp and tracepoint
>        * b0rken cancelling of syscalls by ptrace (replacing syscall number
> with -1 would've worked; doing that to the 5th argument - not really)
>
> Exhibit B: sh64; makes even less sense, but there the assembler glue had
> been too dense for me to get through.  At the very least, seccomp and
> tracepoint are assuming that syscall number is in r9, while audit is
> assuming that it's in r1.  I'm not too inclined to trust audit in this
> case, though.  The _really_ interesting part is that by the look of
> their pt_regs syscall number is stored separately - not in regs->regs[]
> at all.  And the caller
>        * shoves the return value of do_syscall_trace_enter() into regs->regs[2]
>        * picks syscall number from regs->syscall_nr and uses that as index
> in sys_call_table
>        * seems to imply that arguments are in regs->regs[2..7]
>        * code around the (presumable) path leading there seems to imply
> that syscall number comes from the trap number and isn't in regs->regs[]
> at all.  But I might be misreading that assembler.  Easily.
>
> Exhibit C:
> mips is consistent these days, but it has no tracepoint hookup *and* it does
> open-code tracehook_report_syscall_entry(), except for its return value...
> Used to pass the wrong register to seccomp, IIRC.
>
> We really ought to look into merging those suckers.  It's a source of PITA
> that keeps coming back; what we need is
>        regs_syscall_number(struct pt_regs *)
>        regs_syscall_arg1(struct pt_regs *)
>        ...
>        regs_syscall_arg6(struct pt_regs *)
> in addition to existing
>        regs_return_value(struct pt_regs *)
> added on all platforms and made mandatory for new ones.  With that we
> could go a long way towards merging these implementations...

For fun, I took a stab at that to see how it'd play out for x86.
(Attached instead of inline since it's just a first cut.)

I do think much of the ptrace/seccomp could be merged, but I don't
know how quickly given all the changes that each arch will need.

What makes sense for the current seccomp work?  I'm not displeased
with the in-tree behavior, but the alternative approach would change
the ptrace+seccomp interactions (minimally) just by switching the
ordering.  It's not hard to change all the existing arches that call
secure_computing calls (I have a tentative patch set), but it wouldn't
fix their general brokenness.

Anyway, I'd like to see seccomp work properly on all arches, but if
there are strong opinions about the correct expectations between
seccomp and ptrace, I'm happy to fix that /now/ for whatever arches it
makes sense for.

thanks!
will

[-- Attachment #2: 0001-arch-x86-asm-generic-add-syscall_regs.h.patch.txt --]
[-- Type: text/plain, Size: 8873 bytes --]

From f281e194e7dee18134266fa4af926d1f1e5ca208 Mon Sep 17 00:00:00 2001
From: Will Drewry <wad@chromium.org>
Date: Tue, 22 May 2012 14:54:16 -0500
Subject: [PATCH] arch/x86,asm-generic: add syscall_regs.h

asm/syscall.h provides a reliable layer of abstraction for accessing
system call-related values across arches.  It doesn't come without
costs, and using asm/syscall.h values in place of direct struct pt_regs
access adds non-trivial overhead.

While asm/syscall.h has its place encoding tricky logic, like
CONFIG_COMPAT behavior, there are many places where arch-specific code
could be moved into a shared location if there were arch-agnostic
accessors for the system call register properties.

asm/syscall_regs.h provides inline functions which provide direct
access to the pt_regs members for the cases where asm/syscall.h
doesn't make sense or is too costly.

As an example arch/x86/kernel/ptrace.c has been converted to use the
macros.  The resulting ptrace.o file is the exact same size and contains
the exact same output as the pre-patch version.

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Will Drewry <wad@chromium.org>
---
 arch/x86/include/asm/syscall_regs.h |   91 +++++++++++++++++++++++++++++++++++
 arch/x86/kernel/ptrace.c            |   23 +++++----
 include/asm-generic/syscall_regs.h  |   76 +++++++++++++++++++++++++++++
 3 files changed, 181 insertions(+), 9 deletions(-)
 create mode 100644 arch/x86/include/asm/syscall_regs.h
 create mode 100644 include/asm-generic/syscall_regs.h

diff --git a/arch/x86/include/asm/syscall_regs.h b/arch/x86/include/asm/syscall_regs.h
new file mode 100644
index 0000000..68aacf4
--- /dev/null
+++ b/arch/x86/include/asm/syscall_regs.h
@@ -0,0 +1,91 @@
+/*
+ * Efficient, uniform access to system call-related registers
+ *
+ * Copyright (C) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *
+ * This copyrighted material is made available to anyone wishing to use,
+ * modify, copy, or redistribute it subject to the terms and conditions
+ * of the GNU General Public License v.2.
+ *
+ * See asm-generic/syscall_regs.h for descriptions of what we must do here.
+ */
+
+#ifndef _ASM_X86_SYSCALL_REGS_H
+#define _ASM_X86_SYSCALL_REGS_H
+
+#include <asm/unistd.h>
+
+static inline unsigned long *regs_syscall_nr(struct pt_regs *regs)
+{
+	return &regs->orig_ax;
+}
+
+static inline unsigned long *regs_syscall_return(struct pt_regs *regs)
+{
+	/* Note! This does not handle compat mode sign extension, etc */
+	return &regs->ax;
+}
+
+static inline unsigned long *regs_syscall32_arg1(struct pt_regs *regs)
+{
+		return &regs->bx;
+}
+
+static inline unsigned long *regs_syscall32_arg2(struct pt_regs *regs)
+{
+		return &regs->cx;
+}
+
+static inline unsigned long *regs_syscall32_arg3(struct pt_regs *regs)
+{
+		return &regs->dx;
+}
+
+static inline unsigned long *regs_syscall32_arg4(struct pt_regs *regs)
+{
+		return &regs->si;
+}
+
+static inline unsigned long *regs_syscall32_arg5(struct pt_regs *regs)
+{
+		return &regs->di;
+}
+
+static inline unsigned long *regs_syscall32_arg6(struct pt_regs *regs)
+{
+		return &regs->bp;
+}
+
+#ifdef CONFIG_X86_64
+static inline unsigned long *regs_syscall64_arg1(struct pt_regs *regs)
+{
+		return &regs->di;
+}
+
+static inline unsigned long *regs_syscall64_arg2(struct pt_regs *regs)
+{
+		return &regs->si;
+}
+
+static inline unsigned long *regs_syscall64_arg3(struct pt_regs *regs)
+{
+		return &regs->dx;
+}
+
+static inline unsigned long *regs_syscall64_arg4(struct pt_regs *regs)
+{
+		return &regs->r10;
+}
+
+static inline unsigned long *regs_syscall64_arg5(struct pt_regs *regs)
+{
+		return &regs->r8;
+}
+
+static inline unsigned long *regs_syscall64_arg6(struct pt_regs *regs)
+{
+		return &regs->r9;
+}
+#endif	/* CONFIG_X86_64 */
+
+#endif	/* _ASM_X86_SYSCALL_REGS_H */
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 13474d0..a02ac2a 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -34,6 +34,7 @@
 #include <asm/proto.h>
 #include <asm/hw_breakpoint.h>
 #include <asm/traps.h>
+#include <asm/syscall_regs.h>
 
 #include "tls.h"
 
@@ -1487,29 +1488,33 @@ long syscall_trace_enter(struct pt_regs *regs)
 		ret = -1L;
 
 	/* do the secure computing after userspace can't change the syscall. */
-	if (!ret && secure_computing(regs->orig_ax)) {
+	if (!ret && secure_computing(*regs_syscall_nr(regs))) {
 		ret = -1L;
 		goto out;
 	}
 
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
-		trace_sys_enter(regs, regs->orig_ax);
+		trace_sys_enter(regs, *regs_syscall_nr(regs));
 
 	if (IS_IA32)
 		audit_syscall_entry(AUDIT_ARCH_I386,
-				    regs->orig_ax,
-				    regs->bx, regs->cx,
-				    regs->dx, regs->si);
+				    *regs_syscall_nr(regs),
+				    *regs_syscall32_arg1(regs),
+				    *regs_syscall32_arg2(regs),
+				    *regs_syscall32_arg3(regs),
+				    *regs_syscall32_arg4(regs));
 #ifdef CONFIG_X86_64
 	else
 		audit_syscall_entry(AUDIT_ARCH_X86_64,
-				    regs->orig_ax,
-				    regs->di, regs->si,
-				    regs->dx, regs->r10);
+				    *regs_syscall_nr(regs),
+				    *regs_syscall64_arg1(regs),
+				    *regs_syscall64_arg2(regs),
+				    *regs_syscall64_arg3(regs),
+				    *regs_syscall64_arg4(regs));
 #endif
 
 out:
-	return ret ?: regs->orig_ax;
+	return ret ?: syscall_get_nr(current, regs);
 }
 
 void syscall_trace_leave(struct pt_regs *regs)
diff --git a/include/asm-generic/syscall_regs.h b/include/asm-generic/syscall_regs.h
new file mode 100644
index 0000000..7b3fb5b
--- /dev/null
+++ b/include/asm-generic/syscall_regs.h
@@ -0,0 +1,76 @@
+/*
+ * Efficient, uniform access to system call-related registers
+ *
+ * Copyright (C) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *
+ * This copyrighted material is made available to anyone wishing to use,
+ * modify, copy, or redistribute it subject to the terms and conditions
+ * of the GNU General Public License v.2.
+ *
+ * This file is a stub providing documentation for what functions
+ * asm-ARCH/syscall_regs.h files need to define. All arch definitions
+ * should be simple inlines.
+ *
+ * All functions are meant to be cross-arch accessors into pt_regs.
+ * This means that no locks are expected and a populated pt_regs is
+ * expected.  It also means that any per-arch expectations, like
+ * runtime CONFIG_COMPAT behavior, must be handled by the caller.
+ *
+ * See asm/syscall.h for implementations that handle higher level
+ * runtime logic.
+ */
+
+#ifndef _ASM_X86_SYSCALL_REGS_H
+#define _ASM_X86_SYSCALL_REGS_H
+
+struct pt_regs;
+
+/**
+ * regs_syscall_nr - returns a pointer to the system call number entry
+ * @regs:	task_pt_regs() for the targeted task
+ *
+ * It's only valid to call this when @regs is known to be populated and
+ * the caller is the only reader/writer.
+ */
+static inline unsigned long *regs_syscall_nr(struct pt_regs *regs);
+
+/**
+ * regs_syscall_return - returns a pointer to the system call return value entry
+ * @regs:	task_pt_regs() for the targeted task
+ *
+ * It's only valid to call this when @regs is known to be populated and
+ * the caller is the only reader/writer.
+ */
+static inline unsigned long *regs_syscall_return(struct pt_regs *regs);
+
+/**
+ * regs_syscall32_arg1...6 - returns a pointer to the system call arg entry
+ * @regs:	task_pt_regs() for the targeted task
+ *
+ * It's only valid to call this when @regs is known to be populated and
+ * the caller is the only reader/writer.  The behavior is undefined if
+ * these functions are called on non-32-bit system call argument registers.
+ */
+static inline unsigned long *regs_syscall32_arg1(struct pt_regs *regs);
+static inline unsigned long *regs_syscall32_arg2(struct pt_regs *regs);
+static inline unsigned long *regs_syscall32_arg3(struct pt_regs *regs);
+static inline unsigned long *regs_syscall32_arg4(struct pt_regs *regs);
+static inline unsigned long *regs_syscall32_arg5(struct pt_regs *regs);
+static inline unsigned long *regs_syscall32_arg6(struct pt_regs *regs);
+
+/**
+ * regs_syscall64_arg1...6 - returns a pointer to the system call arg entry
+ * @regs:	task_pt_regs() for the targeted task
+ *
+ * It's only valid to call this when @regs is known to be populated and
+ * the caller is the only reader/writer.  The behavior is undefined if
+ * these functions are called on non-32-bit system call argument registers.
+ */
+static inline unsigned long *regs_syscall32_arg1(struct pt_regs *regs);
+static inline unsigned long *regs_syscall32_arg2(struct pt_regs *regs);
+static inline unsigned long *regs_syscall32_arg3(struct pt_regs *regs);
+static inline unsigned long *regs_syscall32_arg4(struct pt_regs *regs);
+static inline unsigned long *regs_syscall32_arg5(struct pt_regs *regs);
+static inline unsigned long *regs_syscall32_arg6(struct pt_regs *regs);
+
+#endif	/* _ASM_X86_SYSCALL_REGS_H */
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: seccomp and ptrace. what is the correct order?
  2012-05-22 20:26           ` [kernel-hardening] " Will Drewry
@ 2012-05-22 20:34             ` H. Peter Anvin
  -1 siblings, 0 replies; 77+ messages in thread
From: H. Peter Anvin @ 2012-05-22 20:34 UTC (permalink / raw)
  To: Will Drewry
  Cc: Al Viro, Indan Zupancic, Roland McGrath, Eric Paris,
	linux-kernel, linux-security-module, kernel-hardening, mingo,
	oleg, peterz, rdunlap, tglx, luto, eparis, serge.hallyn, pmoore,
	akpm, corbet, eric.dumazet, markus, coreyb, keescook

The proposed patch seems to duplicate the functionality in
<asm/syscall.h>.  Those macros also try to do the right thing in the
presence of compat.

	-hpa

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: seccomp and ptrace. what is the correct order?
@ 2012-05-22 20:34             ` H. Peter Anvin
  0 siblings, 0 replies; 77+ messages in thread
From: H. Peter Anvin @ 2012-05-22 20:34 UTC (permalink / raw)
  To: Will Drewry
  Cc: Al Viro, Indan Zupancic, Roland McGrath, Eric Paris,
	linux-kernel, linux-security-module, kernel-hardening, mingo,
	oleg, peterz, rdunlap, tglx, luto, eparis, serge.hallyn, pmoore,
	akpm, corbet, eric.dumazet, markus, coreyb, keescook

The proposed patch seems to duplicate the functionality in
<asm/syscall.h>.  Those macros also try to do the right thing in the
presence of compat.

	-hpa

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: seccomp and ptrace. what is the correct order?
  2012-05-22 20:34             ` [kernel-hardening] " H. Peter Anvin
@ 2012-05-22 20:48               ` Will Drewry
  -1 siblings, 0 replies; 77+ messages in thread
From: Will Drewry @ 2012-05-22 20:48 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Al Viro, Indan Zupancic, Roland McGrath, Eric Paris,
	linux-kernel, linux-security-module, kernel-hardening, mingo,
	oleg, peterz, rdunlap, tglx, luto, eparis, serge.hallyn, pmoore,
	akpm, corbet, eric.dumazet, markus, coreyb, keescook

On Tue, May 22, 2012 at 3:34 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> The proposed patch seems to duplicate the functionality in
> <asm/syscall.h>.  Those macros also try to do the right thing in the
> presence of compat.

That was my first thought too, so I ran a few simple tests.  gcc isn't
smart enough to not add ~344 bytes of code to get the number and
arguments for the x86/kernel/ptrace.c case I included (in the
naive-est of integrations).  But I don't know that it justifies the
extra patchwork or enforcing shared code across arches.

Regardless, the syscall entry + trace code can use some attention
across the architectures. I don't know that
one-more-layer-of-abstraction is the right answer (rather than just
fixing the code). The biggest benefit would be having one-true
syscall_trace_entry slow path. That said, the fast paths will be
forever divergent so the opportunity for bugs like the ones pointed
out will still be there.

cheers!
will

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: seccomp and ptrace. what is the correct order?
@ 2012-05-22 20:48               ` Will Drewry
  0 siblings, 0 replies; 77+ messages in thread
From: Will Drewry @ 2012-05-22 20:48 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Al Viro, Indan Zupancic, Roland McGrath, Eric Paris,
	linux-kernel, linux-security-module, kernel-hardening, mingo,
	oleg, peterz, rdunlap, tglx, luto, eparis, serge.hallyn, pmoore,
	akpm, corbet, eric.dumazet, markus, coreyb, keescook

On Tue, May 22, 2012 at 3:34 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> The proposed patch seems to duplicate the functionality in
> <asm/syscall.h>.  Those macros also try to do the right thing in the
> presence of compat.

That was my first thought too, so I ran a few simple tests.  gcc isn't
smart enough to not add ~344 bytes of code to get the number and
arguments for the x86/kernel/ptrace.c case I included (in the
naive-est of integrations).  But I don't know that it justifies the
extra patchwork or enforcing shared code across arches.

Regardless, the syscall entry + trace code can use some attention
across the architectures. I don't know that
one-more-layer-of-abstraction is the right answer (rather than just
fixing the code). The biggest benefit would be having one-true
syscall_trace_entry slow path. That said, the fast paths will be
forever divergent so the opportunity for bugs like the ones pointed
out will still be there.

cheers!
will

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: seccomp and ptrace. what is the correct order?
  2012-05-22 20:48               ` [kernel-hardening] " Will Drewry
@ 2012-05-22 21:07                 ` Al Viro
  -1 siblings, 0 replies; 77+ messages in thread
From: Al Viro @ 2012-05-22 21:07 UTC (permalink / raw)
  To: Will Drewry
  Cc: H. Peter Anvin, Indan Zupancic, Roland McGrath, Eric Paris,
	linux-kernel, linux-security-module, kernel-hardening, mingo,
	oleg, peterz, rdunlap, tglx, luto, eparis, serge.hallyn, pmoore,
	akpm, corbet, eric.dumazet, markus, coreyb, keescook

On Tue, May 22, 2012 at 03:48:40PM -0500, Will Drewry wrote:
> On Tue, May 22, 2012 at 3:34 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> > The proposed patch seems to duplicate the functionality in
> > <asm/syscall.h>. ?Those macros also try to do the right thing in the
> > presence of compat.
> 
> That was my first thought too, so I ran a few simple tests.  gcc isn't
> smart enough to not add ~344 bytes of code to get the number and
> arguments for the x86/kernel/ptrace.c case I included (in the
> naive-est of integrations).  But I don't know that it justifies the
> extra patchwork or enforcing shared code across arches.
> 
> Regardless, the syscall entry + trace code can use some attention
> across the architectures. I don't know that
> one-more-layer-of-abstraction is the right answer (rather than just
> fixing the code). The biggest benefit would be having one-true
> syscall_trace_entry slow path. That said, the fast paths will be
> forever divergent so the opportunity for bugs like the ones pointed
> out will still be there.

FWIW, I'd prefer to have all that done inside __audit_syscall_entry(),
with
        context->arch       = syscall_get_arch(current, regs);
        context->major      = syscall_get_nr(current, regs);
	syscall_get_arguments(current, regs, 0, 4, context->argv);
done instead of initializations from arguments we are doing there now.
I seriously doubt that it would lead to worse code than what we currently
have.  If nothing else, we won't be passing that pile of arguments around.

And yes, asm/syscall.h stuff is probably the right approach here.  For
biarch ones syscall_get_arguments() is saner than doing them one by one...

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: seccomp and ptrace. what is the correct order?
@ 2012-05-22 21:07                 ` Al Viro
  0 siblings, 0 replies; 77+ messages in thread
From: Al Viro @ 2012-05-22 21:07 UTC (permalink / raw)
  To: Will Drewry
  Cc: H. Peter Anvin, Indan Zupancic, Roland McGrath, Eric Paris,
	linux-kernel, linux-security-module, kernel-hardening, mingo,
	oleg, peterz, rdunlap, tglx, luto, eparis, serge.hallyn, pmoore,
	akpm, corbet, eric.dumazet, markus, coreyb, keescook

On Tue, May 22, 2012 at 03:48:40PM -0500, Will Drewry wrote:
> On Tue, May 22, 2012 at 3:34 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> > The proposed patch seems to duplicate the functionality in
> > <asm/syscall.h>. ?Those macros also try to do the right thing in the
> > presence of compat.
> 
> That was my first thought too, so I ran a few simple tests.  gcc isn't
> smart enough to not add ~344 bytes of code to get the number and
> arguments for the x86/kernel/ptrace.c case I included (in the
> naive-est of integrations).  But I don't know that it justifies the
> extra patchwork or enforcing shared code across arches.
> 
> Regardless, the syscall entry + trace code can use some attention
> across the architectures. I don't know that
> one-more-layer-of-abstraction is the right answer (rather than just
> fixing the code). The biggest benefit would be having one-true
> syscall_trace_entry slow path. That said, the fast paths will be
> forever divergent so the opportunity for bugs like the ones pointed
> out will still be there.

FWIW, I'd prefer to have all that done inside __audit_syscall_entry(),
with
        context->arch       = syscall_get_arch(current, regs);
        context->major      = syscall_get_nr(current, regs);
	syscall_get_arguments(current, regs, 0, 4, context->argv);
done instead of initializations from arguments we are doing there now.
I seriously doubt that it would lead to worse code than what we currently
have.  If nothing else, we won't be passing that pile of arguments around.

And yes, asm/syscall.h stuff is probably the right approach here.  For
biarch ones syscall_get_arguments() is saner than doing them one by one...

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: seccomp and ptrace. what is the correct order?
  2012-05-22 20:48               ` [kernel-hardening] " Will Drewry
@ 2012-05-22 21:09                 ` H. Peter Anvin
  -1 siblings, 0 replies; 77+ messages in thread
From: H. Peter Anvin @ 2012-05-22 21:09 UTC (permalink / raw)
  To: Will Drewry
  Cc: Al Viro, Indan Zupancic, Roland McGrath, Eric Paris,
	linux-kernel, linux-security-module, kernel-hardening, mingo,
	oleg, peterz, rdunlap, tglx, luto, eparis, serge.hallyn, pmoore,
	akpm, corbet, eric.dumazet, markus, coreyb, keescook

On 05/22/2012 01:48 PM, Will Drewry wrote:
> 
> That was my first thought too, so I ran a few simple tests.  gcc isn't
> smart enough to not add ~344 bytes of code to get the number and
> arguments for the x86/kernel/ptrace.c case I included (in the
> naive-est of integrations).  But I don't know that it justifies the
> extra patchwork or enforcing shared code across arches.
> 

I suspect the construction of those inlines can be improved.

	-hpa


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: seccomp and ptrace. what is the correct order?
@ 2012-05-22 21:09                 ` H. Peter Anvin
  0 siblings, 0 replies; 77+ messages in thread
From: H. Peter Anvin @ 2012-05-22 21:09 UTC (permalink / raw)
  To: Will Drewry
  Cc: Al Viro, Indan Zupancic, Roland McGrath, Eric Paris,
	linux-kernel, linux-security-module, kernel-hardening, mingo,
	oleg, peterz, rdunlap, tglx, luto, eparis, serge.hallyn, pmoore,
	akpm, corbet, eric.dumazet, markus, coreyb, keescook

On 05/22/2012 01:48 PM, Will Drewry wrote:
> 
> That was my first thought too, so I ran a few simple tests.  gcc isn't
> smart enough to not add ~344 bytes of code to get the number and
> arguments for the x86/kernel/ptrace.c case I included (in the
> naive-est of integrations).  But I don't know that it justifies the
> extra patchwork or enforcing shared code across arches.
> 

I suspect the construction of those inlines can be improved.

	-hpa

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: seccomp and ptrace. what is the correct order?
  2012-05-22 21:09                 ` [kernel-hardening] " H. Peter Anvin
@ 2012-05-22 21:14                   ` Will Drewry
  -1 siblings, 0 replies; 77+ messages in thread
From: Will Drewry @ 2012-05-22 21:14 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Al Viro, Indan Zupancic, Roland McGrath, Eric Paris,
	linux-kernel, linux-security-module, kernel-hardening, mingo,
	oleg, peterz, rdunlap, tglx, luto, eparis, serge.hallyn, pmoore,
	akpm, corbet, eric.dumazet, markus, coreyb, keescook

On Tue, May 22, 2012 at 4:09 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 05/22/2012 01:48 PM, Will Drewry wrote:
>>
>> That was my first thought too, so I ran a few simple tests.  gcc isn't
>> smart enough to not add ~344 bytes of code to get the number and
>> arguments for the x86/kernel/ptrace.c case I included (in the
>> naive-est of integrations).  But I don't know that it justifies the
>> extra patchwork or enforcing shared code across arches.
>>
>
> I suspect the construction of those inlines can be improved.

Seems likely - or just my use of them :)

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: seccomp and ptrace. what is the correct order?
@ 2012-05-22 21:14                   ` Will Drewry
  0 siblings, 0 replies; 77+ messages in thread
From: Will Drewry @ 2012-05-22 21:14 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Al Viro, Indan Zupancic, Roland McGrath, Eric Paris,
	linux-kernel, linux-security-module, kernel-hardening, mingo,
	oleg, peterz, rdunlap, tglx, luto, eparis, serge.hallyn, pmoore,
	akpm, corbet, eric.dumazet, markus, coreyb, keescook

On Tue, May 22, 2012 at 4:09 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 05/22/2012 01:48 PM, Will Drewry wrote:
>>
>> That was my first thought too, so I ran a few simple tests.  gcc isn't
>> smart enough to not add ~344 bytes of code to get the number and
>> arguments for the x86/kernel/ptrace.c case I included (in the
>> naive-est of integrations).  But I don't know that it justifies the
>> extra patchwork or enforcing shared code across arches.
>>
>
> I suspect the construction of those inlines can be improved.

Seems likely - or just my use of them :)

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: seccomp and ptrace. what is the correct order?
  2012-05-22 21:07                 ` [kernel-hardening] " Al Viro
@ 2012-05-22 21:17                   ` Roland McGrath
  -1 siblings, 0 replies; 77+ messages in thread
From: Roland McGrath @ 2012-05-22 21:17 UTC (permalink / raw)
  To: Al Viro
  Cc: Will Drewry, H. Peter Anvin, Indan Zupancic, Eric Paris,
	linux-kernel, linux-security-module, kernel-hardening, mingo,
	oleg, peterz, rdunlap, tglx, luto, eparis, serge.hallyn, pmoore,
	akpm, corbet, eric.dumazet, markus, coreyb, keescook

On Tue, May 22, 2012 at 2:07 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> FWIW, I'd prefer to have all that done inside __audit_syscall_entry(),
> with
>        context->arch       = syscall_get_arch(current, regs);
>        context->major      = syscall_get_nr(current, regs);
>        syscall_get_arguments(current, regs, 0, 4, context->argv);
> done instead of initializations from arguments we are doing there now.
> I seriously doubt that it would lead to worse code than what we currently
> have.  If nothing else, we won't be passing that pile of arguments around.

I always felt the same way about the audit code.  (As a bonus, if the audit
folks ever decide they want all six syscall arguments instead of just four,
they wouldn't have to touch every arch.)  But it will certainly produce
drastically worse code for ia64.  (Not that anybody cares about ia64.)


Thanks,
Roland

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: seccomp and ptrace. what is the correct order?
@ 2012-05-22 21:17                   ` Roland McGrath
  0 siblings, 0 replies; 77+ messages in thread
From: Roland McGrath @ 2012-05-22 21:17 UTC (permalink / raw)
  To: Al Viro
  Cc: Will Drewry, H. Peter Anvin, Indan Zupancic, Eric Paris,
	linux-kernel, linux-security-module, kernel-hardening, mingo,
	oleg, peterz, rdunlap, tglx, luto, eparis, serge.hallyn, pmoore,
	akpm, corbet, eric.dumazet, markus, coreyb, keescook

On Tue, May 22, 2012 at 2:07 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> FWIW, I'd prefer to have all that done inside __audit_syscall_entry(),
> with
>        context->arch       = syscall_get_arch(current, regs);
>        context->major      = syscall_get_nr(current, regs);
>        syscall_get_arguments(current, regs, 0, 4, context->argv);
> done instead of initializations from arguments we are doing there now.
> I seriously doubt that it would lead to worse code than what we currently
> have.  If nothing else, we won't be passing that pile of arguments around.

I always felt the same way about the audit code.  (As a bonus, if the audit
folks ever decide they want all six syscall arguments instead of just four,
they wouldn't have to touch every arch.)  But it will certainly produce
drastically worse code for ia64.  (Not that anybody cares about ia64.)


Thanks,
Roland

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: seccomp and ptrace. what is the correct order?
  2012-05-22 21:17                   ` [kernel-hardening] " Roland McGrath
@ 2012-05-22 21:18                     ` H. Peter Anvin
  -1 siblings, 0 replies; 77+ messages in thread
From: H. Peter Anvin @ 2012-05-22 21:18 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Al Viro, Will Drewry, Indan Zupancic, Eric Paris, linux-kernel,
	linux-security-module, kernel-hardening, mingo, oleg, peterz,
	rdunlap, tglx, luto, eparis, serge.hallyn, pmoore, akpm, corbet,
	eric.dumazet, markus, coreyb, keescook

On 05/22/2012 02:17 PM, Roland McGrath wrote:
> (As a bonus, if the audit folks ever decide they want all six syscall
> arguments instead of just four

Is there anything about audit which *isn't* a gigantic facepalm?

	-hpa

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: seccomp and ptrace. what is the correct order?
@ 2012-05-22 21:18                     ` H. Peter Anvin
  0 siblings, 0 replies; 77+ messages in thread
From: H. Peter Anvin @ 2012-05-22 21:18 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Al Viro, Will Drewry, Indan Zupancic, Eric Paris, linux-kernel,
	linux-security-module, kernel-hardening, mingo, oleg, peterz,
	rdunlap, tglx, luto, eparis, serge.hallyn, pmoore, akpm, corbet,
	eric.dumazet, markus, coreyb, keescook

On 05/22/2012 02:17 PM, Roland McGrath wrote:
> (As a bonus, if the audit folks ever decide they want all six syscall
> arguments instead of just four

Is there anything about audit which *isn't* a gigantic facepalm?

	-hpa

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: seccomp and ptrace. what is the correct order?
  2012-05-22 21:14                   ` [kernel-hardening] " Will Drewry
@ 2012-05-22 21:37                     ` H. Peter Anvin
  -1 siblings, 0 replies; 77+ messages in thread
From: H. Peter Anvin @ 2012-05-22 21:37 UTC (permalink / raw)
  To: Will Drewry
  Cc: Al Viro, Indan Zupancic, Roland McGrath, Eric Paris,
	linux-kernel, linux-security-module, kernel-hardening, mingo,
	oleg, peterz, rdunlap, tglx, luto, eparis, serge.hallyn, pmoore,
	akpm, corbet, eric.dumazet, markus, coreyb, keescook

On 05/22/2012 02:14 PM, Will Drewry wrote:
>>
>> I suspect the construction of those inlines can be improved.
> 
> Seems likely - or just my use of them :)
> 

One thing that could make the code worse is if you are in a flow where
the state of the TS_COMPAT flag is known but gcc doesn't know that.  I
don't have an easy answer for that.

	-hpa


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: seccomp and ptrace. what is the correct order?
@ 2012-05-22 21:37                     ` H. Peter Anvin
  0 siblings, 0 replies; 77+ messages in thread
From: H. Peter Anvin @ 2012-05-22 21:37 UTC (permalink / raw)
  To: Will Drewry
  Cc: Al Viro, Indan Zupancic, Roland McGrath, Eric Paris,
	linux-kernel, linux-security-module, kernel-hardening, mingo,
	oleg, peterz, rdunlap, tglx, luto, eparis, serge.hallyn, pmoore,
	akpm, corbet, eric.dumazet, markus, coreyb, keescook

On 05/22/2012 02:14 PM, Will Drewry wrote:
>>
>> I suspect the construction of those inlines can be improved.
> 
> Seems likely - or just my use of them :)
> 

One thing that could make the code worse is if you are in a flow where
the state of the TS_COMPAT flag is known but gcc doesn't know that.  I
don't have an easy answer for that.

	-hpa

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: seccomp and ptrace. what is the correct order?
  2012-05-22 21:17                   ` [kernel-hardening] " Roland McGrath
@ 2012-05-22 22:20                     ` Al Viro
  -1 siblings, 0 replies; 77+ messages in thread
From: Al Viro @ 2012-05-22 22:20 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Will Drewry, H. Peter Anvin, Indan Zupancic, Eric Paris,
	linux-kernel, linux-security-module, kernel-hardening, mingo,
	oleg, peterz, rdunlap, tglx, luto, eparis, serge.hallyn, pmoore,
	akpm, corbet, eric.dumazet, markus, coreyb, keescook

On Tue, May 22, 2012 at 02:17:03PM -0700, Roland McGrath wrote:
> I always felt the same way about the audit code.  (As a bonus, if the audit
> folks ever decide they want all six syscall arguments instead of just four,
> they wouldn't have to touch every arch.)  But it will certainly produce
> drastically worse code for ia64.  (Not that anybody cares about ia64.)

FWIW, here's more or less what I had in mind; it's completely untested,
definitely breaks build on mips (no asm/syscall.h there) and I'm very
sceptical about ia32 compat syscall glue on x86 (I remember that there
had been some very evil games played with what's in which register at
which point, but I can't recall the details and I would rather leave that
to somebody whose eyes are not bleeding from staring at asm glue for more
than a month, TYVM...

diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
index c334a23..c31d82e 100644
--- a/arch/arm/include/asm/syscall.h
+++ b/arch/arm/include/asm/syscall.h
@@ -8,6 +8,8 @@
 #define _ASM_ARM_SYSCALL_H
 
 #include <linux/err.h>
+#include <asm/elf.h>
+#include <linux/audit.h>
 
 extern const unsigned long sys_call_table[];
 
@@ -90,4 +92,10 @@ static inline void syscall_set_arguments(struct task_struct *task,
 	memcpy(&regs->ARM_r0 + i, args, n * sizeof(args[0]));
 }
 
+static inline int syscall_get_arch(struct task_struct *task,
+				   struct pt_regs *regs)
+{
+	return AUDIT_ARCH_ARM;
+}
+
 #endif /* _ASM_ARM_SYSCALL_H */
diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
index 14e3826..2e39596 100644
--- a/arch/arm/kernel/ptrace.c
+++ b/arch/arm/kernel/ptrace.c
@@ -911,17 +911,16 @@ asmlinkage int syscall_trace(int why, struct pt_regs *regs, int scno)
 {
 	unsigned long ip;
 
+	current_thread_info()->syscall = scno;
+
 	if (why)
 		audit_syscall_exit(regs);
 	else
-		audit_syscall_entry(AUDIT_ARCH_ARM, scno, regs->ARM_r0,
-				    regs->ARM_r1, regs->ARM_r2, regs->ARM_r3);
+		audit_syscall_entry(regs);
 
 	if (!test_thread_flag(TIF_SYSCALL_TRACE))
 		return scno;
 
-	current_thread_info()->syscall = scno;
-
 	/*
 	 * IP is used to denote syscall entry/exit:
 	 * IP = 0 -> entry, =1 -> exit
diff --git a/arch/ia64/include/asm/syscall.h b/arch/ia64/include/asm/syscall.h
index a7ff1c6..12f36fb 100644
--- a/arch/ia64/include/asm/syscall.h
+++ b/arch/ia64/include/asm/syscall.h
@@ -14,6 +14,7 @@
 #define _ASM_SYSCALL_H	1
 
 #include <linux/sched.h>
+#include <linux/audit.h>
 #include <linux/err.h>
 
 static inline long syscall_get_nr(struct task_struct *task,
@@ -79,4 +80,10 @@ static inline void syscall_set_arguments(struct task_struct *task,
 
 	ia64_syscall_get_set_arguments(task, regs, i, n, args, 1);
 }
+
+static inline int syscall_get_arch(struct task_struct *task,
+				   struct pt_regs *regs)
+{
+	return AUDIT_ARCH_IA64;
+}
 #endif	/* _ASM_SYSCALL_H */
diff --git a/arch/ia64/kernel/ptrace.c b/arch/ia64/kernel/ptrace.c
index 4265ff6..319d001 100644
--- a/arch/ia64/kernel/ptrace.c
+++ b/arch/ia64/kernel/ptrace.c
@@ -1246,7 +1246,7 @@ syscall_trace_enter (long arg0, long arg1, long arg2, long arg3,
 		ia64_sync_krbs();
 
 
-	audit_syscall_entry(AUDIT_ARCH_IA64, regs.r15, arg0, arg1, arg2, arg3);
+	audit_syscall_entry(&regs);
 
 	return 0;
 }
diff --git a/arch/microblaze/include/asm/syscall.h b/arch/microblaze/include/asm/syscall.h
index 9bc4317..d8d963b 100644
--- a/arch/microblaze/include/asm/syscall.h
+++ b/arch/microblaze/include/asm/syscall.h
@@ -3,6 +3,7 @@
 
 #include <linux/kernel.h>
 #include <linux/sched.h>
+#include <linux/audit.h>
 #include <asm/ptrace.h>
 
 /* The system call number is given by the user in R12 */
@@ -96,6 +97,12 @@ static inline void syscall_set_arguments(struct task_struct *task,
 		microblaze_set_syscall_arg(regs, i++, *args++);
 }
 
+static inline int syscall_get_arch(struct task_struct *task,
+				   struct pt_regs *regs)
+{
+	return EM_MICROBLAZE;
+}
+
 asmlinkage long do_syscall_trace_enter(struct pt_regs *regs);
 asmlinkage void do_syscall_trace_leave(struct pt_regs *regs);
 
diff --git a/arch/microblaze/kernel/ptrace.c b/arch/microblaze/kernel/ptrace.c
index ab1b9db..4953058 100644
--- a/arch/microblaze/kernel/ptrace.c
+++ b/arch/microblaze/kernel/ptrace.c
@@ -147,8 +147,7 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
 		 */
 		ret = -1L;
 
-	audit_syscall_entry(EM_MICROBLAZE, regs->r12, regs->r5, regs->r6,
-			    regs->r7, regs->r8);
+	audit_syscall_entry(regs);
 
 	return ret ?: regs->r12;
 }
diff --git a/arch/powerpc/include/asm/syscall.h b/arch/powerpc/include/asm/syscall.h
index b54b2ad..3a6d01f 100644
--- a/arch/powerpc/include/asm/syscall.h
+++ b/arch/powerpc/include/asm/syscall.h
@@ -14,6 +14,7 @@
 #define _ASM_SYSCALL_H	1
 
 #include <linux/sched.h>
+#include <linux/audit.h>
 
 /* ftrace syscalls requires exporting the sys_call_table */
 #ifdef CONFIG_FTRACE_SYSCALLS
@@ -86,4 +87,12 @@ static inline void syscall_set_arguments(struct task_struct *task,
 	memcpy(&regs->gpr[3 + i], args, n * sizeof(args[0]));
 }
 
+static inline int syscall_get_arch(struct task_struct *task,
+				   struct pt_regs *regs)
+{
+	return test_tsk_thread_flag(task, TIF_32BIT) ? 
+					AUDIT_ARCH_PPC :
+					AUDIT_ARCH_PPC64;
+}
+
 #endif	/* _ASM_SYSCALL_H */
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index dd5e214..3f959f4 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -1724,20 +1724,7 @@ long do_syscall_trace_enter(struct pt_regs *regs)
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
 		trace_sys_enter(regs, regs->gpr[0]);
 
-#ifdef CONFIG_PPC64
-	if (!is_32bit_task())
-		audit_syscall_entry(AUDIT_ARCH_PPC64,
-				    regs->gpr[0],
-				    regs->gpr[3], regs->gpr[4],
-				    regs->gpr[5], regs->gpr[6]);
-	else
-#endif
-		audit_syscall_entry(AUDIT_ARCH_PPC,
-				    regs->gpr[0],
-				    regs->gpr[3] & 0xffffffff,
-				    regs->gpr[4] & 0xffffffff,
-				    regs->gpr[5] & 0xffffffff,
-				    regs->gpr[6] & 0xffffffff);
+	audit_syscall_entry(regs);
 
 	return ret ?: regs->gpr[0];
 }
diff --git a/arch/s390/include/asm/syscall.h b/arch/s390/include/asm/syscall.h
index fb214dd..b481b11 100644
--- a/arch/s390/include/asm/syscall.h
+++ b/arch/s390/include/asm/syscall.h
@@ -14,6 +14,7 @@
 
 #include <linux/sched.h>
 #include <linux/err.h>
+#include <linux/audit.h>
 #include <asm/ptrace.h>
 
 /*
@@ -87,4 +88,12 @@ static inline void syscall_set_arguments(struct task_struct *task,
 		regs->orig_gpr2 = args[0];
 }
 
+static inline int syscall_get_arch(struct task_struct *task,
+				   struct pt_regs *regs)
+{
+	return test_tsk_thread_flag(task, TIF_31BIT) ? 
+					AUDIT_ARCH_S390 :
+					AUDIT_ARCH_S390X;
+}
+
 #endif	/* _ASM_SYSCALL_H */
diff --git a/arch/s390/kernel/ptrace.c b/arch/s390/kernel/ptrace.c
index 4993e68..4222adb 100644
--- a/arch/s390/kernel/ptrace.c
+++ b/arch/s390/kernel/ptrace.c
@@ -740,11 +740,7 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
 		trace_sys_enter(regs, regs->gprs[2]);
 
-	audit_syscall_entry(is_compat_task() ?
-				AUDIT_ARCH_S390 : AUDIT_ARCH_S390X,
-			    regs->gprs[2], regs->orig_gpr2,
-			    regs->gprs[3], regs->gprs[4],
-			    regs->gprs[5]);
+	audit_syscall_entry(regs);
 	return ret ?: regs->gprs[2];
 }
 
diff --git a/arch/sh/include/asm/syscall_32.h b/arch/sh/include/asm/syscall_32.h
index 7d80df4..db1eb5d 100644
--- a/arch/sh/include/asm/syscall_32.h
+++ b/arch/sh/include/asm/syscall_32.h
@@ -4,6 +4,7 @@
 #include <linux/kernel.h>
 #include <linux/sched.h>
 #include <linux/err.h>
+#include <linux/audit.h>
 #include <asm/ptrace.h>
 
 /* The system call number is given by the user in R3 */
@@ -93,4 +94,16 @@ static inline void syscall_set_arguments(struct task_struct *task,
 	}
 }
 
+static inline int syscall_get_arch(struct task_struct *task,
+				   struct pt_regs *regs)
+{
+	int arch = EM_SH;
+
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+	arch |= __AUDIT_ARCH_LE;
+#endif
+
+	return arch;
+}
+
 #endif /* __ASM_SH_SYSCALL_32_H */
diff --git a/arch/sh/include/asm/syscall_64.h b/arch/sh/include/asm/syscall_64.h
index c3561ca..f3a5913 100644
--- a/arch/sh/include/asm/syscall_64.h
+++ b/arch/sh/include/asm/syscall_64.h
@@ -3,6 +3,7 @@
 
 #include <linux/kernel.h>
 #include <linux/sched.h>
+#include <linux/audit.h>
 #include <asm/ptrace.h>
 
 /* The system call number is given by the user in R9 */
@@ -61,4 +62,19 @@ static inline void syscall_set_arguments(struct task_struct *task,
 	memcpy(&regs->regs[2 + i], args, n * sizeof(args[0]));
 }
 
+static inline int syscall_get_arch(struct task_struct *task,
+				   struct pt_regs *regs)
+{
+	int arch = EM_SH;
+
+#ifdef CONFIG_64BIT
+	arch |= __AUDIT_ARCH_64BIT;
+#endif
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+	arch |= __AUDIT_ARCH_LE;
+#endif
+
+	return arch;
+}
+
 #endif /* __ASM_SH_SYSCALL_64_H */
diff --git a/arch/sh/kernel/ptrace_32.c b/arch/sh/kernel/ptrace_32.c
index 81f999a..0173bfd 100644
--- a/arch/sh/kernel/ptrace_32.c
+++ b/arch/sh/kernel/ptrace_32.c
@@ -488,17 +488,6 @@ long arch_ptrace(struct task_struct *child, long request,
 	return ret;
 }
 
-static inline int audit_arch(void)
-{
-	int arch = EM_SH;
-
-#ifdef CONFIG_CPU_LITTLE_ENDIAN
-	arch |= __AUDIT_ARCH_LE;
-#endif
-
-	return arch;
-}
-
 asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
 {
 	long ret = 0;
@@ -517,9 +506,7 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
 		trace_sys_enter(regs, regs->regs[0]);
 
-	audit_syscall_entry(audit_arch(), regs->regs[3],
-			    regs->regs[4], regs->regs[5],
-			    regs->regs[6], regs->regs[7]);
+	audit_syscall_entry(regs);
 
 	return ret ?: regs->regs[0];
 }
diff --git a/arch/sh/kernel/ptrace_64.c b/arch/sh/kernel/ptrace_64.c
index af90339..5e21c79 100644
--- a/arch/sh/kernel/ptrace_64.c
+++ b/arch/sh/kernel/ptrace_64.c
@@ -504,20 +504,6 @@ asmlinkage int sh64_ptrace(long request, long pid,
 	return sys_ptrace(request, pid, addr, data);
 }
 
-static inline int audit_arch(void)
-{
-	int arch = EM_SH;
-
-#ifdef CONFIG_64BIT
-	arch |= __AUDIT_ARCH_64BIT;
-#endif
-#ifdef CONFIG_CPU_LITTLE_ENDIAN
-	arch |= __AUDIT_ARCH_LE;
-#endif
-
-	return arch;
-}
-
 asmlinkage long long do_syscall_trace_enter(struct pt_regs *regs)
 {
 	long long ret = 0;
@@ -536,10 +522,7 @@ asmlinkage long long do_syscall_trace_enter(struct pt_regs *regs)
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
 		trace_sys_enter(regs, regs->regs[9]);
 
-	audit_syscall_entry(audit_arch(), regs->regs[1],
-			    regs->regs[2], regs->regs[3],
-			    regs->regs[4], regs->regs[5]);
-
+	audit_syscall_entry(regs);
 	return ret ?: regs->regs[9];
 }
 
diff --git a/arch/sparc/include/asm/syscall.h b/arch/sparc/include/asm/syscall.h
index 025a02a..dc94ae8 100644
--- a/arch/sparc/include/asm/syscall.h
+++ b/arch/sparc/include/asm/syscall.h
@@ -3,6 +3,7 @@
 
 #include <linux/kernel.h>
 #include <linux/sched.h>
+#include <linux/audit.h>
 #include <asm/ptrace.h>
 
 /*
@@ -124,4 +125,12 @@ static inline void syscall_set_arguments(struct task_struct *task,
 		regs->u_regs[UREG_I0 + i + j] = args[j];
 }
 
+static inline int syscall_get_arch(struct task_struct *task,
+				   struct pt_regs *regs)
+{
+	return test_tsk_thread_flag(TIF_32BIT) ?
+			     AUDIT_ARCH_SPARC :
+			     AUDIT_ARCH_SPARC64;
+}
+
 #endif /* __ASM_SPARC_SYSCALL_H */
diff --git a/arch/sparc/kernel/ptrace_64.c b/arch/sparc/kernel/ptrace_64.c
index 484daba..7deee07 100644
--- a/arch/sparc/kernel/ptrace_64.c
+++ b/arch/sparc/kernel/ptrace_64.c
@@ -1070,15 +1070,7 @@ asmlinkage int syscall_trace_enter(struct pt_regs *regs)
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
 		trace_sys_enter(regs, regs->u_regs[UREG_G1]);
 
-	audit_syscall_entry((test_thread_flag(TIF_32BIT) ?
-			     AUDIT_ARCH_SPARC :
-			     AUDIT_ARCH_SPARC64),
-			    regs->u_regs[UREG_G1],
-			    regs->u_regs[UREG_I0],
-			    regs->u_regs[UREG_I1],
-			    regs->u_regs[UREG_I2],
-			    regs->u_regs[UREG_I3]);
-
+	audit_syscall_entry(regs);
 	return ret;
 }
 
diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index e3e7340..aa5d1ff 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -184,17 +184,11 @@ sysexit_from_sys_call:
 
 #ifdef CONFIG_AUDITSYSCALL
 	.macro auditsys_entry_common
-	movl %esi,%r9d			/* 6th arg: 4th syscall arg */
-	movl %edx,%r8d			/* 5th arg: 3rd syscall arg */
-	/* (already in %ecx)		   4th arg: 2nd syscall arg */
-	movl %ebx,%edx			/* 3rd arg: 1st syscall arg */
-	movl %eax,%esi			/* 2nd arg: syscall number */
-	movl $AUDIT_ARCH_I386,%edi	/* 1st arg: audit arch */
+	movq %rsp,%rdi
 	call __audit_syscall_entry
 	movl RAX-ARGOFFSET(%rsp),%eax	/* reload syscall number */
 	cmpq $(IA32_NR_syscalls-1),%rax
 	ja ia32_badsys
-	movl %ebx,%edi			/* reload 1st syscall arg */
 	movl RCX-ARGOFFSET(%rsp),%esi	/* reload 2nd syscall arg */
 	movl RDX-ARGOFFSET(%rsp),%edx	/* reload 3rd syscall arg */
 	movl RSI-ARGOFFSET(%rsp),%ecx	/* reload 4th syscall arg */
diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index 1ace47b..7d69c71 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -13,8 +13,8 @@
 #ifndef _ASM_X86_SYSCALL_H
 #define _ASM_X86_SYSCALL_H
 
-#include <linux/audit.h>
 #include <linux/sched.h>
+#include <linux/audit.h>
 #include <linux/err.h>
 #include <asm/asm-offsets.h>	/* For NR_syscalls */
 #include <asm/thread_info.h>	/* for TS_COMPAT */
diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index 7b784f4..42169b9 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -449,16 +449,8 @@ sysenter_exit:
 sysenter_audit:
 	testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags(%ebp)
 	jnz syscall_trace_entry
-	addl $4,%esp
-	CFI_ADJUST_CFA_OFFSET -4
-	/* %esi already in 8(%esp)	   6th arg: 4th syscall arg */
-	/* %edx already in 4(%esp)	   5th arg: 3rd syscall arg */
-	/* %ecx already in 0(%esp)	   4th arg: 2nd syscall arg */
-	movl %ebx,%ecx			/* 3rd arg: 1st syscall arg */
-	movl %eax,%edx			/* 2nd arg: syscall number */
-	movl $AUDIT_ARCH_I386,%eax	/* 1st arg: audit arch */
+	movl %esp, %eax
 	call __audit_syscall_entry
-	pushl_cfi %ebx
 	movl PT_EAX(%esp),%eax		/* reload syscall number */
 	jmp sysenter_do_call
 
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index cdc79b5..38fc889 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -557,12 +557,7 @@ badsys:
 	 * jump back to the normal fast path.
 	 */
 auditsys:
-	movq %r10,%r9			/* 6th arg: 4th syscall arg */
-	movq %rdx,%r8			/* 5th arg: 3rd syscall arg */
-	movq %rsi,%rcx			/* 4th arg: 2nd syscall arg */
-	movq %rdi,%rdx			/* 3rd arg: 1st syscall arg */
-	movq %rax,%rsi			/* 2nd arg: syscall number */
-	movl $AUDIT_ARCH_X86_64,%edi	/* 1st arg: audit arch */
+	movq %rsp,%rdi
 	call __audit_syscall_entry
 	LOAD_ARGS 0		/* reload call-clobbered registers */
 	jmp system_call_fastpath
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 13b1990..916abfd 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1496,19 +1496,7 @@ long syscall_trace_enter(struct pt_regs *regs)
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
 		trace_sys_enter(regs, regs->orig_ax);
 
-	if (IS_IA32)
-		audit_syscall_entry(AUDIT_ARCH_I386,
-				    regs->orig_ax,
-				    regs->bx, regs->cx,
-				    regs->dx, regs->si);
-#ifdef CONFIG_X86_64
-	else
-		audit_syscall_entry(AUDIT_ARCH_X86_64,
-				    regs->orig_ax,
-				    regs->di, regs->si,
-				    regs->dx, regs->r10);
-#endif
-
+	audit_syscall_entry(regs);
 out:
 	return ret ?: regs->orig_ax;
 }
diff --git a/include/linux/audit.h b/include/linux/audit.h
index 22f292a..3d1f963 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -454,9 +454,7 @@ extern int audit_classify_arch(int arch);
 				/* Public API */
 extern int  audit_alloc(struct task_struct *task);
 extern void __audit_free(struct task_struct *task);
-extern void __audit_syscall_entry(int arch,
-				  int major, unsigned long a0, unsigned long a1,
-				  unsigned long a2, unsigned long a3);
+extern void __audit_syscall_entry(struct pt_regs *regs);
 extern void __audit_syscall_exit(int ret_success, long ret_value);
 extern void __audit_getname(const char *name);
 extern void audit_putname(const char *name);
@@ -476,14 +474,12 @@ static inline void audit_free(struct task_struct *task)
 	if (unlikely(task->audit_context))
 		__audit_free(task);
 }
-static inline void audit_syscall_entry(int arch, int major, unsigned long a0,
-				       unsigned long a1, unsigned long a2,
-				       unsigned long a3)
+static inline void audit_syscall_entry(struct pt_regs *regs)
 {
 	if (unlikely(!audit_dummy_context()))
-		__audit_syscall_entry(arch, major, a0, a1, a2, a3);
+		__audit_syscall_entry(regs);
 }
-static inline void audit_syscall_exit(void *pt_regs)
+static inline void audit_syscall_exit(struct pt_regs *pt_regs)
 {
 	if (unlikely(current->audit_context)) {
 		int success = is_syscall_success(pt_regs);
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 4b96415..9075c27 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -68,6 +68,7 @@
 #include <linux/capability.h>
 #include <linux/fs_struct.h>
 #include <linux/compat.h>
+#include <asm/syscall.h>
 
 #include "audit.h"
 
@@ -1787,13 +1788,12 @@ void __audit_free(struct task_struct *tsk)
  * will only be written if another part of the kernel requests that it
  * be written).
  */
-void __audit_syscall_entry(int arch, int major,
-			 unsigned long a1, unsigned long a2,
-			 unsigned long a3, unsigned long a4)
+void __audit_syscall_entry(struct pt_regs *regs)
 {
 	struct task_struct *tsk = current;
 	struct audit_context *context = tsk->audit_context;
 	enum audit_state     state;
+	int major;
 
 	if (!context)
 		return;
@@ -1812,6 +1812,7 @@ void __audit_syscall_entry(int arch, int major,
 	 * This also happens with vm86 emulation in a non-nested manner
 	 * (entries without exits), so this case must be caught.
 	 */
+	major = syscall_get_nr(tsk, regs);
 	if (context->in_syscall) {
 		struct audit_context *newctx;
 
@@ -1839,12 +1840,9 @@ void __audit_syscall_entry(int arch, int major,
 	if (!audit_enabled)
 		return;
 
-	context->arch	    = arch;
+	context->arch	    = syscall_get_arch(tsk, regs);
 	context->major      = major;
-	context->argv[0]    = a1;
-	context->argv[1]    = a2;
-	context->argv[2]    = a3;
-	context->argv[3]    = a4;
+	syscall_get_arguments(tsk, regs, 0, 4, context->argv);
 
 	state = context->state;
 	context->dummy = !audit_n_rules;

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: seccomp and ptrace. what is the correct order?
@ 2012-05-22 22:20                     ` Al Viro
  0 siblings, 0 replies; 77+ messages in thread
From: Al Viro @ 2012-05-22 22:20 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Will Drewry, H. Peter Anvin, Indan Zupancic, Eric Paris,
	linux-kernel, linux-security-module, kernel-hardening, mingo,
	oleg, peterz, rdunlap, tglx, luto, eparis, serge.hallyn, pmoore,
	akpm, corbet, eric.dumazet, markus, coreyb, keescook

On Tue, May 22, 2012 at 02:17:03PM -0700, Roland McGrath wrote:
> I always felt the same way about the audit code.  (As a bonus, if the audit
> folks ever decide they want all six syscall arguments instead of just four,
> they wouldn't have to touch every arch.)  But it will certainly produce
> drastically worse code for ia64.  (Not that anybody cares about ia64.)

FWIW, here's more or less what I had in mind; it's completely untested,
definitely breaks build on mips (no asm/syscall.h there) and I'm very
sceptical about ia32 compat syscall glue on x86 (I remember that there
had been some very evil games played with what's in which register at
which point, but I can't recall the details and I would rather leave that
to somebody whose eyes are not bleeding from staring at asm glue for more
than a month, TYVM...

diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
index c334a23..c31d82e 100644
--- a/arch/arm/include/asm/syscall.h
+++ b/arch/arm/include/asm/syscall.h
@@ -8,6 +8,8 @@
 #define _ASM_ARM_SYSCALL_H
 
 #include <linux/err.h>
+#include <asm/elf.h>
+#include <linux/audit.h>
 
 extern const unsigned long sys_call_table[];
 
@@ -90,4 +92,10 @@ static inline void syscall_set_arguments(struct task_struct *task,
 	memcpy(&regs->ARM_r0 + i, args, n * sizeof(args[0]));
 }
 
+static inline int syscall_get_arch(struct task_struct *task,
+				   struct pt_regs *regs)
+{
+	return AUDIT_ARCH_ARM;
+}
+
 #endif /* _ASM_ARM_SYSCALL_H */
diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
index 14e3826..2e39596 100644
--- a/arch/arm/kernel/ptrace.c
+++ b/arch/arm/kernel/ptrace.c
@@ -911,17 +911,16 @@ asmlinkage int syscall_trace(int why, struct pt_regs *regs, int scno)
 {
 	unsigned long ip;
 
+	current_thread_info()->syscall = scno;
+
 	if (why)
 		audit_syscall_exit(regs);
 	else
-		audit_syscall_entry(AUDIT_ARCH_ARM, scno, regs->ARM_r0,
-				    regs->ARM_r1, regs->ARM_r2, regs->ARM_r3);
+		audit_syscall_entry(regs);
 
 	if (!test_thread_flag(TIF_SYSCALL_TRACE))
 		return scno;
 
-	current_thread_info()->syscall = scno;
-
 	/*
 	 * IP is used to denote syscall entry/exit:
 	 * IP = 0 -> entry, =1 -> exit
diff --git a/arch/ia64/include/asm/syscall.h b/arch/ia64/include/asm/syscall.h
index a7ff1c6..12f36fb 100644
--- a/arch/ia64/include/asm/syscall.h
+++ b/arch/ia64/include/asm/syscall.h
@@ -14,6 +14,7 @@
 #define _ASM_SYSCALL_H	1
 
 #include <linux/sched.h>
+#include <linux/audit.h>
 #include <linux/err.h>
 
 static inline long syscall_get_nr(struct task_struct *task,
@@ -79,4 +80,10 @@ static inline void syscall_set_arguments(struct task_struct *task,
 
 	ia64_syscall_get_set_arguments(task, regs, i, n, args, 1);
 }
+
+static inline int syscall_get_arch(struct task_struct *task,
+				   struct pt_regs *regs)
+{
+	return AUDIT_ARCH_IA64;
+}
 #endif	/* _ASM_SYSCALL_H */
diff --git a/arch/ia64/kernel/ptrace.c b/arch/ia64/kernel/ptrace.c
index 4265ff6..319d001 100644
--- a/arch/ia64/kernel/ptrace.c
+++ b/arch/ia64/kernel/ptrace.c
@@ -1246,7 +1246,7 @@ syscall_trace_enter (long arg0, long arg1, long arg2, long arg3,
 		ia64_sync_krbs();
 
 
-	audit_syscall_entry(AUDIT_ARCH_IA64, regs.r15, arg0, arg1, arg2, arg3);
+	audit_syscall_entry(&regs);
 
 	return 0;
 }
diff --git a/arch/microblaze/include/asm/syscall.h b/arch/microblaze/include/asm/syscall.h
index 9bc4317..d8d963b 100644
--- a/arch/microblaze/include/asm/syscall.h
+++ b/arch/microblaze/include/asm/syscall.h
@@ -3,6 +3,7 @@
 
 #include <linux/kernel.h>
 #include <linux/sched.h>
+#include <linux/audit.h>
 #include <asm/ptrace.h>
 
 /* The system call number is given by the user in R12 */
@@ -96,6 +97,12 @@ static inline void syscall_set_arguments(struct task_struct *task,
 		microblaze_set_syscall_arg(regs, i++, *args++);
 }
 
+static inline int syscall_get_arch(struct task_struct *task,
+				   struct pt_regs *regs)
+{
+	return EM_MICROBLAZE;
+}
+
 asmlinkage long do_syscall_trace_enter(struct pt_regs *regs);
 asmlinkage void do_syscall_trace_leave(struct pt_regs *regs);
 
diff --git a/arch/microblaze/kernel/ptrace.c b/arch/microblaze/kernel/ptrace.c
index ab1b9db..4953058 100644
--- a/arch/microblaze/kernel/ptrace.c
+++ b/arch/microblaze/kernel/ptrace.c
@@ -147,8 +147,7 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
 		 */
 		ret = -1L;
 
-	audit_syscall_entry(EM_MICROBLAZE, regs->r12, regs->r5, regs->r6,
-			    regs->r7, regs->r8);
+	audit_syscall_entry(regs);
 
 	return ret ?: regs->r12;
 }
diff --git a/arch/powerpc/include/asm/syscall.h b/arch/powerpc/include/asm/syscall.h
index b54b2ad..3a6d01f 100644
--- a/arch/powerpc/include/asm/syscall.h
+++ b/arch/powerpc/include/asm/syscall.h
@@ -14,6 +14,7 @@
 #define _ASM_SYSCALL_H	1
 
 #include <linux/sched.h>
+#include <linux/audit.h>
 
 /* ftrace syscalls requires exporting the sys_call_table */
 #ifdef CONFIG_FTRACE_SYSCALLS
@@ -86,4 +87,12 @@ static inline void syscall_set_arguments(struct task_struct *task,
 	memcpy(&regs->gpr[3 + i], args, n * sizeof(args[0]));
 }
 
+static inline int syscall_get_arch(struct task_struct *task,
+				   struct pt_regs *regs)
+{
+	return test_tsk_thread_flag(task, TIF_32BIT) ? 
+					AUDIT_ARCH_PPC :
+					AUDIT_ARCH_PPC64;
+}
+
 #endif	/* _ASM_SYSCALL_H */
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index dd5e214..3f959f4 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -1724,20 +1724,7 @@ long do_syscall_trace_enter(struct pt_regs *regs)
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
 		trace_sys_enter(regs, regs->gpr[0]);
 
-#ifdef CONFIG_PPC64
-	if (!is_32bit_task())
-		audit_syscall_entry(AUDIT_ARCH_PPC64,
-				    regs->gpr[0],
-				    regs->gpr[3], regs->gpr[4],
-				    regs->gpr[5], regs->gpr[6]);
-	else
-#endif
-		audit_syscall_entry(AUDIT_ARCH_PPC,
-				    regs->gpr[0],
-				    regs->gpr[3] & 0xffffffff,
-				    regs->gpr[4] & 0xffffffff,
-				    regs->gpr[5] & 0xffffffff,
-				    regs->gpr[6] & 0xffffffff);
+	audit_syscall_entry(regs);
 
 	return ret ?: regs->gpr[0];
 }
diff --git a/arch/s390/include/asm/syscall.h b/arch/s390/include/asm/syscall.h
index fb214dd..b481b11 100644
--- a/arch/s390/include/asm/syscall.h
+++ b/arch/s390/include/asm/syscall.h
@@ -14,6 +14,7 @@
 
 #include <linux/sched.h>
 #include <linux/err.h>
+#include <linux/audit.h>
 #include <asm/ptrace.h>
 
 /*
@@ -87,4 +88,12 @@ static inline void syscall_set_arguments(struct task_struct *task,
 		regs->orig_gpr2 = args[0];
 }
 
+static inline int syscall_get_arch(struct task_struct *task,
+				   struct pt_regs *regs)
+{
+	return test_tsk_thread_flag(task, TIF_31BIT) ? 
+					AUDIT_ARCH_S390 :
+					AUDIT_ARCH_S390X;
+}
+
 #endif	/* _ASM_SYSCALL_H */
diff --git a/arch/s390/kernel/ptrace.c b/arch/s390/kernel/ptrace.c
index 4993e68..4222adb 100644
--- a/arch/s390/kernel/ptrace.c
+++ b/arch/s390/kernel/ptrace.c
@@ -740,11 +740,7 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
 		trace_sys_enter(regs, regs->gprs[2]);
 
-	audit_syscall_entry(is_compat_task() ?
-				AUDIT_ARCH_S390 : AUDIT_ARCH_S390X,
-			    regs->gprs[2], regs->orig_gpr2,
-			    regs->gprs[3], regs->gprs[4],
-			    regs->gprs[5]);
+	audit_syscall_entry(regs);
 	return ret ?: regs->gprs[2];
 }
 
diff --git a/arch/sh/include/asm/syscall_32.h b/arch/sh/include/asm/syscall_32.h
index 7d80df4..db1eb5d 100644
--- a/arch/sh/include/asm/syscall_32.h
+++ b/arch/sh/include/asm/syscall_32.h
@@ -4,6 +4,7 @@
 #include <linux/kernel.h>
 #include <linux/sched.h>
 #include <linux/err.h>
+#include <linux/audit.h>
 #include <asm/ptrace.h>
 
 /* The system call number is given by the user in R3 */
@@ -93,4 +94,16 @@ static inline void syscall_set_arguments(struct task_struct *task,
 	}
 }
 
+static inline int syscall_get_arch(struct task_struct *task,
+				   struct pt_regs *regs)
+{
+	int arch = EM_SH;
+
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+	arch |= __AUDIT_ARCH_LE;
+#endif
+
+	return arch;
+}
+
 #endif /* __ASM_SH_SYSCALL_32_H */
diff --git a/arch/sh/include/asm/syscall_64.h b/arch/sh/include/asm/syscall_64.h
index c3561ca..f3a5913 100644
--- a/arch/sh/include/asm/syscall_64.h
+++ b/arch/sh/include/asm/syscall_64.h
@@ -3,6 +3,7 @@
 
 #include <linux/kernel.h>
 #include <linux/sched.h>
+#include <linux/audit.h>
 #include <asm/ptrace.h>
 
 /* The system call number is given by the user in R9 */
@@ -61,4 +62,19 @@ static inline void syscall_set_arguments(struct task_struct *task,
 	memcpy(&regs->regs[2 + i], args, n * sizeof(args[0]));
 }
 
+static inline int syscall_get_arch(struct task_struct *task,
+				   struct pt_regs *regs)
+{
+	int arch = EM_SH;
+
+#ifdef CONFIG_64BIT
+	arch |= __AUDIT_ARCH_64BIT;
+#endif
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+	arch |= __AUDIT_ARCH_LE;
+#endif
+
+	return arch;
+}
+
 #endif /* __ASM_SH_SYSCALL_64_H */
diff --git a/arch/sh/kernel/ptrace_32.c b/arch/sh/kernel/ptrace_32.c
index 81f999a..0173bfd 100644
--- a/arch/sh/kernel/ptrace_32.c
+++ b/arch/sh/kernel/ptrace_32.c
@@ -488,17 +488,6 @@ long arch_ptrace(struct task_struct *child, long request,
 	return ret;
 }
 
-static inline int audit_arch(void)
-{
-	int arch = EM_SH;
-
-#ifdef CONFIG_CPU_LITTLE_ENDIAN
-	arch |= __AUDIT_ARCH_LE;
-#endif
-
-	return arch;
-}
-
 asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
 {
 	long ret = 0;
@@ -517,9 +506,7 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
 		trace_sys_enter(regs, regs->regs[0]);
 
-	audit_syscall_entry(audit_arch(), regs->regs[3],
-			    regs->regs[4], regs->regs[5],
-			    regs->regs[6], regs->regs[7]);
+	audit_syscall_entry(regs);
 
 	return ret ?: regs->regs[0];
 }
diff --git a/arch/sh/kernel/ptrace_64.c b/arch/sh/kernel/ptrace_64.c
index af90339..5e21c79 100644
--- a/arch/sh/kernel/ptrace_64.c
+++ b/arch/sh/kernel/ptrace_64.c
@@ -504,20 +504,6 @@ asmlinkage int sh64_ptrace(long request, long pid,
 	return sys_ptrace(request, pid, addr, data);
 }
 
-static inline int audit_arch(void)
-{
-	int arch = EM_SH;
-
-#ifdef CONFIG_64BIT
-	arch |= __AUDIT_ARCH_64BIT;
-#endif
-#ifdef CONFIG_CPU_LITTLE_ENDIAN
-	arch |= __AUDIT_ARCH_LE;
-#endif
-
-	return arch;
-}
-
 asmlinkage long long do_syscall_trace_enter(struct pt_regs *regs)
 {
 	long long ret = 0;
@@ -536,10 +522,7 @@ asmlinkage long long do_syscall_trace_enter(struct pt_regs *regs)
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
 		trace_sys_enter(regs, regs->regs[9]);
 
-	audit_syscall_entry(audit_arch(), regs->regs[1],
-			    regs->regs[2], regs->regs[3],
-			    regs->regs[4], regs->regs[5]);
-
+	audit_syscall_entry(regs);
 	return ret ?: regs->regs[9];
 }
 
diff --git a/arch/sparc/include/asm/syscall.h b/arch/sparc/include/asm/syscall.h
index 025a02a..dc94ae8 100644
--- a/arch/sparc/include/asm/syscall.h
+++ b/arch/sparc/include/asm/syscall.h
@@ -3,6 +3,7 @@
 
 #include <linux/kernel.h>
 #include <linux/sched.h>
+#include <linux/audit.h>
 #include <asm/ptrace.h>
 
 /*
@@ -124,4 +125,12 @@ static inline void syscall_set_arguments(struct task_struct *task,
 		regs->u_regs[UREG_I0 + i + j] = args[j];
 }
 
+static inline int syscall_get_arch(struct task_struct *task,
+				   struct pt_regs *regs)
+{
+	return test_tsk_thread_flag(TIF_32BIT) ?
+			     AUDIT_ARCH_SPARC :
+			     AUDIT_ARCH_SPARC64;
+}
+
 #endif /* __ASM_SPARC_SYSCALL_H */
diff --git a/arch/sparc/kernel/ptrace_64.c b/arch/sparc/kernel/ptrace_64.c
index 484daba..7deee07 100644
--- a/arch/sparc/kernel/ptrace_64.c
+++ b/arch/sparc/kernel/ptrace_64.c
@@ -1070,15 +1070,7 @@ asmlinkage int syscall_trace_enter(struct pt_regs *regs)
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
 		trace_sys_enter(regs, regs->u_regs[UREG_G1]);
 
-	audit_syscall_entry((test_thread_flag(TIF_32BIT) ?
-			     AUDIT_ARCH_SPARC :
-			     AUDIT_ARCH_SPARC64),
-			    regs->u_regs[UREG_G1],
-			    regs->u_regs[UREG_I0],
-			    regs->u_regs[UREG_I1],
-			    regs->u_regs[UREG_I2],
-			    regs->u_regs[UREG_I3]);
-
+	audit_syscall_entry(regs);
 	return ret;
 }
 
diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index e3e7340..aa5d1ff 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -184,17 +184,11 @@ sysexit_from_sys_call:
 
 #ifdef CONFIG_AUDITSYSCALL
 	.macro auditsys_entry_common
-	movl %esi,%r9d			/* 6th arg: 4th syscall arg */
-	movl %edx,%r8d			/* 5th arg: 3rd syscall arg */
-	/* (already in %ecx)		   4th arg: 2nd syscall arg */
-	movl %ebx,%edx			/* 3rd arg: 1st syscall arg */
-	movl %eax,%esi			/* 2nd arg: syscall number */
-	movl $AUDIT_ARCH_I386,%edi	/* 1st arg: audit arch */
+	movq %rsp,%rdi
 	call __audit_syscall_entry
 	movl RAX-ARGOFFSET(%rsp),%eax	/* reload syscall number */
 	cmpq $(IA32_NR_syscalls-1),%rax
 	ja ia32_badsys
-	movl %ebx,%edi			/* reload 1st syscall arg */
 	movl RCX-ARGOFFSET(%rsp),%esi	/* reload 2nd syscall arg */
 	movl RDX-ARGOFFSET(%rsp),%edx	/* reload 3rd syscall arg */
 	movl RSI-ARGOFFSET(%rsp),%ecx	/* reload 4th syscall arg */
diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index 1ace47b..7d69c71 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -13,8 +13,8 @@
 #ifndef _ASM_X86_SYSCALL_H
 #define _ASM_X86_SYSCALL_H
 
-#include <linux/audit.h>
 #include <linux/sched.h>
+#include <linux/audit.h>
 #include <linux/err.h>
 #include <asm/asm-offsets.h>	/* For NR_syscalls */
 #include <asm/thread_info.h>	/* for TS_COMPAT */
diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index 7b784f4..42169b9 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -449,16 +449,8 @@ sysenter_exit:
 sysenter_audit:
 	testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags(%ebp)
 	jnz syscall_trace_entry
-	addl $4,%esp
-	CFI_ADJUST_CFA_OFFSET -4
-	/* %esi already in 8(%esp)	   6th arg: 4th syscall arg */
-	/* %edx already in 4(%esp)	   5th arg: 3rd syscall arg */
-	/* %ecx already in 0(%esp)	   4th arg: 2nd syscall arg */
-	movl %ebx,%ecx			/* 3rd arg: 1st syscall arg */
-	movl %eax,%edx			/* 2nd arg: syscall number */
-	movl $AUDIT_ARCH_I386,%eax	/* 1st arg: audit arch */
+	movl %esp, %eax
 	call __audit_syscall_entry
-	pushl_cfi %ebx
 	movl PT_EAX(%esp),%eax		/* reload syscall number */
 	jmp sysenter_do_call
 
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index cdc79b5..38fc889 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -557,12 +557,7 @@ badsys:
 	 * jump back to the normal fast path.
 	 */
 auditsys:
-	movq %r10,%r9			/* 6th arg: 4th syscall arg */
-	movq %rdx,%r8			/* 5th arg: 3rd syscall arg */
-	movq %rsi,%rcx			/* 4th arg: 2nd syscall arg */
-	movq %rdi,%rdx			/* 3rd arg: 1st syscall arg */
-	movq %rax,%rsi			/* 2nd arg: syscall number */
-	movl $AUDIT_ARCH_X86_64,%edi	/* 1st arg: audit arch */
+	movq %rsp,%rdi
 	call __audit_syscall_entry
 	LOAD_ARGS 0		/* reload call-clobbered registers */
 	jmp system_call_fastpath
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 13b1990..916abfd 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1496,19 +1496,7 @@ long syscall_trace_enter(struct pt_regs *regs)
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
 		trace_sys_enter(regs, regs->orig_ax);
 
-	if (IS_IA32)
-		audit_syscall_entry(AUDIT_ARCH_I386,
-				    regs->orig_ax,
-				    regs->bx, regs->cx,
-				    regs->dx, regs->si);
-#ifdef CONFIG_X86_64
-	else
-		audit_syscall_entry(AUDIT_ARCH_X86_64,
-				    regs->orig_ax,
-				    regs->di, regs->si,
-				    regs->dx, regs->r10);
-#endif
-
+	audit_syscall_entry(regs);
 out:
 	return ret ?: regs->orig_ax;
 }
diff --git a/include/linux/audit.h b/include/linux/audit.h
index 22f292a..3d1f963 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -454,9 +454,7 @@ extern int audit_classify_arch(int arch);
 				/* Public API */
 extern int  audit_alloc(struct task_struct *task);
 extern void __audit_free(struct task_struct *task);
-extern void __audit_syscall_entry(int arch,
-				  int major, unsigned long a0, unsigned long a1,
-				  unsigned long a2, unsigned long a3);
+extern void __audit_syscall_entry(struct pt_regs *regs);
 extern void __audit_syscall_exit(int ret_success, long ret_value);
 extern void __audit_getname(const char *name);
 extern void audit_putname(const char *name);
@@ -476,14 +474,12 @@ static inline void audit_free(struct task_struct *task)
 	if (unlikely(task->audit_context))
 		__audit_free(task);
 }
-static inline void audit_syscall_entry(int arch, int major, unsigned long a0,
-				       unsigned long a1, unsigned long a2,
-				       unsigned long a3)
+static inline void audit_syscall_entry(struct pt_regs *regs)
 {
 	if (unlikely(!audit_dummy_context()))
-		__audit_syscall_entry(arch, major, a0, a1, a2, a3);
+		__audit_syscall_entry(regs);
 }
-static inline void audit_syscall_exit(void *pt_regs)
+static inline void audit_syscall_exit(struct pt_regs *pt_regs)
 {
 	if (unlikely(current->audit_context)) {
 		int success = is_syscall_success(pt_regs);
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 4b96415..9075c27 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -68,6 +68,7 @@
 #include <linux/capability.h>
 #include <linux/fs_struct.h>
 #include <linux/compat.h>
+#include <asm/syscall.h>
 
 #include "audit.h"
 
@@ -1787,13 +1788,12 @@ void __audit_free(struct task_struct *tsk)
  * will only be written if another part of the kernel requests that it
  * be written).
  */
-void __audit_syscall_entry(int arch, int major,
-			 unsigned long a1, unsigned long a2,
-			 unsigned long a3, unsigned long a4)
+void __audit_syscall_entry(struct pt_regs *regs)
 {
 	struct task_struct *tsk = current;
 	struct audit_context *context = tsk->audit_context;
 	enum audit_state     state;
+	int major;
 
 	if (!context)
 		return;
@@ -1812,6 +1812,7 @@ void __audit_syscall_entry(int arch, int major,
 	 * This also happens with vm86 emulation in a non-nested manner
 	 * (entries without exits), so this case must be caught.
 	 */
+	major = syscall_get_nr(tsk, regs);
 	if (context->in_syscall) {
 		struct audit_context *newctx;
 
@@ -1839,12 +1840,9 @@ void __audit_syscall_entry(int arch, int major,
 	if (!audit_enabled)
 		return;
 
-	context->arch	    = arch;
+	context->arch	    = syscall_get_arch(tsk, regs);
 	context->major      = major;
-	context->argv[0]    = a1;
-	context->argv[1]    = a2;
-	context->argv[2]    = a3;
-	context->argv[3]    = a4;
+	syscall_get_arguments(tsk, regs, 0, 4, context->argv);
 
 	state = context->state;
 	context->dummy = !audit_n_rules;

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC PATCH 0/3] move the secure_computing call
  2012-05-22 17:39         ` [kernel-hardening] " Al Viro
@ 2012-05-24 16:07           ` Will Drewry
  -1 siblings, 0 replies; 77+ messages in thread
From: Will Drewry @ 2012-05-24 16:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: mcgrathr, hpa, indan, netdev, linux-security-module,
	kernel-hardening, mingo, oleg, peterz, rdunlap, tglx, luto,
	serge.hallyn, pmoore, akpm, corbet, markus, coreyb, keescook,
	viro, jmorris, Will Drewry

This is an RFC based on the comments from Al Viro and Eric Paris
regarding ptrace()rs being able to change the system call the kernel
sees after the seccomp enforcement has occurred (for mode 1 or 2).

With this series applied, a (p)tracer of a process with seccomp enabled
will be unable to change the tracee's system call number after the
secure computing check has been performed.

The x86 change is tested, as is the seccomp.c change.  For other arches,
it is not (RFC :).  Given that there are other inconsistencies in this
code across architectures, I'm not sure if it makes sense to attempt to
fix them all at once or to roll through as I attempt to add seccomp
filter support.

As is, the biggest benefit of this change is just setting consistent
expectations in what the ptrace/seccomp interactions should be.  The
current ability for ptrace to "bypass" secure computing (by remapping
allowed system calls) is not necessarily a problem, but it is not
necessarily intuitive behavior.

Thoughts, comments, flames will be appreciated!


Will Drewry (3):
  seccomp: Don't allow tracers to abuse RET_TRACE
  arch/x86: move secure_computing after ptrace
  arch/*: move secure_computing after trace

 arch/arm/kernel/entry-common.S  |    7 +------
 arch/arm/kernel/ptrace.c        |   42 +++++++++++++++++++--------------------
 arch/microblaze/kernel/ptrace.c |    4 ++--
 arch/mips/kernel/ptrace.c       |   16 ++++++---------
 arch/powerpc/kernel/ptrace.c    |    5 +++--
 arch/s390/kernel/ptrace.c       |    6 +++---
 arch/sh/kernel/ptrace_32.c      |    5 +++--
 arch/sh/kernel/ptrace_64.c      |    5 +++--
 arch/sparc/kernel/ptrace_64.c   |    7 ++++---
 arch/x86/kernel/ptrace.c        |   13 ++++++------
 kernel/seccomp.c                |    4 ++++
 11 files changed, 56 insertions(+), 58 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] [RFC PATCH 0/3] move the secure_computing call
@ 2012-05-24 16:07           ` Will Drewry
  0 siblings, 0 replies; 77+ messages in thread
From: Will Drewry @ 2012-05-24 16:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: mcgrathr, hpa, indan, netdev, linux-security-module,
	kernel-hardening, mingo, oleg, peterz, rdunlap, tglx, luto,
	serge.hallyn, pmoore, akpm, corbet, markus, coreyb, keescook,
	viro, jmorris, Will Drewry

This is an RFC based on the comments from Al Viro and Eric Paris
regarding ptrace()rs being able to change the system call the kernel
sees after the seccomp enforcement has occurred (for mode 1 or 2).

With this series applied, a (p)tracer of a process with seccomp enabled
will be unable to change the tracee's system call number after the
secure computing check has been performed.

The x86 change is tested, as is the seccomp.c change.  For other arches,
it is not (RFC :).  Given that there are other inconsistencies in this
code across architectures, I'm not sure if it makes sense to attempt to
fix them all at once or to roll through as I attempt to add seccomp
filter support.

As is, the biggest benefit of this change is just setting consistent
expectations in what the ptrace/seccomp interactions should be.  The
current ability for ptrace to "bypass" secure computing (by remapping
allowed system calls) is not necessarily a problem, but it is not
necessarily intuitive behavior.

Thoughts, comments, flames will be appreciated!


Will Drewry (3):
  seccomp: Don't allow tracers to abuse RET_TRACE
  arch/x86: move secure_computing after ptrace
  arch/*: move secure_computing after trace

 arch/arm/kernel/entry-common.S  |    7 +------
 arch/arm/kernel/ptrace.c        |   42 +++++++++++++++++++--------------------
 arch/microblaze/kernel/ptrace.c |    4 ++--
 arch/mips/kernel/ptrace.c       |   16 ++++++---------
 arch/powerpc/kernel/ptrace.c    |    5 +++--
 arch/s390/kernel/ptrace.c       |    6 +++---
 arch/sh/kernel/ptrace_32.c      |    5 +++--
 arch/sh/kernel/ptrace_64.c      |    5 +++--
 arch/sparc/kernel/ptrace_64.c   |    7 ++++---
 arch/x86/kernel/ptrace.c        |   13 ++++++------
 kernel/seccomp.c                |    4 ++++
 11 files changed, 56 insertions(+), 58 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [RFC PATCH 1/3] seccomp: Don't allow tracers to abuse RET_TRACE
  2012-05-24 16:07           ` [kernel-hardening] " Will Drewry
@ 2012-05-24 16:07             ` Will Drewry
  -1 siblings, 0 replies; 77+ messages in thread
From: Will Drewry @ 2012-05-24 16:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: mcgrathr, hpa, indan, netdev, linux-security-module,
	kernel-hardening, mingo, oleg, peterz, rdunlap, tglx, luto,
	serge.hallyn, pmoore, akpm, corbet, markus, coreyb, keescook,
	viro, jmorris, Will Drewry

Ensure that consumers of the PTRACE_EVENT_SECCOMP notification
cannot change the system call number for the traced task
without it resulting in the system call being skipped.

Traditionally, tracers will set the system call number to
-1 to skip the system call. This behavior will work as expected
but the tracer will be unable to remap the system call to a valid
system call after the seccomp policy has been evaluated.

Signed-off-by: Will Drewry <wad@chromium.org>
---
 kernel/seccomp.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index ee376be..33f0ad6 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -425,6 +425,10 @@ int __secure_computing(int this_syscall)
 			 */
 			if (fatal_signal_pending(current))
 				break;
+			/* Skip the system call if the tracer changed it. */
+			if (this_syscall !=
+			    syscall_get_nr(current, task_pt_regs(current)))
+				goto skip;
 			return 0;
 		case SECCOMP_RET_ALLOW:
 			return 0;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [kernel-hardening] [RFC PATCH 1/3] seccomp: Don't allow tracers to abuse RET_TRACE
@ 2012-05-24 16:07             ` Will Drewry
  0 siblings, 0 replies; 77+ messages in thread
From: Will Drewry @ 2012-05-24 16:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: mcgrathr, hpa, indan, netdev, linux-security-module,
	kernel-hardening, mingo, oleg, peterz, rdunlap, tglx, luto,
	serge.hallyn, pmoore, akpm, corbet, markus, coreyb, keescook,
	viro, jmorris, Will Drewry

Ensure that consumers of the PTRACE_EVENT_SECCOMP notification
cannot change the system call number for the traced task
without it resulting in the system call being skipped.

Traditionally, tracers will set the system call number to
-1 to skip the system call. This behavior will work as expected
but the tracer will be unable to remap the system call to a valid
system call after the seccomp policy has been evaluated.

Signed-off-by: Will Drewry <wad@chromium.org>
---
 kernel/seccomp.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index ee376be..33f0ad6 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -425,6 +425,10 @@ int __secure_computing(int this_syscall)
 			 */
 			if (fatal_signal_pending(current))
 				break;
+			/* Skip the system call if the tracer changed it. */
+			if (this_syscall !=
+			    syscall_get_nr(current, task_pt_regs(current)))
+				goto skip;
 			return 0;
 		case SECCOMP_RET_ALLOW:
 			return 0;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC PATCH 2/3] arch/x86: move secure_computing after ptrace
  2012-05-24 16:07           ` [kernel-hardening] " Will Drewry
@ 2012-05-24 16:08             ` Will Drewry
  -1 siblings, 0 replies; 77+ messages in thread
From: Will Drewry @ 2012-05-24 16:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: mcgrathr, hpa, indan, netdev, linux-security-module,
	kernel-hardening, mingo, oleg, peterz, rdunlap, tglx, luto,
	serge.hallyn, pmoore, akpm, corbet, markus, coreyb, keescook,
	viro, jmorris, Will Drewry

At present, seccomp modes 1 and 2 may have their
behavior changed by a ptrace()ing task.  The ptracer
cannot change blocked/disallowed system calls, but it can
change allowed system calls to calls that would otherwise
not be allowed by the seccomp policy.

Signed-off-by: Will Drewry <wad@chromium.org>
---
 arch/x86/kernel/ptrace.c |   13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 13b1990..ad649a6 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1479,13 +1479,6 @@ long syscall_trace_enter(struct pt_regs *regs)
 	if (test_thread_flag(TIF_SINGLESTEP))
 		regs->flags |= X86_EFLAGS_TF;
 
-	/* do the secure computing check first */
-	if (secure_computing(regs->orig_ax)) {
-		/* seccomp failures shouldn't expose any additional code. */
-		ret = -1L;
-		goto out;
-	}
-
 	if (unlikely(test_thread_flag(TIF_SYSCALL_EMU)))
 		ret = -1L;
 
@@ -1493,6 +1486,12 @@ long syscall_trace_enter(struct pt_regs *regs)
 	    tracehook_report_syscall_entry(regs))
 		ret = -1L;
 
+	/* check secure computing after userspace can't change the syscall. */
+	if (!ret && secure_computing(regs->orig_ax)) {
+		ret = -1L;
+		goto out;
+	}
+
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
 		trace_sys_enter(regs, regs->orig_ax);
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [kernel-hardening] [RFC PATCH 2/3] arch/x86: move secure_computing after ptrace
@ 2012-05-24 16:08             ` Will Drewry
  0 siblings, 0 replies; 77+ messages in thread
From: Will Drewry @ 2012-05-24 16:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: mcgrathr, hpa, indan, netdev, linux-security-module,
	kernel-hardening, mingo, oleg, peterz, rdunlap, tglx, luto,
	serge.hallyn, pmoore, akpm, corbet, markus, coreyb, keescook,
	viro, jmorris, Will Drewry

At present, seccomp modes 1 and 2 may have their
behavior changed by a ptrace()ing task.  The ptracer
cannot change blocked/disallowed system calls, but it can
change allowed system calls to calls that would otherwise
not be allowed by the seccomp policy.

Signed-off-by: Will Drewry <wad@chromium.org>
---
 arch/x86/kernel/ptrace.c |   13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 13b1990..ad649a6 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1479,13 +1479,6 @@ long syscall_trace_enter(struct pt_regs *regs)
 	if (test_thread_flag(TIF_SINGLESTEP))
 		regs->flags |= X86_EFLAGS_TF;
 
-	/* do the secure computing check first */
-	if (secure_computing(regs->orig_ax)) {
-		/* seccomp failures shouldn't expose any additional code. */
-		ret = -1L;
-		goto out;
-	}
-
 	if (unlikely(test_thread_flag(TIF_SYSCALL_EMU)))
 		ret = -1L;
 
@@ -1493,6 +1486,12 @@ long syscall_trace_enter(struct pt_regs *regs)
 	    tracehook_report_syscall_entry(regs))
 		ret = -1L;
 
+	/* check secure computing after userspace can't change the syscall. */
+	if (!ret && secure_computing(regs->orig_ax)) {
+		ret = -1L;
+		goto out;
+	}
+
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
 		trace_sys_enter(regs, regs->orig_ax);
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [RFC PATCH 3/3] arch/*: move secure_computing after trace
  2012-05-24 16:07           ` [kernel-hardening] " Will Drewry
@ 2012-05-24 16:08             ` Will Drewry
  -1 siblings, 0 replies; 77+ messages in thread
From: Will Drewry @ 2012-05-24 16:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: mcgrathr, hpa, indan, netdev, linux-security-module,
	kernel-hardening, mingo, oleg, peterz, rdunlap, tglx, luto,
	serge.hallyn, pmoore, akpm, corbet, markus, coreyb, keescook,
	viro, jmorris, Will Drewry

At present, any ptracer can bypass the seccomp system call
policy by changing the system call after secure_computing has
been called.  This change ensures that secure_computing
occurs after all other userspace applications may have changed the
system call value.

(Note, ARM was the most affected in this patch as the call was moved
 into the syscall_trace path and reordering after ptrace was
 non-trivial.)

This change is wildly untested.

Signed-off-by: Will Drewry <wad@chromium.org>
---
 arch/arm/kernel/entry-common.S  |    7 +------
 arch/arm/kernel/ptrace.c        |   42 +++++++++++++++++++--------------------
 arch/microblaze/kernel/ptrace.c |    4 ++--
 arch/mips/kernel/ptrace.c       |   16 ++++++---------
 arch/powerpc/kernel/ptrace.c    |    5 +++--
 arch/s390/kernel/ptrace.c       |    6 +++---
 arch/sh/kernel/ptrace_32.c      |    5 +++--
 arch/sh/kernel/ptrace_64.c      |    5 +++--
 arch/sparc/kernel/ptrace_64.c   |    7 ++++---
 9 files changed, 46 insertions(+), 51 deletions(-)

diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
index 7bd2d3c..57a5bde 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -416,12 +416,7 @@ ENTRY(vector_swi)
 
 #ifdef CONFIG_SECCOMP
 	tst	r10, #_TIF_SECCOMP
-	beq	1f
-	mov	r0, scno
-	bl	__secure_computing	
-	add	r0, sp, #S_R0 + S_OFF		@ pointer to regs
-	ldmia	r0, {r0 - r3}			@ have to reload r0 - r3
-1:
+	bne	__sys_trace			@ are we in seccomp mode?
 #endif
 
 	tst	r10, #_TIF_SYSCALL_WORK		@ are we tracing syscalls?
diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
index 14e3826..1654071 100644
--- a/arch/arm/kernel/ptrace.c
+++ b/arch/arm/kernel/ptrace.c
@@ -909,32 +909,32 @@ long arch_ptrace(struct task_struct *child, long request,
 
 asmlinkage int syscall_trace(int why, struct pt_regs *regs, int scno)
 {
-	unsigned long ip;
-
-	if (why)
-		audit_syscall_exit(regs);
-	else
-		audit_syscall_entry(AUDIT_ARCH_ARM, scno, regs->ARM_r0,
-				    regs->ARM_r1, regs->ARM_r2, regs->ARM_r3);
-
-	if (!test_thread_flag(TIF_SYSCALL_TRACE))
-		return scno;
-
-	current_thread_info()->syscall = scno;
-
 	/*
 	 * IP is used to denote syscall entry/exit:
 	 * IP = 0 -> entry, =1 -> exit
 	 */
-	ip = regs->ARM_ip;
-	regs->ARM_ip = why;
+	unsigned long ip = regs->ARM_ip;
 
-	if (why)
-		tracehook_report_syscall_exit(regs, 0);
-	else if (tracehook_report_syscall_entry(regs))
-		current_thread_info()->syscall = -1;
-
-	regs->ARM_ip = ip;
+	current_thread_info()->syscall = scno;
+	if (why) {	/* exit */
+		if (test_thread_flag(TIF_SYSCALL_TRACE)) {
+			regs->ARM_ip = why;
+			tracehook_report_syscall_exit(regs, 0);
+			regs->ARM_ip = ip;
+		}
+		audit_syscall_exit(regs);
+	} else {	/* entry */
+		if (test_thread_flag(TIF_SYSCALL_TRACE)) {
+			regs->ARM_ip = why;
+			if (tracehook_report_syscall_entry(regs))
+				current_thread_info()->syscall = -1;
+			regs->ARM_ip = ip;
+		}
+		/* Call secure_computing after any userspace changes are done */
+		secure_computing_strict(current_thread_info()->syscall);
+		audit_syscall_entry(AUDIT_ARCH_ARM, scno, regs->ARM_r0,
+				    regs->ARM_r1, regs->ARM_r2, regs->ARM_r3);
+	}
 
 	return current_thread_info()->syscall;
 }
diff --git a/arch/microblaze/kernel/ptrace.c b/arch/microblaze/kernel/ptrace.c
index ab1b9db..9a917a7 100644
--- a/arch/microblaze/kernel/ptrace.c
+++ b/arch/microblaze/kernel/ptrace.c
@@ -136,8 +136,6 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
 {
 	long ret = 0;
 
-	secure_computing_strict(regs->r12);
-
 	if (test_thread_flag(TIF_SYSCALL_TRACE) &&
 	    tracehook_report_syscall_entry(regs))
 		/*
@@ -147,6 +145,8 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
 		 */
 		ret = -1L;
 
+	secure_computing_strict(regs->r12);
+
 	audit_syscall_entry(EM_MICROBLAZE, regs->r12, regs->r5, regs->r6,
 			    regs->r7, regs->r8);
 
diff --git a/arch/mips/kernel/ptrace.c b/arch/mips/kernel/ptrace.c
index 4812c6d..2b1f5f3 100644
--- a/arch/mips/kernel/ptrace.c
+++ b/arch/mips/kernel/ptrace.c
@@ -534,19 +534,16 @@ static inline int audit_arch(void)
  */
 asmlinkage void syscall_trace_enter(struct pt_regs *regs)
 {
-	/* do the secure computing check first */
-	secure_computing_strict(regs->regs[2]);
-
-	if (!(current->ptrace & PT_PTRACED))
-		goto out;
-
-	if (!test_thread_flag(TIF_SYSCALL_TRACE))
-		goto out;
-
+	if ((current->ptrace & PT_PTRACED) &&
+	    test_thread_flag(TIF_SYSCALL_TRACE)) {
 	/* The 0x80 provides a way for the tracing parent to distinguish
 	   between a syscall stop and SIGTRAP delivery */
 	ptrace_notify(SIGTRAP | ((current->ptrace & PT_TRACESYSGOOD) ?
 	                         0x80 : 0));
+	}
+
+	/* do the secure computing check after the system calls are final. */
+	secure_computing_strict(regs->regs[2]);
 
 	/*
 	 * this isn't the same as continuing with a signal, but it will do
@@ -558,7 +555,6 @@ asmlinkage void syscall_trace_enter(struct pt_regs *regs)
 		current->exit_code = 0;
 	}
 
-out:
 	audit_syscall_entry(audit_arch(), regs->regs[2],
 			    regs->regs[4], regs->regs[5],
 			    regs->regs[6], regs->regs[7]);
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index dd5e214..4b725ed 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -1710,8 +1710,6 @@ long do_syscall_trace_enter(struct pt_regs *regs)
 {
 	long ret = 0;
 
-	secure_computing_strict(regs->gpr[0]);
-
 	if (test_thread_flag(TIF_SYSCALL_TRACE) &&
 	    tracehook_report_syscall_entry(regs))
 		/*
@@ -1721,6 +1719,9 @@ long do_syscall_trace_enter(struct pt_regs *regs)
 		 */
 		ret = -1L;
 
+	if (!ret)
+		secure_computing_strict(regs->gpr[0]);
+
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
 		trace_sys_enter(regs, regs->gpr[0]);
 
diff --git a/arch/s390/kernel/ptrace.c b/arch/s390/kernel/ptrace.c
index 4993e68..b95c0ac 100644
--- a/arch/s390/kernel/ptrace.c
+++ b/arch/s390/kernel/ptrace.c
@@ -718,9 +718,6 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
 {
 	long ret = 0;
 
-	/* Do the secure computing check first. */
-	secure_computing_strict(regs->gprs[2]);
-
 	/*
 	 * The sysc_tracesys code in entry.S stored the system
 	 * call number to gprs[2].
@@ -737,6 +734,9 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
 		ret = -1;
 	}
 
+	/* Do the secure computing check with a final syscall set. */
+	secure_computing_strict(regs->gprs[2]);
+
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
 		trace_sys_enter(regs, regs->gprs[2]);
 
diff --git a/arch/sh/kernel/ptrace_32.c b/arch/sh/kernel/ptrace_32.c
index 81f999a..2500ce1 100644
--- a/arch/sh/kernel/ptrace_32.c
+++ b/arch/sh/kernel/ptrace_32.c
@@ -503,8 +503,6 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
 {
 	long ret = 0;
 
-	secure_computing_strict(regs->regs[0]);
-
 	if (test_thread_flag(TIF_SYSCALL_TRACE) &&
 	    tracehook_report_syscall_entry(regs))
 		/*
@@ -514,6 +512,9 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
 		 */
 		ret = -1L;
 
+	if (!ret)
+		secure_computing_strict(regs->regs[0]);
+
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
 		trace_sys_enter(regs, regs->regs[0]);
 
diff --git a/arch/sh/kernel/ptrace_64.c b/arch/sh/kernel/ptrace_64.c
index af90339..cc030e0 100644
--- a/arch/sh/kernel/ptrace_64.c
+++ b/arch/sh/kernel/ptrace_64.c
@@ -522,8 +522,6 @@ asmlinkage long long do_syscall_trace_enter(struct pt_regs *regs)
 {
 	long long ret = 0;
 
-	secure_computing_strict(regs->regs[9]);
-
 	if (test_thread_flag(TIF_SYSCALL_TRACE) &&
 	    tracehook_report_syscall_entry(regs))
 		/*
@@ -533,6 +531,9 @@ asmlinkage long long do_syscall_trace_enter(struct pt_regs *regs)
 		 */
 		ret = -1LL;
 
+	if (!ret)
+		secure_computing_strict(regs->regs[9]);
+
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
 		trace_sys_enter(regs, regs->regs[9]);
 
diff --git a/arch/sparc/kernel/ptrace_64.c b/arch/sparc/kernel/ptrace_64.c
index 484daba..31fff51 100644
--- a/arch/sparc/kernel/ptrace_64.c
+++ b/arch/sparc/kernel/ptrace_64.c
@@ -1061,12 +1061,13 @@ asmlinkage int syscall_trace_enter(struct pt_regs *regs)
 {
 	int ret = 0;
 
-	/* do the secure computing check first */
-	secure_computing_strict(regs->u_regs[UREG_G1]);
-
 	if (test_thread_flag(TIF_SYSCALL_TRACE))
 		ret = tracehook_report_syscall_entry(regs);
 
+	/* do the secure computing check with the final syscall */
+	if (!ret)
+		secure_computing_strict(regs->u_regs[UREG_G1]);
+
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
 		trace_sys_enter(regs, regs->u_regs[UREG_G1]);
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [kernel-hardening] [RFC PATCH 3/3] arch/*: move secure_computing after trace
@ 2012-05-24 16:08             ` Will Drewry
  0 siblings, 0 replies; 77+ messages in thread
From: Will Drewry @ 2012-05-24 16:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: mcgrathr, hpa, indan, netdev, linux-security-module,
	kernel-hardening, mingo, oleg, peterz, rdunlap, tglx, luto,
	serge.hallyn, pmoore, akpm, corbet, markus, coreyb, keescook,
	viro, jmorris, Will Drewry

At present, any ptracer can bypass the seccomp system call
policy by changing the system call after secure_computing has
been called.  This change ensures that secure_computing
occurs after all other userspace applications may have changed the
system call value.

(Note, ARM was the most affected in this patch as the call was moved
 into the syscall_trace path and reordering after ptrace was
 non-trivial.)

This change is wildly untested.

Signed-off-by: Will Drewry <wad@chromium.org>
---
 arch/arm/kernel/entry-common.S  |    7 +------
 arch/arm/kernel/ptrace.c        |   42 +++++++++++++++++++--------------------
 arch/microblaze/kernel/ptrace.c |    4 ++--
 arch/mips/kernel/ptrace.c       |   16 ++++++---------
 arch/powerpc/kernel/ptrace.c    |    5 +++--
 arch/s390/kernel/ptrace.c       |    6 +++---
 arch/sh/kernel/ptrace_32.c      |    5 +++--
 arch/sh/kernel/ptrace_64.c      |    5 +++--
 arch/sparc/kernel/ptrace_64.c   |    7 ++++---
 9 files changed, 46 insertions(+), 51 deletions(-)

diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
index 7bd2d3c..57a5bde 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -416,12 +416,7 @@ ENTRY(vector_swi)
 
 #ifdef CONFIG_SECCOMP
 	tst	r10, #_TIF_SECCOMP
-	beq	1f
-	mov	r0, scno
-	bl	__secure_computing	
-	add	r0, sp, #S_R0 + S_OFF		@ pointer to regs
-	ldmia	r0, {r0 - r3}			@ have to reload r0 - r3
-1:
+	bne	__sys_trace			@ are we in seccomp mode?
 #endif
 
 	tst	r10, #_TIF_SYSCALL_WORK		@ are we tracing syscalls?
diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
index 14e3826..1654071 100644
--- a/arch/arm/kernel/ptrace.c
+++ b/arch/arm/kernel/ptrace.c
@@ -909,32 +909,32 @@ long arch_ptrace(struct task_struct *child, long request,
 
 asmlinkage int syscall_trace(int why, struct pt_regs *regs, int scno)
 {
-	unsigned long ip;
-
-	if (why)
-		audit_syscall_exit(regs);
-	else
-		audit_syscall_entry(AUDIT_ARCH_ARM, scno, regs->ARM_r0,
-				    regs->ARM_r1, regs->ARM_r2, regs->ARM_r3);
-
-	if (!test_thread_flag(TIF_SYSCALL_TRACE))
-		return scno;
-
-	current_thread_info()->syscall = scno;
-
 	/*
 	 * IP is used to denote syscall entry/exit:
 	 * IP = 0 -> entry, =1 -> exit
 	 */
-	ip = regs->ARM_ip;
-	regs->ARM_ip = why;
+	unsigned long ip = regs->ARM_ip;
 
-	if (why)
-		tracehook_report_syscall_exit(regs, 0);
-	else if (tracehook_report_syscall_entry(regs))
-		current_thread_info()->syscall = -1;
-
-	regs->ARM_ip = ip;
+	current_thread_info()->syscall = scno;
+	if (why) {	/* exit */
+		if (test_thread_flag(TIF_SYSCALL_TRACE)) {
+			regs->ARM_ip = why;
+			tracehook_report_syscall_exit(regs, 0);
+			regs->ARM_ip = ip;
+		}
+		audit_syscall_exit(regs);
+	} else {	/* entry */
+		if (test_thread_flag(TIF_SYSCALL_TRACE)) {
+			regs->ARM_ip = why;
+			if (tracehook_report_syscall_entry(regs))
+				current_thread_info()->syscall = -1;
+			regs->ARM_ip = ip;
+		}
+		/* Call secure_computing after any userspace changes are done */
+		secure_computing_strict(current_thread_info()->syscall);
+		audit_syscall_entry(AUDIT_ARCH_ARM, scno, regs->ARM_r0,
+				    regs->ARM_r1, regs->ARM_r2, regs->ARM_r3);
+	}
 
 	return current_thread_info()->syscall;
 }
diff --git a/arch/microblaze/kernel/ptrace.c b/arch/microblaze/kernel/ptrace.c
index ab1b9db..9a917a7 100644
--- a/arch/microblaze/kernel/ptrace.c
+++ b/arch/microblaze/kernel/ptrace.c
@@ -136,8 +136,6 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
 {
 	long ret = 0;
 
-	secure_computing_strict(regs->r12);
-
 	if (test_thread_flag(TIF_SYSCALL_TRACE) &&
 	    tracehook_report_syscall_entry(regs))
 		/*
@@ -147,6 +145,8 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
 		 */
 		ret = -1L;
 
+	secure_computing_strict(regs->r12);
+
 	audit_syscall_entry(EM_MICROBLAZE, regs->r12, regs->r5, regs->r6,
 			    regs->r7, regs->r8);
 
diff --git a/arch/mips/kernel/ptrace.c b/arch/mips/kernel/ptrace.c
index 4812c6d..2b1f5f3 100644
--- a/arch/mips/kernel/ptrace.c
+++ b/arch/mips/kernel/ptrace.c
@@ -534,19 +534,16 @@ static inline int audit_arch(void)
  */
 asmlinkage void syscall_trace_enter(struct pt_regs *regs)
 {
-	/* do the secure computing check first */
-	secure_computing_strict(regs->regs[2]);
-
-	if (!(current->ptrace & PT_PTRACED))
-		goto out;
-
-	if (!test_thread_flag(TIF_SYSCALL_TRACE))
-		goto out;
-
+	if ((current->ptrace & PT_PTRACED) &&
+	    test_thread_flag(TIF_SYSCALL_TRACE)) {
 	/* The 0x80 provides a way for the tracing parent to distinguish
 	   between a syscall stop and SIGTRAP delivery */
 	ptrace_notify(SIGTRAP | ((current->ptrace & PT_TRACESYSGOOD) ?
 	                         0x80 : 0));
+	}
+
+	/* do the secure computing check after the system calls are final. */
+	secure_computing_strict(regs->regs[2]);
 
 	/*
 	 * this isn't the same as continuing with a signal, but it will do
@@ -558,7 +555,6 @@ asmlinkage void syscall_trace_enter(struct pt_regs *regs)
 		current->exit_code = 0;
 	}
 
-out:
 	audit_syscall_entry(audit_arch(), regs->regs[2],
 			    regs->regs[4], regs->regs[5],
 			    regs->regs[6], regs->regs[7]);
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index dd5e214..4b725ed 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -1710,8 +1710,6 @@ long do_syscall_trace_enter(struct pt_regs *regs)
 {
 	long ret = 0;
 
-	secure_computing_strict(regs->gpr[0]);
-
 	if (test_thread_flag(TIF_SYSCALL_TRACE) &&
 	    tracehook_report_syscall_entry(regs))
 		/*
@@ -1721,6 +1719,9 @@ long do_syscall_trace_enter(struct pt_regs *regs)
 		 */
 		ret = -1L;
 
+	if (!ret)
+		secure_computing_strict(regs->gpr[0]);
+
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
 		trace_sys_enter(regs, regs->gpr[0]);
 
diff --git a/arch/s390/kernel/ptrace.c b/arch/s390/kernel/ptrace.c
index 4993e68..b95c0ac 100644
--- a/arch/s390/kernel/ptrace.c
+++ b/arch/s390/kernel/ptrace.c
@@ -718,9 +718,6 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
 {
 	long ret = 0;
 
-	/* Do the secure computing check first. */
-	secure_computing_strict(regs->gprs[2]);
-
 	/*
 	 * The sysc_tracesys code in entry.S stored the system
 	 * call number to gprs[2].
@@ -737,6 +734,9 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
 		ret = -1;
 	}
 
+	/* Do the secure computing check with a final syscall set. */
+	secure_computing_strict(regs->gprs[2]);
+
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
 		trace_sys_enter(regs, regs->gprs[2]);
 
diff --git a/arch/sh/kernel/ptrace_32.c b/arch/sh/kernel/ptrace_32.c
index 81f999a..2500ce1 100644
--- a/arch/sh/kernel/ptrace_32.c
+++ b/arch/sh/kernel/ptrace_32.c
@@ -503,8 +503,6 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
 {
 	long ret = 0;
 
-	secure_computing_strict(regs->regs[0]);
-
 	if (test_thread_flag(TIF_SYSCALL_TRACE) &&
 	    tracehook_report_syscall_entry(regs))
 		/*
@@ -514,6 +512,9 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
 		 */
 		ret = -1L;
 
+	if (!ret)
+		secure_computing_strict(regs->regs[0]);
+
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
 		trace_sys_enter(regs, regs->regs[0]);
 
diff --git a/arch/sh/kernel/ptrace_64.c b/arch/sh/kernel/ptrace_64.c
index af90339..cc030e0 100644
--- a/arch/sh/kernel/ptrace_64.c
+++ b/arch/sh/kernel/ptrace_64.c
@@ -522,8 +522,6 @@ asmlinkage long long do_syscall_trace_enter(struct pt_regs *regs)
 {
 	long long ret = 0;
 
-	secure_computing_strict(regs->regs[9]);
-
 	if (test_thread_flag(TIF_SYSCALL_TRACE) &&
 	    tracehook_report_syscall_entry(regs))
 		/*
@@ -533,6 +531,9 @@ asmlinkage long long do_syscall_trace_enter(struct pt_regs *regs)
 		 */
 		ret = -1LL;
 
+	if (!ret)
+		secure_computing_strict(regs->regs[9]);
+
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
 		trace_sys_enter(regs, regs->regs[9]);
 
diff --git a/arch/sparc/kernel/ptrace_64.c b/arch/sparc/kernel/ptrace_64.c
index 484daba..31fff51 100644
--- a/arch/sparc/kernel/ptrace_64.c
+++ b/arch/sparc/kernel/ptrace_64.c
@@ -1061,12 +1061,13 @@ asmlinkage int syscall_trace_enter(struct pt_regs *regs)
 {
 	int ret = 0;
 
-	/* do the secure computing check first */
-	secure_computing_strict(regs->u_regs[UREG_G1]);
-
 	if (test_thread_flag(TIF_SYSCALL_TRACE))
 		ret = tracehook_report_syscall_entry(regs);
 
+	/* do the secure computing check with the final syscall */
+	if (!ret)
+		secure_computing_strict(regs->u_regs[UREG_G1]);
+
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
 		trace_sys_enter(regs, regs->u_regs[UREG_G1]);
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [RFC PATCH 0/3] move the secure_computing call
  2012-05-24 16:07           ` [kernel-hardening] " Will Drewry
@ 2012-05-24 16:13             ` H. Peter Anvin
  -1 siblings, 0 replies; 77+ messages in thread
From: H. Peter Anvin @ 2012-05-24 16:13 UTC (permalink / raw)
  To: Will Drewry
  Cc: linux-kernel, mcgrathr, indan, netdev, linux-security-module,
	kernel-hardening, mingo, oleg, peterz, rdunlap, tglx, luto,
	serge.hallyn, pmoore, akpm, corbet, markus, coreyb, keescook,
	viro, jmorris

On 05/24/2012 09:07 AM, Will Drewry wrote:
> This is an RFC based on the comments from Al Viro and Eric Paris
> regarding ptrace()rs being able to change the system call the kernel
> sees after the seccomp enforcement has occurred (for mode 1 or 2).
> 
> With this series applied, a (p)tracer of a process with seccomp enabled
> will be unable to change the tracee's system call number after the
> secure computing check has been performed.
> 
> The x86 change is tested, as is the seccomp.c change.  For other arches,
> it is not (RFC :).  Given that there are other inconsistencies in this
> code across architectures, I'm not sure if it makes sense to attempt to
> fix them all at once or to roll through as I attempt to add seccomp
> filter support.
> 
> As is, the biggest benefit of this change is just setting consistent
> expectations in what the ptrace/seccomp interactions should be.  The
> current ability for ptrace to "bypass" secure computing (by remapping
> allowed system calls) is not necessarily a problem, but it is not
> necessarily intuitive behavior.
> 
> Thoughts, comments, flames will be appreciated!

I think this really screws with using seccomp for self-interception.  I
wouldn't inherently be opposed to the following flow:

	seccomp -> ptrace -> seccomp

... i.e. if ptrace is enabled and we enable something, run it through
seccomp again, but there are bunch of use cases (mostly involving
SIGSYS) where doing ptrace before seccomp is just bizarre.

	-hpa


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: [RFC PATCH 0/3] move the secure_computing call
@ 2012-05-24 16:13             ` H. Peter Anvin
  0 siblings, 0 replies; 77+ messages in thread
From: H. Peter Anvin @ 2012-05-24 16:13 UTC (permalink / raw)
  To: Will Drewry
  Cc: linux-kernel, mcgrathr, indan, netdev, linux-security-module,
	kernel-hardening, mingo, oleg, peterz, rdunlap, tglx, luto,
	serge.hallyn, pmoore, akpm, corbet, markus, coreyb, keescook,
	viro, jmorris

On 05/24/2012 09:07 AM, Will Drewry wrote:
> This is an RFC based on the comments from Al Viro and Eric Paris
> regarding ptrace()rs being able to change the system call the kernel
> sees after the seccomp enforcement has occurred (for mode 1 or 2).
> 
> With this series applied, a (p)tracer of a process with seccomp enabled
> will be unable to change the tracee's system call number after the
> secure computing check has been performed.
> 
> The x86 change is tested, as is the seccomp.c change.  For other arches,
> it is not (RFC :).  Given that there are other inconsistencies in this
> code across architectures, I'm not sure if it makes sense to attempt to
> fix them all at once or to roll through as I attempt to add seccomp
> filter support.
> 
> As is, the biggest benefit of this change is just setting consistent
> expectations in what the ptrace/seccomp interactions should be.  The
> current ability for ptrace to "bypass" secure computing (by remapping
> allowed system calls) is not necessarily a problem, but it is not
> necessarily intuitive behavior.
> 
> Thoughts, comments, flames will be appreciated!

I think this really screws with using seccomp for self-interception.  I
wouldn't inherently be opposed to the following flow:

	seccomp -> ptrace -> seccomp

... i.e. if ptrace is enabled and we enable something, run it through
seccomp again, but there are bunch of use cases (mostly involving
SIGSYS) where doing ptrace before seccomp is just bizarre.

	-hpa

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC PATCH 1/3] seccomp: Don't allow tracers to abuse RET_TRACE
  2012-05-24 16:07             ` [kernel-hardening] " Will Drewry
@ 2012-05-24 17:54               ` Indan Zupancic
  -1 siblings, 0 replies; 77+ messages in thread
From: Indan Zupancic @ 2012-05-24 17:54 UTC (permalink / raw)
  To: Will Drewry
  Cc: linux-kernel, mcgrathr, hpa, netdev, linux-security-module,
	kernel-hardening, mingo, oleg, peterz, rdunlap, tglx, luto,
	serge.hallyn, pmoore, akpm, corbet, markus, coreyb, keescook,
	viro, jmorris, Will Drewry

On Thu, May 24, 2012 18:07, Will Drewry wrote:
> Ensure that consumers of the PTRACE_EVENT_SECCOMP notification
> cannot change the system call number for the traced task
> without it resulting in the system call being skipped.
>
> Traditionally, tracers will set the system call number to
> -1 to skip the system call. This behavior will work as expected
> but the tracer will be unable to remap the system call to a valid
> system call after the seccomp policy has been evaluated.
>
> Signed-off-by: Will Drewry <wad@chromium.org>
> ---
>  kernel/seccomp.c |    4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index ee376be..33f0ad6 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -425,6 +425,10 @@ int __secure_computing(int this_syscall)
>  			 */
>  			if (fatal_signal_pending(current))
>  				break;
> +			/* Skip the system call if the tracer changed it. */
> +			if (this_syscall !=
> +			    syscall_get_nr(current, task_pt_regs(current)))
> +				goto skip;
>  			return 0;
>  		case SECCOMP_RET_ALLOW:
>  			return 0;
> --

This patch doesn't make any sense whatsoever. You can't know why a system
call was blocked by a seccomp filter, assuming it's always because of the
system call number is wrong.

Also, you don't check if an allowed system call is changed into a denied
one, so this doesn't protect against ptracers bypassing seccomp filters.

And one of the main points of PTRACE_EVENT_SECCOMP events was that it's
useful for cases that can't be handled or decided by the seccomp filter.
Then taking away the ability to change the syscall number makes it a lot
less useful.

Either do the seccomp test before or after ptrace, or both, but please
don't introduce ad hoc checks like this.

Greetings,

Indan



^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: [RFC PATCH 1/3] seccomp: Don't allow tracers to abuse RET_TRACE
@ 2012-05-24 17:54               ` Indan Zupancic
  0 siblings, 0 replies; 77+ messages in thread
From: Indan Zupancic @ 2012-05-24 17:54 UTC (permalink / raw)
  To: Will Drewry
  Cc: linux-kernel, mcgrathr, hpa, netdev, linux-security-module,
	kernel-hardening, mingo, oleg, peterz, rdunlap, tglx, luto,
	serge.hallyn, pmoore, akpm, corbet, markus, coreyb, keescook,
	viro, jmorris

On Thu, May 24, 2012 18:07, Will Drewry wrote:
> Ensure that consumers of the PTRACE_EVENT_SECCOMP notification
> cannot change the system call number for the traced task
> without it resulting in the system call being skipped.
>
> Traditionally, tracers will set the system call number to
> -1 to skip the system call. This behavior will work as expected
> but the tracer will be unable to remap the system call to a valid
> system call after the seccomp policy has been evaluated.
>
> Signed-off-by: Will Drewry <wad@chromium.org>
> ---
>  kernel/seccomp.c |    4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index ee376be..33f0ad6 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -425,6 +425,10 @@ int __secure_computing(int this_syscall)
>  			 */
>  			if (fatal_signal_pending(current))
>  				break;
> +			/* Skip the system call if the tracer changed it. */
> +			if (this_syscall !=
> +			    syscall_get_nr(current, task_pt_regs(current)))
> +				goto skip;
>  			return 0;
>  		case SECCOMP_RET_ALLOW:
>  			return 0;
> --

This patch doesn't make any sense whatsoever. You can't know why a system
call was blocked by a seccomp filter, assuming it's always because of the
system call number is wrong.

Also, you don't check if an allowed system call is changed into a denied
one, so this doesn't protect against ptracers bypassing seccomp filters.

And one of the main points of PTRACE_EVENT_SECCOMP events was that it's
useful for cases that can't be handled or decided by the seccomp filter.
Then taking away the ability to change the syscall number makes it a lot
less useful.

Either do the seccomp test before or after ptrace, or both, but please
don't introduce ad hoc checks like this.

Greetings,

Indan

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC PATCH 0/3] move the secure_computing call
  2012-05-24 16:13             ` [kernel-hardening] " H. Peter Anvin
@ 2012-05-24 18:07               ` Roland McGrath
  -1 siblings, 0 replies; 77+ messages in thread
From: Roland McGrath @ 2012-05-24 18:07 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Will Drewry, linux-kernel, indan, netdev, linux-security-module,
	kernel-hardening, mingo, oleg, peterz, rdunlap, tglx, luto,
	serge.hallyn, pmoore, akpm, corbet, markus, coreyb, keescook,
	viro, jmorris

On Thu, May 24, 2012 at 9:13 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> I think this really screws with using seccomp for self-interception.  I
> wouldn't inherently be opposed to the following flow:
>
>        seccomp -> ptrace -> seccomp
>
> ... i.e. if ptrace is enabled and we enable something, run it through
> seccomp again, but there are bunch of use cases (mostly involving
> SIGSYS) where doing ptrace before seccomp is just bizarre.

Are you sure?  This is ptrace syscall tracing going first.
If seccomp generates a SIGSYS, then ptrace will still get its opportunity
to intercept the signal and change the register state however it likes.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: [RFC PATCH 0/3] move the secure_computing call
@ 2012-05-24 18:07               ` Roland McGrath
  0 siblings, 0 replies; 77+ messages in thread
From: Roland McGrath @ 2012-05-24 18:07 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Will Drewry, linux-kernel, indan, netdev, linux-security-module,
	kernel-hardening, mingo, oleg, peterz, rdunlap, tglx, luto,
	serge.hallyn, pmoore, akpm, corbet, markus, coreyb, keescook,
	viro, jmorris

On Thu, May 24, 2012 at 9:13 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> I think this really screws with using seccomp for self-interception.  I
> wouldn't inherently be opposed to the following flow:
>
>        seccomp -> ptrace -> seccomp
>
> ... i.e. if ptrace is enabled and we enable something, run it through
> seccomp again, but there are bunch of use cases (mostly involving
> SIGSYS) where doing ptrace before seccomp is just bizarre.

Are you sure?  This is ptrace syscall tracing going first.
If seccomp generates a SIGSYS, then ptrace will still get its opportunity
to intercept the signal and change the register state however it likes.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC PATCH 1/3] seccomp: Don't allow tracers to abuse RET_TRACE
  2012-05-24 17:54               ` [kernel-hardening] " Indan Zupancic
@ 2012-05-24 18:24                 ` Will Drewry
  -1 siblings, 0 replies; 77+ messages in thread
From: Will Drewry @ 2012-05-24 18:24 UTC (permalink / raw)
  To: Indan Zupancic
  Cc: linux-kernel, mcgrathr, hpa, netdev, linux-security-module,
	kernel-hardening, mingo, oleg, peterz, rdunlap, tglx, luto,
	serge.hallyn, pmoore, akpm, corbet, markus, coreyb, keescook,
	viro, jmorris

On Thu, May 24, 2012 at 12:54 PM, Indan Zupancic <indan@nul.nu> wrote:
> On Thu, May 24, 2012 18:07, Will Drewry wrote:
>> Ensure that consumers of the PTRACE_EVENT_SECCOMP notification
>> cannot change the system call number for the traced task
>> without it resulting in the system call being skipped.
>>
>> Traditionally, tracers will set the system call number to
>> -1 to skip the system call. This behavior will work as expected
>> but the tracer will be unable to remap the system call to a valid
>> system call after the seccomp policy has been evaluated.
>>
>> Signed-off-by: Will Drewry <wad@chromium.org>
>> ---
>>  kernel/seccomp.c |    4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
>> index ee376be..33f0ad6 100644
>> --- a/kernel/seccomp.c
>> +++ b/kernel/seccomp.c
>> @@ -425,6 +425,10 @@ int __secure_computing(int this_syscall)
>>                        */
>>                       if (fatal_signal_pending(current))
>>                               break;
>> +                     /* Skip the system call if the tracer changed it. */
>> +                     if (this_syscall !=
>> +                         syscall_get_nr(current, task_pt_regs(current)))
>> +                             goto skip;
>>                       return 0;
>>               case SECCOMP_RET_ALLOW:
>>                       return 0;
>> --
>
> This patch doesn't make any sense whatsoever. You can't know why a system
> call was blocked by a seccomp filter, assuming it's always because of the
> system call number is wrong.

All this does is assert that the tracer can't change the syscall
number without it skipping the call.  If seccomp returned
SECCOMP_RET_TRACE because the argument to open was O_RDWR, then
everything is fine.

> Also, you don't check if an allowed system call is changed into a denied
> one, so this doesn't protect against ptracers bypassing seccomp filters.

This enforces that the system call that is going to be executed is the
one that triggered SECCOMP_RET_TRACE.  That means seccomp is
delegating the go/no-go decision to the tracer.  I don't understand
your assertion here.  This code doesn't affect the PTRACE_SYSCALL
case.

> And one of the main points of PTRACE_EVENT_SECCOMP events was that it's
> useful for cases that can't be handled or decided by the seccomp filter.
> Then taking away the ability to change the syscall number makes it a lot
> less useful.

Do you have a valid case where you want to remap one system call to
another without the ability to also handle the syscall exit path and
do any fixups?  I've mostly just seen skip, allow, update arguments -
not swapping the entire syscall.  That said, it's possible.  you could
do all sorts of weird things with ptrace if you want :)

> Either do the seccomp test before or after ptrace, or both, but please
> don't introduce ad hoc checks like this.

I don't feel strongly about this RFC, but I don't believe that
expectations are being changed dramatically.

thanks!
will

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: [RFC PATCH 1/3] seccomp: Don't allow tracers to abuse RET_TRACE
@ 2012-05-24 18:24                 ` Will Drewry
  0 siblings, 0 replies; 77+ messages in thread
From: Will Drewry @ 2012-05-24 18:24 UTC (permalink / raw)
  To: Indan Zupancic
  Cc: linux-kernel, mcgrathr, hpa, netdev, linux-security-module,
	kernel-hardening, mingo, oleg, peterz, rdunlap, tglx, luto,
	serge.hallyn, pmoore, akpm, corbet, markus, coreyb, keescook,
	viro, jmorris

On Thu, May 24, 2012 at 12:54 PM, Indan Zupancic <indan@nul.nu> wrote:
> On Thu, May 24, 2012 18:07, Will Drewry wrote:
>> Ensure that consumers of the PTRACE_EVENT_SECCOMP notification
>> cannot change the system call number for the traced task
>> without it resulting in the system call being skipped.
>>
>> Traditionally, tracers will set the system call number to
>> -1 to skip the system call. This behavior will work as expected
>> but the tracer will be unable to remap the system call to a valid
>> system call after the seccomp policy has been evaluated.
>>
>> Signed-off-by: Will Drewry <wad@chromium.org>
>> ---
>>  kernel/seccomp.c |    4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
>> index ee376be..33f0ad6 100644
>> --- a/kernel/seccomp.c
>> +++ b/kernel/seccomp.c
>> @@ -425,6 +425,10 @@ int __secure_computing(int this_syscall)
>>                        */
>>                       if (fatal_signal_pending(current))
>>                               break;
>> +                     /* Skip the system call if the tracer changed it. */
>> +                     if (this_syscall !=
>> +                         syscall_get_nr(current, task_pt_regs(current)))
>> +                             goto skip;
>>                       return 0;
>>               case SECCOMP_RET_ALLOW:
>>                       return 0;
>> --
>
> This patch doesn't make any sense whatsoever. You can't know why a system
> call was blocked by a seccomp filter, assuming it's always because of the
> system call number is wrong.

All this does is assert that the tracer can't change the syscall
number without it skipping the call.  If seccomp returned
SECCOMP_RET_TRACE because the argument to open was O_RDWR, then
everything is fine.

> Also, you don't check if an allowed system call is changed into a denied
> one, so this doesn't protect against ptracers bypassing seccomp filters.

This enforces that the system call that is going to be executed is the
one that triggered SECCOMP_RET_TRACE.  That means seccomp is
delegating the go/no-go decision to the tracer.  I don't understand
your assertion here.  This code doesn't affect the PTRACE_SYSCALL
case.

> And one of the main points of PTRACE_EVENT_SECCOMP events was that it's
> useful for cases that can't be handled or decided by the seccomp filter.
> Then taking away the ability to change the syscall number makes it a lot
> less useful.

Do you have a valid case where you want to remap one system call to
another without the ability to also handle the syscall exit path and
do any fixups?  I've mostly just seen skip, allow, update arguments -
not swapping the entire syscall.  That said, it's possible.  you could
do all sorts of weird things with ptrace if you want :)

> Either do the seccomp test before or after ptrace, or both, but please
> don't introduce ad hoc checks like this.

I don't feel strongly about this RFC, but I don't believe that
expectations are being changed dramatically.

thanks!
will

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC PATCH 0/3] move the secure_computing call
  2012-05-24 18:07               ` [kernel-hardening] " Roland McGrath
@ 2012-05-24 18:27                 ` Indan Zupancic
  -1 siblings, 0 replies; 77+ messages in thread
From: Indan Zupancic @ 2012-05-24 18:27 UTC (permalink / raw)
  To: Roland McGrath
  Cc: H. Peter Anvin, Will Drewry, linux-kernel, netdev,
	linux-security-module, kernel-hardening, mingo, oleg, peterz,
	rdunlap, tglx, luto, serge.hallyn, pmoore, akpm, corbet, markus,
	coreyb, keescook, viro, jmorris

On Thu, May 24, 2012 20:07, Roland McGrath wrote:
> On Thu, May 24, 2012 at 9:13 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>> I think this really screws with using seccomp for self-interception.  I
>> wouldn't inherently be opposed to the following flow:
>>
>> 	seccomp -> ptrace -> seccomp
>>
>> ... i.e. if ptrace is enabled and we enable something, run it through
>> seccomp again, but there are bunch of use cases (mostly involving
>> SIGSYS) where doing ptrace before seccomp is just bizarre.
>
> Are you sure?  This is ptrace syscall tracing going first.
> If seccomp generates a SIGSYS, then ptrace will still get its opportunity
> to intercept the signal and change the register state however it likes.

If so, then the seccomp check needs to be redone after any ptrace
changes, or we should give up and just do the seccomp check first,
instead of possibly looping forever. PTRACE_EVENT_SECCOMP has the
same problem.

If a seccomp filtered task can do ptrace(), it can easily bypass
the seccomp filter by ptracing any task not under the same filter
but from the same user. And then it can puppeteer the victim into
doing anything it wishes. So pretending seccomp can make a ptracer
secure is futile, I think. Perhaps it's better to keep it simple and
always do the seccomp test first and ignore ptrace changes, however
sad that may seem. Seccomp had the power to stop ptrace(). It didn't,
so it shouldn't try to do it afterwards either.

It's a bit fuzzy though, only reason why doing seccomp first is more
convenient is because seccomp can generate ptrace events. I don't
think it will make a difference in practice because ptrace(2) won't
be allowed by seccomp filters anyway, so it's a bit of a theoretical
problem.

Greetings,

Indan



^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: [RFC PATCH 0/3] move the secure_computing call
@ 2012-05-24 18:27                 ` Indan Zupancic
  0 siblings, 0 replies; 77+ messages in thread
From: Indan Zupancic @ 2012-05-24 18:27 UTC (permalink / raw)
  To: Roland McGrath
  Cc: H. Peter Anvin, Will Drewry, linux-kernel, netdev,
	linux-security-module, kernel-hardening, mingo, oleg, peterz,
	rdunlap, tglx, luto, serge.hallyn, pmoore, akpm, corbet, markus,
	coreyb, keescook, viro, jmorris

On Thu, May 24, 2012 20:07, Roland McGrath wrote:
> On Thu, May 24, 2012 at 9:13 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>> I think this really screws with using seccomp for self-interception.  I
>> wouldn't inherently be opposed to the following flow:
>>
>> 	seccomp -> ptrace -> seccomp
>>
>> ... i.e. if ptrace is enabled and we enable something, run it through
>> seccomp again, but there are bunch of use cases (mostly involving
>> SIGSYS) where doing ptrace before seccomp is just bizarre.
>
> Are you sure?  This is ptrace syscall tracing going first.
> If seccomp generates a SIGSYS, then ptrace will still get its opportunity
> to intercept the signal and change the register state however it likes.

If so, then the seccomp check needs to be redone after any ptrace
changes, or we should give up and just do the seccomp check first,
instead of possibly looping forever. PTRACE_EVENT_SECCOMP has the
same problem.

If a seccomp filtered task can do ptrace(), it can easily bypass
the seccomp filter by ptracing any task not under the same filter
but from the same user. And then it can puppeteer the victim into
doing anything it wishes. So pretending seccomp can make a ptracer
secure is futile, I think. Perhaps it's better to keep it simple and
always do the seccomp test first and ignore ptrace changes, however
sad that may seem. Seccomp had the power to stop ptrace(). It didn't,
so it shouldn't try to do it afterwards either.

It's a bit fuzzy though, only reason why doing seccomp first is more
convenient is because seccomp can generate ptrace events. I don't
think it will make a difference in practice because ptrace(2) won't
be allowed by seccomp filters anyway, so it's a bit of a theoretical
problem.

Greetings,

Indan

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC PATCH 0/3] move the secure_computing call
  2012-05-24 18:27                 ` [kernel-hardening] " Indan Zupancic
@ 2012-05-24 18:45                   ` H. Peter Anvin
  -1 siblings, 0 replies; 77+ messages in thread
From: H. Peter Anvin @ 2012-05-24 18:45 UTC (permalink / raw)
  To: Indan Zupancic
  Cc: Roland McGrath, Will Drewry, linux-kernel, netdev,
	linux-security-module, kernel-hardening, mingo, oleg, peterz,
	rdunlap, tglx, luto, serge.hallyn, pmoore, akpm, corbet, markus,
	coreyb, keescook, viro, jmorris

On 05/24/2012 11:27 AM, Indan Zupancic wrote:
> 
> If so, then the seccomp check needs to be redone after any ptrace
> changes, or we should give up and just do the seccomp check first,
> instead of possibly looping forever. PTRACE_EVENT_SECCOMP has the
> same problem.
> 
> If a seccomp filtered task can do ptrace(), it can easily bypass
> the seccomp filter by ptracing any task not under the same filter
> but from the same user. And then it can puppeteer the victim into
> doing anything it wishes. So pretending seccomp can make a ptracer
> secure is futile, I think. Perhaps it's better to keep it simple and
> always do the seccomp test first and ignore ptrace changes, however
> sad that may seem. Seccomp had the power to stop ptrace(). It didn't,
> so it shouldn't try to do it afterwards either.
> 
> It's a bit fuzzy though, only reason why doing seccomp first is more
> convenient is because seccomp can generate ptrace events. I don't
> think it will make a difference in practice because ptrace(2) won't
> be allowed by seccomp filters anyway, so it's a bit of a theoretical
> problem.
> 

No, that's not the reason to do seccomp first.  The reason to do seccomp
first is that a seccomp filter can be part of the process execution and
can completely transform the system call picture.

Consider UML, for example.  It uses ptrace to capture system calls and
execute them on the behalf of the process.  It needs to know what system
calls *actually* are done by the virtual process.

(Note: that being said, UML might very well be better done using seccomp
filters *instead* of ptrace, but that's another matter.)

I agree with you, if the process is traceable it is rather questionable
to claim any kind of security; more likely consider that a debugging
mode and tell people to lock out ptrace for real sandboxing.

	-hpa


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: [RFC PATCH 0/3] move the secure_computing call
@ 2012-05-24 18:45                   ` H. Peter Anvin
  0 siblings, 0 replies; 77+ messages in thread
From: H. Peter Anvin @ 2012-05-24 18:45 UTC (permalink / raw)
  To: Indan Zupancic
  Cc: Roland McGrath, Will Drewry, linux-kernel, netdev,
	linux-security-module, kernel-hardening, mingo, oleg, peterz,
	rdunlap, tglx, luto, serge.hallyn, pmoore, akpm, corbet, markus,
	coreyb, keescook, viro, jmorris

On 05/24/2012 11:27 AM, Indan Zupancic wrote:
> 
> If so, then the seccomp check needs to be redone after any ptrace
> changes, or we should give up and just do the seccomp check first,
> instead of possibly looping forever. PTRACE_EVENT_SECCOMP has the
> same problem.
> 
> If a seccomp filtered task can do ptrace(), it can easily bypass
> the seccomp filter by ptracing any task not under the same filter
> but from the same user. And then it can puppeteer the victim into
> doing anything it wishes. So pretending seccomp can make a ptracer
> secure is futile, I think. Perhaps it's better to keep it simple and
> always do the seccomp test first and ignore ptrace changes, however
> sad that may seem. Seccomp had the power to stop ptrace(). It didn't,
> so it shouldn't try to do it afterwards either.
> 
> It's a bit fuzzy though, only reason why doing seccomp first is more
> convenient is because seccomp can generate ptrace events. I don't
> think it will make a difference in practice because ptrace(2) won't
> be allowed by seccomp filters anyway, so it's a bit of a theoretical
> problem.
> 

No, that's not the reason to do seccomp first.  The reason to do seccomp
first is that a seccomp filter can be part of the process execution and
can completely transform the system call picture.

Consider UML, for example.  It uses ptrace to capture system calls and
execute them on the behalf of the process.  It needs to know what system
calls *actually* are done by the virtual process.

(Note: that being said, UML might very well be better done using seccomp
filters *instead* of ptrace, but that's another matter.)

I agree with you, if the process is traceable it is rather questionable
to claim any kind of security; more likely consider that a debugging
mode and tell people to lock out ptrace for real sandboxing.

	-hpa

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC PATCH 0/3] move the secure_computing call
  2012-05-24 18:45                   ` [kernel-hardening] " H. Peter Anvin
@ 2012-05-24 19:39                     ` Indan Zupancic
  -1 siblings, 0 replies; 77+ messages in thread
From: Indan Zupancic @ 2012-05-24 19:39 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Roland McGrath, Will Drewry, linux-kernel, netdev,
	linux-security-module, kernel-hardening, mingo, oleg, peterz,
	rdunlap, tglx, luto, serge.hallyn, pmoore, akpm, corbet, markus,
	coreyb, keescook, viro, jmorris

On Thu, May 24, 2012 20:45, H. Peter Anvin wrote:
> On 05/24/2012 11:27 AM, Indan Zupancic wrote:
>>
>> If so, then the seccomp check needs to be redone after any ptrace
>> changes, or we should give up and just do the seccomp check first,
>> instead of possibly looping forever. PTRACE_EVENT_SECCOMP has the
>> same problem.
>>
>> If a seccomp filtered task can do ptrace(), it can easily bypass
>> the seccomp filter by ptracing any task not under the same filter
>> but from the same user. And then it can puppeteer the victim into
>> doing anything it wishes. So pretending seccomp can make a ptracer
>> secure is futile, I think. Perhaps it's better to keep it simple and
>> always do the seccomp test first and ignore ptrace changes, however
>> sad that may seem. Seccomp had the power to stop ptrace(). It didn't,
>> so it shouldn't try to do it afterwards either.
>>
>> It's a bit fuzzy though, only reason why doing seccomp first is more
>> convenient is because seccomp can generate ptrace events. I don't
>> think it will make a difference in practice because ptrace(2) won't
>> be allowed by seccomp filters anyway, so it's a bit of a theoretical
>> problem.
>>
>
> No, that's not the reason to do seccomp first.  The reason to do seccomp
> first is that a seccomp filter can be part of the process execution and
> can completely transform the system call picture.

How? All that seccomp can do is deny system calls and kill the task.
What you describe sounds more like ptrace.

>
> Consider UML, for example.  It uses ptrace to capture system calls and
> execute them on the behalf of the process.  It needs to know what system
> calls *actually* are done by the virtual process.

Seccomp can't change system calls, only prevent them, so I miss how it
can change anything for UML in that way.

>
> (Note: that being said, UML might very well be better done using seccomp
> filters *instead* of ptrace, but that's another matter.)

Well, obviously it will use seccomp filters instead of ptrace when possible
and do the more complicated stuff via PTRACE_SECCOMP_EVENT when seccomp isn't
sufficient. But mainly for performance reasons.

>
> I agree with you, if the process is traceable it is rather questionable
> to claim any kind of security; more likely consider that a debugging
> mode and tell people to lock out ptrace for real sandboxing.

"process is traceable" should be "process is able to trace".

Greetings,

Indan



^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: [RFC PATCH 0/3] move the secure_computing call
@ 2012-05-24 19:39                     ` Indan Zupancic
  0 siblings, 0 replies; 77+ messages in thread
From: Indan Zupancic @ 2012-05-24 19:39 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Roland McGrath, Will Drewry, linux-kernel, netdev,
	linux-security-module, kernel-hardening, mingo, oleg, peterz,
	rdunlap, tglx, luto, serge.hallyn, pmoore, akpm, corbet, markus,
	coreyb, keescook, viro, jmorris

On Thu, May 24, 2012 20:45, H. Peter Anvin wrote:
> On 05/24/2012 11:27 AM, Indan Zupancic wrote:
>>
>> If so, then the seccomp check needs to be redone after any ptrace
>> changes, or we should give up and just do the seccomp check first,
>> instead of possibly looping forever. PTRACE_EVENT_SECCOMP has the
>> same problem.
>>
>> If a seccomp filtered task can do ptrace(), it can easily bypass
>> the seccomp filter by ptracing any task not under the same filter
>> but from the same user. And then it can puppeteer the victim into
>> doing anything it wishes. So pretending seccomp can make a ptracer
>> secure is futile, I think. Perhaps it's better to keep it simple and
>> always do the seccomp test first and ignore ptrace changes, however
>> sad that may seem. Seccomp had the power to stop ptrace(). It didn't,
>> so it shouldn't try to do it afterwards either.
>>
>> It's a bit fuzzy though, only reason why doing seccomp first is more
>> convenient is because seccomp can generate ptrace events. I don't
>> think it will make a difference in practice because ptrace(2) won't
>> be allowed by seccomp filters anyway, so it's a bit of a theoretical
>> problem.
>>
>
> No, that's not the reason to do seccomp first.  The reason to do seccomp
> first is that a seccomp filter can be part of the process execution and
> can completely transform the system call picture.

How? All that seccomp can do is deny system calls and kill the task.
What you describe sounds more like ptrace.

>
> Consider UML, for example.  It uses ptrace to capture system calls and
> execute them on the behalf of the process.  It needs to know what system
> calls *actually* are done by the virtual process.

Seccomp can't change system calls, only prevent them, so I miss how it
can change anything for UML in that way.

>
> (Note: that being said, UML might very well be better done using seccomp
> filters *instead* of ptrace, but that's another matter.)

Well, obviously it will use seccomp filters instead of ptrace when possible
and do the more complicated stuff via PTRACE_SECCOMP_EVENT when seccomp isn't
sufficient. But mainly for performance reasons.

>
> I agree with you, if the process is traceable it is rather questionable
> to claim any kind of security; more likely consider that a debugging
> mode and tell people to lock out ptrace for real sandboxing.

"process is traceable" should be "process is able to trace".

Greetings,

Indan

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC PATCH 1/3] seccomp: Don't allow tracers to abuse RET_TRACE
  2012-05-24 18:24                 ` [kernel-hardening] " Will Drewry
@ 2012-05-24 20:17                   ` Indan Zupancic
  -1 siblings, 0 replies; 77+ messages in thread
From: Indan Zupancic @ 2012-05-24 20:17 UTC (permalink / raw)
  To: Will Drewry
  Cc: linux-kernel, mcgrathr, hpa, netdev, linux-security-module,
	kernel-hardening, mingo, oleg, peterz, rdunlap, tglx, luto,
	serge.hallyn, pmoore, akpm, corbet, markus, coreyb, keescook,
	viro, jmorris

On Thu, May 24, 2012 20:24, Will Drewry wrote:
> On Thu, May 24, 2012 at 12:54 PM, Indan Zupancic <indan@nul.nu> wrote:
>> This patch doesn't make any sense whatsoever. You can't know why a system
>> call was blocked by a seccomp filter, assuming it's always because of the
>> system call number is wrong.
>
> All this does is assert that the tracer can't change the syscall
> number without it skipping the call.

Why wouldn't it be allowed to change the system call number?

And try answering that question in a way that doesn't apply to syscall
argument values too.

> If seccomp returned
> SECCOMP_RET_TRACE because the argument to open was O_RDWR, then
> everything is fine.

No it's not fine, because it's inconsistent and arbitrary.

>
>> Also, you don't check if an allowed system call is changed into a denied
>> one, so this doesn't protect against ptracers bypassing seccomp filters.
>
> This enforces that the system call that is going to be executed is the
> one that triggered SECCOMP_RET_TRACE.  That means seccomp is
> delegating the go/no-go decision to the tracer.  I don't understand
> your assertion here.  This code doesn't affect the PTRACE_SYSCALL
> case.

It still gives normal ptracers the ability to bypass seccomp by changing
allowed system calls into system calls that would have been denied. So
considering ptrace can still be used to execute arbitrary system calls,
why add this special case restriction to SECCOMP_RET_TRACE?

>
>> And one of the main points of PTRACE_EVENT_SECCOMP events was that it's
>> useful for cases that can't be handled or decided by the seccomp filter.
>> Then taking away the ability to change the syscall number makes it a lot
>> less useful.
>
> Do you have a valid case where you want to remap one system call to
> another without the ability to also handle the syscall exit path and
> do any fixups?  I've mostly just seen skip, allow, update arguments -
> not swapping the entire syscall.  That said, it's possible.  you could
> do all sorts of weird things with ptrace if you want :)

One case is for system call injection where an arbitrary syscall is
hijacked for the jailer's purposes. But that needs the exit path too.
Another one is replacing fork() with clone(), but that's only necessary
on 2.4. Another one is to let the process call exit() to get rid of it.

But it's easy to get the exit notification by resuming with PTRACE_SYSCALL
after getting a PTRACE_EVENT_SECCOMP event, so I don't see how it matters
if we're interested in the exit path or not.

In the jailer the first system call after an execve is replaced with a
call to mmap2() to get a read-only shared mapping which is used to avoid
file path modifications and similar races. And when using seccomp all
events are PTRACE_EVENT_SECCOMP ones, and those are exactly the ones
where we need the shared read-only mapping. We could work around your
ad hoc restriction, but the main thing is if seccomp filters is just
about syscall numbers anyway, then why use BPF instead of a bitmask?

The whole point of PTRACE_EVENT_SECCOMP is to delegate to ptrace, it
means "we can't decide in the filter, ask the ptracer". This implies
that the ptracer is trusted, so doing any checks afterwards is a bit
pointless. But if you want to do it anyway, at least do it properly
and re-run the filter after ptrace. To avoid loops you need to allow
the syscall if you get another PTRACE_EVENT_SECCOMP.

>
>> Either do the seccomp test before or after ptrace, or both, but please
>> don't introduce ad hoc checks like this.
>
> I don't feel strongly about this RFC, but I don't believe that
> expectations are being changed dramatically.

As seccomp can generate ptrace events, the only thing that makes sense is
to either do seccomp first and ignore ptrace changes, or rerun filters after
any ptrace changes. But as I said in my other mail, if a process can call
ptrace(), it will pretty much avoid all secomp filters anyway, seccomp
filters which allow ptrace() are pretty much guaranteed to be insecure.
We're talking about a non-seccomped process ptracing a seccomped process,
both with the same UID. I don't think it matters in practice what you do
in this case, from a security point of view.

Greetings,

Indan



^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: [RFC PATCH 1/3] seccomp: Don't allow tracers to abuse RET_TRACE
@ 2012-05-24 20:17                   ` Indan Zupancic
  0 siblings, 0 replies; 77+ messages in thread
From: Indan Zupancic @ 2012-05-24 20:17 UTC (permalink / raw)
  To: Will Drewry
  Cc: linux-kernel, mcgrathr, hpa, netdev, linux-security-module,
	kernel-hardening, mingo, oleg, peterz, rdunlap, tglx, luto,
	serge.hallyn, pmoore, akpm, corbet, markus, coreyb, keescook,
	viro, jmorris

On Thu, May 24, 2012 20:24, Will Drewry wrote:
> On Thu, May 24, 2012 at 12:54 PM, Indan Zupancic <indan@nul.nu> wrote:
>> This patch doesn't make any sense whatsoever. You can't know why a system
>> call was blocked by a seccomp filter, assuming it's always because of the
>> system call number is wrong.
>
> All this does is assert that the tracer can't change the syscall
> number without it skipping the call.

Why wouldn't it be allowed to change the system call number?

And try answering that question in a way that doesn't apply to syscall
argument values too.

> If seccomp returned
> SECCOMP_RET_TRACE because the argument to open was O_RDWR, then
> everything is fine.

No it's not fine, because it's inconsistent and arbitrary.

>
>> Also, you don't check if an allowed system call is changed into a denied
>> one, so this doesn't protect against ptracers bypassing seccomp filters.
>
> This enforces that the system call that is going to be executed is the
> one that triggered SECCOMP_RET_TRACE.  That means seccomp is
> delegating the go/no-go decision to the tracer.  I don't understand
> your assertion here.  This code doesn't affect the PTRACE_SYSCALL
> case.

It still gives normal ptracers the ability to bypass seccomp by changing
allowed system calls into system calls that would have been denied. So
considering ptrace can still be used to execute arbitrary system calls,
why add this special case restriction to SECCOMP_RET_TRACE?

>
>> And one of the main points of PTRACE_EVENT_SECCOMP events was that it's
>> useful for cases that can't be handled or decided by the seccomp filter.
>> Then taking away the ability to change the syscall number makes it a lot
>> less useful.
>
> Do you have a valid case where you want to remap one system call to
> another without the ability to also handle the syscall exit path and
> do any fixups?  I've mostly just seen skip, allow, update arguments -
> not swapping the entire syscall.  That said, it's possible.  you could
> do all sorts of weird things with ptrace if you want :)

One case is for system call injection where an arbitrary syscall is
hijacked for the jailer's purposes. But that needs the exit path too.
Another one is replacing fork() with clone(), but that's only necessary
on 2.4. Another one is to let the process call exit() to get rid of it.

But it's easy to get the exit notification by resuming with PTRACE_SYSCALL
after getting a PTRACE_EVENT_SECCOMP event, so I don't see how it matters
if we're interested in the exit path or not.

In the jailer the first system call after an execve is replaced with a
call to mmap2() to get a read-only shared mapping which is used to avoid
file path modifications and similar races. And when using seccomp all
events are PTRACE_EVENT_SECCOMP ones, and those are exactly the ones
where we need the shared read-only mapping. We could work around your
ad hoc restriction, but the main thing is if seccomp filters is just
about syscall numbers anyway, then why use BPF instead of a bitmask?

The whole point of PTRACE_EVENT_SECCOMP is to delegate to ptrace, it
means "we can't decide in the filter, ask the ptracer". This implies
that the ptracer is trusted, so doing any checks afterwards is a bit
pointless. But if you want to do it anyway, at least do it properly
and re-run the filter after ptrace. To avoid loops you need to allow
the syscall if you get another PTRACE_EVENT_SECCOMP.

>
>> Either do the seccomp test before or after ptrace, or both, but please
>> don't introduce ad hoc checks like this.
>
> I don't feel strongly about this RFC, but I don't believe that
> expectations are being changed dramatically.

As seccomp can generate ptrace events, the only thing that makes sense is
to either do seccomp first and ignore ptrace changes, or rerun filters after
any ptrace changes. But as I said in my other mail, if a process can call
ptrace(), it will pretty much avoid all secomp filters anyway, seccomp
filters which allow ptrace() are pretty much guaranteed to be insecure.
We're talking about a non-seccomped process ptracing a seccomped process,
both with the same UID. I don't think it matters in practice what you do
in this case, from a security point of view.

Greetings,

Indan

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC PATCH 0/3] move the secure_computing call
  2012-05-24 16:07           ` [kernel-hardening] " Will Drewry
@ 2012-05-24 22:00             ` Andrew Morton
  -1 siblings, 0 replies; 77+ messages in thread
From: Andrew Morton @ 2012-05-24 22:00 UTC (permalink / raw)
  To: Will Drewry
  Cc: linux-kernel, mcgrathr, hpa, indan, netdev,
	linux-security-module, kernel-hardening, mingo, oleg, peterz,
	rdunlap, tglx, luto, serge.hallyn, pmoore, corbet, markus,
	coreyb, keescook, viro, jmorris

On Thu, 24 May 2012 11:07:58 -0500
Will Drewry <wad@chromium.org> wrote:

> This is an RFC based on the comments from Al Viro and Eric Paris
> regarding ptrace()rs being able to change the system call the kernel
> sees after the seccomp enforcement has occurred (for mode 1 or 2).

Perhaps you could repeat those comments in this changelog.

> With this series applied, a (p)tracer of a process with seccomp enabled
> will be unable to change the tracee's system call number after the
> secure computing check has been performed.
> 
> The x86 change is tested, as is the seccomp.c change.  For other arches,
> it is not (RFC :).  Given that there are other inconsistencies in this
> code across architectures, I'm not sure if it makes sense to attempt to
> fix them all at once or to roll through as I attempt to add seccomp
> filter support.
> 
> As is, the biggest benefit of this change is just setting consistent
> expectations in what the ptrace/seccomp interactions should be.  The
> current ability for ptrace to "bypass" secure computing (by remapping
> allowed system calls) is not necessarily a problem, but it is not
> necessarily intuitive behavior.
> 

Because my take on the above reasoning is "why did you bother writing
these patches"!


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: [RFC PATCH 0/3] move the secure_computing call
@ 2012-05-24 22:00             ` Andrew Morton
  0 siblings, 0 replies; 77+ messages in thread
From: Andrew Morton @ 2012-05-24 22:00 UTC (permalink / raw)
  To: Will Drewry
  Cc: linux-kernel, mcgrathr, hpa, indan, netdev,
	linux-security-module, kernel-hardening, mingo, oleg, peterz,
	rdunlap, tglx, luto, serge.hallyn, pmoore, corbet, markus,
	coreyb, keescook, viro, jmorris

On Thu, 24 May 2012 11:07:58 -0500
Will Drewry <wad@chromium.org> wrote:

> This is an RFC based on the comments from Al Viro and Eric Paris
> regarding ptrace()rs being able to change the system call the kernel
> sees after the seccomp enforcement has occurred (for mode 1 or 2).

Perhaps you could repeat those comments in this changelog.

> With this series applied, a (p)tracer of a process with seccomp enabled
> will be unable to change the tracee's system call number after the
> secure computing check has been performed.
> 
> The x86 change is tested, as is the seccomp.c change.  For other arches,
> it is not (RFC :).  Given that there are other inconsistencies in this
> code across architectures, I'm not sure if it makes sense to attempt to
> fix them all at once or to roll through as I attempt to add seccomp
> filter support.
> 
> As is, the biggest benefit of this change is just setting consistent
> expectations in what the ptrace/seccomp interactions should be.  The
> current ability for ptrace to "bypass" secure computing (by remapping
> allowed system calls) is not necessarily a problem, but it is not
> necessarily intuitive behavior.
> 

Because my take on the above reasoning is "why did you bother writing
these patches"!

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC PATCH 0/3] move the secure_computing call
  2012-05-24 16:07           ` [kernel-hardening] " Will Drewry
@ 2012-05-24 23:40             ` James Morris
  -1 siblings, 0 replies; 77+ messages in thread
From: James Morris @ 2012-05-24 23:40 UTC (permalink / raw)
  To: Will Drewry
  Cc: linux-kernel, mcgrathr, hpa, indan, netdev,
	linux-security-module, kernel-hardening, mingo, oleg, peterz,
	rdunlap, tglx, luto, serge.hallyn, pmoore, akpm, corbet, markus,
	coreyb, keescook, viro

On Thu, 24 May 2012, Will Drewry wrote:

> As is, the biggest benefit of this change is just setting consistent
> expectations in what the ptrace/seccomp interactions should be.  The
> current ability for ptrace to "bypass" secure computing (by remapping
> allowed system calls) is not necessarily a problem, but it is not
> necessarily intuitive behavior.

Indeed -- while the purpose of seccomp is to reduce the attack surface of 
the syscall interface, if a user allows ptrace, attackers will definitely 
see that as an attack vector, if it allows them to increase that attack 
surface.

It at least needs to be well-documented.

-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: [RFC PATCH 0/3] move the secure_computing call
@ 2012-05-24 23:40             ` James Morris
  0 siblings, 0 replies; 77+ messages in thread
From: James Morris @ 2012-05-24 23:40 UTC (permalink / raw)
  To: Will Drewry
  Cc: linux-kernel, mcgrathr, hpa, indan, netdev,
	linux-security-module, kernel-hardening, mingo, oleg, peterz,
	rdunlap, tglx, luto, serge.hallyn, pmoore, akpm, corbet, markus,
	coreyb, keescook, viro

On Thu, 24 May 2012, Will Drewry wrote:

> As is, the biggest benefit of this change is just setting consistent
> expectations in what the ptrace/seccomp interactions should be.  The
> current ability for ptrace to "bypass" secure computing (by remapping
> allowed system calls) is not necessarily a problem, but it is not
> necessarily intuitive behavior.

Indeed -- while the purpose of seccomp is to reduce the attack surface of 
the syscall interface, if a user allows ptrace, attackers will definitely 
see that as an attack vector, if it allows them to increase that attack 
surface.

It at least needs to be well-documented.

-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC PATCH 0/3] move the secure_computing call
  2012-05-24 23:40             ` [kernel-hardening] " James Morris
@ 2012-05-24 23:43               ` Andrew Lutomirski
  -1 siblings, 0 replies; 77+ messages in thread
From: Andrew Lutomirski @ 2012-05-24 23:43 UTC (permalink / raw)
  To: James Morris
  Cc: Will Drewry, linux-kernel, mcgrathr, hpa, indan, netdev,
	linux-security-module, kernel-hardening, mingo, oleg, peterz,
	rdunlap, tglx, serge.hallyn, pmoore, akpm, corbet, markus,
	coreyb, keescook, viro

On Thu, May 24, 2012 at 4:40 PM, James Morris <jmorris@namei.org> wrote:
> On Thu, 24 May 2012, Will Drewry wrote:
>
>> As is, the biggest benefit of this change is just setting consistent
>> expectations in what the ptrace/seccomp interactions should be.  The
>> current ability for ptrace to "bypass" secure computing (by remapping
>> allowed system calls) is not necessarily a problem, but it is not
>> necessarily intuitive behavior.
>
> Indeed -- while the purpose of seccomp is to reduce the attack surface of
> the syscall interface, if a user allows ptrace, attackers will definitely
> see that as an attack vector, if it allows them to increase that attack
> surface.
>
> It at least needs to be well-documented.

IMO the behavior should change.  Alternatively, a post-ptrace syscall
should have to pass the *tracer's* seccomp filter, but that seems
overcomplicated and confusing.

OTOH, allowing ptrace in a seccomp filter is asking for trouble anyway
-- if you can ptrace something outside the sandbox, you've escaped.

--Andy

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: [RFC PATCH 0/3] move the secure_computing call
@ 2012-05-24 23:43               ` Andrew Lutomirski
  0 siblings, 0 replies; 77+ messages in thread
From: Andrew Lutomirski @ 2012-05-24 23:43 UTC (permalink / raw)
  To: James Morris
  Cc: Will Drewry, linux-kernel, mcgrathr, hpa, indan, netdev,
	linux-security-module, kernel-hardening, mingo, oleg, peterz,
	rdunlap, tglx, serge.hallyn, pmoore, akpm, corbet, markus,
	coreyb, keescook, viro

On Thu, May 24, 2012 at 4:40 PM, James Morris <jmorris@namei.org> wrote:
> On Thu, 24 May 2012, Will Drewry wrote:
>
>> As is, the biggest benefit of this change is just setting consistent
>> expectations in what the ptrace/seccomp interactions should be.  The
>> current ability for ptrace to "bypass" secure computing (by remapping
>> allowed system calls) is not necessarily a problem, but it is not
>> necessarily intuitive behavior.
>
> Indeed -- while the purpose of seccomp is to reduce the attack surface of
> the syscall interface, if a user allows ptrace, attackers will definitely
> see that as an attack vector, if it allows them to increase that attack
> surface.
>
> It at least needs to be well-documented.

IMO the behavior should change.  Alternatively, a post-ptrace syscall
should have to pass the *tracer's* seccomp filter, but that seems
overcomplicated and confusing.

OTOH, allowing ptrace in a seccomp filter is asking for trouble anyway
-- if you can ptrace something outside the sandbox, you've escaped.

--Andy

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC PATCH 0/3] move the secure_computing call
  2012-05-24 23:43               ` [kernel-hardening] " Andrew Lutomirski
@ 2012-05-24 23:56                 ` H. Peter Anvin
  -1 siblings, 0 replies; 77+ messages in thread
From: H. Peter Anvin @ 2012-05-24 23:56 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: James Morris, Will Drewry, linux-kernel, mcgrathr, indan, netdev,
	linux-security-module, kernel-hardening, mingo, oleg, peterz,
	rdunlap, tglx, serge.hallyn, pmoore, akpm, corbet, markus,
	coreyb, keescook, viro

On 05/24/2012 04:43 PM, Andrew Lutomirski wrote:
> 
> IMO the behavior should change.  Alternatively, a post-ptrace syscall
> should have to pass the *tracer's* seccomp filter, but that seems
> overcomplicated and confusing.
> 
> OTOH, allowing ptrace in a seccomp filter is asking for trouble anyway
> -- if you can ptrace something outside the sandbox, you've escaped.
> 

This is my suggestion: if there is demand, make it possible to install a
*second* seccomp filter program which is run on the result of the
ptrace.  I.e.:

Untraced:	process -> seccomp1 -> kernel

Traced:		process -> seccomp1 -> ptrace -> seccomp2 -> kernel

This is something we could add later if there is demand.

	-hpa



^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: [RFC PATCH 0/3] move the secure_computing call
@ 2012-05-24 23:56                 ` H. Peter Anvin
  0 siblings, 0 replies; 77+ messages in thread
From: H. Peter Anvin @ 2012-05-24 23:56 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: James Morris, Will Drewry, linux-kernel, mcgrathr, indan, netdev,
	linux-security-module, kernel-hardening, mingo, oleg, peterz,
	rdunlap, tglx, serge.hallyn, pmoore, akpm, corbet, markus,
	coreyb, keescook, viro

On 05/24/2012 04:43 PM, Andrew Lutomirski wrote:
> 
> IMO the behavior should change.  Alternatively, a post-ptrace syscall
> should have to pass the *tracer's* seccomp filter, but that seems
> overcomplicated and confusing.
> 
> OTOH, allowing ptrace in a seccomp filter is asking for trouble anyway
> -- if you can ptrace something outside the sandbox, you've escaped.
> 

This is my suggestion: if there is demand, make it possible to install a
*second* seccomp filter program which is run on the result of the
ptrace.  I.e.:

Untraced:	process -> seccomp1 -> kernel

Traced:		process -> seccomp1 -> ptrace -> seccomp2 -> kernel

This is something we could add later if there is demand.

	-hpa

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC PATCH 0/3] move the secure_computing call
  2012-05-24 23:56                 ` [kernel-hardening] " H. Peter Anvin
@ 2012-05-25  0:26                   ` Andrew Lutomirski
  -1 siblings, 0 replies; 77+ messages in thread
From: Andrew Lutomirski @ 2012-05-25  0:26 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: James Morris, Will Drewry, linux-kernel, mcgrathr, indan, netdev,
	linux-security-module, kernel-hardening, mingo, oleg, peterz,
	rdunlap, tglx, serge.hallyn, pmoore, akpm, corbet, markus,
	coreyb, keescook, viro

On Thu, May 24, 2012 at 4:56 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 05/24/2012 04:43 PM, Andrew Lutomirski wrote:
>>
>> IMO the behavior should change.  Alternatively, a post-ptrace syscall
>> should have to pass the *tracer's* seccomp filter, but that seems
>> overcomplicated and confusing.
>>
>> OTOH, allowing ptrace in a seccomp filter is asking for trouble anyway
>> -- if you can ptrace something outside the sandbox, you've escaped.
>>
>
> This is my suggestion: if there is demand, make it possible to install a
> *second* seccomp filter program which is run on the result of the
> ptrace.  I.e.:
>
> Untraced:       process -> seccomp1 -> kernel
>
> Traced:         process -> seccomp1 -> ptrace -> seccomp2 -> kernel

Just to clarify: are you suggesting that, for now, the traced behavior
should be:

process -> seccomp -> ptrace -> kernel?

If so, I think the man page or something should have a big fat warning
that seccomp filters should *never* allow ptrace (even PTRACE_TRACEME)
unless they fully understand the issue.

In any case, I think that the UML interaction is missing the point.
UML will *emulate* the seccomp filter.  If it chooses to use host
seccomp filters for some business, that's its business.

--Andy

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: [RFC PATCH 0/3] move the secure_computing call
@ 2012-05-25  0:26                   ` Andrew Lutomirski
  0 siblings, 0 replies; 77+ messages in thread
From: Andrew Lutomirski @ 2012-05-25  0:26 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: James Morris, Will Drewry, linux-kernel, mcgrathr, indan, netdev,
	linux-security-module, kernel-hardening, mingo, oleg, peterz,
	rdunlap, tglx, serge.hallyn, pmoore, akpm, corbet, markus,
	coreyb, keescook, viro

On Thu, May 24, 2012 at 4:56 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 05/24/2012 04:43 PM, Andrew Lutomirski wrote:
>>
>> IMO the behavior should change.  Alternatively, a post-ptrace syscall
>> should have to pass the *tracer's* seccomp filter, but that seems
>> overcomplicated and confusing.
>>
>> OTOH, allowing ptrace in a seccomp filter is asking for trouble anyway
>> -- if you can ptrace something outside the sandbox, you've escaped.
>>
>
> This is my suggestion: if there is demand, make it possible to install a
> *second* seccomp filter program which is run on the result of the
> ptrace.  I.e.:
>
> Untraced:       process -> seccomp1 -> kernel
>
> Traced:         process -> seccomp1 -> ptrace -> seccomp2 -> kernel

Just to clarify: are you suggesting that, for now, the traced behavior
should be:

process -> seccomp -> ptrace -> kernel?

If so, I think the man page or something should have a big fat warning
that seccomp filters should *never* allow ptrace (even PTRACE_TRACEME)
unless they fully understand the issue.

In any case, I think that the UML interaction is missing the point.
UML will *emulate* the seccomp filter.  If it chooses to use host
seccomp filters for some business, that's its business.

--Andy

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC PATCH 0/3] move the secure_computing call
  2012-05-25  0:26                   ` [kernel-hardening] " Andrew Lutomirski
@ 2012-05-25  0:38                     ` H. Peter Anvin
  -1 siblings, 0 replies; 77+ messages in thread
From: H. Peter Anvin @ 2012-05-25  0:38 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: James Morris, Will Drewry, linux-kernel, mcgrathr, indan, netdev,
	linux-security-module, kernel-hardening, mingo, oleg, peterz,
	rdunlap, tglx, serge.hallyn, pmoore, akpm, corbet, markus,
	coreyb, keescook, viro

On 05/24/2012 05:26 PM, Andrew Lutomirski wrote:
> 
> Just to clarify: are you suggesting that, for now, the traced behavior
> should be:
> 
> process -> seccomp -> ptrace -> kernel?
> 
> If so, I think the man page or something should have a big fat warning
> that seccomp filters should *never* allow ptrace (even PTRACE_TRACEME)
> unless they fully understand the issue.
> 

Yes, and yes.

> In any case, I think that the UML interaction is missing the point.
> UML will *emulate* the seccomp filter.  If it chooses to use host
> seccomp filters for some business, that's its business.

I don't see why UML should have to emulate the seccomp filter.  With the
proposed order, then it can simply use the seccomp filter provided by
the host.  Furthermore, with this sequencing UML can actually *use*
seccomp to provide the simulation.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: [RFC PATCH 0/3] move the secure_computing call
@ 2012-05-25  0:38                     ` H. Peter Anvin
  0 siblings, 0 replies; 77+ messages in thread
From: H. Peter Anvin @ 2012-05-25  0:38 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: James Morris, Will Drewry, linux-kernel, mcgrathr, indan, netdev,
	linux-security-module, kernel-hardening, mingo, oleg, peterz,
	rdunlap, tglx, serge.hallyn, pmoore, akpm, corbet, markus,
	coreyb, keescook, viro

On 05/24/2012 05:26 PM, Andrew Lutomirski wrote:
> 
> Just to clarify: are you suggesting that, for now, the traced behavior
> should be:
> 
> process -> seccomp -> ptrace -> kernel?
> 
> If so, I think the man page or something should have a big fat warning
> that seccomp filters should *never* allow ptrace (even PTRACE_TRACEME)
> unless they fully understand the issue.
> 

Yes, and yes.

> In any case, I think that the UML interaction is missing the point.
> UML will *emulate* the seccomp filter.  If it chooses to use host
> seccomp filters for some business, that's its business.

I don't see why UML should have to emulate the seccomp filter.  With the
proposed order, then it can simply use the seccomp filter provided by
the host.  Furthermore, with this sequencing UML can actually *use*
seccomp to provide the simulation.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC PATCH 0/3] move the secure_computing call
  2012-05-25  0:38                     ` [kernel-hardening] " H. Peter Anvin
@ 2012-05-25  0:55                       ` Andrew Lutomirski
  -1 siblings, 0 replies; 77+ messages in thread
From: Andrew Lutomirski @ 2012-05-25  0:55 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: James Morris, Will Drewry, linux-kernel, mcgrathr, indan, netdev,
	linux-security-module, kernel-hardening, mingo, oleg, peterz,
	rdunlap, tglx, serge.hallyn, pmoore, akpm, corbet, markus,
	coreyb, keescook, viro

On Thu, May 24, 2012 at 5:38 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 05/24/2012 05:26 PM, Andrew Lutomirski wrote:
>>
>> Just to clarify: are you suggesting that, for now, the traced behavior
>> should be:
>>
>> process -> seccomp -> ptrace -> kernel?
>>
>> If so, I think the man page or something should have a big fat warning
>> that seccomp filters should *never* allow ptrace (even PTRACE_TRACEME)
>> unless they fully understand the issue.
>>
>
> Yes, and yes.
>
>> In any case, I think that the UML interaction is missing the point.
>> UML will *emulate* the seccomp filter.  If it chooses to use host
>> seccomp filters for some business, that's its business.
>
> I don't see why UML should have to emulate the seccomp filter.  With the
> proposed order, then it can simply use the seccomp filter provided by
> the host.  Furthermore, with this sequencing UML can actually *use*
> seccomp to provide the simulation.

Hmm.  I guess I agree.  I'll shut up now :)

--Andy

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: [RFC PATCH 0/3] move the secure_computing call
@ 2012-05-25  0:55                       ` Andrew Lutomirski
  0 siblings, 0 replies; 77+ messages in thread
From: Andrew Lutomirski @ 2012-05-25  0:55 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: James Morris, Will Drewry, linux-kernel, mcgrathr, indan, netdev,
	linux-security-module, kernel-hardening, mingo, oleg, peterz,
	rdunlap, tglx, serge.hallyn, pmoore, akpm, corbet, markus,
	coreyb, keescook, viro

On Thu, May 24, 2012 at 5:38 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 05/24/2012 05:26 PM, Andrew Lutomirski wrote:
>>
>> Just to clarify: are you suggesting that, for now, the traced behavior
>> should be:
>>
>> process -> seccomp -> ptrace -> kernel?
>>
>> If so, I think the man page or something should have a big fat warning
>> that seccomp filters should *never* allow ptrace (even PTRACE_TRACEME)
>> unless they fully understand the issue.
>>
>
> Yes, and yes.
>
>> In any case, I think that the UML interaction is missing the point.
>> UML will *emulate* the seccomp filter.  If it chooses to use host
>> seccomp filters for some business, that's its business.
>
> I don't see why UML should have to emulate the seccomp filter.  With the
> proposed order, then it can simply use the seccomp filter provided by
> the host.  Furthermore, with this sequencing UML can actually *use*
> seccomp to provide the simulation.

Hmm.  I guess I agree.  I'll shut up now :)

--Andy

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [RFC PATCH 0/3] move the secure_computing call
  2012-05-24 22:00             ` [kernel-hardening] " Andrew Morton
@ 2012-05-25  1:55               ` Will Drewry
  -1 siblings, 0 replies; 77+ messages in thread
From: Will Drewry @ 2012-05-25  1:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, mcgrathr, hpa, indan, netdev,
	linux-security-module, kernel-hardening, mingo, oleg, peterz,
	rdunlap, tglx, luto, serge.hallyn, pmoore, corbet, markus,
	coreyb, keescook, viro, jmorris

On Thu, May 24, 2012 at 5:00 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Thu, 24 May 2012 11:07:58 -0500
> Will Drewry <wad@chromium.org> wrote:
>
>> This is an RFC based on the comments from Al Viro and Eric Paris
>> regarding ptrace()rs being able to change the system call the kernel
>> sees after the seccomp enforcement has occurred (for mode 1 or 2).
>
> Perhaps you could repeat those comments in this changelog.

Oops :)  Here's the context -- https://lkml.org/lkml/2012/5/21/326

I doubt there's need for a repost though.

>> With this series applied, a (p)tracer of a process with seccomp enabled
>> will be unable to change the tracee's system call number after the
>> secure computing check has been performed.
>>
>> The x86 change is tested, as is the seccomp.c change.  For other arches,
>> it is not (RFC :).  Given that there are other inconsistencies in this
>> code across architectures, I'm not sure if it makes sense to attempt to
>> fix them all at once or to roll through as I attempt to add seccomp
>> filter support.
>>
>> As is, the biggest benefit of this change is just setting consistent
>> expectations in what the ptrace/seccomp interactions should be.  The
>> current ability for ptrace to "bypass" secure computing (by remapping
>> allowed system calls) is not necessarily a problem, but it is not
>> necessarily intuitive behavior.
>>
>
> Because my take on the above reasoning is "why did you bother writing
> these patches"!

Just to be thorough -- I wanted to make sure the discussion was framed
against actual code just in case I was missing something.  Otherwise,
I'd be happy to see these patches disappear into the annals of the
wayback machine.

thanks!

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [kernel-hardening] Re: [RFC PATCH 0/3] move the secure_computing call
@ 2012-05-25  1:55               ` Will Drewry
  0 siblings, 0 replies; 77+ messages in thread
From: Will Drewry @ 2012-05-25  1:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, mcgrathr, hpa, indan, netdev,
	linux-security-module, kernel-hardening, mingo, oleg, peterz,
	rdunlap, tglx, luto, serge.hallyn, pmoore, corbet, markus,
	coreyb, keescook, viro, jmorris

On Thu, May 24, 2012 at 5:00 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Thu, 24 May 2012 11:07:58 -0500
> Will Drewry <wad@chromium.org> wrote:
>
>> This is an RFC based on the comments from Al Viro and Eric Paris
>> regarding ptrace()rs being able to change the system call the kernel
>> sees after the seccomp enforcement has occurred (for mode 1 or 2).
>
> Perhaps you could repeat those comments in this changelog.

Oops :)  Here's the context -- https://lkml.org/lkml/2012/5/21/326

I doubt there's need for a repost though.

>> With this series applied, a (p)tracer of a process with seccomp enabled
>> will be unable to change the tracee's system call number after the
>> secure computing check has been performed.
>>
>> The x86 change is tested, as is the seccomp.c change.  For other arches,
>> it is not (RFC :).  Given that there are other inconsistencies in this
>> code across architectures, I'm not sure if it makes sense to attempt to
>> fix them all at once or to roll through as I attempt to add seccomp
>> filter support.
>>
>> As is, the biggest benefit of this change is just setting consistent
>> expectations in what the ptrace/seccomp interactions should be.  The
>> current ability for ptrace to "bypass" secure computing (by remapping
>> allowed system calls) is not necessarily a problem, but it is not
>> necessarily intuitive behavior.
>>
>
> Because my take on the above reasoning is "why did you bother writing
> these patches"!

Just to be thorough -- I wanted to make sure the discussion was framed
against actual code just in case I was missing something.  Otherwise,
I'd be happy to see these patches disappear into the annals of the
wayback machine.

thanks!

^ permalink raw reply	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2012-05-25  1:55 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-21 18:21 seccomp and ptrace. what is the correct order? Eric Paris
2012-05-21 18:21 ` [kernel-hardening] " Eric Paris
2012-05-21 18:25 ` Roland McGrath
2012-05-21 18:25   ` [kernel-hardening] " Roland McGrath
2012-05-21 18:40   ` Andrew Lutomirski
2012-05-21 19:20   ` Indan Zupancic
2012-05-21 19:20     ` [kernel-hardening] " Indan Zupancic
2012-05-22 16:23     ` Will Drewry
2012-05-22 16:23       ` [kernel-hardening] " Will Drewry
2012-05-22 16:26       ` Will Drewry
2012-05-22 16:26         ` [kernel-hardening] " Will Drewry
2012-05-22 17:39       ` Al Viro
2012-05-22 17:39         ` [kernel-hardening] " Al Viro
2012-05-22 20:26         ` Will Drewry
2012-05-22 20:26           ` [kernel-hardening] " Will Drewry
2012-05-22 20:34           ` H. Peter Anvin
2012-05-22 20:34             ` [kernel-hardening] " H. Peter Anvin
2012-05-22 20:48             ` Will Drewry
2012-05-22 20:48               ` [kernel-hardening] " Will Drewry
2012-05-22 21:07               ` Al Viro
2012-05-22 21:07                 ` [kernel-hardening] " Al Viro
2012-05-22 21:17                 ` Roland McGrath
2012-05-22 21:17                   ` [kernel-hardening] " Roland McGrath
2012-05-22 21:18                   ` H. Peter Anvin
2012-05-22 21:18                     ` [kernel-hardening] " H. Peter Anvin
2012-05-22 22:20                   ` Al Viro
2012-05-22 22:20                     ` [kernel-hardening] " Al Viro
2012-05-22 21:09               ` H. Peter Anvin
2012-05-22 21:09                 ` [kernel-hardening] " H. Peter Anvin
2012-05-22 21:14                 ` Will Drewry
2012-05-22 21:14                   ` [kernel-hardening] " Will Drewry
2012-05-22 21:37                   ` H. Peter Anvin
2012-05-22 21:37                     ` [kernel-hardening] " H. Peter Anvin
2012-05-24 16:07         ` [RFC PATCH 0/3] move the secure_computing call Will Drewry
2012-05-24 16:07           ` [kernel-hardening] " Will Drewry
2012-05-24 16:07           ` [RFC PATCH 1/3] seccomp: Don't allow tracers to abuse RET_TRACE Will Drewry
2012-05-24 16:07             ` [kernel-hardening] " Will Drewry
2012-05-24 17:54             ` Indan Zupancic
2012-05-24 17:54               ` [kernel-hardening] " Indan Zupancic
2012-05-24 18:24               ` Will Drewry
2012-05-24 18:24                 ` [kernel-hardening] " Will Drewry
2012-05-24 20:17                 ` Indan Zupancic
2012-05-24 20:17                   ` [kernel-hardening] " Indan Zupancic
2012-05-24 16:08           ` [RFC PATCH 2/3] arch/x86: move secure_computing after ptrace Will Drewry
2012-05-24 16:08             ` [kernel-hardening] " Will Drewry
2012-05-24 16:08           ` [RFC PATCH 3/3] arch/*: move secure_computing after trace Will Drewry
2012-05-24 16:08             ` [kernel-hardening] " Will Drewry
2012-05-24 16:13           ` [RFC PATCH 0/3] move the secure_computing call H. Peter Anvin
2012-05-24 16:13             ` [kernel-hardening] " H. Peter Anvin
2012-05-24 18:07             ` Roland McGrath
2012-05-24 18:07               ` [kernel-hardening] " Roland McGrath
2012-05-24 18:27               ` Indan Zupancic
2012-05-24 18:27                 ` [kernel-hardening] " Indan Zupancic
2012-05-24 18:45                 ` H. Peter Anvin
2012-05-24 18:45                   ` [kernel-hardening] " H. Peter Anvin
2012-05-24 19:39                   ` Indan Zupancic
2012-05-24 19:39                     ` [kernel-hardening] " Indan Zupancic
2012-05-24 22:00           ` Andrew Morton
2012-05-24 22:00             ` [kernel-hardening] " Andrew Morton
2012-05-25  1:55             ` Will Drewry
2012-05-25  1:55               ` [kernel-hardening] " Will Drewry
2012-05-24 23:40           ` James Morris
2012-05-24 23:40             ` [kernel-hardening] " James Morris
2012-05-24 23:43             ` Andrew Lutomirski
2012-05-24 23:43               ` [kernel-hardening] " Andrew Lutomirski
2012-05-24 23:56               ` H. Peter Anvin
2012-05-24 23:56                 ` [kernel-hardening] " H. Peter Anvin
2012-05-25  0:26                 ` Andrew Lutomirski
2012-05-25  0:26                   ` [kernel-hardening] " Andrew Lutomirski
2012-05-25  0:38                   ` H. Peter Anvin
2012-05-25  0:38                     ` [kernel-hardening] " H. Peter Anvin
2012-05-25  0:55                     ` Andrew Lutomirski
2012-05-25  0:55                       ` [kernel-hardening] " Andrew Lutomirski
2012-05-21 18:47 ` seccomp and ptrace. what is the correct order? richard -rw- weinberger
2012-05-21 18:47   ` [kernel-hardening] " richard -rw- weinberger
2012-05-21 19:13   ` H. Peter Anvin
2012-05-21 19:13     ` [kernel-hardening] " H. Peter Anvin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.