linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Denys Vlasenko <dvlasenk@redhat.com>
To: Kees Cook <keescook@chromium.org>, David Drysdale <drysdale@google.com>
Cc: Andy Lutomirski <luto@amacapital.net>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Will Drewry <wad@chromium.org>, Ingo Molnar <mingo@kernel.org>,
	Alok Kataria <akataria@vmware.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Borislav Petkov <bp@alien8.de>,
	Alexei Starovoitov <ast@plumgrid.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Oleg Nesterov <oleg@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>, X86 ML <x86@kernel.org>
Subject: Re: [Regression v4.2 ?] 32-bit seccomp-BPF returned errno values wrong in VM?
Date: Thu, 13 Aug 2015 23:35:40 +0200	[thread overview]
Message-ID: <55CD0DAC.9080809@redhat.com> (raw)
In-Reply-To: <CAGXu5j+gXShOAdK93KuVide93VYA_ObyjbK-zb7CwgOLc2JCnQ@mail.gmail.com>

On 08/13/2015 08:47 PM, Kees Cook wrote:
> On Thu, Aug 13, 2015 at 10:39 AM, David Drysdale <drysdale@google.com> wrote:
>> On Thu, Aug 13, 2015 at 6:15 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>> On Thu, Aug 13, 2015 at 9:28 AM, David Drysdale <drysdale@google.com> wrote:
>>>> On Thu, Aug 13, 2015 at 4:17 PM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>>>>> On 08/13/2015 10:30 AM, David Drysdale wrote:
>>>>>> Hi folks,
>>>>>>
>>>>>> I've got an odd regression with the v4.2 rc kernel, and I wondered if anyone
>>>>>> else could reproduce it.
>>>>>>
>>>>>> The problem occurs with a seccomp-bpf filter program that's set up to return
>>>>>> an errno value -- an errno of 1 is always returned instead of what's in the
>>>>>> filter, plus other oddities (selftest output below).
>>>>>>
>>>>>> The problem seems to need a combination of circumstances to occur:
>>>>>>
>>>>>>  - The seccomp-bpf userspace program needs to be 32-bit, running against a
>>>>>>    64-bit kernel -- I'm testing with seccomp_bpf from
>>>>>>    tools/testing/selftests/seccomp/, built via 'CFLAGS=-m32 make'.
>>>>>
>>>>> Does it work correctly when built as 64-bit program?
>>>>
>>>> Yep, 64-bit works fine (both at v4.2-rc6 and at commit 3f5159).
>>>>
>>>>>>
>>>>>>  - The kernel needs to be running as a VM guest -- it occurs inside my
>>>>>>    VMware Fusion host, but not if I run on bare metal.  Kees tells me he
>>>>>>    cannot repro with a kvm guest though.
>>>>>>
>>>>>> Bisecting indicates that the commit that induces the problem is
>>>>>> 3f5159a9221f19b0, "x86/asm/entry/32: Update -ENOSYS handling to match the
>>>>>> 64-bit logic", included in all the v4.2-rc* candidates.
>>>>>>
>>>>>> Apologies if I've just got something odd with my local setup, but the
>>>>>> bisection was unequivocal enough that I thought it worth reporting...
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>
>>>>>> seccomp_bpf failure outputs:
>>>>
>>>> [snip]
>>>>
>>>>> End result should be:
>>>>> pt_regs->ax = -E2BIG (via syscall_set_return_value())
>>>>> pt_regs->orig_ax = -1 ("skip syscall")
>>>>> and syscall_trace_enter_phase1() usually returns with 0,
>>>>> meaning "re-execute syscall at once, no phase2 needed".
>>>>>
>>>>> This, in turn, is called from .S files, and when it returns there,
>>>>> execution loops back to syscall dispatch.
>>>>>
>>>>> Because of orig_ax = -1, syscall dispatch should skip calling syscall.
>>>>> So -E2BIG should survive and be returned...
>>>>
>>>> So I was just about to send:
>>>>
>>>>  That makes sense, and given that exactly the same 32-bit binary
>>>>  runs fine on a different machine, there's presumably something up
>>>>  with my local setup.  The failing machine is a VMware guest, but
>>>>  maybe that's not the relevant interaction -- particularly if no-one
>>>>  else can repro.
>>>>
>>>> But then I noticed some odd audit entries in the main log:
>>>>
>>>> Aug 13 16:52:56 ubuntu kernel: [   20.687249] audit: type=1326
>>>> audit(1439481176.034:62): auid=4294967295 uid=1000 gid=1000
>>>> ses=4294967295 pid=2621 comm="secccomp_bpf.ke"
>>>> exe="/home/dmd/secccomp_bpf.kees.m32" sig=9 arch=40000003 syscall=172
>>>> compat=1 ip=0xf773cc90 code=0x0
>>>> Aug 13 16:52:56 ubuntu kernel: [   20.691157] audit: type=1326
>>>> audit(1439481176.038:63): auid=4294967295 uid=1000 gid=1000
>>>> ses=4294967295 pid=2631 comm="secccomp_bpf.ke"
>>>> exe="/home/dmd/secccomp_bpf.kees.m32" sig=31 arch=40000003 syscall=20
>>>> compat=1 ip=0xf773cc90 code=0x10000000
>>>> ...
>>>>
>>>> I didn't think I had any audit stuff turned on, and indeed:
>>>>   # auditctl -l
>>>>   No rules
>>>>
>>>> But as soon as I'd run that auditctl command, the 32-bit
>>>> seccomp_bpf binary started running fine!
>>>>
>>>> So now I'm confused, and I can no longer reproduce the
>>>> problem.  Which probably means this was a false alarm, in
>>>> which case, my apologies.
>>>
>>> You might have triggered TIF_AUDIT or whatever it's called, which
>>> causes a whole different path through the asm tangle, so you might
>>> really have a problem.
>>>
>>> Try auditctl -a task,never.  If that doesn't change anything, try
>>> rebooting the guest.
>>
>> Aha, that seems to re-instate the problem -- with that auditctl setup
>> I get the 32-bit seccomp failures on two different machines (one VM,
>> one bare).  So can anyone else repro?
>>
>> I guess the relevant steps are thus:
>>   - sudo auditctl -a task,never
>>   - cd tools/testing/selftests/seccomp
>>   - CFLAGS=-m32 make clean run_tests
> 
> That was it! I can reproduce this now on kvm (after adding the auditctl rule).

I suspect this change:

        .macro auditsys_entry_common
...
        movl %ebx,%esi                  /* 2nd arg: 1st syscall arg */
        movl %eax,%edi                  /* 1st arg: syscall number */
        call __audit_syscall_entry
-       movl RAX(%rsp),%eax     /* reload syscall number */
-       cmpq $(IA32_NR_syscalls-1),%rax
-       ja ia32_badsys
+       movl ORIG_RAX(%rsp),%eax        /* reload syscall number */
        movl %ebx,%edi                  /* reload 1st syscall arg */
        movl RCX(%rsp),%esi     /* reload 2nd syscall arg */
        movl RDX(%rsp),%edx     /* reload 3rd syscall arg */

We were reloading syscall# from pt_regs->ax.

After the patch, pt_regs->ax isn't equal to syscall# on entry,
instead it contains -ENOSYS. Therefore the change shown above
was made, to reload it from pt_regs->orig_ax.

Well. This still should work... in fact it is "more correct"
than it was before...

64-bit code has no call to __audit_syscall_entry, it uses
syscall_trace_enter_phase1/phase2 mechanism instead of
"only audit" shortcut. If the bug is here (though I don't see it),
it explains why 64-bit binary works.


Now, how do we reach this bit of code?

ia32_sysenter_target:
...
        testl   $_TIF_WORK_SYSCALL_ENTRY, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
        jnz  sysenter_tracesys
...
sysenter_tracesys:
        testl   $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT), ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
        jz      sysenter_auditsys
...
sysenter_auditsys:
        auditsys_entry_common     <== OUR MACRO
        movl %ebp,%r9d                  /* reload 6th syscall arg */
        jmp sysenter_dispatch


ia32_cstar_target:
...
        testl   $_TIF_WORK_SYSCALL_ENTRY, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
        jnz   cstar_tracesys
...
cstar_tracesys:
        testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT), ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
        jz cstar_auditsys
...
cstar_auditsys:
        movl %r9d,R9(%rsp)      /* register to be clobbered by call */
        auditsys_entry_common  <== OUR MACRO
        movl R9(%rsp),%r9d      /* reload 6th syscall arg */
        jmp cstar_dispatch


  reply	other threads:[~2015-08-13 21:35 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-13  8:30 [Regression v4.2 ?] 32-bit seccomp-BPF returned errno values wrong in VM? David Drysdale
2015-08-13 15:17 ` Denys Vlasenko
2015-08-13 16:28   ` David Drysdale
2015-08-13 17:15     ` Andy Lutomirski
2015-08-13 17:39       ` David Drysdale
2015-08-13 18:47         ` Kees Cook
2015-08-13 21:35           ` Denys Vlasenko [this message]
2015-08-13 21:47             ` Andy Lutomirski
2015-08-13 22:49               ` Linus Torvalds
2015-08-13 22:54                 ` Linus Torvalds
2015-08-13 22:56                   ` Kees Cook
2015-08-13 22:59                     ` Andy Lutomirski
2015-08-13 23:14                       ` Kees Cook
2015-08-13 23:30                       ` Linus Torvalds
2015-08-14 11:58                       ` Denys Vlasenko
2015-08-14 14:27                         ` Andy Lutomirski
2015-08-14  7:33                     ` David Drysdale
2015-08-13 22:58                   ` Andy Lutomirski
2015-08-13 23:25                     ` Linus Torvalds
2015-08-13 22:27             ` Linus Torvalds
2015-08-14 11:20               ` Denys Vlasenko
2015-08-22 10:03                 ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55CD0DAC.9080809@redhat.com \
    --to=dvlasenk@redhat.com \
    --cc=akataria@vmware.com \
    --cc=ast@plumgrid.com \
    --cc=bp@alien8.de \
    --cc=drysdale@google.com \
    --cc=fweisbec@gmail.com \
    --cc=hpa@zytor.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=torvalds@linux-foundation.org \
    --cc=wad@chromium.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).