All of lore.kernel.org
 help / color / mirror / Atom feed
* pseudo: host user contamination
@ 2018-03-23 15:33 Enrico Scholz
  2018-03-23 15:43 ` Enrico Scholz
  2018-03-23 16:06 ` Burton, Ross
  0 siblings, 2 replies; 68+ messages in thread
From: Enrico Scholz @ 2018-03-23 15:33 UTC (permalink / raw)
  To: openembedded-core

Hello,

in some packages (e.g. ncurses or glibc) I get package-qa warnings due
to host user contamination; e.g.

| WARNING: ncurses-6.0+20170715-r0 do_package_qa: QA Issue: ncurses: /ncurses-terminfo-base/etc/terminfo/a/ansi is owned by uid 505, which is the same as the user running bitbake. This may be due to host contamination
| ncurses: /ncurses-terminfo-base/etc/terminfo/d/dumb is owned by uid 505, which is the same as the user running bitbake. This may be due to host contamination
| ...
| ncurses: /ncurses-terminfo-base/etc/terminfo/x/xterm-256color is owned by uid 505, which is the same as the user running bitbake. This may be due to host contamination [host-user-contaminated]


This is 100% reproducible by 'bitbake ncurses -c cleansstate && bitbake
ncurses'.


Pseudo log contains

| debug_logfile: fd 2
| pid 16096 [parent 16095], doing new pid setup and server start
| Setup complete, sending SIGUSR1 to pid 16095.
| path mismatch [2 links]: ino 382716451 db '/dev/shm/sem.mp-6eoy3v9n' req '/dev/shm/8JXYzj'.
| inode mismatch: '..../ncurses/6.0+20170715-r0/image/usr/share/terminfo/a/ansi' ino 39261519 in db, 39262517 in request.
| symlink mismatch: '..../ncurses/6.0+20170715-r0/image/usr/share/terminfo/a/ansi' [39261519] db mode 0100644, header mode 0120777 (unlinking db)

for all the reported files.and a lot of

| path mismatch [91 links]: ino 1605070 db ...

like messages for unrelated files (e.g  sysroot components).


Seen on recent rocko and master (1-2 weeks before).

System is Fedora 27 in a docker container; rootfs is on an overlaysfs,
recipes on a bind-mounted btrfs fs and build (TMPDIR) happens in an ext4
fs.  SELinux is active.



Is this issue known resp. does a solution exist?



Enrico
-- 
SIGMA Chemnitz GmbH       Registergericht:   Amtsgericht Chemnitz HRB 1750
Am Erlenwald 13           Geschaeftsfuehrer: Grit Freitag, Frank Pyritz
09128 Chemnitz



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-23 15:33 pseudo: host user contamination Enrico Scholz
@ 2018-03-23 15:43 ` Enrico Scholz
  2018-03-23 16:05   ` Burton, Ross
  2018-03-23 16:06 ` Burton, Ross
  1 sibling, 1 reply; 68+ messages in thread
From: Enrico Scholz @ 2018-03-23 15:43 UTC (permalink / raw)
  To: openembedded-core

Enrico Scholz
<enrico.scholz-wttK6gPy29v+Hn7q9Vec/7NAH6kLmebB@public.gmane.org>
writes:

> in some packages (e.g. ncurses or glibc) I get package-qa warnings due
> to host user contamination; e.g.

can be reproduced by

---
LICENSE = "closed"

do_compile() {
	echo foo > bar
}

do_install() {
	install -D -p -m 0644 bar ${D}/bin/bar
	install -d -m 0755 ${D}/usr/bin
	mv ${D}/bin/bar ${D}/usr/bin/bar
	ln -s /usr/bin/bar ${D}/bin/bar
}

FILES_${PN} = "/bin/* /usr/bin/*"
---

WARNING: foo-1.0-r0 do_package_qa: QA Issue: foo: /foo/usr/bin/bar is owned by uid 505, which is the same as the user running bitbake. This may be due to host contamination [host-user-contaminated]



Enrico
-- 
SIGMA Chemnitz GmbH       Registergericht:   Amtsgericht Chemnitz HRB 1750
Am Erlenwald 13           Geschaeftsfuehrer: Grit Freitag, Frank Pyritz
09128 Chemnitz


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-23 15:43 ` Enrico Scholz
@ 2018-03-23 16:05   ` Burton, Ross
  2018-03-23 16:10     ` Enrico Scholz
  0 siblings, 1 reply; 68+ messages in thread
From: Burton, Ross @ 2018-03-23 16:05 UTC (permalink / raw)
  To: Enrico Scholz; +Cc: OE-core

> Enrico Scholz
> <enrico.scholz-wttK6gPy29v+Hn7q9Vec/7NAH6kLmebB@public.gmane.org>
> writes:
>
>> in some packages (e.g. ncurses or glibc) I get package-qa warnings due
>> to host user contamination; e.g.
>
> can be reproduced by
>
> ---
> LICENSE = "closed"
>
> do_compile() {
>         echo foo > bar
> }
>
> do_install() {
>         install -D -p -m 0644 bar ${D}/bin/bar
>         install -d -m 0755 ${D}/usr/bin
>         mv ${D}/bin/bar ${D}/usr/bin/bar
>         ln -s /usr/bin/bar ${D}/bin/bar
> }
>
> FILES_${PN} = "/bin/* /usr/bin/*"
> ---
>
> WARNING: foo-1.0-r0 do_package_qa: QA Issue: foo: /foo/usr/bin/bar is owned by uid 505, which is the same as the user running bitbake. This may be due to host contamination [host-user-contaminated]

Works for me: install correctly changes owner to root:root as it should.

$ dpkg -c foo_1.0-r0_corei7-64.ipk
drwxrwxrwx root/root         0 2018-03-23 16:02 ./
drwxr-xr-x root/root         0 2018-03-23 16:02 ./bin/
lrwxrwxrwx root/root         0 2018-03-23 16:02 ./bin/bar -> /usr/bin/bar
drwxr-xr-x root/root         0 2018-03-23 16:02 ./usr/
drwxr-xr-x root/root         0 2018-03-23 16:02 ./usr/bin/
-rw-r--r-- root/root         4 2018-03-23 16:02 ./usr/bin/bar

Obviously the install/mv pair is pointless, and it's not conventional
to pass -p to preserve timestamps, but none of that should matter.

Ross


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-23 15:33 pseudo: host user contamination Enrico Scholz
  2018-03-23 15:43 ` Enrico Scholz
@ 2018-03-23 16:06 ` Burton, Ross
  1 sibling, 0 replies; 68+ messages in thread
From: Burton, Ross @ 2018-03-23 16:06 UTC (permalink / raw)
  To: Enrico Scholz; +Cc: OE-core

Ah, I just read this other message of yours, which includes the fact
that you're on F27.

Yes, something in F27 has just broken pseudo quite dramatically.  No
idea what, so far.

Ross

On 23 March 2018 at 15:33, Enrico Scholz
<enrico.scholz@sigma-chemnitz.de> wrote:
> Hello,
>
> in some packages (e.g. ncurses or glibc) I get package-qa warnings due
> to host user contamination; e.g.
>
> | WARNING: ncurses-6.0+20170715-r0 do_package_qa: QA Issue: ncurses: /ncurses-terminfo-base/etc/terminfo/a/ansi is owned by uid 505, which is the same as the user running bitbake. This may be due to host contamination
> | ncurses: /ncurses-terminfo-base/etc/terminfo/d/dumb is owned by uid 505, which is the same as the user running bitbake. This may be due to host contamination
> | ...
> | ncurses: /ncurses-terminfo-base/etc/terminfo/x/xterm-256color is owned by uid 505, which is the same as the user running bitbake. This may be due to host contamination [host-user-contaminated]
>
>
> This is 100% reproducible by 'bitbake ncurses -c cleansstate && bitbake
> ncurses'.
>
>
> Pseudo log contains
>
> | debug_logfile: fd 2
> | pid 16096 [parent 16095], doing new pid setup and server start
> | Setup complete, sending SIGUSR1 to pid 16095.
> | path mismatch [2 links]: ino 382716451 db '/dev/shm/sem.mp-6eoy3v9n' req '/dev/shm/8JXYzj'.
> | inode mismatch: '..../ncurses/6.0+20170715-r0/image/usr/share/terminfo/a/ansi' ino 39261519 in db, 39262517 in request.
> | symlink mismatch: '..../ncurses/6.0+20170715-r0/image/usr/share/terminfo/a/ansi' [39261519] db mode 0100644, header mode 0120777 (unlinking db)
>
> for all the reported files.and a lot of
>
> | path mismatch [91 links]: ino 1605070 db ...
>
> like messages for unrelated files (e.g  sysroot components).
>
>
> Seen on recent rocko and master (1-2 weeks before).
>
> System is Fedora 27 in a docker container; rootfs is on an overlaysfs,
> recipes on a bind-mounted btrfs fs and build (TMPDIR) happens in an ext4
> fs.  SELinux is active.
>
>
>
> Is this issue known resp. does a solution exist?
>
>
>
> Enrico
> --
> SIGMA Chemnitz GmbH       Registergericht:   Amtsgericht Chemnitz HRB 1750
> Am Erlenwald 13           Geschaeftsfuehrer: Grit Freitag, Frank Pyritz
> 09128 Chemnitz
>
> --
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/openembedded-core


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-23 16:05   ` Burton, Ross
@ 2018-03-23 16:10     ` Enrico Scholz
  2018-03-23 16:17       ` Burton, Ross
  2018-03-23 16:28       ` Seebs
  0 siblings, 2 replies; 68+ messages in thread
From: Enrico Scholz @ 2018-03-23 16:10 UTC (permalink / raw)
  To: Burton, Ross; +Cc: OE-core

"Burton, Ross" <ross.burton@intel.com> writes:

>> do_install() {
>>         install -D -p -m 0644 bar ${D}/bin/bar
>>         install -d -m 0755 ${D}/usr/bin
>>         mv ${D}/bin/bar ${D}/usr/bin/bar
>>         ln -s /usr/bin/bar ${D}/bin/bar
>> }
>>
>> FILES_${PN} = "/bin/* /usr/bin/*"
>> ---
>>
>> WARNING: foo-1.0-r0 do_package_qa: QA Issue: foo: /foo/usr/bin/bar
>> is owned by uid 505, which is the same as the user running
>> bitbake. This may be due to host contamination
>> [host-user-contaminated]
>
> Works for me: install correctly changes owner to root:root as it
> should.

I think, 'mv' is the culprit.  It calls 'renameat2()' directly over
'syscall()':

| $ ltrace mv foo bar
| ...
| syscall(316, 0xffffff9c, 0x7fff1564a341, 0xffffff9c)                                            = 0


Perhaps, 'pseudo' does not catch this?


Enrico


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-23 16:10     ` Enrico Scholz
@ 2018-03-23 16:17       ` Burton, Ross
  2018-03-23 16:28       ` Seebs
  1 sibling, 0 replies; 68+ messages in thread
From: Burton, Ross @ 2018-03-23 16:17 UTC (permalink / raw)
  To: Enrico Scholz, Seebs; +Cc: OE-core

On 23 March 2018 at 16:10, Enrico Scholz
<enrico.scholz@sigma-chemnitz.de> wrote:
> "Burton, Ross" <ross.burton@intel.com> writes:
>
>>> do_install() {
>>>         install -D -p -m 0644 bar ${D}/bin/bar
>>>         install -d -m 0755 ${D}/usr/bin
>>>         mv ${D}/bin/bar ${D}/usr/bin/bar
>>>         ln -s /usr/bin/bar ${D}/bin/bar
>>> }
>>>
>>> FILES_${PN} = "/bin/* /usr/bin/*"
>>> ---
>>>
>>> WARNING: foo-1.0-r0 do_package_qa: QA Issue: foo: /foo/usr/bin/bar
>>> is owned by uid 505, which is the same as the user running
>>> bitbake. This may be due to host contamination
>>> [host-user-contaminated]
>>
>> Works for me: install correctly changes owner to root:root as it
>> should.
>
> I think, 'mv' is the culprit.  It calls 'renameat2()' directly over
> 'syscall()':
>
> | $ ltrace mv foo bar
> | ...
> | syscall(316, 0xffffff9c, 0x7fff1564a341, 0xffffff9c)                                            = 0
>
>
> Perhaps, 'pseudo' does not catch this?

I suspect that would do it as pseudo is basically a glorified LD_PRELOAD.  Sigh.

CCing Peter, pseudo author.

Ross


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-23 16:10     ` Enrico Scholz
  2018-03-23 16:17       ` Burton, Ross
@ 2018-03-23 16:28       ` Seebs
  2018-03-23 16:30         ` Burton, Ross
  2018-03-27 14:42         ` Enrico Scholz
  1 sibling, 2 replies; 68+ messages in thread
From: Seebs @ 2018-03-23 16:28 UTC (permalink / raw)
  To: Enrico Scholz; +Cc: OE-core

On Fri, 23 Mar 2018 17:10:35 +0100
Enrico Scholz <enrico.scholz@sigma-chemnitz.de> wrote:

> I think, 'mv' is the culprit.  It calls 'renameat2()' directly over
> 'syscall()':
> 
> | $ ltrace mv foo bar
> | ...
> | syscall(316, 0xffffff9c, 0x7fff1564a341,
> 0xffffff9c)                                            = 0
> 
> 
> Perhaps, 'pseudo' does not catch this?

Yeah.

And so far as I know, it's not actually *possible* to in the general
case. I really don't think it's safe to try to catch syscall().

I was afraid someone would do this. (It also breaks most Go programs,
for similar reasons; no libc calls.)

I have no idea why they're doing that; it seems distinctly unsafe.

-s


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-23 16:28       ` Seebs
@ 2018-03-23 16:30         ` Burton, Ross
  2018-03-23 16:49           ` Seebs
  2018-03-27 14:42         ` Enrico Scholz
  1 sibling, 1 reply; 68+ messages in thread
From: Burton, Ross @ 2018-03-23 16:30 UTC (permalink / raw)
  To: Seebs; +Cc: Enrico Scholz, OE-core

On 23 March 2018 at 16:28, Seebs <seebs@seebs.net> wrote:
> On Fri, 23 Mar 2018 17:10:35 +0100
> Enrico Scholz <enrico.scholz@sigma-chemnitz.de> wrote:
>
>> I think, 'mv' is the culprit.  It calls 'renameat2()' directly over
>> 'syscall()':
>>
>> | $ ltrace mv foo bar
>> | ...
>> | syscall(316, 0xffffff9c, 0x7fff1564a341,
>> 0xffffff9c)                                            = 0
>>
>>
>> Perhaps, 'pseudo' does not catch this?
>
> Yeah.
>
> And so far as I know, it's not actually *possible* to in the general
> case. I really don't think it's safe to try to catch syscall().
>
> I was afraid someone would do this. (It also breaks most Go programs,
> for similar reasons; no libc calls.)
>
> I have no idea why they're doing that; it seems distinctly unsafe.

Because in GNU's infinite wisdom they're using renameat2() to do
atomic renames in the mv command, and as renameat2 isn't in the
headers for F27 it just does a syscall directly. This is in upstream
coreutils so once they make a release, everyone gets it.

This is really ruining my day.

Ross


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-23 16:30         ` Burton, Ross
@ 2018-03-23 16:49           ` Seebs
  2018-03-23 16:56             ` Burton, Ross
  2018-03-23 23:47             ` Richard Purdie
  0 siblings, 2 replies; 68+ messages in thread
From: Seebs @ 2018-03-23 16:49 UTC (permalink / raw)
  To: Burton, Ross; +Cc: Enrico Scholz, OE-core

On Fri, 23 Mar 2018 16:30:55 +0000
"Burton, Ross" <ross.burton@intel.com> wrote:

> Because in GNU's infinite wisdom they're using renameat2() to do
> atomic renames in the mv command, and as renameat2 isn't in the
> headers for F27 it just does a syscall directly. This is in upstream
> coreutils so once they make a release, everyone gets it.

UGH.

I... am really unsure whether it's possible to catch that, because
I really, really, don't want to try to intercept raw syscall() calls.
I don't think that ends well.

I wonder if they can be persuaded to, you know, NOT use a syscall
directly when it's not in the system headers, on the grounds that the
system headers define the exported interface, and bypassing them is
almost certainly a very bad idea.

-s


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-23 16:49           ` Seebs
@ 2018-03-23 16:56             ` Burton, Ross
  2018-03-23 17:23               ` Seebs
  2018-03-23 23:47             ` Richard Purdie
  1 sibling, 1 reply; 68+ messages in thread
From: Burton, Ross @ 2018-03-23 16:56 UTC (permalink / raw)
  To: Seebs; +Cc: Enrico Scholz, OE-core

On 23 March 2018 at 16:49, Seebs <seebs@seebs.net> wrote:
> On Fri, 23 Mar 2018 16:30:55 +0000
> "Burton, Ross" <ross.burton@intel.com> wrote:
>
>> Because in GNU's infinite wisdom they're using renameat2() to do
>> atomic renames in the mv command, and as renameat2 isn't in the
>> headers for F27 it just does a syscall directly. This is in upstream
>> coreutils so once they make a release, everyone gets it.
>
> UGH.
>
> I... am really unsure whether it's possible to catch that, because
> I really, really, don't want to try to intercept raw syscall() calls.
> I don't think that ends well.
>
> I wonder if they can be persuaded to, you know, NOT use a syscall
> directly when it's not in the system headers, on the grounds that the
> system headers define the exported interface, and bypassing them is
> almost certainly a very bad idea.

Just chatting to the fakeroot maintainer now, as this is presumably
going to break the entire Debian build infrastructure when they get
the coreutils upgrade.  He isn't massively thrilled either.  They have
the option of just reverting these changes to coreutils though.

Ross


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-23 16:56             ` Burton, Ross
@ 2018-03-23 17:23               ` Seebs
  0 siblings, 0 replies; 68+ messages in thread
From: Seebs @ 2018-03-23 17:23 UTC (permalink / raw)
  To: Burton, Ross; +Cc: Enrico Scholz, OE-core

On Fri, 23 Mar 2018 16:56:16 +0000
"Burton, Ross" <ross.burton@intel.com> wrote:

> On 23 March 2018 at 16:49, Seebs <seebs@seebs.net> wrote:
> > On Fri, 23 Mar 2018 16:30:55 +0000
> > "Burton, Ross" <ross.burton@intel.com> wrote:
> >
> >> Because in GNU's infinite wisdom they're using renameat2() to do
> >> atomic renames in the mv command, and as renameat2 isn't in the
> >> headers for F27 it just does a syscall directly. This is in
> >> upstream coreutils so once they make a release, everyone gets it.
> >
> > UGH.
> >
> > I... am really unsure whether it's possible to catch that, because
> > I really, really, don't want to try to intercept raw syscall()
> > calls. I don't think that ends well.
> >
> > I wonder if they can be persuaded to, you know, NOT use a syscall
> > directly when it's not in the system headers, on the grounds that
> > the system headers define the exported interface, and bypassing
> > them is almost certainly a very bad idea.
> 
> Just chatting to the fakeroot maintainer now, as this is presumably
> going to break the entire Debian build infrastructure when they get
> the coreutils upgrade.  He isn't massively thrilled either.  They have
> the option of just reverting these changes to coreutils though.

It's *possible* that there's a workaround, but I think realistically
the right answer is probably "yell at coreutils not to do that".

-s


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-23 16:49           ` Seebs
  2018-03-23 16:56             ` Burton, Ross
@ 2018-03-23 23:47             ` Richard Purdie
  2018-03-23 23:56               ` Seebs
  1 sibling, 1 reply; 68+ messages in thread
From: Richard Purdie @ 2018-03-23 23:47 UTC (permalink / raw)
  To: Seebs, Burton, Ross; +Cc: Enrico Scholz, OE-core

On Fri, 2018-03-23 at 11:49 -0500, Seebs wrote:
> On Fri, 23 Mar 2018 16:30:55 +0000
> "Burton, Ross" <ross.burton@intel.com> wrote:
> 
> > 
> > Because in GNU's infinite wisdom they're using renameat2() to do
> > atomic renames in the mv command, and as renameat2 isn't in the
> > headers for F27 it just does a syscall directly. This is in
> > upstream
> > coreutils so once they make a release, everyone gets it.
> UGH.
> 
> I... am really unsure whether it's possible to catch that, because
> I really, really, don't want to try to intercept raw syscall() calls.
> I don't think that ends well.

Just out of interest for my education, why is that a really bad idea?
Loops, e.g. with memory allocation issues?

Cheers,

Richard


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-23 23:47             ` Richard Purdie
@ 2018-03-23 23:56               ` Seebs
  2018-03-24  0:22                 ` Enrico Scholz
                                   ` (2 more replies)
  0 siblings, 3 replies; 68+ messages in thread
From: Seebs @ 2018-03-23 23:56 UTC (permalink / raw)
  To: Richard Purdie; +Cc: Enrico Scholz, OE-core

On Fri, 23 Mar 2018 23:47:30 +0000
Richard Purdie <richard.purdie@linuxfoundation.org> wrote:

> On Fri, 2018-03-23 at 11:49 -0500, Seebs wrote:
> > On Fri, 23 Mar 2018 16:30:55 +0000
> > "Burton, Ross" <ross.burton@intel.com> wrote:
> > 
> > > 
> > > Because in GNU's infinite wisdom they're using renameat2() to do
> > > atomic renames in the mv command, and as renameat2 isn't in the
> > > headers for F27 it just does a syscall directly. This is in
> > > upstream
> > > coreutils so once they make a release, everyone gets it.
> > UGH.
> > 
> > I... am really unsure whether it's possible to catch that, because
> > I really, really, don't want to try to intercept raw syscall()
> > calls. I don't think that ends well.
> 
> Just out of interest for my education, why is that a really bad idea?
> Loops, e.g. with memory allocation issues?

Potentially. We rely pretty heavily on the assumption that an *actual*
syscall can go through.

Although... Actually, I don't even know if this is an actual syscall.
This could be an actual glibc wrapper around the syscall interface,
just like all the others, which is not the *actual* raw syscall or
whatever, and... I have no idea how often that is or isn't hit.

It's totally possible it would work, but basically, I have a pretty
good intuition of when something sounds brittle and error-prone, and
trying to trap syscall() sounds brittle and error-prone and might work
today but not next week...

-s


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-23 23:56               ` Seebs
@ 2018-03-24  0:22                 ` Enrico Scholz
  2018-03-24  0:33                 ` Andre McCurdy
  2018-03-24 12:36                 ` Richard Purdie
  2 siblings, 0 replies; 68+ messages in thread
From: Enrico Scholz @ 2018-03-24  0:22 UTC (permalink / raw)
  To: Seebs; +Cc: OE-core

Seebs <seebs@seebs.net> writes:

>> > I... am really unsure whether it's possible to catch that, because
>> > I really, really, don't want to try to intercept raw syscall()
>> > calls. I don't think that ends well.
>
> Potentially. We rely pretty heavily on the assumption that an *actual*
> syscall can go through.

I think, this would end in implementing architecture dependening
assembly code.  E.g. for ARM you can write

----
syscall:
        cmp     r0, #__NR_renameat2
        beq     renameat2
        ldr     r12, _orig_syscall_addr
        mov     pc, [r9, r12]

_orig_syscall_addr: .word       orig_syscall_addr
----

(Untested; the last three lines are probably wrong and try to get the
address of the variable where the original syscall() address has been
stored into).


> Although... Actually, I don't even know if this is an actual syscall.
> This could be an actual glibc wrapper around the syscall interface,
> just like all the others, which is not the *actual* raw syscall or
> whatever, and... I have no idea how often that is or isn't hit.

'ltrace' catches it.



Enrico


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-23 23:56               ` Seebs
  2018-03-24  0:22                 ` Enrico Scholz
@ 2018-03-24  0:33                 ` Andre McCurdy
  2018-03-24  0:36                   ` Seebs
  2018-03-24 12:36                 ` Richard Purdie
  2 siblings, 1 reply; 68+ messages in thread
From: Andre McCurdy @ 2018-03-24  0:33 UTC (permalink / raw)
  To: Seebs; +Cc: Enrico Scholz, OE-core

On Fri, Mar 23, 2018 at 4:56 PM, Seebs <seebs@seebs.net> wrote:
> On Fri, 23 Mar 2018 23:47:30 +0000
> Richard Purdie <richard.purdie@linuxfoundation.org> wrote:
>> On Fri, 2018-03-23 at 11:49 -0500, Seebs wrote:
>> > On Fri, 23 Mar 2018 16:30:55 +0000
>> > "Burton, Ross" <ross.burton@intel.com> wrote:
>> > > Because in GNU's infinite wisdom they're using renameat2() to do
>> > > atomic renames in the mv command, and as renameat2 isn't in the
>> > > headers for F27 it just does a syscall directly. This is in
>> > > upstream
>> > > coreutils so once they make a release, everyone gets it.
>> > UGH.
>> >
>> > I... am really unsure whether it's possible to catch that, because
>> > I really, really, don't want to try to intercept raw syscall()
>> > calls. I don't think that ends well.
>>
>> Just out of interest for my education, why is that a really bad idea?
>> Loops, e.g. with memory allocation issues?
>
> Potentially. We rely pretty heavily on the assumption that an *actual*
> syscall can go through.
>
> Although... Actually, I don't even know if this is an actual syscall.
> This could be an actual glibc wrapper around the syscall interface,
> just like all the others, which is not the *actual* raw syscall or
> whatever,

It looks like coreutils calls into gnulib, which calls the libc's
syscall wrapper:

  http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/renameat2.c;h=a295ec33f33dfe14e1d29cfae5d2c36e82d01ef4;hb=HEAD#l74

Interposing the libc syscall wrapper doesn't seem to scary if you can
transparently pass on everything apart from syscall(SYS_renameat2,
...) ?

> and... I have no idea how often that is or isn't hit.
>
> It's totally possible it would work, but basically, I have a pretty
> good intuition of when something sounds brittle and error-prone, and
> trying to trap syscall() sounds brittle and error-prone and might work
> today but not next week...


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-24  0:33                 ` Andre McCurdy
@ 2018-03-24  0:36                   ` Seebs
  2018-03-24  1:10                     ` Andre McCurdy
  0 siblings, 1 reply; 68+ messages in thread
From: Seebs @ 2018-03-24  0:36 UTC (permalink / raw)
  To: Andre McCurdy; +Cc: Enrico Scholz, OE-core

On Fri, 23 Mar 2018 17:33:28 -0700
Andre McCurdy <armccurdy@gmail.com> wrote:

> Interposing the libc syscall wrapper doesn't seem to scary if you can
> transparently pass on everything apart from syscall(SYS_renameat2,
> ...) ?

I don't think I can in the general case; I don't know how many
arguments every possible syscall takes.

-s


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-24  0:36                   ` Seebs
@ 2018-03-24  1:10                     ` Andre McCurdy
  2018-03-24  1:17                       ` Seebs
  0 siblings, 1 reply; 68+ messages in thread
From: Andre McCurdy @ 2018-03-24  1:10 UTC (permalink / raw)
  To: Seebs; +Cc: Enrico Scholz, OE-core

On Fri, Mar 23, 2018 at 5:36 PM, Seebs <seebs@seebs.net> wrote:
> On Fri, 23 Mar 2018 17:33:28 -0700
> Andre McCurdy <armccurdy@gmail.com> wrote:
>
>> Interposing the libc syscall wrapper doesn't seem to scary if you can
>> transparently pass on everything apart from syscall(SYS_renameat2,
>> ...) ?
>
> I don't think I can in the general case; I don't know how many
> arguments every possible syscall takes.

The syscall wrapper in musl handles 6 additional arguments -
unconditionally. Based on that you might not need to interpret
anything - just extract 6 arguments and pass them on?

long syscall(long n, ...)
{
    va_list ap;
    syscall_arg_t a,b,c,d,e,f;
    va_start(ap, n);
    a=va_arg(ap, syscall_arg_t);
    b=va_arg(ap, syscall_arg_t);
    c=va_arg(ap, syscall_arg_t);
    d=va_arg(ap, syscall_arg_t);
    e=va_arg(ap, syscall_arg_t);
    f=va_arg(ap, syscall_arg_t);
    va_end(ap);
    return __syscall_ret(__syscall(n,a,b,c,d,e,f));
}


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-24  1:10                     ` Andre McCurdy
@ 2018-03-24  1:17                       ` Seebs
  2018-03-24  1:43                         ` Andre McCurdy
  0 siblings, 1 reply; 68+ messages in thread
From: Seebs @ 2018-03-24  1:17 UTC (permalink / raw)
  To: Andre McCurdy; +Cc: Enrico Scholz, OE-core

On Fri, 23 Mar 2018 18:10:21 -0700
Andre McCurdy <armccurdy@gmail.com> wrote:

> The syscall wrapper in musl handles 6 additional arguments -
> unconditionally. Based on that you might not need to interpret
> anything - just extract 6 arguments and pass them on?
> 
> long syscall(long n, ...)
> {
>     va_list ap;
>     syscall_arg_t a,b,c,d,e,f;
>     va_start(ap, n);
>     a=va_arg(ap, syscall_arg_t);
>     b=va_arg(ap, syscall_arg_t);
>     c=va_arg(ap, syscall_arg_t);
>     d=va_arg(ap, syscall_arg_t);
>     e=va_arg(ap, syscall_arg_t);
>     f=va_arg(ap, syscall_arg_t);
>     va_end(ap);
>     return __syscall_ret(__syscall(n,a,b,c,d,e,f));
> }

That is the sort of thing which *might* work, but which is potentially
subject to arch-specific calling conventions or strangeness.

It's worth a try, I guess? But I also think it may be worth just having
all the people maintaining stuff that expects this go yell at coreutils
about bad implementation choices, like "bypass libc to make raw
syscalls when you are not, in fact, implementing libc".

-s


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-24  1:17                       ` Seebs
@ 2018-03-24  1:43                         ` Andre McCurdy
  2018-03-24  2:44                           ` Seebs
  0 siblings, 1 reply; 68+ messages in thread
From: Andre McCurdy @ 2018-03-24  1:43 UTC (permalink / raw)
  To: Seebs; +Cc: Enrico Scholz, OE-core

On Fri, Mar 23, 2018 at 6:17 PM, Seebs <seebs@seebs.net> wrote:
> On Fri, 23 Mar 2018 18:10:21 -0700
> Andre McCurdy <armccurdy@gmail.com> wrote:
>
>> The syscall wrapper in musl handles 6 additional arguments -
>> unconditionally. Based on that you might not need to interpret
>> anything - just extract 6 arguments and pass them on?
>>
>> long syscall(long n, ...)
>> {
>>     va_list ap;
>>     syscall_arg_t a,b,c,d,e,f;
>>     va_start(ap, n);
>>     a=va_arg(ap, syscall_arg_t);
>>     b=va_arg(ap, syscall_arg_t);
>>     c=va_arg(ap, syscall_arg_t);
>>     d=va_arg(ap, syscall_arg_t);
>>     e=va_arg(ap, syscall_arg_t);
>>     f=va_arg(ap, syscall_arg_t);
>>     va_end(ap);
>>     return __syscall_ret(__syscall(n,a,b,c,d,e,f));
>> }
>
> That is the sort of thing which *might* work, but which is potentially
> subject to arch-specific calling conventions or strangeness.
>
> It's worth a try, I guess? But I also think it may be worth just having
> all the people maintaining stuff that expects this go yell at coreutils
> about bad implementation choices, like "bypass libc to make raw
> syscalls when you are not, in fact, implementing libc".

Since glibc doesn't provide a wrapper for renameat2, making the
syscall via the libc syscall() API is exactly what coreutils (actually
gnulib) should be doing. There would certainly be grounds to complain
if user space code were making a syscall directly, but that's not
what's happening here - the syscall is still being made from within
libc.

Some more background here:

  https://lwn.net/Articles/655028/


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-24  1:43                         ` Andre McCurdy
@ 2018-03-24  2:44                           ` Seebs
  0 siblings, 0 replies; 68+ messages in thread
From: Seebs @ 2018-03-24  2:44 UTC (permalink / raw)
  To: Andre McCurdy; +Cc: Enrico Scholz, OE-core

On Fri, 23 Mar 2018 18:43:12 -0700
Andre McCurdy <armccurdy@gmail.com> wrote:

> Since glibc doesn't provide a wrapper for renameat2, making the
> syscall via the libc syscall() API is exactly what coreutils (actually
> gnulib) should be doing. There would certainly be grounds to complain
> if user space code were making a syscall directly, but that's not
> what's happening here - the syscall is still being made from within
> libc.

Ahh, okay, that clears it up some.

I thought coreutils was making a direct invocation of a syscall rather
than using the generic syscall wrapper. I guess my thought is that if
glibc isn't providing a wrapper for a syscall, it's probably best to
avoid that syscall unless it's impossible to run without it. And since I
*think* someone may have implemented mv(1) in the past without
renameat2(), it seems to me that switching it to use a new syscall that
libc doesn't have a wrapper for yet is perhaps premature.

There's been a lot of issues we've run into that were caused by
coreutils or something near it being very excited about switching to a
new, and preferably larger and more complicated, API. Usually to do
something that was already being done successfully and without problems;
for instance, the use of posix ACL xattrs to encode standard permission
bits even when no other ACL functionality is in use. I am not positively
impressed by this; if what you're doing can be done entirely with a
stable interface that hasn't changed in the last decade or two,
switching to a newer interface seems sort of counterproductive.

I have some concerns about the API for syscall(), and there's a lot of
behavior here that's potentially more-undefined-than-usual; for
instance, using va_arg to get more arguments than were actually passed.
But it may be we don't really have a way around it.

-s


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-23 23:56               ` Seebs
  2018-03-24  0:22                 ` Enrico Scholz
  2018-03-24  0:33                 ` Andre McCurdy
@ 2018-03-24 12:36                 ` Richard Purdie
  2018-03-24 15:12                   ` Seebs
  2018-03-24 17:10                   ` Burton, Ross
  2 siblings, 2 replies; 68+ messages in thread
From: Richard Purdie @ 2018-03-24 12:36 UTC (permalink / raw)
  To: Seebs; +Cc: Enrico Scholz, OE-core

On Fri, 2018-03-23 at 18:56 -0500, Seebs wrote:
> On Fri, 23 Mar 2018 23:47:30 +0000
> Richard Purdie <richard.purdie@linuxfoundation.org> wrote:
> > > I... am really unsure whether it's possible to catch that,
> > > because
> > > I really, really, don't want to try to intercept raw syscall()
> > > calls. I don't think that ends well.
> > Just out of interest for my education, why is that a really bad
> > idea?
> > Loops, e.g. with memory allocation issues?
>
> Potentially. We rely pretty heavily on the assumption that an
> *actual* syscall can go through.
> 
> Although... Actually, I don't even know if this is an actual syscall.
> This could be an actual glibc wrapper around the syscall interface,
> just like all the others, which is not the *actual* raw syscall or
> whatever, and... I have no idea how often that is or isn't hit.
> 
> It's totally possible it would work, but basically, I have a pretty
> good intuition of when something sounds brittle and error-prone, and
> trying to trap syscall() sounds brittle and error-prone and might
> work today but not next week...

I do totally agree that this is into dangerous territory. That said, I
did want to understand what they've done here.

Checking on a f27 machine:

[rpurdie@fedora27 ~]$ objdump -T /bin/mv | grep sys
0000000000000000      DF *UND*	0000000000000000  GLIBC_2.2.5
syscall

and a quick look at the glibc source says there is a syscall()
function:

long syscall (syscall_number, arg1, arg2, arg3, arg4, arg5, arg6)

which whilst written in assembler, is a standard library function which
I believe coreutils is using.

I think, at least in principle, pseudo could wrap that and intercept
this particular syscall, check syscall_number (the numbering having its
own set of issues) and then only handle the specific problem case we
have.

The unfortunate reality is we will have to figure out some solution to
this as f27 is in the wild now. Explaining why this causes problems for
debian/yocto to the upstream is also obviously a good idea.

Cheers,

Richard



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-24 12:36                 ` Richard Purdie
@ 2018-03-24 15:12                   ` Seebs
  2018-03-24 17:10                   ` Burton, Ross
  1 sibling, 0 replies; 68+ messages in thread
From: Seebs @ 2018-03-24 15:12 UTC (permalink / raw)
  To: Richard Purdie; +Cc: Enrico Scholz, OE-core

On Sat, 24 Mar 2018 12:36:28 +0000
Richard Purdie <richard.purdie@linuxfoundation.org> wrote:

> I think, at least in principle, pseudo could wrap that and intercept
> this particular syscall, check syscall_number (the numbering having
> its own set of issues) and then only handle the specific problem case
> we have.

I think the problem is the lack of a generic mechanism for "oops
nevermind just pass the arguments along to the child".

There's actually a neat post in Go land pointing out a mechanism
for a somewhat-similar circumstance in which their compiler just
plain cheats, and does not actually do the full function call setup,
just leaves the stack in a place that works *as if* the parent had
called something else.
 
> The unfortunate reality is we will have to figure out some solution to
> this as f27 is in the wild now. Explaining why this causes problems
> for debian/yocto to the upstream is also obviously a good idea.

I would like to put in a (weak, and i know it's impractical) vote for
just labeling this a host system requirement, "don't use versions of
coreutils that jumped to a not-yet-fully-supported new system call."

-s


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-24 12:36                 ` Richard Purdie
  2018-03-24 15:12                   ` Seebs
@ 2018-03-24 17:10                   ` Burton, Ross
  2018-03-24 17:23                     ` Seebs
  1 sibling, 1 reply; 68+ messages in thread
From: Burton, Ross @ 2018-03-24 17:10 UTC (permalink / raw)
  To: Richard Purdie; +Cc: Enrico Scholz, OE-core

On 24 March 2018 at 12:36, Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:
> I think, at least in principle, pseudo could wrap that and intercept
> this particular syscall, check syscall_number (the numbering having its
> own set of issues) and then only handle the specific problem case we
> have.

And to make things easier I think we could even just ENOTSUPP renameat2
in the short term (i.e. for 2.5), before looking at a more
comprehensive intercepting
which could solve the Go issue.

> The unfortunate reality is we will have to figure out some solution to
> this as f27 is in the wild now. Explaining why this causes problems for
> debian/yocto to the upstream is also obviously a good idea.

I filed a bug with coreutils yesterday.  "Just intercept syscall()" they said.

Ross


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-24 17:10                   ` Burton, Ross
@ 2018-03-24 17:23                     ` Seebs
  2018-03-24 18:12                       ` Andre McCurdy
  0 siblings, 1 reply; 68+ messages in thread
From: Seebs @ 2018-03-24 17:23 UTC (permalink / raw)
  To: Burton, Ross; +Cc: Enrico Scholz, OE-core

On Sat, 24 Mar 2018 17:10:47 +0000
"Burton, Ross" <ross.burton@intel.com> wrote:

> On 24 March 2018 at 12:36, Richard Purdie
> <richard.purdie@linuxfoundation.org> wrote:
> > I think, at least in principle, pseudo could wrap that and intercept
> > this particular syscall, check syscall_number (the numbering having
> > its own set of issues) and then only handle the specific problem
> > case we have.
> 
> And to make things easier I think we could even just ENOTSUPP
> renameat2 in the short term (i.e. for 2.5), before looking at a more
> comprehensive intercepting
> which could solve the Go issue.

In the Go case, we would basically have to do something more like
debugger traps. They're not using libc *at all*, and unless something's
built with cgo or requires C-type libraries, it's not even going to be
dynamically linked. No dynamic linker => LD_PRELOAD is irrelevant.

> I filed a bug with coreutils yesterday.  "Just intercept syscall()"
> they said.

If they can describe a mechanism for intercepting syscall that they can
guarantee will work across all Linux architectures including possible
future architectures not yet in use, I'd love to know what it is.

See syscall(2) for some examples of the kinds of things that could be
concerns, such as the EABI calling convention. We can sort of hope for
the best if we just treat everything as a chain of unsigned longs, but
that's really *not* safe, and it should not be expected to work
reliably across architectures.

Honestly, reading it more closely, I don't think we can actually
produce behavior that precisely mimics the behavior of syscall() for
generic cases on architectures we currently run on. There's magic like
setting values in other registers, clobbering registers, and so on,
because *this function does not obey general architecture calling
conventions*. And if the wrapper does, the wrapper will break at least
some of the expected behaviors, by not behaving the same way.

Basically: I don't think we can promise that we will correctly pass
through both parameters to syscall() and returns from it in on existing
architectures we're actually running on today, for the whole set of
possible syscalls. So if we intercept syscall(), at least some
previously-valid programs break.

-s


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-24 17:23                     ` Seebs
@ 2018-03-24 18:12                       ` Andre McCurdy
  2018-03-24 18:22                         ` Seebs
  0 siblings, 1 reply; 68+ messages in thread
From: Andre McCurdy @ 2018-03-24 18:12 UTC (permalink / raw)
  To: Seebs; +Cc: Enrico Scholz, OE-core

On Sat, Mar 24, 2018 at 10:23 AM, Seebs <seebs@seebs.net> wrote:
> On Sat, 24 Mar 2018 17:10:47 +0000
> "Burton, Ross" <ross.burton@intel.com> wrote:
>> On 24 March 2018 at 12:36, Richard Purdie
>> <richard.purdie@linuxfoundation.org> wrote:
>> > I think, at least in principle, pseudo could wrap that and intercept
>> > this particular syscall, check syscall_number (the numbering having
>> > its own set of issues) and then only handle the specific problem
>> > case we have.
>>
>> And to make things easier I think we could even just ENOTSUPP
>> renameat2 in the short term (i.e. for 2.5)

If you can successfully intercept the libc syscall() API and return
ENOTSUPP for the one specific case of renameat2 but pass on all other
callers transparently then haven't you've already solved the bulk of
the problem (for the non-Go case)?

Or are you suggesting unconditionally returning ENOTSUPP for every
syscall called via the libc syscall() API?

>>, before looking at a more
>> comprehensive intercepting
>> which could solve the Go issue.
>
> In the Go case, we would basically have to do something more like
> debugger traps. They're not using libc *at all*, and unless something's
> built with cgo or requires C-type libraries, it's not even going to be
> dynamically linked. No dynamic linker => LD_PRELOAD is irrelevant.

Right, Go (and statically linked libc apps) are a completely different
problem and need a different solution.

>> I filed a bug with coreutils yesterday.  "Just intercept syscall()"
>> they said.
>
> If they can describe a mechanism for intercepting syscall that they can
> guarantee will work across all Linux architectures including possible
> future architectures not yet in use, I'd love to know what it is.

It's basically exactly what the musl syscall() wrapper does. ie fetch
6 register sized vaargs values from the caller and pass them on in the
same order to the next syscall().

  http://git.musl-libc.org/cgit/musl/tree/src/misc/syscall.c

> See syscall(2) for some examples of the kinds of things that could be
> concerns, such as the EABI calling convention. We can sort of hope for
> the best if we just treat everything as a chain of unsigned longs, but
> that's really *not* safe, and it should not be expected to work
> reliably across architectures.

None of that matters if you don't need to interpret the arguments -
you just need to pass them on in the same order you received them.

> Honestly, reading it more closely, I don't think we can actually
> produce behavior that precisely mimics the behavior of syscall() for
> generic cases on architectures we currently run on. There's magic like
> setting values in other registers, clobbering registers, and so on,
> because *this function does not obey general architecture calling
> conventions*. And if the wrapper does, the wrapper will break at least
> some of the expected behaviors, by not behaving the same way.

I don't see any evidence to support all this doom and gloom, but if
there is a corner case which fails then it will also fail when running
on musl - so at least you won't be debugging it on your own :-)

> Basically: I don't think we can promise that we will correctly pass
> through both parameters to syscall() and returns from it in on existing
> architectures we're actually running on today, for the whole set of
> possible syscalls. So if we intercept syscall(), at least some
> previously-valid programs break.

Since the libc syscall() API is only going to be used for syscalls
which don't have an libc wrapper, it's unlikely to get used very much.
The LWN article has a list of the potential syscalls which are likely
to come via this API and it's not a long list. In practice, getrandom
and renameat2 may be the only syscalls which currently get called that
way - and getrandom now has a wrapper in libc so over time that will
migrate away from using syscall().

  https://lwn.net/Articles/655028/


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-24 18:12                       ` Andre McCurdy
@ 2018-03-24 18:22                         ` Seebs
  2018-03-24 18:59                           ` Andre McCurdy
  0 siblings, 1 reply; 68+ messages in thread
From: Seebs @ 2018-03-24 18:22 UTC (permalink / raw)
  To: Andre McCurdy; +Cc: Enrico Scholz, OE-core

On Sat, 24 Mar 2018 11:12:20 -0700
Andre McCurdy <armccurdy@gmail.com> wrote:

> If you can successfully intercept the libc syscall() API and return
> ENOTSUPP for the one specific case of renameat2 but pass on all other
> callers transparently then haven't you've already solved the bulk of
> the problem (for the non-Go case)?

> Or are you suggesting unconditionally returning ENOTSUPP for every
> syscall called via the libc syscall() API?

I have no idea what happens if we try to ENOTSUPP things. I would be
wary of it.

> It's basically exactly what the musl syscall() wrapper does. ie fetch
> 6 register sized vaargs values from the caller and pass them on in the
> same order to the next syscall().

Yes. That might well work in most cases?

> None of that matters if you don't need to interpret the arguments -
> you just need to pass them on in the same order you received them.

That is not necessarily true in every ABI.

But I'm more concerned about the comments about *returns* and special
register usage.
 
> I don't see any evidence to support all this doom and gloom, but if
> there is a corner case which fails then it will also fail when running
> on musl - so at least you won't be debugging it on your own :-)

Possibly? I've never encountered musl, and don't know how broad their
coverage is.

But I'm put in mind of a quote from some years back:

"This guy I know says you can't just carry the ball in basketball, but
I got a basketball and tried it, and it worked fine."

I'm not saying this will necessarily fail immediately. I'm saying
there's nothing even *like* a guarantee that it will work, or that it
will keep working. And I am concerned about the fairly unbounded
possible risk cases.

Every other function we wrap *has* a meaningful prototype, with
arguments in a knowable order. But look at the EABI example; if we want
to actually *process* renameat2(), it's not enough to pass arguments on
blindly, we have to be sure we know exactly which arguments are which.
If the pointers can be 64-bit pointers, though, that may mean that
we have to expect the incoming arguments to be (olddirfd, 0, oldpathhi,
oldpathlo, newdirfd, 0, newpathhi, newpathlo, flags), but only on some
architectures, where others would use (olddirfd, oldpath, newdirfd,
newpath, flags).

Do we have any 64-bit pointer EABI? Heck if I know, I don't track that,
because this is the only time I can think of where I'd have had to.

> Since the libc syscall() API is only going to be used for syscalls
> which don't have an libc wrapper,

I'm not sure I'd bet on that. If it exists, people will sometimes use
it, and sometimes use it poorly.

-s


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-24 18:22                         ` Seebs
@ 2018-03-24 18:59                           ` Andre McCurdy
  2018-03-24 19:24                             ` Seebs
  0 siblings, 1 reply; 68+ messages in thread
From: Andre McCurdy @ 2018-03-24 18:59 UTC (permalink / raw)
  To: Seebs; +Cc: Enrico Scholz, OE-core

On Sat, Mar 24, 2018 at 11:22 AM, Seebs <seebs@seebs.net> wrote:
>
> Every other function we wrap *has* a meaningful prototype, with
> arguments in a knowable order. But look at the EABI example; if we want
> to actually *process* renameat2(), it's not enough to pass arguments on
> blindly, we have to be sure we know exactly which arguments are which.
> If the pointers can be 64-bit pointers, though, that may mean that
> we have to expect the incoming arguments to be (olddirfd, 0, oldpathhi,
> oldpathlo, newdirfd, 0, newpathhi, newpathlo, flags), but only on some
> architectures, where others would use (olddirfd, oldpath, newdirfd,
> newpath, flags).

The EABI example applies to 64bit values on 32bit architectures. Since
pointers are 32bit values on 32bit architectures the example doesn't
apply to renameat2 (which only passes int's and pointers - nothing
which would be a 64bit value on a 32bit architecture). ie there is
never any "oldpathhi, oldpathlo", only "oldpath", etc.

If there _were_ some architecture dependent ordering of the renameat2
arguments passed into the syscall() ABI then it would have to be
reflected in the code which originally calls syscall(), e.g. the code
in gnulib.

  http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/renameat2.c;h=a295ec33f33dfe14e1d29cfae5d2c36e82d01ef4;hb=HEAD#l74


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-24 18:59                           ` Andre McCurdy
@ 2018-03-24 19:24                             ` Seebs
  2018-03-24 19:42                               ` Andre McCurdy
  0 siblings, 1 reply; 68+ messages in thread
From: Seebs @ 2018-03-24 19:24 UTC (permalink / raw)
  To: Andre McCurdy; +Cc: Enrico Scholz, OE-core

On Sat, 24 Mar 2018 11:59:27 -0700
Andre McCurdy <armccurdy@gmail.com> wrote:

> The EABI example applies to 64bit values on 32bit architectures. Since
> pointers are 32bit values on 32bit architectures the example doesn't
> apply to renameat2 (which only passes int's and pointers - nothing
> which would be a 64bit value on a 32bit architecture). ie there is
> never any "oldpathhi, oldpathlo", only "oldpath", etc.

I didn't see a qualifier about it being only on a 32-bit architecture,
it just says "EABI".

But in general, this is the reason that musl's ability to work doesn't
buy us guarantees; musl doesn't have to *interpret* the arguments. So
for instance, they could just pass "the same arguments" for
SYS_readahead, we couldn't. (If we needed it, which I don't think we
do.)

Similarly, they don't have to do Fancy Complicated Fixups around their
system calls which can break weird register conventions. Consider:

> >        On a few architectures, a register is used to indicate
> > simple boolean failure of the system call:  ia64
> > uses r10 for this purpose, and mips uses a3.

I have no evidence that these registers are being reliably saved
through all the *other* code pseudo has in and around wrapper
functions. By the time we get into wrap_foo(), we've already done a ton
of other things, including made system calls, and we make *more* system
calls on the way back out.

So to handle that reliably, we'd need an extra special fancy wrapper
function which bypasses everything for all the non-renameat2 cases,
and even then, what would we do for renameat2? I can't write C code
which stashes r10-on-ia64-or-a3-on-mips on its way out of the wrapper,
then restores them after everything else is done.

I can try writing the code, and it might well work, but I want to make
it clear that this is a case where there's no guarantees at *all* that
any code that can be written in remotely-sane C without assembly can
actually do the thing correctly. The musl code is not trying to deal
with a case where there's multiple other syscalls after the real
syscall has been called, but before returning.

This is not "we definitely can't", this is warning that there's reason
to think it may not work, or may require a great deal more magic than
anything else does.

> If there _were_ some architecture dependent ordering of the renameat2
> arguments passed into the syscall() ABI then it would have to be
> reflected in the code which originally calls syscall(), e.g. the code
> in gnulib.

Yeah. But about all my crystal ball will tell me about possible future
changes or syscalls or ABI choices is that there might be some.

-s


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-24 19:24                             ` Seebs
@ 2018-03-24 19:42                               ` Andre McCurdy
  2018-03-24 19:50                                 ` Seebs
  0 siblings, 1 reply; 68+ messages in thread
From: Andre McCurdy @ 2018-03-24 19:42 UTC (permalink / raw)
  To: Seebs; +Cc: Enrico Scholz, OE-core

On Sat, Mar 24, 2018 at 12:24 PM, Seebs <seebs@seebs.net> wrote:
>
> I didn't see a qualifier about it being only on a 32-bit architecture,
> it just says "EABI".
>
> But in general, this is the reason that musl's ability to work doesn't
> buy us guarantees; musl doesn't have to *interpret* the arguments. So
> for instance, they could just pass "the same arguments" for
> SYS_readahead, we couldn't. (If we needed it, which I don't think we
> do.)

Right. The musl example is to show how it's possible to transparently
intercept and pass on any call to the syscall() ABI without
interpreting anything.

> Similarly, they don't have to do Fancy Complicated Fixups around their
> system calls which can break weird register conventions. Consider:
>
>> >        On a few architectures, a register is used to indicate
>> > simple boolean failure of the system call:  ia64
>> > uses r10 for this purpose, and mips uses a3.

Those details are all taken care of within the libc implementation of
syscall(). It's not something we need to care about at all in a
wrapper for it.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-24 19:42                               ` Andre McCurdy
@ 2018-03-24 19:50                                 ` Seebs
  2018-03-24 20:12                                   ` Victor Kamensky
                                                     ` (2 more replies)
  0 siblings, 3 replies; 68+ messages in thread
From: Seebs @ 2018-03-24 19:50 UTC (permalink / raw)
  To: Andre McCurdy; +Cc: Enrico Scholz, OE-core

On Sat, 24 Mar 2018 12:42:45 -0700
Andre McCurdy <armccurdy@gmail.com> wrote:

> Right. The musl example is to show how it's possible to transparently
> intercept and pass on any call to the syscall() ABI without
> interpreting anything.

Yes, if you don't need to interpret things, and aren't making
additional other unrelated system calls after doing so.

> Those details are all taken care of within the libc implementation of
> syscall(). It's not something we need to care about at all in a
> wrapper for it.

I don't think that's correct.

musl's call sequence:
	real_syscall() // sets a3
	return

pseudo's call sequence:
	various_setup()
	real_syscall() // sets a3
	other system calls // also set a3
	return

In the case where pseudo is actually *disabled*, we just return
right away after the real call. In every other case, we're making
other calls some of which imply system calls, and those system calls
could potentially overwrite things that the libc implementation of
syscall took care of. (Mutex and signal mask operations.)

So for that to work, I would in principle have to stash the value
stored in, for instance, "a3", wait until after the other system calls,
and then restore it. Unless *only* syscall() itself actually sets
that register, and other system calls don't, and nothing else is
using it either.

-s


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-24 19:50                                 ` Seebs
@ 2018-03-24 20:12                                   ` Victor Kamensky
  2018-03-24 23:04                                     ` Burton, Ross
  2018-03-24 20:22                                   ` Joshua Watt
  2018-03-24 20:27                                   ` Andre McCurdy
  2 siblings, 1 reply; 68+ messages in thread
From: Victor Kamensky @ 2018-03-24 20:12 UTC (permalink / raw)
  To: Seebs; +Cc: Enrico Scholz, OE-core

Here is another crazy idea how to deal with it, just
brainstorming what options are on the table: disable
renameat2 with help of seccomp and force coreutils to
use other calls. Something along the lines that were
suggested with intercept of syscall function call, but
let kernel to do interception work.

Here is tiny example based on my todays learning or
seccomp and eBPF, it shows how on my FC27 filtering out
renameat2 forces coreutils mv do use other calls to do the job.

[kamensky@coreos-lnx2 bpf]$ cat filterout_renameat2.c
#include <stddef.h>
#include <linux/unistd.h>
#include <linux/seccomp.h>
#include <linux/filter.h>
#include <sys/prctl.h>
#include <errno.h>

#define syscall_nr (offsetof(struct seccomp_data, nr))

struct sock_filter filterout_renameat2[] = {
     BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_nr),
     BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_renameat2, 0, 1),
     BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ERRNO + ENOSYS),
     BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW)
};

struct sock_fprog filterout_renameat2_prog = {
     .len = (unsigned short)(sizeof(filterout_renameat2) /
                             sizeof(filterout_renameat2[0])),
     .filter = filterout_renameat2,
};

int disable_renameat2_syscall (void)
{
     int err;
     err = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
     if (!err) {
         err = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER,
 		    &filterout_renameat2_prog);
     }

     return err;
}
[kamensky@coreos-lnx2 bpf]$ cat norenameat2.c
#include <unistd.h>
#include <stdio.h>

int disable_renameat2_syscall (void);

int main(int argc, char **argv)
{
     int err = 0;

     err = disable_renameat2_syscall();
     if(err) {
         perror("disable_renameat2_syscall");
     }

     execvp (argv[1], &argv[1]);
     return 0;
}
[kamensky@coreos-lnx2 bpf]$ gcc -o norenameat2 norenameat2.c filterout_renameat2.c 
[kamensky@coreos-lnx2 bpf]$ mkdir foo
[kamensky@coreos-lnx2 bpf]$ strace -o ./trace.mv.txt -f mv foo bar
[kamensky@coreos-lnx2 bpf]$ grep rename ./trace.mv.txt
2218  renameat2(AT_FDCWD, "foo", AT_FDCWD, "bar", 0) = 0
[kamensky@coreos-lnx2 bpf]$ rm -r -f bar
[kamensky@coreos-lnx2 bpf]$ mkdir foo
[kamensky@coreos-lnx2 bpf]$ strace -o ./trace.norenameat2.mv.txt -f ./norenameat2 mv foo bar
[kamensky@coreos-lnx2 bpf]$ grep rename ./trace.norenameat2.mv.txt
2228  execve("./norenameat2", ["./norenameat2", "mv", "foo", "bar"], 0x7ffd16d930e0 /* 37 vars */) = 0
2228  renameat2(AT_FDCWD, "foo", AT_FDCWD, "bar", 0) = -1 ENOSYS (Function not implemented)
2228  renameat(AT_FDCWD, "foo", AT_FDCWD, "bar") = 0
[kamensky@coreos-lnx2 bpf]$

Thanks,
Victor

On Sat, 24 Mar 2018, Seebs wrote:

> On Sat, 24 Mar 2018 12:42:45 -0700
> Andre McCurdy <armccurdy@gmail.com> wrote:
>
>> Right. The musl example is to show how it's possible to transparently
>> intercept and pass on any call to the syscall() ABI without
>> interpreting anything.
>
> Yes, if you don't need to interpret things, and aren't making
> additional other unrelated system calls after doing so.
>
>> Those details are all taken care of within the libc implementation of
>> syscall(). It's not something we need to care about at all in a
>> wrapper for it.
>
> I don't think that's correct.
>
> musl's call sequence:
> 	real_syscall() // sets a3
> 	return
>
> pseudo's call sequence:
> 	various_setup()
> 	real_syscall() // sets a3
> 	other system calls // also set a3
> 	return
>
> In the case where pseudo is actually *disabled*, we just return
> right away after the real call. In every other case, we're making
> other calls some of which imply system calls, and those system calls
> could potentially overwrite things that the libc implementation of
> syscall took care of. (Mutex and signal mask operations.)
>
> So for that to work, I would in principle have to stash the value
> stored in, for instance, "a3", wait until after the other system calls,
> and then restore it. Unless *only* syscall() itself actually sets
> that register, and other system calls don't, and nothing else is
> using it either.
>
> -s
> -- 
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/openembedded-core
>


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-24 19:50                                 ` Seebs
  2018-03-24 20:12                                   ` Victor Kamensky
@ 2018-03-24 20:22                                   ` Joshua Watt
  2018-03-24 21:01                                     ` Seebs
  2018-03-24 20:27                                   ` Andre McCurdy
  2 siblings, 1 reply; 68+ messages in thread
From: Joshua Watt @ 2018-03-24 20:22 UTC (permalink / raw)
  To: Seebs; +Cc: Enrico Scholz, OE-core

On Sat, Mar 24, 2018 at 2:50 PM, Seebs <seebs@seebs.net> wrote:
> On Sat, 24 Mar 2018 12:42:45 -0700
> Andre McCurdy <armccurdy@gmail.com> wrote:
>
>> Right. The musl example is to show how it's possible to transparently
>> intercept and pass on any call to the syscall() ABI without
>> interpreting anything.
>
> Yes, if you don't need to interpret things, and aren't making
> additional other unrelated system calls after doing so.
>
>> Those details are all taken care of within the libc implementation of
>> syscall(). It's not something we need to care about at all in a
>> wrapper for it.
>
> I don't think that's correct.
>
> musl's call sequence:
>         real_syscall() // sets a3
>         return
>
> pseudo's call sequence:
>         various_setup()
>         real_syscall() // sets a3
>         other system calls // also set a3
>         return
>
> In the case where pseudo is actually *disabled*, we just return
> right away after the real call. In every other case, we're making
> other calls some of which imply system calls, and those system calls
> could potentially overwrite things that the libc implementation of
> syscall took care of. (Mutex and signal mask operations.)
>
> So for that to work, I would in principle have to stash the value
> stored in, for instance, "a3", wait until after the other system calls,
> and then restore it. Unless *only* syscall() itself actually sets
> that register, and other system calls don't, and nothing else is
> using it either.

I don't think that is true. libc's syscall() must conform to the *C*
ABI for the system... if the kernel does things that aren't in line
with the C ABI (like return things in registers that aren't expected,
fail to preserve registers that require preservation, or whatever),
wouldn't the libc syscall() be *required* to paper over it so that it
looks like a valid C call? Otherwise, it could never be safely called
from C code.

So as long as pseudo's replacement of syscall() conformed to the C
ABI, and pseudo calls the libc syscall() (which conforms to the C ABI)
as the real syscall, I think everything should be OK.

That of course doesn't deal with the reentrancy, signal masks, mutexs,
etc. IMHO, the number of syscalls we would actually consider doing
this for is necessarily pretty limited, so perhaps it would just need
some careful evaluation?

>
> -s
> --
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/openembedded-core


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-24 19:50                                 ` Seebs
  2018-03-24 20:12                                   ` Victor Kamensky
  2018-03-24 20:22                                   ` Joshua Watt
@ 2018-03-24 20:27                                   ` Andre McCurdy
  2 siblings, 0 replies; 68+ messages in thread
From: Andre McCurdy @ 2018-03-24 20:27 UTC (permalink / raw)
  To: Seebs; +Cc: Enrico Scholz, OE-core

On Sat, Mar 24, 2018 at 12:50 PM, Seebs <seebs@seebs.net> wrote:
>
> pseudo's call sequence:
>         various_setup()
>         real_syscall() // sets a3
>         other system calls // also set a3
>         return

You don't need to know that the kernel returns a result in any
particular register. The libc syscall() internal implementation will
take care of collecting the result from the kernel and returning it
via the normal function call ABI. Therefore pseudo's call sequence
becomes something like:

        various_setup()
        temp = real_syscall()
        other system calls
        return temp


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-24 20:22                                   ` Joshua Watt
@ 2018-03-24 21:01                                     ` Seebs
  0 siblings, 0 replies; 68+ messages in thread
From: Seebs @ 2018-03-24 21:01 UTC (permalink / raw)
  To: Joshua Watt; +Cc: Enrico Scholz, OE-core

On Sat, 24 Mar 2018 15:22:47 -0500
Joshua Watt <jpewhacker@gmail.com> wrote:

> I don't think that is true. libc's syscall() must conform to the *C*
> ABI for the system... if the kernel does things that aren't in line
> with the C ABI (like return things in registers that aren't expected,
> fail to preserve registers that require preservation, or whatever),
> wouldn't the libc syscall() be *required* to paper over it so that it
> looks like a valid C call? Otherwise, it could never be safely called
> from C code.

I think this is only partially true. There's extra warnings in
syscall(2) about weird kinds of non-conformance with the usual ABI,
like the magic for 64-bit values on EABI (or possibly 32-bit EABI), and
I think the point about the extra registers or possible
register-smashing is just "at this point, you get the behavior of the
actual syscall, which may violate the ABI."

And yeah, it returns correctly, but if code's written to interact with
what syscall() is *supposed* to do, and we trample that in some way,
that's potentially bad.

I honestly have no idea how much scope there is for weird problems,
or whether I'm reading too much into the man page. But there's comments
in the man page that seem like very things to say if libc's syscall
really is just hiding all this complexity. Why on earth would the man
page need to mention those things? What is their relevance, if the
implementation covers all of that?

Practical answer: I'm probably going to attempt the thing, with
the first pass being (1) implement a wrapper for renameat2(), (2)
implement a wrapper for syscall, (3) try to change syscall's behavior
only in the case where the call is renameat2, (4) make this available
and let people try it on a variety of architectures and hope for the
best.

-s


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-24 20:12                                   ` Victor Kamensky
@ 2018-03-24 23:04                                     ` Burton, Ross
  2018-03-25  0:09                                       ` Victor Kamensky
  0 siblings, 1 reply; 68+ messages in thread
From: Burton, Ross @ 2018-03-24 23:04 UTC (permalink / raw)
  To: Victor Kamensky; +Cc: Enrico Scholz, OE-core

On 24 March 2018 at 20:12, Victor Kamensky <kamensky@cisco.com> wrote:
> Here is another crazy idea how to deal with it, just
> brainstorming what options are on the table: disable
> renameat2 with help of seccomp and force coreutils to
> use other calls. Something along the lines that were
> suggested with intercept of syscall function call, but
> let kernel to do interception work.

Wow, that's impressively magic.  Does this depend on kernel options or
specific recent versions?

Ross


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-24 23:04                                     ` Burton, Ross
@ 2018-03-25  0:09                                       ` Victor Kamensky
  2018-03-25  2:43                                         ` Andre McCurdy
  0 siblings, 1 reply; 68+ messages in thread
From: Victor Kamensky @ 2018-03-25  0:09 UTC (permalink / raw)
  To: Burton, Ross; +Cc: Enrico Scholz, OE-core



On Sat, 24 Mar 2018, Burton, Ross wrote:

> On 24 March 2018 at 20:12, Victor Kamensky <kamensky@cisco.com> wrote:
>> Here is another crazy idea how to deal with it, just
>> brainstorming what options are on the table: disable
>> renameat2 with help of seccomp and force coreutils to
>> use other calls. Something along the lines that were
>> suggested with intercept of syscall function call, but
>> let kernel to do interception work.
>
> Wow, that's impressively magic.  Does this depend on kernel options or
> specific recent versions?

Not very recent, but relatively mordern. As far as I read
kernel code seccomp syscall BPF filtering [1] was introduced
in 2012 in 3.5 kernel by chromium project guys.

It is controlled by CONFIG_SECCOMP_FILTER which depends on
HAVE_ARCH_SECCOMP_FILTER that all major CPU architectures
do support by now. And I think CONFIG_SECCOMP_FILTER should
be set for all major cases - AFAIK chrome browser uses it
as one of its sandboxing mechanisms.

But you are right, if any code would use it, it needs to
check whether usable seccomp syscall filtering is present
on the system.

[1] https://github.com/torvalds/linux/blob/master/Documentation/userspace-api/seccomp_filter.rst

Thanks,
Victor

> Ross
>


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-25  0:09                                       ` Victor Kamensky
@ 2018-03-25  2:43                                         ` Andre McCurdy
  2018-03-25  5:37                                           ` Victor Kamensky
  0 siblings, 1 reply; 68+ messages in thread
From: Andre McCurdy @ 2018-03-25  2:43 UTC (permalink / raw)
  To: Victor Kamensky; +Cc: Enrico Scholz, OE-core

On Sat, Mar 24, 2018 at 5:09 PM, Victor Kamensky <kamensky@cisco.com> wrote:
> On Sat, 24 Mar 2018, Burton, Ross wrote:
>> On 24 March 2018 at 20:12, Victor Kamensky <kamensky@cisco.com> wrote:
>>>
>>> Here is another crazy idea how to deal with it, just
>>> brainstorming what options are on the table: disable
>>> renameat2 with help of seccomp and force coreutils to
>>> use other calls. Something along the lines that were
>>> suggested with intercept of syscall function call, but
>>> let kernel to do interception work.
>>
>> Wow, that's impressively magic.  Does this depend on kernel options or
>> specific recent versions?

Yeah, it's impressive but perhaps overkill for this situation.

Having the kernel run a BPF script on every syscall is going to have a
much bigger performance impact than intercepting one specific libc
function in user space.

Also, AFAIK, seccomp can't be nested - so building within an
environment which has already been secured with seccomp (e.g. recent
versions of docker?) might be a problem if pseudo starts to rely on
seccomp too.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-25  2:43                                         ` Andre McCurdy
@ 2018-03-25  5:37                                           ` Victor Kamensky
  2018-03-25  7:05                                             ` Andre McCurdy
  0 siblings, 1 reply; 68+ messages in thread
From: Victor Kamensky @ 2018-03-25  5:37 UTC (permalink / raw)
  To: Andre McCurdy; +Cc: Enrico Scholz, OE-core



On Sat, 24 Mar 2018, Andre McCurdy wrote:

> On Sat, Mar 24, 2018 at 5:09 PM, Victor Kamensky <kamensky@cisco.com> wrote:
>> On Sat, 24 Mar 2018, Burton, Ross wrote:
>>> On 24 March 2018 at 20:12, Victor Kamensky <kamensky@cisco.com> wrote:
>>>>
>>>> Here is another crazy idea how to deal with it, just
>>>> brainstorming what options are on the table: disable
>>>> renameat2 with help of seccomp and force coreutils to
>>>> use other calls. Something along the lines that were
>>>> suggested with intercept of syscall function call, but
>>>> let kernel to do interception work.
>>>
>>> Wow, that's impressively magic.  Does this depend on kernel options or
>>> specific recent versions?
>
> Yeah, it's impressive but perhaps overkill for this situation.
>
> Having the kernel run a BPF script on every syscall is going to have a
> much bigger performance impact than intercepting one specific libc
> function in user space.

I don't think we should worry about overhead in pseudo case.

> Also, AFAIK, seccomp can't be nested - so building within an
> environment which has already been secured with seccomp (e.g. recent
> versions of docker?) might be a problem if pseudo starts to rely on
> seccomp too.

Above is true. It was on my mind.

Note I have no problem whatsoever if you can intercept syscall
function correctly. Function intercepting way is definitely more
aligned with what pseudo does. I was just listing other
possible options.

But please note syscall function takes a
variable number of arguments and call another variable
number of argument function, real syscall implementation, in
general, cannot be done. One would need to have complimentary
vsyscall function taking va_list. I.e like printf and vprintf.

Please see http://c-faq.com/varargs/handoff.html

But maybe something special can be done for syscall case.
Disclaimer: I did not read full thread, maybe you already
discussed this.

Thanks,
Victor


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-25  5:37                                           ` Victor Kamensky
@ 2018-03-25  7:05                                             ` Andre McCurdy
  2018-03-26 18:49                                               ` Andreas Müller
  0 siblings, 1 reply; 68+ messages in thread
From: Andre McCurdy @ 2018-03-25  7:05 UTC (permalink / raw)
  To: Victor Kamensky; +Cc: Enrico Scholz, OE-core

On Sat, Mar 24, 2018 at 10:37 PM, Victor Kamensky <kamensky@cisco.com> wrote:
> On Sat, 24 Mar 2018, Andre McCurdy wrote:
>> On Sat, Mar 24, 2018 at 5:09 PM, Victor Kamensky <kamensky@cisco.com>
>> wrote:
>>> On Sat, 24 Mar 2018, Burton, Ross wrote:
>>>> On 24 March 2018 at 20:12, Victor Kamensky <kamensky@cisco.com> wrote:
>>>>> Here is another crazy idea how to deal with it, just
>>>>> brainstorming what options are on the table: disable
>>>>> renameat2 with help of seccomp and force coreutils to
>>>>> use other calls. Something along the lines that were
>>>>> suggested with intercept of syscall function call, but
>>>>> let kernel to do interception work.
>>>>
>>>> Wow, that's impressively magic.  Does this depend on kernel options or
>>>> specific recent versions?
>>
>> Yeah, it's impressive but perhaps overkill for this situation.
>>
>> Having the kernel run a BPF script on every syscall is going to have a
>> much bigger performance impact than intercepting one specific libc
>> function in user space.
>
> I don't think we should worry about overhead in pseudo case.
>
>> Also, AFAIK, seccomp can't be nested - so building within an
>> environment which has already been secured with seccomp (e.g. recent
>> versions of docker?) might be a problem if pseudo starts to rely on
>> seccomp too.
>
> Above is true. It was on my mind.
>
> Note I have no problem whatsoever if you can intercept syscall
> function correctly. Function intercepting way is definitely more
> aligned with what pseudo does. I was just listing other
> possible options.
>
> But please note syscall function takes a
> variable number of arguments and call another variable
> number of argument function, real syscall implementation, in
> general, cannot be done. One would need to have complimentary
> vsyscall function taking va_list. I.e like printf and vprintf.
>
> Please see http://c-faq.com/varargs/handoff.html
>
> But maybe something special can be done for syscall case.
> Disclaimer: I did not read full thread, maybe you already
> discussed this.

Yes, I think it's already been covered in the thread. Although the
libc syscall() function takes a variable number of arguments, it's
known that there are a maximum of 6 of them and they are all of a data
type which fits into the register size of the target architecture (ie
"long" for most 32bit and 64bit targets, "long long" for x32 etc).
Therefore it's possible to extract them from the va_args created by
the caller into 6 temporary variables and then pass those variables on
when calling the real libc syscall() function. ie we don't actually
need to pass the original caller's va_args on to the real syscall()
function - we just need to pass on all the arguments.

There's some concern that unconditionally extracting 6 arguments when
the caller may have supplied less than that could be problematic.
However, there's code in both glibc and musl which does exactly that,
so I'm inclined to think it's OK in practice. The worst that can
happen would seem to be passing some extra junk values to a syscall in
the kernel which is going to ignore them.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-25  7:05                                             ` Andre McCurdy
@ 2018-03-26 18:49                                               ` Andreas Müller
  2018-03-26 19:31                                                 ` Seebs
  0 siblings, 1 reply; 68+ messages in thread
From: Andreas Müller @ 2018-03-26 18:49 UTC (permalink / raw)
  To: OE-core

On Sun, Mar 25, 2018 at 9:05 AM, Andre McCurdy <armccurdy@gmail.com> wrote:
> On Sat, Mar 24, 2018 at 10:37 PM, Victor Kamensky <kamensky@cisco.com> wrote:
>> On Sat, 24 Mar 2018, Andre McCurdy wrote:
>>> On Sat, Mar 24, 2018 at 5:09 PM, Victor Kamensky <kamensky@cisco.com>
>>> wrote:
>>>> On Sat, 24 Mar 2018, Burton, Ross wrote:
>>>>> On 24 March 2018 at 20:12, Victor Kamensky <kamensky@cisco.com> wrote:
>>>>>> Here is another crazy idea how to deal with it, just
>>>>>> brainstorming what options are on the table: disable
>>>>>> renameat2 with help of seccomp and force coreutils to
>>>>>> use other calls. Something along the lines that were
>>>>>> suggested with intercept of syscall function call, but
>>>>>> let kernel to do interception work.
>>>>>
>>>>> Wow, that's impressively magic.  Does this depend on kernel options or
>>>>> specific recent versions?
>>>
>>> Yeah, it's impressive but perhaps overkill for this situation.
>>>
>>> Having the kernel run a BPF script on every syscall is going to have a
>>> much bigger performance impact than intercepting one specific libc
>>> function in user space.
>>
>> I don't think we should worry about overhead in pseudo case.
>>
>>> Also, AFAIK, seccomp can't be nested - so building within an
>>> environment which has already been secured with seccomp (e.g. recent
>>> versions of docker?) might be a problem if pseudo starts to rely on
>>> seccomp too.
>>
>> Above is true. It was on my mind.
>>
>> Note I have no problem whatsoever if you can intercept syscall
>> function correctly. Function intercepting way is definitely more
>> aligned with what pseudo does. I was just listing other
>> possible options.
>>
>> But please note syscall function takes a
>> variable number of arguments and call another variable
>> number of argument function, real syscall implementation, in
>> general, cannot be done. One would need to have complimentary
>> vsyscall function taking va_list. I.e like printf and vprintf.
>>
>> Please see http://c-faq.com/varargs/handoff.html
>>
>> But maybe something special can be done for syscall case.
>> Disclaimer: I did not read full thread, maybe you already
>> discussed this.
>
> Yes, I think it's already been covered in the thread. Although the
> libc syscall() function takes a variable number of arguments, it's
> known that there are a maximum of 6 of them and they are all of a data
> type which fits into the register size of the target architecture (ie
> "long" for most 32bit and 64bit targets, "long long" for x32 etc).
> Therefore it's possible to extract them from the va_args created by
> the caller into 6 temporary variables and then pass those variables on
> when calling the real libc syscall() function. ie we don't actually
> need to pass the original caller's va_args on to the real syscall()
> function - we just need to pass on all the arguments.
>
> There's some concern that unconditionally extracting 6 arguments when
> the caller may have supplied less than that could be problematic.
> However, there's code in both glibc and musl which does exactly that,
> so I'm inclined to think it's OK in practice. The worst that can
> happen would seem to be passing some extra junk values to a syscall in
> the kernel which is going to ignore them.
> --
FWIW: All my build machines are affected by this issue. As temporary
workaround I downgraded coreutils-8.27-20.fc27 by

dnf install coreutils-8.27-16.fc27

Now images seem to build again without floods of host contamination.
Have no idea for how long downgrade is possible...

Interesting background: mv/renameat2 change seemed so important for
Fedora that they backported the changes into 8.27.

Andreas


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-26 18:49                                               ` Andreas Müller
@ 2018-03-26 19:31                                                 ` Seebs
  2018-03-26 20:12                                                   ` Andre McCurdy
  0 siblings, 1 reply; 68+ messages in thread
From: Seebs @ 2018-03-26 19:31 UTC (permalink / raw)
  To: Andreas Müller; +Cc: OE-core

On Mon, 26 Mar 2018 20:49:30 +0200
Andreas Müller <schnitzeltony@gmail.com> wrote:

> Interesting background: mv/renameat2 change seemed so important for
> Fedora that they backported the changes into 8.27.

It looks like the reason for this is the RENAME_NOREPLACE flag, which
avoids a possible race condition.

FWIW, I've traded a couple of emails with the coreutils people, and I
think at this point I'm going to try a custom wrapper for syscall that
just yields ENOTSUPP, because any attempt to do something fancier
seems like it's going to be potentially error-prone.

Since the man page gave the ia64 example, I went and checked, and it
is indeed the case that calls other than syscall(2) will clobber r10
after system calls, so it's actually not possible for a C wrapper to
do what we want on an intercepted syscall.

Luckily for everyone, no one actually cares about ia64, aka Itanic.

-s


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-26 19:31                                                 ` Seebs
@ 2018-03-26 20:12                                                   ` Andre McCurdy
  2018-03-26 21:07                                                     ` Seebs
  2018-03-27 13:06                                                     ` Enrico Scholz
  0 siblings, 2 replies; 68+ messages in thread
From: Andre McCurdy @ 2018-03-26 20:12 UTC (permalink / raw)
  To: Seebs; +Cc: OE-core

On Mon, Mar 26, 2018 at 12:31 PM, Seebs <seebs@seebs.net> wrote:
> On Mon, 26 Mar 2018 20:49:30 +0200
> Andreas Müller <schnitzeltony@gmail.com> wrote:
>
>> Interesting background: mv/renameat2 change seemed so important for
>> Fedora that they backported the changes into 8.27.
>
> It looks like the reason for this is the RENAME_NOREPLACE flag, which
> avoids a possible race condition.
>
> FWIW, I've traded a couple of emails with the coreutils people, and I
> think at this point I'm going to try a custom wrapper for syscall that
> just yields ENOTSUPP, because any attempt to do something fancier
> seems like it's going to be potentially error-prone.
>
> Since the man page gave the ia64 example, I went and checked, and it
> is indeed the case that calls other than syscall(2) will clobber r10
> after system calls, so it's actually not possible for a C wrapper to
> do what we want on an intercepted syscall.

That's based on your assumption that a C wrapper needs to care about
results in architecture specific registers, which I contend is not a
correct interpretation of the syscall manpage.

Did you find any evidence to support your interpretation? e.g. Did you
find any examples of callers to the libc syscall() API which use
architecture specific assembler to examine the result of the syscall?

The gnulib code calling syscall(SYS_renameat2, ...) certainly doesn't
do that - it just checks the C function return value and errno. Since
there's no architecture specific code to examine the syscall() result,
do you expect coreutils mv to now incorrectly detect errors on ia64?


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-26 20:12                                                   ` Andre McCurdy
@ 2018-03-26 21:07                                                     ` Seebs
  2018-03-27  1:10                                                       ` Andre McCurdy
  2018-03-27 13:06                                                     ` Enrico Scholz
  1 sibling, 1 reply; 68+ messages in thread
From: Seebs @ 2018-03-26 21:07 UTC (permalink / raw)
  To: Andre McCurdy; +Cc: OE-core

On Mon, 26 Mar 2018 13:12:44 -0700
Andre McCurdy <armccurdy@gmail.com> wrote:

> That's based on your assumption that a C wrapper needs to care about
> results in architecture specific registers, which I contend is not a
> correct interpretation of the syscall manpage.

My observation is: If this doesn't matter, why is glibc doing it? It
seems really weird to mention this thing, and bother doing it, if it
*never* matters. So possibly it does matter. Sometimes. When?

I don't feel comfortable assuming I understand the code if it's doing
something like that, I can't see when it would affect anything, and the
code hasn't been removed to improve performance. I'd be a lot more
comfortable disregarding the weird return values and register
specifications if I could look at real-world examples of how that
information is used.

> Did you find any evidence to support your interpretation? e.g. Did you
> find any examples of callers to the libc syscall() API which use
> architecture specific assembler to examine the result of the syscall?

I have seen exactly one use of syscall() in the wild at all, that being
the recent addition to coreutils.

The evidence for my interpretation that you *could* need to know about
arch-specific behavior is the EABI example, which clearly indicates the
*possibility* that code in C has to care about architectural variance
in non-obvious ways. I don't know what ways those might be, and this
call is so rarely used that I'm not sure it would be reasonable to
generalize about it. (I also don't know whether it's still true on
64-bit ARM, and whether it also applies to pointer values or only to
integer values, or...)

> The gnulib code calling syscall(SYS_renameat2, ...) certainly doesn't
> do that - it just checks the C function return value and errno. Since
> there's no architecture specific code to examine the syscall() result,
> do you expect coreutils mv to now incorrectly detect errors on ia64?

No. But I don't know whether *anyone* is using syscall(), other than
this one single example I've seen identified. I also don't know how
widely tested the code in question is, or on what architectures.

Testing on non-x86 architectures has often been sporadic in the open
source community, and I think that's improved, but I don't know how
much. If I'm looking at something that's almost never used, and I don't
have specific information that the existing usage is being fully tested
on "obscure" targets (such as mips, arm, etc), I am going to be at
least a little distrustful.

-s


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-26 21:07                                                     ` Seebs
@ 2018-03-27  1:10                                                       ` Andre McCurdy
  2018-03-27  1:32                                                         ` Seebs
  0 siblings, 1 reply; 68+ messages in thread
From: Andre McCurdy @ 2018-03-27  1:10 UTC (permalink / raw)
  To: Seebs; +Cc: OE-core

I've verified that the QA warnings on FC27 can be fixed by preloading
a wrapper for libc syscall(). Just a proof of concept but if anyone
would like to reproduce it my steps are below.


$ cat ../meta/recipes-core/mvtest/mvtest.bb
LICENSE = "CLOSED"

do_compile() {
        echo foo > bar
}

do_install() {
        install -D -p -m 0644 bar ${D}/bin/bar
        install -d -m 0755 ${D}/usr/bin
        mv ${D}/bin/bar ${D}/usr/bin/bar
        ln -s /usr/bin/bar ${D}/bin/bar
}

FILES_${PN} = "/bin/* /usr/bin/*"

$ bitbake mvtest
WARNING: Host distribution "fedora-27" has not been validated with
this version of the build system; you may possibly experience
unexpected failures. It is recommended that you use a tested
distribution.
Parsing recipes: 100%
|#######################################################| Time:
0:00:49
Parsing of 815 .bb files complete (0 cached, 815 parsed). 1282
targets, 45 skipped, 0 masked, 0 errors.
NOTE: Resolving any missing task queue dependencies

Build Configuration:
BB_VERSION           = "1.37.0"
BUILD_SYS            = "x86_64-linux"
NATIVELSBSTRING      = "universal"
TARGET_SYS           = "i586-poky-linux"
MACHINE              = "qemux86"
DISTRO               = "poky"
DISTRO_VERSION       = "2.4+snapshot-20180327"
TUNE_FEATURES        = "m32 i586"
TARGET_FPU           = ""
meta
meta-poky
meta-yocto-bsp       = "master:80c7ca2c28959d08a59d960d318d8360392bd488"

Initialising tasks: 100%
|####################################################| Time: 0:00:00
NOTE: Executing SetScene Tasks
NOTE: Executing RunQueue Tasks
WARNING: mvtest-1.0-r0 do_package_qa: QA Issue: mvtest:
/mvtest/usr/bin/bar is owned by uid 1000, which is the same as the
user running bitbake. This may be due to host contamination
[host-user-contaminated]
NOTE: Tasks Summary: Attempted 470 tasks of which 456 didn't need to
be rerun and all succeeded.

Summary: There were 2 WARNING messages shown.

$ git clone https://github.com/armcc/interposers.git
Cloning into 'interposers'...
remote: Counting objects: 12, done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 12 (delta 0), reused 5 (delta 0), pack-reused 6
Unpacking objects: 100% (12/12), done.
$ make -C interposers/
make: Entering directory '/home/vagrant/poky/build/interposers'
gcc -O2 -Wl,-O1 -Wl,--hash-style=gnu -fPIC -Wall -Werror -std=c99
-shared -o libinterpose_memcpy.so memcpy.c -ldl
gcc -O2 -Wl,-O1 -Wl,--hash-style=gnu -fPIC -Wall -Werror -std=c99
-shared -o libinterpose_syscall.so syscall.c -ldl
make: Leaving directory '/home/vagrant/poky/build/interposers'

$ sudo mkdir /usr/lib/interposers
$ sudo cp interposers/libinterpose_syscall.so /usr/lib/interposers
$ sudo cp interposers/interpose_syscall /usr/local/bin/
$ sudo cp interposers/mv_wrapper /usr/local/bin/mv
$ ln -sf /usr/local/bin/interpose_syscall tmp/hosttools/interpose_syscall
$ ln -sf /usr/local/bin/mv tmp/hosttools/mv

$ bitbake mvtest -c cleanall

$ bitbake mvtest
WARNING: Host distribution "fedora-27" has not been validated with
this version of the build system; you may possibly experience
unexpected failures. It is recommended that you use a tested
distribution.
Loading cache: 100%
|#########################################################| Time:
0:00:00
Loaded 1282 entries from dependency cache.
NOTE: Resolving any missing task queue dependencies

Build Configuration:
BB_VERSION           = "1.37.0"
BUILD_SYS            = "x86_64-linux"
NATIVELSBSTRING      = "universal"
TARGET_SYS           = "i586-poky-linux"
MACHINE              = "qemux86"
DISTRO               = "poky"
DISTRO_VERSION       = "2.4+snapshot-20180327"
TUNE_FEATURES        = "m32 i586"
TARGET_FPU           = ""
meta
meta-poky
meta-yocto-bsp       = "master:80c7ca2c28959d08a59d960d318d8360392bd488"

Initialising tasks: 100%
|####################################################| Time: 0:00:00
NOTE: Executing SetScene Tasks
NOTE: Executing RunQueue Tasks
NOTE: Tasks Summary: Attempted 470 tasks of which 456 didn't need to
be rerun and all succeeded.

Summary: There was 1 WARNING message shown.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-27  1:10                                                       ` Andre McCurdy
@ 2018-03-27  1:32                                                         ` Seebs
  2018-03-27  1:34                                                           ` Andre McCurdy
  0 siblings, 1 reply; 68+ messages in thread
From: Seebs @ 2018-03-27  1:32 UTC (permalink / raw)
  To: Andre McCurdy; +Cc: OE-core

On Mon, 26 Mar 2018 18:10:07 -0700
Andre McCurdy <armccurdy@gmail.com> wrote:

> I've verified that the QA warnings on FC27 can be fixed by preloading
> a wrapper for libc syscall(). Just a proof of concept but if anyone
> would like to reproduce it my steps are below.

Yes, I think we were expecting it would work on x86, where the ABI is
trivial and friendly?

I remain interested in why the glibc implementation does all these
weird things on some architectures if none of those things matter.

-s


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-27  1:32                                                         ` Seebs
@ 2018-03-27  1:34                                                           ` Andre McCurdy
  2018-03-27  2:07                                                             ` Seebs
  0 siblings, 1 reply; 68+ messages in thread
From: Andre McCurdy @ 2018-03-27  1:34 UTC (permalink / raw)
  To: Seebs; +Cc: OE-core

On Mon, Mar 26, 2018 at 6:32 PM, Seebs <seebs@seebs.net> wrote:
> On Mon, 26 Mar 2018 18:10:07 -0700
> Andre McCurdy <armccurdy@gmail.com> wrote:
>
>> I've verified that the QA warnings on FC27 can be fixed by preloading
>> a wrapper for libc syscall(). Just a proof of concept but if anyone
>> would like to reproduce it my steps are below.
>
> Yes, I think we were expecting it would work on x86, where the ABI is
> trivial and friendly?
>
> I remain interested in why the glibc implementation does all these
> weird things on some architectures if none of those things matter.

Which glibc implementation? I'll take a look.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-27  1:34                                                           ` Andre McCurdy
@ 2018-03-27  2:07                                                             ` Seebs
  2018-03-27  2:59                                                               ` Andre McCurdy
  0 siblings, 1 reply; 68+ messages in thread
From: Seebs @ 2018-03-27  2:07 UTC (permalink / raw)
  To: Andre McCurdy; +Cc: OE-core

On Mon, 26 Mar 2018 18:34:10 -0700
Andre McCurdy <armccurdy@gmail.com> wrote:

> > I remain interested in why the glibc implementation does all these
> > weird things on some architectures if none of those things matter.
> 
> Which glibc implementation? I'll take a look.

syscall(2) for various architectures, which is actually implementing
all this fancy ABI stuff. If that doesn't matter, why's it there?

I think we may be talking past each other. I'm not looking for "I tried
this once on one architecture and it worked." I'm looking for a good
enough understanding of *why* all these things are in the man page, and
when they might matter, that I can reasonably predict whether this will
work on lots of other platforms, and continue to work in the future.

Pseudo is already way off in the weeds, but it mostly works, and the
reason it mostly works is that I try to find out why things are the way
they are rather than disregarding them. (And I'm thinking I should
possibly add an is-syscall flag to wrappers, and then have those
wrappers check returns and recreate the success/fail state right before
actually returning.)

-s


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-27  2:07                                                             ` Seebs
@ 2018-03-27  2:59                                                               ` Andre McCurdy
  2018-03-27  4:41                                                                 ` Seebs
  0 siblings, 1 reply; 68+ messages in thread
From: Andre McCurdy @ 2018-03-27  2:59 UTC (permalink / raw)
  To: Seebs; +Cc: OE-core

On Mon, Mar 26, 2018 at 7:07 PM, Seebs <seebs@seebs.net> wrote:
>
>>> I remain interested in why the glibc implementation does all these
>>> weird things on some architectures if none of those things matter.
>>
>> Which glibc implementation? I'll take a look.
>
> syscall(2) for various architectures, which is actually implementing
> all this fancy ABI stuff. If that doesn't matter, why's it there?

The syscall manpage is from the kernel manpages, not glibc.

  http://man7.org/linux/man-pages/man2/syscall.2.html

I agree it's a bit weird that it contains a description of the
kernel's syscall calling conventions, but perhaps that's a historical
leftover from an original document which described kernel internals?
Either way I think it's useful background information.

> I think we may be talking past each other.

Well, the good news is I'm almost done talking :-)

> I'm not looking for "I tried
> this once on one architecture and it worked." I'm looking for a good
> enough understanding of *why* all these things are in the man page, and
> when they might matter, that I can reasonably predict whether this will
> work on lots of other platforms, and continue to work in the future.

I'm not sure I can help you with your understanding.

Personally, I've read the manpage, I've read code in glibc and musl,
I've straced coreutils mv and various little test programs on 32bit
ARM plus 32bit and 64bit x86 and written a wrapper for libc syscall()
which either intercepts or passes through syscalls. Everything I've
found seems to be consistent to the point that I've satisfied myself
that I have a pretty clear understanding of how libc syscall() works,
including why ARM EABI sometimes needs an extra argument to offset
64bit values - and when it matters for a wrapper and when it doesn't.
I don't think there's much more I can do.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-27  2:59                                                               ` Andre McCurdy
@ 2018-03-27  4:41                                                                 ` Seebs
  2018-03-27 19:11                                                                   ` Andre McCurdy
  0 siblings, 1 reply; 68+ messages in thread
From: Seebs @ 2018-03-27  4:41 UTC (permalink / raw)
  To: Andre McCurdy; +Cc: OE-core

On Mon, 26 Mar 2018 19:59:09 -0700
Andre McCurdy <armccurdy@gmail.com> wrote:

> The syscall manpage is from the kernel manpages, not glibc.

>   http://man7.org/linux/man-pages/man2/syscall.2.html

And yet! glibc is setting those registers in its code. Why? If that's a
kernel thing and libc doesn't need to do it, why is libc doing it?

If it's "useful background information", what exactly is it "useful"
for?

> Personally, I've read the manpage, I've read code in glibc and musl,
> I've straced coreutils mv and various little test programs on 32bit
> ARM plus 32bit and 64bit x86 and written a wrapper for libc syscall()
> which either intercepts or passes through syscalls.

Okay, you've read the code in glibc and understand it. So, why does the
glibc code have that register-setting assembly, if that register-setting
assembly doesn't matter?

You've told me several times that we don't need to think about the
register-setting code. So why did glibc include it?

> Everything I've
> found seems to be consistent to the point that I've satisfied myself
> that I have a pretty clear understanding of how libc syscall() works,
> including why ARM EABI sometimes needs an extra argument to offset
> 64bit values - and when it matters for a wrapper and when it doesn't.
> I don't think there's much more I can do.

Okay, you say you understand why ARM EABI "sometimes" needs an argument
to offset things. What are the circumstances? Is it specific to 32-bit
targets? On a target with 64-bit pointers, would it apply also to
64-bit pointers, or is it exclusively for 64-bit integers?

Because it seems to me that on a 64-bit target, renameat2() would in
fact be passing a 64-bit object as the second argument. And if there's
a reason that this doesn't count as a 64-bit argument passed after an
odd number of 32-bit arguments, I'd like to know specifically what that
reason is before I go relying on it to stay true forever.

-s


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-26 20:12                                                   ` Andre McCurdy
  2018-03-26 21:07                                                     ` Seebs
@ 2018-03-27 13:06                                                     ` Enrico Scholz
  2018-03-27 15:50                                                       ` Seebs
  1 sibling, 1 reply; 68+ messages in thread
From: Enrico Scholz @ 2018-03-27 13:06 UTC (permalink / raw)
  To: openembedded-core, Seebs

Andre McCurdy <armccurdy-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
writes:

>> Since the man page gave the ia64 example, I went and checked, and it
>> is indeed the case that calls other than syscall(2) will clobber r10
>> after system calls,

I think you are misinterpreting the man-page.  In "Architecture
calling conventions" it documents the calling convention into the
kernel.  syscall(2) itself is an ordinary function which has to
follow the userspace ABI; after jumping into the kernel and setting
'errno' in error case, it restores registers as needed.

Some ABIs allow functions to clobber registers (they are not restored
after leaving the function and do not carry a return value); e.g. on
ARM, these are r0-r3 and r12.  That's probably the case for r10 in ia64
too.



Enrico


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-23 16:28       ` Seebs
  2018-03-23 16:30         ` Burton, Ross
@ 2018-03-27 14:42         ` Enrico Scholz
  2018-03-27 15:55           ` Seebs
  1 sibling, 1 reply; 68+ messages in thread
From: Enrico Scholz @ 2018-03-27 14:42 UTC (permalink / raw)
  To: Seebs; +Cc: OE-core

Seebs <seebs@seebs.net> writes:

> And so far as I know, it's not actually *possible* to in the general
> case. I really don't think it's safe to try to catch syscall().

I think, something like

----
static void (*orig_syscall)();
long syscall(long number, ...)
{
        switch (number) {
        case __NR_renameat2: return _renameat2_syscall(.......);
        }
        
 	void	*res =__builtin_apply(orig_syscall, __builtin_apply_args(),
				      sizeof(uintmax_t) * 7);

	__builtin_return(res);
}
----

will work to wrap syscall(2).  Params for _renameat2_syscall() can be
extracted by va_args.


Code generated above is very ineffective; perhaps you can create
specialized assembly instructions which just jump into orig_syscall.


Enrico


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-27 13:06                                                     ` Enrico Scholz
@ 2018-03-27 15:50                                                       ` Seebs
  2018-03-27 16:26                                                         ` Enrico Scholz
  0 siblings, 1 reply; 68+ messages in thread
From: Seebs @ 2018-03-27 15:50 UTC (permalink / raw)
  To: Enrico Scholz; +Cc: openembedded-core

On Tue, 27 Mar 2018 15:06:40 +0200
Enrico Scholz <enrico.scholz@sigma-chemnitz.de> wrote:

> Andre McCurdy <armccurdy-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> writes:
> 
> >> Since the man page gave the ia64 example, I went and checked, and
> >> it is indeed the case that calls other than syscall(2) will
> >> clobber r10 after system calls,
> 
> I think you are misinterpreting the man-page.  In "Architecture
> calling conventions" it documents the calling convention into the
> kernel.  syscall(2) itself is an ordinary function which has to
> follow the userspace ABI; after jumping into the kernel and setting
> 'errno' in error case, it restores registers as needed.

I don't think this is what it's talking about.

> Some ABIs allow functions to clobber registers (they are not restored
> after leaving the function and do not carry a return value); e.g. on
> ARM, these are r0-r3 and r12.  That's probably the case for r10 in
> ia64 too.

Maybe you missed the previous message where I pointed out that this
behavior is, at least on MIPS, an explicit step taken by glibc's
syscall implementation (and many other system calls).

So, no matter what the kernel's internal syscall behavior does, *after*
the syscall has returned, glibc is checking whether a syscall returned
-1, and setting a register based on that. This isn't a generic clobber;
this is an explicitly specified value that the register shall have
after the completion of the call, which glibc is implementing in code.

And we don't actually know why, because as Andre has pointed out, if
you don't do that, nothing obvious breaks in the test cases we've
tried. (Admittedly, I don't think we've tried on any of the
architectures where such a convention exists.)

-s


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-27 14:42         ` Enrico Scholz
@ 2018-03-27 15:55           ` Seebs
  2018-03-27 16:35             ` Enrico Scholz
  0 siblings, 1 reply; 68+ messages in thread
From: Seebs @ 2018-03-27 15:55 UTC (permalink / raw)
  To: Enrico Scholz; +Cc: OE-core

On Tue, 27 Mar 2018 16:42:03 +0200
Enrico Scholz <enrico.scholz@sigma-chemnitz.de> wrote:

> will work to wrap syscall(2).  Params for _renameat2_syscall() can be
> extracted by va_args.

Does anyone have access to an actual 64-bit EABI ARM system to verify
the argument passing for renameat2 there?
 
> Code generated above is very ineffective; perhaps you can create
> specialized assembly instructions which just jump into orig_syscall.

I do not think I want to start adding assembly to pseudo, because I do
not feel like learning assembly for all the architectures we currently
run on. (Pseudo is, I believe, known to work across x86, PPC, MIPS, and
ARM, in both 32-bit and 64-bit variants, also weird stuff like the x32
variant of x86.)

-s


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-27 15:50                                                       ` Seebs
@ 2018-03-27 16:26                                                         ` Enrico Scholz
  2018-03-27 16:46                                                           ` Seebs
  0 siblings, 1 reply; 68+ messages in thread
From: Enrico Scholz @ 2018-03-27 16:26 UTC (permalink / raw)
  To: Seebs; +Cc: openembedded-core

Seebs <seebs@seebs.net> writes:

>> >> Since the man page gave the ia64 example, I went and checked, and
>> >> it is indeed the case that calls other than syscall(2) will
>> >> clobber r10 after system calls,
>> 
>> I think you are misinterpreting the man-page.  In "Architecture
>> calling conventions" it documents the calling convention into the
>> kernel.  syscall(2) itself is an ordinary function which has to
>> follow the userspace ABI; after jumping into the kernel and setting
>> 'errno' in error case, it restores registers as needed.
>
> I don't think this is what it's talking about.

Perhaps we have different man pages but e.g. [1] mentions only registers
in the context of the kernel interface but not when entering/leaving
syscall(2) itself.


>> Some ABIs allow functions to clobber registers (they are not restored
>> after leaving the function and do not carry a return value); e.g. on
>> ARM, these are r0-r3 and r12.  That's probably the case for r10 in
>> ia64 too.
>
> Maybe you missed the previous message where I pointed out that this
> behavior is, at least on MIPS, an explicit step taken by glibc's
> syscall implementation (and many other system calls).

When, then this is completely undocumented and a glibc-only thing.
Other implementations[2] follow the behavior described in the man page
and do not set some magic registers on return.

I did not found the glibc syscall implementation for MIPS atm.



Enrico

Footnotes: 
[1]  http://man7.org/linux/man-pages/man2/syscall.2.html

[2]  https://android.googlesource.com/platform/bionic/+/ae5c3dd73844e6a9e1a14dbf893eab5142902f18/libc/arch-mips/syscalls/syscall.S
     https://github.com/ops-class/os161/blob/master/userland/lib/libc/arch/mips/syscalls-mips.S
     https://github.com/m-labs/uclibc-lm32/blob/master/libc/sysdeps/linux/mips/syscall.S


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-27 15:55           ` Seebs
@ 2018-03-27 16:35             ` Enrico Scholz
  2018-03-27 16:40               ` Seebs
  0 siblings, 1 reply; 68+ messages in thread
From: Enrico Scholz @ 2018-03-27 16:35 UTC (permalink / raw)
  To: openembedded-core

Seebs <seebs-59Mtl4G6ZbFeoWH0uzbU5w@public.gmane.org> writes:

> On Tue, 27 Mar 2018 16:42:03 +0200
> Enrico Scholz <enrico.scholz-wttK6gPy29v+Hn7q9Vec/7NAH6kLmebB@public.gmane.org> wrote:
>
>> will work to wrap syscall(2).  Params for _renameat2_syscall() can be
>> extracted by va_args.
>
> Does anyone have access to an actual 64-bit EABI ARM system to verify
> the argument passing for renameat2 there?

Does this really matter here?  Because the caller has to set them
accordingly the ABI, you can extract the arguments by

	int olddirfd        = va_arg(ap, int);
	char const *oldpath = va_arg(ap, char consr *);
	int newdirfd        = va_arg(ap, int);
	char const *newpath = va_arg(ap, char consr *);
        unsigned int flags  = va_arg(ap, unsigned int);

There are no 64 bit arguments (on 32 bit platforms) which might require a
special treatment as described in [1] "Architecture-specific requirements".



Enrico

Footnotes: 
[1]  http://man7.org/linux/man-pages/man2/syscall.2.html


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-27 16:35             ` Enrico Scholz
@ 2018-03-27 16:40               ` Seebs
  2018-03-27 19:20                 ` Enrico Scholz
  0 siblings, 1 reply; 68+ messages in thread
From: Seebs @ 2018-03-27 16:40 UTC (permalink / raw)
  To: Enrico Scholz; +Cc: openembedded-core

On Tue, 27 Mar 2018 18:35:32 +0200
Enrico Scholz <enrico.scholz@sigma-chemnitz.de> wrote:

> Does this really matter here?  Because the caller has to set them
> accordingly the ABI, you can extract the arguments by
> 
> 	int olddirfd        = va_arg(ap, int);
> 	char const *oldpath = va_arg(ap, char consr *);
> 	int newdirfd        = va_arg(ap, int);
> 	char const *newpath = va_arg(ap, char consr *);
>         unsigned int flags  = va_arg(ap, unsigned int);
> 
> There are no 64 bit arguments (on 32 bit platforms) which might
> require a special treatment as described in [1]
> "Architecture-specific requirements".

Okay, ignore the pointer case, and pretend it's the 64-bit value case,
since we have specific-ish documentation for that.

Look at the example for SYS_readahead, stating that the caller must
pass an extra value.

At that point, if you have a series of va_arg calls corresponding to
the values that would have been arguments had they not passed the extra
value, I don't think you get the expected arguments. So far as I can
tell, if the caller actually wrote
	varargsfunc(SYS_readahead, 0, uint64_t_value, ...)
and the function did
	va_arg(ap, uint64_t);
they would not get the value passed as the third argument, because the
calls to va_arg don't match the arguments passed.

If you could just ignore this, the SYS_readahead example wouldn't have
to exist; you could just follow the ABI and provide a 64-bit value.

-s


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-27 16:26                                                         ` Enrico Scholz
@ 2018-03-27 16:46                                                           ` Seebs
  0 siblings, 0 replies; 68+ messages in thread
From: Seebs @ 2018-03-27 16:46 UTC (permalink / raw)
  To: Enrico Scholz; +Cc: openembedded-core

On Tue, 27 Mar 2018 18:26:05 +0200
Enrico Scholz <enrico.scholz@sigma-chemnitz.de> wrote:

> Perhaps we have different man pages but e.g. [1] mentions only
> registers in the context of the kernel interface but not when
> entering/leaving syscall(2) itself.

And yet, it's syscall(2) doing the thing... or possibly it's not,
because it turns out I don't read IA64 assembly.

> When, then this is completely undocumented and a glibc-only thing.
> Other implementations[2] follow the behavior described in the man page
> and do not set some magic registers on return.

> I did not found the glibc syscall implementation for MIPS atm.

Hmm. In MIPS, it does appear that glibc's syscall is *using* the return
register, rather than writing to it.

Yeah, I misread the IA64 code.

I'm gonna go ahead and try to implement a wrapper. Until I've got test
cases available, it's gonna be a custom wrapper, that will ENOTSUP for
renameat2, and try to pass other things on naively.

But I think Andre and Enrico are right that the code in glibc is not
doing what I interpreted it as doing.

-s


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-27  4:41                                                                 ` Seebs
@ 2018-03-27 19:11                                                                   ` Andre McCurdy
  2018-03-27 19:22                                                                     ` Seebs
  0 siblings, 1 reply; 68+ messages in thread
From: Andre McCurdy @ 2018-03-27 19:11 UTC (permalink / raw)
  To: Seebs; +Cc: OE-core

On Mon, Mar 26, 2018 at 9:41 PM, Seebs <seebs@seebs.net> wrote:
>
>> The syscall manpage is from the kernel manpages, not glibc.
>
>>   http://man7.org/linux/man-pages/man2/syscall.2.html
>
> And yet! glibc is setting those registers in its code. Why? If that's a
> kernel thing and libc doesn't need to do it, why is libc doing it?

Of course libc syscall is setting those registers WITHIN it's code.
The job of the syscall() function is to translate from a C callable
API into a kernel syscall - so it must read arguments passed in from
the C caller (via the normal C function call rules, e.g. the first few
arguments passed via registers, the rest on the stack, etc) and use
them to setup a kernel syscall (via the kernel's syscall interface, ie
maximum of 6 arguments, all passed via registers). After the kernel
syscall has returned, the implementation of libc syscall() needs to
collect the result from whichever register the kernel leaves it in and
return it via the normal C function call rules (plus take care of some
extra housekeeping, ie setting errno).

Whatever happens within syscall() is not important. The key point is
that it's a C callable function and follows standard C function call
rules.

> Okay, you've read the code in glibc and understand it. So, why does the
> glibc code have that register-setting assembly, if that register-setting
> assembly doesn't matter?

If you are asking why does glibc implement syscall() in assembler when
it could be implemented in completely generic C code (as musl does)
then the answer is I don't know. Historical I guess.

Looking at the glibc 32bit ARM syscall() assembler. After stripping
away the cfi_XXX annotations (ie stuff related to debug, not actual
opcodes) the assembler is:

ENTRY (syscall)
    mov    ip, sp
    push    {r4, r5, r6, r7}
    mov    r7, r0
    mov    r0, r1
    mov    r1, r2
    mov    r2, r3
    ldmfd    ip, {r3, r4, r5, r6}
    swi    0x0
    pop    {r4, r5, r6, r7}
    cmn    r0, #4096
    it    cc
    RETINSTR(cc, lr)
    b    PLTJMP(syscall_error)
PSEUDO_END (syscall)

ie it's pushing the original contents of r4, r5, r6 and r7 to the
stack, shuffling the first 4 arguments from C into the kernel's
syscall registers (the syscall number in r0 -> r7, the first argument
in r1 -> r0, etc), loading the next 4 arguments from C into registers
(cunningly, it loads 4 arguments directly from the stack into the
registers used for the next 4 arguments for the kernel syscall).
Interestingly, it's taking a total of 8 arguments from the C caller -
the first is the syscall number, then 7 additional arguments (one more
than required if the maximum is 6). It then invokes the syscall,
restores the callers original r4, r5, r6 and r7 values from the stack
and returns via a helper to set errno if the result from the kernel
indicated an error.

Now, looking at the C code implementation of syscall() in musl:

long syscall(long n, ...)
{
    va_list ap;
    syscall_arg_t a,b,c,d,e,f;
    va_start(ap, n);
    a=va_arg(ap, syscall_arg_t);
    b=va_arg(ap, syscall_arg_t);
    c=va_arg(ap, syscall_arg_t);
    d=va_arg(ap, syscall_arg_t);
    e=va_arg(ap, syscall_arg_t);
    f=va_arg(ap, syscall_arg_t);
    va_end(ap);
    return __syscall_ret(__syscall(n,a,b,c,d,e,f));
}

It fetches 6 va_args arguments from the caller, using standard C
function calling rules, and passes them on to the architecture
specific __syscall() macro, which will put the arguments in the
registers used for the kernel syscall and then invoke the syscall.
Note that since this is pure generic C code, you can insert debug,
call other functions etc where ever you like (the only thing that
needs special attention is that __syscall_ret() set errno).

Compiling the musl C code for 32bit ARM gives the following assembler:

00000000 <syscall>:
   0:    e92d000f     push    {r0, r1, r2, r3}
   4:    e92d48b0     push    {r4, r5, r7, fp, lr}
   8:    e28db010     add    fp, sp, #16
   c:    e28b0008     add    r0, fp, #8
  10:    e24dd00c     sub    sp, sp, #12
  14:    e28bc008     add    ip, fp, #8
  18:    e59b7004     ldr    r7, [fp, #4]
  1c:    e50bc018     str    ip, [fp, #-24]    ; 0xffffffe8
  20:    e890000f     ldm    r0, {r0, r1, r2, r3}
  24:    e59b4018     ldr    r4, [fp, #24]
  28:    e59b501c     ldr    r5, [fp, #28]
  2c:    ef000000     svc    0x00000000
  30:    ebfffffe     bl    0 <__syscall_ret>
  34:    e24bd010     sub    sp, fp, #16
  38:    e8bd48b0     pop    {r4, r5, r7, fp, lr}
  3c:    e28dd010     add    sp, sp, #16
  40:    e12fff1e     bx    lr

Although this is a bit of a mess (gcc obviously isn't good at
optimising va_args as it needlessly saves the first 4 arguments to the
stack and then loads them back again...) the basic shuffling of
arguments from a C function call into the registers used for the
kernel syscall is the same as the glibc assembler! (Apart from the
fact it only handles 6 syscall arguments, not 7 as the glibc assembler
does, so nothing is setup in r6).

ie the glibc assembler isn't some mysterious function with a non
standard calling convention - it's just an optimised implementation of
a standard C function.

> Okay, you say you understand why ARM EABI "sometimes" needs an argument
> to offset things. What are the circumstances?

The background to this is that in ARM 32bit EABI, 64bit values in
registers need to be kept in an even/odd register pair, which then
allows "double word" load and store instructions (ie single
instructions, first added in ARMv5, which can load or store 64bit
values from an even/odd register pair) to be used to read and write
them to/from memory. Since the ARM 32bit EABI kernel syscall interface
uses registers r0,r1,r2,r3, etc to pass the syscall arguments, a
padding argument is required if the first word of a 64bit value passed
to the kernel would not naturally be placed into an even numbered
register. In the readahead example, the first syscall argument is the
32bit file descriptor (which will be passed to the kernel in r0),
therefore a padding argument is required to fill r1 and ensure that
the first word of the 64bit offset gets passed in r2.

> Is it specific to 32-bit
> targets?

The above is completely specific to ARM 32bit EABI. I guess *similar*
issues may apply to some other 32bit architectures (as suggested in
the manpage). It's certainly not an issue with is generic to all 32bit
targets though.

> On a target with 64-bit pointers, would it apply also to
> 64-bit pointers, or is it exclusively for 64-bit integers?

Since 64bit architectures can, by definition, read and write 64bit
values to memory using single load and store instructions, no 64bit
architecture would have an ABI which places a restriction that 64bit
values need to be held in any particular register - so no padding
arguments would ever be required to accommodate that.

> Because it seems to me that on a 64-bit target, renameat2() would in
> fact be passing a 64-bit object as the second argument. And if there's
> a reason that this doesn't count as a 64-bit argument passed after an
> odd number of 32-bit arguments, I'd like to know specifically what that
> reason is before I go relying on it to stay true forever.

For a 64bit architecture, the distinction between a 32bit argument and
a 64bit argument is only in how you interpret that data. In all cases
the data is passed as a 64bit value.

The code calling libc syscall() and the code within the kernel which
interprets the syscall arguments must agree on the format of the data,
but for a libc syscall() implementation which just passes the
arguments along it can treat everything as 64bit values. It doesn't
matter if an argument is actually int, long, or pointer. See the musl
syscall() implementation - all va_args values are extracted from the
caller as long.

If syscall(), or a wrapper for it, *does* need to interpret the
arguments for a particular syscall then the syscall() implementation
would have to also agree with the interpretation of the data defined
by the kernel.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-27 16:40               ` Seebs
@ 2018-03-27 19:20                 ` Enrico Scholz
  2018-03-27 19:24                   ` Seebs
  0 siblings, 1 reply; 68+ messages in thread
From: Enrico Scholz @ 2018-03-27 19:20 UTC (permalink / raw)
  To: Seebs; +Cc: openembedded-core

Seebs <seebs@seebs.net> writes:

>> There are no 64 bit arguments (on 32 bit platforms) which might
>> require a special treatment as described in [1]
>> "Architecture-specific requirements".
>
> Okay, ignore the pointer case, and pretend it's the 64-bit value case,
> since we have specific-ish documentation for that.
>
> Look at the example for SYS_readahead, stating that the caller must
> pass an extra value.

SYS_readahead is one of a few syscalls which pass 64 bit arguments on 32
bit architectures.  Without the manual splitting, the ABI will cause the
compiler to insert a dummy argument so that registers are aligned for 64
bit values.

The caller of syscall(2) has to split such arguments. When the wrapper
does not handle the syscall itself, it just need to pass the arguments
as-is.


> At that point, if you have a series of va_arg calls corresponding to
> the values that would have been arguments had they not passed the extra
> value, I don't think you get the expected arguments. So far as I can
> tell, if the caller actually wrote
> 	varargsfunc(SYS_readahead, 0, uint64_t_value, ...)
> and the function did
> 	va_arg(ap, uint64_t);

That's true and must be checked when writing a wrapper for a syscall
which takes a 64 bit argument on a 32 bit architecture.

But for renameat2() it does not matter; every of its arguments fits into
a single register.



Enrico


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-27 19:11                                                                   ` Andre McCurdy
@ 2018-03-27 19:22                                                                     ` Seebs
  2018-03-27 20:12                                                                       ` Andre McCurdy
  0 siblings, 1 reply; 68+ messages in thread
From: Seebs @ 2018-03-27 19:22 UTC (permalink / raw)
  To: Andre McCurdy; +Cc: OE-core

On Tue, 27 Mar 2018 12:11:22 -0700
Andre McCurdy <armccurdy@gmail.com> wrote: 
> In the readahead example, the first syscall argument is the
> 32bit file descriptor (which will be passed to the kernel in r0),
> therefore a padding argument is required to fill r1 and ensure that
> the first word of the 64bit offset gets passed in r2.

Yes.

> The above is completely specific to ARM 32bit EABI. I guess *similar*
> issues may apply to some other 32bit architectures (as suggested in
> the manpage). It's certainly not an issue with is generic to all 32bit
> targets though.

I was wondering about 64-bit EABI. The man page didn't say "32-bit
EABI", it said "EABI". The information that you don't need to do that
on at least some ARM EABI arguably makes this *worse*, rather than
*better*, from the standpoint of "how do I write correct code for
this". So this appears to be at least partially a documentation error,
although it's quite possible that the text predates the question having
come up.

But it does also mean that it should be harmless to us in this case.

> If syscall(), or a wrapper for it, *does* need to interpret the
> arguments for a particular syscall then the syscall() implementation
> would have to also agree with the interpretation of the data defined
> by the kernel.

Yes.

My basic concern is that I don't think I have enough information to
produce a Provably Correct handling for syscall arguments in the
presence of at least one architecture where argument order can change
for at least one syscall.

... That said, an actual *correct* wrapper for renameat2 turns out
to be surprisingly hard, mostly because EXCHANGE is impossible to do
with pseudo's current IPC data structure.

-s


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-27 19:20                 ` Enrico Scholz
@ 2018-03-27 19:24                   ` Seebs
  2018-03-27 20:06                     ` Enrico Scholz
  0 siblings, 1 reply; 68+ messages in thread
From: Seebs @ 2018-03-27 19:24 UTC (permalink / raw)
  To: Enrico Scholz; +Cc: openembedded-core

On Tue, 27 Mar 2018 21:20:24 +0200
Enrico Scholz <enrico.scholz@sigma-chemnitz.de> wrote:

> SYS_readahead is one of a few syscalls which pass 64 bit arguments on
> 32 bit architectures.  Without the manual splitting, the ABI will
> cause the compiler to insert a dummy argument so that registers are
> aligned for 64 bit values.

I'm now even more confused. This sounds like the compiler *would*
insert the argument without being told to, because the ABI "will cause"
that, in which case the manual splitting wouldn't be necessary?

-s


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-27 19:24                   ` Seebs
@ 2018-03-27 20:06                     ` Enrico Scholz
  0 siblings, 0 replies; 68+ messages in thread
From: Enrico Scholz @ 2018-03-27 20:06 UTC (permalink / raw)
  To: Seebs; +Cc: openembedded-core

Seebs <seebs@seebs.net> writes:

> On Tue, 27 Mar 2018 21:20:24 +0200
> Enrico Scholz <enrico.scholz@sigma-chemnitz.de> wrote:
>
>> SYS_readahead is one of a few syscalls which pass 64 bit arguments on
>> 32 bit architectures.  Without the manual splitting, the ABI will
>> cause the compiler to insert a dummy argument so that registers are
>> aligned for 64 bit values.
>
> I'm now even more confused. This sounds like the compiler *would*
> insert the argument without being told to, because the ABI "will cause"
> that, in which case the manual splitting wouldn't be necessary?

For example, by ARM EABI function arguments are transmitted in r0, r1,
r2, r3.  But 64 bit values must be aligned to even registers.

So, assuming code like

| void foo(int a, unsigned long long b)
| {
| }
| 
| void bar(void)
| {
| 	foo(1, 2)
| }

The compiler generates

        mov     r2, #2
        mov     r3, #0
        mov     r0, #1
        bl      foo

e.g. it skips 'r1'.


When you use the variadic syscall(2) function, you pass an extra argument
at front (the syscall number)

  syscall(__NR_readahead, fd, offset_64bit, count);

  --> when doing the 'svc', fd goes into 'r0' and offset_64bit
      into 'r1' + 'r2'

In-kernel function does not have have the syscall-number and is

  sys_readahead(int fd, loff_t, count)

  --> 'fd' is expected in 'r0', offset in 'r2' + 'r3' due to the ABI


Enrico


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-27 19:22                                                                     ` Seebs
@ 2018-03-27 20:12                                                                       ` Andre McCurdy
  2018-03-27 20:20                                                                         ` Seebs
  0 siblings, 1 reply; 68+ messages in thread
From: Andre McCurdy @ 2018-03-27 20:12 UTC (permalink / raw)
  To: Seebs; +Cc: OE-core

On Tue, Mar 27, 2018 at 12:22 PM, Seebs <seebs@seebs.net> wrote:
>
> I was wondering about 64-bit EABI. The man page didn't say "32-bit
> EABI", it said "EABI".

EABI is just a name (Embedded ABI) which on ARM is used to distinguish
between the original ABI (now referred to as OABI, or Old ABI) and the
current one.

As far as I know, only one ABI exists for 64bit ARM and it's just
called the AArch64 ABI, not 64bit EABI etc.

Where the manpage says "EABI" it means ARM 32bit EABI, nothing else.

> The information that you don't need to do that
> on at least some ARM EABI arguably makes this *worse*, rather than
> *better*, from the standpoint of "how do I write correct code for
> this". So this appears to be at least partially a documentation error,
> although it's quite possible that the text predates the question having
> come up.
>
> But it does also mean that it should be harmless to us in this case.

You've lost me here...

> My basic concern is that I don't think I have enough information to
> produce a Provably Correct handling for syscall arguments in the
> presence of at least one architecture where argument order can change
> for at least one syscall.

Not sure what you mean by "handling of syscall arguments".

If you mean forwarding arguments through a wrapper without
interpreting them then I don't know what your concern is. Forwarding
arguments can be handled completely generically - for any architecture
and any syscall. See the musl implementation.

If you mean interpreting the arguments of a particular syscall then
yes - for each particular syscall, the C code calling libc syscall()
(and any code within libc syscall() which needs to interpret the
arguments) must be aware of the argument order for that particular
syscall as defined by the kernel for each architecture/ABI.

However the good news is that code in a syscall() wrapper doesn't need
to be any *more* aware of argument ordering than the C code calling
syscall(). In this particular case, if the code in gnulib calling
syscall(SYS_renameat2, ...) doesn't do anything architecture specific
then either it's not needed (and therefore also not needed in a
syscall() wrapper which wants interpret renameat2 syscalls) or there's
a portability bug in gnulib. ie there is no case where architecture
specific awareness is required in a syscall() wrapper but not in the
original C code which calls syscall().


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-27 20:12                                                                       ` Andre McCurdy
@ 2018-03-27 20:20                                                                         ` Seebs
  2018-03-27 20:52                                                                           ` Andre McCurdy
  0 siblings, 1 reply; 68+ messages in thread
From: Seebs @ 2018-03-27 20:20 UTC (permalink / raw)
  To: Andre McCurdy; +Cc: OE-core

On Tue, 27 Mar 2018 13:12:19 -0700
Andre McCurdy <armccurdy@gmail.com> wrote:

> If you mean forwarding arguments through a wrapper without
> interpreting them then I don't know what your concern is. Forwarding
> arguments can be handled completely generically - for any architecture
> and any syscall. See the musl implementation.

My concern is that, strictly speaking, this is nearly all undefined
behavior, and that reading more arguments than you were passed *does*
explode on some C implementations. Possibly none of the ones musl is
targeting.

I'm trying to minimize assumptions that *could in principle* affect
portability, such as "it's safe to grab an arbitrary pool of arguments
with va_arg", or "it's safe to grab arguments with va_arg using
different parameter types than were used to store them". Because
assumptions like those periodically break when, for some inexplicable
reason, someone ports to an architecture that isn't a VAX 11/780.

We're already stuck with "duplicating library functions" as a risk.
But so far, I don't think I have any code which is manipulating
arguments in a way that violates the spec. Adding such code creates
an additional risk, however small that risk may be in practice right
now.

> However the good news is that code in a syscall() wrapper doesn't need
> to be any *more* aware of argument ordering than the C code calling
> syscall(). In this particular case, if the code in gnulib calling
> syscall(SYS_renameat2, ...) doesn't do anything architecture specific
> then either it's not needed (and therefore also not needed in a
> syscall() wrapper which wants interpret renameat2 syscalls) or there's
> a portability bug in gnulib. ie there is no case where architecture
> specific awareness is required in a syscall() wrapper but not in the
> original C code which calls syscall().

Yes.

Right now, I think my inclination is to make a renameat2() wrapper
which fails. We did that for renameat() originally, and it was years
before it actually came up, and I think it's premature to attempt the
wrapper at a time when I *can't* write test code which compares it to
the behavior of libc.

-s


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-27 20:20                                                                         ` Seebs
@ 2018-03-27 20:52                                                                           ` Andre McCurdy
  2018-03-27 21:10                                                                             ` Seebs
  0 siblings, 1 reply; 68+ messages in thread
From: Andre McCurdy @ 2018-03-27 20:52 UTC (permalink / raw)
  To: Seebs; +Cc: OE-core

On Tue, Mar 27, 2018 at 1:20 PM, Seebs <seebs@seebs.net> wrote:
>
> My concern is that, strictly speaking, this is nearly all undefined
> behavior, and that reading more arguments than you were passed *does*
> explode on some C implementations.

Can you give some examples?

For every architecture I'm aware of that supports Linux, reading more
arguments is going to mean reading more data out of the stack. It's
not going "explode" until you read far enough to reach beyond the
start of the stack. What other failure modes are there?

> I'm trying to minimize assumptions that *could in principle* affect
> portability, such as "it's safe to grab an arbitrary pool of arguments
> with va_arg", or "it's safe to grab arguments with va_arg using
> different parameter types than were used to store them".

ALL of the implementations of libc syscall() I've looked at in both
glibc and musl do BOTH of these things - either explicitly in C code
or effectively the same thing in assembler.

By trying to avoid them in a wrapper, you are holding yourself to a
higher standard than any of the underlying syscall() implementations.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-27 20:52                                                                           ` Andre McCurdy
@ 2018-03-27 21:10                                                                             ` Seebs
  2018-03-29 12:04                                                                               ` Enrico Scholz
  0 siblings, 1 reply; 68+ messages in thread
From: Seebs @ 2018-03-27 21:10 UTC (permalink / raw)
  To: Andre McCurdy; +Cc: OE-core

On Tue, 27 Mar 2018 13:52:28 -0700
Andre McCurdy <armccurdy@gmail.com> wrote:

> On Tue, Mar 27, 2018 at 1:20 PM, Seebs <seebs@seebs.net> wrote:
> > My concern is that, strictly speaking, this is nearly all undefined
> > behavior, and that reading more arguments than you were passed
> > *does* explode on some C implementations.

> Can you give some examples?

Not specific ones off the top of my head, no.

> For every architecture I'm aware of that supports Linux, reading more
> arguments is going to mean reading more data out of the stack. It's
> not going "explode" until you read far enough to reach beyond the
> start of the stack. What other failure modes are there?

There are weird calling conventions out there. For instance, "pass
floating point values in registers, but integers on stack", or "pass
first N arguments in registers", and so on. I don't know if any of them
are active in stuff Linux supports, but I'm aware that this is an area
where you can get really strange behaviors.

It's undefined behavior for a reason.

> ALL of the implementations of libc syscall() I've looked at in both
> glibc and musl do BOTH of these things - either explicitly in C code
> or effectively the same thing in assembler.

Yes.
 
> By trying to avoid them in a wrapper, you are holding yourself to a
> higher standard than any of the underlying syscall() implementations.

Well, yes. They're part of the implementation and can make assumptions
about architecture because they're in a position to define
architecture. I'm not part of the implementation, and I don't want to
take on the workload of trying to track every possible architecture if
there's any possible way I can avoid it.

-s


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-27 21:10                                                                             ` Seebs
@ 2018-03-29 12:04                                                                               ` Enrico Scholz
  2018-03-29 14:06                                                                                 ` Seebs
  0 siblings, 1 reply; 68+ messages in thread
From: Enrico Scholz @ 2018-03-29 12:04 UTC (permalink / raw)
  To: openembedded-core

Seebs <seebs-59Mtl4G6ZbFeoWH0uzbU5w@public.gmane.org> writes:

> There are weird calling conventions out there. For instance, "pass
> floating point values in registers, but integers on stack", or "pass
> first N arguments in registers", and so on. I don't know if any of
> them are active in stuff Linux supports, but I'm aware that this is an
> area where you can get really strange behaviors.

__builtin_apply() should deal with it.  If you are really paranoid, assume
a huge stack size (e.g. 1024).  But accordingly syscall(2) man-page, there
are to be expected not more than 7 arguments for syscalls.  So, define the
size for known architectures and a fallback of '7 * sizeof(uintmax_t)' or
so.


Enrico


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: pseudo: host user contamination
  2018-03-29 12:04                                                                               ` Enrico Scholz
@ 2018-03-29 14:06                                                                                 ` Seebs
  0 siblings, 0 replies; 68+ messages in thread
From: Seebs @ 2018-03-29 14:06 UTC (permalink / raw)
  To: Enrico Scholz; +Cc: openembedded-core

On Thu, 29 Mar 2018 14:04:00 +0200
Enrico Scholz <enrico.scholz@sigma-chemnitz.de> wrote:
> __builtin_apply() should deal with it.

The documentation hints at possible problems when dealing with other
functions, but doesn't mention any architecture difficulties. It's
possible gcc simply never uses any of the weird conventions, it's also
possible that builtin_apply just doesn't work reliably on such
architectures. The gcc docs do state that gcc's calling conventions
never depend on whether a function has a fixed or variadic argument
list, which suggests that it's probably safe-ish. (Some compilers use
very different calling conventions for variadic functions.)

-s


^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2018-03-29 14:06 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-23 15:33 pseudo: host user contamination Enrico Scholz
2018-03-23 15:43 ` Enrico Scholz
2018-03-23 16:05   ` Burton, Ross
2018-03-23 16:10     ` Enrico Scholz
2018-03-23 16:17       ` Burton, Ross
2018-03-23 16:28       ` Seebs
2018-03-23 16:30         ` Burton, Ross
2018-03-23 16:49           ` Seebs
2018-03-23 16:56             ` Burton, Ross
2018-03-23 17:23               ` Seebs
2018-03-23 23:47             ` Richard Purdie
2018-03-23 23:56               ` Seebs
2018-03-24  0:22                 ` Enrico Scholz
2018-03-24  0:33                 ` Andre McCurdy
2018-03-24  0:36                   ` Seebs
2018-03-24  1:10                     ` Andre McCurdy
2018-03-24  1:17                       ` Seebs
2018-03-24  1:43                         ` Andre McCurdy
2018-03-24  2:44                           ` Seebs
2018-03-24 12:36                 ` Richard Purdie
2018-03-24 15:12                   ` Seebs
2018-03-24 17:10                   ` Burton, Ross
2018-03-24 17:23                     ` Seebs
2018-03-24 18:12                       ` Andre McCurdy
2018-03-24 18:22                         ` Seebs
2018-03-24 18:59                           ` Andre McCurdy
2018-03-24 19:24                             ` Seebs
2018-03-24 19:42                               ` Andre McCurdy
2018-03-24 19:50                                 ` Seebs
2018-03-24 20:12                                   ` Victor Kamensky
2018-03-24 23:04                                     ` Burton, Ross
2018-03-25  0:09                                       ` Victor Kamensky
2018-03-25  2:43                                         ` Andre McCurdy
2018-03-25  5:37                                           ` Victor Kamensky
2018-03-25  7:05                                             ` Andre McCurdy
2018-03-26 18:49                                               ` Andreas Müller
2018-03-26 19:31                                                 ` Seebs
2018-03-26 20:12                                                   ` Andre McCurdy
2018-03-26 21:07                                                     ` Seebs
2018-03-27  1:10                                                       ` Andre McCurdy
2018-03-27  1:32                                                         ` Seebs
2018-03-27  1:34                                                           ` Andre McCurdy
2018-03-27  2:07                                                             ` Seebs
2018-03-27  2:59                                                               ` Andre McCurdy
2018-03-27  4:41                                                                 ` Seebs
2018-03-27 19:11                                                                   ` Andre McCurdy
2018-03-27 19:22                                                                     ` Seebs
2018-03-27 20:12                                                                       ` Andre McCurdy
2018-03-27 20:20                                                                         ` Seebs
2018-03-27 20:52                                                                           ` Andre McCurdy
2018-03-27 21:10                                                                             ` Seebs
2018-03-29 12:04                                                                               ` Enrico Scholz
2018-03-29 14:06                                                                                 ` Seebs
2018-03-27 13:06                                                     ` Enrico Scholz
2018-03-27 15:50                                                       ` Seebs
2018-03-27 16:26                                                         ` Enrico Scholz
2018-03-27 16:46                                                           ` Seebs
2018-03-24 20:22                                   ` Joshua Watt
2018-03-24 21:01                                     ` Seebs
2018-03-24 20:27                                   ` Andre McCurdy
2018-03-27 14:42         ` Enrico Scholz
2018-03-27 15:55           ` Seebs
2018-03-27 16:35             ` Enrico Scholz
2018-03-27 16:40               ` Seebs
2018-03-27 19:20                 ` Enrico Scholz
2018-03-27 19:24                   ` Seebs
2018-03-27 20:06                     ` Enrico Scholz
2018-03-23 16:06 ` Burton, Ross

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.