* [PATCH] man2 : syscall.2 : add notes
@ 2013-03-27 5:11 ch0.han-Hm3cg6mZ9cc
[not found] ` <1364361092-5948-1-git-send-email-ch0.han-Hm3cg6mZ9cc@public.gmane.org>
0 siblings, 1 reply; 41+ messages in thread
From: ch0.han-Hm3cg6mZ9cc @ 2013-03-27 5:11 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
Cc: linux-man-u79uwXL29TY76Z2rM5mHXA, gunho.lee-Hm3cg6mZ9cc, Changhee Han
From: Changhee Han <gyulkkajo-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Add notes that caution users when passing arguments to syscall.2.
Signed-off-by: Changhee Han <gyulkkajo-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
man2/syscall.2 | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/man2/syscall.2 b/man2/syscall.2
index 0675943..2c823b6 100644
--- a/man2/syscall.2
+++ b/man2/syscall.2
@@ -79,6 +79,15 @@ and an error code is stored in
.BR syscall ()
first appeared in
4BSD.
+
+On some architecture arguments should be passed with an appropriate way.
+The glibc wrapper function, described in
+.BR syscalls (2),
+copies arguments to the right registers denpend on the architecture but
+.BR syscall (2)
+needs arguments following ABI, which its architecture describes, to be passed manually by a user.
+For example, on ARM architecture, a long long type of argument is considered to be 8-byte aligned and to be split into two 4-byte arguments.
+
.SH EXAMPLE
.nf
#define _GNU_SOURCE
--
1.7.9.5
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 41+ messages in thread
* (unknown),
[not found] ` <1364361092-5948-1-git-send-email-ch0.han-Hm3cg6mZ9cc@public.gmane.org>
@ 2013-03-27 7:53 ` Changhee Han
2013-03-27 8:25 ` [PATCH v2] man2 : syscall.2 : add notes Changhee Han
` (2 subsequent siblings)
3 siblings, 0 replies; 41+ messages in thread
From: Changhee Han @ 2013-03-27 7:53 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
Cc: linux-man-u79uwXL29TY76Z2rM5mHXA, gunho.lee-Hm3cg6mZ9cc
In v1, there was incorrect information, author and signed-off email address.
So, I resend the patch and igonre the previous one.
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH v2] man2 : syscall.2 : add notes
[not found] ` <1364361092-5948-1-git-send-email-ch0.han-Hm3cg6mZ9cc@public.gmane.org>
2013-03-27 7:53 ` (unknown), Changhee Han
@ 2013-03-27 8:25 ` Changhee Han
2013-03-28 9:37 ` [PATCH] " Michael Kerrisk (man-pages)
2013-04-01 5:33 ` Changhee Han
3 siblings, 0 replies; 41+ messages in thread
From: Changhee Han @ 2013-03-27 8:25 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
Cc: linux-man-u79uwXL29TY76Z2rM5mHXA, gunho.lee-Hm3cg6mZ9cc, Changhee Han
Add notes that caution users when passing arguments to syscall.2.
Signed-off-by: Changhee Han <ch0.han-Hm3cg6mZ9cc@public.gmane.org>
---
Previous v1, it had fault information, the author, signed-off email address. So, I resend the corrected patch and please ignore the previous v1 patch. Thanks.
man2/syscall.2 | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/man2/syscall.2 b/man2/syscall.2
index 0675943..2c823b6 100644
--- a/man2/syscall.2
+++ b/man2/syscall.2
@@ -79,6 +79,15 @@ and an error code is stored in
.BR syscall ()
first appeared in
4BSD.
+
+On some architecture arguments should be passed with an appropriate way.
+The glibc wrapper function, described in
+.BR syscalls (2),
+copies arguments to the right registers denpend on the architecture but
+.BR syscall (2)
+needs arguments following ABI, which its architecture describes, to be passed manually by a user.
+For example, on ARM architecture, a long long type of argument is considered to be 8-byte aligned and to be split into two 4-byte arguments.
+
.SH EXAMPLE
.nf
#define _GNU_SOURCE
--
1.7.9.5
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : add notes
[not found] ` <1364361092-5948-1-git-send-email-ch0.han-Hm3cg6mZ9cc@public.gmane.org>
2013-03-27 7:53 ` (unknown), Changhee Han
2013-03-27 8:25 ` [PATCH v2] man2 : syscall.2 : add notes Changhee Han
@ 2013-03-28 9:37 ` Michael Kerrisk (man-pages)
2013-04-01 5:33 ` Changhee Han
3 siblings, 0 replies; 41+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-03-28 9:37 UTC (permalink / raw)
To: ch0.han-Hm3cg6mZ9cc
Cc: linux-man-u79uwXL29TY76Z2rM5mHXA, gunho.lee-Hm3cg6mZ9cc, Changhee Han
On Wed, Mar 27, 2013 at 6:11 AM, <ch0.han-Hm3cg6mZ9cc@public.gmane.org> wrote:
> From: Changhee Han <gyulkkajo-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>
> Add notes that caution users when passing arguments to syscall.2.
>
> Signed-off-by: Changhee Han <gyulkkajo-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
> man2/syscall.2 | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/man2/syscall.2 b/man2/syscall.2
> index 0675943..2c823b6 100644
> --- a/man2/syscall.2
> +++ b/man2/syscall.2
> @@ -79,6 +79,15 @@ and an error code is stored in
> .BR syscall ()
> first appeared in
> 4BSD.
> +
> +On some architecture arguments should be passed with an appropriate way.
> +The glibc wrapper function, described in
> +.BR syscalls (2),
> +copies arguments to the right registers denpend on the architecture but
> +.BR syscall (2)
> +needs arguments following ABI, which its architecture describes, to be passed manually by a user.
> +For example, on ARM architecture, a long long type of argument is considered to be 8-byte aligned and to be split into two 4-byte arguments.
Changhee,
I think this is a very worthwhile patch, but needs a bit of
clarification. How would text such as this be:
[[
Each architecture ABI has its own requirements on how system call
arguments are passed to the kernel.
For system calls that have a glibc wrapper (i.e., most system calls)
glibc handles the details of copy arguments to the right registers
in a manner suitable for the architecture.
However, when using
.BR syscall ()
to make a system call,
the caller may need to handle architecture-dependent details.
For example, on the ARM architecture, a
.I "long long"
argument is considered to be 8-byte aligned
and to be split into two 4-byte arguments.
]]
Would that text be okay?
And then, in addition to that it would be good to have an example of
how one uses syscall() on ARM to invoke a system call with an 8-byte
aligned argument. Could you provide an example?
Cheers,
Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH] man2 : syscall.2 : add notes
[not found] ` <1364361092-5948-1-git-send-email-ch0.han-Hm3cg6mZ9cc@public.gmane.org>
` (2 preceding siblings ...)
2013-03-28 9:37 ` [PATCH] " Michael Kerrisk (man-pages)
@ 2013-04-01 5:33 ` Changhee Han
[not found] ` <1364794429-20477-1-git-send-email-ch0.han-Hm3cg6mZ9cc@public.gmane.org>
3 siblings, 1 reply; 41+ messages in thread
From: Changhee Han @ 2013-04-01 5:33 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
Cc: linux-man-u79uwXL29TY76Z2rM5mHXA, gunho.lee-Hm3cg6mZ9cc, Changhee Han
Add notes that caution users when passing arguments to syscall.2.
Signed-off-by: Changhee Han <ch0.han-Hm3cg6mZ9cc@public.gmane.org>
---
Modified notes as you suggested and added some example which show how to handle 64bit argument
man2/syscall.2 | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/man2/syscall.2 b/man2/syscall.2
index 0675943..180a0e4 100644
--- a/man2/syscall.2
+++ b/man2/syscall.2
@@ -79,6 +79,36 @@ and an error code is stored in
.BR syscall ()
first appeared in
4BSD.
+
+Each architecture ABI has its own requirements on how system call arguments are passed to the kernel.
+For system calls that have a glibc wrapper (i.g., most system calls) glibc handles the details of copy arguments to the right registers in a manner suitable for the architecture.
+However, when using
+.BR syscall ()
+to make a system call,
+the caller may need to handle architecture-dependent details.
+For example, on ARM architecture, a
+.I "long long"
+argument is considered to be 8-byte aligned and to be split into two 4-byte arguments.
+
+.BR readahead ()
+system call could be called like below in ARM architecture.
+
+syscall(__NR_readahead, fd,
+.I 0
+, (unsigned int)(
+.I offset
+>> 32), (unsigned int)(
+.I offset
+& 0xFFFFFFFF), count)
+
+.I offset
+is 64 bit and should be 8-byte aligned.
+Thus, a padding is inserted before
+.I offset
+and
+.I offset
+is split into two 32 bit arguments.
+
.SH EXAMPLE
.nf
#define _GNU_SOURCE
--
1.7.9.5
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : add notes
[not found] ` <1364794429-20477-1-git-send-email-ch0.han-Hm3cg6mZ9cc@public.gmane.org>
@ 2013-04-01 6:13 ` Mike Frysinger
[not found] ` <201304010213.06056.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
0 siblings, 1 reply; 41+ messages in thread
From: Mike Frysinger @ 2013-04-01 6:13 UTC (permalink / raw)
To: Changhee Han
Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
linux-man-u79uwXL29TY76Z2rM5mHXA, gunho.lee-Hm3cg6mZ9cc
[-- Attachment #1: Type: Text/Plain, Size: 1019 bytes --]
On Monday 01 April 2013 01:33:49 Changhee Han wrote:
> +Each architecture ABI has its own requirements on how system call
> arguments are passed to the kernel. +For system calls that have a glibc
> wrapper (i.g., most system calls) glibc handles the details of copy
> arguments to the right registers in a manner suitable for the
> architecture.
these lines need to be wrapped
"i.g." is incorrect ... you mean "i.e."
> +However, when using
> +.BR syscall ()
> +to make a system call,
> +the caller may need to handle architecture-dependent details.
> +For example, on ARM architecture, a
> +.I "long long"
> +argument is considered to be 8-byte aligned and to be split into two
> 4-byte arguments. +
> +.BR readahead ()
> +system call could be called like below in ARM architecture.
this has nothing to do with alignment. syscalls pass args via registers, and
in the 32bit ARM port, registers are 32bits wide. so in order to pass a 64bit
value, you have to manually split it up.
-mike
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : add notes
[not found] ` <201304010213.06056.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
@ 2013-04-01 6:22 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAki_8bOsuKTJLx3iMLeSvVXHo0bZf8zSUQ08RR7+D33xgQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-01 7:05 ` Fw : Re : " 한창희
1 sibling, 1 reply; 41+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-04-01 6:22 UTC (permalink / raw)
To: Mike Frysinger; +Cc: Changhee Han, linux-man, gunho.lee-Hm3cg6mZ9cc
Mike,
On Mon, Apr 1, 2013 at 8:13 AM, Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org> wrote:
> On Monday 01 April 2013 01:33:49 Changhee Han wrote:
>> +Each architecture ABI has its own requirements on how system call
>> arguments are passed to the kernel. +For system calls that have a glibc
>> wrapper (i.g., most system calls) glibc handles the details of copy
>> arguments to the right registers in a manner suitable for the
>> architecture.
>
> these lines need to be wrapped
>
> "i.g." is incorrect ... you mean "i.e."
>
>> +However, when using
>> +.BR syscall ()
>> +to make a system call,
>> +the caller may need to handle architecture-dependent details.
>> +For example, on ARM architecture, a
>> +.I "long long"
>> +argument is considered to be 8-byte aligned and to be split into two
>> 4-byte arguments. +
>> +.BR readahead ()
>> +system call could be called like below in ARM architecture.
>
> this has nothing to do with alignment. syscalls pass args via registers, and
> in the 32bit ARM port, registers are 32bits wide. so in order to pass a 64bit
> value, you have to manually split it up.
So, I'm not familiar with all the details here. What is the purpose of
the '0' argument that precedes 'offset' then?
Cheers,
Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 41+ messages in thread
* Fw : Re : Re: [PATCH] man2 : syscall.2 : add notes
[not found] ` <201304010213.06056.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
2013-04-01 6:22 ` Michael Kerrisk (man-pages)
@ 2013-04-01 7:05 ` 한창희
1 sibling, 0 replies; 41+ messages in thread
From: 한창희 @ 2013-04-01 7:05 UTC (permalink / raw)
To: Michael Kerrisk (man-pages)
Cc: Mike Frysinger, linux-man, 이건호
Kerrisk,
Arguments passes via registers when syscall() calls.
but each register size is 32 bit wide.
So, when 64bit argument passes, it split into two registers.
But according to ABI by ARM, 64 bit arguments should be start with an even number of register (e.g. r0, r2)
so, in the example, 'offset' should be started on an even number of register like below
r0 : fd
r1 : 0 (dummy value)
r2 : offset high
r3 : offset low
if '0' is omitted, system call handler mis-interpreted the value and a garbage value is in 'offset'
I will correct the notes and submit it again soon...
Mike, do you have any suggestion to correct my expression? ( include alignment....? )
---------- Original Message ----------
From : "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
To : Mike Frysinger <vapier@gentoo.org>
Cc : 한창희 연구원(ch0.han), linux-man <linux-man@vger.kernel.org>, 이건호 책임연구원(gunho.lee)
Date : 13/4/1 15:23:08
Subject : Re: [PATCH] man2 : syscall.2 : add notes
Mike,
On Mon, Apr 1, 2013 at 8:13 AM, Mike Frysinger wrote:
> On Monday 01 April 2013 01:33:49 Changhee Han wrote:
>> +Each architecture ABI has its own requirements on how system call
>> arguments are passed to the kernel. +For system calls that have a glibc
>> wrapper (i.g., most system calls) glibc handles the details of copy
>> arguments to the right registers in a manner suitable for the
>> architecture.
>
> these lines need to be wrapped
>
> "i.g." is incorrect ... you mean "i.e."
>
>> +However, when using
>> +.BR syscall ()
>> +to make a system call,
>> +the caller may need to handle architecture-dependent details.
>> +For example, on ARM architecture, a
>> +.I "long long"
>> +argument is considered to be 8-byte aligned and to be split into two
>> 4-byte arguments. +
>> +.BR readahead ()
>> +system call could be called like below in ARM architecture.
>
> this has nothing to do with alignment. syscalls pass args via registers, and
> in the 32bit ARM port, registers are 32bits wide. so in order to pass a 64bit
> value, you have to manually split it up.
So, I'm not familiar with all the details here. What is the purpose of
the '0' argument that precedes 'offset' then?
Cheers,
Michael
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : add notes
[not found] ` <CAKgNAki_8bOsuKTJLx3iMLeSvVXHo0bZf8zSUQ08RR7+D33xgQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-04-01 7:19 ` Mike Frysinger
[not found] ` <201304010319.45019.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
0 siblings, 1 reply; 41+ messages in thread
From: Mike Frysinger @ 2013-04-01 7:19 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
Cc: Changhee Han, linux-man, gunho.lee-Hm3cg6mZ9cc
[-- Attachment #1: Type: Text/Plain, Size: 1839 bytes --]
On Monday 01 April 2013 02:22:45 Michael Kerrisk (man-pages) wrote:
> On Mon, Apr 1, 2013 at 8:13 AM, Mike Frysinger wrote:
> > On Monday 01 April 2013 01:33:49 Changhee Han wrote:
> >> +However, when using
> >> +.BR syscall ()
> >> +to make a system call,
> >> +the caller may need to handle architecture-dependent details.
> >> +For example, on ARM architecture, a
> >> +.I "long long"
> >> +argument is considered to be 8-byte aligned and to be split into two
> >> 4-byte arguments. +
> >> +.BR readahead ()
> >> +system call could be called like below in ARM architecture.
> >
> > this has nothing to do with alignment. syscalls pass args via registers,
> > and in the 32bit ARM port, registers are 32bits wide. so in order to
> > pass a 64bit value, you have to manually split it up.
>
> So, I'm not familiar with all the details here. What is the purpose of
> the '0' argument that precedes 'offset' then?
ok, so the answer is more nuanced, and the reasoning above is incorrect (or at
the very least, poorly phrased).
for ARM OABI, there is no such padding, and the proposed example is wrong and
will not work.
for ARM EABI, the ABI requires that 64bit values be passed in register pairs.
since the kernel people wanted to avoid an assembly trampoline to unpack the
64bit value with EABI, you have to call it as proposed:
syscall(readahead, fd, _pad, high32, low32)
for MIPS, only the O32 ABI has this behavior.
for PPC, only the 32bit ABI has this behavior.
otherwise, i don't believe anyone else does this -- they just pass things
along in registers w/out padding.
since the current list of syscalls which are impacted is small, it might be
useful to explicitly enumerate them. they are:
fadvise64_64
ftruncate64
pread64
pwrite64
readahead
truncate64
-mike
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : add notes
[not found] ` <201304010319.45019.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
@ 2013-04-01 7:36 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkhBASGvXGfdBSjpGaMuxoJofcQvZQrX3a=uxbcKQnXOAQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-01 8:37 ` [PATCH] man2 : syscall.2 : add notes Mike Frysinger
1 sibling, 1 reply; 41+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-04-01 7:36 UTC (permalink / raw)
To: Mike Frysinger; +Cc: Changhee Han, linux-man, gunho.lee-Hm3cg6mZ9cc
Mike,
On Mon, Apr 1, 2013 at 9:19 AM, Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org> wrote:
> On Monday 01 April 2013 02:22:45 Michael Kerrisk (man-pages) wrote:
>> On Mon, Apr 1, 2013 at 8:13 AM, Mike Frysinger wrote:
>> > On Monday 01 April 2013 01:33:49 Changhee Han wrote:
>> >> +However, when using
>> >> +.BR syscall ()
>> >> +to make a system call,
>> >> +the caller may need to handle architecture-dependent details.
>> >> +For example, on ARM architecture, a
>> >> +.I "long long"
>> >> +argument is considered to be 8-byte aligned and to be split into two
>> >> 4-byte arguments. +
>> >> +.BR readahead ()
>> >> +system call could be called like below in ARM architecture.
>> >
>> > this has nothing to do with alignment. syscalls pass args via registers,
>> > and in the 32bit ARM port, registers are 32bits wide. so in order to
>> > pass a 64bit value, you have to manually split it up.
>>
>> So, I'm not familiar with all the details here. What is the purpose of
>> the '0' argument that precedes 'offset' then?
>
> ok, so the answer is more nuanced, and the reasoning above is incorrect (or at
> the very least, poorly phrased).
>
> for ARM OABI, there is no such padding, and the proposed example is wrong and
> will not work.
>
> for ARM EABI, the ABI requires that 64bit values be passed in register pairs.
> since the kernel people wanted to avoid an assembly trampoline to unpack the
> 64bit value with EABI, you have to call it as proposed:
> syscall(readahead, fd, _pad, high32, low32)
>
> for MIPS, only the O32 ABI has this behavior.
>
> for PPC, only the 32bit ABI has this behavior.
>
> otherwise, i don't believe anyone else does this -- they just pass things
> along in registers w/out padding.
>
> since the current list of syscalls which are impacted is small, it might be
> useful to explicitly enumerate them. they are:
> fadvise64_64
> ftruncate64
> pread64
> pwrite64
> readahead
> truncate64
> -mike
So, in summary, is the following patch okay? (If not, could you
suggest specific rewordings.)
Thanks,
Michael
diff --git a/man2/syscall.2 b/man2/syscall.2
index 0675943..cad1f20 100644
--- a/man2/syscall.2
+++ b/man2/syscall.2
@@ -79,6 +79,48 @@ and an error code is stored in
.BR syscall ()
first appeared in
4BSD.
+
+Each architecture ABI has its own requirements on how
+system call arguments are passed to the kernel.
+For system calls that have a glibc wrapper (e.g., most system calls),
+glibc handles the details of copyiing arguments to the right registers
+in a manner suitable for the architecture.
+However, when using
+.BR syscall ()
+to make a system call,
+the caller may need to handle architecture-dependent details.
+For example, on the ARM architecture Embbeded ABI (EABI), a
+.I "long long"
+argument is considered to be 8-byte aligned and to be split
+into two 4-byte arguments.
+
+For example, the
+.BR readahead ()
+system call would be invoked as follows on the ARM architecture with the EABI:
+
+.in +4n
+.nf
+syscall(__NR_readahead, fd, 0, (unsigned int)(offset >> 32),
+ (unsigned int)(offset & 0xFFFFFFFF), count);
+.fi
+.in
+.PP
+.I offset
+is 64 bit and should be 8-byte aligned.
+Thus, a padding is inserted before
+.I offset
+and
+.I offset
+is split into two 32-bit arguments.
+Similar issues can occur on MIPS with the O32 ABI and
+on PowerPC with the 32-bit ABI.
+.BR fadvise64_64 (2)
+.BR ftruncate64 (2)
+.BR pread64 (2)
+.BR pwrite64 (2)
+.BR readahead (2)
+and
+.BR truncate64 (2).
.SH EXAMPLE
.nf
#define _GNU_SOURCE
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : add notes
[not found] ` <CAKgNAkhBASGvXGfdBSjpGaMuxoJofcQvZQrX3a=uxbcKQnXOAQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-04-01 8:29 ` Mike Frysinger
[not found] ` <201304010429.45737.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
0 siblings, 1 reply; 41+ messages in thread
From: Mike Frysinger @ 2013-04-01 8:29 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
Cc: Changhee Han, linux-man, gunho.lee-Hm3cg6mZ9cc
[-- Attachment #1: Type: Text/Plain, Size: 2308 bytes --]
On Monday 01 April 2013 03:36:40 Michael Kerrisk (man-pages) wrote:
> +glibc handles the details of copyiing arguments to the right registers
copying
> +the caller may need to handle architecture-dependent details.
may->might
> +For example, on the ARM architecture Embbeded ABI (EABI), a
Embedded
> +.I "long long"
> +argument is considered to be 8-byte aligned and to be split
> +into two 4-byte arguments.
i would rewrite to:
64 bit value (e.g. "long long") must be aligned to an even register pair.
> +.I offset
> +is 64 bit and should be 8-byte aligned.
> +Thus, a padding is inserted before
> +.I offset
> +and
> +.I offset
> +is split into two 32-bit arguments.
i would rewrite to:
Since the offset argument is 64 bits, and the first argument (fd) is passed in
r0, we need to manually split & align the 64 bit value ourselves so that it is
passed in the r2/r3 register pair. That means inserting a dummy value into r1
(the 2nd argument of 0).
> +Similar issues can occur on MIPS with the O32 ABI and
> +on PowerPC with the 32-bit ABI.
> +.BR fadvise64_64 (2)
> +.BR ftruncate64 (2)
> +.BR pread64 (2)
> +.BR pwrite64 (2)
> +.BR readahead (2)
> +and
> +.BR truncate64 (2).
the style here is messed up. i'm guessing you meant to make a new paragraph
starting at "Similar", and you meant to add some text before the function
list. also add to the list: sync_file_range and posix_fadvise.
not sure if it's worth mentioning, but this issue ends up forcing MIPS' O32 to
take 7 arguments to syscall() :). on ARM/PPC, they avoid this by reordering
the arguments.
i see that the existing sync_file_range and posix_fadvise pages explicitly call
out this issue. i'd suggest updating those (as well as the other funcs that
are affected) to point back to syscall(2) for more details rather than getting
into too much detail.
on a related topic, would it be useful to document the exact calling
convention for architecture system calls ? from time to time, i need to
reference this, and i inevitably turn to a variety of sources to dig up the
answer (the kernel itself, or strace, or qemu, or glibc, or uClibc, or lss, or
other random places). i would find it handy to have all of these in a single
location.
-mike
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : add notes
[not found] ` <201304010319.45019.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
2013-04-01 7:36 ` Michael Kerrisk (man-pages)
@ 2013-04-01 8:37 ` Mike Frysinger
[not found] ` <201304010437.52901.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
1 sibling, 1 reply; 41+ messages in thread
From: Mike Frysinger @ 2013-04-01 8:37 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
Cc: Changhee Han, linux-man, gunho.lee-Hm3cg6mZ9cc
[-- Attachment #1: Type: Text/Plain, Size: 828 bytes --]
On Monday 01 April 2013 03:19:43 Mike Frysinger wrote:
> for ARM OABI, there is no such padding, and the proposed example is wrong
> and will not work.
>
> for ARM EABI, the ABI requires that 64bit values be passed in register
> pairs. since the kernel people wanted to avoid an assembly trampoline to
> unpack the 64bit value with EABI, you have to call it as proposed:
> syscall(readahead, fd, _pad, high32, low32)
>
> for MIPS, only the O32 ABI has this behavior.
>
> for PPC, only the 32bit ABI has this behavior.
>
> otherwise, i don't believe anyone else does this -- they just pass things
> along in registers w/out padding.
in random grepping of code bases (uClibc), i believe the xtensa arch also does
64bit register pair aligning. a cursory scan of the kernel seems to back this
up.
-mike
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : add notes
[not found] ` <201304010429.45737.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
@ 2013-04-01 9:29 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkij3zDwakWvcRkRbknmV2Hpt4HWfH4uVqmxp+7gQek-2g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 41+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-04-01 9:29 UTC (permalink / raw)
To: Mike Frysinger; +Cc: Changhee Han, linux-man, gunho.lee-Hm3cg6mZ9cc
On Mon, Apr 1, 2013 at 10:29 AM, Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org> wrote:
> On Monday 01 April 2013 03:36:40 Michael Kerrisk (man-pages) wrote:
[Type corrections incorporated]
>> +.I "long long"
>> +argument is considered to be 8-byte aligned and to be split
>> +into two 4-byte arguments.
>
> i would rewrite to:
> 64 bit value (e.g. "long long") must be aligned to an even register pair.
Done.
>> +.I offset
>> +is 64 bit and should be 8-byte aligned.
>> +Thus, a padding is inserted before
>> +.I offset
>> +and
>> +.I offset
>> +is split into two 32-bit arguments.
>
> i would rewrite to:
> Since the offset argument is 64 bits, and the first argument (fd) is passed in
> r0, we need to manually split & align the 64 bit value ourselves so that it is
> passed in the r2/r3 register pair. That means inserting a dummy value into r1
> (the 2nd argument of 0).
Done.
>> +Similar issues can occur on MIPS with the O32 ABI and
>> +on PowerPC with the 32-bit ABI.
>> +.BR fadvise64_64 (2)
>> +.BR ftruncate64 (2)
>> +.BR pread64 (2)
>> +.BR pwrite64 (2)
>> +.BR readahead (2)
>> +and
>> +.BR truncate64 (2).
>
> the style here is messed up. i'm guessing you meant to make a new paragraph
> starting at "Similar", and you meant to add some text before the function
> list. also add to the list: sync_file_range and posix_fadvise.
Yes, fixed.
> not sure if it's worth mentioning, but this issue ends up forcing MIPS' O32 to
> take 7 arguments to syscall() :). on ARM/PPC, they avoid this by reordering
> the arguments.
I'm not sure that we need quite this level of detail, so I'll leave for now.
> i see that the existing sync_file_range and posix_fadvise pages explicitly call
> out this issue. i'd suggest updating those (as well as the other funcs that
> are affected) to point back to syscall(2) for more details rather than getting
> into too much detail.
Seems reasonable to me.
> on a related topic, would it be useful to document the exact calling
> convention for architecture system calls ? from time to time, i need to
> reference this, and i inevitably turn to a variety of sources to dig up the
> answer (the kernel itself, or strace, or qemu, or glibc, or uClibc, or lss, or
> other random places). i would find it handy to have all of these in a single
> location.
Sounds like it would be useful to have that documented. Would you have
a chance to write patches for that?
Revised patches below.
Cheers,
Michael
diff --git a/man2/syscall.2 b/man2/syscall.2
index 0675943..75c4ad8 100644
--- a/man2/syscall.2
+++ b/man2/syscall.2
@@ -37,7 +37,7 @@
.\" 2002-03-20 Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
.\" - adopted for Linux
.\"
-.TH SYSCALL 2 2012-08-14 "Linux" "Linux Programmer's Manual"
+.TH SYSCALL 2 2013-04-01 "Linux" "Linux Programmer's Manual"
.SH NAME
syscall \- indirect system call
.SH SYNOPSIS
@@ -79,6 +79,56 @@ and an error code is stored in
.BR syscall ()
first appeared in
4BSD.
+
+Each architecture ABI has its own requirements on how
+system call arguments are passed to the kernel.
+For system calls that have a glibc wrapper (e.g., most system calls),
+glibc handles the details of copying arguments to the right registers
+in a manner suitable for the architecture.
+However, when using
+.BR syscall ()
+to make a system call,
+the caller might need to handle architecture-dependent details.
+For example, on the ARM architecture Embedded ABI (EABI), a
+64-bit value (e.g.,
+.IR "long long" ) must be aligned to an even register pair.
+
+For example, the
+.BR readahead ()
+system call would be invoked as follows on the ARM architecture with the EABI:
+
+.in +4n
+.nf
+syscall(SYS_readahead, fd, 0,
+ (unsigned int) (offset >> 32),
+ (unsigned int) (offset & 0xFFFFFFFF),
+ count);
+.fi
+.in
+.PP
+Since the offset argument is 64 bits, and the first argument
+.RI ( fd )
+is passed in
+.IR r0 ,
+we need to manually split and align the 64-bit value ourselves so that it is
+passed in the
+.IR r2 / r3
+register pair.
+That means inserting a dummy value into
+.I r1
+(the second argument of 0).
+Similar issues can occur on MIPS with the O32 ABI and
+on PowerPC with the 32-bit ABI.
+The affected system calls are
+.BR fadvise64_64 (2),
+.BR ftruncate64 (2),
+.BR posix_fadvise (2),
+.BR pread64 (2),
+.BR pwrite64 (2),
+.BR readahead (2),
+.BR sync_file_range (2),
+and
+.BR truncate64 (2).
.SH EXAMPLE
.nf
#define _GNU_SOURCE
=====================
diff --git a/man2/posix_fadvise.2 b/man2/posix_fadvise.2
index d644641..90ac8e9 100644
--- a/man2/posix_fadvise.2
+++ b/man2/posix_fadvise.2
@@ -153,7 +153,10 @@ or
first.
.SS arm_fadvise()
The ARM architecture
-needs 64-bit arguments to be aligned in a suitable pair of registers.
+needs 64-bit arguments to be aligned in a suitable pair of registers
+(see
+.BR syscall (2)
+for further detail).
On this architecture, the call signature of
.BR posix_fadvise ()
is flawed, since it forces a register to be wasted as padding between the
diff --git a/man2/pread.2 b/man2/pread.2
index 42e79b7..1d648b1 100644
--- a/man2/pread.2
+++ b/man2/pread.2
@@ -130,6 +130,11 @@ The glibc
and
.BR pwrite ()
wrapper functions transparently deal with the change.
+
+On some 32-bit architectures,
+the calling signature for these system calls differ,
+for the reasons described in
+.BR syscall (2).
.SH BUGS
POSIX requires that opening a file with the
.BR O_APPEND
diff --git a/man2/readahead.2 b/man2/readahead.2
index 08c2fe2..605fa5e 100644
--- a/man2/readahead.2
+++ b/man2/readahead.2
@@ -89,6 +89,11 @@ The
.BR readahead ()
system call is Linux-specific, and its use should be avoided
in portable applications.
+.SH NOTES
+On some 32-bit architectures,
+the calling signature for this system call differs,
+for the reasons described in
+.BR syscall (2).
.SH SEE ALSO
.BR lseek (2),
.BR madvise (2),
diff --git a/man2/sync_file_range.2 b/man2/sync_file_range.2
index c55184a..6adf15d 100644
--- a/man2/sync_file_range.2
+++ b/man2/sync_file_range.2
@@ -191,6 +191,9 @@ is flawed, since it forces a register to be wasted
as padding between the
and
.I offset
arguments.
+(See
+.BR syscall (2)
+for details.)
Therefore, these architectures define a different
system call that orders the arguments suitably:
.PP
diff --git a/man2/truncate.2 b/man2/truncate.2
index 4d12683..64b8288 100644
--- a/man2/truncate.2
+++ b/man2/truncate.2
@@ -240,6 +240,11 @@ system calls that handle large files.
However, these details can be ignored by applications using glibc, whose
wrapper functions transparently employ the more recent system calls
where they are available.
+
+On some 32-bit architectures,
+the calling signature for these system calls differ,
+for the reasons described in
+.BR syscall (2).
.SH BUGS
A header file bug in glibc 2.12 meant that the minimum value of
.\" http://sourceware.org/bugzilla/show_bug.cgi?id=12037
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : add notes
[not found] ` <201304010437.52901.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
@ 2013-04-01 9:30 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkit-qRPErHDzGEJ_yedA+O97bFxDsqWJMZOhCZ9DPvOtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 41+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-04-01 9:30 UTC (permalink / raw)
To: Mike Frysinger; +Cc: Changhee Han, linux-man, gunho.lee-Hm3cg6mZ9cc
On Mon, Apr 1, 2013 at 10:37 AM, Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org> wrote:
> On Monday 01 April 2013 03:19:43 Mike Frysinger wrote:
>> for ARM OABI, there is no such padding, and the proposed example is wrong
>> and will not work.
>>
>> for ARM EABI, the ABI requires that 64bit values be passed in register
>> pairs. since the kernel people wanted to avoid an assembly trampoline to
>> unpack the 64bit value with EABI, you have to call it as proposed:
>> syscall(readahead, fd, _pad, high32, low32)
>>
>> for MIPS, only the O32 ABI has this behavior.
>>
>> for PPC, only the 32bit ABI has this behavior.
>>
>> otherwise, i don't believe anyone else does this -- they just pass things
>> along in registers w/out padding.
>
> in random grepping of code bases (uClibc), i believe the xtensa arch also does
> 64bit register pair aligning. a cursory scan of the kernel seems to back this
> up.
Also SuperH?
For my own education: which part of the kernel sources backed this up?
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : add notes
[not found] ` <CAKgNAkit-qRPErHDzGEJ_yedA+O97bFxDsqWJMZOhCZ9DPvOtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-04-01 10:09 ` Mike Frysinger
0 siblings, 0 replies; 41+ messages in thread
From: Mike Frysinger @ 2013-04-01 10:09 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
Cc: Changhee Han, linux-man, gunho.lee-Hm3cg6mZ9cc
[-- Attachment #1: Type: Text/Plain, Size: 2045 bytes --]
On Monday 01 April 2013 05:30:06 Michael Kerrisk (man-pages) wrote:
> On Mon, Apr 1, 2013 at 10:37 AM, Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org> wrote:
> > On Monday 01 April 2013 03:19:43 Mike Frysinger wrote:
> >> for ARM OABI, there is no such padding, and the proposed example is
> >> wrong and will not work.
> >>
> >> for ARM EABI, the ABI requires that 64bit values be passed in register
> >> pairs. since the kernel people wanted to avoid an assembly trampoline to
> >>
> >> unpack the 64bit value with EABI, you have to call it as proposed:
> >> syscall(readahead, fd, _pad, high32, low32)
> >>
> >> for MIPS, only the O32 ABI has this behavior.
> >>
> >> for PPC, only the 32bit ABI has this behavior.
> >>
> >> otherwise, i don't believe anyone else does this -- they just pass
> >> things along in registers w/out padding.
> >
> > in random grepping of code bases (uClibc), i believe the xtensa arch also
> > does 64bit register pair aligning. a cursory scan of the kernel seems
> > to back this up.
>
> Also SuperH?
i don't think so ... the pread/pwrite is indeed funky for SuperH, but i'm
pretty sure that's a wart they accidentally copied from another arch (ppc
maybe?) when they implemented the syscall rather than needing to do 64bit
register alignment. i say that because qemu/strace/glibc don't do the
register realigning for any other function.
> For my own education: which part of the kernel sources backed this up?
for xtensa, this part:
arch/xtensa/include/uapi/asm/unistd.h:
__SYSCALL(260, sys_readahead, 5)
that says readahead takes 5 args, but that's only true for 32bit arches if
you're re-aligning the value. the other 64bit syscalls have the same property
(+1 to the normal # of args).
additionally, the arch/xtensa/kernel/syscall.c file has a custom fadvise64_64
syscall with re-order arguments (with "advice" moved from last to 2nd) so that
the shifting of args doesn't end up requiring 7 slots (ala mips/o32).
-mike
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : add notes
[not found] ` <CAKgNAkij3zDwakWvcRkRbknmV2Hpt4HWfH4uVqmxp+7gQek-2g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-04-01 10:32 ` Mike Frysinger
[not found] ` <201304010632.41520.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
0 siblings, 1 reply; 41+ messages in thread
From: Mike Frysinger @ 2013-04-01 10:32 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
Cc: Changhee Han, linux-man, gunho.lee-Hm3cg6mZ9cc
[-- Attachment #1: Type: Text/Plain, Size: 3188 bytes --]
On Monday 01 April 2013 05:29:11 Michael Kerrisk (man-pages) wrote:
> On Mon, Apr 1, 2013 at 10:29 AM, Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org> wrote:
> > not sure if it's worth mentioning, but this issue ends up forcing MIPS'
> > O32 to take 7 arguments to syscall() :). on ARM/PPC, they avoid this by
> > reordering the arguments.
>
> I'm not sure that we need quite this level of detail, so I'll leave for
> now.
i only mention it because 7 args to syscall is unusual ... i thinks mips/o32
is the only arch that does this.
> > on a related topic, would it be useful to document the exact calling
> > convention for architecture system calls ? from time to time, i need to
> > reference this, and i inevitably turn to a variety of sources to dig up
> > the answer (the kernel itself, or strace, or qemu, or glibc, or uClibc,
> > or lss, or other random places). i would find it handy to have all of
> > these in a single location.
>
> Sounds like it would be useful to have that documented. Would you have
> a chance to write patches for that?
should we do it in syscall(2) ? or a dedicated man page ?
if the former, create a dedicated section, or do it under NOTES ?
> --- a/man2/syscall.2
> +++ b/man2/syscall.2
>
> +64-bit value (e.g.,
> +.IR "long long" ) must be aligned to an even register pair.
this renders incorrectly. the reset of the sentence should be pulled onto a
new line.
> +That means inserting a dummy value into
> +.I r1
> +(the second argument of 0).
> +Similar issues can occur on MIPS with the O32 ABI and
> +on PowerPC with the 32-bit ABI.
> +The affected system calls are
> +.BR fadvise64_64 (2),
> +.BR ftruncate64 (2),
> +.BR posix_fadvise (2),
> +.BR pread64 (2),
> +.BR pwrite64 (2),
> +.BR readahead (2),
> +.BR sync_file_range (2),
> +and
> +.BR truncate64 (2).
i'm on the fence whether this reads better if there's a new paragraph starting
with "Similar issues", and whether the list of syscalls should be a flat list
(one syscall per line).
> --- a/man2/posix_fadvise.2
> +++ b/man2/posix_fadvise.2
> @@ -153,7 +153,10 @@ or
> first.
> .SS arm_fadvise()
> The ARM architecture
> -needs 64-bit arguments to be aligned in a suitable pair of registers.
> +needs 64-bit arguments to be aligned in a suitable pair of registers
> +(see
> +.BR syscall (2)
> +for further detail).
probably want to scrub the arm references altogether and say "some 32-bit
arches". this signature is used on other arches as well (ppc and xtensa at
least).
would also stop describing it as "flawed". there are tradeoffs when the ABI
imposes these kinds of requirements, and i'm not sure one is really better
than the other.
> --- a/man2/sync_file_range.2
> +++ b/man2/sync_file_range.2
> @@ -191,6 +191,9 @@ is flawed, since it forces a register to be wasted
> as padding between the
> and
> .I offset
> arguments.
> +(See
> +.BR syscall (2)
> +for details.)
> Therefore, these architectures define a different
> system call that orders the arguments suitably:
also in this man page, i would stop describing it as "flawed".
-mike
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : add notes
[not found] ` <201304010632.41520.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
@ 2013-04-02 6:54 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkgG2kdCC1tyZQkYU7O_nP7RB8VoCmx6eb8FcudU1s6RgA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 41+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-04-02 6:54 UTC (permalink / raw)
To: Mike Frysinger; +Cc: Changhee Han, linux-man, gunho.lee-Hm3cg6mZ9cc
On Mon, Apr 1, 2013 at 12:32 PM, Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org> wrote:
> On Monday 01 April 2013 05:29:11 Michael Kerrisk (man-pages) wrote:
>> On Mon, Apr 1, 2013 at 10:29 AM, Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org> wrote:
>> > not sure if it's worth mentioning, but this issue ends up forcing MIPS'
>> > O32 to take 7 arguments to syscall() :). on ARM/PPC, they avoid this by
>> > reordering the arguments.
>>
>> I'm not sure that we need quite this level of detail, so I'll leave for
>> now.
>
> i only mention it because 7 args to syscall is unusual ... i thinks mips/o32
> is the only arch that does this.
I'll add a note in the page source. Maybe I'll revisit this later.
>> > on a related topic, would it be useful to document the exact calling
>> > convention for architecture system calls ? from time to time, i need to
>> > reference this, and i inevitably turn to a variety of sources to dig up
>> > the answer (the kernel itself, or strace, or qemu, or glibc, or uClibc,
>> > or lss, or other random places). i would find it handy to have all of
>> > these in a single location.
>>
>> Sounds like it would be useful to have that documented. Would you have
>> a chance to write patches for that?
>
> should we do it in syscall(2) ? or a dedicated man page ?
It's a little hard to say until I see the shape of what comes. Can you
provide a rough per-syscall example or two of what you expect to
document? (Don't write too concrete a patch yet, until I can get a
handle on what you intend.)
> if the former, create a dedicated section, or do it under NOTES ?
*If* the former, then I'd say a subsection under NOTES. But maybe this
is better per-syscall. Not sure yet.
>> --- a/man2/syscall.2
>> +++ b/man2/syscall.2
>>
>> +64-bit value (e.g.,
>> +.IR "long long" ) must be aligned to an even register pair.
>
> this renders incorrectly. the reset of the sentence should be pulled onto a
> new line.
fixed.
>> +That means inserting a dummy value into
>> +.I r1
>> +(the second argument of 0).
>> +Similar issues can occur on MIPS with the O32 ABI and
>> +on PowerPC with the 32-bit ABI.
>> +The affected system calls are
>> +.BR fadvise64_64 (2),
>> +.BR ftruncate64 (2),
>> +.BR posix_fadvise (2),
>> +.BR pread64 (2),
>> +.BR pwrite64 (2),
>> +.BR readahead (2),
>> +.BR sync_file_range (2),
>> +and
>> +.BR truncate64 (2).
>
> i'm on the fence whether this reads better if there's a new paragraph starting
> with "Similar issues", and whether the list of syscalls should be a flat list
> (one syscall per line).
Tweaked.
>> --- a/man2/posix_fadvise.2
>> +++ b/man2/posix_fadvise.2
>> @@ -153,7 +153,10 @@ or
>> first.
>> .SS arm_fadvise()
>> The ARM architecture
>> -needs 64-bit arguments to be aligned in a suitable pair of registers.
>> +needs 64-bit arguments to be aligned in a suitable pair of registers
>> +(see
>> +.BR syscall (2)
>> +for further detail).
>
> probably want to scrub the arm references altogether and say "some 32-bit
> arches". this signature is used on other arches as well (ppc and xtensa at
> least).
Tweaked.
> would also stop describing it as "flawed". there are tradeoffs when the ABI
> imposes these kinds of requirements, and i'm not sure one is really better
> than the other.
Removed "flawed"
>> --- a/man2/sync_file_range.2
>> +++ b/man2/sync_file_range.2
>> @@ -191,6 +191,9 @@ is flawed, since it forces a register to be wasted
>> as padding between the
>> and
>> .I offset
>> arguments.
>> +(See
>> +.BR syscall (2)
>> +for details.)
>> Therefore, these architectures define a different
>> system call that orders the arguments suitably:
>
> also in this man page, i would stop describing it as "flawed".
Done.
Changes have been pushed to Git now.
Cheers,
Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH] man2 : syscall.2 : document syscall calling conventions
[not found] ` <CAKgNAkgG2kdCC1tyZQkYU7O_nP7RB8VoCmx6eb8FcudU1s6RgA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-04-02 23:17 ` Mike Frysinger
2013-04-07 10:00 ` Michael Kerrisk (man-pages)
0 siblings, 1 reply; 41+ messages in thread
From: Mike Frysinger @ 2013-04-02 23:17 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w; +Cc: linux-man
[-- Attachment #1: Type: Text/Plain, Size: 2665 bytes --]
On Tuesday 02 April 2013 02:54:39 Michael Kerrisk (man-pages) wrote:
> On Mon, Apr 1, 2013 at 12:32 PM, Mike Frysinger wrote:
> > On Monday 01 April 2013 05:29:11 Michael Kerrisk (man-pages) wrote:
> >> On Mon, Apr 1, 2013 at 10:29 AM, Mike Frysinger wrote:
> >> > on a related topic, would it be useful to document the exact calling
> >> > convention for architecture system calls ? from time to time, i need
> >> > to reference this, and i inevitably turn to a variety of sources to
> >> > dig up the answer (the kernel itself, or strace, or qemu, or glibc,
> >> > or uClibc, or lss, or other random places). i would find it handy to
> >> > have all of these in a single location.
> >>
> >> Sounds like it would be useful to have that documented. Would you have
> >> a chance to write patches for that?
> >
> > should we do it in syscall(2) ? or a dedicated man page ?
>
> It's a little hard to say until I see the shape of what comes. Can you
> provide a rough per-syscall example or two of what you expect to
> document? (Don't write too concrete a patch yet, until I can get a
> handle on what you intend.)
this renders nicely i think. it shows most of the stuff i'm interested in.
might be useful to add a dedicated section covering the clobbers in the
future.
-mike
--- a/man2/syscall.2
+++ b/man2/syscall.2
@@ -79,6 +79,35 @@ and an error code is stored in
.BR syscall ()
first appeared in
4BSD.
+.SS Architecture calling conventions
+Every architecture has its own way of invoking & passing arguments to the
+kernel.
+Note that the instruction listed below might not be the fastest or best way to
+transition to the kernel, so you might have to refer to the VDSO.
+Also note that this doesn't cover the entire calling convention -- some
+architectures may indiscriminately clobber other registers not listed here.
+.if t \{\
+.ft CW
+\}
+.TS
+l l l l l l l l l l l.
+arch/ABI insn NR ret arg1 arg2 arg3 arg4 arg5 arg6 arg7
+_
+arm/OABI swi NR; - a1 a1 a2 a3 a4 v1 v2 v3
+arm/EABI swi 0x0; r7 r1 r1 r2 r3 r4 r5 r6 r7
+bfin excpt 0x0; P0 R0 R0 R1 R2 R3 R4 R5 -
+i386 int $0x80; eax eax ebx ecx edx esi edi ebp -
+ia64 break 0x100000; r15 r10/r8 r11 r9 r10 r14 r15 r13 -
+.\" not sure about insn or NR
+.\" parisc ble 0x100(%%sr2, %%r0); - r28 r26 r25 r24 r23 r22 r21 -
+sparc/32 t 0x10; g1 o0 o0 o1 o2 o3 o4 o5 -
+sparc/64 t 0x6d; g1 o0 o0 o1 o2 o3 o4 o5 -
+x86_64 syscall; rax rax rdi rsi rdx r10 r8 r9 -
+.TE
+.if t \{\
+.in
+.ft P
+\}
.SS Architecture-specific requirements
Each architecture ABI has its own requirements on how
system call arguments are passed to the kernel.
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
2013-04-02 23:17 ` [PATCH] man2 : syscall.2 : document syscall calling conventions Mike Frysinger
@ 2013-04-07 10:00 ` Michael Kerrisk (man-pages)
2013-04-07 13:55 ` Kyle McMartin
[not found] ` <CAKgNAkgODPSWSeA8ZymiAjFBqSAZQMtQe9GW84Y6QHdFEc9S-w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 2 replies; 41+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-04-07 10:00 UTC (permalink / raw)
To: Mike Frysinger
Cc: linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
linux-parisc
[Adding a few people to CC who may be able to help with Mike's doubts
on PA-RISC; folks, if any of you could have a quick look at the parisc
piece below, that would be helpful]
Mike,
On Wed, Apr 3, 2013 at 1:17 AM, Mike Frysinger <vapier@gentoo.org> wrote:
> On Tuesday 02 April 2013 02:54:39 Michael Kerrisk (man-pages) wrote:
>> On Mon, Apr 1, 2013 at 12:32 PM, Mike Frysinger wrote:
>> > On Monday 01 April 2013 05:29:11 Michael Kerrisk (man-pages) wrote:
>> >> On Mon, Apr 1, 2013 at 10:29 AM, Mike Frysinger wrote:
>> >> > on a related topic, would it be useful to document the exact calling
>> >> > convention for architecture system calls ? from time to time, i need
>> >> > to reference this, and i inevitably turn to a variety of sources to
>> >> > dig up the answer (the kernel itself, or strace, or qemu, or glibc,
>> >> > or uClibc, or lss, or other random places). i would find it handy to
>> >> > have all of these in a single location.
>> >>
>> >> Sounds like it would be useful to have that documented. Would you have
>> >> a chance to write patches for that?
>> >
>> > should we do it in syscall(2) ? or a dedicated man page ?
>>
>> It's a little hard to say until I see the shape of what comes. Can you
>> provide a rough per-syscall example or two of what you expect to
>> document? (Don't write too concrete a patch yet, until I can get a
>> handle on what you intend.)
>
> this renders nicely i think. it shows most of the stuff i'm interested in.
> might be useful to add a dedicated section covering the clobbers in the
> future.
Thanks for that. It looks good to me, and I have applied. But it
renders too wide (wherever possible, I try to ensure that everything
renders inside 80 columns), so I have split into tables, one with
"instruction, NR, ret" and another with the arguments (arg1 to arg7).
Now, just to make 100% sure of your intention, the NR column would be
better named "syscall #" (or similar), right? (I've made that change.)
> --- a/man2/syscall.2
> +++ b/man2/syscall.2
> @@ -79,6 +79,35 @@ and an error code is stored in
> .BR syscall ()
> first appeared in
> 4BSD.
> +.SS Architecture calling conventions
> +Every architecture has its own way of invoking & passing arguments to the
> +kernel.
> +Note that the instruction listed below might not be the fastest or best way to
> +transition to the kernel, so you might have to refer to the VDSO.
Mike, any chance that I could interest you in writing a vdso(7) man
page? I've felt the lack of such a page for a while (it need not be
too long), but am not deep enough into the details to write it easily
(I am not sure if you are).
> +Also note that this doesn't cover the entire calling convention -- some
> +architectures may indiscriminately clobber other registers not listed here.
> +.if t \{\
> +.ft CW
> +\}
> +.TS
> +l l l l l l l l l l l.
> +arch/ABI insn NR ret arg1 arg2 arg3 arg4 arg5 arg6 arg7
> +_
> +arm/OABI swi NR; - a1 a1 a2 a3 a4 v1 v2 v3
> +arm/EABI swi 0x0; r7 r1 r1 r2 r3 r4 r5 r6 r7
> +bfin excpt 0x0; P0 R0 R0 R1 R2 R3 R4 R5 -
> +i386 int $0x80; eax eax ebx ecx edx esi edi ebp -
> +ia64 break 0x100000; r15 r10/r8 r11 r9 r10 r14 r15 r13 -
> +.\" not sure about insn or NR
> +.\" parisc ble 0x100(%%sr2, %%r0); - r28 r26 r25 r24 r23 r22 r21 -
PA-RISC folks, are you able to confirm/correct the above?
> +sparc/32 t 0x10; g1 o0 o0 o1 o2 o3 o4 o5 -
> +sparc/64 t 0x6d; g1 o0 o0 o1 o2 o3 o4 o5 -
> +x86_64 syscall; rax rax rdi rsi rdx r10 r8 r9 -
> +.TE
> +.if t \{\
> +.in
> +.ft P
> +\}
> .SS Architecture-specific requirements
> Each architecture ABI has its own requirements on how
> system call arguments are passed to the kernel.
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
2013-04-07 10:00 ` Michael Kerrisk (man-pages)
@ 2013-04-07 13:55 ` Kyle McMartin
2013-04-07 14:56 ` James Bottomley
[not found] ` <20130407135514.GW12938-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org>
[not found] ` <CAKgNAkgODPSWSeA8ZymiAjFBqSAZQMtQe9GW84Y6QHdFEc9S-w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
1 sibling, 2 replies; 41+ messages in thread
From: Kyle McMartin @ 2013-04-07 13:55 UTC (permalink / raw)
To: Michael Kerrisk (man-pages)
Cc: Mike Frysinger, linux-man, Kyle McMartin, Helge Deller,
James E.J. Bottomley, linux-parisc
On Sun, Apr 07, 2013 at 12:00:50PM +0200, Michael Kerrisk (man-pages) wrote:
> [Adding a few people to CC who may be able to help with Mike's doubts
> on PA-RISC; folks, if any of you could have a quick look at the parisc
> piece below, that would be helpful]
>
The syscall number is in %r20, everything else looks correct. The
returned value is in %r28 and the args are %r26 through %r21.
--Kyle
> Mike,
>
> On Wed, Apr 3, 2013 at 1:17 AM, Mike Frysinger <vapier@gentoo.org> wrote:
> > On Tuesday 02 April 2013 02:54:39 Michael Kerrisk (man-pages) wrote:
> >> On Mon, Apr 1, 2013 at 12:32 PM, Mike Frysinger wrote:
> >> > On Monday 01 April 2013 05:29:11 Michael Kerrisk (man-pages) wrote:
> >> >> On Mon, Apr 1, 2013 at 10:29 AM, Mike Frysinger wrote:
> >> >> > on a related topic, would it be useful to document the exact calling
> >> >> > convention for architecture system calls ? from time to time, i need
> >> >> > to reference this, and i inevitably turn to a variety of sources to
> >> >> > dig up the answer (the kernel itself, or strace, or qemu, or glibc,
> >> >> > or uClibc, or lss, or other random places). i would find it handy to
> >> >> > have all of these in a single location.
> >> >>
> >> >> Sounds like it would be useful to have that documented. Would you have
> >> >> a chance to write patches for that?
> >> >
> >> > should we do it in syscall(2) ? or a dedicated man page ?
> >>
> >> It's a little hard to say until I see the shape of what comes. Can you
> >> provide a rough per-syscall example or two of what you expect to
> >> document? (Don't write too concrete a patch yet, until I can get a
> >> handle on what you intend.)
> >
> > this renders nicely i think. it shows most of the stuff i'm interested in.
> > might be useful to add a dedicated section covering the clobbers in the
> > future.
>
> Thanks for that. It looks good to me, and I have applied. But it
> renders too wide (wherever possible, I try to ensure that everything
> renders inside 80 columns), so I have split into tables, one with
> "instruction, NR, ret" and another with the arguments (arg1 to arg7).
>
> Now, just to make 100% sure of your intention, the NR column would be
> better named "syscall #" (or similar), right? (I've made that change.)
>
> > --- a/man2/syscall.2
> > +++ b/man2/syscall.2
> > @@ -79,6 +79,35 @@ and an error code is stored in
> > .BR syscall ()
> > first appeared in
> > 4BSD.
> > +.SS Architecture calling conventions
> > +Every architecture has its own way of invoking & passing arguments to the
> > +kernel.
> > +Note that the instruction listed below might not be the fastest or best way to
> > +transition to the kernel, so you might have to refer to the VDSO.
>
> Mike, any chance that I could interest you in writing a vdso(7) man
> page? I've felt the lack of such a page for a while (it need not be
> too long), but am not deep enough into the details to write it easily
> (I am not sure if you are).
>
> > +Also note that this doesn't cover the entire calling convention -- some
> > +architectures may indiscriminately clobber other registers not listed here.
> > +.if t \{\
> > +.ft CW
> > +\}
> > +.TS
> > +l l l l l l l l l l l.
> > +arch/ABI insn NR ret arg1 arg2 arg3 arg4 arg5 arg6 arg7
> > +_
> > +arm/OABI swi NR; - a1 a1 a2 a3 a4 v1 v2 v3
> > +arm/EABI swi 0x0; r7 r1 r1 r2 r3 r4 r5 r6 r7
> > +bfin excpt 0x0; P0 R0 R0 R1 R2 R3 R4 R5 -
> > +i386 int $0x80; eax eax ebx ecx edx esi edi ebp -
> > +ia64 break 0x100000; r15 r10/r8 r11 r9 r10 r14 r15 r13 -
> > +.\" not sure about insn or NR
> > +.\" parisc ble 0x100(%%sr2, %%r0); - r28 r26 r25 r24 r23 r22 r21 -
>
> PA-RISC folks, are you able to confirm/correct the above?
>
> > +sparc/32 t 0x10; g1 o0 o0 o1 o2 o3 o4 o5 -
> > +sparc/64 t 0x6d; g1 o0 o0 o1 o2 o3 o4 o5 -
> > +x86_64 syscall; rax rax rdi rsi rdx r10 r8 r9 -
> > +.TE
> > +.if t \{\
> > +.in
> > +.ft P
> > +\}
> > .SS Architecture-specific requirements
> > Each architecture ABI has its own requirements on how
> > system call arguments are passed to the kernel.
>
> Cheers,
>
> Michael
>
> --
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Author of "The Linux Programming Interface"; http://man7.org/tlpi/
>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
2013-04-07 13:55 ` Kyle McMartin
@ 2013-04-07 14:56 ` James Bottomley
2013-04-07 15:11 ` Kyle McMartin
[not found] ` <20130407135514.GW12938-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org>
1 sibling, 1 reply; 41+ messages in thread
From: James Bottomley @ 2013-04-07 14:56 UTC (permalink / raw)
To: Kyle McMartin
Cc: Michael Kerrisk (man-pages),
Mike Frysinger, linux-man, Kyle McMartin, Helge Deller,
James E.J. Bottomley, linux-parisc
On Sun, 2013-04-07 at 09:55 -0400, Kyle McMartin wrote:
> On Sun, Apr 07, 2013 at 12:00:50PM +0200, Michael Kerrisk (man-pages) wrote:
> > [Adding a few people to CC who may be able to help with Mike's doubts
> > on PA-RISC; folks, if any of you could have a quick look at the parisc
> > piece below, that would be helpful]
> >
>
> The syscall number is in %r20, everything else looks correct. The
> returned value is in %r28 and the args are %r26 through %r21.
Actually, that's not quite correct. on 64 bits it's arg1-8 are %r26-%
r19 but on 32 the convention is that arg1-arg4 are %r26-%r23 and the
rest on stack. We can also do register pair combining on 32 bits for a
64 bit argument.
Our register use is documented in
Documentation/parisc/registers
James
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
2013-04-07 14:56 ` James Bottomley
@ 2013-04-07 15:11 ` Kyle McMartin
[not found] ` <20130407151134.GX12938-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org>
0 siblings, 1 reply; 41+ messages in thread
From: Kyle McMartin @ 2013-04-07 15:11 UTC (permalink / raw)
To: James Bottomley
Cc: Michael Kerrisk (man-pages),
Mike Frysinger, linux-man, Kyle McMartin, Helge Deller,
James E.J. Bottomley, linux-parisc
On Sun, Apr 07, 2013 at 07:56:49AM -0700, James Bottomley wrote:
> Actually, that's not quite correct. on 64 bits it's arg1-8 are %r26-%
> r19 but on 32 the convention is that arg1-arg4 are %r26-%r23 and the
> rest on stack. We can also do register pair combining on 32 bits for a
> 64 bit argument.
I guess the confusion is whether you're writing this from the kernel
side or the userspace side. The syscall instruction is called with six
arg registers, but we fix it on entry to the kernel when we call into C.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
[not found] ` <20130407151134.GX12938-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org>
@ 2013-04-07 15:38 ` James Bottomley
2013-04-08 9:18 ` Michael Kerrisk (man-pages)
1 sibling, 0 replies; 41+ messages in thread
From: James Bottomley @ 2013-04-07 15:38 UTC (permalink / raw)
To: Kyle McMartin
Cc: Michael Kerrisk (man-pages),
Mike Frysinger, linux-man, Kyle McMartin, Helge Deller,
James E.J. Bottomley, linux-parisc-u79uwXL29TY76Z2rM5mHXA
On Sun, 2013-04-07 at 11:11 -0400, Kyle McMartin wrote:
> On Sun, Apr 07, 2013 at 07:56:49AM -0700, James Bottomley wrote:
> > Actually, that's not quite correct. on 64 bits it's arg1-8 are %r26-%
> > r19 but on 32 the convention is that arg1-arg4 are %r26-%r23 and the
> > rest on stack. We can also do register pair combining on 32 bits for a
> > 64 bit argument.
>
> I guess the confusion is whether you're writing this from the kernel
> side or the userspace side. The syscall instruction is called with six
> arg registers, but we fix it on entry to the kernel when we call into C.
Oh, right, syscall arguments, sorry didn't manage to extract the content
from all the quotes. I was just thinking general ABI.
The syscall arguments are all in
arch/parisc/include/asm/unistd.h
As Kyle says, we override the calling convention and define in-register
arguments even on 32 bit (so %r26-%r21). We actually don't define
_syscall6() yet, but we're ready for it.
James
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
[not found] ` <20130407135514.GW12938-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org>
@ 2013-04-07 18:39 ` Mike Frysinger
2013-04-07 18:48 ` John David Anglin
0 siblings, 1 reply; 41+ messages in thread
From: Mike Frysinger @ 2013-04-07 18:39 UTC (permalink / raw)
To: Kyle McMartin
Cc: Michael Kerrisk (man-pages),
linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
linux-parisc-u79uwXL29TY76Z2rM5mHXA
[-- Attachment #1: Type: Text/Plain, Size: 884 bytes --]
On Sunday 07 April 2013 09:55:14 Kyle McMartin wrote:
> On Sun, Apr 07, 2013 at 12:00:50PM +0200, Michael Kerrisk (man-pages) wrote:
> > [Adding a few people to CC who may be able to help with Mike's doubts
> > on PA-RISC; folks, if any of you could have a quick look at the parisc
> > piece below, that would be helpful]
>
> The syscall number is in %r20, everything else looks correct. The
> returned value is in %r28 and the args are %r26 through %r21.
just to be clear, the only insn you need is:
ble 0x100(%sr2, %r0);
the kernel docs say sr2 holds the kernel gateway page (so i guess 0x100 is a
known offset into that). the docs don't mention r0 that i can see, so i'm
guessing it's one of those "always 0" registers ?
the sysdep code has an ldi call in the branch delay slot (i think), but all
that seems to do is load r20 with the syscall nr.
-mike
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
[not found] ` <CAKgNAkgODPSWSeA8ZymiAjFBqSAZQMtQe9GW84Y6QHdFEc9S-w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-04-07 18:43 ` Mike Frysinger
0 siblings, 0 replies; 41+ messages in thread
From: Mike Frysinger @ 2013-04-07 18:43 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w; +Cc: linux-man
[-- Attachment #1: Type: Text/Plain, Size: 3144 bytes --]
On Sunday 07 April 2013 06:00:50 Michael Kerrisk (man-pages) wrote:
> On Wed, Apr 3, 2013 at 1:17 AM, Mike Frysinger wrote:
> > On Tuesday 02 April 2013 02:54:39 Michael Kerrisk (man-pages) wrote:
> >> On Mon, Apr 1, 2013 at 12:32 PM, Mike Frysinger wrote:
> >> > On Monday 01 April 2013 05:29:11 Michael Kerrisk (man-pages) wrote:
> >> >> On Mon, Apr 1, 2013 at 10:29 AM, Mike Frysinger wrote:
> >> >> > on a related topic, would it be useful to document the exact
> >> >> > calling convention for architecture system calls ? from time to
> >> >> > time, i need to reference this, and i inevitably turn to a variety
> >> >> > of sources to dig up the answer (the kernel itself, or strace, or
> >> >> > qemu, or glibc, or uClibc, or lss, or other random places). i
> >> >> > would find it handy to have all of these in a single location.
> >> >>
> >> >> Sounds like it would be useful to have that documented. Would you
> >> >> have a chance to write patches for that?
> >> >
> >> > should we do it in syscall(2) ? or a dedicated man page ?
> >>
> >> It's a little hard to say until I see the shape of what comes. Can you
> >> provide a rough per-syscall example or two of what you expect to
> >> document? (Don't write too concrete a patch yet, until I can get a
> >> handle on what you intend.)
> >
> > this renders nicely i think. it shows most of the stuff i'm interested
> > in. might be useful to add a dedicated section covering the clobbers in
> > the future.
>
> Thanks for that. It looks good to me, and I have applied. But it
> renders too wide (wherever possible, I try to ensure that everything
> renders inside 80 columns), so I have split into tables, one with
> "instruction, NR, ret" and another with the arguments (arg1 to arg7).
>
> Now, just to make 100% sure of your intention, the NR column would be
> better named "syscall #" (or similar), right? (I've made that change.)
i called it "nr" because that's the common convention (__NR_xxx/etc...) in
code bases, and because it does a nice job of not pushing the table too wide.
if you've split up the table though, that should no longer be a problem.
> > --- a/man2/syscall.2
> > +++ b/man2/syscall.2
> > @@ -79,6 +79,35 @@ and an error code is stored in
> > .BR syscall ()
> > first appeared in
> > 4BSD.
> > +.SS Architecture calling conventions
> > +Every architecture has its own way of invoking & passing arguments to
> > the +kernel.
> > +Note that the instruction listed below might not be the fastest or best
> > way to +transition to the kernel, so you might have to refer to the
> > VDSO.
>
> Mike, any chance that I could interest you in writing a vdso(7) man
> page? I've felt the lack of such a page for a while (it need not be
> too long), but am not deep enough into the details to write it easily
> (I am not sure if you are).
i might take a stab at it. it's annoying to constantly have to refer to the
kernel source when looking something up.
in order to be useful, i think there will have to be arch-specific sections
which document the funcs each port provides.
-mike
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
2013-04-07 18:39 ` Mike Frysinger
@ 2013-04-07 18:48 ` John David Anglin
[not found] ` <BLU0-SMTP986B123D17DB8B88214F797C40-MsuGFMq8XAE@public.gmane.org>
2013-04-12 1:55 ` Mike Frysinger
0 siblings, 2 replies; 41+ messages in thread
From: John David Anglin @ 2013-04-07 18:48 UTC (permalink / raw)
To: Mike Frysinger
Cc: Kyle McMartin, Michael Kerrisk (man-pages),
linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
linux-parisc
On 7-Apr-13, at 2:39 PM, Mike Frysinger wrote:
> just to be clear, the only insn you need is:
> ble 0x100(%sr2, %r0);
>
> the kernel docs say sr2 holds the kernel gateway page (so i guess
> 0x100 is a
> known offset into that). the docs don't mention r0 that i can see,
> so i'm
> guessing it's one of those "always 0" registers ?
Yes. There is also an entry at offset 0xb0 for light-weight-
syscalls. Currently,
this implements an atomic CAS operation used for pthread support.
Dave
--
John David Anglin dave.anglin@bell.net
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
[not found] ` <20130407151134.GX12938-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org>
2013-04-07 15:38 ` James Bottomley
@ 2013-04-08 9:18 ` Michael Kerrisk (man-pages)
1 sibling, 0 replies; 41+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-04-08 9:18 UTC (permalink / raw)
To: Kyle McMartin
Cc: James Bottomley, Mike Frysinger, linux-man, Kyle McMartin,
Helge Deller, James E.J. Bottomley,
linux-parisc-u79uwXL29TY76Z2rM5mHXA
On Sun, Apr 7, 2013 at 5:11 PM, Kyle McMartin <kyle-pfcGkIkfWfAsA/PxXw9srA@public.gmane.org> wrote:
> On Sun, Apr 07, 2013 at 07:56:49AM -0700, James Bottomley wrote:
>> Actually, that's not quite correct. on 64 bits it's arg1-8 are %r26-%
>> r19 but on 32 the convention is that arg1-arg4 are %r26-%r23 and the
>> rest on stack. We can also do register pair combining on 32 bits for a
>> 64 bit argument.
>
> I guess the confusion is whether you're writing this from the kernel
> side or the userspace side. The syscall instruction is called with six
> arg registers, but we fix it on entry to the kernel when we call into C.> --
> To unsubscribe from this list: send the line "unsubscribe linux-man" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Thanks, Kyle.
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
[not found] ` <BLU0-SMTP986B123D17DB8B88214F797C40-MsuGFMq8XAE@public.gmane.org>
@ 2013-04-08 9:20 ` Michael Kerrisk (man-pages)
0 siblings, 0 replies; 41+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-04-08 9:20 UTC (permalink / raw)
To: Mike Frysinger, Kyle McMartin
Cc: John David Anglin, linux-man, Helge Deller, James E.J. Bottomley,
linux-parisc-u79uwXL29TY76Z2rM5mHXA
On Sun, Apr 7, 2013 at 8:48 PM, John David Anglin <dave.anglin-CzeTG9NwML0@public.gmane.org=
> wrote:
> On 7-Apr-13, at 2:39 PM, Mike Frysinger wrote:
>
>> just to be clear, the only insn you need is:
>> ble 0x100(%sr2, %r0);
>>
>> the kernel docs say sr2 holds the kernel gateway page (so i guess 0x=
100 is
>> a
>> known offset into that). the docs don't mention r0 that i can see, =
so i'm
>> guessing it's one of those "always 0" registers ?
>
>
> Yes. There is also an entry at offset 0xb0 for light-weight-syscalls=
=2E
> Currently,
> this implements an atomic CAS operation used for pthread support.
Mike (and Kyle),
=46or review, here are the tables as they now stand:
=3D=3D=3D=3D=3D
Architecture calling conventions
Every architecture has its own way of invoking and passing argum=
ents to
the kernel. The details for various architectures are listed =
in the
two tables below.
The first table lists the instruction used to transition to=
kernel
mode, (which might not be the fastest or best way to transition =
to the
kernel, so you might have to refer to the VDSO), the register =
used to
indicate the system call number, and the register used to retu=
rn the
system call result.
arch/ABI instruction syscall # retval Notes
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80
arm/OABI swi NR - a1 NR is syscal=
l #
arm/EABI swi 0x0 r7 r1
blackfin excpt 0x0 P0 R0
i386 int $0x80 eax eax
ia64 break 0x100000 r15 r10/r8C
parisc ble 0x100(%sr2, %r0) r20 r28
sparc/32 t 0x10 g1 o0
sparc/64 t 0x6d g1 o0
x86_64 syscall rax rax
The second table shows the registers used to pass the system cal=
l argu=E2=80=90
ments.
arch/ABI arg1 arg2 arg3 arg4 arg5 arg6 arg7
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80
arm/OABI a1 a2 a3 a4 v1 v2 v3
arm/EABI r1 r2 r3 r4 r5 r6 r7
blackfin R0 R1 R2 R3 R4 R5 -
i386 ebx ecx edx esi edi ebp -
ia64 r11 r9 r10 r14 r15 r13 -
parisc r26 r25 r24 r23 r22 r21 -
sparc/32 o0 o1 o2 o3 o4 o5 -
sparc/64 o0 o1 o2 o3 o4 o5 -
x86_64 rdi rsi rdx r10 r8 r9 -
Note that these tables don't cover the entire calling conventi=
on=E2=80=94some
architectures may indiscriminately clobber other registers not=
listed
here.
=3D=3D=3D=3D=3D
Cheers,
Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
@ 2013-04-08 9:20 ` Michael Kerrisk (man-pages)
0 siblings, 0 replies; 41+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-04-08 9:20 UTC (permalink / raw)
To: Mike Frysinger, Kyle McMartin
Cc: John David Anglin, linux-man, Helge Deller, James E.J. Bottomley,
linux-parisc-u79uwXL29TY76Z2rM5mHXA
On Sun, Apr 7, 2013 at 8:48 PM, John David Anglin <dave.anglin-CzeTG9NwML0@public.gmane.org> wrote:
> On 7-Apr-13, at 2:39 PM, Mike Frysinger wrote:
>
>> just to be clear, the only insn you need is:
>> ble 0x100(%sr2, %r0);
>>
>> the kernel docs say sr2 holds the kernel gateway page (so i guess 0x100 is
>> a
>> known offset into that). the docs don't mention r0 that i can see, so i'm
>> guessing it's one of those "always 0" registers ?
>
>
> Yes. There is also an entry at offset 0xb0 for light-weight-syscalls.
> Currently,
> this implements an atomic CAS operation used for pthread support.
Mike (and Kyle),
For review, here are the tables as they now stand:
=====
Architecture calling conventions
Every architecture has its own way of invoking and passing arguments to
the kernel. The details for various architectures are listed in the
two tables below.
The first table lists the instruction used to transition to kernel
mode, (which might not be the fastest or best way to transition to the
kernel, so you might have to refer to the VDSO), the register used to
indicate the system call number, and the register used to return the
system call result.
arch/ABI instruction syscall # retval Notes
────────────────────────────────────────────────────────────────────
arm/OABI swi NR - a1 NR is syscall #
arm/EABI swi 0x0 r7 r1
blackfin excpt 0x0 P0 R0
i386 int $0x80 eax eax
ia64 break 0x100000 r15 r10/r8C
parisc ble 0x100(%sr2, %r0) r20 r28
sparc/32 t 0x10 g1 o0
sparc/64 t 0x6d g1 o0
x86_64 syscall rax rax
The second table shows the registers used to pass the system call argu‐
ments.
arch/ABI arg1 arg2 arg3 arg4 arg5 arg6 arg7
──────────────────────────────────────────────────────────
arm/OABI a1 a2 a3 a4 v1 v2 v3
arm/EABI r1 r2 r3 r4 r5 r6 r7
blackfin R0 R1 R2 R3 R4 R5 -
i386 ebx ecx edx esi edi ebp -
ia64 r11 r9 r10 r14 r15 r13 -
parisc r26 r25 r24 r23 r22 r21 -
sparc/32 o0 o1 o2 o3 o4 o5 -
sparc/64 o0 o1 o2 o3 o4 o5 -
x86_64 rdi rsi rdx r10 r8 r9 -
Note that these tables don't cover the entire calling convention—some
architectures may indiscriminately clobber other registers not listed
here.
=====
Cheers,
Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
[not found] ` <CAKgNAkhv6tovvnucoofDR-eOe4H7xeFZDam9+iaVVndEqbuoXg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-04-12 1:40 ` Mike Frysinger
[not found] ` <201304112140.18506.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
0 siblings, 1 reply; 41+ messages in thread
From: Mike Frysinger @ 2013-04-12 1:40 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w; +Cc: linux-man
[-- Attachment #1: Type: Text/Plain, Size: 417 bytes --]
On Monday 08 April 2013 05:20:07 Michael Kerrisk (man-pages) wrote:
> arch/ABI instruction syscall # retval Notes
i was thinking it also might be useful to mention the register where the
syscall # is kept for syscall_restart, but we can do that in a follow up
> ia64 break 0x100000 r15 r10/r8C
looks like you added a typo :). it's "r8", not "r8C".
-mike
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
2013-04-07 18:48 ` John David Anglin
[not found] ` <BLU0-SMTP986B123D17DB8B88214F797C40-MsuGFMq8XAE@public.gmane.org>
@ 2013-04-12 1:55 ` Mike Frysinger
[not found] ` <201304112155.46349.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
2013-04-12 14:01 ` Kyle McMartin
1 sibling, 2 replies; 41+ messages in thread
From: Mike Frysinger @ 2013-04-12 1:55 UTC (permalink / raw)
To: John David Anglin
Cc: Kyle McMartin, Michael Kerrisk (man-pages),
linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
linux-parisc
[-- Attachment #1: Type: Text/Plain, Size: 686 bytes --]
On Sunday 07 April 2013 14:48:42 John David Anglin wrote:
> On 7-Apr-13, at 2:39 PM, Mike Frysinger wrote:
> > just to be clear, the only insn you need is:
> > ble 0x100(%sr2, %r0);
> >
> > the kernel docs say sr2 holds the kernel gateway page (so i guess
> > 0x100 is a
> > known offset into that). the docs don't mention r0 that i can see,
> > so i'm
> > guessing it's one of those "always 0" registers ?
>
> Yes. There is also an entry at offset 0xb0 for light-weight-
> syscalls. Currently,
> this implements an atomic CAS operation used for pthread support.
interesting. sounds like a poor man's vDSO. i'll document this the new
vdso(7) man page.
-mike
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
[not found] ` <201304112155.46349.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
@ 2013-04-12 2:34 ` John David Anglin
2013-04-12 3:38 ` Mike Frysinger
0 siblings, 1 reply; 41+ messages in thread
From: John David Anglin @ 2013-04-12 2:34 UTC (permalink / raw)
To: Mike Frysinger
Cc: Kyle McMartin, Michael Kerrisk (man-pages),
linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
linux-parisc-u79uwXL29TY76Z2rM5mHXA
On 11-Apr-13, at 9:55 PM, Mike Frysinger wrote:
> On Sunday 07 April 2013 14:48:42 John David Anglin wrote:
>> On 7-Apr-13, at 2:39 PM, Mike Frysinger wrote:
>>> just to be clear, the only insn you need is:
>>> ble 0x100(%sr2, %r0);
>>>
>>> the kernel docs say sr2 holds the kernel gateway page (so i guess
>>> 0x100 is a
>>> known offset into that). the docs don't mention r0 that i can see,
>>> so i'm
>>> guessing it's one of those "always 0" registers ?
>>
>> Yes. There is also an entry at offset 0xb0 for light-weight-
>> syscalls. Currently,
>> this implements an atomic CAS operation used for pthread support.
>
> interesting. sounds like a poor man's vDSO. i'll document this the
> new
> vdso(7) man page.
Not exactly, the code runs on the gateway page which is in kernel space.
The main reason for doing the operation in kernel space is to prevent
processes from being preempted while executing in the lock region. In
general,
parisc processes are not preempted on the gateway page. There are
some subtleties regarding fault handling.
There is support in glibc and libgcc for these calls. The libgcc
implementation
in linux-atomic.c is very similar to that on arm.
Dave
--
John David Anglin dave.anglin-CzeTG9NwML0@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
2013-04-12 2:34 ` John David Anglin
@ 2013-04-12 3:38 ` Mike Frysinger
2013-04-12 4:45 ` James Bottomley
0 siblings, 1 reply; 41+ messages in thread
From: Mike Frysinger @ 2013-04-12 3:38 UTC (permalink / raw)
To: John David Anglin
Cc: Kyle McMartin, Michael Kerrisk (man-pages),
linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
linux-parisc
[-- Attachment #1: Type: Text/Plain, Size: 2707 bytes --]
On Thursday 11 April 2013 22:34:43 John David Anglin wrote:
> On 11-Apr-13, at 9:55 PM, Mike Frysinger wrote:
> > On Sunday 07 April 2013 14:48:42 John David Anglin wrote:
> >> On 7-Apr-13, at 2:39 PM, Mike Frysinger wrote:
> >>> just to be clear, the only insn you need is:
> >>> ble 0x100(%sr2, %r0);
> >>>
> >>> the kernel docs say sr2 holds the kernel gateway page (so i guess
> >>> 0x100 is a
> >>> known offset into that). the docs don't mention r0 that i can see,
> >>> so i'm
> >>> guessing it's one of those "always 0" registers ?
> >>
> >> Yes. There is also an entry at offset 0xb0 for light-weight-
> >> syscalls. Currently,
> >> this implements an atomic CAS operation used for pthread support.
> >
> > interesting. sounds like a poor man's vDSO. i'll document this the
> > new
> > vdso(7) man page.
>
> Not exactly, the code runs on the gateway page which is in kernel space.
> The main reason for doing the operation in kernel space is to prevent
> processes from being preempted while executing in the lock region. In
> general,
> parisc processes are not preempted on the gateway page. There are
> some subtleties regarding fault handling.
sure ... the Blackfin arch does a similar thing for providing fast atomic
primitives to userspace since the ISA can't.
what do you think of this section for vdso(7) ? i might have to split the
"real" vdso arches from these others since there's a couple now (arm, bfin,
parisc), and i think there might be more down the line (microblaze).
.SS parisc (hppa) functions
.\" See linux/arch/parisc/kernel/syscall.S
.\" See linux/Documentation/parisc/registers
The parisc port has a code page full of utility functions.
Rather than use the normal ELF aux vector approach, it passes the address of
the page to the process via the SR2 register.
This is done to match the way HP-UX works.
Since it's just a raw page of code, there is no ELF information for doing
symbol lookups or versioning.
Simply call into the appropriate offset via the branch instruction, e.g.:
.br
ble <offset>(%sr2, %r0)
.if t \{\
.ft CW
\}
.TS
l l.
offset function
_
00b0 lws_entry
00e0 set_thread_pointer
0100 linux_gateway_entry (syscall)
0268 syscall_nosys
0274 tracesys
0324 tracesys_next
0368 tracesys_exit
03a0 tracesys_sigexit
03b8 lws_start
03dc lws_exit_nosys
03e0 lws_exit
03e4 lws_compare_and_swap64
03e8 lws_compare_and_swap
0404 cas_wouldblock
0410 cas_action
.TE
.if t \{\
.in
.ft P
\}
> There is support in glibc and libgcc for these calls. The libgcc
> implementation
> in linux-atomic.c is very similar to that on arm.
interesting. another arch to add :).
-mike
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
2013-04-12 3:38 ` Mike Frysinger
@ 2013-04-12 4:45 ` James Bottomley
2013-04-12 12:17 ` John David Anglin
2013-04-12 18:45 ` Mike Frysinger
0 siblings, 2 replies; 41+ messages in thread
From: James Bottomley @ 2013-04-12 4:45 UTC (permalink / raw)
To: Mike Frysinger
Cc: John David Anglin, Kyle McMartin, Michael Kerrisk (man-pages),
linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
linux-parisc
On Thu, 2013-04-11 at 23:38 -0400, Mike Frysinger wrote:
> On Thursday 11 April 2013 22:34:43 John David Anglin wrote:
> > On 11-Apr-13, at 9:55 PM, Mike Frysinger wrote:
> > > On Sunday 07 April 2013 14:48:42 John David Anglin wrote:
> > >> On 7-Apr-13, at 2:39 PM, Mike Frysinger wrote:
> > >>> just to be clear, the only insn you need is:
> > >>> ble 0x100(%sr2, %r0);
> > >>>
> > >>> the kernel docs say sr2 holds the kernel gateway page (so i guess
> > >>> 0x100 is a
> > >>> known offset into that). the docs don't mention r0 that i can see,
> > >>> so i'm
> > >>> guessing it's one of those "always 0" registers ?
> > >>
> > >> Yes. There is also an entry at offset 0xb0 for light-weight-
> > >> syscalls. Currently,
> > >> this implements an atomic CAS operation used for pthread support.
> > >
> > > interesting. sounds like a poor man's vDSO. i'll document this the
> > > new
> > > vdso(7) man page.
> >
> > Not exactly, the code runs on the gateway page which is in kernel space.
> > The main reason for doing the operation in kernel space is to prevent
> > processes from being preempted while executing in the lock region. In
> > general,
> > parisc processes are not preempted on the gateway page. There are
> > some subtleties regarding fault handling.
>
> sure ... the Blackfin arch does a similar thing for providing fast atomic
> primitives to userspace since the ISA can't.
>
> what do you think of this section for vdso(7) ? i might have to split the
> "real" vdso arches from these others since there's a couple now (arm, bfin,
> parisc), and i think there might be more down the line (microblaze).
I've got to say, I really don't think this can be classified as a vdso.
For a vdso, the kernel exports an ELF object that can be linked
dynamically into any elf binary requiring it. The ELF section
information provides full details and so vdso entries can be called by
symbol.
In the parisc gateway page implementation, we have a set of "hidden"
primitives which the executable must know how to call (no self
description like a vdso). This mechanism is identical to the original
intent of the x86 int <n> instruction (an instruction that traps into
the kernel and performs some primitive action but to use it, you have to
know which function corresponds to which value of <n>).
James
> .SS parisc (hppa) functions
> .\" See linux/arch/parisc/kernel/syscall.S
> .\" See linux/Documentation/parisc/registers
> The parisc port has a code page full of utility functions.
> Rather than use the normal ELF aux vector approach, it passes the address of
> the page to the process via the SR2 register.
> This is done to match the way HP-UX works.
>
> Since it's just a raw page of code, there is no ELF information for doing
> symbol lookups or versioning.
> Simply call into the appropriate offset via the branch instruction, e.g.:
> .br
> ble <offset>(%sr2, %r0)
> .if t \{\
> .ft CW
> \}
> .TS
> l l.
> offset function
> _
> 00b0 lws_entry
> 00e0 set_thread_pointer
> 0100 linux_gateway_entry (syscall)
> 0268 syscall_nosys
> 0274 tracesys
> 0324 tracesys_next
> 0368 tracesys_exit
> 03a0 tracesys_sigexit
> 03b8 lws_start
> 03dc lws_exit_nosys
> 03e0 lws_exit
> 03e4 lws_compare_and_swap64
> 03e8 lws_compare_and_swap
> 0404 cas_wouldblock
> 0410 cas_action
> .TE
> .if t \{\
> .in
> .ft P
> \}
>
> > There is support in glibc and libgcc for these calls. The libgcc
> > implementation
> > in linux-atomic.c is very similar to that on arm.
>
> interesting. another arch to add :).
> -mike
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
2013-04-12 4:45 ` James Bottomley
@ 2013-04-12 12:17 ` John David Anglin
2013-04-12 18:45 ` Mike Frysinger
1 sibling, 0 replies; 41+ messages in thread
From: John David Anglin @ 2013-04-12 12:17 UTC (permalink / raw)
To: James Bottomley
Cc: Mike Frysinger, Kyle McMartin, Michael Kerrisk (man-pages),
linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
linux-parisc
On 12-Apr-13, at 12:45 AM, James Bottomley wrote:
> On Thu, 2013-04-11 at 23:38 -0400, Mike Frysinger wrote:
>> On Thursday 11 April 2013 22:34:43 John David Anglin wrote:
>>> On 11-Apr-13, at 9:55 PM, Mike Frysinger wrote:
>>>> On Sunday 07 April 2013 14:48:42 John David Anglin wrote:
>>>>> On 7-Apr-13, at 2:39 PM, Mike Frysinger wrote:
>>>>>> just to be clear, the only insn you need is:
>>>>>> ble 0x100(%sr2, %r0);
>>>>>>
>>>>>> the kernel docs say sr2 holds the kernel gateway page (so i guess
>>>>>> 0x100 is a
>>>>>> known offset into that). the docs don't mention r0 that i can
>>>>>> see,
>>>>>> so i'm
>>>>>> guessing it's one of those "always 0" registers ?
>>>>>
>>>>> Yes. There is also an entry at offset 0xb0 for light-weight-
>>>>> syscalls. Currently,
>>>>> this implements an atomic CAS operation used for pthread support.
>>>>
>>>> interesting. sounds like a poor man's vDSO. i'll document this
>>>> the
>>>> new
>>>> vdso(7) man page.
>>>
>>> Not exactly, the code runs on the gateway page which is in kernel
>>> space.
>>> The main reason for doing the operation in kernel space is to
>>> prevent
>>> processes from being preempted while executing in the lock
>>> region. In
>>> general,
>>> parisc processes are not preempted on the gateway page. There are
>>> some subtleties regarding fault handling.
>>
>> sure ... the Blackfin arch does a similar thing for providing fast
>> atomic
>> primitives to userspace since the ISA can't.
>>
>> what do you think of this section for vdso(7) ? i might have to
>> split the
>> "real" vdso arches from these others since there's a couple now
>> (arm, bfin,
>> parisc), and i think there might be more down the line (microblaze).
>
> I've got to say, I really don't think this can be classified as a
> vdso.
> For a vdso, the kernel exports an ELF object that can be linked
> dynamically into any elf binary requiring it. The ELF section
> information provides full details and so vdso entries can be called by
> symbol.
>
> In the parisc gateway page implementation, we have a set of "hidden"
> primitives which the executable must know how to call (no self
> description like a vdso). This mechanism is identical to the original
> intent of the x86 int <n> instruction (an instruction that traps into
> the kernel and performs some primitive action but to use it, you
> have to
> know which function corresponds to which value of <n>).
I agree with James. There is no ELF object exported to userspace. The
content of the gateway page is hidden. The data structures used for
the locks are in the kernel itself. Access is via a special branch
instruction
rather than a break/trap instruction.
>
> James
>
>
>> .SS parisc (hppa) functions
>> .\" See linux/arch/parisc/kernel/syscall.S
>> .\" See linux/Documentation/parisc/registers
>> The parisc port has a code page full of utility functions.
>> Rather than use the normal ELF aux vector approach, it passes the
>> address of
>> the page to the process via the SR2 register.
>> This is done to match the way HP-UX works.
>>
>> Since it's just a raw page of code, there is no ELF information for
>> doing
>> symbol lookups or versioning.
>> Simply call into the appropriate offset via the branch instruction,
>> e.g.:
>> .br
>> ble <offset>(%sr2, %r0)
>> .if t \{\
>> .ft CW
>> \}
>> .TS
>> l l.
>> offset function
>> _
>> 00b0 lws_entry
>> 00e0 set_thread_pointer
>> 0100 linux_gateway_entry (syscall)
>> 0268 syscall_nosys
>> 0274 tracesys
>> 0324 tracesys_next
>> 0368 tracesys_exit
>> 03a0 tracesys_sigexit
>> 03b8 lws_start
>> 03dc lws_exit_nosys
>> 03e0 lws_exit
>> 03e4 lws_compare_and_swap64
>> 03e8 lws_compare_and_swap
>> 0404 cas_wouldblock
>> 0410 cas_action
>> .TE
>> .if t \{\
>> .in
>> .ft P
>> \}
>>
>>> There is support in glibc and libgcc for these calls. The libgcc
>>> implementation
>>> in linux-atomic.c is very similar to that on arm.
>>
>> interesting. another arch to add :).
>> -mike
>
>
>
--
John David Anglin dave.anglin@bell.net
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
2013-04-12 1:55 ` Mike Frysinger
[not found] ` <201304112155.46349.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
@ 2013-04-12 14:01 ` Kyle McMartin
1 sibling, 0 replies; 41+ messages in thread
From: Kyle McMartin @ 2013-04-12 14:01 UTC (permalink / raw)
To: Mike Frysinger
Cc: John David Anglin, Michael Kerrisk (man-pages),
linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
linux-parisc
On Thu, Apr 11, 2013 at 09:55:43PM -0400, Mike Frysinger wrote:
> interesting. sounds like a poor man's vDSO. i'll document this the new
> vdso(7) man page.
> -mike
fwiw ia64 does basically the same thing for a subset of syscalls
(fsys.c)
--Kyle
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
2013-04-12 4:45 ` James Bottomley
2013-04-12 12:17 ` John David Anglin
@ 2013-04-12 18:45 ` Mike Frysinger
2013-04-12 19:14 ` James Bottomley
1 sibling, 1 reply; 41+ messages in thread
From: Mike Frysinger @ 2013-04-12 18:45 UTC (permalink / raw)
To: James Bottomley
Cc: John David Anglin, Kyle McMartin, Michael Kerrisk (man-pages),
linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
linux-parisc
[-- Attachment #1: Type: Text/Plain, Size: 3533 bytes --]
On Friday 12 April 2013 00:45:12 James Bottomley wrote:
> On Thu, 2013-04-11 at 23:38 -0400, Mike Frysinger wrote:
> > On Thursday 11 April 2013 22:34:43 John David Anglin wrote:
> > > On 11-Apr-13, at 9:55 PM, Mike Frysinger wrote:
> > > > On Sunday 07 April 2013 14:48:42 John David Anglin wrote:
> > > >> On 7-Apr-13, at 2:39 PM, Mike Frysinger wrote:
> > > >>> just to be clear, the only insn you need is:
> > > >>> ble 0x100(%sr2, %r0);
> > > >>>
> > > >>> the kernel docs say sr2 holds the kernel gateway page (so i guess
> > > >>> 0x100 is a
> > > >>> known offset into that). the docs don't mention r0 that i can see,
> > > >>> so i'm
> > > >>> guessing it's one of those "always 0" registers ?
> > > >>
> > > >> Yes. There is also an entry at offset 0xb0 for light-weight-
> > > >> syscalls. Currently,
> > > >> this implements an atomic CAS operation used for pthread support.
> > > >
> > > > interesting. sounds like a poor man's vDSO. i'll document this the
> > > > new
> > > > vdso(7) man page.
> > >
> > > Not exactly, the code runs on the gateway page which is in kernel
> > > space. The main reason for doing the operation in kernel space is to
> > > prevent processes from being preempted while executing in the lock
> > > region. In general,
> > > parisc processes are not preempted on the gateway page. There are
> > > some subtleties regarding fault handling.
> >
> > sure ... the Blackfin arch does a similar thing for providing fast atomic
> > primitives to userspace since the ISA can't.
> >
> > what do you think of this section for vdso(7) ? i might have to split
> > the "real" vdso arches from these others since there's a couple now
> > (arm, bfin, parisc), and i think there might be more down the line
> > (microblaze).
>
> I've got to say, I really don't think this can be classified as a vdso.
> For a vdso, the kernel exports an ELF object that can be linked
> dynamically into any elf binary requiring it. The ELF section
> information provides full details and so vdso entries can be called by
> symbol.
strictly speaking, sure, a vDSO is only a vDSO if it's an ELF (since the
acronym is literally "virtual dynamic shared object"). however, i see the
vdso as being a bit more of a flexible concept -- it's a place of shared code
that the kernel manages and exports for all userspace processes.
fundamentally, the point of the vDSO is to provide services to greatly speed
up userspace. in that regard, these mapped pages are exactly like vDSOs.
thus i think it's appropriate to document these "fixed code" regions that many
arches export (ARM, Blackfin, Itanium, Microblaze, PA-RISC) in the same man
page as the vdso. especially since (currently) arches do one or the other,
but not both.
> In the parisc gateway page implementation, we have a set of "hidden"
> primitives which the executable must know how to call (no self
> description like a vdso). This mechanism is identical to the original
> intent of the x86 int <n> instruction (an instruction that traps into
> the kernel and performs some primitive action but to use it, you have to
> know which function corresponds to which value of <n>).
would it be useful to document all of them ? or just the ones that userspace
actively uses (like syscall/cas) ? or should all of this be recorded in the
kernel's Documentation/parisc/ subdir and just have the man page refer people
there (like it does for ARM & Blackfin currently) ?
-mike
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
2013-04-12 18:45 ` Mike Frysinger
@ 2013-04-12 19:14 ` James Bottomley
2013-04-12 19:46 ` Mike Frysinger
0 siblings, 1 reply; 41+ messages in thread
From: James Bottomley @ 2013-04-12 19:14 UTC (permalink / raw)
To: Mike Frysinger
Cc: John David Anglin, Kyle McMartin, Michael Kerrisk (man-pages),
linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
linux-parisc
On Fri, 2013-04-12 at 14:45 -0400, Mike Frysinger wrote:
> On Friday 12 April 2013 00:45:12 James Bottomley wrote:
> > On Thu, 2013-04-11 at 23:38 -0400, Mike Frysinger wrote:
> > > what do you think of this section for vdso(7) ? i might have to split
> > > the "real" vdso arches from these others since there's a couple now
> > > (arm, bfin, parisc), and i think there might be more down the line
> > > (microblaze).
> >
> > I've got to say, I really don't think this can be classified as a vdso.
> > For a vdso, the kernel exports an ELF object that can be linked
> > dynamically into any elf binary requiring it. The ELF section
> > information provides full details and so vdso entries can be called by
> > symbol.
>
> strictly speaking, sure, a vDSO is only a vDSO if it's an ELF (since the
> acronym is literally "virtual dynamic shared object"). however, i see the
> vdso as being a bit more of a flexible concept -- it's a place of shared code
> that the kernel manages and exports for all userspace processes.
> fundamentally, the point of the vDSO is to provide services to greatly speed
> up userspace. in that regard, these mapped pages are exactly like vDSOs.
I don't entirely understand this classification. If the kernel<->user
gateway becomes classified as a vdso, that covers our syscall interface
on every archtecture. There's now no distinction between a vdso (which
may not even move to kernel mode) and a syscall.
I think the difference is that a syscall is a specific call to a known
kernel routine by number and it involves a transition to kernel mode. A
vdso is an exported link object containing certain functions which may
or may not cause a trap to kernel mode when executed. The distinction
is how you do the call. For syscalls, you have to know the number and
the arguments. For vdso you just have to know the symbol (and
obviously, the prototype for C code) and the kernel supplies the
implementation direct to the userspace binary.
> thus i think it's appropriate to document these "fixed code" regions that many
> arches export (ARM, Blackfin, Itanium, Microblaze, PA-RISC) in the same man
> page as the vdso. especially since (currently) arches do one or the other,
> but not both.
I really see these as a type of lightweight syscall. You use the
syscall prototype (call by number with known arguments) but the call may
not necessarily transition to kernel mode proper to handle the function.
> > In the parisc gateway page implementation, we have a set of "hidden"
> > primitives which the executable must know how to call (no self
> > description like a vdso). This mechanism is identical to the original
> > intent of the x86 int <n> instruction (an instruction that traps into
> > the kernel and performs some primitive action but to use it, you have to
> > know which function corresponds to which value of <n>).
>
> would it be useful to document all of them ? or just the ones that userspace
> actively uses (like syscall/cas) ? or should all of this be recorded in the
> kernel's Documentation/parisc/ subdir and just have the man page refer people
> there (like it does for ARM & Blackfin currently) ?
I'm not sure. For x86 they're in include/asm/traps.h. I think the only
ones we really use are int3 for breakpoint, int4 for overflow and int80
for legacy syscall.
James
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
2013-04-12 19:14 ` James Bottomley
@ 2013-04-12 19:46 ` Mike Frysinger
2013-04-12 20:25 ` James Bottomley
0 siblings, 1 reply; 41+ messages in thread
From: Mike Frysinger @ 2013-04-12 19:46 UTC (permalink / raw)
To: James Bottomley
Cc: John David Anglin, Kyle McMartin, Michael Kerrisk (man-pages),
linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
linux-parisc
[-- Attachment #1: Type: Text/Plain, Size: 5259 bytes --]
On Friday 12 April 2013 15:14:47 James Bottomley wrote:
> On Fri, 2013-04-12 at 14:45 -0400, Mike Frysinger wrote:
> > On Friday 12 April 2013 00:45:12 James Bottomley wrote:
> > > On Thu, 2013-04-11 at 23:38 -0400, Mike Frysinger wrote:
> > > > what do you think of this section for vdso(7) ? i might have to
> > > > split the "real" vdso arches from these others since there's a
> > > > couple now (arm, bfin, parisc), and i think there might be more down
> > > > the line (microblaze).
> > >
> > > I've got to say, I really don't think this can be classified as a vdso.
> > > For a vdso, the kernel exports an ELF object that can be linked
> > > dynamically into any elf binary requiring it. The ELF section
> > > information provides full details and so vdso entries can be called by
> > > symbol.
> >
> > strictly speaking, sure, a vDSO is only a vDSO if it's an ELF (since the
> > acronym is literally "virtual dynamic shared object"). however, i see
> > the vdso as being a bit more of a flexible concept -- it's a place of
> > shared code that the kernel manages and exports for all userspace
> > processes. fundamentally, the point of the vDSO is to provide services
> > to greatly speed up userspace. in that regard, these mapped pages are
> > exactly like vDSOs.
>
> I don't entirely understand this classification. If the kernel<->user
> gateway becomes classified as a vdso, that covers our syscall interface
> on every archtecture. There's now no distinction between a vdso (which
> may not even move to kernel mode) and a syscall.
>
> I think the difference is that a syscall is a specific call to a known
> kernel routine by number and it involves a transition to kernel mode. A
> vdso is an exported link object containing certain functions which may
> or may not cause a trap to kernel mode when executed. The distinction
> is how you do the call. For syscalls, you have to know the number and
> the arguments. For vdso you just have to know the symbol (and
> obviously, the prototype for C code) and the kernel supplies the
> implementation direct to the userspace binary.
i'm not fully versed in the parisc linux gateway page or how the architecture
is handling things, so i could be completely off here. from reading the source
code, it *looked* like it was just a page of utility funcs that userspace
branches to without changing privilege modes or going through the full syscall
routines.
so i'm saying the gateway page itself can be thought of in the same vein as a
vDSO. it's a black box with entry points that provide light weight services
to userspace. sometimes it ends up triggering a full syscall, sometimes it
doesn't (just like a vDSO).
> > thus i think it's appropriate to document these "fixed code" regions that
> > many arches export (ARM, Blackfin, Itanium, Microblaze, PA-RISC) in the
> > same man page as the vdso. especially since (currently) arches do one
> > or the other, but not both.
>
> I really see these as a type of lightweight syscall. You use the
> syscall prototype (call by number with known arguments) but the call may
> not necessarily transition to kernel mode proper to handle the function.
if you think of the vdso in a very strict light (it's exactly an ELF that the
kernel automatically maps into every process's address space), then i guess
you can only classify these as lightweight syscalls (where the address/offset
is the "syscall #").
i see vdso as being a more flexible concept than that -- if it's code mapped
into a process's address space and provides useful lightweight services that
are meant to be used specifically in lieu of syscall(), then it's vdso-like and
should be in the vdso(7) man page. it has a lot more in common imo with a
vdso than it does with an actual syscall. i certainly think vdso(7) is more
appropriate for these regions than syscall(2) or syscalls(2).
> > > In the parisc gateway page implementation, we have a set of "hidden"
> > > primitives which the executable must know how to call (no self
> > > description like a vdso). This mechanism is identical to the original
> > > intent of the x86 int <n> instruction (an instruction that traps into
> > > the kernel and performs some primitive action but to use it, you have
> > > to know which function corresponds to which value of <n>).
> >
> > would it be useful to document all of them ? or just the ones that
> > userspace actively uses (like syscall/cas) ? or should all of this be
> > recorded in the kernel's Documentation/parisc/ subdir and just have the
> > man page refer people there (like it does for ARM & Blackfin currently)
> > ?
>
> I'm not sure. For x86 they're in include/asm/traps.h. I think the only
> ones we really use are int3 for breakpoint, int4 for overflow and int80
> for legacy syscall.
hmm, i wasn't even considering the other arch-specific services offered by e.g.
software interrupts. i don't think those belong in vdso(7) as they don't
confer any of the lightweight advantages the vdso is designed to bring, but it
might be useful to document these somewhere. they're also not as common for
people to encounter as a vdso ...
-mike
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
2013-04-12 19:46 ` Mike Frysinger
@ 2013-04-12 20:25 ` James Bottomley
0 siblings, 0 replies; 41+ messages in thread
From: James Bottomley @ 2013-04-12 20:25 UTC (permalink / raw)
To: Mike Frysinger
Cc: John David Anglin, Kyle McMartin, Michael Kerrisk (man-pages),
linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
linux-parisc
On Fri, 2013-04-12 at 15:46 -0400, Mike Frysinger wrote:
> On Friday 12 April 2013 15:14:47 James Bottomley wrote:
> > On Fri, 2013-04-12 at 14:45 -0400, Mike Frysinger wrote:
> > > On Friday 12 April 2013 00:45:12 James Bottomley wrote:
> > > > On Thu, 2013-04-11 at 23:38 -0400, Mike Frysinger wrote:
> > > > > what do you think of this section for vdso(7) ? i might have to
> > > > > split the "real" vdso arches from these others since there's a
> > > > > couple now (arm, bfin, parisc), and i think there might be more down
> > > > > the line (microblaze).
> > > >
> > > > I've got to say, I really don't think this can be classified as a vdso.
> > > > For a vdso, the kernel exports an ELF object that can be linked
> > > > dynamically into any elf binary requiring it. The ELF section
> > > > information provides full details and so vdso entries can be called by
> > > > symbol.
> > >
> > > strictly speaking, sure, a vDSO is only a vDSO if it's an ELF (since the
> > > acronym is literally "virtual dynamic shared object"). however, i see
> > > the vdso as being a bit more of a flexible concept -- it's a place of
> > > shared code that the kernel manages and exports for all userspace
> > > processes. fundamentally, the point of the vDSO is to provide services
> > > to greatly speed up userspace. in that regard, these mapped pages are
> > > exactly like vDSOs.
> >
> > I don't entirely understand this classification. If the kernel<->user
> > gateway becomes classified as a vdso, that covers our syscall interface
> > on every archtecture. There's now no distinction between a vdso (which
> > may not even move to kernel mode) and a syscall.
> >
> > I think the difference is that a syscall is a specific call to a known
> > kernel routine by number and it involves a transition to kernel mode. A
> > vdso is an exported link object containing certain functions which may
> > or may not cause a trap to kernel mode when executed. The distinction
> > is how you do the call. For syscalls, you have to know the number and
> > the arguments. For vdso you just have to know the symbol (and
> > obviously, the prototype for C code) and the kernel supplies the
> > implementation direct to the userspace binary.
>
> i'm not fully versed in the parisc linux gateway page or how the architecture
> is handling things, so i could be completely off here. from reading the source
> code, it *looked* like it was just a page of utility funcs that userspace
> branches to without changing privilege modes or going through the full syscall
> routines.
Oh, if that's the misunderstanding, then the gateway page is "special".
It actually has PAGE_GATEWAY bits set (this is linux terminology; in
parisc terminology it's Execute, promote to PL0)in the page map. So
anything executing on this page executes with kernel level privilege
(there's more to it than that: to have this happen, you also have to use
a branch with a ,gate completer to activate the privilege promotion).
The upshot is that everything that runs on the gateway page runs at
kernel privilege but with the current user process address space
(although you have access to kernel space via %sr2). For the 0x100
syscall entry, we redo the space registers to point to the kernel
address space (preserving the user address space in %sr3), move to wide
mode if required, save the user registers and branch into the kernel
syscall entry point. For all the other functions, we execute at kernel
privilege but don't flip address spaces. The basic upshot of this is
that these code snippets are executed atomically (because the kernel
can't be pre-empted) and they may perform architecturally forbidden (to
PL3) operations (like setting control registers).
James
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
[not found] ` <201304112140.18506.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
@ 2013-04-16 6:01 ` Michael Kerrisk (man-pages)
0 siblings, 0 replies; 41+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-04-16 6:01 UTC (permalink / raw)
To: Mike Frysinger; +Cc: linux-man
On Fri, Apr 12, 2013 at 3:40 AM, Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org> wrote:
> On Monday 08 April 2013 05:20:07 Michael Kerrisk (man-pages) wrote:
>> arch/ABI instruction syscall # retval Notes
>
> i was thinking it also might be useful to mention the register where the
> syscall # is kept for syscall_restart, but we can do that in a follow up
That sounds like a useful addition! Yu can see the current version of
the page in Git. Further patches welcome...
>> ia64 break 0x100000 r15 r10/r8C
>
> looks like you added a typo :). it's "r8", not "r8C".
Fixed.
Cheers,
Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 41+ messages in thread
end of thread, other threads:[~2013-04-16 6:01 UTC | newest]
Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-27 5:11 [PATCH] man2 : syscall.2 : add notes ch0.han-Hm3cg6mZ9cc
[not found] ` <1364361092-5948-1-git-send-email-ch0.han-Hm3cg6mZ9cc@public.gmane.org>
2013-03-27 7:53 ` (unknown), Changhee Han
2013-03-27 8:25 ` [PATCH v2] man2 : syscall.2 : add notes Changhee Han
2013-03-28 9:37 ` [PATCH] " Michael Kerrisk (man-pages)
2013-04-01 5:33 ` Changhee Han
[not found] ` <1364794429-20477-1-git-send-email-ch0.han-Hm3cg6mZ9cc@public.gmane.org>
2013-04-01 6:13 ` Mike Frysinger
[not found] ` <201304010213.06056.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
2013-04-01 6:22 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAki_8bOsuKTJLx3iMLeSvVXHo0bZf8zSUQ08RR7+D33xgQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-01 7:19 ` Mike Frysinger
[not found] ` <201304010319.45019.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
2013-04-01 7:36 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkhBASGvXGfdBSjpGaMuxoJofcQvZQrX3a=uxbcKQnXOAQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-01 8:29 ` Mike Frysinger
[not found] ` <201304010429.45737.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
2013-04-01 9:29 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkij3zDwakWvcRkRbknmV2Hpt4HWfH4uVqmxp+7gQek-2g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-01 10:32 ` Mike Frysinger
[not found] ` <201304010632.41520.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
2013-04-02 6:54 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkgG2kdCC1tyZQkYU7O_nP7RB8VoCmx6eb8FcudU1s6RgA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-02 23:17 ` [PATCH] man2 : syscall.2 : document syscall calling conventions Mike Frysinger
2013-04-07 10:00 ` Michael Kerrisk (man-pages)
2013-04-07 13:55 ` Kyle McMartin
2013-04-07 14:56 ` James Bottomley
2013-04-07 15:11 ` Kyle McMartin
[not found] ` <20130407151134.GX12938-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org>
2013-04-07 15:38 ` James Bottomley
2013-04-08 9:18 ` Michael Kerrisk (man-pages)
[not found] ` <20130407135514.GW12938-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org>
2013-04-07 18:39 ` Mike Frysinger
2013-04-07 18:48 ` John David Anglin
[not found] ` <BLU0-SMTP986B123D17DB8B88214F797C40-MsuGFMq8XAE@public.gmane.org>
2013-04-08 9:20 ` Michael Kerrisk (man-pages)
2013-04-08 9:20 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkhv6tovvnucoofDR-eOe4H7xeFZDam9+iaVVndEqbuoXg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-12 1:40 ` Mike Frysinger
[not found] ` <201304112140.18506.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
2013-04-16 6:01 ` Michael Kerrisk (man-pages)
2013-04-12 1:55 ` Mike Frysinger
[not found] ` <201304112155.46349.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
2013-04-12 2:34 ` John David Anglin
2013-04-12 3:38 ` Mike Frysinger
2013-04-12 4:45 ` James Bottomley
2013-04-12 12:17 ` John David Anglin
2013-04-12 18:45 ` Mike Frysinger
2013-04-12 19:14 ` James Bottomley
2013-04-12 19:46 ` Mike Frysinger
2013-04-12 20:25 ` James Bottomley
2013-04-12 14:01 ` Kyle McMartin
[not found] ` <CAKgNAkgODPSWSeA8ZymiAjFBqSAZQMtQe9GW84Y6QHdFEc9S-w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-07 18:43 ` Mike Frysinger
2013-04-01 8:37 ` [PATCH] man2 : syscall.2 : add notes Mike Frysinger
[not found] ` <201304010437.52901.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
2013-04-01 9:30 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkit-qRPErHDzGEJ_yedA+O97bFxDsqWJMZOhCZ9DPvOtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-01 10:09 ` Mike Frysinger
2013-04-01 7:05 ` Fw : Re : " 한창희
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.