All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] man2 : syscall.2 : add notes
@ 2013-03-27  5:11 ch0.han-Hm3cg6mZ9cc
       [not found] ` <1364361092-5948-1-git-send-email-ch0.han-Hm3cg6mZ9cc@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: ch0.han-Hm3cg6mZ9cc @ 2013-03-27  5:11 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
  Cc: linux-man-u79uwXL29TY76Z2rM5mHXA, gunho.lee-Hm3cg6mZ9cc, Changhee Han

From: Changhee Han <gyulkkajo-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Add notes that caution users when passing arguments to syscall.2.

Signed-off-by: Changhee Han <gyulkkajo-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 man2/syscall.2 |    9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/man2/syscall.2 b/man2/syscall.2
index 0675943..2c823b6 100644
--- a/man2/syscall.2
+++ b/man2/syscall.2
@@ -79,6 +79,15 @@ and an error code is stored in
 .BR syscall ()
 first appeared in
 4BSD.
+
+On some architecture arguments should be passed with an appropriate way.
+The glibc wrapper function, described in
+.BR syscalls (2),
+copies arguments to the right registers denpend on the architecture but
+.BR syscall (2)
+needs arguments following ABI, which its architecture describes, to be passed manually by a user.
+For example, on ARM architecture, a long long type of argument is considered to be 8-byte aligned and to be split into two 4-byte arguments.
+
 .SH EXAMPLE
 .nf
 #define _GNU_SOURCE
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* (unknown), 
       [not found] ` <1364361092-5948-1-git-send-email-ch0.han-Hm3cg6mZ9cc@public.gmane.org>
@ 2013-03-27  7:53   ` Changhee Han
  2013-03-27  8:25   ` [PATCH v2] man2 : syscall.2 : add notes Changhee Han
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 41+ messages in thread
From: Changhee Han @ 2013-03-27  7:53 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
  Cc: linux-man-u79uwXL29TY76Z2rM5mHXA, gunho.lee-Hm3cg6mZ9cc


In v1, there was incorrect information, author and signed-off email address.
So, I resend the patch and igonre the previous one.
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2] man2 : syscall.2 : add notes
       [not found] ` <1364361092-5948-1-git-send-email-ch0.han-Hm3cg6mZ9cc@public.gmane.org>
  2013-03-27  7:53   ` (unknown), Changhee Han
@ 2013-03-27  8:25   ` Changhee Han
  2013-03-28  9:37   ` [PATCH] " Michael Kerrisk (man-pages)
  2013-04-01  5:33   ` Changhee Han
  3 siblings, 0 replies; 41+ messages in thread
From: Changhee Han @ 2013-03-27  8:25 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
  Cc: linux-man-u79uwXL29TY76Z2rM5mHXA, gunho.lee-Hm3cg6mZ9cc, Changhee Han

Add notes that caution users when passing arguments to syscall.2.

Signed-off-by: Changhee Han <ch0.han-Hm3cg6mZ9cc@public.gmane.org>
---
Previous v1, it had fault information, the author, signed-off email address. So, I resend the corrected patch and please ignore the previous v1 patch. Thanks.
 man2/syscall.2 |    9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/man2/syscall.2 b/man2/syscall.2
index 0675943..2c823b6 100644
--- a/man2/syscall.2
+++ b/man2/syscall.2
@@ -79,6 +79,15 @@ and an error code is stored in
 .BR syscall ()
 first appeared in
 4BSD.
+
+On some architecture arguments should be passed with an appropriate way.
+The glibc wrapper function, described in
+.BR syscalls (2),
+copies arguments to the right registers denpend on the architecture but
+.BR syscall (2)
+needs arguments following ABI, which its architecture describes, to be passed manually by a user.
+For example, on ARM architecture, a long long type of argument is considered to be 8-byte aligned and to be split into two 4-byte arguments.
+
 .SH EXAMPLE
 .nf
 #define _GNU_SOURCE
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : add notes
       [not found] ` <1364361092-5948-1-git-send-email-ch0.han-Hm3cg6mZ9cc@public.gmane.org>
  2013-03-27  7:53   ` (unknown), Changhee Han
  2013-03-27  8:25   ` [PATCH v2] man2 : syscall.2 : add notes Changhee Han
@ 2013-03-28  9:37   ` Michael Kerrisk (man-pages)
  2013-04-01  5:33   ` Changhee Han
  3 siblings, 0 replies; 41+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-03-28  9:37 UTC (permalink / raw)
  To: ch0.han-Hm3cg6mZ9cc
  Cc: linux-man-u79uwXL29TY76Z2rM5mHXA, gunho.lee-Hm3cg6mZ9cc, Changhee Han

On Wed, Mar 27, 2013 at 6:11 AM,  <ch0.han-Hm3cg6mZ9cc@public.gmane.org> wrote:
> From: Changhee Han <gyulkkajo-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>
> Add notes that caution users when passing arguments to syscall.2.
>
> Signed-off-by: Changhee Han <gyulkkajo-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
>  man2/syscall.2 |    9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/man2/syscall.2 b/man2/syscall.2
> index 0675943..2c823b6 100644
> --- a/man2/syscall.2
> +++ b/man2/syscall.2
> @@ -79,6 +79,15 @@ and an error code is stored in
>  .BR syscall ()
>  first appeared in
>  4BSD.
> +
> +On some architecture arguments should be passed with an appropriate way.
> +The glibc wrapper function, described in
> +.BR syscalls (2),
> +copies arguments to the right registers denpend on the architecture but
> +.BR syscall (2)
> +needs arguments following ABI, which its architecture describes, to be passed manually by a user.
> +For example, on ARM architecture, a long long type of argument is considered to be 8-byte aligned and to be split into two 4-byte arguments.

Changhee,

I think this is a very worthwhile patch, but needs a bit of
clarification. How would text such as this be:

[[
Each architecture ABI has its own requirements on how system call
arguments are passed to the kernel.
For system calls that have a glibc wrapper (i.e., most system calls)
glibc handles the details of copy arguments to the right registers
in a manner suitable for the architecture.
However, when using
.BR syscall ()
to make a system call,
the caller may need to handle architecture-dependent details.
For example, on the ARM architecture, a
.I "long long"
argument is considered to be 8-byte aligned
and to be split into two 4-byte arguments.
]]

Would that text be okay?

And then, in addition to that it would be good to have an example of
how one uses syscall() on ARM to invoke a system call with an 8-byte
aligned argument. Could you provide an example?

Cheers,

Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] man2 : syscall.2 : add notes
       [not found] ` <1364361092-5948-1-git-send-email-ch0.han-Hm3cg6mZ9cc@public.gmane.org>
                     ` (2 preceding siblings ...)
  2013-03-28  9:37   ` [PATCH] " Michael Kerrisk (man-pages)
@ 2013-04-01  5:33   ` Changhee Han
       [not found]     ` <1364794429-20477-1-git-send-email-ch0.han-Hm3cg6mZ9cc@public.gmane.org>
  3 siblings, 1 reply; 41+ messages in thread
From: Changhee Han @ 2013-04-01  5:33 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
  Cc: linux-man-u79uwXL29TY76Z2rM5mHXA, gunho.lee-Hm3cg6mZ9cc, Changhee Han

Add notes that caution users when passing arguments to syscall.2.

Signed-off-by: Changhee Han <ch0.han-Hm3cg6mZ9cc@public.gmane.org>
---
Modified notes as you suggested and added some example which show how to handle 64bit argument
 man2/syscall.2 |   30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/man2/syscall.2 b/man2/syscall.2
index 0675943..180a0e4 100644
--- a/man2/syscall.2
+++ b/man2/syscall.2
@@ -79,6 +79,36 @@ and an error code is stored in
 .BR syscall ()
 first appeared in
 4BSD.
+
+Each architecture ABI has its own requirements on how system call arguments are passed to the kernel.
+For system calls that have a glibc wrapper (i.g., most system calls) glibc handles the details of copy arguments to the right registers in a manner suitable for the architecture.
+However, when using
+.BR syscall ()
+to make a system call,
+the caller may need to handle architecture-dependent details.
+For example, on ARM architecture, a
+.I "long long"
+argument is considered to be 8-byte aligned and to be split into two 4-byte arguments.
+
+.BR readahead ()
+system call could be called like below in ARM architecture.
+
+syscall(__NR_readahead, fd, 
+.I 0
+, (unsigned int)(
+.I offset
+>> 32), (unsigned int)(
+.I offset
+& 0xFFFFFFFF), count)
+
+.I offset
+is 64 bit and should be 8-byte aligned.
+Thus, a padding is inserted before
+.I offset
+and
+.I offset
+is split into two 32 bit arguments.
+
 .SH EXAMPLE
 .nf
 #define _GNU_SOURCE
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : add notes
       [not found]     ` <1364794429-20477-1-git-send-email-ch0.han-Hm3cg6mZ9cc@public.gmane.org>
@ 2013-04-01  6:13       ` Mike Frysinger
       [not found]         ` <201304010213.06056.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Mike Frysinger @ 2013-04-01  6:13 UTC (permalink / raw)
  To: Changhee Han
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	linux-man-u79uwXL29TY76Z2rM5mHXA, gunho.lee-Hm3cg6mZ9cc

[-- Attachment #1: Type: Text/Plain, Size: 1019 bytes --]

On Monday 01 April 2013 01:33:49 Changhee Han wrote:
> +Each architecture ABI has its own requirements on how system call
> arguments are passed to the kernel. +For system calls that have a glibc
> wrapper (i.g., most system calls) glibc handles the details of copy
> arguments to the right registers in a manner suitable for the
> architecture.

these lines need to be wrapped

"i.g." is incorrect ... you mean "i.e."

> +However, when using
> +.BR syscall ()
> +to make a system call,
> +the caller may need to handle architecture-dependent details.
> +For example, on ARM architecture, a
> +.I "long long"
> +argument is considered to be 8-byte aligned and to be split into two
> 4-byte arguments. +
> +.BR readahead ()
> +system call could be called like below in ARM architecture.

this has nothing to do with alignment.  syscalls pass args via registers, and 
in the 32bit ARM port, registers are 32bits wide.  so in order to pass a 64bit 
value, you have to manually split it up.
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : add notes
       [not found]         ` <201304010213.06056.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
@ 2013-04-01  6:22           ` Michael Kerrisk (man-pages)
       [not found]             ` <CAKgNAki_8bOsuKTJLx3iMLeSvVXHo0bZf8zSUQ08RR7+D33xgQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2013-04-01  7:05           ` Fw : Re : " 한창희
  1 sibling, 1 reply; 41+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-04-01  6:22 UTC (permalink / raw)
  To: Mike Frysinger; +Cc: Changhee Han, linux-man, gunho.lee-Hm3cg6mZ9cc

Mike,

On Mon, Apr 1, 2013 at 8:13 AM, Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org> wrote:
> On Monday 01 April 2013 01:33:49 Changhee Han wrote:
>> +Each architecture ABI has its own requirements on how system call
>> arguments are passed to the kernel. +For system calls that have a glibc
>> wrapper (i.g., most system calls) glibc handles the details of copy
>> arguments to the right registers in a manner suitable for the
>> architecture.
>
> these lines need to be wrapped
>
> "i.g." is incorrect ... you mean "i.e."
>
>> +However, when using
>> +.BR syscall ()
>> +to make a system call,
>> +the caller may need to handle architecture-dependent details.
>> +For example, on ARM architecture, a
>> +.I "long long"
>> +argument is considered to be 8-byte aligned and to be split into two
>> 4-byte arguments. +
>> +.BR readahead ()
>> +system call could be called like below in ARM architecture.
>
> this has nothing to do with alignment.  syscalls pass args via registers, and
> in the 32bit ARM port, registers are 32bits wide.  so in order to pass a 64bit
> value, you have to manually split it up.

So, I'm not familiar with all the details here. What is the purpose of
the '0' argument that precedes 'offset' then?

Cheers,

Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Fw : Re : Re: [PATCH] man2 : syscall.2 : add notes
       [not found]         ` <201304010213.06056.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
  2013-04-01  6:22           ` Michael Kerrisk (man-pages)
@ 2013-04-01  7:05           ` 한창희
  1 sibling, 0 replies; 41+ messages in thread
From: 한창희 @ 2013-04-01  7:05 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Mike Frysinger, linux-man, 이건호


Kerrisk,

Arguments passes via registers when syscall() calls.
but each register size is 32 bit wide.
So, when 64bit argument passes, it split into two registers.

But according to ABI by ARM, 64 bit arguments should be start with an even number of register (e.g. r0, r2)
so, in the example, 'offset' should be started on an even number of register like below

r0 : fd
r1 : 0 (dummy value)
r2 : offset high
r3 : offset low

if '0' is omitted, system call handler mis-interpreted the value and a garbage value is in 'offset'


I will correct the notes and submit it again soon...

Mike, do you have any suggestion to correct my expression? ( include alignment....? )



---------- Original Message ----------

From : "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
To : Mike Frysinger <vapier@gentoo.org>
Cc : 한창희 연구원(ch0.han), linux-man <linux-man@vger.kernel.org>, 이건호 책임연구원(gunho.lee)
Date : 13/4/1 15:23:08
Subject : Re: [PATCH] man2 : syscall.2 : add notes



Mike,

On Mon, Apr 1, 2013 at 8:13 AM, Mike Frysinger wrote:
> On Monday 01 April 2013 01:33:49 Changhee Han wrote:
>> +Each architecture ABI has its own requirements on how system call
>> arguments are passed to the kernel. +For system calls that have a glibc
>> wrapper (i.g., most system calls) glibc handles the details of copy
>> arguments to the right registers in a manner suitable for the
>> architecture.
>
> these lines need to be wrapped
>
> "i.g." is incorrect ... you mean "i.e."
>
>> +However, when using
>> +.BR syscall ()
>> +to make a system call,
>> +the caller may need to handle architecture-dependent details.
>> +For example, on ARM architecture, a
>> +.I "long long"
>> +argument is considered to be 8-byte aligned and to be split into two
>> 4-byte arguments. +
>> +.BR readahead ()
>> +system call could be called like below in ARM architecture.
>
> this has nothing to do with alignment.  syscalls pass args via registers, and
> in the 32bit ARM port, registers are 32bits wide.  so in order to pass a 64bit
> value, you have to manually split it up.

So, I'm not familiar with all the details here. What is the purpose of
the '0' argument that precedes 'offset' then?

Cheers,

Michael

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : add notes
       [not found]             ` <CAKgNAki_8bOsuKTJLx3iMLeSvVXHo0bZf8zSUQ08RR7+D33xgQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-04-01  7:19               ` Mike Frysinger
       [not found]                 ` <201304010319.45019.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Mike Frysinger @ 2013-04-01  7:19 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Changhee Han, linux-man, gunho.lee-Hm3cg6mZ9cc

[-- Attachment #1: Type: Text/Plain, Size: 1839 bytes --]

On Monday 01 April 2013 02:22:45 Michael Kerrisk (man-pages) wrote:
> On Mon, Apr 1, 2013 at 8:13 AM, Mike Frysinger wrote:
> > On Monday 01 April 2013 01:33:49 Changhee Han wrote:
> >> +However, when using
> >> +.BR syscall ()
> >> +to make a system call,
> >> +the caller may need to handle architecture-dependent details.
> >> +For example, on ARM architecture, a
> >> +.I "long long"
> >> +argument is considered to be 8-byte aligned and to be split into two
> >> 4-byte arguments. +
> >> +.BR readahead ()
> >> +system call could be called like below in ARM architecture.
> > 
> > this has nothing to do with alignment.  syscalls pass args via registers,
> > and in the 32bit ARM port, registers are 32bits wide.  so in order to
> > pass a 64bit value, you have to manually split it up.
> 
> So, I'm not familiar with all the details here. What is the purpose of
> the '0' argument that precedes 'offset' then?

ok, so the answer is more nuanced, and the reasoning above is incorrect (or at 
the very least, poorly phrased).

for ARM OABI, there is no such padding, and the proposed example is wrong and 
will not work.

for ARM EABI, the ABI requires that 64bit values be passed in register pairs.  
since the kernel people wanted to avoid an assembly trampoline to unpack the 
64bit value with EABI, you have to call it as proposed:
	syscall(readahead, fd, _pad, high32, low32)

for MIPS, only the O32 ABI has this behavior.

for PPC, only the 32bit ABI has this behavior.

otherwise, i don't believe anyone else does this -- they just pass things 
along in registers w/out padding.

since the current list of syscalls which are impacted is small, it might be 
useful to explicitly enumerate them.  they are:
	fadvise64_64
	ftruncate64
	pread64
	pwrite64
	readahead
	truncate64
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : add notes
       [not found]                 ` <201304010319.45019.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
@ 2013-04-01  7:36                   ` Michael Kerrisk (man-pages)
       [not found]                     ` <CAKgNAkhBASGvXGfdBSjpGaMuxoJofcQvZQrX3a=uxbcKQnXOAQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2013-04-01  8:37                   ` [PATCH] man2 : syscall.2 : add notes Mike Frysinger
  1 sibling, 1 reply; 41+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-04-01  7:36 UTC (permalink / raw)
  To: Mike Frysinger; +Cc: Changhee Han, linux-man, gunho.lee-Hm3cg6mZ9cc

Mike,

On Mon, Apr 1, 2013 at 9:19 AM, Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org> wrote:
> On Monday 01 April 2013 02:22:45 Michael Kerrisk (man-pages) wrote:
>> On Mon, Apr 1, 2013 at 8:13 AM, Mike Frysinger wrote:
>> > On Monday 01 April 2013 01:33:49 Changhee Han wrote:
>> >> +However, when using
>> >> +.BR syscall ()
>> >> +to make a system call,
>> >> +the caller may need to handle architecture-dependent details.
>> >> +For example, on ARM architecture, a
>> >> +.I "long long"
>> >> +argument is considered to be 8-byte aligned and to be split into two
>> >> 4-byte arguments. +
>> >> +.BR readahead ()
>> >> +system call could be called like below in ARM architecture.
>> >
>> > this has nothing to do with alignment.  syscalls pass args via registers,
>> > and in the 32bit ARM port, registers are 32bits wide.  so in order to
>> > pass a 64bit value, you have to manually split it up.
>>
>> So, I'm not familiar with all the details here. What is the purpose of
>> the '0' argument that precedes 'offset' then?
>
> ok, so the answer is more nuanced, and the reasoning above is incorrect (or at
> the very least, poorly phrased).
>
> for ARM OABI, there is no such padding, and the proposed example is wrong and
> will not work.
>
> for ARM EABI, the ABI requires that 64bit values be passed in register pairs.
> since the kernel people wanted to avoid an assembly trampoline to unpack the
> 64bit value with EABI, you have to call it as proposed:
>         syscall(readahead, fd, _pad, high32, low32)
>
> for MIPS, only the O32 ABI has this behavior.
>
> for PPC, only the 32bit ABI has this behavior.
>
> otherwise, i don't believe anyone else does this -- they just pass things
> along in registers w/out padding.
>
> since the current list of syscalls which are impacted is small, it might be
> useful to explicitly enumerate them.  they are:
>         fadvise64_64
>         ftruncate64
>         pread64
>         pwrite64
>         readahead
>         truncate64
> -mike

So, in summary, is the following patch okay? (If not, could you
suggest specific rewordings.)

Thanks,

Michael


diff --git a/man2/syscall.2 b/man2/syscall.2
index 0675943..cad1f20 100644
--- a/man2/syscall.2
+++ b/man2/syscall.2
@@ -79,6 +79,48 @@ and an error code is stored in
 .BR syscall ()
 first appeared in
 4BSD.
+
+Each architecture ABI has its own requirements on how
+system call arguments are passed to the kernel.
+For system calls that have a glibc wrapper (e.g., most system calls),
+glibc handles the details of copyiing arguments to the right registers
+in a manner suitable for the architecture.
+However, when using
+.BR syscall ()
+to make a system call,
+the caller may need to handle architecture-dependent details.
+For example, on the ARM architecture Embbeded ABI (EABI), a
+.I "long long"
+argument is considered to be 8-byte aligned and to be split
+into two 4-byte arguments.
+
+For example, the
+.BR readahead ()
+system call would be invoked as follows on the ARM architecture with the EABI:
+
+.in +4n
+.nf
+syscall(__NR_readahead, fd, 0, (unsigned int)(offset >> 32),
+        (unsigned int)(offset & 0xFFFFFFFF), count);
+.fi
+.in
+.PP
+.I offset
+is 64 bit and should be 8-byte aligned.
+Thus, a padding is inserted before
+.I offset
+and
+.I offset
+is split into two 32-bit arguments.
+Similar issues can occur on MIPS with the O32 ABI and
+on PowerPC with the 32-bit ABI.
+.BR fadvise64_64 (2)
+.BR ftruncate64 (2)
+.BR pread64 (2)
+.BR pwrite64 (2)
+.BR readahead (2)
+and
+.BR truncate64 (2).
 .SH EXAMPLE
 .nf
 #define _GNU_SOURCE
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : add notes
       [not found]                     ` <CAKgNAkhBASGvXGfdBSjpGaMuxoJofcQvZQrX3a=uxbcKQnXOAQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-04-01  8:29                       ` Mike Frysinger
       [not found]                         ` <201304010429.45737.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Mike Frysinger @ 2013-04-01  8:29 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Changhee Han, linux-man, gunho.lee-Hm3cg6mZ9cc

[-- Attachment #1: Type: Text/Plain, Size: 2308 bytes --]

On Monday 01 April 2013 03:36:40 Michael Kerrisk (man-pages) wrote:
> +glibc handles the details of copyiing arguments to the right registers

copying

> +the caller may need to handle architecture-dependent details.

may->might

> +For example, on the ARM architecture Embbeded ABI (EABI), a

Embedded

> +.I "long long"
> +argument is considered to be 8-byte aligned and to be split
> +into two 4-byte arguments.

i would rewrite to:
64 bit value (e.g. "long long") must be aligned to an even register pair.

> +.I offset
> +is 64 bit and should be 8-byte aligned.
> +Thus, a padding is inserted before
> +.I offset
> +and
> +.I offset
> +is split into two 32-bit arguments.

i would rewrite to:
Since the offset argument is 64 bits, and the first argument (fd) is passed in 
r0, we need to manually split & align the 64 bit value ourselves so that it is 
passed in the r2/r3 register pair.  That means inserting a dummy value into r1 
(the 2nd argument of 0).

> +Similar issues can occur on MIPS with the O32 ABI and
> +on PowerPC with the 32-bit ABI.
> +.BR fadvise64_64 (2)
> +.BR ftruncate64 (2)
> +.BR pread64 (2)
> +.BR pwrite64 (2)
> +.BR readahead (2)
> +and
> +.BR truncate64 (2).

the style here is messed up.  i'm guessing you meant to make a new paragraph 
starting at "Similar", and you meant to add some text before the function 
list.  also add to the list: sync_file_range and posix_fadvise.

not sure if it's worth mentioning, but this issue ends up forcing MIPS' O32 to 
take 7 arguments to syscall() :).  on ARM/PPC, they avoid this by reordering 
the arguments.

i see that the existing sync_file_range and posix_fadvise pages explicitly call 
out this issue.  i'd suggest updating those (as well as the other funcs that 
are affected) to point back to syscall(2) for more details rather than getting 
into too much detail.

on a related topic, would it be useful to document the exact calling 
convention for architecture system calls ?  from time to time, i need to 
reference this, and i inevitably turn to a variety of sources to dig up the 
answer (the kernel itself, or strace, or qemu, or glibc, or uClibc, or lss, or 
other random places).  i would find it handy to have all of these in a single 
location.
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : add notes
       [not found]                 ` <201304010319.45019.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
  2013-04-01  7:36                   ` Michael Kerrisk (man-pages)
@ 2013-04-01  8:37                   ` Mike Frysinger
       [not found]                     ` <201304010437.52901.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
  1 sibling, 1 reply; 41+ messages in thread
From: Mike Frysinger @ 2013-04-01  8:37 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Changhee Han, linux-man, gunho.lee-Hm3cg6mZ9cc

[-- Attachment #1: Type: Text/Plain, Size: 828 bytes --]

On Monday 01 April 2013 03:19:43 Mike Frysinger wrote:
> for ARM OABI, there is no such padding, and the proposed example is wrong
> and will not work.
> 
> for ARM EABI, the ABI requires that 64bit values be passed in register
> pairs. since the kernel people wanted to avoid an assembly trampoline to
> unpack the 64bit value with EABI, you have to call it as proposed:
> 	syscall(readahead, fd, _pad, high32, low32)
> 
> for MIPS, only the O32 ABI has this behavior.
> 
> for PPC, only the 32bit ABI has this behavior.
> 
> otherwise, i don't believe anyone else does this -- they just pass things
> along in registers w/out padding.

in random grepping of code bases (uClibc), i believe the xtensa arch also does 
64bit register pair aligning.  a cursory scan of the kernel seems to back this 
up.
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : add notes
       [not found]                         ` <201304010429.45737.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
@ 2013-04-01  9:29                           ` Michael Kerrisk (man-pages)
       [not found]                             ` <CAKgNAkij3zDwakWvcRkRbknmV2Hpt4HWfH4uVqmxp+7gQek-2g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-04-01  9:29 UTC (permalink / raw)
  To: Mike Frysinger; +Cc: Changhee Han, linux-man, gunho.lee-Hm3cg6mZ9cc

On Mon, Apr 1, 2013 at 10:29 AM, Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org> wrote:
> On Monday 01 April 2013 03:36:40 Michael Kerrisk (man-pages) wrote:

[Type corrections incorporated]

>> +.I "long long"
>> +argument is considered to be 8-byte aligned and to be split
>> +into two 4-byte arguments.
>
> i would rewrite to:
> 64 bit value (e.g. "long long") must be aligned to an even register pair.

Done.

>> +.I offset
>> +is 64 bit and should be 8-byte aligned.
>> +Thus, a padding is inserted before
>> +.I offset
>> +and
>> +.I offset
>> +is split into two 32-bit arguments.
>
> i would rewrite to:
> Since the offset argument is 64 bits, and the first argument (fd) is passed in
> r0, we need to manually split & align the 64 bit value ourselves so that it is
> passed in the r2/r3 register pair.  That means inserting a dummy value into r1
> (the 2nd argument of 0).

Done.

>> +Similar issues can occur on MIPS with the O32 ABI and
>> +on PowerPC with the 32-bit ABI.
>> +.BR fadvise64_64 (2)
>> +.BR ftruncate64 (2)
>> +.BR pread64 (2)
>> +.BR pwrite64 (2)
>> +.BR readahead (2)
>> +and
>> +.BR truncate64 (2).
>
> the style here is messed up.  i'm guessing you meant to make a new paragraph
> starting at "Similar", and you meant to add some text before the function
> list.  also add to the list: sync_file_range and posix_fadvise.

Yes, fixed.

> not sure if it's worth mentioning, but this issue ends up forcing MIPS' O32 to
> take 7 arguments to syscall() :).  on ARM/PPC, they avoid this by reordering
> the arguments.

I'm not sure that we need quite this level of detail, so I'll leave for now.

> i see that the existing sync_file_range and posix_fadvise pages explicitly call
> out this issue.  i'd suggest updating those (as well as the other funcs that
> are affected) to point back to syscall(2) for more details rather than getting
> into too much detail.

Seems reasonable to me.

> on a related topic, would it be useful to document the exact calling
> convention for architecture system calls ?  from time to time, i need to
> reference this, and i inevitably turn to a variety of sources to dig up the
> answer (the kernel itself, or strace, or qemu, or glibc, or uClibc, or lss, or
> other random places).  i would find it handy to have all of these in a single
> location.

Sounds like it would be useful to have that documented. Would you have
a chance to write patches for that?

Revised patches below.

Cheers,

Michael

diff --git a/man2/syscall.2 b/man2/syscall.2
index 0675943..75c4ad8 100644
--- a/man2/syscall.2
+++ b/man2/syscall.2
@@ -37,7 +37,7 @@
 .\" 2002-03-20  Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
 .\"    - adopted for Linux
 .\"
-.TH SYSCALL 2 2012-08-14 "Linux" "Linux Programmer's Manual"
+.TH SYSCALL 2 2013-04-01 "Linux" "Linux Programmer's Manual"
 .SH NAME
 syscall \- indirect system call
 .SH SYNOPSIS
@@ -79,6 +79,56 @@ and an error code is stored in
 .BR syscall ()
 first appeared in
 4BSD.
+
+Each architecture ABI has its own requirements on how
+system call arguments are passed to the kernel.
+For system calls that have a glibc wrapper (e.g., most system calls),
+glibc handles the details of copying arguments to the right registers
+in a manner suitable for the architecture.
+However, when using
+.BR syscall ()
+to make a system call,
+the caller might need to handle architecture-dependent details.
+For example, on the ARM architecture Embedded ABI (EABI), a
+64-bit value (e.g.,
+.IR "long long" ) must be aligned to an even register pair.
+
+For example, the
+.BR readahead ()
+system call would be invoked as follows on the ARM architecture with the EABI:
+
+.in +4n
+.nf
+syscall(SYS_readahead, fd, 0,
+        (unsigned int) (offset >> 32),
+        (unsigned int) (offset & 0xFFFFFFFF),
+        count);
+.fi
+.in
+.PP
+Since the offset argument is 64 bits, and the first argument
+.RI ( fd )
+is passed in
+.IR r0 ,
+we need to manually split and align the 64-bit value ourselves so that it is
+passed in the
+.IR r2 / r3
+register pair.
+That means inserting a dummy value into
+.I r1
+(the second argument of 0).
+Similar issues can occur on MIPS with the O32 ABI and
+on PowerPC with the 32-bit ABI.
+The affected system calls are
+.BR fadvise64_64 (2),
+.BR ftruncate64 (2),
+.BR posix_fadvise (2),
+.BR pread64 (2),
+.BR pwrite64 (2),
+.BR readahead (2),
+.BR sync_file_range (2),
+and
+.BR truncate64 (2).
 .SH EXAMPLE
 .nf
 #define _GNU_SOURCE





=====================

diff --git a/man2/posix_fadvise.2 b/man2/posix_fadvise.2
index d644641..90ac8e9 100644
--- a/man2/posix_fadvise.2
+++ b/man2/posix_fadvise.2
@@ -153,7 +153,10 @@ or
 first.
 .SS arm_fadvise()
 The ARM architecture
-needs 64-bit arguments to be aligned in a suitable pair of registers.
+needs 64-bit arguments to be aligned in a suitable pair of registers
+(see
+.BR syscall (2)
+for further detail).
 On this architecture, the call signature of
 .BR posix_fadvise ()
 is flawed, since it forces a register to be wasted as padding between the
diff --git a/man2/pread.2 b/man2/pread.2
index 42e79b7..1d648b1 100644
--- a/man2/pread.2
+++ b/man2/pread.2
@@ -130,6 +130,11 @@ The glibc
 and
 .BR pwrite ()
 wrapper functions transparently deal with the change.
+
+On some 32-bit architectures,
+the calling signature for these system calls differ,
+for the reasons described in
+.BR syscall (2).
 .SH BUGS
 POSIX requires that opening a file with the
 .BR O_APPEND
diff --git a/man2/readahead.2 b/man2/readahead.2
index 08c2fe2..605fa5e 100644
--- a/man2/readahead.2
+++ b/man2/readahead.2
@@ -89,6 +89,11 @@ The
 .BR readahead ()
 system call is Linux-specific, and its use should be avoided
 in portable applications.
+.SH NOTES
+On some 32-bit architectures,
+the calling signature for this system call differs,
+for the reasons described in
+.BR syscall (2).
 .SH SEE ALSO
 .BR lseek (2),
 .BR madvise (2),
diff --git a/man2/sync_file_range.2 b/man2/sync_file_range.2
index c55184a..6adf15d 100644
--- a/man2/sync_file_range.2
+++ b/man2/sync_file_range.2
@@ -191,6 +191,9 @@ is flawed, since it forces a register to be wasted
as padding between the
 and
 .I offset
 arguments.
+(See
+.BR syscall (2)
+for details.)
 Therefore, these architectures define a different
 system call that orders the arguments suitably:
 .PP
diff --git a/man2/truncate.2 b/man2/truncate.2
index 4d12683..64b8288 100644
--- a/man2/truncate.2
+++ b/man2/truncate.2
@@ -240,6 +240,11 @@ system calls that handle large files.
 However, these details can be ignored by applications using glibc, whose
 wrapper functions transparently employ the more recent system calls
 where they are available.
+
+On some 32-bit architectures,
+the calling signature for these system calls differ,
+for the reasons described in
+.BR syscall (2).
 .SH BUGS
 A header file bug in glibc 2.12 meant that the minimum value of
 .\" http://sourceware.org/bugzilla/show_bug.cgi?id=12037
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : add notes
       [not found]                     ` <201304010437.52901.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
@ 2013-04-01  9:30                       ` Michael Kerrisk (man-pages)
       [not found]                         ` <CAKgNAkit-qRPErHDzGEJ_yedA+O97bFxDsqWJMZOhCZ9DPvOtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-04-01  9:30 UTC (permalink / raw)
  To: Mike Frysinger; +Cc: Changhee Han, linux-man, gunho.lee-Hm3cg6mZ9cc

On Mon, Apr 1, 2013 at 10:37 AM, Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org> wrote:
> On Monday 01 April 2013 03:19:43 Mike Frysinger wrote:
>> for ARM OABI, there is no such padding, and the proposed example is wrong
>> and will not work.
>>
>> for ARM EABI, the ABI requires that 64bit values be passed in register
>> pairs. since the kernel people wanted to avoid an assembly trampoline to
>> unpack the 64bit value with EABI, you have to call it as proposed:
>>       syscall(readahead, fd, _pad, high32, low32)
>>
>> for MIPS, only the O32 ABI has this behavior.
>>
>> for PPC, only the 32bit ABI has this behavior.
>>
>> otherwise, i don't believe anyone else does this -- they just pass things
>> along in registers w/out padding.
>
> in random grepping of code bases (uClibc), i believe the xtensa arch also does
> 64bit register pair aligning.  a cursory scan of the kernel seems to back this
> up.

Also SuperH?

For my own education: which part of the kernel sources backed this up?
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : add notes
       [not found]                         ` <CAKgNAkit-qRPErHDzGEJ_yedA+O97bFxDsqWJMZOhCZ9DPvOtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-04-01 10:09                           ` Mike Frysinger
  0 siblings, 0 replies; 41+ messages in thread
From: Mike Frysinger @ 2013-04-01 10:09 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Changhee Han, linux-man, gunho.lee-Hm3cg6mZ9cc

[-- Attachment #1: Type: Text/Plain, Size: 2045 bytes --]

On Monday 01 April 2013 05:30:06 Michael Kerrisk (man-pages) wrote:
> On Mon, Apr 1, 2013 at 10:37 AM, Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org> wrote:
> > On Monday 01 April 2013 03:19:43 Mike Frysinger wrote:
> >> for ARM OABI, there is no such padding, and the proposed example is
> >> wrong and will not work.
> >> 
> >> for ARM EABI, the ABI requires that 64bit values be passed in register
> >> pairs. since the kernel people wanted to avoid an assembly trampoline to
> >> 
> >> unpack the 64bit value with EABI, you have to call it as proposed:
> >>       syscall(readahead, fd, _pad, high32, low32)
> >> 
> >> for MIPS, only the O32 ABI has this behavior.
> >> 
> >> for PPC, only the 32bit ABI has this behavior.
> >> 
> >> otherwise, i don't believe anyone else does this -- they just pass
> >> things along in registers w/out padding.
> > 
> > in random grepping of code bases (uClibc), i believe the xtensa arch also
> > does 64bit register pair aligning.  a cursory scan of the kernel seems
> > to back this up.
> 
> Also SuperH?

i don't think so ... the pread/pwrite is indeed funky for SuperH, but i'm 
pretty sure that's a wart they accidentally copied from another arch (ppc 
maybe?) when they implemented the syscall rather than needing to do 64bit 
register alignment.  i say that because qemu/strace/glibc don't do the 
register realigning for any other function.

> For my own education: which part of the kernel sources backed this up?

for xtensa, this part:
	arch/xtensa/include/uapi/asm/unistd.h:
	__SYSCALL(260, sys_readahead, 5)
that says readahead takes 5 args, but that's only true for 32bit arches if 
you're re-aligning the value.  the other 64bit syscalls have the same property 
(+1 to the normal # of args).

additionally, the arch/xtensa/kernel/syscall.c file has a custom fadvise64_64 
syscall with re-order arguments (with "advice" moved from last to 2nd) so that 
the shifting of args doesn't end up requiring 7 slots (ala mips/o32).
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : add notes
       [not found]                             ` <CAKgNAkij3zDwakWvcRkRbknmV2Hpt4HWfH4uVqmxp+7gQek-2g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-04-01 10:32                               ` Mike Frysinger
       [not found]                                 ` <201304010632.41520.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Mike Frysinger @ 2013-04-01 10:32 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Changhee Han, linux-man, gunho.lee-Hm3cg6mZ9cc

[-- Attachment #1: Type: Text/Plain, Size: 3188 bytes --]

On Monday 01 April 2013 05:29:11 Michael Kerrisk (man-pages) wrote:
> On Mon, Apr 1, 2013 at 10:29 AM, Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org> wrote:
> > not sure if it's worth mentioning, but this issue ends up forcing MIPS'
> > O32 to take 7 arguments to syscall() :).  on ARM/PPC, they avoid this by
> > reordering the arguments.
> 
> I'm not sure that we need quite this level of detail, so I'll leave for
> now.

i only mention it because 7 args to syscall is unusual ... i thinks mips/o32 
is the only arch that does this.

> > on a related topic, would it be useful to document the exact calling
> > convention for architecture system calls ?  from time to time, i need to
> > reference this, and i inevitably turn to a variety of sources to dig up
> > the answer (the kernel itself, or strace, or qemu, or glibc, or uClibc,
> > or lss, or other random places).  i would find it handy to have all of
> > these in a single location.
> 
> Sounds like it would be useful to have that documented. Would you have
> a chance to write patches for that?

should we do it in syscall(2) ?  or a dedicated man page ?

if the former, create a dedicated section, or do it under NOTES ?

> --- a/man2/syscall.2
> +++ b/man2/syscall.2
>
> +64-bit value (e.g.,
> +.IR "long long" ) must be aligned to an even register pair.

this renders incorrectly.  the reset of the sentence should be pulled onto a 
new line.

> +That means inserting a dummy value into
> +.I r1
> +(the second argument of 0).
> +Similar issues can occur on MIPS with the O32 ABI and
> +on PowerPC with the 32-bit ABI.
> +The affected system calls are
> +.BR fadvise64_64 (2),
> +.BR ftruncate64 (2),
> +.BR posix_fadvise (2),
> +.BR pread64 (2),
> +.BR pwrite64 (2),
> +.BR readahead (2),
> +.BR sync_file_range (2),
> +and
> +.BR truncate64 (2).

i'm on the fence whether this reads better if there's a new paragraph starting 
with "Similar issues", and whether the list of syscalls should be a flat list 
(one syscall per line).

> --- a/man2/posix_fadvise.2
> +++ b/man2/posix_fadvise.2
> @@ -153,7 +153,10 @@ or
>  first.
>  .SS arm_fadvise()
>  The ARM architecture
> -needs 64-bit arguments to be aligned in a suitable pair of registers.
> +needs 64-bit arguments to be aligned in a suitable pair of registers
> +(see
> +.BR syscall (2)
> +for further detail).

probably want to scrub the arm references altogether and say "some 32-bit 
arches".  this signature is used on other arches as well (ppc and xtensa at 
least).

would also stop describing it as "flawed".  there are tradeoffs when the ABI 
imposes these kinds of requirements, and i'm not sure one is really better 
than the other.

> --- a/man2/sync_file_range.2
> +++ b/man2/sync_file_range.2
> @@ -191,6 +191,9 @@ is flawed, since it forces a register to be wasted
> as padding between the
>  and
>  .I offset
>  arguments.
> +(See
> +.BR syscall (2)
> +for details.)
>  Therefore, these architectures define a different
>  system call that orders the arguments suitably:

also in this man page, i would stop describing it as "flawed".
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : add notes
       [not found]                                 ` <201304010632.41520.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
@ 2013-04-02  6:54                                   ` Michael Kerrisk (man-pages)
       [not found]                                     ` <CAKgNAkgG2kdCC1tyZQkYU7O_nP7RB8VoCmx6eb8FcudU1s6RgA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-04-02  6:54 UTC (permalink / raw)
  To: Mike Frysinger; +Cc: Changhee Han, linux-man, gunho.lee-Hm3cg6mZ9cc

On Mon, Apr 1, 2013 at 12:32 PM, Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org> wrote:
> On Monday 01 April 2013 05:29:11 Michael Kerrisk (man-pages) wrote:
>> On Mon, Apr 1, 2013 at 10:29 AM, Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org> wrote:
>> > not sure if it's worth mentioning, but this issue ends up forcing MIPS'
>> > O32 to take 7 arguments to syscall() :).  on ARM/PPC, they avoid this by
>> > reordering the arguments.
>>
>> I'm not sure that we need quite this level of detail, so I'll leave for
>> now.
>
> i only mention it because 7 args to syscall is unusual ... i thinks mips/o32
> is the only arch that does this.

I'll add a note in the page source. Maybe I'll revisit this later.

>> > on a related topic, would it be useful to document the exact calling
>> > convention for architecture system calls ?  from time to time, i need to
>> > reference this, and i inevitably turn to a variety of sources to dig up
>> > the answer (the kernel itself, or strace, or qemu, or glibc, or uClibc,
>> > or lss, or other random places).  i would find it handy to have all of
>> > these in a single location.
>>
>> Sounds like it would be useful to have that documented. Would you have
>> a chance to write patches for that?
>
> should we do it in syscall(2) ?  or a dedicated man page ?

It's a little hard to say until I see the shape of what comes. Can you
provide a rough per-syscall example or two of what you expect to
document? (Don't write too concrete a patch yet, until I can get a
handle on what you intend.)

> if the former, create a dedicated section, or do it under NOTES ?

*If* the former, then I'd say a subsection under NOTES. But maybe this
is better per-syscall. Not sure yet.

>> --- a/man2/syscall.2
>> +++ b/man2/syscall.2
>>
>> +64-bit value (e.g.,
>> +.IR "long long" ) must be aligned to an even register pair.
>
> this renders incorrectly.  the reset of the sentence should be pulled onto a
> new line.

fixed.

>> +That means inserting a dummy value into
>> +.I r1
>> +(the second argument of 0).
>> +Similar issues can occur on MIPS with the O32 ABI and
>> +on PowerPC with the 32-bit ABI.
>> +The affected system calls are
>> +.BR fadvise64_64 (2),
>> +.BR ftruncate64 (2),
>> +.BR posix_fadvise (2),
>> +.BR pread64 (2),
>> +.BR pwrite64 (2),
>> +.BR readahead (2),
>> +.BR sync_file_range (2),
>> +and
>> +.BR truncate64 (2).
>
> i'm on the fence whether this reads better if there's a new paragraph starting
> with "Similar issues", and whether the list of syscalls should be a flat list
> (one syscall per line).

Tweaked.

>> --- a/man2/posix_fadvise.2
>> +++ b/man2/posix_fadvise.2
>> @@ -153,7 +153,10 @@ or
>>  first.
>>  .SS arm_fadvise()
>>  The ARM architecture
>> -needs 64-bit arguments to be aligned in a suitable pair of registers.
>> +needs 64-bit arguments to be aligned in a suitable pair of registers
>> +(see
>> +.BR syscall (2)
>> +for further detail).
>
> probably want to scrub the arm references altogether and say "some 32-bit
> arches".  this signature is used on other arches as well (ppc and xtensa at
> least).

Tweaked.

> would also stop describing it as "flawed".  there are tradeoffs when the ABI
> imposes these kinds of requirements, and i'm not sure one is really better
> than the other.

Removed "flawed"

>> --- a/man2/sync_file_range.2
>> +++ b/man2/sync_file_range.2
>> @@ -191,6 +191,9 @@ is flawed, since it forces a register to be wasted
>> as padding between the
>>  and
>>  .I offset
>>  arguments.
>> +(See
>> +.BR syscall (2)
>> +for details.)
>>  Therefore, these architectures define a different
>>  system call that orders the arguments suitably:
>
> also in this man page, i would stop describing it as "flawed".

Done.

Changes have been pushed to Git now.

Cheers,

Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] man2 : syscall.2 : document syscall calling conventions
       [not found]                                     ` <CAKgNAkgG2kdCC1tyZQkYU7O_nP7RB8VoCmx6eb8FcudU1s6RgA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-04-02 23:17                                       ` Mike Frysinger
  2013-04-07 10:00                                         ` Michael Kerrisk (man-pages)
  0 siblings, 1 reply; 41+ messages in thread
From: Mike Frysinger @ 2013-04-02 23:17 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w; +Cc: linux-man

[-- Attachment #1: Type: Text/Plain, Size: 2665 bytes --]

On Tuesday 02 April 2013 02:54:39 Michael Kerrisk (man-pages) wrote:
> On Mon, Apr 1, 2013 at 12:32 PM, Mike Frysinger wrote:
> > On Monday 01 April 2013 05:29:11 Michael Kerrisk (man-pages) wrote:
> >> On Mon, Apr 1, 2013 at 10:29 AM, Mike Frysinger wrote:
> >> > on a related topic, would it be useful to document the exact calling
> >> > convention for architecture system calls ?  from time to time, i need
> >> > to reference this, and i inevitably turn to a variety of sources to
> >> > dig up the answer (the kernel itself, or strace, or qemu, or glibc,
> >> > or uClibc, or lss, or other random places).  i would find it handy to
> >> > have all of these in a single location.
> >> 
> >> Sounds like it would be useful to have that documented. Would you have
> >> a chance to write patches for that?
> > 
> > should we do it in syscall(2) ?  or a dedicated man page ?
> 
> It's a little hard to say until I see the shape of what comes. Can you
> provide a rough per-syscall example or two of what you expect to
> document? (Don't write too concrete a patch yet, until I can get a
> handle on what you intend.)

this renders nicely i think.  it shows most of the stuff i'm interested in.
might be useful to add a dedicated section covering the clobbers in the
future.
-mike

--- a/man2/syscall.2
+++ b/man2/syscall.2
@@ -79,6 +79,35 @@ and an error code is stored in
 .BR syscall ()
 first appeared in
 4BSD.
+.SS Architecture calling conventions
+Every architecture has its own way of invoking & passing arguments to the
+kernel.
+Note that the instruction listed below might not be the fastest or best way to
+transition to the kernel, so you might have to refer to the VDSO.
+Also note that this doesn't cover the entire calling convention -- some
+architectures may indiscriminately clobber other registers not listed here.
+.if t \{\
+.ft CW
+\}
+.TS
+l l l l l l l l l l l.
+arch/ABI	insn	NR	ret	arg1	arg2	arg3	arg4	arg5	arg6	arg7
+_
+arm/OABI	swi NR;	-	a1	a1	a2	a3	a4	v1	v2	v3
+arm/EABI	swi 0x0;	r7	r1	r1	r2	r3	r4	r5	r6	r7
+bfin	excpt 0x0;	P0	R0	R0	R1	R2	R3	R4	R5	-
+i386	int $0x80;	eax	eax	ebx	ecx	edx	esi	edi	ebp	-
+ia64	break 0x100000;	r15	r10/r8	r11	r9	r10	r14	r15	r13	-
+.\" not sure about insn or NR
+.\" parisc	ble 0x100(%%sr2, %%r0);	-	r28	r26	r25	r24	r23	r22	r21	-
+sparc/32	t 0x10;	g1	o0	o0	o1	o2	o3	o4	o5	-
+sparc/64	t 0x6d;	g1	o0	o0	o1	o2	o3	o4	o5	-
+x86_64	syscall;	rax	rax	rdi	rsi	rdx	r10	r8	r9	-
+.TE
+.if t \{\
+.in
+.ft P
+\}
 .SS Architecture-specific requirements
 Each architecture ABI has its own requirements on how
 system call arguments are passed to the kernel.

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
  2013-04-02 23:17                                       ` [PATCH] man2 : syscall.2 : document syscall calling conventions Mike Frysinger
@ 2013-04-07 10:00                                         ` Michael Kerrisk (man-pages)
  2013-04-07 13:55                                           ` Kyle McMartin
       [not found]                                           ` <CAKgNAkgODPSWSeA8ZymiAjFBqSAZQMtQe9GW84Y6QHdFEc9S-w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 2 replies; 41+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-04-07 10:00 UTC (permalink / raw)
  To: Mike Frysinger
  Cc: linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
	linux-parisc

[Adding a few people to CC who may be able to help with Mike's doubts
on PA-RISC; folks, if any of you could have a quick look at the parisc
piece below, that would be helpful]

Mike,

On Wed, Apr 3, 2013 at 1:17 AM, Mike Frysinger <vapier@gentoo.org> wrote:
> On Tuesday 02 April 2013 02:54:39 Michael Kerrisk (man-pages) wrote:
>> On Mon, Apr 1, 2013 at 12:32 PM, Mike Frysinger wrote:
>> > On Monday 01 April 2013 05:29:11 Michael Kerrisk (man-pages) wrote:
>> >> On Mon, Apr 1, 2013 at 10:29 AM, Mike Frysinger wrote:
>> >> > on a related topic, would it be useful to document the exact calling
>> >> > convention for architecture system calls ?  from time to time, i need
>> >> > to reference this, and i inevitably turn to a variety of sources to
>> >> > dig up the answer (the kernel itself, or strace, or qemu, or glibc,
>> >> > or uClibc, or lss, or other random places).  i would find it handy to
>> >> > have all of these in a single location.
>> >>
>> >> Sounds like it would be useful to have that documented. Would you have
>> >> a chance to write patches for that?
>> >
>> > should we do it in syscall(2) ?  or a dedicated man page ?
>>
>> It's a little hard to say until I see the shape of what comes. Can you
>> provide a rough per-syscall example or two of what you expect to
>> document? (Don't write too concrete a patch yet, until I can get a
>> handle on what you intend.)
>
> this renders nicely i think.  it shows most of the stuff i'm interested in.
> might be useful to add a dedicated section covering the clobbers in the
> future.

Thanks for that. It looks good to me, and I have applied. But it
renders too wide (wherever possible, I try to ensure that everything
renders inside 80 columns), so I have split into tables, one with
"instruction, NR, ret" and another with the arguments (arg1 to arg7).

Now, just to make 100% sure of your intention, the NR column would be
better named "syscall #" (or similar), right? (I've made that change.)

> --- a/man2/syscall.2
> +++ b/man2/syscall.2
> @@ -79,6 +79,35 @@ and an error code is stored in
>  .BR syscall ()
>  first appeared in
>  4BSD.
> +.SS Architecture calling conventions
> +Every architecture has its own way of invoking & passing arguments to the
> +kernel.
> +Note that the instruction listed below might not be the fastest or best way to
> +transition to the kernel, so you might have to refer to the VDSO.

Mike, any chance that I could interest you in writing a vdso(7) man
page? I've felt the lack of such a page for a while (it need not be
too long), but am not deep enough into the details to write it easily
(I am not sure if you are).

> +Also note that this doesn't cover the entire calling convention -- some
> +architectures may indiscriminately clobber other registers not listed here.
> +.if t \{\
> +.ft CW
> +\}
> +.TS
> +l l l l l l l l l l l.
> +arch/ABI       insn    NR      ret     arg1    arg2    arg3    arg4    arg5    arg6    arg7
> +_
> +arm/OABI       swi NR; -       a1      a1      a2      a3      a4      v1      v2      v3
> +arm/EABI       swi 0x0;        r7      r1      r1      r2      r3      r4      r5      r6      r7
> +bfin   excpt 0x0;      P0      R0      R0      R1      R2      R3      R4      R5      -
> +i386   int $0x80;      eax     eax     ebx     ecx     edx     esi     edi     ebp     -
> +ia64   break 0x100000; r15     r10/r8  r11     r9      r10     r14     r15     r13     -
> +.\" not sure about insn or NR
> +.\" parisc     ble 0x100(%%sr2, %%r0); -       r28     r26     r25     r24     r23     r22     r21     -

PA-RISC folks, are you able to confirm/correct the above?

> +sparc/32       t 0x10; g1      o0      o0      o1      o2      o3      o4      o5      -
> +sparc/64       t 0x6d; g1      o0      o0      o1      o2      o3      o4      o5      -
> +x86_64 syscall;        rax     rax     rdi     rsi     rdx     r10     r8      r9      -
> +.TE
> +.if t \{\
> +.in
> +.ft P
> +\}
>  .SS Architecture-specific requirements
>  Each architecture ABI has its own requirements on how
>  system call arguments are passed to the kernel.

Cheers,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
  2013-04-07 10:00                                         ` Michael Kerrisk (man-pages)
@ 2013-04-07 13:55                                           ` Kyle McMartin
  2013-04-07 14:56                                             ` James Bottomley
       [not found]                                             ` <20130407135514.GW12938-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org>
       [not found]                                           ` <CAKgNAkgODPSWSeA8ZymiAjFBqSAZQMtQe9GW84Y6QHdFEc9S-w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 2 replies; 41+ messages in thread
From: Kyle McMartin @ 2013-04-07 13:55 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Mike Frysinger, linux-man, Kyle McMartin, Helge Deller,
	James E.J. Bottomley, linux-parisc

On Sun, Apr 07, 2013 at 12:00:50PM +0200, Michael Kerrisk (man-pages) wrote:
> [Adding a few people to CC who may be able to help with Mike's doubts
> on PA-RISC; folks, if any of you could have a quick look at the parisc
> piece below, that would be helpful]
> 

The syscall number is in %r20, everything else looks correct. The
returned value is in %r28 and the args are %r26 through %r21.

--Kyle

> Mike,
> 
> On Wed, Apr 3, 2013 at 1:17 AM, Mike Frysinger <vapier@gentoo.org> wrote:
> > On Tuesday 02 April 2013 02:54:39 Michael Kerrisk (man-pages) wrote:
> >> On Mon, Apr 1, 2013 at 12:32 PM, Mike Frysinger wrote:
> >> > On Monday 01 April 2013 05:29:11 Michael Kerrisk (man-pages) wrote:
> >> >> On Mon, Apr 1, 2013 at 10:29 AM, Mike Frysinger wrote:
> >> >> > on a related topic, would it be useful to document the exact calling
> >> >> > convention for architecture system calls ?  from time to time, i need
> >> >> > to reference this, and i inevitably turn to a variety of sources to
> >> >> > dig up the answer (the kernel itself, or strace, or qemu, or glibc,
> >> >> > or uClibc, or lss, or other random places).  i would find it handy to
> >> >> > have all of these in a single location.
> >> >>
> >> >> Sounds like it would be useful to have that documented. Would you have
> >> >> a chance to write patches for that?
> >> >
> >> > should we do it in syscall(2) ?  or a dedicated man page ?
> >>
> >> It's a little hard to say until I see the shape of what comes. Can you
> >> provide a rough per-syscall example or two of what you expect to
> >> document? (Don't write too concrete a patch yet, until I can get a
> >> handle on what you intend.)
> >
> > this renders nicely i think.  it shows most of the stuff i'm interested in.
> > might be useful to add a dedicated section covering the clobbers in the
> > future.
> 
> Thanks for that. It looks good to me, and I have applied. But it
> renders too wide (wherever possible, I try to ensure that everything
> renders inside 80 columns), so I have split into tables, one with
> "instruction, NR, ret" and another with the arguments (arg1 to arg7).
> 
> Now, just to make 100% sure of your intention, the NR column would be
> better named "syscall #" (or similar), right? (I've made that change.)
> 
> > --- a/man2/syscall.2
> > +++ b/man2/syscall.2
> > @@ -79,6 +79,35 @@ and an error code is stored in
> >  .BR syscall ()
> >  first appeared in
> >  4BSD.
> > +.SS Architecture calling conventions
> > +Every architecture has its own way of invoking & passing arguments to the
> > +kernel.
> > +Note that the instruction listed below might not be the fastest or best way to
> > +transition to the kernel, so you might have to refer to the VDSO.
> 
> Mike, any chance that I could interest you in writing a vdso(7) man
> page? I've felt the lack of such a page for a while (it need not be
> too long), but am not deep enough into the details to write it easily
> (I am not sure if you are).
> 
> > +Also note that this doesn't cover the entire calling convention -- some
> > +architectures may indiscriminately clobber other registers not listed here.
> > +.if t \{\
> > +.ft CW
> > +\}
> > +.TS
> > +l l l l l l l l l l l.
> > +arch/ABI       insn    NR      ret     arg1    arg2    arg3    arg4    arg5    arg6    arg7
> > +_
> > +arm/OABI       swi NR; -       a1      a1      a2      a3      a4      v1      v2      v3
> > +arm/EABI       swi 0x0;        r7      r1      r1      r2      r3      r4      r5      r6      r7
> > +bfin   excpt 0x0;      P0      R0      R0      R1      R2      R3      R4      R5      -
> > +i386   int $0x80;      eax     eax     ebx     ecx     edx     esi     edi     ebp     -
> > +ia64   break 0x100000; r15     r10/r8  r11     r9      r10     r14     r15     r13     -
> > +.\" not sure about insn or NR
> > +.\" parisc     ble 0x100(%%sr2, %%r0); -       r28     r26     r25     r24     r23     r22     r21     -
> 
> PA-RISC folks, are you able to confirm/correct the above?
> 
> > +sparc/32       t 0x10; g1      o0      o0      o1      o2      o3      o4      o5      -
> > +sparc/64       t 0x6d; g1      o0      o0      o1      o2      o3      o4      o5      -
> > +x86_64 syscall;        rax     rax     rdi     rsi     rdx     r10     r8      r9      -
> > +.TE
> > +.if t \{\
> > +.in
> > +.ft P
> > +\}
> >  .SS Architecture-specific requirements
> >  Each architecture ABI has its own requirements on how
> >  system call arguments are passed to the kernel.
> 
> Cheers,
> 
> Michael
> 
> --
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Author of "The Linux Programming Interface"; http://man7.org/tlpi/
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
  2013-04-07 13:55                                           ` Kyle McMartin
@ 2013-04-07 14:56                                             ` James Bottomley
  2013-04-07 15:11                                               ` Kyle McMartin
       [not found]                                             ` <20130407135514.GW12938-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org>
  1 sibling, 1 reply; 41+ messages in thread
From: James Bottomley @ 2013-04-07 14:56 UTC (permalink / raw)
  To: Kyle McMartin
  Cc: Michael Kerrisk (man-pages),
	Mike Frysinger, linux-man, Kyle McMartin, Helge Deller,
	James E.J. Bottomley, linux-parisc

On Sun, 2013-04-07 at 09:55 -0400, Kyle McMartin wrote:
> On Sun, Apr 07, 2013 at 12:00:50PM +0200, Michael Kerrisk (man-pages) wrote:
> > [Adding a few people to CC who may be able to help with Mike's doubts
> > on PA-RISC; folks, if any of you could have a quick look at the parisc
> > piece below, that would be helpful]
> > 
> 
> The syscall number is in %r20, everything else looks correct. The
> returned value is in %r28 and the args are %r26 through %r21.

Actually, that's not quite correct.  on 64 bits it's arg1-8 are %r26-%
r19 but on 32 the convention is that arg1-arg4 are %r26-%r23 and the
rest on stack.  We can also do register pair combining on 32 bits for a
64 bit argument.

Our register use is documented in 

Documentation/parisc/registers

James



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
  2013-04-07 14:56                                             ` James Bottomley
@ 2013-04-07 15:11                                               ` Kyle McMartin
       [not found]                                                 ` <20130407151134.GX12938-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Kyle McMartin @ 2013-04-07 15:11 UTC (permalink / raw)
  To: James Bottomley
  Cc: Michael Kerrisk (man-pages),
	Mike Frysinger, linux-man, Kyle McMartin, Helge Deller,
	James E.J. Bottomley, linux-parisc

On Sun, Apr 07, 2013 at 07:56:49AM -0700, James Bottomley wrote:
> Actually, that's not quite correct.  on 64 bits it's arg1-8 are %r26-%
> r19 but on 32 the convention is that arg1-arg4 are %r26-%r23 and the
> rest on stack.  We can also do register pair combining on 32 bits for a
> 64 bit argument.

I guess the confusion is whether you're writing this from the kernel
side or the userspace side. The syscall instruction is called with six
arg registers, but we fix it on entry to the kernel when we call into C.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
       [not found]                                                 ` <20130407151134.GX12938-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org>
@ 2013-04-07 15:38                                                   ` James Bottomley
  2013-04-08  9:18                                                   ` Michael Kerrisk (man-pages)
  1 sibling, 0 replies; 41+ messages in thread
From: James Bottomley @ 2013-04-07 15:38 UTC (permalink / raw)
  To: Kyle McMartin
  Cc: Michael Kerrisk (man-pages),
	Mike Frysinger, linux-man, Kyle McMartin, Helge Deller,
	James E.J. Bottomley, linux-parisc-u79uwXL29TY76Z2rM5mHXA

On Sun, 2013-04-07 at 11:11 -0400, Kyle McMartin wrote:
> On Sun, Apr 07, 2013 at 07:56:49AM -0700, James Bottomley wrote:
> > Actually, that's not quite correct.  on 64 bits it's arg1-8 are %r26-%
> > r19 but on 32 the convention is that arg1-arg4 are %r26-%r23 and the
> > rest on stack.  We can also do register pair combining on 32 bits for a
> > 64 bit argument.
> 
> I guess the confusion is whether you're writing this from the kernel
> side or the userspace side. The syscall instruction is called with six
> arg registers, but we fix it on entry to the kernel when we call into C.

Oh, right, syscall arguments, sorry didn't manage to extract the content
from all the quotes.  I was just thinking general ABI.

The syscall arguments are all in

arch/parisc/include/asm/unistd.h

As Kyle says, we override the calling convention and define in-register
arguments even on 32 bit (so %r26-%r21).  We actually don't define
_syscall6() yet, but we're ready for it.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
       [not found]                                             ` <20130407135514.GW12938-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org>
@ 2013-04-07 18:39                                               ` Mike Frysinger
  2013-04-07 18:48                                                 ` John David Anglin
  0 siblings, 1 reply; 41+ messages in thread
From: Mike Frysinger @ 2013-04-07 18:39 UTC (permalink / raw)
  To: Kyle McMartin
  Cc: Michael Kerrisk (man-pages),
	linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
	linux-parisc-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: Text/Plain, Size: 884 bytes --]

On Sunday 07 April 2013 09:55:14 Kyle McMartin wrote:
> On Sun, Apr 07, 2013 at 12:00:50PM +0200, Michael Kerrisk (man-pages) wrote:
> > [Adding a few people to CC who may be able to help with Mike's doubts
> > on PA-RISC; folks, if any of you could have a quick look at the parisc
> > piece below, that would be helpful]
> 
> The syscall number is in %r20, everything else looks correct. The
> returned value is in %r28 and the args are %r26 through %r21.

just to be clear, the only insn you need is:
	ble 0x100(%sr2, %r0);

the kernel docs say sr2 holds the kernel gateway page (so i guess 0x100 is a 
known offset into that).  the docs don't mention r0 that i can see, so i'm 
guessing it's one of those "always 0" registers ?

the sysdep code has an ldi call in the branch delay slot (i think), but all 
that seems to do is load r20 with the syscall nr.
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
       [not found]                                           ` <CAKgNAkgODPSWSeA8ZymiAjFBqSAZQMtQe9GW84Y6QHdFEc9S-w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-04-07 18:43                                             ` Mike Frysinger
  0 siblings, 0 replies; 41+ messages in thread
From: Mike Frysinger @ 2013-04-07 18:43 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w; +Cc: linux-man

[-- Attachment #1: Type: Text/Plain, Size: 3144 bytes --]

On Sunday 07 April 2013 06:00:50 Michael Kerrisk (man-pages) wrote:
> On Wed, Apr 3, 2013 at 1:17 AM, Mike Frysinger wrote:
> > On Tuesday 02 April 2013 02:54:39 Michael Kerrisk (man-pages) wrote:
> >> On Mon, Apr 1, 2013 at 12:32 PM, Mike Frysinger wrote:
> >> > On Monday 01 April 2013 05:29:11 Michael Kerrisk (man-pages) wrote:
> >> >> On Mon, Apr 1, 2013 at 10:29 AM, Mike Frysinger wrote:
> >> >> > on a related topic, would it be useful to document the exact
> >> >> > calling convention for architecture system calls ?  from time to
> >> >> > time, i need to reference this, and i inevitably turn to a variety
> >> >> > of sources to dig up the answer (the kernel itself, or strace, or
> >> >> > qemu, or glibc, or uClibc, or lss, or other random places).  i
> >> >> > would find it handy to have all of these in a single location.
> >> >> 
> >> >> Sounds like it would be useful to have that documented. Would you
> >> >> have a chance to write patches for that?
> >> > 
> >> > should we do it in syscall(2) ?  or a dedicated man page ?
> >> 
> >> It's a little hard to say until I see the shape of what comes. Can you
> >> provide a rough per-syscall example or two of what you expect to
> >> document? (Don't write too concrete a patch yet, until I can get a
> >> handle on what you intend.)
> > 
> > this renders nicely i think.  it shows most of the stuff i'm interested
> > in. might be useful to add a dedicated section covering the clobbers in
> > the future.
> 
> Thanks for that. It looks good to me, and I have applied. But it
> renders too wide (wherever possible, I try to ensure that everything
> renders inside 80 columns), so I have split into tables, one with
> "instruction, NR, ret" and another with the arguments (arg1 to arg7).
> 
> Now, just to make 100% sure of your intention, the NR column would be
> better named "syscall #" (or similar), right? (I've made that change.)

i called it "nr" because that's the common convention (__NR_xxx/etc...) in 
code bases, and because it does a nice job of not pushing the table too wide.

if you've split up the table though, that should no longer be a problem.

> > --- a/man2/syscall.2
> > +++ b/man2/syscall.2
> > @@ -79,6 +79,35 @@ and an error code is stored in
> >  .BR syscall ()
> >  first appeared in
> >  4BSD.
> > +.SS Architecture calling conventions
> > +Every architecture has its own way of invoking & passing arguments to
> > the +kernel.
> > +Note that the instruction listed below might not be the fastest or best
> > way to +transition to the kernel, so you might have to refer to the
> > VDSO.
> 
> Mike, any chance that I could interest you in writing a vdso(7) man
> page? I've felt the lack of such a page for a while (it need not be
> too long), but am not deep enough into the details to write it easily
> (I am not sure if you are).

i might take a stab at it.  it's annoying to constantly have to refer to the 
kernel source when looking something up.

in order to be useful, i think there will have to be arch-specific sections 
which document the funcs each port provides.
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
  2013-04-07 18:39                                               ` Mike Frysinger
@ 2013-04-07 18:48                                                 ` John David Anglin
       [not found]                                                   ` <BLU0-SMTP986B123D17DB8B88214F797C40-MsuGFMq8XAE@public.gmane.org>
  2013-04-12  1:55                                                   ` Mike Frysinger
  0 siblings, 2 replies; 41+ messages in thread
From: John David Anglin @ 2013-04-07 18:48 UTC (permalink / raw)
  To: Mike Frysinger
  Cc: Kyle McMartin, Michael Kerrisk (man-pages),
	linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
	linux-parisc

On 7-Apr-13, at 2:39 PM, Mike Frysinger wrote:

> just to be clear, the only insn you need is:
> 	ble 0x100(%sr2, %r0);
>
> the kernel docs say sr2 holds the kernel gateway page (so i guess  
> 0x100 is a
> known offset into that).  the docs don't mention r0 that i can see,  
> so i'm
> guessing it's one of those "always 0" registers ?

Yes.  There is also an entry at offset 0xb0 for light-weight- 
syscalls.  Currently,
this implements an atomic CAS operation used for pthread support.

Dave
--
John David Anglin	dave.anglin@bell.net




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
       [not found]                                                 ` <20130407151134.GX12938-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org>
  2013-04-07 15:38                                                   ` James Bottomley
@ 2013-04-08  9:18                                                   ` Michael Kerrisk (man-pages)
  1 sibling, 0 replies; 41+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-04-08  9:18 UTC (permalink / raw)
  To: Kyle McMartin
  Cc: James Bottomley, Mike Frysinger, linux-man, Kyle McMartin,
	Helge Deller, James E.J. Bottomley,
	linux-parisc-u79uwXL29TY76Z2rM5mHXA

On Sun, Apr 7, 2013 at 5:11 PM, Kyle McMartin <kyle-pfcGkIkfWfAsA/PxXw9srA@public.gmane.org> wrote:
> On Sun, Apr 07, 2013 at 07:56:49AM -0700, James Bottomley wrote:
>> Actually, that's not quite correct.  on 64 bits it's arg1-8 are %r26-%
>> r19 but on 32 the convention is that arg1-arg4 are %r26-%r23 and the
>> rest on stack.  We can also do register pair combining on 32 bits for a
>> 64 bit argument.
>
> I guess the confusion is whether you're writing this from the kernel
> side or the userspace side. The syscall instruction is called with six
> arg registers, but we fix it on entry to the kernel when we call into C.> --
> To unsubscribe from this list: send the line "unsubscribe linux-man" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thanks, Kyle.
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
       [not found]                                                   ` <BLU0-SMTP986B123D17DB8B88214F797C40-MsuGFMq8XAE@public.gmane.org>
@ 2013-04-08  9:20                                                       ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 41+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-04-08  9:20 UTC (permalink / raw)
  To: Mike Frysinger, Kyle McMartin
  Cc: John David Anglin, linux-man, Helge Deller, James E.J. Bottomley,
	linux-parisc-u79uwXL29TY76Z2rM5mHXA

On Sun, Apr 7, 2013 at 8:48 PM, John David Anglin <dave.anglin-CzeTG9NwML0@public.gmane.org=
> wrote:
> On 7-Apr-13, at 2:39 PM, Mike Frysinger wrote:
>
>> just to be clear, the only insn you need is:
>>         ble 0x100(%sr2, %r0);
>>
>> the kernel docs say sr2 holds the kernel gateway page (so i guess 0x=
100 is
>> a
>> known offset into that).  the docs don't mention r0 that i can see, =
so i'm
>> guessing it's one of those "always 0" registers ?
>
>
> Yes.  There is also an entry at offset 0xb0 for light-weight-syscalls=
=2E
> Currently,
> this implements an atomic CAS operation used for pthread support.

Mike (and Kyle),

=46or review, here are the tables as they now stand:

=3D=3D=3D=3D=3D
   Architecture calling conventions
       Every architecture has its own way of invoking and passing argum=
ents to
       the kernel.  The details for various architectures are  listed  =
in  the
       two tables below.

       The  first  table  lists  the  instruction used to transition to=
 kernel
       mode, (which might not be the fastest or best way to transition =
to  the
       kernel,  so  you might have to refer to the VDSO), the register =
used to
       indicate the system call number, and the register used  to  retu=
rn  the
       system call result.

       arch/ABI   instruction          syscall #   retval  Notes
       =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80
       arm/OABI   swi NR               -           a1      NR is syscal=
l #
       arm/EABI   swi 0x0              r7          r1
       blackfin   excpt 0x0            P0          R0
       i386       int $0x80            eax         eax
       ia64       break 0x100000       r15         r10/r8C
       parisc     ble 0x100(%sr2, %r0) r20         r28
       sparc/32   t 0x10               g1          o0
       sparc/64   t 0x6d               g1          o0
       x86_64     syscall              rax         rax

       The second table shows the registers used to pass the system cal=
l argu=E2=80=90
       ments.

       arch/ABI   arg1   arg2   arg3   arg4   arg5   arg6   arg7
       =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80
       arm/OABI   a1     a2     a3     a4     v1     v2     v3
       arm/EABI   r1     r2     r3     r4     r5     r6     r7
       blackfin   R0     R1     R2     R3     R4     R5     -
       i386       ebx    ecx    edx    esi    edi    ebp    -
       ia64       r11    r9     r10    r14    r15    r13    -
       parisc     r26    r25    r24    r23    r22    r21    -
       sparc/32   o0     o1     o2     o3     o4     o5     -
       sparc/64   o0     o1     o2     o3     o4     o5     -
       x86_64     rdi    rsi    rdx    r10    r8     r9     -

       Note that these tables don't cover the entire  calling  conventi=
on=E2=80=94some
       architectures  may  indiscriminately clobber other registers not=
 listed
       here.
=3D=3D=3D=3D=3D

Cheers,

Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
@ 2013-04-08  9:20                                                       ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 41+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-04-08  9:20 UTC (permalink / raw)
  To: Mike Frysinger, Kyle McMartin
  Cc: John David Anglin, linux-man, Helge Deller, James E.J. Bottomley,
	linux-parisc-u79uwXL29TY76Z2rM5mHXA

On Sun, Apr 7, 2013 at 8:48 PM, John David Anglin <dave.anglin-CzeTG9NwML0@public.gmane.org> wrote:
> On 7-Apr-13, at 2:39 PM, Mike Frysinger wrote:
>
>> just to be clear, the only insn you need is:
>>         ble 0x100(%sr2, %r0);
>>
>> the kernel docs say sr2 holds the kernel gateway page (so i guess 0x100 is
>> a
>> known offset into that).  the docs don't mention r0 that i can see, so i'm
>> guessing it's one of those "always 0" registers ?
>
>
> Yes.  There is also an entry at offset 0xb0 for light-weight-syscalls.
> Currently,
> this implements an atomic CAS operation used for pthread support.

Mike (and Kyle),

For review, here are the tables as they now stand:

=====
   Architecture calling conventions
       Every architecture has its own way of invoking and passing arguments to
       the kernel.  The details for various architectures are  listed  in  the
       two tables below.

       The  first  table  lists  the  instruction used to transition to kernel
       mode, (which might not be the fastest or best way to transition to  the
       kernel,  so  you might have to refer to the VDSO), the register used to
       indicate the system call number, and the register used  to  return  the
       system call result.

       arch/ABI   instruction          syscall #   retval  Notes
       ────────────────────────────────────────────────────────────────────
       arm/OABI   swi NR               -           a1      NR is syscall #
       arm/EABI   swi 0x0              r7          r1
       blackfin   excpt 0x0            P0          R0
       i386       int $0x80            eax         eax
       ia64       break 0x100000       r15         r10/r8C
       parisc     ble 0x100(%sr2, %r0) r20         r28
       sparc/32   t 0x10               g1          o0
       sparc/64   t 0x6d               g1          o0
       x86_64     syscall              rax         rax

       The second table shows the registers used to pass the system call argu‐
       ments.

       arch/ABI   arg1   arg2   arg3   arg4   arg5   arg6   arg7
       ──────────────────────────────────────────────────────────
       arm/OABI   a1     a2     a3     a4     v1     v2     v3
       arm/EABI   r1     r2     r3     r4     r5     r6     r7
       blackfin   R0     R1     R2     R3     R4     R5     -
       i386       ebx    ecx    edx    esi    edi    ebp    -
       ia64       r11    r9     r10    r14    r15    r13    -
       parisc     r26    r25    r24    r23    r22    r21    -
       sparc/32   o0     o1     o2     o3     o4     o5     -
       sparc/64   o0     o1     o2     o3     o4     o5     -
       x86_64     rdi    rsi    rdx    r10    r8     r9     -

       Note that these tables don't cover the entire  calling  convention—some
       architectures  may  indiscriminately clobber other registers not listed
       here.
=====

Cheers,

Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
       [not found]                                                       ` <CAKgNAkhv6tovvnucoofDR-eOe4H7xeFZDam9+iaVVndEqbuoXg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-04-12  1:40                                                         ` Mike Frysinger
       [not found]                                                           ` <201304112140.18506.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Mike Frysinger @ 2013-04-12  1:40 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w; +Cc: linux-man

[-- Attachment #1: Type: Text/Plain, Size: 417 bytes --]

On Monday 08 April 2013 05:20:07 Michael Kerrisk (man-pages) wrote:
>        arch/ABI   instruction          syscall #   retval  Notes

i was thinking it also might be useful to mention the register where the 
syscall # is kept for syscall_restart, but we can do that in a follow up

>        ia64       break 0x100000       r15         r10/r8C

looks like you added a typo :).  it's "r8", not "r8C".
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
  2013-04-07 18:48                                                 ` John David Anglin
       [not found]                                                   ` <BLU0-SMTP986B123D17DB8B88214F797C40-MsuGFMq8XAE@public.gmane.org>
@ 2013-04-12  1:55                                                   ` Mike Frysinger
       [not found]                                                     ` <201304112155.46349.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
  2013-04-12 14:01                                                     ` Kyle McMartin
  1 sibling, 2 replies; 41+ messages in thread
From: Mike Frysinger @ 2013-04-12  1:55 UTC (permalink / raw)
  To: John David Anglin
  Cc: Kyle McMartin, Michael Kerrisk (man-pages),
	linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
	linux-parisc

[-- Attachment #1: Type: Text/Plain, Size: 686 bytes --]

On Sunday 07 April 2013 14:48:42 John David Anglin wrote:
> On 7-Apr-13, at 2:39 PM, Mike Frysinger wrote:
> > just to be clear, the only insn you need is:
> > 	ble 0x100(%sr2, %r0);
> > 
> > the kernel docs say sr2 holds the kernel gateway page (so i guess
> > 0x100 is a
> > known offset into that).  the docs don't mention r0 that i can see,
> > so i'm
> > guessing it's one of those "always 0" registers ?
> 
> Yes.  There is also an entry at offset 0xb0 for light-weight-
> syscalls.  Currently,
> this implements an atomic CAS operation used for pthread support.

interesting.  sounds like a poor man's vDSO.  i'll document this the new 
vdso(7) man page.
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
       [not found]                                                     ` <201304112155.46349.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
@ 2013-04-12  2:34                                                       ` John David Anglin
  2013-04-12  3:38                                                         ` Mike Frysinger
  0 siblings, 1 reply; 41+ messages in thread
From: John David Anglin @ 2013-04-12  2:34 UTC (permalink / raw)
  To: Mike Frysinger
  Cc: Kyle McMartin, Michael Kerrisk (man-pages),
	linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
	linux-parisc-u79uwXL29TY76Z2rM5mHXA

On 11-Apr-13, at 9:55 PM, Mike Frysinger wrote:

> On Sunday 07 April 2013 14:48:42 John David Anglin wrote:
>> On 7-Apr-13, at 2:39 PM, Mike Frysinger wrote:
>>> just to be clear, the only insn you need is:
>>> 	ble 0x100(%sr2, %r0);
>>>
>>> the kernel docs say sr2 holds the kernel gateway page (so i guess
>>> 0x100 is a
>>> known offset into that).  the docs don't mention r0 that i can see,
>>> so i'm
>>> guessing it's one of those "always 0" registers ?
>>
>> Yes.  There is also an entry at offset 0xb0 for light-weight-
>> syscalls.  Currently,
>> this implements an atomic CAS operation used for pthread support.
>
> interesting.  sounds like a poor man's vDSO.  i'll document this the  
> new
> vdso(7) man page.

Not exactly, the code runs on the gateway page which is in kernel space.
The main reason for doing the operation in kernel space is to prevent
processes from being preempted while executing in the lock region.  In  
general,
parisc processes are not preempted on the gateway page.  There are
some subtleties regarding fault handling.

There is support in glibc and libgcc for these calls.  The libgcc  
implementation
in linux-atomic.c is very similar to that on arm.

Dave
--
John David Anglin	dave.anglin-CzeTG9NwML0@public.gmane.org



--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
  2013-04-12  2:34                                                       ` John David Anglin
@ 2013-04-12  3:38                                                         ` Mike Frysinger
  2013-04-12  4:45                                                           ` James Bottomley
  0 siblings, 1 reply; 41+ messages in thread
From: Mike Frysinger @ 2013-04-12  3:38 UTC (permalink / raw)
  To: John David Anglin
  Cc: Kyle McMartin, Michael Kerrisk (man-pages),
	linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
	linux-parisc

[-- Attachment #1: Type: Text/Plain, Size: 2707 bytes --]

On Thursday 11 April 2013 22:34:43 John David Anglin wrote:
> On 11-Apr-13, at 9:55 PM, Mike Frysinger wrote:
> > On Sunday 07 April 2013 14:48:42 John David Anglin wrote:
> >> On 7-Apr-13, at 2:39 PM, Mike Frysinger wrote:
> >>> just to be clear, the only insn you need is:
> >>> 	ble 0x100(%sr2, %r0);
> >>> 
> >>> the kernel docs say sr2 holds the kernel gateway page (so i guess
> >>> 0x100 is a
> >>> known offset into that).  the docs don't mention r0 that i can see,
> >>> so i'm
> >>> guessing it's one of those "always 0" registers ?
> >> 
> >> Yes.  There is also an entry at offset 0xb0 for light-weight-
> >> syscalls.  Currently,
> >> this implements an atomic CAS operation used for pthread support.
> > 
> > interesting.  sounds like a poor man's vDSO.  i'll document this the
> > new
> > vdso(7) man page.
> 
> Not exactly, the code runs on the gateway page which is in kernel space.
> The main reason for doing the operation in kernel space is to prevent
> processes from being preempted while executing in the lock region.  In
> general,
> parisc processes are not preempted on the gateway page.  There are
> some subtleties regarding fault handling.

sure ... the Blackfin arch does a similar thing for providing fast atomic 
primitives to userspace since the ISA can't.

what do you think of this section for vdso(7) ?  i might have to split the 
"real" vdso arches from these others since there's a couple now (arm, bfin, 
parisc), and i think there might be more down the line (microblaze).

.SS parisc (hppa) functions
.\" See linux/arch/parisc/kernel/syscall.S
.\" See linux/Documentation/parisc/registers
The parisc port has a code page full of utility functions.
Rather than use the normal ELF aux vector approach, it passes the address of
the page to the process via the SR2 register.
This is done to match the way HP-UX works.

Since it's just a raw page of code, there is no ELF information for doing
symbol lookups or versioning.
Simply call into the appropriate offset via the branch instruction, e.g.:
.br
ble <offset>(%sr2, %r0)
.if t \{\
.ft CW
\}
.TS
l l.
offset	function
_
00b0	lws_entry
00e0	set_thread_pointer
0100	linux_gateway_entry (syscall)
0268	syscall_nosys
0274	tracesys
0324	tracesys_next
0368	tracesys_exit
03a0	tracesys_sigexit
03b8	lws_start
03dc	lws_exit_nosys
03e0	lws_exit
03e4	lws_compare_and_swap64
03e8	lws_compare_and_swap
0404	cas_wouldblock
0410	cas_action
.TE
.if t \{\
.in
.ft P
\}

> There is support in glibc and libgcc for these calls.  The libgcc
> implementation
> in linux-atomic.c is very similar to that on arm.

interesting.  another arch to add :).
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
  2013-04-12  3:38                                                         ` Mike Frysinger
@ 2013-04-12  4:45                                                           ` James Bottomley
  2013-04-12 12:17                                                             ` John David Anglin
  2013-04-12 18:45                                                             ` Mike Frysinger
  0 siblings, 2 replies; 41+ messages in thread
From: James Bottomley @ 2013-04-12  4:45 UTC (permalink / raw)
  To: Mike Frysinger
  Cc: John David Anglin, Kyle McMartin, Michael Kerrisk (man-pages),
	linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
	linux-parisc

On Thu, 2013-04-11 at 23:38 -0400, Mike Frysinger wrote:
> On Thursday 11 April 2013 22:34:43 John David Anglin wrote:
> > On 11-Apr-13, at 9:55 PM, Mike Frysinger wrote:
> > > On Sunday 07 April 2013 14:48:42 John David Anglin wrote:
> > >> On 7-Apr-13, at 2:39 PM, Mike Frysinger wrote:
> > >>> just to be clear, the only insn you need is:
> > >>>   ble 0x100(%sr2, %r0);
> > >>> 
> > >>> the kernel docs say sr2 holds the kernel gateway page (so i guess
> > >>> 0x100 is a
> > >>> known offset into that).  the docs don't mention r0 that i can see,
> > >>> so i'm
> > >>> guessing it's one of those "always 0" registers ?
> > >> 
> > >> Yes.  There is also an entry at offset 0xb0 for light-weight-
> > >> syscalls.  Currently,
> > >> this implements an atomic CAS operation used for pthread support.
> > > 
> > > interesting.  sounds like a poor man's vDSO.  i'll document this the
> > > new
> > > vdso(7) man page.
> > 
> > Not exactly, the code runs on the gateway page which is in kernel space.
> > The main reason for doing the operation in kernel space is to prevent
> > processes from being preempted while executing in the lock region.  In
> > general,
> > parisc processes are not preempted on the gateway page.  There are
> > some subtleties regarding fault handling.
> 
> sure ... the Blackfin arch does a similar thing for providing fast atomic 
> primitives to userspace since the ISA can't.
> 
> what do you think of this section for vdso(7) ?  i might have to split the 
> "real" vdso arches from these others since there's a couple now (arm, bfin, 
> parisc), and i think there might be more down the line (microblaze).

I've got to say, I really don't think this can be classified as a vdso.
For a vdso, the kernel exports an ELF object that can be linked
dynamically into any elf binary requiring it.  The ELF section
information provides full details and so vdso entries can be called by
symbol.

In the parisc gateway page implementation, we have a set of "hidden"
primitives which the executable must know how to call (no self
description like a vdso).  This mechanism is identical to the original
intent of the x86 int <n> instruction (an instruction that traps into
the kernel and performs some primitive action but to use it, you have to
know which function corresponds to which value of <n>).

James


> .SS parisc (hppa) functions
> .\" See linux/arch/parisc/kernel/syscall.S
> .\" See linux/Documentation/parisc/registers
> The parisc port has a code page full of utility functions.
> Rather than use the normal ELF aux vector approach, it passes the address of
> the page to the process via the SR2 register.
> This is done to match the way HP-UX works.
> 
> Since it's just a raw page of code, there is no ELF information for doing
> symbol lookups or versioning.
> Simply call into the appropriate offset via the branch instruction, e.g.:
> .br
> ble <offset>(%sr2, %r0)
> .if t \{\
> .ft CW
> \}
> .TS
> l l.
> offset  function
> _
> 00b0    lws_entry
> 00e0    set_thread_pointer
> 0100    linux_gateway_entry (syscall)
> 0268    syscall_nosys
> 0274    tracesys
> 0324    tracesys_next
> 0368    tracesys_exit
> 03a0    tracesys_sigexit
> 03b8    lws_start
> 03dc    lws_exit_nosys
> 03e0    lws_exit
> 03e4    lws_compare_and_swap64
> 03e8    lws_compare_and_swap
> 0404    cas_wouldblock
> 0410    cas_action
> .TE
> .if t \{\
> .in
> .ft P
> \}
> 
> > There is support in glibc and libgcc for these calls.  The libgcc
> > implementation
> > in linux-atomic.c is very similar to that on arm.
> 
> interesting.  another arch to add :).
> -mike



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
  2013-04-12  4:45                                                           ` James Bottomley
@ 2013-04-12 12:17                                                             ` John David Anglin
  2013-04-12 18:45                                                             ` Mike Frysinger
  1 sibling, 0 replies; 41+ messages in thread
From: John David Anglin @ 2013-04-12 12:17 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Frysinger, Kyle McMartin, Michael Kerrisk (man-pages),
	linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
	linux-parisc

On 12-Apr-13, at 12:45 AM, James Bottomley wrote:

> On Thu, 2013-04-11 at 23:38 -0400, Mike Frysinger wrote:
>> On Thursday 11 April 2013 22:34:43 John David Anglin wrote:
>>> On 11-Apr-13, at 9:55 PM, Mike Frysinger wrote:
>>>> On Sunday 07 April 2013 14:48:42 John David Anglin wrote:
>>>>> On 7-Apr-13, at 2:39 PM, Mike Frysinger wrote:
>>>>>> just to be clear, the only insn you need is:
>>>>>>  ble 0x100(%sr2, %r0);
>>>>>>
>>>>>> the kernel docs say sr2 holds the kernel gateway page (so i guess
>>>>>> 0x100 is a
>>>>>> known offset into that).  the docs don't mention r0 that i can  
>>>>>> see,
>>>>>> so i'm
>>>>>> guessing it's one of those "always 0" registers ?
>>>>>
>>>>> Yes.  There is also an entry at offset 0xb0 for light-weight-
>>>>> syscalls.  Currently,
>>>>> this implements an atomic CAS operation used for pthread support.
>>>>
>>>> interesting.  sounds like a poor man's vDSO.  i'll document this  
>>>> the
>>>> new
>>>> vdso(7) man page.
>>>
>>> Not exactly, the code runs on the gateway page which is in kernel  
>>> space.
>>> The main reason for doing the operation in kernel space is to  
>>> prevent
>>> processes from being preempted while executing in the lock  
>>> region.  In
>>> general,
>>> parisc processes are not preempted on the gateway page.  There are
>>> some subtleties regarding fault handling.
>>
>> sure ... the Blackfin arch does a similar thing for providing fast  
>> atomic
>> primitives to userspace since the ISA can't.
>>
>> what do you think of this section for vdso(7) ?  i might have to  
>> split the
>> "real" vdso arches from these others since there's a couple now  
>> (arm, bfin,
>> parisc), and i think there might be more down the line (microblaze).
>
> I've got to say, I really don't think this can be classified as a  
> vdso.
> For a vdso, the kernel exports an ELF object that can be linked
> dynamically into any elf binary requiring it.  The ELF section
> information provides full details and so vdso entries can be called by
> symbol.
>
> In the parisc gateway page implementation, we have a set of "hidden"
> primitives which the executable must know how to call (no self
> description like a vdso).  This mechanism is identical to the original
> intent of the x86 int <n> instruction (an instruction that traps into
> the kernel and performs some primitive action but to use it, you  
> have to
> know which function corresponds to which value of <n>).

I agree with James.  There is no ELF object exported to userspace.  The
content of the gateway page is hidden.  The data structures used for
the locks are in the kernel itself.  Access is via a special branch  
instruction
rather than a break/trap instruction.

>
> James
>
>
>> .SS parisc (hppa) functions
>> .\" See linux/arch/parisc/kernel/syscall.S
>> .\" See linux/Documentation/parisc/registers
>> The parisc port has a code page full of utility functions.
>> Rather than use the normal ELF aux vector approach, it passes the  
>> address of
>> the page to the process via the SR2 register.
>> This is done to match the way HP-UX works.
>>
>> Since it's just a raw page of code, there is no ELF information for  
>> doing
>> symbol lookups or versioning.
>> Simply call into the appropriate offset via the branch instruction,  
>> e.g.:
>> .br
>> ble <offset>(%sr2, %r0)
>> .if t \{\
>> .ft CW
>> \}
>> .TS
>> l l.
>> offset  function
>> _
>> 00b0    lws_entry
>> 00e0    set_thread_pointer
>> 0100    linux_gateway_entry (syscall)
>> 0268    syscall_nosys
>> 0274    tracesys
>> 0324    tracesys_next
>> 0368    tracesys_exit
>> 03a0    tracesys_sigexit
>> 03b8    lws_start
>> 03dc    lws_exit_nosys
>> 03e0    lws_exit
>> 03e4    lws_compare_and_swap64
>> 03e8    lws_compare_and_swap
>> 0404    cas_wouldblock
>> 0410    cas_action
>> .TE
>> .if t \{\
>> .in
>> .ft P
>> \}
>>
>>> There is support in glibc and libgcc for these calls.  The libgcc
>>> implementation
>>> in linux-atomic.c is very similar to that on arm.
>>
>> interesting.  another arch to add :).
>> -mike
>
>
>

--
John David Anglin	dave.anglin@bell.net




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
  2013-04-12  1:55                                                   ` Mike Frysinger
       [not found]                                                     ` <201304112155.46349.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
@ 2013-04-12 14:01                                                     ` Kyle McMartin
  1 sibling, 0 replies; 41+ messages in thread
From: Kyle McMartin @ 2013-04-12 14:01 UTC (permalink / raw)
  To: Mike Frysinger
  Cc: John David Anglin, Michael Kerrisk (man-pages),
	linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
	linux-parisc

On Thu, Apr 11, 2013 at 09:55:43PM -0400, Mike Frysinger wrote:
> interesting.  sounds like a poor man's vDSO.  i'll document this the new 
> vdso(7) man page.
> -mike

fwiw ia64 does basically the same thing for a subset of syscalls
(fsys.c)

--Kyle

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
  2013-04-12  4:45                                                           ` James Bottomley
  2013-04-12 12:17                                                             ` John David Anglin
@ 2013-04-12 18:45                                                             ` Mike Frysinger
  2013-04-12 19:14                                                               ` James Bottomley
  1 sibling, 1 reply; 41+ messages in thread
From: Mike Frysinger @ 2013-04-12 18:45 UTC (permalink / raw)
  To: James Bottomley
  Cc: John David Anglin, Kyle McMartin, Michael Kerrisk (man-pages),
	linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
	linux-parisc

[-- Attachment #1: Type: Text/Plain, Size: 3533 bytes --]

On Friday 12 April 2013 00:45:12 James Bottomley wrote:
> On Thu, 2013-04-11 at 23:38 -0400, Mike Frysinger wrote:
> > On Thursday 11 April 2013 22:34:43 John David Anglin wrote:
> > > On 11-Apr-13, at 9:55 PM, Mike Frysinger wrote:
> > > > On Sunday 07 April 2013 14:48:42 John David Anglin wrote:
> > > >> On 7-Apr-13, at 2:39 PM, Mike Frysinger wrote:
> > > >>> just to be clear, the only insn you need is:
> > > >>>   ble 0x100(%sr2, %r0);
> > > >>> 
> > > >>> the kernel docs say sr2 holds the kernel gateway page (so i guess
> > > >>> 0x100 is a
> > > >>> known offset into that).  the docs don't mention r0 that i can see,
> > > >>> so i'm
> > > >>> guessing it's one of those "always 0" registers ?
> > > >> 
> > > >> Yes.  There is also an entry at offset 0xb0 for light-weight-
> > > >> syscalls.  Currently,
> > > >> this implements an atomic CAS operation used for pthread support.
> > > > 
> > > > interesting.  sounds like a poor man's vDSO.  i'll document this the
> > > > new
> > > > vdso(7) man page.
> > > 
> > > Not exactly, the code runs on the gateway page which is in kernel
> > > space. The main reason for doing the operation in kernel space is to
> > > prevent processes from being preempted while executing in the lock
> > > region.  In general,
> > > parisc processes are not preempted on the gateway page.  There are
> > > some subtleties regarding fault handling.
> > 
> > sure ... the Blackfin arch does a similar thing for providing fast atomic
> > primitives to userspace since the ISA can't.
> > 
> > what do you think of this section for vdso(7) ?  i might have to split
> > the "real" vdso arches from these others since there's a couple now
> > (arm, bfin, parisc), and i think there might be more down the line
> > (microblaze).
> 
> I've got to say, I really don't think this can be classified as a vdso.
> For a vdso, the kernel exports an ELF object that can be linked
> dynamically into any elf binary requiring it.  The ELF section
> information provides full details and so vdso entries can be called by
> symbol.

strictly speaking, sure, a vDSO is only a vDSO if it's an ELF (since the 
acronym is literally "virtual dynamic shared object").  however, i see the 
vdso as being a bit more of a flexible concept -- it's a place of shared code 
that the kernel manages and exports for all userspace processes.  
fundamentally, the point of the vDSO is to provide services to greatly speed 
up userspace.  in that regard, these mapped pages are exactly like vDSOs.

thus i think it's appropriate to document these "fixed code" regions that many 
arches export (ARM, Blackfin, Itanium, Microblaze, PA-RISC) in the same man 
page as the vdso.  especially since (currently) arches do one or the other, 
but not both.

> In the parisc gateway page implementation, we have a set of "hidden"
> primitives which the executable must know how to call (no self
> description like a vdso).  This mechanism is identical to the original
> intent of the x86 int <n> instruction (an instruction that traps into
> the kernel and performs some primitive action but to use it, you have to
> know which function corresponds to which value of <n>).

would it be useful to document all of them ?  or just the ones that userspace 
actively uses (like syscall/cas) ?  or should all of this be recorded in the 
kernel's Documentation/parisc/ subdir and just have the man page refer people 
there (like it does for ARM & Blackfin currently) ?
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
  2013-04-12 18:45                                                             ` Mike Frysinger
@ 2013-04-12 19:14                                                               ` James Bottomley
  2013-04-12 19:46                                                                 ` Mike Frysinger
  0 siblings, 1 reply; 41+ messages in thread
From: James Bottomley @ 2013-04-12 19:14 UTC (permalink / raw)
  To: Mike Frysinger
  Cc: John David Anglin, Kyle McMartin, Michael Kerrisk (man-pages),
	linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
	linux-parisc

On Fri, 2013-04-12 at 14:45 -0400, Mike Frysinger wrote:
> On Friday 12 April 2013 00:45:12 James Bottomley wrote:
> > On Thu, 2013-04-11 at 23:38 -0400, Mike Frysinger wrote:
> > > what do you think of this section for vdso(7) ?  i might have to split
> > > the "real" vdso arches from these others since there's a couple now
> > > (arm, bfin, parisc), and i think there might be more down the line
> > > (microblaze).
> > 
> > I've got to say, I really don't think this can be classified as a vdso.
> > For a vdso, the kernel exports an ELF object that can be linked
> > dynamically into any elf binary requiring it.  The ELF section
> > information provides full details and so vdso entries can be called by
> > symbol.
> 
> strictly speaking, sure, a vDSO is only a vDSO if it's an ELF (since the 
> acronym is literally "virtual dynamic shared object").  however, i see the 
> vdso as being a bit more of a flexible concept -- it's a place of shared code 
> that the kernel manages and exports for all userspace processes.  
> fundamentally, the point of the vDSO is to provide services to greatly speed 
> up userspace.  in that regard, these mapped pages are exactly like vDSOs.

I don't entirely understand this classification.  If the kernel<->user
gateway becomes classified as a vdso, that covers our syscall interface
on every archtecture.  There's now no distinction between a vdso (which
may not even move to kernel mode) and a syscall.

I think the difference is that a syscall is a specific call to a known
kernel routine by number and it involves a transition to kernel mode.  A
vdso is an exported link object containing certain functions which may
or may not cause a trap to kernel mode when executed.  The distinction
is how you do the call.  For syscalls, you have to know the number and
the arguments.  For vdso you just have to know the symbol (and
obviously, the prototype for C code) and the kernel supplies the
implementation direct to the userspace binary.

> thus i think it's appropriate to document these "fixed code" regions that many 
> arches export (ARM, Blackfin, Itanium, Microblaze, PA-RISC) in the same man 
> page as the vdso.  especially since (currently) arches do one or the other, 
> but not both.

I really see these as a type of lightweight syscall.  You use the
syscall prototype (call by number with known arguments) but the call may
not necessarily transition to kernel mode proper to handle the function.

> > In the parisc gateway page implementation, we have a set of "hidden"
> > primitives which the executable must know how to call (no self
> > description like a vdso).  This mechanism is identical to the original
> > intent of the x86 int <n> instruction (an instruction that traps into
> > the kernel and performs some primitive action but to use it, you have to
> > know which function corresponds to which value of <n>).
> 
> would it be useful to document all of them ?  or just the ones that userspace 
> actively uses (like syscall/cas) ?  or should all of this be recorded in the 
> kernel's Documentation/parisc/ subdir and just have the man page refer people 
> there (like it does for ARM & Blackfin currently) ?

I'm not sure.  For x86 they're in include/asm/traps.h.  I think the only
ones we really use are int3 for breakpoint, int4 for overflow and int80
for legacy syscall.

James




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
  2013-04-12 19:14                                                               ` James Bottomley
@ 2013-04-12 19:46                                                                 ` Mike Frysinger
  2013-04-12 20:25                                                                   ` James Bottomley
  0 siblings, 1 reply; 41+ messages in thread
From: Mike Frysinger @ 2013-04-12 19:46 UTC (permalink / raw)
  To: James Bottomley
  Cc: John David Anglin, Kyle McMartin, Michael Kerrisk (man-pages),
	linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
	linux-parisc

[-- Attachment #1: Type: Text/Plain, Size: 5259 bytes --]

On Friday 12 April 2013 15:14:47 James Bottomley wrote:
> On Fri, 2013-04-12 at 14:45 -0400, Mike Frysinger wrote:
> > On Friday 12 April 2013 00:45:12 James Bottomley wrote:
> > > On Thu, 2013-04-11 at 23:38 -0400, Mike Frysinger wrote:
> > > > what do you think of this section for vdso(7) ?  i might have to
> > > > split the "real" vdso arches from these others since there's a
> > > > couple now (arm, bfin, parisc), and i think there might be more down
> > > > the line (microblaze).
> > > 
> > > I've got to say, I really don't think this can be classified as a vdso.
> > > For a vdso, the kernel exports an ELF object that can be linked
> > > dynamically into any elf binary requiring it.  The ELF section
> > > information provides full details and so vdso entries can be called by
> > > symbol.
> > 
> > strictly speaking, sure, a vDSO is only a vDSO if it's an ELF (since the
> > acronym is literally "virtual dynamic shared object").  however, i see
> > the vdso as being a bit more of a flexible concept -- it's a place of
> > shared code that the kernel manages and exports for all userspace
> > processes. fundamentally, the point of the vDSO is to provide services
> > to greatly speed up userspace.  in that regard, these mapped pages are
> > exactly like vDSOs.
> 
> I don't entirely understand this classification.  If the kernel<->user
> gateway becomes classified as a vdso, that covers our syscall interface
> on every archtecture.  There's now no distinction between a vdso (which
> may not even move to kernel mode) and a syscall.
> 
> I think the difference is that a syscall is a specific call to a known
> kernel routine by number and it involves a transition to kernel mode.  A
> vdso is an exported link object containing certain functions which may
> or may not cause a trap to kernel mode when executed.  The distinction
> is how you do the call.  For syscalls, you have to know the number and
> the arguments.  For vdso you just have to know the symbol (and
> obviously, the prototype for C code) and the kernel supplies the
> implementation direct to the userspace binary.

i'm not fully versed in the parisc linux gateway page or how the architecture 
is handling things, so i could be completely off here.  from reading the source 
code, it *looked* like it was just a page of utility funcs that userspace 
branches to without changing privilege modes or going through the full syscall 
routines.

so i'm saying the gateway page itself can be thought of in the same vein as a 
vDSO.  it's a black box with entry points that provide light weight services 
to userspace.  sometimes it ends up triggering a full syscall, sometimes it 
doesn't (just like a vDSO).

> > thus i think it's appropriate to document these "fixed code" regions that
> > many arches export (ARM, Blackfin, Itanium, Microblaze, PA-RISC) in the
> > same man page as the vdso.  especially since (currently) arches do one
> > or the other, but not both.
> 
> I really see these as a type of lightweight syscall.  You use the
> syscall prototype (call by number with known arguments) but the call may
> not necessarily transition to kernel mode proper to handle the function.

if you think of the vdso in a very strict light (it's exactly an ELF that the 
kernel automatically maps into every process's address space), then i guess 
you can only classify these as lightweight syscalls (where the address/offset 
is the "syscall #").

i see vdso as being a more flexible concept than that -- if it's code mapped 
into a process's address space and provides useful lightweight services that 
are meant to be used specifically in lieu of syscall(), then it's vdso-like and 
should be in the vdso(7) man page.  it has a lot more in common imo with a 
vdso than it does with an actual syscall.  i certainly think vdso(7) is more 
appropriate for these regions than syscall(2) or syscalls(2).

> > > In the parisc gateway page implementation, we have a set of "hidden"
> > > primitives which the executable must know how to call (no self
> > > description like a vdso).  This mechanism is identical to the original
> > > intent of the x86 int <n> instruction (an instruction that traps into
> > > the kernel and performs some primitive action but to use it, you have
> > > to know which function corresponds to which value of <n>).
> > 
> > would it be useful to document all of them ?  or just the ones that
> > userspace actively uses (like syscall/cas) ?  or should all of this be
> > recorded in the kernel's Documentation/parisc/ subdir and just have the
> > man page refer people there (like it does for ARM & Blackfin currently)
> > ?
> 
> I'm not sure.  For x86 they're in include/asm/traps.h.  I think the only
> ones we really use are int3 for breakpoint, int4 for overflow and int80
> for legacy syscall.

hmm, i wasn't even considering the other arch-specific services offered by e.g. 
software interrupts.  i don't think those belong in vdso(7) as they don't 
confer any of the lightweight advantages the vdso is designed to bring, but it 
might be useful to document these somewhere.  they're also not as common for 
people to encounter as a vdso ...
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
  2013-04-12 19:46                                                                 ` Mike Frysinger
@ 2013-04-12 20:25                                                                   ` James Bottomley
  0 siblings, 0 replies; 41+ messages in thread
From: James Bottomley @ 2013-04-12 20:25 UTC (permalink / raw)
  To: Mike Frysinger
  Cc: John David Anglin, Kyle McMartin, Michael Kerrisk (man-pages),
	linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
	linux-parisc

On Fri, 2013-04-12 at 15:46 -0400, Mike Frysinger wrote:
> On Friday 12 April 2013 15:14:47 James Bottomley wrote:
> > On Fri, 2013-04-12 at 14:45 -0400, Mike Frysinger wrote:
> > > On Friday 12 April 2013 00:45:12 James Bottomley wrote:
> > > > On Thu, 2013-04-11 at 23:38 -0400, Mike Frysinger wrote:
> > > > > what do you think of this section for vdso(7) ?  i might have to
> > > > > split the "real" vdso arches from these others since there's a
> > > > > couple now (arm, bfin, parisc), and i think there might be more down
> > > > > the line (microblaze).
> > > > 
> > > > I've got to say, I really don't think this can be classified as a vdso.
> > > > For a vdso, the kernel exports an ELF object that can be linked
> > > > dynamically into any elf binary requiring it.  The ELF section
> > > > information provides full details and so vdso entries can be called by
> > > > symbol.
> > > 
> > > strictly speaking, sure, a vDSO is only a vDSO if it's an ELF (since the
> > > acronym is literally "virtual dynamic shared object").  however, i see
> > > the vdso as being a bit more of a flexible concept -- it's a place of
> > > shared code that the kernel manages and exports for all userspace
> > > processes. fundamentally, the point of the vDSO is to provide services
> > > to greatly speed up userspace.  in that regard, these mapped pages are
> > > exactly like vDSOs.
> > 
> > I don't entirely understand this classification.  If the kernel<->user
> > gateway becomes classified as a vdso, that covers our syscall interface
> > on every archtecture.  There's now no distinction between a vdso (which
> > may not even move to kernel mode) and a syscall.
> > 
> > I think the difference is that a syscall is a specific call to a known
> > kernel routine by number and it involves a transition to kernel mode.  A
> > vdso is an exported link object containing certain functions which may
> > or may not cause a trap to kernel mode when executed.  The distinction
> > is how you do the call.  For syscalls, you have to know the number and
> > the arguments.  For vdso you just have to know the symbol (and
> > obviously, the prototype for C code) and the kernel supplies the
> > implementation direct to the userspace binary.
> 
> i'm not fully versed in the parisc linux gateway page or how the architecture 
> is handling things, so i could be completely off here.  from reading the source 
> code, it *looked* like it was just a page of utility funcs that userspace 
> branches to without changing privilege modes or going through the full syscall 
> routines.

Oh, if that's the misunderstanding, then the gateway page is "special".
It actually has PAGE_GATEWAY bits set (this is linux terminology; in
parisc terminology it's Execute, promote to PL0)in the page map.  So
anything executing on this page executes with kernel level privilege
(there's more to it than that: to have this happen, you also have to use
a branch with a ,gate completer to activate the privilege promotion).
The upshot is that everything that runs on the gateway page runs at
kernel privilege but with the current user process address space
(although you have access to kernel space via %sr2).  For the 0x100
syscall entry, we redo the space registers to point to the kernel
address space (preserving the user address space in %sr3), move to wide
mode if required, save the user registers and branch into the kernel
syscall entry point.  For all the other functions, we execute at kernel
privilege but don't flip address spaces.  The basic upshot of this is
that these code snippets are executed atomically (because the kernel
can't be pre-empted) and they may perform architecturally forbidden (to
PL3) operations (like setting control registers).

James




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
       [not found]                                                           ` <201304112140.18506.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
@ 2013-04-16  6:01                                                             ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 41+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-04-16  6:01 UTC (permalink / raw)
  To: Mike Frysinger; +Cc: linux-man

On Fri, Apr 12, 2013 at 3:40 AM, Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org> wrote:
> On Monday 08 April 2013 05:20:07 Michael Kerrisk (man-pages) wrote:
>>        arch/ABI   instruction          syscall #   retval  Notes
>
> i was thinking it also might be useful to mention the register where the
> syscall # is kept for syscall_restart, but we can do that in a follow up

That sounds like a useful addition! Yu can see the current version of
the page in Git. Further patches welcome...

>>        ia64       break 0x100000       r15         r10/r8C
>
> looks like you added a typo :).  it's "r8", not "r8C".

Fixed.

Cheers,

Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2013-04-16  6:01 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-27  5:11 [PATCH] man2 : syscall.2 : add notes ch0.han-Hm3cg6mZ9cc
     [not found] ` <1364361092-5948-1-git-send-email-ch0.han-Hm3cg6mZ9cc@public.gmane.org>
2013-03-27  7:53   ` (unknown), Changhee Han
2013-03-27  8:25   ` [PATCH v2] man2 : syscall.2 : add notes Changhee Han
2013-03-28  9:37   ` [PATCH] " Michael Kerrisk (man-pages)
2013-04-01  5:33   ` Changhee Han
     [not found]     ` <1364794429-20477-1-git-send-email-ch0.han-Hm3cg6mZ9cc@public.gmane.org>
2013-04-01  6:13       ` Mike Frysinger
     [not found]         ` <201304010213.06056.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
2013-04-01  6:22           ` Michael Kerrisk (man-pages)
     [not found]             ` <CAKgNAki_8bOsuKTJLx3iMLeSvVXHo0bZf8zSUQ08RR7+D33xgQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-01  7:19               ` Mike Frysinger
     [not found]                 ` <201304010319.45019.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
2013-04-01  7:36                   ` Michael Kerrisk (man-pages)
     [not found]                     ` <CAKgNAkhBASGvXGfdBSjpGaMuxoJofcQvZQrX3a=uxbcKQnXOAQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-01  8:29                       ` Mike Frysinger
     [not found]                         ` <201304010429.45737.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
2013-04-01  9:29                           ` Michael Kerrisk (man-pages)
     [not found]                             ` <CAKgNAkij3zDwakWvcRkRbknmV2Hpt4HWfH4uVqmxp+7gQek-2g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-01 10:32                               ` Mike Frysinger
     [not found]                                 ` <201304010632.41520.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
2013-04-02  6:54                                   ` Michael Kerrisk (man-pages)
     [not found]                                     ` <CAKgNAkgG2kdCC1tyZQkYU7O_nP7RB8VoCmx6eb8FcudU1s6RgA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-02 23:17                                       ` [PATCH] man2 : syscall.2 : document syscall calling conventions Mike Frysinger
2013-04-07 10:00                                         ` Michael Kerrisk (man-pages)
2013-04-07 13:55                                           ` Kyle McMartin
2013-04-07 14:56                                             ` James Bottomley
2013-04-07 15:11                                               ` Kyle McMartin
     [not found]                                                 ` <20130407151134.GX12938-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org>
2013-04-07 15:38                                                   ` James Bottomley
2013-04-08  9:18                                                   ` Michael Kerrisk (man-pages)
     [not found]                                             ` <20130407135514.GW12938-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org>
2013-04-07 18:39                                               ` Mike Frysinger
2013-04-07 18:48                                                 ` John David Anglin
     [not found]                                                   ` <BLU0-SMTP986B123D17DB8B88214F797C40-MsuGFMq8XAE@public.gmane.org>
2013-04-08  9:20                                                     ` Michael Kerrisk (man-pages)
2013-04-08  9:20                                                       ` Michael Kerrisk (man-pages)
     [not found]                                                       ` <CAKgNAkhv6tovvnucoofDR-eOe4H7xeFZDam9+iaVVndEqbuoXg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-12  1:40                                                         ` Mike Frysinger
     [not found]                                                           ` <201304112140.18506.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
2013-04-16  6:01                                                             ` Michael Kerrisk (man-pages)
2013-04-12  1:55                                                   ` Mike Frysinger
     [not found]                                                     ` <201304112155.46349.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
2013-04-12  2:34                                                       ` John David Anglin
2013-04-12  3:38                                                         ` Mike Frysinger
2013-04-12  4:45                                                           ` James Bottomley
2013-04-12 12:17                                                             ` John David Anglin
2013-04-12 18:45                                                             ` Mike Frysinger
2013-04-12 19:14                                                               ` James Bottomley
2013-04-12 19:46                                                                 ` Mike Frysinger
2013-04-12 20:25                                                                   ` James Bottomley
2013-04-12 14:01                                                     ` Kyle McMartin
     [not found]                                           ` <CAKgNAkgODPSWSeA8ZymiAjFBqSAZQMtQe9GW84Y6QHdFEc9S-w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-07 18:43                                             ` Mike Frysinger
2013-04-01  8:37                   ` [PATCH] man2 : syscall.2 : add notes Mike Frysinger
     [not found]                     ` <201304010437.52901.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
2013-04-01  9:30                       ` Michael Kerrisk (man-pages)
     [not found]                         ` <CAKgNAkit-qRPErHDzGEJ_yedA+O97bFxDsqWJMZOhCZ9DPvOtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-01 10:09                           ` Mike Frysinger
2013-04-01  7:05           ` Fw : Re : " 한창희

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.