linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] lseek.2: SYNOPSIS: Use correct types
@ 2020-11-21 17:30 Alejandro Colomar
  2020-11-21 17:45 ` Alejandro Colomar (man-pages)
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Alejandro Colomar @ 2020-11-21 17:30 UTC (permalink / raw)
  To: mtk.manpages; +Cc: Alejandro Colomar, linux-man, linux-kernel

The Linux kernel uses 'unsigned int' instead of 'int'
for 'fd' and 'whence'.
As glibc provides no wrapper, use the same types the kernel uses.

src/linux$ grep -rn "SYSCALL_DEFINE.*lseek"
fs/read_write.c:322:SYSCALL_DEFINE3(lseek, unsigned int, fd, off_t, offset, unsigned int, whence)
fs/read_write.c:328:COMPAT_SYSCALL_DEFINE3(lseek, unsigned int, fd, compat_off_t, offset, unsigned int, whence)
fs/read_write.c:336:SYSCALL_DEFINE5(llseek, unsigned int, fd, unsigned long, offset_high,
arch/mips/kernel/linux32.c:65:SYSCALL_DEFINE5(32_llseek, unsigned int, fd, unsigned int, offset_high,

src/linux$ sed -n 322,325p fs/read_write.c
SYSCALL_DEFINE3(lseek, unsigned int, fd, off_t, offset, unsigned int, whence)
{
	return ksys_lseek(fd, offset, whence);
}

Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
---
 man2/lseek.2 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/man2/lseek.2 b/man2/lseek.2
index e35e410a6..2ff878ffa 100644
--- a/man2/lseek.2
+++ b/man2/lseek.2
@@ -51,7 +51,7 @@ lseek \- reposition read/write file offset
 .br
 .B #include <unistd.h>
 .PP
-.BI "off_t lseek(int " fd ", off_t " offset ", int " whence );
+.BI "off_t lseek(unsigned int " fd ", off_t " offset ", unsigned int " whence );
 .SH DESCRIPTION
 .BR lseek ()
 repositions the file offset of the open file description
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] lseek.2: SYNOPSIS: Use correct types
  2020-11-21 17:30 [PATCH] lseek.2: SYNOPSIS: Use correct types Alejandro Colomar
@ 2020-11-21 17:45 ` Alejandro Colomar (man-pages)
  2020-11-22 22:37   ` Michael Kerrisk (man-pages)
  2020-11-22 12:43 ` Florian Weimer
  2020-11-22 22:32 ` Michael Kerrisk (man-pages)
  2 siblings, 1 reply; 6+ messages in thread
From: Alejandro Colomar (man-pages) @ 2020-11-21 17:45 UTC (permalink / raw)
  To: mtk.manpages; +Cc: linux-man, linux-kernel

Hi Michael,

I'm a bit lost in all the *lseek* pages.
You had a good read some months ago, so you may know it better.
I don't know which of those functions come from the kernel,
and which come from glibc (if any).
In the kernel I only found the lseek, llseek, and 32_llseek
(as you can see in the patch).
So if any other prototype needs to be updated, please do so.
Especially, have a look at lseek64(3),
which I suspect needs the same changes I propose in that patch.

Thanks,

Alex

On 11/21/20 6:30 PM, Alejandro Colomar wrote:
> The Linux kernel uses 'unsigned int' instead of 'int'
> for 'fd' and 'whence'.
> As glibc provides no wrapper, use the same types the kernel uses.
> 
> src/linux$ grep -rn "SYSCALL_DEFINE.*lseek"
> fs/read_write.c:322:SYSCALL_DEFINE3(lseek, unsigned int, fd, off_t, offset, unsigned int, whence)
> fs/read_write.c:328:COMPAT_SYSCALL_DEFINE3(lseek, unsigned int, fd, compat_off_t, offset, unsigned int, whence)
> fs/read_write.c:336:SYSCALL_DEFINE5(llseek, unsigned int, fd, unsigned long, offset_high,
> arch/mips/kernel/linux32.c:65:SYSCALL_DEFINE5(32_llseek, unsigned int, fd, unsigned int, offset_high,
> 
> src/linux$ sed -n 322,325p fs/read_write.c
> SYSCALL_DEFINE3(lseek, unsigned int, fd, off_t, offset, unsigned int, whence)
> {
> 	return ksys_lseek(fd, offset, whence);
> }
> 
> Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
> ---
>  man2/lseek.2 | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/man2/lseek.2 b/man2/lseek.2
> index e35e410a6..2ff878ffa 100644
> --- a/man2/lseek.2
> +++ b/man2/lseek.2
> @@ -51,7 +51,7 @@ lseek \- reposition read/write file offset
>  .br
>  .B #include <unistd.h>
>  .PP
> -.BI "off_t lseek(int " fd ", off_t " offset ", int " whence );
> +.BI "off_t lseek(unsigned int " fd ", off_t " offset ", unsigned int " whence );
>  .SH DESCRIPTION
>  .BR lseek ()
>  repositions the file offset of the open file description
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] lseek.2: SYNOPSIS: Use correct types
  2020-11-21 17:30 [PATCH] lseek.2: SYNOPSIS: Use correct types Alejandro Colomar
  2020-11-21 17:45 ` Alejandro Colomar (man-pages)
@ 2020-11-22 12:43 ` Florian Weimer
  2020-11-22 13:14   ` Alejandro Colomar (man-pages)
  2020-11-22 22:32 ` Michael Kerrisk (man-pages)
  2 siblings, 1 reply; 6+ messages in thread
From: Florian Weimer @ 2020-11-22 12:43 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: mtk.manpages, linux-man, linux-kernel

* Alejandro Colomar:

> The Linux kernel uses 'unsigned int' instead of 'int' for 'fd' and
> 'whence'.  As glibc provides no wrapper, use the same types the
> kernel uses.

lseek is a POSIX interface, and glibc provides it.  POSIX uses int for
file descriptors (and the whence parameter in case of lseek).

The llseek system call is a different matter, that's indeed
Linux-specific.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] lseek.2: SYNOPSIS: Use correct types
  2020-11-22 12:43 ` Florian Weimer
@ 2020-11-22 13:14   ` Alejandro Colomar (man-pages)
  0 siblings, 0 replies; 6+ messages in thread
From: Alejandro Colomar (man-pages) @ 2020-11-22 13:14 UTC (permalink / raw)
  To: Florian Weimer; +Cc: mtk.manpages, linux-man, linux-kernel

Hi Florian,

On 11/22/20 1:43 PM, Florian Weimer wrote:
> * Alejandro Colomar:
> 
>> The Linux kernel uses 'unsigned int' instead of 'int' for 'fd' and
>> 'whence'.  As glibc provides no wrapper, use the same types the
>> kernel uses.
> 
> lseek is a POSIX interface, and glibc provides it.  POSIX uses int for
> file descriptors (and the whence parameter in case of lseek).
> 
> The llseek system call is a different matter, that's indeed
> Linux-specific.
> 

Ahhh, true.  So many similar functions... :p

Thanks,

Alex

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] lseek.2: SYNOPSIS: Use correct types
  2020-11-21 17:30 [PATCH] lseek.2: SYNOPSIS: Use correct types Alejandro Colomar
  2020-11-21 17:45 ` Alejandro Colomar (man-pages)
  2020-11-22 12:43 ` Florian Weimer
@ 2020-11-22 22:32 ` Michael Kerrisk (man-pages)
  2 siblings, 0 replies; 6+ messages in thread
From: Michael Kerrisk (man-pages) @ 2020-11-22 22:32 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: linux-man, lkml, libc-alpha, Florian Weimer

[Adding libc-alpha@ here, so someone might correct me if I make a misstep]

Hello Alex,

On Sat, 21 Nov 2020 at 18:34, Alejandro Colomar <alx.manpages@gmail.com> wrote:
>
> The Linux kernel uses 'unsigned int' instead of 'int'
> for 'fd' and 'whence'.
> As glibc provides no wrapper, use the same types the kernel uses.

I see Florian already replied, but just to add a detail or two...

In general, the manual pages explicitly note the APIs that have no
glibc wrapper. (If not, that's a bug in the page, but I don't expect
there are many such bugs.)

Looking in <unistd.h>, we have:

[[
#ifndef __USE_FILE_OFFSET64
extern __off_t lseek (int __fd, __off_t __offset, int __whence) __THROW;
#else
# ifdef __REDIRECT_NTH
extern __off64_t __REDIRECT_NTH (lseek,
                                 (int __fd, __off64_t __offset, int __whence),
                                 lseek64);
# else
#  define lseek lseek64
# endif
#endif
#ifdef __USE_LARGEFILE64
extern __off64_t lseek64 (int __fd, __off64_t __offset, int __whence)
     __THROW;
#endif
]]

It looks to me like there's a prototype hiding in there. (And yes, I
don't find it so funny to decode the macro logic either.)

Thanks,

Michael

PS By the way, be aware that the code of many wrapper functions is
autogenerated from "syscalls.list" files in the glibc source, for
example, sysdeps/unix/sysv/linux/syscalls.list. This isn't the case
for lseek(), though, as far as I can see; I think the wrapper function
is defined in sysdeps/unix/sysv/linux/lseek.c.



--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] lseek.2: SYNOPSIS: Use correct types
  2020-11-21 17:45 ` Alejandro Colomar (man-pages)
@ 2020-11-22 22:37   ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 6+ messages in thread
From: Michael Kerrisk (man-pages) @ 2020-11-22 22:37 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages); +Cc: linux-man, lkml, libc-alpha, Florian Weimer

Hi Alex,

On Sat, 21 Nov 2020 at 18:45, Alejandro Colomar (man-pages)
<alx.manpages@gmail.com> wrote:
>
> Hi Michael,
>
> I'm a bit lost in all the *lseek* pages.
>
> You had a good read some months ago, so you may know it better.
> I don't know which of those functions come from the kernel,
> and which come from glibc (if any).

It always takes me too long to remind myself of the details here :-(.

This time, I'll try to write what I (re)learned.

Inside the kernel (5.9 sources), in fs/read_write.c, we have:

[[
SYSCALL_DEFINE3(lseek, unsigned int, fd, off_t, offset, unsigned int, whence)
{
        return ksys_lseek(fd, offset, whence);
}

#ifdef CONFIG_COMPAT
COMPAT_SYSCALL_DEFINE3(lseek, unsigned int, fd, compat_off_t, offset,
unsigned int, whence)
{
        return ksys_lseek(fd, offset, whence);
}
#endif

#if !defined(CONFIG_64BIT) || defined(CONFIG_COMPAT) || \
        defined(__ARCH_WANT_SYS_LLSEEK)
SYSCALL_DEFINE5(llseek, unsigned int, fd, unsigned long, offset_high,
                unsigned long, offset_low, loff_t __user *, result,
                unsigned int, whence)
{
...
}
#endif
]]

The main pieces of interest here are the first and last
SYSCALL_DEFINEn. The first is the "standard" lseek() system call that
exists on 64-bit and 32-bit architectures.

The problem on 32-bit architectures is that the off_t type is a 32-bit
type, but files can be bigger than 2GB (2**32-1). That's why 32-bit
kernels also provide the llseek() system call. It receives the new
offset in two 32-bit pieces (offset_high, offset_low), and returns the
new offset via a 64-bit off_t argument (result). (I forget the
reason why there are 32-bit and 64-bit "offset" args in the syscall.)

One more thing... In arch/x86/entry/syscalls/syscall_32.tbl,
we see the following line:

[[
140     i386    _llseek                 sys_llseek
]]

This is essentially telling us that 'sys_llseek' (the name generated
by SYSCALL_DEFINE5(llseek...)) is exposed to user-space as system call
number 140, and that system call number will (IIUC) be exposed in
autogenerated headers with the name "__NR__llseek" (i.e., "_llseek").
The "i386" is
telling us that this happens in i386 (32-bit Intel). There is nothing
equivalent on x86-64, because 64 bit systems don't need an _llseek
system call.

Now, in ancient times (let's say Linux 2.2), there was a more
transparent situation (but the effect was the same):

#define __NR__llseek            140

and that system call number was tied to the implementation by this definition
linux-2.2.26/arch/i386/kernel/entry.S:

.long SYMBOL_NAME(sys_llseek)           /* 140 */

==

lseek64() is a C library function.  It takes and returns a 64-bit
offset. It exists to support seeking in large (>2GB) files. Its
implementation is in the glibc source file
sysdeps/unix/sysv/linux/lseek64.c, where it calls _llseek(2)

Returning to the <unistd.h> header file, we have:

[[
#ifndef __USE_FILE_OFFSET64
extern __off_t lseek (int __fd, __off_t __offset, int __whence) __THROW;
#else
# ifdef __REDIRECT_NTH
extern __off64_t __REDIRECT_NTH (lseek,
                                 (int __fd, __off64_t __offset, int __whence),
                                 lseek64);
# else
#  define lseek lseek64
# endif
#endif
#ifdef __USE_LARGEFILE64
extern __off64_t lseek64 (int __fd, __off64_t __offset, int __whence)
     __THROW;
#endif
]]

The name "lseek64" is exposed if _LARGEFILE64_SOURCE (which triggers
__USE_LARGEFILE64) is defined. That name was part of the so-called
Transitional Large FIle Systems (LFS) API (see page 105 in my book),
which existed to support the use of 64-bit file offsets on 32 bit
systems. It provided a set of interfaces with names of the form
"xxxxx64()" (e.g., "lseek64")) which provided for 64-bit offsets;
those names coexisted with the traditional 32-bit APIs (e.g.,
"lseek").

Alternatively, the LFS specified a macro, _FILE_OFFSET_BITS=64 (which
triggers __USE_FILE_OFFSET64) as another way of exposing 64-bit-offset
functionality on 32 bit systems. In this case, the traditional API
names (e.g., "lseek") are redirected to the 64-bit implementations
(e.g., "lseek64");

> In the kernel I only found the lseek, llseek, and 32_llseek

I'd ignore 32_llseek -- I guess that's an arch-specific equivalent of
_llseek/llseek.

> (as you can see in the patch).
> So if any other prototype needs to be updated, please do so.
> Especially, have a look at lseek64(3),
> which I suspect needs the same changes I propose in that patch.

I think that no changes to the types are needed in lseek64(3). But
maybe some of the info in this mail should be captured in that manual
page.

Thanks,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-11-22 22:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-21 17:30 [PATCH] lseek.2: SYNOPSIS: Use correct types Alejandro Colomar
2020-11-21 17:45 ` Alejandro Colomar (man-pages)
2020-11-22 22:37   ` Michael Kerrisk (man-pages)
2020-11-22 12:43 ` Florian Weimer
2020-11-22 13:14   ` Alejandro Colomar (man-pages)
2020-11-22 22:32 ` Michael Kerrisk (man-pages)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).