* [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
@ 2022-08-26 21:07 Alejandro Colomar
2022-08-27 11:10 ` Ingo Schwarze
0 siblings, 1 reply; 85+ messages in thread
From: Alejandro Colomar @ 2022-08-26 21:07 UTC (permalink / raw)
To: linux-man; +Cc: Alejandro Colomar, Ingo Schwarze, JeanHeyd Meneide
The WG14 charter for C23 added one principle to the ones in
previous standards:
[
15. Application Programming Interfaces (APIs) should be
self-documenting when possible. In particular, the order of
parameters in function declarations should be arranged such that
the size of an array appears before the array. The purpose is to
allow Variable-Length Array (VLA) notation to be used. This not
only makes the code's purpose clearer to human readers, but also
makes static analysis easier. Any new APIs added to the Standard
should take this into consideration.
]
ISO C doesn't allow using VLA syntax when the parameter used for
the size of the array is declared _after_ the parameter that is a
VLa. That's a minor issue that could be easily changed in the
language without backwards-compatibility issues, and in fact it
seems to have been proposed, and not yet discarded, even if it's
not going to change in C23.
Since the manual pages SYNOPSIS are not bounded by strict C legal
syntax, but we already use some "tricks" to try to convey the most
information to the reader even if it might not be the most legal
syntax, we can also make a small compromise in this case, using
illegal syntax (at least not yet legalized) to add important
information to the function prototypes.
If we're lucky, compiler authors, and maybe even WG14 members, may
be satisfied by the syntax used in these manual pages, and may
decide to implement this feature to the language.
It seems to me a sound syntax that isn't ambiguous, even if it
deviates from the common pattern in C that declarations _always_
come before use. It's a reasonable tradeoff.
This change will make the contract between the programmer and the
implementation clearer just by reading a prototype. For example:
size_t strlcpy(char *restrict dst, const char *restrict src,
size_t size);
vs
size_t strlcpy(char dst[restrict size], const char *restrict src,
size_t size);
The second prototype above makes it clear that the 'dst' buffer
will be safe from overflow, but the 'src' one clearly needs to be
NUL-terminated, or it will cause UB, since nothing tells the
function how long it is.
Link: <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2611.htm>
Cc: Ingo Schwarze <schwarze@openbsd.org>
Cc: JeanHeyd Meneide <wg14@soasis.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
---
man3/confstr.3 | 2 +-
man3/des_crypt.3 | 6 ++++--
man3/fgetc.3 | 2 +-
man3/getcwd.3 | 2 +-
man3/getdirentries.3 | 3 ++-
man3/getgrent_r.3 | 4 ++--
man3/getgrnam.3 | 4 ++--
man3/gethostbyname.3 | 8 ++++----
man3/getlogin.3 | 2 +-
man3/getmntent.3 | 2 +-
man3/getnameinfo.3 | 6 +++---
man3/getnetent_r.3 | 6 +++---
man3/getprotoent_r.3 | 6 +++---
man3/getpwent_r.3 | 4 ++--
man3/getpwnam.3 | 8 ++++----
man3/getrpcent_r.3 | 6 +++---
man3/getservent_r.3 | 6 +++---
man3/getspnam.3 | 8 ++++----
man3/inet_net_pton.3 | 2 +-
man3/inet_ntop.3 | 2 +-
man3/mblen.3 | 2 +-
man3/mbrlen.3 | 2 +-
man3/mbrtowc.3 | 5 ++---
man3/mbstowcs.3 | 3 ++-
man3/mbtowc.3 | 4 ++--
man3/mq_receive.3 | 5 +++--
man3/mq_send.3 | 4 ++--
man3/printf.3 | 4 ++--
man3/pthread_setname_np.3 | 3 ++-
man3/ptsname.3 | 4 ++--
man3/random.3 | 2 +-
man3/random_r.3 | 3 ++-
man3/regex.3 | 7 ++++---
man3/resolver.3 | 34 ++++++++++++++++++----------------
man3/rpc.3 | 2 +-
man3/setaliasent.3 | 6 ++++--
man3/setbuf.3 | 4 ++--
man3/setnetgrent.3 | 2 +-
man3/stpncpy.3 | 5 +++--
man3/strcasecmp.3 | 2 +-
man3/strcat.3 | 5 +++--
man3/strcmp.3 | 2 +-
man3/strcpy.3 | 5 +++--
man3/strdup.3 | 4 ++--
man3/strerror.3 | 4 ++--
man3/strfmon.3 | 4 ++--
man3/strfromd.3 | 6 +++---
man3/strftime.3 | 4 ++--
man3/string.3 | 25 +++++++++++++++++--------
man3/strnlen.3 | 2 +-
man3/strxfrm.3 | 3 ++-
man3/ttyname.3 | 2 +-
man3/unlocked_stdio.3 | 4 ++--
man3/wcsnrtombs.3 | 2 +-
man3/wcsrtombs.3 | 2 +-
man3/wcstombs.3 | 2 +-
56 files changed, 146 insertions(+), 122 deletions(-)
diff --git a/man3/confstr.3 b/man3/confstr.3
index 5bc334c02..434ab9678 100644
--- a/man3/confstr.3
+++ b/man3/confstr.3
@@ -20,7 +20,7 @@ Standard C library
.nf
.B #include <unistd.h>
.PP
-.BI "size_t confstr(int " "name" ", char *" buf ", size_t " size );
+.BI "size_t confstr(int " "name" ", char " buf [ size "], size_t " size );
.fi
.PP
.RS -4
diff --git a/man3/des_crypt.3 b/man3/des_crypt.3
index 90ce308b9..f419ab026 100644
--- a/man3/des_crypt.3
+++ b/man3/des_crypt.3
@@ -22,9 +22,11 @@ Standard C library
.\" .B #include <des_crypt.h>
.B #include <rpc/des_crypt.h>
.PP
-.BI "int ecb_crypt(char *" key ", char *" data ", unsigned int " datalen ,
+.BI "int ecb_crypt(char *" key ", char " data [ datalen "], \
+unsigned int " datalen ,
.BI " unsigned int " mode );
-.BI "int cbc_crypt(char *" key ", char *" data ", unsigned int " datalen ,
+.BI "int cbc_crypt(char *" key ", char " data [ datalen "], \
+unsigned int " datalen ,
.BI " unsigned int " mode ", char *" ivec );
.PP
.BI "void des_setparity(char *" key );
diff --git a/man3/fgetc.3 b/man3/fgetc.3
index 2cd14a5fb..690cbce80 100644
--- a/man3/fgetc.3
+++ b/man3/fgetc.3
@@ -18,7 +18,7 @@ Standard C library
.BI "int getc(FILE *" stream );
.B "int getchar(void);"
.PP
-.BI "char *fgets(char *restrict " s ", int " size ", FILE *restrict " stream );
+.BI "char *fgets(char " s "[restrict " size "], int " size ", FILE *restrict " stream );
.PP
.BI "int ungetc(int " c ", FILE *" stream );
.fi
diff --git a/man3/getcwd.3 b/man3/getcwd.3
index 382bade77..82f573115 100644
--- a/man3/getcwd.3
+++ b/man3/getcwd.3
@@ -19,7 +19,7 @@ Standard C library
.nf
.B #include <unistd.h>
.PP
-.BI "char *getcwd(char *" buf ", size_t " size );
+.BI "char *getcwd(char " buf [ size "], size_t " size );
.BI "char *getwd(char *" buf );
.B "char *get_current_dir_name(void);"
.fi
diff --git a/man3/getdirentries.3 b/man3/getdirentries.3
index ce8ee69a8..eadc3c86e 100644
--- a/man3/getdirentries.3
+++ b/man3/getdirentries.3
@@ -14,7 +14,8 @@ Standard C library
.nf
.B #include <dirent.h>
.PP
-.BI "ssize_t getdirentries(int " fd ", char *restrict " buf ", size_t " nbytes ,
+.BI "ssize_t getdirentries(int " fd ", char " buf "[restrict " nbytes "], \
+size_t " nbytes ,
.BI " off_t *restrict " basep );
.fi
.PP
diff --git a/man3/getgrent_r.3 b/man3/getgrent_r.3
index 8a47bf59e..e1eeb31c7 100644
--- a/man3/getgrent_r.3
+++ b/man3/getgrent_r.3
@@ -13,10 +13,10 @@ Standard C library
.B #include <grp.h>
.PP
.BI "int getgrent_r(struct group *restrict " gbuf ,
-.BI " char *restrict " buf ", size_t " buflen ,
+.BI " char " buf "[restrict " buflen "], size_t " buflen ,
.BI " struct group **restrict " gbufp );
.BI "int fgetgrent_r(FILE *restrict " stream ", struct group *restrict " gbuf ,
-.BI " char *restrict " buf ", size_t " buflen ,
+.BI " char " buf "[restrict " buflen "], size_t " buflen ,
.BI " struct group **restrict " gbufp );
.fi
.PP
diff --git a/man3/getgrnam.3 b/man3/getgrnam.3
index 7ef37819f..ab46a7570 100644
--- a/man3/getgrnam.3
+++ b/man3/getgrnam.3
@@ -26,10 +26,10 @@ Standard C library
.PP
.BI "int getgrnam_r(const char *restrict " name \
", struct group *restrict " grp ,
-.BI " char *restrict " buf ", size_t " buflen ,
+.BI " char " buf "[restrict " buflen "], size_t " buflen ,
.BI " struct group **restrict " result );
.BI "int getgrgid_r(gid_t " gid ", struct group *restrict " grp ,
-.BI " char *restrict " buf ", size_t " buflen ,
+.BI " char " buf "[restrict " buflen "], size_t " buflen ,
.BI " struct group **restrict " result );
.fi
.PP
diff --git a/man3/gethostbyname.3 b/man3/gethostbyname.3
index 20ad562be..1c7182245 100644
--- a/man3/gethostbyname.3
+++ b/man3/gethostbyname.3
@@ -49,24 +49,24 @@ Standard C library
.BI "struct hostent *gethostbyname2(const char *" name ", int " af );
.PP
.BI "int gethostent_r(struct hostent *restrict " ret ,
-.BI " char *restrict " buf ", size_t " buflen ,
+.BI " char " buf "[restrict " buflen "], size_t " buflen ,
.BI " struct hostent **restrict " result ,
.BI " int *restrict " h_errnop );
.PP
.BI "int gethostbyaddr_r(const void *restrict " addr ", socklen_t " len \
", int " type ,
.BI " struct hostent *restrict " ret ,
-.BI " char *restrict " buf ", size_t " buflen ,
+.BI " char " buf "[restrict " buflen "], size_t " buflen ,
.BI " struct hostent **restrict " result ,
.BI " int *restrict " h_errnop );
.BI "int gethostbyname_r(const char *restrict " name ,
.BI " struct hostent *restrict " ret ,
-.BI " char *restrict " buf ", size_t " buflen ,
+.BI " char " buf "[restrict " buflen "], size_t " buflen ,
.BI " struct hostent **restrict " result ,
.BI " int *restrict " h_errnop );
.BI "int gethostbyname2_r(const char *restrict " name ", int " af,
.BI " struct hostent *restrict " ret ,
-.BI " char *restrict " buf ", size_t " buflen ,
+.BI " char " buf "[restrict " buflen "], size_t " buflen ,
.BI " struct hostent **restrict " result ,
.BI " int *restrict " h_errnop );
.fi
diff --git a/man3/getlogin.3 b/man3/getlogin.3
index 50b8b008b..8604f2bcd 100644
--- a/man3/getlogin.3
+++ b/man3/getlogin.3
@@ -16,7 +16,7 @@ Standard C library
.B #include <unistd.h>
.PP
.B "char *getlogin(void);"
-.BI "int getlogin_r(char *" buf ", size_t " bufsize );
+.BI "int getlogin_r(char " buf [ bufsize "], size_t " bufsize );
.PP
.B #include <stdio.h>
.PP
diff --git a/man3/getmntent.3 b/man3/getmntent.3
index 3c704b1d8..41746d9eb 100644
--- a/man3/getmntent.3
+++ b/man3/getmntent.3
@@ -37,7 +37,7 @@ Standard C library
.PP
.BI "struct mntent *getmntent_r(FILE *restrict " streamp ,
.BI " struct mntent *restrict " mntbuf ,
-.BI " char *restrict " buf ", int " buflen );
+.BI " char " buf "[restrict " buflen "], int " buflen );
.fi
.PP
.RS -4
diff --git a/man3/getnameinfo.3 b/man3/getnameinfo.3
index 5c42c09b6..b72c37117 100644
--- a/man3/getnameinfo.3
+++ b/man3/getnameinfo.3
@@ -20,9 +20,9 @@ Standard C library
.PP
.BI "int getnameinfo(const struct sockaddr *restrict " addr \
", socklen_t " addrlen ,
-.BI " char *restrict " host ", socklen_t " hostlen ,
-.BI " char *restrict " serv ", socklen_t " servlen \
-", int " flags );
+.BI " char " host "[restrict " hostlen "], socklen_t " hostlen ,
+.BI " char " serv "[restrict " servlen "], socklen_t " servlen ,
+.BI " int " flags );
.fi
.PP
.RS -4
diff --git a/man3/getnetent_r.3 b/man3/getnetent_r.3
index 36a2ff819..55322df27 100644
--- a/man3/getnetent_r.3
+++ b/man3/getnetent_r.3
@@ -15,17 +15,17 @@ Standard C library
.B #include <netdb.h>
.PP
.BI "int getnetent_r(struct netent *restrict " result_buf ,
-.BI " char *restrict " buf ", size_t " buflen ,
+.BI " char " buf "[restrict " buflen "], size_t " buflen ,
.BI " struct netent **restrict " result ,
.BI " int *restrict " h_errnop );
.BI "int getnetbyname_r(const char *restrict " name ,
.BI " struct netent *restrict " result_buf ,
-.BI " char *restrict " buf ", size_t " buflen ,
+.BI " char " buf "[restrict " buflen "], size_t " buflen ,
.BI " struct netent **restrict " result ,
.BI " int *restrict " h_errnop );
.BI "int getnetbyaddr_r(uint32_t " net ", int " type ,
.BI " struct netent *restrict " result_buf ,
-.BI " char *restrict " buf ", size_t " buflen ,
+.BI " char " buf "[restrict " buflen "], size_t " buflen ,
.BI " struct netent **restrict " result ,
.BI " int *restrict " h_errnop );
.PP
diff --git a/man3/getprotoent_r.3 b/man3/getprotoent_r.3
index 2e3815a30..34ae75634 100644
--- a/man3/getprotoent_r.3
+++ b/man3/getprotoent_r.3
@@ -15,15 +15,15 @@ Standard C library
.B #include <netdb.h>
.PP
.BI "int getprotoent_r(struct protoent *restrict " result_buf ,
-.BI " char *restrict " buf ", size_t " buflen ,
+.BI " char " buf "[restrict " buflen "], size_t " buflen ,
.BI " struct protoent **restrict " result );
.BI "int getprotobyname_r(const char *restrict " name ,
.BI " struct protoent *restrict " result_buf ,
-.BI " char *restrict " buf ", size_t " buflen ,
+.BI " char " buf "[restrict " buflen "], size_t " buflen ,
.BI " struct protoent **restrict " result );
.BI "int getprotobynumber_r(int " proto ,
.BI " struct protoent *restrict " result_buf ,
-.BI " char *restrict " buf ", size_t " buflen ,
+.BI " char " buf "[restrict " buflen "], size_t " buflen ,
.BI " struct protoent **restrict " result );
.PP
.fi
diff --git a/man3/getpwent_r.3 b/man3/getpwent_r.3
index bde13f399..03826578c 100644
--- a/man3/getpwent_r.3
+++ b/man3/getpwent_r.3
@@ -13,11 +13,11 @@ Standard C library
.B #include <pwd.h>
.PP
.BI "int getpwent_r(struct passwd *restrict " pwbuf ,
-.BI " char *restrict " buf ", size_t " buflen ,
+.BI " char " buf "[restrict " buflen "], size_t " buflen ,
.BI " struct passwd **restrict " pwbufp );
.BI "int fgetpwent_r(FILE *restrict " stream \
", struct passwd *restrict " pwbuf ,
-.BI " char *restrict " buf ", size_t " buflen ,
+.BI " char " buf "[restrict " buflen "], size_t " buflen ,
.BI " struct passwd **restrict " pwbufp );
.fi
.PP
diff --git a/man3/getpwnam.3 b/man3/getpwnam.3
index 219d37733..d711a4c4a 100644
--- a/man3/getpwnam.3
+++ b/man3/getpwnam.3
@@ -28,12 +28,12 @@ Standard C library
.BI "struct passwd *getpwnam(const char *" name );
.BI "struct passwd *getpwuid(uid_t " uid );
.PP
-.BI "int getpwnam_r(const char *restrict " name \
-", struct passwd *restrict " pwd ,
-.BI " char *restrict " buf ", size_t " buflen ,
+.BI "int getpwnam_r(const char *restrict " name ", \
+struct passwd *restrict " pwd ,
+.BI " char " buf "[restrict " buflen "], size_t " buflen ,
.BI " struct passwd **restrict " result );
.BI "int getpwuid_r(uid_t " uid ", struct passwd *restrict " pwd ,
-.BI " char *restrict " buf ", size_t " buflen ,
+.BI " char " buf "[restrict " buflen "], size_t " buflen ,
.BI " struct passwd **restrict " result );
.fi
.PP
diff --git a/man3/getrpcent_r.3 b/man3/getrpcent_r.3
index 44d20b7ed..74182fd83 100644
--- a/man3/getrpcent_r.3
+++ b/man3/getrpcent_r.3
@@ -14,13 +14,13 @@ Standard C library
.nf
.B #include <netdb.h>
.PP
-.BI "int getrpcent_r(struct rpcent *" result_buf ", char *" buf ,
+.BI "int getrpcent_r(struct rpcent *" result_buf ", char " buf [ buflen "],
.BI " size_t " buflen ", struct rpcent **" result );
.BI "int getrpcbyname_r(const char *" name ,
-.BI " struct rpcent *" result_buf ", char *" buf ,
+.BI " struct rpcent *" result_buf ", char " buf [ buflen "],
.BI " size_t " buflen ", struct rpcent **" result );
.BI "int getrpcbynumber_r(int " number ,
-.BI " struct rpcent *" result_buf ", char *" buf ,
+.BI " struct rpcent *" result_buf ", char " buf [ buflen "],
.BI " size_t " buflen ", struct rpcent **" result );
.PP
.fi
diff --git a/man3/getservent_r.3 b/man3/getservent_r.3
index 4e7b1f03d..6d9c578b4 100644
--- a/man3/getservent_r.3
+++ b/man3/getservent_r.3
@@ -15,17 +15,17 @@ Standard C library
.B #include <netdb.h>
.PP
.BI "int getservent_r(struct servent *restrict " result_buf ,
-.BI " char *restrict " buf ", size_t " buflen ,
+.BI " char " buf "[restrict " buflen "], size_t " buflen ,
.BI " struct servent **restrict " result );
.BI "int getservbyname_r(const char *restrict " name ,
.BI " const char *restrict " proto ,
.BI " struct servent *restrict " result_buf ,
-.BI " char *restrict " buf ", size_t " buflen ,
+.BI " char " buf "[restrict " buflen "], size_t " buflen ,
.BI " struct servent **restrict " result );
.BI "int getservbyport_r(int " port ,
.BI " const char *restrict " proto ,
.BI " struct servent *restrict " result_buf ,
-.BI " char *restrict " buf ", size_t " buflen ,
+.BI " char " buf "[restrict " buflen "], size_t " buflen ,
.BI " struct servent **restrict " result );
.PP
.fi
diff --git a/man3/getspnam.3 b/man3/getspnam.3
index 3389105ab..db5f8f5f8 100644
--- a/man3/getspnam.3
+++ b/man3/getspnam.3
@@ -34,14 +34,14 @@ Standard C library
.B #include <shadow.h>
.PP
.BI "int getspent_r(struct spwd *" spbuf ,
-.BI " char *" buf ", size_t " buflen ", struct spwd **" spbufp );
+.BI " char " buf [ buflen "], size_t " buflen ", struct spwd **" spbufp );
.BI "int getspnam_r(const char *" name ", struct spwd *" spbuf ,
-.BI " char *" buf ", size_t " buflen ", struct spwd **" spbufp );
+.BI " char " buf [ buflen "], size_t " buflen ", struct spwd **" spbufp );
.PP
.BI "int fgetspent_r(FILE *" stream ", struct spwd *" spbuf ,
-.BI " char *" buf ", size_t " buflen ", struct spwd **" spbufp );
+.BI " char " buf [ buflen "], size_t " buflen ", struct spwd **" spbufp );
.BI "int sgetspent_r(const char *" s ", struct spwd *" spbuf ,
-.BI " char *" buf ", size_t " buflen ", struct spwd **" spbufp );
+.BI " char " buf [ buflen "], size_t " buflen ", struct spwd **" spbufp );
.fi
.PP
.RS -4
diff --git a/man3/inet_net_pton.3 b/man3/inet_net_pton.3
index 8dce6b299..c7d477695 100644
--- a/man3/inet_net_pton.3
+++ b/man3/inet_net_pton.3
@@ -15,7 +15,7 @@ Resolver library
.BI "int inet_net_pton(int " af ", const char *" pres ,
.BI " void *" netp ", size_t " nsize );
.BI "char *inet_net_ntop(int " af ", const void *" netp ", int " bits ,
-.BI " char *" pres ", size_t " psize );
+.BI " char " pres [ psize "], size_t " psize );
.fi
.PP
.RS -4
diff --git a/man3/inet_ntop.3 b/man3/inet_ntop.3
index b06c268bd..6f73b33fa 100644
--- a/man3/inet_ntop.3
+++ b/man3/inet_ntop.3
@@ -14,7 +14,7 @@ Standard C library
.B #include <arpa/inet.h>
.PP
.BI "const char *inet_ntop(int " af ", const void *restrict " src ,
-.BI " char *restrict " dst ", socklen_t " size );
+.BI " char " dst "[restrict " size "], socklen_t " size );
.fi
.SH DESCRIPTION
This function converts the network address structure
diff --git a/man3/mblen.3 b/man3/mblen.3
index ae7b38f1b..de826f2b8 100644
--- a/man3/mblen.3
+++ b/man3/mblen.3
@@ -18,7 +18,7 @@ Standard C library
.nf
.B #include <stdlib.h>
.PP
-.BI "int mblen(const char *" s ", size_t " n );
+.BI "int mblen(const char " s [ n "], size_t " n );
.fi
.SH DESCRIPTION
If
diff --git a/man3/mbrlen.3 b/man3/mbrlen.3
index 35c2b8db5..4522d2cac 100644
--- a/man3/mbrlen.3
+++ b/man3/mbrlen.3
@@ -18,7 +18,7 @@ Standard C library
.nf
.B #include <wchar.h>
.PP
-.BI "size_t mbrlen(const char *restrict " s ", size_t " n ,
+.BI "size_t mbrlen(const char " s "[restrict " n "], size_t " n ,
.BI " mbstate_t *restrict " ps );
.fi
.SH DESCRIPTION
diff --git a/man3/mbrtowc.3 b/man3/mbrtowc.3
index b91c0fbc2..1de0f1ba7 100644
--- a/man3/mbrtowc.3
+++ b/man3/mbrtowc.3
@@ -19,9 +19,8 @@ Standard C library
.nf
.B #include <wchar.h>
.PP
-.BI "size_t mbrtowc(wchar_t *restrict " pwc ", const char *restrict " s \
-", size_t " n ,
-.BI " mbstate_t *restrict " ps );
+.BI "size_t mbrtowc(wchar_t *restrict " pwc ", const char " s "[restrict " n ],
+.BI " size_t " n ", mbstate_t *restrict " ps );
.fi
.SH DESCRIPTION
The main case for this function is when
diff --git a/man3/mbstowcs.3 b/man3/mbstowcs.3
index 30a2a8679..67b8e569e 100644
--- a/man3/mbstowcs.3
+++ b/man3/mbstowcs.3
@@ -19,7 +19,8 @@ Standard C library
.nf
.B #include <stdlib.h>
.PP
-.BI "size_t mbstowcs(wchar_t *restrict " dest ", const char *restrict " src ,
+.BI "size_t mbstowcs(wchar_t " dest "[restrict " n "], \
+const char *restrict " src ,
.BI " size_t " n );
.fi
.SH DESCRIPTION
diff --git a/man3/mbtowc.3 b/man3/mbtowc.3
index b0a25ae12..18dca1957 100644
--- a/man3/mbtowc.3
+++ b/man3/mbtowc.3
@@ -18,8 +18,8 @@ Standard C library
.nf
.B #include <stdlib.h>
.PP
-.BI "int mbtowc(wchar_t *restrict " pwc ", const char *restrict " s \
-", size_t " n );
+.BI "int mbtowc(wchar_t *restrict " pwc ", const char " s "[restrict " n "], \
+size_t " n );
.fi
.SH DESCRIPTION
The main case for this function is when
diff --git a/man3/mq_receive.3 b/man3/mq_receive.3
index 94f686d97..f43df785f 100644
--- a/man3/mq_receive.3
+++ b/man3/mq_receive.3
@@ -12,13 +12,14 @@ Real-time library
.nf
.B #include <mqueue.h>
.PP
-.BI "ssize_t mq_receive(mqd_t " mqdes ", char *" msg_ptr ,
+.BI "ssize_t mq_receive(mqd_t " mqdes ", char " msg_ptr [ msg_len ],
.BI " size_t " msg_len ", unsigned int *" msg_prio );
.PP
.B #include <time.h>
.B #include <mqueue.h>
.PP
-.BI "ssize_t mq_timedreceive(mqd_t " mqdes ", char *restrict " msg_ptr ,
+.BI "ssize_t mq_timedreceive(mqd_t " mqdes ", \
+char *restrict " msg_ptr [ msg_len ],
.BI " size_t " msg_len ", unsigned int *restrict " msg_prio ,
.BI " const struct timespec *restrict " abs_timeout );
.fi
diff --git a/man3/mq_send.3 b/man3/mq_send.3
index 26947595a..6f147d4fb 100644
--- a/man3/mq_send.3
+++ b/man3/mq_send.3
@@ -12,13 +12,13 @@ Real-time library
.nf
.B #include <mqueue.h>
.PP
-.BI "int mq_send(mqd_t " mqdes ", const char *" msg_ptr ,
+.BI "int mq_send(mqd_t " mqdes ", const char " msg_ptr [ msg_len ],
.BI " size_t " msg_len ", unsigned int " msg_prio );
.PP
.B #include <time.h>
.B #include <mqueue.h>
.PP
-.BI "int mq_timedsend(mqd_t " mqdes ", const char *" msg_ptr ,
+.BI "int mq_timedsend(mqd_t " mqdes ", const char " msg_ptr [ msg_len ],
.BI " size_t " msg_len ", unsigned int " msg_prio ,
.BI " const struct timespec *" abs_timeout );
.fi
diff --git a/man3/printf.3 b/man3/printf.3
index 878f95791..5099b6f72 100644
--- a/man3/printf.3
+++ b/man3/printf.3
@@ -30,7 +30,7 @@ Standard C library
.BI " const char *restrict " format ", ...);"
.BI "int sprintf(char *restrict " str ,
.BI " const char *restrict " format ", ...);"
-.BI "int snprintf(char *restrict " str ", size_t " size ,
+.BI "int snprintf(char " str "[restrict " size "], size_t " size ,
.BI " const char *restrict " format ", ...);"
.PP
.B #include <stdarg.h>
@@ -42,7 +42,7 @@ Standard C library
.BI " const char *restrict " format ", va_list " ap );
.BI "int vsprintf(char *restrict " str ,
.BI " const char *restrict " format ", va_list " ap );
-.BI "int vsnprintf(char *restrict " str ", size_t " size ,
+.BI "int vsnprintf(char " str "[restrict " size "], size_t " size ,
.BI " const char *restrict " format ", va_list " ap );
.fi
.PP
diff --git a/man3/pthread_setname_np.3 b/man3/pthread_setname_np.3
index 115557787..2bab13e85 100644
--- a/man3/pthread_setname_np.3
+++ b/man3/pthread_setname_np.3
@@ -15,7 +15,8 @@ POSIX threads library
.B #include <pthread.h>
.PP
.BI "int pthread_setname_np(pthread_t " thread ", const char *" name );
-.BI "int pthread_getname_np(pthread_t " thread ", char *" name ", size_t " size );
+.BI "int pthread_getname_np(pthread_t " thread ", char " name [ size "], \
+size_t " size );
.fi
.SH DESCRIPTION
By default, all the threads created using
diff --git a/man3/ptsname.3 b/man3/ptsname.3
index e40005df6..135730752 100644
--- a/man3/ptsname.3
+++ b/man3/ptsname.3
@@ -14,8 +14,8 @@ Standard C library
.nf
.B #include <stdlib.h>
.PP
-.BI "char *ptsname(int " fd ");"
-.BI "int ptsname_r(int " fd ", char *" buf ", size_t " buflen ");"
+.BI "char *ptsname(int " fd );
+.BI "int ptsname_r(int " fd ", char " buf [ buflen "], size_t " buflen );
.fi
.PP
.RS -4
diff --git a/man3/random.3 b/man3/random.3
index fd2512626..3a7af437a 100644
--- a/man3/random.3
+++ b/man3/random.3
@@ -23,7 +23,7 @@ Standard C library
.B long random(void);
.BI "void srandom(unsigned int " seed );
.PP
-.BI "char *initstate(unsigned int " seed ", char *" state ", size_t " n );
+.BI "char *initstate(unsigned int " seed ", char " state [ n "], size_t " n );
.BI "char *setstate(char *" state );
.fi
.PP
diff --git a/man3/random_r.3 b/man3/random_r.3
index b2bf97b06..8564e1723 100644
--- a/man3/random_r.3
+++ b/man3/random_r.3
@@ -18,7 +18,8 @@ Standard C library
.BI " int32_t *restrict " result );
.BI "int srandom_r(unsigned int " seed ", struct random_data *" buf );
.PP
-.BI "int initstate_r(unsigned int " seed ", char *restrict " statebuf ,
+.BI "int initstate_r(unsigned int " seed ", \
+char " statebuf "[restrict " statelen ],
.BI " size_t " statelen ", struct random_data *restrict " buf );
.BI "int setstate_r(char *restrict " statebuf ,
.BI " struct random_data *restrict " buf );
diff --git a/man3/regex.3 b/man3/regex.3
index e423e442d..ae66b7980 100644
--- a/man3/regex.3
+++ b/man3/regex.3
@@ -21,11 +21,12 @@ Standard C library
.BI " int " cflags );
.BI "int regexec(const regex_t *restrict " preg \
", const char *restrict " string ,
-.BI " size_t " nmatch ", regmatch_t " pmatch "[restrict]\
-, int " eflags );
+.BI " size_t " nmatch ", regmatch_t " pmatch "[restrict " nmatch ],
+.BI " int " eflags );
.PP
.BI "size_t regerror(int " errcode ", const regex_t *restrict " preg ,
-.BI " char *restrict " errbuf ", size_t " errbuf_size );
+.BI " char " errbuf "[restrict " errbuf_size "], \
+size_t " errbuf_size );
.BI "void regfree(regex_t *" preg );
.fi
.SH DESCRIPTION
diff --git a/man3/resolver.3 b/man3/resolver.3
index b565bb5e6..c701b4629 100644
--- a/man3/resolver.3
+++ b/man3/resolver.3
@@ -35,34 +35,35 @@ Resolver library
.PP
.BI "int res_nquery(res_state " statep ,
.BI " const char *" dname ", int " class ", int " type ,
-.BI " unsigned char *" answer ", int " anslen );
+.BI " unsigned char " answer [ anslen "], int " anslen );
.PP
.BI "int res_nsearch(res_state " statep ,
.BI " const char *" dname ", int " class ", int " type ,
-.BI " unsigned char *" answer ", int " anslen );
+.BI " unsigned char " answer [ anslen "], int " anslen );
.PP
.BI "int res_nquerydomain(res_state " statep ,
.BI " const char *" name ", const char *" domain ,
-.BI " int " class ", int " type ", unsigned char *" answer ,
+.BI " int " class ", int " type ", unsigned char " answer [ anslen ],
.BI " int " anslen );
.PP
.BI "int res_nmkquery(res_state " statep ,
.BI " int " op ", const char *" dname ", int " class ,
-.BI " int " type ", const unsigned char *" data ", int " datalen ,
+.BI " int " type ", const unsigned char " data [ datalen "], \
+int " datalen ,
.BI " const unsigned char *" newrr ,
-.BI " unsigned char *" buf ", int " buflen );
+.BI " unsigned char " buf [ buflen "], int " buflen );
.PP
.BI "int res_nsend(res_state " statep ,
-.BI " const unsigned char *" msg ", int " msglen ,
-.BI " unsigned char *" answer ", int " anslen );
+.BI " const unsigned char " msg [ msglen "], int " msglen ,
+.BI " unsigned char " answer [ anslen "], int " anslen );
.PP
-.BI "int dn_comp(const char *" exp_dn ", unsigned char *" comp_dn ,
+.BI "int dn_comp(const char *" exp_dn ", unsigned char " comp_dn [ length ],
.BI " int " length ", unsigned char **" dnptrs ,
.BI " unsigned char **" lastdnptr );
.PP
.BI "int dn_expand(const unsigned char *" msg ,
.BI " const unsigned char *" eomorig ,
-.BI " const unsigned char *" comp_dn ", char *" exp_dn ,
+.BI " const unsigned char *" comp_dn ", char " exp_dn [ length ],
.BI " int " length );
.fi
.\"
@@ -73,22 +74,23 @@ Resolver library
.B int res_init(void);
.PP
.BI "int res_query(const char *" dname ", int " class ", int " type ,
-.BI " unsigned char *" answer ", int " anslen );
+.BI " unsigned char " answer [ anslen "], int " anslen );
.PP
.BI "int res_search(const char *" dname ", int " class ", int " type ,
-.BI " unsigned char *" answer ", int " anslen );
+.BI " unsigned char " answer [ anslen "], int " anslen );
.PP
.BI "int res_querydomain(const char *" name ", const char *" domain ,
-.BI " int " class ", int " type ", unsigned char *" answer ,
+.BI " int " class ", int " type ", unsigned char " answer [ anslen ],
.BI " int " anslen );
.PP
.BI "int res_mkquery(int " op ", const char *" dname ", int " class ,
-.BI " int " type ", const unsigned char *" data ", int " datalen ,
+.BI " int " type ", const unsigned char " data [ datalen "], \
+int " datalen ,
.BI " const unsigned char *" newrr ,
-.BI " unsigned char *" buf ", int " buflen );
+.BI " unsigned char " buf [ buflen "], int " buflen );
.PP
-.BI "int res_send(const unsigned char *" msg ", int " msglen ,
-.BI " unsigned char *" answer ", int " anslen );
+.BI "int res_send(const unsigned char " msg [ msglen "], int " msglen ,
+.BI " unsigned char " answer [ anslen "], int " anslen );
.fi
.SH DESCRIPTION
.B Note:
diff --git a/man3/rpc.3 b/man3/rpc.3
index b0cfc52e7..80fdd5dc4 100644
--- a/man3/rpc.3
+++ b/man3/rpc.3
@@ -74,7 +74,7 @@ This is the default authentication used by RPC.
.PP
.nf
.BI "AUTH *authunix_create(char *" host ", uid_t " uid ", gid_t " gid ,
-.BI " int " len ", gid_t *" aup_gids );
+.BI " int " len ", gid_t " aup_gids [ len ]);
.fi
.IP
Create and return an RPC authentication handle that contains
diff --git a/man3/setaliasent.3 b/man3/setaliasent.3
index 9d3cfb968..f4401608b 100644
--- a/man3/setaliasent.3
+++ b/man3/setaliasent.3
@@ -20,13 +20,15 @@ Standard C library
.PP
.B "struct aliasent *getaliasent(void);"
.BI "int getaliasent_r(struct aliasent *restrict " result ,
-.BI " char *restrict " buffer ", size_t " buflen ,
+.BI " char " buffer "[restrict " buflen "], \
+size_t " buflen ,
.BI " struct aliasent **restrict " res );
.PP
.BI "struct aliasent *getaliasbyname(const char *" name );
.BI "int getaliasbyname_r(const char *restrict " name ,
.BI " struct aliasent *restrict " result ,
-.BI " char *restrict " buffer ", size_t " buflen ,
+.BI " char " buffer "[restrict " buflen "], \
+size_t " buflen ,
.BI " struct aliasent **restrict " res );
.fi
.SH DESCRIPTION
diff --git a/man3/setbuf.3 b/man3/setbuf.3
index 4a62952d7..8c72b8e0a 100644
--- a/man3/setbuf.3
+++ b/man3/setbuf.3
@@ -27,11 +27,11 @@ Standard C library
.nf
.B #include <stdio.h>
.PP
-.BI "int setvbuf(FILE *restrict " stream ", char *restrict " buf ,
+.BI "int setvbuf(FILE *restrict " stream ", char " buf "[restrict " size ],
.BI " int " mode ", size_t " size );
.PP
.BI "void setbuf(FILE *restrict " stream ", char *restrict " buf );
-.BI "void setbuffer(FILE *restrict " stream ", char *restrict " buf ,
+.BI "void setbuffer(FILE *restrict " stream ", char " buf "[restrict " size ],
.BI " size_t " size );
.BI "void setlinebuf(FILE *" stream );
.fi
diff --git a/man3/setnetgrent.3 b/man3/setnetgrent.3
index 3625adf14..9cfda3c83 100644
--- a/man3/setnetgrent.3
+++ b/man3/setnetgrent.3
@@ -23,7 +23,7 @@ Standard C library
.BI " char **restrict " user ", char **restrict " domain );
.BI "int getnetgrent_r(char **restrict " host ,
.BI " char **restrict " user ", char **restrict " domain ,
-.BI " char *restrict " buf ", size_t " buflen );
+.BI " char " buf "[restrict " buflen "], size_t " buflen );
.PP
.BI "int innetgr(const char *" netgroup ", const char *" host ,
.BI " const char *" user ", const char *" domain );
diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
index c057845ac..5dd7cc96d 100644
--- a/man3/stpncpy.3
+++ b/man3/stpncpy.3
@@ -16,8 +16,9 @@ Standard C library
.nf
.B #include <string.h>
.PP
-.BI "char *stpncpy(char *restrict " dest ", const char *restrict " src \
-", size_t " n );
+.BI "char *stpncpy(char " dest "[restrict " n "], \
+const char " src "[restrict " n "],
+.BI " size_t " n );
.fi
.PP
.RS -4
diff --git a/man3/strcasecmp.3 b/man3/strcasecmp.3
index 58a22349e..e94c79966 100644
--- a/man3/strcasecmp.3
+++ b/man3/strcasecmp.3
@@ -18,7 +18,7 @@ Standard C library
.B #include <strings.h>
.PP
.BI "int strcasecmp(const char *" s1 ", const char *" s2 );
-.BI "int strncasecmp(const char *" s1 ", const char *" s2 ", size_t " n );
+.BI "int strncasecmp(const char " s1 [ n "], const char " s2 [ n "], size_t " n );
.fi
.SH DESCRIPTION
The
diff --git a/man3/strcat.3 b/man3/strcat.3
index 5738bb9be..ff4c91307 100644
--- a/man3/strcat.3
+++ b/man3/strcat.3
@@ -20,8 +20,9 @@ Standard C library
.B #include <string.h>
.PP
.BI "char *strcat(char *restrict " dest ", const char *restrict " src );
-.BI "char *strncat(char *restrict " dest ", const char *restrict " src \
-", size_t " n );
+.BI "char *strncat(char " dest "[restrict " n "], \
+const char " src "[restrict " n ],
+.BI " size_t " n );
.fi
.SH DESCRIPTION
The
diff --git a/man3/strcmp.3 b/man3/strcmp.3
index 933011b9c..fc5bf1a70 100644
--- a/man3/strcmp.3
+++ b/man3/strcmp.3
@@ -21,7 +21,7 @@ Standard C library
.B #include <string.h>
.PP
.BI "int strcmp(const char *" s1 ", const char *" s2 );
-.BI "int strncmp(const char *" s1 ", const char *" s2 ", size_t " n );
+.BI "int strncmp(const char " s1 [ n "], const char " s2 [ n "], size_t " n );
.fi
.SH DESCRIPTION
The
diff --git a/man3/strcpy.3 b/man3/strcpy.3
index 461b811a5..50543cf7b 100644
--- a/man3/strcpy.3
+++ b/man3/strcpy.3
@@ -23,8 +23,9 @@ Standard C library
.B #include <string.h>
.PP
.BI "char *strcpy(char *restrict " dest ", const char *" src );
-.BI "char *strncpy(char *restrict " dest ", const char *restrict " src \
-", size_t " n );
+.BI "char *strncpy(char " dest "[restrict " n "], \
+const char " src "[restrict " n ],
+.BI " size_t " n );
.fi
.SH DESCRIPTION
The
diff --git a/man3/strdup.3 b/man3/strdup.3
index 7d15245b4..ea24a61bd 100644
--- a/man3/strdup.3
+++ b/man3/strdup.3
@@ -20,9 +20,9 @@ Standard C library
.PP
.BI "char *strdup(const char *" s );
.PP
-.BI "char *strndup(const char *" s ", size_t " n );
+.BI "char *strndup(const char " s [ n "], size_t " n );
.BI "char *strdupa(const char *" s );
-.BI "char *strndupa(const char *" s ", size_t " n );
+.BI "char *strndupa(const char " s [ n "], size_t " n );
.fi
.PP
.RS -4
diff --git a/man3/strerror.3 b/man3/strerror.3
index 8857ddb4e..c1621372b 100644
--- a/man3/strerror.3
+++ b/man3/strerror.3
@@ -31,10 +31,10 @@ Standard C library
.BI "const char *strerrorname_np(int " errnum );
.BI "const char *strerrordesc_np(int " errnum );
.PP
-.BI "int strerror_r(int " errnum ", char *" buf ", size_t " buflen );
+.BI "int strerror_r(int " errnum ", char " buf [ buflen "], size_t " buflen );
/* XSI-compliant */
.PP
-.BI "char *strerror_r(int " errnum ", char *" buf ", size_t " buflen );
+.BI "char *strerror_r(int " errnum ", char " buf [ buflen "], size_t " buflen );
/* GNU-specific */
.PP
.BI "char *strerror_l(int " errnum ", locale_t " locale );
diff --git a/man3/strfmon.3 b/man3/strfmon.3
index 40342a900..41b22b95e 100644
--- a/man3/strfmon.3
+++ b/man3/strfmon.3
@@ -12,9 +12,9 @@ Standard C library
.nf
.B #include <monetary.h>
.PP
-.BI "ssize_t strfmon(char *restrict " s ", size_t " max ,
+.BI "ssize_t strfmon(char " s "[restrict " max "], size_t " max ,
.BI " const char *restrict " format ", ...);"
-.BI "ssize_t strfmon_l(char *restrict " s ", size_t " max ", locale_t " locale ,
+.BI "ssize_t strfmon_l(char " s "[restrict " max "], size_t " max ", locale_t " locale ,
.BI " const char *restrict " format ", ...);"
.fi
.SH DESCRIPTION
diff --git a/man3/strfromd.3 b/man3/strfromd.3
index a936489a1..6c4df845c 100644
--- a/man3/strfromd.3
+++ b/man3/strfromd.3
@@ -20,11 +20,11 @@ Standard C library
.nf
.B #include <stdlib.h>
.PP
-.BI "int strfromd(char *restrict " str ", size_t " n ,
+.BI "int strfromd(char " str "[restrict " n "], size_t " n ,
.BI " const char *restrict " format ", double " fp ");"
-.BI "int strfromf(char *restrict " str ", size_t " n ,
+.BI "int strfromf(char " str "[restrict " n "], size_t " n ,
.BI " const char *restrict " format ", float "fp ");"
-.BI "int strfroml(char *restrict " str ", size_t " n ,
+.BI "int strfroml(char " str "[restrict " n "], size_t " n ,
.BI " const char *restrict " format ", long double " fp ");"
.fi
.PP
diff --git a/man3/strftime.3 b/man3/strftime.3
index 9a10275ca..0fb2c3123 100644
--- a/man3/strftime.3
+++ b/man3/strftime.3
@@ -24,11 +24,11 @@ Standard C library
.nf
.B #include <time.h>
.PP
-.BI "size_t strftime(char *restrict " s ", size_t " max ,
+.BI "size_t strftime(char " s "[restrict " max "], size_t " max ,
.BI " const char *restrict " format ,
.BI " const struct tm *restrict " tm );
.PP
-.BI "size_t strftime_l(char *restrict " s ", size_t " max ,
+.BI "size_t strftime_l(char " s "[restrict " max "], size_t " max ,
.BI " const char *restrict " format ,
.BI " const struct tm *restrict " tm ,
.BI " locale_t " locale );
diff --git a/man3/string.3 b/man3/string.3
index ec5ed0bd9..2db0f80eb 100644
--- a/man3/string.3
+++ b/man3/string.3
@@ -26,7 +26,7 @@ and
.I s2
ignoring case.
.TP
-.BI "int strncasecmp(const char *" s1 ", const char *" s2 ", size_t " n );
+.BI "int strncasecmp(const char " s1 [ n "], const char " s2 [ n "], size_t " n );
Compare the first
.I n
bytes of the strings
@@ -112,8 +112,11 @@ Randomly swap the characters in
Return the length of the string
.IR s .
.TP
-.BI "char *strncat(char *restrict " dest ", const char *restrict " src \
-", size_t " n );
+.nf
+.BI "char *strncat(char " dest "[restrict " n "], \
+const char " src "[restrict " n ],
+.BI " size_t " n );
+.fi
Append at most
.I n
bytes from the string
@@ -123,7 +126,7 @@ to the string
returning a pointer to
.IR dest .
.TP
-.BI "int strncmp(const char *" s1 ", const char *" s2 ", size_t " n );
+.BI "int strncmp(const char " s1 [ n "], const char " s2 [ n "], size_t " n );
Compare at most
.I n
bytes of the strings
@@ -131,8 +134,11 @@ bytes of the strings
and
.IR s2 .
.TP
-.BI "char *strncpy(char *restrict " dest ", const char *restrict " src \
-", size_t " n );
+.nf
+.BI "char *strncpy(char " dest "[restrict " n "], \
+const char " src "[restrict " n ],
+.BI " size_t " n );
+.fi
Copy at most
.I n
bytes from string
@@ -179,8 +185,11 @@ Extract tokens from the string
that are delimited by one of the bytes in
.IR delim .
.TP
-.BI "size_t strxfrm(char *restrict " dst ", const char *restrict " src \
-", size_t " n );
+.nf
+.BI "size_t strxfrm(char " dst "[restrict " n "], \
+const char " src "[restrict " n ],
+.BI " size_t " n );
+.fi
Transforms
.I src
to the current locale and copies the first
diff --git a/man3/strnlen.3 b/man3/strnlen.3
index 3cf575735..6df8e0d03 100644
--- a/man3/strnlen.3
+++ b/man3/strnlen.3
@@ -15,7 +15,7 @@ Standard C library
.nf
.B #include <string.h>
.PP
-.BI "size_t strnlen(const char *" s ", size_t " maxlen );
+.BI "size_t strnlen(const char " s [ maxlen "], size_t " maxlen );
.fi
.PP
.RS -4
diff --git a/man3/strxfrm.3 b/man3/strxfrm.3
index 909aed1df..df623a186 100644
--- a/man3/strxfrm.3
+++ b/man3/strxfrm.3
@@ -17,7 +17,8 @@ Standard C library
.nf
.B #include <string.h>
.PP
-.BI "size_t strxfrm(char *restrict " dest ", const char *restrict " src ,
+.BI "size_t strxfrm(char " dest "[restrict " n "], \
+const char " src "[restrict " n ],
.BI " size_t " n );
.fi
.SH DESCRIPTION
diff --git a/man3/ttyname.3 b/man3/ttyname.3
index 39d253356..4e37d6cbf 100644
--- a/man3/ttyname.3
+++ b/man3/ttyname.3
@@ -16,7 +16,7 @@ Standard C library
.B #include <unistd.h>
.PP
.BI "char *ttyname(int " fd );
-.BI "int ttyname_r(int " fd ", char *" buf ", size_t " buflen );
+.BI "int ttyname_r(int " fd ", char " buf [ buflen "], size_t " buflen );
.fi
.SH DESCRIPTION
The function
diff --git a/man3/unlocked_stdio.3 b/man3/unlocked_stdio.3
index f87b57779..cb9de40f6 100644
--- a/man3/unlocked_stdio.3
+++ b/man3/unlocked_stdio.3
@@ -33,7 +33,7 @@ Standard C library
", size_t " n ,
.BI " FILE *restrict " stream );
.PP
-.BI "char *fgets_unlocked(char *restrict " s ", int " n \
+.BI "char *fgets_unlocked(char " s "[restrict " n "], int " n \
", FILE *restrict " stream );
.BI "int fputs_unlocked(const char *restrict " s ", FILE *restrict " stream );
.PP
@@ -47,7 +47,7 @@ Standard C library
.BI "wint_t putwc_unlocked(wchar_t " wc ", FILE *" stream );
.BI "wint_t putwchar_unlocked(wchar_t " wc );
.PP
-.BI "wchar_t *fgetws_unlocked(wchar_t *restrict " ws ", int " n ,
+.BI "wchar_t *fgetws_unlocked(wchar_t " ws "[restrict " n "], int " n ,
.BI " FILE *restrict " stream );
.BI "int fputws_unlocked(const wchar_t *restrict " ws ,
.BI " FILE *restrict " stream );
diff --git a/man3/wcsnrtombs.3 b/man3/wcsnrtombs.3
index ef9aeba4c..bc0a9c64f 100644
--- a/man3/wcsnrtombs.3
+++ b/man3/wcsnrtombs.3
@@ -17,7 +17,7 @@ Standard C library
.nf
.B #include <wchar.h>
.PP
-.BI "size_t wcsnrtombs(char *restrict " dest ", const wchar_t **restrict " src ,
+.BI "size_t wcsnrtombs(char " dest "[restrict " len "], const wchar_t **restrict " src ,
.BI " size_t " nwc ", size_t " len \
", mbstate_t *restrict " ps );
.fi
diff --git a/man3/wcsrtombs.3 b/man3/wcsrtombs.3
index aed7024b7..335498663 100644
--- a/man3/wcsrtombs.3
+++ b/man3/wcsrtombs.3
@@ -18,7 +18,7 @@ Standard C library
.nf
.B #include <wchar.h>
.PP
-.BI "size_t wcsrtombs(char *restrict " dest ", const wchar_t **restrict " src ,
+.BI "size_t wcsrtombs(char " dest "[restrict " len "], const wchar_t **restrict " src ,
.BI " size_t " len ", mbstate_t *restrict " ps );
.fi
.SH DESCRIPTION
diff --git a/man3/wcstombs.3 b/man3/wcstombs.3
index 547381f7e..7c2394b36 100644
--- a/man3/wcstombs.3
+++ b/man3/wcstombs.3
@@ -18,7 +18,7 @@ Standard C library
.nf
.B #include <stdlib.h>
.PP
-.BI "size_t wcstombs(char *restrict " dest ", const wchar_t *restrict " src ,
+.BI "size_t wcstombs(char " dest "[restrict " n "], const wchar_t *restrict " src ,
.BI " size_t " n );
.fi
.SH DESCRIPTION
--
2.37.2
^ permalink raw reply related [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-08-26 21:07 [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters Alejandro Colomar
@ 2022-08-27 11:10 ` Ingo Schwarze
2022-08-27 12:15 ` Alejandro Colomar
0 siblings, 1 reply; 85+ messages in thread
From: Ingo Schwarze @ 2022-08-27 11:10 UTC (permalink / raw)
To: Alejandro Colomar; +Cc: linux-man, JeanHeyd Meneide
Hi Alejandro,
> -.BI "char *getcwd(char *" buf ", size_t " size );
> +.BI "char *getcwd(char " buf [ size "], size_t " size );
I dislike this.
Manual pages should show function prototypes as they really are in
the header file, or if the header file contains useless fluff like
"restrict", a shortened form showing the essence that actually matters
for using the API. They should certainly not show something imaginary
that does not match reality, and even less so using invalid syntax.
Yours,
Ingo
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-08-27 11:10 ` Ingo Schwarze
@ 2022-08-27 12:15 ` Alejandro Colomar
2022-08-27 13:08 ` Ingo Schwarze
0 siblings, 1 reply; 85+ messages in thread
From: Alejandro Colomar @ 2022-08-27 12:15 UTC (permalink / raw)
To: Ingo Schwarze; +Cc: linux-man, JeanHeyd Meneide
[-- Attachment #1.1: Type: text/plain, Size: 4593 bytes --]
Hi Ingo,
On 8/27/22 13:10, Ingo Schwarze wrote:
> Hi Alejandro,
>
>> -.BI "char *getcwd(char *" buf ", size_t " size );
>> +.BI "char *getcwd(char " buf [ size "], size_t " size );
>
> I dislike this.
>
> Manual pages should show function prototypes as they really are in
> the header file, or if the header file contains useless fluff like
> "restrict", a shortened form showing the essence that actually matters
> for using the API.
Regarding restrict, it is essential to differentiate memcpy(3) and
memmove(3), which are otherwise identical:
void *memmove(void *dest, const void *src, size_t n);
void *memcpy(void *restrict dest, const void *restrict src,
size_t n);
I guess you will argue that the description specified the difference, so
it's not necessary in the synopsis. That's true. But reality is that
programmers have historically not cared about those details; so much
that glibc had to provide a compat symbol for old programs, which
basically maps memcpy(3) to memmove(3) for code linked against old glibc
versions.
In some cases, like in memcpy(3), the use of restrict is important; in
others, such as in printf(3), it is irrelevant. But for consistency, I
decided to use restrict everywhere where one of POSIX, or glibc used it
(assuming that POSIX would never remove a restrict qualifier if ISO C
required it). In some cases, glibc and POSIX differed, and I used the
most restrictive prototype.
I didn't add that change about restrict without concerns about being too
noisy. I had them, and still have them. But I think the added value is
more than the one I removed. Now prototypes are more precise, and
overcoming the noise shouldn't be too much of a problem.
In the case of (abusing) VLA syntax, it's more or less the same thing,
with a bit of added WTF moments about the "Why is this code using an
identifier declared right after it? Is it a typo?". I guess the WTF
moments will be more relevant the first few months, and less so when
time passes and programmers get used to the syntax.
I used strlcpy(3) in the commit message on purpose, as it's a great
example, similar to how good is the one about memcpy(3). The competitor
(as they promoted it) to strlcpy(3) in the Linux kernel is strscpy(9)
(not available to user space). They seem to be the same thing, but they
are not. Let's show their prototypes:
size_t strlcpy(char dst[size], const char *src, size_t size);
ssize_t strscpy(char dst[size], const char src[size], size_t size);
From those prototypes, I can already see that the kernel accepts a
possibly-not-terminated string, while strlcpy(3) requires that the
string is terminated. I didn't use restrict here to more clearly show
the difference in VLA syntax (therefore admitting that a bit of noise is
true).
Then of course, there's no difference in the prototypes between
strscpy(9) and strncpy(3), apart from the return value, of course:
char *strncpy(char dest[n], const char src[n], size_t n);
And yet they are different functions (one guarantees the produced string
to be NUL-terminated and the other not (and also clears unnecessarily
the rest of the buffer, so strncpy(3) is just broken). But they are
more or less in the same league, as they are used for transforming
untrusted strings into proper strings (strncpy(3) only if you use it
with sizeof(buf) - 1), and that's shown by the prototypes.
Do you regard the (abused) VLA syntax as something much worse than the
use of restrict? Or are they more or less equivalent to you?
> They should certainly not show something imaginary
> that does not match reality, and even less so using invalid syntax.
Well, not that I haven't had those thoughts, but we already use ilegal
syntax in some cases for good reasons. See for example open(2):
int open(const char *pathname, int flags);
int open(const char *pathname, int flags, mode_t mode);
Of course, you can't declare two conflicting prototypes like that. But
it shows that those are the two only ways you can use it. I'll admit
that a long time ago I told Michael that we should fix those prototypes
to match reality, with legal syntax, because otherwise they are
confusing. But with time, I got used to that weirdness, and it now
seems to me more informative than just '...' as FreeBSD and OpenBSD
document.
>
> Yours,
> Ingo
Cheers,
Alex
--
Alejandro Colomar
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-08-27 12:15 ` Alejandro Colomar
@ 2022-08-27 13:08 ` Ingo Schwarze
2022-08-27 18:38 ` Alejandro Colomar
0 siblings, 1 reply; 85+ messages in thread
From: Ingo Schwarze @ 2022-08-27 13:08 UTC (permalink / raw)
To: Alejandro Colomar; +Cc: linux-man, JeanHeyd Meneide
Hi Alejandro,
Alejandro Colomar wrote on Sat, Aug 27, 2022 at 02:15:32PM +0200:
> On 8/27/22 13:10, Ingo Schwarze wrote:
>> Alejandro Colomar wrote:
>>> -.BI "char *getcwd(char *" buf ", size_t " size );
>>> +.BI "char *getcwd(char " buf [ size "], size_t " size );
>> I dislike this.
>>
>> Manual pages should show function prototypes as they really are in
>> the header file, or if the header file contains useless fluff like
>> "restrict", a shortened form showing the essence that actually matters
>> for using the API.
> Regarding restrict, it is essential to differentiate memcpy(3) and
> memmove(3), which are otherwise identical:
>
> void *memmove(void *dest, const void *src, size_t n);
>
> void *memcpy(void *restrict dest, const void *restrict src,
> size_t n);
Actually, the syntax of both is identical, only the semantics differ.
That said, you are right that using memcpy(3) when memmove(3) is
required is a famous and widespread bug. I doubt putting "restrict"
into the SYNOPSIS will discourage careless programmers from making that
mistake though.
To me, "restrict" feels like a specialized tool for people writing
compiler optimizers, not like something important enough to clutter
API documenation.
> Do you regard the (abused) VLA syntax as something much worse than the
> use of restrict? Or are they more or less equivalent to you?
If your implementation really contains "restrict" in the header
file and it's standardized, putting it into the SYNOPSIS seems
acceptable to me. Not necessary though and maybe somewhat noisy
and distracting.
Putting something that is not in the implementation and/or not
in the standard into the SYNOPSIS seems much worse to me.
And invalid syntax in the SYNOPSIS is even worse than that.
For example, people may attempt to use SYNOPSIS as an example
when designing their own, private function for a similar but
not identical purpose and end up writing non-portable code,
or even code that does not compile anywhere.
They may be wrong if they blame you for that, but i doubt they
will thank you.
>> They should certainly not show something imaginary
>> that does not match reality, and even less so using invalid syntax.
> Well, not that I haven't had those thoughts, but we already use ilegal
> syntax in some cases for good reasons. See for example open(2):
>
> int open(const char *pathname, int flags);
> int open(const char *pathname, int flags, mode_t mode);
>
> Of course, you can't declare two conflicting prototypes like that.
This does not seem quite as horrifying as
char *getcwd(char buf[size], size_t size);
because at least each of the prototypes is valid.
My main concern about it would be that it is likely to make some people
think (and C++ programmers in particular :-/) that there is type
checking for the third and subsequent arguments, in which case they
will be unpleasantly surprised when accidentally writing something like
open(pathname, flags, &some_var, mode);
and finding out later that it compiled and ran just fine, but the
resulting file wasn't quite as confidential as they hoped.
Explicitly displaying the ... to indicate the variable number of
arguments, by contrast, makes it very clear that an API is almost
certainly unusually dangerous and needs to be used with especial
diligence.
Either way, certainly not quite as bad as invalid syntax inside
a prototype...
Yours,
Ingo
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-08-27 13:08 ` Ingo Schwarze
@ 2022-08-27 18:38 ` Alejandro Colomar
2022-08-28 11:24 ` Alejandro Colomar
0 siblings, 1 reply; 85+ messages in thread
From: Alejandro Colomar @ 2022-08-27 18:38 UTC (permalink / raw)
To: Ingo Schwarze; +Cc: linux-man, JeanHeyd Meneide
[-- Attachment #1.1: Type: text/plain, Size: 8610 bytes --]
Hi Ingo,
On 8/27/22 15:08, Ingo Schwarze wrote:
[...]
>> void *memmove(void *dest, const void *src, size_t n);
>>
>> void *memcpy(void *restrict dest, const void *restrict src,
>> size_t n);
>
> Actually, the syntax of both is identical, only the semantics differ.
>
> That said, you are right that using memcpy(3) when memmove(3) is
> required is a famous and widespread bug. I doubt putting "restrict"
> into the SYNOPSIS will discourage careless programmers from making that
> mistake though.
There will always be completely careless programmers, and I won't
attempt to target those. But I hope to increase the percentage of
population that receives the message with this change.
There's a lot of good programmers out there that ignore what we'd
probably consider basic stuff. I've seen very successful programmers
using pointer types to store offsets, where ptrdiff_t should be used;
and of course the code is full of casts (including to uintptr_t) to
silence the hundreds of warnings from the compiler yelling at that
blasphemy (3 casts in a line of code that should be just 'q = p + offset;').
I don't think it's only carelessness (okay, there's a bit of it), but
more that some programmers still live in the past century, and don't
know that 'const', and more recently 'restrict' were added to the
language. Did they live in a cave? So it seems.
>
> To me, "restrict" feels like a specialized tool for people writing
> compiler optimizers, not like something important enough to clutter
> API documenation.
>
>> Do you regard the (abused) VLA syntax as something much worse than the
>> use of restrict? Or are they more or less equivalent to you?
>
> If your implementation really contains "restrict" in the header
> file and it's standardized, putting it into the SYNOPSIS seems
> acceptable to me. Not necessary though and maybe somewhat noisy
> and distracting.
I would document restrict even if glibc ommited it from their
prototypes, because the manual pages document the behavior, and are not
required to be uninformative when the implementation is. memcpy(3)
requires restrict pointers, even if the implementation doesn't advertise
it. The run-time behavior will produce bugs if the pointers aren't
restrict, and so the documentation better tells that.
Should that only be limited to the DESCRIPTION? Maybe. I don't like
that idea, when we have the language to express that more precisely and
concisely.
What is the usefulness of a prototype that is as short as possible?
Okay, it's less noisy, but at the cost of giving less information to an
interested reader.
void *memcpy(void dest[restrict n], const void src[restrict n],
size_t n);
void *memcpy(void *restrict dest, const void *restrict src,
size_t n);
void *memcpy(void *dest, const void *src, size_t n);
void *memcpy(void *dest, void *src, size_t n);
void *memcpy(void *, void *, size_t);
Where do we stop? Okay, I was abusing too much in the first one. I
didn't dare to propose that, since VLA syntax with void is, right now, a
shooting offense. GNU C allows pointer arithmetic on 'void *', but for
some reason, when they are disguised as false arrays it is not allowed
to compile. Give it some decades, though. I think it would be useful.
These prototypes give several levels of information, from most basic, to
most precise:
- number of arguments.
- type of the arguments.
- small description of what they mean (through the names).
- are pointers only used for reading, or are the pointees also modified?
- can pointers point to the same storage?
- how many elements/bytes has the storage pointed to by pointers?
And now yet in those prototypes, but I'd also like to give information
about if pointers are allowed to be null or not. I'm still not
convinced about how to document that.
Why draw the line of this is useful and this is noise in a specific
point? The description could perfectly document the const-ness of a
parameter as well as it documents the restrict-ness or the null-ness. I
think that having the prototypes be concise is good, but the overall
goal is to have the whole manual page concise. If adding a little bit
to the prototypes makes the description much more concise (since it
doesn't need to document const-ness, and hopefully restrict-ness and
null-ness or size, or at least it can be shorter about it since the
prototype already tells you a big part), I'd go for it.
>
> Putting something that is not in the implementation and/or not
> in the standard into the SYNOPSIS seems much worse to me.
I have mixed feelings about this.
As you probably know by older threads, I don't like the standard as a
driving force in C, so I don't like the idea that implementations and
documentation should go behind the standard, using whatever the standard
provides them with. It's not completely like that, as I acknowledge
that the standard has improved considerably the language and the
library, and I like to use some of their improvements, or even things
that it is yet considering (my own personal code I build it with
-std=gnu2x, since it's just for me, so I don't care about portability,
and want the most useful extensions, and I'm prepared to deal with
compiler bugs); but I don't like the standard as the _only_ driving force.
Said that, I think both GCC and glibc should not be intimidated by the
standard when developing new extensions. Okay, that is not a wildcard
for releasing crap; but if a feature is good, that's fine. Now, are the
manual pages allowed to extend the language as well? Of course not so
much as the compiler or libc, but a little bit wouldn't hurt. So, I
wouldn't take your comment too strictly.
Still, I acknowledge this suggestion of mine is far more aggressive than
most other trivial deviations from valid code. Maybe I should keep the
idea floating around, and suggest it again after the new standard is
released, so that compiler writers are less stressed about it, and can
consider such an extension. I'll maybe talk to GCC maintainers about it
and see what they think.
>
> And invalid syntax in the SYNOPSIS is even worse than that.
> For example, people may attempt to use SYNOPSIS as an example
> when designing their own, private function for a similar but
> not identical purpose and end up writing non-portable code,
> or even code that does not compile anywhere.
>
> They may be wrong if they blame you for that, but i doubt they
> will thank you.
Yeah, that's something important to consider.
>
>>> They should certainly not show something imaginary
>>> that does not match reality, and even less so using invalid syntax.
>
>> Well, not that I haven't had those thoughts, but we already use ilegal
>> syntax in some cases for good reasons. See for example open(2):
>>
>> int open(const char *pathname, int flags);
>> int open(const char *pathname, int flags, mode_t mode);
>>
>> Of course, you can't declare two conflicting prototypes like that.
>
> This does not seem quite as horrifying as
>
> char *getcwd(char buf[size], size_t size);
>
> because at least each of the prototypes is valid.
>
> My main concern about it would be that it is likely to make some people
> think (and C++ programmers in particular :-/) that there is type
> checking for the third and subsequent arguments, in which case they
> will be unpleasantly surprised when accidentally writing something like
>
> open(pathname, flags, &some_var, mode);
>
> and finding out later that it compiled and ran just fine, but the
> resulting file wasn't quite as confidential as they hoped.
>
> Explicitly displaying the ... to indicate the variable number of
> arguments, by contrast, makes it very clear that an API is almost
> certainly unusually dangerous and needs to be used with especial
> diligence.
Yeah, I suggested Michael using '...' and adding in a comment:
int open(const char *pathname, int flags, ... /* mode_t mode */);
He agreed, but we were doing something else, and then I didn't ask
again, so this change didn't make it. If you recommend me doing it,
I'll do.
>
> Either way, certainly not quite as bad as invalid syntax inside
> a prototype...
Cheers,
Alex
>
> Yours,
> Ingo
--
Alejandro Colomar
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-08-27 18:38 ` Alejandro Colomar
@ 2022-08-28 11:24 ` Alejandro Colomar
[not found] ` <CACqA6+mfaj6Viw+LVOG=nE350gQhCwVKXRzycVru5Oi4EJzgTg@mail.gmail.com>
0 siblings, 1 reply; 85+ messages in thread
From: Alejandro Colomar @ 2022-08-28 11:24 UTC (permalink / raw)
To: Ingo Schwarze; +Cc: linux-man, JeanHeyd Meneide
[-- Attachment #1.1: Type: text/plain, Size: 827 bytes --]
On 8/27/22 20:38, Alejandro Colomar wrote:
>
> void *memcpy(void dest[restrict n], const void src[restrict n],
> size_t n);
>
> void *memcpy(void *restrict dest, const void *restrict src,
> size_t n);
>
> void *memcpy(void *dest, const void *src, size_t n);
>
> void *memcpy(void *dest, void *src, size_t n);
>
> void *memcpy(void *, void *, size_t);
BTW, I forgot about 'noreturn', probably because memcpy(3) doesn't use
it, but it's another layer of information which also adds a bit of
noise, but is also useful to know. The Linux man-pages use it (see
exit(3)); I added that more or less at the same time I added restrict.
--
Alejandro Colomar
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
[not found] ` <CACqA6+mfaj6Viw+LVOG=nE350gQhCwVKXRzycVru5Oi4EJzgTg@mail.gmail.com>
@ 2022-09-02 21:02 ` Alejandro Colomar
2022-09-02 21:57 ` Alejandro Colomar
0 siblings, 1 reply; 85+ messages in thread
From: Alejandro Colomar @ 2022-09-02 21:02 UTC (permalink / raw)
To: JeanHeyd Meneide; +Cc: Ingo Schwarze, linux-man
[-- Attachment #1.1: Type: text/plain, Size: 3856 bytes --]
Hi JeanHeyd!
I'm forwarding your email to the mailing list, from my post-1996 mail
client ;)
I hope all of your content is kept (even if slightly degraded).
Cheers,
Alex
-------- Forwarded Message --------
Subject: Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in
function parameters
Date: Fri, 2 Sep 2022 16:56:00 -0400
From: JeanHeyd Meneide <wg14@soasis.org>
To: Alejandro Colomar <alx.manpages@gmail.com>
CC: Ingo Schwarze <schwarze@usta.de>, linux-man@vger.kernel.org
Hi Alejandro and Ingo,
Just chiming in from a Standards perspective, here. We discussed,
briefly, a way to allow Variable-Length function parameter declarations
like the ones shown in this thread (e.g., char *getcwd(char buf[size],
size_t size );).
In GCC, there is a GNU extension that allows explicitly
forward-declaring the prototype. Using the above example, it would look
like so:
char *getcwd(size_t size; char buf[size], size_t size);
(Live Example [1])
(Note the `;` after the first "size" declaration). This was brought
before the Committee to vote on for C23 in the form of N2780 [2], around
the January 2022 timeframe. The paper did not pass, and it was seen as a
"failed extension". After the vote on that failed, we talked about other
ways of allowing places whether there was some appetite to allow
"forward parsing" for this sort of case. That is, could we simply allow:
char *getcwd(char buf[size], size_t size);
to work as expected. The vote for this did not gain full consensus
either, but there were a lot of abstentions [3]. While I personally
voted in favor of allowing such for C, there was distinct worry that
this would produce issues for weaker C implementations that did not want
to commit to delayed parsing or forward parsing of the entirety of the
argument list before resolving types. There are enough abstentions
during voting that a working implementation with a writeup of complexity
would sway the Committee one way or the other.
This is not to dissuade Alejandro's position, or to bolster Ingo's
point; I'm mostly just reporting the Committee's response here. This is
an unsolved problem for the Committee, and also a larger holdover from
the removal of K&R declarations from C23, which COULD solve this problem:
// decl
char *getcwd();
// impl
char* getcwd(buf, size)
char buf[size];
size_t size;
{
/* impl here */
}
There is room for innovation here, or perhaps bolstering of the
GCC original extension. As it stands right now, compilers only very
recently started taking Variably-Modified Type parameters and Static
Extent parameters seriously after carefully separating them out of
Variable-Length Arrays, warning where they can when static or other
array parameters do not match buffer lengths and so-on.
Not just to the folks in this thread, but to the broader
community for anyone who is paying attention: WG14 would actively like
to solve this problem. If someone can:
- prove out a way to do delayed parsing that is not implementation-costly,
- revive the considered-dead GCC extension, or
- provide a 3rd or 4th way to support the goals,
I am certain WG14 would look favorably upon such a thing eventually,
brought before the Committee in inclusion for C2y/C3a.
Whether or not you feel like the manpages are the best place to
start that, I'll leave up to you!
Thanks,
JeanHeyd
[1]: https://godbolt.org/z/dv1G3qGa3 <https://godbolt.org/z/dv1G3qGa3>
[2]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2780.pdf
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2780.pdf>
[3]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2991.pdf
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2991.pdf> - search
for n2780
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-09-02 21:02 ` Alejandro Colomar
@ 2022-09-02 21:57 ` Alejandro Colomar
2022-09-03 12:47 ` Martin Uecker
0 siblings, 1 reply; 85+ messages in thread
From: Alejandro Colomar @ 2022-09-02 21:57 UTC (permalink / raw)
To: JeanHeyd Meneide; +Cc: Ingo Schwarze, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 8301 bytes --]
Hi JeanHeyd,
> Subject: Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in
> function parameters
> Date: Fri, 2 Sep 2022 16:56:00 -0400
> From: JeanHeyd Meneide <wg14@soasis.org>
> To: Alejandro Colomar <alx.manpages@gmail.com>
> CC: Ingo Schwarze <schwarze@usta.de>, linux-man@vger.kernel.org
>
>
>
> Hi Alejandro and Ingo,
>
> Just chiming in from a Standards perspective, here. We discussed,
> briefly, a way to allow Variable-Length function parameter declarations
> like the ones shown in this thread (e.g., char *getcwd(char buf[size],
> size_t size );).
>
> In GCC, there is a GNU extension that allows explicitly
> forward-declaring the prototype. Using the above example, it would look
> like so:
I added the GCC list to the thread, so that they can intervene if they
consider it necessary.
>
> char *getcwd(size_t size; char buf[size], size_t size);
I read about that, although I don't like it very much, and never used it.
>
> (Live Example [1])
>
> (Note the `;` after the first "size" declaration). This was brought
> before the Committee to vote on for C23 in the form of N2780 [2], around
> the January 2022 timeframe. The paper did not pass, and it was seen as a
> "failed extension". After the vote on that failed, we talked about other
> ways of allowing places whether there was some appetite to allow
> "forward parsing" for this sort of case. That is, could we simply allow:
>
> char *getcwd(char buf[size], size_t size);
>
> to work as expected. The vote for this did not gain full consensus
> either, but there were a lot of abstentions [3]. While I personally
> voted in favor of allowing such for C, there was distinct worry that
> this would produce issues for weaker C implementations that did not want
> to commit to delayed parsing or forward parsing of the entirety of the
> argument list before resolving types. There are enough abstentions
> during voting that a working implementation with a writeup of complexity
> would sway the Committee one way or the other.
I like that this got less hate than the GNU extension. It's nicer to my
eyes.
>
> This is not to dissuade Alejandro's position, or to bolster Ingo's
> point; I'm mostly just reporting the Committee's response here. This is
> an unsolved problem for the Committee, and also a larger holdover from
> the removal of K&R declarations from C23, which COULD solve this problem:
>
> // decl
> char *getcwd();
>
> // impl
> char* getcwd(buf, size)
> char buf[size];
> size_t size;
> {
> /* impl here */
> }
I won't miss them ;)
My regex-based parser[1] that finds declarations and definitions in C
code bases goes nuts with K&R functions. They are dead for good :)
[1]: <http://www.alejandro-colomar.es/src/alx/alx/grepc.git/>
>
> There is room for innovation here, or perhaps bolstering of the
> GCC original extension. As it stands right now, compilers only very
> recently started taking Variably-Modified Type parameters and Static
> Extent parameters seriously after carefully separating them out of
> Variable-Length Arrays, warning where they can when static or other
> array parameters do not match buffer lengths and so-on.
>
> Not just to the folks in this thread, but to the broader
> community for anyone who is paying attention: WG14 would actively like
> to solve this problem. If someone can:
> - prove out a way to do delayed parsing that is not implementation-costly,
> - revive the considered-dead GCC extension, or
> - provide a 3rd or 4th way to support the goals,
>
> I am certain WG14 would look favorably upon such a thing eventually,
> brought before the Committee in inclusion for C2y/C3a.
>
> Whether or not you feel like the manpages are the best place to
> start that, I'll leave up to you!
I'll try to defend the reasons to start this in the man-pages.
This feature is mostly for documentation purposes, not being meaningful
for code at all (for some meaning of meaningful), since it won't change
the function definition in any way, nor the calls to it. At least not
by itself; static analysis may get some benefits, though.
Also, new code can be designed from the beginning so that sizes go
before their corresponding arrays, so that new code won't typically be
affected by the lack of this feature in the language.
This leaves us with legacy code, especially libc, which just works, and
doesn't have any urgent needs to change their prototypes in this regard
(they could, to improve static analysis, but not what we'd call urgent).
And since most people don't go around reading libc headers searching for
function declarations (especially since there are manual pages that show
them nicely), it's not like the documentation of the code depends on how
the function is _actually_ declared in code (that's why I also defended
documenting restrict even if glibc wouldn't have cared to declare it),
but it depends basically on what the manual pages say about the
function. If the manual pages say a function gets 'restrict' params, it
means it gets 'restrict' params, no matter what the code says, and if it
doesn't, the function accepts overlapping pointers, at least for most of
the public (modulo manual page bugs, that is).
So this extension could very well be added by the manual pages, as a
form of documentation, and then maybe picked up by compilers that have
enough resources to implement it.
Considering that this feature is mostly about documentation (and a bit
of static analysis too), the documentation should be something appealing
to the reader.
Let's take an example:
int getnameinfo(const struct sockaddr *restrict addr,
socklen_t addrlen,
char *restrict host, socklen_t hostlen,
char *restrict serv, socklen_t servlen,
int flags);
and some transformations:
int getnameinfo(const struct sockaddr *restrict addr,
socklen_t addrlen,
char host[restrict hostlen], socklen_t hostlen,
char serv[restrict servlen], socklen_t servlen,
int flags);
int getnameinfo(socklen_t hostlen;
socklen_t servlen;
const struct sockaddr *restrict addr,
socklen_t addrlen,
char host[restrict hostlen], socklen_t hostlen,
char serv[restrict servlen], socklen_t servlen,
int flags);
(I'm not sure if I used correct GNU syntax, since I never used that
extension myself.)
The first transformation above is non-ambiguous, as concise as possible,
and its only issue is that it might complicate the implementation a bit
too much. I don't think forward-using a parameter's size would be too
much of a parsing problem for human readers.
The second one is unnecessarily long and verbose, and semicolons are not
very distinguishable from commas, for human readers, which may be very
confusing.
int foo(int a; int b[a], int a);
int foo(int a, int b[a], int o);
Those two are very different to the compiler, and yet very similar to
the human eye. I don't like it. The fact that it allows for simpler
compilers isn't enough to overcome the readability issues.
I think I'd prefer having the forward-using syntax as a non-standard
extension --or a standard but optional language feature-- to avoid
forcing small compilers to implement it, rather than having the GNU
extension standardized in all compilers.
Having this extension in any single compiler would even make it more
appealing to manual pages, which could use the syntax more freely
without fear of confusing readers. Even if the standard wouldn't accept it.
Let's see if GCC likes the feature and helps me attempt to use it a
little bit! :-)
Cheers,
Alex
--
Alejandro Colomar
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-09-02 21:57 ` Alejandro Colomar
@ 2022-09-03 12:47 ` Martin Uecker
2022-09-03 13:29 ` Ingo Schwarze
2022-09-03 13:41 ` Alejandro Colomar
0 siblings, 2 replies; 85+ messages in thread
From: Martin Uecker @ 2022-09-03 12:47 UTC (permalink / raw)
To: Alejandro Colomar, JeanHeyd Meneide; +Cc: Ingo Schwarze, linux-man, gcc
...
> >
> > Whether or not you feel like the manpages are the best place to
> > start that, I'll leave up to you!
>
> I'll try to defend the reasons to start this in the man-pages.
>
> This feature is mostly for documentation purposes, not being meaningful
> for code at all (for some meaning of meaningful), since it won't change
> the function definition in any way, nor the calls to it. At least not
> by itself; static analysis may get some benefits, though.
GCC will warn if the bound is specified inconsistently between
declarations and also emit warnings if it can see that a buffer
which is passed is too small:
https://godbolt.org/z/PsjPG1nv7
BTW: If you declare pointers to arrays (not first elements) you
can get run-time bounds checking with UBSan:
https://godbolt.org/z/TvMo89WfP
>
> Also, new code can be designed from the beginning so that sizes go
> before their corresponding arrays, so that new code won't typically be
> affected by the lack of this feature in the language.
>
> This leaves us with legacy code, especially libc, which just works, and
> doesn't have any urgent needs to change their prototypes in this regard
> (they could, to improve static analysis, but not what we'd call urgent).
It would be useful step to find out-of-bounds problem in
applications using libc.
> And since most people don't go around reading libc headers searching for
> function declarations (especially since there are manual pages that show
> them nicely), it's not like the documentation of the code depends on how
> the function is _actually_ declared in code (that's why I also defended
> documenting restrict even if glibc wouldn't have cared to declare it),
> but it depends basically on what the manual pages say about the
> function. If the manual pages say a function gets 'restrict' params, it
> means it gets 'restrict' params, no matter what the code says, and if it
> doesn't, the function accepts overlapping pointers, at least for most of
> the public (modulo manual page bugs, that is).
>
> So this extension could very well be added by the manual pages, as a
> form of documentation, and then maybe picked up by compilers that have
> enough resources to implement it.
>
>
> Considering that this feature is mostly about documentation (and a bit
> of static analysis too), the documentation should be something appealing
> to the reader.
>
>
> Let's take an example:
>
>
> int getnameinfo(const struct sockaddr *restrict addr,
> socklen_t addrlen,
> char *restrict host, socklen_t hostlen,
> char *restrict serv, socklen_t servlen,
> int flags);
>
> and some transformations:
>
>
> int getnameinfo(const struct sockaddr *restrict addr,
> socklen_t addrlen,
> char host[restrict hostlen], socklen_t hostlen,
> char serv[restrict servlen], socklen_t servlen,
> int flags);
>
>
> int getnameinfo(socklen_t hostlen;
> socklen_t servlen;
> const struct sockaddr *restrict addr,
> socklen_t addrlen,
> char host[restrict hostlen], socklen_t hostlen,
> char serv[restrict servlen], socklen_t servlen,
> int flags);
>
> (I'm not sure if I used correct GNU syntax, since I never used that
> extension myself.)
>
> The first transformation above is non-ambiguous, as concise as possible,
> and its only issue is that it might complicate the implementation a bit
> too much. I don't think forward-using a parameter's size would be too
> much of a parsing problem for human readers.
I personally find the second form not terrible. Being
able to read code left-to-right, top-down is helpful in more
complicated examples.
> The second one is unnecessarily long and verbose, and semicolons are not
> very distinguishable from commas, for human readers, which may be very
> confusing.
>
> int foo(int a; int b[a], int a);
> int foo(int a, int b[a], int o);
>
> Those two are very different to the compiler, and yet very similar to
> the human eye. I don't like it. The fact that it allows for simpler
> compilers isn't enough to overcome the readability issues.
This is true, I would probably use it with a comma and/or
syntax highlighting.
> I think I'd prefer having the forward-using syntax as a non-standard
> extension --or a standard but optional language feature-- to avoid
> forcing small compilers to implement it, rather than having the GNU
> extension standardized in all compilers.
The problems with the second form are:
- it is not 100% backwards compatible (which maybe ok though) as
the semantics of the following code changes:
int n;
int foo(int a[n], int n); // refers to different n!
Code written for new compilers could then be misunderstood
by old compilers when a variable with 'n' is in scope.
- it would generally be fundamentally new to C to have
backwards references and parser might need to be changes
to allow this
- a compiler or tool then has to deal also with ugly
corner cases such as mutual references:
int foo(int (*a)[sizeof(*b)], int (*b)[sizeof(*a)]);
We could consider new syntax such as
int foo(char buf[.n], int n);
Personally, I would prefer the conceptual simplicity of forward
declarations and the fact that these exist already in GCC
over any alternative. I would also not mind new syntax, but
then one has to define the rules more precisely to avoid the
aforementioned problems.
Martin
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-09-03 12:47 ` Martin Uecker
@ 2022-09-03 13:29 ` Ingo Schwarze
2022-09-03 15:08 ` Alejandro Colomar
2022-09-03 13:41 ` Alejandro Colomar
1 sibling, 1 reply; 85+ messages in thread
From: Ingo Schwarze @ 2022-09-03 13:29 UTC (permalink / raw)
To: alx.manpages
Cc: Martin Uecker, Alejandro Colomar, JeanHeyd Meneide, linux-man, gcc
Hi,
the only point i strongly care about is this one:
Manual pages should not use
* non-standard syntax
* non-portable syntax
* ambiguous syntax (i.e. syntax that might have different meanings
with different compilers or in different contexts)
* syntax that might be invalid or dangerous with some widely
used compiler collections like GCC or LLVM
Regarding the discussions about standardization and extensions,
all proposals i have seen look seriously ugly and awkward to me,
and i'm not yet convinced such ugliness is sufficiently offset by
the relatively minor benefit that is apparent to me right now.
Yours,
Ingo
--
Ingo Schwarze <schwarze@usta.de>
http://www.openbsd.org/ <schwarze@openbsd.org>
http://mandoc.bsd.lv/ <schwarze@mandoc.bsd.lv>
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-09-03 12:47 ` Martin Uecker
2022-09-03 13:29 ` Ingo Schwarze
@ 2022-09-03 13:41 ` Alejandro Colomar
2022-09-03 14:35 ` Martin Uecker
1 sibling, 1 reply; 85+ messages in thread
From: Alejandro Colomar @ 2022-09-03 13:41 UTC (permalink / raw)
To: Martin Uecker; +Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 5699 bytes --]
Hi Martin,
On 9/3/22 14:47, Martin Uecker wrote:
[...]
> GCC will warn if the bound is specified inconsistently between
> declarations and also emit warnings if it can see that a buffer
> which is passed is too small:
>
> https://godbolt.org/z/PsjPG1nv7
That's very good news!
BTW, it's nice to see that GCC doesn't need 'static' for array
parameters. I never understood what the static keyword adds there.
There's no way one can specify an array size an mean anything other than
requiring that, for a non-null pointer, the array should have at least
that size.
>
>
> BTW: If you declare pointers to arrays (not first elements) you
> can get run-time bounds checking with UBSan:
>
> https://godbolt.org/z/TvMo89WfP
Couldn't that be caught at compile time? n is certainly out of bounds
always for such an array, since the last element is n-1.
>
>
>>
>> Also, new code can be designed from the beginning so that sizes go
>> before their corresponding arrays, so that new code won't typically be
>> affected by the lack of this feature in the language.
>>
>> This leaves us with legacy code, especially libc, which just works, and
>> doesn't have any urgent needs to change their prototypes in this regard
>> (they could, to improve static analysis, but not what we'd call urgent).
>
> It would be useful step to find out-of-bounds problem in
> applications using libc.
Yep, it would be very useful for that. Not urgent, but yes, very useful.
>> Let's take an example:
>>
>>
>> int getnameinfo(const struct sockaddr *restrict addr,
>> socklen_t addrlen,
>> char *restrict host, socklen_t hostlen,
>> char *restrict serv, socklen_t servlen,
>> int flags);
>>
>> and some transformations:
>>
>>
>> int getnameinfo(const struct sockaddr *restrict addr,
>> socklen_t addrlen,
>> char host[restrict hostlen], socklen_t hostlen,
>> char serv[restrict servlen], socklen_t servlen,
>> int flags);
>>
>>
>> int getnameinfo(socklen_t hostlen;
>> socklen_t servlen;
>> const struct sockaddr *restrict addr,
>> socklen_t addrlen,
>> char host[restrict hostlen], socklen_t hostlen,
>> char serv[restrict servlen], socklen_t servlen,
>> int flags);
>>
>> (I'm not sure if I used correct GNU syntax, since I never used that
>> extension myself.)
>>
>> The first transformation above is non-ambiguous, as concise as possible,
>> and its only issue is that it might complicate the implementation a bit
>> too much. I don't think forward-using a parameter's size would be too
>> much of a parsing problem for human readers.
>
>
> I personally find the second form not terrible. Being
> able to read code left-to-right, top-down is helpful in more
> complicated examples.
>
>
>
>> The second one is unnecessarily long and verbose, and semicolons are not
>> very distinguishable from commas, for human readers, which may be very
>> confusing.
>>
>> int foo(int a; int b[a], int a);
>> int foo(int a, int b[a], int o);
>>
>> Those two are very different to the compiler, and yet very similar to
>> the human eye. I don't like it. The fact that it allows for simpler
>> compilers isn't enough to overcome the readability issues.
>
> This is true, I would probably use it with a comma and/or
> syntax highlighting.
>
>
>> I think I'd prefer having the forward-using syntax as a non-standard
>> extension --or a standard but optional language feature-- to avoid
>> forcing small compilers to implement it, rather than having the GNU
>> extension standardized in all compilers.
>
> The problems with the second form are:
>
> - it is not 100% backwards compatible (which maybe ok though) as
> the semantics of the following code changes:
>
> int n;
> int foo(int a[n], int n); // refers to different n!
>
> Code written for new compilers could then be misunderstood
> by old compilers when a variable with 'n' is in scope.
>
>
Hmmm, this one is serious. I can't seem to solve it with that syntax.
> - it would generally be fundamentally new to C to have
> backwards references and parser might need to be changes
> to allow this
>
>
> - a compiler or tool then has to deal also with ugly
> corner cases such as mutual references:
>
> int foo(int (*a)[sizeof(*b)], int (*b)[sizeof(*a)]);
>
>
>
> We could consider new syntax such as
>
> int foo(char buf[.n], int n);
>
>
> Personally, I would prefer the conceptual simplicity of forward
> declarations and the fact that these exist already in GCC
> over any alternative. I would also not mind new syntax, but
> then one has to define the rules more precisely to avoid the
> aforementioned problems.
What about taking something from K&R functions for this?:
int foo(q; w; int a[q], int q, int s[w], int w);
By not specifying the types, the syntax is again short.
This is left-to-right, so no problems with global variables, and no need
for complex parsers.
Also, by not specifying types, now it's more obvious to the naked eye
that there's a difference:
int foo(a; int b[a], int a);
int foo(int a, int b[a], int o);
What do you think about this syntax?
Thanks,
Alex
--
Alejandro Colomar
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-09-03 13:41 ` Alejandro Colomar
@ 2022-09-03 14:35 ` Martin Uecker
2022-09-03 14:59 ` Alejandro Colomar
0 siblings, 1 reply; 85+ messages in thread
From: Martin Uecker @ 2022-09-03 14:35 UTC (permalink / raw)
To: Alejandro Colomar; +Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
Am Samstag, den 03.09.2022, 15:41 +0200 schrieb Alejandro Colomar:
> Hi Martin,
>
> On 9/3/22 14:47, Martin Uecker wrote:
> [...]
>
> > GCC will warn if the bound is specified inconsistently between
> > declarations and also emit warnings if it can see that a buffer
> > which is passed is too small:
> >
> > https://godbolt.org/z/PsjPG1nv7
>
> That's very good news!
>
> BTW, it's nice to see that GCC doesn't need 'static' for array
> parameters. I never understood what the static keyword adds there.
> There's no way one can specify an array size an mean anything other than
> requiring that, for a non-null pointer, the array should have at least
> that size.
From the C standard's point of view,
void foo(int n, char buf[n]);
is semantically equivalent to
void foo(int, char *buf);
and without 'static' the 'n' has no further meaning
(this is different for pointers to arrays).
The static keyword implies that the pointer is be valid and
non-zero and that there must be at least 'n' elements
accessible, so in some sense it is stronger (it implies
alid non-zero pointers), but at the same time it does not
imply a bound.
But I agree that 'n' without 'static' should simply imply
a bound and I think we should use it this way even when
the standard currently does not attach a meaning to it.
> >
> > BTW: If you declare pointers to arrays (not first elements) you
> > can get run-time bounds checking with UBSan:
> >
> > https://godbolt.org/z/TvMo89WfP
>
> Couldn't that be caught at compile time? n is certainly out of bounds
> always for such an array, since the last element is n-1.
Yes, in this example it could (and ideally should) be
detected at compile time.
But this notation already today allows passing of a bound
across API boundaries and thus enables run-time detection of
out-of-bound accesses even in scenarious where it could
not be found at compile time.
> >
> > > Also, new code can be designed from the beginning so that sizes go
> > > before their corresponding arrays, so that new code won't typically be
> > > affected by the lack of this feature in the language.
> > >
> > > This leaves us with legacy code, especially libc, which just works, and
> > > doesn't have any urgent needs to change their prototypes in this regard
> > > (they could, to improve static analysis, but not what we'd call urgent).
> >
> > It would be useful step to find out-of-bounds problem in
> > applications using libc.
>
> Yep, it would be very useful for that. Not urgent, but yes, very useful.
>
>
> > > Let's take an example:
> > >
> > >
> > > int getnameinfo(const struct sockaddr *restrict addr,
> > > socklen_t addrlen,
> > > char *restrict host, socklen_t hostlen,
> > > char *restrict serv, socklen_t servlen,
> > > int flags);
> > >
> > > and some transformations:
> > >
> > >
> > > int getnameinfo(const struct sockaddr *restrict addr,
> > > socklen_t addrlen,
> > > char host[restrict hostlen], socklen_t hostlen,
> > > char serv[restrict servlen], socklen_t servlen,
> > > int flags);
> > >
> > >
> > > int getnameinfo(socklen_t hostlen;
> > > socklen_t servlen;
> > > const struct sockaddr *restrict addr,
> > > socklen_t addrlen,
> > > char host[restrict hostlen], socklen_t hostlen,
> > > char serv[restrict servlen], socklen_t servlen,
> > > int flags);
> > >
> > > (I'm not sure if I used correct GNU syntax, since I never used that
> > > extension myself.)
> > >
> > > The first transformation above is non-ambiguous, as concise as possible,
> > > and its only issue is that it might complicate the implementation a bit
> > > too much. I don't think forward-using a parameter's size would be too
> > > much of a parsing problem for human readers.
> >
> > I personally find the second form not terrible. Being
> > able to read code left-to-right, top-down is helpful in more
> > complicated examples.
> >
> >
> >
> > > The second one is unnecessarily long and verbose, and semicolons are not
> > > very distinguishable from commas, for human readers, which may be very
> > > confusing.
> > >
> > > int foo(int a; int b[a], int a);
> > > int foo(int a, int b[a], int o);
> > >
> > > Those two are very different to the compiler, and yet very similar to
> > > the human eye. I don't like it. The fact that it allows for simpler
> > > compilers isn't enough to overcome the readability issues.
> >
> > This is true, I would probably use it with a comma and/or
> > syntax highlighting.
> >
> >
> > > I think I'd prefer having the forward-using syntax as a non-standard
> > > extension --or a standard but optional language feature-- to avoid
> > > forcing small compilers to implement it, rather than having the GNU
> > > extension standardized in all compilers.
> >
> > The problems with the second form are:
> >
> > - it is not 100% backwards compatible (which maybe ok though) as
> > the semantics of the following code changes:
> >
> > int n;
> > int foo(int a[n], int n); // refers to different n!
> >
> > Code written for new compilers could then be misunderstood
> > by old compilers when a variable with 'n' is in scope.
> >
> >
>
> Hmmm, this one is serious. I can't seem to solve it with that syntax.
>
> > - it would generally be fundamentally new to C to have
> > backwards references and parser might need to be changes
> > to allow this
> >
> >
> > - a compiler or tool then has to deal also with ugly
> > corner cases such as mutual references:
> >
> > int foo(int (*a)[sizeof(*b)], int (*b)[sizeof(*a)]);
> >
> >
> >
> > We could consider new syntax such as
> >
> > int foo(char buf[.n], int n);
> >
> >
> > Personally, I would prefer the conceptual simplicity of forward
> > declarations and the fact that these exist already in GCC
> > over any alternative. I would also not mind new syntax, but
> > then one has to define the rules more precisely to avoid the
> > aforementioned problems.
>
> What about taking something from K&R functions for this?:
>
> int foo(q; w; int a[q], int q, int s[w], int w);
>
> By not specifying the types, the syntax is again short.
> This is left-to-right, so no problems with global variables, and no need
> for complex parsers.
> Also, by not specifying types, now it's more obvious to the naked eye
> that there's a difference:
I am ok with the syntax, but I am not sure how this would
work. If the type is determined only later you would still
have to change parsers (some C compilers do type
checking and folding during parsing, so need the types
to be known during parsing) and you also still have the
problem with the mutual dependencies.
We thought about using this syntax
int foo(char buf[.n], int n);
because it is new syntax which means we can restrict the
size to be the name of a parameter instead of allowing
arbitrary expressions, which then makes forward references
less problematic. It is also consistent with designators in
initializers and could also be extend to annotate
flexible array members or for storing pointers to arrays
in structures:
struct {
int n;
char buf[.n];
};
struct {
int n;
char (*buf)[.n];
};
Martin
>
> int foo(a; int b[a], int a);
> int foo(int a, int b[a], int o);
>
>
> What do you think about this syntax?
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-09-03 14:35 ` Martin Uecker
@ 2022-09-03 14:59 ` Alejandro Colomar
2022-09-03 15:31 ` Martin Uecker
0 siblings, 1 reply; 85+ messages in thread
From: Alejandro Colomar @ 2022-09-03 14:59 UTC (permalink / raw)
To: Martin Uecker; +Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 3878 bytes --]
Hi Martin,
On 9/3/22 16:35, Martin Uecker wrote:
> Am Samstag, den 03.09.2022, 15:41 +0200 schrieb Alejandro Colomar:
>> Hi Martin,
>>
>> On 9/3/22 14:47, Martin Uecker wrote:
>> [...]
>>
>>> GCC will warn if the bound is specified inconsistently between
>>> declarations and also emit warnings if it can see that a buffer
>>> which is passed is too small:
>>>
>>> https://godbolt.org/z/PsjPG1nv7
>>
>> That's very good news!
>>
>> BTW, it's nice to see that GCC doesn't need 'static' for array
>> parameters. I never understood what the static keyword adds there.
>> There's no way one can specify an array size an mean anything other than
>> requiring that, for a non-null pointer, the array should have at least
>> that size.
>
> From the C standard's point of view,
>
> void foo(int n, char buf[n]);
>
> is semantically equivalent to
>
> void foo(int, char *buf);
>
> and without 'static' the 'n' has no further meaning
> (this is different for pointers to arrays).
I know. I just don't understand the rationale for that decission. :/
>
> The static keyword implies that the pointer is be valid and
> non-zero and that there must be at least 'n' elements
> accessible, so in some sense it is stronger (it implies
> alid non-zero pointers), but at the same time it does not
> imply a bound.
That stronger meaning, I think is a mistake by the standard.
Basically, [static n] means the same as [n] combined with [[gnu::nonnull]].
What the standard should have done would be to keep those two things
separate, since one may want to declare non-null non-array pointers, or
possibly-null array ones. So the standard should have standardized some
form of nonnull for that. But the recent discussion about presenting
nonnull pointers as [static 1] is horrible. But let's wait till the
future hopefully fixes this.
>
> But I agree that 'n' without 'static' should simply imply
> a bound and I think we should use it this way even when
> the standard currently does not attach a meaning to it.
Yep.
[...]
>> What about taking something from K&R functions for this?:
>>
>> int foo(q; w; int a[q], int q, int s[w], int w);
>>
>> By not specifying the types, the syntax is again short.
>> This is left-to-right, so no problems with global variables, and no need
>> for complex parsers.
>> Also, by not specifying types, now it's more obvious to the naked eye
>> that there's a difference:
>
> I am ok with the syntax, but I am not sure how this would
> work. If the type is determined only later you would still
> have to change parsers (some C compilers do type
> checking and folding during parsing, so need the types
> to be known during parsing) and you also still have the
> problem with the mutual dependencies.
This syntax resembles a lot K&R syntax. Any C compiler that supports
them (and I guess most compilers out there do) should be easily
convertible to support this syntax (at least more easily than other
alternatives). But this is just a guess.
>
> We thought about using this syntax
>
> int foo(char buf[.n], int n);
>
> because it is new syntax which means we can restrict the
> size to be the name of a parameter instead of allowing
> arbitrary expressions, which then makes forward references
> less problematic. It is also consistent with designators in
> initializers and could also be extend to annotate
> flexible array members or for storing pointers to arrays
> in structures:
It's not crazy. I don't have much to argue against it.
>
> struct {
> int n;
> char buf[.n];
> };
>
> struct {
> int n;
> char (*buf)[.n];
> };
Perhaps some doubts about how this would work for nested structures, but
not unreasonable.
Cheers,
Alex
--
Alejandro Colomar
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-09-03 13:29 ` Ingo Schwarze
@ 2022-09-03 15:08 ` Alejandro Colomar
0 siblings, 0 replies; 85+ messages in thread
From: Alejandro Colomar @ 2022-09-03 15:08 UTC (permalink / raw)
To: Ingo Schwarze; +Cc: Martin Uecker, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 1460 bytes --]
Hi Ingo,
On 9/3/22 15:29, Ingo Schwarze wrote:
> the only point i strongly care about is this one:
>
> Manual pages should not use
> * non-standard syntax
> * non-portable syntax
> * ambiguous syntax (i.e. syntax that might have different meanings
> with different compilers or in different contexts)
> * syntax that might be invalid or dangerous with some widely
> used compiler collections like GCC or LLVM
The first two are good guidelines, but not strict IMHO if there's a good
reason.
The third and fourth are a strong requirements.
For now I won't be applying this patch.
>
> Regarding the discussions about standardization and extensions,
> all proposals i have seen look seriously ugly and awkward to me,
> and i'm not yet convinced such ugliness is sufficiently offset by
> the relatively minor benefit that is apparent to me right now.
I hope we come up with something not ugly from that discussion.
The static analysis / compiler warning capabilities of using VLA syntax
seem strong reasons to me. They help avoid stupid bugs, even for
careless programmers (well, only if those careless programmers care just
enough to enable -Wall, and then to read the warnings). Not something
that will fix an incorrect algorithm, but can stop some typos, or other
stupid mistakes that we all do from time to time.
Cheers,
Alex
--
Alejandro Colomar
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-09-03 14:59 ` Alejandro Colomar
@ 2022-09-03 15:31 ` Martin Uecker
2022-09-03 20:02 ` Alejandro Colomar
2022-11-10 0:06 ` Alejandro Colomar
0 siblings, 2 replies; 85+ messages in thread
From: Martin Uecker @ 2022-09-03 15:31 UTC (permalink / raw)
To: Alejandro Colomar; +Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
Hi Alejandro,
Am Samstag, den 03.09.2022, 16:59 +0200 schrieb Alejandro Colomar:
> Hi Martin,
>
> On 9/3/22 16:35, Martin Uecker wrote:
> > Am Samstag, den 03.09.2022, 15:41 +0200 schrieb Alejandro Colomar:
> > > Hi Martin,
> > >
> > > On 9/3/22 14:47, Martin Uecker wrote:
> > > [...]
> > >
> > > > GCC will warn if the bound is specified inconsistently between
> > > > declarations and also emit warnings if it can see that a buffer
> > > > which is passed is too small:
> > > >
> > > > https://godbolt.org/z/PsjPG1nv7
> > >
> > > That's very good news!
> > >
> > > BTW, it's nice to see that GCC doesn't need 'static' for array
> > > parameters. I never understood what the static keyword adds there.
> > > There's no way one can specify an array size an mean anything other than
> > > requiring that, for a non-null pointer, the array should have at least
> > > that size.
> >
> > From the C standard's point of view,
> >
> > void foo(int n, char buf[n]);
> >
> > is semantically equivalent to
> >
> > void foo(int, char *buf);
> >
> > and without 'static' the 'n' has no further meaning
> > (this is different for pointers to arrays).
>
> I know. I just don't understand the rationale for that decission. :/
I guess it made sense in the past, but is simply not
what we need today.
> > The static keyword implies that the pointer is be valid and
> > non-zero and that there must be at least 'n' elements
> > accessible, so in some sense it is stronger (it implies
> > alid non-zero pointers), but at the same time it does not
> > imply a bound.
>
> That stronger meaning, I think is a mistake by the standard.
> Basically, [static n] means the same as [n] combined with [[gnu::nonnull]].
> What the standard should have done would be to keep those two things
> separate, since one may want to declare non-null non-array pointers, or
> possibly-null array ones. So the standard should have standardized some
> form of nonnull for that.
I agree the situation is not good.
> But the recent discussion about presenting
> nonnull pointers as [static 1] is horrible. But let's wait till the
> future hopefully fixes this.
yes, [static 1] is problematic because then the number
can not be used as a bound anymore.
My experience is that if one wants to see something fixed,
one has to push for it. Standardization is meant
to standardize existing practice, so if we want to see
this improved, we can not wait for this.
> > But I agree that 'n' without 'static' should simply imply
> > a bound and I think we should use it this way even when
> > the standard currently does not attach a meaning to it.
>
> Yep.
>
> [...]
>
> > > What about taking something from K&R functions for this?:
> > >
> > > int foo(q; w; int a[q], int q, int s[w], int w);
> > >
> > > By not specifying the types, the syntax is again short.
> > > This is left-to-right, so no problems with global variables, and no need
> > > for complex parsers.
> > > Also, by not specifying types, now it's more obvious to the naked eye
> > > that there's a difference:
> >
> > I am ok with the syntax, but I am not sure how this would
> > work. If the type is determined only later you would still
> > have to change parsers (some C compilers do type
> > checking and folding during parsing, so need the types
> > to be known during parsing) and you also still have the
> > problem with the mutual dependencies.
>
> This syntax resembles a lot K&R syntax. Any C compiler that supports
> them (and I guess most compilers out there do) should be easily
> convertible to support this syntax (at least more easily than other
> alternatives). But this is just a guess.
In K&R syntax this worked for definition:
void foo(y, n)
int n;
int y[n];
{ ...
But this worked because you could reorder the
declarations so that later declarations could
refer to previous ones.
So one could do
int foo(int n, char buf[n]; buf, n);
where the second part defines the order of
the parameter or
int foo(buf, n; int n, char buf[n]);
where the first part defins the order,
but the declarations need to have the size
first. But then you need to specify each
parameter twice...
> > We thought about using this syntax
> >
> > int foo(char buf[.n], int n);
> >
> > because it is new syntax which means we can restrict the
> > size to be the name of a parameter instead of allowing
> > arbitrary expressions, which then makes forward references
> > less problematic. It is also consistent with designators in
> > initializers and could also be extend to annotate
> > flexible array members or for storing pointers to arrays
> > in structures:
>
> It's not crazy. I don't have much to argue against it.
>
> > struct {
> > int n;
> > char buf[.n];
> > };
> >
> > struct {
> > int n;
> > char (*buf)[.n];
> > };
>
> Perhaps some doubts about how this would work for nested structures, but
> not unreasonable.
It is not implemented though...
Martin
> Cheers,
>
> Alex
>
> --
> Alejandro Colomar
> <http://www.alejandro-colomar.es/>
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-09-03 15:31 ` Martin Uecker
@ 2022-09-03 20:02 ` Alejandro Colomar
2022-09-05 14:31 ` Alejandro Colomar
2022-11-10 0:06 ` Alejandro Colomar
1 sibling, 1 reply; 85+ messages in thread
From: Alejandro Colomar @ 2022-09-03 20:02 UTC (permalink / raw)
To: Martin Uecker; +Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 2782 bytes --]
Hi Martin,
On 9/3/22 17:31, Martin Uecker wrote:
[...]
>> But the recent discussion about presenting
>> nonnull pointers as [static 1] is horrible. But let's wait till the
>> future hopefully fixes this.
>
> yes, [static 1] is problematic because then the number
> can not be used as a bound anymore.
>
> My experience is that if one wants to see something fixed,
> one has to push for it. Standardization is meant
> to standardize existing practice, so if we want to see
> this improved, we can not wait for this.
>
Yeah, I'm not just waiting to see if it gets fixed alone. I've been
discussing about nonnull being added to the standard, or improved in the
compilers, but so far no compiler has something convincing. GCC's
attribute is problematic due to UB issues, and Clang's _Nonnull keyword
is useless as of now:
<https://github.com/llvm/llvm-project/issues/57546>
Maybe GCC could add Clang's _Nonnull (and maybe _Nullable and the
pragmas, but definitely not _Null_unspecified), and add some good warnings.
Only then it would make sense to try to standardize the feature.
[...]
> In K&R syntax this worked for definition:
>
> void foo(y, n)
> int n;
> int y[n];
> { ...
>
> But this worked because you could reorder the
> declarations so that later declarations could
> refer to previous ones.
>
> So one could do
>
> int foo(int n, char buf[n]; buf, n);
>
> where the second part defines the order of
> the parameter or
>
> int foo(buf, n; int n, char buf[n]);
>
> where the first part defins the order,
> but the declarations need to have the size
> first. But then you need to specify each
> parameter twice...
Hmm, yeah, maybe the [.n] notation makes more sense.
>
>
>>> We thought about using this syntax
>>>
>>> int foo(char buf[.n], int n);
>>>
>>> because it is new syntax which means we can restrict the
>>> size to be the name of a parameter instead of allowing
>>> arbitrary expressions, which then makes forward references
>>> less problematic. It is also consistent with designators in
>>> initializers and could also be extend to annotate
>>> flexible array members or for storing pointers to arrays
>>> in structures:
>>
>> It's not crazy. I don't have much to argue against it.
>>
>>> struct {
>>> int n;
>>> char buf[.n];
>>> };
>>>
>>> struct {
>>> int n;
>>> char (*buf)[.n];
>>> };
>>
>> Perhaps some doubts about how this would work for nested structures, but
>> not unreasonable.
>
> It is not implemented though...
Well, are you planning to implement it?
If you do, I'm very interested in using it in the documentation ;)
Cheers,
Alex
--
Alejandro Colomar
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-09-03 20:02 ` Alejandro Colomar
@ 2022-09-05 14:31 ` Alejandro Colomar
0 siblings, 0 replies; 85+ messages in thread
From: Alejandro Colomar @ 2022-09-05 14:31 UTC (permalink / raw)
To: Martin Uecker; +Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 730 bytes --]
Hi Martin,
On 9/3/22 22:02, Alejandro Colomar wrote:
>>>> We thought about using this syntax
>>>>
>>>> int foo(char buf[.n], int n);
BTW, it would be useful if this syntax was accepted for void * too,
especially since GNU C allows pointer arithmetic on void *.
void *memmove(void dest[.n], const void src[.n], size_t n);
I understand that a void array doesn't make sense, so defining a VLA of
type void is an error elsewhere, but since array parameters are not
really arrays, and instead pointers, this could be reasonable.
The same that these "arrays" can have zero sizes, or even negative ones
in some weird cases.
Cheers,
Alex
--
Alejandro Colomar
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-09-03 15:31 ` Martin Uecker
2022-09-03 20:02 ` Alejandro Colomar
@ 2022-11-10 0:06 ` Alejandro Colomar
2022-11-10 0:09 ` Alejandro Colomar
` (2 more replies)
1 sibling, 3 replies; 85+ messages in thread
From: Alejandro Colomar @ 2022-11-10 0:06 UTC (permalink / raw)
To: Martin Uecker; +Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 1792 bytes --]
Hi Martin,
On 9/3/22 17:31, Martin Uecker wrote:
> My experience is that if one wants to see something fixed,
> one has to push for it. Standardization is meant
> to standardize existing practice, so if we want to see
> this improved, we can not wait for this.
I fully agree with you. I've been ruminating these patches for some time, for
having some more time to think about them. Now, I like them enough to push.
So, after a few minor cosmetic issues detected by some linters, I've pushed the
changes to document all of man2 and man3 with hypothetical VLA syntax.
Now, I've released man-pages-6.01 very recently (just a few weeks ago), and I
don't plan to release again in a year or two, so there's time to do the
implementation in GCC. From my side, please consider this an ACK or even
somewhat of a push to get things done in the compiler side of things :)
I'll show here an excerpt of what kind of syntax has been pushed. Of course,
there's room for improving/fixing, since it's not seen an official release, but
for now, this is what's up there:
int strncmp(const char s1[.n], const char s2[.n], size_t n);
long mbind(void addr[.len], unsigned long len, int mode,
const unsigned long nodemask[(.maxnode + ULONG_WIDTH ‐ 1)
/ ULONG_WIDTH],
unsigned long maxnode, unsigned int flags);
int cacheflush(void addr[.nbytes], int nbytes, int cache);
I've shown the three kinds of prototypes that have been changed:
- Normal VLA; nothing fancy except for the '.'.
- Complex size expressions.
- 'void *' VLAs (assuming GNU conventions: sizeof(void *)==1).
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-10 0:06 ` Alejandro Colomar
@ 2022-11-10 0:09 ` Alejandro Colomar
2022-11-10 1:33 ` Joseph Myers
2022-11-10 9:40 ` G. Branden Robinson
2 siblings, 0 replies; 85+ messages in thread
From: Alejandro Colomar @ 2022-11-10 0:09 UTC (permalink / raw)
To: Martin Uecker; +Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 2058 bytes --]
On 11/10/22 01:06, Alejandro Colomar wrote:
> Hi Martin,
>
> On 9/3/22 17:31, Martin Uecker wrote:
>> My experience is that if one wants to see something fixed,
>> one has to push for it. Standardization is meant
>> to standardize existing practice, so if we want to see
>> this improved, we can not wait for this.
>
> I fully agree with you. I've been ruminating these patches for some time, for
> having some more time to think about them. Now, I like them enough to push. So,
> after a few minor cosmetic issues detected by some linters, I've pushed the
> changes to document all of man2 and man3 with hypothetical VLA syntax.
>
> Now, I've released man-pages-6.01 very recently (just a few weeks ago), and I
> don't plan to release again in a year or two, so there's time to do the
> implementation in GCC. From my side, please consider this an ACK or even
> somewhat of a push to get things done in the compiler side of things :)
>
> I'll show here an excerpt of what kind of syntax has been pushed. Of course,
> there's room for improving/fixing, since it's not seen an official release, but
> for now, this is what's up there:
>
>
> int strncmp(const char s1[.n], const char s2[.n], size_t n);
>
> long mbind(void addr[.len], unsigned long len, int mode,
> const unsigned long nodemask[(.maxnode + ULONG_WIDTH ‐ 1)
> / ULONG_WIDTH],
> unsigned long maxnode, unsigned int flags);
>
> int cacheflush(void addr[.nbytes], int nbytes, int cache);
>
>
> I've shown the three kinds of prototypes that have been changed:
>
> - Normal VLA; nothing fancy except for the '.'.
> - Complex size expressions.
> - 'void *' VLAs (assuming GNU conventions: sizeof(void *)==1).
Oops: sizeof(void)==1
>
>
> Cheers,
>
> Alex
>
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-10 0:06 ` Alejandro Colomar
2022-11-10 0:09 ` Alejandro Colomar
@ 2022-11-10 1:33 ` Joseph Myers
2022-11-10 1:39 ` Joseph Myers
2022-11-10 9:40 ` G. Branden Robinson
2 siblings, 1 reply; 85+ messages in thread
From: Joseph Myers @ 2022-11-10 1:33 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
On Thu, 10 Nov 2022, Alejandro Colomar via Gcc wrote:
> I've shown the three kinds of prototypes that have been changed:
>
> - Normal VLA; nothing fancy except for the '.'.
> - Complex size expressions.
> - 'void *' VLAs (assuming GNU conventions: sizeof(void *)==1).
That doesn't cover any of the tricky issues with such proposals, such as
the choice of which entity is referred to by the parameter name when there
are multiple nested parameter lists that use the same parameter name, or
when the identifier is visible from an outer scope (including in
particular the case where it's declared as a typedef name in an outer
scope).
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-10 1:33 ` Joseph Myers
@ 2022-11-10 1:39 ` Joseph Myers
2022-11-10 6:21 ` Martin Uecker
0 siblings, 1 reply; 85+ messages in thread
From: Joseph Myers @ 2022-11-10 1:39 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
On Thu, 10 Nov 2022, Joseph Myers wrote:
> On Thu, 10 Nov 2022, Alejandro Colomar via Gcc wrote:
>
> > I've shown the three kinds of prototypes that have been changed:
> >
> > - Normal VLA; nothing fancy except for the '.'.
> > - Complex size expressions.
> > - 'void *' VLAs (assuming GNU conventions: sizeof(void *)==1).
>
> That doesn't cover any of the tricky issues with such proposals, such as
> the choice of which entity is referred to by the parameter name when there
> are multiple nested parameter lists that use the same parameter name, or
> when the identifier is visible from an outer scope (including in
> particular the case where it's declared as a typedef name in an outer
> scope).
In fact I can't tell from these examples whether you mean for a '.' token
after '[' to have special semantics, or whether you mean to have a special
'. identifier' form of expression valid in certain context (each of which
introduces its own complications; for the former, typedef names from outer
scopes are problematic; for the latter, it's designated initializers where
you get complications, for example). Designing new syntax that doesn't
cause ambiguity is generally tricky, and this sort of language extension
is the kind of thing where you'd expect to so through at least five
iterations of a WG14 paper before you have something like a sound
specification.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-10 1:39 ` Joseph Myers
@ 2022-11-10 6:21 ` Martin Uecker
2022-11-10 10:09 ` Alejandro Colomar
2022-11-10 23:19 ` Joseph Myers
0 siblings, 2 replies; 85+ messages in thread
From: Martin Uecker @ 2022-11-10 6:21 UTC (permalink / raw)
To: Joseph Myers, Alejandro Colomar
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
Am Donnerstag, den 10.11.2022, 01:39 +0000 schrieb Joseph Myers:
> On Thu, 10 Nov 2022, Joseph Myers wrote:
>
> > On Thu, 10 Nov 2022, Alejandro Colomar via Gcc wrote:
> >
> > > I've shown the three kinds of prototypes that have been changed:
> > >
> > > - Normal VLA; nothing fancy except for the '.'.
> > > - Complex size expressions.
> > > - 'void *' VLAs (assuming GNU conventions: sizeof(void *)==1).
> >
> > That doesn't cover any of the tricky issues with such proposals, such as
> > the choice of which entity is referred to by the parameter name when there
> > are multiple nested parameter lists that use the same parameter name, or
> > when the identifier is visible from an outer scope (including in
> > particular the case where it's declared as a typedef name in an outer
> > scope).
>
> In fact I can't tell from these examples whether you mean for a '.' token
> after '[' to have special semantics, or whether you mean to have a special
> '. identifier' form of expression valid in certain context (each of which
> introduces its own complications; for the former, typedef names from outer
> scopes are problematic; for the latter, it's designated initializers where
> you get complications, for example). Designing new syntax that doesn't
> cause ambiguity is generally tricky, and this sort of language extension
> is the kind of thing where you'd expect to so through at least five
> iterations of a WG14 paper before you have something like a sound
> specification.
I am not sure what Alejandro has in mind exactly, but my idea of using
a new notation [.identifier] would be to limit it to accessing other
parameter names in the same parameter list only, so that there is
1) no ambiguity what is referred to and
2) one can access parameters which come later
If we want to specify something like this, I think we should also
restrict what kind of expressions one allows, e.g. it has to
be side-effect free. But maybe we want to make this even more
restrictive (at least initially).
One problem with WG14 papers is that people put in too much,
because the overhead is so high and the standard is not updated
very often. It would be better to build such feature more
incrementally, which could be done more easily with a compiler
extension. One could start supporting just [.x] but not more
complicated expressions.
Later WG14 can still accept or reject or modify this proposal
based on the experience we get.
(I would also be happy with using GNU forward declarations, and
I am not sure why people dislike them so much.)
Martin
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-10 0:06 ` Alejandro Colomar
2022-11-10 0:09 ` Alejandro Colomar
2022-11-10 1:33 ` Joseph Myers
@ 2022-11-10 9:40 ` G. Branden Robinson
2022-11-10 10:59 ` Alejandro Colomar
2 siblings, 1 reply; 85+ messages in thread
From: G. Branden Robinson @ 2022-11-10 9:40 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1: Type: text/plain, Size: 944 bytes --]
Hi Alex,
At 2022-11-10T01:06:31+0100, Alejandro Colomar wrote:
> Now, I've released man-pages-6.01 very recently (just a few weeks
> ago), and I don't plan to release again in a year or two, so there's
> time to do the implementation in GCC. From my side, please consider
> this an ACK or even somewhat of a push to get things done in the
> compiler side of things :)
Do you mean you _don't_ plan to release again for a year or two?
You know what Moltke said about plans and contact with the enemy. For
one thing, I think the Linux kernel will move too fast to permit such a
leisurely cadence.
Also, as soon as Bertrand and I can get groff 1.23 out[1], I am hoping
you will, shortly thereafter, migrate to the new `MR` macro.
<tents fingers, laughs villainously>
Regards,
Branden
[1] Only 6 RC bugs left!
https://savannah.gnu.org/bugs/index.php?go_report=Apply&group=groff&set=custom&report_id=225&status_id=1&plan_release_id=103
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-10 6:21 ` Martin Uecker
@ 2022-11-10 10:09 ` Alejandro Colomar
2022-11-10 23:19 ` Joseph Myers
1 sibling, 0 replies; 85+ messages in thread
From: Alejandro Colomar @ 2022-11-10 10:09 UTC (permalink / raw)
To: Martin Uecker, Joseph Myers
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 6277 bytes --]
Hi Joseph and Martin!
On 11/10/22 07:21, Martin Uecker wrote:
> Am Donnerstag, den 10.11.2022, 01:39 +0000 schrieb Joseph Myers:
>> On Thu, 10 Nov 2022, Joseph Myers wrote:
>>
>>> On Thu, 10 Nov 2022, Alejandro Colomar via Gcc wrote:
>>>
>>>> I've shown the three kinds of prototypes that have been changed:
>>>>
>>>> - Normal VLA; nothing fancy except for the '.'.
>>>> - Complex size expressions.
>>>> - 'void *' VLAs (assuming GNU conventions: sizeof(void *)==1).
>>>
>>> That doesn't cover any of the tricky issues with such proposals, such as
>>> the choice of which entity is referred to by the parameter name when there
>>> are multiple nested parameter lists that use the same parameter name, or
>>> when the identifier is visible from an outer scope (including in
>>> particular the case where it's declared as a typedef name in an outer
>>> scope).
>>
>> In fact I can't tell from these examples whether you mean for a '.' token
>> after '[' to have special semantics, or whether you mean to have a special
>> '. identifier' form of expression valid in certain context (each of which
>> introduces its own complications; for the former, typedef names from outer
>> scopes are problematic; for the latter, it's designated initializers where
>> you get complications, for example). Designing new syntax that doesn't
>> cause ambiguity is generally tricky, and this sort of language extension
>> is the kind of thing where you'd expect to so through at least five
>> iterations of a WG14 paper before you have something like a sound
>> specification.
>
> I am not sure what Alejandro has in mind exactly, but my idea of using
> a new notation [.identifier] would be to limit it to accessing other
> parameter names in the same parameter list only, so that there is
>
> 1) no ambiguity what is referred to and
> 2) one can access parameters which come later
Yes, I implemented your idea. As always, I thought I had linked to it in the
commit message, but I didn't. Quite a bad thing for the commit that implements
a completely new feature to not point to the documentation/idea at all.
So, the documentation followed by these 3 patches is Martin's email:
<https://lore.kernel.org/linux-man/601680ae-30d7-1481-e152-034083f6dde1@gmail.com/T/#med2bdfcc31a3d0b3bc6c48b229c8d8dd5088935e>
It was sound in my head, and I couldn't see any inconsistencies.
- I implemented it with '.' as being restricted to refer to parameters of the
function being prototypes (commit 1).
- I also allowed complex expressions in the prototypes (commit 2), since it's
something that can be quite useful (that was already foreseen by Martin's idea,
IIRC). The most useful example that I have in my mind is a patch that I'm
developing for shadow-utils:
<https://github.com/shadow-maint/shadow/pull/569/files#diff-12b560bab6b4fb8f7f3a16f01aaa994de539a8bed3058c976be0daebe16405c1>
The gist of it is a function that gets a fixed-width non-NUL-terminated
string, and copies it into a NUL-terminated string in a buffer than has to be of
course +1 the size of the input string:
void buf2str(char dst[restrict .n+1], const char src[restrict .n],
size_t n);
- I extended the idea to apply to void[] (commit 3). Something not yet allowed
by GCC, but very useful IMO, especially for the mem...(3) functions. Since GNU
C consistently treats sizeof(void)==1, it makes sense to allow VLA syntax in
that way. This is not at all about allowing true VLAs of type void[]; that's
forbidden, and should continue to be forbidden. But since parameters are just
pointers, I don't see any issue with allowing false void[] VLAs in parameters
that really are void* in disguise.
The 3 commits are here (last 3 commits in that log):
<https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?id=c64cd13e002561c6802c6a1a1a8a640f034fea70>
Martin, please check if I implemented your idea faithfully. The 3 example
prototypes I showed are good representatives of what I added, so if you don't
understand man(7) source you could just read them and see if they make sense to
you; the rest of the changes are of the same kind. Or you could install the man
pages from the repo :)
>
> If we want to specify something like this, I think we should also
> restrict what kind of expressions one allows, e.g. it has to
> be side-effect free.
Well, yes, there should be no side effects; it would not make sense in a
prototype. I'd put it as simply as with _Generic(3) and similar stuff, where
the controlling expression is not evaluated for side effects. I never remember
about sizeof() or typeof(): I always need to consult if they have side effects
or not. I'll be documenting that in the man-pages soon.
> But maybe we want to make this even more
> restrictive (at least initially).
Yeah, you could go for an initial implementation that only supports my commit 1;
that would be the simplest. That would cover already the vast majority of
cases. But please consider commits 2 and 3 afterwards, since I believe they are
also of great importance.
>
> One problem with WG14 papers is that people put in too much,
> because the overhead is so high and the standard is not updated
> very often. It would be better to build such feature more
> incrementally, which could be done more easily with a compiler
> extension. One could start supporting just [.x] but not more
> complicated expressions.
>
> Later WG14 can still accept or reject or modify this proposal
> based on the experience we get.
Yeah, and I also think any WG14 papers with features as important as this one
without prior experience in a real compiler should be rejected. I don't think
it makes sense to standardize something just from theoretical discussions, and
force everyone to implement it afterwards. No matter how good the reviewers are.
>
> (I would also be happy with using GNU forward declarations, and
> I am not sure why people dislike them so much.)
For me, it's how easy it is to confuse a comma with a semicolon. Also,
unnecessarily long lines.
>
> Martin
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-10 9:40 ` G. Branden Robinson
@ 2022-11-10 10:59 ` Alejandro Colomar
2022-11-10 17:47 ` Alejandro Colomar
2022-11-10 22:25 ` [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters G. Branden Robinson
0 siblings, 2 replies; 85+ messages in thread
From: Alejandro Colomar @ 2022-11-10 10:59 UTC (permalink / raw)
To: G. Branden Robinson
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 1807 bytes --]
Hi Branden!
On 11/10/22 10:40, G. Branden Robinson wrote:
> Hi Alex,
>
> At 2022-11-10T01:06:31+0100, Alejandro Colomar wrote:
>> Now, I've released man-pages-6.01 very recently (just a few weeks
>> ago), and I don't plan to release again in a year or two, so there's
>> time to do the implementation in GCC. From my side, please consider
>> this an ACK or even somewhat of a push to get things done in the
>> compiler side of things :)
>
> Do you mean you _don't_ plan to release again for a year or two?
>
> You know what Moltke said about plans and contact with the enemy. For
> one thing, I think the Linux kernel will move too fast to permit such a
> leisurely cadence.
Heh, at this point, I burnt my ships, by using enhanced VLA syntax. If I
release that before GCC, I'm expecting to see an avalanche of reports about it
(and I also expect that GCC and forums will receive a similar ammount). So yes,
I expect to wait some longish time.
>
> Also, as soon as Bertrand and I can get groff 1.23 out[1], I am hoping
> you will, shortly thereafter, migrate to the new `MR` macro.
Not as soon as it gets released, because I expect (at least a decent amount of)
contributors to be able to read the pages to which they contribute to, but as
soon as it makes it into Debian stable, yes, that's in my plans. So, if you
make it before the freeze, that means around a couple of months from now.
>
> <tents fingers, laughs villainously>
<also tents fingers, laughs villainously>
>
> Regards,
> Branden
>
> [1] Only 6 RC bugs left!
Looks good!
Cheers,
Alex
>
> https://savannah.gnu.org/bugs/index.php?go_report=Apply&group=groff&set=custom&report_id=225&status_id=1&plan_release_id=103
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-10 10:59 ` Alejandro Colomar
@ 2022-11-10 17:47 ` Alejandro Colomar
2022-11-10 18:04 ` MR macro 4th argument (was: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters) Alejandro Colomar
2022-11-10 22:25 ` [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters G. Branden Robinson
1 sibling, 1 reply; 85+ messages in thread
From: Alejandro Colomar @ 2022-11-10 17:47 UTC (permalink / raw)
To: G. Branden Robinson; +Cc: Ingo Schwarze, linux-man
[-- Attachment #1.1: Type: text/plain, Size: 2616 bytes --]
[removed gcc@ and other uninterested people; added groff@]
Hi Branden!
On 11/10/22 11:59, Alejandro Colomar wrote:
>> Also, as soon as Bertrand and I can get groff 1.23 out[1], I am hoping
>> you will, shortly thereafter, migrate to the new `MR` macro.
>
> Not as soon as it gets released, because I expect (at least a decent amount of)
> contributors to be able to read the pages to which they contribute to, but as
> soon as it makes it into Debian stable, yes, that's in my plans. So, if you
> make it before the freeze, that means around a couple of months from now.
I won't be applying the patch now, to avoid contributors seeing people suddenly
not seeing man page references while preparing patches. But I'll start
preparing the patch, to see where are the most difficult parts. And maybe
report some issues with the usability.
My first thing was to run:
$ grep -rn '^\.BR .* ([1-9]\w*)'
I'm surprised for good that it seems that there are no false positives. I
didn't expect that. But since things like exit(1) are code, they are probably
either not highlighted at all, or maybe are italicized (as code is). So that's
a good thing.
It showed a few lines that might be problematic, but that's actually bad code,
which I need to fix:
man7/credentials.7:270:.BR setuid "(2) (" setgid (2))
man7/credentials.7:274:.BR seteuid "(2) (" setegid (2))
man7/credentials.7:277:.BR setfsuid "(2) (" setfsgid (2))
man7/credentials.7:280:.BR setreuid "(2) (" setregid (2))
man7/credentials.7:284:.BR setresuid "(2) (" setresgid (2))
Those are asking for a 2-line thing, where the second line is RB instead of BR.
Which reminds me to check RB:
$ grep -rn '^\.RB .* ([1-9]\w*)'
There are much less cases, and also seem to be fine to script, with a few minor
ffixes too.
The big issue is that your MR doesn't support leading text:
.MR page‐title manual‐section [trailing‐text]
I remember we had this discussion about what to do with it. A 4th argument?
There's also conflict with a hypothetical link that we might want to add later.
My opinion is that the 4th argument should be the leading text. Asking to use
the escape (was it \c?) sequence to workaround that limitation is not very nice.
Especially for scripting the change.
If you want a 5th argument for a URI, you can specify the leading text as "",
which is not much of an issue. And you keep the trailing text and the leading
one together.
What are your thoughts? What should we do?
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* MR macro 4th argument (was: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters)
2022-11-10 17:47 ` Alejandro Colomar
@ 2022-11-10 18:04 ` Alejandro Colomar
2022-11-10 18:11 ` Alejandro Colomar
` (2 more replies)
0 siblings, 3 replies; 85+ messages in thread
From: Alejandro Colomar @ 2022-11-10 18:04 UTC (permalink / raw)
To: G. Branden Robinson, groff; +Cc: Ingo Schwarze, linux-man
[-- Attachment #1.1: Type: text/plain, Size: 3023 bytes --]
Of course I forgot to rename the title, and to agg groff@. Nice.
-------- Forwarded Message --------
Subject: Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
Date: Thu, 10 Nov 2022 18:47:38 +0100
From: Alejandro Colomar <alx.manpages@gmail.com>
To: G. Branden Robinson <g.branden.robinson@gmail.com>
CC: Ingo Schwarze <schwarze@usta.de>, linux-man@vger.kernel.org
[removed gcc@ and other uninterested people; added groff@]
Hi Branden!
On 11/10/22 11:59, Alejandro Colomar wrote:
>> Also, as soon as Bertrand and I can get groff 1.23 out[1], I am hoping
>> you will, shortly thereafter, migrate to the new `MR` macro.
>
> Not as soon as it gets released, because I expect (at least a decent amount of)
> contributors to be able to read the pages to which they contribute to, but as
> soon as it makes it into Debian stable, yes, that's in my plans. So, if you
> make it before the freeze, that means around a couple of months from now.
I won't be applying the patch now, to avoid contributors seeing people suddenly
not seeing man page references while preparing patches. But I'll start
preparing the patch, to see where are the most difficult parts. And maybe
report some issues with the usability.
My first thing was to run:
$ grep -rn '^\.BR .* ([1-9]\w*)'
I'm surprised for good that it seems that there are no false positives. I
didn't expect that. But since things like exit(1) are code, they are probably
either not highlighted at all, or maybe are italicized (as code is). So that's
a good thing.
It showed a few lines that might be problematic, but that's actually bad code,
which I need to fix:
man7/credentials.7:270:.BR setuid "(2) (" setgid (2))
man7/credentials.7:274:.BR seteuid "(2) (" setegid (2))
man7/credentials.7:277:.BR setfsuid "(2) (" setfsgid (2))
man7/credentials.7:280:.BR setreuid "(2) (" setregid (2))
man7/credentials.7:284:.BR setresuid "(2) (" setresgid (2))
Those are asking for a 2-line thing, where the second line is RB instead of BR.
Which reminds me to check RB:
$ grep -rn '^\.RB .* ([1-9]\w*)'
There are much less cases, and also seem to be fine to script, with a few minor
ffixes too.
The big issue is that your MR doesn't support leading text:
.MR page‐title manual‐section [trailing‐text]
I remember we had this discussion about what to do with it. A 4th argument?
There's also conflict with a hypothetical link that we might want to add later.
My opinion is that the 4th argument should be the leading text. Asking to use
the escape (was it \c?) sequence to workaround that limitation is not very nice.
Especially for scripting the change.
If you want a 5th argument for a URI, you can specify the leading text as "",
which is not much of an issue. And you keep the trailing text and the leading
one together.
What are your thoughts? What should we do?
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: MR macro 4th argument (was: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters)
2022-11-10 18:04 ` MR macro 4th argument (was: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters) Alejandro Colomar
@ 2022-11-10 18:11 ` Alejandro Colomar
2022-11-10 18:20 ` Alejandro Colomar
2022-11-10 19:37 ` Alejandro Colomar
2022-11-10 22:55 ` G. Branden Robinson
2 siblings, 1 reply; 85+ messages in thread
From: Alejandro Colomar @ 2022-11-10 18:11 UTC (permalink / raw)
To: G. Branden Robinson, groff; +Cc: Ingo Schwarze, linux-man
[-- Attachment #1.1: Type: text/plain, Size: 315 bytes --]
Hi Branden,
Another interesting thing is what to do here:
$ sed -n 319,320p man2/timerfd_create.2
.TP
.BR poll "(2), " select "(2) (and similar)"
Can I have multiple input lines as the tag for a TP? How to put 2 MR references
in there?
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: MR macro 4th argument (was: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters)
2022-11-10 18:11 ` Alejandro Colomar
@ 2022-11-10 18:20 ` Alejandro Colomar
0 siblings, 0 replies; 85+ messages in thread
From: Alejandro Colomar @ 2022-11-10 18:20 UTC (permalink / raw)
To: G. Branden Robinson, groff; +Cc: Ingo Schwarze, linux-man
[-- Attachment #1.1: Type: text/plain, Size: 477 bytes --]
On 11/10/22 19:11, Alejandro Colomar wrote:
> Hi Branden,
>
> Another interesting thing is what to do here:
>
> $ sed -n 319,320p man2/timerfd_create.2
> .TP
> .BR poll "(2), " select "(2) (and similar)"
>
>
> Can I have multiple input lines as the tag for a TP? How to put 2 MR references
> in there?
Or maybe I should reorganize it and use TQ and multiple separate tags...
>
> Cheers,
>
> Alex
>
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: MR macro 4th argument (was: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters)
2022-11-10 18:04 ` MR macro 4th argument (was: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters) Alejandro Colomar
2022-11-10 18:11 ` Alejandro Colomar
@ 2022-11-10 19:37 ` Alejandro Colomar
2022-11-10 20:41 ` Alejandro Colomar
2022-11-10 22:55 ` G. Branden Robinson
2 siblings, 1 reply; 85+ messages in thread
From: Alejandro Colomar @ 2022-11-10 19:37 UTC (permalink / raw)
To: G. Branden Robinson, groff; +Cc: Ingo Schwarze, linux-man
[-- Attachment #1.1: Type: text/plain, Size: 6737 bytes --]
Hi Branden,
On 11/10/22 19:04, Alejandro Colomar wrote:
> Of course I forgot to rename the title, and to agg groff@. Nice.
>
> -------- Forwarded Message --------
> Subject: Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
> Date: Thu, 10 Nov 2022 18:47:38 +0100
> From: Alejandro Colomar <alx.manpages@gmail.com>
> To: G. Branden Robinson <g.branden.robinson@gmail.com>
> CC: Ingo Schwarze <schwarze@usta.de>, linux-man@vger.kernel.org
>
> [removed gcc@ and other uninterested people; added groff@]
>
> Hi Branden!
>
> On 11/10/22 11:59, Alejandro Colomar wrote:
> >> Also, as soon as Bertrand and I can get groff 1.23 out[1], I am hoping
> >> you will, shortly thereafter, migrate to the new `MR` macro.
> >
> > Not as soon as it gets released, because I expect (at least a decent amount of)
> > contributors to be able to read the pages to which they contribute to, but as
> > soon as it makes it into Debian stable, yes, that's in my plans. So, if you
> > make it before the freeze, that means around a couple of months from now.
>
> I won't be applying the patch now, to avoid contributors seeing people suddenly
> not seeing man page references while preparing patches. But I'll start
> preparing the patch, to see where are the most difficult parts. And maybe
> report some issues with the usability.
>
> My first thing was to run:
>
> $ grep -rn '^\.BR .* ([1-9]\w*)'
>
> I'm surprised for good that it seems that there are no false positives. I
> didn't expect that. But since things like exit(1) are code, they are probably
> either not highlighted at all, or maybe are italicized (as code is). So that's
> a good thing.
>
> It showed a few lines that might be problematic, but that's actually bad code,
> which I need to fix:
>
> man7/credentials.7:270:.BR setuid "(2) (" setgid (2))
> man7/credentials.7:274:.BR seteuid "(2) (" setegid (2))
> man7/credentials.7:277:.BR setfsuid "(2) (" setfsgid (2))
> man7/credentials.7:280:.BR setreuid "(2) (" setregid (2))
> man7/credentials.7:284:.BR setresuid "(2) (" setresgid (2))
>
> Those are asking for a 2-line thing, where the second line is RB instead of BR.
> Which reminds me to check RB:
>
> $ grep -rn '^\.RB .* ([1-9]\w*)'
>
> There are much less cases, and also seem to be fine to script, with a few minor
> ffixes too.
>
> The big issue is that your MR doesn't support leading text:
>
> .MR page‐title manual‐section [trailing‐text]
>
> I remember we had this discussion about what to do with it. A 4th argument?
> There's also conflict with a hypothetical link that we might want to add later.
>
> My opinion is that the 4th argument should be the leading text. Asking to use
> the escape (was it \c?) sequence to workaround that limitation is not very nice.
> Especially for scripting the change.
>
> If you want a 5th argument for a URI, you can specify the leading text as "",
> which is not much of an issue. And you keep the trailing text and the leading
> one together.
>
> What are your thoughts? What should we do?
To document and discuss the way I'm migrating, I'll share here the scripts:
The simplest case: a single man page reference with no other stuff around it:
$ find man* -type f \
| xargs sed -i 's/^\.BR \([^ ]*\) (\([1-9]\w*\))$/.MR \1 \2/'
Second simplest case: a single man page reference with only trailing stuff:
$ find man* -type f \
| xargs sed -i 's/^\.BR \([^ ]*\) (\([1-9]\w*\))/.MR \1 \2 /'
And here I continue with hypothetical syntax not yet allowed by groff.
A single man page reference with only leading stuff:
$ find man* -type f \
| xargs sed -i 's/^\.RB \([^ ]*\) \([^ ]*\) (\([1-9]\w*\))$/.MR \2 \3 "" \1/'
A single man page reference with both leading and trailing stuff (thank $DEITY
for not having comments in any of those, so I can just run the script):
$ find man* -type f \
| xargs sed -i 's/^\.RB \([^ ]*\) \([^ ]*\) (\([1-9]\w*\))\(.*\)/.MR \2 \3 \4
\1/'
After running those 4, and inspecting the changes to make sure they look good
(and they do), I have a quite small amount of references that my scripts didn't
catch. Some of them, just need a ffix before running the scripts again, some
others need a manual migration, but nothing too difficult.
alx@asus5775:~/src/linux/man-pages/man-pages/MR$ grep -rn '^\.RB .* .\?([1-9]\w*)'
man2/mremap.2:324:.RB ( mmap "(2) " MAP_PRIVATE ),
man2/perf_event_open.2:35:.RB ( read "(2), " mmap "(2), " prctl "(2), " fcntl
"(2), etc.)."
man2/open.2:86:.RB ( read "(2), " write "(2), " lseek "(2), " fcntl (2),
man3type/div_t.3type:43:.RB [[ l ] l ] div (3)
man3/fts.3:189:.RB [ l ] stat (2)
man3/fts.3:200:.RB [ l ] stat (2)
man3/fts.3:331:.RB [ l ] stat (2)
man3/fts.3:745:.RB [ l ] stat (2).
man5/proc.5:3426:.RB ( flock "(2) and " fcntl (2))
man7/pty.7:125:.RB ( ssh "(1), " rlogin "(1), " telnet (1)),
alx@asus5775:~/src/linux/man-pages/man-pages/MR$ grep -rn '^\.BR .* .\?([1-9]\w*)'
man1/getent.1:346:.BR ahosts / getaddrinfo (3)
man2/ioprio_set.2:278:.BR IOPRIO_CLASS_RT " (1)"
man2/ioprio_set.2:293:.BR IOPRIO_CLASS_BE " (2)"
man2/ioprio_set.2:306:.BR IOPRIO_CLASS_IDLE " (3)"
man2/keyctl.2:985:.BR execve (2).
man2/ioctl_iflags.2:63:.BR mount (2)
man2/memfd_create.2:232:.BR open (2)
man2/syslog.2:77:.BR SYSLOG_ACTION_OPEN " (1)"
man2/syslog.2:81:.BR SYSLOG_ACTION_READ " (2)"
man2/syslog.2:93:.BR SYSLOG_ACTION_READ_ALL " (3)"
man2/syslog.2:103:.BR SYSLOG_ACTION_READ_CLEAR " (4)"
man2/syslog.2:109:.BR SYSLOG_ACTION_CLEAR " (5)"
man2/syslog.2:128:.BR SYSLOG_ACTION_CONSOLE_OFF " (6)"
man2/syslog.2:152:.BR SYSLOG_ACTION_CONSOLE_ON " (7)"
man2/syslog.2:175:.BR SYSLOG_ACTION_CONSOLE_LEVEL " (8)"
man2/syslog.2:192:.BR SYSLOG_ACTION_SIZE_UNREAD " (9) (since Linux 2.4.10)"
man2/syslog.2:203:.BR SYSLOG_ACTION_SIZE_BUFFER " (10) (since Linux 2.6.6)"
man2/sigreturn.2:42:.BR sigaltstack "(2))\(emin"
man2/timerfd_create.2:320:.BR poll "(2), " select "(2) (and similar)"
man2/eventfd.2:144:.BR poll "(2), " select "(2) (and similar)"
man2/signalfd.2:134:.BR poll "(2), " select "(2) (and similar)"
man3/duplocale.3:99:.BR freelocale (3).
man7/spufs.7:122:.BR read "(2), " pread "(2), " write "(2), " pwrite "(2), "
lseek (2)
man7/credentials.7:270:.BR setuid "(2) (" setgid (2))
man7/credentials.7:274:.BR seteuid "(2) (" setegid (2))
man7/credentials.7:277:.BR setfsuid "(2) (" setfsgid (2))
man7/credentials.7:280:.BR setreuid "(2) (" setregid (2))
man7/credentials.7:284:.BR setresuid "(2) (" setresgid (2))
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: MR macro 4th argument (was: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters)
2022-11-10 19:37 ` Alejandro Colomar
@ 2022-11-10 20:41 ` Alejandro Colomar
0 siblings, 0 replies; 85+ messages in thread
From: Alejandro Colomar @ 2022-11-10 20:41 UTC (permalink / raw)
To: G. Branden Robinson, groff; +Cc: Ingo Schwarze, linux-man
[-- Attachment #1.1: Type: text/plain, Size: 5503 bytes --]
On 11/10/22 20:37, Alejandro Colomar wrote:
> Hi Branden,
>
> On 11/10/22 19:04, Alejandro Colomar wrote:
>> Of course I forgot to rename the title, and to agg groff@. Nice.
>>
>> -------- Forwarded Message --------
>> Subject: Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function
>> parameters
>> Date: Thu, 10 Nov 2022 18:47:38 +0100
>> From: Alejandro Colomar <alx.manpages@gmail.com>
>> To: G. Branden Robinson <g.branden.robinson@gmail.com>
>> CC: Ingo Schwarze <schwarze@usta.de>, linux-man@vger.kernel.org
>>
>> [removed gcc@ and other uninterested people; added groff@]
>>
>> Hi Branden!
>>
>> On 11/10/22 11:59, Alejandro Colomar wrote:
>> >> Also, as soon as Bertrand and I can get groff 1.23 out[1], I am hoping
>> >> you will, shortly thereafter, migrate to the new `MR` macro.
>> >
>> > Not as soon as it gets released, because I expect (at least a decent amount
>> of)
>> > contributors to be able to read the pages to which they contribute to, but as
>> > soon as it makes it into Debian stable, yes, that's in my plans. So, if you
>> > make it before the freeze, that means around a couple of months from now.
>>
>> I won't be applying the patch now, to avoid contributors seeing people
>> suddenly not seeing man page references while preparing patches. But I'll
>> start preparing the patch, to see where are the most difficult parts. And
>> maybe report some issues with the usability.
>>
>> My first thing was to run:
>>
>> $ grep -rn '^\.BR .* ([1-9]\w*)'
>>
>> I'm surprised for good that it seems that there are no false positives. I
>> didn't expect that. But since things like exit(1) are code, they are probably
>> either not highlighted at all, or maybe are italicized (as code is). So
>> that's a good thing.
>>
>> It showed a few lines that might be problematic, but that's actually bad code,
>> which I need to fix:
>>
>> man7/credentials.7:270:.BR setuid "(2) (" setgid (2))
>> man7/credentials.7:274:.BR seteuid "(2) (" setegid (2))
>> man7/credentials.7:277:.BR setfsuid "(2) (" setfsgid (2))
>> man7/credentials.7:280:.BR setreuid "(2) (" setregid (2))
>> man7/credentials.7:284:.BR setresuid "(2) (" setresgid (2))
>>
>> Those are asking for a 2-line thing, where the second line is RB instead of
>> BR. Which reminds me to check RB:
>>
>> $ grep -rn '^\.RB .* ([1-9]\w*)'
>>
>> There are much less cases, and also seem to be fine to script, with a few
>> minor ffixes too.
>>
>> The big issue is that your MR doesn't support leading text:
>>
>> .MR page‐title manual‐section [trailing‐text]
>>
>> I remember we had this discussion about what to do with it. A 4th argument?
>> There's also conflict with a hypothetical link that we might want to add later.
>>
>> My opinion is that the 4th argument should be the leading text. Asking to use
>> the escape (was it \c?) sequence to workaround that limitation is not very
>> nice. Especially for scripting the change.
>>
>> If you want a 5th argument for a URI, you can specify the leading text as "",
>> which is not much of an issue. And you keep the trailing text and the leading
>> one together.
>>
>> What are your thoughts? What should we do?
>
> To document and discuss the way I'm migrating, I'll share here the scripts:
>
> The simplest case: a single man page reference with no other stuff around it:
>
> $ find man* -type f \
> | xargs sed -i 's/^\.BR \([^ ]*\) (\([1-9]\w*\))$/.MR \1 \2/'
>
> Second simplest case: a single man page reference with only trailing stuff:
>
> $ find man* -type f \
> | xargs sed -i 's/^\.BR \([^ ]*\) (\([1-9]\w*\))/.MR \1 \2 /'
>
>
> And here I continue with hypothetical syntax not yet allowed by groff.
>
> A single man page reference with only leading stuff:
>
> $ find man* -type f \
> | xargs sed -i 's/^\.RB \([^ ]*\) \([^ ]*\) (\([1-9]\w*\))$/.MR \2 \3 "" \1/'
>
> A single man page reference with both leading and trailing stuff (thank $DEITY
> for not having comments in any of those, so I can just run the script):
>
> $ find man* -type f \
> | xargs sed -i 's/^\.RB \([^ ]*\) \([^ ]*\) (\([1-9]\w*\))\(.*\)/.MR \2 \3 \4
> \1/'
And a few more, to cover same-page references. As Ingo recommended, I'm adding
the section for consistency. Redundancy is not a big issue here.
Man references in the same page, with no stuff around them:
$ find man2 -type f \
| xargs sed -i 's/^\.BR \([^ ]*\) ()$/.MR \1 2/'
$ find man3 -type f \
| xargs sed -i 's/^\.BR \([^ ]*\) ()$/.MR \1 3/'
Man references in the same page, with trailing stuff:
$ find man2 -type f \
| xargs sed -i 's/^\.BR \([^ ]*\) ()/.MR \1 2 /'
$ find man3 -type f \
| xargs sed -i 's/^\.BR \([^ ]*\) ()/.MR \1 3 /'
Man references in the same page, with only leading stuff:
$ find man2 -type f \
| xargs sed -i 's/^\.RB \([^ ]*\) \([^ ]*\) ()$/.MR \2 2 "" \1/'
$ find man3 -type f \
| xargs sed -i 's/^\.RB \([^ ]*\) \([^ ]*\) ()$/.MR \2 3 "" \1/'
And finally, man references in the same page, with both leading and trailing
stuff (again, I was lucky, and there were no comments):
$ find man2 -type f \
| xargs sed -i 's/^\.RB \([^ ]*\) \([^ ]*\) ()\(.*\)/.MR \2 2 \3 \1/'
$ find man3 -type f \
| xargs sed -i 's/^\.RB \([^ ]*\) \([^ ]*\) ()\(.*\)/.MR \2 3 \3 \1/'
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-10 10:59 ` Alejandro Colomar
2022-11-10 17:47 ` Alejandro Colomar
@ 2022-11-10 22:25 ` G. Branden Robinson
1 sibling, 0 replies; 85+ messages in thread
From: G. Branden Robinson @ 2022-11-10 22:25 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1: Type: text/plain, Size: 1533 bytes --]
Hi Alex,
At 2022-11-10T11:59:02+0100, Alejandro Colomar wrote:
> > You know what Moltke said about plans and contact with the enemy.
> > For one thing, I think the Linux kernel will move too fast to permit
> > such a leisurely cadence.
>
> Heh, at this point, I burnt my ships, by using enhanced VLA syntax.
> If I release that before GCC, I'm expecting to see an avalanche of
> reports about it (and I also expect that GCC and forums will receive a
> similar ammount). So yes, I expect to wait some longish time.
Hah, you rebutted my Moltke with your namesake. You understand that I'm
obligated to spring a reference to the Battle of Lepanto or something on
you at some point.
> > Also, as soon as Bertrand and I can get groff 1.23 out[1], I am
> > hoping you will, shortly thereafter, migrate to the new `MR` macro.
>
> Not as soon as it gets released, because I expect (at least a decent
> amount of) contributors to be able to read the pages to which they
> contribute to,
Laggardly adopters can always put this in man.local.
.if !d MR \{\
. de MR
. IR \\$1 (\\$2)\\$3
. .
.\}
> but as soon as it makes it into Debian stable, yes, that's in my
> plans. So, if you make it before the freeze, that means around a
> couple of months from now.
Yes. It is a major personal goal to get groff 1.23 into Debian
bookworm.
> > <tents fingers, laughs villainously>
>
> <also tents fingers, laughs villainously>
https://www.youtube.com/watch?v=VhH2egTLohM
Regards,
Branden
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: MR macro 4th argument (was: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters)
2022-11-10 18:04 ` MR macro 4th argument (was: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters) Alejandro Colomar
2022-11-10 18:11 ` Alejandro Colomar
2022-11-10 19:37 ` Alejandro Colomar
@ 2022-11-10 22:55 ` G. Branden Robinson
2022-11-10 23:55 ` Alejandro Colomar
2 siblings, 1 reply; 85+ messages in thread
From: G. Branden Robinson @ 2022-11-10 22:55 UTC (permalink / raw)
To: Alejandro Colomar; +Cc: groff, Ingo Schwarze, linux-man
[-- Attachment #1: Type: text/plain, Size: 6493 bytes --]
Hi Alex,
At 2022-11-10T19:04:46+0100, Alejandro Colomar wrote:
> Of course I forgot to rename the title, and to agg groff@. Nice.
It gave me time to reply to this one. :)
> On 11/10/22 11:59, Alejandro Colomar wrote:
> I won't be applying the patch now, to avoid contributors seeing people
> suddenly not seeing man page references while preparing patches. But
> I'll start preparing the patch, to see where are the most difficult
> parts. And maybe report some issues with the usability.
>
> My first thing was to run:
>
> $ grep -rn '^\.BR .* ([1-9]\w*)'
>
> I'm surprised for good that it seems that there are no false
> positives. I didn't expect that. But since things like exit(1) are
> code, they are probably either not highlighted at all, or maybe are
> italicized (as code is). So that's a good thing.
>
> It showed a few lines that might be problematic, but that's actually
> bad code, which I need to fix:
>
> man7/credentials.7:270:.BR setuid "(2) (" setgid (2))
> man7/credentials.7:274:.BR seteuid "(2) (" setegid (2))
> man7/credentials.7:277:.BR setfsuid "(2) (" setfsgid (2))
> man7/credentials.7:280:.BR setreuid "(2) (" setregid (2))
> man7/credentials.7:284:.BR setresuid "(2) (" setresgid (2))
>
> Those are asking for a 2-line thing, where the second line is RB instead of
> BR. Which reminds me to check RB:
>
> $ grep -rn '^\.RB .* ([1-9]\w*)'
>
> There are much less cases, and also seem to be fine to script, with a few
> minor ffixes too.
>
> The big issue is that your MR doesn't support leading text:
>
> .MR page‐title manual‐section [trailing‐text]
>
> I remember we had this discussion about what to do with it. A 4th
> argument? There's also conflict with a hypothetical link that we
> might want to add later.
>
> My opinion is that the 4th argument should be the leading text.
> Asking to use the escape (was it \c?) sequence to workaround that
> limitation is not very nice. Especially for scripting the change.
Here's what I did for groff.
commit 2ab0dacb95863a2e347d06cf970676c74c784ce2
Author: G. Branden Robinson <g.branden.robinson@gmail.com>
Date: Fri Oct 8 00:46:41 2021 +1100
[man pages]: Migrate man(7) cross refs to `MR`.
# Handle simplest case: ".IR foo (1)".
s/^.[BI]R \(\\%\)*\([@_[:alnum:]\\-]\+\) (\(@MAN[157]EXT@\))$/.MR \2 \3/
s/^.[BI]R \(\\%\)*\([@_[:alnum:]\\-]\+\) (\([1-8a-z]\+\))$/.MR \2 \3/
# Handle case: trailing puncutation, e.g., ".IR foo (1),".
s/^.[BI]R \(\\%\)*\([@_[:alnum:]\\-]\+\) (\(@MAN[157]EXT@\))\([^[:space:]]\+\)/.MR \2 \3 \4/
s/^.[BI]R \(\\%\)*\([@_[:alnum:]\\-]\+\) (\([1-8a-z]\+\))\([^[:space:]]\+\)/.MR \2 \3 \4/
# Handle case: 3rd+ arguments or trailing comments. This case is rare
# and will require manual fixup if there are 4+ arguments to MR. Use
# groff -man -rCHECKSTYLE=1 to have them automatically reported.
s/^.[BI]R \(\\%\)*\([@_[:alnum:]\\-]\+\) (\(@MAN[157]EXT@\))\( .*\)/.MR \2 \3\4/
s/^.[BI]R \(\\%\)*\([@_[:alnum:]\\-]\+\) (\([1-8a-z]\+\))\( .*\)/.MR \2 \3\4/
You can ignore the 'MAN[157]EXT' lines; they are relevant only to
within-groff pages (because all of our man pages undergo sed-processing
to be prepared for installation).
> If you want a 5th argument for a URI, you can specify the leading text
> as "", which is not much of an issue. And you keep the trailing text
> and the leading one together.
>
> What are your thoughts? What should we do?
I am reluctant to extend the interface of `MR` at this point because as
it is it has two nice properties: it aligns with mdoc(7)'s `Xr` macro
and with Plan 9 from User Space troff's `MR`, which did it first.
(Admittedly, P9US troff's `MR` macro doesn't supply the parentheses. I
don't know if they intend to change that. I'm willing to supply a patch
to change their implementation and their man pages to align with what I
did in groff. As shown above, I believe my sed-fu is in order.)
I think man page authors should learn when the `\c` escape sequence is
appropriate and use it when warranted, and recast their sentences
otherwise. That is why I provided an explicit example in the
groff_man_style(7) page.
.MR page-title manual-section [trailing-text]
(since groff 1.23) Set a man page cross reference as "page-
title(manual-section)". If trailing-text (typically
punctuation) is specified, it follows the closing parenthesis
without intervening space. Hyphenation is disabled while the
cross reference is set. page-title is set in the font specified
by the MF string. The cross reference hyperlinks to a URI of
the form "man:page-title(manual-section)".
The output driver
.MR grops 1
produces PostScript from
.I troff
output.
.
The Ghostscript program (\c
.MR gs 1 )
interprets PostScript and PDF.
`\c` solves problems that are complicated to solve any other way. As
far as I have seen, you don't ever need it in mdoc(7) pages, for
example...but you pay a price. You must learn which of mdoc's several
dozen macros are "parsed" versus "callable" (and what the heck the
package even _means_ by those words); you must learn that `Pf` and `Ns`
exist and when to use them; you must learn that certain two-letter words
will not behave as you expect; and if you thought using mdoc(7) meant
you wouldn't have to type any groff escape sequences, think
again--you'll be putting `\&` all over the place.
People can use mdoc(7) if they want to (and now that I'm learning it
better, I will consult as I am able), but its reputation in some circles
as a superior solution to man(7) on all fronts that should have kicked
its predecessor into the grave long ago is due solely to irresponsible
hype from its exponents.
If you need help automating a change to adapt some Linux man-pages
documents to use `\c` before an `MR` call on the next line (where you
were using `RB` before, for instance), just let know. I am nearly
certain that a sed script utilizing its hold space feature can get the
job done. (I've used the hold space profitably before, but occasions
for it come up seldom enough that I have to review my past solutions
before the knowledge comes back. Or maybe it's creeping senescence.)
Regards,
Branden
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-10 6:21 ` Martin Uecker
2022-11-10 10:09 ` Alejandro Colomar
@ 2022-11-10 23:19 ` Joseph Myers
2022-11-10 23:28 ` Alejandro Colomar
` (2 more replies)
1 sibling, 3 replies; 85+ messages in thread
From: Joseph Myers @ 2022-11-10 23:19 UTC (permalink / raw)
To: Martin Uecker
Cc: Alejandro Colomar, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
On Thu, 10 Nov 2022, Martin Uecker via Gcc wrote:
> One problem with WG14 papers is that people put in too much,
> because the overhead is so high and the standard is not updated
> very often. It would be better to build such feature more
> incrementally, which could be done more easily with a compiler
> extension. One could start supporting just [.x] but not more
> complicated expressions.
Even a compiler extension requires the level of detail of specification
that you get with a WG14 paper (and the level of work on finding bugs in
that specification), to avoid the problem we've had before with too many
features added in GCC 2.x days where a poorly defined feature is "whatever
the compiler accepts".
If you use .x as the notation but don't limit it to [.x], you have a
completely new ambiguity between ordinary identifiers and member names
struct s { int a; };
void f(int a, int b[((struct s) { .a = 1 }).a]);
where it's newly ambiguous whether ".a = 1" is an assignment to the
expression ".a" or a use of a designated initializer.
(I think that if you add any syntax for this, GNU VLA forward declarations
are clearly to be preferred to inventing something new like [.x] which
introduces its own problems.)
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-10 23:19 ` Joseph Myers
@ 2022-11-10 23:28 ` Alejandro Colomar
2022-11-11 19:52 ` Martin Uecker
2022-11-12 12:34 ` Alejandro Colomar
2 siblings, 0 replies; 85+ messages in thread
From: Alejandro Colomar @ 2022-11-10 23:28 UTC (permalink / raw)
To: Joseph Myers, Martin Uecker
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 2085 bytes --]
Hi Joseph,
On 11/11/22 00:19, Joseph Myers wrote:
> On Thu, 10 Nov 2022, Martin Uecker via Gcc wrote:
>
>> One problem with WG14 papers is that people put in too much,
>> because the overhead is so high and the standard is not updated
>> very often. It would be better to build such feature more
>> incrementally, which could be done more easily with a compiler
>> extension. One could start supporting just [.x] but not more
>> complicated expressions.
>
> Even a compiler extension requires the level of detail of specification
> that you get with a WG14 paper (and the level of work on finding bugs in
> that specification), to avoid the problem we've had before with too many
> features added in GCC 2.x days where a poorly defined feature is "whatever
> the compiler accepts".
>
> If you use .x as the notation but don't limit it to [.x], you have a
> completely new ambiguity between ordinary identifiers and member names
>
> struct s { int a; };
> void f(int a, int b[((struct s) { .a = 1 }).a]);
>
> where it's newly ambiguous whether ".a = 1" is an assignment to the
> expression ".a" or a use of a designated initializer.
>
> (I think that if you add any syntax for this, GNU VLA forward declarations
> are clearly to be preferred to inventing something new like [.x] which
> introduces its own problems.)
Yeah, I think limiting it to [.n] initially, and only moving forward, step by
step, if it's perfectly clear that it's doable seems very reasonable.
Re: GNU VLA fwd decl:
This example is what I'm worried about:
int foo(int a; int b[a], int a);
int foo(int a, int b[a], int o);
Okay, parameters should have more readable names... But still, it allows for a
high chance of wtf moments. However, I can think of a syntax very similar to
GNU's, that would make it a bit better in terms of readability: not declaring
the type in the fwd decl:
int foo(a; int b[a], int a);
int foo(int a, int b[a], int o);
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: MR macro 4th argument (was: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters)
2022-11-10 22:55 ` G. Branden Robinson
@ 2022-11-10 23:55 ` Alejandro Colomar
2022-11-11 4:44 ` G. Branden Robinson
0 siblings, 1 reply; 85+ messages in thread
From: Alejandro Colomar @ 2022-11-10 23:55 UTC (permalink / raw)
To: G. Branden Robinson; +Cc: groff, Ingo Schwarze, linux-man
[-- Attachment #1.1: Type: text/plain, Size: 7194 bytes --]
Hi Branden,
On 11/10/22 23:55, G. Branden Robinson wrote:
> Hi Alex,
>
> At 2022-11-10T19:04:46+0100, Alejandro Colomar wrote:
>> Of course I forgot to rename the title, and to agg groff@. Nice.
>
> It gave me time to reply to this one. :)
:)
[...]
>> The big issue is that your MR doesn't support leading text:
>>
>> .MR page‐title manual‐section [trailing‐text]
>>
>> I remember we had this discussion about what to do with it. A 4th
>> argument? There's also conflict with a hypothetical link that we
>> might want to add later.
>>
>> My opinion is that the 4th argument should be the leading text.
>> Asking to use the escape (was it \c?) sequence to workaround that
>> limitation is not very nice. Especially for scripting the change.
>
> Here's what I did for groff.
>
> commit 2ab0dacb95863a2e347d06cf970676c74c784ce2
> Author: G. Branden Robinson <g.branden.robinson@gmail.com>
> Date: Fri Oct 8 00:46:41 2021 +1100
>
> [man pages]: Migrate man(7) cross refs to `MR`.
>
> # Handle simplest case: ".IR foo (1)".
> s/^.[BI]R \(\\%\)*\([@_[:alnum:]\\-]\+\) (\(@MAN[157]EXT@\))$/.MR \2 \3/
> s/^.[BI]R \(\\%\)*\([@_[:alnum:]\\-]\+\) (\([1-8a-z]\+\))$/.MR \2 \3/
> # Handle case: trailing puncutation, e.g., ".IR foo (1),".
> s/^.[BI]R \(\\%\)*\([@_[:alnum:]\\-]\+\) (\(@MAN[157]EXT@\))\([^[:space:]]\+\)/.MR \2 \3 \4/
> s/^.[BI]R \(\\%\)*\([@_[:alnum:]\\-]\+\) (\([1-8a-z]\+\))\([^[:space:]]\+\)/.MR \2 \3 \4/
> # Handle case: 3rd+ arguments or trailing comments. This case is rare
> # and will require manual fixup if there are 4+ arguments to MR. Use
> # groff -man -rCHECKSTYLE=1 to have them automatically reported.
> s/^.[BI]R \(\\%\)*\([@_[:alnum:]\\-]\+\) (\(@MAN[157]EXT@\))\( .*\)/.MR \2 \3\4/
> s/^.[BI]R \(\\%\)*\([@_[:alnum:]\\-]\+\) (\([1-8a-z]\+\))\( .*\)/.MR \2 \3\4/
>
> You can ignore the 'MAN[157]EXT' lines; they are relevant only to
> within-groff pages (because all of our man pages undergo sed-processing
> to be prepared for installation).
Hmm, will need to parse that. Anyway, I think now that I have the MR with 4
arguments, moving the 4th to the previous line with sed and N should not be that
difficult.
>
>> If you want a 5th argument for a URI, you can specify the leading text
>> as "", which is not much of an issue. And you keep the trailing text
>> and the leading one together.
>>
>> What are your thoughts? What should we do?
>
> I am reluctant to extend the interface of `MR` at this point because as
> it is it has two nice properties: it aligns with mdoc(7)'s `Xr` macro
> and with Plan 9 from User Space troff's `MR`, which did it first.
Well, being a compatible extension to the others is not that bad. How does
mdoc(7) solve it with Xr?
>
> (Admittedly, P9US troff's `MR` macro doesn't supply the parentheses. I
> don't know if they intend to change that. I'm willing to supply a patch
> to change their implementation and their man pages to align with what I
> did in groff. As shown above, I believe my sed-fu is in order.)
>
> I think man page authors should learn when the `\c` escape sequence is
> appropriate and use it when warranted, and recast their sentences
> otherwise. That is why I provided an explicit example in the
> groff_man_style(7) page.
>
> .MR page-title manual-section [trailing-text]
> (since groff 1.23) Set a man page cross reference as "page-
> title(manual-section)". If trailing-text (typically
> punctuation) is specified, it follows the closing parenthesis
> without intervening space. Hyphenation is disabled while the
> cross reference is set. page-title is set in the font specified
> by the MF string. The cross reference hyperlinks to a URI of
> the form "man:page-title(manual-section)".
>
> The output driver
> .MR grops 1
> produces PostScript from
> .I troff
> output.
> .
> The Ghostscript program (\c
> .MR gs 1 )
> interprets PostScript and PDF.
One of the biggest issues with this is that it breaks what would otherwise
represent a single entity, into two lines, so it hurts readability. See as an
extreme example the following change I did with my scripts (from posix_spawn(3),
if you're curious):
@@ -129,7 +129,7 @@ .SH DESCRIPTION
Below, the functions are described in terms of a three-step process: the
.MR fork 3
step, the
-.RB pre- exec ()
+.MR exec 3 "" pre-
step (executed in the child),
and the
.MR exec 3
Having 'pre-' as the last part of some random line, separates it from the other
part of the word. The \c alternative would be:
step, the pre-
.MR exec 3
step ...
Not terrible, but I'm not in love with it.
>
> `\c` solves problems that are complicated to solve any other way. As
> far as I have seen, you don't ever need it in mdoc(7) pages, for
> example...but you pay a price. You must learn which of mdoc's several
> dozen macros are "parsed" versus "callable" (and what the heck the
> package even _means_ by those words); you must learn that `Pf` and `Ns`
> exist and when to use them; you must learn that certain two-letter words
> will not behave as you expect; and if you thought using mdoc(7) meant
> you wouldn't have to type any groff escape sequences, think
> again--you'll be putting `\&` all over the place.
>
> People can use mdoc(7) if they want to (and now that I'm learning it
> better, I will consult as I am able), but its reputation in some circles
> as a superior solution to man(7) on all fronts that should have kicked
> its predecessor into the grave long ago is due solely to irresponsible
> hype from its exponents.
>
> If you need help automating a change to adapt some Linux man-pages
> documents to use `\c` before an `MR` call on the next line (where you
> were using `RB` before, for instance), just let know. I am nearly
> certain that a sed script utilizing its hold space feature can get the
> job done. (I've used the hold space profitably before, but occasions
> for it come up seldom enough that I have to review my past solutions
> before the knowledge comes back. Or maybe it's creeping senescence.)
I hope I can come up with something, but yes, if not, I'll call you ;)
BTW, so far I've never found a case where I had to use the hold space. I wonder
if I may meet a case where I need it in my life. This week I came up with some
script for inserting an element into a JSON array at a specified position, but N
is all that was needed:
<http://www.alejandro-colomar.es/src/alx/nginx/unitcli.git/tree/bin/setup-unit#n969>.
I've met a few more-complex cases, but not really that much. I always come up
with some combination of filters that allows me to avoid the hold space.
Sometimes, two scripts run consecutively also helps keep it simple :)
Cheers,
Alex
>
> Regards,
> Branden
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: MR macro 4th argument (was: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters)
2022-11-10 23:55 ` Alejandro Colomar
@ 2022-11-11 4:44 ` G. Branden Robinson
0 siblings, 0 replies; 85+ messages in thread
From: G. Branden Robinson @ 2022-11-11 4:44 UTC (permalink / raw)
To: Alejandro Colomar; +Cc: groff, Ingo Schwarze, linux-man
[-- Attachment #1: Type: text/plain, Size: 2901 bytes --]
At 2022-11-11T00:55:18+0100, Alejandro Colomar wrote:
> Hmm, will need to parse that. Anyway, I think now that I have the MR
> with 4 arguments, moving the 4th to the previous line with sed and N
> should not be that difficult.
Okay.
> Well, being a compatible extension to the others is not that bad. How
> does mdoc(7) solve it with Xr?
I alluded to it: the `Pf` ("prefix") macro.
man(7):
.TH foo 1 2022-11-10 "groff test suite"
.SH Description
pre-\c
.MR exec 3
mdoc(7):
.Dd 2022-11-10
.Dt foo 1
.Os "groff test suite"
.Sh Description
.Pf pre- Xr exec 3
> One of the biggest issues with this is that it breaks what would
> otherwise represent a single entity, into two lines, so it hurts
> readability. See as an extreme example the following change I did
> with my scripts (from posix_spawn(3), if you're curious):
>
> @@ -129,7 +129,7 @@ .SH DESCRIPTION
> Below, the functions are described in terms of a three-step process: the
> .MR fork 3
> step, the
> -.RB pre- exec ()
> +.MR exec 3 "" pre-
> step (executed in the child),
> and the
> .MR exec 3
>
> Having 'pre-' as the last part of some random line, separates it from the
> other part of the word. The \c alternative would be:
>
> step, the pre-
> .MR exec 3
> step ...
>
> Not terrible, but I'm not in love with it.
I personally find the derangement of word ordering more disruptive to my
reading than a mid-word line break...especially after a hyphen, where
years of experience have prepared me to expect a continued word on the
next line anyway. ;-)
I would also note that I don't think it's necessary to hyperlink every
single occurence of a cross-referenced man page topic, especially if the
same page topic comes up repeatedly in a section (or even paragraph).
IIRC Ingo doesn't agree, and you might too.
> I hope I can come up with something, but yes, if not, I'll call you ;)
My bat-shaped phone is plugged in.
> BTW, so far I've never found a case where I had to use the hold space.
> I wonder if I may meet a case where I need it in my life. This week I
> came up with some script for inserting an element into a JSON array at
> a specified position, but N is all that was needed:
> <http://www.alejandro-colomar.es/src/alx/nginx/unitcli.git/tree/bin/setup-unit#n969>.
Multi-line patterns solve a lot of problems. A person knows that they
are no longer a sed(1) beginner when they use those effectively. :D
> I've met a few more-complex cases, but not really that much. I always
> come up with some combination of filters that allows me to avoid the
> hold space. Sometimes, two scripts run consecutively also helps keep
> it simple :)
I've resorted to this too. It's just that sed is such a small language
(even in its GNU dialect) that it taunts me. Surely mastering it should
be _easy_...
Regards,
Branden
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-10 23:19 ` Joseph Myers
2022-11-10 23:28 ` Alejandro Colomar
@ 2022-11-11 19:52 ` Martin Uecker
2022-11-12 1:09 ` Joseph Myers
2022-11-12 12:34 ` Alejandro Colomar
2 siblings, 1 reply; 85+ messages in thread
From: Martin Uecker @ 2022-11-11 19:52 UTC (permalink / raw)
To: Joseph Myers
Cc: Alejandro Colomar, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
Am Donnerstag, den 10.11.2022, 23:19 +0000 schrieb Joseph Myers:
> On Thu, 10 Nov 2022, Martin Uecker via Gcc wrote:
>
> > One problem with WG14 papers is that people put in too much,
> > because the overhead is so high and the standard is not updated
> > very often. It would be better to build such feature more
> > incrementally, which could be done more easily with a compiler
> > extension. One could start supporting just [.x] but not more
> > complicated expressions.
>
> Even a compiler extension requires the level of detail of specification
> that you get with a WG14 paper (and the level of work on finding bugs in
> that specification), to avoid the problem we've had before with too many
> features added in GCC 2.x days where a poorly defined feature is "whatever
> the compiler accepts".
I think the effort needed to specify the feature correctly
can be minimized by making the first version of the feature
as simple as possible.
> If you use .x as the notation but don't limit it to [.x], you have a
> completely new ambiguity between ordinary identifiers and member names
>
> struct s { int a; };
> void f(int a, int b[((struct s) { .a = 1 }).a]);
>
> where it's newly ambiguous whether ".a = 1" is an assignment to the
> expression ".a" or a use of a designated initializer.
If we only allowed [ . a ] then this example would not be allowed.
If need more flexibility, we could incrementally extend it.
> (I think that if you add any syntax for this, GNU VLA forward declarations
> are clearly to be preferred to inventing something new like [.x] which
> introduces its own problems.)
I also prefer this.
I proposed forward declarations but WG14 and also people in this
discussion did not like them. If we would actually start using
them, we could propose them again for the next revision.
Martin
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-11 19:52 ` Martin Uecker
@ 2022-11-12 1:09 ` Joseph Myers
2022-11-12 7:24 ` Martin Uecker
0 siblings, 1 reply; 85+ messages in thread
From: Joseph Myers @ 2022-11-12 1:09 UTC (permalink / raw)
To: Martin Uecker
Cc: Alejandro Colomar, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
On Fri, 11 Nov 2022, Martin Uecker via Gcc wrote:
> > Even a compiler extension requires the level of detail of specification
> > that you get with a WG14 paper (and the level of work on finding bugs in
> > that specification), to avoid the problem we've had before with too many
> > features added in GCC 2.x days where a poorly defined feature is "whatever
> > the compiler accepts".
>
> I think the effort needed to specify the feature correctly
> can be minimized by making the first version of the feature
> as simple as possible.
The version of constexpr in the current C2x working draft is more or less
as simple as possible. It also went through lots of revisions to get
there. I'm currently testing an implementation of C2x constexpr for GCC
13, and there are still several issues with the specification I found in
the implementation process, beyond those raised in WG14 discussions, for
which I'll need to raise NB comments to clarify things.
I think that illustrates that you need the several iterations on the
specification process, *and* making it as simple as possible, *and*
getting implementation experience, *and* the implementation experience
being with a close eye to what it implies for all the details in the
specification rather than just getting something vaguely functional but
not clearly specified.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-12 1:09 ` Joseph Myers
@ 2022-11-12 7:24 ` Martin Uecker
0 siblings, 0 replies; 85+ messages in thread
From: Martin Uecker @ 2022-11-12 7:24 UTC (permalink / raw)
To: Joseph Myers
Cc: Alejandro Colomar, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
Am Samstag, den 12.11.2022, 01:09 +0000 schrieb Joseph Myers:
> On Fri, 11 Nov 2022, Martin Uecker via Gcc wrote:
>
> > > Even a compiler extension requires the level of detail of specification
> > > that you get with a WG14 paper (and the level of work on finding bugs in
> > > that specification), to avoid the problem we've had before with too many
> > > features added in GCC 2.x days where a poorly defined feature is "whatever
> > > the compiler accepts".
> >
> > I think the effort needed to specify the feature correctly
> > can be minimized by making the first version of the feature
> > as simple as possible.
>
> The version of constexpr in the current C2x working draft is more or less
> as simple as possible. It also went through lots of revisions to get
> there. I'm currently testing an implementation of C2x constexpr for GCC
> 13, and there are still several issues with the specification I found in
> the implementation process, beyond those raised in WG14 discussions, for
> which I'll need to raise NB comments to clarify things.
constexpr had no implementation experience in C at all and
always suspected that C++ experience should somehow count is
not really justified.
> I think that illustrates that you need the several iterations on the
> specification process, *and* making it as simple as possible, *and*
> getting implementation experience, *and* the implementation experience
> being with a close eye to what it implies for all the details in the
> specification rather than just getting something vaguely functional but
> not clearly specified.
I agree. We should work on specification and on prototyping
new features in parallel.
Martin
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-10 23:19 ` Joseph Myers
2022-11-10 23:28 ` Alejandro Colomar
2022-11-11 19:52 ` Martin Uecker
@ 2022-11-12 12:34 ` Alejandro Colomar
2022-11-12 12:46 ` Alejandro Colomar
2022-11-12 13:03 ` Joseph Myers
2 siblings, 2 replies; 85+ messages in thread
From: Alejandro Colomar @ 2022-11-12 12:34 UTC (permalink / raw)
To: Joseph Myers, Martin Uecker
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 2166 bytes --]
Hi Joseph,
On 11/11/22 00:19, Joseph Myers wrote:
> On Thu, 10 Nov 2022, Martin Uecker via Gcc wrote:
>
>> One problem with WG14 papers is that people put in too much,
>> because the overhead is so high and the standard is not updated
>> very often. It would be better to build such feature more
>> incrementally, which could be done more easily with a compiler
>> extension. One could start supporting just [.x] but not more
>> complicated expressions.
>
> Even a compiler extension requires the level of detail of specification
> that you get with a WG14 paper (and the level of work on finding bugs in
> that specification), to avoid the problem we've had before with too many
> features added in GCC 2.x days where a poorly defined feature is "whatever
> the compiler accepts".
>
> If you use .x as the notation but don't limit it to [.x], you have a
> completely new ambiguity between ordinary identifiers and member names
>
> struct s { int a; };
> void f(int a, int b[((struct s) { .a = 1 }).a]);
Is it really ambiguous? Let's show some currently-valid code:
struct s {
int a;
};
struct t {
struct s s;
int a;
};
void f(void)
{
struct t x = {
.a = 1,
.s = {
.a = ((struct s) {.a = 1}).a,
},
};
}
It is ambiguous to a human reader, but that's a subjective thing, and of course
shadowing should be avoided by programmers. However, for a compiler, scoping
and syntax rules should be unambiguous, I think. In your code example, I
believe it is unambiguous that both '.a' refer to the struct member.
But maybe we're not considering more complex situations that might really be
ambiguous to the compiler, so a first round of supporting only [.a] would be a
good first implementation.
>
> where it's newly ambiguous whether ".a = 1" is an assignment to the
> expression ".a" or a use of a designated initializer.
>
> (I think that if you add any syntax for this, GNU VLA forward declarations
> are clearly to be preferred to inventing something new like [.x] which
> introduces its own problems.)
>
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-12 12:34 ` Alejandro Colomar
@ 2022-11-12 12:46 ` Alejandro Colomar
2022-11-12 13:03 ` Joseph Myers
1 sibling, 0 replies; 85+ messages in thread
From: Alejandro Colomar @ 2022-11-12 12:46 UTC (permalink / raw)
To: Joseph Myers, Martin Uecker
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 1004 bytes --]
On 11/12/22 13:34, Alejandro Colomar wrote:
> struct s {
> int a;
> };
>
> struct t {
> struct s s;
> int a;
> };
>
> void f(void)
> {
> struct t x = {
> .a = 1,
> .s = {
> .a = ((struct s) {.a = 1}).a,
> },
> };
> }
From here, a demonstration of what I understood from Martin's email is that
there's also an idea of allowing the following:
struct s {
int a;
int b;
};
struct t {
struct s s;
int a;
int b;
};
void f(void)
{
struct t x = {
.a = 1,
.s = {
// In the following line, .b=.a is assigning 2
.a = ((struct s) {.a = 2, .b = .a}).b,
// The previous line assigned 2, since the compound had 2 in .b
},
// In the following line, .b=.a is assigning 1
.b = .a,
};
}
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-12 12:34 ` Alejandro Colomar
2022-11-12 12:46 ` Alejandro Colomar
@ 2022-11-12 13:03 ` Joseph Myers
2022-11-12 13:40 ` Alejandro Colomar
1 sibling, 1 reply; 85+ messages in thread
From: Joseph Myers @ 2022-11-12 13:03 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
On Sat, 12 Nov 2022, Alejandro Colomar via Gcc wrote:
> > struct s { int a; };
> > void f(int a, int b[((struct s) { .a = 1 }).a]);
>
> Is it really ambiguous? Let's show some currently-valid code:
Well, I still don't know what the syntax addition you propose is. Is it
postfix-expression : . identifier
(with a special rule about how the identifier is interpreted, different
from the normal scope rules)? If so, then ".a = 1" could either match
assignment-expression directly (assigning to the postfix-expression ".a").
Or it could match designation[opt] initializer, where ".a" is a
designator. And as I've noted many times in discussions of C2x proposals
on the WG14 reflector, if some sequence of tokens can match the syntax in
more than one way, there always needs to be explicit normative text to
disambiguate the intended parse - it's not enough that one parse might
lead later to a violation of some other constraint (not that either parse
leads to a constraint violation in this case).
Or is the syntax
array-declarator : direct-declarator [ . assignment-expression ]
(with appropriate variants with static and type-qualifier-list and for
array-abstract-declarator as well, and with different identifier
interpretation rules inside the assignment-expression)? If so, then there
are big problems parsing [ . ( a ) + ( b ) ], where 'a' is a typedef name
in an outer scope, because the appropriate parse would depend on whether
'a' is shadowed by a parameter - unless of course you add appropriate
wording like that present in some places about not being able to use this
syntax to shadow a typedef name.
Or is it just
array-declarator : direct-declarator [ . identifier ]
which might avoid some of these problems at the expense of being less
expressive?
If you're proposing a C syntax addition, you always need to be clear about
exactly what the new cases in the syntax would be, and how you resolve
ambiguities with any other existing part of the syntax, how you interact
with rules on scopes, namespaces and linkage of identifiers, etc.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-12 13:03 ` Joseph Myers
@ 2022-11-12 13:40 ` Alejandro Colomar
2022-11-12 13:58 ` Alejandro Colomar
2022-11-12 14:54 ` Joseph Myers
0 siblings, 2 replies; 85+ messages in thread
From: Alejandro Colomar @ 2022-11-12 13:40 UTC (permalink / raw)
To: Joseph Myers
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 3876 bytes --]
Hi Joseph,
On 11/12/22 14:03, Joseph Myers wrote:
> On Sat, 12 Nov 2022, Alejandro Colomar via Gcc wrote:
>
>>> struct s { int a; };
>>> void f(int a, int b[((struct s) { .a = 1 }).a]);
>>
>> Is it really ambiguous? Let's show some currently-valid code:
>
> Well, I still don't know what the syntax addition you propose is. Is it
>
> postfix-expression : . identifier
I'll try to explain it in standardeese, but I'm not sure if I'll get it right,
so I'll accompany it with plain English.
Maybe Martin can help.
Since it's to be used as an rvalue, not as a lvalue, I guess a
postfix-expression wouldn't be the right one.
>
> (with a special rule about how the identifier is interpreted, different
> from the normal scope rules)? If so, then ".a = 1" could either match
> assignment-expression directly (assigning to the postfix-expression ".a").
No, assigning to a function parameter from within another parameter declaration
wouldn't make sense. They should be readonly. Side effects should be
forbidden, I think.
> Or it could match designation[opt] initializer, where ".a" is a
> designator. And as I've noted many times in discussions of C2x proposals
> on the WG14 reflector, if some sequence of tokens can match the syntax in
> more than one way, there always needs to be explicit normative text to
> disambiguate the intended parse - it's not enough that one parse might
> lead later to a violation of some other constraint (not that either parse
> leads to a constraint violation in this case).
>
> Or is the syntax
>
> array-declarator : direct-declarator [ . assignment-expression ]
Not good either. The '.' should prefix the identifier, not the expression. So,
for example, you would have:
void *bsearch(const void key[.size], const void base[.size * .nmemb],
size_t nmemb, size_t size,
int (*compar)(const void [.size], const void [.size]));
That's taken from the current manual page from git HEAD. See 'base', which gets
its size from the multiplication of 'size' and 'nmemb'.
>
> (with appropriate variants with static and type-qualifier-list and for
> array-abstract-declarator as well, and with different identifier
> interpretation rules inside the assignment-expression)? If so, then there
> are big problems parsing [ . ( a ) + ( b ) ], where 'a' is a typedef name
> in an outer scope, because the appropriate parse would depend on whether
> 'a' is shadowed by a parameter - unless of course you add appropriate
> wording like that present in some places about not being able to use this
> syntax to shadow a typedef name.
>
> Or is it just
>
> array-declarator : direct-declarator [ . identifier ]
For the initial implementation, it would be, I think.
>
> which might avoid some of these problems at the expense of being less
> expressive?
Yes.
>
> If you're proposing a C syntax addition, you always need to be clear about
> exactly what the new cases in the syntax would be, and how you resolve
> ambiguities with any other existing part of the syntax, how you interact
> with rules on scopes, namespaces and linkage of identifiers, etc.
Yeah, I'll try.
I think that the complete feature would allow 'designator' to be used within
unary-expression:
unary-expression: designator
Since sizeof(foo) is a unary-expression and you can't assign to it, I'm guessing
that similar rules could be used for '.size'.
That would have the effect of allowing both features suggested by Martin: being
able to used designators in both structures (as demonstrated in my last email)
and function prototypes (as in the thing we're discussing).
I hope I got it right. I'm not used to lexical grammar so much.
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-12 13:40 ` Alejandro Colomar
@ 2022-11-12 13:58 ` Alejandro Colomar
2022-11-12 14:54 ` Joseph Myers
1 sibling, 0 replies; 85+ messages in thread
From: Alejandro Colomar @ 2022-11-12 13:58 UTC (permalink / raw)
To: Joseph Myers
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 4271 bytes --]
On 11/12/22 14:40, Alejandro Colomar wrote:
> Hi Joseph,
>
> On 11/12/22 14:03, Joseph Myers wrote:
>> On Sat, 12 Nov 2022, Alejandro Colomar via Gcc wrote:
>>
>>>> struct s { int a; };
>>>> void f(int a, int b[((struct s) { .a = 1 }).a]);
>>>
>>> Is it really ambiguous? Let's show some currently-valid code:
>>
>> Well, I still don't know what the syntax addition you propose is. Is it
>>
>> postfix-expression : . identifier
>
> I'll try to explain it in standardeese, but I'm not sure if I'll get it right,
> so I'll accompany it with plain English.
>
> Maybe Martin can help.
>
> Since it's to be used as an rvalue, not as a lvalue, I guess a
> postfix-expression wouldn't be the right one.
>
>>
>> (with a special rule about how the identifier is interpreted, different
>> from the normal scope rules)? If so, then ".a = 1" could either match
>> assignment-expression directly (assigning to the postfix-expression ".a").
>
> No, assigning to a function parameter from within another parameter declaration
> wouldn't make sense. They should be readonly. Side effects should be
> forbidden, I think.
>
>> Or it could match designation[opt] initializer, where ".a" is a
>> designator. And as I've noted many times in discussions of C2x proposals
>> on the WG14 reflector, if some sequence of tokens can match the syntax in
>> more than one way, there always needs to be explicit normative text to
>> disambiguate the intended parse - it's not enough that one parse might
>> lead later to a violation of some other constraint (not that either parse
>> leads to a constraint violation in this case).
>>
>> Or is the syntax
>>
>> array-declarator : direct-declarator [ . assignment-expression ]
>
> Not good either. The '.' should prefix the identifier, not the expression. So,
> for example, you would have:
>
> void *bsearch(const void key[.size], const void base[.size * .nmemb],
> size_t nmemb, size_t size,
> int (*compar)(const void [.size], const void [.size]));
>
> That's taken from the current manual page from git HEAD. See 'base', which gets
> its size from the multiplication of 'size' and 'nmemb'.
>
>>
>> (with appropriate variants with static and type-qualifier-list and for
>> array-abstract-declarator as well, and with different identifier
>> interpretation rules inside the assignment-expression)? If so, then there
>> are big problems parsing [ . ( a ) + ( b ) ], where 'a' is a typedef name
>> in an outer scope, because the appropriate parse would depend on whether
>> 'a' is shadowed by a parameter - unless of course you add appropriate
>> wording like that present in some places about not being able to use this
>> syntax to shadow a typedef name.
>>
>> Or is it just
>>
>> array-declarator : direct-declarator [ . identifier ]
>
> For the initial implementation, it would be, I think.
>
>>
>> which might avoid some of these problems at the expense of being less
>> expressive?
>
> Yes.
>
>>
>> If you're proposing a C syntax addition, you always need to be clear about
>> exactly what the new cases in the syntax would be, and how you resolve
>> ambiguities with any other existing part of the syntax, how you interact
>> with rules on scopes, namespaces and linkage of identifiers, etc.
>
> Yeah, I'll try.
>
> I think that the complete feature would allow 'designator' to be used within
> unary-expression:
>
> unary-expression: designator
Some mistake I did: Since enum designators don't make sense in this feature, it
should only be:
unary-expression: . identifier
>
> Since sizeof(foo) is a unary-expression and you can't assign to it, I'm guessing
> that similar rules could be used for '.size'.
>
>
> That would have the effect of allowing both features suggested by Martin: being
> able to used designators in both structures (as demonstrated in my last email)
> and function prototypes (as in the thing we're discussing).
>
> I hope I got it right. I'm not used to lexical grammar so much.
>
> Cheers,
>
> Alex
>
>
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-12 13:40 ` Alejandro Colomar
2022-11-12 13:58 ` Alejandro Colomar
@ 2022-11-12 14:54 ` Joseph Myers
2022-11-12 15:35 ` Alejandro Colomar
2022-11-12 15:56 ` Martin Uecker
1 sibling, 2 replies; 85+ messages in thread
From: Joseph Myers @ 2022-11-12 14:54 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
On Sat, 12 Nov 2022, Alejandro Colomar via Gcc wrote:
> Since it's to be used as an rvalue, not as a lvalue, I guess a
> postfix-expression wouldn't be the right one.
Several forms of postfix-expression are only rvalues.
> > (with a special rule about how the identifier is interpreted, different
> > from the normal scope rules)? If so, then ".a = 1" could either match
> > assignment-expression directly (assigning to the postfix-expression ".a").
>
> No, assigning to a function parameter from within another parameter
> declaration wouldn't make sense. They should be readonly. Side effects
> should be forbidden, I think.
Such assignments are already allowed. In a function definition, the side
effects (including in size expressions for array parameters adjusted to
pointers) take place before entry to the function body.
And, in any case, if you did have a constraint disallowing such
assignments, it wouldn't suffice for syntactic disambiguation (see the
previous point I made about that; I have some rough notes towards a WG14
paper on syntactic disambiguation, but haven't converted them into a
coherent paper).
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-12 14:54 ` Joseph Myers
@ 2022-11-12 15:35 ` Alejandro Colomar
2022-11-12 17:02 ` Joseph Myers
2022-11-12 15:56 ` Martin Uecker
1 sibling, 1 reply; 85+ messages in thread
From: Alejandro Colomar @ 2022-11-12 15:35 UTC (permalink / raw)
To: Joseph Myers
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 1768 bytes --]
Hi Joseph,
On 11/12/22 15:54, Joseph Myers wrote:
> On Sat, 12 Nov 2022, Alejandro Colomar via Gcc wrote:
>
>> Since it's to be used as an rvalue, not as a lvalue, I guess a
>> postfix-expression wouldn't be the right one.
>
> Several forms of postfix-expression are only rvalues.
>
>>> (with a special rule about how the identifier is interpreted, different
>>> from the normal scope rules)? If so, then ".a = 1" could either match
>>> assignment-expression directly (assigning to the postfix-expression ".a").
>>
>> No, assigning to a function parameter from within another parameter
>> declaration wouldn't make sense. They should be readonly. Side effects
>> should be forbidden, I think.
>
> Such assignments are already allowed. In a function definition, the side
> effects (including in size expressions for array parameters adjusted to
> pointers) take place before entry to the function body.
Then, I'm guessing that rules need to change in a way that .initializer cannot
appear as the left operand of an assignment-expression.
That is, for the following current definition of the assignment-expression (as
of N3054):
assignment-expression:
conditional-expression
unary-expression assignment-operator assignment-expression
The unary-expression cannot be formed by a .initializer.
Would that be doable and sufficient?
Cheers,
Alex
>
> And, in any case, if you did have a constraint disallowing such
> assignments, it wouldn't suffice for syntactic disambiguation (see the
> previous point I made about that; I have some rough notes towards a WG14
> paper on syntactic disambiguation, but haven't converted them into a
> coherent paper).
>
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-12 14:54 ` Joseph Myers
2022-11-12 15:35 ` Alejandro Colomar
@ 2022-11-12 15:56 ` Martin Uecker
2022-11-13 13:19 ` Alejandro Colomar
1 sibling, 1 reply; 85+ messages in thread
From: Martin Uecker @ 2022-11-12 15:56 UTC (permalink / raw)
To: Joseph Myers, Alejandro Colomar
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
Am Samstag, den 12.11.2022, 14:54 +0000 schrieb Joseph Myers:
> On Sat, 12 Nov 2022, Alejandro Colomar via Gcc wrote:
>
> > Since it's to be used as an rvalue, not as a lvalue, I guess a
> > postfix-expression wouldn't be the right one.
>
> Several forms of postfix-expression are only rvalues.
>
> > > (with a special rule about how the identifier is interpreted, different
> > > from the normal scope rules)? If so, then ".a = 1" could either match
> > > assignment-expression directly (assigning to the postfix-expression ".a").
> >
> > No, assigning to a function parameter from within another parameter
> > declaration wouldn't make sense. They should be readonly. Side effects
> > should be forbidden, I think.
>
> Such assignments are already allowed. In a function definition, the side
> effects (including in size expressions for array parameters adjusted to
> pointers) take place before entry to the function body.
>
> And, in any case, if you did have a constraint disallowing such
> assignments, it wouldn't suffice for syntactic disambiguation (see the
> previous point I made about that; I have some rough notes towards a WG14
> paper on syntactic disambiguation, but haven't converted them into a
> coherent paper).
My idea was to only allow
array-declarator : direct-declarator [ . identifier ]
and only for parameter (not nested inside structs declared
in parameter list) as a first step because it seems this
would exclude all difficult cases.
But if we need to allow more complicated expressions, then
it starts getting more complicated.
One could could allow more generic expressions, and
specify that the .identifier refers to a
parameter in
the nearest lexically enclosing parameter list or
struct/union.
Then
void foo(struct bar { int x; char c[.x] } a, int x);
would not be allowed (which is good because then we
could later use the syntax also inside structs). If
we apply scoping rules, the following would work:
struct bar { int y; };
void foo(char p[((struct bar){ .y = .x }).y], int x);
But not:
struct bar { int y; };
void foo(char p[((struct bar){ .y = .y }).y], int y);
But there are not only syntactical problems, because
also the type of the parameter might become relevant
and then you can get circular dependencies:
void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
I am not sure what would the best way to fix it. One
could specifiy that parameters referred to by
the .identifer syntax must of some integer type and
that the sub-expression .identifer is always
converted to a 'size_t'.
Maybe one should also add a constraint that all new
type length expressions, i.e. using the syntax,
can not have side effects. Or even that they follow
all the rules of integer constant expressions with
the fictitious assumption that all . identifer
sub-expressions are integer constant expressions.
The rationale being that this would facilitate
compile time reasoning about length expressions.
Martin
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-12 15:35 ` Alejandro Colomar
@ 2022-11-12 17:02 ` Joseph Myers
2022-11-12 17:08 ` Alejandro Colomar
0 siblings, 1 reply; 85+ messages in thread
From: Joseph Myers @ 2022-11-12 17:02 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
On Sat, 12 Nov 2022, Alejandro Colomar via Gcc wrote:
> > > No, assigning to a function parameter from within another parameter
> > > declaration wouldn't make sense. They should be readonly. Side effects
> > > should be forbidden, I think.
> >
> > Such assignments are already allowed. In a function definition, the side
> > effects (including in size expressions for array parameters adjusted to
> > pointers) take place before entry to the function body.
>
> Then, I'm guessing that rules need to change in a way that .initializer cannot
> appear as the left operand of an assignment-expression.
I think needing such a very special case rule tends to indicate that some
alternative syntax, not needing such a rule, would be better.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-12 17:02 ` Joseph Myers
@ 2022-11-12 17:08 ` Alejandro Colomar
0 siblings, 0 replies; 85+ messages in thread
From: Alejandro Colomar @ 2022-11-12 17:08 UTC (permalink / raw)
To: Joseph Myers
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 1244 bytes --]
On 11/12/22 18:02, Joseph Myers wrote:
> On Sat, 12 Nov 2022, Alejandro Colomar via Gcc wrote:
>
>>>> No, assigning to a function parameter from within another parameter
>>>> declaration wouldn't make sense. They should be readonly. Side effects
>>>> should be forbidden, I think.
>>>
>>> Such assignments are already allowed. In a function definition, the side
>>> effects (including in size expressions for array parameters adjusted to
>>> pointers) take place before entry to the function body.
>>
>> Then, I'm guessing that rules need to change in a way that .initializer cannot
>> appear as the left operand of an assignment-expression.
>
> I think needing such a very special case rule tends to indicate that some
> alternative syntax, not needing such a rule, would be better.
Well, by not being an lvalue, it can't be assigned to. That would be somewhat
like sizeof(identifier), which is also a unary-expression, so it's not so much
of a special case, is it?
void f(size_t s, int a[sizeof(1) = 1]); // constraint violation
void g(size_t s, int a[.s = 1]); // Also constraint violation
void h(size_t s, int a[s = 1]); // This is fine
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-12 15:56 ` Martin Uecker
@ 2022-11-13 13:19 ` Alejandro Colomar
2022-11-13 13:33 ` Alejandro Colomar
2022-11-14 17:52 ` Joseph Myers
0 siblings, 2 replies; 85+ messages in thread
From: Alejandro Colomar @ 2022-11-13 13:19 UTC (permalink / raw)
To: Martin Uecker, Joseph Myers
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 6674 bytes --]
Hi Martin!
On 11/12/22 16:56, Martin Uecker wrote:
> Am Samstag, den 12.11.2022, 14:54 +0000 schrieb Joseph Myers:
>> On Sat, 12 Nov 2022, Alejandro Colomar via Gcc wrote:
>>
>>> Since it's to be used as an rvalue, not as a lvalue, I guess a
>>> postfix-expression wouldn't be the right one.
>>
>> Several forms of postfix-expression are only rvalues.
>>
>>>> (with a special rule about how the identifier is interpreted, different
>>>> from the normal scope rules)? If so, then ".a = 1" could either match
>>>> assignment-expression directly (assigning to the postfix-expression ".a").
>>>
>>> No, assigning to a function parameter from within another parameter
>>> declaration wouldn't make sense. They should be readonly. Side effects
>>> should be forbidden, I think.
>>
>> Such assignments are already allowed. In a function definition, the side
>> effects (including in size expressions for array parameters adjusted to
>> pointers) take place before entry to the function body.
>>
>> And, in any case, if you did have a constraint disallowing such
>> assignments, it wouldn't suffice for syntactic disambiguation (see the
>> previous point I made about that; I have some rough notes towards a WG14
>> paper on syntactic disambiguation, but haven't converted them into a
>> coherent paper).
>
> My idea was to only allow
>
> array-declarator : direct-declarator [ . identifier ]
>
> and only for parameter (not nested inside structs declared
> in parameter list) as a first step because it seems this
> would exclude all difficult cases.
>
> But if we need to allow more complicated expressions, then
> it starts getting more complicated.
Ahh, I guess my work in documenting the man-pages prototypes got me thinking of
those extensions to the idea. I don't remember all the details :)
>
> One could could allow more generic expressions, and
> specify that the .identifier refers to a
> parameter in
> the nearest lexically enclosing parameter list or
> struct/union.
>
> Then
>
> void foo(struct bar { int x; char c[.x] } a, int x);
>
> would not be allowed (which is good because then we
> could later use the syntax also inside structs). If
> we apply scoping rules, the following would work:
>
> struct bar { int y; };
> void foo(char p[((struct bar){ .y = .x }).y], int x);
Makes sense.
>
> But not:
>
> struct bar { int y; };
> void foo(char p[((struct bar){ .y = .y }).y], int y);
Although it clearly is nonsense, I'm not sure I'd make it a constraint
violation, but rather Undefined Behavior. How is it different than this?:
$ cat foo.c
int main(void)
{
int i = i;
return i;
}
$ gcc --version | head -n1
gcc (Debian 12.2.0-9) 12.2.0
$ gcc -Wall -Wextra -Werror foo.c
$
$ clang --version | head -n1
Debian clang version 14.0.6
$ clang -Wall -Wextra -Werror foo.c
foo.c:3:10: error: variable 'i' is uninitialized when used within its own
initialization [-Werror,-Wuninitialized]
int i = i;
~ ^
1 error generated.
BTW, I just freaked out that GCC can't catch this trivial bug. Should I open a
bug report?
>
>
> But there are not only syntactical problems, because
> also the type of the parameter might become relevant
> and then you can get circular dependencies:
>
> void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
This seems to be a difficult stone in the road.
>
> I am not sure what would the best way to fix it. One
> could specifiy that parameters referred to by
> the .identifer syntax must of some integer type and
> that the sub-expression .identifer is always
> converted to a 'size_t'.
That makes sense, but then overnight some quite useful thing came to my mind
that would not be possible with this limitation:
<https://software.codidact.com/posts/285946>
char *
stpecpy(char dst[.end - .dst], char *src, char end[1])
{
for (/* void */; dst <= end; dst++) {
*dst = *src++;
if (*dst == '\0')
return dst;
}
/* Truncation detected */
*end = '\0';
#if !defined(NDEBUG)
/* Consume the rest of the input string. */
while (*src++) {};
#endif
return end + 1;
}
stpecpy() is a function similar to strlcat(3) that gets a pointer to the end of
the array instead of the size of the buffer. This allows chaining without
having performance issues[1].
[1]: <https://en.wikichip.org/wiki/schlemiel_the_painter%27s_algorithm>
Maybe allowing integral types and pointers would be enough. However, foreseeing
that the _Lengthof() proposal (BTW, which paper was it?) will succeed, and
combining it with this one, _Lengthof(pointer) would ideally give the length of
the array, so allowing pointers would conflict.
My solution is to disallow sizeof() and _Lengthof() on .identifier. That could
be done simply by saying that variably-modified types (VMT) are incomplete types
until immediately after the comma that follows the parameter declaration.
Therefore it would be allowed only in the same way as it is allowed right now
with the normal syntax (i.e., after the parameter has been seen).
BTW, what was the number of the latest paper for _Lengthof() and what happened
to it? I guess it's likely to be added to C3x, isn't it?
And another BTW: there's some kind of consistency in (some) projects for naming
sizes, and I have pending a review of the Linux man-pages to make it consistent
there too.
See the following table of usual conventions:
Operator/macro: variable names; Description.
------------------------------|------------------|---------------------
strlen(3): length, len, l; String length.
sizeof(): size, sz, nbytes; Identifier size in bytes.
nitems(), nelems(): n, nelem, nitems; Array number of elements.
sizeof_array(), array_bytes(): size, sz, nbytes; Array size in bytes.
Naming _Lengthof() the operator that gets the number of elements in an array
would create naming confusion, since then length can mean two different things.
I suggest _Nitemsof().
>
> Maybe one should also add a constraint that all new
> type length expressions, i.e. using the syntax,
> can not have side effects. Or even that they follow
> all the rules of integer constant expressions with
> the fictitious assumption that all . identifer
> sub-expressions are integer constant expressions.
> The rationale being that this would facilitate
> compile time reasoning about length expressions.
>
>
> Martin
>
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-13 13:19 ` Alejandro Colomar
@ 2022-11-13 13:33 ` Alejandro Colomar
2022-11-13 14:02 ` Alejandro Colomar
2022-11-14 17:52 ` Joseph Myers
1 sibling, 1 reply; 85+ messages in thread
From: Alejandro Colomar @ 2022-11-13 13:33 UTC (permalink / raw)
To: Martin Uecker, Joseph Myers
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 2080 bytes --]
Hi Martin,
On 11/13/22 14:19, Alejandro Colomar wrote:
>> But there are not only syntactical problems, because
>> also the type of the parameter might become relevant
>> and then you can get circular dependencies:
>>
>> void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
>
> This seems to be a difficult stone in the road.
>
>>
>> I am not sure what would the best way to fix it. One
>> could specifiy that parameters referred to by
>> the .identifer syntax must of some integer type and
>> that the sub-expression .identifer is always
>> converted to a 'size_t'.
>
> That makes sense, but then overnight some quite useful thing came to my mind
> that would not be possible with this limitation:
>
>
> <https://software.codidact.com/posts/285946>
>
> char *
> stpecpy(char dst[.end - .dst], char *src, char end[1])
> {
> for (/* void */; dst <= end; dst++) {
> *dst = *src++;
> if (*dst == '\0')
> return dst;
> }
> /* Truncation detected */
> *end = '\0';
>
> #if !defined(NDEBUG)
> /* Consume the rest of the input string. */
> while (*src++) {};
> #endif
>
> return end + 1;
> }
And I forgot to say it: Default promotions rank high (probably the highest) in
my list of most hated features^Wbugs in C. I wouldn't convert it to size_t, but
rather follow normal promotion rules.
Since you can use anything between INTMAX_MIN and UINTMAX_MAX for accessing an
array (which took me some time to understand), I'd also allow the same here.
So, the type of the expression between [] could perfectly be signed or unsigned.
So, you could use size_t for very high indices, or e.g. ptrdiff_t if you want to
allow negative numbers. In the function above, since dst can be a pointer to
one-past-the-end (it represents a previous truncation; that's why the test
dst<=end), forcing a size_t conversion would disallow that syntax.
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-13 13:33 ` Alejandro Colomar
@ 2022-11-13 14:02 ` Alejandro Colomar
2022-11-13 14:58 ` Martin Uecker
0 siblings, 1 reply; 85+ messages in thread
From: Alejandro Colomar @ 2022-11-13 14:02 UTC (permalink / raw)
To: Martin Uecker, Joseph Myers
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 2405 bytes --]
On 11/13/22 14:33, Alejandro Colomar wrote:
> Hi Martin,
>
> On 11/13/22 14:19, Alejandro Colomar wrote:
>>> But there are not only syntactical problems, because
>>> also the type of the parameter might become relevant
>>> and then you can get circular dependencies:
>>>
>>> void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
>>
>> This seems to be a difficult stone in the road.
>>
>>>
>>> I am not sure what would the best way to fix it. One
>>> could specifiy that parameters referred to by
>>> the .identifer syntax must of some integer type and
>>> that the sub-expression .identifer is always
>>> converted to a 'size_t'.
>>
>> That makes sense, but then overnight some quite useful thing came to my mind
>> that would not be possible with this limitation:
>>
>>
>> <https://software.codidact.com/posts/285946>
>>
>> char *
>> stpecpy(char dst[.end - .dst], char *src, char end[1])
Heh, I got an off-by-one error. It should be dst[.end - .dst + 1], of course,
and then the result of the whole expression would be 0, which is fine as size_t.
So, never mind.
>> {
>> for (/* void */; dst <= end; dst++) {
>> *dst = *src++;
>> if (*dst == '\0')
>> return dst;
>> }
>> /* Truncation detected */
>> *end = '\0';
>>
>> #if !defined(NDEBUG)
>> /* Consume the rest of the input string. */
>> while (*src++) {};
>> #endif
>>
>> return end + 1;
>> }
>
> And I forgot to say it: Default promotions rank high (probably the highest) in
> my list of most hated features^Wbugs in C. I wouldn't convert it to size_t, but
> rather follow normal promotion rules.
>
> Since you can use anything between INTMAX_MIN and UINTMAX_MAX for accessing an
> array (which took me some time to understand), I'd also allow the same here. So,
> the type of the expression between [] could perfectly be signed or unsigned.
>
> So, you could use size_t for very high indices, or e.g. ptrdiff_t if you want to
> allow negative numbers. In the function above, since dst can be a pointer to
> one-past-the-end (it represents a previous truncation; that's why the test
> dst<=end), forcing a size_t conversion would disallow that syntax.
>
> Cheers,
>
> Alex
>
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-13 14:02 ` Alejandro Colomar
@ 2022-11-13 14:58 ` Martin Uecker
2022-11-13 15:15 ` Alejandro Colomar
2022-11-28 23:18 ` Alex Colomar
0 siblings, 2 replies; 85+ messages in thread
From: Martin Uecker @ 2022-11-13 14:58 UTC (permalink / raw)
To: Alejandro Colomar, Joseph Myers
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
Am Sonntag, den 13.11.2022, 15:02 +0100 schrieb Alejandro Colomar:
>
> On 11/13/22 14:33, Alejandro Colomar wrote:
> > Hi Martin,
> >
> > On 11/13/22 14:19, Alejandro Colomar wrote:
> > > > But there are not only syntactical problems, because
> > > > also the type of the parameter might become relevant
> > > > and then you can get circular dependencies:
> > > >
> > > > void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
> > >
> > > This seems to be a difficult stone in the road.
But note that GNU forward declarations solve this nicely.
> > >
> > > > I am not sure what would the best way to fix it. One
> > > > could specifiy that parameters referred to by
> > > > the .identifer syntax must of some integer type and
> > > > that the sub-expression .identifer is always
> > > > converted to a 'size_t'.
> > >
> > > That makes sense, but then overnight some quite useful thing came to my mind
> > > that would not be possible with this limitation:
> > >
> > >
> > > <https://software.codidact.com/posts/285946>
> > >
> > > char *
> > > stpecpy(char dst[.end - .dst], char *src, char end[1])
>
> Heh, I got an off-by-one error. It should be dst[.end - .dst + 1], of course,
> and then the result of the whole expression would be 0, which is fine as size_t.
>
> So, never mind.
.end and .dst would have pointer size though.
> > > {
> > > for (/* void */; dst <= end; dst++) {
> > > *dst = *src++;
> > > if (*dst == '\0')
> > > return dst;
> > > }
> > > /* Truncation detected */
> > > *end = '\0';
> > >
> > > #if !defined(NDEBUG)
> > > /* Consume the rest of the input string. */
> > > while (*src++) {};
> > > #endif
> > >
> > > return end + 1;
> > > }
> > And I forgot to say it: Default promotions rank high (probably the highest) in
> > my list of most hated features^Wbugs in C.
If you replaced them with explicit conversion you then have
to add by hand all the time, I am pretty sure most people
would hate this more. (and it could also hide bugs)
> > I wouldn't convert it to size_t, but
> > rather follow normal promotion rules.
The point of making it size_t is that you then
do need to know the type of the parameter to make
sense of the expression. If the type matters, then you get
mutual dependencies as in the example above.
> > Since you can use anything between INTMAX_MIN and UINTMAX_MAX for accessing an
> > array (which took me some time to understand), I'd also allow the same here. So,
> > the type of the expression between [] could perfectly be signed or unsigned.
> >
> > So, you could use size_t for very high indices, or e.g. ptrdiff_t if you want to
> > allow negative numbers. In the function above, since dst can be a pointer to
> > one-past-the-end (it represents a previous truncation; that's why the test
> > dst<=end), forcing a size_t conversion would disallow that syntax.
Yes, this then does not work.
Martin
> > Cheers,
> >
> > Alex
> >
>
> --
> <http://www.alejandro-colomar.es/>
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-13 14:58 ` Martin Uecker
@ 2022-11-13 15:15 ` Alejandro Colomar
2022-11-13 15:32 ` Martin Uecker
2022-11-13 16:28 ` Alejandro Colomar
2022-11-28 23:18 ` Alex Colomar
1 sibling, 2 replies; 85+ messages in thread
From: Alejandro Colomar @ 2022-11-13 15:15 UTC (permalink / raw)
To: Martin Uecker, Joseph Myers
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 3465 bytes --]
Hi Martin,
On 11/13/22 15:58, Martin Uecker wrote:
> Am Sonntag, den 13.11.2022, 15:02 +0100 schrieb Alejandro Colomar:
>>
>> On 11/13/22 14:33, Alejandro Colomar wrote:
>>>
>>> On 11/13/22 14:19, Alejandro Colomar wrote:
>>>>> But there are not only syntactical problems, because
>>>>> also the type of the parameter might become relevant
>>>>> and then you can get circular dependencies:
>>>>>
>>>>> void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
>>>>
>>>> This seems to be a difficult stone in the road.
>
> But note that GNU forward declarations solve this nicely.
How would that above be solved with GNU fwd decl? I'm guessing that it can't.
How do you forward declare incomplete VMTs?.
>
>>>>
>>>>> I am not sure what would the best way to fix it. One
>>>>> could specifiy that parameters referred to by
>>>>> the .identifer syntax must of some integer type and
>>>>> that the sub-expression .identifer is always
>>>>> converted to a 'size_t'.
>>>>
>>>> That makes sense, but then overnight some quite useful thing came to my mind
>>>> that would not be possible with this limitation:
>>>>
>>>>
>>>> <https://software.codidact.com/posts/285946>
>>>>
>>>> char *
>>>> stpecpy(char dst[.end - .dst], char *src, char end[1])
>>
>> Heh, I got an off-by-one error. It should be dst[.end - .dst + 1], of course,
>> and then the result of the whole expression would be 0, which is fine as size_t.
>>
>> So, never mind.
>
> .end and .dst would have pointer size though.
>
>>>> {
>>>> for (/* void */; dst <= end; dst++) {
>>>> *dst = *src++;
>>>> if (*dst == '\0')
>>>> return dst;
>>>> }
>>>> /* Truncation detected */
>>>> *end = '\0';
>>>>
>>>> #if !defined(NDEBUG)
>>>> /* Consume the rest of the input string. */
>>>> while (*src++) {};
>>>> #endif
>>>>
>>>> return end + 1;
>>>> }
>>> And I forgot to say it: Default promotions rank high (probably the highest) in
>>> my list of most hated features^Wbugs in C.
>
> If you replaced them with explicit conversion you then have
> to add by hand all the time, I am pretty sure most people
> would hate this more. (and it could also hide bugs)
Yeah, casts are also in my top 3 list of things to avoid (although in this case
there's no bug); maybe a bit over default promotions :)
I didn't mean that all promotions are bad. Just the gratuitous ones, like
promoting everything to int before even needing it. That makes uint16_t a
theoretical type, because whenever you try to use it, you end up with a signed
32-bit type; fun heh? :P _BitInt() solves that for me.
But sure, in (1u + 1l), promotions are fine to get a common type.
>
>>> I wouldn't convert it to size_t, but
>>> rather follow normal promotion rules.
>
> The point of making it size_t is that you then
> do need to know the type of the parameter to make
> sense of the expression. If the type matters, then you get
> mutual dependencies as in the example above.
Except if you treat incomplete types as... incomplete types (for which sizeof()
is disallowed by the standard). And the issue we're having is that the types
are not yet complete at the time we're using them, aren't they?
Kind of like the initialization order fiasco, but since we're in a limited
scope, we can detect it.
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-13 15:15 ` Alejandro Colomar
@ 2022-11-13 15:32 ` Martin Uecker
2022-11-13 16:25 ` Alejandro Colomar
2022-11-13 16:28 ` Alejandro Colomar
1 sibling, 1 reply; 85+ messages in thread
From: Martin Uecker @ 2022-11-13 15:32 UTC (permalink / raw)
To: Alejandro Colomar, Joseph Myers
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
Am Sonntag, den 13.11.2022, 16:15 +0100 schrieb Alejandro Colomar:
> Hi Martin,
>
> On 11/13/22 15:58, Martin Uecker wrote:
> > Am Sonntag, den 13.11.2022, 15:02 +0100 schrieb Alejandro Colomar:
> > > On 11/13/22 14:33, Alejandro Colomar wrote:
> > > > On 11/13/22 14:19, Alejandro Colomar wrote:
> > > > > > But there are not only syntactical problems, because
> > > > > > also the type of the parameter might become relevant
> > > > > > and then you can get circular dependencies:
> > > > > >
> > > > > > void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
> > > > >
> > > > > This seems to be a difficult stone in the road.
> >
> > But note that GNU forward declarations solve this nicely.
>
> How would that above be solved with GNU fwd decl? I'm guessing that it can't.
> How do you forward declare incomplete VMTs?.
You can't express it. This was my point: it is impossible
to create circular dependencies.
...
> > > > > {
> > > > > for (/* void */; dst <= end; dst++) {
> > > > > *dst = *src++;
> > > > > if (*dst == '\0')
> > > > > return dst;
> > > > > }
> > > > > /* Truncation detected */
> > > > > *end = '\0';
> > > > >
> > > > > #if !defined(NDEBUG)
> > > > > /* Consume the rest of the input string. */
> > > > > while (*src++) {};
> > > > > #endif
> > > > >
> > > > > return end + 1;
> > > > > }
> > > > And I forgot to say it: Default promotions rank high (probably the highest) in
> > > > my list of most hated features^Wbugs in C.
> >
> > If you replaced them with explicit conversion you then have
> > to add by hand all the time, I am pretty sure most people
> > would hate this more. (and it could also hide bugs)
>
> Yeah, casts are also in my top 3 list of things to avoid (although in this case
> there's no bug); maybe a bit over default promotions :)
>
> I didn't mean that all promotions are bad. Just the gratuitous ones, like
> promoting everything to int before even needing it. That makes uint16_t a
> theoretical type, because whenever you try to use it, you end up with a signed
> 32-bit type; fun heh? :P _BitInt() solves that for me.
uint16_t is for storing data. My expectation is that people
will find _BitInt() difficult and error-prone to use for
small sizes. But maybe I am wrong...
> But sure, in (1u + 1l), promotions are fine to get a common type.
>
> > > > I wouldn't convert it to size_t, but
> > > > rather follow normal promotion rules.
> >
> > The point of making it size_t is that you then
> > do need to know the type of the parameter to make
> > sense of the expression. If the type matters, then you get
> > mutual dependencies as in the example above.
>
> Except if you treat incomplete types as... incomplete types (for which sizeof()
> is disallowed by the standard). And the issue we're having is that the types
> are not yet complete at the time we're using them, aren't they?
It is not an incomplete type. When doing parsing and do not have
a declaration we know nothing about it (not just not the size).
If we assume we know the type (by looking ahead) we get mutual
dependencies.
Also the capability to parse, fold, and do type checking
in one go is something worth preserving in my opinion.
Martin
> Kind of like the initialization order fiasco, but since we're in a limited
> scope, we can detect it.
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-13 15:32 ` Martin Uecker
@ 2022-11-13 16:25 ` Alejandro Colomar
0 siblings, 0 replies; 85+ messages in thread
From: Alejandro Colomar @ 2022-11-13 16:25 UTC (permalink / raw)
To: Martin Uecker, Joseph Myers
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 4129 bytes --]
Hi Martin,
On 11/13/22 16:32, Martin Uecker wrote:
> Am Sonntag, den 13.11.2022, 16:15 +0100 schrieb Alejandro Colomar:
>> Hi Martin,
>>
>> On 11/13/22 15:58, Martin Uecker wrote:
>>> Am Sonntag, den 13.11.2022, 15:02 +0100 schrieb Alejandro Colomar:
>>>> On 11/13/22 14:33, Alejandro Colomar wrote:
>>>>> On 11/13/22 14:19, Alejandro Colomar wrote:
>>>>>>> But there are not only syntactical problems, because
>>>>>>> also the type of the parameter might become relevant
>>>>>>> and then you can get circular dependencies:
>>>>>>>
>>>>>>> void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
>>>>>>
>>>>>> This seems to be a difficult stone in the road.
>>>
>>> But note that GNU forward declarations solve this nicely.
>>
>> How would that above be solved with GNU fwd decl? I'm guessing that it can't.
>> How do you forward declare incomplete VMTs?.
>
> You can't express it. This was my point: it is impossible
> to create circular dependencies.
>
> ...
>
>>>>>> {
>>>>>> for (/* void */; dst <= end; dst++) {
>>>>>> *dst = *src++;
>>>>>> if (*dst == '\0')
>>>>>> return dst;
>>>>>> }
>>>>>> /* Truncation detected */
>>>>>> *end = '\0';
>>>>>>
>>>>>> #if !defined(NDEBUG)
>>>>>> /* Consume the rest of the input string. */
>>>>>> while (*src++) {};
>>>>>> #endif
>>>>>>
>>>>>> return end + 1;
>>>>>> }
>>>>> And I forgot to say it: Default promotions rank high (probably the highest) in
>>>>> my list of most hated features^Wbugs in C.
>>>
>>> If you replaced them with explicit conversion you then have
>>> to add by hand all the time, I am pretty sure most people
>>> would hate this more. (and it could also hide bugs)
>>
>> Yeah, casts are also in my top 3 list of things to avoid (although in this case
>> there's no bug); maybe a bit over default promotions :)
>>
>> I didn't mean that all promotions are bad. Just the gratuitous ones, like
>> promoting everything to int before even needing it. That makes uint16_t a
>> theoretical type, because whenever you try to use it, you end up with a signed
>> 32-bit type; fun heh? :P _BitInt() solves that for me.
>
> uint16_t is for storing data. My expectation is that people
> will find _BitInt() difficult and error-prone to use for
> small sizes. But maybe I am wrong...
I'm a bit concerned about the suffix to create literals. I'd have preferred a
suffix that allowed creating a specific size (instead of the minimum one. i.e.,
1u16 or something like that. But otherwise I think it can be better. I don't
have in mind a big issue I had a year ago with uint16_t, but it required 3 casts
in a line. With _BitInt() I think none (or maybe one, for giving 1 the
appropriate size) would have been needed. But we'll see how it works out.
>
>> But sure, in (1u + 1l), promotions are fine to get a common type.
>>
>>>>> I wouldn't convert it to size_t, but
>>>>> rather follow normal promotion rules.
>>>
>>> The point of making it size_t is that you then
>>> do need to know the type of the parameter to make
>>> sense of the expression. If the type matters, then you get
>>> mutual dependencies as in the example above.
>>
>> Except if you treat incomplete types as... incomplete types (for which sizeof()
>> is disallowed by the standard). And the issue we're having is that the types
>> are not yet complete at the time we're using them, aren't they?
>
> It is not an incomplete type. When doing parsing and do not have
> a declaration we know nothing about it (not just not the size).
> If we assume we know the type (by looking ahead) we get mutual
> dependencies.
Then I'd do the following: .identifier always has an incomplete type.
I'm preparing a complete description of what I think of the feature. I'll add that.
>
> Also the capability to parse, fold, and do type checking
> in one go is something worth preserving in my opinion.
Makes sense.
Thanks for all the help, both!
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-13 15:15 ` Alejandro Colomar
2022-11-13 15:32 ` Martin Uecker
@ 2022-11-13 16:28 ` Alejandro Colomar
2022-11-13 16:31 ` Alejandro Colomar
2022-11-14 18:13 ` Joseph Myers
1 sibling, 2 replies; 85+ messages in thread
From: Alejandro Colomar @ 2022-11-13 16:28 UTC (permalink / raw)
To: Martin Uecker, Joseph Myers
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 3542 bytes --]
SYNOPSIS:
unary-operator: . identifier
DESCRIPTION:
- It is not an lvalue.
- This means sizeof() and _Lengthof() cannot be applied to them.
- This prevents ambiguity with a designator in an initializer-list within a
nested braced-initializer.
- The type of a .identifier is always an incomplete type.
- This prevents circular dependencies involving sizeof() or _Lengthof().
- Shadowing rules apply.
- This prevents ambiguity.
EXAMPLES:
- Valid examples (libc):
int
strncmp(const char s1[.n],
const char s2[.n],
size_t n);
int
cacheflush(void addr[.nbytes],
int nbytes,
int cache);
long
mbind(void addr[.len],
unsigned long len,
int mode,
const unsigned long nodemask[(.maxnode + ULONG_WIDTH ‐ 1)
/ ULONG_WIDTH],
unsigned long maxnode, unsigned int flags);
void *
bsearch(const void key[.size],
const void base[.size * .nmemb],
size_t nmemb,
size_t size,
int (*compar)(const void [.size], const void [.size]));
- Valid examples (my own):
void
ustr2str(char dst[restrict .len + 1],
const char src[restrict .len],
size_t len);
char *
stpecpy(char dst[.end - .dst + 1],
char *restrict src,
char end[1]);
- Valid examples (from this thread):
-
struct s { int a; };
void f(int a, int b[((struct s) { .a = 1 }).a]);
Explanation:
- Because of shadowing rules, .a=1 refers to the struct member.
- Also, if .a referred to the parameter, it would be an rvalue, so
it wouldn't be valid to assign to it.
- (...).a refers to the struct member too, since otherwise an rvalue is
not expected there.
-
void foo(struct bar { int x; char c[.x] } a, int x);
Explanation:
- Because of shadowing rules, [.x] refers to the struct member.
-
struct bar { int y; };
void foo(char p[((struct bar){ .y = .x }).y], int x);
Explanation:
- .x unambiguously refers to the parameter.
- Undefined behavior:
-
struct bar { int y; };
void foo(char p[((struct bar){ .y = .y }).y], int y);
Explanation:
- Because of shadowing rules, =.y refers to the struct member.
- .y=.y means initialize the member with itself (uninitialized use).
- (...).y refers to the struct member, since otherwise an rvalue is not
expected there.
- Constraint violations:
-
void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
Explanation:
- sizeof(*.b): Cannot get size of incomplete type.
- sizeof(*.a): Cannot get size of incomplete type.
-
void f(size_t s, int a[sizeof(1) = 1]);
Explanation:
- Cannot assign to rvalue.
-
void f(size_t s, int a[.s = 1]);
Explanation:
- Cannot assign to rvalue.
-
void f(size_t s, int a[sizeof(.s)]);
Explanation:
- sizeof(.s): Cannot get size of incomplete type.
Does this idea make sense to you?
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-13 16:28 ` Alejandro Colomar
@ 2022-11-13 16:31 ` Alejandro Colomar
2022-11-13 16:34 ` Alejandro Colomar
2022-11-14 18:13 ` Joseph Myers
1 sibling, 1 reply; 85+ messages in thread
From: Alejandro Colomar @ 2022-11-13 16:31 UTC (permalink / raw)
To: Martin Uecker, Joseph Myers
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 4560 bytes --]
On 11/13/22 17:28, Alejandro Colomar wrote:
> SYNOPSIS:
>
> unary-operator: . identifier
>
>
> DESCRIPTION:
>
> - It is not an lvalue.
>
> - This means sizeof() and _Lengthof() cannot be applied to them.
Sorry, the above is a thinko.
I wanted to say that, like sizeof() and _Lengthof(), you can't assign to it.
> - This prevents ambiguity with a designator in an initializer-list within a
> nested braced-initializer.
>
> - The type of a .identifier is always an incomplete type.
>
> - This prevents circular dependencies involving sizeof() or _Lengthof().
>
> - Shadowing rules apply.
>
> - This prevents ambiguity.
>
>
> EXAMPLES:
>
>
> - Valid examples (libc):
>
> int
> strncmp(const char s1[.n],
> const char s2[.n],
> size_t n);
>
> int
> cacheflush(void addr[.nbytes],
> int nbytes,
> int cache);
>
> long
> mbind(void addr[.len],
> unsigned long len,
> int mode,
> const unsigned long nodemask[(.maxnode + ULONG_WIDTH ‐ 1)
> / ULONG_WIDTH],
> unsigned long maxnode, unsigned int flags);
>
> void *
> bsearch(const void key[.size],
> const void base[.size * .nmemb],
> size_t nmemb,
> size_t size,
> int (*compar)(const void [.size], const void [.size]));
>
> - Valid examples (my own):
>
> void
> ustr2str(char dst[restrict .len + 1],
> const char src[restrict .len],
> size_t len);
>
> char *
> stpecpy(char dst[.end - .dst + 1],
> char *restrict src,
> char end[1]);
>
> - Valid examples (from this thread):
>
> -
> struct s { int a; };
> void f(int a, int b[((struct s) { .a = 1 }).a]);
>
> Explanation:
> - Because of shadowing rules, .a=1 refers to the struct member.
> - Also, if .a referred to the parameter, it would be an rvalue, so
> it wouldn't be valid to assign to it.
> - (...).a refers to the struct member too, since otherwise an rvalue is
> not expected there.
>
> -
> void foo(struct bar { int x; char c[.x] } a, int x);
>
> Explanation:
> - Because of shadowing rules, [.x] refers to the struct member.
>
> -
> struct bar { int y; };
> void foo(char p[((struct bar){ .y = .x }).y], int x);
>
> Explanation:
> - .x unambiguously refers to the parameter.
>
> - Undefined behavior:
>
> -
> struct bar { int y; };
> void foo(char p[((struct bar){ .y = .y }).y], int y);
>
> Explanation:
> - Because of shadowing rules, =.y refers to the struct member.
> - .y=.y means initialize the member with itself (uninitialized use).
> - (...).y refers to the struct member, since otherwise an rvalue is not
> expected there.
>
> - Constraint violations:
>
> -
> void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
>
> Explanation:
> - sizeof(*.b): Cannot get size of incomplete type.
> - sizeof(*.a): Cannot get size of incomplete type.
>
> -
> void f(size_t s, int a[sizeof(1) = 1]);
>
> Explanation:
> - Cannot assign to rvalue.
>
> -
> void f(size_t s, int a[.s = 1]);
>
> Explanation:
> - Cannot assign to rvalue.
>
> -
> void f(size_t s, int a[sizeof(.s)]);
>
> Explanation:
> - sizeof(.s): Cannot get size of incomplete type.
>
>
> Does this idea make sense to you?
>
>
> Cheers,
> Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-13 16:31 ` Alejandro Colomar
@ 2022-11-13 16:34 ` Alejandro Colomar
2022-11-13 16:56 ` Alejandro Colomar
0 siblings, 1 reply; 85+ messages in thread
From: Alejandro Colomar @ 2022-11-13 16:34 UTC (permalink / raw)
To: Martin Uecker, Joseph Myers
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 4893 bytes --]
On 11/13/22 17:31, Alejandro Colomar wrote:
>
>
> On 11/13/22 17:28, Alejandro Colomar wrote:
>> SYNOPSIS:
>>
>> unary-operator: . identifier
>>
>>
>> DESCRIPTION:
>>
>> - It is not an lvalue.
>>
>> - This means sizeof() and _Lengthof() cannot be applied to them.
>
> Sorry, the above is a thinko.
>
> I wanted to say that, like sizeof() and _Lengthof(), you can't assign to it.
>
>> - This prevents ambiguity with a designator in an initializer-list within
>> a nested braced-initializer.
>>
>> - The type of a .identifier is always an incomplete type.
Or rather, more easily prohibit explicitly using typeof(), sizeof(), and
_Lengthof() to it.
>>
>> - This prevents circular dependencies involving sizeof() or _Lengthof().
>>
>> - Shadowing rules apply.
>>
>> - This prevents ambiguity.
>>
>>
>> EXAMPLES:
>>
>>
>> - Valid examples (libc):
>>
>> int
>> strncmp(const char s1[.n],
>> const char s2[.n],
>> size_t n);
>>
>> int
>> cacheflush(void addr[.nbytes],
>> int nbytes,
>> int cache);
>>
>> long
>> mbind(void addr[.len],
>> unsigned long len,
>> int mode,
>> const unsigned long nodemask[(.maxnode + ULONG_WIDTH ‐ 1)
>> / ULONG_WIDTH],
>> unsigned long maxnode, unsigned int flags);
>>
>> void *
>> bsearch(const void key[.size],
>> const void base[.size * .nmemb],
>> size_t nmemb,
>> size_t size,
>> int (*compar)(const void [.size], const void [.size]));
>>
>> - Valid examples (my own):
>>
>> void
>> ustr2str(char dst[restrict .len + 1],
>> const char src[restrict .len],
>> size_t len);
>>
>> char *
>> stpecpy(char dst[.end - .dst + 1],
>> char *restrict src,
>> char end[1]);
>>
>> - Valid examples (from this thread):
>>
>> -
>> struct s { int a; };
>> void f(int a, int b[((struct s) { .a = 1 }).a]);
>>
>> Explanation:
>> - Because of shadowing rules, .a=1 refers to the struct member.
>> - Also, if .a referred to the parameter, it would be an rvalue, so
>> it wouldn't be valid to assign to it.
>> - (...).a refers to the struct member too, since otherwise an rvalue
>> is not expected there.
>>
>> -
>> void foo(struct bar { int x; char c[.x] } a, int x);
>>
>> Explanation:
>> - Because of shadowing rules, [.x] refers to the struct member.
>>
>> -
>> struct bar { int y; };
>> void foo(char p[((struct bar){ .y = .x }).y], int x);
>>
>> Explanation:
>> - .x unambiguously refers to the parameter.
>>
>> - Undefined behavior:
>>
>> -
>> struct bar { int y; };
>> void foo(char p[((struct bar){ .y = .y }).y], int y);
>>
>> Explanation:
>> - Because of shadowing rules, =.y refers to the struct member.
>> - .y=.y means initialize the member with itself (uninitialized use).
>> - (...).y refers to the struct member, since otherwise an rvalue is
>> not expected there.
>>
>> - Constraint violations:
>>
>> -
>> void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
>>
>> Explanation:
>> - sizeof(*.b): Cannot get size of incomplete type.
>> - sizeof(*.a): Cannot get size of incomplete type.
>>
>> -
>> void f(size_t s, int a[sizeof(1) = 1]);
>>
>> Explanation:
>> - Cannot assign to rvalue.
>>
>> -
>> void f(size_t s, int a[.s = 1]);
>>
>> Explanation:
>> - Cannot assign to rvalue.
>>
>> -
>> void f(size_t s, int a[sizeof(.s)]);
>>
>> Explanation:
>> - sizeof(.s): Cannot get size of incomplete type.
>>
>>
>> Does this idea make sense to you?
>>
>>
>> Cheers,
>> Alex
>
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-13 16:34 ` Alejandro Colomar
@ 2022-11-13 16:56 ` Alejandro Colomar
2022-11-13 19:05 ` Alejandro Colomar
0 siblings, 1 reply; 85+ messages in thread
From: Alejandro Colomar @ 2022-11-13 16:56 UTC (permalink / raw)
To: Martin Uecker, Joseph Myers
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 5300 bytes --]
On 11/13/22 17:34, Alejandro Colomar wrote:
>
>
> On 11/13/22 17:31, Alejandro Colomar wrote:
>>
>>
>> On 11/13/22 17:28, Alejandro Colomar wrote:
>>> SYNOPSIS:
>>>
>>> unary-operator: . identifier
>>>
>>>
>>> DESCRIPTION:
>>>
>>> - It is not an lvalue.
>>>
>>> - This means sizeof() and _Lengthof() cannot be applied to them.
>>
>> Sorry, the above is a thinko.
>>
>> I wanted to say that, like sizeof() and _Lengthof(), you can't assign to it.
>>
>>> - This prevents ambiguity with a designator in an initializer-list
>>> within a nested braced-initializer.
>>>
>>> - The type of a .identifier is always an incomplete type.
>
> Or rather, more easily prohibit explicitly using typeof(), sizeof(), and
> _Lengthof() to it.
Hmm, this is not enough. Pointer arithmetics are interesting, and for that, you
need to implicitly know the sizeof(*.p).
How about allowing only integral types or pointers to integral types?
>
>>>
>>> - This prevents circular dependencies involving sizeof() or _Lengthof().
>>>
>>> - Shadowing rules apply.
>>>
>>> - This prevents ambiguity.
>>>
>>>
>>> EXAMPLES:
>>>
>>>
>>> - Valid examples (libc):
>>>
>>> int
>>> strncmp(const char s1[.n],
>>> const char s2[.n],
>>> size_t n);
>>>
>>> int
>>> cacheflush(void addr[.nbytes],
>>> int nbytes,
>>> int cache);
>>>
>>> long
>>> mbind(void addr[.len],
>>> unsigned long len,
>>> int mode,
>>> const unsigned long nodemask[(.maxnode + ULONG_WIDTH ‐ 1)
>>> / ULONG_WIDTH],
>>> unsigned long maxnode, unsigned int flags);
>>>
>>> void *
>>> bsearch(const void key[.size],
>>> const void base[.size * .nmemb],
>>> size_t nmemb,
>>> size_t size,
>>> int (*compar)(const void [.size], const void [.size]));
>>>
>>> - Valid examples (my own):
>>>
>>> void
>>> ustr2str(char dst[restrict .len + 1],
>>> const char src[restrict .len],
>>> size_t len);
>>>
>>> char *
>>> stpecpy(char dst[.end - .dst + 1],
>>> char *restrict src,
>>> char end[1]);
>>>
>>> - Valid examples (from this thread):
>>>
>>> -
>>> struct s { int a; };
>>> void f(int a, int b[((struct s) { .a = 1 }).a]);
>>>
>>> Explanation:
>>> - Because of shadowing rules, .a=1 refers to the struct member.
>>> - Also, if .a referred to the parameter, it would be an rvalue,
>>> so it wouldn't be valid to assign to it.
>>> - (...).a refers to the struct member too, since otherwise an rvalue
>>> is not expected there.
>>>
>>> -
>>> void foo(struct bar { int x; char c[.x] } a, int x);
>>>
>>> Explanation:
>>> - Because of shadowing rules, [.x] refers to the struct member.
>>>
>>> -
>>> struct bar { int y; };
>>> void foo(char p[((struct bar){ .y = .x }).y], int x);
>>>
>>> Explanation:
>>> - .x unambiguously refers to the parameter.
>>>
>>> - Undefined behavior:
>>>
>>> -
>>> struct bar { int y; };
>>> void foo(char p[((struct bar){ .y = .y }).y], int y);
>>>
>>> Explanation:
>>> - Because of shadowing rules, =.y refers to the struct member.
>>> - .y=.y means initialize the member with itself (uninitialized use).
>>> - (...).y refers to the struct member, since otherwise an rvalue is
>>> not expected there.
>>>
>>> - Constraint violations:
>>>
>>> -
>>> void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
>>>
>>> Explanation:
>>> - sizeof(*.b): Cannot get size of incomplete type.
>>> - sizeof(*.a): Cannot get size of incomplete type.
>>>
>>> -
>>> void f(size_t s, int a[sizeof(1) = 1]);
>>>
>>> Explanation:
>>> - Cannot assign to rvalue.
>>>
>>> -
>>> void f(size_t s, int a[.s = 1]);
>>>
>>> Explanation:
>>> - Cannot assign to rvalue.
>>>
>>> -
>>> void f(size_t s, int a[sizeof(.s)]);
>>>
>>> Explanation:
>>> - sizeof(.s): Cannot get size of incomplete type.
>>>
>>>
>>> Does this idea make sense to you?
>>>
>>>
>>> Cheers,
>>> Alex
>>
>
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-13 16:56 ` Alejandro Colomar
@ 2022-11-13 19:05 ` Alejandro Colomar
0 siblings, 0 replies; 85+ messages in thread
From: Alejandro Colomar @ 2022-11-13 19:05 UTC (permalink / raw)
To: Martin Uecker, Joseph Myers
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 6113 bytes --]
On 11/13/22 17:56, Alejandro Colomar wrote:>>> On 11/13/22 17:28, Alejandro
Colomar wrote:
>>>> SYNOPSIS:
>>>>
>>>> unary-operator: . identifier
>>>>
>>>>
>>>> DESCRIPTION:
>>>>
>>>> - It is not an lvalue.
>>>>
>>>> - This means sizeof() and _Lengthof() cannot be applied to them.
>>>
>>> Sorry, the above is a thinko.
>>>
>>> I wanted to say that, like sizeof() and _Lengthof(), you can't assign to it.
>>>
>>>> - This prevents ambiguity with a designator in an initializer-list
>>>> within a nested braced-initializer.
>>>>
>>>> - The type of a .identifier is always an incomplete type.
>>
>> Or rather, more easily prohibit explicitly using typeof(), sizeof(), and
>> _Lengthof() to it.
>
> Hmm, this is not enough. Pointer arithmetics are interesting, and for that, you
> need to implicitly know the sizeof(*.p).
>
> How about allowing only integral types or pointers to integral types?
I've been thinking about keeping the number of passes as low as possible, while
allowing most useful expressions:
Maybe forcing some ordering can help:
- The type of a .initializer is complete after the opening parenthesis of the
function-declarator (if it refers to a parameter) or after the opening brace of
a braced-initializer, if it refers to a struct/union member, except when the
type is a variably-modified type, which will be complete after the closing
parenthesis or brace respectively.
I'm not sure I got the wording precisely, or if I covered all cases (like types
that cannot be completed for other reasons, even after the closing ')' or '}'.
>
>>
>>>>
>>>> - This prevents circular dependencies involving sizeof() or _Lengthof().
>>>>
>>>> - Shadowing rules apply.
>>>>
>>>> - This prevents ambiguity.
>>>>
>>>>
>>>> EXAMPLES:
>>>>
>>>>
>>>> - Valid examples (libc):
>>>>
>>>> int
>>>> strncmp(const char s1[.n],
>>>> const char s2[.n],
>>>> size_t n);
>>>>
>>>> int
>>>> cacheflush(void addr[.nbytes],
>>>> int nbytes,
>>>> int cache);
>>>>
>>>> long
>>>> mbind(void addr[.len],
>>>> unsigned long len,
>>>> int mode,
>>>> const unsigned long nodemask[(.maxnode + ULONG_WIDTH ‐ 1)
>>>> / ULONG_WIDTH],
>>>> unsigned long maxnode, unsigned int flags);
>>>>
>>>> void *
>>>> bsearch(const void key[.size],
>>>> const void base[.size * .nmemb],
>>>> size_t nmemb,
>>>> size_t size,
>>>> int (*compar)(const void [.size], const void [.size]));
>>>>
>>>> - Valid examples (my own):
>>>>
>>>> void
>>>> ustr2str(char dst[restrict .len + 1],
>>>> const char src[restrict .len],
>>>> size_t len);
>>>>
>>>> char *
>>>> stpecpy(char dst[.end - .dst + 1],
>>>> char *restrict src,
>>>> char end[1]);
>>>>
>>>> - Valid examples (from this thread):
>>>>
>>>> -
>>>> struct s { int a; };
>>>> void f(int a, int b[((struct s) { .a = 1 }).a]);
>>>>
>>>> Explanation:
>>>> - Because of shadowing rules, .a=1 refers to the struct member.
>>>> - Also, if .a referred to the parameter, it would be an rvalue,
>>>> so it wouldn't be valid to assign to it.
>>>> - (...).a refers to the struct member too, since otherwise an
>>>> rvalue is not expected there.
>>>>
>>>> -
>>>> void foo(struct bar { int x; char c[.x] } a, int x);
>>>>
>>>> Explanation:
>>>> - Because of shadowing rules, [.x] refers to the struct member.
>>>>
>>>> -
>>>> struct bar { int y; };
>>>> void foo(char p[((struct bar){ .y = .x }).y], int x);
>>>>
>>>> Explanation:
>>>> - .x unambiguously refers to the parameter.
>>>>
>>>> - Undefined behavior:
>>>>
>>>> -
>>>> struct bar { int y; };
>>>> void foo(char p[((struct bar){ .y = .y }).y], int y);
>>>>
>>>> Explanation:
>>>> - Because of shadowing rules, =.y refers to the struct member.
>>>> - .y=.y means initialize the member with itself (uninitialized use).
>>>> - (...).y refers to the struct member, since otherwise an rvalue is
>>>> not expected there.
>>>>
>>>> - Constraint violations:
>>>>
>>>> -
>>>> void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
>>>>
>>>> Explanation:
>>>> - sizeof(*.b): Cannot get size of incomplete type.
>>>> - sizeof(*.a): Cannot get size of incomplete type.
>>>>
>>>> -
>>>> void f(size_t s, int a[sizeof(1) = 1]);
>>>>
>>>> Explanation:
>>>> - Cannot assign to rvalue.
>>>>
>>>> -
>>>> void f(size_t s, int a[.s = 1]);
>>>>
>>>> Explanation:
>>>> - Cannot assign to rvalue.
>>>>
>>>> -
>>>> void f(size_t s, int a[sizeof(.s)]);
This should actually be valid.
>>>>
>>>> Explanation:
>>>> - sizeof(.s): Cannot get size of incomplete type.
>>>>
>>>>
>>>> Does this idea make sense to you?
>>>>
>>>>
>>>> Cheers,
>>>> Alex
>>>
>>
>
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-13 13:19 ` Alejandro Colomar
2022-11-13 13:33 ` Alejandro Colomar
@ 2022-11-14 17:52 ` Joseph Myers
2022-11-14 17:57 ` Alejandro Colomar
1 sibling, 1 reply; 85+ messages in thread
From: Joseph Myers @ 2022-11-14 17:52 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
On Sun, 13 Nov 2022, Alejandro Colomar via Gcc wrote:
> Maybe allowing integral types and pointers would be enough. However,
> foreseeing that the _Lengthof() proposal (BTW, which paper was it?) will
> succeed, and combining it with this one, _Lengthof(pointer) would ideally give
> the length of the array, so allowing pointers would conflict.
Do you mean N2529 Romero, New pointer-proof keyword to determine array
length? To quote the convenor in WG14 reflector message 18575 (17 Nov
2020) when I asked about its status, "The author asked me not to put those
on the agenda. He will supply updated versions later.".
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-14 17:52 ` Joseph Myers
@ 2022-11-14 17:57 ` Alejandro Colomar
2022-11-14 18:26 ` Joseph Myers
0 siblings, 1 reply; 85+ messages in thread
From: Alejandro Colomar @ 2022-11-14 17:57 UTC (permalink / raw)
To: Joseph Myers
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 1139 bytes --]
Hi Joseph!
On 11/14/22 18:52, Joseph Myers wrote:
> On Sun, 13 Nov 2022, Alejandro Colomar via Gcc wrote:
>
>> Maybe allowing integral types and pointers would be enough. However,
>> foreseeing that the _Lengthof() proposal (BTW, which paper was it?) will
>> succeed, and combining it with this one, _Lengthof(pointer) would ideally give
>> the length of the array, so allowing pointers would conflict.
>
> Do you mean N2529 Romero, New pointer-proof keyword to determine array
> length?
Yes, that's it! Thanks.
> To quote the convenor in WG14 reflector message 18575 (17 Nov
> 2020) when I asked about its status, "The author asked me not to put those
> on the agenda. He will supply updated versions later.".
Since his email is not in the paper, would you mind forwarding him this
suggestion of mine of renaming it to avoid confusion with string lengths? Or
maybe point him to the mailing list discussion[1]?
[1]:
<https://lore.kernel.org/linux-man/20221110222540.as3jrjdzxsnot3zm@illithid/T/#m794ad2a3173a19099625ee1dec7ea11ab754513d>
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-13 16:28 ` Alejandro Colomar
2022-11-13 16:31 ` Alejandro Colomar
@ 2022-11-14 18:13 ` Joseph Myers
2022-11-28 22:59 ` Alex Colomar
1 sibling, 1 reply; 85+ messages in thread
From: Joseph Myers @ 2022-11-14 18:13 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
On Sun, 13 Nov 2022, Alejandro Colomar via Gcc wrote:
> SYNOPSIS:
>
> unary-operator: . identifier
That's not what you mean. See the standard syntax.
unary-expression:
[other alternatives]
unary-operator cast-expression
unary-operator: one of
& * + - ~ !
> - It is not an lvalue.
>
> - This means sizeof() and _Lengthof() cannot be applied to them.
sizeof can be applied to non-lvalues.
> - This prevents ambiguity with a designator in an initializer-list within
> a nested braced-initializer.
No, it doesn't. See my previous points about syntactic disambiguation
being a separate matter from "one parse would result in a constraint
violation, so choose another parse that doesn't" (necessarily, because the
constraint violation that results could in general be at an arbitrary
distance from the point where a choice of parse has to be made). Or see
e.g. the disambiguation rule about enum type specifiers: there is an
explicit rule "If an enum type specifier is present, then the longest
possible sequence of tokens that can be interpreted as a specifier
qualifier list is interpreted as part of the enum type specifier." that
ensures that "enum e : long int;" interprets "long int" as the enum type
specifier, rather than "long" as the enum type specifier and "int" as
another type specifier in the sequence of declaration specifiers, even
though the latter parse would result in a constraint violation later.
Also, requiring unbounded lookahead to determine what kind of construct is
being parsed may be considered questionable for C. (If you have an
initializer starting .a.b.c.d.e, possibly with array element access as
well, those could all be designators or .a might be a reference to a
parameter of struct or union type and .b.c.d.e a sequence of references to
members within it and disambiguation under your rule would depend on
whether an '=' follows such an unbounded sequence.)
> - The type of a .identifier is always an incomplete type.
>
> - This prevents circular dependencies involving sizeof() or _Lengthof().
We have typeof as well, which can be applied to expressions with
incomplete type.
> - Shadowing rules apply.
>
> - This prevents ambiguity.
"Shadowing rules apply" isn't much of a specification. You need detailed
wording that would be added to 6.2.1 Scopes of identifiers (or equivalent
elsewhere) to make it clear exactly what scopes apply for identifiers
looked up using this construct.
> -
> void foo(struct bar { int x; char c[.x] } a, int x);
>
> Explanation:
> - Because of shadowing rules, [.x] refers to the struct member.
I really don't think standardizing VLAs-in-structures would be a good
idea. Certainly it would be a massive pain to specify meaningful
semantics for them and this outline doesn't even attempt to work through
the consequences of removing the rule that "If an identifier is declared
as having a variably modified type, it shall be an ordinary identifier (as
defined in 6.2.3), have no linkage, and have either block scope or
function prototype scope.".
The idea that .x as an expression might refer to either a member or a
parameter is also a massive change to the namespace rules, where at
present those are in completely different namespaces and so in any given
context a name only needs looking up as one or the other.
Again, proposals should be *minimal*. And even when they are, many issues
may well arise in practice (see the long list of constexpr issues in my
commit message for that C2x feature, for example, which I expect to turn
into multiple NB comments and at least two accompanying documents).
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-14 17:57 ` Alejandro Colomar
@ 2022-11-14 18:26 ` Joseph Myers
2022-11-28 23:02 ` Alex Colomar
0 siblings, 1 reply; 85+ messages in thread
From: Joseph Myers @ 2022-11-14 18:26 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
On Mon, 14 Nov 2022, Alejandro Colomar via Gcc wrote:
> > To quote the convenor in WG14 reflector message 18575 (17 Nov
> > 2020) when I asked about its status, "The author asked me not to put those
> > on the agenda. He will supply updated versions later.".
>
> Since his email is not in the paper, would you mind forwarding him this
> suggestion of mine of renaming it to avoid confusion with string lengths? Or
> maybe point him to the mailing list discussion[1]?
>
> [1]:
> <https://lore.kernel.org/linux-man/20221110222540.as3jrjdzxsnot3zm@illithid/T/#m794ad2a3173a19099625ee1dec7ea11ab754513d>
I don't have his email address (I don't see any emails from him on the
reflector since I joined it in 2001).
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-14 18:13 ` Joseph Myers
@ 2022-11-28 22:59 ` Alex Colomar
0 siblings, 0 replies; 85+ messages in thread
From: Alex Colomar @ 2022-11-28 22:59 UTC (permalink / raw)
To: Joseph Myers
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 5335 bytes --]
Hi Joseph,
On 11/14/22 19:13, Joseph Myers wrote:
> On Sun, 13 Nov 2022, Alejandro Colomar via Gcc wrote:
>
>> SYNOPSIS:
>>
>> unary-operator: . identifier
>
> That's not what you mean. See the standard syntax.
Yup; typo there.
>
> unary-expression:
> [other alternatives]
> unary-operator cast-expression
>
> unary-operator: one of
> & * + - ~ !
>
>> - It is not an lvalue.
>>
>> - This means sizeof() and _Lengthof() cannot be applied to them.
>
> sizeof can be applied to non-lvalues.
thinko there. I fixed it in a subsequent email.
>
>> - This prevents ambiguity with a designator in an initializer-list within
>> a nested braced-initializer.
>
> No, it doesn't. See my previous points about syntactic disambiguation
> being a separate matter from "one parse would result in a constraint
> violation, so choose another parse that doesn't" (necessarily, because the
> constraint violation that results could in general be at an arbitrary
> distance from the point where a choice of parse has to be made). Or see
> e.g. the disambiguation rule about enum type specifiers: there is an
> explicit rule "If an enum type specifier is present, then the longest
> possible sequence of tokens that can be interpreted as a specifier
> qualifier list is interpreted as part of the enum type specifier." that
> ensures that "enum e : long int;" interprets "long int" as the enum type
> specifier, rather than "long" as the enum type specifier and "int" as
> another type specifier in the sequence of declaration specifiers, even
> though the latter parse would result in a constraint violation later.
I get it. It's only unambiguous if there's lookahead.
>
> Also, requiring unbounded lookahead to determine what kind of construct is
> being parsed may be considered questionable for C. (If you have an
> initializer starting .a.b.c.d.e, possibly with array element access as
> well, those could all be designators or .a might be a reference to a
> parameter of struct or union type and .b.c.d.e a sequence of references to
> members within it and disambiguation under your rule would depend on
> whether an '=' follows such an unbounded sequence.)
I'm thinking of an idea for this.
>
>> - The type of a .identifier is always an incomplete type.
>>
>> - This prevents circular dependencies involving sizeof() or _Lengthof().
>
> We have typeof as well, which can be applied to expressions with
> incomplete type.
Yes, but it would not be problematic in the two-pass parsing I have in mind.
>
>> - Shadowing rules apply.
>>
>> - This prevents ambiguity.
>
> "Shadowing rules apply" isn't much of a specification. You need detailed
> wording that would be added to 6.2.1 Scopes of identifiers (or equivalent
> elsewhere) to make it clear exactly what scopes apply for identifiers
> looked up using this construct.
Yeah, I guess. I'm being easy for this draft. I'll try to be more
precise for future revisions.
>
>> -
>> void foo(struct bar { int x; char c[.x] } a, int x);
>>
>> Explanation:
>> - Because of shadowing rules, [.x] refers to the struct member.
>
> I really don't think standardizing VLAs-in-structures would be a good
> idea. Certainly it would be a massive pain to specify meaningful
> semantics for them and this outline doesn't even attempt to work through
> the consequences of removing the rule that "If an identifier is declared
> as having a variably modified type, it shall be an ordinary identifier (as
> defined in 6.2.3), have no linkage, and have either block scope or
> function prototype scope.".
Maybe. I didn't have them in mind until Martin mentioned them. Now
that he mentioned them, I'd like at least to be careful so that any new
syntax doesn't do something that impedes adding them in the future, if
it is ever considered desirable.
>
> The idea that .x as an expression might refer to either a member or a
> parameter is also a massive change to the namespace rules, where at
> present those are in completely different namespaces and so in any given
> context a name only needs looking up as one or the other.
>
> Again, proposals should be *minimal*.
Yes. I only want to have a rough discussion about how the entire
feature in an ideal future where everything is added would look like.
Otherwise, adding a minimal feature without considering this future,
might do something that prevents some part of it being implemented due
to backwards compatibility.
So I'd like to discuss the whole idea before then going to a minimal
proposal that will be *much* smaller than this idea that I'm discussing.
I'm happy with the Linux man-pages implementing the whole idea (even if
it's impossible to implement it in C ever), and letting ISO C / GCC
implement initially (and possibly ever) only the minimal stuff.
> And even when they are, many issues
> may well arise in practice (see the long list of constexpr issues in my
> commit message for that C2x feature, for example, which I expect to turn
> into multiple NB comments and at least two accompanying documents).
Sure; I expect that.
Cheers,
Alex
>
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-14 18:26 ` Joseph Myers
@ 2022-11-28 23:02 ` Alex Colomar
0 siblings, 0 replies; 85+ messages in thread
From: Alex Colomar @ 2022-11-28 23:02 UTC (permalink / raw)
To: Joseph Myers
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 986 bytes --]
Hi Joseph,
On 11/14/22 19:26, Joseph Myers wrote:
> On Mon, 14 Nov 2022, Alejandro Colomar via Gcc wrote:
>
>>> To quote the convenor in WG14 reflector message 18575 (17 Nov
>>> 2020) when I asked about its status, "The author asked me not to put those
>>> on the agenda. He will supply updated versions later.".
>>
>> Since his email is not in the paper, would you mind forwarding him this
>> suggestion of mine of renaming it to avoid confusion with string lengths? Or
>> maybe point him to the mailing list discussion[1]?
>>
>> [1]:
>> <https://lore.kernel.org/linux-man/20221110222540.as3jrjdzxsnot3zm@illithid/T/#m794ad2a3173a19099625ee1dec7ea11ab754513d>
>
> I don't have his email address (I don't see any emails from him on the
> reflector since I joined it in 2001).
Meh; thanks. Would you mind commenting this issue to whoever defends
his document, whenever you talk about it?
Thanks,
Alex
>
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-13 14:58 ` Martin Uecker
2022-11-13 15:15 ` Alejandro Colomar
@ 2022-11-28 23:18 ` Alex Colomar
2022-11-29 0:05 ` Joseph Myers
2022-11-29 14:58 ` Michael Matz
1 sibling, 2 replies; 85+ messages in thread
From: Alex Colomar @ 2022-11-28 23:18 UTC (permalink / raw)
To: Martin Uecker, Joseph Myers
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 4976 bytes --]
Hi Martin,
On 11/13/22 15:58, Martin Uecker wrote:
> Am Sonntag, den 13.11.2022, 15:02 +0100 schrieb Alejandro Colomar:
>>
>> On 11/13/22 14:33, Alejandro Colomar wrote:
>>> Hi Martin,
>>>
>>> On 11/13/22 14:19, Alejandro Colomar wrote:
>>>>> But there are not only syntactical problems, because
>>>>> also the type of the parameter might become relevant
>>>>> and then you can get circular dependencies:
>>>>>
>>>>> void foo(char (*a)[sizeof *.b], char (*b)[sizeof *.a]);
>>>>
>>>> This seems to be a difficult stone in the road.
>
> But note that GNU forward declarations solve this nicely.
Okay, so GNU declarations basically work by duplicating (some of) the
declarations.
How about the compiler parsing the parameter list twice? One for
getting the declarations and their types (but not resolving any
sizeof(), _Lengthof(), or typeof(), when they contain .identifier (or
expressions containing it; in those cases, leave the type incomplete, to
be completed in the second pass). As if the programmer had specified
the firward declarations, but it's the compiler that gets them
automatically.
I guess asking the compiler to do two passes on the param list isn't as
bad as asking to do unbound lookahead. In this case it's bound: look
ahead till the end of the param list; get as much info as possible, and
then do it again to complete. Anything not yet clear after two passes
is not valid.
So, for
void foo(char (*a)[sizeof(*.b)], char (*b)[sizeof(*.a)]);
in the first pass, the compiler would read:
char (*a)[sizeof(*.b)]; // sizeof .identifier; incomplete type;
continue parsing
char (*b)[sizeof(*.a)]; // sizeof .identifier; incomplete type;
continue parsing
At the end of the first pass, the compiler only know:
char (*a)[];
char (*b)[];
At the second pass, when evaluating sizeof(), since the type of the
arguments are yet incomplete, it can't be evaluated, and therefore,
there's an error at the first sizeof(*.b): *.b has incomplete type.
---
Let's show a distinct case:
void foo(char (*a)[sizeof(*.b)], char (*b)[10]);
After the first pass, the compiler would know:
char (*a)[];
char (*b)[10];
At the second pass, sizeof(*.b) would be evaluated undoubtedly to
sizeof(char[10]), and the parameter list would then be fine.
Does this 2-pass parsing make sense to you? Did I miss any details?
>
>>>>
>>>>> I am not sure what would the best way to fix it. One
>>>>> could specifiy that parameters referred to by
>>>>> the .identifer syntax must of some integer type and
>>>>> that the sub-expression .identifer is always
>>>>> converted to a 'size_t'.
>>>>
>>>> That makes sense, but then overnight some quite useful thing came to my mind
>>>> that would not be possible with this limitation:
>>>>
>>>>
>>>> <https://software.codidact.com/posts/285946>
>>>>
>>>> char *
>>>> stpecpy(char dst[.end - .dst], char *src, char end[1])
>>
>> Heh, I got an off-by-one error. It should be dst[.end - .dst + 1], of course,
>> and then the result of the whole expression would be 0, which is fine as size_t.
>>
>> So, never mind.
>
> .end and .dst would have pointer size though.
>
>>>> {
>>>> for (/* void */; dst <= end; dst++) {
>>>> *dst = *src++;
>>>> if (*dst == '\0')
>>>> return dst;
>>>> }
>>>> /* Truncation detected */
>>>> *end = '\0';
>>>>
>>>> #if !defined(NDEBUG)
>>>> /* Consume the rest of the input string. */
>>>> while (*src++) {};
>>>> #endif
>>>>
>>>> return end + 1;
>>>> }
>>> And I forgot to say it: Default promotions rank high (probably the highest) in
>>> my list of most hated features^Wbugs in C.
>
> If you replaced them with explicit conversion you then have
> to add by hand all the time, I am pretty sure most people
> would hate this more. (and it could also hide bugs)
>
>>> I wouldn't convert it to size_t, but
>>> rather follow normal promotion rules.
>
> The point of making it size_t is that you then
> do need to know the type of the parameter to make
> sense of the expression. If the type matters, then you get
> mutual dependencies as in the example above.
>
>>> Since you can use anything between INTMAX_MIN and UINTMAX_MAX for accessing an
>>> array (which took me some time to understand), I'd also allow the same here. So,
>>> the type of the expression between [] could perfectly be signed or unsigned.
>>>
>>> So, you could use size_t for very high indices, or e.g. ptrdiff_t if you want to
>>> allow negative numbers. In the function above, since dst can be a pointer to
>>> one-past-the-end (it represents a previous truncation; that's why the test
>>> dst<=end), forcing a size_t conversion would disallow that syntax.
>
> Yes, this then does not work.
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-28 23:18 ` Alex Colomar
@ 2022-11-29 0:05 ` Joseph Myers
2022-11-29 14:58 ` Michael Matz
1 sibling, 0 replies; 85+ messages in thread
From: Joseph Myers @ 2022-11-29 0:05 UTC (permalink / raw)
To: Alex Colomar
Cc: Martin Uecker, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
On Tue, 29 Nov 2022, Alex Colomar via Gcc wrote:
> I guess asking the compiler to do two passes on the param list isn't as bad as
> asking to do unbound lookahead. In this case it's bound: look ahead till the
> end of the param list; get as much info as possible, and then do it again to
> complete. Anything not yet clear after two passes is not valid.
Unbounded here means an unbounded number of tokens, as opposed to e.g.
looking one token ahead after seeing an identifier in statement context to
determine if it's a label.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-28 23:18 ` Alex Colomar
2022-11-29 0:05 ` Joseph Myers
@ 2022-11-29 14:58 ` Michael Matz
2022-11-29 15:17 ` Uecker, Martin
2022-11-29 16:49 ` Joseph Myers
1 sibling, 2 replies; 85+ messages in thread
From: Michael Matz @ 2022-11-29 14:58 UTC (permalink / raw)
To: Alex Colomar
Cc: Martin Uecker, Joseph Myers, Ingo Schwarze, JeanHeyd Meneide,
linux-man, gcc
Hey,
On Tue, 29 Nov 2022, Alex Colomar via Gcc wrote:
> How about the compiler parsing the parameter list twice?
This _is_ unbounded look-ahead. You could avoid this by using "." for
your new syntax. Use something unambiguous that can't be confused with
other syntactic elements, e.g. with a different punctuator like '@' or the
like. But I'm generally doubtful of this whole feature within C itself.
It serves a purpose in documentation, so in man-pages it seems fine enough
(but then still could use a different puncuator to not be confusable with
C syntax).
But within C it still can only serve a documentation purpose as no
checking could be performed without also changes in how e.g. arrays are
represented (they always would need to come with a size). It seems
doubtful to introduce completely new and ambiguous syntax with all the
problems Joseph lists just in order to be able to write documentation when
there's a perfectly fine method to do so: comments.
Ciao,
Michael.
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-29 14:58 ` Michael Matz
@ 2022-11-29 15:17 ` Uecker, Martin
2022-11-29 15:44 ` Michael Matz
2022-11-29 16:49 ` Joseph Myers
1 sibling, 1 reply; 85+ messages in thread
From: Uecker, Martin @ 2022-11-29 15:17 UTC (permalink / raw)
To: alx.manpages, matz; +Cc: gcc, linux-man, joseph, schwarze, wg14
Am Dienstag, dem 29.11.2022 um 14:58 +0000 schrieb Michael Matz:
> Hey,
>
> On Tue, 29 Nov 2022, Alex Colomar via Gcc wrote:
>
> > How about the compiler parsing the parameter list twice?
>
> This _is_ unbounded look-ahead. You could avoid this by using "."
> for
> your new syntax. Use something unambiguous that can't be confused
> with
> other syntactic elements, e.g. with a different punctuator like '@'
> or the
> like. But I'm generally doubtful of this whole feature within C
> itself.
> It serves a purpose in documentation, so in man-pages it seems fine
> enough
> (but then still could use a different puncuator to not be confusable
> with
> C syntax).
>
> But within C it still can only serve a documentation purpose as no
> checking could be performed without also changes in how e.g. arrays
> are
> represented (they always would need to come with a size).
It does not require any changes on how arrays are represented.
As part of VM-types the size becomes part of the type and this
can be used for static or dynamic analysis, e.g. you can
- today - get a run-time bounds violation with the sanitizer:
void foo(int n, char (*buf)[n])
{
(*buf)[n] = 1;
}
int main()
{
char buf[10];
foo(10, &buf);
}
https://godbolt.org/z/WWEdeYchs
I personally find this already extremely useful.
For
void foo(int n, char buf[n]);
it semantically has no meaning according to the C standard,
but a compiler could still warn.
It could also warn for
void foo(int n, char buf[n]);
int main()
{
char buf[9];
foo(buf);
}
if the passed buffer is too short. And here, GCC and Clang
already do this! (although - so far - only for static
bounds I think)
https://godbolt.org/z/afPhnxfzx
With "static"
void foo(int n, char buf[static n]);
this would also be UB according to C.
We miss some features in GCC to make this more useful (and
I filed bugs a while ago). For example, UB sanitzer should detect
additional cases which are UB.
But in general: This feature is useful not only for documentation
but also for analysis. You can get bounds checking in C which
works today and with additional compiler features this would
be very useful!
Martin
> It seems
> doubtful to introduce completely new and ambiguous syntax with all
> the
> problems Joseph lists just in order to be able to write documentation
> when
> there's a perfectly fine method to do so: comments.
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-29 15:17 ` Uecker, Martin
@ 2022-11-29 15:44 ` Michael Matz
2022-11-29 16:58 ` Uecker, Martin
0 siblings, 1 reply; 85+ messages in thread
From: Michael Matz @ 2022-11-29 15:44 UTC (permalink / raw)
To: Uecker, Martin; +Cc: alx.manpages, gcc, linux-man, joseph, schwarze, wg14
[-- Attachment #1: Type: text/plain, Size: 2427 bytes --]
Hey,
On Tue, 29 Nov 2022, Uecker, Martin wrote:
> It does not require any changes on how arrays are represented.
>
> As part of VM-types the size becomes part of the type and this
> can be used for static or dynamic analysis, e.g. you can
> - today - get a run-time bounds violation with the sanitizer:
>
> void foo(int n, char (*buf)[n])
> {
> (*buf)[n] = 1;
> }
This can already statically analyzed as being wrong, no need for dynamic
checking. What I mean is the checking of the claimed contract. Above you
assure for the function body that buf has n elements. This is also a
pre-condition for calling this function and _that_ can't be checked in all
cases because:
void foo (int n, char (*buf)[n]) { (*buf)[n-1] = 1; }
void callfoo(char * buf) { foo(10, buf); }
buf doesn't have a known size. And a pre-condition that can't be checked
is no pre-condition at all, as only then it can become a guarantee for the
body.
The compiler has no choice than to trust the user that the pre-condition
for calling foo is fulfilled. I can see how being able to just check half
of the contract might be useful, but if it doesn't give full checking then
any proposal for syntax should be even more obviously orthogonal than the
current one.
> For
>
> void foo(int n, char buf[n]);
>
> it semantically has no meaning according to the C standard,
> but a compiler could still warn.
Hmm? Warn about what in this decl?
> It could also warn for
>
> void foo(int n, char buf[n]);
>
> int main()
> {
> char buf[9];
> foo(buf);
> }
You mean if you write 'foo(10,buf)' (the above, as is, is simply a syntax
error for non-matching number of args). Or was it a mispaste and you mean
the one from the godbolt link, i.e.:
void foo(char buf[10]){ buf[9] = 1; }
int main()
{
char buf[9];
foo(buf);
}
? If so, yeah, we warn already. I don't think this is an argument for
(or against) introducing new syntax.
...
> But in general: This feature is useful not only for documentation
> but also for analysis.
Which feature we're talking about now? The ones you used all work today,
as you demonstrated. I thought we would be talking about that ".whatever"
syntax to refer to arbitrary parameters, even following ones? I think a
disrupting syntax change like that should have a higher bar than "in some
cases, depending on circumstance, we might even be able to warn".
Ciao,
Michael.
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-29 14:58 ` Michael Matz
2022-11-29 15:17 ` Uecker, Martin
@ 2022-11-29 16:49 ` Joseph Myers
2022-11-29 16:53 ` Jonathan Wakely
1 sibling, 1 reply; 85+ messages in thread
From: Joseph Myers @ 2022-11-29 16:49 UTC (permalink / raw)
To: Michael Matz
Cc: Alex Colomar, Martin Uecker, Ingo Schwarze, JeanHeyd Meneide,
linux-man, gcc
On Tue, 29 Nov 2022, Michael Matz via Gcc wrote:
> like. But I'm generally doubtful of this whole feature within C itself.
> It serves a purpose in documentation, so in man-pages it seems fine enough
> (but then still could use a different puncuator to not be confusable with
> C syntax).
In man-pages you don't need to invent syntax at all. You can write
int f(char buf[n], int n);
and in the context of a man page it will be clear to readers what is
meant, though such a syntax would be problematic in actual C source files
because of issues with circular dependencies between parameters and with n
already being declared in an outer scope.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-29 16:49 ` Joseph Myers
@ 2022-11-29 16:53 ` Jonathan Wakely
2022-11-29 17:00 ` Martin Uecker
0 siblings, 1 reply; 85+ messages in thread
From: Jonathan Wakely @ 2022-11-29 16:53 UTC (permalink / raw)
To: Joseph Myers
Cc: Michael Matz, Alex Colomar, Martin Uecker, Ingo Schwarze,
JeanHeyd Meneide, linux-man, gcc
On Tue, 29 Nov 2022 at 16:49, Joseph Myers wrote:
>
> On Tue, 29 Nov 2022, Michael Matz via Gcc wrote:
>
> > like. But I'm generally doubtful of this whole feature within C itself.
> > It serves a purpose in documentation, so in man-pages it seems fine enough
> > (but then still could use a different puncuator to not be confusable with
> > C syntax).
>
> In man-pages you don't need to invent syntax at all. You can write
>
> int f(char buf[n], int n);
>
> and in the context of a man page it will be clear to readers what is
> meant,
Considerably more clear than new invented syntax IMHO.
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-29 15:44 ` Michael Matz
@ 2022-11-29 16:58 ` Uecker, Martin
2022-11-29 17:28 ` Alex Colomar
0 siblings, 1 reply; 85+ messages in thread
From: Uecker, Martin @ 2022-11-29 16:58 UTC (permalink / raw)
To: matz; +Cc: gcc, alx.manpages, linux-man, joseph, schwarze, wg14
Hi,
Am Dienstag, dem 29.11.2022 um 15:44 +0000 schrieb Michael Matz:
> Hey,
>
> On Tue, 29 Nov 2022, Uecker, Martin wrote:
>
> > It does not require any changes on how arrays are represented.
> >
> > As part of VM-types the size becomes part of the type and this
> > can be used for static or dynamic analysis, e.g. you can
> > - today - get a run-time bounds violation with the sanitizer:
> >
> > void foo(int n, char (*buf)[n])
> > {
> > (*buf)[n] = 1;
> > }
>
> This can already statically analyzed as being wrong, no need for
> dynamic checking.
In this toy example, but in general in can be checked
only at run-time by using the information about the
dynamic bound.
> What I mean is the checking of the claimed contract.
> Above you assure for the function body that buf has n elements.
Yes.
> This is also a pre-condition for calling this function and
> _that_ can't be checked in all cases because:
>
> void foo (int n, char (*buf)[n]) { (*buf)[n-1] = 1; }
> void callfoo(char * buf) { foo(10, buf); }
>
> buf doesn't have a known size.
This does not type check.
> And a pre-condition that can't be checked
> is no pre-condition at all, as only then it can become a guarantee
> for the body.
The example above should look like:
void foo(int n, char (*buf)[n]);
void callfoo(char (*buf)[12]) { foo(10, buf); }
This could be checked by an UB sanitizer as calling
the function with an argument of incompatible type
is UB (but we currently do not do this)
If you think about
void foo(int n, char buf[n]);
void callfoo(char *buf) { foo(10, buf); }
Then you are right that this can not be checked at this
time. But this does not mean it is useless because we
still can detect inconsistencies in other cases:
void callfoo(int n, char buf[n - 1]) { foo(n, buf); }
We could also - in the future - have a warning about all
situations where bound information is lost, making sure
that preconditions are always checked for people who
consistently use these annotations.
> The compiler has no choice than to trust the user that the pre-
> condition for calling foo is fulfilled. I can see how
> being able to just check half of the contract might be
> useful, but if it doesn't give full checking then
> any proposal for syntax should be even more obviously
> orthogonal than the current one.
Your argument is not clear to me.
> > For
> >
> > void foo(int n, char buf[n]);
> >
> > it semantically has no meaning according to the C standard,
> > but a compiler could still warn.
>
> Hmm? Warn about what in this decl?
I meant, we could warn about something like this
because it is likely an error:
void foo(int n, char buf[n])
{
buf[n] = 1;
}
> > It could also warn for
> >
> > void foo(int n, char buf[n]);
> >
> > int main()
> > {
> > char buf[9];
> > foo(buf);
> > }
>
> You mean if you write 'foo(10,buf)' (the above, as is, is simply a
> syntax error for non-matching number of args). Or was it a mispaste
> and you mean the one from the godbolt link, i.e.:
I meant:
char buf[9];
foo(10, buf);
In fact, it turns out we warn already:
https://godbolt.org/z/qcvsv87Ev
> void foo(char buf[10]){ buf[9] = 1; }
> int main()
> {
> char buf[9];
> foo(buf);
> }
>
> ? If so, yeah, we warn already. I don't think this is an argument
> for (or against) introducing new syntax.
> ...
It is argument for having this syntax, because we could
extend such warning (those we already have and those we
could still add) to more common cases such as
void foo(char buf[.n], size_t n);
In my opinion, this would a huge step forward for
safety of C programs as we already have a lot of
infrastructure for checking bounds.
Of course, the existing GNU extension would achieve
the same thing:
void foo(size_t n; char buf[n], size_t n);
> > But in general: This feature is useful not only for documentation
> > but also for analysis.
>
> Which feature we're talking about now? The ones you used all work
> today,
> as you demonstrated. I thought we would be talking about that
> ".whatever"
> syntax to refer to arbitrary parameters, even following ones? I
> think a
> disrupting syntax change like that should have a higher bar than "in
> some
> cases, depending on circumstance, we might even be able to warn".
We can use our existing features and then apply them
to cases where the bound is specified after the pointer,
which is more common in practice.
Martin
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-29 16:53 ` Jonathan Wakely
@ 2022-11-29 17:00 ` Martin Uecker
2022-11-29 17:19 ` Alex Colomar
0 siblings, 1 reply; 85+ messages in thread
From: Martin Uecker @ 2022-11-29 17:00 UTC (permalink / raw)
To: Jonathan Wakely, Joseph Myers
Cc: Michael Matz, Alex Colomar, Ingo Schwarze, JeanHeyd Meneide,
linux-man, gcc
Am Dienstag, dem 29.11.2022 um 16:53 +0000 schrieb Jonathan Wakely:
> On Tue, 29 Nov 2022 at 16:49, Joseph Myers wrote:
> >
> > On Tue, 29 Nov 2022, Michael Matz via Gcc wrote:
> >
> > > like. But I'm generally doubtful of this whole feature within C
> > > itself.
> > > It serves a purpose in documentation, so in man-pages it seems
> > > fine enough
> > > (but then still could use a different puncuator to not be
> > > confusable with
> > > C syntax).
> >
> > In man-pages you don't need to invent syntax at all. You can write
> >
> > int f(char buf[n], int n);
> >
> > and in the context of a man page it will be clear to readers what
> > is
> > meant,
>
> Considerably more clear than new invented syntax IMHO.
True, but I think it would be a mistake to use code in
man pages which then does not work as expected (or even
is subtle wrong) in actual code.
Martin
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-29 17:00 ` Martin Uecker
@ 2022-11-29 17:19 ` Alex Colomar
2022-11-29 17:29 ` Alex Colomar
0 siblings, 1 reply; 85+ messages in thread
From: Alex Colomar @ 2022-11-29 17:19 UTC (permalink / raw)
To: Martin Uecker, Jonathan Wakely, Joseph Myers
Cc: Michael Matz, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 1328 bytes --]
Hi Martin, Joseph,
On 11/29/22 18:00, Martin Uecker wrote:
> Am Dienstag, dem 29.11.2022 um 16:53 +0000 schrieb Jonathan Wakely:
>> On Tue, 29 Nov 2022 at 16:49, Joseph Myers wrote:
>>>
>>> On Tue, 29 Nov 2022, Michael Matz via Gcc wrote:
>>>
>>>> like. But I'm generally doubtful of this whole feature within C
>>>> itself.
>>>> It serves a purpose in documentation, so in man-pages it seems
>>>> fine enough
>>>> (but then still could use a different puncuator to not be
>>>> confusable with
>>>> C syntax).
>>>
>>> In man-pages you don't need to invent syntax at all. You can write
>>>
>>> int f(char buf[n], int n);
>>>
>>> and in the context of a man page it will be clear to readers what
>>> is
>>> meant,
>>
>> Considerably more clear than new invented syntax IMHO.
>
> True, but I think it would be a mistake to use code in
> man pages which then does not work as expected (or even
> is subtle wrong) in actual code.
Exactly. Using your proposed syntax (which was my first draft) would
have probably been the source of hidden bugs, since it might work (read
compile) in some cases, but with wrong results.
I prefer this hypothetical syntax, which at most will cause compile errors.
Cheers,
Alex
>
> Martin
>
>
>
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-29 16:58 ` Uecker, Martin
@ 2022-11-29 17:28 ` Alex Colomar
0 siblings, 0 replies; 85+ messages in thread
From: Alex Colomar @ 2022-11-29 17:28 UTC (permalink / raw)
To: Uecker, Martin, matz; +Cc: gcc, linux-man, joseph, schwarze, wg14
[-- Attachment #1.1: Type: text/plain, Size: 6055 bytes --]
Hi Martin and Michael,
On 11/29/22 17:58, Uecker, Martin wrote:
>
> Hi,
>
> Am Dienstag, dem 29.11.2022 um 15:44 +0000 schrieb Michael Matz:
>> Hey,
>>
>> On Tue, 29 Nov 2022, Uecker, Martin wrote:
>>
>>> It does not require any changes on how arrays are represented.
>>>
>>> As part of VM-types the size becomes part of the type and this
>>> can be used for static or dynamic analysis, e.g. you can
>>> - today - get a run-time bounds violation with the sanitizer:
>>>
>>> void foo(int n, char (*buf)[n])
>>> {
>>> (*buf)[n] = 1;
>>> }
>>
>> This can already statically analyzed as being wrong, no need for
>> dynamic checking.
>
> In this toy example, but in general in can be checked
> only at run-time by using the information about the
> dynamic bound.
>
>> What I mean is the checking of the claimed contract.
>> Above you assure for the function body that buf has n elements.
>
> Yes.
>
>> This is also a pre-condition for calling this function and
>> _that_ can't be checked in all cases because:
>>
>> void foo (int n, char (*buf)[n]) { (*buf)[n-1] = 1; }
>> void callfoo(char * buf) { foo(10, buf); }
>>
>> buf doesn't have a known size.
>
> This does not type check.
>
>> And a pre-condition that can't be checked
>> is no pre-condition at all, as only then it can become a guarantee
>> for the body.
>
> The example above should look like:
>
> void foo(int n, char (*buf)[n]);
>
> void callfoo(char (*buf)[12]) { foo(10, buf); }
>
> This could be checked by an UB sanitizer as calling
> the function with an argument of incompatible type
> is UB (but we currently do not do this)
>
>
> If you think about
>
> void foo(int n, char buf[n]);
>
> void callfoo(char *buf) { foo(10, buf); }
>
>
> Then you are right that this can not be checked at this
> time. But this does not mean it is useless because we
> still can detect inconsistencies in other cases:
>
> void callfoo(int n, char buf[n - 1]) { foo(n, buf); }
>
> We could also - in the future - have a warning about all
> situations where bound information is lost, making sure
> that preconditions are always checked for people who
> consistently use these annotations.
>
>
>> The compiler has no choice than to trust the user that the pre-
>> condition for calling foo is fulfilled. I can see how
>> being able to just check half of the contract might be
>> useful, but if it doesn't give full checking then
>> any proposal for syntax should be even more obviously
>> orthogonal than the current one.
>
> Your argument is not clear to me.
>
>
>>> For
>>>
>>> void foo(int n, char buf[n]);
>>>
>>> it semantically has no meaning according to the C standard,
>>> but a compiler could still warn.
>>
>> Hmm? Warn about what in this decl?
>
> I meant, we could warn about something like this
> because it is likely an error:
>
> void foo(int n, char buf[n])
> {
> buf[n] = 1;
> }
>
>
>>> It could also warn for
>>>
>>> void foo(int n, char buf[n]);
>>>
>>> int main()
>>> {
>>> char buf[9];
>>> foo(buf);
>>> }
>>
>> You mean if you write 'foo(10,buf)' (the above, as is, is simply a
>> syntax error for non-matching number of args). Or was it a mispaste
>> and you mean the one from the godbolt link, i.e.:
>
> I meant:
>
> char buf[9];
> foo(10, buf);
>
> In fact, it turns out we warn already:
>
> https://godbolt.org/z/qcvsv87Ev
>
>> void foo(char buf[10]){ buf[9] = 1; }
>> int main()
>> {
>> char buf[9];
>> foo(buf);
>> }
>>
>> ? If so, yeah, we warn already. I don't think this is an argument
>> for (or against) introducing new syntax.
>> ...
>
> It is argument for having this syntax, because we could
> extend such warning (those we already have and those we
> could still add) to more common cases such as
>
> void foo(char buf[.n], size_t n);
>
> In my opinion, this would a huge step forward for
> safety of C programs as we already have a lot of
> infrastructure for checking bounds.
>
> Of course, the existing GNU extension would achieve
> the same thing:
>
> void foo(size_t n; char buf[n], size_t n);
>
>
>
>>> But in general: This feature is useful not only for documentation
>>> but also for analysis.
>>
>> Which feature we're talking about now? The ones you used all work
>> today,
>> as you demonstrated. I thought we would be talking about that
>> ".whatever"
>> syntax to refer to arbitrary parameters, even following ones? I
>> think a
>> disrupting syntax change like that should have a higher bar than "in
>> some
>> cases, depending on circumstance, we might even be able to warn".
>
> We can use our existing features and then apply them
> to cases where the bound is specified after the pointer,
> which is more common in practice.
Yep; basically adding some (not perfect, but some) static analysis to
many libc function calls.
Also, considering the issues with sizeof and arrays, and the lack of a
_Nitems() [proposed as _Lengthof()] operator, there's a lot of manual
work in array (read pointer) parameters.
However, a hypothetical _Nitems() operator could make use of this
syntactic sugar, and be more useful than just providing static analysis.
Using _Nitems() on a VMT (including pointer parameters) could be
specified to return the number of elements, so I foresee code like:
void foo(int arr[nmemb], size_t nmemb)
{
// _Nitems() evaluates to nmemb
for (size_t i = 0; i < _Nitems(arr); i++)
arr[i] = i;
}
void bar(int arr[])
{
// Constraint violation
for (size_t i = 0; i < _Nitems(arr); i++)
arr[i] = i;
}
This is probably the most useful part of this feature (but admittedly
it's not only about this feature, or even could be added without this
feature).
>
>
> Martin
>
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-29 17:19 ` Alex Colomar
@ 2022-11-29 17:29 ` Alex Colomar
2022-12-03 21:03 ` Alejandro Colomar
0 siblings, 1 reply; 85+ messages in thread
From: Alex Colomar @ 2022-11-29 17:29 UTC (permalink / raw)
To: Martin Uecker, Jonathan Wakely, Joseph Myers
Cc: Michael Matz, Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 1452 bytes --]
On 11/29/22 18:19, Alex Colomar wrote:
> Hi Martin, Joseph,
>
> On 11/29/22 18:00, Martin Uecker wrote:
>> Am Dienstag, dem 29.11.2022 um 16:53 +0000 schrieb Jonathan Wakely:
>>> On Tue, 29 Nov 2022 at 16:49, Joseph Myers wrote:
>>>>
>>>> On Tue, 29 Nov 2022, Michael Matz via Gcc wrote:
>>>>
>>>>> like. But I'm generally doubtful of this whole feature within C
>>>>> itself.
>>>>> It serves a purpose in documentation, so in man-pages it seems
>>>>> fine enough
>>>>> (but then still could use a different puncuator to not be
>>>>> confusable with
>>>>> C syntax).
>>>>
>>>> In man-pages you don't need to invent syntax at all. You can write
>>>>
>>>> int f(char buf[n], int n);
>>>>
>>>> and in the context of a man page it will be clear to readers what
>>>> is
>>>> meant,
>>>
>>> Considerably more clear than new invented syntax IMHO.
>>
>> True, but I think it would be a mistake to use code in
>> man pages which then does not work as expected (or even
>> is subtle wrong) in actual code.
>
> Exactly. Using your
s/your/Joseph's/
> proposed syntax (which was my first draft) would
> have probably been the source of hidden bugs, since it might work (read
> compile) in some cases, but with wrong results.
>
> I prefer this hypothetical syntax, which at most will cause compile errors.
>
> Cheers,
>
> Alex
>
>>
>> Martin
>>
>>
>>
>
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-11-29 17:29 ` Alex Colomar
@ 2022-12-03 21:03 ` Alejandro Colomar
2022-12-03 21:13 ` Andrew Pinski
` (2 more replies)
0 siblings, 3 replies; 85+ messages in thread
From: Alejandro Colomar @ 2022-12-03 21:03 UTC (permalink / raw)
To: Martin Uecker, Jonathan Wakely, Joseph Myers, Michael Matz
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 1641 bytes --]
Hi!
I'll probably have to release again before the Debian freeze of Bookworm.
That's something I didn't want to do, but there's some important bug that
affects downstream projects (translation pages), and I need to release. It's a
bit weird that the bug has been reported now, because it has always been there
(it's not a regression), but still, I want to address it before the next Debian.
And I don't want to start with stable releases, so I won't be releasing
man-pages-6.01.1. That means that all changes that I have in the project that I
didn't plan to release until 2024 will be released in a few weeks, notably
including the VLA syntax.
This means that while this syntax is still an invent, not something real that
can be used, I need to be careful about the future if I plan to make it public
so soon.
Since we've seen that using a '.' prefix seems to be problematic because of
lookahead, and recently Michael Matz proposed using a different punctuator (he
proposed '@') for differentiating parameters from struct members, I think going
in that direction may be a good idea.
How about '$'?
It's been used for function parameters since... forever? in sh(1). And it's
being added to the source character set in C23, so it seems to be a good choice.
It should also be intuitive what it means.
What do you think about it? I'm not asking for your opinion about adding it to
GCC, but rather for replacing the current '.' in the man-pages before I release
later this month. Do you think I should apply that change?
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-12-03 21:03 ` Alejandro Colomar
@ 2022-12-03 21:13 ` Andrew Pinski
2022-12-03 21:15 ` Martin Uecker
2022-12-06 2:08 ` Joseph Myers
2 siblings, 0 replies; 85+ messages in thread
From: Andrew Pinski @ 2022-12-03 21:13 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Martin Uecker, Jonathan Wakely, Joseph Myers, Michael Matz,
Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
On Sat, Dec 3, 2022 at 1:05 PM Alejandro Colomar via Gcc
<gcc@gcc.gnu.org> wrote:
>
> Hi!
>
> I'll probably have to release again before the Debian freeze of Bookworm.
> That's something I didn't want to do, but there's some important bug that
> affects downstream projects (translation pages), and I need to release. It's a
> bit weird that the bug has been reported now, because it has always been there
> (it's not a regression), but still, I want to address it before the next Debian.
>
> And I don't want to start with stable releases, so I won't be releasing
> man-pages-6.01.1. That means that all changes that I have in the project that I
> didn't plan to release until 2024 will be released in a few weeks, notably
> including the VLA syntax.
>
> This means that while this syntax is still an invent, not something real that
> can be used, I need to be careful about the future if I plan to make it public
> so soon.
>
> Since we've seen that using a '.' prefix seems to be problematic because of
> lookahead, and recently Michael Matz proposed using a different punctuator (he
> proposed '@') for differentiating parameters from struct members, I think going
> in that direction may be a good idea.
>
> How about '$'?
$ is a GNU extension for identifiers already.
See https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Dollar-Signs.html#Dollar-Signs
Thanks,
Andrew
>
> It's been used for function parameters since... forever? in sh(1). And it's
> being added to the source character set in C23, so it seems to be a good choice.
> It should also be intuitive what it means.
>
> What do you think about it? I'm not asking for your opinion about adding it to
> GCC, but rather for replacing the current '.' in the man-pages before I release
> later this month. Do you think I should apply that change?
>
> Cheers,
>
> Alex
>
>
> --
> <http://www.alejandro-colomar.es/>
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-12-03 21:03 ` Alejandro Colomar
2022-12-03 21:13 ` Andrew Pinski
@ 2022-12-03 21:15 ` Martin Uecker
2022-12-03 21:18 ` Alejandro Colomar
2022-12-06 2:08 ` Joseph Myers
2 siblings, 1 reply; 85+ messages in thread
From: Martin Uecker @ 2022-12-03 21:15 UTC (permalink / raw)
To: Alejandro Colomar, Jonathan Wakely, Joseph Myers, Michael Matz
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
Am Samstag, dem 03.12.2022 um 22:03 +0100 schrieb Alejandro Colomar:
...
> Since we've seen that using a '.' prefix seems to be problematic
> because of lookahead, and recently Michael Matz proposed using a
> different punctuator (he proposed '@') for differentiating parameters
> from struct members, I think going in that direction may be a good
> idea.
>
> How about '$'?
I don't see how the lookahead issue has anything to do with the choice
of the symbol. Here, also with the context would fully disambiguate
between other uses so I do not think there is any issue with using this
syntax. '$' is much more problematic as people use it in identifiers,
'@' may cause confusion with objective C.
Martin
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-12-03 21:15 ` Martin Uecker
@ 2022-12-03 21:18 ` Alejandro Colomar
0 siblings, 0 replies; 85+ messages in thread
From: Alejandro Colomar @ 2022-12-03 21:18 UTC (permalink / raw)
To: Martin Uecker, Jonathan Wakely, Joseph Myers, Michael Matz
Cc: Ingo Schwarze, JeanHeyd Meneide, linux-man, gcc
[-- Attachment #1.1: Type: text/plain, Size: 1435 bytes --]
Hi Martin and Andrew!
On 12/3/22 22:15, Martin Uecker wrote:
> Am Samstag, dem 03.12.2022 um 22:03 +0100 schrieb Alejandro Colomar:
> ...
>> Since we've seen that using a '.' prefix seems to be problematic
>> because of lookahead, and recently Michael Matz proposed using a
>> different punctuator (he proposed '@') for differentiating parameters
>> from struct members, I think going in that direction may be a good
>> idea.
>>
>> How about '$'?
>
> I don't see how the lookahead issue has anything to do with the choice
> of the symbol.
In simple [.identifier] expressions it's not a problem. I was foreseeing more
complex expressions, as I suggested earlier.
> Here, also with the context would fully disambiguate
> between other uses so I do not think there is any issue with using this
> syntax. '$' is much more problematic as people use it in identifiers,
> '@' may cause confusion with objective C.
On 12/3/22 22:13, Andrew Pinski wrote:
> $ is a GNU extension for identifiers already.
> Seehttps://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Dollar-Signs.html#Dollar-Signs
>
> Thanks,
> Andrew
>
Hmmm, I see. '$' is too bad. '@' is confusing. I think I'll keep the '.' for
now then, and assume that there's a high possibility that we'll never have
complex expressions with it.
>
> Martin
>
Thanks you!
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters
2022-12-03 21:03 ` Alejandro Colomar
2022-12-03 21:13 ` Andrew Pinski
2022-12-03 21:15 ` Martin Uecker
@ 2022-12-06 2:08 ` Joseph Myers
2 siblings, 0 replies; 85+ messages in thread
From: Joseph Myers @ 2022-12-06 2:08 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Martin Uecker, Jonathan Wakely, Michael Matz, Ingo Schwarze,
JeanHeyd Meneide, linux-man, gcc
On Sat, 3 Dec 2022, Alejandro Colomar via Gcc wrote:
> What do you think about it? I'm not asking for your opinion about adding it
> to GCC, but rather for replacing the current '.' in the man-pages before I
> release later this month. Do you think I should apply that change?
I think man pages should not use any novel syntax - even syntax newly
added to the C standard or GCC, unless required to express the standard
prototype for a function. They should be written for maximal
comprehensibility to C users in general, who are often behind on knowledge
standard features let alone the more obscure extensions - and certainly
don't know about random, highly speculative suggestions for possible
features suggested in random mailing list threads. So: don't use any
invented syntax (even if you explain it somewhere in the man pages), don't
use any syntax newly introduced in C23 unless strictly necessary and
you're sure it's already extremely widely understood among C users, be
wary of syntax introduced in C11. If a new feature in this area were
introduced in C29, waiting at least several years after that standard is
released (*not* just after the feature gets added to a draft) to start
using the new syntax in man pages would be a good idea.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 85+ messages in thread
end of thread, other threads:[~2022-12-06 2:08 UTC | newest]
Thread overview: 85+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-26 21:07 [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters Alejandro Colomar
2022-08-27 11:10 ` Ingo Schwarze
2022-08-27 12:15 ` Alejandro Colomar
2022-08-27 13:08 ` Ingo Schwarze
2022-08-27 18:38 ` Alejandro Colomar
2022-08-28 11:24 ` Alejandro Colomar
[not found] ` <CACqA6+mfaj6Viw+LVOG=nE350gQhCwVKXRzycVru5Oi4EJzgTg@mail.gmail.com>
2022-09-02 21:02 ` Alejandro Colomar
2022-09-02 21:57 ` Alejandro Colomar
2022-09-03 12:47 ` Martin Uecker
2022-09-03 13:29 ` Ingo Schwarze
2022-09-03 15:08 ` Alejandro Colomar
2022-09-03 13:41 ` Alejandro Colomar
2022-09-03 14:35 ` Martin Uecker
2022-09-03 14:59 ` Alejandro Colomar
2022-09-03 15:31 ` Martin Uecker
2022-09-03 20:02 ` Alejandro Colomar
2022-09-05 14:31 ` Alejandro Colomar
2022-11-10 0:06 ` Alejandro Colomar
2022-11-10 0:09 ` Alejandro Colomar
2022-11-10 1:33 ` Joseph Myers
2022-11-10 1:39 ` Joseph Myers
2022-11-10 6:21 ` Martin Uecker
2022-11-10 10:09 ` Alejandro Colomar
2022-11-10 23:19 ` Joseph Myers
2022-11-10 23:28 ` Alejandro Colomar
2022-11-11 19:52 ` Martin Uecker
2022-11-12 1:09 ` Joseph Myers
2022-11-12 7:24 ` Martin Uecker
2022-11-12 12:34 ` Alejandro Colomar
2022-11-12 12:46 ` Alejandro Colomar
2022-11-12 13:03 ` Joseph Myers
2022-11-12 13:40 ` Alejandro Colomar
2022-11-12 13:58 ` Alejandro Colomar
2022-11-12 14:54 ` Joseph Myers
2022-11-12 15:35 ` Alejandro Colomar
2022-11-12 17:02 ` Joseph Myers
2022-11-12 17:08 ` Alejandro Colomar
2022-11-12 15:56 ` Martin Uecker
2022-11-13 13:19 ` Alejandro Colomar
2022-11-13 13:33 ` Alejandro Colomar
2022-11-13 14:02 ` Alejandro Colomar
2022-11-13 14:58 ` Martin Uecker
2022-11-13 15:15 ` Alejandro Colomar
2022-11-13 15:32 ` Martin Uecker
2022-11-13 16:25 ` Alejandro Colomar
2022-11-13 16:28 ` Alejandro Colomar
2022-11-13 16:31 ` Alejandro Colomar
2022-11-13 16:34 ` Alejandro Colomar
2022-11-13 16:56 ` Alejandro Colomar
2022-11-13 19:05 ` Alejandro Colomar
2022-11-14 18:13 ` Joseph Myers
2022-11-28 22:59 ` Alex Colomar
2022-11-28 23:18 ` Alex Colomar
2022-11-29 0:05 ` Joseph Myers
2022-11-29 14:58 ` Michael Matz
2022-11-29 15:17 ` Uecker, Martin
2022-11-29 15:44 ` Michael Matz
2022-11-29 16:58 ` Uecker, Martin
2022-11-29 17:28 ` Alex Colomar
2022-11-29 16:49 ` Joseph Myers
2022-11-29 16:53 ` Jonathan Wakely
2022-11-29 17:00 ` Martin Uecker
2022-11-29 17:19 ` Alex Colomar
2022-11-29 17:29 ` Alex Colomar
2022-12-03 21:03 ` Alejandro Colomar
2022-12-03 21:13 ` Andrew Pinski
2022-12-03 21:15 ` Martin Uecker
2022-12-03 21:18 ` Alejandro Colomar
2022-12-06 2:08 ` Joseph Myers
2022-11-14 17:52 ` Joseph Myers
2022-11-14 17:57 ` Alejandro Colomar
2022-11-14 18:26 ` Joseph Myers
2022-11-28 23:02 ` Alex Colomar
2022-11-10 9:40 ` G. Branden Robinson
2022-11-10 10:59 ` Alejandro Colomar
2022-11-10 17:47 ` Alejandro Colomar
2022-11-10 18:04 ` MR macro 4th argument (was: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters) Alejandro Colomar
2022-11-10 18:11 ` Alejandro Colomar
2022-11-10 18:20 ` Alejandro Colomar
2022-11-10 19:37 ` Alejandro Colomar
2022-11-10 20:41 ` Alejandro Colomar
2022-11-10 22:55 ` G. Branden Robinson
2022-11-10 23:55 ` Alejandro Colomar
2022-11-11 4:44 ` G. Branden Robinson
2022-11-10 22:25 ` [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters G. Branden Robinson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.