All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] mctp.7: Add man page for Linux MCTP support
@ 2021-11-11  1:53 Jeremy Kerr
  2021-11-11 21:38 ` Alejandro Colomar (man-pages)
  0 siblings, 1 reply; 15+ messages in thread
From: Jeremy Kerr @ 2021-11-11  1:53 UTC (permalink / raw)
  To: linux-man; +Cc: Alejandro Colomar, Michael Kerrisk

This change adds a brief description for the new Management Component
Transport Protocol (MCTP) support added to Linux as of bc49d8169.

This is a fairly regular sockets-API implementation, so we're just
describing the semantics of socket, bind, sendto and recvfrom for the
new protocol.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>

---
v2:
 - Fix synopsis variable formatting
 - Semantic newlines
 - remove unnecessary escape
 - make custom constants more obvious
 - Add URI breakpoints
 - fix sockaddr_mctp misuse
 - update to reflect upstream sockaddr_mctp types
---
 man7/address_families.7 |   7 ++
 man7/mctp.7             | 187 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 194 insertions(+)
 create mode 100644 man7/mctp.7

diff --git a/man7/address_families.7 b/man7/address_families.7
index 3e535e66d..3c8299e69 100644
--- a/man7/address_families.7
+++ b/man7/address_families.7
@@ -405,6 +405,13 @@ XDP (express data path) interface (since Linux 4.18).
 See
 .I Documentation/networking/af_xdp.rst
 in the Linux kernel source tree for details.
+.TP
+.B AF_MCTP
+.\" commit: bc49d8169aa72295104f1558830c568efb946315
+MCTP (Management Component Transport Protocol) interface (since Linux 5.15),
+as defined by the DMTF specification DSP0236.
+For further information, see
+.BR mctp (7).
 .SH SEE ALSO
 .BR socket (2),
 .BR socket (7)
diff --git a/man7/mctp.7 b/man7/mctp.7
new file mode 100644
index 000000000..506826ed6
--- /dev/null
+++ b/man7/mctp.7
@@ -0,0 +1,187 @@
+.\" Copyright (c) 2021 Jeremy Kerr <jk@codeconstruct.com.au>
+.\"
+.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
+.\" This is free documentation; you can redistribute it and/or
+.\" modify it under the terms of the GNU General Public License as
+.\" published by the Free Software Foundation; either version 2 of
+.\" the License, or (at your option) any later version.
+.\"
+.\" The GNU General Public License's references to "object code"
+.\" and "executables" are to be interpreted as the output of any
+.\" document formatting or typesetting system, including
+.\" intermediate and printed output.
+.\"
+.\" This manual is distributed in the hope that it will be useful,
+.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
+.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+.\" GNU General Public License for more details.
+.\"
+.\" You should have received a copy of the GNU General Public
+.\" License along with this manual; if not, see
+.\" <http://www.gnu.org/licenses/>.
+.\" %%%LICENSE_END
+.\" commit: bc49d8169aa72295104f1558830c568efb946315
+.TH MCTP  7 2021-10-14 "Linux" "Linux Programmer's Manual"
+.SH NAME
+mctp \- Management Component Transport Protocol
+.SH SYNOPSIS
+.nf
+.B #include <sys/socket.h>
+.B #include <linux/mctp.h>
+.PP
+.IB mctp_socket " = socket(AF_MCTP, SOCK_DGRAM, 0);"
+.fi
+.SH DESCRIPTION
+Linux implements the Management Component Transport Protocol (MCTP),
+specified by DMTF spec DSP0236, currently at version 1.
+This is a connectionless protocol, typically used between devices within a
+server system.
+Message reliability and ordering are not guaranteed, but message boundaries are
+preserved.
+.PP
+The API for MCTP messaging uses a standard sockets interface, using the
+.BR sendto (2)
+and
+.BR recvfrom (2)
+classes of system calls to transfer messages.
+Messages may be fragmented into packets before transmission, and reassembled at
+the remote endpoint.
+This fragmentation and reassembly is transparent to userspace.
+.PP
+.SS Address format
+MCTP addresses (also referred to as EIDs, or Endpoint Identifiers) are
+single-byte values, typed as
+.BR mctp_eid_t .
+Packets between a local and remote endpoint are identified by the source
+and destination EIDs, plus a three-bit tag value.
+.PP
+Addressing data is passed in socket system calls through
+.B struct sockaddr_mctp
+defined as:
+.PP
+.in +4n
+.EX
+typedef uint8_t        mctp_eid_t;
+
+struct mctp_addr {
+    mctp_eid_t         s_addr;
+};
+
+struct sockaddr_mctp {
+    unsigned short     smctp_family;  /* = AF_MCTP */
+    uint16_t           __smctp_pad0;  /* pad, must be zero */
+    int                smctp_network; /* local network identifier */
+    struct mctp_addr   smctp_addr;    /* EID */
+    uint8_t            smctp_type;    /* message type byte */
+    uint8_t            smctp_tag;     /* tag value, including TO flag */
+    uint8_t            __smctp_pad1;  /* pad, must be zero */
+};
+.EE
+.in
+.SS Sending messages
+Messages can be transmitted using the
+.BR sendto (2)
+and
+.BR sendmsg (2)
+system calls, by providing a
+.B struct sockaddr_mctp
+describing the addressing parameters.
+.PP
+.in +4n
+.EX
+struct sockaddr_mctp addr;
+ssize_t len;
+char *buf;
+
+/* unused fields must be zero */
+memset(&addr, 0, sizeof(addr));
+
+/* set message destination */
+addr.smctp_family = AF_MCTP;
+addr.smctp_network = 0;
+addr.smctp_addr.s_addr = 8; /* remote EID */
+addr.smctp_tag = MCTP_TAG_OWNER;
+addr.smctp_type = MYPROGRAM_MCTP_TYPE_ECHO; /* example type */
+
+buf = "hello, world!"
+
+len = sendto(sd, buf, 13, 0,
+             (struct sockaddr *)&addr, sizeof(addr));
+.EE
+.in
+.PP
+Here, the sender owns the message tag; so
+.B MCTP_TAG_OWNER
+is used as the tag data.
+The kernel will allocate a specific tag value for this message.
+If no tag is available,
+.BR sendto (2)
+will return an error, with errno set to
+.BR EBUSY .
+This allocated tag remains associated with the socket, so that any replies to
+the sent message will be received by the same socket.
+.PP
+Sending a MCTP message requires the
+.B CAP_NET_RAW
+capability.
+.SS Receiving messages
+Messages can be received using the
+.BR recvfrom (2)
+and
+.BR recvmsg (2)
+system calls.
+.PP
+.in +4n
+.EX
+struct sockaddr_mctp addr;
+socklen_t addrlen;
+char buf[13];
+
+addrlen = sizeof(addr);
+
+len = recvfrom(sd, buf, sizeof(buf), 0,
+               (struct sockaddr *)&addr, &addrlen);
+.EE
+.in
+.PP
+In order to receive an incoming message, the receiver will need to either have
+previously sent a message to the same endpoint, or performed a
+.BR bind (2)
+to receive all messages of a certain type:
+.PP
+.in +4n
+.EX
+struct sockaddr_mctp addr;
+
+addr.smctp_family = AF_MCTP;
+addr.smctp_network = MCTP_NET_ANY;
+addr.smctp_addr.s_addr = MCTP_ADDR_ANY;
+addr.smctp_type = MYPROGRAM_MCTP_TYPE_ECHO; /* our 'echo' type */
+
+int rc = bind(sd, (struct sockaddr *)&addr, sizeof(addr));
+.EE
+.in
+.PP
+This call requires the
+.B CAP_NET_BIND_SERVICE
+capability, and will result in the socket receiving all messages sent to
+locally-assigned EIDs, for the specified message type.
+.PP
+After a
+.BR recvfrom (2)
+or
+.BR recvmsg (2)
+returns a success condition, the provided address argument will be populated
+with the sender's network and EID, as well as the tag value used for the
+message.
+Any reply to this message should pass the same address and tag value (with the
+TO bit cleared) to indicate that is is directed to the same remote endpoint.
+.SH SEE ALSO
+.BR socket (7)
+.PP
+The kernel source file
+.IR Documentation/networking/mctp.rst .
+.PP
+The DSP0236 specification, at
+.UR https://www.dmtf.org\:/standards\:/pmci
+.UE .
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] mctp.7: Add man page for Linux MCTP support
  2021-11-11  1:53 [PATCH v2] mctp.7: Add man page for Linux MCTP support Jeremy Kerr
@ 2021-11-11 21:38 ` Alejandro Colomar (man-pages)
  2021-11-12  1:12   ` Jeremy Kerr
  2021-11-12  9:35   ` G. Branden Robinson
  0 siblings, 2 replies; 15+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-11-11 21:38 UTC (permalink / raw)
  To: Jeremy Kerr, G. Branden Robinson; +Cc: Michael Kerrisk, linux-man

[CC += Branden]

Hi, Jeremy (and Branden)!

On 11/11/21 02:53, Jeremy Kerr wrote:
> This change adds a brief description for the new Management Component
> Transport Protocol (MCTP) support added to Linux as of bc49d8169.
> 
> This is a fairly regular sockets-API implementation, so we're just
> describing the semantics of socket, bind, sendto and recvfrom for the
> new protocol.
> 
> Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
> 

I have a few nitpicks below :)

Thanks!

Alex

> ---
> v2:
>   - Fix synopsis variable formatting
>   - Semantic newlines
>   - remove unnecessary escape
>   - make custom constants more obvious
>   - Add URI breakpoints
>   - fix sockaddr_mctp misuse
>   - update to reflect upstream sockaddr_mctp types
> ---
>   man7/address_families.7 |   7 ++
>   man7/mctp.7             | 187 ++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 194 insertions(+)
>   create mode 100644 man7/mctp.7
> 
> diff --git a/man7/address_families.7 b/man7/address_families.7
> index 3e535e66d..3c8299e69 100644
> --- a/man7/address_families.7
> +++ b/man7/address_families.7
> @@ -405,6 +405,13 @@ XDP (express data path) interface (since Linux 4.18).
>   See
>   .I Documentation/networking/af_xdp.rst
>   in the Linux kernel source tree for details.
> +.TP
> +.B AF_MCTP
> +.\" commit: bc49d8169aa72295104f1558830c568efb946315
> +MCTP (Management Component Transport Protocol) interface (since Linux 5.15),
> +as defined by the DMTF specification DSP0236.
> +For further information, see
> +.BR mctp (7).
>   .SH SEE ALSO
>   .BR socket (2),
>   .BR socket (7)
> diff --git a/man7/mctp.7 b/man7/mctp.7
> new file mode 100644
> index 000000000..506826ed6
> --- /dev/null
> +++ b/man7/mctp.7
> @@ -0,0 +1,187 @@
> +.\" Copyright (c) 2021 Jeremy Kerr <jk@codeconstruct.com.au>
> +.\"
> +.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
> +.\" This is free documentation; you can redistribute it and/or
> +.\" modify it under the terms of the GNU General Public License as
> +.\" published by the Free Software Foundation; either version 2 of
> +.\" the License, or (at your option) any later version.
> +.\"
> +.\" The GNU General Public License's references to "object code"
> +.\" and "executables" are to be interpreted as the output of any
> +.\" document formatting or typesetting system, including
> +.\" intermediate and printed output.
> +.\"
> +.\" This manual is distributed in the hope that it will be useful,
> +.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
> +.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +.\" GNU General Public License for more details.
> +.\"
> +.\" You should have received a copy of the GNU General Public
> +.\" License along with this manual; if not, see
> +.\" <http://www.gnu.org/licenses/>.
> +.\" %%%LICENSE_END
> +.\" commit: bc49d8169aa72295104f1558830c568efb946315
> +.TH MCTP  7 2021-10-14 "Linux" "Linux Programmer's Manual"
> +.SH NAME
> +mctp \- Management Component Transport Protocol
> +.SH SYNOPSIS
> +.nf
> +.B #include <sys/socket.h>
> +.B #include <linux/mctp.h>

I prefer alphabetic sorting of includes.  And except for the one that 
provides the prototype of the function, I also want them to have a 
comment saying why they are needed.

See for example membarrier(2):

[
SYNOPSIS
        #include <linux/membarrier.h> /* Definition of MEMBARRIER_* 
constants */
        #include <sys/syscall.h>      /* Definition of SYS_* constants */
        #include <unistd.h>

        int syscall(SYS_membarrier, int cmd, unsigned int flags, int 
cpu_id);

        Note:  glibc provides no wrapper for membarrier(), neces-
        sitating the use of syscall(2).

]

Since syscall() is provided by <unistd.h>, it doesn't have a comment. 
Any other include has its comment.  Also, please have a look at that 
page for the formatting of those comments.

> +.PP
> +.IB mctp_socket " = socket(AF_MCTP, SOCK_DGRAM, 0);"
> +.fi
> +.SH DESCRIPTION
> +Linux implements the Management Component Transport Protocol (MCTP),
> +specified by DMTF spec DSP0236, currently at version 1.
> +This is a connectionless protocol, typically used between devices within a
> +server system.
> +Message reliability and ordering are not guaranteed, but message boundaries are
> +preserved.
> +.PP
> +The API for MCTP messaging uses a standard sockets interface, using the
> +.BR sendto (2)
> +and
> +.BR recvfrom (2)
> +classes of system calls to transfer messages.
> +Messages may be fragmented into packets before transmission, and reassembled at
> +the remote endpoint.

Break at the comma.

> +This fragmentation and reassembly is transparent to userspace.
> +.PP
> +.SS Address format
> +MCTP addresses (also referred to as EIDs, or Endpoint Identifiers) are
> +single-byte values, typed as
> +.BR mctp_eid_t .
Types should be in italics.

Branden, I thought this was specified somewhere, but I can't find it. 
Do you know where it is?  Or maybe if your more up to date 
groff_man[_style](7) pages mention that?

groff_man(7) (groff 1.22.4):

[
        .I [text]
               Set text in italics.  If the macro is given no ar-
               guments, the text of the next input line is set in
               italics.

               Use italics for file and path names, for  environ-
               ment  variables,  for  enumeration or preprocessor
               constants in  C,  for  variable  (user-determined)
               portions  of syntax synopses, for the first occur-
               rence only of a  technical  concept  being  intro-
               duced,  for  names of works of software (including
               commands and functions, but excluding names of op-
               erating  systems or their kernels), and anywhere a
               parameter requiring replacement by the user is en-
               countered.  An exception involves variable text in
               a context that is already marked  up  in  italics,
               such  as  file  or path names with variable compo-
               nents; in such cases,  follow  the  convention  of
               mathematical typography: set the file or path name
               in italics as usual (see .IR below), but use roman
               for  the  variable part, and italics again in run-
               ning roman text when referring to the variable ma-
               terial.
]

Anyway, for you Jeremy, I have other pages to follow for consistency:
For example, gettimeofday(2).

> +Packets between a local and remote endpoint are identified by the source

Break after "by" (or perhaps just before it).

> +and destination EIDs, plus a three-bit tag value.
> +.PP
> +Addressing data is passed in socket system calls through
> +.B struct sockaddr_mctp
Type in italics too.

> +defined as:
> +.PP
> +.in +4n
> +.EX
> +typedef uint8_t        mctp_eid_t;
> +
> +struct mctp_addr {
> +    mctp_eid_t         s_addr;
> +};
> +
> +struct sockaddr_mctp {
> +    unsigned short     smctp_family;  /* = AF_MCTP */
> +    uint16_t           __smctp_pad0;  /* pad, must be zero */
> +    int                smctp_network; /* local network identifier */
> +    struct mctp_addr   smctp_addr;    /* EID */
> +    uint8_t            smctp_type;    /* message type byte */
> +    uint8_t            smctp_tag;     /* tag value, including TO flag */
> +    uint8_t            __smctp_pad1;  /* pad, must be zero */

Do we want to tie the implementation to this pad?
Future implementations of sockaddr_mctp are not going to use that byte 
for anything else?

If that definition is not to be set in stone, we probably want to cover 
it with a layer of mistery.  system_data_types(7) for example mentions:

[
        The structures described in this manual page  shall  con-
        tain, at least, the members shown in their definition, in
        no particular order.
]

Something similar might be good for this page.  Maybe "trailing fields 
may be added in the future to this structure.  The structure should be 
zeroed before use, so that future fields are zeroed" or something like 
that (I'm not very inspired for the wording, sorry :), and then remove 
the pad field.

> +};
> +.EE
> +.in
> +.SS Sending messages
> +Messages can be transmitted using the
> +.BR sendto (2)
> +and
> +.BR sendmsg (2)
> +system calls, by providing a
> +.B struct sockaddr_mctp
> +describing the addressing parameters.
> +.PP
> +.in +4n
> +.EX
> +struct sockaddr_mctp addr;
> +ssize_t len;
> +char *buf;
> +
> +/* unused fields must be zero */
> +memset(&addr, 0, sizeof(addr));
> +
> +/* set message destination */
> +addr.smctp_family = AF_MCTP;
> +addr.smctp_network = 0;
> +addr.smctp_addr.s_addr = 8; /* remote EID */
> +addr.smctp_tag = MCTP_TAG_OWNER;
> +addr.smctp_type = MYPROGRAM_MCTP_TYPE_ECHO; /* example type */
> +
> +buf = "hello, world!"
> +
> +len = sendto(sd, buf, 13, 0,
> +             (struct sockaddr *)&addr, sizeof(addr));
> +.EE
> +.in
> +.PP
> +Here, the sender owns the message tag; so
> +.B MCTP_TAG_OWNER
> +is used as the tag data.
> +The kernel will allocate a specific tag value for this message.
> +If no tag is available,
> +.BR sendto (2)
> +will return an error, with errno set to
> +.BR EBUSY .

This is correct.  Please ignore the following comment.

Only for Branden:  I just noticed a difference between man-pages(7) and 
groff_man(7) advise:  groff_man(7) advises to use italics for 
preprocessor constants, while man-pages(7) recommends bold:

[
        Special macros, which are usually in  uppercase,  are  in
        bold (e.g., MAXINT).  Exception: don't boldface NULL.
]

I find it better with bold, since that differentiates variables from 
constants.

> +This allocated tag remains associated with the socket, so that any replies to
> +the sent message will be received by the same socket.
> +.PP
> +Sending a MCTP message requires the
> +.B CAP_NET_RAW
> +capability.
> +.SS Receiving messages
> +Messages can be received using the
> +.BR recvfrom (2)
> +and
> +.BR recvmsg (2)
> +system calls.
> +.PP
> +.in +4n
> +.EX
> +struct sockaddr_mctp addr;
> +socklen_t addrlen;
> +char buf[13];
> +
> +addrlen = sizeof(addr);
> +
> +len = recvfrom(sd, buf, sizeof(buf), 0,
> +               (struct sockaddr *)&addr, &addrlen);
> +.EE
> +.in
> +.PP
> +In order to receive an incoming message, the receiver will need to either have
> +previously sent a message to the same endpoint, or performed a
> +.BR bind (2)
> +to receive all messages of a certain type:
> +.PP
> +.in +4n
> +.EX
> +struct sockaddr_mctp addr;
> +
> +addr.smctp_family = AF_MCTP;
> +addr.smctp_network = MCTP_NET_ANY;
> +addr.smctp_addr.s_addr = MCTP_ADDR_ANY;
> +addr.smctp_type = MYPROGRAM_MCTP_TYPE_ECHO; /* our 'echo' type */
> +
> +int rc = bind(sd, (struct sockaddr *)&addr, sizeof(addr));
> +.EE
> +.in
> +.PP
> +This call requires the
> +.B CAP_NET_BIND_SERVICE
> +capability, and will result in the socket receiving all messages sent to
> +locally-assigned EIDs, for the specified message type.
> +.PP
> +After a
> +.BR recvfrom (2)
> +or
> +.BR recvmsg (2)
> +returns a success condition, the provided address argument will be populated
> +with the sender's network and EID, as well as the tag value used for the
> +message.
> +Any reply to this message should pass the same address and tag value (with the
> +TO bit cleared) to indicate that is is directed to the same remote endpoint.
> +.SH SEE ALSO
> +.BR socket (7)
> +.PP
> +The kernel source file
> +.IR Documentation/networking/mctp.rst .
> +.PP
> +The DSP0236 specification, at
> +.UR https://www.dmtf.org\:/standards\:/pmci

Please, use \: after /, not before.  So '/\:'.  I had a discussion with 
Branden, and he showed me the merits of that style:

<https://lists.gnu.org/archive/html/groff/2021-10/msg00046.html>

> +.UE .
> 

-- 
Alejandro Colomar
Linux man-pages comaintainer; http://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] mctp.7: Add man page for Linux MCTP support
  2021-11-11 21:38 ` Alejandro Colomar (man-pages)
@ 2021-11-12  1:12   ` Jeremy Kerr
  2021-11-12 18:45     ` Alejandro Colomar (man-pages)
  2021-11-12  9:35   ` G. Branden Robinson
  1 sibling, 1 reply; 15+ messages in thread
From: Jeremy Kerr @ 2021-11-12  1:12 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages), G. Branden Robinson
  Cc: Michael Kerrisk, linux-man

Hi Alex,

Thanks for the review! I've updated in line with most of your comments,
and will send a v3 soon. However, I do have a couple of queries, mainly
for my own understanding:

> > +.SH SYNOPSIS
> > +.nf
> > +.B #include <sys/socket.h>
> > +.B #include <linux/mctp.h>
> 
> I prefer alphabetic sorting of includes.

OK, does that take priority over the convention to list the header(s)
specific to this man page last?

In that case, we end up with:

    #include <linux/mctp.h>
    #include <sys/socket.h> /* Definition of socket() & SOCK_DGRAM */

> > +.PP
> > +The API for MCTP messaging uses a standard sockets interface, using the
> > +.BR sendto (2)
> > +and
> > +.BR recvfrom (2)
> > +classes of system calls to transfer messages.
> > +Messages may be fragmented into packets before transmission, and reassembled at
> > +the remote endpoint.
> 
> Break at the comma.

Just this comma? or all of them? There's a couple of sentences right
before this one that would seem to have a similar style - if it's the
former, for my own learning here: what makes this one different?

[and you mean a line-break, right? or a break-point escape sequence?]

> > +Packets between a local and remote endpoint are identified by the source
> 
> Break after "by" (or perhaps just before it).

Same as the above, why is this?

> > +struct sockaddr_mctp {
> > +    unsigned short     smctp_family;  /* = AF_MCTP */
> > +    uint16_t           __smctp_pad0;  /* pad, must be zero */
> > +    int                smctp_network; /* local network identifier */
> > +    struct mctp_addr   smctp_addr;    /* EID */
> > +    uint8_t            smctp_type;    /* message type byte */
> > +    uint8_t            smctp_tag;     /* tag value, including TO flag */
> > +    uint8_t            __smctp_pad1;  /* pad, must be zero */
> 
> Do we want to tie the implementation to this pad?

Yes. The pad will be there anyway, due to the natural alignment of the
struct. Since we want to be explicit about the padding (and require it
to be zeroed), I would strongly suggest keeping it documented.

There is an 'extended' MCTP addressing struct, which encapsulates a
struct sockaddr_mctp. For us to be absolutely clear about the layout of
that structure, the explicit pad here makes that unambiguous.

[unless, for man pages, we don't care about the ABI, only the API?]

> Future implementations of sockaddr_mctp are not going to use that
> byte for anything else?

They might, hence requiring zero at present.

Cheers,


Jeremy


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] mctp.7: Add man page for Linux MCTP support
  2021-11-11 21:38 ` Alejandro Colomar (man-pages)
  2021-11-12  1:12   ` Jeremy Kerr
@ 2021-11-12  9:35   ` G. Branden Robinson
  2021-11-12 19:50     ` Alejandro Colomar (man-pages)
  1 sibling, 1 reply; 15+ messages in thread
From: G. Branden Robinson @ 2021-11-12  9:35 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages); +Cc: Jeremy Kerr, Michael Kerrisk, linux-man

[-- Attachment #1: Type: text/plain, Size: 5094 bytes --]

Hi, Alex!

At 2021-11-11T22:38:43+0100, Alejandro Colomar (man-pages) wrote:
> > +Messages may be fragmented into packets before transmission, and reassembled at
> > +the remote endpoint.
> 
> Break at the comma.

I use "/[;:,]." in vi(m) to help myself find these quickly (you can get
false positives in comments; a more sophisticated regex that one might
want to bind to a key can rule those out).  Breaking input lines after
commas, semicolons, and colons is considered good style by *roff
veterans going back to Kernighan in 1974[1].

"/[!?.][^\\]" is more important--it's an un-semantic-newline finder
(though again with some false positives).  Those have a real impact on
the resulting typography (due to inter-sentence spacing).

> Types should be in italics.
> 
> Branden, I thought this was specified somewhere, but I can't find it.
> Do you know where it is?  Or maybe if your more up to date
> groff_man[_style](7) pages mention that?

Nope, apparently I never made a prescription in this area.  It's worth
making explicit note of, since it deviates from the "literal -> bold,
variable -> italics" mapping that people overgeneralize/overapply.

So I'll queue this up for my next revision of groff_man_style(7).  Thank
you for catching it!

> groff_man(7) (groff 1.22.4):
[...]
>               Use italics for
[...]
>               for  names of works of software (including
>               commands and functions, but excluding names of op-
>               erating  systems or their kernels),

As an FYI, I'm feeling an urge to drop the foregoing item of advice.
Exceptions are often also made for names of software packages (both in
the loose sense and the technical one--who italicizes "TeX", for
example?); usage is so inconsistent that I despair of providing
comprehensible guidance.

Now that groff man(7) has the 'MR' semantic macro for man page cross
references[2], most of the instances where people would fail to
italicize will be taken care of without the foregoing.

> Anyway, for you Jeremy, I have other pages to follow for consistency:
> For example, gettimeofday(2).
> 
> > +Packets between a local and remote endpoint are identified by the source
> 
> Break after "by" (or perhaps just before it).

Phrasal semantic newlines!  :D  This 180-proof Kernighan whiskey is a
stronger prescription than I would write (mainly because it requires
natural-language-aware grepping), but if your contributors don't rebel,
I think we will all ultimately see the benefits in diffs.

> Something similar might be good for this page.  Maybe "trailing fields
> may be added in the future to this structure.  The structure should be
> zeroed before use, so that future fields are zeroed" or something like
> that (I'm not very inspired for the wording, sorry :), and then remove
> the pad field.

The idiom is `memset(mystructp, 0, sizeof(struct mystruct));`, isn't it?
If so, then maybe the point doesn't need to be made.

> Only for Branden:  I just noticed a difference between man-pages(7)
> and groff_man(7) advise:  groff_man(7) advises to use italics for
> preprocessor constants, while man-pages(7) recommends bold:
> 
> [
>        Special macros, which are usually in  uppercase,  are  in
>        bold (e.g., MAXINT).  Exception: don't boldface NULL.
> ]

That was a deliberate difference on my part, motivated partially by own
preference for reduction of what I regard as excessive use of bold in
man pages since the '90s, and partly due to precedent.  The 4.4BSD book
by McKusick, et al., for example, uses italicized small caps for some
things enumeration constants (like open(2) flags) and upright small caps
for others (like errno(3) values and signal names).  man(1) output to a
terminal just doesn't have enough typefaces to go around.

"If in any doubt, use bold" seems to have become the prevailing wisdom
in the 1990s due, as far as I can tell, to a historical accident
involving the (lack of) capability of VGA hardware and text mode console
drivers[3].  Some readers might remember the days when getting an X11
server working on your hardware was considered an achievement.

> I find it better with bold, since that differentiates variables from
> constants.

Would we then also bold constants that are C objects with the "const"
type qualifier rather than language literals emplaced by the
preprocessor?

My intuition is that this distinction isn't worth making with a
typeface; the use of bold is not necessary to cue the user that they
should not redefine a symbol, since there are plenty of other things
set in italics that the user _also_ shouldn't (try to) redefine.

I'm certainly open to hammering out a reasoned basis for typeface
selections, though.  Much of current practice arose in an ad hoc way.

Regards,
Branden

[1] https://rhodesmill.org/brandon/2012/one-sentence-per-line/
[2] https://lists.gnu.org/archive/html/groff/2021-10/msg00012.html
[3] Warning: lengthy.
    https://lists.gnu.org/archive/html/groff/2021-08/msg00023.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] mctp.7: Add man page for Linux MCTP support
  2021-11-12  1:12   ` Jeremy Kerr
@ 2021-11-12 18:45     ` Alejandro Colomar (man-pages)
  2021-11-12 19:40       ` G. Branden Robinson
  2021-11-18  5:08       ` Jeremy Kerr
  0 siblings, 2 replies; 15+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-11-12 18:45 UTC (permalink / raw)
  To: Jeremy Kerr; +Cc: Michael Kerrisk, G. Branden Robinson, linux-man

Hi Jeremy,

On 11/12/21 02:12, Jeremy Kerr wrote:
> Hi Alex,
> 
> Thanks for the review! I've updated in line with most of your comments,
> and will send a v3 soon. However, I do have a couple of queries, mainly
> for my own understanding:

Sure.

> 
>>> +.SH SYNOPSIS
>>> +.nf
>>> +.B #include <sys/socket.h>
>>> +.B #include <linux/mctp.h>
>>
>> I prefer alphabetic sorting of includes.
> 
> OK, does that take priority over the convention to list the header(s)
> specific to this man page last?

I didn't even know there was such a convention, but if there is, yes, I 
explicitly want to override it.

Rationales:

<https://google.github.io/styleguide/cppguide.html#Include_What_You_Use>
<https://google.github.io/styleguide/cppguide.html#Names_and_Order_of_Includes>
<https://stackoverflow.com/a/2763292/6872717>
<https://stackoverflow.com/a/2762626/6872717>

Basically, alphabetic order avoids undeclared dependencies that can be 
hidden in some specific orderings which "just work but if you reorder 
them it breaks".  Of course, it can still hide some dependency, but it's 
more unlikely.  It's a "this can be random order, but let's use 
something more readable and consistent than random".

Unless for some exception that I can't remember now, Google's style 
guide applies to includes in the man-pages (or I intend it to be so).


> 
> In that case, we end up with:
> 
>      #include <linux/mctp.h>
>      #include <sys/socket.h> /* Definition of socket() & SOCK_DGRAM */

Since <sys/socket.h> provides the prototype socket(), it's 
<linux/mctp.h> that should specify why it's needed, so it should be

       #include <linux/mctp.h>  /* Definition of AF_MCTP */
       #include <sys/socket.h>

Also, please use at least 2 spaces between code and comments (unless the 
line goes close to the 78 column (right margin), in which case, reducing 
that space is the most readable thing to do).

Rationale:

<https://google.github.io/styleguide/cppguide.html#Horizontal_Whitespace>

That helps visually separate code from non-code.

> 
>>> +.PP
>>> +The API for MCTP messaging uses a standard sockets interface, using the
>>> +.BR sendto (2)
>>> +and
>>> +.BR recvfrom (2)
>>> +classes of system calls to transfer messages.
>>> +Messages may be fragmented into packets before transmission, and reassembled at
>>> +the remote endpoint.
>>
>> Break at the comma.
> 
> Just this comma? or all of them? There's a couple of sentences right
> before this one that would seem to have a similar style - if it's the
> former, for my own learning here: what makes this one different?

There are a few more that I missed, that's right:

[
 > +This is a connectionless protocol, typically used between devices 
within a
 > +server system.
 > +Message reliability and ordering are not guaranteed, but message 
boundaries are
 > +preserved.
]

Those should also be broken at the comma.  Rationale: semantic newlines 
(man-pages(7)).

In the case of the following one, although you could break at it if you 
want (maybe better for consistency), I won't enforce it too much, since 
it is a couple of words and the line is already broken in a non-semantic 
way due to the formatting.  So I don't care too much:

[
 > +The API for MCTP messaging uses a standard sockets interface, using the
 > +.BR sendto (2)
]

As Branden said, you can use "/[;:,]." and "/[!?.][^\\]" to check the 
correct use of semantic newlines.

> 
> [and you mean a line-break, right? or a break-point escape sequence?]

Yes, line break.

> 
>>> +Packets between a local and remote endpoint are identified by the source
>>
>> Break after "by" (or perhaps just before it).
> 
> Same as the above, why is this?

This is more or less for the same reasons as above, semantic newlines, 
but it goes a bit deeper.  Branden and I discussed a few months ago 
about my strong preference for semantic newlines not only in clause 
boundaries but also phrase boundaries.

man-pages(7) recommends breaking long lines at clause boundaries 
(commas, semicolons, and so on), but somethimes clauses (if you don't 
know the difference between phrases and clauses, which you don't need 
to, basically clauses are made up of phrases) are too long, and you can 
have a single clause that uses more than a single line.  In those cases, 
the most sensible place to break the line is at the next level: phrase 
boundaries.

"the source and destination EIDs" is a single phrase, so it's a bit 
weird to break the line in the middle of it.  I avoid breaking phrases, 
which makes reading the source code a bit more difficult.  Hopefully, it 
will also make diffs easier to read in the future.

> 
>>> +struct sockaddr_mctp {
>>> +    unsigned short     smctp_family;  /* = AF_MCTP */
>>> +    uint16_t           __smctp_pad0;  /* pad, must be zero */
>>> +    int                smctp_network; /* local network identifier */
>>> +    struct mctp_addr   smctp_addr;    /* EID */
>>> +    uint8_t            smctp_type;    /* message type byte */
>>> +    uint8_t            smctp_tag;     /* tag value, including TO flag */
>>> +    uint8_t            __smctp_pad1;  /* pad, must be zero */
>>
>> Do we want to tie the implementation to this pad?
> 
> Yes. The pad will be there anyway, due to the natural alignment of the
> struct. Since we want to be explicit about the padding (and require it
> to be zeroed), I would strongly suggest keeping it documented.

If there was padding in the middle of the struct, yes, it should 
definitely be documented in the man page.

> 
> There is an 'extended' MCTP addressing struct, which encapsulates a
> struct sockaddr_mctp. For us to be absolutely clear about the layout of
> that structure, the explicit pad here makes that unambiguous.

What I mean is, if in the future this structure will have additional 
trailing fields, documenting this padding is unnecessary, since that may 
vary.  Code should not rely on this structure having _only_ that 
padding.  And if code handles any arbitrary extra stuff in this 
structure, it will implicitly also handle that __smctp_pad1 field, so 
there's no need to mention it.

Example:

struct sockaddr_mctp {
     unsigned short     smctp_family;  /* = AF_MCTP */
     uint16_t           __smctp_pad0;  /* pad, must be zero */
     int                smctp_network; /* local network identifier */
     struct mctp_addr   smctp_addr;    /* EID */
     uint8_t            smctp_type;    /* message type byte */
     uint8_t            smctp_tag;     /* tag value, including TO flag */
     uint8_t            foo;           /* was __smctp_pad1 */
     uint8_t            bar;           /* extra stuff */
};

Here I got rid of the pad, and even added an extra field.  Code should 
be written to be compatible with this case, right?  If so, I don't see 
any reason to document that padding field, IMHO.

Also, we prevent some crazy programmers from relying on that padding 
byte being actually padding and not something else, even if it "must" be 
zero.  I've seen too much crazy stuff; programmers relying on undefined 
behavior just because "we don't plan to move from C++17 to C++20, so 
this is safe".

> 
> [unless, for man pages, we don't care about the ABI, only the API?]

We care about the ABI.  Especially, if it's about a type that we control.

In the case of ISO C or POSIX types, system_data_types(7) doesn't 
document Linux-specific details, for portability reasons, but that's an 
exception, not the rule.

> 
>> Future implementations of sockaddr_mctp are not going to use that
>> byte for anything else?
> 
> They might, hence requiring zero at present.

Okay, then code should be able to handle _any_ trailing fields, 
including that padding, so documenting it is irrelevant, I think.  We 
could say something like "trailing fields may be added to this 
structure", and that already implies the current padding byte, doesn't it?


Cheers,
Alex


-- 
Alejandro Colomar
Linux man-pages comaintainer; http://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] mctp.7: Add man page for Linux MCTP support
  2021-11-12 18:45     ` Alejandro Colomar (man-pages)
@ 2021-11-12 19:40       ` G. Branden Robinson
  2021-11-12 20:07         ` Alejandro Colomar (man-pages)
  2021-11-18  5:08       ` Jeremy Kerr
  1 sibling, 1 reply; 15+ messages in thread
From: G. Branden Robinson @ 2021-11-12 19:40 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages); +Cc: Jeremy Kerr, Michael Kerrisk, linux-man

[-- Attachment #1: Type: text/plain, Size: 2118 bytes --]

Hi, Alex!

At 2021-11-12T19:45:16+0100, Alejandro Colomar (man-pages) wrote:
> man-pages(7) recommends breaking long lines at clause boundaries
> (commas, semicolons, and so on), but somethimes clauses (if you don't
> know the difference between phrases and clauses, which you don't need
> to, basically clauses are made up of phrases)

Slightly more formally, a clause is a grammatical unit containing a
subject and a predicate: "allocation fails" is a near-minimal example.
I wouldn't be surprised if caveats exist for "non-standard" speech, but
man pages are written in a fairly formal register so the rule should
apply reliably.  A moment's attention to clause boundaries can help one
clarify or recast complex constructions to aid readability and accuracy.

> "the source and destination EIDs" is a single phrase, so it's a bit
> weird to break the line in the middle of it.  I avoid breaking
> phrases, which makes reading the source code a bit more difficult.
> Hopefully, it will also make diffs easier to read in the future.

My only gripe about list-related phrase breaks, and it's a minor one, is
that in English grammar they nearly always require a conjunction ("and",
"or") somewhere, and no matter where I place it, it feels wrong, because
it's neither an item of the list I'm itemizing, nor part of an item.

I mention this because I suspect there is a point of diminishing returns
in our prescriptive advice, because English is an unwieldy beast that
was not formally designed as any respectable system would be.

Forget killing baby Hitler--if I ever get a time machine, I'm going to
go back and teach Shakespeare Haskell.

> Also, we prevent some crazy programmers from relying on that padding
> byte being actually padding and not something else, even if it "must"
> be zero.  I've seen too much crazy stuff; programmers relying on
> undefined behavior just because "we don't plan to move from C++17 to
> C++20, so this is safe".

+1.  One can almost see them producing a cowboy hat from an
extradimensional space in their cubicle.  Strap in--the UB bronco is
gonna start bucking.

Regards,
Branden

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] mctp.7: Add man page for Linux MCTP support
  2021-11-12  9:35   ` G. Branden Robinson
@ 2021-11-12 19:50     ` Alejandro Colomar (man-pages)
  2021-11-22  9:06       ` Bold and italics, semantics and constness (was: [PATCH v2] mctp.7: Add man page for Linux MCTP support) G. Branden Robinson
  0 siblings, 1 reply; 15+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-11-12 19:50 UTC (permalink / raw)
  To: G. Branden Robinson; +Cc: Jeremy Kerr, Michael Kerrisk, linux-man



On 11/12/21 10:35, G. Branden Robinson wrote:
> Hi, Alex!
> 
> At 2021-11-11T22:38:43+0100, Alejandro Colomar (man-pages) wrote:
>>> +Messages may be fragmented into packets before transmission, and reassembled at
>>> +the remote endpoint.
>>
>> Break at the comma.
> 
> I use "/[;:,]." in vi(m) to help myself find these quickly (you can get
> false positives in comments; a more sophisticated regex that one might
> want to bind to a key can rule those out).  Breaking input lines after
> commas, semicolons, and colons is considered good style by *roff
> veterans going back to Kernighan in 1974[1].
> 
> "/[!?.][^\\]" is more important--it's an un-semantic-newline finder
> (though again with some false positives).  Those have a real impact on
> the resulting typography (due to inter-sentence spacing).
> 
>> Types should be in italics.
>>
>> Branden, I thought this was specified somewhere, but I can't find it.
>> Do you know where it is?  Or maybe if your more up to date
>> groff_man[_style](7) pages mention that?
> 
> Nope, apparently I never made a prescription in this area.  It's worth
> making explicit note of, since it deviates from the "literal -> bold,
> variable -> italics" mapping that people overgeneralize/overapply.

I'll save this for below as an argument.

> 
> So I'll queue this up for my next revision of groff_man_style(7).  Thank
> you for catching it!

It's a pleasure! :-}

> 
>> groff_man(7) (groff 1.22.4):
> [...]
>>                Use italics for
> [...]
>>                for  names of works of software (including
>>                commands and functions, but excluding names of op-
>>                erating  systems or their kernels),
> 
> As an FYI, I'm feeling an urge to drop the foregoing item of advice.
> Exceptions are often also made for names of software packages (both in
> the loose sense and the technical one--who italicizes "TeX", for
> example?); usage is so inconsistent that I despair of providing
> comprehensible guidance.

Okay, I had to write about a different package recently, and I had some 
doubts if I should or not, given current status quo.  If we completely 
remove it, okay.  Maybe Michael will be more conservative, I don't know. 
  But the status quo is already very screwed, since I seldom see that used.

I think there are a few pages that may make use of it, but I don't 
remember which.  Please give me some time (maybe a month? I hope it 
isn't too much) to come with feedback about usage of this in current 
pages, before you remove it.

> 
> Now that groff man(7) has the 'MR' semantic macro for man page cross
> references[2], most of the instances where people would fail to
> italicize will be taken care of without the foregoing.

If only each package had its own manual page...  Not even in Debian...

> 
>> Anyway, for you Jeremy, I have other pages to follow for consistency:
>> For example, gettimeofday(2).
>>
>>> +Packets between a local and remote endpoint are identified by the source
>>
>> Break after "by" (or perhaps just before it).
> 
> Phrasal semantic newlines!  :D  This 180-proof Kernighan whiskey is a
> stronger prescription than I would write (mainly because it requires
> natural-language-aware grepping), but if your contributors don't rebel,
> I think we will all ultimately see the benefits in diffs.

I feel an urge to add it to man-pages(7).  :-}

> 
>> Something similar might be good for this page.  Maybe "trailing fields
>> may be added in the future to this structure.  The structure should be
>> zeroed before use, so that future fields are zeroed" or something like
>> that (I'm not very inspired for the wording, sorry :), and then remove
>> the pad field.
> 
> The idiom is `memset(mystructp, 0, sizeof(struct mystruct));`, isn't it?

Yes.

> If so, then maybe the point doesn't need to be made.

Well, if someone doesn't know that idiom, it may leave the structure 
with garbage padding, so I'd put some notice, even if it's very short.

> 
>> Only for Branden:  I just noticed a difference between man-pages(7)
>> and groff_man(7) advise:  groff_man(7) advises to use italics for
>> preprocessor constants, while man-pages(7) recommends bold:
>>
>> [
>>         Special macros, which are usually in  uppercase,  are  in
>>         bold (e.g., MAXINT).  Exception: don't boldface NULL.
>> ]
> 
> That was a deliberate difference on my part, motivated partially by own
> preference for reduction of what I regard as excessive use of bold in
> man pages since the '90s, and partly due to precedent.  The 4.4BSD book
> by McKusick, et al., for example, uses italicized small caps for some
> things enumeration constants (like open(2) flags) and upright small caps
> for others (like errno(3) values and signal names).  man(1) output to a
> terminal just doesn't have enough typefaces to go around.
> 
> "If in any doubt, use bold" seems to have become the prevailing wisdom
> in the 1990s due, as far as I can tell, to a historical accident
> involving the (lack of) capability of VGA hardware and text mode console
> drivers[3].  Some readers might remember the days when getting an X11
> server working on your hardware was considered an achievement.
> 
>> I find it better with bold, since that differentiates variables from
>> constants.
> 
> Would we then also bold constants that are C objects with the "const"
> type qualifier rather than language literals emplaced by the
> preprocessor?

Yes!  The difference between "const" variables and macros is just 
preprocessor, but they are all intended for very similar usage.

> 
> My intuition is that this distinction isn't worth making with a
> typeface; the use of bold is not necessary to cue the user that they
> should not redefine a symbol, since there are plenty of other things
> set in italics that the user _also_ shouldn't (try to) redefine.
> 
> I'm certainly open to hammering out a reasoned basis for typeface
> selections, though.  Much of current practice arose in an ad hoc way.

Let me try to convince you.

We have a mapping in our brains that says
"literal -> bold, variable -> italics".
If we extend that mapping,
macros are replacements for literals,
so we would use bold for them too.
And "const"s are also mostly intended for the same use as macros,
so bold goes for them too.

Existing practice seems to have followed that
(or maybe a parallel) reasoning.

BTW, I noticed that the Linux man-pages are inconsistent with
the mapping of literal -> bold,
and tend to not highlight literals.
I'll change that for future patches.

Did I convince you? :)


Cheers,
Alex


-- 
Alejandro Colomar
Linux man-pages comaintainer; http://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] mctp.7: Add man page for Linux MCTP support
  2021-11-12 19:40       ` G. Branden Robinson
@ 2021-11-12 20:07         ` Alejandro Colomar (man-pages)
  2021-11-12 20:07           ` Alejandro Colomar (man-pages)
  0 siblings, 1 reply; 15+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-11-12 20:07 UTC (permalink / raw)
  To: G. Branden Robinson; +Cc: Jeremy Kerr, Michael Kerrisk, linux-man



On 11/12/21 20:40, G. Branden Robinson wrote:
> At 2021-11-12T19:45:16+0100, Alejandro Colomar (man-pages) wrote:
>> Also, we prevent some crazy programmers from relying on that padding
>> byte being actually padding and not something else, even if it "must"
>> be zero.  I've seen too much crazy stuff; programmers relying on
>> undefined behavior just because "we don't plan to move from C++17 to
>> C++20, so this is safe".

A more complete report of the horrifying story can be found here:
<https://github.com/tcbrindle/span/issues/42#event-5473150481>

The code to which it refers is closed-source, so I can't link to it.
But you can feel how wrong it was.
It made use of a library that provided C20's std::span for older 
standards, and has some specified behavior for what C20 regarded as UB. 
  Guess what.

> 
> +1.  One can almost see them producing a cowboy hat from an
> extradimensional space in their cubicle.  Strap in--the UB bronco is
> gonna start bucking.

-- 
Alejandro Colomar
Linux man-pages comaintainer; http://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] mctp.7: Add man page for Linux MCTP support
  2021-11-12 20:07         ` Alejandro Colomar (man-pages)
@ 2021-11-12 20:07           ` Alejandro Colomar (man-pages)
  0 siblings, 0 replies; 15+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-11-12 20:07 UTC (permalink / raw)
  To: G. Branden Robinson; +Cc: Jeremy Kerr, Michael Kerrisk, linux-man



On 11/12/21 21:07, Alejandro Colomar (man-pages) wrote:
> 
> 
> On 11/12/21 20:40, G. Branden Robinson wrote:
>> At 2021-11-12T19:45:16+0100, Alejandro Colomar (man-pages) wrote:
>>> Also, we prevent some crazy programmers from relying on that padding
>>> byte being actually padding and not something else, even if it "must"
>>> be zero.  I've seen too much crazy stuff; programmers relying on
>>> undefined behavior just because "we don't plan to move from C++17 to
>>> C++20, so this is safe".
> 
> A more complete report of the horrifying story can be found here:
> <https://github.com/tcbrindle/span/issues/42#event-5473150481>
> 
> The code to which it refers is closed-source, so I can't link to it.
> But you can feel how wrong it was.
> It made use of a library that provided C20's std::span for older 
> standards, and has some specified behavior for what C20 regarded as UB. 
>   Guess what.

D'oh! C++20
-- 
Alejandro Colomar
Linux man-pages comaintainer; http://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] mctp.7: Add man page for Linux MCTP support
  2021-11-12 18:45     ` Alejandro Colomar (man-pages)
  2021-11-12 19:40       ` G. Branden Robinson
@ 2021-11-18  5:08       ` Jeremy Kerr
  2021-11-22 16:35         ` Alejandro Colomar (man-pages)
  1 sibling, 1 reply; 15+ messages in thread
From: Jeremy Kerr @ 2021-11-18  5:08 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages)
  Cc: Michael Kerrisk, G. Branden Robinson, linux-man

Hi Alex,

> I didn't even know there was such a convention, but if there is, yes,
> I explicitly want to override it.

OK, I'll update to suit.

> man-pages(7) recommends breaking long lines at clause boundaries
> (commas, semicolons, and so on), but somethimes clauses (if you don't
> know the difference between phrases and clauses, which you don't need
> to, basically clauses are made up of phrases) are too long, and you
> can have a single clause that uses more than a single line.  In those
> cases, the most sensible place to break the line is at the next level:
> phrase boundaries.

OK, understood, thanks for that.

> What I mean is, if in the future this structure will have additional 
> trailing fields, documenting this padding is unnecessary, since that
> may 
> vary.  Code should not rely on this structure having _only_ that 
> padding.  And if code handles any arbitrary extra stuff in this 
> structure, it will implicitly also handle that __smctp_pad1 field, so
> there's no need to mention it.
> 
> Example:
> 
> struct sockaddr_mctp {
>      unsigned short     smctp_family;  /* = AF_MCTP */
>      uint16_t           __smctp_pad0;  /* pad, must be zero */
>      int                smctp_network; /* local network identifier */
>      struct mctp_addr   smctp_addr;    /* EID */
>      uint8_t            smctp_type;    /* message type byte */
>      uint8_t            smctp_tag;     /* tag value, including TO flag */
>      uint8_t            foo;           /* was __smctp_pad1 */
>      uint8_t            bar;           /* extra stuff */
> };
> 
> Here I got rid of the pad, and even added an extra field.

Right, but you've also broken the ABI if that previous padding byte
wasn't explicitly present, and required to be zero. In that future ABI
implementation, the kernel can't distinguish between 'foo' being properly
initialised, and not just random stack garbage from an old-ABI caller.

That's why we have the _pad1 field, and why we require it to be zero.
Since that's enforced by the kernel, I'd rather have it documented,
rather than users seeing their calls fail for "invisible" reasons, when
a call's _pad1 happens to contain a non-zero byte due to not being
initialised.

> Code should be written to be compatible with this case, right?

I'm not 100% clear on you mean by compatible there - you want to prevent
the case where __smctp_pad1 is removed from the header, and that code is
now referencing an invalid struct member?

That's somewhat unavoidable, and also applies to _pad0; I'm not sure why
_pad1 needs to be different.

> Also, we prevent some crazy programmers from relying on that padding 
> byte being actually padding and not something else, even if it "must"
> be zero.

Right, and the kernel's zero check helps to prevent that.

> I've seen too much crazy stuff; programmers relying on
> undefined behavior just because "we don't plan to move from C++17 to
> C++20, so this is safe".

... sure, but I don't think hiding implementation details will fix that.

Regards,


Jeremy


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Bold and italics, semantics and constness (was: [PATCH v2] mctp.7: Add man page for Linux MCTP support)
  2021-11-12 19:50     ` Alejandro Colomar (man-pages)
@ 2021-11-22  9:06       ` G. Branden Robinson
  2021-11-22 11:50         ` Alejandro Colomar (man-pages)
  0 siblings, 1 reply; 15+ messages in thread
From: G. Branden Robinson @ 2021-11-22  9:06 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages); +Cc: Michael Kerrisk, linux-man

[-- Attachment #1: Type: text/plain, Size: 11156 bytes --]

[Jeremy Kerr dropped from CC list with the change of Subject:]

Hi Alex,

At 2021-11-12T20:50:06+0100, Alejandro Colomar (man-pages) wrote:
> > > Types should be in italics.
> > > 
> > > Branden, I thought this was specified somewhere, but I can't find
> > > it.  Do you know where it is?  Or maybe if your more up to date
> > > groff_man[_style](7) pages mention that?
> > 
> > Nope, apparently I never made a prescription in this area.  It's
> > worth making explicit note of, since it deviates from the "literal
> > -> bold, variable -> italics" mapping that people
> > overgeneralize/overapply.
> 
> I'll save this for below as an argument.

Be sure you keep that observation that people "overgeneralize/overapply"
this rule in mind.

> > > groff_man(7) (groff 1.22.4):
> > [...]
> > >                Use italics for
> > [...]
> > >                for  names of works of software (including
> > >                commands and functions, but excluding names of op-
> > >                erating  systems or their kernels),
> > 
> > As an FYI, I'm feeling an urge to drop the foregoing item of advice.
> > Exceptions are often also made for names of software packages (both in
> > the loose sense and the technical one--who italicizes "TeX", for
> > example?); usage is so inconsistent that I despair of providing
> > comprehensible guidance.
> 
> Okay, I had to write about a different package recently, and I had
> some doubts if I should or not, given current status quo.  If we
> completely remove it, okay.  Maybe Michael will be more conservative,
> I don't know.  But the status quo is already very screwed, since I
> seldom see that used.

I agree.  That was one of my first contributions to groff (3-4 years
ago), and at the time I was more strongly under the influence of the
Chicago Manual of Style.  As I've learned a little more about typography
I've discovered that Chicago can be an untrustworthy oracle.

> I think there are a few pages that may make use of it, but I don't
> remember which.  Please give me some time (maybe a month? I hope it
> isn't too much) to come with feedback about usage of this in current
> pages, before you remove it.

Er, oops, I've already pushed a change to dispose of it.  It wouldn't be
troublesome to put it back, but there might be no need since...

> > Now that groff man(7) has the 'MR' semantic macro for man page cross
> > references[2], most of the instances where people would fail to
> > italicize will be taken care of without the foregoing.
> 
> If only each package had its own manual page...  Not even in Debian...

True, but one of the advantages of the V7 Unix manuals' practice of
setting program names in italics[1] is that you can follow the same
typographic rule irrespective of whether a man page exists for that
program or not.

> > Phrasal semantic newlines!  :D  This 180-proof Kernighan whiskey is
> > a stronger prescription than I would write (mainly because it
> > requires natural-language-aware grepping), but if your contributors
> > don't rebel, I think we will all ultimately see the benefits in
> > diffs.
> 
> I feel an urge to add it to man-pages(7).  :-}

I saw that.  I don't object, exactly, but I would be prepared for some
contributors to simply disregard it.  Some man pages are going to be
more excellent than others.  man-pages(7) calls out wait(2) and pipe(2)
as models, for example.

I think in many cases this Kernighanization of man page running text is
going to fall to you because contributors will feel it unnecessary or
that they cannot justify to their managers the expenditure of the time
necessary to write documentation so scrupulously.  I would agree that it
can improve man page quality--attention to phrase and clause boundaries
compels the writer to re-read what they write and may reveal to them
gaps in the discussion or outright errors.

But documentary diligence is not part of the culture of much software
development these days.  It's not Lean/Agile/XP/MVP.  Documentation is a
cost center.  It's the opposite of secret sauce.  What will please the
Street?  Move fast; break stuff; be far away when the building catches
on fire.

> > The idiom is `memset(mystructp, 0, sizeof(struct mystruct));`, isn't
> > it?
[...]
> Well, if someone doesn't know that idiom, it may leave the structure
> with garbage padding, so I'd put some notice, even if it's very short.

Conceded.

> > > Only for Branden:  I just noticed a difference between
> > > man-pages(7) and groff_man(7) advise:  groff_man(7) advises to use
> > > italics for preprocessor constants, while man-pages(7) recommends
> > > bold:
[...]
> > Would we then also bold constants that are C objects with the
> > "const" type qualifier rather than language literals emplaced by the
> > preprocessor?
> 
> Yes!  The difference between "const" variables and macros is just
> preprocessor, but they are all intended for very similar usage.

Hmm, yes, but personally I wouldn't mind it if we had a notation that
distinguished preprocessor symbols from things the compiler proper sees.
With only 3 faces in man(7), we probably can't get there from here, but
I have dreams of a better world.  A Texinfo partisan would likely point
out that the distinction is already made clear in the GNU C Library
manual, for example.

> > My intuition is that this distinction isn't worth making with a
> > typeface; the use of bold is not necessary to cue the user that they
> > should not redefine a symbol, since there are plenty of other things
> > set in italics that the user _also_ shouldn't (try to) redefine.
> > 
> > I'm certainly open to hammering out a reasoned basis for typeface
> > selections, though.  Much of current practice arose in an ad hoc
> > way.
> 
> Let me try to convince you.
> 
> We have a mapping in our brains that says
> "literal -> bold, variable -> italics".

I don't think I will ever be able to suppress my need to point out that
this is a _loose_ generalization.  Italics are also used for emphasis
and work titles in standard English.

> If we extend that mapping,
> macros are replacements for literals,
> so we would use bold for them too.
> And "const"s are also mostly intended for the same use as macros,
> so bold goes for them too.

But constness can be cast away; and some C library functions that
_should_ take const-qualified parameters or return const-qualified
objects, don't (and worse, got standardized without such qualifiers
either out of excessive deference to poor practice or for fear of
upsetting too many implementors).  So you might mislead the reader who
assumes that a C object in bold type is always const-qualified, leading
them to perform superfluous casts, which make the code noisier to read.
Or you might mislead the reader who assumes that a C object in bold type
is always "semantically" to be treated as "const" (i.e., don't assign to
objects of this name even if you can without provoking a compiler
complaint) might make the opposite false and overgeneralized inference,
and fail to apply "const" where it would be useful.

I'm not saying your idea is bad, but I am deeply wary of coupling to
typography something that has proven as slippery as "const" has in the
history of C as written by mere mortals (who happen to be your
readership, since true rock stars are born knowing everything they'll
ever need, and read nothing).

If you accept my premises, that a bold C object/symbol name doesn't
necessarily mean it's const-qualified, and doesn't necessarily mean that
it's a true C symbol (as opposed to a preprocessor macro), then I think
the argument for setting it in bold at all loses force; the boldness
doesn't _necessarily_ tell the reader anything that italics--or
quotation marks, for that matter--wouldn't.

> Existing practice seems to have followed that (or maybe a parallel)
> reasoning.

It is hard to say.  As I've noted elsewhere, in the early 1990s the old
Unix man page conventions of typeface usage started to fall away.  This
coincided with the rise of Unix on x86 PC hardware, and particularly
with the development of console drivers for VGA display adapters and the
text modes they implemented, which could not render italics.  A bunch of
new people started using Unices at that point, Cynthia Livingston wrote
(or re-wrote) the BSD mdoc package into the design that persists to this
day, and the explosive growth of Linux motivated people to write man
pages in typefaces they could read.  For a while it was considered
really cool to associate different ANSI colors with italics and
sometimes bold.  On my Debian system, man-db man(1) still does this when
using a Linux VC.

> BTW, I noticed that the Linux man-pages are inconsistent with
> the mapping of literal -> bold,
> and tend to not highlight literals.
> I'll change that for future patches.
> 
> Did I convince you? :)

No, but not for lack of trying.  :)  I don't think there is much
distance between us, practically speaking.  There are about 60 groff man
pages totalling, at last count, only 375 dead tree pages, and all of
these are in sections 1, 5, and 7.  I have not acquired deep experience
writing section 2 and 3 man pages, which are the Linux man-pages
project's bread and butter.  I am therefore hesitant to write
prescriptions for the preparation of pages in those sections where there
is no overlap with other sections of the manual.  §2 and 3 tend to have
a different sort of audience (you can be almost certain it's familiar
with the C language, or seeks to be).

Even though I've spilled a lot of electrons discussing typefaces with
you, in the long run I think it is an argument that cannot bear much
fruit.  The mild disagreements we have arise from the unnecessary
conflation of presentation with semantics.  The way out of this morass
is straightforward but revolutionary: add semantic macros to man(7).
With enough man pages revised to use them, distributors and individual
users can set strings (in /etc/groff/man.local or a $HOME counterpart)
to declare their style preferences, and get formatted output they can be
happy with.  This would extend a technique groff man(7) already employs
with its "HF" string for (sub)section headings and "MF" for man page
cross references.  These strings work because the new 'MR' is a semantic
macro, and 'TH', 'SH', and 'SS' have been since 1979.

I guess I should mention...I didn't announce it here, but groff 1.23.0
will have a new "MR" macro for man page cross references[2] and its
grotty(1) driver will emit OSC 8 hyperlink escape sequences to
supporting terminal devices for clickable URLs and man page cross
references declared with "MR".[3]

I reckon groff is due for another release candidate...

Regards,
Branden

[1] laboriously documented in
    <https://lists.gnu.org/archive/html/groff/2021-08/msg00023.html>
[2] https://lists.gnu.org/archive/html/groff/2021-10/msg00012.html
[3] https://lists.gnu.org/archive/html/groff/2021-10/msg00000.html


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bold and italics, semantics and constness (was: [PATCH v2] mctp.7: Add man page for Linux MCTP support)
  2021-11-22  9:06       ` Bold and italics, semantics and constness (was: [PATCH v2] mctp.7: Add man page for Linux MCTP support) G. Branden Robinson
@ 2021-11-22 11:50         ` Alejandro Colomar (man-pages)
  2021-11-22 13:52           ` G. Branden Robinson
  0 siblings, 1 reply; 15+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-11-22 11:50 UTC (permalink / raw)
  To: G. Branden Robinson; +Cc: Michael Kerrisk, linux-man

Hi, Branden!

On 11/22/21 10:06, G. Branden Robinson wrote:
>>> Phrasal semantic newlines!  :D  This 180-proof Kernighan whiskey is
>>> a stronger prescription than I would write (mainly because it
>>> requires natural-language-aware grepping), but if your contributors
>>> don't rebel, I think we will all ultimately see the benefits in
>>> diffs.
>>
>> I feel an urge to add it to man-pages(7).  :-}
> 
> I saw that.  I don't object, exactly, but I would be prepared for some
> contributors to simply disregard it.  Some man pages are going to be
> more excellent than others.  man-pages(7) calls out wait(2) and pipe(2)
> as models, for example.

So far, only one rebelled a bit.  I assume that I'll be fixing those few 
pages myself.  If this advise reduces the amount of work that I have to 
do, it's useful.  If some contributors learn to appreciate the benefits 
of thinking in terms of phrases when writing technical stuff, it will 
have been very useful.

> 
> I think in many cases this Kernighanization of man page running text is
> going to fall to you because contributors will feel it unnecessary or
> that they cannot justify to their managers the expenditure of the time
> necessary to write documentation so scrupulously.  I would agree that it
> can improve man page quality--attention to phrase and clause boundaries
> compels the writer to re-read what they write and may reveal to them
> gaps in the discussion or outright errors.

Those that would complain about phrases I think would also complain 
about clauses.  It's the same switch for the brain, just a bit further. 
  If they prefer to maintain the man-pages with looser rules, they can 
do themselves; no-one's paying me for reviewing their patches.  I'll 
keep doing this, as long as it's entertaining to me.  If managers don't 
want to spend worker's time in writing quality pages, I may ask some 
salary for my time reviewing their pages.

> 
> But documentary diligence is not part of the culture of much software
> development these days.  It's not Lean/Agile/XP/MVP.  Documentation is a
> cost center.  It's the opposite of secret sauce.  What will please the
> Street?  Move fast; break stuff; be far away when the building catches
> on fire.

I have had a very bad (and luckily short) experience with 
Lean/Agile/XP/MVP.  If I can help sabotage that, I will happily and 
intentionally do.


>>>> Only for Branden:  I just noticed a difference between
>>>> man-pages(7) and groff_man(7) advise:  groff_man(7) advises to use
>>>> italics for preprocessor constants, while man-pages(7) recommends
>>>> bold:
> [...]
>>> Would we then also bold constants that are C objects with the
>>> "const" type qualifier rather than language literals emplaced by the
>>> preprocessor?
>>
>> Yes!  The difference between "const" variables and macros is just
>> preprocessor, but they are all intended for very similar usage.
> 
> Hmm, yes, but personally I wouldn't mind it if we had a notation that
> distinguished preprocessor symbols from things the compiler proper sees.
> With only 3 faces in man(7), we probably can't get there from here, but
> I have dreams of a better world.  A Texinfo partisan would likely point
> out that the distinction is already made clear in the GNU C Library
> manual, for example.

Are we talking about libc, or C documentation in general?  Because libc 
doesn't have any 'const' variables at all, at least that I know of.

So we don't even need to care in the Linux man-pages.  Maybe manual 
pages for other C libraries can better decide what to do with those.



>> We have a mapping in our brains that says
>> "literal -> bold, variable -> italics".
> 
> I don't think I will ever be able to suppress my need to point out that
> this is a _loose_ generalization.  Italics are also used for emphasis
> and work titles in standard English.
> 
>> If we extend that mapping,
>> macros are replacements for literals,
>> so we would use bold for them too.
>> And "const"s are also mostly intended for the same use as macros,
>> so bold goes for them too.
> 
> But constness can be cast away; and some C library functions that
> _should_ take const-qualified parameters or return const-qualified
> objects, don't (and worse, got standardized without such qualifiers
> either out of excessive deference to poor practice or for fear of
> upsetting too many implementors).  So you might mislead the reader who
> assumes that a C object in bold type is always const-qualified, leading
> them to perform superfluous casts, which make the code noisier to read.
> Or you might mislead the reader who assumes that a C object in bold type
> is always "semantically" to be treated as "const" (i.e., don't assign to
> objects of this name even if you can without provoking a compiler
> complaint) might make the opposite false and overgeneralized inference,
> and fail to apply "const" where it would be useful.

I think we're talking about 2 different things:

- 'const' variables
- pointers to 'const'

'const' variables can never be cast away.  It's an error.

$ cat const.c
void foo(void)
{
	const int a = 1;

	a = 2;
}

That will never compile, and you can't cast 'a' enough so that you can 
make it work.  If you modify it through a pointer to 'const', then you 
can, but it will be Undefined Behavior, which brings us to the other 
thing: pointers to 'const'.

$ cat pointer_to_const.c
void bar(const int *p)
{
	int *q;

	q = (int *)p;
	*q = 3;
}

This is allowed by the compiler, but it is Undefined Behavior _unless_ 
the variable pointed to by 'p' is really not const.  Casting away const 
is IMO also braindamaged, so I don't consider it a valid thing.

One of the things that I like from C++ is that they correctly 
implemented const.  Hey, "asd" is 'char *' in C, but of course if you 
modify it, demons fly out of your nose!


Going back to formatting:

Pointers to const are just variables.  Their value is the address of 
some const, but that's not important.  The important thing is that they 
are variables, and therefore, you use italics with them.

So the only thing that IMHO should be bold (apart from constants and 
macros that expand to constants) should be global 'const' variables:

const int FOO = 3;

which some people prefer over macros, and which in C++, one could write as:

constexpr int FOO = 3;

In the case above, I don't see a reason why one would want to 
differentiate that from say

#define FOO  3


But in function parameters, 'const' is useless in variables, since they 
already got a copy, so they won't alter the original.  And in pointers, 
const applies to the pointee, but the pointer itself is variable, so 
italics.


>> Did I convince you? :)
> 
> No, but not for lack of trying.  :)  I don't think there is much
> distance between us, practically speaking.

No there isn't. :)

> I guess I should mention...I didn't announce it here, but groff 1.23.0
> will have a new "MR" macro for man page cross references[2] and its
> grotty(1) driver will emit OSC 8 hyperlink escape sequences to
> supporting terminal devices for clickable URLs and man page cross
> references declared with "MR".[3]

OMG, those HTML people better be scared of competition :D

> 
> I reckon groff is due for another release candidate...

Yes, please :-)


Cheers,
Alex


-- 
Alejandro Colomar
Linux man-pages comaintainer; http://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bold and italics, semantics and constness (was: [PATCH v2] mctp.7: Add man page for Linux MCTP support)
  2021-11-22 11:50         ` Alejandro Colomar (man-pages)
@ 2021-11-22 13:52           ` G. Branden Robinson
  2021-11-22 16:00             ` Alejandro Colomar (man-pages)
  0 siblings, 1 reply; 15+ messages in thread
From: G. Branden Robinson @ 2021-11-22 13:52 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages); +Cc: Michael Kerrisk, linux-man

[-- Attachment #1: Type: text/plain, Size: 5536 bytes --]

Hi Alex,

At 2021-11-22T12:50:55+0100, Alejandro Colomar (man-pages) wrote:
[much snipped]
> I have had a very bad (and luckily short) experience with
> Lean/Agile/XP/MVP.

Well, they're all different things.  Supposedly.  Perhaps not.  :-|

> If I can help sabotage that, I will happily and intentionally do.

Hustlers will always be flogging "revolutionary" innovations to
managers, and managers will always suck them up.  It's not even that
there are never any worthwhile ideas in these manifestos, it's just that
you can fit that subset on an index card and the margins in publishing
index cards isn't high enough to attract anyone's interest.

> Are we talking about libc, or C documentation in general?  Because
> libc doesn't have any 'const' variables at all, at least that I know
> of.

I was speaking in general, but the Austin Group is kicking around a
const struct member for 'tm' right now.[1]

> So we don't even need to care in the Linux man-pages.  Maybe manual
> pages for other C libraries can better decide what to do with those.

I have this sick idea that everything that can be const, should be.
It's better for both parallelism and, generally, robustness.

> I think we're talking about 2 different things:
> 
> - 'const' variables
> - pointers to 'const'
> 
> 'const' variables can never be cast away.  It's an error.

Agreed.

> $ cat pointer_to_const.c
> void bar(const int *p)
> {
> 	int *q;
> 
> 	q = (int *)p;
> 	*q = 3;
> }
> 
> This is allowed by the compiler, but it is Undefined Behavior _unless_
> the variable pointed to by 'p' is really not const.  Casting away
> const is IMO also braindamaged, so I don't consider it a valid thing.

I remember having to do this on rare occasions, but don't recollect the
details.  I'm uncertain whether I was working around a bad design or
just being a bad programmer.

There is of course the constant pointer to a constant object.

const int * const foo;

...which, because people insist that type qualifiers must come before
type names so that declarations read slightly more like English, leads
them to ugly constructions like

const int *const foo;

to remind themselves which way the operator binds...when they _could_
just write things so that a simpler "noun-adjective" rule leads to a
correct parse in meatspace.

int const * const foo;

Klemens flogs this in _21st Century C_ (O'Reilly).

> One of the things that I like from C++ is that they correctly
> implemented const.  Hey, "asd" is 'char *' in C, but of course if you
> modify it, demons fly out of your nose!

With -fwritable-strings, they stay inside and poke your sinuses with
pitchforks...surely an improvement(?).

> Going back to formatting:
> 
> Pointers to const are just variables.  Their value is the address of
> some const, but that's not important.  The important thing is that
> they are variables, and therefore, you use italics with them.

Okay, I'm with you so far...

> So the only thing that IMHO should be bold (apart from constants and
> macros that expand to constants) should be global 'const' variables:
> 
> const int FOO = 3;
> 
> which some people prefer over macros, and which in C++, one could
> write as:
> 
> constexpr int FOO = 3;
> 
> In the case above, I don't see a reason why one would want to
> differentiate that from say
> 
> #define FOO  3

It's a good argument.  The only ones that I can marshal against it are
that in the first case, 'FOO' is a C object and an lvalue (which is
almost saying the same thing).  I don't know C++ well enough to address
the second example, but I'm learning C++98 so that I can deal better
with the groff code base, which is written "Annotated Reference Manual
C++", a dialect that is as old as groff itself (1990).

> But in function parameters, 'const' is useless in variables, since
> they already got a copy, so they won't alter the original.  And in
> pointers, const applies to the pointee, but the pointer itself is
> variable, so italics.

I think of 'const' in function definitions as asserting invariants.
Consider the following example, where I use two consts that it sounds
like you would not, to prevent 2 different forms of stupidity.

    void func(int const foo, int const * const bar) {
    //void func(int const foo, int const * bar) {
    //void func(int foo, int * bar) {
        foo = 3;  // prevented by 'int const foo'
        *bar = 4; // prevented by 'int const * bar'
        bar++;    // prevented by 'int const * const bar'
    }

    int main(int argc, char *argv[]) {
        int foo = 1;
        int bar = 2;
        func(foo, &bar);
        (void) printf("foo=%d, bar=%d\n", foo, bar);
    }

When you're writing a parser (groff has many of them), you pass 'const
char *' around all the time.  Often these pointers are ones you got back
from strtok() or similar.  You can pass them to a function to do some
kind of validity checking--but do you want that pointer advanced through
the string by the helper function, or not?  If not, then you can make
the pointer const, and rely on the helper function to make a copy of it
if necessary.  If you do want it to advance the pointer, for instance if
it's a function that skips comments and/or white space, then you can
pass a non-const pointer, and it will give you back a pointer that is
aimed at the next valid input token.

Regards,
Branden

[1] https://austingroupbugs.net/view.php?id=1533#c5532

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bold and italics, semantics and constness (was: [PATCH v2] mctp.7: Add man page for Linux MCTP support)
  2021-11-22 13:52           ` G. Branden Robinson
@ 2021-11-22 16:00             ` Alejandro Colomar (man-pages)
  0 siblings, 0 replies; 15+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-11-22 16:00 UTC (permalink / raw)
  To: G. Branden Robinson; +Cc: linux-man

[I removed Michael; not to spam him :]

On 11/22/21 14:52, G. Branden Robinson wrote:
> Hi Alex,
> 
> At 2021-11-22T12:50:55+0100, Alejandro Colomar (man-pages) wrote:
> [much snipped]
>> I have had a very bad (and luckily short) experience with
>> Lean/Agile/XP/MVP.
> 
> Well, they're all different things.  Supposedly.  Perhaps not.  :-|

My experience was with Agile/Scrum.  I've read about the others, and
they all seem variants of a similar thing.  I can't talk about the
others so much, since I haven't experienced them, but prefer to live
ignorant.  For agile, I can say that, for how my brain works internally,
it just destroys my performance, and my happiness.  Most job offers I
see today, either say Agile, or say nothing at all; I rarely see any of
the others being mentioned in job descriptions.  If I see agile, I will
very likely discard the offer.

> 
>> If I can help sabotage that, I will happily and intentionally do.
> 
> Hustlers will always be flogging "revolutionary" innovations to
> managers, and managers will always suck them up.  It's not even that
> there are never any worthwhile ideas in these manifestos, it's just that
> you can fit that subset on an index card and the margins in publishing
> index cards isn't high enough to attract anyone's interest.

My opinion is (of course I'm not a manager, and wouldn't like to be):
let the programmers program however they are more comfortable.  Each
brain works in a different manner.  Forcing one way will, at best,
reduce performance.  If anything, a manager should recommend reading
about a methodology (maybe after reading about it, the programmer likes
it), not impose it.  I've been in interviews where I've been asked a lot
about if at my current job they used Agile and how much they implemented
of it, and almost no questions about my programming skills...

> 
>> Are we talking about libc, or C documentation in general?  Because
>> libc doesn't have any 'const' variables at all, at least that I know
>> of.
> 
> I was speaking in general, but the Austin Group is kicking around a
> const struct member for 'tm' right now.[1]
> 
>> So we don't even need to care in the Linux man-pages.  Maybe manual
>> pages for other C libraries can better decide what to do with those.
> 
> I have this sick idea that everything that can be const, should be.
> It's better for both parallelism and, generally, robustness.
> 
>> I think we're talking about 2 different things:
>>
>> - 'const' variables
>> - pointers to 'const'
>>
>> 'const' variables can never be cast away.  It's an error.
> 
> Agreed.
> 
>> $ cat pointer_to_const.c
>> void bar(const int *p)
>> {
>> 	int *q;
>>
>> 	q = (int *)p;
>> 	*q = 3;
>> }
>>
>> This is allowed by the compiler, but it is Undefined Behavior _unless_
>> the variable pointed to by 'p' is really not const.  Casting away
>> const is IMO also braindamaged, so I don't consider it a valid thing.
> 
> I remember having to do this on rare occasions, but don't recollect the
> details.  I'm uncertain whether I was working around a bad design or
> just being a bad programmer.

I also remember doing this.  There's a valid scenario:

Callbacks where the interface is generic (void *).  In that case, you
need to provide a prototype that matches the interface, and even if you
don't modify the input, you need to provide a function that accepts a
non-const, to avoid having to cast the function pointer, which would be
even more dangerous.

Then, when you pass const data through the (void *), since you know it
won't be modified (you wrote the callback), you're safe casting it to
(void *).

I think there's no better way around that than casting to (void *).
However, that's rare enough not to be alarmed.

> 
> There is of course the constant pointer to a constant object.
> 
> const int * const foo;
> 
> ...which, because people insist that type qualifiers must come before
> type names so that declarations read slightly more like English, leads
> them to ugly constructions like
> 
> const int *const foo;

I like it this way.  I may be inconsistent, but I'm used to that way :)
I've read an ad-hoc argument that you can read it right-to-left as:
"foo is a const pointer to an int constant."

To try to justify myself, maybe, my brain considers 'const' to be a more
important piece of information than 'int', which is normally already
obvious by the context.  Since 'const' is closer to the left-side blank
it is more easily parsed by my eyes... just trying to make some sense.

> 
> to remind themselves which way the operator binds...when they _could_
> just write things so that a simpler "noun-adjective" rule leads to a
> correct parse in meatspace.
> 
> int const * const foo;

Yes, every objective argument would recommend this form, but
subjectively, it is ugly to my eyes :p

> 
> Klemens flogs this in _21st Century C_ (O'Reilly).
> 
>> One of the things that I like from C++ is that they correctly
>> implemented const.  Hey, "asd" is 'char *' in C, but of course if you
>> modify it, demons fly out of your nose!
> 
> With -fwritable-strings, they stay inside and poke your sinuses with
> pitchforks...surely an improvement(?).

Sometimes I would just like to see backwards compatibility to be
ignored.  There are things that are just wrong.  If C30 decided to make
const-correctness as it should have been from the beginning, and made
string literals to be (const char *), old programs that would want to
recompile with C30 would need to be fixed, but how much of a problem
would that be?  There's always the option to specify -std=c89 if you
want to avoid fixing some program.

But I also like how slow the C standard advances.  That way they avoid
adding much of the insane stuff that C++ adds.  Even the C standard adds
things that I very much doubt they were ever needed (static for array
parameters?).  So in the end, having so few changes is a good thing.

> 
>> Going back to formatting:
>>
>> Pointers to const are just variables.  Their value is the address of
>> some const, but that's not important.  The important thing is that
>> they are variables, and therefore, you use italics with them.
> 
> Okay, I'm with you so far...
> 
>> So the only thing that IMHO should be bold (apart from constants and
>> macros that expand to constants) should be global 'const' variables:
>>
>> const int FOO = 3;
>>
>> which some people prefer over macros, and which in C++, one could
>> write as:
>>
>> constexpr int FOO = 3;
>>
>> In the case above, I don't see a reason why one would want to
>> differentiate that from say
>>
>> #define FOO  3
> 
> It's a good argument.  The only ones that I can marshal against it are
> that in the first case, 'FOO' is a C object and an lvalue (which is
> almost saying the same thing).  I don't know C++ well enough to address
> the second example, but I'm learning C++98 so that I can deal better
> with the groff code base, which is written "Annotated Reference Manual
> C++", a dialect that is as old as groff itself (1990).

The second one guarantees that FOO evaluates to a constant at
compile-time.  Therefore, you can for example use it to initialize
non-VLA array sizes.

I use C++ as "C with extensions" (not even "C with classes").  I like
some of it's extensions so much that I use them in C:

#if !defined(__cplusplus)
#define auto  __auto_type
#endif

If it confuses someone, I'm sorry :)

> 
>> But in function parameters, 'const' is useless in variables, since
>> they already got a copy, so they won't alter the original.  And in
>> pointers, const applies to the pointee, but the pointer itself is
>> variable, so italics.
> 
> I think of 'const' in function definitions as asserting invariants.
> Consider the following example, where I use two consts that it sounds
> like you would not, to prevent 2 different forms of stupidity.

In the function definiton, it may be useful (but not so much).
In the function prototype, it is useless.

> 
>     void func(int const foo, int const * const bar) {
>     //void func(int const foo, int const * bar) {
>     //void func(int foo, int * bar) {
>         foo = 3;  // prevented by 'int const foo'
>         *bar = 4; // prevented by 'int const * bar'
>         bar++;    // prevented by 'int const * const bar'
>     }

I tend to write functions so short (usually fit a screen; < 24 LOC) that
you can see if you're doing something really stupid.  In something like
glibc's source code, where you see functions with more than 200 LOC,
then const may make sense.

I also very consistently differentiate array syntax from pointer syntax,
so I rarely use ++ to advance pointers.

I'd write:

     void func(int foo, const int *bar, int baz[foo])
     {
	 int i = 0;

         foo = 3;  // I'm editing my own copy. There's not much danger
                   // If I'm really stupid in the implementation,
                   // that has nothing to do with the interface.
         *bar = 4; // prevented by 'const int *bar'
         bar++;    // Having a clear distinction in the prototype
                   // makes this mistake very rare.
         baz[i++] = 5;
     }

In glibc code, for example, they don't use const variables in the
prototypes, but they use them in the definitions.  I'd rather write
shorter functions, and a more readable coding style would also help.

I like to limit prototypes to info that is relevant to the interface,
and I also like the definition to be identical to the prototype.

I limit my usage of const to interfaces, and to achieve
const-correctness.  Avoiding silly mistakes in the implementation of an
algorithm creates too much noise to my taste.

> 
>     int main(int argc, char *argv[]) {
>         int foo = 1;
>         int bar = 2;
>         func(foo, &bar);
>         (void) printf("foo=%d, bar=%d\n", foo, bar);
>     }
> 
> When you're writing a parser (groff has many of them), you pass 'const
> char *' around all the time.  Often these pointers are ones you got back
> from strtok() or similar.  You can pass them to a function to do some
> kind of validity checking--but do you want that pointer advanced through
> the string by the helper function, or not?  If not, then you can make
> the pointer const, and rely on the helper function to make a copy of it
> if necessary.  If you do want it to advance the pointer, for instance if
> it's a function that skips comments and/or white space, then you can
> pass a non-const pointer, and it will give you back a pointer that is
> aimed at the next valid input token.

By returning the pointer?  Or by a pointer-to-pointer?  Otherwise, the
edit is only visible to the copy of the pointer.

Regards,
Alex


-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] mctp.7: Add man page for Linux MCTP support
  2021-11-18  5:08       ` Jeremy Kerr
@ 2021-11-22 16:35         ` Alejandro Colomar (man-pages)
  0 siblings, 0 replies; 15+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-11-22 16:35 UTC (permalink / raw)
  To: Jeremy Kerr; +Cc: Michael Kerrisk, G. Branden Robinson, linux-man

Hi Jeremy,

On 11/18/21 06:08, Jeremy Kerr wrote:
> Hi Alex,
> 
>> I didn't even know there was such a convention, but if there is, yes,
>> I explicitly want to override it.
> 
> OK, I'll update to suit.
> 
>> man-pages(7) recommends breaking long lines at clause boundaries
>> (commas, semicolons, and so on), but somethimes clauses (if you don't
>> know the difference between phrases and clauses, which you don't need
>> to, basically clauses are made up of phrases) are too long, and you
>> can have a single clause that uses more than a single line.  In those
>> cases, the most sensible place to break the line is at the next level:
>> phrase boundaries.
> 
> OK, understood, thanks for that.
> 
>> What I mean is, if in the future this structure will have additional
>> trailing fields, documenting this padding is unnecessary, since that
>> may
>> vary.  Code should not rely on this structure having _only_ that
>> padding.  And if code handles any arbitrary extra stuff in this
>> structure, it will implicitly also handle that __smctp_pad1 field, so
>> there's no need to mention it.
>>
>> Example:
>>
>> struct sockaddr_mctp {
>>       unsigned short     smctp_family;  /* = AF_MCTP */
>>       uint16_t           __smctp_pad0;  /* pad, must be zero */
>>       int                smctp_network; /* local network identifier */
>>       struct mctp_addr   smctp_addr;    /* EID */
>>       uint8_t            smctp_type;    /* message type byte */
>>       uint8_t            smctp_tag;     /* tag value, including TO flag */
>>       uint8_t            foo;           /* was __smctp_pad1 */
>>       uint8_t            bar;           /* extra stuff */
>> };
>>
>> Here I got rid of the pad, and even added an extra field.
> 
> Right, but you've also broken the ABI if that previous padding byte
> wasn't explicitly present, and required to be zero.

Ahh, that's why I require in such cases a memset(p, 0, sizeof(*p));
That covers any undocumented padding; present or future.

> In that future ABI
> implementation, the kernel can't distinguish between 'foo' being properly
> initialised, and not just random stack garbage from an old-ABI caller.
> 
> That's why we have the _pad1 field, and why we require it to be zero.
> Since that's enforced by the kernel, I'd rather have it documented,
> rather than users seeing their calls fail for "invisible" reasons, when
> a call's _pad1 happens to contain a non-zero byte due to not being
> initialised.
> 
>> Code should be written to be compatible with this case, right?
> 
> I'm not 100% clear on you mean by compatible there - you want to prevent
> the case where __smctp_pad1 is removed from the header, and that code is
> now referencing an invalid struct member?
> 
> That's somewhat unavoidable, and also applies to _pad0; I'm not sure why
> _pad1 needs to be different.

I want that programmers zero the structure not by p->_pad0 = 0,
but by bzero(3) or memset(3).  That's how it covers any ammount of padding.

Nevertheless, I don't have a strong feeling about this, and if you 
prefer to document the padding, I'm fine with that.

Cheers!
Alex


-- 
Alejandro Colomar
Linux man-pages comaintainer; http://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-11-22 16:35 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-11  1:53 [PATCH v2] mctp.7: Add man page for Linux MCTP support Jeremy Kerr
2021-11-11 21:38 ` Alejandro Colomar (man-pages)
2021-11-12  1:12   ` Jeremy Kerr
2021-11-12 18:45     ` Alejandro Colomar (man-pages)
2021-11-12 19:40       ` G. Branden Robinson
2021-11-12 20:07         ` Alejandro Colomar (man-pages)
2021-11-12 20:07           ` Alejandro Colomar (man-pages)
2021-11-18  5:08       ` Jeremy Kerr
2021-11-22 16:35         ` Alejandro Colomar (man-pages)
2021-11-12  9:35   ` G. Branden Robinson
2021-11-12 19:50     ` Alejandro Colomar (man-pages)
2021-11-22  9:06       ` Bold and italics, semantics and constness (was: [PATCH v2] mctp.7: Add man page for Linux MCTP support) G. Branden Robinson
2021-11-22 11:50         ` Alejandro Colomar (man-pages)
2021-11-22 13:52           ` G. Branden Robinson
2021-11-22 16:00             ` Alejandro Colomar (man-pages)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.