All of lore.kernel.org
 help / color / mirror / Atom feed
* string_copy(7): New manual page documenting string copying functions.
@ 2022-12-11 23:59 Alejandro Colomar
  2022-12-12  0:17 ` Alejandro Colomar
                   ` (5 more replies)
  0 siblings, 6 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-11 23:59 UTC (permalink / raw)
  To: linux-man


[-- Attachment #1.1: Type: text/plain, Size: 24289 bytes --]

Hi all!

I'm planning to add a new manual page that documents all string copying 
functions.  It covers more detail than any of the existing manual pages (and in 
fact, I've discovered some properties of the functions documented while working 
on this page).  The intention is to remove the existing separate manual pages 
for all string copying functions, and make them links to this new page.  It 
intends to be the only reference documentation for copying strings in C, and 
hopefully fix the half century of suboptimal string copying library with which 
we've lived.  (Say goodbye to std::string, here come back C strings ;)

The formatted manual page is below.

Alex

P.S.: I'm sorry for your beloved string copying function(s); it has high chances 
of being dreaded by the page below.  Not sorry.  Oh well, at least I justified 
it, or I tried :-)

---

string_copy(7)         Miscellaneous Information Manual         string_copy(7)

NAME
        stpcpy,  stpecpy,  stpecpyx, strlcpy, strlcat, strscpy, strcpy, strcat,
        stpncpy, ustr2stp, strncpy, strncat, mempcpy - copy strings

SYNOPSIS
    (Null‐terminated) strings
        // Chain‐copy a string.
        char *stpcpy(char *restrict dst, const char *restrict src);

        // Chain‐copy a string with truncation (not in libc).
        char *stpecpy(char *dst, char past_end[0], const char *restrict src);

        // Chain‐copy a string with truncation and SIGSEGV on invalid input.
        char *stpecpyx(char *dst, char past_end[0], const char *restrict src);

        // Copy a string with truncation and SIGSEGV on invalid input.
        [[deprecated]]  // Use stpecpyx() instead.
        size_t strlcpy(char dst[restrict .sz], const char *restrict src,
                       size_t sz);

        // Concatenate a string with truncation.
        [[deprecated]]  // Use stpecpyx() instead.
        size_t strlcat(char dst[restrict .sz], const char *restrict src,
                       size_t sz);

        // Copy a string with truncation (not in libc).
        [[deprecated]]  // Use stpecpy() instead.
        ssize_t strscpy(char dst[restrict .sz], const char src[restrict .sz],
                       size_t sz);

        // Copy a string.
        [[deprecated]]  // Use stpcpy(3) instead.
        char *strcpy(char *restrict dst, const char *restrict src);

        // Concatenate a string.
        [[deprecated]]  // Use stpcpy(3) instead.
        char *strcat(char *restrict dst, const char *restrict src);

    Unterminated strings (null‐padded fixed‐width buffers)
        // Zero a fixed‐width buffer, and
        // copy a string with truncation into an unterminated string.
        char *stpncpy(char dst[restrict .sz], const char *restrict src,
                       size_t sz);

        // Chain‐copy an unterminated string into a string (not in libc).
        char *ustr2stp(char *restrict dst, const char src[restrict .sz],
                       size_t sz);

        // Zero a fixed‐width buffer, and
        // copy a string with truncation into an unterminated string
        [[deprecated]]  // Use stpncpy(3) instead.
        char *strncpy(char dest[restrict .sz], const char *restrict src,
                       size_t sz);

        // Concatenate an unterminated string into a string.
        [[deprecated]]  // Use ustr2stp() instead.
        char *strncat(char *restrict dst, const char src[restrict .sz],
                       size_t sz);

    String structures
        // (Null‐terminated) string structure.
        struct str_s {
            size_t  len;
            char    *str;
        };

        // Unterminated string structure (overlapping strings).
        struct ustr_s {
            size_t  len;
            char    *ustr;
        };

        // Chain‐copy a string structure into an unterminated string.
        void *mempcpy(void *restrict dst, const void src[restrict len],
                       size_t len);

DESCRIPTION
    Terms (and abbreviations)
        string (str)
               is a sequence of zero or more non‐null characters, followed by a
               null byte.

        unterminated string (ustr)
               is a sequence of zero or more  non‐null  characters.   They  are
               sometimes  contained  in fixed‐width buffers, which usually con‐
               tain padding null bytes after the unterminated string,  to  fill
               the  rest  of  the  buffer  without  affecting  the unterminated
               string; however, those padding null bytes are not  part  of  the
               unterminated string.

        length (len)
               is the number of non‐null characters in a string.  It is the re‐
               turn value of strlen(str) and of strnlen(ustr, sz).

        size (sz)
               refers to the entire buffer where the string is contained.

        end    is  the  name  of  a  pointer  to the terminating null byte of a
               string, or a pointer to one past the last character of an unter‐
               minated string.  This is the return value of functions that  al‐
               low chaining.  It is equivalent to &str[len].

        past_end
               is  the name of a pointer to one past the end of the buffer that
               contains a string.  It is equivalent to &str[sz].  It is used as
               a sentinel value, to be able  to  truncate  strings  instead  of
               overrunning a buffer.

        string structure
        unterminated string structure
               Structure  that  contains the length of a string, as well as the
               string or the unterminated string.

    Types of functions
        Copy, concatenate, and chain‐copy
               Originally, there was a distinction between functions that  copy
               and  those that concatenate.  However, newer functions that copy
               while allowing chaining cover both use cases with a single  API.
               They  are  also algorithmically faster, since they don’t need to
               search for the end of the existing string.

               To chain copy functions, they need to return a  pointer  to  the
               end.   That’s  a  byproduct  of the copy operation, so it has no
               performance costs.  These functions are preferred over  copy  or
               concatenation  functions.  Functions that return such a pointer,
               and thus can be chained, have names of the form  *stp*(),  since
               it’s also common to name the pointer just p.

        Truncate or not?
               The  first  thing  to note is that programmers should be careful
               with buffers, so they always have the correct size, and  trunca‐
               tion is not necessary.

               In  most  cases, truncation is not desired, and it is simpler to
               just do the copy.  Simpler  code  is  safer  code.   Programming
               against  programming mistakes by adding more code just adds more
               points where mistakes can be made.

               Nowadays, compilers can detect most programmer errors with  fea‐
               tures    like   compiler   warnings,   static   analyzers,   and
               _FORTIFY_SOURCE (see ftm(7)).  Keeping  the  code  simple  helps
               these error‐detection features be more precise.

               When validating user input, however, it makes sense to truncate.
               Remember to check the return value of such function calls.

               Functions that truncate:

               •  stpecpy()  is  the  most  efficient string copy function that
                  performs truncation.  It only requires to check  for  trunca‐
                  tion once after all chained calls.

               •  stpecpyx() is a variant of stpecpy() that consumes the entire
                  source string, to catch bugs in the program by forcing a seg‐
                  mentation fault (as strlcpy(3bsd) and strlcat(3bsd) do).

               •  strlcpy(3bsd) and strlcat(3bsd), which originated in OpenBSD,
                  are designed to crash if the input string is invalid (doesn’t
                  contain a null byte).

               •  strscpy(9) is a function in the Linux kernel which reports an
                  error instead of crashing.

               •  stpncpy(3) and strncpy(3) also truncate, but they don’t write
                  strings, but rather unterminated strings.

    Unterminated strings (null‐padded fixed‐width buffers)
        For  historic reasons, some standard APIs, such as utmpx(5), use unter‐
        minated strings in fixed‐width buffers.  To interface with  them,  spe‐
        cialized functions need to be used.

        To copy strings into them, use stpncpy(3).

        To  copy from an unterminated string within a fixed‐width buffer into a
        string, ignoring any trailing null  bytes  in  the  source  fixed‐width
        buffer, you should use ustr2stp().

    String structures
        The simplest string copying function is mempcpy(3).  It requires always
        knowing  the length of your strings, for which string structures can be
        used.  It makes the code simpler, since you always know the  length  of
        your strings, and it’s also faster, since it doesn’t need to repeatedly
        calculate  those  lengths.   mempcpy(3)  always creates an unterminated
        string, so you need to explicitly set the terminating null byte.

        String structure
               The following code can be  used  to  chain‐copy  from  a  string
               structure into a string:

                   p = mempcpy(p, src->str, src->len);
                   *p = '\0';

               The  following  code  can  be  used  to chain‐copy from a string
               structure into an unterminated string:

                   p = mempcpy(p, src->str, src->len);

        Unterminated string structure (overlapping strings)
               In programs that make considerable use of strings, and need  the
               best  performance, using overlapping strings can make a big dif‐
               ference.  It allows holding substrings of a bigger string  while
               not duplicating memory nor using time to do a copy.

               However,  this is delicate, since it requires using unterminated
               strings.  C library APIs use strings, so programs that  use  un‐
               terminated  strings  will  have  to  take  care to differentiate
               strings from unterminated strings.

               The following code can be used to chain‐copy  from  an  untermi‐
               nated string structure to a string:

                   p = mempcpy(p, src->ustr, src->len);
                   *p = '\0';

               The  following  code  can be used to chain‐copy from an untermi‐
               nated string structure to an unterminated string:

                   p = mempcpy(p, src->ustr, src->len);

    Functions
        stpcpy(3)
               This function copies the input string into a destination string.
               The programmer is responsible  for  allocating  a  buffer  large
               enough.  It returns a pointer suitable for chaining.

        stpecpy()
        stpecpyx()
               These functions copy the input string into a destination string.
               If  the destination buffer, limited by a pointer to one past the
               end of it, isn’t large enough to hold the  copy,  the  resulting
               string  is  truncated  (but  it  is guaranteed to be null‐termi‐
               nated).  They return a pointer suitable for  chaining.   Trunca‐
               tion needs to be detected only once after the last chained call.
               stpecpyx()  has identical semantics to stpecpy(), except that it
               forces a SIGSEGV on Undefined Behavior.

               These functions are not provided by any library, but you can de‐
               fine them with the following reference implementations:

                   /* This code is in the public domain. */
                   char *
                   stpecpy(char *dst, char past_end[0],
                           const char *restrict src)
                   {
                       char *p;

                       if (dst == past_end)
                           return past_end;

                       p = memccpy(dst, src, '\0', past_end - dst);
                       if (p != NULL)
                           return p - 1;

                       /* truncation detected */
                       past_end[-1] = '\0';
                       return past_end;
                   }

                   /* This code is in the public domain. */
                   char *
                   stpecpyx(char *dst, char past_end[0],
                            const char *restrict src)
                   {
                       if (src[strlen(src)] != '\0')
                           raise(SIGSEGV);

                       return stpecpy(dst, past_end, src);
                   }

        stpncpy(3)
               This function copies the input string into a  destination  null‐
               padded  fixed‐width  unterminated  string.   If  the destination
               buffer, limited by its size, isn’t  large  enough  to  hold  the
               copy,  the  resulting  string is truncated.  Since it creates an
               unterminated string, it doesn’t need to write a terminating null
               byte.  It returns a pointer suitable for chaining, but it’s  not
               ideal for that.  Truncation needs to be detected only once after
               the last chained call.

               If  you’re going to use this function in chained calls, it would
               probably be useful to develop a function similar to stpecpy().

        ustr2stp()
               This function copies the input unterminated string contained  in
               a  null‐padded wixed‐width buffer, into a destination (null‐ter‐
               minated) string.  The programmer is responsible for allocating a
               buffer large enough.  It returns a pointer suitable  for  chain‐
               ing.

               This  function is not provided by any library, but you can write
               it with the definition above in this page.

               A truncating version of this function doesn’t exist,  since  the
               size  of  the original string is always known, so it wouldn’t be
               very useful.

               This function is not provided by any library, but you can define
               it with the following reference implementation:

                   /* This code is in the public domain. */
                   char *
                   ustr2stp(char *restrict dst, const char *restrict src,
                            size_t sz)
                   {
                       char  *end;

                       end = memccpy(dst, src, '\0', sz)) ?: dst + sz;
                       *end = '\0';

                       return end;
                   }

        mempcpy(3)
               This function copies the input string, limited  by  its  length,
               into  a  destination unterminated string.  The programmer is re‐
               sponsible for allocating a buffer large enough.   It  returns  a
               pointer suitable for chaining.

    Deprecated functions
        strlcpy(3bsd)
        strlcat(3bsd)
               Deprecated.  These functions copy the input string into a desti‐
               nation  string.  If the destination buffer, limited by its size,
               isn’t large enough to hold the copy,  the  resulting  string  is
               truncated  (but  it  is guaranteed to be null‐terminated).  They
               return the length of the total  string  they  tried  to  create.
               These functions force a SIGSEGV on Undefined Behavior.

               stpecpyx()  is  a better replacement for these functions for the
               following reasons:

               •  Better performance (chain copy instead of concatenating).

               •  Only requires detecting truncation once per chain of calls.

        strscpy(9)
               Deprecated.  This function copies the input string into a desti‐
               nation string.  If the destination buffer, limited by its  size,
               isn’t  large  enough  to  hold the copy, the resulting string is
               truncated (but it is guaranteed to be null‐terminated).  It  re‐
               turns the length of the destination string, or -E2BIG on trunca‐
               tion.

               stpecpy()  is  a  better replacement for this function, since it
               has a much simpler interface.

        strcpy(3)
        strcat(3)
               Deprecated.  These functions copy the input string into a desti‐
               nation string.  The programmer is responsible for  allocating  a
               buffer large enough.  The return value is useless.

               strcpy(3)  is  identical to stpcpy(3) except for the useless re‐
               turn value.

               stpcpy(3) is a better replacement for these  functions  for  the
               following reasons:

               •  Better performance (chain copy instead of concatenating).

               •  No need to call strlen(3), thanks to the useful return value.

        strncpy(3)
               Deprecated.   strncpy(3)  is  identical to stpncpy(3) except for
               the useless return value.  Due to the return  value,  with  this
               function  it’s hard to correctly check for truncation.  Use stp‐
               ncpy(3) instead.

        strncat(3)
               Deprecated.  Do not confuse this function with strncpy(3);  they
               are not related at all.

               This  function  concatenates  the input unterminated string con‐
               tained in a null‐padded wixed‐width buffer, into  a  destination
               (null‐terminated) string.  The programmer is responsible for al‐
               locating a buffer large enough.  The return value is useless.

               ustr2stp()  is  a  better  replacement for this function for the
               following reasons:

               •  Better performance (chain copy instead of concatenating).

               •  No need to call strlen(3), thanks to the useful return value.

               •  Function name that is not actively confusing.

RETURN VALUE
        The following functions return a pointer to the terminating  null  byte
        in the destination string (they never truncate).

        •  stpcpy(3)

        •  ustr2stp()

        •  mempcpy(3)

        The  following  functions return a pointer to the terminating null byte
        in the destination string, except when truncation occurs; if truncation
        occurs, they return a pointer to one past the end  of  the  destination
        buffer.

        •  stpecpy()

        •  stpecpyx()

        The  following function returns a pointer to one after the last charac‐
        ter in the destination unterminated string; if truncation occurs,  that
        pointer  is equivalent to a pointer to one past the end of the destina‐
        tion buffer.

        •  stpncpy(3)

    Deprecated
        The following functions return the length of the total string that they
        tried to create (as if truncation didn’t occur).

        •  strlcpy(3bsd)

        •  strlcat(3bsd)

        The following function returns the length of the destination string, or
        -E2BIG on truncation.

        •  strscpy(9)

        The following functions return the dst pointer, which is useless.

        •  strcpy(3)

        •  strcat(3)

        •  strncpy(3)

        •  strncat(3)

CAVEATS
        Some of the functions described here are not provided by  any  library;
        you should write your own copy if you want to use them.

        The  deprecated status of these functions varies from system to system.
        This page declares as deprecated those functions that have a better re‐
        placement documented in this same page.

EXAMPLES
        The following are examples of correct use of each of these functions.

        stpcpy(3)
                   p = buf;
                   p = stpcpy(p, "Hello ");
                   p = stpcpy(p, "world");
                   p = stpcpy(p, "!");
                   len = p - buf;
                   puts(buf);

        stpecpy()
        stpecpyx()
                   past_end = buf + sizeof(buf);
                   p = buf;
                   p = stpecpy(p, past_end, "Hello ");
                   p = stpecpy(p, past_end, "world");
                   p = stpecpy(p, past_end, "!");
                   if (p == past_end) {
                       p--;
                       goto toolong;
                   }
                   len = p - buf;
                   puts(buf);

        stpncpy(3)
                   past_end = buf + sizeof(buf);
                   end = stpncpy(buf, "Hello world!", sizeof(buf));
                   if (end == past_end)
                       goto toolong;
                   len = end - buf;
                   for (size_t i = 0; i < sizeof(buf); i++)
                       putchar(buf[i]);

        ustr2stp()
                   p = buf;
                   p = ustr2stp(p, "Hello ", 6);
                   p = ustr2stp(p, "world", 42);  // Padding null bytes ignored.
                   p = ustr2stp(p, "!", 1);
                   len = p - buf;
                   puts(buf);

        mempcpy(3)
                   p = buf;
                   p = mempcpy(p, "Hello ", 6);
                   p = mempcpy(p, "world", 5);
                   p = mempcpy(p, "!", 1);
                   p = '\0';
                   len = p - buf;
                   puts(buf);

    Deprecated
        strlcpy(3bsd)
        strlcat(3bsd)
                   if (strlcpy(buf, "Hello ", sizeof(buf)) >= sizeof(buf))
                       goto toolong;
                   if (strlcat(buf, "world", sizeof(buf)) >= sizeof(buf))
                       goto toolong;
                   len = strlcat(buf, "!", sizeof(buf));
                   if (len >= sizeof(buf))
                       goto toolong;
                   puts(buf);

        strscpy(9)
                   len = strscpy(buf, "Hello world!", sizeof(buf));
                   if (len == -E2BIG)
                       goto toolong;
                   puts(buf);

        strcpy(3)
        strcat(3)
                   strcpy(buf, "Hello ");
                   strcat(buf, "world");
                   strcat(buf, "!");
                   len = strlen(buf);
                   puts(buf);

        strncpy(3)
                   strncpy(buf, "Hello world!", sizeof(buf));
                   if (buf + sizeof(buf) - 1 == '\0')
                       goto toolong;
                   len = strnlen(buf, sizeof(buf));
                   for (size_t i = 0; i < sizeof(buf); i++)
                       putchar(buf[i]);

        strncat(3)
                   strncpy(buf, "Hello ", 6);
                   strncat(buf, "world", 42);  // Padding null bytes ignored.
                   strncat(buf, "!", 1);
                   puts(buf);

SEE ALSO
        memcpy(3), memccpy(3), mempcpy(3), string(3)

Linux man‐pages (unreleased)        (date)                      string_copy(7)



-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: string_copy(7): New manual page documenting string copying functions.
  2022-12-11 23:59 string_copy(7): New manual page documenting string copying functions Alejandro Colomar
@ 2022-12-12  0:17 ` Alejandro Colomar
  2022-12-12  0:25 ` Alejandro Colomar
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-12  0:17 UTC (permalink / raw)
  To: linux-man


[-- Attachment #1.1: Type: text/plain, Size: 31738 bytes --]



On 12/12/22 00:59, Alejandro Colomar wrote:
> Hi all!
> 
> I'm planning to add a new manual page that documents all string copying 
> functions.  It covers more detail than any of the existing manual pages (and in 
> fact, I've discovered some properties of the functions documented while working 
> on this page).  The intention is to remove the existing separate manual pages 
> for all string copying functions, and make them links to this new page.  It 
> intends to be the only reference documentation for copying strings in C, and 
> hopefully fix the half century of suboptimal string copying library with which 
> we've lived.  (Say goodbye to std::string, here come back C strings ;)
> 
> The formatted manual page is below.
> 
> Alex
> 
> P.S.: I'm sorry for your beloved string copying function(s); it has high chances 
> of being dreaded by the page below.  Not sorry.  Oh well, at least I justified 
> it, or I tried :-)
> 
> ---
> 
> string_copy(7)         Miscellaneous Information Manual         string_copy(7)
> 
> NAME
>         stpcpy,  stpecpy,  stpecpyx, strlcpy, strlcat, strscpy, strcpy, strcat,
>         stpncpy, ustr2stp, strncpy, strncat, mempcpy - copy strings
> 
> SYNOPSIS
>     (Null‐terminated) strings
>         // Chain‐copy a string.
>         char *stpcpy(char *restrict dst, const char *restrict src);
> 
>         // Chain‐copy a string with truncation (not in libc).
>         char *stpecpy(char *dst, char past_end[0], const char *restrict src);
> 
>         // Chain‐copy a string with truncation and SIGSEGV on invalid input.
>         char *stpecpyx(char *dst, char past_end[0], const char *restrict src);
> 
>         // Copy a string with truncation and SIGSEGV on invalid input.
>         [[deprecated]]  // Use stpecpyx() instead.
>         size_t strlcpy(char dst[restrict .sz], const char *restrict src,
>                        size_t sz);
> 
>         // Concatenate a string with truncation.
>         [[deprecated]]  // Use stpecpyx() instead.
>         size_t strlcat(char dst[restrict .sz], const char *restrict src,
>                        size_t sz);
> 
>         // Copy a string with truncation (not in libc).
>         [[deprecated]]  // Use stpecpy() instead.
>         ssize_t strscpy(char dst[restrict .sz], const char src[restrict .sz],
>                        size_t sz);
> 
>         // Copy a string.
>         [[deprecated]]  // Use stpcpy(3) instead.
>         char *strcpy(char *restrict dst, const char *restrict src);
> 
>         // Concatenate a string.
>         [[deprecated]]  // Use stpcpy(3) instead.
>         char *strcat(char *restrict dst, const char *restrict src);
> 
>     Unterminated strings (null‐padded fixed‐width buffers)
>         // Zero a fixed‐width buffer, and
>         // copy a string with truncation into an unterminated string.
>         char *stpncpy(char dst[restrict .sz], const char *restrict src,
>                        size_t sz);
> 
>         // Chain‐copy an unterminated string into a string (not in libc).
>         char *ustr2stp(char *restrict dst, const char src[restrict .sz],
>                        size_t sz);
> 
>         // Zero a fixed‐width buffer, and
>         // copy a string with truncation into an unterminated string
>         [[deprecated]]  // Use stpncpy(3) instead.
>         char *strncpy(char dest[restrict .sz], const char *restrict src,
>                        size_t sz);
> 
>         // Concatenate an unterminated string into a string.
>         [[deprecated]]  // Use ustr2stp() instead.
>         char *strncat(char *restrict dst, const char src[restrict .sz],
>                        size_t sz);
> 
>     String structures
>         // (Null‐terminated) string structure.
>         struct str_s {
>             size_t  len;
>             char    *str;
>         };
> 
>         // Unterminated string structure (overlapping strings).
>         struct ustr_s {
>             size_t  len;
>             char    *ustr;
>         };
> 
>         // Chain‐copy a string structure into an unterminated string.
>         void *mempcpy(void *restrict dst, const void src[restrict len],
>                        size_t len);
> 
> DESCRIPTION
>     Terms (and abbreviations)
>         string (str)
>                is a sequence of zero or more non‐null characters, followed by a
>                null byte.
> 
>         unterminated string (ustr)
>                is a sequence of zero or more  non‐null  characters.   They  are
>                sometimes  contained  in fixed‐width buffers, which usually con‐
>                tain padding null bytes after the unterminated string,  to  fill
>                the  rest  of  the  buffer  without  affecting  the unterminated
>                string; however, those padding null bytes are not  part  of  the
>                unterminated string.
> 
>         length (len)
>                is the number of non‐null characters in a string.  It is the re‐
>                turn value of strlen(str) and of strnlen(ustr, sz).
> 
>         size (sz)
>                refers to the entire buffer where the string is contained.
> 
>         end    is  the  name  of  a  pointer  to the terminating null byte of a
>                string, or a pointer to one past the last character of an unter‐
>                minated string.  This is the return value of functions that  al‐
>                low chaining.  It is equivalent to &str[len].
> 
>         past_end
>                is  the name of a pointer to one past the end of the buffer that
>                contains a string.  It is equivalent to &str[sz].  It is used as
>                a sentinel value, to be able  to  truncate  strings  instead  of
>                overrunning a buffer.
> 
>         string structure
>         unterminated string structure
>                Structure  that  contains the length of a string, as well as the
>                string or the unterminated string.
> 
>     Types of functions
>         Copy, concatenate, and chain‐copy
>                Originally, there was a distinction between functions that  copy
>                and  those that concatenate.  However, newer functions that copy
>                while allowing chaining cover both use cases with a single  API.
>                They  are  also algorithmically faster, since they don’t need to
>                search for the end of the existing string.
> 
>                To chain copy functions, they need to return a  pointer  to  the
>                end.   That’s  a  byproduct  of the copy operation, so it has no
>                performance costs.  These functions are preferred over  copy  or
>                concatenation  functions.  Functions that return such a pointer,
>                and thus can be chained, have names of the form  *stp*(),  since
>                it’s also common to name the pointer just p.
> 
>         Truncate or not?
>                The  first  thing  to note is that programmers should be careful
>                with buffers, so they always have the correct size, and  trunca‐
>                tion is not necessary.
> 
>                In  most  cases, truncation is not desired, and it is simpler to
>                just do the copy.  Simpler  code  is  safer  code.   Programming
>                against  programming mistakes by adding more code just adds more
>                points where mistakes can be made.
> 
>                Nowadays, compilers can detect most programmer errors with  fea‐
>                tures    like   compiler   warnings,   static   analyzers,   and
>                _FORTIFY_SOURCE (see ftm(7)).  Keeping  the  code  simple  helps
>                these error‐detection features be more precise.
> 
>                When validating user input, however, it makes sense to truncate.
>                Remember to check the return value of such function calls.
> 
>                Functions that truncate:
> 
>                •  stpecpy()  is  the  most  efficient string copy function that
>                   performs truncation.  It only requires to check  for  trunca‐
>                   tion once after all chained calls.
> 
>                •  stpecpyx() is a variant of stpecpy() that consumes the entire
>                   source string, to catch bugs in the program by forcing a seg‐
>                   mentation fault (as strlcpy(3bsd) and strlcat(3bsd) do).
> 
>                •  strlcpy(3bsd) and strlcat(3bsd), which originated in OpenBSD,
>                   are designed to crash if the input string is invalid (doesn’t
>                   contain a null byte).
> 
>                •  strscpy(9) is a function in the Linux kernel which reports an
>                   error instead of crashing.
> 
>                •  stpncpy(3) and strncpy(3) also truncate, but they don’t write
>                   strings, but rather unterminated strings.
> 
>     Unterminated strings (null‐padded fixed‐width buffers)
>         For  historic reasons, some standard APIs, such as utmpx(5), use unter‐
>         minated strings in fixed‐width buffers.  To interface with  them,  spe‐
>         cialized functions need to be used.
> 
>         To copy strings into them, use stpncpy(3).
> 
>         To  copy from an unterminated string within a fixed‐width buffer into a
>         string, ignoring any trailing null  bytes  in  the  source  fixed‐width
>         buffer, you should use ustr2stp().
> 
>     String structures
>         The simplest string copying function is mempcpy(3).  It requires always
>         knowing  the length of your strings, for which string structures can be
>         used.  It makes the code simpler, since you always know the  length  of
>         your strings, and it’s also faster, since it doesn’t need to repeatedly
>         calculate  those  lengths.   mempcpy(3)  always creates an unterminated
>         string, so you need to explicitly set the terminating null byte.
> 
>         String structure
>                The following code can be  used  to  chain‐copy  from  a  string
>                structure into a string:
> 
>                    p = mempcpy(p, src->str, src->len);
>                    *p = '\0';
> 
>                The  following  code  can  be  used  to chain‐copy from a string
>                structure into an unterminated string:
> 
>                    p = mempcpy(p, src->str, src->len);
> 
>         Unterminated string structure (overlapping strings)
>                In programs that make considerable use of strings, and need  the
>                best  performance, using overlapping strings can make a big dif‐
>                ference.  It allows holding substrings of a bigger string  while
>                not duplicating memory nor using time to do a copy.
> 
>                However,  this is delicate, since it requires using unterminated
>                strings.  C library APIs use strings, so programs that  use  un‐
>                terminated  strings  will  have  to  take  care to differentiate
>                strings from unterminated strings.
> 
>                The following code can be used to chain‐copy  from  an  untermi‐
>                nated string structure to a string:
> 
>                    p = mempcpy(p, src->ustr, src->len);
>                    *p = '\0';
> 
>                The  following  code  can be used to chain‐copy from an untermi‐
>                nated string structure to an unterminated string:
> 
>                    p = mempcpy(p, src->ustr, src->len);
> 
>     Functions
>         stpcpy(3)
>                This function copies the input string into a destination string.
>                The programmer is responsible  for  allocating  a  buffer  large
>                enough.  It returns a pointer suitable for chaining.
> 
>         stpecpy()
>         stpecpyx()
>                These functions copy the input string into a destination string.
>                If  the destination buffer, limited by a pointer to one past the
>                end of it, isn’t large enough to hold the  copy,  the  resulting
>                string  is  truncated  (but  it  is guaranteed to be null‐termi‐
>                nated).  They return a pointer suitable for  chaining.   Trunca‐
>                tion needs to be detected only once after the last chained call.
>                stpecpyx()  has identical semantics to stpecpy(), except that it
>                forces a SIGSEGV on Undefined Behavior.
> 
>                These functions are not provided by any library, but you can de‐
>                fine them with the following reference implementations:
> 
>                    /* This code is in the public domain. */
>                    char *
>                    stpecpy(char *dst, char past_end[0],
>                            const char *restrict src)
>                    {
>                        char *p;
> 
>                        if (dst == past_end)
>                            return past_end;
> 
>                        p = memccpy(dst, src, '\0', past_end - dst);
>                        if (p != NULL)
>                            return p - 1;
> 
>                        /* truncation detected */
>                        past_end[-1] = '\0';
>                        return past_end;
>                    }
> 
>                    /* This code is in the public domain. */
>                    char *
>                    stpecpyx(char *dst, char past_end[0],
>                             const char *restrict src)
>                    {
>                        if (src[strlen(src)] != '\0')
>                            raise(SIGSEGV);
> 
>                        return stpecpy(dst, past_end, src);
>                    }
> 
>         stpncpy(3)
>                This function copies the input string into a  destination  null‐
>                padded  fixed‐width  unterminated  string.   If  the destination
>                buffer, limited by its size, isn’t  large  enough  to  hold  the
>                copy,  the  resulting  string is truncated.  Since it creates an
>                unterminated string, it doesn’t need to write a terminating null
>                byte.  It returns a pointer suitable for chaining, but it’s  not
>                ideal for that.  Truncation needs to be detected only once after
>                the last chained call.
> 
>                If  you’re going to use this function in chained calls, it would
>                probably be useful to develop a function similar to stpecpy().
> 
>         ustr2stp()
>                This function copies the input unterminated string contained  in
>                a  null‐padded wixed‐width buffer, into a destination (null‐ter‐
>                minated) string.  The programmer is responsible for allocating a
>                buffer large enough.  It returns a pointer suitable  for  chain‐
>                ing.
> 
>                This  function is not provided by any library, but you can write
>                it with the definition above in this page.
> 
>                A truncating version of this function doesn’t exist,  since  the
>                size  of  the original string is always known, so it wouldn’t be
>                very useful.
> 
>                This function is not provided by any library, but you can define
>                it with the following reference implementation:
> 
>                    /* This code is in the public domain. */
>                    char *
>                    ustr2stp(char *restrict dst, const char *restrict src,
>                             size_t sz)
>                    {
>                        char  *end;
> 
>                        end = memccpy(dst, src, '\0', sz)) ?: dst + sz;
>                        *end = '\0';
> 
>                        return end;
>                    }
> 
>         mempcpy(3)
>                This function copies the input string, limited  by  its  length,
>                into  a  destination unterminated string.  The programmer is re‐
>                sponsible for allocating a buffer large enough.   It  returns  a
>                pointer suitable for chaining.
> 
>     Deprecated functions
>         strlcpy(3bsd)
>         strlcat(3bsd)
>                Deprecated.  These functions copy the input string into a desti‐
>                nation  string.  If the destination buffer, limited by its size,
>                isn’t large enough to hold the copy,  the  resulting  string  is
>                truncated  (but  it  is guaranteed to be null‐terminated).  They
>                return the length of the total  string  they  tried  to  create.
>                These functions force a SIGSEGV on Undefined Behavior.
> 
>                stpecpyx()  is  a better replacement for these functions for the
>                following reasons:
> 
>                •  Better performance (chain copy instead of concatenating).
> 
>                •  Only requires detecting truncation once per chain of calls.
> 
>         strscpy(9)
>                Deprecated.  This function copies the input string into a desti‐
>                nation string.  If the destination buffer, limited by its  size,
>                isn’t  large  enough  to  hold the copy, the resulting string is
>                truncated (but it is guaranteed to be null‐terminated).  It  re‐
>                turns the length of the destination string, or -E2BIG on trunca‐
>                tion.
> 
>                stpecpy()  is  a  better replacement for this function, since it
>                has a much simpler interface.
> 
>         strcpy(3)
>         strcat(3)
>                Deprecated.  These functions copy the input string into a desti‐
>                nation string.  The programmer is responsible for  allocating  a
>                buffer large enough.  The return value is useless.
> 
>                strcpy(3)  is  identical to stpcpy(3) except for the useless re‐
>                turn value.
> 
>                stpcpy(3) is a better replacement for these  functions  for  the
>                following reasons:
> 
>                •  Better performance (chain copy instead of concatenating).
> 
>                •  No need to call strlen(3), thanks to the useful return value.
> 
>         strncpy(3)
>                Deprecated.   strncpy(3)  is  identical to stpncpy(3) except for
>                the useless return value.  Due to the return  value,  with  this
>                function  it’s hard to correctly check for truncation.  Use stp‐
>                ncpy(3) instead.
> 
>         strncat(3)
>                Deprecated.  Do not confuse this function with strncpy(3);  they
>                are not related at all.
> 
>                This  function  concatenates  the input unterminated string con‐
>                tained in a null‐padded wixed‐width buffer, into  a  destination
>                (null‐terminated) string.  The programmer is responsible for al‐
>                locating a buffer large enough.  The return value is useless.
> 
>                ustr2stp()  is  a  better  replacement for this function for the
>                following reasons:
> 
>                •  Better performance (chain copy instead of concatenating).
> 
>                •  No need to call strlen(3), thanks to the useful return value.
> 
>                •  Function name that is not actively confusing.
> 
> RETURN VALUE
>         The following functions return a pointer to the terminating  null  byte
>         in the destination string (they never truncate).
> 
>         •  stpcpy(3)
> 
>         •  ustr2stp()
> 
>         •  mempcpy(3)
> 
>         The  following  functions return a pointer to the terminating null byte
>         in the destination string, except when truncation occurs; if truncation
>         occurs, they return a pointer to one past the end  of  the  destination
>         buffer.
> 
>         •  stpecpy()
> 
>         •  stpecpyx()
> 
>         The  following function returns a pointer to one after the last charac‐
>         ter in the destination unterminated string; if truncation occurs,  that
>         pointer  is equivalent to a pointer to one past the end of the destina‐
>         tion buffer.
> 
>         •  stpncpy(3)
> 
>     Deprecated
>         The following functions return the length of the total string that they
>         tried to create (as if truncation didn’t occur).
> 
>         •  strlcpy(3bsd)
> 
>         •  strlcat(3bsd)
> 
>         The following function returns the length of the destination string, or
>         -E2BIG on truncation.
> 
>         •  strscpy(9)
> 
>         The following functions return the dst pointer, which is useless.
> 
>         •  strcpy(3)
> 
>         •  strcat(3)
> 
>         •  strncpy(3)
> 
>         •  strncat(3)

And here goes the STANDARDS section:

STANDARDS
        stpcpy(3)
               POSIX.1‐2008.

        stpecpy()
        stpecpyx()
        ustr2stp()
               Not defined by any standards nor libraries.

        stpncpy(3)
               POSIX.1‐2008.

        mempcpy(3)
               This function is a GNU extension.

        strlcpy(3bsd)
        strlcat(3bsd)
               Functions  originated  in  OpenBSD and present in some Unix sys‐
               tems.  They are provided in GNU/Linux systems by libbsd.

        strscpy(9)
               Linux kernel internal function.

        strcpy(3)
        strcat(3)
               POSIX.1‐2001, POSIX.1‐2008, C89, C99, SVr4, 4.3BSD.

        strncpy(3)
               POSIX.1‐2001, POSIX.1‐2008, C89, C99, SVr4, 4.3BSD.

        strncat(3)
               POSIX.1‐2001, POSIX.1‐2008, C89, C99, SVr4, 4.3BSD.


> 
> CAVEATS
>         Some of the functions described here are not provided by  any  library;
>         you should write your own copy if you want to use them.
> 
>         The  deprecated status of these functions varies from system to system.
>         This page declares as deprecated those functions that have a better re‐
>         placement documented in this same page.
> 
> EXAMPLES
>         The following are examples of correct use of each of these functions.
> 
>         stpcpy(3)
>                    p = buf;
>                    p = stpcpy(p, "Hello ");
>                    p = stpcpy(p, "world");
>                    p = stpcpy(p, "!");
>                    len = p - buf;
>                    puts(buf);
> 
>         stpecpy()
>         stpecpyx()
>                    past_end = buf + sizeof(buf);
>                    p = buf;
>                    p = stpecpy(p, past_end, "Hello ");
>                    p = stpecpy(p, past_end, "world");
>                    p = stpecpy(p, past_end, "!");
>                    if (p == past_end) {
>                        p--;
>                        goto toolong;
>                    }
>                    len = p - buf;
>                    puts(buf);
> 
>         stpncpy(3)
>                    past_end = buf + sizeof(buf);
>                    end = stpncpy(buf, "Hello world!", sizeof(buf));
>                    if (end == past_end)
>                        goto toolong;
>                    len = end - buf;
>                    for (size_t i = 0; i < sizeof(buf); i++)
>                        putchar(buf[i]);
> 
>         ustr2stp()
>                    p = buf;
>                    p = ustr2stp(p, "Hello ", 6);
>                    p = ustr2stp(p, "world", 42);  // Padding null bytes ignored.
>                    p = ustr2stp(p, "!", 1);
>                    len = p - buf;
>                    puts(buf);
> 
>         mempcpy(3)
>                    p = buf;
>                    p = mempcpy(p, "Hello ", 6);
>                    p = mempcpy(p, "world", 5);
>                    p = mempcpy(p, "!", 1);
>                    p = '\0';
>                    len = p - buf;
>                    puts(buf);
> 
>     Deprecated
>         strlcpy(3bsd)
>         strlcat(3bsd)
>                    if (strlcpy(buf, "Hello ", sizeof(buf)) >= sizeof(buf))
>                        goto toolong;
>                    if (strlcat(buf, "world", sizeof(buf)) >= sizeof(buf))
>                        goto toolong;
>                    len = strlcat(buf, "!", sizeof(buf));
>                    if (len >= sizeof(buf))
>                        goto toolong;
>                    puts(buf);
> 
>         strscpy(9)
>                    len = strscpy(buf, "Hello world!", sizeof(buf));
>                    if (len == -E2BIG)
>                        goto toolong;
>                    puts(buf);
> 
>         strcpy(3)
>         strcat(3)
>                    strcpy(buf, "Hello ");
>                    strcat(buf, "world");
>                    strcat(buf, "!");
>                    len = strlen(buf);
>                    puts(buf);
> 
>         strncpy(3)
>                    strncpy(buf, "Hello world!", sizeof(buf));
>                    if (buf + sizeof(buf) - 1 == '\0')
>                        goto toolong;
>                    len = strnlen(buf, sizeof(buf));
>                    for (size_t i = 0; i < sizeof(buf); i++)
>                        putchar(buf[i]);
> 
>         strncat(3)
>                    strncpy(buf, "Hello ", 6);
>                    strncat(buf, "world", 42);  // Padding null bytes ignored.
>                    strncat(buf, "!", 1);
>                    puts(buf);
> 
> SEE ALSO
>         memcpy(3), memccpy(3), mempcpy(3), string(3)
> 
> Linux man‐pages (unreleased)        (date)                      string_copy(7)
> 
> 
> 

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: string_copy(7): New manual page documenting string copying functions.
  2022-12-11 23:59 string_copy(7): New manual page documenting string copying functions Alejandro Colomar
  2022-12-12  0:17 ` Alejandro Colomar
@ 2022-12-12  0:25 ` Alejandro Colomar
  2022-12-12  0:32 ` Alejandro Colomar
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-12  0:25 UTC (permalink / raw)
  To: linux-man


[-- Attachment #1.1: Type: text/plain, Size: 31166 bytes --]



On 12/12/22 00:59, Alejandro Colomar wrote:
> Hi all!
> 
> I'm planning to add a new manual page that documents all string copying 
> functions.  It covers more detail than any of the existing manual pages (and in 
> fact, I've discovered some properties of the functions documented while working 
> on this page).  The intention is to remove the existing separate manual pages 
> for all string copying functions, and make them links to this new page.  It 
> intends to be the only reference documentation for copying strings in C, and 
> hopefully fix the half century of suboptimal string copying library with which 
> we've lived.  (Say goodbye to std::string, here come back C strings ;)
> 
> The formatted manual page is below.
> 
> Alex
> 
> P.S.: I'm sorry for your beloved string copying function(s); it has high chances 
> of being dreaded by the page below.  Not sorry.  Oh well, at least I justified 
> it, or I tried :-)
> 
> ---
> 
> string_copy(7)         Miscellaneous Information Manual         string_copy(7)
> 
> NAME
>         stpcpy,  stpecpy,  stpecpyx, strlcpy, strlcat, strscpy, strcpy, strcat,
>         stpncpy, ustr2stp, strncpy, strncat, mempcpy - copy strings
> 
> SYNOPSIS
>     (Null‐terminated) strings
>         // Chain‐copy a string.
>         char *stpcpy(char *restrict dst, const char *restrict src);
> 
>         // Chain‐copy a string with truncation (not in libc).
>         char *stpecpy(char *dst, char past_end[0], const char *restrict src);
> 
>         // Chain‐copy a string with truncation and SIGSEGV on invalid input.
>         char *stpecpyx(char *dst, char past_end[0], const char *restrict src);
> 
>         // Copy a string with truncation and SIGSEGV on invalid input.
>         [[deprecated]]  // Use stpecpyx() instead.
>         size_t strlcpy(char dst[restrict .sz], const char *restrict src,
>                        size_t sz);
> 
>         // Concatenate a string with truncation.
>         [[deprecated]]  // Use stpecpyx() instead.
>         size_t strlcat(char dst[restrict .sz], const char *restrict src,
>                        size_t sz);
> 
>         // Copy a string with truncation (not in libc).
>         [[deprecated]]  // Use stpecpy() instead.
>         ssize_t strscpy(char dst[restrict .sz], const char src[restrict .sz],
>                        size_t sz);
> 
>         // Copy a string.
>         [[deprecated]]  // Use stpcpy(3) instead.
>         char *strcpy(char *restrict dst, const char *restrict src);
> 
>         // Concatenate a string.
>         [[deprecated]]  // Use stpcpy(3) instead.
>         char *strcat(char *restrict dst, const char *restrict src);
> 
>     Unterminated strings (null‐padded fixed‐width buffers)
>         // Zero a fixed‐width buffer, and
>         // copy a string with truncation into an unterminated string.
>         char *stpncpy(char dst[restrict .sz], const char *restrict src,
>                        size_t sz);
> 
>         // Chain‐copy an unterminated string into a string (not in libc).
>         char *ustr2stp(char *restrict dst, const char src[restrict .sz],
>                        size_t sz);
> 
>         // Zero a fixed‐width buffer, and
>         // copy a string with truncation into an unterminated string
>         [[deprecated]]  // Use stpncpy(3) instead.
>         char *strncpy(char dest[restrict .sz], const char *restrict src,
>                        size_t sz);
> 
>         // Concatenate an unterminated string into a string.
>         [[deprecated]]  // Use ustr2stp() instead.
>         char *strncat(char *restrict dst, const char src[restrict .sz],
>                        size_t sz);
> 
>     String structures
>         // (Null‐terminated) string structure.
>         struct str_s {
>             size_t  len;
>             char    *str;
>         };
> 
>         // Unterminated string structure (overlapping strings).
>         struct ustr_s {
>             size_t  len;
>             char    *ustr;
>         };
> 
>         // Chain‐copy a string structure into an unterminated string.
>         void *mempcpy(void *restrict dst, const void src[restrict len],
>                        size_t len);
> 
> DESCRIPTION
>     Terms (and abbreviations)
>         string (str)
>                is a sequence of zero or more non‐null characters, followed by a
>                null byte.
> 
>         unterminated string (ustr)
>                is a sequence of zero or more  non‐null  characters.   They  are
>                sometimes  contained  in fixed‐width buffers, which usually con‐
>                tain padding null bytes after the unterminated string,  to  fill
>                the  rest  of  the  buffer  without  affecting  the unterminated
>                string; however, those padding null bytes are not  part  of  the
>                unterminated string.
> 
>         length (len)
>                is the number of non‐null characters in a string.  It is the re‐
>                turn value of strlen(str) and of strnlen(ustr, sz).
> 
>         size (sz)
>                refers to the entire buffer where the string is contained.
> 
>         end    is  the  name  of  a  pointer  to the terminating null byte of a
>                string, or a pointer to one past the last character of an unter‐
>                minated string.  This is the return value of functions that  al‐
>                low chaining.  It is equivalent to &str[len].
> 
>         past_end
>                is  the name of a pointer to one past the end of the buffer that
>                contains a string.  It is equivalent to &str[sz].  It is used as
>                a sentinel value, to be able  to  truncate  strings  instead  of
>                overrunning a buffer.
> 
>         string structure
>         unterminated string structure
>                Structure  that  contains the length of a string, as well as the
>                string or the unterminated string.
> 
>     Types of functions
>         Copy, concatenate, and chain‐copy
>                Originally, there was a distinction between functions that  copy
>                and  those that concatenate.  However, newer functions that copy
>                while allowing chaining cover both use cases with a single  API.
>                They  are  also algorithmically faster, since they don’t need to
>                search for the end of the existing string.
> 
>                To chain copy functions, they need to return a  pointer  to  the
>                end.   That’s  a  byproduct  of the copy operation, so it has no
>                performance costs.  These functions are preferred over  copy  or
>                concatenation  functions.  Functions that return such a pointer,
>                and thus can be chained, have names of the form  *stp*(),  since
>                it’s also common to name the pointer just p.
> 
>         Truncate or not?
>                The  first  thing  to note is that programmers should be careful
>                with buffers, so they always have the correct size, and  trunca‐
>                tion is not necessary.
> 
>                In  most  cases, truncation is not desired, and it is simpler to
>                just do the copy.  Simpler  code  is  safer  code.   Programming
>                against  programming mistakes by adding more code just adds more
>                points where mistakes can be made.
> 
>                Nowadays, compilers can detect most programmer errors with  fea‐
>                tures    like   compiler   warnings,   static   analyzers,   and
>                _FORTIFY_SOURCE (see ftm(7)).  Keeping  the  code  simple  helps
>                these error‐detection features be more precise.
> 
>                When validating user input, however, it makes sense to truncate.
>                Remember to check the return value of such function calls.
> 
>                Functions that truncate:
> 
>                •  stpecpy()  is  the  most  efficient string copy function that
>                   performs truncation.  It only requires to check  for  trunca‐
>                   tion once after all chained calls.
> 
>                •  stpecpyx() is a variant of stpecpy() that consumes the entire
>                   source string, to catch bugs in the program by forcing a seg‐
>                   mentation fault (as strlcpy(3bsd) and strlcat(3bsd) do).
> 
>                •  strlcpy(3bsd) and strlcat(3bsd), which originated in OpenBSD,
>                   are designed to crash if the input string is invalid (doesn’t
>                   contain a null byte).
> 
>                •  strscpy(9) is a function in the Linux kernel which reports an
>                   error instead of crashing.
> 
>                •  stpncpy(3) and strncpy(3) also truncate, but they don’t write
>                   strings, but rather unterminated strings.
> 
>     Unterminated strings (null‐padded fixed‐width buffers)
>         For  historic reasons, some standard APIs, such as utmpx(5), use unter‐
>         minated strings in fixed‐width buffers.  To interface with  them,  spe‐
>         cialized functions need to be used.
> 
>         To copy strings into them, use stpncpy(3).
> 
>         To  copy from an unterminated string within a fixed‐width buffer into a
>         string, ignoring any trailing null  bytes  in  the  source  fixed‐width
>         buffer, you should use ustr2stp().
> 
>     String structures
>         The simplest string copying function is mempcpy(3).  It requires always
>         knowing  the length of your strings, for which string structures can be
>         used.  It makes the code simpler, since you always know the  length  of
>         your strings, and it’s also faster, since it doesn’t need to repeatedly
>         calculate  those  lengths.   mempcpy(3)  always creates an unterminated
>         string, so you need to explicitly set the terminating null byte.
> 
>         String structure
>                The following code can be  used  to  chain‐copy  from  a  string
>                structure into a string:
> 
>                    p = mempcpy(p, src->str, src->len);
>                    *p = '\0';
> 
>                The  following  code  can  be  used  to chain‐copy from a string
>                structure into an unterminated string:
> 
>                    p = mempcpy(p, src->str, src->len);
> 
>         Unterminated string structure (overlapping strings)
>                In programs that make considerable use of strings, and need  the
>                best  performance, using overlapping strings can make a big dif‐
>                ference.  It allows holding substrings of a bigger string  while
>                not duplicating memory nor using time to do a copy.
> 
>                However,  this is delicate, since it requires using unterminated
>                strings.  C library APIs use strings, so programs that  use  un‐
>                terminated  strings  will  have  to  take  care to differentiate
>                strings from unterminated strings.
> 
>                The following code can be used to chain‐copy  from  an  untermi‐
>                nated string structure to a string:
> 
>                    p = mempcpy(p, src->ustr, src->len);
>                    *p = '\0';
> 
>                The  following  code  can be used to chain‐copy from an untermi‐
>                nated string structure to an unterminated string:
> 
>                    p = mempcpy(p, src->ustr, src->len);
> 
>     Functions
>         stpcpy(3)
>                This function copies the input string into a destination string.
>                The programmer is responsible  for  allocating  a  buffer  large
>                enough.  It returns a pointer suitable for chaining.
> 
>         stpecpy()
>         stpecpyx()
>                These functions copy the input string into a destination string.
>                If  the destination buffer, limited by a pointer to one past the
>                end of it, isn’t large enough to hold the  copy,  the  resulting
>                string  is  truncated  (but  it  is guaranteed to be null‐termi‐
>                nated).  They return a pointer suitable for  chaining.   Trunca‐
>                tion needs to be detected only once after the last chained call.
>                stpecpyx()  has identical semantics to stpecpy(), except that it
>                forces a SIGSEGV on Undefined Behavior.
> 
>                These functions are not provided by any library, but you can de‐
>                fine them with the following reference implementations:
> 
>                    /* This code is in the public domain. */
>                    char *
>                    stpecpy(char *dst, char past_end[0],
>                            const char *restrict src)
>                    {
>                        char *p;
> 
>                        if (dst == past_end)
>                            return past_end;
> 
>                        p = memccpy(dst, src, '\0', past_end - dst);
>                        if (p != NULL)
>                            return p - 1;
> 
>                        /* truncation detected */
>                        past_end[-1] = '\0';
>                        return past_end;
>                    }
> 
>                    /* This code is in the public domain. */
>                    char *
>                    stpecpyx(char *dst, char past_end[0],
>                             const char *restrict src)
>                    {
>                        if (src[strlen(src)] != '\0')
>                            raise(SIGSEGV);
> 
>                        return stpecpy(dst, past_end, src);
>                    }
> 
>         stpncpy(3)
>                This function copies the input string into a  destination  null‐
>                padded  fixed‐width  unterminated  string.   If  the destination
>                buffer, limited by its size, isn’t  large  enough  to  hold  the
>                copy,  the  resulting  string is truncated.  Since it creates an
>                unterminated string, it doesn’t need to write a terminating null
>                byte.  It returns a pointer suitable for chaining, but it’s  not
>                ideal for that.  Truncation needs to be detected only once after
>                the last chained call.
> 
>                If  you’re going to use this function in chained calls, it would
>                probably be useful to develop a function similar to stpecpy().
> 
>         ustr2stp()
>                This function copies the input unterminated string contained  in
>                a  null‐padded wixed‐width buffer, into a destination (null‐ter‐
>                minated) string.  The programmer is responsible for allocating a
>                buffer large enough.  It returns a pointer suitable  for  chain‐
>                ing.
> 
>                This  function is not provided by any library, but you can write
>                it with the definition above in this page.
> 
>                A truncating version of this function doesn’t exist,  since  the
>                size  of  the original string is always known, so it wouldn’t be
>                very useful.
> 
>                This function is not provided by any library, but you can define
>                it with the following reference implementation:
> 
>                    /* This code is in the public domain. */
>                    char *
>                    ustr2stp(char *restrict dst, const char *restrict src,
>                             size_t sz)
>                    {
>                        char  *end;
> 
>                        end = memccpy(dst, src, '\0', sz)) ?: dst + sz;
>                        *end = '\0';
> 
>                        return end;
>                    }
> 
>         mempcpy(3)
>                This function copies the input string, limited  by  its  length,
>                into  a  destination unterminated string.  The programmer is re‐
>                sponsible for allocating a buffer large enough.   It  returns  a
>                pointer suitable for chaining.
> 
>     Deprecated functions
>         strlcpy(3bsd)
>         strlcat(3bsd)
>                Deprecated.  These functions copy the input string into a desti‐
>                nation  string.  If the destination buffer, limited by its size,
>                isn’t large enough to hold the copy,  the  resulting  string  is
>                truncated  (but  it  is guaranteed to be null‐terminated).  They
>                return the length of the total  string  they  tried  to  create.
>                These functions force a SIGSEGV on Undefined Behavior.
> 
>                stpecpyx()  is  a better replacement for these functions for the
>                following reasons:
> 
>                •  Better performance (chain copy instead of concatenating).
> 
>                •  Only requires detecting truncation once per chain of calls.
> 
>         strscpy(9)
>                Deprecated.  This function copies the input string into a desti‐
>                nation string.  If the destination buffer, limited by its  size,
>                isn’t  large  enough  to  hold the copy, the resulting string is
>                truncated (but it is guaranteed to be null‐terminated).  It  re‐
>                turns the length of the destination string, or -E2BIG on trunca‐
>                tion.
> 
>                stpecpy()  is  a  better replacement for this function, since it
>                has a much simpler interface.
> 
>         strcpy(3)
>         strcat(3)
>                Deprecated.  These functions copy the input string into a desti‐
>                nation string.  The programmer is responsible for  allocating  a
>                buffer large enough.  The return value is useless.
> 
>                strcpy(3)  is  identical to stpcpy(3) except for the useless re‐
>                turn value.
> 
>                stpcpy(3) is a better replacement for these  functions  for  the
>                following reasons:
> 
>                •  Better performance (chain copy instead of concatenating).
> 
>                •  No need to call strlen(3), thanks to the useful return value.
> 
>         strncpy(3)
>                Deprecated.   strncpy(3)  is  identical to stpncpy(3) except for
>                the useless return value.  Due to the return  value,  with  this
>                function  it’s hard to correctly check for truncation.  Use stp‐
>                ncpy(3) instead.
> 
>         strncat(3)
>                Deprecated.  Do not confuse this function with strncpy(3);  they
>                are not related at all.
> 
>                This  function  concatenates  the input unterminated string con‐
>                tained in a null‐padded wixed‐width buffer, into  a  destination
>                (null‐terminated) string.  The programmer is responsible for al‐
>                locating a buffer large enough.  The return value is useless.
> 
>                ustr2stp()  is  a  better  replacement for this function for the
>                following reasons:
> 
>                •  Better performance (chain copy instead of concatenating).
> 
>                •  No need to call strlen(3), thanks to the useful return value.
> 
>                •  Function name that is not actively confusing.
> 
> RETURN VALUE
>         The following functions return a pointer to the terminating  null  byte
>         in the destination string (they never truncate).
> 
>         •  stpcpy(3)
> 
>         •  ustr2stp()
> 
>         •  mempcpy(3)
> 
>         The  following  functions return a pointer to the terminating null byte
>         in the destination string, except when truncation occurs; if truncation
>         occurs, they return a pointer to one past the end  of  the  destination
>         buffer.
> 
>         •  stpecpy()
> 
>         •  stpecpyx()
> 
>         The  following function returns a pointer to one after the last charac‐
>         ter in the destination unterminated string; if truncation occurs,  that
>         pointer  is equivalent to a pointer to one past the end of the destina‐
>         tion buffer.
> 
>         •  stpncpy(3)
> 
>     Deprecated
>         The following functions return the length of the total string that they
>         tried to create (as if truncation didn’t occur).
> 
>         •  strlcpy(3bsd)
> 
>         •  strlcat(3bsd)
> 
>         The following function returns the length of the destination string, or
>         -E2BIG on truncation.
> 
>         •  strscpy(9)
> 
>         The following functions return the dst pointer, which is useless.
> 
>         •  strcpy(3)
> 
>         •  strcat(3)
> 
>         •  strncpy(3)
> 
>         •  strncat(3)
> 
> CAVEATS
>         Some of the functions described here are not provided by  any  library;
>         you should write your own copy if you want to use them.
> 
>         The  deprecated status of these functions varies from system to system.
>         This page declares as deprecated those functions that have a better re‐
>         placement documented in this same page.
> 
> EXAMPLES
>         The following are examples of correct use of each of these functions.
> 
>         stpcpy(3)
>                    p = buf;
>                    p = stpcpy(p, "Hello ");
>                    p = stpcpy(p, "world");
>                    p = stpcpy(p, "!");
>                    len = p - buf;
>                    puts(buf);
> 
>         stpecpy()
>         stpecpyx()
>                    past_end = buf + sizeof(buf);
>                    p = buf;
>                    p = stpecpy(p, past_end, "Hello ");
>                    p = stpecpy(p, past_end, "world");
>                    p = stpecpy(p, past_end, "!");
>                    if (p == past_end) {
>                        p--;
>                        goto toolong;
>                    }
>                    len = p - buf;
>                    puts(buf);
> 
>         stpncpy(3)
>                    past_end = buf + sizeof(buf);
>                    end = stpncpy(buf, "Hello world!", sizeof(buf));
>                    if (end == past_end)
>                        goto toolong;
>                    len = end - buf;
>                    for (size_t i = 0; i < sizeof(buf); i++)
>                        putchar(buf[i]);
> 
>         ustr2stp()
>                    p = buf;
>                    p = ustr2stp(p, "Hello ", 6);
>                    p = ustr2stp(p, "world", 42);  // Padding null bytes ignored.
>                    p = ustr2stp(p, "!", 1);
>                    len = p - buf;
>                    puts(buf);
> 
>         mempcpy(3)
>                    p = buf;
>                    p = mempcpy(p, "Hello ", 6);
>                    p = mempcpy(p, "world", 5);
>                    p = mempcpy(p, "!", 1);
>                    p = '\0';
>                    len = p - buf;
>                    puts(buf);
> 
>     Deprecated
>         strlcpy(3bsd)
>         strlcat(3bsd)
>                    if (strlcpy(buf, "Hello ", sizeof(buf)) >= sizeof(buf))
>                        goto toolong;
>                    if (strlcat(buf, "world", sizeof(buf)) >= sizeof(buf))
>                        goto toolong;
>                    len = strlcat(buf, "!", sizeof(buf));
>                    if (len >= sizeof(buf))
>                        goto toolong;
>                    puts(buf);
> 
>         strscpy(9)
>                    len = strscpy(buf, "Hello world!", sizeof(buf));
>                    if (len == -E2BIG)
>                        goto toolong;
>                    puts(buf);
> 
>         strcpy(3)
>         strcat(3)
>                    strcpy(buf, "Hello ");
>                    strcat(buf, "world");
>                    strcat(buf, "!");
>                    len = strlen(buf);
>                    puts(buf);
> 
>         strncpy(3)
>                    strncpy(buf, "Hello world!", sizeof(buf));
>                    if (buf + sizeof(buf) - 1 == '\0')
>                        goto toolong;
>                    len = strnlen(buf, sizeof(buf));
>                    for (size_t i = 0; i < sizeof(buf); i++)
>                        putchar(buf[i]);
> 
>         strncat(3)
>                    strncpy(buf, "Hello ", 6);
>                    strncat(buf, "world", 42);  // Padding null bytes ignored.
>                    strncat(buf, "!", 1);
>                    puts(buf);

Oops, that example was mistaken; too much cut and paste.

        strncat(3)
                   buf[0] = '\0';
                   strncat(buf, "Hello ", 6);
                   strncat(buf, "world", 42);  // Padding null bytes ignored.
                   strncat(buf, "!", 1);
                   len = strlen(buf);
                   puts(buf);

> 
> SEE ALSO
>         memcpy(3), memccpy(3), mempcpy(3), string(3)
> 
> Linux man‐pages (unreleased)        (date)                      string_copy(7)
> 
> 
> 

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: string_copy(7): New manual page documenting string copying functions.
  2022-12-11 23:59 string_copy(7): New manual page documenting string copying functions Alejandro Colomar
  2022-12-12  0:17 ` Alejandro Colomar
  2022-12-12  0:25 ` Alejandro Colomar
@ 2022-12-12  0:32 ` Alejandro Colomar
  2022-12-12 14:24 ` [PATCH 1/3] strcpy.3: Rewrite page to document all string-copying functions Alejandro Colomar
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-12  0:32 UTC (permalink / raw)
  To: linux-man


[-- Attachment #1.1: Type: text/plain, Size: 31367 bytes --]



On 12/12/22 00:59, Alejandro Colomar wrote:
> Hi all!
> 
> I'm planning to add a new manual page that documents all string copying 
> functions.  It covers more detail than any of the existing manual pages (and in 
> fact, I've discovered some properties of the functions documented while working 
> on this page).  The intention is to remove the existing separate manual pages 
> for all string copying functions, and make them links to this new page.  It 
> intends to be the only reference documentation for copying strings in C, and 
> hopefully fix the half century of suboptimal string copying library with which 
> we've lived.  (Say goodbye to std::string, here come back C strings ;)
> 
> The formatted manual page is below.
> 
> Alex
> 
> P.S.: I'm sorry for your beloved string copying function(s); it has high chances 
> of being dreaded by the page below.  Not sorry.  Oh well, at least I justified 
> it, or I tried :-)
> 
> ---
> 
> string_copy(7)         Miscellaneous Information Manual         string_copy(7)
> 
> NAME
>         stpcpy,  stpecpy,  stpecpyx, strlcpy, strlcat, strscpy, strcpy, strcat,
>         stpncpy, ustr2stp, strncpy, strncat, mempcpy - copy strings
> 
> SYNOPSIS
>     (Null‐terminated) strings
>         // Chain‐copy a string.
>         char *stpcpy(char *restrict dst, const char *restrict src);
> 
>         // Chain‐copy a string with truncation (not in libc).
>         char *stpecpy(char *dst, char past_end[0], const char *restrict src);
> 
>         // Chain‐copy a string with truncation and SIGSEGV on invalid input.
>         char *stpecpyx(char *dst, char past_end[0], const char *restrict src);
> 
>         // Copy a string with truncation and SIGSEGV on invalid input.
>         [[deprecated]]  // Use stpecpyx() instead.
>         size_t strlcpy(char dst[restrict .sz], const char *restrict src,
>                        size_t sz);
> 
>         // Concatenate a string with truncation.
>         [[deprecated]]  // Use stpecpyx() instead.
>         size_t strlcat(char dst[restrict .sz], const char *restrict src,
>                        size_t sz);
> 
>         // Copy a string with truncation (not in libc).
>         [[deprecated]]  // Use stpecpy() instead.
>         ssize_t strscpy(char dst[restrict .sz], const char src[restrict .sz],
>                        size_t sz);
> 
>         // Copy a string.
>         [[deprecated]]  // Use stpcpy(3) instead.
>         char *strcpy(char *restrict dst, const char *restrict src);
> 
>         // Concatenate a string.
>         [[deprecated]]  // Use stpcpy(3) instead.
>         char *strcat(char *restrict dst, const char *restrict src);
> 
>     Unterminated strings (null‐padded fixed‐width buffers)
>         // Zero a fixed‐width buffer, and
>         // copy a string with truncation into an unterminated string.
>         char *stpncpy(char dst[restrict .sz], const char *restrict src,
>                        size_t sz);
> 
>         // Chain‐copy an unterminated string into a string (not in libc).
>         char *ustr2stp(char *restrict dst, const char src[restrict .sz],
>                        size_t sz);
> 
>         // Zero a fixed‐width buffer, and
>         // copy a string with truncation into an unterminated string
>         [[deprecated]]  // Use stpncpy(3) instead.
>         char *strncpy(char dest[restrict .sz], const char *restrict src,
>                        size_t sz);
> 
>         // Concatenate an unterminated string into a string.
>         [[deprecated]]  // Use ustr2stp() instead.
>         char *strncat(char *restrict dst, const char src[restrict .sz],
>                        size_t sz);
> 
>     String structures
>         // (Null‐terminated) string structure.
>         struct str_s {
>             size_t  len;
>             char    *str;
>         };
> 
>         // Unterminated string structure (overlapping strings).
>         struct ustr_s {
>             size_t  len;
>             char    *ustr;
>         };
> 
>         // Chain‐copy a string structure into an unterminated string.
>         void *mempcpy(void *restrict dst, const void src[restrict len],
>                        size_t len);
> 
> DESCRIPTION
>     Terms (and abbreviations)
>         string (str)
>                is a sequence of zero or more non‐null characters, followed by a
>                null byte.
> 
>         unterminated string (ustr)
>                is a sequence of zero or more  non‐null  characters.   They  are
>                sometimes  contained  in fixed‐width buffers, which usually con‐
>                tain padding null bytes after the unterminated string,  to  fill
>                the  rest  of  the  buffer  without  affecting  the unterminated
>                string; however, those padding null bytes are not  part  of  the
>                unterminated string.
> 
>         length (len)
>                is the number of non‐null characters in a string.  It is the re‐
>                turn value of strlen(str) and of strnlen(ustr, sz).
> 
>         size (sz)
>                refers to the entire buffer where the string is contained.
> 
>         end    is  the  name  of  a  pointer  to the terminating null byte of a
>                string, or a pointer to one past the last character of an unter‐
>                minated string.  This is the return value of functions that  al‐
>                low chaining.  It is equivalent to &str[len].
> 
>         past_end
>                is  the name of a pointer to one past the end of the buffer that
>                contains a string.  It is equivalent to &str[sz].  It is used as
>                a sentinel value, to be able  to  truncate  strings  instead  of
>                overrunning a buffer.
> 
>         string structure
>         unterminated string structure
>                Structure  that  contains the length of a string, as well as the
>                string or the unterminated string.
> 
>     Types of functions
>         Copy, concatenate, and chain‐copy
>                Originally, there was a distinction between functions that  copy
>                and  those that concatenate.  However, newer functions that copy
>                while allowing chaining cover both use cases with a single  API.
>                They  are  also algorithmically faster, since they don’t need to
>                search for the end of the existing string.
> 
>                To chain copy functions, they need to return a  pointer  to  the
>                end.   That’s  a  byproduct  of the copy operation, so it has no
>                performance costs.  These functions are preferred over  copy  or
>                concatenation  functions.  Functions that return such a pointer,
>                and thus can be chained, have names of the form  *stp*(),  since
>                it’s also common to name the pointer just p.
> 
>         Truncate or not?
>                The  first  thing  to note is that programmers should be careful
>                with buffers, so they always have the correct size, and  trunca‐
>                tion is not necessary.
> 
>                In  most  cases, truncation is not desired, and it is simpler to
>                just do the copy.  Simpler  code  is  safer  code.   Programming
>                against  programming mistakes by adding more code just adds more
>                points where mistakes can be made.
> 
>                Nowadays, compilers can detect most programmer errors with  fea‐
>                tures    like   compiler   warnings,   static   analyzers,   and
>                _FORTIFY_SOURCE (see ftm(7)).  Keeping  the  code  simple  helps
>                these error‐detection features be more precise.
> 
>                When validating user input, however, it makes sense to truncate.
>                Remember to check the return value of such function calls.
> 
>                Functions that truncate:
> 
>                •  stpecpy()  is  the  most  efficient string copy function that
>                   performs truncation.  It only requires to check  for  trunca‐
>                   tion once after all chained calls.
> 
>                •  stpecpyx() is a variant of stpecpy() that consumes the entire
>                   source string, to catch bugs in the program by forcing a seg‐
>                   mentation fault (as strlcpy(3bsd) and strlcat(3bsd) do).
> 
>                •  strlcpy(3bsd) and strlcat(3bsd), which originated in OpenBSD,
>                   are designed to crash if the input string is invalid (doesn’t
>                   contain a null byte).
> 
>                •  strscpy(9) is a function in the Linux kernel which reports an
>                   error instead of crashing.
> 
>                •  stpncpy(3) and strncpy(3) also truncate, but they don’t write
>                   strings, but rather unterminated strings.
> 
>     Unterminated strings (null‐padded fixed‐width buffers)
>         For  historic reasons, some standard APIs, such as utmpx(5), use unter‐
>         minated strings in fixed‐width buffers.  To interface with  them,  spe‐
>         cialized functions need to be used.
> 
>         To copy strings into them, use stpncpy(3).
> 
>         To  copy from an unterminated string within a fixed‐width buffer into a
>         string, ignoring any trailing null  bytes  in  the  source  fixed‐width
>         buffer, you should use ustr2stp().
> 
>     String structures
>         The simplest string copying function is mempcpy(3).  It requires always
>         knowing  the length of your strings, for which string structures can be
>         used.  It makes the code simpler, since you always know the  length  of
>         your strings, and it’s also faster, since it doesn’t need to repeatedly
>         calculate  those  lengths.   mempcpy(3)  always creates an unterminated
>         string, so you need to explicitly set the terminating null byte.
> 
>         String structure
>                The following code can be  used  to  chain‐copy  from  a  string
>                structure into a string:
> 
>                    p = mempcpy(p, src->str, src->len);
>                    *p = '\0';
> 
>                The  following  code  can  be  used  to chain‐copy from a string
>                structure into an unterminated string:
> 
>                    p = mempcpy(p, src->str, src->len);
> 
>         Unterminated string structure (overlapping strings)
>                In programs that make considerable use of strings, and need  the
>                best  performance, using overlapping strings can make a big dif‐
>                ference.  It allows holding substrings of a bigger string  while
>                not duplicating memory nor using time to do a copy.
> 
>                However,  this is delicate, since it requires using unterminated
>                strings.  C library APIs use strings, so programs that  use  un‐
>                terminated  strings  will  have  to  take  care to differentiate
>                strings from unterminated strings.
> 
>                The following code can be used to chain‐copy  from  an  untermi‐
>                nated string structure to a string:
> 
>                    p = mempcpy(p, src->ustr, src->len);
>                    *p = '\0';
> 
>                The  following  code  can be used to chain‐copy from an untermi‐
>                nated string structure to an unterminated string:
> 
>                    p = mempcpy(p, src->ustr, src->len);
> 
>     Functions
>         stpcpy(3)
>                This function copies the input string into a destination string.
>                The programmer is responsible  for  allocating  a  buffer  large
>                enough.  It returns a pointer suitable for chaining.
> 
>         stpecpy()
>         stpecpyx()
>                These functions copy the input string into a destination string.
>                If  the destination buffer, limited by a pointer to one past the
>                end of it, isn’t large enough to hold the  copy,  the  resulting
>                string  is  truncated  (but  it  is guaranteed to be null‐termi‐
>                nated).  They return a pointer suitable for  chaining.   Trunca‐
>                tion needs to be detected only once after the last chained call.
>                stpecpyx()  has identical semantics to stpecpy(), except that it
>                forces a SIGSEGV on Undefined Behavior.
> 
>                These functions are not provided by any library, but you can de‐
>                fine them with the following reference implementations:
> 
>                    /* This code is in the public domain. */
>                    char *
>                    stpecpy(char *dst, char past_end[0],
>                            const char *restrict src)
>                    {
>                        char *p;
> 
>                        if (dst == past_end)
>                            return past_end;
> 
>                        p = memccpy(dst, src, '\0', past_end - dst);
>                        if (p != NULL)
>                            return p - 1;
> 
>                        /* truncation detected */
>                        past_end[-1] = '\0';
>                        return past_end;
>                    }
> 
>                    /* This code is in the public domain. */
>                    char *
>                    stpecpyx(char *dst, char past_end[0],
>                             const char *restrict src)
>                    {
>                        if (src[strlen(src)] != '\0')
>                            raise(SIGSEGV);
> 
>                        return stpecpy(dst, past_end, src);
>                    }
> 
>         stpncpy(3)
>                This function copies the input string into a  destination  null‐
>                padded  fixed‐width  unterminated  string.   If  the destination
>                buffer, limited by its size, isn’t  large  enough  to  hold  the
>                copy,  the  resulting  string is truncated.  Since it creates an
>                unterminated string, it doesn’t need to write a terminating null
>                byte.  It returns a pointer suitable for chaining, but it’s  not
>                ideal for that.  Truncation needs to be detected only once after
>                the last chained call.
> 
>                If  you’re going to use this function in chained calls, it would
>                probably be useful to develop a function similar to stpecpy().
> 
>         ustr2stp()
>                This function copies the input unterminated string contained  in
>                a  null‐padded wixed‐width buffer, into a destination (null‐ter‐
>                minated) string.  The programmer is responsible for allocating a
>                buffer large enough.  It returns a pointer suitable  for  chain‐
>                ing.
> 
>                This  function is not provided by any library, but you can write
>                it with the definition above in this page.
> 
>                A truncating version of this function doesn’t exist,  since  the
>                size  of  the original string is always known, so it wouldn’t be
>                very useful.
> 
>                This function is not provided by any library, but you can define
>                it with the following reference implementation:
> 
>                    /* This code is in the public domain. */
>                    char *
>                    ustr2stp(char *restrict dst, const char *restrict src,
>                             size_t sz)
>                    {
>                        char  *end;
> 
>                        end = memccpy(dst, src, '\0', sz)) ?: dst + sz;
>                        *end = '\0';
> 
>                        return end;
>                    }
> 
>         mempcpy(3)
>                This function copies the input string, limited  by  its  length,
>                into  a  destination unterminated string.  The programmer is re‐
>                sponsible for allocating a buffer large enough.   It  returns  a
>                pointer suitable for chaining.
> 
>     Deprecated functions
>         strlcpy(3bsd)
>         strlcat(3bsd)
>                Deprecated.  These functions copy the input string into a desti‐
>                nation  string.  If the destination buffer, limited by its size,
>                isn’t large enough to hold the copy,  the  resulting  string  is
>                truncated  (but  it  is guaranteed to be null‐terminated).  They
>                return the length of the total  string  they  tried  to  create.
>                These functions force a SIGSEGV on Undefined Behavior.
> 
>                stpecpyx()  is  a better replacement for these functions for the
>                following reasons:
> 
>                •  Better performance (chain copy instead of concatenating).
> 
>                •  Only requires detecting truncation once per chain of calls.
> 
>         strscpy(9)
>                Deprecated.  This function copies the input string into a desti‐
>                nation string.  If the destination buffer, limited by its  size,
>                isn’t  large  enough  to  hold the copy, the resulting string is
>                truncated (but it is guaranteed to be null‐terminated).  It  re‐
>                turns the length of the destination string, or -E2BIG on trunca‐
>                tion.
> 
>                stpecpy()  is  a  better replacement for this function, since it
>                has a much simpler interface.
> 
>         strcpy(3)
>         strcat(3)
>                Deprecated.  These functions copy the input string into a desti‐
>                nation string.  The programmer is responsible for  allocating  a
>                buffer large enough.  The return value is useless.
> 
>                strcpy(3)  is  identical to stpcpy(3) except for the useless re‐
>                turn value.
> 
>                stpcpy(3) is a better replacement for these  functions  for  the
>                following reasons:
> 
>                •  Better performance (chain copy instead of concatenating).
> 
>                •  No need to call strlen(3), thanks to the useful return value.
> 
>         strncpy(3)
>                Deprecated.   strncpy(3)  is  identical to stpncpy(3) except for
>                the useless return value.  Due to the return  value,  with  this
>                function  it’s hard to correctly check for truncation.  Use stp‐
>                ncpy(3) instead.
> 
>         strncat(3)
>                Deprecated.  Do not confuse this function with strncpy(3);  they
>                are not related at all.
> 
>                This  function  concatenates  the input unterminated string con‐
>                tained in a null‐padded wixed‐width buffer, into  a  destination
>                (null‐terminated) string.  The programmer is responsible for al‐
>                locating a buffer large enough.  The return value is useless.
> 
>                ustr2stp()  is  a  better  replacement for this function for the
>                following reasons:
> 
>                •  Better performance (chain copy instead of concatenating).
> 
>                •  No need to call strlen(3), thanks to the useful return value.
> 
>                •  Function name that is not actively confusing.
> 
> RETURN VALUE
>         The following functions return a pointer to the terminating  null  byte
>         in the destination string (they never truncate).
> 
>         •  stpcpy(3)
> 
>         •  ustr2stp()
> 
>         •  mempcpy(3)
> 
>         The  following  functions return a pointer to the terminating null byte
>         in the destination string, except when truncation occurs; if truncation
>         occurs, they return a pointer to one past the end  of  the  destination
>         buffer.
> 
>         •  stpecpy()
> 
>         •  stpecpyx()
> 
>         The  following function returns a pointer to one after the last charac‐
>         ter in the destination unterminated string; if truncation occurs,  that
>         pointer  is equivalent to a pointer to one past the end of the destina‐
>         tion buffer.
> 
>         •  stpncpy(3)
> 
>     Deprecated
>         The following functions return the length of the total string that they
>         tried to create (as if truncation didn’t occur).
> 
>         •  strlcpy(3bsd)
> 
>         •  strlcat(3bsd)
> 
>         The following function returns the length of the destination string, or
>         -E2BIG on truncation.
> 
>         •  strscpy(9)
> 
>         The following functions return the dst pointer, which is useless.
> 
>         •  strcpy(3)
> 
>         •  strcat(3)
> 
>         •  strncpy(3)
> 
>         •  strncat(3)
> 
> CAVEATS

And a new caveat.  I think it's obvious, but better safe than sorry.

        Don’t chain calls to truncating and non‐truncating  functions.   It  is
        conceptually  wrong  unless you know that the first part of a copy will
        always fit.  Anyway, the performance difference will probably be negli‐
        gible, so it will probably be more clear if you use  consistent  seman‐
        tics:  either  truncating  or non‐truncating.  Calling a non‐truncating
        function after a truncating one is necessarily wrong.


>         Some of the functions described here are not provided by  any  library;
>         you should write your own copy if you want to use them.
> 
>         The  deprecated status of these functions varies from system to system.
>         This page declares as deprecated those functions that have a better re‐
>         placement documented in this same page.
> 
> EXAMPLES
>         The following are examples of correct use of each of these functions.
> 
>         stpcpy(3)
>                    p = buf;
>                    p = stpcpy(p, "Hello ");
>                    p = stpcpy(p, "world");
>                    p = stpcpy(p, "!");
>                    len = p - buf;
>                    puts(buf);
> 
>         stpecpy()
>         stpecpyx()
>                    past_end = buf + sizeof(buf);
>                    p = buf;
>                    p = stpecpy(p, past_end, "Hello ");
>                    p = stpecpy(p, past_end, "world");
>                    p = stpecpy(p, past_end, "!");
>                    if (p == past_end) {
>                        p--;
>                        goto toolong;
>                    }
>                    len = p - buf;
>                    puts(buf);
> 
>         stpncpy(3)
>                    past_end = buf + sizeof(buf);
>                    end = stpncpy(buf, "Hello world!", sizeof(buf));
>                    if (end == past_end)
>                        goto toolong;
>                    len = end - buf;
>                    for (size_t i = 0; i < sizeof(buf); i++)
>                        putchar(buf[i]);
> 
>         ustr2stp()
>                    p = buf;
>                    p = ustr2stp(p, "Hello ", 6);
>                    p = ustr2stp(p, "world", 42);  // Padding null bytes ignored.
>                    p = ustr2stp(p, "!", 1);
>                    len = p - buf;
>                    puts(buf);
> 
>         mempcpy(3)
>                    p = buf;
>                    p = mempcpy(p, "Hello ", 6);
>                    p = mempcpy(p, "world", 5);
>                    p = mempcpy(p, "!", 1);
>                    p = '\0';
>                    len = p - buf;
>                    puts(buf);
> 
>     Deprecated
>         strlcpy(3bsd)
>         strlcat(3bsd)
>                    if (strlcpy(buf, "Hello ", sizeof(buf)) >= sizeof(buf))
>                        goto toolong;
>                    if (strlcat(buf, "world", sizeof(buf)) >= sizeof(buf))
>                        goto toolong;
>                    len = strlcat(buf, "!", sizeof(buf));
>                    if (len >= sizeof(buf))
>                        goto toolong;
>                    puts(buf);
> 
>         strscpy(9)
>                    len = strscpy(buf, "Hello world!", sizeof(buf));
>                    if (len == -E2BIG)
>                        goto toolong;
>                    puts(buf);
> 
>         strcpy(3)
>         strcat(3)
>                    strcpy(buf, "Hello ");
>                    strcat(buf, "world");
>                    strcat(buf, "!");
>                    len = strlen(buf);
>                    puts(buf);
> 
>         strncpy(3)
>                    strncpy(buf, "Hello world!", sizeof(buf));
>                    if (buf + sizeof(buf) - 1 == '\0')
>                        goto toolong;
>                    len = strnlen(buf, sizeof(buf));
>                    for (size_t i = 0; i < sizeof(buf); i++)
>                        putchar(buf[i]);
> 
>         strncat(3)
>                    strncpy(buf, "Hello ", 6);
>                    strncat(buf, "world", 42);  // Padding null bytes ignored.
>                    strncat(buf, "!", 1);
>                    puts(buf);
> 
> SEE ALSO
>         memcpy(3), memccpy(3), mempcpy(3), string(3)
> 
> Linux man‐pages (unreleased)        (date)                      string_copy(7)
> 
> 
> 

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH 1/3] strcpy.3: Rewrite page to document all string-copying functions
  2022-12-11 23:59 string_copy(7): New manual page documenting string copying functions Alejandro Colomar
                   ` (2 preceding siblings ...)
  2022-12-12  0:32 ` Alejandro Colomar
@ 2022-12-12 14:24 ` Alejandro Colomar
  2022-12-12 17:33   ` Alejandro Colomar
                     ` (4 more replies)
  2022-12-12 14:24 ` [PATCH 2/3] stpcpy.3, stpncpy.3, strcat.3, strncat.3, strncpy.3: Transform the old pages into " Alejandro Colomar
  2022-12-12 14:24 ` [PATCH 3/3] stpecpy.3, stpecpyx.3, strlcat.3, strlcpy.3, strscpy.3: Add new " Alejandro Colomar
  5 siblings, 5 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-12 14:24 UTC (permalink / raw)
  To: linux-man; +Cc: Alejandro Colomar

This is an opportunity to use consistent language across the
documentation for all string-copying functions.

It is also easier to show the similarities and differences between all
of the functions, so that a reader can use this page to know which
function is needed for a given task.

Many functions that are inferior to another one, have been marked as
deprecated, notwithstanding the deprecation status in C libraries or
any standards.  Alternatives have been given in the same page, with
reference implementations.

Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man3/strcpy.3 | 1053 ++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 965 insertions(+), 88 deletions(-)

diff --git a/man3/strcpy.3 b/man3/strcpy.3
index 74c3180ae..661319f0d 100644
--- a/man3/strcpy.3
+++ b/man3/strcpy.3
@@ -1,48 +1,764 @@
-.\" Copyright (C) 1993 David Metcalfe (david@prism.demon.co.uk)
+.\" Copyright 2022 Alejandro Colomar <alx@kernel.org>
 .\"
-.\" SPDX-License-Identifier: Linux-man-pages-copyleft
-.\"
-.\" References consulted:
-.\"     Linux libc source code
-.\"     Lewine's _POSIX Programmer's Guide_ (O'Reilly & Associates, 1991)
-.\"     386BSD man pages
-.\" Modified Sat Jul 24 18:06:49 1993 by Rik Faith (faith@cs.unc.edu)
-.\" Modified Fri Aug 25 23:17:51 1995 by Andries Brouwer (aeb@cwi.nl)
-.\" Modified Wed Dec 18 00:47:18 1996 by Andries Brouwer (aeb@cwi.nl)
-.\" 2007-06-15, Marc Boyer <marc.boyer@enseeiht.fr> + mtk
-.\"     Improve discussion of strncpy().
+.\" SPDX-License-Identifier: BSD-3-Clause
 .\"
 .TH strcpy 3 (date) "Linux man-pages (unreleased)"
+.\" ----- NAME :: -----------------------------------------------------/
 .SH NAME
-strcpy \- copy a string
+stpcpy,
+stpecpy, stpecpyx,
+strlcpy, strlcat,
+strscpy,
+strcpy, strcat,
+stpncpy,
+ustr2stp,
+strncpy,
+strncat,
+mempcpy
+\- copy strings
+.\" ----- LIBRARY :: --------------------------------------------------/
 .SH LIBRARY
+.TP
+.BR stpcpy (3)
+.TQ
+.BR stpncpy (3)
+.TQ
+.BR mempcpy (3)
+.TQ
+.BR strcpy "(3), \c"
+.BR strcat (3)
+.TQ
+.BR strncpy (3)
+.TQ
+.BR strncat (3)
 Standard C library
 .RI ( libc ", " \-lc )
+.TP
+.BR stpecpy "(3), \c"
+.BR stpecpyx (3)
+Not provided by any library.
+.TP
+.BR strlcpy "(3), \c"
+.BR strlcat (3)
+Utility functions from BSD systems
+.RI ( libbsd ", " \-lbsd )
+.TP
+.BR strscpy (9)
+Not provided by any library.
+It is a Linux kernel internal function.
+.\" ----- SYNOPSIS :: -------------------------------------------------/
 .SH SYNOPSIS
+.\" ----- SYNOPSIS :: (Null-terminated) strings :: --------------------/
 .nf
 .B #include <string.h>
+.fi
+.SS (Null-terminated) strings
+.nf
+// Chain-copy a string.
+.BI "char *stpcpy(char *restrict " dst ", const char *restrict " src );
 .PP
-.BI "char *strcpy(char *restrict " dest ", const char *restrict " src );
+// Chain-copy a string with truncation.
+// Not defined in libc.
+.BI "char *stpecpy(char *" dst ", char " past_end "[0], \
+const char *restrict " src );
+.PP
+// Chain-copy a string with truncation and SIGSEGV on invalid input.
+// Not defined in libc.
+.BI "char *stpecpyx(char *" dst ", char " past_end "[0], \
+const char *restrict " src );
+.PP
+// Copy a string with truncation and SIGSEGV on invalid input.
+.BR [[deprecated]] "  // Use stpecpyx(3) instead."
+.BI "size_t strlcpy(char " dst "[restrict ." sz "], \
+const char *restrict " src ,
+.BI "               size_t " sz );
+.PP
+// Concatenate a string with truncation.
+.BR [[deprecated]] "  // Use stpecpyx(3) instead."
+.BI "size_t strlcat(char " dst "[restrict ." sz "], \
+const char *restrict " src ,
+.BI "               size_t " sz );
+.PP
+// Copy a string with truncation.
+// Not defined in libc.
+.BR [[deprecated]] "  // Use stpecpy(3) instead."
+.BI "ssize_t strscpy(char " dst "[restrict ." sz "], \
+const char " src "[restrict ." sz ],
+.BI "               size_t " sz );
+.PP
+// Copy a string.
+.BR [[deprecated]] "  // Use stpcpy(3) instead."
+.BI "char *strcpy(char *restrict " dst ", const char *restrict " src );
+.PP
+// Concatenate a string.
+.BR [[deprecated]] "  // Use stpcpy(3) instead."
+.BI "char *strcat(char *restrict " dst ", const char *restrict " src );
+.fi
+.\" ----- SYNOPSIS :: Unterminated strings (null-padded fixed-width buffers)
+.SS Unterminated strings (null-padded fixed-width buffers)
+.nf
+// Zero a fixed-width buffer, and
+// copy a string with truncation into an unterminated string.
+.BI "char *stpncpy(char " dst "[restrict ." sz "], \
+const char *restrict " src ,
+.BI "               size_t " sz );
+.PP
+// Chain-copy an unterminated string into a string.
+// Not defined in libc.
+.BI "char *ustr2stp(char *restrict " dst ", \
+const char " src "[restrict ." sz ],
+.BI "               size_t " sz );
+.PP
+// Zero a fixed-width buffer, and
+// copy a string with truncation into an unterminated string
+.BR [[deprecated]] "  // Use stpncpy(3) instead."
+.BI "char *strncpy(char " dest "[restrict ." sz "], \
+const char *restrict " src ,
+.BI "               size_t " sz );
+.PP
+// Concatenate an unterminated string into a string.
+.BR [[deprecated]] "  // Use ustr2stp(3) instead."
+.BI "char *strncat(char *restrict " dst ", const char " src "[restrict ." sz ],
+.BI "               size_t " sz );
+.fi
+.\" ----- SYNOPSIS :: String structures :: ----------------------------/
+.SS String structures
+.nf
+// (Null-terminated) string structure.
+// Not defined in libc.
+.B struct str_s {
+.B "    size_t  len;"
+.B "    char    *str;"
+.B };
+.PP
+// Unterminated string structure (overlapping strings).
+// Not defined in libc.
+.B struct ustr_s {
+.B "    size_t  len;"
+.B "    char    *ustr;"
+.B };
+.PP
+// Chain-copy a string structure into an unterminated string.
+.BI "void *mempcpy(void *restrict " dst ", \
+const void " src "[restrict ." len ],
+.BI "               size_t " len );
+.fi
+.PP
+.RS -4
+Feature Test Macro Requirements for glibc (see
+.BR feature_test_macros (7)):
+.RE
+.PP
+.BR stpcpy (3),
+.BR stpncpy (3):
+.nf
+    Since glibc 2.10:
+        _POSIX_C_SOURCE >= 200809L
+    Before glibc 2.10:
+        _GNU_SOURCE
+.fi
+.PP
+.BR mempcpy (3):
+.nf
+    _GNU_SOURCE
 .fi
 .SH DESCRIPTION
-The
-.BR strcpy ()
-function copies the string pointed to by
-.IR src ,
-including the terminating null byte (\(aq\e0\(aq),
-to the buffer pointed to by
-.IR dest .
-The strings may not overlap, and the destination string
-.I dest
-must be large enough to receive the copy.
-.I Beware of buffer overruns!
-(See BUGS.)
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: -----------------/
+.SS Terms (and abbreviations)
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: string (str) ----/
+.TP
+.IR "string " ( str )
+is a sequence of zero or more non-null characters, followed by a null byte.
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: unterminated string (ustr)
+.TP
+.IR "unterminated string " ( ustr )
+is a sequence of zero or more non-null characters.
+They are sometimes contained in fixed-width buffers,
+which usually contain padding null bytes after the unterminated string,
+to fill the rest of the buffer
+without affecting the unterminated string;
+however, those padding null bytes are not part of the unterminated string.
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: length (len) ----/
+.TP
+.IR "length " ( len )
+is the number of non-null characters in a string.
+It is the return value of
+.I strlen(str)
+and of
+.IR "strnlen(ustr, sz)" .
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: size (sz) -------/
+.TP
+.IR "size " ( sz )
+refers to the entire buffer where the string is contained.
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: end -------------/
+.TP
+.I end
+is the name of a pointer to the terminating null byte of a string,
+or a pointer to one past the last character of an unterminated string.
+This is the return value of functions that allow chaining.
+It is equivalent to
+.IR &str[len] .
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: past_end --------/
+.TP
+.I past_end
+is the name of a pointer to one past the end of the buffer
+that contains a string.
+It is equivalent to
+.IR &str[sz] .
+It is used as a sentinel value,
+to be able to truncate strings instead of overrunning a buffer.
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: string structures
+.TP
+.I string structure
+.TQ
+.I unterminated string structure
+Structure that contains the length of a string,
+as well as the string or the unterminated string.
+.\" ----- DESCRIPTION :: Types of functions :: ------------------------/
+.SS Types of functions
+.\" ----- DESCRIPTION :: Types of functions :: Copy, concatenate, and chain-copy
+.TP
+Copy, concatenate, and chain-copy
+Originally,
+there was a distinction between functions that copy and those that concatenate.
+However, newer functions that copy while allowing chaining
+cover both use cases with a single API.
+They are also algorithmically faster,
+since they don't need to search for the end of the existing string.
+.IP
+To chain copy functions,
+they need to return a pointer to the
+.IR end .
+That's a byproduct of the copy operation,
+so it has no performance costs.
+These functions are preferred over copy or concatenation functions.
+Functions that return such a pointer,
+and thus can be chained,
+have names of the form
+.RB * stp *(),
+since it's also common to name the pointer just
+.IR p .
+.IP
+Chain-copying functions that truncate
+should accept a pointer to one past the end of the destination buffer.
+This allows not having to recalculate the remaining size after each call.
+.\" ----- DESCRIPTION :: Types of functions :: Truncate or not? -------/
+.TP
+Truncate or not?
+The first thing to note is that programmers should be careful with buffers,
+so they always have the correct size,
+and truncation is not necessary.
+.IP
+In most cases,
+truncation is not desired,
+and it is simpler to just do the copy.
+Simpler code is safer code.
+Programming against programming mistakes by adding more code
+just adds more points where mistakes can be made.
+.IP
+Nowadays,
+compilers can detect most programmer errors with features like
+compiler warnings,
+static analyzers, and
+.BR \%_FORTIFY_SOURCE
+(see
+.BR ftm (7)).
+Keeping the code simple
+helps these error-detection features be more precise.
+.IP
+When validating user input,
+however,
+it makes sense to truncate.
+Remember to check the return value of such function calls.
+.IP
+Functions that truncate:
+.RS
+.IP \(bu 3
+.BR stpecpy (3)
+is the most efficient string copy function that performs truncation.
+It only requires to check for truncation once after all chained calls.
+.IP \(bu
+.BR stpecpyx (3)
+is a variant of
+.BR stpecpy (3)
+that consumes the entire source string,
+to catch bugs in the program
+by forcing a segmentation fault (as
+.BR strlcpy (3bsd)
+and
+.BR strlcat (3bsd)
+do).
+.IP \(bu
+.BR strlcpy (3bsd)
+and
+.BR strlcat (3bsd)
+are designed to crash if the input string is invalid
+(doesn't contain a null byte).
+.IP \(bu
+.BR strscpy (9)
+reports an error instead of crashing (similar to
+.BR stpecpy (3)).
+.IP \(bu
+.BR stpncpy (3)
+and
+.BR strncpy (3)
+also truncate, but they don't write strings,
+but rather unterminated strings.
+.RE
+.\" ----- DESCRIPTION :: Unterminated strings :: ----------------------/
+.SS Unterminated strings (null-padded fixed-width buffers)
+For historic reasons,
+some standard APIs,
+such as
+.BR utmpx (5),
+use unterminated strings in fixed-width buffers.
+To interface with them,
+specialized functions need to be used.
+.PP
+To copy strings into them, use
+.BR stpncpy (3).
+.PP
+To copy from an unterminated string within a fixed-width buffer into a string,
+ignoring any trailing null bytes in the source fixed-width buffer,
+you should use
+.BR ustr2stp (3).
+.\" ----- DESCRIPTION :: String structures :: -------------------------/
+.SS String structures
+The simplest string copying function is
+.BR mempcpy (3).
+It requires always knowing the length of your strings,
+for which string structures can be used.
+It makes the code simpler,
+since you always know the length of your strings,
+and it's also faster,
+since it doesn't need to repeatedly calculate those lengths.
+.BR mempcpy (3)
+always creates an unterminated string,
+so you need to explicitly set the terminating null byte.
+.PP
+.\" ----- DESCRIPTION :: String structures :: String structure --------/
+.TP
+String structure
+The following code can be used to
+chain-copy from a string structure into a string:
+.IP
+.in +4n
+.EX
+p = mempcpy(p, src\->str, src\->len);
+*p = \(aq\e0\(aq;
+.EE
+.in
+.IP
+The following code can be used to
+chain-copy from a string structure into an unterminated string:
+.IP
+.in +4n
+.EX
+p = mempcpy(p, src\->str, src\->len);
+.EE
+.in
+.\" ----- DESCRIPTION :: String structures :: Unterminated string structure
+.TP
+Unterminated string structure (overlapping strings)
+In programs that make considerable use of strings,
+and need the best performance,
+using overlapping strings can make a big difference.
+It allows holding substrings of a bigger string
+while not duplicating memory
+nor using time to do a copy.
+.IP
+However, this is delicate,
+since it requires using unterminated strings.
+C library APIs use strings,
+so programs that use unterminated strings
+will have to take care to differentiate strings from unterminated strings.
+.IP
+The following code can be used to
+chain-copy from an unterminated string structure to a string:
+.IP
+.in +4n
+.EX
+p = mempcpy(p, src\->ustr, src\->len);
+*p = \(aq\e0\(aq;
+.EE
+.in
+.IP
+The following code can be used to
+chain-copy from an unterminated string structure to an unterminated string:
+.IP
+.in +4n
+.EX
+p = mempcpy(p, src\->ustr, src\->len);
+.EE
+.in
+.\" ----- DESCRIPTION :: Functions :: ---------------------------------/
+.SS Functions
+.\" ----- DESCRIPTION :: Functions :: stpcpy(3) -----------------------/
+.TP
+.BR stpcpy (3)
+This function copies the input string into a destination string.
+The programmer is responsible for allocating a buffer large enough.
+It returns a pointer suitable for chaining.
+.IP
+A simple implementation of
+.BR stpcpy (3)
+might be:
+.IP
+.in +4n
+.EX
+char *
+stpcpy(char *restrict dst, const char *restrict src)
+{
+    return mempcpy(dst, src, strlen(src));
+}
+.EE
+.in
+.\" ----- DESCRIPTION :: Functions :: stpecpy(3), stpecpyx(3) ---------/
+.TP
+.BR stpecpy (3)
+.TQ
+.BR stpecpyx (3)
+These functions copy the input string into a destination string.
+If the destination buffer,
+limited by a pointer to one past the end of it,
+isn't large enough to hold the copy,
+the resulting string is truncated
+(but it is guaranteed to be null-terminated).
+They return a pointer suitable for chaining.
+Truncation needs to be detected only once after the last chained call.
+.BR stpecpyx (3)
+has identical semantics to
+.BR stpecpy (3),
+except that it forces a SIGSEGV on Undefined Behavior.
+.IP
+These functions are not provided by any library,
+but you can define them with the following reference implementations:
+.IP
+.in +4n
+.EX
+/* This code is in the public domain. */
+char *
+stpecpy(char *dst, char past_end[0],
+        const char *restrict src)
+{
+    char *p;
+
+    if (dst == past_end)
+        return past_end;
+
+    p = memccpy(dst, src, \(aq\e0\(aq, past_end \- dst);
+    if (p != NULL)
+        return p \- 1;
+
+    /* truncation detected */
+    past_end[\-1] = \(aq\e0\(aq;
+    return past_end;
+}
+
+/* This code is in the public domain. */
+char *
+stpecpyx(char *dst, char past_end[0],
+         const char *restrict src)
+{
+    if (src[strlen(src)] != \(aq\e0\(aq)
+        raise(SIGSEGV);
+
+    return stpecpy(dst, past_end, src);
+}
+.EE
+.in
+.\" ----- DESCRIPTION :: Functions :: stpncpy(3) ----------------------/
+.TP
+.BR stpncpy (3)
+This function copies the input string into
+a destination null-padded fixed-width unterminated string.
+If the destination buffer,
+limited by its size,
+isn't large enough to hold the copy,
+the resulting string is truncated.
+Since it creates an unterminated string,
+it doesn't need to write a terminating null byte.
+It returns a pointer suitable for chaining,
+but it's not ideal for that.
+Truncation needs to be detected only once after the last chained call.
+.IP
+If you're going to use this function in chained calls,
+it would be useful to develop a similar function
+that accepts a pointer to one past the end of the buffer instead of a size.
+.IP
+A simple implementation of
+.BR stpncpy (3)
+might be:
+.IP
+.in +4n
+.EX
+char *
+stpncpy(char *restrict dst, const char *restrict src,
+        size_t sz)
+{
+    char  *p;
+
+    bzero(dst, sz);
+    p = memccpy(dst, src, \(aq\e0\(aq, sz);
+    if (p == NULL)
+        return dst + sz;
+
+    return p \- 1;
+}
+.EE
+.in
+.\" ----- DESCRIPTION :: Functions :: ustr2stp(3) ---------------------/
+.TP
+.BR ustr2stp (3)
+This function copies the input unterminated string
+contained in a null-padded wixed-width buffer,
+into a destination (null-terminated) string.
+The programmer is responsible for allocating a buffer large enough.
+It returns a pointer suitable for chaining.
+.IP
+A truncating version of this function doesn't exist,
+since the size of the original string is always known,
+so it wouldn't be very useful.
+.IP
+This function is not provided by any library,
+but you can define it with the following reference implementation:
+.IP
+.in +4n
+.EX
+/* This code is in the public domain. */
+char *
+ustr2stp(char *restrict dst, const char *restrict src,
+         size_t sz)
+{
+    char  *end;
+
+    end = memccpy(dst, src, \(aq\e0\(aq, sz)) ?: dst + sz;
+    *end = \(aq\e0\(aq;
+
+    return end;
+}
+.EE
+.in
+.\" ----- DESCRIPTION :: Functions :: mempcpy(3) ----------------------/
+.TP
+.BR mempcpy (3)
+This function copies the input string,
+limited by its length,
+into a destination unterminated string.
+The programmer is responsible for allocating a buffer large enough.
+It returns a pointer suitable for chaining.
+.IP
+A simple implementation of
+.BR mempcpy (3)
+might be:
+.IP
+.in +4n
+.EX
+void *
+mempcpy(void *restrict dst, const void *restrict src,
+        size_t len)
+{
+    return memcpy(dst, src, len) + len;
+}
+.EE
+.in
+.\" ----- DESCRIPTION :: Deprecated functions :: ----------------------/
+.SS Deprecated functions
+.\" ----- DESCRIPTION :: Deprecated functions :: strlcpy(3bsd), strlcat(3bsd)
+.TP
+.BR strlcpy (3bsd)
+.TQ
+.BR strlcat (3bsd)
+.IR Deprecated .
+These functions copy the input string into a destination string.
+If the destination buffer,
+limited by its size,
+isn't large enough to hold the copy,
+the resulting string is truncated
+(but it is guaranteed to be null-terminated).
+They return the length of the total string they tried to create.
+These functions force a SIGSEGV on Undefined Behavior.
+.IP
+.BR stpecpyx (3)
+is a better replacement for these functions.
+.\" ----- DESCRIPTION :: Deprecated functions :: strscpy(9) -----------/
+.TP
+.BR strscpy (9)
+.IR Deprecated .
+This function copies the input string into a destination string.
+If the destination buffer,
+limited by its size,
+isn't large enough to hold the copy,
+the resulting string is truncated
+(but it is guaranteed to be null-terminated).
+It returns the length of the destination string, or
+.B \-E2BIG
+on truncation.
+.IP
+.BR stpecpy (3)
+is a better replacement for this function.
+.RE
+.\" ----- DESCRIPTION :: Deprecated functions :: strcpy(3), strcat(3) -/
+.TP
+.BR strcpy (3)
+.TQ
+.BR strcat (3)
+.IR Deprecated .
+These functions copy the input string into a destination string.
+The programmer is responsible for allocating a buffer large enough.
+The return value is useless.
+.IP
+.BR stpcpy (3)
+is a better replacement for these functions.
+.IP
+A simple implementation of
+.BR strcpy (3)
+and
+.BR strcat (3)
+might be:
+.IP
+.in +4n
+.EX
+char *
+strcpy(char *restrict dst, const char *restrict src)
+{
+    stpcpy(dst, src);
+    return dst;
+}
+
+char *
+strcat(char *restrict dst, const char *restrict src)
+{
+    stpcpy(dst + strlen(dst), src);
+    return dst;
+}
+.EE
+.in
+.\" ----- DESCRIPTION :: Deprecated functions :: strncpy(3) -----------/
+.TP
+.BR strncpy (3)
+.IR Deprecated .
+.BR strncpy (3)
+is identical to
+.BR stpncpy (3)
+except for the useless return value.
+Due to the return value,
+with this function it's hard to correctly check for truncation.
+.IP
+.BR stpncpy (3)
+is a better replacement for this function.
+.IP
+A simple implementation of
+.BR strncpy (3)
+might be:
+.IP
+.in +4n
+.EX
+char *
+strncpy(char *restrict dst, const char *restrict src,
+        size_t sz)
+{
+    stpncpy(dst, src, sz);
+    return dst;
+}
+.EE
+.in
+.\" ----- DESCRIPTION :: Deprecated functions :: strncat(3) -----------/
+.TP
+.BR strncat (3)
+.IR Deprecated .
+Do not confuse this function with
+.BR strncpy (3);
+they are not related at all.
+.IP
+This function concatenates the input unterminated string
+contained in a null-padded wixed-width buffer,
+into a destination (null-terminated) string.
+The programmer is responsible for allocating a buffer large enough.
+The return value is useless.
+.IP
+.BR ustr2stp (3)
+is a better replacement for this function.
+.IP
+A simple implementation of
+.BR strncat (3)
+might be:
+.IP
+.in +4n
+.EX
+char *
+strncat(char *restrict dst, const char *restrict src,
+        size_t sz)
+{
+    ustr2stp(dst + strlen(dst), src, sz);
+    return dst;
+}
+.EE
+.in
+.\" ----- RETURN VALUE :: ---------------------------------------------/
 .SH RETURN VALUE
-The
-.BR strcpy ()
-function returns a pointer to
-the destination string
-.IR dest .
+The following functions return
+a pointer to the terminating null byte in the destination string.
+.PD 0
+.IP \(bu 3
+.BR stpcpy (3)
+.IP \(bu
+.BR ustr2stp (3)
+.PD
+.PP
+The following functions return
+a pointer to the terminating null byte in the destination string,
+except when truncation occurs;
+if truncation occurs,
+they return a pointer to one past the end of the destination buffer.
+.IP \(bu 3
+.BR stpecpy (3),
+.BR stpecpyx (3)
+.PP
+The following function returns
+a pointer to one after the last character
+in the destination unterminated string;
+if truncation occurs,
+that pointer is equivalent to
+a pointer to one past the end of the destination buffer.
+.IP \(bu 3
+.BR stpncpy (3)
+.PP
+The following function returns
+a pointer to one after the last character
+in the destination unterminated string.
+.IP \(bu 3
+.BR mempcpy (3)
+.\" ----- RETURN VALUE :: Deprecated ----------------------------------/
+.SS Deprecated
+The following functions return
+the length of the total string that they tried to create
+(as if truncation didn't occur).
+.IP \(bu 3
+.BR strlcpy (3bsd),
+.BR strlcat (3bsd)
+.PP
+The following function returns
+the length of the destination string, or
+.B \-E2BIG
+on truncation.
+.IP \(bu 3
+.BR strscpy (9)
+.PP
+The following functions return the
+.I dst
+pointer,
+which is useless.
+.PD 0
+.IP \(bu 3
+.BR strcpy (3),
+.BR strcat (3)
+.IP \(bu
+.BR strncpy (3)
+.IP \(bu
+.BR strncat (3)
+.PD
+.\" ----- ATTRIBUTES :: -----------------------------------------------/
 .SH ATTRIBUTES
 For an explanation of the terms used in this section, see
 .BR attributes (7).
@@ -54,73 +770,234 @@ .SH ATTRIBUTES
 l l l.
 Interface	Attribute	Value
 T{
-.BR strcpy ()
+.BR stpcpy (),
+.BR stpecpy (),
+.BR stpecpyx ()
+.BR strlcpy (),
+.BR strlcat (),
+.BR strscpy (),
+.BR strcpy (),
+.BR strcat (),
+.BR stpncpy (),
+.BR ustr2stp (),
+.BR strncpy (),
+.BR strncat (),
+.BR mempcpy ()
 T}	Thread safety	MT-Safe
 .TE
 .hy
 .ad
 .sp 1
+.\" ----- STANDARDS :: ------------------------------------------------/
 .SH STANDARDS
-POSIX.1-2001, POSIX.1-2008, C89, C99, SVr4, 4.3BSD.
-.SH NOTES
-.SS strlcpy()
-Some systems (the BSDs, Solaris, and others) provide the following function:
+.TP
+.BR stpcpy (3)
+.\" This function was added to POSIX.1-2008.
+.\" Before that, it was not part of
+.\" the C or POSIX.1 standards, nor customary on UNIX systems.
+.\" It first appeared at least as early as 1986,
+.\" in the Lattice C AmigaDOS compiler,
+.\" then in the GNU fileutils and GNU textutils in 1989,
+.\" and in the GNU C library by 1992.
+.\" It is also present on the BSDs.
+.TQ
+.BR stpncpy (3)
+.\" This function was added to POSIX.1-2008.
+.\" Before that, it was a GNU extension.
+.\" It first appeared in glibc 1.07 in 1993.
+POSIX.1-2008.
+.TP
+.BR stpecpy "(3), \c"
+.BR stpecpyx (3)
+.TQ
+.BR ustr2stp (3)
+Not defined by any standards nor libraries.
+.TP
+.BR mempcpy (3)
+This function is a GNU extension.
+.TP
+.BR strlcpy "(3bsd), \c"
+.BR strlcat (3bsd)
+Functions originated in OpenBSD and present in some Unix systems.
+.TP
+.BR strscpy (9)
+Linux kernel internal function.
+.TP
+.BR strcpy "(3), \c"
+.BR strcat (3)
+.TQ
+.BR strncpy (3)
+.TQ
+.BR strncat (3)
+POSIX.1‐2001, POSIX.1‐2008, C89, C99, SVr4, 4.3BSD.
+.\" ----- CAVEATS :: --------------------------------------------------/
+.SH CAVEATS
+Don't mix chain calls to truncating and non-truncating functions.
+It is conceptually wrong
+unless you know that the first part of a copy will always fit.
+Anyway, the performance difference will probably be negligible,
+so it will probably be more clear if you use consistent semantics:
+either truncating or non-truncating.
+Calling a non-truncating function after a truncating one is necessarily wrong.
 .PP
+Some of the functions described here are not provided by any library;
+you should write your own copy if you want to use them.
+See STANDARDS.
+.PP
+The deprecation status of these functions varies from system to system.
+This page declares as deprecated
+those functions that have a better replacement documented in this same page.
+.\" ----- EXAMPLES :: -------------------------------------------------/
+.SH EXAMPLES
+The following are examples of correct use of each of these functions.
+.\" ----- EXAMPLES :: stpcpy(3) ---------------------------------------/
+.TP
+.BR stpcpy (3)
 .in +4n
 .EX
-size_t strlcpy(char *dest, const char *src, size_t size);
+p = buf;
+p = stpcpy(p, "Hello ");
+p = stpcpy(p, "world");
+p = stpcpy(p, "!");
+len = p \- buf;
+puts(buf);
 .EE
 .in
-.PP
-.\" http://static.usenix.org/event/usenix99/full_papers/millert/millert_html/index.html
-.\"     "strlcpy and strlcat - consistent, safe, string copy and concatenation"
-.\"     1999 USENIX Annual Technical Conference
-This function is similar to
-.BR strcpy (),
-but it copies at most
-.I size\-1
-bytes to
-.IR dest ,
-truncating the string as necessary.
-It always adds a terminating null byte.
-This function fixes some of the problems of
-.BR strcpy ()
-but the caller must still handle the possibility of data loss if
-.I size
-is too small.
-The return value of the function is the length of
-.IR src ,
-which allows truncation to be easily detected:
-if the return value is greater than or equal to
-.IR size ,
-truncation occurred.
-If loss of data matters, the caller
-.I must
-either check the arguments before the call,
-or test the function return value.
-.BR strlcpy ()
-is not present in glibc and is not standardized by POSIX,
-.\" https://lwn.net/Articles/506530/
-but is available on Linux via the
-.I libbsd
-library.
-.SH BUGS
-If the destination string of a
-.BR strcpy ()
-is not large enough, then anything might happen.
-Overflowing fixed-length string buffers is a favorite cracker technique
-for taking complete control of the machine.
-Any time a program reads or copies data into a buffer,
-the program first needs to check that there's enough space.
-This may be unnecessary if you can show that overflow is impossible,
-but be careful: programs can get changed over time,
-in ways that may make the impossible possible.
+.\" ----- EXAMPLES :: stpecpy(3), stpecpyx(3) -------------------------/
+.TP
+.BR stpecpy (3)
+.TQ
+.BR stpecpyx (3)
+.in +4n
+.EX
+past_end = buf + sizeof(buf);
+p = buf;
+p = stpecpy(p, past_end, "Hello ");
+p = stpecpy(p, past_end, "world");
+p = stpecpy(p, past_end, "!");
+if (p == past_end) {
+    p\-\-;
+    goto toolong;
+}
+len = p \- buf;
+puts(buf);
+.EE
+.in
+.\" ----- EXAMPLES :: stpncpy(3) --------------------------------------/
+.TP
+.BR stpncpy (3)
+.in +4n
+.EX
+past_end = buf + sizeof(buf);
+end = stpncpy(buf, "Hello world!", sizeof(buf));
+if (end == past_end)
+    goto toolong;
+len = end \- buf;
+for (size_t i = 0; i < sizeof(buf); i++)
+    putchar(buf[i]);
+.EE
+.in
+.\" ----- EXAMPLES :: ustr2stp(3) -------------------------------------/
+.TP
+.BR ustr2stp (3)
+.in +4n
+.EX
+p = buf;
+p = ustr2stp(p, "Hello ", 6);
+p = ustr2stp(p, "world", 42);  // Padding null bytes ignored.
+p = ustr2stp(p, "!", 1);
+len = p \- buf;
+puts(buf);
+.EE
+.in
+.\" ----- EXAMPLES :: mempcpy(3) --------------------------------------/
+.TP
+.BR mempcpy (3)
+.in +4n
+.EX
+p = buf;
+p = mempcpy(p, "Hello ", 6);
+p = mempcpy(p, "world", 5);
+p = mempcpy(p, "!", 1);
+p = \(aq\e0\(aq;
+len = p \- buf;
+puts(buf);
+.EE
+.in
+.\" ----- EXAMPLES :: Deprecated :: -----------------------------------/
+.SS Deprecated
+.\" ----- EXAMPLES :: Deprecated :: strlcpy(3bsd), strlcat(3bsd) ------/
+.TP
+.BR strlcpy (3bsd)
+.TQ
+.BR strlcat (3bsd)
+.in +4n
+.EX
+if (strlcpy(buf, "Hello ", sizeof(buf)) >= sizeof(buf))
+    goto toolong;
+if (strlcat(buf, "world", sizeof(buf)) >= sizeof(buf))
+    goto toolong;
+len = strlcat(buf, "!", sizeof(buf));
+if (len >= sizeof(buf))
+    goto toolong;
+puts(buf);
+.EE
+.in
+.\" ----- EXAMPLES :: Deprecated :: strscpy(9) ------------------------/
+.TP
+.BR strscpy (9)
+.in +4n
+.EX
+len = strscpy(buf, "Hello world!", sizeof(buf));
+if (len == \-E2BIG)
+    goto toolong;
+puts(buf);
+.EE
+.in
+.\" ----- EXAMPLES :: Deprecated :: strcpy(3), strcat(3) --------------/
+.TP
+.BR strcpy (3)
+.TQ
+.BR strcat (3)
+.in +4n
+.EX
+strcpy(buf, "Hello ");
+strcat(buf, "world");
+strcat(buf, "!");
+len = strlen(buf);
+puts(buf);
+.EE
+.in
+.\" ----- EXAMPLES :: Deprecated :: strncpy(3) ------------------------/
+.TP
+.BR strncpy (3)
+.in +4n
+.EX
+strncpy(buf, "Hello world!", sizeof(buf));
+if (buf + sizeof(buf) \- 1 == \(aq\e0\(aq)
+    goto toolong;
+len = strnlen(buf, sizeof(buf));
+for (size_t i = 0; i < sizeof(buf); i++)
+    putchar(buf[i]);
+.EE
+.in
+.\" ----- EXAMPLES :: Deprecated :: strncat(3) ------------------------/
+.TP
+.BR strncat (3)
+.in +4n
+.EX
+buf[0] = \(aq\e0\(aq;  // There's no 'cpy' function to this 'cat'.
+strncat(buf, "Hello ", 6);
+strncat(buf, "world", 42);  // Padding null bytes ignored.
+strncat(buf, "!", 1);
+len = strlen(buf);
+puts(buf);
+.EE
+.in
+.\" ----- SEE ALSO :: -------------------------------------------------/
 .SH SEE ALSO
-.BR bcopy (3),
-.BR memccpy (3),
+.BR bzero (3)
 .BR memcpy (3),
-.BR memmove (3),
-.BR stpcpy (3),
-.BR strdup (3),
-.BR string (3),
-.BR wcscpy (3)
+.BR memccpy (3),
+.BR mempcpy (3),
+.BR string (3)
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 2/3] stpcpy.3, stpncpy.3, strcat.3, strncat.3, strncpy.3: Transform the old pages into links to strcpy(3)
  2022-12-11 23:59 string_copy(7): New manual page documenting string copying functions Alejandro Colomar
                   ` (3 preceding siblings ...)
  2022-12-12 14:24 ` [PATCH 1/3] strcpy.3: Rewrite page to document all string-copying functions Alejandro Colomar
@ 2022-12-12 14:24 ` Alejandro Colomar
  2022-12-12 14:24 ` [PATCH 3/3] stpecpy.3, stpecpyx.3, strlcat.3, strlcpy.3, strscpy.3: Add new " Alejandro Colomar
  5 siblings, 0 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-12 14:24 UTC (permalink / raw)
  To: linux-man; +Cc: Alejandro Colomar

Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man3/stpcpy.3  | 115 +--------------------------------
 man3/stpncpy.3 | 123 +----------------------------------
 man3/strcat.3  | 161 +--------------------------------------------
 man3/strncat.3 | 172 +------------------------------------------------
 man3/strncpy.3 | 130 +------------------------------------
 5 files changed, 5 insertions(+), 696 deletions(-)

diff --git a/man3/stpcpy.3 b/man3/stpcpy.3
index 5770790fc..ff7476a84 100644
--- a/man3/stpcpy.3
+++ b/man3/stpcpy.3
@@ -1,114 +1 @@
-.\" Copyright 1995 James R. Van Zandt <jrv@vanzandt.mv.com>
-.\"
-.\" SPDX-License-Identifier: Linux-man-pages-copyleft
-.\"
-.TH stpcpy 3 (date) "Linux man-pages (unreleased)"
-.SH NAME
-stpcpy \- copy a string returning a pointer to its end
-.SH LIBRARY
-Standard C library
-.RI ( libc ", " \-lc )
-.SH SYNOPSIS
-.nf
-.B #include <string.h>
-.PP
-.BI "char *stpcpy(char *restrict " dest ", const char *restrict " src );
-.fi
-.PP
-.RS -4
-Feature Test Macro Requirements for glibc (see
-.BR feature_test_macros (7)):
-.RE
-.PP
-.BR stpcpy ():
-.nf
-    Since glibc 2.10:
-        _POSIX_C_SOURCE >= 200809L
-    Before glibc 2.10:
-        _GNU_SOURCE
-.fi
-.SH DESCRIPTION
-The
-.BR stpcpy ()
-function copies the string pointed to by
-.I src
-(including the terminating null byte (\(aq\e0\(aq)) to the array pointed to by
-.IR dest .
-The strings may not overlap, and the destination string
-.I dest
-must be large enough to receive the copy.
-.SH RETURN VALUE
-.BR stpcpy ()
-returns a pointer to the
-.B end
-of the string
-.I dest
-(that is, the address of the terminating null byte)
-rather than the beginning.
-.SH ATTRIBUTES
-For an explanation of the terms used in this section, see
-.BR attributes (7).
-.ad l
-.nh
-.TS
-allbox;
-lbx lb lb
-l l l.
-Interface	Attribute	Value
-T{
-.BR stpcpy ()
-T}	Thread safety	MT-Safe
-.TE
-.hy
-.ad
-.sp 1
-.SH STANDARDS
-This function was added to POSIX.1-2008.
-Before that, it was not part of
-the C or POSIX.1 standards, nor customary on UNIX systems.
-It first appeared at least as early as 1986,
-in the Lattice C AmigaDOS compiler,
-then in the GNU fileutils and GNU textutils in 1989,
-and in the GNU C library by 1992.
-It is also present on the BSDs.
-.SH BUGS
-This function may overrun the buffer
-.IR dest .
-.SH EXAMPLES
-For example, this program uses
-.BR stpcpy ()
-to concatenate
-.B foo
-and
-.B bar
-to produce
-.BR foobar ,
-which it then prints.
-.PP
-.\" SRC BEGIN (stpcpy.c)
-.EX
-#define _GNU_SOURCE
-#include <stdio.h>
-#include <string.h>
-
-int
-main(void)
-{
-    char buffer[20];
-    char *to = buffer;
-
-    to = stpcpy(to, "foo");
-    to = stpcpy(to, "bar");
-    printf("%s\en", buffer);
-}
-.EE
-.\" SRC END
-.SH SEE ALSO
-.BR bcopy (3),
-.BR memccpy (3),
-.BR memcpy (3),
-.BR memmove (3),
-.BR stpncpy (3),
-.BR strcpy (3),
-.BR string (3),
-.BR wcpcpy (3)
+.so man3/strcpy.3
diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
index 0a62e3055..ff7476a84 100644
--- a/man3/stpncpy.3
+++ b/man3/stpncpy.3
@@ -1,122 +1 @@
-.\" Copyright (c) Bruno Haible <haible@clisp.cons.org>
-.\" Copyright (c) 2022 Alejandro Colomar <alx@kernel.org>
-.\"
-.\" SPDX-License-Identifier: GPL-2.0-or-later
-.\"
-.\" References consulted:
-.\"   GNU glibc-2 source code and manual
-.\"
-.\" Corrected, aeb, 990824
-.TH stpncpy 3 (date) "Linux man-pages (unreleased)"
-.SH NAME
-stpncpy \- copy string into a fixed-length buffer and zero the rest of it
-.SH LIBRARY
-Standard C library
-.RI ( libc ", " \-lc )
-.SH SYNOPSIS
-.nf
-.B #include <string.h>
-.PP
-.BI "char *stpncpy(char " dest "[restrict ." n "], \
-const char " src "[restrict ." n ],
-.BI "              size_t " n );
-.fi
-.PP
-.RS -4
-Feature Test Macro Requirements for glibc (see
-.BR feature_test_macros (7)):
-.RE
-.PP
-.BR stpncpy ():
-.nf
-    Since glibc 2.10:
-        _POSIX_C_SOURCE >= 200809L
-    Before glibc 2.10:
-        _GNU_SOURCE
-.fi
-.SH DESCRIPTION
-.IR Note :
-This is probably not the function you want to use.
-For string copying with truncation, see
-.BR strlcpy (3bsd).
-.PP
-The
-.BR stpncpy ()
-function copies at most
-.I n
-characters of
-.I src
-and fills the rest of the
-.I dest
-buffer with null bytes.
-.BR Warning :
-If there is no null character among the first
-.I n
-bytes of
-.IR src ,
-the string placed in
-.I dest
-will not be null-terminated.
-.PP
-A simple implementation of
-.BR strncpy ()
-might be:
-.PP
-.in +4n
-.EX
-char *
-stpncpy(char *dest, const char *src, size_t n)
-{
-    char  *p
-
-    bzero(dest, n);
-    p = memccpy(dest, src, \(aq\e0\(aq, n);
-    if (p == NULL)
-        return dest + n;
-
-    return p - 1;
-}
-.EE
-.in
-.PP
-The use of
-.BR strncpy ()
-is to copy a C string to a fixed-length buffer
-while ensuring that unused bytes in the destination buffer are zeroed out
-(perhaps to prevent information leaks if the buffer is to be
-written to media or transmitted to another process via an
-interprocess communication technique).
-.SH RETURN VALUE
-.BR stpncpy ()
-returns a pointer to the terminating null byte
-in
-.IR dest ,
-or, if
-.I dest
-is not null-terminated,
-.IR dest + n
-(that is, a pointer to one-past-the-end of the array).
-.SH ATTRIBUTES
-For an explanation of the terms used in this section, see
-.BR attributes (7).
-.ad l
-.nh
-.TS
-allbox;
-lbx lb lb
-l l l.
-Interface	Attribute	Value
-T{
-.BR stpncpy ()
-T}	Thread safety	MT-Safe
-.TE
-.hy
-.ad
-.sp 1
-.SH STANDARDS
-This function was added to POSIX.1-2008.
-Before that, it was a GNU extension.
-It first appeared in glibc 1.07 in 1993.
-.SH SEE ALSO
-.BR strlcpy (3bsd)
-.BR wcpncpy (3)
+.so man3/strcpy.3
diff --git a/man3/strcat.3 b/man3/strcat.3
index 277e5b1e4..ff7476a84 100644
--- a/man3/strcat.3
+++ b/man3/strcat.3
@@ -1,160 +1 @@
-.\" Copyright 1993 David Metcalfe (david@prism.demon.co.uk)
-.\"
-.\" SPDX-License-Identifier: Linux-man-pages-copyleft
-.\"
-.\" References consulted:
-.\"     Linux libc source code
-.\"     Lewine's _POSIX Programmer's Guide_ (O'Reilly & Associates, 1991)
-.\"     386BSD man pages
-.\" Modified Sat Jul 24 18:11:47 1993 by Rik Faith (faith@cs.unc.edu)
-.\" 2007-06-15, Marc Boyer <marc.boyer@enseeiht.fr> + mtk
-.\"     Improve discussion of strncat().
-.TH strcat 3 (date) "Linux man-pages (unreleased)"
-.SH NAME
-strcat \- concatenate two strings
-.SH LIBRARY
-Standard C library
-.RI ( libc ", " \-lc )
-.SH SYNOPSIS
-.nf
-.B #include <string.h>
-.PP
-.BI "char *strcat(char *restrict " dest ", const char *restrict " src );
-.fi
-.SH DESCRIPTION
-The
-.BR strcat ()
-function appends the
-.I src
-string to the
-.I dest
-string,
-overwriting the terminating null byte (\(aq\e0\(aq) at the end of
-.IR dest ,
-and then adds a terminating null byte.
-The strings may not overlap, and the
-.I dest
-string must have
-enough space for the result.
-If
-.I dest
-is not large enough, program behavior is unpredictable;
-.IR "buffer overruns are a favorite avenue for attacking secure programs" .
-.SH RETURN VALUE
-The
-.BR strcat ()
-function returns a pointer to the resulting string
-.IR dest .
-.SH ATTRIBUTES
-For an explanation of the terms used in this section, see
-.BR attributes (7).
-.ad l
-.nh
-.TS
-allbox;
-lbx lb lb
-l l l.
-Interface	Attribute	Value
-T{
-.BR strcat (),
-.BR strncat ()
-T}	Thread safety	MT-Safe
-.TE
-.hy
-.ad
-.sp 1
-.SH STANDARDS
-POSIX.1-2001, POSIX.1-2008, C89, C99, SVr4, 4.3BSD.
-.SH NOTES
-Some systems (the BSDs, Solaris, and others) provide the following function:
-.PP
-.in +4n
-.EX
-size_t strlcat(char *dest, const char *src, size_t size);
-.EE
-.in
-.PP
-This function appends the null-terminated string
-.I src
-to the string
-.IR dest ,
-copying at most
-.I size\-strlen(dest)\-1
-from
-.IR src ,
-and adds a terminating null byte to the result,
-.I unless
-.I size
-is less than
-.IR strlen(dest) .
-This function fixes the buffer overrun problem of
-.BR strcat (),
-but the caller must still handle the possibility of data loss if
-.I size
-is too small.
-The function returns the length of the string
-.BR strlcat ()
-tried to create; if the return value is greater than or equal to
-.IR size ,
-data loss occurred.
-If data loss matters, the caller
-.I must
-either check the arguments before the call, or test the function return value.
-.BR strlcat ()
-is not present in glibc and is not standardized by POSIX,
-.\" https://lwn.net/Articles/506530/
-but is available on Linux via the
-.I libbsd
-library.
-.\"
-.SH EXAMPLES
-Because
-.BR strcat ()
-must find the null byte that terminates the string
-.I dest
-using a search that starts at the beginning of the string,
-the execution time of this function
-scales according to the length of the string
-.IR dest .
-This can be demonstrated by running the program below.
-(If the goal is to concatenate many strings to one target,
-then manually copying the bytes from each source string
-while maintaining a pointer to the end of the target string
-will provide better performance.)
-.\"
-.SS Program source
-\&
-.\" SRC BEGIN (strcat.c)
-.EX
-#include <stdint.h>
-#include <stdio.h>
-#include <string.h>
-#include <time.h>
-
-int
-main(void)
-{
-#define LIM 4000000
-    char p[LIM + 1];    /* +1 for terminating null byte */
-    time_t base;
-
-    base = time(NULL);
-    p[0] = \(aq\e0\(aq;
-
-    for (unsigned int j = 0; j < LIM; j++) {
-        if ((j % 10000) == 0)
-            printf("%u %jd\en", j, (intmax_t) (time(NULL) \- base));
-        strcat(p, "a");
-    }
-}
-.EE
-.\" SRC END
-.SH SEE ALSO
-.BR bcopy (3),
-.BR memccpy (3),
-.BR memcpy (3),
-.BR strcpy (3),
-.BR string (3),
-.BR strlcat (3bsd),
-.BR wcscat (3),
-.BR wcsncat (3)
+.so man3/strcpy.3
diff --git a/man3/strncat.3 b/man3/strncat.3
index 6e4bf6d78..ff7476a84 100644
--- a/man3/strncat.3
+++ b/man3/strncat.3
@@ -1,171 +1 @@
-.\" Copyright 2022 Alejandro Colomar <alx@kernel.org>
-.\"
-.\" SPDX-License-Identifier: Linux-man-pages-copyleft
-.\"
-.TH strncat 3 (date) "Linux man-pages (unreleased)"
-.SH NAME
-strncat \- concatenate an unterminated string into a string
-.SH LIBRARY
-Standard C library
-.RI ( libc ", " \-lc )
-.SH SYNOPSIS
-.nf
-.B #include <string.h>
-.PP
-.BI "char *strncat(char " dest "[restrict strlen(." dest ") + ." n " + 1],"
-.BI "              const char " src "[restrict ." n ],
-.BI "              size_t " n );
-.fi
-.SH DESCRIPTION
-.IR Note :
-This is probably not the function you want to use.
-For string concatenation with truncation, see
-.BR strlcat (3bsd).
-For copying or concatenating a string into a fixed-length buffer
-with zeroing of the rest, see
-.BR stpncpy (3).
-.PP
-.BR strncat ()
-appends at most
-.I n
-characters of
-.I src
-to the end of
-.IR dst .
-It always terminates with a null character the string placed in
-.IR dest .
-.PP
-An implementation of
-.BR strncat ()
-might be:
-.PP
-.in +4n
-.EX
-char *
-strncat(char *dest, const char *src, size_t n)
-{
-    char    *cat;
-    size_t  len;
-
-    cat = dest + strlen(dest);
-    len = strnlen(src, n);
-    memcpy(cat, src, len);
-    cat[len] = \(aq\e0\(aq;
-
-    return dest;
-}
-.EE
-.in
-.SH RETURN VALUE
-.BR strncat ()
-returns a pointer to the resulting string
-.IR dest .
-.SH ATTRIBUTES
-For an explanation of the terms used in this section, see
-.BR attributes (7).
-.ad l
-.nh
-.TS
-allbox;
-lbx lb lb
-l l l.
-Interface	Attribute	Value
-T{
-.BR strncat ()
-T}	Thread safety	MT-Safe
-.TE
-.hy
-.ad
-.sp 1
-.SH STANDARDS
-POSIX.1-2001, POSIX.1-2008, C89, C99, SVr4, 4.3BSD.
-.SH NOTES
-.SS ustr2stpe()
-You may want to write your own function similar to
-.BR strncpy (),
-with the following improvements:
-.IP \(bu 3
-Copy, instead of concatenating.
-There's no equivalent of
-.BR strncat ()
-that copies instead of concatenating.
-.IP \(bu
-Allow chaining the function,
-by returning a suitable pointer.
-Copy chaining is faster than concatenating.
-.IP \(bu
-Don't check for null characters in the middle of the unterminated string.
-If the string is terminated, this function should not be used.
-If the string is unterminated, it is unnecessary.
-.IP \(bu
-A name that tells what it does:
-Copy from an
-.IR u nterminated
-.IR str ing
-to a
-.IR st ring,
-and return a
-.IR p ointer
-to its end.
-.PP
-.in +4n
-.EX
-/* This code is in the public domain.
- *
- * char *ustr2stp(char dst[restrict .n+1],
- *                const char src[restrict .n],
- *                size_t len);
- */
-char *
-ustr2stp(char *restrict dst, const char *restrict src, size_t len)
-{
-    memcpy(dst, src, len);
-    dst[len] = \(aq\e0\(aq;
-
-    return dst + len;
-}
-.EE
-.in
-.SH CAVEATS
-This function doesn't know the size of the destination buffer,
-so it can overrun the buffer if the programmer wasn't careful enough.
-.SH BUGS
-.BR strncat (3)
-has a misleading name;
-it has no relationship with
-.BR strncpy (3).
-.SH EXAMPLES
-The following program creates a string
-from a concatenation of unterminated strings.
-.\" SRC BEGIN (strncpy.c)
-.EX
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-
-#define nitems(arr)  (sizeof((arr)) / sizeof((arr)[0]))
-
-int
-main(void)
-{
-    char pre[4] = "pre.";
-    char *post = ".post";
-    char *src = "some_long_body.post";
-    char dest[100];
-
-    dest[0] = \(aq\e0\(aq;
-    strncat(dest, pre, nitems(pre));
-    strncat(dest, src, strlen(src) \- strlen(post));
-
-    puts(dest);  // "pre.some_long_body"
-    exit(EXIT_SUCCESS);
-}
-.EE
-.\" SRC END
-.in
-.SH SEE ALSO
-.BR memccpy (3),
-.BR memcpy (3),
-.BR mempcpy (3),
-.BR strcpy (3),
-.BR string (3)
+.so man3/strcpy.3
diff --git a/man3/strncpy.3 b/man3/strncpy.3
index e2ffc683f..ff7476a84 100644
--- a/man3/strncpy.3
+++ b/man3/strncpy.3
@@ -1,129 +1 @@
-.\" Copyright (C) 1993 David Metcalfe <david@prism.demon.co.uk>
-.\" Copyright (C) 2022 Alejandro Colomar <alx@kernel.org>
-.\"
-.\" SPDX-License-Identifier: Linux-man-pages-copyleft
-.\"
-.\" References consulted:
-.\"     Linux libc source code
-.\"     Lewine's _POSIX Programmer's Guide_ (O'Reilly & Associates, 1991)
-.\"     386BSD man pages
-.\" Modified Sat Jul 24 18:06:49 1993 by Rik Faith (faith@cs.unc.edu)
-.\" Modified Fri Aug 25 23:17:51 1995 by Andries Brouwer (aeb@cwi.nl)
-.\" Modified Wed Dec 18 00:47:18 1996 by Andries Brouwer (aeb@cwi.nl)
-.\" 2007-06-15, Marc Boyer <marc.boyer@enseeiht.fr> + mtk
-.\"     Improve discussion of strncpy().
-.\"
-.TH strncpy 3 (date) "Linux man-pages (unreleased)"
-.SH NAME
-strncpy \- copy a string into a fixed-length buffer and zero the rest of it
-.SH LIBRARY
-Standard C library
-.RI ( libc ", " \-lc )
-.SH SYNOPSIS
-.nf
-.B #include <string.h>
-.PP
-.BI "[[deprecated]] char *strncpy(char " dest "[restrict ." n ],
-.BI "                             const char " src "[restrict ." n "], \
-size_t " n );
-.fi
-.SH DESCRIPTION
-.BI Note: " This is not the function you want to use."
-For string copying with truncation, see
-.BR strlcpy (3bsd).
-For copying a string into a fixed-length buffer with zeroing of the rest,
-see
-.BR stpncpy (3).
-.PP
-.BR strncpy ()
-copies at most
-.I n
-bytes of
-.IR src ,
-and fills the rest of the
-.I dest
-buffer with null bytes.
-.BR Warning :
-If there is no null byte
-among the first
-.I n
-bytes of
-.IR src ,
-the string placed in
-.I dest
-will not be null-terminated.
-.PP
-A simple implementation of
-.BR strncpy ()
-might be:
-.PP
-.in +4n
-.EX
-char *
-strncpy(char *dest, const char *src, size_t n)
-{
-    bzero(dest, n);
-    memccpy(dest, src, \(aq\e0\(aq, n);
-
-    return dest;
-}
-.EE
-.in
-.PP
-The use of
-.BR strncpy ()
-is to copy a C string to a fixed-length buffer
-while ensuring that unused bytes in the destination buffer are zeroed out
-(perhaps to prevent information leaks if the buffer is to be
-written to media or transmitted to another process via an
-interprocess communication technique).
-But
-.BR stpncpy (3)
-is better for this purpose,
-since it detects truncation.
-See BUGS below.
-.SH RETURN VALUE
-The
-.BR strncpy ()
-function returns a pointer to
-the destination buffer
-.IR dest .
-.SH ATTRIBUTES
-For an explanation of the terms used in this section, see
-.BR attributes (7).
-.ad l
-.nh
-.TS
-allbox;
-lbx lb lb
-l l l.
-Interface	Attribute	Value
-T{
-.BR strncpy ()
-T}	Thread safety	MT-Safe
-.TE
-.hy
-.ad
-.sp 1
-.SH STANDARDS
-POSIX.1-2001, POSIX.1-2008, C89, C99, SVr4, 4.3BSD.
-.SH BUGS
-.BR strncpy ()
-has a misleading name.
-It doesn't produce a (null-terminated) string;
-and it should never be used for producing a string.
-.PP
-It can't detect truncation.
-It's probably better to explicitly call
-.BR bzero (3)
-and
-.BR memccpy (3),
-or
-.BR stpncpy (3)
-since they allow detecting truncation.
-.SH SEE ALSO
-.BR bzero (3),
-.BR memccpy (3),
-.BR stpncpy (3),
-.BR string (3),
-.BR wcsncpy (3)
+.so man3/strcpy.3
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 3/3] stpecpy.3, stpecpyx.3, strlcat.3, strlcpy.3, strscpy.3: Add new links to strcpy(3)
  2022-12-11 23:59 string_copy(7): New manual page documenting string copying functions Alejandro Colomar
                   ` (4 preceding siblings ...)
  2022-12-12 14:24 ` [PATCH 2/3] stpcpy.3, stpncpy.3, strcat.3, strncat.3, strncpy.3: Transform the old pages into " Alejandro Colomar
@ 2022-12-12 14:24 ` Alejandro Colomar
  5 siblings, 0 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-12 14:24 UTC (permalink / raw)
  To: linux-man; +Cc: Alejandro Colomar

Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man3/stpecpy.3  | 1 +
 man3/stpecpyx.3 | 1 +
 man3/strlcat.3  | 1 +
 man3/strlcpy.3  | 1 +
 man3/strscpy.3  | 1 +
 5 files changed, 5 insertions(+)
 create mode 100644 man3/stpecpy.3
 create mode 100644 man3/stpecpyx.3
 create mode 100644 man3/strlcat.3
 create mode 100644 man3/strlcpy.3
 create mode 100644 man3/strscpy.3

diff --git a/man3/stpecpy.3 b/man3/stpecpy.3
new file mode 100644
index 000000000..ff7476a84
--- /dev/null
+++ b/man3/stpecpy.3
@@ -0,0 +1 @@
+.so man3/strcpy.3
diff --git a/man3/stpecpyx.3 b/man3/stpecpyx.3
new file mode 100644
index 000000000..ff7476a84
--- /dev/null
+++ b/man3/stpecpyx.3
@@ -0,0 +1 @@
+.so man3/strcpy.3
diff --git a/man3/strlcat.3 b/man3/strlcat.3
new file mode 100644
index 000000000..ff7476a84
--- /dev/null
+++ b/man3/strlcat.3
@@ -0,0 +1 @@
+.so man3/strcpy.3
diff --git a/man3/strlcpy.3 b/man3/strlcpy.3
new file mode 100644
index 000000000..ff7476a84
--- /dev/null
+++ b/man3/strlcpy.3
@@ -0,0 +1 @@
+.so man3/strcpy.3
diff --git a/man3/strscpy.3 b/man3/strscpy.3
new file mode 100644
index 000000000..ff7476a84
--- /dev/null
+++ b/man3/strscpy.3
@@ -0,0 +1 @@
+.so man3/strcpy.3
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH 1/3] strcpy.3: Rewrite page to document all string-copying functions
  2022-12-12 14:24 ` [PATCH 1/3] strcpy.3: Rewrite page to document all string-copying functions Alejandro Colomar
@ 2022-12-12 17:33   ` Alejandro Colomar
  2022-12-12 18:38     ` groff man(7) extensions (was: [PATCH 1/3] strcpy.3: Rewrite page to document all string-copying functions) G. Branden Robinson
  2022-12-12 23:00   ` [PATCH v2 0/3] Rewrite strcpy(3) Alejandro Colomar
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-12 17:33 UTC (permalink / raw)
  To: G. Branden Robinson, groff; +Cc: linux-man


[-- Attachment #1.1: Type: text/plain, Size: 1510 bytes --]

Hi Branden,

On 12/12/22 15:24, Alejandro Colomar wrote:
> +.\" ----- RETURN VALUE :: Deprecated ----------------------------------/
> +.SS Deprecated
> +The following functions return
> +the length of the total string that they tried to create
> +(as if truncation didn't occur).
> +.IP \(bu 3
> +.BR strlcpy (3bsd),
> +.BR strlcat (3bsd)
> +.PP
> +The following function returns
> +the length of the destination string, or
> +.B \-E2BIG
> +on truncation.
> +.IP \(bu 3
> +.BR strscpy (9)
> +.PP
> +The following functions return the
> +.I dst
> +pointer,
> +which is useless.
> +.PD 0
> +.IP \(bu 3
> +.BR strcpy (3),
> +.BR strcat (3)
> +.IP \(bu
> +.BR strncpy (3)
> +.IP \(bu
> +.BR strncat (3)
> +.PD

I realized that the above doesn't produce exactly what I wanted.  I wanted this:

        The following functions return the dst pointer, which is useless.

        •  strcpy(3), strcat(3)
        •  strncpy(3)
        •  strncat(3)

But I got this:

        The following functions return the dst pointer, which is useless.
        •  strcpy(3), strcat(3)
        •  strncpy(3)
        •  strncat(3)

I see various possible solutions, but which would you recommend?

I've thought of:

[
[...]
.PP
.PD 0
.IP \(bu 3
[...]
]

or

[
[...]
.IP \(bu 3
.PD 0
[...]
]

I was thinking about your future (I hope) .LS and .LE, and how they would also 
fit in here.

Cheers,

Alex


-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* groff man(7) extensions (was: [PATCH 1/3] strcpy.3: Rewrite page to document all string-copying functions)
  2022-12-12 17:33   ` Alejandro Colomar
@ 2022-12-12 18:38     ` G. Branden Robinson
  2022-12-13 15:45       ` a Q quotation macro for man(7) (was: groff man(7) extensions) G. Branden Robinson
  0 siblings, 1 reply; 53+ messages in thread
From: G. Branden Robinson @ 2022-12-12 18:38 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: groff, linux-man

[-- Attachment #1: Type: text/plain, Size: 5681 bytes --]

Hi Alex,

At 2022-12-12T18:33:52+0100, Alejandro Colomar wrote:
> On 12/12/22 15:24, Alejandro Colomar wrote:
> > +.\" ----- RETURN VALUE :: Deprecated ----------------------------------/
> > +.SS Deprecated
> > +The following functions return
> > +the length of the total string that they tried to create
> > +(as if truncation didn't occur).
> > +.IP \(bu 3
> > +.BR strlcpy (3bsd),
> > +.BR strlcat (3bsd)
> > +.PP
> > +The following function returns
> > +the length of the destination string, or
> > +.B \-E2BIG
> > +on truncation.
> > +.IP \(bu 3
> > +.BR strscpy (9)
> > +.PP
> > +The following functions return the
> > +.I dst
> > +pointer,
> > +which is useless.
> > +.PD 0
> > +.IP \(bu 3
> > +.BR strcpy (3),
> > +.BR strcat (3)
> > +.IP \(bu
> > +.BR strncpy (3)
> > +.IP \(bu
> > +.BR strncat (3)
> > +.PD
> 
> I realized that the above doesn't produce exactly what I wanted.  I
> wanted this:
> 
>        The following functions return the dst pointer, which is useless.
> 
>        •  strcpy(3), strcat(3)
>        •  strncpy(3)
>        •  strncat(3)
> 
> But I got this:
> 
>        The following functions return the dst pointer, which is useless.
>        •  strcpy(3), strcat(3)
>        •  strncpy(3)
>        •  strncat(3)
> 
> I see various possible solutions, but which would you recommend?
> 
> I've thought of:
> 
> [
> [...]
> .PP
> .PD 0
> .IP \(bu 3
> [...]
> ]
> 
> or
> 
> [
> [...]
> .IP \(bu 3
> .PD 0
> [...]
> ]
> 
> I was thinking about your future (I hope) .LS and .LE, and how they
> would also fit in here.

Either is fine; if it were me, after threatening another radical
innovation, I would probably go with the latter, using ".PD 0" _after_
the first `IP` macro.  The hazard there is that if you re-order the
list, you might move the ".PD 0" with it accidentally.  Your earlier
approach avoids that at the cost of a _seemingly_ useless `PP` call.

Paragraphing macros in man(7) are not enclosures; they are spot
marks.[1]  This is an impedance mismatch with the brains of people who
grew up on HTML/XML.

Also, you don't need to keep restating the indentation amount ("3").

  Horizontal and vertical spacing
    The indentation argument accepted by .RS, .IP, .TP, and the
    deprecated .HP is a number plus an optional scaling unit.  If no
    scaling unit is given, the man package assumes "n".  An indentation
    specified in a call to .IP, .TP, or the deprecated .HP persists
    until (1) another of these macros is called with an explicit
    indentation argument, or (2) .SH, .SS, or .P or its synonyms is
    called; these clear the indentation entirely. [...]

(ms(7) works this way, too, though its macro repertoire differs a
bit.[2])

I haven't given much more thought to `LS` and `LE`.  I haven't soured on
them; I simply have more urgent fish to fry.  The possibility of having
`LS` accept an argument to set the paragraph indentation so that `IP` or
`TP` items can be rearranged freely within has occurred to me.  So has
making the inter-paragraph distance itself an argument (possibly just a
Boolean).  So has support for auto-enumerated lists.  But then I wonder
if man(7) authors really need a macro that is as tricked-out as
mdoc(7)'s list macros, which take up about 5 of its 31 U.S. letter-sized
pages of documentation.  That's heavy.

Here's a list of man(7) extensions to which I have given consideration.

	KS/KE	Keeps.  Easy.[3]  Harmlessly ignorable by other
		implementations.
	LS/LE	List enclosure.  Throws a semantic hint (e.g., for HTML
		output) and eliminates final use case of `PD` macro.[4]
	DC/TG	Semantics at last.  Sure to rouse anger in people who
		decided long ago that man(7) can't do this.[5]  Having
		looked more closely at mdoc(7) since writing that, I
		think `DC` should accept a _pair_ of arguments as its
		second and third parameters for bracketing purposes.
		But again, most man page authors would never need to
		mess with `DC` at all.

`DS`/`DE` have been squatted on by groff man(7) for 13 years and have
precedent going back at least to DEC Ultrix, but apart from using them
as a sort of ersatz tbl(1) for people who don't want to use to use
tbl(1),[6] I haven't been able to come up with any use cases for it.

Regards,
Branden

[1] For the curious, all the paragraphing macros in groff man(7) call
    the same common macro.  (They all perform additional operations.)

.\" Break a paragraph.  Restore defaults, except for indentation.
.de an-break-paragraph
.  ft R
.  ps \\n[PS]u
.  vs \\n[VS]u
.  sp \\n[PD]u
.  ns

   This internal macro name is subject to change.

[2] The new ms(7) manual for groff 1.23 appears to have stabilized.[7]
    Here's a URL to a work area I use to proof-read groff documentation.
    I invite you (and others) to check out ms.2022-12-07.pdf, or
    whatever version is there at the time.

    https://www.dropbox.com/sh/17ftu3z31couf07/AAC_9kq0ZA-Ra2ZhmZFWlLuva?dl=0

[3] I initially shied away from dealing with nested diversions, but I
    think I know how to cope with them now.  It seems that in a lot of
    cases, "bubbling up" as illustrated in groff Git's tbl(1) page is
    all that is required.

[4] https://lists.gnu.org/archive/html/groff/2022-05/msg00026.html
[5] https://lore.kernel.org/linux-man/20220724172947.qlunrfnje56yaasv@illithid/
[6] https://lore.kernel.org/linux-man/20220722222045.y7i3yc7d6agygien@illithid/

[7] By saying this, I increase my ability to find a flaw in it, or for
    a reader to report one.  We use all the QA tools at our disposal.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v2 0/3] Rewrite strcpy(3)
  2022-12-12 14:24 ` [PATCH 1/3] strcpy.3: Rewrite page to document all string-copying functions Alejandro Colomar
  2022-12-12 17:33   ` Alejandro Colomar
@ 2022-12-12 23:00   ` Alejandro Colomar
  2022-12-13 20:56     ` Jakub Wilk
                       ` (2 more replies)
  2022-12-12 23:00   ` [PATCH v2 1/3] " Alejandro Colomar
                     ` (2 subsequent siblings)
  4 siblings, 3 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-12 23:00 UTC (permalink / raw)
  To: linux-man; +Cc: Martin Sebor, Alejandro Colomar

I'm describing all string-copying functions together in a single manual
page, using consistent and clear language to help fix long-standing
misuses of those interfaces.

v2 has seen many changes, but two major are:

-  Don't deprecate functions.  A friendly explanation of why they are
   inferior is probably more appealing.

-  Use more precise syntax: mostly
   s/unterminated string/character sequence/g [Martin].


See the formatted page below.

Alejandro Colomar (3):
  strcpy.3: Rewrite page to document all string-copying functions
  stpcpy.3, stpncpy.3, strcat.3, strncat.3, strncpy.3: Transform the old
    pages into links to strcpy(3)
  stpecpy.3, stpecpyx.3, strlcat.3, strlcpy.3, strscpy.3: Add new links
    to strcpy(3)

 man3/stpcpy.3   |  115 +-----
 man3/stpecpy.3  |    1 +
 man3/stpecpyx.3 |    1 +
 man3/stpncpy.3  |  123 +-----
 man3/strcat.3   |  161 +-------
 man3/strcpy.3   | 1048 +++++++++++++++++++++++++++++++++++++++++++----
 man3/strlcat.3  |    1 +
 man3/strlcpy.3  |    1 +
 man3/strncat.3  |  172 +-------
 man3/strncpy.3  |  130 +-----
 man3/strscpy.3  |    1 +
 11 files changed, 970 insertions(+), 784 deletions(-)
 create mode 100644 man3/stpecpy.3
 create mode 100644 man3/stpecpyx.3
 create mode 100644 man3/strlcat.3
 create mode 100644 man3/strlcpy.3
 create mode 100644 man3/strscpy.3


strcpy(3)                  Library Functions Manual                  strcpy(3)

NAME
       stpcpy,  strcpy,  strcat, stpecpy, stpecpyx, strlcpy, strlcat, strscpy,
       stpncpy, strncpy, ustr2stp, strncat, mempcpy - copy strings and charac‐
       ter sequences

LIBRARY
       stpcpy(3)
       strcpy(3), strcat(3)
       stpncpy(3)
       strncpy(3)
       strncat(3)
       mempcpy(3)
              Standard C library (libc, -lc)

       stpecpy(3), stpecpyx(3)
              Not provided by any library.

       strlcpy(3), strlcat(3)
              Utility functions from BSD systems (libbsd, -lbsd)

       strscpy(3)
              Not provided by any library.  It  is  a  Linux  kernel  internal
              function.

SYNOPSIS
       #include <string.h>

   Strings
       // Chain‐copy a string.
       char *stpcpy(char *restrict dst, const char *restrict src);

       // Copy/concatenate a string.
       char *strcpy(char *restrict dst, const char *restrict src);
       char *strcat(char *restrict dst, const char *restrict src);

       // Chain‐copy a string with truncation.
       char *stpecpy(char *dst, char past_end[0], const char *restrict src);

       // Chain‐copy a string with truncation and SIGSEGV on UB.
       char *stpecpyx(char *dst, char past_end[0], const char *restrict src);

       // Copy/concatenate a string with truncation and SIGSEGV on UB.
       size_t strlcpy(char dst[restrict .sz], const char *restrict src,
                      size_t sz);
       size_t strlcat(char dst[restrict .sz], const char *restrict src,
                      size_t sz);

       // Copy a string with truncation.
       ssize_t strscpy(char dst[restrict .sz], const char src[restrict .sz],
                      size_t sz);

   Null‐padded character sequences
       // Zero a fixed‐width buffer, and
       // copy a string with truncation into a character sequence.
       char *stpncpy(char dst[restrict .sz], const char *restrict src,
                      size_t sz);

       // Zero a fixed‐width buffer, and
       // copy a string with truncation into a character sequence.
       char *strncpy(char dest[restrict .sz], const char *restrict src,
                      size_t sz);

       // Chain‐copy a null‐padded character sequence into a string.
       char *ustr2stp(char *restrict dst, const char src[restrict .sz],
                      size_t sz);

       // Concatenate a null‐padded character sequence into a string.
       char *strncat(char *restrict dst, const char src[restrict .sz],
                      size_t sz);

   Measured character sequences
       // Chain‐copy a measured character sequence.
       void *mempcpy(void *restrict dst, const void src[restrict .len],
                      size_t len);

   Feature Test Macro Requirements for glibc (see feature_test_macros(7)):

       stpcpy(3), stpncpy(3):
           Since glibc 2.10:
               _POSIX_C_SOURCE >= 200809L
           Before glibc 2.10:
               _GNU_SOURCE

       mempcpy(3):
           _GNU_SOURCE

DESCRIPTION
   Terms (and abbreviations)
       string (str)
              is  a sequence of zero or more non‐null characters followed by a
              null byte.

       character sequence (ustr)
              is a sequence of zero or more non‐null  characters.   A  program
              should  never  usa  a  character  sequence where a string is re‐
              quired.  However, with appropriate care, a string can be used in
              the place of a character sequence.

              null‐padded character sequence
                     Character  sequences  can  be  contained  in  fixed‐width
                     buffers, which contain padding null bytes after the char‐
                     acter  sequence,  to  fill the rest of the buffer without
                     affecting the character sequence; however, those  padding
                     null bytes are not part of the character sequence.

              measured character sequence
                     Character sequence delimited by its length.

       length (len)
              is  the  number  of non‐null characters in a string or character
              sequence.   It  is  the  return  value  of  strlen(str)  and  of
              strnlen(ustr, sz).

       size (sz)
              refers  to  the  entire buffer where the string or character se‐
              quence is contained.

       end    is the name of a pointer to  the  terminating  null  byte  of  a
              string, or a pointer to one past the last character of a charac‐
              ter  sequence.  This is the return value of functions that allow
              chaining.  It is equivalent to &str[len].

       past_end
              is the name of a pointer to one past the end of the buffer  that
              contains  a  string  or character sequence.  It is equivalent to
              &str[sz].  It is used as a sentinel value, to be able  to  trun‐
              cate  strings  or character sequences instead of overrunning the
              containing buffer.

   Copy, concatenate, and chain‐copy
       Originally, there was a distinction between  functions  that  copy  and
       those  that  concatenate.  However, newer functions that copy while al‐
       lowing chaining cover both use cases with a single API.  They are  also
       algorithmically  faster, since they don’t need to search for the end of
       the existing string.  However, their use is a bit more verbose.

       To chain copy functions, they need to return  a  pointer  to  the  end.
       That’s  a  byproduct  of  the  copy operation, so it has no performance
       costs.  Functions that return such a pointer, and thus can be  chained,
       have  names  of the form *stp*() or *memp*(), since it’s also common to
       name the pointer just p.

       Chain‐copying functions that truncate should accept a  pointer  to  one
       past  the  end  of  the  destination buffer, and have names of the form
       *stpe*().  This allows not having to recalculate the remaining size af‐
       ter each call.

   Truncate or not?
       The first thing to note is that  programmers  should  be  careful  with
       buffers,  so  they  always have the correct size, and truncation is not
       necessary.

       In most cases, truncation is not desired, and it is simpler to just  do
       the copy.  Simpler code is safer code.  Programming against programming
       mistakes  by  adding more code just adds more points where mistakes can
       be made.

       Nowadays, compilers can detect most  programmer  errors  with  features
       like  compiler  warnings,  static  analyzers,  and _FORTIFY_SOURCE (see
       ftm(7)).  Keeping the code simple helps these  overflow‐detection  fea‐
       tures be more precise.

       When  validating  user input, however, it makes sense to truncate.  Re‐
       member to check the return value of such function calls.

       Functions that truncate:

       •  stpecpy(3) is the most efficient string copy function that  performs
          truncation.  It only requires to check for truncation once after all
          chained calls.

       •  stpecpyx(3)  is  a  variant  of  stpecpy(3) that consumes the entire
          source string, to catch bugs in the program by forcing  a  segmenta‐
          tion fault (as strlcpy(3bsd) and strlcat(3bsd) do).

       •  strlcpy(3bsd)  and  strlcat(3bsd) are designed to crash if the input
          string is invalid (doesn’t contain a terminating null byte).

       •  strscpy(3)  reports  an  error  instead  of  crashing  (similar   to
          stpecpy(3)).

       •  stpncpy(3)  and  strncpy(3)  also  truncate,  but  they  don’t write
          strings, but rather null‐padded character sequences.

   Null‐padded character sequences
       For historic reasons, some standard APIs, such as utmpx(5),  use  null‐
       padded  character  sequences in fixed‐width buffers.  To interface with
       them, specialized functions need to be used.

       To copy strings into them, use stpncpy(3).

       To copy from an unterminated string within a fixed‐width buffer into  a
       string,  ignoring  any  trailing  null  bytes in the source fixed‐width
       buffer, you should use ustr2stp(3) or strncat(3).

   Measured character sequences
       The simplest character sequence copying function is mempcpy(3).  It re‐
       quires always knowing the length of your character sequences, for which
       structures can be used.  It makes the code much faster, since  you  al‐
       ways  know the length of your character sequences, and can do the mini‐
       mal copies and length measurements.  mempcpy(3)  copies  character  se‐
       quences, so you need to explicitly set the terminating null byte if you
       need a string.

       The  following code can be used to chain‐copy from a measured character
       sequence into a string:

           p = mempcpy(p, foo->str, foo->len);
           *p = '\0';

       The following code can be used to chain‐copy from a measured  character
       sequence into an unterminated string:

           p = mempcpy(p, src->str, src->len);

       In  programs  that  make  considerable  use of strings or character se‐
       quences, and need the best performance, using overlapping character se‐
       quences can make a big difference.  It allows holding subsequences of a
       larger character sequence.  while not duplicating memory nor using time
       to do a copy.

       However, this is delicate, since it requires using character sequences.
       C library APIs use strings, so programs that  use  character  sequences
       will  have  to  take care of differentiating strings from character se‐
       quences.

   String vs character sequence
       Some functions only operate on strings.  Those require that  the  input
       src  is  a string, and guarantee an output string (even when truncation
       occurs).  Functions that concatenate also  require  that  dst  holds  a
       string before the call.  List of functions:

       •  stpcpy(3)
       •  strcpy(3), strcat(3)
       •  stpecpy(3), stpecpyx(3)
       •  strlcpy(3bsd), strlcat(3bsd)
       •  strscpy(3)

       Other  functions  require  an  input string, but create a character se‐
       quence as output.  These functions have confusing  names,  and  have  a
       long history of misuse.  List of functions:

       •  stpncpy(3)
       •  strncpy(3)

       Other  functions  operate on an input character sequence, and create an
       output string.  Functions that concatenate also require that dst  holds
       a  string before the call.  strncat(3) has an even more misleading name
       than the functions above.  List of functions:

       •  ustr2stp(3)
       •  strncat(3)

       And the last one, operates on an input character sequence to create  an
       output  character  sequence.  But because it asks for the length, and a
       string is by nature composed of a character sequence of the same length
       plus a terminating null byte, a  string  is  also  accepted  as  input.
       Function:

       •  mempcpy(3)

   Functions
       stpcpy(3)
              This function copies the input string into a destination string.
              The  programmer  is  responsible  for  allocating a buffer large
              enough.  It returns a pointer suitable for chaining.

              An implementation of this function might be:

                  char *
                  stpcpy(char *restrict dst, const char *restrict src)
                  {
                      return mempcpy(dst, src, strlen(src));
                  }

       strcpy(3)
       strcat(3)
              These functions copy the input string into a destination string.
              The programmer is responsible  for  allocating  a  buffer  large
              enough.  The return value is useless.

              stpcpy(3) is a faster alternative to these functions.

              An implementation of these functions might be:

                  char *
                  strcpy(char *restrict dst, const char *restrict src)
                  {
                      stpcpy(dst, src);
                      return dst;
                  }

                  char *
                  strcat(char *restrict dst, const char *restrict src)
                  {
                      stpcpy(dst + strlen(dst), src);
                      return dst;
                  }

       stpecpy(3)
       stpecpyx(3)
              These functions copy the input string into a destination string.
              If  the destination buffer, limited by a pointer to one past the
              end of it, isn’t large enough to hold the  copy,  the  resulting
              string  is  truncated  (but  it  is guaranteed to be null‐termi‐
              nated).  They return a pointer suitable for  chaining.   Trunca‐
              tion needs to be detected only once after the last chained call.
              stpecpyx(3)  has  identical semantics to stpecpy(3), except that
              it forces a SIGSEGV if the src pointer is not a string.

              These functions are not provided by any library, but you can de‐
              fine them with the following reference implementations:

                  /* This code is in the public domain. */
                  char *
                  stpecpy(char *dst, char past_end[0],
                          const char *restrict src)
                  {
                      char *p;

                      if (dst == past_end)
                          return past_end;

                      p = memccpy(dst, src, '\0', past_end - dst);
                      if (p != NULL)
                          return p - 1;

                      /* truncation detected */
                      past_end[-1] = '\0';
                      return past_end;
                  }

                  /* This code is in the public domain. */
                  char *
                  stpecpyx(char *dst, char past_end[0],
                           const char *restrict src)
                  {
                      if (src[strlen(src)] != '\0')
                          raise(SIGSEGV);

                      return stpecpy(dst, past_end, src);
                  }

       strlcpy(3bsd)
       strlcat(3bsd)
              These functions copy the input string into a destination string.
              If the destination buffer, limited  by  its  size,  isn’t  large
              enough  to hold the copy, the resulting string is truncated (but
              it is guaranteed to be null‐terminated).  They return the length
              of the total string they tried to create.  These functions force
              a SIGSEGV if the src pointer is not a string.

              stpecpyx(3) is a faster alternative to these functions.

       strscpy(3)
              This function copies the input string into a destination string.
              If the destination buffer, limited  by  its  size,  isn’t  large
              enough  to hold the copy, the resulting string is truncated (but
              it is guaranteed to be null‐terminated).  It returns the  length
              of the destination string, or -E2BIG on truncation.

              stpecpy(3) is a simpler and faster alternative to this function.

       stpncpy(3)
              This  function  copies the input string into a destination null‐
              padded character sequence in a fixed‐width buffer.  If the  des‐
              tination buffer, limited by its size, isn’t large enough to hold
              the  copy, the resulting character sequence is truncated.  Since
              it creates a character sequence, it doesn’t need to write a ter‐
              minating null byte.  It returns a pointer suitable for chaining,
              but it’s not ideal for that.  Truncation needs  to  be  detected
              only once after the last chained call.

              If  you’re going to use this function in chained calls, it would
              be useful to develop a similar function that accepts  a  pointer
              to one past the end of the buffer instead of a size.

              An implementation of this function might be:

                  char *
                  stpncpy(char *restrict dst, const char *restrict src,
                          size_t sz)
                  {
                      char  *p;

                      bzero(dst, sz);
                      p = memccpy(dst, src, '\0', sz);
                      if (p == NULL)
                          return dst + sz;

                      return p - 1;
                  }

       ustr2stp(3)
              This function copies the input character sequence contained in a
              null‐padded  wixed‐width buffer, into a destination string.  The
              programmer is responsible for allocating a buffer large  enough.
              It returns a pointer suitable for chaining.

              A  truncating  version of this function doesn’t exist, since the
              size of the original character sequence is always known,  so  it
              wouldn’t be very useful.

              This function is not provided by any library, but you can define
              it with the following reference implementation:

                  /* This code is in the public domain. */
                  char *
                  ustr2stp(char *restrict dst, const char *restrict src,
                           size_t sz)
                  {
                      char  *end;

                      end = memccpy(dst, src, '\0', sz)) ?: dst + sz;
                      *end = '\0';

                      return end;
                  }

       strncpy(3)
              This  function is identical to stpncpy(3) except for the useless
              return value.  Due to the return value, with this function  it’s
              hard to correctly check for truncation.

              stpncpy(3) is a simpler alternative to this function.

              An implementation of this function might be:

                  char *
                  strncpy(char *restrict dst, const char *restrict src,
                          size_t sz)
                  {
                      stpncpy(dst, src, sz);
                      return dst;
                  }

       strncat(3)
              Do  not  confuse this function with strncpy(3); they are not re‐
              lated at all.

              This function concatenates the  input  character  sequence  con‐
              tained  in  a null‐padded wixed‐width buffer, into a destination
              string.  The programmer is responsible for allocating  a  buffer
              large enough.  The return value is useless.

              ustr2stp(3) is a faster alternative to this function.

              An implementation of this function might be:

                  char *
                  strncat(char *restrict dst, const char *restrict src,
                          size_t sz)
                  {
                      ustr2stp(dst + strlen(dst), src, sz);
                      return dst;
                  }

       mempcpy(3)
              This  function  copies  the input character sequence, limited by
              its length, into a destination character sequence.  The program‐
              mer is responsible for allocating a buffer large enough.  It re‐
              turns a pointer suitable for chaining.

              An implementation of this function might be:

                  void *
                  mempcpy(void *restrict dst, const void *restrict src,
                          size_t len)
                  {
                      return memcpy(dst, src, len) + len;
                  }

RETURN VALUE
       The following functions return a pointer to the terminating  null  byte
       in the destination string.

       •  stpcpy(3)
       •  ustr2stp(3)

       The  following  functions return a pointer to the terminating null byte
       in the destination string, except when truncation occurs; if truncation
       occurs, they return a pointer to one past the end  of  the  destination
       buffer (past_end).

       •  stpecpy(3), stpecpyx(3)

       The  following function returns a pointer to one after the last charac‐
       ter in the destination character sequence; if truncation  occurs,  that
       pointer  is equivalent to a pointer to one past the end of the destina‐
       tion buffer.

       •  stpncpy(3)

       The following function returns a pointer to one after the last  charac‐
       ter in the destination character sequence.

       •  mempcpy(3)

       The following functions return the length of the total string that they
       tried to create (as if truncation didn’t occur).

       •  strlcpy(3bsd), strlcat(3bsd)

       The following function returns the length of the destination string, or
       -E2BIG on truncation.

       •  strscpy(3)

       The following functions return the dst pointer, which is useless.

       •  strcpy(3), strcat(3)
       •  strncpy(3)
       •  strncat(3)

ATTRIBUTES
       For  an  explanation  of  the  terms  used in this section, see attrib‐
       utes(7).
       ┌────────────────────────────────────────────┬───────────────┬─────────┐
       │Interface                                   │ Attribute     │ Value   │
       ├────────────────────────────────────────────┼───────────────┼─────────┤
       │stpcpy(), strcpy(), strcat(), stpecpy(),    │ Thread safety │ MT‐Safe │
       │stpecpyx() strlcpy(), strlcat(), strscpy(), │               │         │
       │stpncpy(), strncpy(), ustr2stp(),           │               │         │
       │strncat(), mempcpy()                        │               │         │
       └────────────────────────────────────────────┴───────────────┴─────────┘

STANDARDS
       strcpy(3), strcat(3)
       strncpy(3)
       strncat(3)
              POSIX.1‐2001, POSIX.1‐2008, C89, C99, SVr4, 4.3BSD.

       stpcpy(3)
       stpncpy(3)
              POSIX.1‐2008.

       strlcpy(3bsd), strlcat(3bsd)
              Functions originated in OpenBSD and present in  some  Unix  sys‐
              tems.

       mempcpy(3)
              This function is a GNU extension.

       strscpy(3)
              Linux kernel internal function.

       stpecpy(3), stpecpyx(3)
       ustr2stp(3)
              Not defined by any standards nor libraries.

CAVEATS
       Don’t  mix  chain calls to truncating and non‐truncating functions.  It
       is conceptually wrong unless you know that the first  part  of  a  copy
       will  always  fit.  Anyway, the performance difference will probably be
       negligible, so it will probably be more clear if you use consistent se‐
       mantics: either truncating or non‐truncating.  Calling a non‐truncating
       function after a truncating one is necessarily wrong.

       Some of the functions described here are not provided by  any  library;
       you should write your own copy if you want to use them.  See STANDARDS.

EXAMPLES
       The following are examples of correct use of each of these functions.

       stpcpy(3)
                  p = buf;
                  p = stpcpy(p, "Hello ");
                  p = stpcpy(p, "world");
                  p = stpcpy(p, "!");
                  len = p - buf;
                  puts(buf);

       strcpy(3)
       strcat(3)
                  strcpy(buf, "Hello ");
                  strcat(buf, "world");
                  strcat(buf, "!");
                  len = strlen(buf);
                  puts(buf);

       stpecpy(3)
       stpecpyx(3)
                  past_end = buf + sizeof(buf);
                  p = buf;
                  p = stpecpy(p, past_end, "Hello ");
                  p = stpecpy(p, past_end, "world");
                  p = stpecpy(p, past_end, "!");
                  if (p == past_end) {
                      p--;
                      goto toolong;
                  }
                  len = p - buf;
                  puts(buf);

       strlcpy(3bsd)
       strlcat(3bsd)
                  if (strlcpy(buf, "Hello ", sizeof(buf)) >= sizeof(buf))
                      goto toolong;
                  if (strlcat(buf, "world", sizeof(buf)) >= sizeof(buf))
                      goto toolong;
                  len = strlcat(buf, "!", sizeof(buf));
                  if (len >= sizeof(buf))
                      goto toolong;
                  puts(buf);

       strscpy(3)
                  len = strscpy(buf, "Hello world!", sizeof(buf));
                  if (len == -E2BIG)
                      goto toolong;
                  puts(buf);

       stpncpy(3)
                  past_end = buf + sizeof(buf);
                  end = stpncpy(buf, "Hello world!", sizeof(buf));
                  if (end == past_end)
                      goto toolong;
                  len = end - buf;
                  for (size_t i = 0; i < sizeof(buf); i++)
                      putchar(buf[i]);

       strncpy(3)
                  strncpy(buf, "Hello world!", sizeof(buf));
                  if (buf + sizeof(buf) - 1 == '\0')
                      goto toolong;
                  len = strnlen(buf, sizeof(buf));
                  for (size_t i = 0; i < sizeof(buf); i++)
                      putchar(buf[i]);

       ustr2stp(3)
                  p = buf;
                  p = ustr2stp(p, "Hello ", 6);
                  p = ustr2stp(p, "world", 42);  // Padding null bytes ignored.
                  p = ustr2stp(p, "!", 1);
                  len = p - buf;
                  puts(buf);

       strncat(3)
                  buf[0] = '\0';  // There’s no ’cpy’ function to this ’cat’.
                  strncat(buf, "Hello ", 6);
                  strncat(buf, "world", 42);  // Padding null bytes ignored.
                  strncat(buf, "!", 1);
                  len = strlen(buf);
                  puts(buf);

       mempcpy(3)
                  p = buf;
                  p = mempcpy(p, "Hello ", 6);
                  p = mempcpy(p, "world", 5);
                  p = mempcpy(p, "!", 1);
                  p = '\0';
                  len = p - buf;
                  puts(buf);

SEE ALSO
       bzero(3), memcpy(3), memccpy(3), mempcpy(3), string(3)

Linux man‐pages (unreleased)        (date)                           strcpy(3)


-- 
2.38.1


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v2 1/3] strcpy.3: Rewrite page to document all string-copying functions
  2022-12-12 14:24 ` [PATCH 1/3] strcpy.3: Rewrite page to document all string-copying functions Alejandro Colomar
  2022-12-12 17:33   ` Alejandro Colomar
  2022-12-12 23:00   ` [PATCH v2 0/3] Rewrite strcpy(3) Alejandro Colomar
@ 2022-12-12 23:00   ` Alejandro Colomar
  2022-12-12 23:00   ` [PATCH v2 2/3] stpcpy.3, stpncpy.3, strcat.3, strncat.3, strncpy.3: Transform the old pages into links to strcpy(3) Alejandro Colomar
  2022-12-12 23:00   ` [PATCH v2 3/3] stpecpy.3, stpecpyx.3, strlcat.3, strlcpy.3, strscpy.3: Add new " Alejandro Colomar
  4 siblings, 0 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-12 23:00 UTC (permalink / raw)
  To: linux-man; +Cc: Martin Sebor, Alejandro Colomar

This is an opportunity to use consistent language across the
documentation for all string-copying functions.

It is also easier to show the similarities and differences between all
of the functions, so that a reader can use this page to know which
function is needed for a given task.

Many functions that are inferior to another one, have been marked as
deprecated, notwithstanding the deprecation status in C libraries or
any standards.  Alternatives have been given in the same page, with
reference implementations.

Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man3/strcpy.3 | 1048 ++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 960 insertions(+), 88 deletions(-)

diff --git a/man3/strcpy.3 b/man3/strcpy.3
index 74c3180ae..7e216e3bf 100644
--- a/man3/strcpy.3
+++ b/man3/strcpy.3
@@ -1,48 +1,765 @@
-.\" Copyright (C) 1993 David Metcalfe (david@prism.demon.co.uk)
+.\" Copyright 2022 Alejandro Colomar <alx@kernel.org>
 .\"
-.\" SPDX-License-Identifier: Linux-man-pages-copyleft
-.\"
-.\" References consulted:
-.\"     Linux libc source code
-.\"     Lewine's _POSIX Programmer's Guide_ (O'Reilly & Associates, 1991)
-.\"     386BSD man pages
-.\" Modified Sat Jul 24 18:06:49 1993 by Rik Faith (faith@cs.unc.edu)
-.\" Modified Fri Aug 25 23:17:51 1995 by Andries Brouwer (aeb@cwi.nl)
-.\" Modified Wed Dec 18 00:47:18 1996 by Andries Brouwer (aeb@cwi.nl)
-.\" 2007-06-15, Marc Boyer <marc.boyer@enseeiht.fr> + mtk
-.\"     Improve discussion of strncpy().
+.\" SPDX-License-Identifier: BSD-3-Clause
 .\"
 .TH strcpy 3 (date) "Linux man-pages (unreleased)"
+.\" ----- NAME :: -----------------------------------------------------/
 .SH NAME
-strcpy \- copy a string
+stpcpy,
+strcpy, strcat,
+stpecpy, stpecpyx,
+strlcpy, strlcat,
+strscpy,
+stpncpy,
+strncpy,
+ustr2stp,
+strncat,
+mempcpy
+\- copy strings and character sequences
+.\" ----- LIBRARY :: --------------------------------------------------/
 .SH LIBRARY
+.TP
+.BR stpcpy (3)
+.TQ
+.BR strcpy "(3), \c"
+.BR strcat (3)
+.TQ
+.BR stpncpy (3)
+.TQ
+.BR strncpy (3)
+.TQ
+.BR strncat (3)
+.TQ
+.BR mempcpy (3)
 Standard C library
 .RI ( libc ", " \-lc )
+.TP
+.BR stpecpy "(3), \c"
+.BR stpecpyx (3)
+Not provided by any library.
+.TP
+.BR strlcpy "(3), \c"
+.BR strlcat (3)
+Utility functions from BSD systems
+.RI ( libbsd ", " \-lbsd )
+.TP
+.BR strscpy (3)
+Not provided by any library.
+It is a Linux kernel internal function.
+.\" ----- SYNOPSIS :: -------------------------------------------------/
 .SH SYNOPSIS
 .nf
 .B #include <string.h>
+.fi
+.\" ----- SYNOPSIS :: (Null-terminated) strings -----------------------/
+.SS Strings
+.nf
+// Chain-copy a string.
+.BI "char *stpcpy(char *restrict " dst ", const char *restrict " src );
 .PP
-.BI "char *strcpy(char *restrict " dest ", const char *restrict " src );
+// Copy/concatenate a string.
+.BI "char *strcpy(char *restrict " dst ", const char *restrict " src );
+.BI "char *strcat(char *restrict " dst ", const char *restrict " src );
+.PP
+// Chain-copy a string with truncation.
+.BI "char *stpecpy(char *" dst ", char " past_end "[0], \
+const char *restrict " src );
+.PP
+// Chain-copy a string with truncation and SIGSEGV on UB.
+.BI "char *stpecpyx(char *" dst ", char " past_end "[0], \
+const char *restrict " src );
+.PP
+// Copy/concatenate a string with truncation and SIGSEGV on UB.
+.BI "size_t strlcpy(char " dst "[restrict ." sz "], \
+const char *restrict " src ,
+.BI "               size_t " sz );
+.BI "size_t strlcat(char " dst "[restrict ." sz "], \
+const char *restrict " src ,
+.BI "               size_t " sz );
+.PP
+// Copy a string with truncation.
+.BI "ssize_t strscpy(char " dst "[restrict ." sz "], \
+const char " src "[restrict ." sz ],
+.BI "               size_t " sz );
+.fi
+.\" ----- SYNOPSIS :: Null-padded character sequences --------/
+.SS Null-padded character sequences
+.nf
+// Zero a fixed-width buffer, and
+// copy a string with truncation into a character sequence.
+.BI "char *stpncpy(char " dst "[restrict ." sz "], \
+const char *restrict " src ,
+.BI "               size_t " sz );
+.PP
+// Zero a fixed-width buffer, and
+// copy a string with truncation into a character sequence.
+.BI "char *strncpy(char " dest "[restrict ." sz "], \
+const char *restrict " src ,
+.BI "               size_t " sz );
+.PP
+// Chain-copy a null-padded character sequence into a string.
+.BI "char *ustr2stp(char *restrict " dst ", \
+const char " src "[restrict ." sz ],
+.BI "               size_t " sz );
+.PP
+// Concatenate a null-padded character sequence into a string.
+.BI "char *strncat(char *restrict " dst ", const char " src "[restrict ." sz ],
+.BI "               size_t " sz );
+.fi
+.\" ----- SYNOPSIS :: Measured character sequences --------------------/
+.SS Measured character sequences
+.nf
+// Chain-copy a measured character sequence.
+.BI "void *mempcpy(void *restrict " dst ", \
+const void " src "[restrict ." len ],
+.BI "               size_t " len );
+.fi
+.PP
+.RS -4
+Feature Test Macro Requirements for glibc (see
+.BR feature_test_macros (7)):
+.RE
+.PP
+.BR stpcpy (3),
+.BR stpncpy (3):
+.nf
+    Since glibc 2.10:
+        _POSIX_C_SOURCE >= 200809L
+    Before glibc 2.10:
+        _GNU_SOURCE
+.fi
+.PP
+.BR mempcpy (3):
+.nf
+    _GNU_SOURCE
 .fi
 .SH DESCRIPTION
-The
-.BR strcpy ()
-function copies the string pointed to by
-.IR src ,
-including the terminating null byte (\(aq\e0\(aq),
-to the buffer pointed to by
-.IR dest .
-The strings may not overlap, and the destination string
-.I dest
-must be large enough to receive the copy.
-.I Beware of buffer overruns!
-(See BUGS.)
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: -----------------/
+.SS Terms (and abbreviations)
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: string (str) ----/
+.TP
+.IR "string " ( str )
+is a sequence of zero or more non-null characters followed by a null byte.
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: null-padded character seq
+.TP
+.IR "character sequence " ( ustr )
+is a sequence of zero or more non-null characters.
+A program should never usa a character sequence where a string is required.
+However, with appropriate care,
+a string can be used in the place of a character sequence.
+.RS
+.TP
+.I null-padded character sequence
+Character sequences can be contained in fixed-width buffers,
+which contain padding null bytes after the character sequence,
+to fill the rest of the buffer
+without affecting the character sequence;
+however, those padding null bytes are not part of the character sequence.
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: measured character sequence
+.TP
+.I measured character sequence
+Character sequence delimited by its length.
+.RE
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: length (len) ----/
+.TP
+.IR "length " ( len )
+is the number of non-null characters in a string or character sequence.
+It is the return value of
+.I strlen(str)
+and of
+.IR "strnlen(ustr, sz)" .
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: size (sz) -------/
+.TP
+.IR "size " ( sz )
+refers to the entire buffer
+where the string or character sequence is contained.
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: end -------------/
+.TP
+.I end
+is the name of a pointer to the terminating null byte of a string,
+or a pointer to one past the last character of a character sequence.
+This is the return value of functions that allow chaining.
+It is equivalent to
+.IR &str[len] .
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: past_end --------/
+.TP
+.I past_end
+is the name of a pointer to one past the end of the buffer
+that contains a string or character sequence.
+It is equivalent to
+.IR &str[sz] .
+It is used as a sentinel value,
+to be able to truncate strings or character sequences
+instead of overrunning the containing buffer.
+.\" ----- DESCRIPTION :: Copy, concatenate, and chain-copy ------------/
+.SS Copy, concatenate, and chain-copy
+Originally,
+there was a distinction between functions that copy and those that concatenate.
+However, newer functions that copy while allowing chaining
+cover both use cases with a single API.
+They are also algorithmically faster,
+since they don't need to search for the end of the existing string.
+However, their use is a bit more verbose.
+.PP
+To chain copy functions,
+they need to return a pointer to the
+.IR end .
+That's a byproduct of the copy operation,
+so it has no performance costs.
+Functions that return such a pointer,
+and thus can be chained,
+have names of the form
+.RB * stp *()
+or
+.RB * memp *(),
+since it's also common to name the pointer just
+.IR p .
+.PP
+Chain-copying functions that truncate
+should accept a pointer to one past the end of the destination buffer,
+and have names of the form
+.RB * stpe *().
+This allows not having to recalculate the remaining size after each call.
+.\" ----- DESCRIPTION :: Truncate or not? -----------------------------/
+.SS Truncate or not?
+The first thing to note is that programmers should be careful with buffers,
+so they always have the correct size,
+and truncation is not necessary.
+.PP
+In most cases,
+truncation is not desired,
+and it is simpler to just do the copy.
+Simpler code is safer code.
+Programming against programming mistakes by adding more code
+just adds more points where mistakes can be made.
+.PP
+Nowadays,
+compilers can detect most programmer errors with features like
+compiler warnings,
+static analyzers, and
+.BR \%_FORTIFY_SOURCE
+(see
+.BR ftm (7)).
+Keeping the code simple
+helps these overflow-detection features be more precise.
+.PP
+When validating user input,
+however,
+it makes sense to truncate.
+Remember to check the return value of such function calls.
+.PP
+Functions that truncate:
+.IP \(bu 3
+.BR stpecpy (3)
+is the most efficient string copy function that performs truncation.
+It only requires to check for truncation once after all chained calls.
+.IP \(bu
+.BR stpecpyx (3)
+is a variant of
+.BR stpecpy (3)
+that consumes the entire source string,
+to catch bugs in the program
+by forcing a segmentation fault (as
+.BR strlcpy (3bsd)
+and
+.BR strlcat (3bsd)
+do).
+.IP \(bu
+.BR strlcpy (3bsd)
+and
+.BR strlcat (3bsd)
+are designed to crash if the input string is invalid
+(doesn't contain a terminating null byte).
+.IP \(bu
+.BR strscpy (3)
+reports an error instead of crashing (similar to
+.BR stpecpy (3)).
+.IP \(bu
+.BR stpncpy (3)
+and
+.BR strncpy (3)
+also truncate, but they don't write strings,
+but rather null-padded character sequences.
+.\" ----- DESCRIPTION :: Null-padded character sequences --------------/
+.SS Null-padded character sequences
+For historic reasons,
+some standard APIs,
+such as
+.BR utmpx (5),
+use null-padded character sequences in fixed-width buffers.
+To interface with them,
+specialized functions need to be used.
+.PP
+To copy strings into them, use
+.BR stpncpy (3).
+.PP
+To copy from an unterminated string within a fixed-width buffer into a string,
+ignoring any trailing null bytes in the source fixed-width buffer,
+you should use
+.BR ustr2stp (3)
+or
+.BR strncat (3).
+.\" ----- DESCRIPTION :: Measured character sequences -----------------/
+.SS Measured character sequences
+The simplest character sequence copying function is
+.BR mempcpy (3).
+It requires always knowing the length of your character sequences,
+for which structures can be used.
+It makes the code much faster,
+since you always know the length of your character sequences,
+and can do the minimal copies and length measurements.
+.BR mempcpy (3)
+copies character sequences,
+so you need to explicitly set the terminating null byte if you need a string.
+.PP
+The following code can be used to
+chain-copy from a measured character sequence into a string:
+.PP
+.in +4n
+.EX
+p = mempcpy(p, foo\->str, foo\->len);
+*p = \(aq\e0\(aq;
+.EE
+.in
+.PP
+The following code can be used to
+chain-copy from a measured character sequence into an unterminated string:
+.PP
+.in +4n
+.EX
+p = mempcpy(p, src\->str, src\->len);
+.EE
+.in
+.PP
+In programs that make considerable use of strings or character sequences,
+and need the best performance,
+using overlapping character sequences can make a big difference.
+It allows holding subsequences of a larger character sequence.
+while not duplicating memory
+nor using time to do a copy.
+.PP
+However, this is delicate,
+since it requires using character sequences.
+C library APIs use strings,
+so programs that use character sequences
+will have to take care of differentiating strings from character sequences.
+.\" ----- DESCRIPTION :: String vs character sequence -----------------/
+.SS String vs character sequence
+Some functions only operate on strings.
+Those require that the input
+.I src
+is a string,
+and guarantee an output string
+(even when truncation occurs).
+Functions that concatenate
+also require that
+.I dst
+holds a string before the call.
+List of functions:
+.IP \(bu 3
+.PD 0
+.BR stpcpy (3)
+.IP \(bu
+.BR strcpy "(3), \c"
+.BR strcat (3)
+.IP \(bu
+.BR stpecpy "(3), \c"
+.BR stpecpyx (3)
+.IP \(bu
+.BR strlcpy "(3bsd), \c"
+.BR strlcat (3bsd)
+.IP \(bu
+.BR strscpy (3)
+.PD
+.PP
+Other functions require an input string,
+but create a character sequence as output.
+These functions have confusing names,
+and have a long history of misuse.
+List of functions:
+.IP \(bu 3
+.PD 0
+.BR stpncpy (3)
+.IP \(bu
+.BR strncpy (3)
+.PD
+.PP
+Other functions operate on an input character sequence,
+and create an output string.
+Functions that concatenate
+also require that
+.I dst
+holds a string before the call.
+.BR strncat (3)
+has an even more misleading name than the functions above.
+List of functions:
+.IP \(bu 3
+.PD 0
+.BR ustr2stp (3)
+.IP \(bu
+.BR strncat (3)
+.PD
+.PP
+And the last one,
+operates on an input character sequence
+to create an output character sequence.
+But because it asks for the length,
+and a string is by nature composed of a character sequence of the same length
+plus a terminating null byte,
+a string is also accepted as input.
+Function:
+.IP \(bu 3
+.BR mempcpy (3)
+.\" ----- DESCRIPTION :: Functions :: ---------------------------------/
+.SS Functions
+.\" ----- DESCRIPTION :: Functions :: stpcpy(3) -----------------------/
+.TP
+.BR stpcpy (3)
+This function copies the input string into a destination string.
+The programmer is responsible for allocating a buffer large enough.
+It returns a pointer suitable for chaining.
+.IP
+An implementation of this function might be:
+.IP
+.in +4n
+.EX
+char *
+stpcpy(char *restrict dst, const char *restrict src)
+{
+    return mempcpy(dst, src, strlen(src));
+}
+.EE
+.in
+.\" ----- DESCRIPTION :: Functions :: strcpy(3), strcat(3) ------------/
+.TP
+.BR strcpy (3)
+.TQ
+.BR strcat (3)
+These functions copy the input string into a destination string.
+The programmer is responsible for allocating a buffer large enough.
+The return value is useless.
+.IP
+.BR stpcpy (3)
+is a faster alternative to these functions.
+.IP
+An implementation of these functions might be:
+.IP
+.in +4n
+.EX
+char *
+strcpy(char *restrict dst, const char *restrict src)
+{
+    stpcpy(dst, src);
+    return dst;
+}
+
+char *
+strcat(char *restrict dst, const char *restrict src)
+{
+    stpcpy(dst + strlen(dst), src);
+    return dst;
+}
+.EE
+.in
+.\" ----- DESCRIPTION :: Functions :: stpecpy(3), stpecpyx(3) ---------/
+.TP
+.BR stpecpy (3)
+.TQ
+.BR stpecpyx (3)
+These functions copy the input string into a destination string.
+If the destination buffer,
+limited by a pointer to one past the end of it,
+isn't large enough to hold the copy,
+the resulting string is truncated
+(but it is guaranteed to be null-terminated).
+They return a pointer suitable for chaining.
+Truncation needs to be detected only once after the last chained call.
+.BR stpecpyx (3)
+has identical semantics to
+.BR stpecpy (3),
+except that it forces a SIGSEGV if the
+.I src
+pointer is not a string.
+.IP
+These functions are not provided by any library,
+but you can define them with the following reference implementations:
+.IP
+.in +4n
+.EX
+/* This code is in the public domain. */
+char *
+stpecpy(char *dst, char past_end[0],
+        const char *restrict src)
+{
+    char *p;
+
+    if (dst == past_end)
+        return past_end;
+
+    p = memccpy(dst, src, \(aq\e0\(aq, past_end \- dst);
+    if (p != NULL)
+        return p \- 1;
+
+    /* truncation detected */
+    past_end[\-1] = \(aq\e0\(aq;
+    return past_end;
+}
+
+/* This code is in the public domain. */
+char *
+stpecpyx(char *dst, char past_end[0],
+         const char *restrict src)
+{
+    if (src[strlen(src)] != \(aq\e0\(aq)
+        raise(SIGSEGV);
+
+    return stpecpy(dst, past_end, src);
+}
+.EE
+.in
+.\" ----- DESCRIPTION :: Functions :: strlcpy(3bsd), strlcat(3bsd) ----/
+.TP
+.BR strlcpy (3bsd)
+.TQ
+.BR strlcat (3bsd)
+These functions copy the input string into a destination string.
+If the destination buffer,
+limited by its size,
+isn't large enough to hold the copy,
+the resulting string is truncated
+(but it is guaranteed to be null-terminated).
+They return the length of the total string they tried to create.
+These functions force a SIGSEGV if the
+.I src
+pointer is not a string.
+.IP
+.BR stpecpyx (3)
+is a faster alternative to these functions.
+.\" ----- DESCRIPTION :: Functions :: strscpy(3) ----------------------/
+.TP
+.BR strscpy (3)
+This function copies the input string into a destination string.
+If the destination buffer,
+limited by its size,
+isn't large enough to hold the copy,
+the resulting string is truncated
+(but it is guaranteed to be null-terminated).
+It returns the length of the destination string, or
+.B \-E2BIG
+on truncation.
+.IP
+.BR stpecpy (3)
+is a simpler and faster alternative to this function.
+.RE
+.\" ----- DESCRIPTION :: Functions :: stpncpy(3) ----------------------/
+.TP
+.BR stpncpy (3)
+This function copies the input string into
+a destination null-padded character sequence in a fixed-width buffer.
+If the destination buffer,
+limited by its size,
+isn't large enough to hold the copy,
+the resulting character sequence is truncated.
+Since it creates a character sequence,
+it doesn't need to write a terminating null byte.
+It returns a pointer suitable for chaining,
+but it's not ideal for that.
+Truncation needs to be detected only once after the last chained call.
+.IP
+If you're going to use this function in chained calls,
+it would be useful to develop a similar function
+that accepts a pointer to one past the end of the buffer instead of a size.
+.IP
+An implementation of this function might be:
+.IP
+.in +4n
+.EX
+char *
+stpncpy(char *restrict dst, const char *restrict src,
+        size_t sz)
+{
+    char  *p;
+
+    bzero(dst, sz);
+    p = memccpy(dst, src, \(aq\e0\(aq, sz);
+    if (p == NULL)
+        return dst + sz;
+
+    return p \- 1;
+}
+.EE
+.in
+.\" ----- DESCRIPTION :: Functions :: ustr2stp(3) ---------------------/
+.TP
+.BR ustr2stp (3)
+This function copies the input character sequence
+contained in a null-padded wixed-width buffer,
+into a destination string.
+The programmer is responsible for allocating a buffer large enough.
+It returns a pointer suitable for chaining.
+.IP
+A truncating version of this function doesn't exist,
+since the size of the original character sequence is always known,
+so it wouldn't be very useful.
+.IP
+This function is not provided by any library,
+but you can define it with the following reference implementation:
+.IP
+.in +4n
+.EX
+/* This code is in the public domain. */
+char *
+ustr2stp(char *restrict dst, const char *restrict src,
+         size_t sz)
+{
+    char  *end;
+
+    end = memccpy(dst, src, \(aq\e0\(aq, sz)) ?: dst + sz;
+    *end = \(aq\e0\(aq;
+
+    return end;
+}
+.EE
+.in
+.\" ----- DESCRIPTION :: Functions :: strncpy(3) ----------------------/
+.TP
+.BR strncpy (3)
+This function is identical to
+.BR stpncpy (3)
+except for the useless return value.
+Due to the return value,
+with this function it's hard to correctly check for truncation.
+.IP
+.BR stpncpy (3)
+is a simpler alternative to this function.
+.IP
+An implementation of this function might be:
+.IP
+.in +4n
+.EX
+char *
+strncpy(char *restrict dst, const char *restrict src,
+        size_t sz)
+{
+    stpncpy(dst, src, sz);
+    return dst;
+}
+.EE
+.in
+.\" ----- DESCRIPTION :: Functions :: strncat(3) ----------------------/
+.TP
+.BR strncat (3)
+Do not confuse this function with
+.BR strncpy (3);
+they are not related at all.
+.IP
+This function concatenates the input character sequence
+contained in a null-padded wixed-width buffer,
+into a destination string.
+The programmer is responsible for allocating a buffer large enough.
+The return value is useless.
+.IP
+.BR ustr2stp (3)
+is a faster alternative to this function.
+.IP
+An implementation of this function might be:
+.IP
+.in +4n
+.EX
+char *
+strncat(char *restrict dst, const char *restrict src,
+        size_t sz)
+{
+    ustr2stp(dst + strlen(dst), src, sz);
+    return dst;
+}
+.EE
+.in
+.\" ----- DESCRIPTION :: Functions :: mempcpy(3) ----------------------/
+.TP
+.BR mempcpy (3)
+This function copies the input character sequence,
+limited by its length,
+into a destination character sequence.
+The programmer is responsible for allocating a buffer large enough.
+It returns a pointer suitable for chaining.
+.IP
+An implementation of this function might be:
+.IP
+.in +4n
+.EX
+void *
+mempcpy(void *restrict dst, const void *restrict src,
+        size_t len)
+{
+    return memcpy(dst, src, len) + len;
+}
+.EE
+.in
+.\" ----- RETURN VALUE :: ---------------------------------------------/
 .SH RETURN VALUE
-The
-.BR strcpy ()
-function returns a pointer to
-the destination string
-.IR dest .
+The following functions return
+a pointer to the terminating null byte in the destination string.
+.IP \(bu 3
+.PD 0
+.BR stpcpy (3)
+.IP \(bu
+.BR ustr2stp (3)
+.PD
+.PP
+The following functions return
+a pointer to the terminating null byte in the destination string,
+except when truncation occurs;
+if truncation occurs,
+they return a pointer to one past the end of the destination buffer
+.RI ( past_end ).
+.IP \(bu 3
+.BR stpecpy (3),
+.BR stpecpyx (3)
+.PP
+The following function returns
+a pointer to one after the last character
+in the destination character sequence;
+if truncation occurs,
+that pointer is equivalent to
+a pointer to one past the end of the destination buffer.
+.IP \(bu 3
+.BR stpncpy (3)
+.PP
+The following function returns
+a pointer to one after the last character
+in the destination character sequence.
+.IP \(bu 3
+.BR mempcpy (3)
+.PP
+The following functions return
+the length of the total string that they tried to create
+(as if truncation didn't occur).
+.IP \(bu 3
+.BR strlcpy (3bsd),
+.BR strlcat (3bsd)
+.PP
+The following function returns
+the length of the destination string, or
+.B \-E2BIG
+on truncation.
+.IP \(bu 3
+.BR strscpy (3)
+.PP
+The following functions return the
+.I dst
+pointer,
+which is useless.
+.IP \(bu 3
+.PD 0
+.BR strcpy (3),
+.BR strcat (3)
+.IP \(bu
+.BR strncpy (3)
+.IP \(bu
+.BR strncat (3)
+.PD
+.\" ----- ATTRIBUTES :: -----------------------------------------------/
 .SH ATTRIBUTES
 For an explanation of the terms used in this section, see
 .BR attributes (7).
@@ -54,73 +771,228 @@ .SH ATTRIBUTES
 l l l.
 Interface	Attribute	Value
 T{
-.BR strcpy ()
+.BR stpcpy (),
+.BR strcpy (),
+.BR strcat (),
+.BR stpecpy (),
+.BR stpecpyx ()
+.BR strlcpy (),
+.BR strlcat (),
+.BR strscpy (),
+.BR stpncpy (),
+.BR strncpy (),
+.BR ustr2stp (),
+.BR strncat (),
+.BR mempcpy ()
 T}	Thread safety	MT-Safe
 .TE
 .hy
 .ad
 .sp 1
+.\" ----- STANDARDS :: ------------------------------------------------/
 .SH STANDARDS
-POSIX.1-2001, POSIX.1-2008, C89, C99, SVr4, 4.3BSD.
-.SH NOTES
-.SS strlcpy()
-Some systems (the BSDs, Solaris, and others) provide the following function:
+.TP
+.BR strcpy "(3), \c"
+.BR strcat (3)
+.TQ
+.BR strncpy (3)
+.TQ
+.BR strncat (3)
+POSIX.1‐2001, POSIX.1‐2008, C89, C99, SVr4, 4.3BSD.
+.TP
+.BR stpcpy (3)
+.\" This function was added to POSIX.1-2008.
+.\" Before that, it was not part of
+.\" the C or POSIX.1 standards, nor customary on UNIX systems.
+.\" It first appeared at least as early as 1986,
+.\" in the Lattice C AmigaDOS compiler,
+.\" then in the GNU fileutils and GNU textutils in 1989,
+.\" and in the GNU C library by 1992.
+.\" It is also present on the BSDs.
+.TQ
+.BR stpncpy (3)
+.\" This function was added to POSIX.1-2008.
+.\" Before that, it was a GNU extension.
+.\" It first appeared in glibc 1.07 in 1993.
+POSIX.1-2008.
+.TP
+.BR strlcpy "(3bsd), \c"
+.BR strlcat (3bsd)
+Functions originated in OpenBSD and present in some Unix systems.
+.TP
+.BR mempcpy (3)
+This function is a GNU extension.
+.TP
+.BR strscpy (3)
+Linux kernel internal function.
+.TP
+.BR stpecpy "(3), \c"
+.BR stpecpyx (3)
+.TQ
+.BR ustr2stp (3)
+Not defined by any standards nor libraries.
+.\" ----- CAVEATS :: --------------------------------------------------/
+.SH CAVEATS
+Don't mix chain calls to truncating and non-truncating functions.
+It is conceptually wrong
+unless you know that the first part of a copy will always fit.
+Anyway, the performance difference will probably be negligible,
+so it will probably be more clear if you use consistent semantics:
+either truncating or non-truncating.
+Calling a non-truncating function after a truncating one is necessarily wrong.
 .PP
+Some of the functions described here are not provided by any library;
+you should write your own copy if you want to use them.
+See STANDARDS.
+.\" ----- EXAMPLES :: -------------------------------------------------/
+.SH EXAMPLES
+The following are examples of correct use of each of these functions.
+.\" ----- EXAMPLES :: stpcpy(3) ---------------------------------------/
+.TP
+.BR stpcpy (3)
 .in +4n
 .EX
-size_t strlcpy(char *dest, const char *src, size_t size);
+p = buf;
+p = stpcpy(p, "Hello ");
+p = stpcpy(p, "world");
+p = stpcpy(p, "!");
+len = p \- buf;
+puts(buf);
 .EE
 .in
-.PP
-.\" http://static.usenix.org/event/usenix99/full_papers/millert/millert_html/index.html
-.\"     "strlcpy and strlcat - consistent, safe, string copy and concatenation"
-.\"     1999 USENIX Annual Technical Conference
-This function is similar to
-.BR strcpy (),
-but it copies at most
-.I size\-1
-bytes to
-.IR dest ,
-truncating the string as necessary.
-It always adds a terminating null byte.
-This function fixes some of the problems of
-.BR strcpy ()
-but the caller must still handle the possibility of data loss if
-.I size
-is too small.
-The return value of the function is the length of
-.IR src ,
-which allows truncation to be easily detected:
-if the return value is greater than or equal to
-.IR size ,
-truncation occurred.
-If loss of data matters, the caller
-.I must
-either check the arguments before the call,
-or test the function return value.
-.BR strlcpy ()
-is not present in glibc and is not standardized by POSIX,
-.\" https://lwn.net/Articles/506530/
-but is available on Linux via the
-.I libbsd
-library.
-.SH BUGS
-If the destination string of a
-.BR strcpy ()
-is not large enough, then anything might happen.
-Overflowing fixed-length string buffers is a favorite cracker technique
-for taking complete control of the machine.
-Any time a program reads or copies data into a buffer,
-the program first needs to check that there's enough space.
-This may be unnecessary if you can show that overflow is impossible,
-but be careful: programs can get changed over time,
-in ways that may make the impossible possible.
+.\" ----- EXAMPLES :: strcpy(3), strcat(3) ----------------------------/
+.TP
+.BR strcpy (3)
+.TQ
+.BR strcat (3)
+.in +4n
+.EX
+strcpy(buf, "Hello ");
+strcat(buf, "world");
+strcat(buf, "!");
+len = strlen(buf);
+puts(buf);
+.EE
+.in
+.\" ----- EXAMPLES :: stpecpy(3), stpecpyx(3) -------------------------/
+.TP
+.BR stpecpy (3)
+.TQ
+.BR stpecpyx (3)
+.in +4n
+.EX
+past_end = buf + sizeof(buf);
+p = buf;
+p = stpecpy(p, past_end, "Hello ");
+p = stpecpy(p, past_end, "world");
+p = stpecpy(p, past_end, "!");
+if (p == past_end) {
+    p\-\-;
+    goto toolong;
+}
+len = p \- buf;
+puts(buf);
+.EE
+.in
+.\" ----- EXAMPLES :: strlcpy(3bsd), strlcat(3bsd) --------------------/
+.TP
+.BR strlcpy (3bsd)
+.TQ
+.BR strlcat (3bsd)
+.in +4n
+.EX
+if (strlcpy(buf, "Hello ", sizeof(buf)) >= sizeof(buf))
+    goto toolong;
+if (strlcat(buf, "world", sizeof(buf)) >= sizeof(buf))
+    goto toolong;
+len = strlcat(buf, "!", sizeof(buf));
+if (len >= sizeof(buf))
+    goto toolong;
+puts(buf);
+.EE
+.in
+.\" ----- EXAMPLES :: strscpy(3) --------------------------------------/
+.TP
+.BR strscpy (3)
+.in +4n
+.EX
+len = strscpy(buf, "Hello world!", sizeof(buf));
+if (len == \-E2BIG)
+    goto toolong;
+puts(buf);
+.EE
+.in
+.\" ----- EXAMPLES :: stpncpy(3) --------------------------------------/
+.TP
+.BR stpncpy (3)
+.in +4n
+.EX
+past_end = buf + sizeof(buf);
+end = stpncpy(buf, "Hello world!", sizeof(buf));
+if (end == past_end)
+    goto toolong;
+len = end \- buf;
+for (size_t i = 0; i < sizeof(buf); i++)
+    putchar(buf[i]);
+.EE
+.in
+.\" ----- EXAMPLES :: strncpy(3) --------------------------------------/
+.TP
+.BR strncpy (3)
+.in +4n
+.EX
+strncpy(buf, "Hello world!", sizeof(buf));
+if (buf + sizeof(buf) \- 1 == \(aq\e0\(aq)
+    goto toolong;
+len = strnlen(buf, sizeof(buf));
+for (size_t i = 0; i < sizeof(buf); i++)
+    putchar(buf[i]);
+.EE
+.in
+.\" ----- EXAMPLES :: ustr2stp(3) -------------------------------------/
+.TP
+.BR ustr2stp (3)
+.in +4n
+.EX
+p = buf;
+p = ustr2stp(p, "Hello ", 6);
+p = ustr2stp(p, "world", 42);  // Padding null bytes ignored.
+p = ustr2stp(p, "!", 1);
+len = p \- buf;
+puts(buf);
+.EE
+.in
+.\" ----- EXAMPLES :: strncat(3) --------------------------------------/
+.TP
+.BR strncat (3)
+.in +4n
+.EX
+buf[0] = \(aq\e0\(aq;  // There's no 'cpy' function to this 'cat'.
+strncat(buf, "Hello ", 6);
+strncat(buf, "world", 42);  // Padding null bytes ignored.
+strncat(buf, "!", 1);
+len = strlen(buf);
+puts(buf);
+.EE
+.in
+.\" ----- EXAMPLES :: mempcpy(3) --------------------------------------/
+.TP
+.BR mempcpy (3)
+.in +4n
+.EX
+p = buf;
+p = mempcpy(p, "Hello ", 6);
+p = mempcpy(p, "world", 5);
+p = mempcpy(p, "!", 1);
+p = \(aq\e0\(aq;
+len = p \- buf;
+puts(buf);
+.EE
+.in
+.\" ----- SEE ALSO :: -------------------------------------------------/
 .SH SEE ALSO
-.BR bcopy (3),
-.BR memccpy (3),
+.BR bzero (3),
 .BR memcpy (3),
-.BR memmove (3),
-.BR stpcpy (3),
-.BR strdup (3),
-.BR string (3),
-.BR wcscpy (3)
+.BR memccpy (3),
+.BR mempcpy (3),
+.BR string (3)
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 2/3] stpcpy.3, stpncpy.3, strcat.3, strncat.3, strncpy.3: Transform the old pages into links to strcpy(3)
  2022-12-12 14:24 ` [PATCH 1/3] strcpy.3: Rewrite page to document all string-copying functions Alejandro Colomar
                     ` (2 preceding siblings ...)
  2022-12-12 23:00   ` [PATCH v2 1/3] " Alejandro Colomar
@ 2022-12-12 23:00   ` Alejandro Colomar
  2022-12-12 23:00   ` [PATCH v2 3/3] stpecpy.3, stpecpyx.3, strlcat.3, strlcpy.3, strscpy.3: Add new " Alejandro Colomar
  4 siblings, 0 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-12 23:00 UTC (permalink / raw)
  To: linux-man; +Cc: Martin Sebor, Alejandro Colomar

Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man3/stpcpy.3  | 115 +--------------------------------
 man3/stpncpy.3 | 123 +----------------------------------
 man3/strcat.3  | 161 +--------------------------------------------
 man3/strncat.3 | 172 +------------------------------------------------
 man3/strncpy.3 | 130 +------------------------------------
 5 files changed, 5 insertions(+), 696 deletions(-)

diff --git a/man3/stpcpy.3 b/man3/stpcpy.3
index 5770790fc..ff7476a84 100644
--- a/man3/stpcpy.3
+++ b/man3/stpcpy.3
@@ -1,114 +1 @@
-.\" Copyright 1995 James R. Van Zandt <jrv@vanzandt.mv.com>
-.\"
-.\" SPDX-License-Identifier: Linux-man-pages-copyleft
-.\"
-.TH stpcpy 3 (date) "Linux man-pages (unreleased)"
-.SH NAME
-stpcpy \- copy a string returning a pointer to its end
-.SH LIBRARY
-Standard C library
-.RI ( libc ", " \-lc )
-.SH SYNOPSIS
-.nf
-.B #include <string.h>
-.PP
-.BI "char *stpcpy(char *restrict " dest ", const char *restrict " src );
-.fi
-.PP
-.RS -4
-Feature Test Macro Requirements for glibc (see
-.BR feature_test_macros (7)):
-.RE
-.PP
-.BR stpcpy ():
-.nf
-    Since glibc 2.10:
-        _POSIX_C_SOURCE >= 200809L
-    Before glibc 2.10:
-        _GNU_SOURCE
-.fi
-.SH DESCRIPTION
-The
-.BR stpcpy ()
-function copies the string pointed to by
-.I src
-(including the terminating null byte (\(aq\e0\(aq)) to the array pointed to by
-.IR dest .
-The strings may not overlap, and the destination string
-.I dest
-must be large enough to receive the copy.
-.SH RETURN VALUE
-.BR stpcpy ()
-returns a pointer to the
-.B end
-of the string
-.I dest
-(that is, the address of the terminating null byte)
-rather than the beginning.
-.SH ATTRIBUTES
-For an explanation of the terms used in this section, see
-.BR attributes (7).
-.ad l
-.nh
-.TS
-allbox;
-lbx lb lb
-l l l.
-Interface	Attribute	Value
-T{
-.BR stpcpy ()
-T}	Thread safety	MT-Safe
-.TE
-.hy
-.ad
-.sp 1
-.SH STANDARDS
-This function was added to POSIX.1-2008.
-Before that, it was not part of
-the C or POSIX.1 standards, nor customary on UNIX systems.
-It first appeared at least as early as 1986,
-in the Lattice C AmigaDOS compiler,
-then in the GNU fileutils and GNU textutils in 1989,
-and in the GNU C library by 1992.
-It is also present on the BSDs.
-.SH BUGS
-This function may overrun the buffer
-.IR dest .
-.SH EXAMPLES
-For example, this program uses
-.BR stpcpy ()
-to concatenate
-.B foo
-and
-.B bar
-to produce
-.BR foobar ,
-which it then prints.
-.PP
-.\" SRC BEGIN (stpcpy.c)
-.EX
-#define _GNU_SOURCE
-#include <stdio.h>
-#include <string.h>
-
-int
-main(void)
-{
-    char buffer[20];
-    char *to = buffer;
-
-    to = stpcpy(to, "foo");
-    to = stpcpy(to, "bar");
-    printf("%s\en", buffer);
-}
-.EE
-.\" SRC END
-.SH SEE ALSO
-.BR bcopy (3),
-.BR memccpy (3),
-.BR memcpy (3),
-.BR memmove (3),
-.BR stpncpy (3),
-.BR strcpy (3),
-.BR string (3),
-.BR wcpcpy (3)
+.so man3/strcpy.3
diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
index 0a62e3055..ff7476a84 100644
--- a/man3/stpncpy.3
+++ b/man3/stpncpy.3
@@ -1,122 +1 @@
-.\" Copyright (c) Bruno Haible <haible@clisp.cons.org>
-.\" Copyright (c) 2022 Alejandro Colomar <alx@kernel.org>
-.\"
-.\" SPDX-License-Identifier: GPL-2.0-or-later
-.\"
-.\" References consulted:
-.\"   GNU glibc-2 source code and manual
-.\"
-.\" Corrected, aeb, 990824
-.TH stpncpy 3 (date) "Linux man-pages (unreleased)"
-.SH NAME
-stpncpy \- copy string into a fixed-length buffer and zero the rest of it
-.SH LIBRARY
-Standard C library
-.RI ( libc ", " \-lc )
-.SH SYNOPSIS
-.nf
-.B #include <string.h>
-.PP
-.BI "char *stpncpy(char " dest "[restrict ." n "], \
-const char " src "[restrict ." n ],
-.BI "              size_t " n );
-.fi
-.PP
-.RS -4
-Feature Test Macro Requirements for glibc (see
-.BR feature_test_macros (7)):
-.RE
-.PP
-.BR stpncpy ():
-.nf
-    Since glibc 2.10:
-        _POSIX_C_SOURCE >= 200809L
-    Before glibc 2.10:
-        _GNU_SOURCE
-.fi
-.SH DESCRIPTION
-.IR Note :
-This is probably not the function you want to use.
-For string copying with truncation, see
-.BR strlcpy (3bsd).
-.PP
-The
-.BR stpncpy ()
-function copies at most
-.I n
-characters of
-.I src
-and fills the rest of the
-.I dest
-buffer with null bytes.
-.BR Warning :
-If there is no null character among the first
-.I n
-bytes of
-.IR src ,
-the string placed in
-.I dest
-will not be null-terminated.
-.PP
-A simple implementation of
-.BR strncpy ()
-might be:
-.PP
-.in +4n
-.EX
-char *
-stpncpy(char *dest, const char *src, size_t n)
-{
-    char  *p
-
-    bzero(dest, n);
-    p = memccpy(dest, src, \(aq\e0\(aq, n);
-    if (p == NULL)
-        return dest + n;
-
-    return p - 1;
-}
-.EE
-.in
-.PP
-The use of
-.BR strncpy ()
-is to copy a C string to a fixed-length buffer
-while ensuring that unused bytes in the destination buffer are zeroed out
-(perhaps to prevent information leaks if the buffer is to be
-written to media or transmitted to another process via an
-interprocess communication technique).
-.SH RETURN VALUE
-.BR stpncpy ()
-returns a pointer to the terminating null byte
-in
-.IR dest ,
-or, if
-.I dest
-is not null-terminated,
-.IR dest + n
-(that is, a pointer to one-past-the-end of the array).
-.SH ATTRIBUTES
-For an explanation of the terms used in this section, see
-.BR attributes (7).
-.ad l
-.nh
-.TS
-allbox;
-lbx lb lb
-l l l.
-Interface	Attribute	Value
-T{
-.BR stpncpy ()
-T}	Thread safety	MT-Safe
-.TE
-.hy
-.ad
-.sp 1
-.SH STANDARDS
-This function was added to POSIX.1-2008.
-Before that, it was a GNU extension.
-It first appeared in glibc 1.07 in 1993.
-.SH SEE ALSO
-.BR strlcpy (3bsd)
-.BR wcpncpy (3)
+.so man3/strcpy.3
diff --git a/man3/strcat.3 b/man3/strcat.3
index 277e5b1e4..ff7476a84 100644
--- a/man3/strcat.3
+++ b/man3/strcat.3
@@ -1,160 +1 @@
-.\" Copyright 1993 David Metcalfe (david@prism.demon.co.uk)
-.\"
-.\" SPDX-License-Identifier: Linux-man-pages-copyleft
-.\"
-.\" References consulted:
-.\"     Linux libc source code
-.\"     Lewine's _POSIX Programmer's Guide_ (O'Reilly & Associates, 1991)
-.\"     386BSD man pages
-.\" Modified Sat Jul 24 18:11:47 1993 by Rik Faith (faith@cs.unc.edu)
-.\" 2007-06-15, Marc Boyer <marc.boyer@enseeiht.fr> + mtk
-.\"     Improve discussion of strncat().
-.TH strcat 3 (date) "Linux man-pages (unreleased)"
-.SH NAME
-strcat \- concatenate two strings
-.SH LIBRARY
-Standard C library
-.RI ( libc ", " \-lc )
-.SH SYNOPSIS
-.nf
-.B #include <string.h>
-.PP
-.BI "char *strcat(char *restrict " dest ", const char *restrict " src );
-.fi
-.SH DESCRIPTION
-The
-.BR strcat ()
-function appends the
-.I src
-string to the
-.I dest
-string,
-overwriting the terminating null byte (\(aq\e0\(aq) at the end of
-.IR dest ,
-and then adds a terminating null byte.
-The strings may not overlap, and the
-.I dest
-string must have
-enough space for the result.
-If
-.I dest
-is not large enough, program behavior is unpredictable;
-.IR "buffer overruns are a favorite avenue for attacking secure programs" .
-.SH RETURN VALUE
-The
-.BR strcat ()
-function returns a pointer to the resulting string
-.IR dest .
-.SH ATTRIBUTES
-For an explanation of the terms used in this section, see
-.BR attributes (7).
-.ad l
-.nh
-.TS
-allbox;
-lbx lb lb
-l l l.
-Interface	Attribute	Value
-T{
-.BR strcat (),
-.BR strncat ()
-T}	Thread safety	MT-Safe
-.TE
-.hy
-.ad
-.sp 1
-.SH STANDARDS
-POSIX.1-2001, POSIX.1-2008, C89, C99, SVr4, 4.3BSD.
-.SH NOTES
-Some systems (the BSDs, Solaris, and others) provide the following function:
-.PP
-.in +4n
-.EX
-size_t strlcat(char *dest, const char *src, size_t size);
-.EE
-.in
-.PP
-This function appends the null-terminated string
-.I src
-to the string
-.IR dest ,
-copying at most
-.I size\-strlen(dest)\-1
-from
-.IR src ,
-and adds a terminating null byte to the result,
-.I unless
-.I size
-is less than
-.IR strlen(dest) .
-This function fixes the buffer overrun problem of
-.BR strcat (),
-but the caller must still handle the possibility of data loss if
-.I size
-is too small.
-The function returns the length of the string
-.BR strlcat ()
-tried to create; if the return value is greater than or equal to
-.IR size ,
-data loss occurred.
-If data loss matters, the caller
-.I must
-either check the arguments before the call, or test the function return value.
-.BR strlcat ()
-is not present in glibc and is not standardized by POSIX,
-.\" https://lwn.net/Articles/506530/
-but is available on Linux via the
-.I libbsd
-library.
-.\"
-.SH EXAMPLES
-Because
-.BR strcat ()
-must find the null byte that terminates the string
-.I dest
-using a search that starts at the beginning of the string,
-the execution time of this function
-scales according to the length of the string
-.IR dest .
-This can be demonstrated by running the program below.
-(If the goal is to concatenate many strings to one target,
-then manually copying the bytes from each source string
-while maintaining a pointer to the end of the target string
-will provide better performance.)
-.\"
-.SS Program source
-\&
-.\" SRC BEGIN (strcat.c)
-.EX
-#include <stdint.h>
-#include <stdio.h>
-#include <string.h>
-#include <time.h>
-
-int
-main(void)
-{
-#define LIM 4000000
-    char p[LIM + 1];    /* +1 for terminating null byte */
-    time_t base;
-
-    base = time(NULL);
-    p[0] = \(aq\e0\(aq;
-
-    for (unsigned int j = 0; j < LIM; j++) {
-        if ((j % 10000) == 0)
-            printf("%u %jd\en", j, (intmax_t) (time(NULL) \- base));
-        strcat(p, "a");
-    }
-}
-.EE
-.\" SRC END
-.SH SEE ALSO
-.BR bcopy (3),
-.BR memccpy (3),
-.BR memcpy (3),
-.BR strcpy (3),
-.BR string (3),
-.BR strlcat (3bsd),
-.BR wcscat (3),
-.BR wcsncat (3)
+.so man3/strcpy.3
diff --git a/man3/strncat.3 b/man3/strncat.3
index 6e4bf6d78..ff7476a84 100644
--- a/man3/strncat.3
+++ b/man3/strncat.3
@@ -1,171 +1 @@
-.\" Copyright 2022 Alejandro Colomar <alx@kernel.org>
-.\"
-.\" SPDX-License-Identifier: Linux-man-pages-copyleft
-.\"
-.TH strncat 3 (date) "Linux man-pages (unreleased)"
-.SH NAME
-strncat \- concatenate an unterminated string into a string
-.SH LIBRARY
-Standard C library
-.RI ( libc ", " \-lc )
-.SH SYNOPSIS
-.nf
-.B #include <string.h>
-.PP
-.BI "char *strncat(char " dest "[restrict strlen(." dest ") + ." n " + 1],"
-.BI "              const char " src "[restrict ." n ],
-.BI "              size_t " n );
-.fi
-.SH DESCRIPTION
-.IR Note :
-This is probably not the function you want to use.
-For string concatenation with truncation, see
-.BR strlcat (3bsd).
-For copying or concatenating a string into a fixed-length buffer
-with zeroing of the rest, see
-.BR stpncpy (3).
-.PP
-.BR strncat ()
-appends at most
-.I n
-characters of
-.I src
-to the end of
-.IR dst .
-It always terminates with a null character the string placed in
-.IR dest .
-.PP
-An implementation of
-.BR strncat ()
-might be:
-.PP
-.in +4n
-.EX
-char *
-strncat(char *dest, const char *src, size_t n)
-{
-    char    *cat;
-    size_t  len;
-
-    cat = dest + strlen(dest);
-    len = strnlen(src, n);
-    memcpy(cat, src, len);
-    cat[len] = \(aq\e0\(aq;
-
-    return dest;
-}
-.EE
-.in
-.SH RETURN VALUE
-.BR strncat ()
-returns a pointer to the resulting string
-.IR dest .
-.SH ATTRIBUTES
-For an explanation of the terms used in this section, see
-.BR attributes (7).
-.ad l
-.nh
-.TS
-allbox;
-lbx lb lb
-l l l.
-Interface	Attribute	Value
-T{
-.BR strncat ()
-T}	Thread safety	MT-Safe
-.TE
-.hy
-.ad
-.sp 1
-.SH STANDARDS
-POSIX.1-2001, POSIX.1-2008, C89, C99, SVr4, 4.3BSD.
-.SH NOTES
-.SS ustr2stpe()
-You may want to write your own function similar to
-.BR strncpy (),
-with the following improvements:
-.IP \(bu 3
-Copy, instead of concatenating.
-There's no equivalent of
-.BR strncat ()
-that copies instead of concatenating.
-.IP \(bu
-Allow chaining the function,
-by returning a suitable pointer.
-Copy chaining is faster than concatenating.
-.IP \(bu
-Don't check for null characters in the middle of the unterminated string.
-If the string is terminated, this function should not be used.
-If the string is unterminated, it is unnecessary.
-.IP \(bu
-A name that tells what it does:
-Copy from an
-.IR u nterminated
-.IR str ing
-to a
-.IR st ring,
-and return a
-.IR p ointer
-to its end.
-.PP
-.in +4n
-.EX
-/* This code is in the public domain.
- *
- * char *ustr2stp(char dst[restrict .n+1],
- *                const char src[restrict .n],
- *                size_t len);
- */
-char *
-ustr2stp(char *restrict dst, const char *restrict src, size_t len)
-{
-    memcpy(dst, src, len);
-    dst[len] = \(aq\e0\(aq;
-
-    return dst + len;
-}
-.EE
-.in
-.SH CAVEATS
-This function doesn't know the size of the destination buffer,
-so it can overrun the buffer if the programmer wasn't careful enough.
-.SH BUGS
-.BR strncat (3)
-has a misleading name;
-it has no relationship with
-.BR strncpy (3).
-.SH EXAMPLES
-The following program creates a string
-from a concatenation of unterminated strings.
-.\" SRC BEGIN (strncpy.c)
-.EX
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-
-#define nitems(arr)  (sizeof((arr)) / sizeof((arr)[0]))
-
-int
-main(void)
-{
-    char pre[4] = "pre.";
-    char *post = ".post";
-    char *src = "some_long_body.post";
-    char dest[100];
-
-    dest[0] = \(aq\e0\(aq;
-    strncat(dest, pre, nitems(pre));
-    strncat(dest, src, strlen(src) \- strlen(post));
-
-    puts(dest);  // "pre.some_long_body"
-    exit(EXIT_SUCCESS);
-}
-.EE
-.\" SRC END
-.in
-.SH SEE ALSO
-.BR memccpy (3),
-.BR memcpy (3),
-.BR mempcpy (3),
-.BR strcpy (3),
-.BR string (3)
+.so man3/strcpy.3
diff --git a/man3/strncpy.3 b/man3/strncpy.3
index e2ffc683f..ff7476a84 100644
--- a/man3/strncpy.3
+++ b/man3/strncpy.3
@@ -1,129 +1 @@
-.\" Copyright (C) 1993 David Metcalfe <david@prism.demon.co.uk>
-.\" Copyright (C) 2022 Alejandro Colomar <alx@kernel.org>
-.\"
-.\" SPDX-License-Identifier: Linux-man-pages-copyleft
-.\"
-.\" References consulted:
-.\"     Linux libc source code
-.\"     Lewine's _POSIX Programmer's Guide_ (O'Reilly & Associates, 1991)
-.\"     386BSD man pages
-.\" Modified Sat Jul 24 18:06:49 1993 by Rik Faith (faith@cs.unc.edu)
-.\" Modified Fri Aug 25 23:17:51 1995 by Andries Brouwer (aeb@cwi.nl)
-.\" Modified Wed Dec 18 00:47:18 1996 by Andries Brouwer (aeb@cwi.nl)
-.\" 2007-06-15, Marc Boyer <marc.boyer@enseeiht.fr> + mtk
-.\"     Improve discussion of strncpy().
-.\"
-.TH strncpy 3 (date) "Linux man-pages (unreleased)"
-.SH NAME
-strncpy \- copy a string into a fixed-length buffer and zero the rest of it
-.SH LIBRARY
-Standard C library
-.RI ( libc ", " \-lc )
-.SH SYNOPSIS
-.nf
-.B #include <string.h>
-.PP
-.BI "[[deprecated]] char *strncpy(char " dest "[restrict ." n ],
-.BI "                             const char " src "[restrict ." n "], \
-size_t " n );
-.fi
-.SH DESCRIPTION
-.BI Note: " This is not the function you want to use."
-For string copying with truncation, see
-.BR strlcpy (3bsd).
-For copying a string into a fixed-length buffer with zeroing of the rest,
-see
-.BR stpncpy (3).
-.PP
-.BR strncpy ()
-copies at most
-.I n
-bytes of
-.IR src ,
-and fills the rest of the
-.I dest
-buffer with null bytes.
-.BR Warning :
-If there is no null byte
-among the first
-.I n
-bytes of
-.IR src ,
-the string placed in
-.I dest
-will not be null-terminated.
-.PP
-A simple implementation of
-.BR strncpy ()
-might be:
-.PP
-.in +4n
-.EX
-char *
-strncpy(char *dest, const char *src, size_t n)
-{
-    bzero(dest, n);
-    memccpy(dest, src, \(aq\e0\(aq, n);
-
-    return dest;
-}
-.EE
-.in
-.PP
-The use of
-.BR strncpy ()
-is to copy a C string to a fixed-length buffer
-while ensuring that unused bytes in the destination buffer are zeroed out
-(perhaps to prevent information leaks if the buffer is to be
-written to media or transmitted to another process via an
-interprocess communication technique).
-But
-.BR stpncpy (3)
-is better for this purpose,
-since it detects truncation.
-See BUGS below.
-.SH RETURN VALUE
-The
-.BR strncpy ()
-function returns a pointer to
-the destination buffer
-.IR dest .
-.SH ATTRIBUTES
-For an explanation of the terms used in this section, see
-.BR attributes (7).
-.ad l
-.nh
-.TS
-allbox;
-lbx lb lb
-l l l.
-Interface	Attribute	Value
-T{
-.BR strncpy ()
-T}	Thread safety	MT-Safe
-.TE
-.hy
-.ad
-.sp 1
-.SH STANDARDS
-POSIX.1-2001, POSIX.1-2008, C89, C99, SVr4, 4.3BSD.
-.SH BUGS
-.BR strncpy ()
-has a misleading name.
-It doesn't produce a (null-terminated) string;
-and it should never be used for producing a string.
-.PP
-It can't detect truncation.
-It's probably better to explicitly call
-.BR bzero (3)
-and
-.BR memccpy (3),
-or
-.BR stpncpy (3)
-since they allow detecting truncation.
-.SH SEE ALSO
-.BR bzero (3),
-.BR memccpy (3),
-.BR stpncpy (3),
-.BR string (3),
-.BR wcsncpy (3)
+.so man3/strcpy.3
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 3/3] stpecpy.3, stpecpyx.3, strlcat.3, strlcpy.3, strscpy.3: Add new links to strcpy(3)
  2022-12-12 14:24 ` [PATCH 1/3] strcpy.3: Rewrite page to document all string-copying functions Alejandro Colomar
                     ` (3 preceding siblings ...)
  2022-12-12 23:00   ` [PATCH v2 2/3] stpcpy.3, stpncpy.3, strcat.3, strncat.3, strncpy.3: Transform the old pages into links to strcpy(3) Alejandro Colomar
@ 2022-12-12 23:00   ` Alejandro Colomar
  4 siblings, 0 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-12 23:00 UTC (permalink / raw)
  To: linux-man; +Cc: Martin Sebor, Alejandro Colomar

Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man3/stpecpy.3  | 1 +
 man3/stpecpyx.3 | 1 +
 man3/strlcat.3  | 1 +
 man3/strlcpy.3  | 1 +
 man3/strscpy.3  | 1 +
 5 files changed, 5 insertions(+)
 create mode 100644 man3/stpecpy.3
 create mode 100644 man3/stpecpyx.3
 create mode 100644 man3/strlcat.3
 create mode 100644 man3/strlcpy.3
 create mode 100644 man3/strscpy.3

diff --git a/man3/stpecpy.3 b/man3/stpecpy.3
new file mode 100644
index 000000000..ff7476a84
--- /dev/null
+++ b/man3/stpecpy.3
@@ -0,0 +1 @@
+.so man3/strcpy.3
diff --git a/man3/stpecpyx.3 b/man3/stpecpyx.3
new file mode 100644
index 000000000..ff7476a84
--- /dev/null
+++ b/man3/stpecpyx.3
@@ -0,0 +1 @@
+.so man3/strcpy.3
diff --git a/man3/strlcat.3 b/man3/strlcat.3
new file mode 100644
index 000000000..ff7476a84
--- /dev/null
+++ b/man3/strlcat.3
@@ -0,0 +1 @@
+.so man3/strcpy.3
diff --git a/man3/strlcpy.3 b/man3/strlcpy.3
new file mode 100644
index 000000000..ff7476a84
--- /dev/null
+++ b/man3/strlcpy.3
@@ -0,0 +1 @@
+.so man3/strcpy.3
diff --git a/man3/strscpy.3 b/man3/strscpy.3
new file mode 100644
index 000000000..ff7476a84
--- /dev/null
+++ b/man3/strscpy.3
@@ -0,0 +1 @@
+.so man3/strcpy.3
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* a Q quotation macro for man(7) (was: groff man(7) extensions)
  2022-12-12 18:38     ` groff man(7) extensions (was: [PATCH 1/3] strcpy.3: Rewrite page to document all string-copying functions) G. Branden Robinson
@ 2022-12-13 15:45       ` G. Branden Robinson
  0 siblings, 0 replies; 53+ messages in thread
From: G. Branden Robinson @ 2022-12-13 15:45 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: groff, linux-man


[-- Attachment #1.1: Type: text/plain, Size: 2256 bytes --]

[self-reply]

At 2022-12-12T12:38:42-0600, G. Branden Robinson wrote:
> Here's a list of man(7) extensions to which I have given consideration.
> 
> 	KS/KE	Keeps.  Easy.[3]  Harmlessly ignorable by other
> 		implementations.
> 	LS/LE	List enclosure.  Throws a semantic hint (e.g., for HTML
> 		output) and eliminates final use case of `PD` macro.[4]
> 	DC/TG	Semantics at last.  Sure to rouse anger in people who
> 		decided long ago that man(7) can't do this.[5]  Having
> 		looked more closely at mdoc(7) since writing that, I
> 		think `DC` should accept a _pair_ of arguments as its
> 		second and third parameters for bracketing purposes.
> 		But again, most man page authors would never need to
> 		mess with `DC` at all.

There was one more.

	Q	Quotation macro.  It's madness that one doesn't already
		exist.  Its absence, the imperfect portability of
		special character identifiers for various types of
		quotation mark, and the bad ergonomics of introducing
		*roff strings just to serve this one purpose have made
		quotation such a pain point in man(7) writing that
		authors have tended to not bother with and instead abuse
		font style changes for it, putting things that should
		simply be quoted into stentorian italics or screaming
		bold instead, when these faces are already heavily
		burdened by other uses.

I experimentally implemented `Q` at one point but ran into a corner case
I wasn't happy with.  Looking back over it now I see that I got it
entangled with an extension to `SY`/`YS` to support arguments to help
the formatter compute tab stops.  I'm attaching "clone.man" so you can
have a look.

I've also pondered having private strings (i.e., not for use directly by
man pages) for opening and closing quotation marks that localization
packages can set.  This might save Helge Kreutzmann and collaborators
some tedium.

Even with that wrinkle, a `Q` macro would be dead simple.

Here's an an-ext.tmac portable version.

.\" Define opening and closing quotation marks as appropriate to your
.\" language and/or output device.
.ds oq \(lq
.ds cq \(rq
.
.\" Quote first argument with second argument immediately following.
.de Q
\*(oq\\$1\*(cq\\$2
..

Regards,
Branden

[-- Attachment #1.2: clone.man --]
[-- Type: application/x-troff-man, Size: 1751 bytes --]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 0/3] Rewrite strcpy(3)
  2022-12-12 23:00   ` [PATCH v2 0/3] Rewrite strcpy(3) Alejandro Colomar
@ 2022-12-13 20:56     ` Jakub Wilk
  2022-12-13 20:57       ` Alejandro Colomar
  2022-12-13 22:05       ` Alejandro Colomar
  2022-12-14  0:03     ` [PATCH v3 0/1] Rewritten page for string-copying functions Alejandro Colomar
  2022-12-14  0:03     ` [PATCH v3 " Alejandro Colomar
  2 siblings, 2 replies; 53+ messages in thread
From: Jakub Wilk @ 2022-12-13 20:56 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: linux-man, Martin Sebor

The sheer size of this page make it almost unusable for me.
Please don't merge it.

* Alejandro Colomar <alx.manpages@gmail.com>, 2022-12-13 00:00:
>       stpecpy(3), stpecpyx(3)
>              Not provided by any library.

Then they don't belong in the man-pages project.

>       strscpy(3)
>              Not provided by any library.  It  is  a  Linux  kernel  internal
>              function.

Ditto.

-- 
Jakub Wilk

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 0/3] Rewrite strcpy(3)
  2022-12-13 20:56     ` Jakub Wilk
@ 2022-12-13 20:57       ` Alejandro Colomar
  2022-12-13 22:05       ` Alejandro Colomar
  1 sibling, 0 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-13 20:57 UTC (permalink / raw)
  To: Jakub Wilk; +Cc: linux-man, Martin Sebor


[-- Attachment #1.1: Type: text/plain, Size: 709 bytes --]

Hi Jakub,

On 12/13/22 21:56, Jakub Wilk wrote:
> The sheer size of this page make it almost unusable for me.
> Please don't merge it.

Plan b is a string_copy(7) page, and keep the other pages minimal.  Would that 
please you?

Thanks,

Alex

> 
> * Alejandro Colomar <alx.manpages@gmail.com>, 2022-12-13 00:00:
>>       stpecpy(3), stpecpyx(3)
>>              Not provided by any library.
> 
> Then they don't belong in the man-pages project. >
>>       strscpy(3)
>>              Not provided by any library.  It  is  a  Linux  kernel  internal
>>              function.
> 
> Ditto.
> 

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 0/3] Rewrite strcpy(3)
  2022-12-13 20:56     ` Jakub Wilk
  2022-12-13 20:57       ` Alejandro Colomar
@ 2022-12-13 22:05       ` Alejandro Colomar
  2022-12-13 22:46         ` Alejandro Colomar
  1 sibling, 1 reply; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-13 22:05 UTC (permalink / raw)
  To: Jakub Wilk; +Cc: linux-man, Martin Sebor


[-- Attachment #1.1: Type: text/plain, Size: 1817 bytes --]

Hi Jakub,

On 12/13/22 21:56, Jakub Wilk wrote:
> The sheer size of this page make it almost unusable for me.
> Please don't merge it.
> 
> * Alejandro Colomar <alx.manpages@gmail.com>, 2022-12-13 00:00:
>>       stpecpy(3), stpecpyx(3)
>>              Not provided by any library.
> 
> Then they don't belong in the man-pages project.
> 
>>       strscpy(3)
>>              Not provided by any library.  It  is  a  Linux  kernel  internal
>>              function.
> 
> Ditto.

And strictly speaking, I shouldn't document strlcpy(3bsd) and strlcat(3bsd) 
either because they're not provided by our libc; libbsd already has manual pages 
for them, anyway.

Regarding this, the intention of the page is not to coldly document the behavior 
of functions in terms of the byte operations they perform.  That's what has been 
done until now, and the result is what we know: many string copy functions are 
dreaded (e.g., strncpy(3)), because most programmers don't use them correctly.

This new page instead, shows all string copying functions, including those 
developed by other systems as alternatives to the standard ones.  They did it 
for a reason: the standard functions don't cover all use cases, and there's a 
need to roll your own.  But rolling your own is bad.  It's better if someone 
explains what alternative string copy functions exist, when they are more 
appropriate than libc ones, and when they are not.  Even the old pages 
documented strlcpy(3) a little bit!

I suggest for a first release using the new page string_copy(7).  I'll rewrite 
anyway strcpy(3) and all others to be minimal, _and_ be reductions of 
string_copy(7), for fast lookup.

Cheers,

Alex



-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 0/3] Rewrite strcpy(3)
  2022-12-13 22:05       ` Alejandro Colomar
@ 2022-12-13 22:46         ` Alejandro Colomar
  0 siblings, 0 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-13 22:46 UTC (permalink / raw)
  To: Jakub Wilk; +Cc: linux-man, Martin Sebor


[-- Attachment #1.1: Type: text/plain, Size: 3445 bytes --]



On 12/13/22 23:05, Alejandro Colomar wrote:
> Hi Jakub,
> 
> On 12/13/22 21:56, Jakub Wilk wrote:
>> The sheer size of this page make it almost unusable for me.

Moreover, I'd like to ask, what's your use case for these (string copy) pages? 
And how am I impeding it?

-  stpcpy(3)
-  strcpy(3)
-  strcat(3)

-  stpncpy(3)
-  strncpy(3)

-  strncat(3)


Except for the last one, they are so simple in terms of the byte operation that 
they perform, or the return value, that the pages are useless.  Once you know 
what they do, you don't forget (and I bet you know what they do).  And even 
strncat(3) is simple, when you understand it.


The return value is simple:  'r' functions return dst.  'p' functions return a 
pointer past the last non-null character written.

The operation of 'cat' functions is simple:  strlen(dst), and append a string there.

The operation of 'st.cpy' functions is even simpler: read a string, and copy it 
at dst.

The operation of st.ncpy(3) is slightly less intuitive (probably due to 
misdesign, the name doesn't match what they do): read a string, and copy it with 
truncation into a null-padded character sequence in a fixed-width array.

strncat(3) is the most misdesigned of all:  it reads a character sequence from a 
null-padded fixed-width array, and creates a string out of it.


That covers it all.  If I were to put those paragraphs in a separate page for 
each function, what good would they do?


So, the pages are not very informative for those who already know.  And for 
those who don't know, I very much prefer that they read the entire page.

Cheers,

Alex



>> Please don't merge it.
>>
>> * Alejandro Colomar <alx.manpages@gmail.com>, 2022-12-13 00:00:
>>>       stpecpy(3), stpecpyx(3)
>>>              Not provided by any library.
>>
>> Then they don't belong in the man-pages project.
>>
>>>       strscpy(3)
>>>              Not provided by any library.  It  is  a  Linux  kernel  internal
>>>              function.
>>
>> Ditto.
> 
> And strictly speaking, I shouldn't document strlcpy(3bsd) and strlcat(3bsd) 
> either because they're not provided by our libc; libbsd already has manual pages 
> for them, anyway.
> 
> Regarding this, the intention of the page is not to coldly document the behavior 
> of functions in terms of the byte operations they perform.  That's what has been 
> done until now, and the result is what we know: many string copy functions are 
> dreaded (e.g., strncpy(3)), because most programmers don't use them correctly.
> 
> This new page instead, shows all string copying functions, including those 
> developed by other systems as alternatives to the standard ones.  They did it 
> for a reason: the standard functions don't cover all use cases, and there's a 
> need to roll your own.  But rolling your own is bad.  It's better if someone 
> explains what alternative string copy functions exist, when they are more 
> appropriate than libc ones, and when they are not.  Even the old pages 
> documented strlcpy(3) a little bit!
> 
> I suggest for a first release using the new page string_copy(7).  I'll rewrite 
> anyway strcpy(3) and all others to be minimal, _and_ be reductions of 
> string_copy(7), for fast lookup.
> 
> Cheers,
> 
> Alex
> 
> 
> 

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v3 0/1] Rewritten page for string-copying functions
  2022-12-12 23:00   ` [PATCH v2 0/3] Rewrite strcpy(3) Alejandro Colomar
  2022-12-13 20:56     ` Jakub Wilk
@ 2022-12-14  0:03     ` Alejandro Colomar
  2022-12-14  0:14       ` Alejandro Colomar
                         ` (2 more replies)
  2022-12-14  0:03     ` [PATCH v3 " Alejandro Colomar
  2 siblings, 3 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-14  0:03 UTC (permalink / raw)
  To: linux-man, Martin Sebor, G. Branden Robinson, Douglas McIlroy,
	Jakub Wilk
  Cc: Alejandro Colomar


Hi!

I've written a new manual page for documenting string-copying functions
so that it's clear what's the purpose of each of them.  It may differ
from the original design of the functions, since my guess for several of
them is simply that they were misdesigned.  However, after investigating
the operation that they perform on bytes, I've come up with a story that
can make sense of functions that were once believed to be broken by
many.  In fact, my conclusion after writing the page is that only one
function is really useless:

-  strncpy(3):  stpncpy(3) is _always_ better.

The others depend on the program.  If you don't care at all about
performance and Shlemiel is a friend of yours, then rcpy and [rn]cat
are your friends.  If you don't like Shlemiel, and don't mind slightly
more complex code, you'll go for 'p' functions.

And so on.  I won't spoil the page more.

Basically I want to end with this situation where a function like
strncpy(3) is dreaded by some because it looks broken (myself thought
that for a long time), and other who don't even know it misuse it for
what it shouldn't be useful, which is even worse.  Or where programmers
think that strncpy(3) and strncat(3) have any relationship at all (they
don't).

Below goes the formatted page.  Please review independently of it being
in strcpy(3) or string_copy(7), and address that as a separate issue
(but of course feel free to cover it, and any other issues).


Cheers,

Alex


Alejandro Colomar (1):
  strcpy.3: Rewrite page to document all string-copying functions

 man3/strcpy.3 | 1058 +++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 970 insertions(+), 88 deletions(-)


strcpy(3)                  Library Functions Manual                  strcpy(3)

NAME
       stpcpy,  strcpy,  strcat, stpecpy, stpecpyx, strlcpy, strlcat, strscpy,
       stpncpy, strncpy, ustr2stp, strncat, mempcpy - copy strings and charac‐
       ter sequences

LIBRARY
       stpcpy(3)
       strcpy(3), strcat(3)
       stpncpy(3)
       strncpy(3)
       strncat(3)
       mempcpy(3)
              Standard C library (libc, -lc)

       stpecpy(3), stpecpyx(3)
              Not provided by any library.

       strlcpy(3), strlcat(3)
              Utility functions from BSD systems (libbsd, -lbsd)

       strscpy(3)
              Not provided by any library.  It  is  a  Linux  kernel  internal
              function.

SYNOPSIS
       #include <string.h>

   Strings
       // Chain‐copy a string.
       char *stpcpy(char *restrict dst, const char *restrict src);

       // Copy/concatenate a string.
       char *strcpy(char *restrict dst, const char *restrict src);
       char *strcat(char *restrict dst, const char *restrict src);

       // Chain‐copy a string with truncation.
       char *stpecpy(char *dst, char past_end[0], const char *restrict src);

       // Chain‐copy a string with truncation and SIGSEGV on UB.
       char *stpecpyx(char *dst, char past_end[0], const char *restrict src);

       // Copy/concatenate a string with truncation and SIGSEGV on UB.
       size_t strlcpy(char dst[restrict .sz], const char *restrict src,
                      size_t sz);
       size_t strlcat(char dst[restrict .sz], const char *restrict src,
                      size_t sz);

       // Copy a string with truncation.
       ssize_t strscpy(char dst[restrict .sz], const char src[restrict .sz],
                      size_t sz);

   Null‐padded character sequences
       // Zero a fixed‐width buffer, and
       // copy a string with truncation into a character sequence.
       char *stpncpy(char dst[restrict .sz], const char *restrict src,
                      size_t sz);

       // Zero a fixed‐width buffer, and
       // copy a string with truncation into a character sequence.
       char *strncpy(char dest[restrict .sz], const char *restrict src,
                      size_t sz);

       // Chain‐copy a null‐padded character sequence into a string.
       char *ustr2stp(char *restrict dst, const char src[restrict .sz],
                      size_t sz);

       // Concatenate a null‐padded character sequence into a string.
       char *strncat(char *restrict dst, const char src[restrict .sz],
                      size_t sz);

   Measured character sequences
       // Chain‐copy a measured character sequence.
       void *mempcpy(void *restrict dst, const void src[restrict .len],
                      size_t len);

   Feature Test Macro Requirements for glibc (see feature_test_macros(7)):

       stpcpy(3), stpncpy(3):
           Since glibc 2.10:
               _POSIX_C_SOURCE >= 200809L
           Before glibc 2.10:
               _GNU_SOURCE

       mempcpy(3):
           _GNU_SOURCE

DESCRIPTION
   Terms (and abbreviations)
       string (str)
              is  a sequence of zero or more non‐null characters followed by a
              null byte.

       character sequence (ustr)
              is a sequence of zero or more non‐null  characters.   A  program
              should  never  usa  a  character  sequence where a string is re‐
              quired.  However, with appropriate care, a string can be used in
              the place of a character sequence.

              null‐padded character sequence
                     Character  sequences  can  be  contained  in  fixed‐width
                     buffers, which contain padding null bytes after the char‐
                     acter  sequence,  to  fill the rest of the buffer without
                     affecting the character sequence; however, those  padding
                     null bytes are not part of the character sequence.

              measured character sequence
                     Character sequence delimited by its length.

       length (len)
              is  the  number  of non‐null characters in a string or character
              sequence.   It  is  the  return  value  of  strlen(str)  and  of
              strnlen(ustr, sz).

       size (sz)
              refers  to  the  entire buffer where the string or character se‐
              quence is contained.

       end    is the name of a pointer to  the  terminating  null  byte  of  a
              string, or a pointer to one past the last character of a charac‐
              ter  sequence.  This is the return value of functions that allow
              chaining.  It is equivalent to &str[len].

       past_end
              is the name of a pointer to one past the end of the buffer  that
              contains  a  string  or character sequence.  It is equivalent to
              &str[sz].  It is used as a sentinel value, to be able  to  trun‐
              cate  strings  or character sequences instead of overrunning the
              containing buffer.

   Copy, concatenate, and chain‐copy
       Originally, there was a distinction between  functions  that  copy  and
       those  that  concatenate.  However, newer functions that copy while al‐
       lowing chaining cover both use cases with a single API.  They are  also
       algorithmically  faster, since they don’t need to search for the end of
       the existing string.  However, functions that concatenate have  a  much
       simpler  use,  so if performance is not important, it can make sense to
       use them for improving readability.

       To chain copy functions, they need to return  a  pointer  to  the  end.
       That’s  a  byproduct  of  the  copy operation, so it has no performance
       costs.  Functions that return such a pointer, and thus can be  chained,
       have  names  of the form *stp*() or *memp*(), since it’s also common to
       name the pointer just p.

       Chain‐copying functions that truncate should accept a  pointer  to  one
       past  the  end  of  the  destination buffer, and have names of the form
       *stpe*().  This allows not having to recalculate the remaining size af‐
       ter each call.

   Truncate or not?
       The first thing to note is that  programmers  should  be  careful  with
       buffers,  so  they  always have the correct size, and truncation is not
       necessary.

       In most cases, truncation is not desired, and it is simpler to just  do
       the copy.  Simpler code is safer code.  Programming against programming
       mistakes  by  adding more code just adds more points where mistakes can
       be made.

       Nowadays, compilers can detect most  programmer  errors  with  features
       like  compiler  warnings,  static  analyzers,  and _FORTIFY_SOURCE (see
       ftm(7)).  Keeping the code simple helps these  overflow‐detection  fea‐
       tures be more precise.

       When  validating  user input, however, it makes sense to truncate.  Re‐
       member to check the return value of such function calls.

       Functions that truncate:

       •  stpecpy(3) is the most efficient string copy function that  performs
          truncation.  It only requires to check for truncation once after all
          chained calls.

       •  stpecpyx(3)  is  a  variant  of  stpecpy(3) that consumes the entire
          source string, to catch bugs in the program by forcing  a  segmenta‐
          tion fault (as strlcpy(3bsd) and strlcat(3bsd) do).

       •  strlcpy(3bsd)  and  strlcat(3bsd) are designed to crash if the input
          string is invalid (doesn’t contain a terminating null byte).

       •  strscpy(3)  reports  an  error  instead  of  crashing  (similar   to
          stpecpy(3)).

       •  stpncpy(3)  and  strncpy(3)  also  truncate,  but  they  don’t write
          strings, but rather null‐padded character sequences.

   Null‐padded character sequences
       For historic reasons, some standard APIs, such as utmpx(5),  use  null‐
       padded  character  sequences in fixed‐width buffers.  To interface with
       them, specialized functions need to be used.

       To copy strings into them, use stpncpy(3).

       To copy from an unterminated string within a fixed‐width buffer into  a
       string,  ignoring  any  trailing  null  bytes in the source fixed‐width
       buffer, you should use ustr2stp(3) or strncat(3).

   Measured character sequences
       The simplest character sequence copying function is mempcpy(3).  It re‐
       quires always knowing the length of your character sequences, for which
       structures can be used.  It makes the code much faster, since  you  al‐
       ways  know the length of your character sequences, and can do the mini‐
       mal copies and length measurements.  mempcpy(3)  copies  character  se‐
       quences, so you need to explicitly set the terminating null byte if you
       need a string.

       The  following code can be used to chain‐copy from a measured character
       sequence into a string:

           p = mempcpy(p, foo->ustr, foo->len);
           *p = '\0';

       The following code can be used to chain‐copy from a measured  character
       sequence into an unterminated string:

           p = mempcpy(p, bar->ustr, bar->len);

       In  programs  that  make  considerable  use of strings or character se‐
       quences, and need the best performance, using overlapping character se‐
       quences can make a big difference.  It allows holding subsequences of a
       larger character sequence.  while not duplicating memory nor using time
       to do a copy.

       However, this is delicate, since it requires using character sequences.
       C library APIs use strings, so programs that  use  character  sequences
       will  have  to  take care of differentiating strings from character se‐
       quences.

   String vs character sequence
       Some functions only operate on strings.  Those require that  the  input
       src  is  a string, and guarantee an output string (even when truncation
       occurs).  Functions that concatenate also  require  that  dst  holds  a
       string before the call.  List of functions:

       •  stpcpy(3)
       •  strcpy(3), strcat(3)
       •  stpecpy(3), stpecpyx(3)
       •  strlcpy(3bsd), strlcat(3bsd)
       •  strscpy(3)

       Other  functions  require  an  input string, but create a character se‐
       quence as output.  These functions have confusing  names,  and  have  a
       long history of misuse.  List of functions:

       •  stpncpy(3)
       •  strncpy(3)

       Other  functions  operate on an input character sequence, and create an
       output string.  Functions that concatenate also require that dst  holds
       a  string before the call.  strncat(3) has an even more misleading name
       than the functions above.  List of functions:

       •  ustr2stp(3)
       •  strncat(3)

       And the last one, operates on an input character sequence to create  an
       output  character  sequence.  But because it asks for the length, and a
       string is by nature composed of a character sequence of the same length
       plus a terminating null byte, a  string  is  also  accepted  as  input.
       Function:

       •  mempcpy(3)

   Functions
       stpcpy(3)
              This function copies the input string into a destination string.
              The  programmer  is  responsible  for  allocating a buffer large
              enough.  It returns a pointer suitable for chaining.

              An implementation of this function might be:

                  char *
                  stpcpy(char *restrict dst, const char *restrict src)
                  {
                      return mempcpy(dst, src, strlen(src));
                  }

       strcpy(3)
       strcat(3)
              These functions copy the input string into a destination string.
              The programmer is responsible  for  allocating  a  buffer  large
              enough.  The return value is useless.

              stpcpy(3) is a faster alternative to these functions.

              An implementation of these functions might be:

                  char *
                  strcpy(char *restrict dst, const char *restrict src)
                  {
                      stpcpy(dst, src);
                      return dst;
                  }

                  char *
                  strcat(char *restrict dst, const char *restrict src)
                  {
                      stpcpy(dst + strlen(dst), src);
                      return dst;
                  }

       stpecpy(3)
       stpecpyx(3)
              These functions copy the input string into a destination string.
              If  the destination buffer, limited by a pointer to one past the
              end of it, isn’t large enough to hold the  copy,  the  resulting
              string  is  truncated  (but  it  is guaranteed to be null‐termi‐
              nated).  They return a pointer suitable for  chaining.   Trunca‐
              tion needs to be detected only once after the last chained call.
              stpecpyx(3)  has  identical semantics to stpecpy(3), except that
              it forces a SIGSEGV if the src pointer is not a string.

              These functions are not provided by any library, but you can de‐
              fine them with the following reference implementations:

                  /* This code is in the public domain. */
                  char *
                  stpecpy(char *dst, char past_end[0],
                          const char *restrict src)
                  {
                      char *p;

                      if (dst == past_end)
                          return past_end;

                      p = memccpy(dst, src, '\0', past_end - dst);
                      if (p != NULL)
                          return p - 1;

                      /* truncation detected */
                      past_end[-1] = '\0';
                      return past_end;
                  }

                  /* This code is in the public domain. */
                  char *
                  stpecpyx(char *dst, char past_end[0],
                           const char *restrict src)
                  {
                      if (src[strlen(src)] != '\0')
                          raise(SIGSEGV);

                      return stpecpy(dst, past_end, src);
                  }

       strlcpy(3bsd)
       strlcat(3bsd)
              These functions copy the input string into a destination string.
              If the destination buffer, limited  by  its  size,  isn’t  large
              enough  to hold the copy, the resulting string is truncated (but
              it is guaranteed to be null‐terminated).  They return the length
              of the total string they tried to create.  These functions force
              a SIGSEGV if the src pointer is not a string.

              stpecpyx(3) is a faster alternative to these functions.

       strscpy(3)
              This function copies the input string into a destination string.
              If the destination buffer, limited  by  its  size,  isn’t  large
              enough  to hold the copy, the resulting string is truncated (but
              it is guaranteed to be null‐terminated).  It returns the  length
              of the destination string, or -E2BIG on truncation.

              stpecpy(3) is a simpler and faster alternative to this function.

       stpncpy(3)
              This  function  copies the input string into a destination null‐
              padded character sequence in a fixed‐width buffer.  If the  des‐
              tination buffer, limited by its size, isn’t large enough to hold
              the  copy, the resulting character sequence is truncated.  Since
              it creates a character sequence, it doesn’t need to write a ter‐
              minating null byte.  It returns a pointer suitable for chaining,
              but it’s not ideal for that.  Truncation needs  to  be  detected
              only once after the last chained call.

              If  you’re going to use this function in chained calls, it would
              be useful to develop a similar function that accepts  a  pointer
              to one past the end of the buffer instead of a size.

              An implementation of this function might be:

                  char *
                  stpncpy(char *restrict dst, const char *restrict src,
                          size_t sz)
                  {
                      char  *p;

                      bzero(dst, sz);
                      p = memccpy(dst, src, '\0', sz);
                      if (p == NULL)
                          return dst + sz;

                      return p - 1;
                  }

       ustr2stp(3)
              This function copies the input character sequence contained in a
              null‐padded  wixed‐width buffer, into a destination string.  The
              programmer is responsible for allocating a buffer large  enough.
              It returns a pointer suitable for chaining.

              A  truncating  version of this function doesn’t exist, since the
              size of the original character sequence is always known,  so  it
              wouldn’t be very useful.

              This function is not provided by any library, but you can define
              it with the following reference implementation:

                  /* This code is in the public domain. */
                  char *
                  ustr2stp(char *restrict dst, const char *restrict src,
                           size_t sz)
                  {
                      char  *end;

                      end = memccpy(dst, src, '\0', sz)) ?: dst + sz;
                      *end = '\0';

                      return end;
                  }

       strncpy(3)
              This  function is identical to stpncpy(3) except for the useless
              return value.  Due to the return value, with this function  it’s
              hard to correctly check for truncation.

              stpncpy(3) is a simpler alternative to this function.

              An implementation of this function might be:

                  char *
                  strncpy(char *restrict dst, const char *restrict src,
                          size_t sz)
                  {
                      stpncpy(dst, src, sz);
                      return dst;
                  }

       strncat(3)
              Do  not  confuse this function with strncpy(3); they are not re‐
              lated at all.

              This function concatenates the  input  character  sequence  con‐
              tained  in  a null‐padded wixed‐width buffer, into a destination
              string.  The programmer is responsible for allocating  a  buffer
              large enough.  The return value is useless.

              ustr2stp(3) is a faster alternative to this function.

              An implementation of this function might be:

                  char *
                  strncat(char *restrict dst, const char *restrict src,
                          size_t sz)
                  {
                      ustr2stp(dst + strlen(dst), src, sz);
                      return dst;
                  }

       mempcpy(3)
              This  function  copies  the input character sequence, limited by
              its length, into a destination character sequence.  The program‐
              mer is responsible for allocating a buffer large enough.  It re‐
              turns a pointer suitable for chaining.

              An implementation of this function might be:

                  void *
                  mempcpy(void *restrict dst, const void *restrict src,
                          size_t len)
                  {
                      return memcpy(dst, src, len) + len;
                  }

RETURN VALUE
       The following functions return a pointer to the terminating  null  byte
       in the destination string.

       •  stpcpy(3)
       •  ustr2stp(3)

       The  following  functions return a pointer to the terminating null byte
       in the destination string, except when truncation occurs; if truncation
       occurs, they return a pointer to one past the end  of  the  destination
       buffer (past_end).

       •  stpecpy(3), stpecpyx(3)

       The  following function returns a pointer to one after the last charac‐
       ter in the destination character sequence; if truncation  occurs,  that
       pointer  is equivalent to a pointer to one past the end of the destina‐
       tion buffer.

       •  stpncpy(3)

       The following function returns a pointer to one after the last  charac‐
       ter in the destination character sequence.

       •  mempcpy(3)

       The following functions return the length of the total string that they
       tried to create (as if truncation didn’t occur).

       •  strlcpy(3bsd), strlcat(3bsd)

       The following function returns the length of the destination string, or
       -E2BIG on truncation.

       •  strscpy(3)

       The following functions return the dst pointer, which is useless.

       •  strcpy(3), strcat(3)
       •  strncpy(3)
       •  strncat(3)

ATTRIBUTES
       For  an  explanation  of  the  terms  used in this section, see attrib‐
       utes(7).
       ┌────────────────────────────────────────────┬───────────────┬─────────┐
       │Interface                                   │ Attribute     │ Value   │
       ├────────────────────────────────────────────┼───────────────┼─────────┤
       │stpcpy(), strcpy(), strcat(), stpecpy(),    │ Thread safety │ MT‐Safe │
       │stpecpyx() strlcpy(), strlcat(), strscpy(), │               │         │
       │stpncpy(), strncpy(), ustr2stp(),           │               │         │
       │strncat(), mempcpy()                        │               │         │
       └────────────────────────────────────────────┴───────────────┴─────────┘

STANDARDS
       strcpy(3), strcat(3)
       strncpy(3)
       strncat(3)
              POSIX.1‐2001, POSIX.1‐2008, C89, C99, SVr4, 4.3BSD.

       stpcpy(3)
       stpncpy(3)
              POSIX.1‐2008.

       strlcpy(3bsd), strlcat(3bsd)
              Functions originated in OpenBSD and present in  some  Unix  sys‐
              tems.

       mempcpy(3)
              This function is a GNU extension.

       strscpy(3)
              Linux kernel internal function.

       stpecpy(3), stpecpyx(3)
       ustr2stp(3)
              Not defined by any standards nor libraries.

CAVEATS
       Don’t  mix  chain calls to truncating and non‐truncating functions.  It
       is conceptually wrong unless you know that the first  part  of  a  copy
       will  always  fit.  Anyway, the performance difference will probably be
       negligible, so it will probably be more clear if you use consistent se‐
       mantics: either truncating or non‐truncating.  Calling a non‐truncating
       function after a truncating one is necessarily wrong.

       Some of the functions described here are not provided by  any  library;
       you should write your own copy if you want to use them.  See STANDARDS.

BUGS
       All  concatenation  (*cat()) functions share the same performance prob‐
       lem: Shlemiel the  painter  ⟨https://www.joelonsoftware.com/2001/12/11/
       back-to-basics/⟩.

EXAMPLES
       The following are examples of correct use of each of these functions.

       stpcpy(3)
                  p = buf;
                  p = stpcpy(p, "Hello ");
                  p = stpcpy(p, "world");
                  p = stpcpy(p, "!");
                  len = p - buf;
                  puts(buf);

       strcpy(3)
       strcat(3)
                  strcpy(buf, "Hello ");
                  strcat(buf, "world");
                  strcat(buf, "!");
                  len = strlen(buf);
                  puts(buf);

       stpecpy(3)
       stpecpyx(3)
                  past_end = buf + sizeof(buf);
                  p = buf;
                  p = stpecpy(p, past_end, "Hello ");
                  p = stpecpy(p, past_end, "world");
                  p = stpecpy(p, past_end, "!");
                  if (p == past_end) {
                      p--;
                      goto toolong;
                  }
                  len = p - buf;
                  puts(buf);

       strlcpy(3bsd)
       strlcat(3bsd)
                  if (strlcpy(buf, "Hello ", sizeof(buf)) >= sizeof(buf))
                      goto toolong;
                  if (strlcat(buf, "world", sizeof(buf)) >= sizeof(buf))
                      goto toolong;
                  len = strlcat(buf, "!", sizeof(buf));
                  if (len >= sizeof(buf))
                      goto toolong;
                  puts(buf);

       strscpy(3)
                  len = strscpy(buf, "Hello world!", sizeof(buf));
                  if (len == -E2BIG)
                      goto toolong;
                  puts(buf);

       stpncpy(3)
                  past_end = buf + sizeof(buf);
                  end = stpncpy(buf, "Hello world!", sizeof(buf));
                  if (end == past_end)
                      goto toolong;
                  len = end - buf;
                  for (size_t i = 0; i < sizeof(buf); i++)
                      putchar(buf[i]);

       strncpy(3)
                  strncpy(buf, "Hello world!", sizeof(buf));
                  if (buf + sizeof(buf) - 1 == '\0')
                      goto toolong;
                  len = strnlen(buf, sizeof(buf));
                  for (size_t i = 0; i < sizeof(buf); i++)
                      putchar(buf[i]);

       ustr2stp(3)
                  p = buf;
                  p = ustr2stp(p, "Hello ", 6);
                  p = ustr2stp(p, "world", 42);  // Padding null bytes ignored.
                  p = ustr2stp(p, "!", 1);
                  len = p - buf;
                  puts(buf);

       strncat(3)
                  buf[0] = '\0';  // There’s no ’cpy’ function to this ’cat’.
                  strncat(buf, "Hello ", 6);
                  strncat(buf, "world", 42);  // Padding null bytes ignored.
                  strncat(buf, "!", 1);
                  len = strlen(buf);
                  puts(buf);

       mempcpy(3)
                  p = buf;
                  p = mempcpy(p, "Hello ", 6);
                  p = mempcpy(p, "world", 5);
                  p = mempcpy(p, "!", 1);
                  p = '\0';
                  len = p - buf;
                  puts(buf);

SEE ALSO
       bzero(3), memcpy(3), memccpy(3), mempcpy(3), string(3)

Linux man‐pages (unreleased)        (date)                           strcpy(3)

-- 
2.38.1


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v3 1/1] strcpy.3: Rewrite page to document all string-copying functions
  2022-12-12 23:00   ` [PATCH v2 0/3] Rewrite strcpy(3) Alejandro Colomar
  2022-12-13 20:56     ` Jakub Wilk
  2022-12-14  0:03     ` [PATCH v3 0/1] Rewritten page for string-copying functions Alejandro Colomar
@ 2022-12-14  0:03     ` Alejandro Colomar
  2022-12-14 16:22       ` Douglas McIlroy
  2 siblings, 1 reply; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-14  0:03 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, Martin Sebor, G. Branden Robinson,
	Douglas McIlroy, Jakub Wilk

This is an opportunity to use consistent language across the
documentation for all string-copying functions.

It is also easier to show the similarities and differences between all
of the functions, so that a reader can use this page to know which
function is needed for a given task.

Many functions that are inferior to another one, have been marked as
deprecated, notwithstanding the deprecation status in C libraries or
any standards.  Alternatives have been given in the same page, with
reference implementations.

Cc: Martin Sebor <msebor@redhat.com>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Douglas McIlroy <douglas.mcilroy@dartmouth.edu>
Cc: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man3/strcpy.3 | 1058 +++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 970 insertions(+), 88 deletions(-)

diff --git a/man3/strcpy.3 b/man3/strcpy.3
index 74c3180ae..e04a7b149 100644
--- a/man3/strcpy.3
+++ b/man3/strcpy.3
@@ -1,48 +1,767 @@
-.\" Copyright (C) 1993 David Metcalfe (david@prism.demon.co.uk)
+.\" Copyright 2022 Alejandro Colomar <alx@kernel.org>
 .\"
-.\" SPDX-License-Identifier: Linux-man-pages-copyleft
-.\"
-.\" References consulted:
-.\"     Linux libc source code
-.\"     Lewine's _POSIX Programmer's Guide_ (O'Reilly & Associates, 1991)
-.\"     386BSD man pages
-.\" Modified Sat Jul 24 18:06:49 1993 by Rik Faith (faith@cs.unc.edu)
-.\" Modified Fri Aug 25 23:17:51 1995 by Andries Brouwer (aeb@cwi.nl)
-.\" Modified Wed Dec 18 00:47:18 1996 by Andries Brouwer (aeb@cwi.nl)
-.\" 2007-06-15, Marc Boyer <marc.boyer@enseeiht.fr> + mtk
-.\"     Improve discussion of strncpy().
+.\" SPDX-License-Identifier: BSD-3-Clause
 .\"
 .TH strcpy 3 (date) "Linux man-pages (unreleased)"
+.\" ----- NAME :: -----------------------------------------------------/
 .SH NAME
-strcpy \- copy a string
+stpcpy,
+strcpy, strcat,
+stpecpy, stpecpyx,
+strlcpy, strlcat,
+strscpy,
+stpncpy,
+strncpy,
+ustr2stp,
+strncat,
+mempcpy
+\- copy strings and character sequences
+.\" ----- LIBRARY :: --------------------------------------------------/
 .SH LIBRARY
+.TP
+.BR stpcpy (3)
+.TQ
+.BR strcpy "(3), \c"
+.BR strcat (3)
+.TQ
+.BR stpncpy (3)
+.TQ
+.BR strncpy (3)
+.TQ
+.BR strncat (3)
+.TQ
+.BR mempcpy (3)
 Standard C library
 .RI ( libc ", " \-lc )
+.TP
+.BR stpecpy "(3), \c"
+.BR stpecpyx (3)
+Not provided by any library.
+.TP
+.BR strlcpy "(3), \c"
+.BR strlcat (3)
+Utility functions from BSD systems
+.RI ( libbsd ", " \-lbsd )
+.TP
+.BR strscpy (3)
+Not provided by any library.
+It is a Linux kernel internal function.
+.\" ----- SYNOPSIS :: -------------------------------------------------/
 .SH SYNOPSIS
 .nf
 .B #include <string.h>
+.fi
+.\" ----- SYNOPSIS :: (Null-terminated) strings -----------------------/
+.SS Strings
+.nf
+// Chain-copy a string.
+.BI "char *stpcpy(char *restrict " dst ", const char *restrict " src );
 .PP
-.BI "char *strcpy(char *restrict " dest ", const char *restrict " src );
+// Copy/concatenate a string.
+.BI "char *strcpy(char *restrict " dst ", const char *restrict " src );
+.BI "char *strcat(char *restrict " dst ", const char *restrict " src );
+.PP
+// Chain-copy a string with truncation.
+.BI "char *stpecpy(char *" dst ", char " past_end "[0], \
+const char *restrict " src );
+.PP
+// Chain-copy a string with truncation and SIGSEGV on UB.
+.BI "char *stpecpyx(char *" dst ", char " past_end "[0], \
+const char *restrict " src );
+.PP
+// Copy/concatenate a string with truncation and SIGSEGV on UB.
+.BI "size_t strlcpy(char " dst "[restrict ." sz "], \
+const char *restrict " src ,
+.BI "               size_t " sz );
+.BI "size_t strlcat(char " dst "[restrict ." sz "], \
+const char *restrict " src ,
+.BI "               size_t " sz );
+.PP
+// Copy a string with truncation.
+.BI "ssize_t strscpy(char " dst "[restrict ." sz "], \
+const char " src "[restrict ." sz ],
+.BI "               size_t " sz );
+.fi
+.\" ----- SYNOPSIS :: Null-padded character sequences --------/
+.SS Null-padded character sequences
+.nf
+// Zero a fixed-width buffer, and
+// copy a string with truncation into a character sequence.
+.BI "char *stpncpy(char " dst "[restrict ." sz "], \
+const char *restrict " src ,
+.BI "               size_t " sz );
+.PP
+// Zero a fixed-width buffer, and
+// copy a string with truncation into a character sequence.
+.BI "char *strncpy(char " dest "[restrict ." sz "], \
+const char *restrict " src ,
+.BI "               size_t " sz );
+.PP
+// Chain-copy a null-padded character sequence into a string.
+.BI "char *ustr2stp(char *restrict " dst ", \
+const char " src "[restrict ." sz ],
+.BI "               size_t " sz );
+.PP
+// Concatenate a null-padded character sequence into a string.
+.BI "char *strncat(char *restrict " dst ", const char " src "[restrict ." sz ],
+.BI "               size_t " sz );
+.fi
+.\" ----- SYNOPSIS :: Measured character sequences --------------------/
+.SS Measured character sequences
+.nf
+// Chain-copy a measured character sequence.
+.BI "void *mempcpy(void *restrict " dst ", \
+const void " src "[restrict ." len ],
+.BI "               size_t " len );
+.fi
+.PP
+.RS -4
+Feature Test Macro Requirements for glibc (see
+.BR feature_test_macros (7)):
+.RE
+.PP
+.BR stpcpy (3),
+.BR stpncpy (3):
+.nf
+    Since glibc 2.10:
+        _POSIX_C_SOURCE >= 200809L
+    Before glibc 2.10:
+        _GNU_SOURCE
+.fi
+.PP
+.BR mempcpy (3):
+.nf
+    _GNU_SOURCE
 .fi
 .SH DESCRIPTION
-The
-.BR strcpy ()
-function copies the string pointed to by
-.IR src ,
-including the terminating null byte (\(aq\e0\(aq),
-to the buffer pointed to by
-.IR dest .
-The strings may not overlap, and the destination string
-.I dest
-must be large enough to receive the copy.
-.I Beware of buffer overruns!
-(See BUGS.)
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: -----------------/
+.SS Terms (and abbreviations)
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: string (str) ----/
+.TP
+.IR "string " ( str )
+is a sequence of zero or more non-null characters followed by a null byte.
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: null-padded character seq
+.TP
+.IR "character sequence " ( ustr )
+is a sequence of zero or more non-null characters.
+A program should never usa a character sequence where a string is required.
+However, with appropriate care,
+a string can be used in the place of a character sequence.
+.RS
+.TP
+.I null-padded character sequence
+Character sequences can be contained in fixed-width buffers,
+which contain padding null bytes after the character sequence,
+to fill the rest of the buffer
+without affecting the character sequence;
+however, those padding null bytes are not part of the character sequence.
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: measured character sequence
+.TP
+.I measured character sequence
+Character sequence delimited by its length.
+.RE
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: length (len) ----/
+.TP
+.IR "length " ( len )
+is the number of non-null characters in a string or character sequence.
+It is the return value of
+.I strlen(str)
+and of
+.IR "strnlen(ustr, sz)" .
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: size (sz) -------/
+.TP
+.IR "size " ( sz )
+refers to the entire buffer
+where the string or character sequence is contained.
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: end -------------/
+.TP
+.I end
+is the name of a pointer to the terminating null byte of a string,
+or a pointer to one past the last character of a character sequence.
+This is the return value of functions that allow chaining.
+It is equivalent to
+.IR &str[len] .
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: past_end --------/
+.TP
+.I past_end
+is the name of a pointer to one past the end of the buffer
+that contains a string or character sequence.
+It is equivalent to
+.IR &str[sz] .
+It is used as a sentinel value,
+to be able to truncate strings or character sequences
+instead of overrunning the containing buffer.
+.\" ----- DESCRIPTION :: Copy, concatenate, and chain-copy ------------/
+.SS Copy, concatenate, and chain-copy
+Originally,
+there was a distinction between functions that copy and those that concatenate.
+However, newer functions that copy while allowing chaining
+cover both use cases with a single API.
+They are also algorithmically faster,
+since they don't need to search for the end of the existing string.
+However, functions that concatenate have a much simpler use,
+so if performance is not important,
+it can make sense to use them for improving readability.
+.PP
+To chain copy functions,
+they need to return a pointer to the
+.IR end .
+That's a byproduct of the copy operation,
+so it has no performance costs.
+Functions that return such a pointer,
+and thus can be chained,
+have names of the form
+.RB * stp *()
+or
+.RB * memp *(),
+since it's also common to name the pointer just
+.IR p .
+.PP
+Chain-copying functions that truncate
+should accept a pointer to one past the end of the destination buffer,
+and have names of the form
+.RB * stpe *().
+This allows not having to recalculate the remaining size after each call.
+.\" ----- DESCRIPTION :: Truncate or not? -----------------------------/
+.SS Truncate or not?
+The first thing to note is that programmers should be careful with buffers,
+so they always have the correct size,
+and truncation is not necessary.
+.PP
+In most cases,
+truncation is not desired,
+and it is simpler to just do the copy.
+Simpler code is safer code.
+Programming against programming mistakes by adding more code
+just adds more points where mistakes can be made.
+.PP
+Nowadays,
+compilers can detect most programmer errors with features like
+compiler warnings,
+static analyzers, and
+.BR \%_FORTIFY_SOURCE
+(see
+.BR ftm (7)).
+Keeping the code simple
+helps these overflow-detection features be more precise.
+.PP
+When validating user input,
+however,
+it makes sense to truncate.
+Remember to check the return value of such function calls.
+.PP
+Functions that truncate:
+.IP \(bu 3
+.BR stpecpy (3)
+is the most efficient string copy function that performs truncation.
+It only requires to check for truncation once after all chained calls.
+.IP \(bu
+.BR stpecpyx (3)
+is a variant of
+.BR stpecpy (3)
+that consumes the entire source string,
+to catch bugs in the program
+by forcing a segmentation fault (as
+.BR strlcpy (3bsd)
+and
+.BR strlcat (3bsd)
+do).
+.IP \(bu
+.BR strlcpy (3bsd)
+and
+.BR strlcat (3bsd)
+are designed to crash if the input string is invalid
+(doesn't contain a terminating null byte).
+.IP \(bu
+.BR strscpy (3)
+reports an error instead of crashing (similar to
+.BR stpecpy (3)).
+.IP \(bu
+.BR stpncpy (3)
+and
+.BR strncpy (3)
+also truncate, but they don't write strings,
+but rather null-padded character sequences.
+.\" ----- DESCRIPTION :: Null-padded character sequences --------------/
+.SS Null-padded character sequences
+For historic reasons,
+some standard APIs,
+such as
+.BR utmpx (5),
+use null-padded character sequences in fixed-width buffers.
+To interface with them,
+specialized functions need to be used.
+.PP
+To copy strings into them, use
+.BR stpncpy (3).
+.PP
+To copy from an unterminated string within a fixed-width buffer into a string,
+ignoring any trailing null bytes in the source fixed-width buffer,
+you should use
+.BR ustr2stp (3)
+or
+.BR strncat (3).
+.\" ----- DESCRIPTION :: Measured character sequences -----------------/
+.SS Measured character sequences
+The simplest character sequence copying function is
+.BR mempcpy (3).
+It requires always knowing the length of your character sequences,
+for which structures can be used.
+It makes the code much faster,
+since you always know the length of your character sequences,
+and can do the minimal copies and length measurements.
+.BR mempcpy (3)
+copies character sequences,
+so you need to explicitly set the terminating null byte if you need a string.
+.PP
+The following code can be used to
+chain-copy from a measured character sequence into a string:
+.PP
+.in +4n
+.EX
+p = mempcpy(p, foo\->ustr, foo\->len);
+*p = \(aq\e0\(aq;
+.EE
+.in
+.PP
+The following code can be used to
+chain-copy from a measured character sequence into an unterminated string:
+.PP
+.in +4n
+.EX
+p = mempcpy(p, bar\->ustr, bar\->len);
+.EE
+.in
+.PP
+In programs that make considerable use of strings or character sequences,
+and need the best performance,
+using overlapping character sequences can make a big difference.
+It allows holding subsequences of a larger character sequence.
+while not duplicating memory
+nor using time to do a copy.
+.PP
+However, this is delicate,
+since it requires using character sequences.
+C library APIs use strings,
+so programs that use character sequences
+will have to take care of differentiating strings from character sequences.
+.\" ----- DESCRIPTION :: String vs character sequence -----------------/
+.SS String vs character sequence
+Some functions only operate on strings.
+Those require that the input
+.I src
+is a string,
+and guarantee an output string
+(even when truncation occurs).
+Functions that concatenate
+also require that
+.I dst
+holds a string before the call.
+List of functions:
+.IP \(bu 3
+.PD 0
+.BR stpcpy (3)
+.IP \(bu
+.BR strcpy "(3), \c"
+.BR strcat (3)
+.IP \(bu
+.BR stpecpy "(3), \c"
+.BR stpecpyx (3)
+.IP \(bu
+.BR strlcpy "(3bsd), \c"
+.BR strlcat (3bsd)
+.IP \(bu
+.BR strscpy (3)
+.PD
+.PP
+Other functions require an input string,
+but create a character sequence as output.
+These functions have confusing names,
+and have a long history of misuse.
+List of functions:
+.IP \(bu 3
+.PD 0
+.BR stpncpy (3)
+.IP \(bu
+.BR strncpy (3)
+.PD
+.PP
+Other functions operate on an input character sequence,
+and create an output string.
+Functions that concatenate
+also require that
+.I dst
+holds a string before the call.
+.BR strncat (3)
+has an even more misleading name than the functions above.
+List of functions:
+.IP \(bu 3
+.PD 0
+.BR ustr2stp (3)
+.IP \(bu
+.BR strncat (3)
+.PD
+.PP
+And the last one,
+operates on an input character sequence
+to create an output character sequence.
+But because it asks for the length,
+and a string is by nature composed of a character sequence of the same length
+plus a terminating null byte,
+a string is also accepted as input.
+Function:
+.IP \(bu 3
+.BR mempcpy (3)
+.\" ----- DESCRIPTION :: Functions :: ---------------------------------/
+.SS Functions
+.\" ----- DESCRIPTION :: Functions :: stpcpy(3) -----------------------/
+.TP
+.BR stpcpy (3)
+This function copies the input string into a destination string.
+The programmer is responsible for allocating a buffer large enough.
+It returns a pointer suitable for chaining.
+.IP
+An implementation of this function might be:
+.IP
+.in +4n
+.EX
+char *
+stpcpy(char *restrict dst, const char *restrict src)
+{
+    return mempcpy(dst, src, strlen(src));
+}
+.EE
+.in
+.\" ----- DESCRIPTION :: Functions :: strcpy(3), strcat(3) ------------/
+.TP
+.BR strcpy (3)
+.TQ
+.BR strcat (3)
+These functions copy the input string into a destination string.
+The programmer is responsible for allocating a buffer large enough.
+The return value is useless.
+.IP
+.BR stpcpy (3)
+is a faster alternative to these functions.
+.IP
+An implementation of these functions might be:
+.IP
+.in +4n
+.EX
+char *
+strcpy(char *restrict dst, const char *restrict src)
+{
+    stpcpy(dst, src);
+    return dst;
+}
+
+char *
+strcat(char *restrict dst, const char *restrict src)
+{
+    stpcpy(dst + strlen(dst), src);
+    return dst;
+}
+.EE
+.in
+.\" ----- DESCRIPTION :: Functions :: stpecpy(3), stpecpyx(3) ---------/
+.TP
+.BR stpecpy (3)
+.TQ
+.BR stpecpyx (3)
+These functions copy the input string into a destination string.
+If the destination buffer,
+limited by a pointer to one past the end of it,
+isn't large enough to hold the copy,
+the resulting string is truncated
+(but it is guaranteed to be null-terminated).
+They return a pointer suitable for chaining.
+Truncation needs to be detected only once after the last chained call.
+.BR stpecpyx (3)
+has identical semantics to
+.BR stpecpy (3),
+except that it forces a SIGSEGV if the
+.I src
+pointer is not a string.
+.IP
+These functions are not provided by any library,
+but you can define them with the following reference implementations:
+.IP
+.in +4n
+.EX
+/* This code is in the public domain. */
+char *
+stpecpy(char *dst, char past_end[0],
+        const char *restrict src)
+{
+    char *p;
+
+    if (dst == past_end)
+        return past_end;
+
+    p = memccpy(dst, src, \(aq\e0\(aq, past_end \- dst);
+    if (p != NULL)
+        return p \- 1;
+
+    /* truncation detected */
+    past_end[\-1] = \(aq\e0\(aq;
+    return past_end;
+}
+
+/* This code is in the public domain. */
+char *
+stpecpyx(char *dst, char past_end[0],
+         const char *restrict src)
+{
+    if (src[strlen(src)] != \(aq\e0\(aq)
+        raise(SIGSEGV);
+
+    return stpecpy(dst, past_end, src);
+}
+.EE
+.in
+.\" ----- DESCRIPTION :: Functions :: strlcpy(3bsd), strlcat(3bsd) ----/
+.TP
+.BR strlcpy (3bsd)
+.TQ
+.BR strlcat (3bsd)
+These functions copy the input string into a destination string.
+If the destination buffer,
+limited by its size,
+isn't large enough to hold the copy,
+the resulting string is truncated
+(but it is guaranteed to be null-terminated).
+They return the length of the total string they tried to create.
+These functions force a SIGSEGV if the
+.I src
+pointer is not a string.
+.IP
+.BR stpecpyx (3)
+is a faster alternative to these functions.
+.\" ----- DESCRIPTION :: Functions :: strscpy(3) ----------------------/
+.TP
+.BR strscpy (3)
+This function copies the input string into a destination string.
+If the destination buffer,
+limited by its size,
+isn't large enough to hold the copy,
+the resulting string is truncated
+(but it is guaranteed to be null-terminated).
+It returns the length of the destination string, or
+.B \-E2BIG
+on truncation.
+.IP
+.BR stpecpy (3)
+is a simpler and faster alternative to this function.
+.RE
+.\" ----- DESCRIPTION :: Functions :: stpncpy(3) ----------------------/
+.TP
+.BR stpncpy (3)
+This function copies the input string into
+a destination null-padded character sequence in a fixed-width buffer.
+If the destination buffer,
+limited by its size,
+isn't large enough to hold the copy,
+the resulting character sequence is truncated.
+Since it creates a character sequence,
+it doesn't need to write a terminating null byte.
+It returns a pointer suitable for chaining,
+but it's not ideal for that.
+Truncation needs to be detected only once after the last chained call.
+.IP
+If you're going to use this function in chained calls,
+it would be useful to develop a similar function
+that accepts a pointer to one past the end of the buffer instead of a size.
+.IP
+An implementation of this function might be:
+.IP
+.in +4n
+.EX
+char *
+stpncpy(char *restrict dst, const char *restrict src,
+        size_t sz)
+{
+    char  *p;
+
+    bzero(dst, sz);
+    p = memccpy(dst, src, \(aq\e0\(aq, sz);
+    if (p == NULL)
+        return dst + sz;
+
+    return p \- 1;
+}
+.EE
+.in
+.\" ----- DESCRIPTION :: Functions :: ustr2stp(3) ---------------------/
+.TP
+.BR ustr2stp (3)
+This function copies the input character sequence
+contained in a null-padded wixed-width buffer,
+into a destination string.
+The programmer is responsible for allocating a buffer large enough.
+It returns a pointer suitable for chaining.
+.IP
+A truncating version of this function doesn't exist,
+since the size of the original character sequence is always known,
+so it wouldn't be very useful.
+.IP
+This function is not provided by any library,
+but you can define it with the following reference implementation:
+.IP
+.in +4n
+.EX
+/* This code is in the public domain. */
+char *
+ustr2stp(char *restrict dst, const char *restrict src,
+         size_t sz)
+{
+    char  *end;
+
+    end = memccpy(dst, src, \(aq\e0\(aq, sz)) ?: dst + sz;
+    *end = \(aq\e0\(aq;
+
+    return end;
+}
+.EE
+.in
+.\" ----- DESCRIPTION :: Functions :: strncpy(3) ----------------------/
+.TP
+.BR strncpy (3)
+This function is identical to
+.BR stpncpy (3)
+except for the useless return value.
+Due to the return value,
+with this function it's hard to correctly check for truncation.
+.IP
+.BR stpncpy (3)
+is a simpler alternative to this function.
+.IP
+An implementation of this function might be:
+.IP
+.in +4n
+.EX
+char *
+strncpy(char *restrict dst, const char *restrict src,
+        size_t sz)
+{
+    stpncpy(dst, src, sz);
+    return dst;
+}
+.EE
+.in
+.\" ----- DESCRIPTION :: Functions :: strncat(3) ----------------------/
+.TP
+.BR strncat (3)
+Do not confuse this function with
+.BR strncpy (3);
+they are not related at all.
+.IP
+This function concatenates the input character sequence
+contained in a null-padded wixed-width buffer,
+into a destination string.
+The programmer is responsible for allocating a buffer large enough.
+The return value is useless.
+.IP
+.BR ustr2stp (3)
+is a faster alternative to this function.
+.IP
+An implementation of this function might be:
+.IP
+.in +4n
+.EX
+char *
+strncat(char *restrict dst, const char *restrict src,
+        size_t sz)
+{
+    ustr2stp(dst + strlen(dst), src, sz);
+    return dst;
+}
+.EE
+.in
+.\" ----- DESCRIPTION :: Functions :: mempcpy(3) ----------------------/
+.TP
+.BR mempcpy (3)
+This function copies the input character sequence,
+limited by its length,
+into a destination character sequence.
+The programmer is responsible for allocating a buffer large enough.
+It returns a pointer suitable for chaining.
+.IP
+An implementation of this function might be:
+.IP
+.in +4n
+.EX
+void *
+mempcpy(void *restrict dst, const void *restrict src,
+        size_t len)
+{
+    return memcpy(dst, src, len) + len;
+}
+.EE
+.in
+.\" ----- RETURN VALUE :: ---------------------------------------------/
 .SH RETURN VALUE
-The
-.BR strcpy ()
-function returns a pointer to
-the destination string
-.IR dest .
+The following functions return
+a pointer to the terminating null byte in the destination string.
+.IP \(bu 3
+.PD 0
+.BR stpcpy (3)
+.IP \(bu
+.BR ustr2stp (3)
+.PD
+.PP
+The following functions return
+a pointer to the terminating null byte in the destination string,
+except when truncation occurs;
+if truncation occurs,
+they return a pointer to one past the end of the destination buffer
+.RI ( past_end ).
+.IP \(bu 3
+.BR stpecpy (3),
+.BR stpecpyx (3)
+.PP
+The following function returns
+a pointer to one after the last character
+in the destination character sequence;
+if truncation occurs,
+that pointer is equivalent to
+a pointer to one past the end of the destination buffer.
+.IP \(bu 3
+.BR stpncpy (3)
+.PP
+The following function returns
+a pointer to one after the last character
+in the destination character sequence.
+.IP \(bu 3
+.BR mempcpy (3)
+.PP
+The following functions return
+the length of the total string that they tried to create
+(as if truncation didn't occur).
+.IP \(bu 3
+.BR strlcpy (3bsd),
+.BR strlcat (3bsd)
+.PP
+The following function returns
+the length of the destination string, or
+.B \-E2BIG
+on truncation.
+.IP \(bu 3
+.BR strscpy (3)
+.PP
+The following functions return the
+.I dst
+pointer,
+which is useless.
+.IP \(bu 3
+.PD 0
+.BR strcpy (3),
+.BR strcat (3)
+.IP \(bu
+.BR strncpy (3)
+.IP \(bu
+.BR strncat (3)
+.PD
+.\" ----- ATTRIBUTES :: -----------------------------------------------/
 .SH ATTRIBUTES
 For an explanation of the terms used in this section, see
 .BR attributes (7).
@@ -54,73 +773,236 @@ .SH ATTRIBUTES
 l l l.
 Interface	Attribute	Value
 T{
-.BR strcpy ()
+.BR stpcpy (),
+.BR strcpy (),
+.BR strcat (),
+.BR stpecpy (),
+.BR stpecpyx ()
+.BR strlcpy (),
+.BR strlcat (),
+.BR strscpy (),
+.BR stpncpy (),
+.BR strncpy (),
+.BR ustr2stp (),
+.BR strncat (),
+.BR mempcpy ()
 T}	Thread safety	MT-Safe
 .TE
 .hy
 .ad
 .sp 1
+.\" ----- STANDARDS :: ------------------------------------------------/
 .SH STANDARDS
-POSIX.1-2001, POSIX.1-2008, C89, C99, SVr4, 4.3BSD.
-.SH NOTES
-.SS strlcpy()
-Some systems (the BSDs, Solaris, and others) provide the following function:
+.TP
+.BR strcpy "(3), \c"
+.BR strcat (3)
+.TQ
+.BR strncpy (3)
+.TQ
+.BR strncat (3)
+POSIX.1‐2001, POSIX.1‐2008, C89, C99, SVr4, 4.3BSD.
+.TP
+.BR stpcpy (3)
+.\" This function was added to POSIX.1-2008.
+.\" Before that, it was not part of
+.\" the C or POSIX.1 standards, nor customary on UNIX systems.
+.\" It first appeared at least as early as 1986,
+.\" in the Lattice C AmigaDOS compiler,
+.\" then in the GNU fileutils and GNU textutils in 1989,
+.\" and in the GNU C library by 1992.
+.\" It is also present on the BSDs.
+.TQ
+.BR stpncpy (3)
+.\" This function was added to POSIX.1-2008.
+.\" Before that, it was a GNU extension.
+.\" It first appeared in glibc 1.07 in 1993.
+POSIX.1-2008.
+.TP
+.BR strlcpy "(3bsd), \c"
+.BR strlcat (3bsd)
+Functions originated in OpenBSD and present in some Unix systems.
+.TP
+.BR mempcpy (3)
+This function is a GNU extension.
+.TP
+.BR strscpy (3)
+Linux kernel internal function.
+.TP
+.BR stpecpy "(3), \c"
+.BR stpecpyx (3)
+.TQ
+.BR ustr2stp (3)
+Not defined by any standards nor libraries.
+.\" ----- CAVEATS :: --------------------------------------------------/
+.SH CAVEATS
+Don't mix chain calls to truncating and non-truncating functions.
+It is conceptually wrong
+unless you know that the first part of a copy will always fit.
+Anyway, the performance difference will probably be negligible,
+so it will probably be more clear if you use consistent semantics:
+either truncating or non-truncating.
+Calling a non-truncating function after a truncating one is necessarily wrong.
 .PP
+Some of the functions described here are not provided by any library;
+you should write your own copy if you want to use them.
+See STANDARDS.
+.\" ----- BUGS :: -----------------------------------------------------/
+.SH BUGS
+All concatenation
+.RB (* cat ())
+functions share the same performance problem:
+.UR https://www.joelonsoftware.com/\:2001/12/11/\:back\-to\-basics/
+Shlemiel the painter
+.UE .
+.\" ----- EXAMPLES :: -------------------------------------------------/
+.SH EXAMPLES
+The following are examples of correct use of each of these functions.
+.\" ----- EXAMPLES :: stpcpy(3) ---------------------------------------/
+.TP
+.BR stpcpy (3)
 .in +4n
 .EX
-size_t strlcpy(char *dest, const char *src, size_t size);
+p = buf;
+p = stpcpy(p, "Hello ");
+p = stpcpy(p, "world");
+p = stpcpy(p, "!");
+len = p \- buf;
+puts(buf);
 .EE
 .in
-.PP
-.\" http://static.usenix.org/event/usenix99/full_papers/millert/millert_html/index.html
-.\"     "strlcpy and strlcat - consistent, safe, string copy and concatenation"
-.\"     1999 USENIX Annual Technical Conference
-This function is similar to
-.BR strcpy (),
-but it copies at most
-.I size\-1
-bytes to
-.IR dest ,
-truncating the string as necessary.
-It always adds a terminating null byte.
-This function fixes some of the problems of
-.BR strcpy ()
-but the caller must still handle the possibility of data loss if
-.I size
-is too small.
-The return value of the function is the length of
-.IR src ,
-which allows truncation to be easily detected:
-if the return value is greater than or equal to
-.IR size ,
-truncation occurred.
-If loss of data matters, the caller
-.I must
-either check the arguments before the call,
-or test the function return value.
-.BR strlcpy ()
-is not present in glibc and is not standardized by POSIX,
-.\" https://lwn.net/Articles/506530/
-but is available on Linux via the
-.I libbsd
-library.
-.SH BUGS
-If the destination string of a
-.BR strcpy ()
-is not large enough, then anything might happen.
-Overflowing fixed-length string buffers is a favorite cracker technique
-for taking complete control of the machine.
-Any time a program reads or copies data into a buffer,
-the program first needs to check that there's enough space.
-This may be unnecessary if you can show that overflow is impossible,
-but be careful: programs can get changed over time,
-in ways that may make the impossible possible.
+.\" ----- EXAMPLES :: strcpy(3), strcat(3) ----------------------------/
+.TP
+.BR strcpy (3)
+.TQ
+.BR strcat (3)
+.in +4n
+.EX
+strcpy(buf, "Hello ");
+strcat(buf, "world");
+strcat(buf, "!");
+len = strlen(buf);
+puts(buf);
+.EE
+.in
+.\" ----- EXAMPLES :: stpecpy(3), stpecpyx(3) -------------------------/
+.TP
+.BR stpecpy (3)
+.TQ
+.BR stpecpyx (3)
+.in +4n
+.EX
+past_end = buf + sizeof(buf);
+p = buf;
+p = stpecpy(p, past_end, "Hello ");
+p = stpecpy(p, past_end, "world");
+p = stpecpy(p, past_end, "!");
+if (p == past_end) {
+    p\-\-;
+    goto toolong;
+}
+len = p \- buf;
+puts(buf);
+.EE
+.in
+.\" ----- EXAMPLES :: strlcpy(3bsd), strlcat(3bsd) --------------------/
+.TP
+.BR strlcpy (3bsd)
+.TQ
+.BR strlcat (3bsd)
+.in +4n
+.EX
+if (strlcpy(buf, "Hello ", sizeof(buf)) >= sizeof(buf))
+    goto toolong;
+if (strlcat(buf, "world", sizeof(buf)) >= sizeof(buf))
+    goto toolong;
+len = strlcat(buf, "!", sizeof(buf));
+if (len >= sizeof(buf))
+    goto toolong;
+puts(buf);
+.EE
+.in
+.\" ----- EXAMPLES :: strscpy(3) --------------------------------------/
+.TP
+.BR strscpy (3)
+.in +4n
+.EX
+len = strscpy(buf, "Hello world!", sizeof(buf));
+if (len == \-E2BIG)
+    goto toolong;
+puts(buf);
+.EE
+.in
+.\" ----- EXAMPLES :: stpncpy(3) --------------------------------------/
+.TP
+.BR stpncpy (3)
+.in +4n
+.EX
+past_end = buf + sizeof(buf);
+end = stpncpy(buf, "Hello world!", sizeof(buf));
+if (end == past_end)
+    goto toolong;
+len = end \- buf;
+for (size_t i = 0; i < sizeof(buf); i++)
+    putchar(buf[i]);
+.EE
+.in
+.\" ----- EXAMPLES :: strncpy(3) --------------------------------------/
+.TP
+.BR strncpy (3)
+.in +4n
+.EX
+strncpy(buf, "Hello world!", sizeof(buf));
+if (buf + sizeof(buf) \- 1 == \(aq\e0\(aq)
+    goto toolong;
+len = strnlen(buf, sizeof(buf));
+for (size_t i = 0; i < sizeof(buf); i++)
+    putchar(buf[i]);
+.EE
+.in
+.\" ----- EXAMPLES :: ustr2stp(3) -------------------------------------/
+.TP
+.BR ustr2stp (3)
+.in +4n
+.EX
+p = buf;
+p = ustr2stp(p, "Hello ", 6);
+p = ustr2stp(p, "world", 42);  // Padding null bytes ignored.
+p = ustr2stp(p, "!", 1);
+len = p \- buf;
+puts(buf);
+.EE
+.in
+.\" ----- EXAMPLES :: strncat(3) --------------------------------------/
+.TP
+.BR strncat (3)
+.in +4n
+.EX
+buf[0] = \(aq\e0\(aq;  // There's no 'cpy' function to this 'cat'.
+strncat(buf, "Hello ", 6);
+strncat(buf, "world", 42);  // Padding null bytes ignored.
+strncat(buf, "!", 1);
+len = strlen(buf);
+puts(buf);
+.EE
+.in
+.\" ----- EXAMPLES :: mempcpy(3) --------------------------------------/
+.TP
+.BR mempcpy (3)
+.in +4n
+.EX
+p = buf;
+p = mempcpy(p, "Hello ", 6);
+p = mempcpy(p, "world", 5);
+p = mempcpy(p, "!", 1);
+p = \(aq\e0\(aq;
+len = p \- buf;
+puts(buf);
+.EE
+.in
+.\" ----- SEE ALSO :: -------------------------------------------------/
 .SH SEE ALSO
-.BR bcopy (3),
-.BR memccpy (3),
+.BR bzero (3),
 .BR memcpy (3),
-.BR memmove (3),
-.BR stpcpy (3),
-.BR strdup (3),
-.BR string (3),
-.BR wcscpy (3)
+.BR memccpy (3),
+.BR mempcpy (3),
+.BR string (3)
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 0/1] Rewritten page for string-copying functions
  2022-12-14  0:03     ` [PATCH v3 0/1] Rewritten page for string-copying functions Alejandro Colomar
@ 2022-12-14  0:14       ` Alejandro Colomar
  2022-12-14  0:16         ` Alejandro Colomar
  2022-12-14 16:17       ` [PATCH v4 " Alejandro Colomar
  2022-12-14 16:17       ` [PATCH v4 1/1] strcpy.3: Rewrite page to document all string-copying functions Alejandro Colomar
  2 siblings, 1 reply; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-14  0:14 UTC (permalink / raw)
  To: linux-man, Martin Sebor, G. Branden Robinson, Douglas McIlroy,
	Jakub Wilk
  Cc: Alejandro Colomar


[-- Attachment #1.1: Type: text/plain, Size: 32004 bytes --]



On 12/14/22 01:03, Alejandro Colomar wrote:
> 
> Hi!
> 
> I've written a new manual page for documenting string-copying functions
> so that it's clear what's the purpose of each of them.  It may differ
> from the original design of the functions, since my guess for several of
> them is simply that they were misdesigned.  However, after investigating
> the operation that they perform on bytes, I've come up with a story that
> can make sense of functions that were once believed to be broken by
> many.  In fact, my conclusion after writing the page is that only one
> function is really useless:
> 
> -  strncpy(3):  stpncpy(3) is _always_ better.
> 
> The others depend on the program.  If you don't care at all about
> performance and Shlemiel is a friend of yours, then rcpy and [rn]cat
> are your friends.  If you don't like Shlemiel, and don't mind slightly
> more complex code, you'll go for 'p' functions.
> 
> And so on.  I won't spoil the page more.
> 
> Basically I want to end with this situation where a function like
> strncpy(3) is dreaded by some because it looks broken (myself thought
> that for a long time), and other who don't even know it misuse it for
> what it shouldn't be useful, which is even worse.  Or where programmers
> think that strncpy(3) and strncat(3) have any relationship at all (they
> don't).
> 
> Below goes the formatted page.  Please review independently of it being
> in strcpy(3) or string_copy(7), and address that as a separate issue
> (but of course feel free to cover it, and any other issues).
> 
> 
> Cheers,
> 
> Alex
> 
> 
> Alejandro Colomar (1):
>    strcpy.3: Rewrite page to document all string-copying functions
> 
>   man3/strcpy.3 | 1058 +++++++++++++++++++++++++++++++++++++++++++++----
>   1 file changed, 970 insertions(+), 88 deletions(-)
> 
> 
> strcpy(3)                  Library Functions Manual                  strcpy(3)
> 
> NAME
>         stpcpy,  strcpy,  strcat, stpecpy, stpecpyx, strlcpy, strlcat, strscpy,
>         stpncpy, strncpy, ustr2stp, strncat, mempcpy - copy strings and charac‐
>         ter sequences
> 
> LIBRARY
>         stpcpy(3)
>         strcpy(3), strcat(3)
>         stpncpy(3)
>         strncpy(3)
>         strncat(3)
>         mempcpy(3)
>                Standard C library (libc, -lc)
> 
>         stpecpy(3), stpecpyx(3)
>                Not provided by any library.
> 
>         strlcpy(3), strlcat(3)
>                Utility functions from BSD systems (libbsd, -lbsd)
> 
>         strscpy(3)
>                Not provided by any library.  It  is  a  Linux  kernel  internal
>                function.
> 
> SYNOPSIS
>         #include <string.h>
> 
>     Strings
>         // Chain‐copy a string.
>         char *stpcpy(char *restrict dst, const char *restrict src);
> 
>         // Copy/concatenate a string.
>         char *strcpy(char *restrict dst, const char *restrict src);
>         char *strcat(char *restrict dst, const char *restrict src);
> 
>         // Chain‐copy a string with truncation.
>         char *stpecpy(char *dst, char past_end[0], const char *restrict src);
> 
>         // Chain‐copy a string with truncation and SIGSEGV on UB.
>         char *stpecpyx(char *dst, char past_end[0], const char *restrict src);
> 
>         // Copy/concatenate a string with truncation and SIGSEGV on UB.
>         size_t strlcpy(char dst[restrict .sz], const char *restrict src,
>                        size_t sz);
>         size_t strlcat(char dst[restrict .sz], const char *restrict src,
>                        size_t sz);
> 
>         // Copy a string with truncation.
>         ssize_t strscpy(char dst[restrict .sz], const char src[restrict .sz],
>                        size_t sz);
> 
>     Null‐padded character sequences
>         // Zero a fixed‐width buffer, and
>         // copy a string with truncation into a character sequence.
>         char *stpncpy(char dst[restrict .sz], const char *restrict src,
>                        size_t sz);
> 
>         // Zero a fixed‐width buffer, and
>         // copy a string with truncation into a character sequence.
>         char *strncpy(char dest[restrict .sz], const char *restrict src,
>                        size_t sz);
> 
>         // Chain‐copy a null‐padded character sequence into a string.
>         char *ustr2stp(char *restrict dst, const char src[restrict .sz],
>                        size_t sz);
> 
>         // Concatenate a null‐padded character sequence into a string.
>         char *strncat(char *restrict dst, const char src[restrict .sz],
>                        size_t sz);
> 
>     Measured character sequences
>         // Chain‐copy a measured character sequence.
>         void *mempcpy(void *restrict dst, const void src[restrict .len],
>                        size_t len);
> 
>     Feature Test Macro Requirements for glibc (see feature_test_macros(7)):
> 
>         stpcpy(3), stpncpy(3):
>             Since glibc 2.10:
>                 _POSIX_C_SOURCE >= 200809L
>             Before glibc 2.10:
>                 _GNU_SOURCE
> 
>         mempcpy(3):
>             _GNU_SOURCE
> 
> DESCRIPTION
>     Terms (and abbreviations)
>         string (str)
>                is  a sequence of zero or more non‐null characters followed by a
>                null byte.
> 
>         character sequence (ustr)
>                is a sequence of zero or more non‐null  characters.   A  program
>                should  never  usa  a  character  sequence where a string is re‐
>                quired.  However, with appropriate care, a string can be used in
>                the place of a character sequence.
> 
>                null‐padded character sequence
>                       Character  sequences  can  be  contained  in  fixed‐width
>                       buffers, which contain padding null bytes after the char‐
>                       acter  sequence,  to  fill the rest of the buffer without
>                       affecting the character sequence; however, those  padding
>                       null bytes are not part of the character sequence.
> 
>                measured character sequence
>                       Character sequence delimited by its length.
> 
>         length (len)
>                is  the  number  of non‐null characters in a string or character
>                sequence.   It  is  the  return  value  of  strlen(str)  and  of
>                strnlen(ustr, sz).
> 
>         size (sz)
>                refers  to  the  entire buffer where the string or character se‐
>                quence is contained.
> 
>         end    is the name of a pointer to  the  terminating  null  byte  of  a
>                string, or a pointer to one past the last character of a charac‐
>                ter  sequence.  This is the return value of functions that allow
>                chaining.  It is equivalent to &str[len].
> 
>         past_end
>                is the name of a pointer to one past the end of the buffer  that
>                contains  a  string  or character sequence.  It is equivalent to
>                &str[sz].  It is used as a sentinel value, to be able  to  trun‐
>                cate  strings  or character sequences instead of overrunning the
>                containing buffer.
> 
>     Copy, concatenate, and chain‐copy
>         Originally, there was a distinction between  functions  that  copy  and
>         those  that  concatenate.  However, newer functions that copy while al‐
>         lowing chaining cover both use cases with a single API.  They are  also
>         algorithmically  faster, since they don’t need to search for the end of
>         the existing string.  However, functions that concatenate have  a  much
>         simpler  use,  so if performance is not important, it can make sense to
>         use them for improving readability.
> 
>         To chain copy functions, they need to return  a  pointer  to  the  end.
>         That’s  a  byproduct  of  the  copy operation, so it has no performance
>         costs.  Functions that return such a pointer, and thus can be  chained,
>         have  names  of the form *stp*() or *memp*(), since it’s also common to
>         name the pointer just p.
> 
>         Chain‐copying functions that truncate should accept a  pointer  to  one
>         past  the  end  of  the  destination buffer, and have names of the form
>         *stpe*().  This allows not having to recalculate the remaining size af‐
>         ter each call.
> 
>     Truncate or not?
>         The first thing to note is that  programmers  should  be  careful  with
>         buffers,  so  they  always have the correct size, and truncation is not
>         necessary.
> 
>         In most cases, truncation is not desired, and it is simpler to just  do
>         the copy.  Simpler code is safer code.  Programming against programming
>         mistakes  by  adding more code just adds more points where mistakes can
>         be made.
> 
>         Nowadays, compilers can detect most  programmer  errors  with  features
>         like  compiler  warnings,  static  analyzers,  and _FORTIFY_SOURCE (see
>         ftm(7)).  Keeping the code simple helps these  overflow‐detection  fea‐
>         tures be more precise.
> 
>         When  validating  user input, however, it makes sense to truncate.  Re‐
>         member to check the return value of such function calls.
> 
>         Functions that truncate:
> 
>         •  stpecpy(3) is the most efficient string copy function that  performs
>            truncation.  It only requires to check for truncation once after all
>            chained calls.
> 
>         •  stpecpyx(3)  is  a  variant  of  stpecpy(3) that consumes the entire
>            source string, to catch bugs in the program by forcing  a  segmenta‐
>            tion fault (as strlcpy(3bsd) and strlcat(3bsd) do).
> 
>         •  strlcpy(3bsd)  and  strlcat(3bsd) are designed to crash if the input
>            string is invalid (doesn’t contain a terminating null byte).
> 
>         •  strscpy(3)  reports  an  error  instead  of  crashing  (similar   to
>            stpecpy(3)).
> 
>         •  stpncpy(3)  and  strncpy(3)  also  truncate,  but  they  don’t write
>            strings, but rather null‐padded character sequences.
> 
>     Null‐padded character sequences
>         For historic reasons, some standard APIs, such as utmpx(5),  use  null‐
>         padded  character  sequences in fixed‐width buffers.  To interface with
>         them, specialized functions need to be used.
> 
>         To copy strings into them, use stpncpy(3).
> 
>         To copy from an unterminated string within a fixed‐width buffer into  a
>         string,  ignoring  any  trailing  null  bytes in the source fixed‐width
>         buffer, you should use ustr2stp(3) or strncat(3).
> 
>     Measured character sequences
>         The simplest character sequence copying function is mempcpy(3).  It re‐
>         quires always knowing the length of your character sequences, for which
>         structures can be used.  It makes the code much faster, since  you  al‐
>         ways  know the length of your character sequences, and can do the mini‐
>         mal copies and length measurements.  mempcpy(3)  copies  character  se‐
>         quences, so you need to explicitly set the terminating null byte if you
>         need a string.
> 
>         The  following code can be used to chain‐copy from a measured character
>         sequence into a string:
> 
>             p = mempcpy(p, foo->ustr, foo->len);
>             *p = '\0';
> 
>         The following code can be used to chain‐copy from a measured  character
>         sequence into an unterminated string:
> 
>             p = mempcpy(p, bar->ustr, bar->len);
> 
>         In  programs  that  make  considerable  use of strings or character se‐
>         quences, and need the best performance, using overlapping character se‐
>         quences can make a big difference.  It allows holding subsequences of a
>         larger character sequence.  while not duplicating memory nor using time
>         to do a copy.
> 
>         However, this is delicate, since it requires using character sequences.
>         C library APIs use strings, so programs that  use  character  sequences
>         will  have  to  take care of differentiating strings from character se‐
>         quences.
> 
>     String vs character sequence
>         Some functions only operate on strings.  Those require that  the  input
>         src  is  a string, and guarantee an output string (even when truncation
>         occurs).  Functions that concatenate also  require  that  dst  holds  a
>         string before the call.  List of functions:
> 
>         •  stpcpy(3)
>         •  strcpy(3), strcat(3)
>         •  stpecpy(3), stpecpyx(3)
>         •  strlcpy(3bsd), strlcat(3bsd)
>         •  strscpy(3)
> 
>         Other  functions  require  an  input string, but create a character se‐
>         quence as output.  These functions have confusing  names,  and  have  a
>         long history of misuse.  List of functions:
> 
>         •  stpncpy(3)
>         •  strncpy(3)
> 
>         Other  functions  operate on an input character sequence, and create an
>         output string.  Functions that concatenate also require that dst  holds
>         a  string before the call.  strncat(3) has an even more misleading name
>         than the functions above.  List of functions:
> 
>         •  ustr2stp(3)
>         •  strncat(3)
> 
>         And the last one, operates on an input character sequence to create  an
>         output  character  sequence.  But because it asks for the length, and a
>         string is by nature composed of a character sequence of the same length
>         plus a terminating null byte, a  string  is  also  accepted  as  input.
>         Function:
> 
>         •  mempcpy(3)
> 
>     Functions
>         stpcpy(3)
>                This function copies the input string into a destination string.
>                The  programmer  is  responsible  for  allocating a buffer large
>                enough.  It returns a pointer suitable for chaining.
> 
>                An implementation of this function might be:
> 
>                    char *
>                    stpcpy(char *restrict dst, const char *restrict src)
>                    {
>                        return mempcpy(dst, src, strlen(src));

Oops.  It should have been:

char *p;

p = mempcpy(dst, src, strlen(src));
p = '\0';
return p;

>                    }
> 
>         strcpy(3)
>         strcat(3)
>                These functions copy the input string into a destination string.
>                The programmer is responsible  for  allocating  a  buffer  large
>                enough.  The return value is useless.
> 
>                stpcpy(3) is a faster alternative to these functions.
> 
>                An implementation of these functions might be:
> 
>                    char *
>                    strcpy(char *restrict dst, const char *restrict src)
>                    {
>                        stpcpy(dst, src);
>                        return dst;
>                    }
> 
>                    char *
>                    strcat(char *restrict dst, const char *restrict src)
>                    {
>                        stpcpy(dst + strlen(dst), src);
>                        return dst;
>                    }
> 
>         stpecpy(3)
>         stpecpyx(3)
>                These functions copy the input string into a destination string.
>                If  the destination buffer, limited by a pointer to one past the
>                end of it, isn’t large enough to hold the  copy,  the  resulting
>                string  is  truncated  (but  it  is guaranteed to be null‐termi‐
>                nated).  They return a pointer suitable for  chaining.   Trunca‐
>                tion needs to be detected only once after the last chained call.
>                stpecpyx(3)  has  identical semantics to stpecpy(3), except that
>                it forces a SIGSEGV if the src pointer is not a string.
> 
>                These functions are not provided by any library, but you can de‐
>                fine them with the following reference implementations:
> 
>                    /* This code is in the public domain. */
>                    char *
>                    stpecpy(char *dst, char past_end[0],
>                            const char *restrict src)
>                    {
>                        char *p;
> 
>                        if (dst == past_end)
>                            return past_end;
> 
>                        p = memccpy(dst, src, '\0', past_end - dst);
>                        if (p != NULL)
>                            return p - 1;
> 
>                        /* truncation detected */
>                        past_end[-1] = '\0';
>                        return past_end;
>                    }
> 
>                    /* This code is in the public domain. */
>                    char *
>                    stpecpyx(char *dst, char past_end[0],
>                             const char *restrict src)
>                    {
>                        if (src[strlen(src)] != '\0')
>                            raise(SIGSEGV);
> 
>                        return stpecpy(dst, past_end, src);
>                    }
> 
>         strlcpy(3bsd)
>         strlcat(3bsd)
>                These functions copy the input string into a destination string.
>                If the destination buffer, limited  by  its  size,  isn’t  large
>                enough  to hold the copy, the resulting string is truncated (but
>                it is guaranteed to be null‐terminated).  They return the length
>                of the total string they tried to create.  These functions force
>                a SIGSEGV if the src pointer is not a string.
> 
>                stpecpyx(3) is a faster alternative to these functions.
> 
>         strscpy(3)
>                This function copies the input string into a destination string.
>                If the destination buffer, limited  by  its  size,  isn’t  large
>                enough  to hold the copy, the resulting string is truncated (but
>                it is guaranteed to be null‐terminated).  It returns the  length
>                of the destination string, or -E2BIG on truncation.
> 
>                stpecpy(3) is a simpler and faster alternative to this function.
> 
>         stpncpy(3)
>                This  function  copies the input string into a destination null‐
>                padded character sequence in a fixed‐width buffer.  If the  des‐
>                tination buffer, limited by its size, isn’t large enough to hold
>                the  copy, the resulting character sequence is truncated.  Since
>                it creates a character sequence, it doesn’t need to write a ter‐
>                minating null byte.  It returns a pointer suitable for chaining,
>                but it’s not ideal for that.  Truncation needs  to  be  detected
>                only once after the last chained call.
> 
>                If  you’re going to use this function in chained calls, it would
>                be useful to develop a similar function that accepts  a  pointer
>                to one past the end of the buffer instead of a size.
> 
>                An implementation of this function might be:
> 
>                    char *
>                    stpncpy(char *restrict dst, const char *restrict src,
>                            size_t sz)
>                    {
>                        char  *p;
> 
>                        bzero(dst, sz);
>                        p = memccpy(dst, src, '\0', sz);
>                        if (p == NULL)
>                            return dst + sz;
> 
>                        return p - 1;
>                    }
> 
>         ustr2stp(3)
>                This function copies the input character sequence contained in a
>                null‐padded  wixed‐width buffer, into a destination string.  The
>                programmer is responsible for allocating a buffer large  enough.
>                It returns a pointer suitable for chaining.
> 
>                A  truncating  version of this function doesn’t exist, since the
>                size of the original character sequence is always known,  so  it
>                wouldn’t be very useful.
> 
>                This function is not provided by any library, but you can define
>                it with the following reference implementation:
> 
>                    /* This code is in the public domain. */
>                    char *
>                    ustr2stp(char *restrict dst, const char *restrict src,
>                             size_t sz)
>                    {
>                        char  *end;
> 
>                        end = memccpy(dst, src, '\0', sz)) ?: dst + sz;
>                        *end = '\0';
> 
>                        return end;
>                    }
> 
>         strncpy(3)
>                This  function is identical to stpncpy(3) except for the useless
>                return value.  Due to the return value, with this function  it’s
>                hard to correctly check for truncation.
> 
>                stpncpy(3) is a simpler alternative to this function.
> 
>                An implementation of this function might be:
> 
>                    char *
>                    strncpy(char *restrict dst, const char *restrict src,
>                            size_t sz)
>                    {
>                        stpncpy(dst, src, sz);
>                        return dst;
>                    }
> 
>         strncat(3)
>                Do  not  confuse this function with strncpy(3); they are not re‐
>                lated at all.
> 
>                This function concatenates the  input  character  sequence  con‐
>                tained  in  a null‐padded wixed‐width buffer, into a destination
>                string.  The programmer is responsible for allocating  a  buffer
>                large enough.  The return value is useless.
> 
>                ustr2stp(3) is a faster alternative to this function.
> 
>                An implementation of this function might be:
> 
>                    char *
>                    strncat(char *restrict dst, const char *restrict src,
>                            size_t sz)
>                    {
>                        ustr2stp(dst + strlen(dst), src, sz);
>                        return dst;
>                    }
> 
>         mempcpy(3)
>                This  function  copies  the input character sequence, limited by
>                its length, into a destination character sequence.  The program‐
>                mer is responsible for allocating a buffer large enough.  It re‐
>                turns a pointer suitable for chaining.
> 
>                An implementation of this function might be:
> 
>                    void *
>                    mempcpy(void *restrict dst, const void *restrict src,
>                            size_t len)
>                    {
>                        return memcpy(dst, src, len) + len;
>                    }
> 
> RETURN VALUE
>         The following functions return a pointer to the terminating  null  byte
>         in the destination string.
> 
>         •  stpcpy(3)
>         •  ustr2stp(3)
> 
>         The  following  functions return a pointer to the terminating null byte
>         in the destination string, except when truncation occurs; if truncation
>         occurs, they return a pointer to one past the end  of  the  destination
>         buffer (past_end).
> 
>         •  stpecpy(3), stpecpyx(3)
> 
>         The  following function returns a pointer to one after the last charac‐
>         ter in the destination character sequence; if truncation  occurs,  that
>         pointer  is equivalent to a pointer to one past the end of the destina‐
>         tion buffer.
> 
>         •  stpncpy(3)
> 
>         The following function returns a pointer to one after the last  charac‐
>         ter in the destination character sequence.
> 
>         •  mempcpy(3)
> 
>         The following functions return the length of the total string that they
>         tried to create (as if truncation didn’t occur).
> 
>         •  strlcpy(3bsd), strlcat(3bsd)
> 
>         The following function returns the length of the destination string, or
>         -E2BIG on truncation.
> 
>         •  strscpy(3)
> 
>         The following functions return the dst pointer, which is useless.
> 
>         •  strcpy(3), strcat(3)
>         •  strncpy(3)
>         •  strncat(3)
> 
> ATTRIBUTES
>         For  an  explanation  of  the  terms  used in this section, see attrib‐
>         utes(7).
>         ┌────────────────────────────────────────────┬───────────────┬─────────┐
>         │Interface                                   │ Attribute     │ Value   │
>         ├────────────────────────────────────────────┼───────────────┼─────────┤
>         │stpcpy(), strcpy(), strcat(), stpecpy(),    │ Thread safety │ MT‐Safe │
>         │stpecpyx() strlcpy(), strlcat(), strscpy(), │               │         │
>         │stpncpy(), strncpy(), ustr2stp(),           │               │         │
>         │strncat(), mempcpy()                        │               │         │
>         └────────────────────────────────────────────┴───────────────┴─────────┘
> 
> STANDARDS
>         strcpy(3), strcat(3)
>         strncpy(3)
>         strncat(3)
>                POSIX.1‐2001, POSIX.1‐2008, C89, C99, SVr4, 4.3BSD.
> 
>         stpcpy(3)
>         stpncpy(3)
>                POSIX.1‐2008.
> 
>         strlcpy(3bsd), strlcat(3bsd)
>                Functions originated in OpenBSD and present in  some  Unix  sys‐
>                tems.
> 
>         mempcpy(3)
>                This function is a GNU extension.
> 
>         strscpy(3)
>                Linux kernel internal function.
> 
>         stpecpy(3), stpecpyx(3)
>         ustr2stp(3)
>                Not defined by any standards nor libraries.
> 
> CAVEATS
>         Don’t  mix  chain calls to truncating and non‐truncating functions.  It
>         is conceptually wrong unless you know that the first  part  of  a  copy
>         will  always  fit.  Anyway, the performance difference will probably be
>         negligible, so it will probably be more clear if you use consistent se‐
>         mantics: either truncating or non‐truncating.  Calling a non‐truncating
>         function after a truncating one is necessarily wrong.
> 
>         Some of the functions described here are not provided by  any  library;
>         you should write your own copy if you want to use them.  See STANDARDS.
> 
> BUGS
>         All  concatenation  (*cat()) functions share the same performance prob‐
>         lem: Shlemiel the  painter  ⟨https://www.joelonsoftware.com/2001/12/11/
>         back-to-basics/⟩.
> 
> EXAMPLES
>         The following are examples of correct use of each of these functions.
> 
>         stpcpy(3)
>                    p = buf;
>                    p = stpcpy(p, "Hello ");
>                    p = stpcpy(p, "world");
>                    p = stpcpy(p, "!");
>                    len = p - buf;
>                    puts(buf);
> 
>         strcpy(3)
>         strcat(3)
>                    strcpy(buf, "Hello ");
>                    strcat(buf, "world");
>                    strcat(buf, "!");
>                    len = strlen(buf);
>                    puts(buf);
> 
>         stpecpy(3)
>         stpecpyx(3)
>                    past_end = buf + sizeof(buf);
>                    p = buf;
>                    p = stpecpy(p, past_end, "Hello ");
>                    p = stpecpy(p, past_end, "world");
>                    p = stpecpy(p, past_end, "!");
>                    if (p == past_end) {
>                        p--;
>                        goto toolong;
>                    }
>                    len = p - buf;
>                    puts(buf);
> 
>         strlcpy(3bsd)
>         strlcat(3bsd)
>                    if (strlcpy(buf, "Hello ", sizeof(buf)) >= sizeof(buf))
>                        goto toolong;
>                    if (strlcat(buf, "world", sizeof(buf)) >= sizeof(buf))
>                        goto toolong;
>                    len = strlcat(buf, "!", sizeof(buf));
>                    if (len >= sizeof(buf))
>                        goto toolong;
>                    puts(buf);
> 
>         strscpy(3)
>                    len = strscpy(buf, "Hello world!", sizeof(buf));
>                    if (len == -E2BIG)
>                        goto toolong;
>                    puts(buf);
> 
>         stpncpy(3)
>                    past_end = buf + sizeof(buf);
>                    end = stpncpy(buf, "Hello world!", sizeof(buf));
>                    if (end == past_end)
>                        goto toolong;
>                    len = end - buf;
>                    for (size_t i = 0; i < sizeof(buf); i++)
>                        putchar(buf[i]);
> 
>         strncpy(3)
>                    strncpy(buf, "Hello world!", sizeof(buf));
>                    if (buf + sizeof(buf) - 1 == '\0')
>                        goto toolong;
>                    len = strnlen(buf, sizeof(buf));
>                    for (size_t i = 0; i < sizeof(buf); i++)
>                        putchar(buf[i]);
> 
>         ustr2stp(3)
>                    p = buf;
>                    p = ustr2stp(p, "Hello ", 6);
>                    p = ustr2stp(p, "world", 42);  // Padding null bytes ignored.
>                    p = ustr2stp(p, "!", 1);
>                    len = p - buf;
>                    puts(buf);
> 
>         strncat(3)
>                    buf[0] = '\0';  // There’s no ’cpy’ function to this ’cat’.
>                    strncat(buf, "Hello ", 6);
>                    strncat(buf, "world", 42);  // Padding null bytes ignored.
>                    strncat(buf, "!", 1);
>                    len = strlen(buf);
>                    puts(buf);
> 
>         mempcpy(3)
>                    p = buf;
>                    p = mempcpy(p, "Hello ", 6);
>                    p = mempcpy(p, "world", 5);
>                    p = mempcpy(p, "!", 1);
>                    p = '\0';
>                    len = p - buf;
>                    puts(buf);
> 
> SEE ALSO
>         bzero(3), memcpy(3), memccpy(3), mempcpy(3), string(3)
> 
> Linux man‐pages (unreleased)        (date)                           strcpy(3)
> 

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 0/1] Rewritten page for string-copying functions
  2022-12-14  0:14       ` Alejandro Colomar
@ 2022-12-14  0:16         ` Alejandro Colomar
  0 siblings, 0 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-14  0:16 UTC (permalink / raw)
  To: linux-man, Martin Sebor, G. Branden Robinson, Douglas McIlroy,
	Jakub Wilk
  Cc: Alejandro Colomar


[-- Attachment #1.1: Type: text/plain, Size: 641 bytes --]



On 12/14/22 01:14, Alejandro Colomar wrote:

>>                An implementation of this function might be:
>>
>>                    char *
>>                    stpcpy(char *restrict dst, const char *restrict src)
>>                    {
>>                        return mempcpy(dst, src, strlen(src));
> 
> Oops.  It should have been:
> 
> char *p;
> 
> p = mempcpy(dst, src, strlen(src));
> p = '\0';

*p = '\0';  //:)
> return p;
> 
>>                    }
>>

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v4 0/1] Rewritten page for string-copying functions
  2022-12-14  0:03     ` [PATCH v3 0/1] Rewritten page for string-copying functions Alejandro Colomar
  2022-12-14  0:14       ` Alejandro Colomar
@ 2022-12-14 16:17       ` Alejandro Colomar
  2022-12-15  0:26         ` [PATCH v5 0/5] Rewrite pages about " Alejandro Colomar
                           ` (5 more replies)
  2022-12-14 16:17       ` [PATCH v4 1/1] strcpy.3: Rewrite page to document all string-copying functions Alejandro Colomar
  2 siblings, 6 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-14 16:17 UTC (permalink / raw)
  To: linux-man, Martin Sebor, G. Branden Robinson, Douglas McIlroy,
	Jakub Wilk
  Cc: Alejandro Colomar

Several improvements, including new functions (I wasn't happy with raw
mempcpy(3) and its type unsafety), and fixed an off-by-one error, and
improved descriptions.

Here goes the new version of the formatted page.

Cheers,

Alex

Alejandro Colomar (1):
  strcpy.3: Rewrite page to document all string-copying functions

 man3/strcpy.3 | 1164 +++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 1076 insertions(+), 88 deletions(-)



strcpy(3)                  Library Functions Manual                  strcpy(3)

NAME
       stpcpy,  strcpy,  strcat, stpecpy, stpecpyx, strlcpy, strlcat, strscpy,
       stpncpy, strncpy, zustr2ustp, zustr2stp, strncat, ustpcpy,  ustr2stp  -
       copy strings and character sequences

LIBRARY
       stpcpy(3)
       strcpy(3), strcat(3)
       stpncpy(3)
       strncpy(3)
       strncat(3)
              Standard C library (libc, -lc)

       stpecpy(3), stpecpyx(3)
       zustr2ustp(3), zustr2stp(3)
       ustpcpy(3), ustr2stp(3)
              Not provided by any library.

       strlcpy(3), strlcat(3)
              Utility functions from BSD systems (libbsd, -lbsd)

       strscpy(3)
              Not  provided  by  any  library.   It is a Linux kernel internal
              function.

SYNOPSIS
       #include <string.h>

   Strings
       // Chain‐copy a string.
       char *stpcpy(char *restrict dst, const char *restrict src);

       // Copy/concatenate a string.
       char *strcpy(char *restrict dst, const char *restrict src);
       char *strcat(char *restrict dst, const char *restrict src);

       // Chain‐copy a string with truncation.
       char *stpecpy(char *dst, char past_end[0], const char *restrict src);

       // Chain‐copy a string with truncation and SIGSEGV on UB.
       char *stpecpyx(char *dst, char past_end[0], const char *restrict src);

       // Copy/concatenate a string with truncation and SIGSEGV on UB.
       size_t strlcpy(char dst[restrict .sz], const char *restrict src,
                      size_t sz);
       size_t strlcat(char dst[restrict .sz], const char *restrict src,
                      size_t sz);

       // Copy a string with truncation.
       ssize_t strscpy(char dst[restrict .sz], const char src[restrict .sz],
                      size_t sz);

   Null‐padded character sequences
       // Zero a fixed‐width buffer, and
       // copy a string into a character sequence with truncation.
       char *stpncpy(char dst[restrict .sz], const char *restrict src,
                      size_t sz);

       // Zero a fixed‐width buffer, and
       // copy a string into a character sequence with truncation.
       char *strncpy(char dest[restrict .sz], const char *restrict src,
                      size_t sz);

       // Chain‐copy a null‐padded character sequence into a character sequence.
       char *zustr2ustp(char *restrict dst, const char src[restrict .sz],
                      size_t sz);

       // Chain‐copy a null‐padded character sequence into a string.
       char *zustr2stp(char *restrict dst, const char src[restrict .sz],
                      size_t sz);

       // Concatenate a null‐padded character sequence into a string.
       char *strncat(char *restrict dst, const char src[restrict .sz],
                      size_t sz);

   Measured character sequences
       // Chain‐copy a measured character sequence.
       char *ustpcpy(char *restrict dst, const char src[restrict .len],
                      size_t len);

       // Chain‐copy a measured character sequence into a string.
       char *ustr2stp(char *restrict dst, const char src[restrict .len],
                      size_t len);

   Feature Test Macro Requirements for glibc (see feature_test_macros(7)):

       stpcpy(3), stpncpy(3):
           Since glibc 2.10:
               _POSIX_C_SOURCE >= 200809L
           Before glibc 2.10:
               _GNU_SOURCE

DESCRIPTION
   Terms (and abbreviations)
       string (str)
              is a sequence of zero or more non‐null characters followed by  a
              null byte.

       character sequence
              is  a  sequence  of zero or more non‐null characters.  A program
              should never usa a character sequence  where  a  string  is  re‐
              quired.  However, with appropriate care, a string can be used in
              the place of a character sequence.

              null‐padded character sequence (zustr)
                     Character  sequences  can  be  contained  in  fixed‐width
                     buffers, which contain padding null bytes after the char‐
                     acter sequence, to fill the rest of  the  buffer  without
                     affecting  the character sequence; however, those padding
                     null bytes are not part of the character sequence.

              measured character sequence (ustr)
                     Character sequence delimited by its length.  It may be  a
                     slice  of  a  larger  character  sequence,  or  even of a
                     string.

       length (len)
              is the number of non‐null characters in a  string  or  character
              sequence.   It  is  the  return  value  of  strlen(str)  and  of
              strnlen(ustr, sz).

       size (sz)
              refers to the entire buffer where the string  or  character  se‐
              quence is contained.

       end    is  the  name  of  a  pointer  to the terminating null byte of a
              string, or a pointer to one past the last character of a charac‐
              ter sequence.  This is the return value of functions that  allow
              chaining.  It is equivalent to &str[len].

       past_end
              is  the name of a pointer to one past the end of the buffer that
              contains a string or character sequence.  It  is  equivalent  to
              &str[sz].   It  is used as a sentinel value, to be able to trun‐
              cate strings or character sequences instead of  overrunning  the
              containing buffer.

   Copy, concatenate, and chain‐copy
       Originally,  there  was  a  distinction between functions that copy and
       those that concatenate.  However, newer functions that copy  while  al‐
       lowing  chaining cover both use cases with a single API.  They are also
       algorithmically faster, since they don’t need to search for the end  of
       the  existing  string.  However, functions that concatenate have a much
       simpler use, so if performance is not important, it can make  sense  to
       use them for improving readability.

       To  chain  copy  functions,  they  need to return a pointer to the end.
       That’s a byproduct of the copy operation,  so  it  has  no  performance
       costs.   Functions that return such a pointer, and thus can be chained,
       have names of the form *stp*(), since it’s  also  common  to  name  the
       pointer just p.

       Chain‐copying  functions  that  truncate should accept a pointer to one
       past the end of the destination buffer, and  have  names  of  the  form
       *stpe*().  This allows not having to recalculate the remaining size af‐
       ter each call.

   Truncate or not?
       The  first  thing  to  note  is that programmers should be careful with
       buffers, so they always have the correct size, and  truncation  is  not
       necessary.

       In  most cases, truncation is not desired, and it is simpler to just do
       the copy.  Simpler code is safer code.  Programming against programming
       mistakes by adding more code just adds more points where  mistakes  can
       be made.

       Nowadays,  compilers  can  detect  most programmer errors with features
       like compiler warnings,  static  analyzers,  and  _FORTIFY_SOURCE  (see
       ftm(7)).   Keeping  the code simple helps these overflow‐detection fea‐
       tures be more precise.

       When validating user input, however, it makes sense to  truncate.   Re‐
       member to check the return value of such function calls.

       Functions that truncate:

       •  stpecpy(3)  is the most efficient string copy function that performs
          truncation.  It only requires to check for truncation once after all
          chained calls.

       •  stpecpyx(3) is a variant of  stpecpy(3)  that  consumes  the  entire
          source  string,  to catch bugs in the program by forcing a segmenta‐
          tion fault (as strlcpy(3bsd) and strlcat(3bsd) do).

       •  strlcpy(3bsd) and strlcat(3bsd) are designed to crash if  the  input
          string is invalid (doesn’t contain a terminating null byte).

       •  strscpy(3)   reports  an  error  instead  of  crashing  (similar  to
          stpecpy(3)).

       •  stpncpy(3) and  strncpy(3)  also  truncate,  but  they  don’t  write
          strings, but rather null‐padded character sequences.

   Null‐padded character sequences
       For  historic  reasons, some standard APIs, such as utmpx(5), use null‐
       padded character sequences in fixed‐width buffers.  To  interface  with
       them, specialized functions need to be used.

       To copy strings into them, use stpncpy(3).

       To  copy from an unterminated string within a fixed‐width buffer into a
       string, ignoring any trailing null  bytes  in  the  source  fixed‐width
       buffer, you should use zustr2stp(3) or strncat(3).

       To  copy from an unterminated string within a fixed‐width buffer into a
       character sequence, ingoring any trailing  null  bytes  in  the  source
       fixed‐width buffer, you should use zustr2ustp(3).

   Measured character sequences
       The simplest character sequence copying function is mempcpy(3).  It re‐
       quires always knowing the length of your character sequences, for which
       structures  can  be used.  It makes the code much faster, since you al‐
       ways know the length of your character sequences, and can do the  mini‐
       mal  copies  and  length measurements.  mempcpy(3) copies character se‐
       quences, so you need to explicitly set the terminating null byte if you
       need a string.

       However, for keeping type safety, it’s good to add a wrapper that  uses
       char * instead of void *: ustpcpy(3).

       In  programs  that  make  considerable  use of strings or character se‐
       quences, and need the best performance, using overlapping character se‐
       quences can make a big difference.  It allows holding subsequences of a
       larger character sequence.  while not duplicating memory nor using time
       to do a copy.

       However, this is delicate, since it requires using character sequences.
       C library APIs use strings, so programs that  use  character  sequences
       will  have  to  take care of differentiating strings from character se‐
       quences.

       To copy a measured character sequence, use ustpcpy(3).

       To copy a measured character sequence into a string, use ustr2stp(3).

       Because these functions ask for the length, and a string is  by  nature
       composed  of a character sequence of the same length plus a terminating
       null byte, a string is also accepted as input.

   String vs character sequence
       Some functions only operate on strings.  Those require that  the  input
       src  is  a string, and guarantee an output string (even when truncation
       occurs).  Functions that concatenate also  require  that  dst  holds  a
       string before the call.  List of functions:

       •  stpcpy(3)
       •  strcpy(3), strcat(3)
       •  stpecpy(3), stpecpyx(3)
       •  strlcpy(3bsd), strlcat(3bsd)
       •  strscpy(3)

       Other  functions  require  an  input string, but create a character se‐
       quence as output.  These functions have confusing  names,  and  have  a
       long history of misuse.  List of functions:

       •  stpncpy(3)
       •  strncpy(3)

       Other  functions  operate on an input character sequence, and create an
       output string.  Functions that concatenate also require that dst  holds
       a  string before the call.  strncat(3) has an even more misleading name
       than the functions above.  List of functions:

       •  zustr2stp(3)
       •  strncat(3)
       •  ustr2stp(3)

       Other functions operate on an input character  sequence  to  create  an
       output character sequence.  List of functions:

       •  ustpcpy(3)

       •  zustr2stp(3)

   Functions
       stpcpy(3)
              This function copies the input string into a destination string.
              The  programmer  is  responsible  for  allocating a buffer large
              enough.  It returns a pointer suitable for chaining.

              An implementation of this function might be:

                  char *
                  stpcpy(char *restrict dst, const char *restrict src)
                  {
                      char  *end;

                      end = mempcpy(dst, src, strlen(src));
                      *end = '\0';

                      return end;
                  }

       strcpy(3)
       strcat(3)
              These functions copy the input string into a destination string.
              The programmer is responsible  for  allocating  a  buffer  large
              enough.  The return value is useless.

              stpcpy(3) is a faster alternative to these functions.

              An implementation of these functions might be:

                  char *
                  strcpy(char *restrict dst, const char *restrict src)
                  {
                      stpcpy(dst, src);
                      return dst;
                  }

                  char *
                  strcat(char *restrict dst, const char *restrict src)
                  {
                      stpcpy(dst + strlen(dst), src);
                      return dst;
                  }

       stpecpy(3)
       stpecpyx(3)
              These functions copy the input string into a destination string.
              If  the destination buffer, limited by a pointer to one past the
              end of it, isn’t large enough to hold the  copy,  the  resulting
              string  is  truncated  (but  it  is guaranteed to be null‐termi‐
              nated).  They return a pointer suitable for  chaining.   Trunca‐
              tion needs to be detected only once after the last chained call.
              stpecpyx(3)  has  identical semantics to stpecpy(3), except that
              it forces a SIGSEGV if the src pointer is not a string.

              These functions are not provided by any library, but you can de‐
              fine them with the following reference implementations:

                  /* This code is in the public domain. */
                  char *
                  stpecpy(char *dst, char past_end[0],
                          const char *restrict src)
                  {
                      char *p;

                      if (dst == past_end)
                          return past_end;

                      p = memccpy(dst, src, '\0', past_end - dst);
                      if (p != NULL)
                          return p - 1;

                      /* truncation detected */
                      past_end[-1] = '\0';
                      return past_end;
                  }

                  /* This code is in the public domain. */
                  char *
                  stpecpyx(char *dst, char past_end[0],
                           const char *restrict src)
                  {
                      if (src[strlen(src)] != '\0')
                          raise(SIGSEGV);

                      return stpecpy(dst, past_end, src);
                  }

       strlcpy(3bsd)
       strlcat(3bsd)
              These functions copy the input string into a destination string.
              If the destination buffer, limited  by  its  size,  isn’t  large
              enough  to hold the copy, the resulting string is truncated (but
              it is guaranteed to be null‐terminated).  They return the length
              of the total string they tried to create.  These functions force
              a SIGSEGV if the src pointer is not a string.

              stpecpyx(3) is a faster alternative to these functions.

       strscpy(3)
              This function copies the input string into a destination string.
              If the destination buffer, limited  by  its  size,  isn’t  large
              enough  to hold the copy, the resulting string is truncated (but
              it is guaranteed to be null‐terminated).  It returns the  length
              of the destination string, or -E2BIG on truncation.

              stpecpy(3) is a simpler and faster alternative to this function.

       stpncpy(3)
              This  function  copies the input string into a destination null‐
              padded character sequence in a fixed‐width buffer.  If the  des‐
              tination buffer, limited by its size, isn’t large enough to hold
              the  copy, the resulting character sequence is truncated.  Since
              it creates a character sequence, it doesn’t need to write a ter‐
              minating null byte.  It returns a pointer suitable for chaining,
              but it’s not ideal for that.   It’s  impossible  to  distinguish
              truncation  after  the call, from a character sequence that just
              fits the destination buffer; truncation should be detected  from
              the length of the original string.

              If  you’re going to use this function in chained calls, it would
              be useful to develop a similar function that accepts  a  pointer
              to one past the end of the buffer instead of a size.

              An implementation of this function might be:

                  char *
                  stpncpy(char *restrict dst, const char *restrict src,
                          size_t sz)
                  {
                      char  *p;

                      bzero(dst, sz);
                      p = memccpy(dst, src, '\0', sz);
                      if (p == NULL)
                          return dst + sz;

                      return p - 1;
                  }

       strncpy(3)
              This  function is identical to stpncpy(3) except for the useless
              return value.

              stpncpy(3) is a simpler alternative to this function.

              An implementation of this function might be:

                  char *
                  strncpy(char *restrict dst, const char *restrict src,
                          size_t sz)
                  {
                      stpncpy(dst, src, sz);
                      return dst;
                  }

       zustr2ustp(3)
              This function copies the input character sequence contained in a
              null‐padded wixed‐width buffer, into a destination character se‐
              quence.  The programmer is responsible for allocating  a  buffer
              large enough.  It returns a pointer suitable for chaining.

              A  truncating  version of this function doesn’t exist, since the
              size of the original character sequence is always known,  so  it
              wouldn’t be very useful.

              This function is not provided by any library, but you can define
              it with the following reference implementation:

                  /* This code is in the public domain. */
                  char *
                  zustr2ustp(char *restrict dst, const char *restrict src,
                             size_t sz)
                  {
                      return ustpcpy(dst, src, strnlen(src, sz));
                  }

       zustr2stp(3)
              This function copies the input character sequence contained in a
              null‐padded  wixed‐width buffer, into a destination string.  The
              programmer is responsible for allocating a buffer large  enough.
              It returns a pointer suitable for chaining.

              A  truncating  version of this function doesn’t exist, since the
              size of the original character sequence is always known,  so  it
              wouldn’t be very useful.

              This function is not provided by any library, but you can define
              it with the following reference implementation:

                  /* This code is in the public domain. */
                  char *
                  zustr2stp(char *restrict dst, const char *restrict src,
                            size_t sz)
                  {
                      char  *end;

                      end = zustr2ustp(dst, src, sz);
                      *end = '\0';

                      return end;
                  }

       strncat(3)
              Do  not  confuse this function with strncpy(3); they are not re‐
              lated at all.

              This function concatenates the  input  character  sequence  con‐
              tained  in  a null‐padded wixed‐width buffer, into a destination
              string.  The programmer is responsible for allocating  a  buffer
              large enough.  The return value is useless.

              zustr2stp(3) is a faster alternative to this function.

              An implementation of this function might be:

                  char *
                  strncat(char *restrict dst, const char *restrict src,
                          size_t sz)
                  {
                      zustr2stp(dst + strlen(dst), src, sz);
                      return dst;
                  }

       ustpcpy(3)
              This  function  copies  the input character sequence, limited by
              its length, into a destination character sequence.  The program‐
              mer is responsible for allocating a buffer large enough.  It re‐
              turns a pointer suitable for chaining.

              An implementation of this function might be:

                  /* This code is in the public domain. */
                  char *
                  ustpcpy(char *restrict dst, const char *restrict src,
                          size_t len)
                  {
                      return mempcpy(dst, src, len);
                  }

       ustr2stp(3)
              This function copies the input character  sequence,  limited  by
              its  length,  into  a destination string.  The programmer is re‐
              sponsible for allocating a buffer large enough.   It  returns  a
              pointer suitable for chaining.

              An implementation of this function might be:

                  /* This code is in the public domain. */
                  char *
                  ustr2stp(char *restrict dst, const char *restrict src,
                          size_t len)
                  {
                      char  *end;

                      end = ustpcpy(dst, src, len);
                      *end = '\0';

                      return end;
                  }

RETURN VALUE
       The  following  functions return a pointer to the terminating null byte
       in the destination string.

       •  stpcpy(3)
       •  ustr2stp(3)
       •  zustr2stp(3)

       The following functions return a pointer to the terminating  null  byte
       in the destination string, except when truncation occurs; if truncation
       occurs,  they  return  a pointer to one past the end of the destination
       buffer (past_end).

       •  stpecpy(3), stpecpyx(3)

       The following function returns a pointer to one after the last  charac‐
       ter  in  the destination character sequence; if truncation occurs, that
       pointer is equivalent to a pointer to one past the end of the  destina‐
       tion buffer.

       •  stpncpy(3)

       The  following function returns a pointer to one after the last charac‐
       ter in the destination character sequence.

       •  zustr2ustp(3)

       •  ustpcpy(3)

       The following functions return the length of the total string that they
       tried to create (as if truncation didn’t occur).

       •  strlcpy(3bsd), strlcat(3bsd)

       The following function returns the length of the destination string, or
       -E2BIG on truncation.

       •  strscpy(3)

       The following functions return the dst pointer, which is useless.

       •  strcpy(3), strcat(3)
       •  strncpy(3)
       •  strncat(3)

ATTRIBUTES
       For an explanation of the terms  used  in  this  section,  see  attrib‐
       utes(7).
       ┌────────────────────────────────────────────┬───────────────┬─────────┐
       │Interface                                   │ Attribute     │ Value   │
       ├────────────────────────────────────────────┼───────────────┼─────────┤
       │stpcpy(), strcpy(), strcat(), stpecpy(),    │ Thread safety │ MT‐Safe │
       │stpecpyx() strlcpy(), strlcat(), strscpy(), │               │         │
       │stpncpy(), strncpy(), zustr2ustp(),         │               │         │
       │zustr2stp(), strncat(), ustr2stp()          │               │         │
       │ustpcpy()                                   │               │         │
       └────────────────────────────────────────────┴───────────────┴─────────┘

STANDARDS
       strcpy(3), strcat(3)
       strncpy(3)
       strncat(3)
              POSIX.1‐2001, POSIX.1‐2008, C89, C99, SVr4, 4.3BSD.

       stpcpy(3)
       stpncpy(3)
              POSIX.1‐2008.

       strlcpy(3bsd), strlcat(3bsd)
              Functions  originated  in  OpenBSD and present in some Unix sys‐
              tems.

       strscpy(3)
              Linux kernel internal function.

       stpecpy(3), stpecpyx(3)
       zustr2ustp(3)
       zustr2stp(3)
       ustr2stp(3), ustpcpy(3)
              Not defined by any standards nor libraries.

CAVEATS
       Don’t mix chain calls to truncating and non‐truncating  functions.   It
       is  conceptually  wrong  unless  you know that the first part of a copy
       will always fit.  Anyway, the performance difference will  probably  be
       negligible, so it will probably be more clear if you use consistent se‐
       mantics: either truncating or non‐truncating.  Calling a non‐truncating
       function after a truncating one is necessarily wrong.

       Some  of  the functions described here are not provided by any library;
       you should write your own copy if you want to use them.  See STANDARDS.

BUGS
       All concatenation (*cat()) functions share the same  performance  prob‐
       lem:  Shlemiel  the painter ⟨https://www.joelonsoftware.com/2001/12/11/
       back-to-basics/⟩.

EXAMPLES
       The following are examples of correct use of each of these functions.

       stpcpy(3)
                  p = buf;
                  p = stpcpy(p, "Hello ");
                  p = stpcpy(p, "world");
                  p = stpcpy(p, "!");
                  len = p - buf;
                  puts(buf);

       strcpy(3)
       strcat(3)
                  strcpy(buf, "Hello ");
                  strcat(buf, "world");
                  strcat(buf, "!");
                  len = strlen(buf);
                  puts(buf);

       stpecpy(3)
       stpecpyx(3)
                  past_end = buf + sizeof(buf);
                  p = buf;
                  p = stpecpy(p, past_end, "Hello ");
                  p = stpecpy(p, past_end, "world");
                  p = stpecpy(p, past_end, "!");
                  if (p == past_end) {
                      p--;
                      goto toolong;
                  }
                  len = p - buf;
                  puts(buf);

       strlcpy(3bsd)
       strlcat(3bsd)
                  if (strlcpy(buf, "Hello ", sizeof(buf)) >= sizeof(buf))
                      goto toolong;
                  if (strlcat(buf, "world", sizeof(buf)) >= sizeof(buf))
                      goto toolong;
                  len = strlcat(buf, "!", sizeof(buf));
                  if (len >= sizeof(buf))
                      goto toolong;
                  puts(buf);

       strscpy(3)
                  len = strscpy(buf, "Hello world!", sizeof(buf));
                  if (len == -E2BIG)
                      goto toolong;
                  puts(buf);

       stpncpy(3)
                  end = stpncpy(buf, "Hello world!", sizeof(buf));
                  if (sizeof(buf) < strlen("Hello world!"))
                      goto toolong;
                  len = end - buf;
                  for (size_t i = 0; i < sizeof(buf); i++)
                      putchar(buf[i]);

       strncpy(3)
                  strncpy(buf, "Hello world!", sizeof(buf));
                  if (sizeof(buf) < strlen("Hello world!"))
                      goto toolong;
                  len = strnlen(buf, sizeof(buf));
                  for (size_t i = 0; i < sizeof(buf); i++)
                      putchar(buf[i]);

       zustr2ustp(3)
                  p = buf;
                  p = zustr2ustp(p, "Hello ", 6);
                  p = zustr2ustp(p, "world", 42); // Padding null bytes ignored.
                  p = zustr2ustp(p, "!", 1);
                  len = p - buf;
                  printf("%.*s\n", (int) len, buf);

       zustr2stp(3)
                  p = buf;
                  p = zustr2stp(p, "Hello ", 6);
                  p = zustr2stp(p, "world", 42);  // Padding null bytes ignored.
                  p = zustr2stp(p, "!", 1);
                  len = p - buf;
                  puts(buf);

       strncat(3)
                  buf[0] = '\0';  // There’s no ’cpy’ function to this ’cat’.
                  strncat(buf, "Hello ", 6);
                  strncat(buf, "world", 42);  // Padding null bytes ignored.
                  strncat(buf, "!", 1);
                  len = strlen(buf);
                  puts(buf);

       ustpcpy(3)
                  p = buf;
                  p = ustpcpy(p, "Hello ", 6);
                  p = ustpcpy(p, "world", 5);
                  p = ustpcpy(p, "!", 1);
                  len = p - buf;
                  printf("%.*s\n", (int) len, buf);

       ustr2stp(3)
                  p = buf;
                  p = ustr2stp(p, "Hello ", 6);
                  p = ustr2stp(p, "world", 5);
                  p = ustr2stp(p, "!", 1);
                  len = p - buf;
                  puts(buf);

SEE ALSO
       bzero(3), memcpy(3), memccpy(3), mempcpy(3), string(3)

Linux man‐pages (unreleased)        (date)                           strcpy(3)

-- 
2.38.1


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v4 1/1] strcpy.3: Rewrite page to document all string-copying functions
  2022-12-14  0:03     ` [PATCH v3 0/1] Rewritten page for string-copying functions Alejandro Colomar
  2022-12-14  0:14       ` Alejandro Colomar
  2022-12-14 16:17       ` [PATCH v4 " Alejandro Colomar
@ 2022-12-14 16:17       ` Alejandro Colomar
  2 siblings, 0 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-14 16:17 UTC (permalink / raw)
  To: linux-man, Martin Sebor, G. Branden Robinson, Douglas McIlroy,
	Jakub Wilk
  Cc: Alejandro Colomar

This is an opportunity to use consistent language across the
documentation for all string-copying functions.

It is also easier to show the similarities and differences between all
of the functions, so that a reader can use this page to know which
function is needed for a given task.

Many functions that are inferior to another one, have been marked as
deprecated, notwithstanding the deprecation status in C libraries or
any standards.  Alternatives have been given in the same page, with
reference implementations.

Cc: Martin Sebor <msebor@redhat.com>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Douglas McIlroy <douglas.mcilroy@dartmouth.edu>
Cc: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man3/strcpy.3 | 1164 +++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 1076 insertions(+), 88 deletions(-)

diff --git a/man3/strcpy.3 b/man3/strcpy.3
index 74c3180ae..3b97da822 100644
--- a/man3/strcpy.3
+++ b/man3/strcpy.3
@@ -1,48 +1,845 @@
-.\" Copyright (C) 1993 David Metcalfe (david@prism.demon.co.uk)
+.\" Copyright 2022 Alejandro Colomar <alx@kernel.org>
 .\"
-.\" SPDX-License-Identifier: Linux-man-pages-copyleft
-.\"
-.\" References consulted:
-.\"     Linux libc source code
-.\"     Lewine's _POSIX Programmer's Guide_ (O'Reilly & Associates, 1991)
-.\"     386BSD man pages
-.\" Modified Sat Jul 24 18:06:49 1993 by Rik Faith (faith@cs.unc.edu)
-.\" Modified Fri Aug 25 23:17:51 1995 by Andries Brouwer (aeb@cwi.nl)
-.\" Modified Wed Dec 18 00:47:18 1996 by Andries Brouwer (aeb@cwi.nl)
-.\" 2007-06-15, Marc Boyer <marc.boyer@enseeiht.fr> + mtk
-.\"     Improve discussion of strncpy().
+.\" SPDX-License-Identifier: BSD-3-Clause
 .\"
 .TH strcpy 3 (date) "Linux man-pages (unreleased)"
+.\" ----- NAME :: -----------------------------------------------------/
 .SH NAME
-strcpy \- copy a string
+stpcpy,
+strcpy, strcat,
+stpecpy, stpecpyx,
+strlcpy, strlcat,
+strscpy,
+stpncpy,
+strncpy,
+zustr2ustp, zustr2stp,
+strncat,
+ustpcpy, ustr2stp
+\- copy strings and character sequences
+.\" ----- LIBRARY :: --------------------------------------------------/
 .SH LIBRARY
+.TP
+.BR stpcpy (3)
+.TQ
+.BR strcpy "(3), \c"
+.BR strcat (3)
+.TQ
+.BR stpncpy (3)
+.TQ
+.BR strncpy (3)
+.TQ
+.BR strncat (3)
 Standard C library
 .RI ( libc ", " \-lc )
+.TP
+.BR stpecpy "(3), \c"
+.BR stpecpyx (3)
+.TQ
+.BR zustr2ustp "(3), \c"
+.BR zustr2stp (3)
+.TQ
+.BR ustpcpy "(3), \c"
+.BR ustr2stp (3)
+Not provided by any library.
+.TP
+.BR strlcpy "(3), \c"
+.BR strlcat (3)
+Utility functions from BSD systems
+.RI ( libbsd ", " \-lbsd )
+.TP
+.BR strscpy (3)
+Not provided by any library.
+It is a Linux kernel internal function.
+.\" ----- SYNOPSIS :: -------------------------------------------------/
 .SH SYNOPSIS
 .nf
 .B #include <string.h>
+.fi
+.\" ----- SYNOPSIS :: (Null-terminated) strings -----------------------/
+.SS Strings
+.nf
+// Chain-copy a string.
+.BI "char *stpcpy(char *restrict " dst ", const char *restrict " src );
 .PP
-.BI "char *strcpy(char *restrict " dest ", const char *restrict " src );
+// Copy/concatenate a string.
+.BI "char *strcpy(char *restrict " dst ", const char *restrict " src );
+.BI "char *strcat(char *restrict " dst ", const char *restrict " src );
+.PP
+// Chain-copy a string with truncation.
+.BI "char *stpecpy(char *" dst ", char " past_end "[0], \
+const char *restrict " src );
+.PP
+// Chain-copy a string with truncation and SIGSEGV on UB.
+.BI "char *stpecpyx(char *" dst ", char " past_end "[0], \
+const char *restrict " src );
+.PP
+// Copy/concatenate a string with truncation and SIGSEGV on UB.
+.BI "size_t strlcpy(char " dst "[restrict ." sz "], \
+const char *restrict " src ,
+.BI "               size_t " sz );
+.BI "size_t strlcat(char " dst "[restrict ." sz "], \
+const char *restrict " src ,
+.BI "               size_t " sz );
+.PP
+// Copy a string with truncation.
+.BI "ssize_t strscpy(char " dst "[restrict ." sz "], \
+const char " src "[restrict ." sz ],
+.BI "               size_t " sz );
+.fi
+.\" ----- SYNOPSIS :: Null-padded character sequences --------/
+.SS Null-padded character sequences
+.nf
+// Zero a fixed-width buffer, and
+// copy a string into a character sequence with truncation.
+.BI "char *stpncpy(char " dst "[restrict ." sz "], \
+const char *restrict " src ,
+.BI "               size_t " sz );
+.PP
+// Zero a fixed-width buffer, and
+// copy a string into a character sequence with truncation.
+.BI "char *strncpy(char " dest "[restrict ." sz "], \
+const char *restrict " src ,
+.BI "               size_t " sz );
+.PP
+// Chain-copy a null-padded character sequence into a character sequence.
+.BI "char *zustr2ustp(char *restrict " dst ", \
+const char " src "[restrict ." sz ],
+.BI "               size_t " sz );
+.PP
+// Chain-copy a null-padded character sequence into a string.
+.BI "char *zustr2stp(char *restrict " dst ", \
+const char " src "[restrict ." sz ],
+.BI "               size_t " sz );
+.PP
+// Concatenate a null-padded character sequence into a string.
+.BI "char *strncat(char *restrict " dst ", const char " src "[restrict ." sz ],
+.BI "               size_t " sz );
+.fi
+.\" ----- SYNOPSIS :: Measured character sequences --------------------/
+.SS Measured character sequences
+.nf
+// Chain-copy a measured character sequence.
+.BI "char *ustpcpy(char *restrict " dst ", \
+const char " src "[restrict ." len ],
+.BI "               size_t " len );
+.PP
+// Chain-copy a measured character sequence into a string.
+.BI "char *ustr2stp(char *restrict " dst ", \
+const char " src "[restrict ." len ],
+.BI "               size_t " len );
+.fi
+.PP
+.RS -4
+Feature Test Macro Requirements for glibc (see
+.BR feature_test_macros (7)):
+.RE
+.PP
+.BR stpcpy (3),
+.BR stpncpy (3):
+.nf
+    Since glibc 2.10:
+        _POSIX_C_SOURCE >= 200809L
+    Before glibc 2.10:
+        _GNU_SOURCE
 .fi
 .SH DESCRIPTION
-The
-.BR strcpy ()
-function copies the string pointed to by
-.IR src ,
-including the terminating null byte (\(aq\e0\(aq),
-to the buffer pointed to by
-.IR dest .
-The strings may not overlap, and the destination string
-.I dest
-must be large enough to receive the copy.
-.I Beware of buffer overruns!
-(See BUGS.)
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: -----------------/
+.SS Terms (and abbreviations)
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: string (str) ----/
+.TP
+.IR "string " ( str )
+is a sequence of zero or more non-null characters followed by a null byte.
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: null-padded character seq
+.TP
+.I character sequence
+is a sequence of zero or more non-null characters.
+A program should never usa a character sequence where a string is required.
+However, with appropriate care,
+a string can be used in the place of a character sequence.
+.RS
+.TP
+.IR "null-padded character sequence " ( zustr )
+Character sequences can be contained in fixed-width buffers,
+which contain padding null bytes after the character sequence,
+to fill the rest of the buffer
+without affecting the character sequence;
+however, those padding null bytes are not part of the character sequence.
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: measured character sequence
+.TP
+.IR "measured character sequence " ( ustr )
+Character sequence delimited by its length.
+It may be a slice of a larger character sequence,
+or even of a string.
+.RE
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: length (len) ----/
+.TP
+.IR "length " ( len )
+is the number of non-null characters in a string or character sequence.
+It is the return value of
+.I strlen(str)
+and of
+.IR "strnlen(ustr, sz)" .
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: size (sz) -------/
+.TP
+.IR "size " ( sz )
+refers to the entire buffer
+where the string or character sequence is contained.
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: end -------------/
+.TP
+.I end
+is the name of a pointer to the terminating null byte of a string,
+or a pointer to one past the last character of a character sequence.
+This is the return value of functions that allow chaining.
+It is equivalent to
+.IR &str[len] .
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: past_end --------/
+.TP
+.I past_end
+is the name of a pointer to one past the end of the buffer
+that contains a string or character sequence.
+It is equivalent to
+.IR &str[sz] .
+It is used as a sentinel value,
+to be able to truncate strings or character sequences
+instead of overrunning the containing buffer.
+.\" ----- DESCRIPTION :: Copy, concatenate, and chain-copy ------------/
+.SS Copy, concatenate, and chain-copy
+Originally,
+there was a distinction between functions that copy and those that concatenate.
+However, newer functions that copy while allowing chaining
+cover both use cases with a single API.
+They are also algorithmically faster,
+since they don't need to search for the end of the existing string.
+However, functions that concatenate have a much simpler use,
+so if performance is not important,
+it can make sense to use them for improving readability.
+.PP
+To chain copy functions,
+they need to return a pointer to the
+.IR end .
+That's a byproduct of the copy operation,
+so it has no performance costs.
+Functions that return such a pointer,
+and thus can be chained,
+have names of the form
+.RB * stp *(),
+since it's also common to name the pointer just
+.IR p .
+.PP
+Chain-copying functions that truncate
+should accept a pointer to one past the end of the destination buffer,
+and have names of the form
+.RB * stpe *().
+This allows not having to recalculate the remaining size after each call.
+.\" ----- DESCRIPTION :: Truncate or not? -----------------------------/
+.SS Truncate or not?
+The first thing to note is that programmers should be careful with buffers,
+so they always have the correct size,
+and truncation is not necessary.
+.PP
+In most cases,
+truncation is not desired,
+and it is simpler to just do the copy.
+Simpler code is safer code.
+Programming against programming mistakes by adding more code
+just adds more points where mistakes can be made.
+.PP
+Nowadays,
+compilers can detect most programmer errors with features like
+compiler warnings,
+static analyzers, and
+.BR \%_FORTIFY_SOURCE
+(see
+.BR ftm (7)).
+Keeping the code simple
+helps these overflow-detection features be more precise.
+.PP
+When validating user input,
+however,
+it makes sense to truncate.
+Remember to check the return value of such function calls.
+.PP
+Functions that truncate:
+.IP \(bu 3
+.BR stpecpy (3)
+is the most efficient string copy function that performs truncation.
+It only requires to check for truncation once after all chained calls.
+.IP \(bu
+.BR stpecpyx (3)
+is a variant of
+.BR stpecpy (3)
+that consumes the entire source string,
+to catch bugs in the program
+by forcing a segmentation fault (as
+.BR strlcpy (3bsd)
+and
+.BR strlcat (3bsd)
+do).
+.IP \(bu
+.BR strlcpy (3bsd)
+and
+.BR strlcat (3bsd)
+are designed to crash if the input string is invalid
+(doesn't contain a terminating null byte).
+.IP \(bu
+.BR strscpy (3)
+reports an error instead of crashing (similar to
+.BR stpecpy (3)).
+.IP \(bu
+.BR stpncpy (3)
+and
+.BR strncpy (3)
+also truncate, but they don't write strings,
+but rather null-padded character sequences.
+.\" ----- DESCRIPTION :: Null-padded character sequences --------------/
+.SS Null-padded character sequences
+For historic reasons,
+some standard APIs,
+such as
+.BR utmpx (5),
+use null-padded character sequences in fixed-width buffers.
+To interface with them,
+specialized functions need to be used.
+.PP
+To copy strings into them, use
+.BR stpncpy (3).
+.PP
+To copy from an unterminated string within a fixed-width buffer into a string,
+ignoring any trailing null bytes in the source fixed-width buffer,
+you should use
+.BR zustr2stp (3)
+or
+.BR strncat (3).
+.PP
+To copy from an unterminated string within a fixed-width buffer
+into a character sequence,
+ingoring any trailing null bytes in the source fixed-width buffer,
+you should use
+.BR zustr2ustp (3).
+.\" ----- DESCRIPTION :: Measured character sequences -----------------/
+.SS Measured character sequences
+The simplest character sequence copying function is
+.BR mempcpy (3).
+It requires always knowing the length of your character sequences,
+for which structures can be used.
+It makes the code much faster,
+since you always know the length of your character sequences,
+and can do the minimal copies and length measurements.
+.BR mempcpy (3)
+copies character sequences,
+so you need to explicitly set the terminating null byte if you need a string.
+.PP
+However,
+for keeping type safety,
+it's good to add a wrapper that uses
+.I char\~*
+instead of
+.IR void\~* :
+.BR ustpcpy (3).
+.PP
+In programs that make considerable use of strings or character sequences,
+and need the best performance,
+using overlapping character sequences can make a big difference.
+It allows holding subsequences of a larger character sequence.
+while not duplicating memory
+nor using time to do a copy.
+.PP
+However, this is delicate,
+since it requires using character sequences.
+C library APIs use strings,
+so programs that use character sequences
+will have to take care of differentiating strings from character sequences.
+.PP
+To copy a measured character sequence, use
+.BR ustpcpy (3).
+.PP
+To copy a measured character sequence into a string, use
+.BR ustr2stp (3).
+.PP
+Because these functions ask for the length,
+and a string is by nature composed of a character sequence of the same length
+plus a terminating null byte,
+a string is also accepted as input.
+.\" ----- DESCRIPTION :: String vs character sequence -----------------/
+.SS String vs character sequence
+Some functions only operate on strings.
+Those require that the input
+.I src
+is a string,
+and guarantee an output string
+(even when truncation occurs).
+Functions that concatenate
+also require that
+.I dst
+holds a string before the call.
+List of functions:
+.IP \(bu 3
+.PD 0
+.BR stpcpy (3)
+.IP \(bu
+.BR strcpy "(3), \c"
+.BR strcat (3)
+.IP \(bu
+.BR stpecpy "(3), \c"
+.BR stpecpyx (3)
+.IP \(bu
+.BR strlcpy "(3bsd), \c"
+.BR strlcat (3bsd)
+.IP \(bu
+.BR strscpy (3)
+.PD
+.PP
+Other functions require an input string,
+but create a character sequence as output.
+These functions have confusing names,
+and have a long history of misuse.
+List of functions:
+.IP \(bu 3
+.PD 0
+.BR stpncpy (3)
+.IP \(bu
+.BR strncpy (3)
+.PD
+.PP
+Other functions operate on an input character sequence,
+and create an output string.
+Functions that concatenate
+also require that
+.I dst
+holds a string before the call.
+.BR strncat (3)
+has an even more misleading name than the functions above.
+List of functions:
+.IP \(bu 3
+.PD 0
+.BR zustr2stp (3)
+.IP \(bu
+.BR strncat (3)
+.IP \(bu
+.BR ustr2stp (3)
+.PD
+.PP
+Other functions operate on an input character sequence
+to create an output character sequence.
+List of functions:
+.IP \(bu 3
+.BR ustpcpy (3)
+.IP \(bu
+.BR zustr2stp (3)
+.\" ----- DESCRIPTION :: Functions :: ---------------------------------/
+.SS Functions
+.\" ----- DESCRIPTION :: Functions :: stpcpy(3) -----------------------/
+.TP
+.BR stpcpy (3)
+This function copies the input string into a destination string.
+The programmer is responsible for allocating a buffer large enough.
+It returns a pointer suitable for chaining.
+.IP
+An implementation of this function might be:
+.IP
+.in +4n
+.EX
+char *
+stpcpy(char *restrict dst, const char *restrict src)
+{
+    char  *end;
+
+    end = mempcpy(dst, src, strlen(src));
+    *end = \(aq\e0\(aq;
+
+    return end;
+}
+.EE
+.in
+.\" ----- DESCRIPTION :: Functions :: strcpy(3), strcat(3) ------------/
+.TP
+.BR strcpy (3)
+.TQ
+.BR strcat (3)
+These functions copy the input string into a destination string.
+The programmer is responsible for allocating a buffer large enough.
+The return value is useless.
+.IP
+.BR stpcpy (3)
+is a faster alternative to these functions.
+.IP
+An implementation of these functions might be:
+.IP
+.in +4n
+.EX
+char *
+strcpy(char *restrict dst, const char *restrict src)
+{
+    stpcpy(dst, src);
+    return dst;
+}
+
+char *
+strcat(char *restrict dst, const char *restrict src)
+{
+    stpcpy(dst + strlen(dst), src);
+    return dst;
+}
+.EE
+.in
+.\" ----- DESCRIPTION :: Functions :: stpecpy(3), stpecpyx(3) ---------/
+.TP
+.BR stpecpy (3)
+.TQ
+.BR stpecpyx (3)
+These functions copy the input string into a destination string.
+If the destination buffer,
+limited by a pointer to one past the end of it,
+isn't large enough to hold the copy,
+the resulting string is truncated
+(but it is guaranteed to be null-terminated).
+They return a pointer suitable for chaining.
+Truncation needs to be detected only once after the last chained call.
+.BR stpecpyx (3)
+has identical semantics to
+.BR stpecpy (3),
+except that it forces a SIGSEGV if the
+.I src
+pointer is not a string.
+.IP
+These functions are not provided by any library,
+but you can define them with the following reference implementations:
+.IP
+.in +4n
+.EX
+/* This code is in the public domain. */
+char *
+stpecpy(char *dst, char past_end[0],
+        const char *restrict src)
+{
+    char *p;
+
+    if (dst == past_end)
+        return past_end;
+
+    p = memccpy(dst, src, \(aq\e0\(aq, past_end \- dst);
+    if (p != NULL)
+        return p \- 1;
+
+    /* truncation detected */
+    past_end[\-1] = \(aq\e0\(aq;
+    return past_end;
+}
+
+/* This code is in the public domain. */
+char *
+stpecpyx(char *dst, char past_end[0],
+         const char *restrict src)
+{
+    if (src[strlen(src)] != \(aq\e0\(aq)
+        raise(SIGSEGV);
+
+    return stpecpy(dst, past_end, src);
+}
+.EE
+.in
+.\" ----- DESCRIPTION :: Functions :: strlcpy(3bsd), strlcat(3bsd) ----/
+.TP
+.BR strlcpy (3bsd)
+.TQ
+.BR strlcat (3bsd)
+These functions copy the input string into a destination string.
+If the destination buffer,
+limited by its size,
+isn't large enough to hold the copy,
+the resulting string is truncated
+(but it is guaranteed to be null-terminated).
+They return the length of the total string they tried to create.
+These functions force a SIGSEGV if the
+.I src
+pointer is not a string.
+.IP
+.BR stpecpyx (3)
+is a faster alternative to these functions.
+.\" ----- DESCRIPTION :: Functions :: strscpy(3) ----------------------/
+.TP
+.BR strscpy (3)
+This function copies the input string into a destination string.
+If the destination buffer,
+limited by its size,
+isn't large enough to hold the copy,
+the resulting string is truncated
+(but it is guaranteed to be null-terminated).
+It returns the length of the destination string, or
+.B \-E2BIG
+on truncation.
+.IP
+.BR stpecpy (3)
+is a simpler and faster alternative to this function.
+.RE
+.\" ----- DESCRIPTION :: Functions :: stpncpy(3) ----------------------/
+.TP
+.BR stpncpy (3)
+This function copies the input string into
+a destination null-padded character sequence in a fixed-width buffer.
+If the destination buffer,
+limited by its size,
+isn't large enough to hold the copy,
+the resulting character sequence is truncated.
+Since it creates a character sequence,
+it doesn't need to write a terminating null byte.
+It returns a pointer suitable for chaining,
+but it's not ideal for that.
+It's impossible to distinguish truncation after the call,
+from a character sequence that just fits the destination buffer;
+truncation should be detected from the length of the original string.
+.IP
+If you're going to use this function in chained calls,
+it would be useful to develop a similar function
+that accepts a pointer to one past the end of the buffer instead of a size.
+.IP
+An implementation of this function might be:
+.IP
+.in +4n
+.EX
+char *
+stpncpy(char *restrict dst, const char *restrict src,
+        size_t sz)
+{
+    char  *p;
+
+    bzero(dst, sz);
+    p = memccpy(dst, src, \(aq\e0\(aq, sz);
+    if (p == NULL)
+        return dst + sz;
+
+    return p \- 1;
+}
+.EE
+.in
+.\" ----- DESCRIPTION :: Functions :: strncpy(3) ----------------------/
+.TP
+.BR strncpy (3)
+This function is identical to
+.BR stpncpy (3)
+except for the useless return value.
+.IP
+.BR stpncpy (3)
+is a simpler alternative to this function.
+.IP
+An implementation of this function might be:
+.IP
+.in +4n
+.EX
+char *
+strncpy(char *restrict dst, const char *restrict src,
+        size_t sz)
+{
+    stpncpy(dst, src, sz);
+    return dst;
+}
+.EE
+.in
+.\" ----- DESCRIPTION :: Functions :: zustr2ustp(3) --------------------/
+.TP
+.BR zustr2ustp (3)
+This function copies the input character sequence
+contained in a null-padded wixed-width buffer,
+into a destination character sequence.
+The programmer is responsible for allocating a buffer large enough.
+It returns a pointer suitable for chaining.
+.IP
+A truncating version of this function doesn't exist,
+since the size of the original character sequence is always known,
+so it wouldn't be very useful.
+.IP
+This function is not provided by any library,
+but you can define it with the following reference implementation:
+.IP
+.in +4n
+.EX
+/* This code is in the public domain. */
+char *
+zustr2ustp(char *restrict dst, const char *restrict src,
+           size_t sz)
+{
+    return ustpcpy(dst, src, strnlen(src, sz));
+}
+.EE
+.in
+.\" ----- DESCRIPTION :: Functions :: zustr2stp(3) --------------------/
+.TP
+.BR zustr2stp (3)
+This function copies the input character sequence
+contained in a null-padded wixed-width buffer,
+into a destination string.
+The programmer is responsible for allocating a buffer large enough.
+It returns a pointer suitable for chaining.
+.IP
+A truncating version of this function doesn't exist,
+since the size of the original character sequence is always known,
+so it wouldn't be very useful.
+.IP
+This function is not provided by any library,
+but you can define it with the following reference implementation:
+.IP
+.in +4n
+.EX
+/* This code is in the public domain. */
+char *
+zustr2stp(char *restrict dst, const char *restrict src,
+          size_t sz)
+{
+    char  *end;
+
+    end = zustr2ustp(dst, src, sz);
+    *end = \(aq\e0\(aq;
+
+    return end;
+}
+.EE
+.in
+.\" ----- DESCRIPTION :: Functions :: strncat(3) ----------------------/
+.TP
+.BR strncat (3)
+Do not confuse this function with
+.BR strncpy (3);
+they are not related at all.
+.IP
+This function concatenates the input character sequence
+contained in a null-padded wixed-width buffer,
+into a destination string.
+The programmer is responsible for allocating a buffer large enough.
+The return value is useless.
+.IP
+.BR zustr2stp (3)
+is a faster alternative to this function.
+.IP
+An implementation of this function might be:
+.IP
+.in +4n
+.EX
+char *
+strncat(char *restrict dst, const char *restrict src,
+        size_t sz)
+{
+    zustr2stp(dst + strlen(dst), src, sz);
+    return dst;
+}
+.EE
+.in
+.\" ----- DESCRIPTION :: Functions :: ustpcpy(3) ----------------------/
+.TP
+.BR ustpcpy (3)
+This function copies the input character sequence,
+limited by its length,
+into a destination character sequence.
+The programmer is responsible for allocating a buffer large enough.
+It returns a pointer suitable for chaining.
+.IP
+An implementation of this function might be:
+.IP
+.in +4n
+.EX
+/* This code is in the public domain. */
+char *
+ustpcpy(char *restrict dst, const char *restrict src,
+        size_t len)
+{
+    return mempcpy(dst, src, len);
+}
+.EE
+.in
+.\" ----- DESCRIPTION :: Functions :: ustr2stp(3) ---------------------/
+.TP
+.BR ustr2stp (3)
+This function copies the input character sequence,
+limited by its length,
+into a destination string.
+The programmer is responsible for allocating a buffer large enough.
+It returns a pointer suitable for chaining.
+.IP
+An implementation of this function might be:
+.IP
+.in +4n
+.EX
+/* This code is in the public domain. */
+char *
+ustr2stp(char *restrict dst, const char *restrict src,
+        size_t len)
+{
+    char  *end;
+
+    end = ustpcpy(dst, src, len);
+    *end = \(aq\e0\(aq;
+
+    return end;
+}
+.EE
+.in
+.\" ----- RETURN VALUE :: ---------------------------------------------/
 .SH RETURN VALUE
-The
-.BR strcpy ()
-function returns a pointer to
-the destination string
-.IR dest .
+The following functions return
+a pointer to the terminating null byte in the destination string.
+.IP \(bu 3
+.PD 0
+.BR stpcpy (3)
+.IP \(bu
+.BR ustr2stp (3)
+.IP \(bu
+.BR zustr2stp (3)
+.PD
+.PP
+The following functions return
+a pointer to the terminating null byte in the destination string,
+except when truncation occurs;
+if truncation occurs,
+they return a pointer to one past the end of the destination buffer
+.RI ( past_end ).
+.IP \(bu 3
+.BR stpecpy (3),
+.BR stpecpyx (3)
+.PP
+The following function returns
+a pointer to one after the last character
+in the destination character sequence;
+if truncation occurs,
+that pointer is equivalent to
+a pointer to one past the end of the destination buffer.
+.IP \(bu 3
+.BR stpncpy (3)
+.PP
+The following function returns
+a pointer to one after the last character
+in the destination character sequence.
+.IP \(bu 3
+.BR zustr2ustp (3)
+.IP \(bu
+.BR ustpcpy (3)
+.PP
+The following functions return
+the length of the total string that they tried to create
+(as if truncation didn't occur).
+.IP \(bu 3
+.BR strlcpy (3bsd),
+.BR strlcat (3bsd)
+.PP
+The following function returns
+the length of the destination string, or
+.B \-E2BIG
+on truncation.
+.IP \(bu 3
+.BR strscpy (3)
+.PP
+The following functions return the
+.I dst
+pointer,
+which is useless.
+.IP \(bu 3
+.PD 0
+.BR strcpy (3),
+.BR strcat (3)
+.IP \(bu
+.BR strncpy (3)
+.IP \(bu
+.BR strncat (3)
+.PD
+.\" ----- ATTRIBUTES :: -----------------------------------------------/
 .SH ATTRIBUTES
 For an explanation of the terms used in this section, see
 .BR attributes (7).
@@ -54,73 +851,264 @@ .SH ATTRIBUTES
 l l l.
 Interface	Attribute	Value
 T{
-.BR strcpy ()
+.BR stpcpy (),
+.BR strcpy (),
+.BR strcat (),
+.BR stpecpy (),
+.BR stpecpyx ()
+.BR strlcpy (),
+.BR strlcat (),
+.BR strscpy (),
+.BR stpncpy (),
+.BR strncpy (),
+.BR zustr2ustp (),
+.BR zustr2stp (),
+.BR strncat (),
+.BR ustr2stp ()
+.BR ustpcpy ()
 T}	Thread safety	MT-Safe
 .TE
 .hy
 .ad
 .sp 1
+.\" ----- STANDARDS :: ------------------------------------------------/
 .SH STANDARDS
-POSIX.1-2001, POSIX.1-2008, C89, C99, SVr4, 4.3BSD.
-.SH NOTES
-.SS strlcpy()
-Some systems (the BSDs, Solaris, and others) provide the following function:
+.TP
+.BR strcpy "(3), \c"
+.BR strcat (3)
+.TQ
+.BR strncpy (3)
+.TQ
+.BR strncat (3)
+POSIX.1‐2001, POSIX.1‐2008, C89, C99, SVr4, 4.3BSD.
+.TP
+.BR stpcpy (3)
+.\" This function was added to POSIX.1-2008.
+.\" Before that, it was not part of
+.\" the C or POSIX.1 standards, nor customary on UNIX systems.
+.\" It first appeared at least as early as 1986,
+.\" in the Lattice C AmigaDOS compiler,
+.\" then in the GNU fileutils and GNU textutils in 1989,
+.\" and in the GNU C library by 1992.
+.\" It is also present on the BSDs.
+.TQ
+.BR stpncpy (3)
+.\" This function was added to POSIX.1-2008.
+.\" Before that, it was a GNU extension.
+.\" It first appeared in glibc 1.07 in 1993.
+POSIX.1-2008.
+.TP
+.BR strlcpy "(3bsd), \c"
+.BR strlcat (3bsd)
+Functions originated in OpenBSD and present in some Unix systems.
+.TP
+.BR strscpy (3)
+Linux kernel internal function.
+.TP
+.BR stpecpy "(3), \c"
+.BR stpecpyx (3)
+.TQ
+.BR zustr2ustp (3)
+.TQ
+.BR zustr2stp (3)
+.TQ
+.BR ustr2stp "(3), \c"
+.BR ustpcpy (3)
+Not defined by any standards nor libraries.
+.\" ----- CAVEATS :: --------------------------------------------------/
+.SH CAVEATS
+Don't mix chain calls to truncating and non-truncating functions.
+It is conceptually wrong
+unless you know that the first part of a copy will always fit.
+Anyway, the performance difference will probably be negligible,
+so it will probably be more clear if you use consistent semantics:
+either truncating or non-truncating.
+Calling a non-truncating function after a truncating one is necessarily wrong.
 .PP
+Some of the functions described here are not provided by any library;
+you should write your own copy if you want to use them.
+See STANDARDS.
+.\" ----- BUGS :: -----------------------------------------------------/
+.SH BUGS
+All concatenation
+.RB (* cat ())
+functions share the same performance problem:
+.UR https://www.joelonsoftware.com/\:2001/12/11/\:back\-to\-basics/
+Shlemiel the painter
+.UE .
+.\" ----- EXAMPLES :: -------------------------------------------------/
+.SH EXAMPLES
+The following are examples of correct use of each of these functions.
+.\" ----- EXAMPLES :: stpcpy(3) ---------------------------------------/
+.TP
+.BR stpcpy (3)
 .in +4n
 .EX
-size_t strlcpy(char *dest, const char *src, size_t size);
+p = buf;
+p = stpcpy(p, "Hello ");
+p = stpcpy(p, "world");
+p = stpcpy(p, "!");
+len = p \- buf;
+puts(buf);
 .EE
 .in
-.PP
-.\" http://static.usenix.org/event/usenix99/full_papers/millert/millert_html/index.html
-.\"     "strlcpy and strlcat - consistent, safe, string copy and concatenation"
-.\"     1999 USENIX Annual Technical Conference
-This function is similar to
-.BR strcpy (),
-but it copies at most
-.I size\-1
-bytes to
-.IR dest ,
-truncating the string as necessary.
-It always adds a terminating null byte.
-This function fixes some of the problems of
-.BR strcpy ()
-but the caller must still handle the possibility of data loss if
-.I size
-is too small.
-The return value of the function is the length of
-.IR src ,
-which allows truncation to be easily detected:
-if the return value is greater than or equal to
-.IR size ,
-truncation occurred.
-If loss of data matters, the caller
-.I must
-either check the arguments before the call,
-or test the function return value.
-.BR strlcpy ()
-is not present in glibc and is not standardized by POSIX,
-.\" https://lwn.net/Articles/506530/
-but is available on Linux via the
-.I libbsd
-library.
-.SH BUGS
-If the destination string of a
-.BR strcpy ()
-is not large enough, then anything might happen.
-Overflowing fixed-length string buffers is a favorite cracker technique
-for taking complete control of the machine.
-Any time a program reads or copies data into a buffer,
-the program first needs to check that there's enough space.
-This may be unnecessary if you can show that overflow is impossible,
-but be careful: programs can get changed over time,
-in ways that may make the impossible possible.
+.\" ----- EXAMPLES :: strcpy(3), strcat(3) ----------------------------/
+.TP
+.BR strcpy (3)
+.TQ
+.BR strcat (3)
+.in +4n
+.EX
+strcpy(buf, "Hello ");
+strcat(buf, "world");
+strcat(buf, "!");
+len = strlen(buf);
+puts(buf);
+.EE
+.in
+.\" ----- EXAMPLES :: stpecpy(3), stpecpyx(3) -------------------------/
+.TP
+.BR stpecpy (3)
+.TQ
+.BR stpecpyx (3)
+.in +4n
+.EX
+past_end = buf + sizeof(buf);
+p = buf;
+p = stpecpy(p, past_end, "Hello ");
+p = stpecpy(p, past_end, "world");
+p = stpecpy(p, past_end, "!");
+if (p == past_end) {
+    p\-\-;
+    goto toolong;
+}
+len = p \- buf;
+puts(buf);
+.EE
+.in
+.\" ----- EXAMPLES :: strlcpy(3bsd), strlcat(3bsd) --------------------/
+.TP
+.BR strlcpy (3bsd)
+.TQ
+.BR strlcat (3bsd)
+.in +4n
+.EX
+if (strlcpy(buf, "Hello ", sizeof(buf)) >= sizeof(buf))
+    goto toolong;
+if (strlcat(buf, "world", sizeof(buf)) >= sizeof(buf))
+    goto toolong;
+len = strlcat(buf, "!", sizeof(buf));
+if (len >= sizeof(buf))
+    goto toolong;
+puts(buf);
+.EE
+.in
+.\" ----- EXAMPLES :: strscpy(3) --------------------------------------/
+.TP
+.BR strscpy (3)
+.in +4n
+.EX
+len = strscpy(buf, "Hello world!", sizeof(buf));
+if (len == \-E2BIG)
+    goto toolong;
+puts(buf);
+.EE
+.in
+.\" ----- EXAMPLES :: stpncpy(3) --------------------------------------/
+.TP
+.BR stpncpy (3)
+.in +4n
+.EX
+end = stpncpy(buf, "Hello world!", sizeof(buf));
+if (sizeof(buf) < strlen("Hello world!"))
+    goto toolong;
+len = end \- buf;
+for (size_t i = 0; i < sizeof(buf); i++)
+    putchar(buf[i]);
+.EE
+.in
+.\" ----- EXAMPLES :: strncpy(3) --------------------------------------/
+.TP
+.BR strncpy (3)
+.in +4n
+.EX
+strncpy(buf, "Hello world!", sizeof(buf));
+if (sizeof(buf) < strlen("Hello world!"))
+    goto toolong;
+len = strnlen(buf, sizeof(buf));
+for (size_t i = 0; i < sizeof(buf); i++)
+    putchar(buf[i]);
+.EE
+.in
+.\" ----- EXAMPLES :: zustr2ustp(3) -----------------------------------/
+.TP
+.BR zustr2ustp (3)
+.in +4n
+.EX
+p = buf;
+p = zustr2ustp(p, "Hello ", 6);
+p = zustr2ustp(p, "world", 42); // Padding null bytes ignored.
+p = zustr2ustp(p, "!", 1);
+len = p \- buf;
+printf("%.*s\en", (int) len, buf);
+.EE
+.in
+.\" ----- EXAMPLES :: zustr2stp(3) ------------------------------------/
+.TP
+.BR zustr2stp (3)
+.in +4n
+.EX
+p = buf;
+p = zustr2stp(p, "Hello ", 6);
+p = zustr2stp(p, "world", 42);  // Padding null bytes ignored.
+p = zustr2stp(p, "!", 1);
+len = p \- buf;
+puts(buf);
+.EE
+.in
+.\" ----- EXAMPLES :: strncat(3) --------------------------------------/
+.TP
+.BR strncat (3)
+.in +4n
+.EX
+buf[0] = \(aq\e0\(aq;  // There's no 'cpy' function to this 'cat'.
+strncat(buf, "Hello ", 6);
+strncat(buf, "world", 42);  // Padding null bytes ignored.
+strncat(buf, "!", 1);
+len = strlen(buf);
+puts(buf);
+.EE
+.in
+.\" ----- EXAMPLES :: ustpcpy(3) --------------------------------------/
+.TP
+.BR ustpcpy (3)
+.in +4n
+.EX
+p = buf;
+p = ustpcpy(p, "Hello ", 6);
+p = ustpcpy(p, "world", 5);
+p = ustpcpy(p, "!", 1);
+len = p \- buf;
+printf("%.*s\en", (int) len, buf);
+.EE
+.in
+.\" ----- EXAMPLES :: ustr2stp(3) -------------------------------------/
+.TP
+.BR ustr2stp (3)
+.in +4n
+.EX
+p = buf;
+p = ustr2stp(p, "Hello ", 6);
+p = ustr2stp(p, "world", 5);
+p = ustr2stp(p, "!", 1);
+len = p \- buf;
+puts(buf);
+.EE
+.in
+.\" ----- SEE ALSO :: -------------------------------------------------/
 .SH SEE ALSO
-.BR bcopy (3),
-.BR memccpy (3),
+.BR bzero (3),
 .BR memcpy (3),
-.BR memmove (3),
-.BR stpcpy (3),
-.BR strdup (3),
-.BR string (3),
-.BR wcscpy (3)
+.BR memccpy (3),
+.BR mempcpy (3),
+.BR string (3)
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 1/1] strcpy.3: Rewrite page to document all string-copying functions
  2022-12-14  0:03     ` [PATCH v3 " Alejandro Colomar
@ 2022-12-14 16:22       ` Douglas McIlroy
  2022-12-14 16:36         ` Alejandro Colomar
  0 siblings, 1 reply; 53+ messages in thread
From: Douglas McIlroy @ 2022-12-14 16:22 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: linux-man, Alejandro Colomar, Martin Sebor, G. Branden Robinson,
	Jakub Wilk

> a sequence of zero or more non-null characters followed by a null byte

Varying  terminology (character vs byte) is poor style in technical writing.

> concatenate

We began fighting this pomposity before v7. There has only been
backsliding since..
"Catenate" is crisper, means the same thing, and concurs with the "cat" command.
I invite you to join the battle for simplicity.

> chain copy

This term is never overtly defined. The definition might be inferred
from, "To chain copy
functions, they need to return a pointer to the end", but the
problematic grammar of the
sentence diverts attention from its content.

> strscpy

Doesn't it muddy the waters to include a non-library function in man3?

Doug

On Tue, Dec 13, 2022 at 7:03 PM Alejandro Colomar
<alx.manpages@gmail.com> wrote:
>
> This is an opportunity to use consistent language across the
> documentation for all string-copying functions.
>
> It is also easier to show the similarities and differences between all
> of the functions, so that a reader can use this page to know which
> function is needed for a given task.
>
> Many functions that are inferior to another one, have been marked as
> deprecated, notwithstanding the deprecation status in C libraries or
> any standards.  Alternatives have been given in the same page, with
> reference implementations.
>
> Cc: Martin Sebor <msebor@redhat.com>
> Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
> Cc: Douglas McIlroy <douglas.mcilroy@dartmouth.edu>
> Cc: Jakub Wilk <jwilk@jwilk.net>
> Signed-off-by: Alejandro Colomar <alx@kernel.org>
> ---
>  man3/strcpy.3 | 1058 +++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 970 insertions(+), 88 deletions(-)
>
> diff --git a/man3/strcpy.3 b/man3/strcpy.3
> index 74c3180ae..e04a7b149 100644
> --- a/man3/strcpy.3
> +++ b/man3/strcpy.3
> @@ -1,48 +1,767 @@
> -.\" Copyright (C) 1993 David Metcalfe (david@prism.demon.co.uk)
> +.\" Copyright 2022 Alejandro Colomar <alx@kernel.org>
>  .\"
> -.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> -.\"
> -.\" References consulted:
> -.\"     Linux libc source code
> -.\"     Lewine's _POSIX Programmer's Guide_ (O'Reilly & Associates, 1991)
> -.\"     386BSD man pages
> -.\" Modified Sat Jul 24 18:06:49 1993 by Rik Faith (faith@cs.unc.edu)
> -.\" Modified Fri Aug 25 23:17:51 1995 by Andries Brouwer (aeb@cwi.nl)
> -.\" Modified Wed Dec 18 00:47:18 1996 by Andries Brouwer (aeb@cwi.nl)
> -.\" 2007-06-15, Marc Boyer <marc.boyer@enseeiht.fr> + mtk
> -.\"     Improve discussion of strncpy().
> +.\" SPDX-License-Identifier: BSD-3-Clause
>  .\"
>  .TH strcpy 3 (date) "Linux man-pages (unreleased)"
> +.\" ----- NAME :: -----------------------------------------------------/
>  .SH NAME
> -strcpy \- copy a string
> +stpcpy,
> +strcpy, strcat,
> +stpecpy, stpecpyx,
> +strlcpy, strlcat,
> +strscpy,
> +stpncpy,
> +strncpy,
> +ustr2stp,
> +strncat,
> +mempcpy
> +\- copy strings and character sequences
> +.\" ----- LIBRARY :: --------------------------------------------------/
>  .SH LIBRARY
> +.TP
> +.BR stpcpy (3)
> +.TQ
> +.BR strcpy "(3), \c"
> +.BR strcat (3)
> +.TQ
> +.BR stpncpy (3)
> +.TQ
> +.BR strncpy (3)
> +.TQ
> +.BR strncat (3)
> +.TQ
> +.BR mempcpy (3)
>  Standard C library
>  .RI ( libc ", " \-lc )
> +.TP
> +.BR stpecpy "(3), \c"
> +.BR stpecpyx (3)
> +Not provided by any library.
> +.TP
> +.BR strlcpy "(3), \c"
> +.BR strlcat (3)
> +Utility functions from BSD systems
> +.RI ( libbsd ", " \-lbsd )
> +.TP
> +.BR strscpy (3)
> +Not provided by any library.
> +It is a Linux kernel internal function.
> +.\" ----- SYNOPSIS :: -------------------------------------------------/
>  .SH SYNOPSIS
>  .nf
>  .B #include <string.h>
> +.fi
> +.\" ----- SYNOPSIS :: (Null-terminated) strings -----------------------/
> +.SS Strings
> +.nf
> +// Chain-copy a string.
> +.BI "char *stpcpy(char *restrict " dst ", const char *restrict " src );
>  .PP
> -.BI "char *strcpy(char *restrict " dest ", const char *restrict " src );
> +// Copy/concatenate a string.
> +.BI "char *strcpy(char *restrict " dst ", const char *restrict " src );
> +.BI "char *strcat(char *restrict " dst ", const char *restrict " src );
> +.PP
> +// Chain-copy a string with truncation.
> +.BI "char *stpecpy(char *" dst ", char " past_end "[0], \
> +const char *restrict " src );
> +.PP
> +// Chain-copy a string with truncation and SIGSEGV on UB.
> +.BI "char *stpecpyx(char *" dst ", char " past_end "[0], \
> +const char *restrict " src );
> +.PP
> +// Copy/concatenate a string with truncation and SIGSEGV on UB.
> +.BI "size_t strlcpy(char " dst "[restrict ." sz "], \
> +const char *restrict " src ,
> +.BI "               size_t " sz );
> +.BI "size_t strlcat(char " dst "[restrict ." sz "], \
> +const char *restrict " src ,
> +.BI "               size_t " sz );
> +.PP
> +// Copy a string with truncation.
> +.BI "ssize_t strscpy(char " dst "[restrict ." sz "], \
> +const char " src "[restrict ." sz ],
> +.BI "               size_t " sz );
> +.fi
> +.\" ----- SYNOPSIS :: Null-padded character sequences --------/
> +.SS Null-padded character sequences
> +.nf
> +// Zero a fixed-width buffer, and
> +// copy a string with truncation into a character sequence.
> +.BI "char *stpncpy(char " dst "[restrict ." sz "], \
> +const char *restrict " src ,
> +.BI "               size_t " sz );
> +.PP
> +// Zero a fixed-width buffer, and
> +// copy a string with truncation into a character sequence.
> +.BI "char *strncpy(char " dest "[restrict ." sz "], \
> +const char *restrict " src ,
> +.BI "               size_t " sz );
> +.PP
> +// Chain-copy a null-padded character sequence into a string.
> +.BI "char *ustr2stp(char *restrict " dst ", \
> +const char " src "[restrict ." sz ],
> +.BI "               size_t " sz );
> +.PP
> +// Concatenate a null-padded character sequence into a string.
> +.BI "char *strncat(char *restrict " dst ", const char " src "[restrict ." sz ],
> +.BI "               size_t " sz );
> +.fi
> +.\" ----- SYNOPSIS :: Measured character sequences --------------------/
> +.SS Measured character sequences
> +.nf
> +// Chain-copy a measured character sequence.
> +.BI "void *mempcpy(void *restrict " dst ", \
> +const void " src "[restrict ." len ],
> +.BI "               size_t " len );
> +.fi
> +.PP
> +.RS -4
> +Feature Test Macro Requirements for glibc (see
> +.BR feature_test_macros (7)):
> +.RE
> +.PP
> +.BR stpcpy (3),
> +.BR stpncpy (3):
> +.nf
> +    Since glibc 2.10:
> +        _POSIX_C_SOURCE >= 200809L
> +    Before glibc 2.10:
> +        _GNU_SOURCE
> +.fi
> +.PP
> +.BR mempcpy (3):
> +.nf
> +    _GNU_SOURCE
>  .fi
>  .SH DESCRIPTION
> -The
> -.BR strcpy ()
> -function copies the string pointed to by
> -.IR src ,
> -including the terminating null byte (\(aq\e0\(aq),
> -to the buffer pointed to by
> -.IR dest .
> -The strings may not overlap, and the destination string
> -.I dest
> -must be large enough to receive the copy.
> -.I Beware of buffer overruns!
> -(See BUGS.)
> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: -----------------/
> +.SS Terms (and abbreviations)
> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: string (str) ----/
> +.TP
> +.IR "string " ( str )
> +is a sequence of zero or more non-null characters followed by a null byte.
> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: null-padded character seq
> +.TP
> +.IR "character sequence " ( ustr )
> +is a sequence of zero or more non-null characters.
> +A program should never usa a character sequence where a string is required.
> +However, with appropriate care,
> +a string can be used in the place of a character sequence.
> +.RS
> +.TP
> +.I null-padded character sequence
> +Character sequences can be contained in fixed-width buffers,
> +which contain padding null bytes after the character sequence,
> +to fill the rest of the buffer
> +without affecting the character sequence;
> +however, those padding null bytes are not part of the character sequence.
> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: measured character sequence
> +.TP
> +.I measured character sequence
> +Character sequence delimited by its length.
> +.RE
> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: length (len) ----/
> +.TP
> +.IR "length " ( len )
> +is the number of non-null characters in a string or character sequence.
> +It is the return value of
> +.I strlen(str)
> +and of
> +.IR "strnlen(ustr, sz)" .
> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: size (sz) -------/
> +.TP
> +.IR "size " ( sz )
> +refers to the entire buffer
> +where the string or character sequence is contained.
> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: end -------------/
> +.TP
> +.I end
> +is the name of a pointer to the terminating null byte of a string,
> +or a pointer to one past the last character of a character sequence.
> +This is the return value of functions that allow chaining.
> +It is equivalent to
> +.IR &str[len] .
> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: past_end --------/
> +.TP
> +.I past_end
> +is the name of a pointer to one past the end of the buffer
> +that contains a string or character sequence.
> +It is equivalent to
> +.IR &str[sz] .
> +It is used as a sentinel value,
> +to be able to truncate strings or character sequences
> +instead of overrunning the containing buffer.
> +.\" ----- DESCRIPTION :: Copy, concatenate, and chain-copy ------------/
> +.SS Copy, concatenate, and chain-copy
> +Originally,
> +there was a distinction between functions that copy and those that concatenate.
> +However, newer functions that copy while allowing chaining
> +cover both use cases with a single API.
> +They are also algorithmically faster,
> +since they don't need to search for the end of the existing string.
> +However, functions that concatenate have a much simpler use,
> +so if performance is not important,
> +it can make sense to use them for improving readability.
> +.PP
> +To chain copy functions,
> +they need to return a pointer to the
> +.IR end .
> +That's a byproduct of the copy operation,
> +so it has no performance costs.
> +Functions that return such a pointer,
> +and thus can be chained,
> +have names of the form
> +.RB * stp *()
> +or
> +.RB * memp *(),
> +since it's also common to name the pointer just
> +.IR p .
> +.PP
> +Chain-copying functions that truncate
> +should accept a pointer to one past the end of the destination buffer,
> +and have names of the form
> +.RB * stpe *().
> +This allows not having to recalculate the remaining size after each call.
> +.\" ----- DESCRIPTION :: Truncate or not? -----------------------------/
> +.SS Truncate or not?
> +The first thing to note is that programmers should be careful with buffers,
> +so they always have the correct size,
> +and truncation is not necessary.
> +.PP
> +In most cases,
> +truncation is not desired,
> +and it is simpler to just do the copy.
> +Simpler code is safer code.
> +Programming against programming mistakes by adding more code
> +just adds more points where mistakes can be made.
> +.PP
> +Nowadays,
> +compilers can detect most programmer errors with features like
> +compiler warnings,
> +static analyzers, and
> +.BR \%_FORTIFY_SOURCE
> +(see
> +.BR ftm (7)).
> +Keeping the code simple
> +helps these overflow-detection features be more precise.
> +.PP
> +When validating user input,
> +however,
> +it makes sense to truncate.
> +Remember to check the return value of such function calls.
> +.PP
> +Functions that truncate:
> +.IP \(bu 3
> +.BR stpecpy (3)
> +is the most efficient string copy function that performs truncation.
> +It only requires to check for truncation once after all chained calls.
> +.IP \(bu
> +.BR stpecpyx (3)
> +is a variant of
> +.BR stpecpy (3)
> +that consumes the entire source string,
> +to catch bugs in the program
> +by forcing a segmentation fault (as
> +.BR strlcpy (3bsd)
> +and
> +.BR strlcat (3bsd)
> +do).
> +.IP \(bu
> +.BR strlcpy (3bsd)
> +and
> +.BR strlcat (3bsd)
> +are designed to crash if the input string is invalid
> +(doesn't contain a terminating null byte).
> +.IP \(bu
> +.BR strscpy (3)
> +reports an error instead of crashing (similar to
> +.BR stpecpy (3)).
> +.IP \(bu
> +.BR stpncpy (3)
> +and
> +.BR strncpy (3)
> +also truncate, but they don't write strings,
> +but rather null-padded character sequences.
> +.\" ----- DESCRIPTION :: Null-padded character sequences --------------/
> +.SS Null-padded character sequences
> +For historic reasons,
> +some standard APIs,
> +such as
> +.BR utmpx (5),
> +use null-padded character sequences in fixed-width buffers.
> +To interface with them,
> +specialized functions need to be used.
> +.PP
> +To copy strings into them, use
> +.BR stpncpy (3).
> +.PP
> +To copy from an unterminated string within a fixed-width buffer into a string,
> +ignoring any trailing null bytes in the source fixed-width buffer,
> +you should use
> +.BR ustr2stp (3)
> +or
> +.BR strncat (3).
> +.\" ----- DESCRIPTION :: Measured character sequences -----------------/
> +.SS Measured character sequences
> +The simplest character sequence copying function is
> +.BR mempcpy (3).
> +It requires always knowing the length of your character sequences,
> +for which structures can be used.
> +It makes the code much faster,
> +since you always know the length of your character sequences,
> +and can do the minimal copies and length measurements.
> +.BR mempcpy (3)
> +copies character sequences,
> +so you need to explicitly set the terminating null byte if you need a string.
> +.PP
> +The following code can be used to
> +chain-copy from a measured character sequence into a string:
> +.PP
> +.in +4n
> +.EX
> +p = mempcpy(p, foo\->ustr, foo\->len);
> +*p = \(aq\e0\(aq;
> +.EE
> +.in
> +.PP
> +The following code can be used to
> +chain-copy from a measured character sequence into an unterminated string:
> +.PP
> +.in +4n
> +.EX
> +p = mempcpy(p, bar\->ustr, bar\->len);
> +.EE
> +.in
> +.PP
> +In programs that make considerable use of strings or character sequences,
> +and need the best performance,
> +using overlapping character sequences can make a big difference.
> +It allows holding subsequences of a larger character sequence.
> +while not duplicating memory
> +nor using time to do a copy.
> +.PP
> +However, this is delicate,
> +since it requires using character sequences.
> +C library APIs use strings,
> +so programs that use character sequences
> +will have to take care of differentiating strings from character sequences.
> +.\" ----- DESCRIPTION :: String vs character sequence -----------------/
> +.SS String vs character sequence
> +Some functions only operate on strings.
> +Those require that the input
> +.I src
> +is a string,
> +and guarantee an output string
> +(even when truncation occurs).
> +Functions that concatenate
> +also require that
> +.I dst
> +holds a string before the call.
> +List of functions:
> +.IP \(bu 3
> +.PD 0
> +.BR stpcpy (3)
> +.IP \(bu
> +.BR strcpy "(3), \c"
> +.BR strcat (3)
> +.IP \(bu
> +.BR stpecpy "(3), \c"
> +.BR stpecpyx (3)
> +.IP \(bu
> +.BR strlcpy "(3bsd), \c"
> +.BR strlcat (3bsd)
> +.IP \(bu
> +.BR strscpy (3)
> +.PD
> +.PP
> +Other functions require an input string,
> +but create a character sequence as output.
> +These functions have confusing names,
> +and have a long history of misuse.
> +List of functions:
> +.IP \(bu 3
> +.PD 0
> +.BR stpncpy (3)
> +.IP \(bu
> +.BR strncpy (3)
> +.PD
> +.PP
> +Other functions operate on an input character sequence,
> +and create an output string.
> +Functions that concatenate
> +also require that
> +.I dst
> +holds a string before the call.
> +.BR strncat (3)
> +has an even more misleading name than the functions above.
> +List of functions:
> +.IP \(bu 3
> +.PD 0
> +.BR ustr2stp (3)
> +.IP \(bu
> +.BR strncat (3)
> +.PD
> +.PP
> +And the last one,
> +operates on an input character sequence
> +to create an output character sequence.
> +But because it asks for the length,
> +and a string is by nature composed of a character sequence of the same length
> +plus a terminating null byte,
> +a string is also accepted as input.
> +Function:
> +.IP \(bu 3
> +.BR mempcpy (3)
> +.\" ----- DESCRIPTION :: Functions :: ---------------------------------/
> +.SS Functions
> +.\" ----- DESCRIPTION :: Functions :: stpcpy(3) -----------------------/
> +.TP
> +.BR stpcpy (3)
> +This function copies the input string into a destination string.
> +The programmer is responsible for allocating a buffer large enough.
> +It returns a pointer suitable for chaining.
> +.IP
> +An implementation of this function might be:
> +.IP
> +.in +4n
> +.EX
> +char *
> +stpcpy(char *restrict dst, const char *restrict src)
> +{
> +    return mempcpy(dst, src, strlen(src));
> +}
> +.EE
> +.in
> +.\" ----- DESCRIPTION :: Functions :: strcpy(3), strcat(3) ------------/
> +.TP
> +.BR strcpy (3)
> +.TQ
> +.BR strcat (3)
> +These functions copy the input string into a destination string.
> +The programmer is responsible for allocating a buffer large enough.
> +The return value is useless.
> +.IP
> +.BR stpcpy (3)
> +is a faster alternative to these functions.
> +.IP
> +An implementation of these functions might be:
> +.IP
> +.in +4n
> +.EX
> +char *
> +strcpy(char *restrict dst, const char *restrict src)
> +{
> +    stpcpy(dst, src);
> +    return dst;
> +}
> +
> +char *
> +strcat(char *restrict dst, const char *restrict src)
> +{
> +    stpcpy(dst + strlen(dst), src);
> +    return dst;
> +}
> +.EE
> +.in
> +.\" ----- DESCRIPTION :: Functions :: stpecpy(3), stpecpyx(3) ---------/
> +.TP
> +.BR stpecpy (3)
> +.TQ
> +.BR stpecpyx (3)
> +These functions copy the input string into a destination string.
> +If the destination buffer,
> +limited by a pointer to one past the end of it,
> +isn't large enough to hold the copy,
> +the resulting string is truncated
> +(but it is guaranteed to be null-terminated).
> +They return a pointer suitable for chaining.
> +Truncation needs to be detected only once after the last chained call.
> +.BR stpecpyx (3)
> +has identical semantics to
> +.BR stpecpy (3),
> +except that it forces a SIGSEGV if the
> +.I src
> +pointer is not a string.
> +.IP
> +These functions are not provided by any library,
> +but you can define them with the following reference implementations:
> +.IP
> +.in +4n
> +.EX
> +/* This code is in the public domain. */
> +char *
> +stpecpy(char *dst, char past_end[0],
> +        const char *restrict src)
> +{
> +    char *p;
> +
> +    if (dst == past_end)
> +        return past_end;
> +
> +    p = memccpy(dst, src, \(aq\e0\(aq, past_end \- dst);
> +    if (p != NULL)
> +        return p \- 1;
> +
> +    /* truncation detected */
> +    past_end[\-1] = \(aq\e0\(aq;
> +    return past_end;
> +}
> +
> +/* This code is in the public domain. */
> +char *
> +stpecpyx(char *dst, char past_end[0],
> +         const char *restrict src)
> +{
> +    if (src[strlen(src)] != \(aq\e0\(aq)
> +        raise(SIGSEGV);
> +
> +    return stpecpy(dst, past_end, src);
> +}
> +.EE
> +.in
> +.\" ----- DESCRIPTION :: Functions :: strlcpy(3bsd), strlcat(3bsd) ----/
> +.TP
> +.BR strlcpy (3bsd)
> +.TQ
> +.BR strlcat (3bsd)
> +These functions copy the input string into a destination string.
> +If the destination buffer,
> +limited by its size,
> +isn't large enough to hold the copy,
> +the resulting string is truncated
> +(but it is guaranteed to be null-terminated).
> +They return the length of the total string they tried to create.
> +These functions force a SIGSEGV if the
> +.I src
> +pointer is not a string.
> +.IP
> +.BR stpecpyx (3)
> +is a faster alternative to these functions.
> +.\" ----- DESCRIPTION :: Functions :: strscpy(3) ----------------------/
> +.TP
> +.BR strscpy (3)
> +This function copies the input string into a destination string.
> +If the destination buffer,
> +limited by its size,
> +isn't large enough to hold the copy,
> +the resulting string is truncated
> +(but it is guaranteed to be null-terminated).
> +It returns the length of the destination string, or
> +.B \-E2BIG
> +on truncation.
> +.IP
> +.BR stpecpy (3)
> +is a simpler and faster alternative to this function.
> +.RE
> +.\" ----- DESCRIPTION :: Functions :: stpncpy(3) ----------------------/
> +.TP
> +.BR stpncpy (3)
> +This function copies the input string into
> +a destination null-padded character sequence in a fixed-width buffer.
> +If the destination buffer,
> +limited by its size,
> +isn't large enough to hold the copy,
> +the resulting character sequence is truncated.
> +Since it creates a character sequence,
> +it doesn't need to write a terminating null byte.
> +It returns a pointer suitable for chaining,
> +but it's not ideal for that.
> +Truncation needs to be detected only once after the last chained call.
> +.IP
> +If you're going to use this function in chained calls,
> +it would be useful to develop a similar function
> +that accepts a pointer to one past the end of the buffer instead of a size.
> +.IP
> +An implementation of this function might be:
> +.IP
> +.in +4n
> +.EX
> +char *
> +stpncpy(char *restrict dst, const char *restrict src,
> +        size_t sz)
> +{
> +    char  *p;
> +
> +    bzero(dst, sz);
> +    p = memccpy(dst, src, \(aq\e0\(aq, sz);
> +    if (p == NULL)
> +        return dst + sz;
> +
> +    return p \- 1;
> +}
> +.EE
> +.in
> +.\" ----- DESCRIPTION :: Functions :: ustr2stp(3) ---------------------/
> +.TP
> +.BR ustr2stp (3)
> +This function copies the input character sequence
> +contained in a null-padded wixed-width buffer,
> +into a destination string.
> +The programmer is responsible for allocating a buffer large enough.
> +It returns a pointer suitable for chaining.
> +.IP
> +A truncating version of this function doesn't exist,
> +since the size of the original character sequence is always known,
> +so it wouldn't be very useful.
> +.IP
> +This function is not provided by any library,
> +but you can define it with the following reference implementation:
> +.IP
> +.in +4n
> +.EX
> +/* This code is in the public domain. */
> +char *
> +ustr2stp(char *restrict dst, const char *restrict src,
> +         size_t sz)
> +{
> +    char  *end;
> +
> +    end = memccpy(dst, src, \(aq\e0\(aq, sz)) ?: dst + sz;
> +    *end = \(aq\e0\(aq;
> +
> +    return end;
> +}
> +.EE
> +.in
> +.\" ----- DESCRIPTION :: Functions :: strncpy(3) ----------------------/
> +.TP
> +.BR strncpy (3)
> +This function is identical to
> +.BR stpncpy (3)
> +except for the useless return value.
> +Due to the return value,
> +with this function it's hard to correctly check for truncation.
> +.IP
> +.BR stpncpy (3)
> +is a simpler alternative to this function.
> +.IP
> +An implementation of this function might be:
> +.IP
> +.in +4n
> +.EX
> +char *
> +strncpy(char *restrict dst, const char *restrict src,
> +        size_t sz)
> +{
> +    stpncpy(dst, src, sz);
> +    return dst;
> +}
> +.EE
> +.in
> +.\" ----- DESCRIPTION :: Functions :: strncat(3) ----------------------/
> +.TP
> +.BR strncat (3)
> +Do not confuse this function with
> +.BR strncpy (3);
> +they are not related at all.
> +.IP
> +This function concatenates the input character sequence
> +contained in a null-padded wixed-width buffer,
> +into a destination string.
> +The programmer is responsible for allocating a buffer large enough.
> +The return value is useless.
> +.IP
> +.BR ustr2stp (3)
> +is a faster alternative to this function.
> +.IP
> +An implementation of this function might be:
> +.IP
> +.in +4n
> +.EX
> +char *
> +strncat(char *restrict dst, const char *restrict src,
> +        size_t sz)
> +{
> +    ustr2stp(dst + strlen(dst), src, sz);
> +    return dst;
> +}
> +.EE
> +.in
> +.\" ----- DESCRIPTION :: Functions :: mempcpy(3) ----------------------/
> +.TP
> +.BR mempcpy (3)
> +This function copies the input character sequence,
> +limited by its length,
> +into a destination character sequence.
> +The programmer is responsible for allocating a buffer large enough.
> +It returns a pointer suitable for chaining.
> +.IP
> +An implementation of this function might be:
> +.IP
> +.in +4n
> +.EX
> +void *
> +mempcpy(void *restrict dst, const void *restrict src,
> +        size_t len)
> +{
> +    return memcpy(dst, src, len) + len;
> +}
> +.EE
> +.in
> +.\" ----- RETURN VALUE :: ---------------------------------------------/
>  .SH RETURN VALUE
> -The
> -.BR strcpy ()
> -function returns a pointer to
> -the destination string
> -.IR dest .
> +The following functions return
> +a pointer to the terminating null byte in the destination string.
> +.IP \(bu 3
> +.PD 0
> +.BR stpcpy (3)
> +.IP \(bu
> +.BR ustr2stp (3)
> +.PD
> +.PP
> +The following functions return
> +a pointer to the terminating null byte in the destination string,
> +except when truncation occurs;
> +if truncation occurs,
> +they return a pointer to one past the end of the destination buffer
> +.RI ( past_end ).
> +.IP \(bu 3
> +.BR stpecpy (3),
> +.BR stpecpyx (3)
> +.PP
> +The following function returns
> +a pointer to one after the last character
> +in the destination character sequence;
> +if truncation occurs,
> +that pointer is equivalent to
> +a pointer to one past the end of the destination buffer.
> +.IP \(bu 3
> +.BR stpncpy (3)
> +.PP
> +The following function returns
> +a pointer to one after the last character
> +in the destination character sequence.
> +.IP \(bu 3
> +.BR mempcpy (3)
> +.PP
> +The following functions return
> +the length of the total string that they tried to create
> +(as if truncation didn't occur).
> +.IP \(bu 3
> +.BR strlcpy (3bsd),
> +.BR strlcat (3bsd)
> +.PP
> +The following function returns
> +the length of the destination string, or
> +.B \-E2BIG
> +on truncation.
> +.IP \(bu 3
> +.BR strscpy (3)
> +.PP
> +The following functions return the
> +.I dst
> +pointer,
> +which is useless.
> +.IP \(bu 3
> +.PD 0
> +.BR strcpy (3),
> +.BR strcat (3)
> +.IP \(bu
> +.BR strncpy (3)
> +.IP \(bu
> +.BR strncat (3)
> +.PD
> +.\" ----- ATTRIBUTES :: -----------------------------------------------/
>  .SH ATTRIBUTES
>  For an explanation of the terms used in this section, see
>  .BR attributes (7).
> @@ -54,73 +773,236 @@ .SH ATTRIBUTES
>  l l l.
>  Interface      Attribute       Value
>  T{
> -.BR strcpy ()
> +.BR stpcpy (),
> +.BR strcpy (),
> +.BR strcat (),
> +.BR stpecpy (),
> +.BR stpecpyx ()
> +.BR strlcpy (),
> +.BR strlcat (),
> +.BR strscpy (),
> +.BR stpncpy (),
> +.BR strncpy (),
> +.BR ustr2stp (),
> +.BR strncat (),
> +.BR mempcpy ()
>  T}     Thread safety   MT-Safe
>  .TE
>  .hy
>  .ad
>  .sp 1
> +.\" ----- STANDARDS :: ------------------------------------------------/
>  .SH STANDARDS
> -POSIX.1-2001, POSIX.1-2008, C89, C99, SVr4, 4.3BSD.
> -.SH NOTES
> -.SS strlcpy()
> -Some systems (the BSDs, Solaris, and others) provide the following function:
> +.TP
> +.BR strcpy "(3), \c"
> +.BR strcat (3)
> +.TQ
> +.BR strncpy (3)
> +.TQ
> +.BR strncat (3)
> +POSIX.1‐2001, POSIX.1‐2008, C89, C99, SVr4, 4.3BSD.
> +.TP
> +.BR stpcpy (3)
> +.\" This function was added to POSIX.1-2008.
> +.\" Before that, it was not part of
> +.\" the C or POSIX.1 standards, nor customary on UNIX systems.
> +.\" It first appeared at least as early as 1986,
> +.\" in the Lattice C AmigaDOS compiler,
> +.\" then in the GNU fileutils and GNU textutils in 1989,
> +.\" and in the GNU C library by 1992.
> +.\" It is also present on the BSDs.
> +.TQ
> +.BR stpncpy (3)
> +.\" This function was added to POSIX.1-2008.
> +.\" Before that, it was a GNU extension.
> +.\" It first appeared in glibc 1.07 in 1993.
> +POSIX.1-2008.
> +.TP
> +.BR strlcpy "(3bsd), \c"
> +.BR strlcat (3bsd)
> +Functions originated in OpenBSD and present in some Unix systems.
> +.TP
> +.BR mempcpy (3)
> +This function is a GNU extension.
> +.TP
> +.BR strscpy (3)
> +Linux kernel internal function.
> +.TP
> +.BR stpecpy "(3), \c"
> +.BR stpecpyx (3)
> +.TQ
> +.BR ustr2stp (3)
> +Not defined by any standards nor libraries.
> +.\" ----- CAVEATS :: --------------------------------------------------/
> +.SH CAVEATS
> +Don't mix chain calls to truncating and non-truncating functions.
> +It is conceptually wrong
> +unless you know that the first part of a copy will always fit.
> +Anyway, the performance difference will probably be negligible,
> +so it will probably be more clear if you use consistent semantics:
> +either truncating or non-truncating.
> +Calling a non-truncating function after a truncating one is necessarily wrong.
>  .PP
> +Some of the functions described here are not provided by any library;
> +you should write your own copy if you want to use them.
> +See STANDARDS.
> +.\" ----- BUGS :: -----------------------------------------------------/
> +.SH BUGS
> +All concatenation
> +.RB (* cat ())
> +functions share the same performance problem:
> +.UR https://www.joelonsoftware.com/\:2001/12/11/\:back\-to\-basics/
> +Shlemiel the painter
> +.UE .
> +.\" ----- EXAMPLES :: -------------------------------------------------/
> +.SH EXAMPLES
> +The following are examples of correct use of each of these functions.
> +.\" ----- EXAMPLES :: stpcpy(3) ---------------------------------------/
> +.TP
> +.BR stpcpy (3)
>  .in +4n
>  .EX
> -size_t strlcpy(char *dest, const char *src, size_t size);
> +p = buf;
> +p = stpcpy(p, "Hello ");
> +p = stpcpy(p, "world");
> +p = stpcpy(p, "!");
> +len = p \- buf;
> +puts(buf);
>  .EE
>  .in
> -.PP
> -.\" http://static.usenix.org/event/usenix99/full_papers/millert/millert_html/index.html
> -.\"     "strlcpy and strlcat - consistent, safe, string copy and concatenation"
> -.\"     1999 USENIX Annual Technical Conference
> -This function is similar to
> -.BR strcpy (),
> -but it copies at most
> -.I size\-1
> -bytes to
> -.IR dest ,
> -truncating the string as necessary.
> -It always adds a terminating null byte.
> -This function fixes some of the problems of
> -.BR strcpy ()
> -but the caller must still handle the possibility of data loss if
> -.I size
> -is too small.
> -The return value of the function is the length of
> -.IR src ,
> -which allows truncation to be easily detected:
> -if the return value is greater than or equal to
> -.IR size ,
> -truncation occurred.
> -If loss of data matters, the caller
> -.I must
> -either check the arguments before the call,
> -or test the function return value.
> -.BR strlcpy ()
> -is not present in glibc and is not standardized by POSIX,
> -.\" https://lwn.net/Articles/506530/
> -but is available on Linux via the
> -.I libbsd
> -library.
> -.SH BUGS
> -If the destination string of a
> -.BR strcpy ()
> -is not large enough, then anything might happen.
> -Overflowing fixed-length string buffers is a favorite cracker technique
> -for taking complete control of the machine.
> -Any time a program reads or copies data into a buffer,
> -the program first needs to check that there's enough space.
> -This may be unnecessary if you can show that overflow is impossible,
> -but be careful: programs can get changed over time,
> -in ways that may make the impossible possible.
> +.\" ----- EXAMPLES :: strcpy(3), strcat(3) ----------------------------/
> +.TP
> +.BR strcpy (3)
> +.TQ
> +.BR strcat (3)
> +.in +4n
> +.EX
> +strcpy(buf, "Hello ");
> +strcat(buf, "world");
> +strcat(buf, "!");
> +len = strlen(buf);
> +puts(buf);
> +.EE
> +.in
> +.\" ----- EXAMPLES :: stpecpy(3), stpecpyx(3) -------------------------/
> +.TP
> +.BR stpecpy (3)
> +.TQ
> +.BR stpecpyx (3)
> +.in +4n
> +.EX
> +past_end = buf + sizeof(buf);
> +p = buf;
> +p = stpecpy(p, past_end, "Hello ");
> +p = stpecpy(p, past_end, "world");
> +p = stpecpy(p, past_end, "!");
> +if (p == past_end) {
> +    p\-\-;
> +    goto toolong;
> +}
> +len = p \- buf;
> +puts(buf);
> +.EE
> +.in
> +.\" ----- EXAMPLES :: strlcpy(3bsd), strlcat(3bsd) --------------------/
> +.TP
> +.BR strlcpy (3bsd)
> +.TQ
> +.BR strlcat (3bsd)
> +.in +4n
> +.EX
> +if (strlcpy(buf, "Hello ", sizeof(buf)) >= sizeof(buf))
> +    goto toolong;
> +if (strlcat(buf, "world", sizeof(buf)) >= sizeof(buf))
> +    goto toolong;
> +len = strlcat(buf, "!", sizeof(buf));
> +if (len >= sizeof(buf))
> +    goto toolong;
> +puts(buf);
> +.EE
> +.in
> +.\" ----- EXAMPLES :: strscpy(3) --------------------------------------/
> +.TP
> +.BR strscpy (3)
> +.in +4n
> +.EX
> +len = strscpy(buf, "Hello world!", sizeof(buf));
> +if (len == \-E2BIG)
> +    goto toolong;
> +puts(buf);
> +.EE
> +.in
> +.\" ----- EXAMPLES :: stpncpy(3) --------------------------------------/
> +.TP
> +.BR stpncpy (3)
> +.in +4n
> +.EX
> +past_end = buf + sizeof(buf);
> +end = stpncpy(buf, "Hello world!", sizeof(buf));
> +if (end == past_end)
> +    goto toolong;
> +len = end \- buf;
> +for (size_t i = 0; i < sizeof(buf); i++)
> +    putchar(buf[i]);
> +.EE
> +.in
> +.\" ----- EXAMPLES :: strncpy(3) --------------------------------------/
> +.TP
> +.BR strncpy (3)
> +.in +4n
> +.EX
> +strncpy(buf, "Hello world!", sizeof(buf));
> +if (buf + sizeof(buf) \- 1 == \(aq\e0\(aq)
> +    goto toolong;
> +len = strnlen(buf, sizeof(buf));
> +for (size_t i = 0; i < sizeof(buf); i++)
> +    putchar(buf[i]);
> +.EE
> +.in
> +.\" ----- EXAMPLES :: ustr2stp(3) -------------------------------------/
> +.TP
> +.BR ustr2stp (3)
> +.in +4n
> +.EX
> +p = buf;
> +p = ustr2stp(p, "Hello ", 6);
> +p = ustr2stp(p, "world", 42);  // Padding null bytes ignored.
> +p = ustr2stp(p, "!", 1);
> +len = p \- buf;
> +puts(buf);
> +.EE
> +.in
> +.\" ----- EXAMPLES :: strncat(3) --------------------------------------/
> +.TP
> +.BR strncat (3)
> +.in +4n
> +.EX
> +buf[0] = \(aq\e0\(aq;  // There's no 'cpy' function to this 'cat'.
> +strncat(buf, "Hello ", 6);
> +strncat(buf, "world", 42);  // Padding null bytes ignored.
> +strncat(buf, "!", 1);
> +len = strlen(buf);
> +puts(buf);
> +.EE
> +.in
> +.\" ----- EXAMPLES :: mempcpy(3) --------------------------------------/
> +.TP
> +.BR mempcpy (3)
> +.in +4n
> +.EX
> +p = buf;
> +p = mempcpy(p, "Hello ", 6);
> +p = mempcpy(p, "world", 5);
> +p = mempcpy(p, "!", 1);
> +p = \(aq\e0\(aq;
> +len = p \- buf;
> +puts(buf);
> +.EE
> +.in
> +.\" ----- SEE ALSO :: -------------------------------------------------/
>  .SH SEE ALSO
> -.BR bcopy (3),
> -.BR memccpy (3),
> +.BR bzero (3),
>  .BR memcpy (3),
> -.BR memmove (3),
> -.BR stpcpy (3),
> -.BR strdup (3),
> -.BR string (3),
> -.BR wcscpy (3)
> +.BR memccpy (3),
> +.BR mempcpy (3),
> +.BR string (3)
> --
> 2.38.1
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 1/1] strcpy.3: Rewrite page to document all string-copying functions
  2022-12-14 16:22       ` Douglas McIlroy
@ 2022-12-14 16:36         ` Alejandro Colomar
  2022-12-14 17:11           ` Alejandro Colomar
  0 siblings, 1 reply; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-14 16:36 UTC (permalink / raw)
  To: Douglas McIlroy
  Cc: linux-man, Alejandro Colomar, Martin Sebor, G. Branden Robinson,
	Jakub Wilk


[-- Attachment #1.1: Type: text/plain, Size: 2017 bytes --]

Hi Doug!

Thanks for having a look at it!

On 12/14/22 17:22, Douglas McIlroy wrote:
>> a sequence of zero or more non-null characters followed by a null byte
> 
> Varying  terminology (character vs byte) is poor style in technical writing.

I thought of using null character, but it was longer, and I prefer shorter terms.

About using byte for everything... it feels a bit wrong especially when I'm 
trying to make a clear distinction between strings and character sequences that 
don't have a terminating NUL.

And, since the two are distinct things that should not be mixed (as far as 
strings are concerned), it doesn't feel so bad using different terms for them.

> 
>> concatenate
> 
> We began fighting this pomposity before v7. There has only been
> backsliding since..
> "Catenate" is crisper, means the same thing, and concurs with the "cat" command.
> I invite you to join the battle for simplicity.

Heh, I didn't know the word existed.  In Spanish we only have "concatenar". 
I'll happily join this battle for simplicity :)

> 
>> chain copy
> 
> This term is never overtly defined. The definition might be inferred
> from, "To chain copy
> functions, they need to return a pointer to the end", but the
> problematic grammar of the
> sentence diverts attention from its content.

Okay, I'll try to improve the wording in that paragraph; indeed that subsection 
intended to define the "chain copy" term.

> 
>> strscpy
> 
> Doesn't it muddy the waters to include a non-library function in man3?

Initially I wanted to discuss it because it always comes up in discussions about 
better string-copying functions.

But since I don't provide an implementation for it (since it's hard to get 
right) (as opposed to the other functions that are not in libraries, for which I 
show trivial implementations), and don't see it very useful, I can remove it. 
Less lines wasted with it.

Maybe I'll keep a small mention to it.

> 
> Doug

Cheers,

Alex

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 1/1] strcpy.3: Rewrite page to document all string-copying functions
  2022-12-14 16:36         ` Alejandro Colomar
@ 2022-12-14 17:11           ` Alejandro Colomar
  2022-12-14 17:19             ` Alejandro Colomar
  0 siblings, 1 reply; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-14 17:11 UTC (permalink / raw)
  To: Douglas McIlroy
  Cc: linux-man, Alejandro Colomar, Martin Sebor, G. Branden Robinson,
	Jakub Wilk


[-- Attachment #1.1: Type: text/plain, Size: 2409 bytes --]

Hi Doug,

On 12/14/22 17:36, Alejandro Colomar wrote:
> On 12/14/22 17:22, Douglas McIlroy wrote:
>>> chain copy
>>
>> This term is never overtly defined. The definition might be inferred
>> from, "To chain copy
>> functions, they need to return a pointer to the end", but the
>> problematic grammar of the
>> sentence diverts attention from its content.
> 
> Okay, I'll try to improve the wording in that paragraph; indeed that subsection 
> intended to define the "chain copy" term.
> 
>>

I'll hold on sending v5 to see if there is more feedback from others, but here's 
what I have for documenting the chain term:


@@ -202,15 +192,36 @@ .SS Terms (and abbreviations)
  It is used as a sentinel value,
  to be able to truncate strings or character sequences
  instead of overrunning the containing buffer.
-.\" ----- DESCRIPTION :: Copy, concatenate, and chain-copy ------------/
-.SS Copy, concatenate, and chain-copy
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: copy ------------/
+.TP
+.I copy
+This term is used when
+the writing starts at the first element pointed to by
+.IR dst .
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: catenate --------/
+.TP
+.I catenate
+This term is used when
+a function first finds the terminating null byte in
+.IR dst ,
+and then starts writing at that position.
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: chain -----------/
+.TP
+.I chain
+This term is used when
+it's the programmer who provides a pointer to the
+.IR end ,
+and the function starts writing at that location.
+.IR dst .
+.\" ----- DESCRIPTION :: Copy, catenate, and chain-copy ---------------/
+.SS Copy, catenate, and chain-copy
  Originally,
-there was a distinction between functions that copy and those that concatenate.
+there was a distinction between functions that copy and those that catenate.
  However, newer functions that copy while allowing chaining
  cover both use cases with a single API.
  They are also algorithmically faster,
  since they don't need to search for the end of the existing string.
-However, functions that concatenate have a much simpler use,
+However, functions that catenate have a much simpler use,
  so if performance is not important,
  it can make sense to use them for improving readability.
  .PP



Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 1/1] strcpy.3: Rewrite page to document all string-copying functions
  2022-12-14 17:11           ` Alejandro Colomar
@ 2022-12-14 17:19             ` Alejandro Colomar
  0 siblings, 0 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-14 17:19 UTC (permalink / raw)
  To: Douglas McIlroy
  Cc: linux-man, Alejandro Colomar, Martin Sebor, G. Branden Robinson,
	Jakub Wilk


[-- Attachment #1.1: Type: text/plain, Size: 3030 bytes --]



On 12/14/22 18:11, Alejandro Colomar wrote:
> Hi Doug,
> 
> On 12/14/22 17:36, Alejandro Colomar wrote:
>> On 12/14/22 17:22, Douglas McIlroy wrote:
>>>> chain copy
>>>
>>> This term is never overtly defined. The definition might be inferred
>>> from, "To chain copy
>>> functions, they need to return a pointer to the end", but the
>>> problematic grammar of the
>>> sentence diverts attention from its content.
>>
>> Okay, I'll try to improve the wording in that paragraph; indeed that 
>> subsection intended to define the "chain copy" term.
>>
>>>
> 
> I'll hold on sending v5 to see if there is more feedback from others, but here's 
> what I have for documenting the chain term:
> 
> 
> @@ -202,15 +192,36 @@ .SS Terms (and abbreviations)
>   It is used as a sentinel value,
>   to be able to truncate strings or character sequences
>   instead of overrunning the containing buffer.
> -.\" ----- DESCRIPTION :: Copy, concatenate, and chain-copy ------------/
> -.SS Copy, concatenate, and chain-copy
> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: copy ------------/
> +.TP
> +.I copy
> +This term is used when
> +the writing starts at the first element pointed to by
> +.IR dst .
> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: catenate --------/
> +.TP
> +.I catenate
> +This term is used when
> +a function first finds the terminating null byte in
> +.IR dst ,
> +and then starts writing at that position.
> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: chain -----------/
> +.TP
> +.I chain
> +This term is used when
> +it's the programmer who provides a pointer to the
> +.IR end ,
> +and the function starts writing at that location.
> +.IR dst .


@@ -213,6 +213,10 @@ .SS Terms (and abbreviations)
  .IR end ,
  and the function starts writing at that location.
  .IR dst .
+The function returns a pointer to the new
+.I end
+after the call,
+so that the programmer can use it to chain such calls.
  .\" ----- DESCRIPTION :: Copy, catenate, and chain-copy ---------------/
  .SS Copy, catenate, and chain-copy
  Originally,



And this is for completeness. :)


> +.\" ----- DESCRIPTION :: Copy, catenate, and chain-copy ---------------/
> +.SS Copy, catenate, and chain-copy
>   Originally,
> -there was a distinction between functions that copy and those that concatenate.
> +there was a distinction between functions that copy and those that catenate.
>   However, newer functions that copy while allowing chaining
>   cover both use cases with a single API.
>   They are also algorithmically faster,
>   since they don't need to search for the end of the existing string.
> -However, functions that concatenate have a much simpler use,
> +However, functions that catenate have a much simpler use,
>   so if performance is not important,
>   it can make sense to use them for improving readability.
>   .PP
> 
> 
> 
> Cheers,
> 
> Alex
> 

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v5 0/5] Rewrite pages about string-copying functions
  2022-12-14 16:17       ` [PATCH v4 " Alejandro Colomar
@ 2022-12-15  0:26         ` Alejandro Colomar
  2022-12-19 21:02           ` [PATCH v6 0/5] Rewrite documentation for " Alejandro Colomar
                             ` (5 more replies)
  2022-12-15  0:26         ` [PATCH v5 1/5] string_copy.7: Add page to document all string-copying functions Alejandro Colomar
                           ` (4 subsequent siblings)
  5 siblings, 6 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-15  0:26 UTC (permalink / raw)
  To: linux-man, Martin Sebor, G. Branden Robinson, Douglas McIlroy,
	Jakub Wilk, Serge Hallyn, Iker Pedrosa, Andrew Pinski
  Cc: Alejandro Colomar


Hi,

After a long investigation, I'm rewriting all manual pages about string-
copying functions (and also non-string ones).  When there was no term
for a thing, I used an invented one, and documented it, in an attempt to
form precedent; for example, for non-terminated strings (which is an
oxymoron, since strings are necessarily terminated) I used "character
sequence" (suggested by Martin, to improve my own "unterminated string".

v5 addresses a few suggestions by Doug, and also a few concerns by
Jakub.  I'm not anymore replacing current pages, but rather adding a new
one, and also rewriting the old ones to be consistent with it, but I
kept them as a quick reference, for those who need it.  They also have
complete example programs each.

This time, I'll send the formatted pages as replies to the corresponding
diffs, since there are several.

Cheers,

Alex


Alejandro Colomar (5):
  string_copy.7: Add page to document all string-copying functions
  stpecpy.3, stpecpyx.3, ustpcpy.3, ustr2stp.3, zustr2stp.3,
    zustr2ustp.3: Add new links to string_copy(7)
  stpcpy.3, strcpy.3, strcat.3: Document in a single page
  stpncpy.3, strncpy.3: Document in a single page
  strncat.3: Rewrite to be consistent with string_copy.7.

 man3/stpcpy.3      |  13 -
 man3/stpecpy.3     |   1 +
 man3/stpecpyx.3    |   1 +
 man3/stpncpy.3     | 163 +++++----
 man3/strcat.3      | 161 +--------
 man3/strcpy.3      | 226 +++++++-----
 man3/strncat.3     | 147 +++-----
 man3/strncpy.3     | 130 +------
 man3/ustpcpy.3     |   1 +
 man3/ustr2stp.3    |   1 +
 man3/zustr2stp.3   |   1 +
 man3/zustr2ustp.3  |   1 +
 man7/string_copy.7 | 869 +++++++++++++++++++++++++++++++++++++++++++++
 13 files changed, 1162 insertions(+), 553 deletions(-)
 create mode 100644 man3/stpecpy.3
 create mode 100644 man3/stpecpyx.3
 create mode 100644 man3/ustpcpy.3
 create mode 100644 man3/ustr2stp.3
 create mode 100644 man3/zustr2stp.3
 create mode 100644 man3/zustr2ustp.3
 create mode 100644 man7/string_copy.7

-- 
2.38.1


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v5 1/5] string_copy.7: Add page to document all string-copying functions
  2022-12-14 16:17       ` [PATCH v4 " Alejandro Colomar
  2022-12-15  0:26         ` [PATCH v5 0/5] Rewrite pages about " Alejandro Colomar
@ 2022-12-15  0:26         ` Alejandro Colomar
  2022-12-15  0:30           ` Alejandro Colomar
  2022-12-15  0:26         ` [PATCH v5 2/5] stpecpy.3, stpecpyx.3, ustpcpy.3, ustr2stp.3, zustr2stp.3, zustr2ustp.3: Add new links to string_copy(7) Alejandro Colomar
                           ` (3 subsequent siblings)
  5 siblings, 1 reply; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-15  0:26 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, Martin Sebor, G. Branden Robinson,
	Douglas McIlroy, Jakub Wilk, Serge Hallyn, Iker Pedrosa,
	Andrew Pinski

This is an opportunity to use consistent language across the
documentation for all string-copying functions.

It is also easier to show the similarities and differences between all
of the functions, so that a reader can use this page to know which
function is needed for a given task.

Alternative functions not provided by libc have been given in the same
page, with reference implementations.

Cc: Martin Sebor <msebor@redhat.com>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Douglas McIlroy <douglas.mcilroy@dartmouth.edu>
Cc: Jakub Wilk <jwilk@jwilk.net>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Iker Pedrosa <ipedrosa@redhat.com>
Cc: Andrew Pinski <pinskia@gmail.com>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man7/string_copy.7 | 869 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 869 insertions(+)
 create mode 100644 man7/string_copy.7

diff --git a/man7/string_copy.7 b/man7/string_copy.7
new file mode 100644
index 000000000..be8b841e2
--- /dev/null
+++ b/man7/string_copy.7
@@ -0,0 +1,869 @@
+.\" Copyright 2022 Alejandro Colomar <alx@kernel.org>
+.\"
+.\" SPDX-License-Identifier: BSD-3-Clause
+.\"
+.TH string_copy 7 (date) "Linux man-pages (unreleased)"
+.\" ----- NAME :: -----------------------------------------------------/
+.SH NAME
+stpcpy,
+strcpy, strcat,
+stpecpy, stpecpyx,
+strlcpy, strlcat,
+stpncpy,
+strncpy,
+zustr2ustp, zustr2stp,
+strncat,
+ustpcpy, ustr2stp
+\- copy strings and character sequences
+.\" ----- SYNOPSIS :: -------------------------------------------------/
+.SH SYNOPSIS
+.\" ----- SYNOPSIS :: (Null-terminated) strings -----------------------/
+.SS Strings
+.nf
+// Chain-copy a string.
+.BI "char *stpcpy(char *restrict " dst ", const char *restrict " src );
+.PP
+// Copy/catenate a string.
+.BI "char *strcpy(char *restrict " dst ", const char *restrict " src );
+.BI "char *strcat(char *restrict " dst ", const char *restrict " src );
+.PP
+// Chain-copy a string with truncation.
+.BI "char *stpecpy(char *" dst ", char " past_end "[0], \
+const char *restrict " src );
+.PP
+// Chain-copy a string with truncation and SIGSEGV on UB.
+.BI "char *stpecpyx(char *" dst ", char " past_end "[0], \
+const char *restrict " src );
+.PP
+// Copy/catenate a string with truncation and SIGSEGV on UB.
+.BI "size_t strlcpy(char " dst "[restrict ." sz "], \
+const char *restrict " src ,
+.BI "               size_t " sz );
+.BI "size_t strlcat(char " dst "[restrict ." sz "], \
+const char *restrict " src ,
+.BI "               size_t " sz );
+.fi
+.\" ----- SYNOPSIS :: Null-padded character sequences --------/
+.SS Null-padded character sequences
+.nf
+// Zero a fixed-width buffer, and
+// copy a string into a character sequence with truncation.
+.BI "char *stpncpy(char " dst "[restrict ." sz "], \
+const char *restrict " src ,
+.BI "               size_t " sz );
+.PP
+// Zero a fixed-width buffer, and
+// copy a string into a character sequence with truncation.
+.BI "char *strncpy(char " dest "[restrict ." sz "], \
+const char *restrict " src ,
+.BI "               size_t " sz );
+.PP
+// Chain-copy a null-padded character sequence into a character sequence.
+.BI "char *zustr2ustp(char *restrict " dst ", \
+const char " src "[restrict ." sz ],
+.BI "               size_t " sz );
+.PP
+// Chain-copy a null-padded character sequence into a string.
+.BI "char *zustr2stp(char *restrict " dst ", \
+const char " src "[restrict ." sz ],
+.BI "               size_t " sz );
+.PP
+// Catenate a null-padded character sequence into a string.
+.BI "char *strncat(char *restrict " dst ", const char " src "[restrict ." sz ],
+.BI "               size_t " sz );
+.fi
+.\" ----- SYNOPSIS :: Measured character sequences --------------------/
+.SS Measured character sequences
+.nf
+// Chain-copy a measured character sequence.
+.BI "char *ustpcpy(char *restrict " dst ", \
+const char " src "[restrict ." len ],
+.BI "               size_t " len );
+.PP
+// Chain-copy a measured character sequence into a string.
+.BI "char *ustr2stp(char *restrict " dst ", \
+const char " src "[restrict ." len ],
+.BI "               size_t " len );
+.fi
+.SH DESCRIPTION
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: -----------------/
+.SS Terms (and abbreviations)
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: string (str) ----/
+.TP
+.IR "string " ( str )
+is a sequence of zero or more non-null characters followed by a null byte.
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: null-padded character seq
+.TP
+.I character sequence
+is a sequence of zero or more non-null characters.
+A program should never usa a character sequence where a string is required.
+However, with appropriate care,
+a string can be used in the place of a character sequence.
+.RS
+.TP
+.IR "null-padded character sequence " ( zustr )
+Character sequences can be contained in fixed-width buffers,
+which contain padding null bytes after the character sequence,
+to fill the rest of the buffer
+without affecting the character sequence;
+however, those padding null bytes are not part of the character sequence.
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: measured character sequence
+.TP
+.IR "measured character sequence " ( ustr )
+Character sequence delimited by its length.
+It may be a slice of a larger character sequence,
+or even of a string.
+.RE
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: length (len) ----/
+.TP
+.IR "length " ( len )
+is the number of non-null characters in a string or character sequence.
+It is the return value of
+.I strlen(str)
+and of
+.IR "strnlen(ustr, sz)" .
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: size (sz) -------/
+.TP
+.IR "size " ( sz )
+refers to the entire buffer
+where the string or character sequence is contained.
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: end -------------/
+.TP
+.I end
+is the name of a pointer to the terminating null byte of a string,
+or a pointer to one past the last character of a character sequence.
+This is the return value of functions that allow chaining.
+It is equivalent to
+.IR &str[len] .
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: past_end --------/
+.TP
+.I past_end
+is the name of a pointer to one past the end of the buffer
+that contains a string or character sequence.
+It is equivalent to
+.IR &str[sz] .
+It is used as a sentinel value,
+to be able to truncate strings or character sequences
+instead of overrunning the containing buffer.
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: copy ------------/
+.TP
+.I copy
+This term is used when
+the writing starts at the first element pointed to by
+.IR dst .
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: catenate --------/
+.TP
+.I catenate
+This term is used when
+a function first finds the terminating null byte in
+.IR dst ,
+and then starts writing at that position.
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: chain -----------/
+.TP
+.I chain
+This term is used when
+it's the programmer who provides a pointer to the
+.I end
+in
+.IR dst ,
+and the function starts writing at that location.
+The function returns a pointer to the new
+.I end
+after the call,
+so that the programmer can use it to chain such calls.
+.\" ----- DESCRIPTION :: Copy, catenate, and chain-copy ---------------/
+.SS Copy, catenate, and chain-copy
+Originally,
+there was a distinction between functions that copy and those that catenate.
+However, newer functions that copy while allowing chaining
+cover both use cases with a single API.
+They are also algorithmically faster,
+since they don't need to search for the end of the existing string.
+However, functions that catenate have a much simpler use,
+so if performance is not important,
+it can make sense to use them for improving readability.
+.PP
+To chain copy functions,
+they need to return a pointer to the
+.IR end .
+That's a byproduct of the copy operation,
+so it has no performance costs.
+Functions that return such a pointer,
+and thus can be chained,
+have names of the form
+.RB * stp *(),
+since it's also common to name the pointer just
+.IR p .
+.PP
+Chain-copying functions that truncate
+should accept a pointer to one past the end of the destination buffer,
+and have names of the form
+.RB * stpe *().
+This allows not having to recalculate the remaining size after each call.
+.\" ----- DESCRIPTION :: Truncate or not? -----------------------------/
+.SS Truncate or not?
+The first thing to note is that programmers should be careful with buffers,
+so they always have the correct size,
+and truncation is not necessary.
+.PP
+In most cases,
+truncation is not desired,
+and it is simpler to just do the copy.
+Simpler code is safer code.
+Programming against programming mistakes by adding more code
+just adds more points where mistakes can be made.
+.PP
+Nowadays,
+compilers can detect most programmer errors with features like
+compiler warnings,
+static analyzers, and
+.BR \%_FORTIFY_SOURCE
+(see
+.BR ftm (7)).
+Keeping the code simple
+helps these overflow-detection features be more precise.
+.PP
+When validating user input,
+however,
+it makes sense to truncate.
+Remember to check the return value of such function calls.
+.PP
+Functions that truncate:
+.IP \(bu 3
+.BR stpecpy (3)
+is the most efficient string copy function that performs truncation.
+It only requires to check for truncation once after all chained calls.
+.IP \(bu
+.BR stpecpyx (3)
+is a variant of
+.BR stpecpy (3)
+that consumes the entire source string,
+to catch bugs in the program
+by forcing a segmentation fault (as
+.BR strlcpy (3bsd)
+and
+.BR strlcat (3bsd)
+do).
+.IP \(bu
+.BR strlcpy (3bsd)
+and
+.BR strlcat (3bsd)
+are designed to crash if the input string is invalid
+(doesn't contain a terminating null byte).
+.IP \(bu
+.BR stpncpy (3)
+and
+.BR strncpy (3)
+also truncate, but they don't write strings,
+but rather null-padded character sequences.
+.\" ----- DESCRIPTION :: Null-padded character sequences --------------/
+.SS Null-padded character sequences
+For historic reasons,
+some standard APIs,
+such as
+.BR utmpx (5),
+use null-padded character sequences in fixed-width buffers.
+To interface with them,
+specialized functions need to be used.
+.PP
+To copy strings into them, use
+.BR stpncpy (3).
+.PP
+To copy from an unterminated string within a fixed-width buffer into a string,
+ignoring any trailing null bytes in the source fixed-width buffer,
+you should use
+.BR zustr2stp (3)
+or
+.BR strncat (3).
+.PP
+To copy from an unterminated string within a fixed-width buffer
+into a character sequence,
+ingoring any trailing null bytes in the source fixed-width buffer,
+you should use
+.BR zustr2ustp (3).
+.\" ----- DESCRIPTION :: Measured character sequences -----------------/
+.SS Measured character sequences
+The simplest character sequence copying function is
+.BR mempcpy (3).
+It requires always knowing the length of your character sequences,
+for which structures can be used.
+It makes the code much faster,
+since you always know the length of your character sequences,
+and can do the minimal copies and length measurements.
+.BR mempcpy (3)
+copies character sequences,
+so you need to explicitly set the terminating null byte if you need a string.
+.PP
+However,
+for keeping type safety,
+it's good to add a wrapper that uses
+.I char\~*
+instead of
+.IR void\~* :
+.BR ustpcpy (3).
+.PP
+In programs that make considerable use of strings or character sequences,
+and need the best performance,
+using overlapping character sequences can make a big difference.
+It allows holding subsequences of a larger character sequence.
+while not duplicating memory
+nor using time to do a copy.
+.PP
+However, this is delicate,
+since it requires using character sequences.
+C library APIs use strings,
+so programs that use character sequences
+will have to take care of differentiating strings from character sequences.
+.PP
+To copy a measured character sequence, use
+.BR ustpcpy (3).
+.PP
+To copy a measured character sequence into a string, use
+.BR ustr2stp (3).
+.PP
+Because these functions ask for the length,
+and a string is by nature composed of a character sequence of the same length
+plus a terminating null byte,
+a string is also accepted as input.
+.\" ----- DESCRIPTION :: String vs character sequence -----------------/
+.SS String vs character sequence
+Some functions only operate on strings.
+Those require that the input
+.I src
+is a string,
+and guarantee an output string
+(even when truncation occurs).
+Functions that catenate
+also require that
+.I dst
+holds a string before the call.
+List of functions:
+.IP \(bu 3
+.PD 0
+.BR stpcpy (3)
+.IP \(bu
+.BR strcpy "(3), \c"
+.BR strcat (3)
+.IP \(bu
+.BR stpecpy "(3), \c"
+.BR stpecpyx (3)
+.IP \(bu
+.BR strlcpy "(3bsd), \c"
+.BR strlcat (3bsd)
+.PD
+.PP
+Other functions require an input string,
+but create a character sequence as output.
+These functions have confusing names,
+and have a long history of misuse.
+List of functions:
+.IP \(bu 3
+.PD 0
+.BR stpncpy (3)
+.IP \(bu
+.BR strncpy (3)
+.PD
+.PP
+Other functions operate on an input character sequence,
+and create an output string.
+Functions that catenate
+also require that
+.I dst
+holds a string before the call.
+.BR strncat (3)
+has an even more misleading name than the functions above.
+List of functions:
+.IP \(bu 3
+.PD 0
+.BR zustr2stp (3)
+.IP \(bu
+.BR strncat (3)
+.IP \(bu
+.BR ustr2stp (3)
+.PD
+.PP
+Other functions operate on an input character sequence
+to create an output character sequence.
+List of functions:
+.IP \(bu 3
+.PD 0
+.BR ustpcpy (3)
+.IP \(bu
+.BR zustr2stp (3)
+.PD
+.\" ----- DESCRIPTION :: Functions :: ---------------------------------/
+.SS Functions
+.\" ----- DESCRIPTION :: Functions :: stpcpy(3) -----------------------/
+.TP
+.BR stpcpy (3)
+This function copies the input string into a destination string.
+The programmer is responsible for allocating a buffer large enough.
+It returns a pointer suitable for chaining.
+.\" ----- DESCRIPTION :: Functions :: strcpy(3), strcat(3) ------------/
+.TP
+.BR strcpy (3)
+.TQ
+.BR strcat (3)
+These functions copy and catenate the input string into a destination string.
+The programmer is responsible for allocating a buffer large enough.
+The return value is useless.
+.IP
+.BR stpcpy (3)
+is a faster alternative to these functions.
+.\" ----- DESCRIPTION :: Functions :: stpecpy(3), stpecpyx(3) ---------/
+.TP
+.BR stpecpy (3)
+.TQ
+.BR stpecpyx (3)
+These functions copy the input string into a destination string.
+If the destination buffer,
+limited by a pointer to one past the end of it,
+isn't large enough to hold the copy,
+the resulting string is truncated
+(but it is guaranteed to be null-terminated).
+They return a pointer suitable for chaining.
+Truncation needs to be detected only once after the last chained call.
+.BR stpecpyx (3)
+has identical semantics to
+.BR stpecpy (3),
+except that it forces a SIGSEGV if the
+.I src
+pointer is not a string.
+.IP
+These functions are not provided by any library;
+See EXAMPLES for a reference implementation.
+.\" ----- DESCRIPTION :: Functions :: strlcpy(3bsd), strlcat(3bsd) ----/
+.TP
+.BR strlcpy (3bsd)
+.TQ
+.BR strlcat (3bsd)
+These functions copy and catenate the input string into a destination string.
+If the destination buffer,
+limited by its size,
+isn't large enough to hold the copy,
+the resulting string is truncated
+(but it is guaranteed to be null-terminated).
+They return the length of the total string they tried to create.
+These functions force a SIGSEGV if the
+.I src
+pointer is not a string.
+.IP
+.BR stpecpyx (3)
+is a faster alternative to these functions.
+.\" ----- DESCRIPTION :: Functions :: stpncpy(3) ----------------------/
+.TP
+.BR stpncpy (3)
+This function copies the input string into
+a destination null-padded character sequence in a fixed-width buffer.
+If the destination buffer,
+limited by its size,
+isn't large enough to hold the copy,
+the resulting character sequence is truncated.
+Since it creates a character sequence,
+it doesn't need to write a terminating null byte.
+It's impossible to distinguish truncation after the call,
+from a character sequence that just fits the destination buffer;
+truncation should be detected from the length of the original string.
+.\" ----- DESCRIPTION :: Functions :: strncpy(3) ----------------------/
+.TP
+.BR strncpy (3)
+This function is identical to
+.BR stpncpy (3)
+except for the useless return value.
+.IP
+.BR stpncpy (3)
+is a more useful alternative to this function.
+.\" ----- DESCRIPTION :: Functions :: zustr2ustp(3) --------------------/
+.TP
+.BR zustr2ustp (3)
+This function copies the input character sequence
+contained in a null-padded wixed-width buffer,
+into a destination character sequence.
+The programmer is responsible for allocating a buffer large enough.
+It returns a pointer suitable for chaining.
+.IP
+A truncating version of this function doesn't exist,
+since the size of the original character sequence is always known,
+so it wouldn't be very useful.
+.IP
+This function is not provided by any library;
+See EXAMPLES for a reference implementation.
+.\" ----- DESCRIPTION :: Functions :: zustr2stp(3) --------------------/
+.TP
+.BR zustr2stp (3)
+This function copies the input character sequence
+contained in a null-padded wixed-width buffer,
+into a destination string.
+The programmer is responsible for allocating a buffer large enough.
+It returns a pointer suitable for chaining.
+.IP
+A truncating version of this function doesn't exist,
+since the size of the original character sequence is always known,
+so it wouldn't be very useful.
+.IP
+This function is not provided by any library;
+See EXAMPLES for a reference implementation.
+.\" ----- DESCRIPTION :: Functions :: strncat(3) ----------------------/
+.TP
+.BR strncat (3)
+Do not confuse this function with
+.BR strncpy (3);
+they are not related at all.
+.IP
+This function catenates the input character sequence
+contained in a null-padded wixed-width buffer,
+into a destination string.
+The programmer is responsible for allocating a buffer large enough.
+The return value is useless.
+.IP
+.BR zustr2stp (3)
+is a faster alternative to this function.
+.\" ----- DESCRIPTION :: Functions :: ustpcpy(3) ----------------------/
+.TP
+.BR ustpcpy (3)
+This function copies the input character sequence,
+limited by its length,
+into a destination character sequence.
+The programmer is responsible for allocating a buffer large enough.
+It returns a pointer suitable for chaining.
+.\" ----- DESCRIPTION :: Functions :: ustr2stp(3) ---------------------/
+.TP
+.BR ustr2stp (3)
+This function copies the input character sequence,
+limited by its length,
+into a destination string.
+The programmer is responsible for allocating a buffer large enough.
+It returns a pointer suitable for chaining.
+.\" ----- RETURN VALUE :: ---------------------------------------------/
+.SH RETURN VALUE
+The following functions return
+a pointer to the terminating null byte in the destination string.
+.IP \(bu 3
+.PD 0
+.BR stpcpy (3)
+.IP \(bu
+.BR ustr2stp (3)
+.IP \(bu
+.BR zustr2stp (3)
+.PD
+.PP
+The following functions return
+a pointer to the terminating null byte in the destination string,
+except when truncation occurs;
+if truncation occurs,
+they return a pointer to one past the end of the destination buffer
+.RI ( past_end ).
+.IP \(bu 3
+.BR stpecpy (3),
+.BR stpecpyx (3)
+.PP
+The following function returns
+a pointer to one after the last character
+in the destination character sequence;
+if truncation occurs,
+that pointer is equivalent to
+a pointer to one past the end of the destination buffer.
+.IP \(bu 3
+.BR stpncpy (3)
+.PP
+The following functions return
+a pointer to one after the last character
+in the destination character sequence.
+.IP \(bu 3
+.PD 0
+.BR zustr2ustp (3)
+.IP \(bu
+.BR ustpcpy (3)
+.PD
+.PP
+The following functions return
+the length of the total string that they tried to create
+(as if truncation didn't occur).
+.IP \(bu 3
+.BR strlcpy (3bsd),
+.BR strlcat (3bsd)
+.PP
+The following functions return the
+.I dst
+pointer,
+which is useless.
+.IP \(bu 3
+.PD 0
+.BR strcpy (3),
+.BR strcat (3)
+.IP \(bu
+.BR strncpy (3)
+.IP \(bu
+.BR strncat (3)
+.PD
+.\" ----- NOTES :: strscpy(9) -----------------------------------------/
+.SH NOTES
+The Linux kernel has an internal function for copying strings,
+which is similar to
+.BR stpecpy (3),
+except that it can't be chained:
+.TP
+.BR strscpy (9)
+This function copies the input string into a destination string.
+If the destination buffer,
+limited by its size,
+isn't large enough to hold the copy,
+the resulting string is truncated
+(but it is guaranteed to be null-terminated).
+It returns the length of the destination string, or
+.B \-E2BIG
+on truncation.
+.IP
+.BR stpecpy (3)
+is a simpler and faster alternative to this function.
+.RE
+.\" ----- CAVEATS :: --------------------------------------------------/
+.SH CAVEATS
+Don't mix chain calls to truncating and non-truncating functions.
+It is conceptually wrong
+unless you know that the first part of a copy will always fit.
+Anyway, the performance difference will probably be negligible,
+so it will probably be more clear if you use consistent semantics:
+either truncating or non-truncating.
+Calling a non-truncating function after a truncating one is necessarily wrong.
+.\" ----- BUGS :: -----------------------------------------------------/
+.SH BUGS
+All catenation functions share the same performance problem:
+.UR https://www.joelonsoftware.com/\:2001/12/11/\:back\-to\-basics/
+Shlemiel the painter
+.UE .
+.\" ----- EXAMPLES :: -------------------------------------------------/
+.SH EXAMPLES
+The following are examples of correct use of each of these functions.
+.\" ----- EXAMPLES :: stpcpy(3) ---------------------------------------/
+.TP
+.BR stpcpy (3)
+.EX
+p = buf;
+p = stpcpy(p, "Hello ");
+p = stpcpy(p, "world");
+p = stpcpy(p, "!");
+len = p \- buf;
+puts(buf);
+.EE
+.\" ----- EXAMPLES :: strcpy(3), strcat(3) ----------------------------/
+.TP
+.BR strcpy (3)
+.TQ
+.BR strcat (3)
+.EX
+strcpy(buf, "Hello ");
+strcat(buf, "world");
+strcat(buf, "!");
+len = strlen(buf);
+puts(buf);
+.EE
+.\" ----- EXAMPLES :: stpecpy(3), stpecpyx(3) -------------------------/
+.TP
+.BR stpecpy (3)
+.TQ
+.BR stpecpyx (3)
+.EX
+past_end = buf + sizeof(buf);
+p = buf;
+p = stpecpy(p, past_end, "Hello ");
+p = stpecpy(p, past_end, "world");
+p = stpecpy(p, past_end, "!");
+if (p == past_end) {
+    p\-\-;
+    goto toolong;
+}
+len = p \- buf;
+puts(buf);
+.EE
+.\" ----- EXAMPLES :: strlcpy(3bsd), strlcat(3bsd) --------------------/
+.TP
+.BR strlcpy (3bsd)
+.TQ
+.BR strlcat (3bsd)
+.EX
+if (strlcpy(buf, "Hello ", sizeof(buf)) >= sizeof(buf))
+    goto toolong;
+if (strlcat(buf, "world", sizeof(buf)) >= sizeof(buf))
+    goto toolong;
+len = strlcat(buf, "!", sizeof(buf));
+if (len >= sizeof(buf))
+    goto toolong;
+puts(buf);
+.EE
+.\" ----- EXAMPLES :: strscpy(9) --------------------------------------/
+.TP
+.BR strscpy (9)
+.EX
+len = strscpy(buf, "Hello world!", sizeof(buf));
+if (len == \-E2BIG)
+    goto toolong;
+puts(buf);
+.EE
+.\" ----- EXAMPLES :: stpncpy(3) --------------------------------------/
+.TP
+.BR stpncpy (3)
+.EX
+end = stpncpy(buf, "Hello world!", sizeof(buf));
+if (sizeof(buf) < strlen("Hello world!"))
+    goto toolong;
+len = end \- buf;
+for (size_t i = 0; i < sizeof(buf); i++)
+    putchar(buf[i]);
+.EE
+.\" ----- EXAMPLES :: strncpy(3) --------------------------------------/
+.TP
+.BR strncpy (3)
+.EX
+strncpy(buf, "Hello world!", sizeof(buf));
+if (sizeof(buf) < strlen("Hello world!"))
+    goto toolong;
+len = strnlen(buf, sizeof(buf));
+for (size_t i = 0; i < sizeof(buf); i++)
+    putchar(buf[i]);
+.EE
+.\" ----- EXAMPLES :: zustr2ustp(3) -----------------------------------/
+.TP
+.BR zustr2ustp (3)
+.EX
+p = buf;
+p = zustr2ustp(p, "Hello ", 6);
+p = zustr2ustp(p, "world", 42);  // Padding null bytes ignored.
+p = zustr2ustp(p, "!", 1);
+len = p \- buf;
+printf("%.*s\en", (int) len, buf);
+.EE
+.\" ----- EXAMPLES :: zustr2stp(3) ------------------------------------/
+.TP
+.BR zustr2stp (3)
+.EX
+p = buf;
+p = zustr2stp(p, "Hello ", 6);
+p = zustr2stp(p, "world", 42);  // Padding null bytes ignored.
+p = zustr2stp(p, "!", 1);
+len = p \- buf;
+puts(buf);
+.EE
+.\" ----- EXAMPLES :: strncat(3) --------------------------------------/
+.TP
+.BR strncat (3)
+.EX
+buf[0] = \(aq\e0\(aq;  // There's no 'cpy' function to this 'cat'.
+strncat(buf, "Hello ", 6);
+strncat(buf, "world", 42);  // Padding null bytes ignored.
+strncat(buf, "!", 1);
+len = strlen(buf);
+puts(buf);
+.EE
+.\" ----- EXAMPLES :: ustpcpy(3) --------------------------------------/
+.TP
+.BR ustpcpy (3)
+.EX
+p = buf;
+p = ustpcpy(p, "Hello ", 6);
+p = ustpcpy(p, "world", 5);
+p = ustpcpy(p, "!", 1);
+len = p \- buf;
+printf("%.*s\en", (int) len, buf);
+.EE
+.\" ----- EXAMPLES :: ustr2stp(3) -------------------------------------/
+.TP
+.BR ustr2stp (3)
+.EX
+p = buf;
+p = ustr2stp(p, "Hello ", 6);
+p = ustr2stp(p, "world", 5);
+p = ustr2stp(p, "!", 1);
+len = p \- buf;
+puts(buf);
+.EE
+.\" ----- EXAMPLES :: Implementations :: ------------------------------/
+.SS Implementations
+Here are reference implementations for functions not provided by libc.
+.PP
+.in +4n
+.EX
+/* This code is in the public domain. */
+
+.\" ----- EXAMPLES :: Implementations :: stpecpy(3) -------------------/
+char *
+.IR stpecpy "(char *dst, char past_end[0], const char *restrict src)"
+{
+    char *p;
+
+    if (dst == past_end)
+        return past_end;
+
+    p = memccpy(dst, src, \(aq\e0\(aq, past_end \- dst);
+    if (p != NULL)
+        return p \- 1;
+
+    /* truncation detected */
+    past_end[\-1] = \(aq\e0\(aq;
+    return past_end;
+}
+
+.\" ----- EXAMPLES :: Implementations :: stpecpy(3) -------------------/
+char *
+.IR stpecpyx "(char *dst, char past_end[0], const char *restrict src)"
+{
+    if (src[strlen(src)] != \(aq\e0\(aq)
+        raise(SIGSEGV);
+
+    return stpecpy(dst, past_end, src);
+}
+
+.\" ----- EXAMPLES :: Implementations :: zustr2ustp(3) ----------------/
+char *
+.IR zustr2ustp "(char *restrict dst, const char *restrict src, size_t sz)"
+{
+    return ustpcpy(dst, src, strnlen(src, sz));
+}
+
+.\" ----- EXAMPLES :: Implementations :: zustr2stp(3) -----------------/
+char *
+.IR zustr2stp "(char *restrict dst, const char *restrict src, size_t sz)"
+{
+    char  *end;
+
+    end = zustr2ustp(dst, src, sz);
+    *end = \(aq\e0\(aq;
+
+    return end;
+}
+
+.\" ----- EXAMPLES :: Implementations :: ustpcpy(3) -------------------/
+char *
+.IR ustpcpy "(char *restrict dst, const char *restrict src, size_t len)"
+{
+    return mempcpy(dst, src, len);
+}
+
+.\" ----- EXAMPLES :: Implementations :: ustr2stp(3) ------------------/
+char *
+.IR ustr2stp "(char *restrict dst, const char *restrict src, size_t len)"
+{
+    char  *end;
+
+    end = ustpcpy(dst, src, len);
+    *end = \(aq\e0\(aq;
+
+    return end;
+}
+.EE
+.in
+.EE
+.in
+.EE
+.in
+.\" ----- SEE ALSO :: -------------------------------------------------/
+.SH SEE ALSO
+.BR bzero (3),
+.BR memcpy (3),
+.BR memccpy (3),
+.BR mempcpy (3),
+.BR stpcpy (3),
+.BR strlcpy (3bsd),
+.BR strncat (3),
+.BR strpcpy (3),
+.BR string (3)
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v5 2/5] stpecpy.3, stpecpyx.3, ustpcpy.3, ustr2stp.3, zustr2stp.3, zustr2ustp.3: Add new links to string_copy(7)
  2022-12-14 16:17       ` [PATCH v4 " Alejandro Colomar
  2022-12-15  0:26         ` [PATCH v5 0/5] Rewrite pages about " Alejandro Colomar
  2022-12-15  0:26         ` [PATCH v5 1/5] string_copy.7: Add page to document all string-copying functions Alejandro Colomar
@ 2022-12-15  0:26         ` Alejandro Colomar
  2022-12-15  0:27           ` Alejandro Colomar
  2022-12-15  0:26         ` [PATCH v5 3/5] stpcpy.3, strcpy.3, strcat.3: Document in a single page Alejandro Colomar
                           ` (2 subsequent siblings)
  5 siblings, 1 reply; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-15  0:26 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, Martin Sebor, G. Branden Robinson,
	Douglas McIlroy, Jakub Wilk, Serge Hallyn, Iker Pedrosa,
	Andrew Pinski

Cc: Martin Sebor <msebor@redhat.com>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Douglas McIlroy <douglas.mcilroy@dartmouth.edu>
Cc: Jakub Wilk <jwilk@jwilk.net>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Iker Pedrosa <ipedrosa@redhat.com>
Cc: Andrew Pinski <pinskia@gmail.com>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man3/stpecpy.3    | 1 +
 man3/stpecpyx.3   | 1 +
 man3/ustpcpy.3    | 1 +
 man3/ustr2stp.3   | 1 +
 man3/zustr2stp.3  | 1 +
 man3/zustr2ustp.3 | 1 +
 6 files changed, 6 insertions(+)
 create mode 100644 man3/stpecpy.3
 create mode 100644 man3/stpecpyx.3
 create mode 100644 man3/ustpcpy.3
 create mode 100644 man3/ustr2stp.3
 create mode 100644 man3/zustr2stp.3
 create mode 100644 man3/zustr2ustp.3

diff --git a/man3/stpecpy.3 b/man3/stpecpy.3
new file mode 100644
index 000000000..6ff53887b
--- /dev/null
+++ b/man3/stpecpy.3
@@ -0,0 +1 @@
+.so man7/string_copy.7
diff --git a/man3/stpecpyx.3 b/man3/stpecpyx.3
new file mode 100644
index 000000000..6ff53887b
--- /dev/null
+++ b/man3/stpecpyx.3
@@ -0,0 +1 @@
+.so man7/string_copy.7
diff --git a/man3/ustpcpy.3 b/man3/ustpcpy.3
new file mode 100644
index 000000000..6ff53887b
--- /dev/null
+++ b/man3/ustpcpy.3
@@ -0,0 +1 @@
+.so man7/string_copy.7
diff --git a/man3/ustr2stp.3 b/man3/ustr2stp.3
new file mode 100644
index 000000000..6ff53887b
--- /dev/null
+++ b/man3/ustr2stp.3
@@ -0,0 +1 @@
+.so man7/string_copy.7
diff --git a/man3/zustr2stp.3 b/man3/zustr2stp.3
new file mode 100644
index 000000000..6ff53887b
--- /dev/null
+++ b/man3/zustr2stp.3
@@ -0,0 +1 @@
+.so man7/string_copy.7
diff --git a/man3/zustr2ustp.3 b/man3/zustr2ustp.3
new file mode 100644
index 000000000..6ff53887b
--- /dev/null
+++ b/man3/zustr2ustp.3
@@ -0,0 +1 @@
+.so man7/string_copy.7
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v5 3/5] stpcpy.3, strcpy.3, strcat.3: Document in a single page
  2022-12-14 16:17       ` [PATCH v4 " Alejandro Colomar
                           ` (2 preceding siblings ...)
  2022-12-15  0:26         ` [PATCH v5 2/5] stpecpy.3, stpecpyx.3, ustpcpy.3, ustr2stp.3, zustr2stp.3, zustr2ustp.3: Add new links to string_copy(7) Alejandro Colomar
@ 2022-12-15  0:26         ` Alejandro Colomar
  2022-12-16 14:46           ` Alejandro Colomar
  2022-12-15  0:26         ` [PATCH v5 4/5] stpncpy.3, strncpy.3: " Alejandro Colomar
  2022-12-15  0:26         ` [PATCH v5 5/5] strncat.3: Rewrite to be consistent with string_copy.7 Alejandro Colomar
  5 siblings, 1 reply; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-15  0:26 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, Martin Sebor, G. Branden Robinson,
	Douglas McIlroy, Jakub Wilk, Serge Hallyn, Iker Pedrosa,
	Andrew Pinski

Rewrite to be consistent with the new string_copy.7 page.

Cc: Martin Sebor <msebor@redhat.com>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Douglas McIlroy <douglas.mcilroy@dartmouth.edu>
Cc: Jakub Wilk <jwilk@jwilk.net>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Iker Pedrosa <ipedrosa@redhat.com>
Cc: Andrew Pinski <pinskia@gmail.com>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man3/stpcpy.3 |  13 ---
 man3/strcat.3 | 161 +----------------------------------
 man3/strcpy.3 | 226 +++++++++++++++++++++++++++++++-------------------
 3 files changed, 143 insertions(+), 257 deletions(-)

diff --git a/man3/stpcpy.3 b/man3/stpcpy.3
index 5770790fc..d01c0239b 100644
--- a/man3/stpcpy.3
+++ b/man3/stpcpy.3
@@ -14,19 +14,6 @@ .SH SYNOPSIS
 .PP
 .BI "char *stpcpy(char *restrict " dest ", const char *restrict " src );
 .fi
-.PP
-.RS -4
-Feature Test Macro Requirements for glibc (see
-.BR feature_test_macros (7)):
-.RE
-.PP
-.BR stpcpy ():
-.nf
-    Since glibc 2.10:
-        _POSIX_C_SOURCE >= 200809L
-    Before glibc 2.10:
-        _GNU_SOURCE
-.fi
 .SH DESCRIPTION
 The
 .BR stpcpy ()
diff --git a/man3/strcat.3 b/man3/strcat.3
index 277e5b1e4..ff7476a84 100644
--- a/man3/strcat.3
+++ b/man3/strcat.3
@@ -1,160 +1 @@
-.\" Copyright 1993 David Metcalfe (david@prism.demon.co.uk)
-.\"
-.\" SPDX-License-Identifier: Linux-man-pages-copyleft
-.\"
-.\" References consulted:
-.\"     Linux libc source code
-.\"     Lewine's _POSIX Programmer's Guide_ (O'Reilly & Associates, 1991)
-.\"     386BSD man pages
-.\" Modified Sat Jul 24 18:11:47 1993 by Rik Faith (faith@cs.unc.edu)
-.\" 2007-06-15, Marc Boyer <marc.boyer@enseeiht.fr> + mtk
-.\"     Improve discussion of strncat().
-.TH strcat 3 (date) "Linux man-pages (unreleased)"
-.SH NAME
-strcat \- concatenate two strings
-.SH LIBRARY
-Standard C library
-.RI ( libc ", " \-lc )
-.SH SYNOPSIS
-.nf
-.B #include <string.h>
-.PP
-.BI "char *strcat(char *restrict " dest ", const char *restrict " src );
-.fi
-.SH DESCRIPTION
-The
-.BR strcat ()
-function appends the
-.I src
-string to the
-.I dest
-string,
-overwriting the terminating null byte (\(aq\e0\(aq) at the end of
-.IR dest ,
-and then adds a terminating null byte.
-The strings may not overlap, and the
-.I dest
-string must have
-enough space for the result.
-If
-.I dest
-is not large enough, program behavior is unpredictable;
-.IR "buffer overruns are a favorite avenue for attacking secure programs" .
-.SH RETURN VALUE
-The
-.BR strcat ()
-function returns a pointer to the resulting string
-.IR dest .
-.SH ATTRIBUTES
-For an explanation of the terms used in this section, see
-.BR attributes (7).
-.ad l
-.nh
-.TS
-allbox;
-lbx lb lb
-l l l.
-Interface	Attribute	Value
-T{
-.BR strcat (),
-.BR strncat ()
-T}	Thread safety	MT-Safe
-.TE
-.hy
-.ad
-.sp 1
-.SH STANDARDS
-POSIX.1-2001, POSIX.1-2008, C89, C99, SVr4, 4.3BSD.
-.SH NOTES
-Some systems (the BSDs, Solaris, and others) provide the following function:
-.PP
-.in +4n
-.EX
-size_t strlcat(char *dest, const char *src, size_t size);
-.EE
-.in
-.PP
-This function appends the null-terminated string
-.I src
-to the string
-.IR dest ,
-copying at most
-.I size\-strlen(dest)\-1
-from
-.IR src ,
-and adds a terminating null byte to the result,
-.I unless
-.I size
-is less than
-.IR strlen(dest) .
-This function fixes the buffer overrun problem of
-.BR strcat (),
-but the caller must still handle the possibility of data loss if
-.I size
-is too small.
-The function returns the length of the string
-.BR strlcat ()
-tried to create; if the return value is greater than or equal to
-.IR size ,
-data loss occurred.
-If data loss matters, the caller
-.I must
-either check the arguments before the call, or test the function return value.
-.BR strlcat ()
-is not present in glibc and is not standardized by POSIX,
-.\" https://lwn.net/Articles/506530/
-but is available on Linux via the
-.I libbsd
-library.
-.\"
-.SH EXAMPLES
-Because
-.BR strcat ()
-must find the null byte that terminates the string
-.I dest
-using a search that starts at the beginning of the string,
-the execution time of this function
-scales according to the length of the string
-.IR dest .
-This can be demonstrated by running the program below.
-(If the goal is to concatenate many strings to one target,
-then manually copying the bytes from each source string
-while maintaining a pointer to the end of the target string
-will provide better performance.)
-.\"
-.SS Program source
-\&
-.\" SRC BEGIN (strcat.c)
-.EX
-#include <stdint.h>
-#include <stdio.h>
-#include <string.h>
-#include <time.h>
-
-int
-main(void)
-{
-#define LIM 4000000
-    char p[LIM + 1];    /* +1 for terminating null byte */
-    time_t base;
-
-    base = time(NULL);
-    p[0] = \(aq\e0\(aq;
-
-    for (unsigned int j = 0; j < LIM; j++) {
-        if ((j % 10000) == 0)
-            printf("%u %jd\en", j, (intmax_t) (time(NULL) \- base));
-        strcat(p, "a");
-    }
-}
-.EE
-.\" SRC END
-.SH SEE ALSO
-.BR bcopy (3),
-.BR memccpy (3),
-.BR memcpy (3),
-.BR strcpy (3),
-.BR string (3),
-.BR strlcat (3bsd),
-.BR wcscat (3),
-.BR wcsncat (3)
+.so man3/strcpy.3
diff --git a/man3/strcpy.3 b/man3/strcpy.3
index 74c3180ae..424648c46 100644
--- a/man3/strcpy.3
+++ b/man3/strcpy.3
@@ -1,20 +1,10 @@
-.\" Copyright (C) 1993 David Metcalfe (david@prism.demon.co.uk)
+.\" Copyright 2022 Alejandro Colomar <alx@kernel.org>
 .\"
 .\" SPDX-License-Identifier: Linux-man-pages-copyleft
 .\"
-.\" References consulted:
-.\"     Linux libc source code
-.\"     Lewine's _POSIX Programmer's Guide_ (O'Reilly & Associates, 1991)
-.\"     386BSD man pages
-.\" Modified Sat Jul 24 18:06:49 1993 by Rik Faith (faith@cs.unc.edu)
-.\" Modified Fri Aug 25 23:17:51 1995 by Andries Brouwer (aeb@cwi.nl)
-.\" Modified Wed Dec 18 00:47:18 1996 by Andries Brouwer (aeb@cwi.nl)
-.\" 2007-06-15, Marc Boyer <marc.boyer@enseeiht.fr> + mtk
-.\"     Improve discussion of strncpy().
-.\"
 .TH strcpy 3 (date) "Linux man-pages (unreleased)"
 .SH NAME
-strcpy \- copy a string
+strcpy \- copy or catenate a string
 .SH LIBRARY
 Standard C library
 .RI ( libc ", " \-lc )
@@ -22,26 +12,87 @@ .SH SYNOPSIS
 .nf
 .B #include <string.h>
 .PP
-.BI "char *strcpy(char *restrict " dest ", const char *restrict " src );
+.BI "char *stpcpy(char *restrict " dst ", const char *restrict " src );
+.BI "char *strcpy(char *restrict " dst ", const char *restrict " src );
+.BI "char *strcat(char *restrict " dst ", const char *restrict " src );
+.fi
+.PP
+.RS -4
+Feature Test Macro Requirements for glibc (see
+.BR feature_test_macros (7)):
+.RE
+.PP
+.BR stpcpy ():
+.nf
+    Since glibc 2.10:
+        _POSIX_C_SOURCE >= 200809L
+    Before glibc 2.10:
+        _GNU_SOURCE
 .fi
 .SH DESCRIPTION
-The
+.TP
+.BR stpcpy ()
+.TQ
 .BR strcpy ()
-function copies the string pointed to by
+These functions copy the string pointed to by
 .IR src ,
-including the terminating null byte (\(aq\e0\(aq),
-to the buffer pointed to by
-.IR dest .
-The strings may not overlap, and the destination string
-.I dest
-must be large enough to receive the copy.
-.I Beware of buffer overruns!
-(See BUGS.)
+into a string
+at the buffer pointed to by
+.IR dst .
+The programmer is responsible for allocating a buffer large enough,
+that is,
+.IR "strlen(src) + 1" .
+They only differ in the return value.
+.TP
+.BR strcat ()
+This function catenates the string pointed to by
+.IR src ,
+at the end of the string pointed to by
+.IR dst .
+The programmer is responsible for allocating a buffer large enough,
+that is,
+.IR "strlen(dst) + strlen(src) + 1" .
+.PP
+An implementation of these functions might be:
+.PP
+.in +4n
+.EX
+char *
+stpcpy(char *restrict dst, const char *restrict src)
+{
+    char  *end;
+
+    end = mempcpy(dst, src, strlen(src));
+    *end = \(aq\e0\(aq;
+
+    return end;
+}
+
+char *
+strcpy(char *restrict dst, const char *restrict src)
+{
+    stpcpy(dst, src);
+    return dst;
+}
+
+char *
+strcat(char *restrict dst, const char *restrict src)
+{
+    stpcpy(dst + strlen(dst), src);
+    return dst;
+}
+.EE
+.in
 .SH RETURN VALUE
-The
+.TP
+.BR stpcpy ()
+This function returns
+a pointer to the terminating null byte at the end of the copied string.
+.TP
 .BR strcpy ()
-function returns a pointer to
-the destination string
+.TQ
+.BR strcat ()
+These functions return
 .IR dest .
 .SH ATTRIBUTES
 For an explanation of the terms used in this section, see
@@ -54,73 +105,80 @@ .SH ATTRIBUTES
 l l l.
 Interface	Attribute	Value
 T{
-.BR strcpy ()
+.BR stpcpy (),
+.BR strcpy (),
+.BR strcat ()
 T}	Thread safety	MT-Safe
 .TE
 .hy
 .ad
 .sp 1
 .SH STANDARDS
+.TP
+.BR stpcpy ()
+POSIX.1-2008.
+.TP
+.BR strcpy ()
+.TQ
+.BR strcat ()
 POSIX.1-2001, POSIX.1-2008, C89, C99, SVr4, 4.3BSD.
-.SH NOTES
-.SS strlcpy()
-Some systems (the BSDs, Solaris, and others) provide the following function:
+.SH CAVEATS
+The strings
+.I src
+and
+.I dst
+may not overlap.
 .PP
-.in +4n
-.EX
-size_t strlcpy(char *dest, const char *src, size_t size);
-.EE
-.in
-.PP
-.\" http://static.usenix.org/event/usenix99/full_papers/millert/millert_html/index.html
-.\"     "strlcpy and strlcat - consistent, safe, string copy and concatenation"
-.\"     1999 USENIX Annual Technical Conference
-This function is similar to
-.BR strcpy (),
-but it copies at most
-.I size\-1
-bytes to
-.IR dest ,
-truncating the string as necessary.
-It always adds a terminating null byte.
-This function fixes some of the problems of
-.BR strcpy ()
-but the caller must still handle the possibility of data loss if
-.I size
-is too small.
-The return value of the function is the length of
-.IR src ,
-which allows truncation to be easily detected:
-if the return value is greater than or equal to
-.IR size ,
-truncation occurred.
-If loss of data matters, the caller
-.I must
-either check the arguments before the call,
-or test the function return value.
-.BR strlcpy ()
-is not present in glibc and is not standardized by POSIX,
-.\" https://lwn.net/Articles/506530/
-but is available on Linux via the
-.I libbsd
-library.
+If the destination buffer is not large enough,
+the behavior is undefined.
+See
+.B _FORTIFY_SOURCE
+in
+.BR feature_test_macros (7).
 .SH BUGS
-If the destination string of a
-.BR strcpy ()
-is not large enough, then anything might happen.
-Overflowing fixed-length string buffers is a favorite cracker technique
-for taking complete control of the machine.
-Any time a program reads or copies data into a buffer,
-the program first needs to check that there's enough space.
-This may be unnecessary if you can show that overflow is impossible,
-but be careful: programs can get changed over time,
-in ways that may make the impossible possible.
+.TP
+.BR strcat ()
+This function can be very inefficient.
+Read about
+.UR https://www.joelonsoftware.com/\:2001/12/11/\:back\-to\-basics/
+Shlemiel the painter
+.UE .
+.SH EXAMPLES
+.EX
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+int
+main(void)
+{
+    char    *p;
+    char    buf1[BUFSIZ];
+    char    buf2[BUFSIZ];
+    size_t  len;
+
+    p = buf1;
+    p = stpcpy(p, "Hello ");
+    p = stpcpy(p, "world");
+    p = stpcpy(p, "!");
+    len = p \- buf1;
+
+    printf("[len = %zu]: ", len);
+    puts(buf1);  // "Hello world!"
+
+    strcpy(buf2, "Hello ");
+    strcat(buf2, "world");
+    strcat(buf2, "!");
+    len = strlen(buf2);
+
+    printf("[len = %zu]: ", len);
+    puts(buf2);  // "Hello world!"
+
+    exit(EXIT_SUCCESS);
+}
+.EE
 .SH SEE ALSO
-.BR bcopy (3),
-.BR memccpy (3),
-.BR memcpy (3),
-.BR memmove (3),
-.BR stpcpy (3),
 .BR strdup (3),
 .BR string (3),
-.BR wcscpy (3)
+.BR wcscpy (3),
+.BR string_copy (7)
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v5 4/5] stpncpy.3, strncpy.3: Document in a single page
  2022-12-14 16:17       ` [PATCH v4 " Alejandro Colomar
                           ` (3 preceding siblings ...)
  2022-12-15  0:26         ` [PATCH v5 3/5] stpcpy.3, strcpy.3, strcat.3: Document in a single page Alejandro Colomar
@ 2022-12-15  0:26         ` Alejandro Colomar
  2022-12-15  0:28           ` Alejandro Colomar
  2022-12-15  0:26         ` [PATCH v5 5/5] strncat.3: Rewrite to be consistent with string_copy.7 Alejandro Colomar
  5 siblings, 1 reply; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-15  0:26 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, Martin Sebor, G. Branden Robinson,
	Douglas McIlroy, Jakub Wilk, Serge Hallyn, Iker Pedrosa,
	Andrew Pinski

Rewrite to be consistent with the new string_copy.7 page.

Cc: Martin Sebor <msebor@redhat.com>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Douglas McIlroy <douglas.mcilroy@dartmouth.edu>
Cc: Jakub Wilk <jwilk@jwilk.net>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Iker Pedrosa <ipedrosa@redhat.com>
Cc: Andrew Pinski <pinskia@gmail.com>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man3/stpncpy.3 | 163 +++++++++++++++++++++++++++++--------------------
 man3/strncpy.3 | 130 +--------------------------------------
 2 files changed, 99 insertions(+), 194 deletions(-)

diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
index 0a62e3055..ab69be8ec 100644
--- a/man3/stpncpy.3
+++ b/man3/stpncpy.3
@@ -1,15 +1,13 @@
-.\" Copyright (c) Bruno Haible <haible@clisp.cons.org>
-.\" Copyright (c) 2022 Alejandro Colomar <alx@kernel.org>
+.\" Copyright 2022 Alejandro Colomar <alx@kernel.org>
 .\"
-.\" SPDX-License-Identifier: GPL-2.0-or-later
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft
 .\"
-.\" References consulted:
-.\"   GNU glibc-2 source code and manual
-.\"
-.\" Corrected, aeb, 990824
 .TH stpncpy 3 (date) "Linux man-pages (unreleased)"
 .SH NAME
-stpncpy \- copy string into a fixed-length buffer and zero the rest of it
+stpncpy, strncpy
+\- zero a fixed-width buffer and
+copy a string into a character sequence with truncation
+and zero the rest of it
 .SH LIBRARY
 Standard C library
 .RI ( libc ", " \-lc )
@@ -17,9 +15,12 @@ .SH SYNOPSIS
 .nf
 .B #include <string.h>
 .PP
-.BI "char *stpncpy(char " dest "[restrict ." n "], \
-const char " src "[restrict ." n ],
-.BI "              size_t " n );
+.BI "char *stpncpy(char " dst "[restrict ." sz "], \
+const char *restrict " src ,
+.BI "               size_t " sz );
+.BI "char *strncpy(char " dst "[restrict ." sz "], \
+const char *restrict " src ,
+.BI "               size_t " sz );
 .fi
 .PP
 .RS -4
@@ -35,67 +36,44 @@ .SH SYNOPSIS
         _GNU_SOURCE
 .fi
 .SH DESCRIPTION
-.IR Note :
-This is probably not the function you want to use.
-For string copying with truncation, see
-.BR strlcpy (3bsd).
-.PP
-The
-.BR stpncpy ()
-function copies at most
-.I n
-characters of
+These functions copy the string pointed to by
 .I src
-and fills the rest of the
-.I dest
-buffer with null bytes.
-.BR Warning :
-If there is no null character among the first
-.I n
-bytes of
-.IR src ,
-the string placed in
-.I dest
-will not be null-terminated.
+into a null-padded character sequence at the fixed-width buffer pointer to by
+.IR dst .
+If the destination buffer,
+limited by its size,
+isn't large enough to hold the copy,
+the resulting character sequence is truncated.
+They only differ in the return value.
 .PP
-A simple implementation of
-.BR strncpy ()
-might be:
+An implementation of these functions might be:
 .PP
 .in +4n
 .EX
 char *
-stpncpy(char *dest, const char *src, size_t n)
+stpncpy(char *restrict dst, const char *restrict src, size_t sz)
 {
-    char  *p
+    bzero(dst, sz);
+    return mempcpy(dst, src, strnlen(src, sz));
+}
 
-    bzero(dest, n);
-    p = memccpy(dest, src, \(aq\e0\(aq, n);
-    if (p == NULL)
-        return dest + n;
-
-    return p - 1;
+char *
+strncpy(char *restrict dst, const char *restrict src, size_t sz)
+{
+    stpncpy(dst, src, sz);
+    return dst;
 }
 .EE
 .in
-.PP
-The use of
-.BR strncpy ()
-is to copy a C string to a fixed-length buffer
-while ensuring that unused bytes in the destination buffer are zeroed out
-(perhaps to prevent information leaks if the buffer is to be
-written to media or transmitted to another process via an
-interprocess communication technique).
 .SH RETURN VALUE
+.TP
 .BR stpncpy ()
-returns a pointer to the terminating null byte
-in
-.IR dest ,
-or, if
-.I dest
-is not null-terminated,
-.IR dest + n
-(that is, a pointer to one-past-the-end of the array).
+returns a pointer to
+one after the last character in the destination character sequence.
+.TP
+.BR strncpy ()
+returns
+.IR dst .
 .SH ATTRIBUTES
 For an explanation of the terms used in this section, see
 .BR attributes (7).
@@ -107,16 +85,71 @@ .SH ATTRIBUTES
 l l l.
 Interface	Attribute	Value
 T{
-.BR stpncpy ()
+.BR stpncpy (),
+.BR strncpy ()
 T}	Thread safety	MT-Safe
 .TE
 .hy
 .ad
 .sp 1
 .SH STANDARDS
-This function was added to POSIX.1-2008.
-Before that, it was a GNU extension.
-It first appeared in glibc 1.07 in 1993.
+.TP
+.BR stpncpy ()
+POSIX.1-2008.
+.\" Before that, it was a GNU extension.
+.\" It first appeared in glibc 1.07 in 1993.
+.TP
+.BR strncpy ()
+POSIX.1-2001, POSIX.1-2008, C89, C99, SVr4, 4.3BSD.
+.SH CAVEATS
+The name of these functions is confusing.
+These functions produce a null-padded character sequence,
+not a string (see
+.BR string_copy (7)).
+.PP
+Truncation should be determined by
+comparing the length of the input string
+with the size of the destination buffer.
+.PP
+If you're going to use this function in chained calls,
+it would be useful to develop a similar function that accepts
+a pointer to one past the end of the destination buffer instead of its size.
+.SH EXAMPLES
+.\" SRC BEGIN (stpncpy.c)
+.EX
+#include <err.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+int
+main(void)
+{
+    char    *end;
+    char    buf1[20];
+    char    buf2[20];
+    size_t  len;
+
+    if (sizeof(buf1) < strlen("Hello world!"))
+        warnx("stpncpy: truncating character sequence");
+    end = stpncpy(buf1, "Hello world!", sizeof(buf1));
+    len = end \- buf1;
+
+    printf("[len = %zu]: ", len);
+    printf("%.*s\en", (int) len, buf1);  // "Hello world!"
+
+    if (sizeof(buf2) < strlen("Hello world!"))
+        warnx("strncpy: truncating character sequence");
+    strncpy(buf2, "Hello world!", sizeof(buf));
+    len = strnlen(buf2, sizeof(buf2));
+
+    printf("[len = %zu]: ", len);
+    printf("%.*s\en", (int) len, buf2);  // "Hello world!"
+
+    exit(EXIT_SUCCESS);
+}
+.EE
+.\" SRC END
 .SH SEE ALSO
-.BR strlcpy (3bsd)
-.BR wcpncpy (3)
+.BR wcpncpy (3),
+.BR string_copy (7)
diff --git a/man3/strncpy.3 b/man3/strncpy.3
index e2ffc683f..4710b0201 100644
--- a/man3/strncpy.3
+++ b/man3/strncpy.3
@@ -1,129 +1 @@
-.\" Copyright (C) 1993 David Metcalfe <david@prism.demon.co.uk>
-.\" Copyright (C) 2022 Alejandro Colomar <alx@kernel.org>
-.\"
-.\" SPDX-License-Identifier: Linux-man-pages-copyleft
-.\"
-.\" References consulted:
-.\"     Linux libc source code
-.\"     Lewine's _POSIX Programmer's Guide_ (O'Reilly & Associates, 1991)
-.\"     386BSD man pages
-.\" Modified Sat Jul 24 18:06:49 1993 by Rik Faith (faith@cs.unc.edu)
-.\" Modified Fri Aug 25 23:17:51 1995 by Andries Brouwer (aeb@cwi.nl)
-.\" Modified Wed Dec 18 00:47:18 1996 by Andries Brouwer (aeb@cwi.nl)
-.\" 2007-06-15, Marc Boyer <marc.boyer@enseeiht.fr> + mtk
-.\"     Improve discussion of strncpy().
-.\"
-.TH strncpy 3 (date) "Linux man-pages (unreleased)"
-.SH NAME
-strncpy \- copy a string into a fixed-length buffer and zero the rest of it
-.SH LIBRARY
-Standard C library
-.RI ( libc ", " \-lc )
-.SH SYNOPSIS
-.nf
-.B #include <string.h>
-.PP
-.BI "[[deprecated]] char *strncpy(char " dest "[restrict ." n ],
-.BI "                             const char " src "[restrict ." n "], \
-size_t " n );
-.fi
-.SH DESCRIPTION
-.BI Note: " This is not the function you want to use."
-For string copying with truncation, see
-.BR strlcpy (3bsd).
-For copying a string into a fixed-length buffer with zeroing of the rest,
-see
-.BR stpncpy (3).
-.PP
-.BR strncpy ()
-copies at most
-.I n
-bytes of
-.IR src ,
-and fills the rest of the
-.I dest
-buffer with null bytes.
-.BR Warning :
-If there is no null byte
-among the first
-.I n
-bytes of
-.IR src ,
-the string placed in
-.I dest
-will not be null-terminated.
-.PP
-A simple implementation of
-.BR strncpy ()
-might be:
-.PP
-.in +4n
-.EX
-char *
-strncpy(char *dest, const char *src, size_t n)
-{
-    bzero(dest, n);
-    memccpy(dest, src, \(aq\e0\(aq, n);
-
-    return dest;
-}
-.EE
-.in
-.PP
-The use of
-.BR strncpy ()
-is to copy a C string to a fixed-length buffer
-while ensuring that unused bytes in the destination buffer are zeroed out
-(perhaps to prevent information leaks if the buffer is to be
-written to media or transmitted to another process via an
-interprocess communication technique).
-But
-.BR stpncpy (3)
-is better for this purpose,
-since it detects truncation.
-See BUGS below.
-.SH RETURN VALUE
-The
-.BR strncpy ()
-function returns a pointer to
-the destination buffer
-.IR dest .
-.SH ATTRIBUTES
-For an explanation of the terms used in this section, see
-.BR attributes (7).
-.ad l
-.nh
-.TS
-allbox;
-lbx lb lb
-l l l.
-Interface	Attribute	Value
-T{
-.BR strncpy ()
-T}	Thread safety	MT-Safe
-.TE
-.hy
-.ad
-.sp 1
-.SH STANDARDS
-POSIX.1-2001, POSIX.1-2008, C89, C99, SVr4, 4.3BSD.
-.SH BUGS
-.BR strncpy ()
-has a misleading name.
-It doesn't produce a (null-terminated) string;
-and it should never be used for producing a string.
-.PP
-It can't detect truncation.
-It's probably better to explicitly call
-.BR bzero (3)
-and
-.BR memccpy (3),
-or
-.BR stpncpy (3)
-since they allow detecting truncation.
-.SH SEE ALSO
-.BR bzero (3),
-.BR memccpy (3),
-.BR stpncpy (3),
-.BR string (3),
-.BR wcsncpy (3)
+.so man3/stpncpy.3
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v5 5/5] strncat.3: Rewrite to be consistent with string_copy.7.
  2022-12-14 16:17       ` [PATCH v4 " Alejandro Colomar
                           ` (4 preceding siblings ...)
  2022-12-15  0:26         ` [PATCH v5 4/5] stpncpy.3, strncpy.3: " Alejandro Colomar
@ 2022-12-15  0:26         ` Alejandro Colomar
  2022-12-15  0:29           ` Alejandro Colomar
  5 siblings, 1 reply; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-15  0:26 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, Martin Sebor, G. Branden Robinson,
	Douglas McIlroy, Jakub Wilk, Serge Hallyn, Iker Pedrosa,
	Andrew Pinski

Cc: Martin Sebor <msebor@redhat.com>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Douglas McIlroy <douglas.mcilroy@dartmouth.edu>
Cc: Jakub Wilk <jwilk@jwilk.net>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Iker Pedrosa <ipedrosa@redhat.com>
Cc: Andrew Pinski <pinskia@gmail.com>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man3/strncat.3 | 147 +++++++++++++++----------------------------------
 1 file changed, 45 insertions(+), 102 deletions(-)

diff --git a/man3/strncat.3 b/man3/strncat.3
index 6e4bf6d78..108a9c450 100644
--- a/man3/strncat.3
+++ b/man3/strncat.3
@@ -4,7 +4,7 @@
 .\"
 .TH strncat 3 (date) "Linux man-pages (unreleased)"
 .SH NAME
-strncat \- concatenate an unterminated string into a string
+strncat \- concatenate a null-padded character sequence into a string
 .SH LIBRARY
 Standard C library
 .RI ( libc ", " \-lc )
@@ -12,53 +12,39 @@ .SH SYNOPSIS
 .nf
 .B #include <string.h>
 .PP
-.BI "char *strncat(char " dest "[restrict strlen(." dest ") + ." n " + 1],"
-.BI "              const char " src "[restrict ." n ],
-.BI "              size_t " n );
+.BI "char *strncat(char *restrict " dst ", const char " src "[restrict ." sz ],
+.BI "               size_t " sz );
 .fi
 .SH DESCRIPTION
-.IR Note :
-This is probably not the function you want to use.
-For string concatenation with truncation, see
-.BR strlcat (3bsd).
-For copying or concatenating a string into a fixed-length buffer
-with zeroing of the rest, see
-.BR stpncpy (3).
-.PP
-.BR strncat ()
-appends at most
-.I n
-characters of
-.I src
-to the end of
+This function catenates the input character sequence
+contained in a null-padded fixed-width buffer,
+into a string at the buffer pointed to by
 .IR dst .
-It always terminates with a null character the string placed in
-.IR dest .
+The programmer is responsible for allocating a buffer large enough, that is,
+.IR "strlen(dst) + strnlen(src, sz) + 1" .
 .PP
-An implementation of
-.BR strncat ()
-might be:
+An implementation of this function might be:
 .PP
 .in +4n
 .EX
 char *
-strncat(char *dest, const char *src, size_t n)
+strncat(char *restrict dst, const char *restrict src, size_t sz)
 {
-    char    *cat;
-    size_t  len;
+    int   len;
+    char  *end;
 
-    cat = dest + strlen(dest);
-    len = strnlen(src, n);
-    memcpy(cat, src, len);
-    cat[len] = \(aq\e0\(aq;
+    len = strnlen(src, sz);
+    end = dst + strlen(dst);
+    end = mempcpy(end, src, len);
+    *end = \(aq\e0\(aq;
 
-    return dest;
+    return dst;
 }
 .EE
 .in
 .SH RETURN VALUE
 .BR strncat ()
-returns a pointer to the resulting string
+returns
 .IR dest .
 .SH ATTRIBUTES
 For an explanation of the terms used in this section, see
@@ -79,93 +65,50 @@ .SH ATTRIBUTES
 .sp 1
 .SH STANDARDS
 POSIX.1-2001, POSIX.1-2008, C89, C99, SVr4, 4.3BSD.
-.SH NOTES
-.SS ustr2stpe()
-You may want to write your own function similar to
-.BR strncpy (),
-with the following improvements:
-.IP \(bu 3
-Copy, instead of concatenating.
-There's no equivalent of
-.BR strncat ()
-that copies instead of concatenating.
-.IP \(bu
-Allow chaining the function,
-by returning a suitable pointer.
-Copy chaining is faster than concatenating.
-.IP \(bu
-Don't check for null characters in the middle of the unterminated string.
-If the string is terminated, this function should not be used.
-If the string is unterminated, it is unnecessary.
-.IP \(bu
-A name that tells what it does:
-Copy from an
-.IR u nterminated
-.IR str ing
-to a
-.IR st ring,
-and return a
-.IR p ointer
-to its end.
-.PP
-.in +4n
-.EX
-/* This code is in the public domain.
- *
- * char *ustr2stp(char dst[restrict .n+1],
- *                const char src[restrict .n],
- *                size_t len);
- */
-char *
-ustr2stp(char *restrict dst, const char *restrict src, size_t len)
-{
-    memcpy(dst, src, len);
-    dst[len] = \(aq\e0\(aq;
-
-    return dst + len;
-}
-.EE
-.in
 .SH CAVEATS
-This function doesn't know the size of the destination buffer,
-so it can overrun the buffer if the programmer wasn't careful enough.
-.SH BUGS
-.BR strncat (3)
-has a misleading name;
-it has no relationship with
+The name of this function is confusing.
+This function has no relation to
 .BR strncpy (3).
+.PP
+If the destination buffer is not large enough,
+the behavior is undefined.
+See
+.B _FORTIFY_SOURCE
+in
+.BR feature_test_macros (7).
+.SH BUGS
+This function can be very inefficient.
+Read about
+.UR https://www.joelonsoftware.com/\:2001/12/11/\:back\-to\-basics/
+Shlemiel the painter
+.UE .
 .SH EXAMPLES
-The following program creates a string
-from a concatenation of unterminated strings.
 .\" SRC BEGIN (strncpy.c)
 .EX
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 
-#define nitems(arr)  (sizeof((arr)) / sizeof((arr)[0]))
-
 int
 main(void)
 {
-    char pre[4] = "pre.";
-    char *post = ".post";
-    char *src = "some_long_body.post";
-    char dest[100];
+    char    buf[BUFSIZ];
+    size_t  len;
 
-    dest[0] = \(aq\e0\(aq;
-    strncat(dest, pre, nitems(pre));
-    strncat(dest, src, strlen(src) \- strlen(post));
+    buf[0] = \(aq\e0\(aq;  // There's no 'cpy' function to this 'cat'.
+    strncat(buf, "Hello XXX", 6);
+    strncat(buf, "world", 42);
+    strncat(buf, "!", 1);
+    len = strlen(buf);
+
+    printf("[len = %zu]: ", len);
+    puts(buf);  // "Hello world!"
 
-    puts(dest);  // "pre.some_long_body"
     exit(EXIT_SUCCESS);
 }
 .EE
 .\" SRC END
 .in
 .SH SEE ALSO
-.BR memccpy (3),
-.BR memcpy (3),
-.BR mempcpy (3),
-.BR strcpy (3),
-.BR string (3)
+.BR string (3),
+.BR string_copy (3)
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 2/5] stpecpy.3, stpecpyx.3, ustpcpy.3, ustr2stp.3, zustr2stp.3, zustr2ustp.3: Add new links to string_copy(7)
  2022-12-15  0:26         ` [PATCH v5 2/5] stpecpy.3, stpecpyx.3, ustpcpy.3, ustr2stp.3, zustr2stp.3, zustr2ustp.3: Add new links to string_copy(7) Alejandro Colomar
@ 2022-12-15  0:27           ` Alejandro Colomar
  2022-12-16 18:47             ` Stefan Puiu
  0 siblings, 1 reply; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-15  0:27 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, Martin Sebor, G. Branden Robinson,
	Douglas McIlroy, Jakub Wilk, Serge Hallyn, Iker Pedrosa,
	Andrew Pinski


[-- Attachment #1.1: Type: text/plain, Size: 4853 bytes --]

Formatted strpcy(3):

strcpy(3)                  Library Functions Manual                  strcpy(3)

NAME
        strcpy - copy or catenate a string

LIBRARY
        Standard C library (libc, -lc)

SYNOPSIS
        #include <string.h>

        char *stpcpy(char *restrict dst, const char *restrict src);
        char *strcpy(char *restrict dst, const char *restrict src);
        char *strcat(char *restrict dst, const char *restrict src);

    Feature Test Macro Requirements for glibc (see feature_test_macros(7)):

        stpcpy():
            Since glibc 2.10:
                _POSIX_C_SOURCE >= 200809L
            Before glibc 2.10:
                _GNU_SOURCE

DESCRIPTION
        stpcpy()
        strcpy()
               These functions copy the string pointed to by src, into a string
               at  the buffer pointed to by dst.  The programmer is responsible
               for allocating a buffer large enough, that is, strlen(src) +  1.
               They only differ in the return value.

        strcat()
               This function catenates the string pointed to by src, at the end
               of  the string pointed to by dst.  The programmer is responsible
               for allocating a buffer large enough,  that  is,  strlen(dst)  +
               strlen(src) + 1.

        An implementation of these functions might be:

            char *
            stpcpy(char *restrict dst, const char *restrict src)
            {
                char  *end;

                end = mempcpy(dst, src, strlen(src));
                *end = '\0';

                return end;
            }

            char *
            strcpy(char *restrict dst, const char *restrict src)
            {
                stpcpy(dst, src);
                return dst;
            }

            char *
            strcat(char *restrict dst, const char *restrict src)
            {
                stpcpy(dst + strlen(dst), src);
                return dst;
            }

RETURN VALUE
        stpcpy()
               This  function returns a pointer to the terminating null byte at
               the end of the copied string.

        strcpy()
        strcat()
               These functions return dest.

ATTRIBUTES
        For an explanation of the terms  used  in  this  section,  see  attrib‐
        utes(7).
        ┌────────────────────────────────────────────┬───────────────┬─────────┐
        │Interface                                   │ Attribute     │ Value   │
        ├────────────────────────────────────────────┼───────────────┼─────────┤
        │stpcpy(), strcpy(), strcat()                │ Thread safety │ MT‐Safe │
        └────────────────────────────────────────────┴───────────────┴─────────┘

STANDARDS
        stpcpy()
               POSIX.1‐2008.

        strcpy()
        strcat()
               POSIX.1‐2001, POSIX.1‐2008, C89, C99, SVr4, 4.3BSD.

CAVEATS
        The strings src and dst may not overlap.

        If  the  destination  buffer is not large enough, the behavior is unde‐
        fined.  See _FORTIFY_SOURCE in feature_test_macros(7).

BUGS
        strcat()
               This function can be  very  inefficient.   Read  about  Shlemiel
               the      painter     ⟨https://www.joelonsoftware.com/2001/12/11/
               back-to-basics/⟩.

EXAMPLES
        #include <stdio.h>
        #include <stdlib.h>
        #include <string.h>

        int
        main(void)
        {
            char    *p;
            char    buf1[BUFSIZ];
            char    buf2[BUFSIZ];
            size_t  len;

            p = buf1;
            p = stpcpy(p, "Hello ");
            p = stpcpy(p, "world");
            p = stpcpy(p, "!");
            len = p - buf1;

            printf("[len = %zu]: ", len);
            puts(buf1);  // "Hello world!"

            strcpy(buf2, "Hello ");
            strcat(buf2, "world");
            strcat(buf2, "!");
            len = strlen(buf2);

            printf("[len = %zu]: ", len);
            puts(buf2);  // "Hello world!"

            exit(EXIT_SUCCESS);
        }

SEE ALSO
        strdup(3), string(3), wcscpy(3), string_copy(7)

Linux man‐pages (unreleased)        (date)                           strcpy(3)

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 4/5] stpncpy.3, strncpy.3: Document in a single page
  2022-12-15  0:26         ` [PATCH v5 4/5] stpncpy.3, strncpy.3: " Alejandro Colomar
@ 2022-12-15  0:28           ` Alejandro Colomar
  0 siblings, 0 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-15  0:28 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, Martin Sebor, G. Branden Robinson,
	Douglas McIlroy, Jakub Wilk, Serge Hallyn, Iker Pedrosa,
	Andrew Pinski


[-- Attachment #1.1: Type: text/plain, Size: 4732 bytes --]

Formatted stpncpy(3):

stpncpy(3)                 Library Functions Manual                 stpncpy(3)

NAME
        stpncpy,  strncpy  - zero a fixed‐width buffer and copy a string into a
        character sequence with truncation and zero the rest of it

LIBRARY
        Standard C library (libc, -lc)

SYNOPSIS
        #include <string.h>

        char *stpncpy(char dst[restrict .sz], const char *restrict src,
                       size_t sz);
        char *strncpy(char dst[restrict .sz], const char *restrict src,
                       size_t sz);

    Feature Test Macro Requirements for glibc (see feature_test_macros(7)):

        stpncpy():
            Since glibc 2.10:
                _POSIX_C_SOURCE >= 200809L
            Before glibc 2.10:
                _GNU_SOURCE

DESCRIPTION
        These functions copy the string pointed to by src  into  a  null‐padded
        character sequence at the fixed‐width buffer pointer to by dst.  If the
        destination buffer, limited by its size, isn’t large enough to hold the
        copy,  the resulting character sequence is truncated.  They only differ
        in the return value.

        An implementation of these functions might be:

            char *
            stpncpy(char *restrict dst, const char *restrict src, size_t sz)
            {
                bzero(dst, sz);
                return mempcpy(dst, src, strnlen(src, sz));
            }

            char *
            strncpy(char *restrict dst, const char *restrict src, size_t sz)
            {
                stpncpy(dst, src, sz);
                return dst;
            }

RETURN VALUE
        stpncpy()
               returns a pointer to one after the last character in the  desti‐
               nation character sequence.

        strncpy()
               returns dst.

ATTRIBUTES
        For  an  explanation  of  the  terms  used in this section, see attrib‐
        utes(7).
        ┌────────────────────────────────────────────┬───────────────┬─────────┐
        │Interface                                   │ Attribute     │ Value   │
        ├────────────────────────────────────────────┼───────────────┼─────────┤
        │stpncpy(), strncpy()                        │ Thread safety │ MT‐Safe │
        └────────────────────────────────────────────┴───────────────┴─────────┘

STANDARDS
        stpncpy()
               POSIX.1‐2008.

        strncpy()
               POSIX.1‐2001, POSIX.1‐2008, C89, C99, SVr4, 4.3BSD.

CAVEATS
        The name of these functions is confusing.  These  functions  produce  a
        null‐padded character sequence, not a string (see string_copy(7)).

        Truncation  should  be  determined by comparing the length of the input
        string with the size of the destination buffer.

        If you’re going to use this function in chained calls, it would be use‐
        ful to develop a similar function that accepts a pointer  to  one  past
        the end of the destination buffer instead of its size.

EXAMPLES
        #include <err.h>
        #include <stdio.h>
        #include <stdlib.h>
        #include <string.h>

        int
        main(void)
        {
            char    *end;
            char    buf1[20];
            char    buf2[20];
            size_t  len;

            if (sizeof(buf1) < strlen("Hello world!"))
                warnx("stpncpy: truncating character sequence");
            end = stpncpy(buf1, "Hello world!", sizeof(buf1));
            len = end - buf1;

            printf("[len = %zu]: ", len);
            printf("%.*s\n", (int) len, buf1);  // "Hello world!"

            if (sizeof(buf2) < strlen("Hello world!"))
                warnx("strncpy: truncating character sequence");
            strncpy(buf2, "Hello world!", sizeof(buf));
            len = strnlen(buf2, sizeof(buf2));

            printf("[len = %zu]: ", len);
            printf("%.*s\n", (int) len, buf2);  // "Hello world!"

            exit(EXIT_SUCCESS);
        }

SEE ALSO
        wcpncpy(3), string_copy(7)

Linux man‐pages (unreleased)        (date)                          stpncpy(3)

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 5/5] strncat.3: Rewrite to be consistent with string_copy.7.
  2022-12-15  0:26         ` [PATCH v5 5/5] strncat.3: Rewrite to be consistent with string_copy.7 Alejandro Colomar
@ 2022-12-15  0:29           ` Alejandro Colomar
  0 siblings, 0 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-15  0:29 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, Martin Sebor, G. Branden Robinson,
	Douglas McIlroy, Jakub Wilk, Serge Hallyn, Iker Pedrosa,
	Andrew Pinski


[-- Attachment #1.1: Type: text/plain, Size: 3433 bytes --]

Formatted strncat(3):

strncat(3)                 Library Functions Manual                 strncat(3)

NAME
        strncat - concatenate a null‐padded character sequence into a string

LIBRARY
        Standard C library (libc, -lc)

SYNOPSIS
        #include <string.h>

        char *strncat(char *restrict dst, const char src[restrict .sz],
                       size_t sz);

DESCRIPTION
        This  function  catenates  the  input character sequence contained in a
        null‐padded fixed‐width buffer, into a string at the buffer pointed  to
        by  dst.   The  programmer is responsible for allocating a buffer large
        enough, that is, strlen(dst) + strnlen(src, sz) + 1.

        An implementation of this function might be:

            char *
            strncat(char *restrict dst, const char *restrict src, size_t sz)
            {
                int   len;
                char  *end;

                len = strnlen(src, sz);
                end = dst + strlen(dst);
                end = mempcpy(end, src, len);
                *end = '\0';

                return dst;
            }

RETURN VALUE
        strncat() returns dest.

ATTRIBUTES
        For an explanation of the terms  used  in  this  section,  see  attrib‐
        utes(7).
        ┌────────────────────────────────────────────┬───────────────┬─────────┐
        │Interface                                   │ Attribute     │ Value   │
        ├────────────────────────────────────────────┼───────────────┼─────────┤
        │strncat()                                   │ Thread safety │ MT‐Safe │
        └────────────────────────────────────────────┴───────────────┴─────────┘

STANDARDS
        POSIX.1‐2001, POSIX.1‐2008, C89, C99, SVr4, 4.3BSD.

CAVEATS
        The  name of this function is confusing.  This function has no relation
        to strncpy(3).

        If the destination buffer is not large enough, the  behavior  is  unde‐
        fined.  See _FORTIFY_SOURCE in feature_test_macros(7).

BUGS
        This function can be very inefficient.  Read about Shlemiel the painter
        ⟨https://www.joelonsoftware.com/2001/12/11/back-to-basics/⟩.

EXAMPLES
        #include <stdio.h>
        #include <stdlib.h>
        #include <string.h>

        int
        main(void)
        {
            char    buf[BUFSIZ];
            size_t  len;

            buf[0] = '\0';  // There’s no ’cpy’ function to this ’cat’.
            strncat(buf, "Hello XXX", 6);
            strncat(buf, "world", 42);
            strncat(buf, "!", 1);
            len = strlen(buf);

            printf("[len = %zu]: ", len);
            puts(buf);  // "Hello world!"

            exit(EXIT_SUCCESS);
        }

SEE ALSO
        string(3), string_copy(3)

Linux man‐pages (unreleased)        (date)                          strncat(3)

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 1/5] string_copy.7: Add page to document all string-copying functions
  2022-12-15  0:26         ` [PATCH v5 1/5] string_copy.7: Add page to document all string-copying functions Alejandro Colomar
@ 2022-12-15  0:30           ` Alejandro Colomar
  0 siblings, 0 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-15  0:30 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, Martin Sebor, G. Branden Robinson,
	Douglas McIlroy, Jakub Wilk, Serge Hallyn, Iker Pedrosa,
	Andrew Pinski


[-- Attachment #1.1: Type: text/plain, Size: 26068 bytes --]

Formatted string_copy(7):

string_copy(7)         Miscellaneous Information Manual         string_copy(7)

NAME
        stpcpy,  strcpy,  strcat, stpecpy, stpecpyx, strlcpy, strlcat, stpncpy,
        strncpy, zustr2ustp,  zustr2stp,  strncat,  ustpcpy,  ustr2stp  -  copy
        strings and character sequences

SYNOPSIS
    Strings
        // Chain‐copy a string.
        char *stpcpy(char *restrict dst, const char *restrict src);

        // Copy/catenate a string.
        char *strcpy(char *restrict dst, const char *restrict src);
        char *strcat(char *restrict dst, const char *restrict src);

        // Chain‐copy a string with truncation.
        char *stpecpy(char *dst, char past_end[0], const char *restrict src);

        // Chain‐copy a string with truncation and SIGSEGV on UB.
        char *stpecpyx(char *dst, char past_end[0], const char *restrict src);

        // Copy/catenate a string with truncation and SIGSEGV on UB.
        size_t strlcpy(char dst[restrict .sz], const char *restrict src,
                       size_t sz);
        size_t strlcat(char dst[restrict .sz], const char *restrict src,
                       size_t sz);

    Null‐padded character sequences
        // Zero a fixed‐width buffer, and
        // copy a string into a character sequence with truncation.
        char *stpncpy(char dst[restrict .sz], const char *restrict src,
                       size_t sz);

        // Zero a fixed‐width buffer, and
        // copy a string into a character sequence with truncation.
        char *strncpy(char dest[restrict .sz], const char *restrict src,
                       size_t sz);

        // Chain‐copy a null‐padded character sequence into a character sequence.
        char *zustr2ustp(char *restrict dst, const char src[restrict .sz],
                       size_t sz);

        // Chain‐copy a null‐padded character sequence into a string.
        char *zustr2stp(char *restrict dst, const char src[restrict .sz],
                       size_t sz);

        // Catenate a null‐padded character sequence into a string.
        char *strncat(char *restrict dst, const char src[restrict .sz],
                       size_t sz);

    Measured character sequences
        // Chain‐copy a measured character sequence.
        char *ustpcpy(char *restrict dst, const char src[restrict .len],
                       size_t len);

        // Chain‐copy a measured character sequence into a string.
        char *ustr2stp(char *restrict dst, const char src[restrict .len],
                       size_t len);

DESCRIPTION
    Terms (and abbreviations)
        string (str)
               is  a sequence of zero or more non‐null characters followed by a
               null byte.

        character sequence
               is a sequence of zero or more non‐null  characters.   A  program
               should  never  usa  a  character  sequence where a string is re‐
               quired.  However, with appropriate care, a string can be used in
               the place of a character sequence.

               null‐padded character sequence (zustr)
                      Character  sequences  can  be  contained  in  fixed‐width
                      buffers, which contain padding null bytes after the char‐
                      acter  sequence,  to  fill the rest of the buffer without
                      affecting the character sequence; however, those  padding
                      null bytes are not part of the character sequence.

               measured character sequence (ustr)
                      Character  sequence delimited by its length.  It may be a
                      slice of a  larger  character  sequence,  or  even  of  a
                      string.

        length (len)
               is  the  number  of non‐null characters in a string or character
               sequence.   It  is  the  return  value  of  strlen(str)  and  of
               strnlen(ustr, sz).

        size (sz)
               refers  to  the  entire buffer where the string or character se‐
               quence is contained.

        end    is the name of a pointer to  the  terminating  null  byte  of  a
               string, or a pointer to one past the last character of a charac‐
               ter  sequence.  This is the return value of functions that allow
               chaining.  It is equivalent to &str[len].

        past_end
               is the name of a pointer to one past the end of the buffer  that
               contains  a  string  or character sequence.  It is equivalent to
               &str[sz].  It is used as a sentinel value, to be able  to  trun‐
               cate  strings  or character sequences instead of overrunning the
               containing buffer.

        copy   This term is used when the writing starts at the  first  element
               pointed to by dst.

        catenate
               This  term  is  used when a function first finds the terminating
               null byte in dst, and then starts writing at that position.

        chain  This term is used  when  it’s  the  programmer  who  provides  a
               pointer  to  the  end in dst, and the function starts writing at
               that location.  The function returns a pointer to  the  new  end
               after  the call, so that the programmer can use it to chain such
               calls.

    Copy, catenate, and chain‐copy
        Originally, there was a distinction between  functions  that  copy  and
        those that catenate.  However, newer functions that copy while allowing
        chaining  cover  both use cases with a single API.  They are also algo‐
        rithmically faster, since they don’t need to search for the end of  the
        existing  string.  However, functions that catenate have a much simpler
        use, so if performance is not important, it can make sense to use  them
        for improving readability.

        To  chain  copy  functions,  they  need to return a pointer to the end.
        That’s a byproduct of the copy operation,  so  it  has  no  performance
        costs.   Functions that return such a pointer, and thus can be chained,
        have names of the form *stp*(), since it’s  also  common  to  name  the
        pointer just p.

        Chain‐copying  functions  that  truncate should accept a pointer to one
        past the end of the destination buffer, and  have  names  of  the  form
        *stpe*().  This allows not having to recalculate the remaining size af‐
        ter each call.

    Truncate or not?
        The  first  thing  to  note  is that programmers should be careful with
        buffers, so they always have the correct size, and  truncation  is  not
        necessary.

        In  most cases, truncation is not desired, and it is simpler to just do
        the copy.  Simpler code is safer code.  Programming against programming
        mistakes by adding more code just adds more points where  mistakes  can
        be made.

        Nowadays,  compilers  can  detect  most programmer errors with features
        like compiler warnings,  static  analyzers,  and  _FORTIFY_SOURCE  (see
        ftm(7)).   Keeping  the code simple helps these overflow‐detection fea‐
        tures be more precise.

        When validating user input, however, it makes sense to  truncate.   Re‐
        member to check the return value of such function calls.

        Functions that truncate:

        •  stpecpy(3)  is the most efficient string copy function that performs
           truncation.  It only requires to check for truncation once after all
           chained calls.

        •  stpecpyx(3) is a variant of  stpecpy(3)  that  consumes  the  entire
           source  string,  to catch bugs in the program by forcing a segmenta‐
           tion fault (as strlcpy(3bsd) and strlcat(3bsd) do).

        •  strlcpy(3bsd) and strlcat(3bsd) are designed to crash if  the  input
           string is invalid (doesn’t contain a terminating null byte).

        •  stpncpy(3)  and  strncpy(3)  also  truncate,  but  they  don’t write
           strings, but rather null‐padded character sequences.

    Null‐padded character sequences
        For historic reasons, some standard APIs, such as utmpx(5),  use  null‐
        padded  character  sequences in fixed‐width buffers.  To interface with
        them, specialized functions need to be used.

        To copy strings into them, use stpncpy(3).

        To copy from an unterminated string within a fixed‐width buffer into  a
        string,  ignoring  any  trailing  null  bytes in the source fixed‐width
        buffer, you should use zustr2stp(3) or strncat(3).

        To copy from an unterminated string within a fixed‐width buffer into  a
        character  sequence,  ingoring  any  trailing  null bytes in the source
        fixed‐width buffer, you should use zustr2ustp(3).

    Measured character sequences
        The simplest character sequence copying function is mempcpy(3).  It re‐
        quires always knowing the length of your character sequences, for which
        structures can be used.  It makes the code much faster, since  you  al‐
        ways  know the length of your character sequences, and can do the mini‐
        mal copies and length measurements.  mempcpy(3)  copies  character  se‐
        quences, so you need to explicitly set the terminating null byte if you
        need a string.

        However,  for keeping type safety, it’s good to add a wrapper that uses
        char * instead of void *: ustpcpy(3).

        In programs that make considerable use  of  strings  or  character  se‐
        quences, and need the best performance, using overlapping character se‐
        quences can make a big difference.  It allows holding subsequences of a
        larger character sequence.  while not duplicating memory nor using time
        to do a copy.

        However, this is delicate, since it requires using character sequences.
        C  library  APIs  use strings, so programs that use character sequences
        will have to take care of differentiating strings  from  character  se‐
        quences.

        To copy a measured character sequence, use ustpcpy(3).

        To copy a measured character sequence into a string, use ustr2stp(3).

        Because  these  functions ask for the length, and a string is by nature
        composed of a character sequence of the same length plus a  terminating
        null byte, a string is also accepted as input.

    String vs character sequence
        Some  functions  only operate on strings.  Those require that the input
        src is a string, and guarantee an output string (even  when  truncation
        occurs).   Functions that catenate also require that dst holds a string
        before the call.  List of functions:

        •  stpcpy(3)
        •  strcpy(3), strcat(3)
        •  stpecpy(3), stpecpyx(3)
        •  strlcpy(3bsd), strlcat(3bsd)

        Other functions require an input string, but  create  a  character  se‐
        quence  as  output.   These  functions have confusing names, and have a
        long history of misuse.  List of functions:

        •  stpncpy(3)
        •  strncpy(3)

        Other functions operate on an input character sequence, and  create  an
        output  string.   Functions that catenate also require that dst holds a
        string before the call.  strncat(3) has an even  more  misleading  name
        than the functions above.  List of functions:

        •  zustr2stp(3)
        •  strncat(3)
        •  ustr2stp(3)

        Other  functions  operate  on  an input character sequence to create an
        output character sequence.  List of functions:

        •  ustpcpy(3)
        •  zustr2stp(3)

    Functions
        stpcpy(3)
               This function copies the input string into a destination string.
               The programmer is responsible  for  allocating  a  buffer  large
               enough.  It returns a pointer suitable for chaining.

        strcpy(3)
        strcat(3)
               These functions copy and catenate the input string into a desti‐
               nation  string.   The programmer is responsible for allocating a
               buffer large enough.  The return value is useless.

               stpcpy(3) is a faster alternative to these functions.

        stpecpy(3)
        stpecpyx(3)
               These functions copy the input string into a destination string.
               If the destination buffer, limited by a pointer to one past  the
               end  of  it,  isn’t large enough to hold the copy, the resulting
               string is truncated (but it  is  guaranteed  to  be  null‐termi‐
               nated).   They  return a pointer suitable for chaining.  Trunca‐
               tion needs to be detected only once after the last chained call.
               stpecpyx(3) has identical semantics to stpecpy(3),  except  that
               it forces a SIGSEGV if the src pointer is not a string.

               These  functions  are  not provided by any library; See EXAMPLES
               for a reference implementation.

        strlcpy(3bsd)
        strlcat(3bsd)
               These functions copy and catenate the input string into a desti‐
               nation string.  If the destination buffer, limited by its  size,
               isn’t  large  enough  to  hold the copy, the resulting string is
               truncated (but it is guaranteed to  be  null‐terminated).   They
               return  the  length  of  the  total string they tried to create.
               These functions force a SIGSEGV if the  src  pointer  is  not  a
               string.

               stpecpyx(3) is a faster alternative to these functions.

        stpncpy(3)
               This  function  copies the input string into a destination null‐
               padded character sequence in a fixed‐width buffer.  If the  des‐
               tination buffer, limited by its size, isn’t large enough to hold
               the  copy, the resulting character sequence is truncated.  Since
               it creates a character sequence, it doesn’t need to write a ter‐
               minating null byte.  It’s impossible to  distinguish  truncation
               after  the  call,  from  a character sequence that just fits the
               destination buffer;  truncation  should  be  detected  from  the
               length of the original string.

        strncpy(3)
               This  function is identical to stpncpy(3) except for the useless
               return value.

               stpncpy(3) is a more useful alternative to this function.

        zustr2ustp(3)
               This function copies the input character sequence contained in a
               null‐padded wixed‐width buffer, into a destination character se‐
               quence.  The programmer is responsible for allocating  a  buffer
               large enough.  It returns a pointer suitable for chaining.

               A  truncating  version of this function doesn’t exist, since the
               size of the original character sequence is always known,  so  it
               wouldn’t be very useful.

               This function is not provided by any library; See EXAMPLES for a
               reference implementation.

        zustr2stp(3)
               This function copies the input character sequence contained in a
               null‐padded  wixed‐width buffer, into a destination string.  The
               programmer is responsible for allocating a buffer large  enough.
               It returns a pointer suitable for chaining.

               A  truncating  version of this function doesn’t exist, since the
               size of the original character sequence is always known,  so  it
               wouldn’t be very useful.

               This function is not provided by any library; See EXAMPLES for a
               reference implementation.

        strncat(3)
               Do  not  confuse this function with strncpy(3); they are not re‐
               lated at all.

               This function catenates the input character  sequence  contained
               in  a null‐padded wixed‐width buffer, into a destination string.
               The programmer is responsible  for  allocating  a  buffer  large
               enough.  The return value is useless.

               zustr2stp(3) is a faster alternative to this function.

        ustpcpy(3)
               This  function  copies  the input character sequence, limited by
               its length, into a destination character sequence.  The program‐
               mer is responsible for allocating a buffer large enough.  It re‐
               turns a pointer suitable for chaining.

        ustr2stp(3)
               This function copies the input character  sequence,  limited  by
               its  length,  into  a destination string.  The programmer is re‐
               sponsible for allocating a buffer large enough.   It  returns  a
               pointer suitable for chaining.

RETURN VALUE
        The  following  functions return a pointer to the terminating null byte
        in the destination string.

        •  stpcpy(3)
        •  ustr2stp(3)
        •  zustr2stp(3)

        The following functions return a pointer to the terminating  null  byte
        in the destination string, except when truncation occurs; if truncation
        occurs,  they  return  a pointer to one past the end of the destination
        buffer (past_end).

        •  stpecpy(3), stpecpyx(3)

        The following function returns a pointer to one after the last  charac‐
        ter  in  the destination character sequence; if truncation occurs, that
        pointer is equivalent to a pointer to one past the end of the  destina‐
        tion buffer.

        •  stpncpy(3)

        The  following functions return a pointer to one after the last charac‐
        ter in the destination character sequence.

        •  zustr2ustp(3)
        •  ustpcpy(3)

        The following functions return the length of the total string that they
        tried to create (as if truncation didn’t occur).

        •  strlcpy(3bsd), strlcat(3bsd)

        The following functions return the dst pointer, which is useless.

        •  strcpy(3), strcat(3)
        •  strncpy(3)
        •  strncat(3)

NOTES
        The Linux kernel has an internal function for copying strings, which is
        similar to stpecpy(3), except that it can’t be chained:

        strscpy(9)
               This function copies the input string into a destination string.
               If the destination buffer, limited  by  its  size,  isn’t  large
               enough  to hold the copy, the resulting string is truncated (but
               it is guaranteed to be null‐terminated).  It returns the  length
               of the destination string, or -E2BIG on truncation.

               stpecpy(3) is a simpler and faster alternative to this function.

CAVEATS
        Don’t  mix  chain calls to truncating and non‐truncating functions.  It
        is conceptually wrong unless you know that the first  part  of  a  copy
        will  always  fit.  Anyway, the performance difference will probably be
        negligible, so it will probably be more clear if you use consistent se‐
        mantics: either truncating or non‐truncating.  Calling a non‐truncating
        function after a truncating one is necessarily wrong.

BUGS
        All catenation functions share the same performance  problem:  Shlemiel
        the         painter         ⟨https://www.joelonsoftware.com/2001/12/11/
        back-to-basics/⟩.

EXAMPLES
        The following are examples of correct use of each of these functions.

        stpcpy(3)
               p = buf;
               p = stpcpy(p, "Hello ");
               p = stpcpy(p, "world");
               p = stpcpy(p, "!");
               len = p - buf;
               puts(buf);

        strcpy(3)
        strcat(3)
               strcpy(buf, "Hello ");
               strcat(buf, "world");
               strcat(buf, "!");
               len = strlen(buf);
               puts(buf);

        stpecpy(3)
        stpecpyx(3)
               past_end = buf + sizeof(buf);
               p = buf;
               p = stpecpy(p, past_end, "Hello ");
               p = stpecpy(p, past_end, "world");
               p = stpecpy(p, past_end, "!");
               if (p == past_end) {
                   p--;
                   goto toolong;
               }
               len = p - buf;
               puts(buf);

        strlcpy(3bsd)
        strlcat(3bsd)
               if (strlcpy(buf, "Hello ", sizeof(buf)) >= sizeof(buf))
                   goto toolong;
               if (strlcat(buf, "world", sizeof(buf)) >= sizeof(buf))
                   goto toolong;
               len = strlcat(buf, "!", sizeof(buf));
               if (len >= sizeof(buf))
                   goto toolong;
               puts(buf);

        strscpy(9)
               len = strscpy(buf, "Hello world!", sizeof(buf));
               if (len == -E2BIG)
                   goto toolong;
               puts(buf);

        stpncpy(3)
               end = stpncpy(buf, "Hello world!", sizeof(buf));
               if (sizeof(buf) < strlen("Hello world!"))
                   goto toolong;
               len = end - buf;
               for (size_t i = 0; i < sizeof(buf); i++)
                   putchar(buf[i]);

        strncpy(3)
               strncpy(buf, "Hello world!", sizeof(buf));
               if (sizeof(buf) < strlen("Hello world!"))
                   goto toolong;
               len = strnlen(buf, sizeof(buf));
               for (size_t i = 0; i < sizeof(buf); i++)
                   putchar(buf[i]);

        zustr2ustp(3)
               p = buf;
               p = zustr2ustp(p, "Hello ", 6);
               p = zustr2ustp(p, "world", 42);  // Padding null bytes ignored.
               p = zustr2ustp(p, "!", 1);
               len = p - buf;
               printf("%.*s\n", (int) len, buf);

        zustr2stp(3)
               p = buf;
               p = zustr2stp(p, "Hello ", 6);
               p = zustr2stp(p, "world", 42);  // Padding null bytes ignored.
               p = zustr2stp(p, "!", 1);
               len = p - buf;
               puts(buf);

        strncat(3)
               buf[0] = '\0';  // There’s no ’cpy’ function to this ’cat’.
               strncat(buf, "Hello ", 6);
               strncat(buf, "world", 42);  // Padding null bytes ignored.
               strncat(buf, "!", 1);
               len = strlen(buf);
               puts(buf);

        ustpcpy(3)
               p = buf;
               p = ustpcpy(p, "Hello ", 6);
               p = ustpcpy(p, "world", 5);
               p = ustpcpy(p, "!", 1);
               len = p - buf;
               printf("%.*s\n", (int) len, buf);

        ustr2stp(3)
               p = buf;
               p = ustr2stp(p, "Hello ", 6);
               p = ustr2stp(p, "world", 5);
               p = ustr2stp(p, "!", 1);
               len = p - buf;
               puts(buf);

    Implementations
        Here are reference implementations for functions not provided by libc.

            /* This code is in the public domain. */

            char *
            stpecpy(char *dst, char past_end[0], const char *restrict src)
            {
                char *p;

                if (dst == past_end)
                    return past_end;

                p = memccpy(dst, src, '\0', past_end - dst);
                if (p != NULL)
                    return p - 1;

                /* truncation detected */
                past_end[-1] = '\0';
                return past_end;
            }

            char *
            stpecpyx(char *dst, char past_end[0], const char *restrict src)
            {
                if (src[strlen(src)] != '\0')
                    raise(SIGSEGV);

                return stpecpy(dst, past_end, src);
            }

            char *
            zustr2ustp(char *restrict dst, const char *restrict src, size_t sz)
            {
                return ustpcpy(dst, src, strnlen(src, sz));
            }

            char *
            zustr2stp(char *restrict dst, const char *restrict src, size_t sz)
            {
                char  *end;

                end = zustr2ustp(dst, src, sz);
                *end = '\0';

                return end;
            }

            char *
            ustpcpy(char *restrict dst, const char *restrict src, size_t len)
            {
                return mempcpy(dst, src, len);
            }

            char *
            ustr2stp(char *restrict dst, const char *restrict src, size_t len)
            {
                char  *end;

                end = ustpcpy(dst, src, len);
                *end = '\0';

                return end;
            }

SEE ALSO
        bzero(3), memcpy(3), memccpy(3), mempcpy(3), stpcpy(3),  strlcpy(3bsd),
        strncat(3), strpcpy(3), string(3)

Linux man‐pages (unreleased)        (date)                      string_copy(7)


-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 3/5] stpcpy.3, strcpy.3, strcat.3: Document in a single page
  2022-12-15  0:26         ` [PATCH v5 3/5] stpcpy.3, strcpy.3, strcat.3: Document in a single page Alejandro Colomar
@ 2022-12-16 14:46           ` Alejandro Colomar
  2022-12-16 14:47             ` Alejandro Colomar
  0 siblings, 1 reply; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-16 14:46 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, Martin Sebor, G. Branden Robinson,
	Douglas McIlroy, Jakub Wilk, Serge Hallyn, Iker Pedrosa,
	Andrew Pinski


[-- Attachment #1.1: Type: text/plain, Size: 13361 bytes --]

Hi!

The formatted version of this page was sent accidentally as reply to 2/3.
Since 2/5 are only link pages, there's no formatted page for them.

Cheers,

Alex

On 12/15/22 01:26, Alejandro Colomar wrote:
> Rewrite to be consistent with the new string_copy.7 page.
> 
> Cc: Martin Sebor <msebor@redhat.com>
> Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
> Cc: Douglas McIlroy <douglas.mcilroy@dartmouth.edu>
> Cc: Jakub Wilk <jwilk@jwilk.net>
> Cc: Serge Hallyn <serge@hallyn.com>
> Cc: Iker Pedrosa <ipedrosa@redhat.com>
> Cc: Andrew Pinski <pinskia@gmail.com>
> Signed-off-by: Alejandro Colomar <alx@kernel.org>
> ---
>   man3/stpcpy.3 |  13 ---
>   man3/strcat.3 | 161 +----------------------------------
>   man3/strcpy.3 | 226 +++++++++++++++++++++++++++++++-------------------
>   3 files changed, 143 insertions(+), 257 deletions(-)
> 
> diff --git a/man3/stpcpy.3 b/man3/stpcpy.3
> index 5770790fc..d01c0239b 100644
> --- a/man3/stpcpy.3
> +++ b/man3/stpcpy.3
> @@ -14,19 +14,6 @@ .SH SYNOPSIS
>   .PP
>   .BI "char *stpcpy(char *restrict " dest ", const char *restrict " src );
>   .fi
> -.PP
> -.RS -4
> -Feature Test Macro Requirements for glibc (see
> -.BR feature_test_macros (7)):
> -.RE
> -.PP
> -.BR stpcpy ():
> -.nf
> -    Since glibc 2.10:
> -        _POSIX_C_SOURCE >= 200809L
> -    Before glibc 2.10:
> -        _GNU_SOURCE
> -.fi
>   .SH DESCRIPTION
>   The
>   .BR stpcpy ()
> diff --git a/man3/strcat.3 b/man3/strcat.3
> index 277e5b1e4..ff7476a84 100644
> --- a/man3/strcat.3
> +++ b/man3/strcat.3
> @@ -1,160 +1 @@
> -.\" Copyright 1993 David Metcalfe (david@prism.demon.co.uk)
> -.\"
> -.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> -.\"
> -.\" References consulted:
> -.\"     Linux libc source code
> -.\"     Lewine's _POSIX Programmer's Guide_ (O'Reilly & Associates, 1991)
> -.\"     386BSD man pages
> -.\" Modified Sat Jul 24 18:11:47 1993 by Rik Faith (faith@cs.unc.edu)
> -.\" 2007-06-15, Marc Boyer <marc.boyer@enseeiht.fr> + mtk
> -.\"     Improve discussion of strncat().
> -.TH strcat 3 (date) "Linux man-pages (unreleased)"
> -.SH NAME
> -strcat \- concatenate two strings
> -.SH LIBRARY
> -Standard C library
> -.RI ( libc ", " \-lc )
> -.SH SYNOPSIS
> -.nf
> -.B #include <string.h>
> -.PP
> -.BI "char *strcat(char *restrict " dest ", const char *restrict " src );
> -.fi
> -.SH DESCRIPTION
> -The
> -.BR strcat ()
> -function appends the
> -.I src
> -string to the
> -.I dest
> -string,
> -overwriting the terminating null byte (\(aq\e0\(aq) at the end of
> -.IR dest ,
> -and then adds a terminating null byte.
> -The strings may not overlap, and the
> -.I dest
> -string must have
> -enough space for the result.
> -If
> -.I dest
> -is not large enough, program behavior is unpredictable;
> -.IR "buffer overruns are a favorite avenue for attacking secure programs" .
> -.SH RETURN VALUE
> -The
> -.BR strcat ()
> -function returns a pointer to the resulting string
> -.IR dest .
> -.SH ATTRIBUTES
> -For an explanation of the terms used in this section, see
> -.BR attributes (7).
> -.ad l
> -.nh
> -.TS
> -allbox;
> -lbx lb lb
> -l l l.
> -Interface	Attribute	Value
> -T{
> -.BR strcat (),
> -.BR strncat ()
> -T}	Thread safety	MT-Safe
> -.TE
> -.hy
> -.ad
> -.sp 1
> -.SH STANDARDS
> -POSIX.1-2001, POSIX.1-2008, C89, C99, SVr4, 4.3BSD.
> -.SH NOTES
> -Some systems (the BSDs, Solaris, and others) provide the following function:
> -.PP
> -.in +4n
> -.EX
> -size_t strlcat(char *dest, const char *src, size_t size);
> -.EE
> -.in
> -.PP
> -This function appends the null-terminated string
> -.I src
> -to the string
> -.IR dest ,
> -copying at most
> -.I size\-strlen(dest)\-1
> -from
> -.IR src ,
> -and adds a terminating null byte to the result,
> -.I unless
> -.I size
> -is less than
> -.IR strlen(dest) .
> -This function fixes the buffer overrun problem of
> -.BR strcat (),
> -but the caller must still handle the possibility of data loss if
> -.I size
> -is too small.
> -The function returns the length of the string
> -.BR strlcat ()
> -tried to create; if the return value is greater than or equal to
> -.IR size ,
> -data loss occurred.
> -If data loss matters, the caller
> -.I must
> -either check the arguments before the call, or test the function return value.
> -.BR strlcat ()
> -is not present in glibc and is not standardized by POSIX,
> -.\" https://lwn.net/Articles/506530/
> -but is available on Linux via the
> -.I libbsd
> -library.
> -.\"
> -.SH EXAMPLES
> -Because
> -.BR strcat ()
> -must find the null byte that terminates the string
> -.I dest
> -using a search that starts at the beginning of the string,
> -the execution time of this function
> -scales according to the length of the string
> -.IR dest .
> -This can be demonstrated by running the program below.
> -(If the goal is to concatenate many strings to one target,
> -then manually copying the bytes from each source string
> -while maintaining a pointer to the end of the target string
> -will provide better performance.)
> -.\"
> -.SS Program source
> -\&
> -.\" SRC BEGIN (strcat.c)
> -.EX
> -#include <stdint.h>
> -#include <stdio.h>
> -#include <string.h>
> -#include <time.h>
> -
> -int
> -main(void)
> -{
> -#define LIM 4000000
> -    char p[LIM + 1];    /* +1 for terminating null byte */
> -    time_t base;
> -
> -    base = time(NULL);
> -    p[0] = \(aq\e0\(aq;
> -
> -    for (unsigned int j = 0; j < LIM; j++) {
> -        if ((j % 10000) == 0)
> -            printf("%u %jd\en", j, (intmax_t) (time(NULL) \- base));
> -        strcat(p, "a");
> -    }
> -}
> -.EE
> -.\" SRC END
> -.SH SEE ALSO
> -.BR bcopy (3),
> -.BR memccpy (3),
> -.BR memcpy (3),
> -.BR strcpy (3),
> -.BR string (3),
> -.BR strlcat (3bsd),
> -.BR wcscat (3),
> -.BR wcsncat (3)
> +.so man3/strcpy.3
> diff --git a/man3/strcpy.3 b/man3/strcpy.3
> index 74c3180ae..424648c46 100644
> --- a/man3/strcpy.3
> +++ b/man3/strcpy.3
> @@ -1,20 +1,10 @@
> -.\" Copyright (C) 1993 David Metcalfe (david@prism.demon.co.uk)
> +.\" Copyright 2022 Alejandro Colomar <alx@kernel.org>
>   .\"
>   .\" SPDX-License-Identifier: Linux-man-pages-copyleft
>   .\"
> -.\" References consulted:
> -.\"     Linux libc source code
> -.\"     Lewine's _POSIX Programmer's Guide_ (O'Reilly & Associates, 1991)
> -.\"     386BSD man pages
> -.\" Modified Sat Jul 24 18:06:49 1993 by Rik Faith (faith@cs.unc.edu)
> -.\" Modified Fri Aug 25 23:17:51 1995 by Andries Brouwer (aeb@cwi.nl)
> -.\" Modified Wed Dec 18 00:47:18 1996 by Andries Brouwer (aeb@cwi.nl)
> -.\" 2007-06-15, Marc Boyer <marc.boyer@enseeiht.fr> + mtk
> -.\"     Improve discussion of strncpy().
> -.\"
>   .TH strcpy 3 (date) "Linux man-pages (unreleased)"
>   .SH NAME
> -strcpy \- copy a string
> +strcpy \- copy or catenate a string
>   .SH LIBRARY
>   Standard C library
>   .RI ( libc ", " \-lc )
> @@ -22,26 +12,87 @@ .SH SYNOPSIS
>   .nf
>   .B #include <string.h>
>   .PP
> -.BI "char *strcpy(char *restrict " dest ", const char *restrict " src );
> +.BI "char *stpcpy(char *restrict " dst ", const char *restrict " src );
> +.BI "char *strcpy(char *restrict " dst ", const char *restrict " src );
> +.BI "char *strcat(char *restrict " dst ", const char *restrict " src );
> +.fi
> +.PP
> +.RS -4
> +Feature Test Macro Requirements for glibc (see
> +.BR feature_test_macros (7)):
> +.RE
> +.PP
> +.BR stpcpy ():
> +.nf
> +    Since glibc 2.10:
> +        _POSIX_C_SOURCE >= 200809L
> +    Before glibc 2.10:
> +        _GNU_SOURCE
>   .fi
>   .SH DESCRIPTION
> -The
> +.TP
> +.BR stpcpy ()
> +.TQ
>   .BR strcpy ()
> -function copies the string pointed to by
> +These functions copy the string pointed to by
>   .IR src ,
> -including the terminating null byte (\(aq\e0\(aq),
> -to the buffer pointed to by
> -.IR dest .
> -The strings may not overlap, and the destination string
> -.I dest
> -must be large enough to receive the copy.
> -.I Beware of buffer overruns!
> -(See BUGS.)
> +into a string
> +at the buffer pointed to by
> +.IR dst .
> +The programmer is responsible for allocating a buffer large enough,
> +that is,
> +.IR "strlen(src) + 1" .
> +They only differ in the return value.
> +.TP
> +.BR strcat ()
> +This function catenates the string pointed to by
> +.IR src ,
> +at the end of the string pointed to by
> +.IR dst .
> +The programmer is responsible for allocating a buffer large enough,
> +that is,
> +.IR "strlen(dst) + strlen(src) + 1" .
> +.PP
> +An implementation of these functions might be:
> +.PP
> +.in +4n
> +.EX
> +char *
> +stpcpy(char *restrict dst, const char *restrict src)
> +{
> +    char  *end;
> +
> +    end = mempcpy(dst, src, strlen(src));
> +    *end = \(aq\e0\(aq;
> +
> +    return end;
> +}
> +
> +char *
> +strcpy(char *restrict dst, const char *restrict src)
> +{
> +    stpcpy(dst, src);
> +    return dst;
> +}
> +
> +char *
> +strcat(char *restrict dst, const char *restrict src)
> +{
> +    stpcpy(dst + strlen(dst), src);
> +    return dst;
> +}
> +.EE
> +.in
>   .SH RETURN VALUE
> -The
> +.TP
> +.BR stpcpy ()
> +This function returns
> +a pointer to the terminating null byte at the end of the copied string.
> +.TP
>   .BR strcpy ()
> -function returns a pointer to
> -the destination string
> +.TQ
> +.BR strcat ()
> +These functions return
>   .IR dest .
>   .SH ATTRIBUTES
>   For an explanation of the terms used in this section, see
> @@ -54,73 +105,80 @@ .SH ATTRIBUTES
>   l l l.
>   Interface	Attribute	Value
>   T{
> -.BR strcpy ()
> +.BR stpcpy (),
> +.BR strcpy (),
> +.BR strcat ()
>   T}	Thread safety	MT-Safe
>   .TE
>   .hy
>   .ad
>   .sp 1
>   .SH STANDARDS
> +.TP
> +.BR stpcpy ()
> +POSIX.1-2008.
> +.TP
> +.BR strcpy ()
> +.TQ
> +.BR strcat ()
>   POSIX.1-2001, POSIX.1-2008, C89, C99, SVr4, 4.3BSD.
> -.SH NOTES
> -.SS strlcpy()
> -Some systems (the BSDs, Solaris, and others) provide the following function:
> +.SH CAVEATS
> +The strings
> +.I src
> +and
> +.I dst
> +may not overlap.
>   .PP
> -.in +4n
> -.EX
> -size_t strlcpy(char *dest, const char *src, size_t size);
> -.EE
> -.in
> -.PP
> -.\" http://static.usenix.org/event/usenix99/full_papers/millert/millert_html/index.html
> -.\"     "strlcpy and strlcat - consistent, safe, string copy and concatenation"
> -.\"     1999 USENIX Annual Technical Conference
> -This function is similar to
> -.BR strcpy (),
> -but it copies at most
> -.I size\-1
> -bytes to
> -.IR dest ,
> -truncating the string as necessary.
> -It always adds a terminating null byte.
> -This function fixes some of the problems of
> -.BR strcpy ()
> -but the caller must still handle the possibility of data loss if
> -.I size
> -is too small.
> -The return value of the function is the length of
> -.IR src ,
> -which allows truncation to be easily detected:
> -if the return value is greater than or equal to
> -.IR size ,
> -truncation occurred.
> -If loss of data matters, the caller
> -.I must
> -either check the arguments before the call,
> -or test the function return value.
> -.BR strlcpy ()
> -is not present in glibc and is not standardized by POSIX,
> -.\" https://lwn.net/Articles/506530/
> -but is available on Linux via the
> -.I libbsd
> -library.
> +If the destination buffer is not large enough,
> +the behavior is undefined.
> +See
> +.B _FORTIFY_SOURCE
> +in
> +.BR feature_test_macros (7).
>   .SH BUGS
> -If the destination string of a
> -.BR strcpy ()
> -is not large enough, then anything might happen.
> -Overflowing fixed-length string buffers is a favorite cracker technique
> -for taking complete control of the machine.
> -Any time a program reads or copies data into a buffer,
> -the program first needs to check that there's enough space.
> -This may be unnecessary if you can show that overflow is impossible,
> -but be careful: programs can get changed over time,
> -in ways that may make the impossible possible.
> +.TP
> +.BR strcat ()
> +This function can be very inefficient.
> +Read about
> +.UR https://www.joelonsoftware.com/\:2001/12/11/\:back\-to\-basics/
> +Shlemiel the painter
> +.UE .
> +.SH EXAMPLES
> +.EX
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +
> +int
> +main(void)
> +{
> +    char    *p;
> +    char    buf1[BUFSIZ];
> +    char    buf2[BUFSIZ];
> +    size_t  len;
> +
> +    p = buf1;
> +    p = stpcpy(p, "Hello ");
> +    p = stpcpy(p, "world");
> +    p = stpcpy(p, "!");
> +    len = p \- buf1;
> +
> +    printf("[len = %zu]: ", len);
> +    puts(buf1);  // "Hello world!"
> +
> +    strcpy(buf2, "Hello ");
> +    strcat(buf2, "world");
> +    strcat(buf2, "!");
> +    len = strlen(buf2);
> +
> +    printf("[len = %zu]: ", len);
> +    puts(buf2);  // "Hello world!"
> +
> +    exit(EXIT_SUCCESS);
> +}
> +.EE
>   .SH SEE ALSO
> -.BR bcopy (3),
> -.BR memccpy (3),
> -.BR memcpy (3),
> -.BR memmove (3),
> -.BR stpcpy (3),
>   .BR strdup (3),
>   .BR string (3),
> -.BR wcscpy (3)
> +.BR wcscpy (3),
> +.BR string_copy (7)

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 3/5] stpcpy.3, strcpy.3, strcat.3: Document in a single page
  2022-12-16 14:46           ` Alejandro Colomar
@ 2022-12-16 14:47             ` Alejandro Colomar
  0 siblings, 0 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-16 14:47 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, Martin Sebor, G. Branden Robinson,
	Douglas McIlroy, Jakub Wilk, Serge Hallyn, Iker Pedrosa,
	Andrew Pinski


[-- Attachment #1.1: Type: text/plain, Size: 273 bytes --]



On 12/16/22 15:46, Alejandro Colomar wrote:
> Hi!
> 
> The formatted version of this page was sent accidentally as reply to 2/3.

D'oh!  I meant 2/5.

> Since 2/5 are only link pages, there's no formatted page for them.
-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 2/5] stpecpy.3, stpecpyx.3, ustpcpy.3, ustr2stp.3, zustr2stp.3, zustr2ustp.3: Add new links to string_copy(7)
  2022-12-15  0:27           ` Alejandro Colomar
@ 2022-12-16 18:47             ` Stefan Puiu
  2022-12-16 19:03               ` Alejandro Colomar
  0 siblings, 1 reply; 53+ messages in thread
From: Stefan Puiu @ 2022-12-16 18:47 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: linux-man, Alejandro Colomar, Martin Sebor, G. Branden Robinson,
	Douglas McIlroy, Jakub Wilk, Serge Hallyn, Iker Pedrosa,
	Andrew Pinski

Hi Alex!

On Thu, Dec 15, 2022 at 2:46 AM Alejandro Colomar
<alx.manpages@gmail.com> wrote:
>
> Formatted strpcy(3):
>
> strcpy(3)                  Library Functions Manual                  strcpy(3)
>
> NAME
>         strcpy - copy or catenate a string
>
> LIBRARY
>         Standard C library (libc, -lc)
>
> SYNOPSIS
>         #include <string.h>
>
>         char *stpcpy(char *restrict dst, const char *restrict src);
>         char *strcpy(char *restrict dst, const char *restrict src);
>         char *strcat(char *restrict dst, const char *restrict src);
>
>     Feature Test Macro Requirements for glibc (see feature_test_macros(7)):
>
>         stpcpy():
>             Since glibc 2.10:
>                 _POSIX_C_SOURCE >= 200809L
>             Before glibc 2.10:
>                 _GNU_SOURCE
>
> DESCRIPTION
>         stpcpy()
>         strcpy()
>                These functions copy the string pointed to by src, into a string
>                at  the buffer pointed to by dst.  The programmer is responsible
>                for allocating a buffer large enough, that is, strlen(src) +  1.
>                They only differ in the return value.

A destination buffer large enough? It's not that obvious to me from
the text, but maybe I'm tired :).
I was also a bit at a loss about the difference between the two; maybe
you can say "For the difference between the two, see RETURN VALUE"?

>
>         strcat()
>                This function catenates the string pointed to by src, at the end
>                of  the string pointed to by dst.  The programmer is responsible
>                for allocating a buffer large enough,  that  is,  strlen(dst)  +
>                strlen(src) + 1.

Ditto here.

>
>         An implementation of these functions might be:
>
>             char *
>             stpcpy(char *restrict dst, const char *restrict src)
>             {
>                 char  *end;
>
>                 end = mempcpy(dst, src, strlen(src));
>                 *end = '\0';
>
>                 return end;
>             }
>
>             char *
>             strcpy(char *restrict dst, const char *restrict src)
>             {
>                 stpcpy(dst, src);
>                 return dst;
>             }
>
>             char *
>             strcat(char *restrict dst, const char *restrict src)
>             {
>                 stpcpy(dst + strlen(dst), src);
>                 return dst;
>             }

Are you sure this section adds any value? I think good documentation
should explain how a function works without delving into the
interpretation. Also, people might get confused and think this is the
actual implementation.

>
> RETURN VALUE
>         stpcpy()
>                This  function returns a pointer to the terminating null byte at
>                the end of the copied string.
>
>         strcpy()
>         strcat()
>                These functions return dest.
>
> ATTRIBUTES
>         For an explanation of the terms  used  in  this  section,  see  attrib‐
>         utes(7).
>         ┌────────────────────────────────────────────┬───────────────┬─────────┐
>         │Interface                                   │ Attribute     │ Value   │
>         ├────────────────────────────────────────────┼───────────────┼─────────┤
>         │stpcpy(), strcpy(), strcat()                │ Thread safety │ MT‐Safe │
>         └────────────────────────────────────────────┴───────────────┴─────────┘
>
> STANDARDS
>         stpcpy()
>                POSIX.1‐2008.
>
>         strcpy()
>         strcat()
>                POSIX.1‐2001, POSIX.1‐2008, C89, C99, SVr4, 4.3BSD.
>
> CAVEATS
>         The strings src and dst may not overlap.
>
>         If  the  destination  buffer is not large enough, the behavior is unde‐
>         fined.  See _FORTIFY_SOURCE in feature_test_macros(7).
>
> BUGS
>         strcat()
>                This function can be  very  inefficient.   Read  about  Shlemiel
>                the      painter     ⟨https://www.joelonsoftware.com/2001/12/11/
>                back-to-basics/⟩.

I'm not sure this is a bug, rather a design limitation. Maybe it
belongs in NOTES or CAVEATS? Also, I think this can be summarized
along the lines of 'strcat needs to walk the destination buffer to
find the null terminator, so it has linear complexity with respect to
the size of the destination buffer up to the terminator' (hmm, I'm
sure this can be expressed more concisely), so the page is more self
contained. Outside links sometimes go dead, like on Wikipedia, so I
think just in case, it helps to make explicit the point that you want
the reader to study further in the URL.

Regards,
Stefan.

>
> EXAMPLES
>         #include <stdio.h>
>         #include <stdlib.h>
>         #include <string.h>
>
>         int
>         main(void)
>         {
>             char    *p;
>             char    buf1[BUFSIZ];
>             char    buf2[BUFSIZ];
>             size_t  len;
>
>             p = buf1;
>             p = stpcpy(p, "Hello ");
>             p = stpcpy(p, "world");
>             p = stpcpy(p, "!");
>             len = p - buf1;
>
>             printf("[len = %zu]: ", len);
>             puts(buf1);  // "Hello world!"
>
>             strcpy(buf2, "Hello ");
>             strcat(buf2, "world");
>             strcat(buf2, "!");
>             len = strlen(buf2);
>
>             printf("[len = %zu]: ", len);
>             puts(buf2);  // "Hello world!"
>
>             exit(EXIT_SUCCESS);
>         }
>
> SEE ALSO
>         strdup(3), string(3), wcscpy(3), string_copy(7)
>
> Linux man‐pages (unreleased)        (date)                           strcpy(3)
>
> --
> <http://www.alejandro-colomar.es/>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 2/5] stpecpy.3, stpecpyx.3, ustpcpy.3, ustr2stp.3, zustr2stp.3, zustr2ustp.3: Add new links to string_copy(7)
  2022-12-16 18:47             ` Stefan Puiu
@ 2022-12-16 19:03               ` Alejandro Colomar
  2022-12-16 19:09                 ` Alejandro Colomar
  0 siblings, 1 reply; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-16 19:03 UTC (permalink / raw)
  To: Stefan Puiu
  Cc: linux-man, Alejandro Colomar, Martin Sebor, G. Branden Robinson,
	Douglas McIlroy, Jakub Wilk, Serge Hallyn, Iker Pedrosa,
	Andrew Pinski


[-- Attachment #1.1: Type: text/plain, Size: 8165 bytes --]

Hi Stefan,

On 12/16/22 19:47, Stefan Puiu wrote:
> Hi Alex!
> 
> On Thu, Dec 15, 2022 at 2:46 AM Alejandro Colomar
> <alx.manpages@gmail.com> wrote:
>>
>> Formatted strpcy(3):
>>
>> strcpy(3)                  Library Functions Manual                  strcpy(3)
>>
>> NAME
>>          strcpy - copy or catenate a string
>>
>> LIBRARY
>>          Standard C library (libc, -lc)
>>
>> SYNOPSIS
>>          #include <string.h>
>>
>>          char *stpcpy(char *restrict dst, const char *restrict src);
>>          char *strcpy(char *restrict dst, const char *restrict src);
>>          char *strcat(char *restrict dst, const char *restrict src);
>>
>>      Feature Test Macro Requirements for glibc (see feature_test_macros(7)):
>>
>>          stpcpy():
>>              Since glibc 2.10:
>>                  _POSIX_C_SOURCE >= 200809L
>>              Before glibc 2.10:
>>                  _GNU_SOURCE
>>
>> DESCRIPTION
>>          stpcpy()
>>          strcpy()
>>                 These functions copy the string pointed to by src, into a string
>>                 at  the buffer pointed to by dst.  The programmer is responsible
>>                 for allocating a buffer large enough, that is, strlen(src) +  1.
>>                 They only differ in the return value.
> 
> A destination buffer large enough? It's not that obvious to me from
> the text, but maybe I'm tired :).

Sure.  Thanks!

> I was also a bit at a loss about the difference between the two; maybe
> you can say "For the difference between the two, see RETURN VALUE"?

That can make sense, yes.

> 
>>
>>          strcat()
>>                 This function catenates the string pointed to by src, at the end
>>                 of  the string pointed to by dst.  The programmer is responsible
>>                 for allocating a buffer large enough,  that  is,  strlen(dst)  +
>>                 strlen(src) + 1.
> 
> Ditto here.

:)

> 
>>
>>          An implementation of these functions might be:
>>
>>              char *
>>              stpcpy(char *restrict dst, const char *restrict src)
>>              {
>>                  char  *end;
>>
>>                  end = mempcpy(dst, src, strlen(src));
>>                  *end = '\0';
>>
>>                  return end;
>>              }
>>
>>              char *
>>              strcpy(char *restrict dst, const char *restrict src)
>>              {
>>                  stpcpy(dst, src);
>>                  return dst;
>>              }
>>
>>              char *
>>              strcat(char *restrict dst, const char *restrict src)
>>              {
>>                  stpcpy(dst + strlen(dst), src);
>>                  return dst;
>>              }
> 
> Are you sure this section adds any value? I think good documentation
> should explain how a function works without delving into the
> interpretation.

To be honest, this page doesn't benefit too much from it.  strcpy(3)/strcat(3) 
are dead simple, and the explanations above should be enough.

However, the same thing in strncpy(3) and strncat(3) is very helpful, IMO.  For 
consistency I just showed trivial implementations in all of the pages.  (And in 
fact, there was an example implementation in the old strncat(3) and maybe a few 
others, IIRC.)

> Also, people might get confused and think this is the
> actual implementation.

I don't think there's any problem if one believes this is the implementation. 
Except for stpcpy(3), in which I preferred readability, they are actually quite 
good implementations.  A faster implementation of stpcpy(3) might be done in 
terms of memccpy(3).

Funnily enough, I just checked what musl libc does, and it's the same as shown here:


alx@debian:~/src/musl/musl$ grepc -tfd strcpy
./src/string/strcpy.c:3:
char *strcpy(char *restrict dest, const char *restrict src)
{
	__stpcpy(dest, src);
	return dest;
}
alx@debian:~/src/musl/musl$ grepc -tfd strcat
./src/string/strcat.c:3:
char *strcat(char *restrict dest, const char *restrict src)
{
	strcpy(dest + strlen(dest), src);
	return dest;
}


> 
>>
>> RETURN VALUE
>>          stpcpy()
>>                 This  function returns a pointer to the terminating null byte at
>>                 the end of the copied string.
>>
>>          strcpy()
>>          strcat()
>>                 These functions return dest.
>>
>> ATTRIBUTES
>>          For an explanation of the terms  used  in  this  section,  see  attrib‐
>>          utes(7).
>>          ┌────────────────────────────────────────────┬───────────────┬─────────┐
>>          │Interface                                   │ Attribute     │ Value   │
>>          ├────────────────────────────────────────────┼───────────────┼─────────┤
>>          │stpcpy(), strcpy(), strcat()                │ Thread safety │ MT‐Safe │
>>          └────────────────────────────────────────────┴───────────────┴─────────┘
>>
>> STANDARDS
>>          stpcpy()
>>                 POSIX.1‐2008.
>>
>>          strcpy()
>>          strcat()
>>                 POSIX.1‐2001, POSIX.1‐2008, C89, C99, SVr4, 4.3BSD.
>>
>> CAVEATS
>>          The strings src and dst may not overlap.
>>
>>          If  the  destination  buffer is not large enough, the behavior is unde‐
>>          fined.  See _FORTIFY_SOURCE in feature_test_macros(7).
>>
>> BUGS
>>          strcat()
>>                 This function can be  very  inefficient.   Read  about  Shlemiel
>>                 the      painter     ⟨https://www.joelonsoftware.com/2001/12/11/
>>                 back-to-basics/⟩.
> 
> I'm not sure this is a bug, rather a design limitation. Maybe it
> belongs in NOTES or CAVEATS?

Yeah, I had been thinking of downgrading it.  I'll do it.

> Also, I think this can be summarized
> along the lines of 'strcat needs to walk the destination buffer to
> find the null terminator, so it has linear complexity with respect to
> the size of the destination buffer up to the terminator' (hmm, I'm
> sure this can be expressed more concisely), so the page is more self
> contained. Outside links sometimes go dead, like on Wikipedia, so I
> think just in case, it helps to make explicit the point that you want
> the reader to study further in the URL.

I wasn't inspired to write it short enough to not be too verbose.  Maybe I'll 
write something based on your suggestion.

> 
> Regards,
> Stefan.

Thanks for the review!

Cheers,

Alex

> 
>>
>> EXAMPLES
>>          #include <stdio.h>
>>          #include <stdlib.h>
>>          #include <string.h>
>>
>>          int
>>          main(void)
>>          {
>>              char    *p;
>>              char    buf1[BUFSIZ];
>>              char    buf2[BUFSIZ];
>>              size_t  len;
>>
>>              p = buf1;
>>              p = stpcpy(p, "Hello ");
>>              p = stpcpy(p, "world");
>>              p = stpcpy(p, "!");
>>              len = p - buf1;
>>
>>              printf("[len = %zu]: ", len);
>>              puts(buf1);  // "Hello world!"
>>
>>              strcpy(buf2, "Hello ");
>>              strcat(buf2, "world");
>>              strcat(buf2, "!");
>>              len = strlen(buf2);
>>
>>              printf("[len = %zu]: ", len);
>>              puts(buf2);  // "Hello world!"
>>
>>              exit(EXIT_SUCCESS);
>>          }
>>
>> SEE ALSO
>>          strdup(3), string(3), wcscpy(3), string_copy(7)
>>
>> Linux man‐pages (unreleased)        (date)                           strcpy(3)
>>
>> --
>> <http://www.alejandro-colomar.es/>

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 2/5] stpecpy.3, stpecpyx.3, ustpcpy.3, ustr2stp.3, zustr2stp.3, zustr2ustp.3: Add new links to string_copy(7)
  2022-12-16 19:03               ` Alejandro Colomar
@ 2022-12-16 19:09                 ` Alejandro Colomar
  0 siblings, 0 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-16 19:09 UTC (permalink / raw)
  To: Stefan Puiu
  Cc: linux-man, Alejandro Colomar, Martin Sebor, G. Branden Robinson,
	Douglas McIlroy, Jakub Wilk, Serge Hallyn, Iker Pedrosa,
	Andrew Pinski


[-- Attachment #1.1: Type: text/plain, Size: 8054 bytes --]



On 12/16/22 20:03, Alejandro Colomar wrote:
> On 12/16/22 19:47, Stefan Puiu wrote:
>> On Thu, Dec 15, 2022 at 2:46 AM Alejandro Colomar
>> <alx.manpages@gmail.com> wrote:
>>>          An implementation of these functions might be:
>>>
>>>              char *
>>>              stpcpy(char *restrict dst, const char *restrict src)
>>>              {
>>>                  char  *end;
>>>
>>>                  end = mempcpy(dst, src, strlen(src));
>>>                  *end = '\0';
>>>
>>>                  return end;
>>>              }
>>>
>>>              char *
>>>              strcpy(char *restrict dst, const char *restrict src)
>>>              {
>>>                  stpcpy(dst, src);
>>>                  return dst;
>>>              }
>>>
>>>              char *
>>>              strcat(char *restrict dst, const char *restrict src)
>>>              {
>>>                  stpcpy(dst + strlen(dst), src);
>>>                  return dst;
>>>              }
>>
>> Are you sure this section adds any value? I think good documentation
>> should explain how a function works without delving into the
>> interpretation.
> 
> To be honest, this page doesn't benefit too much from it.  strcpy(3)/strcat(3) 
> are dead simple, and the explanations above should be enough.
> 
> However, the same thing in strncpy(3) and strncat(3) is very helpful, IMO.  For 
> consistency I just showed trivial implementations in all of the pages.  (And in 
> fact, there was an example implementation in the old strncat(3) and maybe a few 
> others, IIRC.)
> 
>> Also, people might get confused and think this is the
>> actual implementation.
> 
> I don't think there's any problem if one believes this is the implementation. 
> Except for stpcpy(3), in which I preferred readability, they are actually quite 
> good implementations.  A faster implementation of stpcpy(3) might be done in 
> terms of memccpy(3).
> 
> Funnily enough, I just checked what musl libc does, and it's the same as shown 
> here:
> 
> 
> alx@debian:~/src/musl/musl$ grepc -tfd strcpy
> ./src/string/strcpy.c:3:
> char *strcpy(char *restrict dest, const char *restrict src)
> {
>      __stpcpy(dest, src);
>      return dest;
> }
> alx@debian:~/src/musl/musl$ grepc -tfd strcat
> ./src/string/strcat.c:3:
> char *strcat(char *restrict dest, const char *restrict src)
> {
>      strcpy(dest + strlen(dest), src);
>      return dest;
> }
> 
> 

And considering memccpy(3) is defined in terms of memchr(3) and mempcpy(3) in 
glibc, I don't feel so bad about my own stpcpy(3) :).  See:


alx@debian:~/src/gnu/glibc$ grepc -tfd __memccpy
./string/memccpy.c:30:
void *
__memccpy (void *dest, const void *src, int c, size_t n)
{
   void *p = memchr (src, c, n);

   if (p != NULL)
     return __mempcpy (dest, src, p - src + 1);

   memcpy (dest, src, n);
   return NULL;
}


Cheers,

Alex


>>
>>>
>>> RETURN VALUE
>>>          stpcpy()
>>>                 This  function returns a pointer to the terminating null byte at
>>>                 the end of the copied string.
>>>
>>>          strcpy()
>>>          strcat()
>>>                 These functions return dest.
>>>
>>> ATTRIBUTES
>>>          For an explanation of the terms  used  in  this  section,  see  attrib‐
>>>          utes(7).
>>>          
>>> ┌────────────────────────────────────────────┬───────────────┬─────────┐
>>>          │Interface                                   │ Attribute     │ 
>>> Value   │
>>>          
>>> ├────────────────────────────────────────────┼───────────────┼─────────┤
>>>          │stpcpy(), strcpy(), strcat()                │ Thread safety │ 
>>> MT‐Safe │
>>>          
>>> └────────────────────────────────────────────┴───────────────┴─────────┘
>>>
>>> STANDARDS
>>>          stpcpy()
>>>                 POSIX.1‐2008.
>>>
>>>          strcpy()
>>>          strcat()
>>>                 POSIX.1‐2001, POSIX.1‐2008, C89, C99, SVr4, 4.3BSD.
>>>
>>> CAVEATS
>>>          The strings src and dst may not overlap.
>>>
>>>          If  the  destination  buffer is not large enough, the behavior is unde‐
>>>          fined.  See _FORTIFY_SOURCE in feature_test_macros(7).
>>>
>>> BUGS
>>>          strcat()
>>>                 This function can be  very  inefficient.   Read  about  Shlemiel
>>>                 the      painter     ⟨https://www.joelonsoftware.com/2001/12/11/
>>>                 back-to-basics/⟩.
>>
>> I'm not sure this is a bug, rather a design limitation. Maybe it
>> belongs in NOTES or CAVEATS?
> 
> Yeah, I had been thinking of downgrading it.  I'll do it.
> 
>> Also, I think this can be summarized
>> along the lines of 'strcat needs to walk the destination buffer to
>> find the null terminator, so it has linear complexity with respect to
>> the size of the destination buffer up to the terminator' (hmm, I'm
>> sure this can be expressed more concisely), so the page is more self
>> contained. Outside links sometimes go dead, like on Wikipedia, so I
>> think just in case, it helps to make explicit the point that you want
>> the reader to study further in the URL.
> 
> I wasn't inspired to write it short enough to not be too verbose.  Maybe I'll 
> write something based on your suggestion.
> 
>>
>> Regards,
>> Stefan.
> 
> Thanks for the review!
> 
> Cheers,
> 
> Alex
> 
>>
>>>
>>> EXAMPLES
>>>          #include <stdio.h>
>>>          #include <stdlib.h>
>>>          #include <string.h>
>>>
>>>          int
>>>          main(void)
>>>          {
>>>              char    *p;
>>>              char    buf1[BUFSIZ];
>>>              char    buf2[BUFSIZ];
>>>              size_t  len;
>>>
>>>              p = buf1;
>>>              p = stpcpy(p, "Hello ");
>>>              p = stpcpy(p, "world");
>>>              p = stpcpy(p, "!");
>>>              len = p - buf1;
>>>
>>>              printf("[len = %zu]: ", len);
>>>              puts(buf1);  // "Hello world!"
>>>
>>>              strcpy(buf2, "Hello ");
>>>              strcat(buf2, "world");
>>>              strcat(buf2, "!");
>>>              len = strlen(buf2);
>>>
>>>              printf("[len = %zu]: ", len);
>>>              puts(buf2);  // "Hello world!"
>>>
>>>              exit(EXIT_SUCCESS);
>>>          }
>>>
>>> SEE ALSO
>>>          strdup(3), string(3), wcscpy(3), string_copy(7)
>>>
>>> Linux man‐pages (unreleased)        (date)                           strcpy(3)
>>>
>>> -- 
>>> <http://www.alejandro-colomar.es/>
> 

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v6 0/5] Rewrite documentation for string-copying functions
  2022-12-15  0:26         ` [PATCH v5 0/5] Rewrite pages about " Alejandro Colomar
@ 2022-12-19 21:02           ` Alejandro Colomar
  2022-12-19 21:02           ` [PATCH v6 1/5] string_copy.7: Add page to document all " Alejandro Colomar
                             ` (4 subsequent siblings)
  5 siblings, 0 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-19 21:02 UTC (permalink / raw)
  To: linux-man, Martin Sebor, G. Branden Robinson, Douglas McIlroy,
	Jakub Wilk, Serge Hallyn, Iker Pedrosa, Andrew Pinski,
	Stefan Puiu
  Cc: Alejandro Colomar


Hi,

Yet another revision of this patch set.

v6:

-  Fixed a link page (stpcpy(3)).

-  Use malloc(3) in the examples, to show that buffers need to be
   properly allocated before these calls.

-  Return to the example program of strncat(3) that showed a more
   reallistic use (based on groff(1)'s source code, plus some
   imagination).

-  Use the term 'end' for one after the last element of an array, to be
   consistent with C++ (as Andrew pointed out).  It is also less to
   type, and using end for the end of the string and past_end for the
   buffer was a bit confusing, since it wasn't true that
   'end == past_end - 1'.  Now, I don't have a term for the end of a
   string, so I used the description instead of a term.  The name of
   such pointers is called 'p', following tradition (and the name of
   mempcpy(3) and stpcpy(3) and others).


This is likely to be the last revision before pushing.  I don't expect
important changes to occur, and I think we can improve the page once
it's been published.  This is already a big improvement over what we've
had for many years, and worth of being released to the public.


Cheers,

Alex


P.S.: I'm writing a library that implements the functions suggested here
that are not part of libc.  The code is already done, and I'm now
working on the build system.  After that, manual pages and Debian
packaging (I'll need help for the latter), and it'll be done.


Alejandro Colomar (5):
  string_copy.7: Add page to document all string-copying functions
  stpecpy.3, stpecpyx.3, ustpcpy.3, ustr2stp.3, zustr2stp.3,
    zustr2ustp.3: Add new links to string_copy(7)
  stpcpy.3, strcpy.3, strcat.3: Document in a single page
  stpncpy.3, strncpy.3: Document in a single page
  strncat.3: Rewrite to be consistent with string_copy.7.

 man3/stpcpy.3      | 116 +-----
 man3/stpecpy.3     |   1 +
 man3/stpecpyx.3    |   1 +
 man3/stpncpy.3     | 166 +++++----
 man3/strcat.3      | 162 +--------
 man3/strcpy.3      | 234 ++++++++-----
 man3/strncat.3     | 157 +++------
 man3/strncpy.3     | 130 +------
 man3/ustpcpy.3     |   1 +
 man3/ustr2stp.3    |   1 +
 man3/zustr2stp.3   |   1 +
 man3/zustr2ustp.3  |   1 +
 man7/string_copy.7 | 855 +++++++++++++++++++++++++++++++++++++++++++++
 13 files changed, 1172 insertions(+), 654 deletions(-)
 create mode 100644 man3/stpecpy.3
 create mode 100644 man3/stpecpyx.3
 create mode 100644 man3/ustpcpy.3
 create mode 100644 man3/ustr2stp.3
 create mode 100644 man3/zustr2stp.3
 create mode 100644 man3/zustr2ustp.3
 create mode 100644 man7/string_copy.7

-- 
2.39.0


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v6 1/5] string_copy.7: Add page to document all string-copying functions
  2022-12-15  0:26         ` [PATCH v5 0/5] Rewrite pages about " Alejandro Colomar
  2022-12-19 21:02           ` [PATCH v6 0/5] Rewrite documentation for " Alejandro Colomar
@ 2022-12-19 21:02           ` Alejandro Colomar
  2022-12-20 15:00             ` Stefan Puiu
  2023-01-20  3:43             ` Eric Biggers
  2022-12-19 21:02           ` [PATCH v6 2/5] stpecpy.3, stpecpyx.3, ustpcpy.3, ustr2stp.3, zustr2stp.3, zustr2ustp.3: Add new links to string_copy(7) Alejandro Colomar
                             ` (3 subsequent siblings)
  5 siblings, 2 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-19 21:02 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, Martin Sebor, G. Branden Robinson,
	Douglas McIlroy, Jakub Wilk, Serge Hallyn, Iker Pedrosa,
	Andrew Pinski, Stefan Puiu

This is an opportunity to use consistent language across the
documentation for all string-copying functions.

It is also easier to show the similarities and differences between all
of the functions, so that a reader can use this page to know which
function is needed for a given task.

Alternative functions not provided by libc have been given in the same
page, with reference implementations.

Cc: Martin Sebor <msebor@redhat.com>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Douglas McIlroy <douglas.mcilroy@dartmouth.edu>
Cc: Jakub Wilk <jwilk@jwilk.net>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Iker Pedrosa <ipedrosa@redhat.com>
Cc: Andrew Pinski <pinskia@gmail.com>
Cc: Stefan Puiu <stefan.puiu@gmail.com>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man7/string_copy.7 | 855 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 855 insertions(+)
 create mode 100644 man7/string_copy.7

diff --git a/man7/string_copy.7 b/man7/string_copy.7
new file mode 100644
index 000000000..a32b93c01
--- /dev/null
+++ b/man7/string_copy.7
@@ -0,0 +1,855 @@
+.\" Copyright 2022 Alejandro Colomar <alx@kernel.org>
+.\"
+.\" SPDX-License-Identifier: BSD-3-Clause
+.\"
+.TH string_copy 7 (date) "Linux man-pages (unreleased)"
+.\" ----- NAME :: -----------------------------------------------------/
+.SH NAME
+stpcpy,
+strcpy, strcat,
+stpecpy, stpecpyx,
+strlcpy, strlcat,
+stpncpy,
+strncpy,
+zustr2ustp, zustr2stp,
+strncat,
+ustpcpy, ustr2stp
+\- copy strings and character sequences
+.\" ----- SYNOPSIS :: -------------------------------------------------/
+.SH SYNOPSIS
+.\" ----- SYNOPSIS :: (Null-terminated) strings -----------------------/
+.SS Strings
+.nf
+// Chain-copy a string.
+.BI "char *stpcpy(char *restrict " dst ", const char *restrict " src );
+.PP
+// Copy/catenate a string.
+.BI "char *strcpy(char *restrict " dst ", const char *restrict " src );
+.BI "char *strcat(char *restrict " dst ", const char *restrict " src );
+.PP
+// Chain-copy a string with truncation.
+.BI "char *stpecpy(char *" dst ", char " end "[0], const char *restrict " src );
+.PP
+// Chain-copy a string with truncation and SIGSEGV on UB.
+.BI "char *stpecpyx(char *" dst ", char " end "[0], const char *restrict " src );
+.PP
+// Copy/catenate a string with truncation and SIGSEGV on UB.
+.BI "size_t strlcpy(char " dst "[restrict ." sz "], \
+const char *restrict " src ,
+.BI "               size_t " sz );
+.BI "size_t strlcat(char " dst "[restrict ." sz "], \
+const char *restrict " src ,
+.BI "               size_t " sz );
+.fi
+.\" ----- SYNOPSIS :: Null-padded character sequences --------/
+.SS Null-padded character sequences
+.nf
+// Zero a fixed-width buffer, and
+// copy a string into a character sequence with truncation.
+.BI "char *stpncpy(char " dst "[restrict ." sz "], \
+const char *restrict " src ,
+.BI "               size_t " sz );
+.PP
+// Zero a fixed-width buffer, and
+// copy a string into a character sequence with truncation.
+.BI "char *strncpy(char " dest "[restrict ." sz "], \
+const char *restrict " src ,
+.BI "               size_t " sz );
+.PP
+// Chain-copy a null-padded character sequence into a character sequence.
+.BI "char *zustr2ustp(char *restrict " dst ", \
+const char " src "[restrict ." sz ],
+.BI "               size_t " sz );
+.PP
+// Chain-copy a null-padded character sequence into a string.
+.BI "char *zustr2stp(char *restrict " dst ", \
+const char " src "[restrict ." sz ],
+.BI "               size_t " sz );
+.PP
+// Catenate a null-padded character sequence into a string.
+.BI "char *strncat(char *restrict " dst ", const char " src "[restrict ." sz ],
+.BI "               size_t " sz );
+.fi
+.\" ----- SYNOPSIS :: Measured character sequences --------------------/
+.SS Measured character sequences
+.nf
+// Chain-copy a measured character sequence.
+.BI "char *ustpcpy(char *restrict " dst ", \
+const char " src "[restrict ." len ],
+.BI "               size_t " len );
+.PP
+// Chain-copy a measured character sequence into a string.
+.BI "char *ustr2stp(char *restrict " dst ", \
+const char " src "[restrict ." len ],
+.BI "               size_t " len );
+.fi
+.SH DESCRIPTION
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: -----------------/
+.SS Terms (and abbreviations)
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: string (str) ----/
+.TP
+.IR "string " ( str )
+is a sequence of zero or more non-null characters followed by a null byte.
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: null-padded character seq
+.TP
+.I character sequence
+is a sequence of zero or more non-null characters.
+A program should never usa a character sequence where a string is required.
+However, with appropriate care,
+a string can be used in the place of a character sequence.
+.RS
+.TP
+.IR "null-padded character sequence " ( zustr )
+Character sequences can be contained in fixed-width buffers,
+which contain padding null bytes after the character sequence,
+to fill the rest of the buffer
+without affecting the character sequence;
+however, those padding null bytes are not part of the character sequence.
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: measured character sequence
+.TP
+.IR "measured character sequence " ( ustr )
+Character sequence delimited by its length.
+It may be a slice of a larger character sequence,
+or even of a string.
+.RE
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: length (len) ----/
+.TP
+.IR "length " ( len )
+is the number of non-null characters in a string or character sequence.
+It is the return value of
+.I strlen(str)
+and of
+.IR "strnlen(ustr, sz)" .
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: size (sz) -------/
+.TP
+.IR "size " ( sz )
+refers to the entire buffer
+where the string or character sequence is contained.
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: end -------------/
+.TP
+.I end
+is the name of a pointer to one past the last element of a buffer.
+It is equivalent to
+.IR &str[sz] .
+It is used as a sentinel value,
+to be able to truncate strings or character sequences
+instead of overrunning the containing buffer.
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: copy ------------/
+.TP
+.I copy
+This term is used when
+the writing starts at the first element pointed to by
+.IR dst .
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: catenate --------/
+.TP
+.I catenate
+This term is used when
+a function first finds the terminating null byte in
+.IR dst ,
+and then starts writing at that position.
+.\" ----- DESCRIPTION :: Terms (and abbreviations) :: chain -----------/
+.TP
+.I chain
+This term is used when
+it's the programmer who provides
+a pointer to the terminating null byte in the string
+.I dst
+(or one after the last character in a character sequence),
+and the function starts writing at that location.
+The function returns
+a pointer to the new location of the terminating null byte
+(or one after the last character in a character sequence)
+after the call,
+so that the programmer can use it to chain such calls.
+.\" ----- DESCRIPTION :: Copy, catenate, and chain-copy ---------------/
+.SS Copy, catenate, and chain-copy
+Originally,
+there was a distinction between functions that copy and those that catenate.
+However, newer functions that copy while allowing chaining
+cover both use cases with a single API.
+They are also algorithmically faster,
+since they don't need to search for
+the terminating null byte of the existing string.
+However, functions that catenate have a much simpler use,
+so if performance is not important,
+it can make sense to use them for improving readability.
+.PP
+The pointer returned by functions that allow chaining
+is a byproduct of the copy operation,
+so it has no performance costs.
+Functions that return such a pointer,
+and thus can be chained,
+have names of the form
+.RB * stp *(),
+since it's common to name the pointer just
+.IR p .
+.PP
+Chain-copying functions that truncate
+should accept a pointer to the end of the destination buffer,
+and have names of the form
+.RB * stpe *().
+This allows not having to recalculate the remaining size after each call.
+.\" ----- DESCRIPTION :: Truncate or not? -----------------------------/
+.SS Truncate or not?
+The first thing to note is that programmers should be careful with buffers,
+so they always have the correct size,
+and truncation is not necessary.
+.PP
+In most cases,
+truncation is not desired,
+and it is simpler to just do the copy.
+Simpler code is safer code.
+Programming against programming mistakes by adding more code
+just adds more points where mistakes can be made.
+.PP
+Nowadays,
+compilers can detect most programmer errors with features like
+compiler warnings,
+static analyzers, and
+.BR \%_FORTIFY_SOURCE
+(see
+.BR ftm (7)).
+Keeping the code simple
+helps these overflow-detection features be more precise.
+.PP
+When validating user input,
+however,
+it makes sense to truncate.
+Remember to check the return value of such function calls.
+.PP
+Functions that truncate:
+.IP \(bu 3
+.BR stpecpy (3)
+is the most efficient string copy function that performs truncation.
+It only requires to check for truncation once after all chained calls.
+.IP \(bu
+.BR stpecpyx (3)
+is a variant of
+.BR stpecpy (3)
+that consumes the entire source string,
+to catch bugs in the program
+by forcing a segmentation fault (as
+.BR strlcpy (3bsd)
+and
+.BR strlcat (3bsd)
+do).
+.IP \(bu
+.BR strlcpy (3bsd)
+and
+.BR strlcat (3bsd)
+are designed to crash if the input string is invalid
+(doesn't contain a terminating null byte).
+.IP \(bu
+.BR stpncpy (3)
+and
+.BR strncpy (3)
+also truncate, but they don't write strings,
+but rather null-padded character sequences.
+.\" ----- DESCRIPTION :: Null-padded character sequences --------------/
+.SS Null-padded character sequences
+For historic reasons,
+some standard APIs,
+such as
+.BR utmpx (5),
+use null-padded character sequences in fixed-width buffers.
+To interface with them,
+specialized functions need to be used.
+.PP
+To copy strings into them, use
+.BR stpncpy (3).
+.PP
+To copy from an unterminated string within a fixed-width buffer into a string,
+ignoring any trailing null bytes in the source fixed-width buffer,
+you should use
+.BR zustr2stp (3)
+or
+.BR strncat (3).
+.PP
+To copy from an unterminated string within a fixed-width buffer
+into a character sequence,
+ingoring any trailing null bytes in the source fixed-width buffer,
+you should use
+.BR zustr2ustp (3).
+.\" ----- DESCRIPTION :: Measured character sequences -----------------/
+.SS Measured character sequences
+The simplest character sequence copying function is
+.BR mempcpy (3).
+It requires always knowing the length of your character sequences,
+for which structures can be used.
+It makes the code much faster,
+since you always know the length of your character sequences,
+and can do the minimal copies and length measurements.
+.BR mempcpy (3)
+copies character sequences,
+so you need to explicitly set the terminating null byte if you need a string.
+.PP
+However,
+for keeping type safety,
+it's good to add a wrapper that uses
+.I char\~*
+instead of
+.IR void\~* :
+.BR ustpcpy (3).
+.PP
+In programs that make considerable use of strings or character sequences,
+and need the best performance,
+using overlapping character sequences can make a big difference.
+It allows holding subsequences of a larger character sequence.
+while not duplicating memory
+nor using time to do a copy.
+.PP
+However, this is delicate,
+since it requires using character sequences.
+C library APIs use strings,
+so programs that use character sequences
+will have to take care of differentiating strings from character sequences.
+.PP
+To copy a measured character sequence, use
+.BR ustpcpy (3).
+.PP
+To copy a measured character sequence into a string, use
+.BR ustr2stp (3).
+.PP
+Because these functions ask for the length,
+and a string is by nature composed of a character sequence of the same length
+plus a terminating null byte,
+a string is also accepted as input.
+.\" ----- DESCRIPTION :: String vs character sequence -----------------/
+.SS String vs character sequence
+Some functions only operate on strings.
+Those require that the input
+.I src
+is a string,
+and guarantee an output string
+(even when truncation occurs).
+Functions that catenate
+also require that
+.I dst
+holds a string before the call.
+List of functions:
+.IP \(bu 3
+.PD 0
+.BR stpcpy (3)
+.IP \(bu
+.BR strcpy "(3), \c"
+.BR strcat (3)
+.IP \(bu
+.BR stpecpy "(3), \c"
+.BR stpecpyx (3)
+.IP \(bu
+.BR strlcpy "(3bsd), \c"
+.BR strlcat (3bsd)
+.PD
+.PP
+Other functions require an input string,
+but create a character sequence as output.
+These functions have confusing names,
+and have a long history of misuse.
+List of functions:
+.IP \(bu 3
+.PD 0
+.BR stpncpy (3)
+.IP \(bu
+.BR strncpy (3)
+.PD
+.PP
+Other functions operate on an input character sequence,
+and create an output string.
+Functions that catenate
+also require that
+.I dst
+holds a string before the call.
+.BR strncat (3)
+has an even more misleading name than the functions above.
+List of functions:
+.IP \(bu 3
+.PD 0
+.BR zustr2stp (3)
+.IP \(bu
+.BR strncat (3)
+.IP \(bu
+.BR ustr2stp (3)
+.PD
+.PP
+Other functions operate on an input character sequence
+to create an output character sequence.
+List of functions:
+.IP \(bu 3
+.PD 0
+.BR ustpcpy (3)
+.IP \(bu
+.BR zustr2stp (3)
+.PD
+.\" ----- DESCRIPTION :: Functions :: ---------------------------------/
+.SS Functions
+.\" ----- DESCRIPTION :: Functions :: stpcpy(3) -----------------------/
+.TP
+.BR stpcpy (3)
+This function copies the input string into a destination string.
+The programmer is responsible for allocating a buffer large enough.
+It returns a pointer suitable for chaining.
+.\" ----- DESCRIPTION :: Functions :: strcpy(3), strcat(3) ------------/
+.TP
+.BR strcpy (3)
+.TQ
+.BR strcat (3)
+These functions copy and catenate the input string into a destination string.
+The programmer is responsible for allocating a buffer large enough.
+The return value is useless.
+.IP
+.BR stpcpy (3)
+is a faster alternative to these functions.
+.\" ----- DESCRIPTION :: Functions :: stpecpy(3), stpecpyx(3) ---------/
+.TP
+.BR stpecpy (3)
+.TQ
+.BR stpecpyx (3)
+These functions copy the input string into a destination string.
+If the destination buffer,
+limited by a pointer to its end,
+isn't large enough to hold the copy,
+the resulting string is truncated
+(but it is guaranteed to be null-terminated).
+They return a pointer suitable for chaining.
+Truncation needs to be detected only once after the last chained call.
+.BR stpecpyx (3)
+has identical semantics to
+.BR stpecpy (3),
+except that it forces a SIGSEGV if the
+.I src
+pointer is not a string.
+.IP
+These functions are not provided by any library;
+See EXAMPLES for a reference implementation.
+.\" ----- DESCRIPTION :: Functions :: strlcpy(3bsd), strlcat(3bsd) ----/
+.TP
+.BR strlcpy (3bsd)
+.TQ
+.BR strlcat (3bsd)
+These functions copy and catenate the input string into a destination string.
+If the destination buffer,
+limited by its size,
+isn't large enough to hold the copy,
+the resulting string is truncated
+(but it is guaranteed to be null-terminated).
+They return the length of the total string they tried to create.
+These functions force a SIGSEGV if the
+.I src
+pointer is not a string.
+.IP
+.BR stpecpyx (3)
+is a faster alternative to these functions.
+.\" ----- DESCRIPTION :: Functions :: stpncpy(3) ----------------------/
+.TP
+.BR stpncpy (3)
+This function copies the input string into
+a destination null-padded character sequence in a fixed-width buffer.
+If the destination buffer,
+limited by its size,
+isn't large enough to hold the copy,
+the resulting character sequence is truncated.
+Since it creates a character sequence,
+it doesn't need to write a terminating null byte.
+It's impossible to distinguish truncation by the result of the call,
+from a character sequence that just fits the destination buffer;
+truncation should be detected by
+comparing the length of the input string
+with the size of the destination buffer.
+.\" ----- DESCRIPTION :: Functions :: strncpy(3) ----------------------/
+.TP
+.BR strncpy (3)
+This function is identical to
+.BR stpncpy (3)
+except for the useless return value.
+.IP
+.BR stpncpy (3)
+is a more useful alternative to this function.
+.\" ----- DESCRIPTION :: Functions :: zustr2ustp(3) --------------------/
+.TP
+.BR zustr2ustp (3)
+This function copies the input character sequence
+contained in a null-padded wixed-width buffer,
+into a destination character sequence.
+The programmer is responsible for allocating a buffer large enough.
+It returns a pointer suitable for chaining.
+.IP
+A truncating version of this function doesn't exist,
+since the size of the original character sequence is always known,
+so it wouldn't be very useful.
+.IP
+This function is not provided by any library;
+See EXAMPLES for a reference implementation.
+.\" ----- DESCRIPTION :: Functions :: zustr2stp(3) --------------------/
+.TP
+.BR zustr2stp (3)
+This function copies the input character sequence
+contained in a null-padded wixed-width buffer,
+into a destination string.
+The programmer is responsible for allocating a buffer large enough.
+It returns a pointer suitable for chaining.
+.IP
+A truncating version of this function doesn't exist,
+since the size of the original character sequence is always known,
+so it wouldn't be very useful.
+.IP
+This function is not provided by any library;
+See EXAMPLES for a reference implementation.
+.\" ----- DESCRIPTION :: Functions :: strncat(3) ----------------------/
+.TP
+.BR strncat (3)
+Do not confuse this function with
+.BR strncpy (3);
+they are not related at all.
+.IP
+This function catenates the input character sequence
+contained in a null-padded wixed-width buffer,
+into a destination string.
+The programmer is responsible for allocating a buffer large enough.
+The return value is useless.
+.IP
+.BR zustr2stp (3)
+is a faster alternative to this function.
+.\" ----- DESCRIPTION :: Functions :: ustpcpy(3) ----------------------/
+.TP
+.BR ustpcpy (3)
+This function copies the input character sequence,
+limited by its length,
+into a destination character sequence.
+The programmer is responsible for allocating a buffer large enough.
+It returns a pointer suitable for chaining.
+.\" ----- DESCRIPTION :: Functions :: ustr2stp(3) ---------------------/
+.TP
+.BR ustr2stp (3)
+This function copies the input character sequence,
+limited by its length,
+into a destination string.
+The programmer is responsible for allocating a buffer large enough.
+It returns a pointer suitable for chaining.
+.\" ----- RETURN VALUE :: ---------------------------------------------/
+.SH RETURN VALUE
+The following functions return
+a pointer to the terminating null byte in the destination string.
+.IP \(bu 3
+.PD 0
+.BR stpcpy (3)
+.IP \(bu
+.BR ustr2stp (3)
+.IP \(bu
+.BR zustr2stp (3)
+.PD
+.PP
+The following functions return
+a pointer to the terminating null byte in the destination string,
+except when truncation occurs;
+if truncation occurs,
+they return a pointer to the end of the destination buffer.
+.IP \(bu 3
+.BR stpecpy (3),
+.BR stpecpyx (3)
+.PP
+The following function returns
+a pointer to one after the last character
+in the destination character sequence;
+if truncation occurs,
+that pointer is equivalent to
+a pointer to the end of the destination buffer.
+.IP \(bu 3
+.BR stpncpy (3)
+.PP
+The following functions return
+a pointer to one after the last character
+in the destination character sequence.
+.IP \(bu 3
+.PD 0
+.BR zustr2ustp (3)
+.IP \(bu
+.BR ustpcpy (3)
+.PD
+.PP
+The following functions return
+the length of the total string that they tried to create
+(as if truncation didn't occur).
+.IP \(bu 3
+.BR strlcpy (3bsd),
+.BR strlcat (3bsd)
+.PP
+The following functions return the
+.I dst
+pointer,
+which is useless.
+.IP \(bu 3
+.PD 0
+.BR strcpy (3),
+.BR strcat (3)
+.IP \(bu
+.BR strncpy (3)
+.IP \(bu
+.BR strncat (3)
+.PD
+.\" ----- NOTES :: strscpy(9) -----------------------------------------/
+.SH NOTES
+The Linux kernel has an internal function for copying strings,
+which is similar to
+.BR stpecpy (3),
+except that it can't be chained:
+.TP
+.BR strscpy (9)
+This function copies the input string into a destination string.
+If the destination buffer,
+limited by its size,
+isn't large enough to hold the copy,
+the resulting string is truncated
+(but it is guaranteed to be null-terminated).
+It returns the length of the destination string, or
+.B \-E2BIG
+on truncation.
+.IP
+.BR stpecpy (3)
+is a simpler and faster alternative to this function.
+.RE
+.\" ----- CAVEATS :: --------------------------------------------------/
+.SH CAVEATS
+Don't mix chain calls to truncating and non-truncating functions.
+It is conceptually wrong
+unless you know that the first part of a copy will always fit.
+Anyway, the performance difference will probably be negligible,
+so it will probably be more clear if you use consistent semantics:
+either truncating or non-truncating.
+Calling a non-truncating function after a truncating one is necessarily wrong.
+.\" ----- BUGS :: -----------------------------------------------------/
+.SH BUGS
+All catenation functions share the same performance problem:
+.UR https://www.joelonsoftware.com/\:2001/12/11/\:back\-to\-basics/
+Shlemiel the painter
+.UE .
+.\" ----- EXAMPLES :: -------------------------------------------------/
+.SH EXAMPLES
+The following are examples of correct use of each of these functions.
+.\" ----- EXAMPLES :: stpcpy(3) ---------------------------------------/
+.TP
+.BR stpcpy (3)
+.EX
+p = buf;
+p = stpcpy(p, "Hello ");
+p = stpcpy(p, "world");
+p = stpcpy(p, "!");
+len = p \- buf;
+puts(buf);
+.EE
+.\" ----- EXAMPLES :: strcpy(3), strcat(3) ----------------------------/
+.TP
+.BR strcpy (3)
+.TQ
+.BR strcat (3)
+.EX
+strcpy(buf, "Hello ");
+strcat(buf, "world");
+strcat(buf, "!");
+len = strlen(buf);
+puts(buf);
+.EE
+.\" ----- EXAMPLES :: stpecpy(3), stpecpyx(3) -------------------------/
+.TP
+.BR stpecpy (3)
+.TQ
+.BR stpecpyx (3)
+.EX
+end = buf + sizeof(buf);
+p = buf;
+p = stpecpy(p, end, "Hello ");
+p = stpecpy(p, end, "world");
+p = stpecpy(p, end, "!");
+if (p == end) {
+    p\-\-;
+    goto toolong;
+}
+len = p \- buf;
+puts(buf);
+.EE
+.\" ----- EXAMPLES :: strlcpy(3bsd), strlcat(3bsd) --------------------/
+.TP
+.BR strlcpy (3bsd)
+.TQ
+.BR strlcat (3bsd)
+.EX
+if (strlcpy(buf, "Hello ", sizeof(buf)) >= sizeof(buf))
+    goto toolong;
+if (strlcat(buf, "world", sizeof(buf)) >= sizeof(buf))
+    goto toolong;
+len = strlcat(buf, "!", sizeof(buf));
+if (len >= sizeof(buf))
+    goto toolong;
+puts(buf);
+.EE
+.\" ----- EXAMPLES :: strscpy(9) --------------------------------------/
+.TP
+.BR strscpy (9)
+.EX
+len = strscpy(buf, "Hello world!", sizeof(buf));
+if (len == \-E2BIG)
+    goto toolong;
+puts(buf);
+.EE
+.\" ----- EXAMPLES :: stpncpy(3) --------------------------------------/
+.TP
+.BR stpncpy (3)
+.EX
+p = stpncpy(buf, "Hello world!", sizeof(buf));
+if (sizeof(buf) < strlen("Hello world!"))
+    goto toolong;
+len = p \- buf;
+for (size_t i = 0; i < sizeof(buf); i++)
+    putchar(buf[i]);
+.EE
+.\" ----- EXAMPLES :: strncpy(3) --------------------------------------/
+.TP
+.BR strncpy (3)
+.EX
+strncpy(buf, "Hello world!", sizeof(buf));
+if (sizeof(buf) < strlen("Hello world!"))
+    goto toolong;
+len = strnlen(buf, sizeof(buf));
+for (size_t i = 0; i < sizeof(buf); i++)
+    putchar(buf[i]);
+.EE
+.\" ----- EXAMPLES :: zustr2ustp(3) -----------------------------------/
+.TP
+.BR zustr2ustp (3)
+.EX
+p = buf;
+p = zustr2ustp(p, "Hello ", 6);
+p = zustr2ustp(p, "world", 42);  // Padding null bytes ignored.
+p = zustr2ustp(p, "!", 1);
+len = p \- buf;
+printf("%.*s\en", (int) len, buf);
+.EE
+.\" ----- EXAMPLES :: zustr2stp(3) ------------------------------------/
+.TP
+.BR zustr2stp (3)
+.EX
+p = buf;
+p = zustr2stp(p, "Hello ", 6);
+p = zustr2stp(p, "world", 42);  // Padding null bytes ignored.
+p = zustr2stp(p, "!", 1);
+len = p \- buf;
+puts(buf);
+.EE
+.\" ----- EXAMPLES :: strncat(3) --------------------------------------/
+.TP
+.BR strncat (3)
+.EX
+buf[0] = \(aq\e0\(aq;  // There's no 'cpy' function to this 'cat'.
+strncat(buf, "Hello ", 6);
+strncat(buf, "world", 42);  // Padding null bytes ignored.
+strncat(buf, "!", 1);
+len = strlen(buf);
+puts(buf);
+.EE
+.\" ----- EXAMPLES :: ustpcpy(3) --------------------------------------/
+.TP
+.BR ustpcpy (3)
+.EX
+p = buf;
+p = ustpcpy(p, "Hello ", 6);
+p = ustpcpy(p, "world", 5);
+p = ustpcpy(p, "!", 1);
+len = p \- buf;
+printf("%.*s\en", (int) len, buf);
+.EE
+.\" ----- EXAMPLES :: ustr2stp(3) -------------------------------------/
+.TP
+.BR ustr2stp (3)
+.EX
+p = buf;
+p = ustr2stp(p, "Hello ", 6);
+p = ustr2stp(p, "world", 5);
+p = ustr2stp(p, "!", 1);
+len = p \- buf;
+puts(buf);
+.EE
+.\" ----- EXAMPLES :: Implementations :: ------------------------------/
+.SS Implementations
+Here are reference implementations for functions not provided by libc.
+.PP
+.in +4n
+.EX
+/* This code is in the public domain. */
+
+.\" ----- EXAMPLES :: Implementations :: stpecpy(3) -------------------/
+char *
+.IR stpecpy "(char *dst, char end[0], const char *restrict src)"
+{
+    char *p;
+
+    if (dst == end)
+        return end;
+
+    p = memccpy(dst, src, \(aq\e0\(aq, end \- dst);
+    if (p != NULL)
+        return p \- 1;
+
+    /* truncation detected */
+    end[\-1] = \(aq\e0\(aq;
+    return end;
+}
+
+.\" ----- EXAMPLES :: Implementations :: stpecpy(3) -------------------/
+char *
+.IR stpecpyx "(char *dst, char end[0], const char *restrict src)"
+{
+    if (src[strlen(src)] != \(aq\e0\(aq)
+        raise(SIGSEGV);
+
+    return stpecpy(dst, end, src);
+}
+
+.\" ----- EXAMPLES :: Implementations :: zustr2ustp(3) ----------------/
+char *
+.IR zustr2ustp "(char *restrict dst, const char *restrict src, size_t sz)"
+{
+    return ustpcpy(dst, src, strnlen(src, sz));
+}
+
+.\" ----- EXAMPLES :: Implementations :: zustr2stp(3) -----------------/
+char *
+.IR zustr2stp "(char *restrict dst, const char *restrict src, size_t sz)"
+{
+    char  *p;
+
+    p = zustr2ustp(dst, src, sz);
+    *p = \(aq\e0\(aq;
+
+    return p;
+}
+
+.\" ----- EXAMPLES :: Implementations :: ustpcpy(3) -------------------/
+char *
+.IR ustpcpy "(char *restrict dst, const char *restrict src, size_t len)"
+{
+    return mempcpy(dst, src, len);
+}
+
+.\" ----- EXAMPLES :: Implementations :: ustr2stp(3) ------------------/
+char *
+.IR ustr2stp "(char *restrict dst, const char *restrict src, size_t len)"
+{
+    char  *p;
+
+    p = ustpcpy(dst, src, len);
+    *p = \(aq\e0\(aq;
+
+    return p;
+}
+.EE
+.in
+.\" ----- SEE ALSO :: -------------------------------------------------/
+.SH SEE ALSO
+.BR bzero (3),
+.BR memcpy (3),
+.BR memccpy (3),
+.BR mempcpy (3),
+.BR stpcpy (3),
+.BR strlcpy (3bsd),
+.BR strncat (3),
+.BR stpncpy (3),
+.BR string (3)
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v6 2/5] stpecpy.3, stpecpyx.3, ustpcpy.3, ustr2stp.3, zustr2stp.3, zustr2ustp.3: Add new links to string_copy(7)
  2022-12-15  0:26         ` [PATCH v5 0/5] Rewrite pages about " Alejandro Colomar
  2022-12-19 21:02           ` [PATCH v6 0/5] Rewrite documentation for " Alejandro Colomar
  2022-12-19 21:02           ` [PATCH v6 1/5] string_copy.7: Add page to document all " Alejandro Colomar
@ 2022-12-19 21:02           ` Alejandro Colomar
  2022-12-19 21:02           ` [PATCH v6 3/5] stpcpy.3, strcpy.3, strcat.3: Document in a single page Alejandro Colomar
                             ` (2 subsequent siblings)
  5 siblings, 0 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-19 21:02 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, Martin Sebor, G. Branden Robinson,
	Douglas McIlroy, Jakub Wilk, Serge Hallyn, Iker Pedrosa,
	Andrew Pinski, Stefan Puiu

Cc: Martin Sebor <msebor@redhat.com>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Douglas McIlroy <douglas.mcilroy@dartmouth.edu>
Cc: Jakub Wilk <jwilk@jwilk.net>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Iker Pedrosa <ipedrosa@redhat.com>
Cc: Andrew Pinski <pinskia@gmail.com>
Cc: Stefan Puiu <stefan.puiu@gmail.com>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man3/stpecpy.3    | 1 +
 man3/stpecpyx.3   | 1 +
 man3/ustpcpy.3    | 1 +
 man3/ustr2stp.3   | 1 +
 man3/zustr2stp.3  | 1 +
 man3/zustr2ustp.3 | 1 +
 6 files changed, 6 insertions(+)
 create mode 100644 man3/stpecpy.3
 create mode 100644 man3/stpecpyx.3
 create mode 100644 man3/ustpcpy.3
 create mode 100644 man3/ustr2stp.3
 create mode 100644 man3/zustr2stp.3
 create mode 100644 man3/zustr2ustp.3

diff --git a/man3/stpecpy.3 b/man3/stpecpy.3
new file mode 100644
index 000000000..6ff53887b
--- /dev/null
+++ b/man3/stpecpy.3
@@ -0,0 +1 @@
+.so man7/string_copy.7
diff --git a/man3/stpecpyx.3 b/man3/stpecpyx.3
new file mode 100644
index 000000000..6ff53887b
--- /dev/null
+++ b/man3/stpecpyx.3
@@ -0,0 +1 @@
+.so man7/string_copy.7
diff --git a/man3/ustpcpy.3 b/man3/ustpcpy.3
new file mode 100644
index 000000000..6ff53887b
--- /dev/null
+++ b/man3/ustpcpy.3
@@ -0,0 +1 @@
+.so man7/string_copy.7
diff --git a/man3/ustr2stp.3 b/man3/ustr2stp.3
new file mode 100644
index 000000000..6ff53887b
--- /dev/null
+++ b/man3/ustr2stp.3
@@ -0,0 +1 @@
+.so man7/string_copy.7
diff --git a/man3/zustr2stp.3 b/man3/zustr2stp.3
new file mode 100644
index 000000000..6ff53887b
--- /dev/null
+++ b/man3/zustr2stp.3
@@ -0,0 +1 @@
+.so man7/string_copy.7
diff --git a/man3/zustr2ustp.3 b/man3/zustr2ustp.3
new file mode 100644
index 000000000..6ff53887b
--- /dev/null
+++ b/man3/zustr2ustp.3
@@ -0,0 +1 @@
+.so man7/string_copy.7
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v6 3/5] stpcpy.3, strcpy.3, strcat.3: Document in a single page
  2022-12-15  0:26         ` [PATCH v5 0/5] Rewrite pages about " Alejandro Colomar
                             ` (2 preceding siblings ...)
  2022-12-19 21:02           ` [PATCH v6 2/5] stpecpy.3, stpecpyx.3, ustpcpy.3, ustr2stp.3, zustr2stp.3, zustr2ustp.3: Add new links to string_copy(7) Alejandro Colomar
@ 2022-12-19 21:02           ` Alejandro Colomar
  2022-12-19 21:02           ` [PATCH v6 4/5] stpncpy.3, strncpy.3: " Alejandro Colomar
  2022-12-19 21:02           ` [PATCH v6 5/5] strncat.3: Rewrite to be consistent with string_copy.7 Alejandro Colomar
  5 siblings, 0 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-19 21:02 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, Martin Sebor, G. Branden Robinson,
	Douglas McIlroy, Jakub Wilk, Serge Hallyn, Iker Pedrosa,
	Andrew Pinski, Stefan Puiu

Rewrite to be consistent with the new string_copy.7 page.

Cc: Martin Sebor <msebor@redhat.com>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Douglas McIlroy <douglas.mcilroy@dartmouth.edu>
Cc: Jakub Wilk <jwilk@jwilk.net>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Iker Pedrosa <ipedrosa@redhat.com>
Cc: Andrew Pinski <pinskia@gmail.com>
Cc: Stefan Puiu <stefan.puiu@gmail.com>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man3/stpcpy.3 | 116 +------------------------
 man3/strcat.3 | 162 +---------------------------------
 man3/strcpy.3 | 234 ++++++++++++++++++++++++++++++++------------------
 3 files changed, 152 insertions(+), 360 deletions(-)

diff --git a/man3/stpcpy.3 b/man3/stpcpy.3
index 42751d356..ff7476a84 100644
--- a/man3/stpcpy.3
+++ b/man3/stpcpy.3
@@ -1,115 +1 @@
-'\" t
-.\" Copyright 1995 James R. Van Zandt <jrv@vanzandt.mv.com>
-.\"
-.\" SPDX-License-Identifier: Linux-man-pages-copyleft
-.\"
-.TH stpcpy 3 (date) "Linux man-pages (unreleased)"
-.SH NAME
-stpcpy \- copy a string returning a pointer to its end
-.SH LIBRARY
-Standard C library
-.RI ( libc ", " \-lc )
-.SH SYNOPSIS
-.nf
-.B #include <string.h>
-.PP
-.BI "char *stpcpy(char *restrict " dest ", const char *restrict " src );
-.fi
-.PP
-.RS -4
-Feature Test Macro Requirements for glibc (see
-.BR feature_test_macros (7)):
-.RE
-.PP
-.BR stpcpy ():
-.nf
-    Since glibc 2.10:
-        _POSIX_C_SOURCE >= 200809L
-    Before glibc 2.10:
-        _GNU_SOURCE
-.fi
-.SH DESCRIPTION
-The
-.BR stpcpy ()
-function copies the string pointed to by
-.I src
-(including the terminating null byte (\(aq\e0\(aq)) to the array pointed to by
-.IR dest .
-The strings may not overlap, and the destination string
-.I dest
-must be large enough to receive the copy.
-.SH RETURN VALUE
-.BR stpcpy ()
-returns a pointer to the
-.B end
-of the string
-.I dest
-(that is, the address of the terminating null byte)
-rather than the beginning.
-.SH ATTRIBUTES
-For an explanation of the terms used in this section, see
-.BR attributes (7).
-.ad l
-.nh
-.TS
-allbox;
-lbx lb lb
-l l l.
-Interface	Attribute	Value
-T{
-.BR stpcpy ()
-T}	Thread safety	MT-Safe
-.TE
-.hy
-.ad
-.sp 1
-.SH STANDARDS
-This function was added to POSIX.1-2008.
-Before that, it was not part of
-the C or POSIX.1 standards, nor customary on UNIX systems.
-It first appeared at least as early as 1986,
-in the Lattice C AmigaDOS compiler,
-then in the GNU fileutils and GNU textutils in 1989,
-and in the GNU C library by 1992.
-It is also present on the BSDs.
-.SH BUGS
-This function may overrun the buffer
-.IR dest .
-.SH EXAMPLES
-For example, this program uses
-.BR stpcpy ()
-to concatenate
-.B foo
-and
-.B bar
-to produce
-.BR foobar ,
-which it then prints.
-.PP
-.\" SRC BEGIN (stpcpy.c)
-.EX
-#define _GNU_SOURCE
-#include <stdio.h>
-#include <string.h>
-
-int
-main(void)
-{
-    char buffer[20];
-    char *to = buffer;
-
-    to = stpcpy(to, "foo");
-    to = stpcpy(to, "bar");
-    printf("%s\en", buffer);
-}
-.EE
-.\" SRC END
-.SH SEE ALSO
-.BR bcopy (3),
-.BR memccpy (3),
-.BR memcpy (3),
-.BR memmove (3),
-.BR stpncpy (3),
-.BR strcpy (3),
-.BR string (3),
-.BR wcpcpy (3)
+.so man3/strcpy.3
diff --git a/man3/strcat.3 b/man3/strcat.3
index 90b9d260d..ff7476a84 100644
--- a/man3/strcat.3
+++ b/man3/strcat.3
@@ -1,161 +1 @@
-'\" t
-.\" Copyright 1993 David Metcalfe (david@prism.demon.co.uk)
-.\"
-.\" SPDX-License-Identifier: Linux-man-pages-copyleft
-.\"
-.\" References consulted:
-.\"     Linux libc source code
-.\"     Lewine's _POSIX Programmer's Guide_ (O'Reilly & Associates, 1991)
-.\"     386BSD man pages
-.\" Modified Sat Jul 24 18:11:47 1993 by Rik Faith (faith@cs.unc.edu)
-.\" 2007-06-15, Marc Boyer <marc.boyer@enseeiht.fr> + mtk
-.\"     Improve discussion of strncat().
-.TH strcat 3 (date) "Linux man-pages (unreleased)"
-.SH NAME
-strcat \- concatenate two strings
-.SH LIBRARY
-Standard C library
-.RI ( libc ", " \-lc )
-.SH SYNOPSIS
-.nf
-.B #include <string.h>
-.PP
-.BI "char *strcat(char *restrict " dest ", const char *restrict " src );
-.fi
-.SH DESCRIPTION
-The
-.BR strcat ()
-function appends the
-.I src
-string to the
-.I dest
-string,
-overwriting the terminating null byte (\(aq\e0\(aq) at the end of
-.IR dest ,
-and then adds a terminating null byte.
-The strings may not overlap, and the
-.I dest
-string must have
-enough space for the result.
-If
-.I dest
-is not large enough, program behavior is unpredictable;
-.IR "buffer overruns are a favorite avenue for attacking secure programs" .
-.SH RETURN VALUE
-The
-.BR strcat ()
-function returns a pointer to the resulting string
-.IR dest .
-.SH ATTRIBUTES
-For an explanation of the terms used in this section, see
-.BR attributes (7).
-.ad l
-.nh
-.TS
-allbox;
-lbx lb lb
-l l l.
-Interface	Attribute	Value
-T{
-.BR strcat (),
-.BR strncat ()
-T}	Thread safety	MT-Safe
-.TE
-.hy
-.ad
-.sp 1
-.SH STANDARDS
-POSIX.1-2001, POSIX.1-2008, C89, C99, SVr4, 4.3BSD.
-.SH NOTES
-Some systems (the BSDs, Solaris, and others) provide the following function:
-.PP
-.in +4n
-.EX
-size_t strlcat(char *dest, const char *src, size_t size);
-.EE
-.in
-.PP
-This function appends the null-terminated string
-.I src
-to the string
-.IR dest ,
-copying at most
-.I size\-strlen(dest)\-1
-from
-.IR src ,
-and adds a terminating null byte to the result,
-.I unless
-.I size
-is less than
-.IR strlen(dest) .
-This function fixes the buffer overrun problem of
-.BR strcat (),
-but the caller must still handle the possibility of data loss if
-.I size
-is too small.
-The function returns the length of the string
-.BR strlcat ()
-tried to create; if the return value is greater than or equal to
-.IR size ,
-data loss occurred.
-If data loss matters, the caller
-.I must
-either check the arguments before the call, or test the function return value.
-.BR strlcat ()
-is not present in glibc and is not standardized by POSIX,
-.\" https://lwn.net/Articles/506530/
-but is available on Linux via the
-.I libbsd
-library.
-.\"
-.SH EXAMPLES
-Because
-.BR strcat ()
-must find the null byte that terminates the string
-.I dest
-using a search that starts at the beginning of the string,
-the execution time of this function
-scales according to the length of the string
-.IR dest .
-This can be demonstrated by running the program below.
-(If the goal is to concatenate many strings to one target,
-then manually copying the bytes from each source string
-while maintaining a pointer to the end of the target string
-will provide better performance.)
-.\"
-.SS Program source
-\&
-.\" SRC BEGIN (strcat.c)
-.EX
-#include <stdint.h>
-#include <stdio.h>
-#include <string.h>
-#include <time.h>
-
-int
-main(void)
-{
-#define LIM 4000000
-    char p[LIM + 1];    /* +1 for terminating null byte */
-    time_t base;
-
-    base = time(NULL);
-    p[0] = \(aq\e0\(aq;
-
-    for (unsigned int j = 0; j < LIM; j++) {
-        if ((j % 10000) == 0)
-            printf("%u %jd\en", j, (intmax_t) (time(NULL) \- base));
-        strcat(p, "a");
-    }
-}
-.EE
-.\" SRC END
-.SH SEE ALSO
-.BR bcopy (3),
-.BR memccpy (3),
-.BR memcpy (3),
-.BR strcpy (3),
-.BR string (3),
-.BR strlcat (3bsd),
-.BR wcscat (3),
-.BR wcsncat (3)
+.so man3/strcpy.3
diff --git a/man3/strcpy.3 b/man3/strcpy.3
index 685a8e77a..ba6820dab 100644
--- a/man3/strcpy.3
+++ b/man3/strcpy.3
@@ -1,21 +1,11 @@
 '\" t
-.\" Copyright (C) 1993 David Metcalfe (david@prism.demon.co.uk)
+.\" Copyright 2022 Alejandro Colomar <alx@kernel.org>
 .\"
 .\" SPDX-License-Identifier: Linux-man-pages-copyleft
 .\"
-.\" References consulted:
-.\"     Linux libc source code
-.\"     Lewine's _POSIX Programmer's Guide_ (O'Reilly & Associates, 1991)
-.\"     386BSD man pages
-.\" Modified Sat Jul 24 18:06:49 1993 by Rik Faith (faith@cs.unc.edu)
-.\" Modified Fri Aug 25 23:17:51 1995 by Andries Brouwer (aeb@cwi.nl)
-.\" Modified Wed Dec 18 00:47:18 1996 by Andries Brouwer (aeb@cwi.nl)
-.\" 2007-06-15, Marc Boyer <marc.boyer@enseeiht.fr> + mtk
-.\"     Improve discussion of strncpy().
-.\"
 .TH strcpy 3 (date) "Linux man-pages (unreleased)"
 .SH NAME
-strcpy \- copy a string
+stpcpy, strcpy, strcat \- copy or catenate a string
 .SH LIBRARY
 Standard C library
 .RI ( libc ", " \-lc )
@@ -23,27 +13,89 @@ .SH SYNOPSIS
 .nf
 .B #include <string.h>
 .PP
-.BI "char *strcpy(char *restrict " dest ", const char *restrict " src );
+.BI "char *stpcpy(char *restrict " dst ", const char *restrict " src );
+.BI "char *strcpy(char *restrict " dst ", const char *restrict " src );
+.BI "char *strcat(char *restrict " dst ", const char *restrict " src );
+.fi
+.PP
+.RS -4
+Feature Test Macro Requirements for glibc (see
+.BR feature_test_macros (7)):
+.RE
+.PP
+.BR stpcpy ():
+.nf
+    Since glibc 2.10:
+        _POSIX_C_SOURCE >= 200809L
+    Before glibc 2.10:
+        _GNU_SOURCE
 .fi
 .SH DESCRIPTION
-The
+.TP
+.BR stpcpy ()
+.TQ
 .BR strcpy ()
-function copies the string pointed to by
+These functions copy the string pointed to by
 .IR src ,
-including the terminating null byte (\(aq\e0\(aq),
-to the buffer pointed to by
-.IR dest .
-The strings may not overlap, and the destination string
-.I dest
-must be large enough to receive the copy.
-.I Beware of buffer overruns!
-(See BUGS.)
+into a string
+at the buffer pointed to by
+.IR dst .
+The programmer is responsible for allocating a destination buffer large enough,
+that is,
+.IR "strlen(src) + 1" .
+For the difference between the two functions, see RETURN VALUE.
+.TP
+.BR strcat ()
+This function catenates the string pointed to by
+.IR src ,
+after the string pointed to by
+.I dst
+(overwriting its terminating null byte).
+The programmer is responsible for allocating a destination buffer large enough,
+that is,
+.IR "strlen(dst) + strlen(src) + 1" .
+.PP
+An implementation of these functions might be:
+.PP
+.in +4n
+.EX
+char *
+stpcpy(char *restrict dst, const char *restrict src)
+{
+    char  *p;
+
+    p = mempcpy(dst, src, strlen(src));
+    *p = \(aq\e0\(aq;
+
+    return p;
+}
+
+char *
+strcpy(char *restrict dst, const char *restrict src)
+{
+    stpcpy(dst, src);
+    return dst;
+}
+
+char *
+strcat(char *restrict dst, const char *restrict src)
+{
+    stpcpy(dst + strlen(dst), src);
+    return dst;
+}
+.EE
+.in
 .SH RETURN VALUE
-The
+.TP
+.BR stpcpy ()
+This function returns
+a pointer to the terminating null byte of the copied string.
+.TP
 .BR strcpy ()
-function returns a pointer to
-the destination string
-.IR dest .
+.TQ
+.BR strcat ()
+These functions return
+.IR dst .
 .SH ATTRIBUTES
 For an explanation of the terms used in this section, see
 .BR attributes (7).
@@ -55,73 +107,87 @@ .SH ATTRIBUTES
 l l l.
 Interface	Attribute	Value
 T{
-.BR strcpy ()
+.BR stpcpy (),
+.BR strcpy (),
+.BR strcat ()
 T}	Thread safety	MT-Safe
 .TE
 .hy
 .ad
 .sp 1
 .SH STANDARDS
+.TP
+.BR stpcpy ()
+POSIX.1-2008.
+.TP
+.BR strcpy ()
+.TQ
+.BR strcat ()
 POSIX.1-2001, POSIX.1-2008, C89, C99, SVr4, 4.3BSD.
-.SH NOTES
-.SS strlcpy()
-Some systems (the BSDs, Solaris, and others) provide the following function:
+.SH CAVEATS
+The strings
+.I src
+and
+.I dst
+may not overlap.
 .PP
-.in +4n
+If the destination buffer is not large enough,
+the behavior is undefined.
+See
+.B _FORTIFY_SOURCE
+in
+.BR feature_test_macros (7).
+.PP
+.BR strcat ()
+can be very inefficient.
+Read about
+.UR https://www.joelonsoftware.com/\:2001/12/11/\:back\-to\-basics/
+Shlemiel the painter
+.UE .
+.SH EXAMPLES
+.\" SRC BEGIN (strcpy.c)
 .EX
-size_t strlcpy(char *dest, const char *src, size_t size);
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+int
+main(void)
+{
+    char    *p;
+    char    *buf1;
+    char    *buf2;
+    size_t  len, maxsize;
+
+    maxsize = strlen("Hello ") + strlen("world") + strlen("!") + 1;
+    buf1 = malloc(sizeof(*buf1) * maxsize);
+    buf2 = malloc(sizeof(*buf2) * maxsize);
+
+    p = buf1;
+    p = stpcpy(p, "Hello ");
+    p = stpcpy(p, "world");
+    p = stpcpy(p, "!");
+    len = p \- buf1;
+
+    printf("[len = %zu]: ", len);
+    puts(buf1);  // "Hello world!"
+    free(buf1);
+
+    strcpy(buf2, "Hello ");
+    strcat(buf2, "world");
+    strcat(buf2, "!");
+    len = strlen(buf2);
+
+    printf("[len = %zu]: ", len);
+    puts(buf2);  // "Hello world!"
+    free(buf2);
+
+    exit(EXIT_SUCCESS);
+}
 .EE
-.in
-.PP
-.\" http://static.usenix.org/event/usenix99/full_papers/millert/millert_html/index.html
-.\"     "strlcpy and strlcat - consistent, safe, string copy and concatenation"
-.\"     1999 USENIX Annual Technical Conference
-This function is similar to
-.BR strcpy (),
-but it copies at most
-.I size\-1
-bytes to
-.IR dest ,
-truncating the string as necessary.
-It always adds a terminating null byte.
-This function fixes some of the problems of
-.BR strcpy ()
-but the caller must still handle the possibility of data loss if
-.I size
-is too small.
-The return value of the function is the length of
-.IR src ,
-which allows truncation to be easily detected:
-if the return value is greater than or equal to
-.IR size ,
-truncation occurred.
-If loss of data matters, the caller
-.I must
-either check the arguments before the call,
-or test the function return value.
-.BR strlcpy ()
-is not present in glibc and is not standardized by POSIX,
-.\" https://lwn.net/Articles/506530/
-but is available on Linux via the
-.I libbsd
-library.
-.SH BUGS
-If the destination string of a
-.BR strcpy ()
-is not large enough, then anything might happen.
-Overflowing fixed-length string buffers is a favorite cracker technique
-for taking complete control of the machine.
-Any time a program reads or copies data into a buffer,
-the program first needs to check that there's enough space.
-This may be unnecessary if you can show that overflow is impossible,
-but be careful: programs can get changed over time,
-in ways that may make the impossible possible.
+.\" SRC END
 .SH SEE ALSO
-.BR bcopy (3),
-.BR memccpy (3),
-.BR memcpy (3),
-.BR memmove (3),
-.BR stpcpy (3),
 .BR strdup (3),
 .BR string (3),
-.BR wcscpy (3)
+.BR wcscpy (3),
+.BR string_copy (7)
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v6 4/5] stpncpy.3, strncpy.3: Document in a single page
  2022-12-15  0:26         ` [PATCH v5 0/5] Rewrite pages about " Alejandro Colomar
                             ` (3 preceding siblings ...)
  2022-12-19 21:02           ` [PATCH v6 3/5] stpcpy.3, strcpy.3, strcat.3: Document in a single page Alejandro Colomar
@ 2022-12-19 21:02           ` Alejandro Colomar
  2022-12-19 21:02           ` [PATCH v6 5/5] strncat.3: Rewrite to be consistent with string_copy.7 Alejandro Colomar
  5 siblings, 0 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-19 21:02 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, Martin Sebor, G. Branden Robinson,
	Douglas McIlroy, Jakub Wilk, Serge Hallyn, Iker Pedrosa,
	Andrew Pinski, Stefan Puiu

Rewrite to be consistent with the new string_copy.7 page.

Cc: Martin Sebor <msebor@redhat.com>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Douglas McIlroy <douglas.mcilroy@dartmouth.edu>
Cc: Jakub Wilk <jwilk@jwilk.net>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Iker Pedrosa <ipedrosa@redhat.com>
Cc: Andrew Pinski <pinskia@gmail.com>
Cc: Stefan Puiu <stefan.puiu@gmail.com>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man3/stpncpy.3 | 166 ++++++++++++++++++++++++++++++-------------------
 man3/strncpy.3 | 130 +-------------------------------------
 2 files changed, 102 insertions(+), 194 deletions(-)

diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
index e7b24036b..e80ec2fd4 100644
--- a/man3/stpncpy.3
+++ b/man3/stpncpy.3
@@ -1,16 +1,14 @@
 '\" t
-.\" Copyright (c) Bruno Haible <haible@clisp.cons.org>
-.\" Copyright (c) 2022 Alejandro Colomar <alx@kernel.org>
+.\" Copyright 2022 Alejandro Colomar <alx@kernel.org>
 .\"
-.\" SPDX-License-Identifier: GPL-2.0-or-later
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft
 .\"
-.\" References consulted:
-.\"   GNU glibc-2 source code and manual
-.\"
-.\" Corrected, aeb, 990824
 .TH stpncpy 3 (date) "Linux man-pages (unreleased)"
 .SH NAME
-stpncpy \- copy string into a fixed-length buffer and zero the rest of it
+stpncpy, strncpy
+\- zero a fixed-width buffer and
+copy a string into a character sequence with truncation
+and zero the rest of it
 .SH LIBRARY
 Standard C library
 .RI ( libc ", " \-lc )
@@ -18,9 +16,12 @@ .SH SYNOPSIS
 .nf
 .B #include <string.h>
 .PP
-.BI "char *stpncpy(char " dest "[restrict ." n "], \
-const char " src "[restrict ." n ],
-.BI "              size_t " n );
+.BI "char *stpncpy(char " dst "[restrict ." sz "], \
+const char *restrict " src ,
+.BI "               size_t " sz );
+.BI "char *strncpy(char " dst "[restrict ." sz "], \
+const char *restrict " src ,
+.BI "               size_t " sz );
 .fi
 .PP
 .RS -4
@@ -36,67 +37,44 @@ .SH SYNOPSIS
         _GNU_SOURCE
 .fi
 .SH DESCRIPTION
-.IR Note :
-This is probably not the function you want to use.
-For string copying with truncation, see
-.BR strlcpy (3bsd).
-.PP
-The
-.BR stpncpy ()
-function copies at most
-.I n
-characters of
+These functions copy the string pointed to by
 .I src
-and fills the rest of the
-.I dest
-buffer with null bytes.
-.BR Warning :
-If there is no null character among the first
-.I n
-bytes of
-.IR src ,
-the string placed in
-.I dest
-will not be null-terminated.
+into a null-padded character sequence at the fixed-width buffer pointed to by
+.IR dst .
+If the destination buffer,
+limited by its size,
+isn't large enough to hold the copy,
+the resulting character sequence is truncated.
+For the difference between the two functions, see RETURN VALUE.
 .PP
-A simple implementation of
-.BR strncpy ()
-might be:
+An implementation of these functions might be:
 .PP
 .in +4n
 .EX
 char *
-stpncpy(char *dest, const char *src, size_t n)
+stpncpy(char *restrict dst, const char *restrict src, size_t sz)
 {
-    char  *p
+    bzero(dst, sz);
+    return mempcpy(dst, src, strnlen(src, sz));
+}
 
-    bzero(dest, n);
-    p = memccpy(dest, src, \(aq\e0\(aq, n);
-    if (p == NULL)
-        return dest + n;
-
-    return p - 1;
+char *
+strncpy(char *restrict dst, const char *restrict src, size_t sz)
+{
+    stpncpy(dst, src, sz);
+    return dst;
 }
 .EE
 .in
-.PP
-The use of
-.BR strncpy ()
-is to copy a C string to a fixed-length buffer
-while ensuring that unused bytes in the destination buffer are zeroed out
-(perhaps to prevent information leaks if the buffer is to be
-written to media or transmitted to another process via an
-interprocess communication technique).
 .SH RETURN VALUE
+.TP
 .BR stpncpy ()
-returns a pointer to the terminating null byte
-in
-.IR dest ,
-or, if
-.I dest
-is not null-terminated,
-.IR dest + n
-(that is, a pointer to one-past-the-end of the array).
+returns a pointer to
+one after the last character in the destination character sequence.
+.TP
+.BR strncpy ()
+returns
+.IR dst .
 .SH ATTRIBUTES
 For an explanation of the terms used in this section, see
 .BR attributes (7).
@@ -108,16 +86,74 @@ .SH ATTRIBUTES
 l l l.
 Interface	Attribute	Value
 T{
-.BR stpncpy ()
+.BR stpncpy (),
+.BR strncpy ()
 T}	Thread safety	MT-Safe
 .TE
 .hy
 .ad
 .sp 1
 .SH STANDARDS
-This function was added to POSIX.1-2008.
-Before that, it was a GNU extension.
-It first appeared in glibc 1.07 in 1993.
+.TP
+.BR stpncpy ()
+POSIX.1-2008.
+.\" Before that, it was a GNU extension.
+.\" It first appeared in glibc 1.07 in 1993.
+.TP
+.BR strncpy ()
+POSIX.1-2001, POSIX.1-2008, C89, C99, SVr4, 4.3BSD.
+.SH CAVEATS
+The name of these functions is confusing.
+These functions produce a null-padded character sequence,
+not a string (see
+.BR string_copy (7)).
+.PP
+It's impossible to distinguish truncation by the result of the call,
+from a character sequence that just fits the destination buffer;
+truncation should be detected by
+comparing the length of the input string
+with the size of the destination buffer.
+.PP
+If you're going to use this function in chained calls,
+it would be useful to develop a similar function that accepts
+a pointer to the end (one after the last element) of the destination buffer
+instead of its size.
+.SH EXAMPLES
+.\" SRC BEGIN (stpncpy.c)
+.EX
+#include <err.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+int
+main(void)
+{
+    char    *p;
+    char    buf1[20];
+    char    buf2[20];
+    size_t  len;
+
+    if (sizeof(buf1) < strlen("Hello world!"))
+        warnx("stpncpy: truncating character sequence");
+    p = stpncpy(buf1, "Hello world!", sizeof(buf1));
+    len = p \- buf1;
+
+    printf("[len = %zu]: ", len);
+    printf("%.*s\en", (int) len, buf1);  // "Hello world!"
+
+    if (sizeof(buf2) < strlen("Hello world!"))
+        warnx("strncpy: truncating character sequence");
+    strncpy(buf2, "Hello world!", sizeof(buf));
+    len = strnlen(buf2, sizeof(buf2));
+
+    printf("[len = %zu]: ", len);
+    printf("%.*s\en", (int) len, buf2);  // "Hello world!"
+
+    exit(EXIT_SUCCESS);
+}
+.EE
+.\" SRC END
 .SH SEE ALSO
-.BR strlcpy (3bsd)
-.BR wcpncpy (3)
+.BR wcpncpy (3),
+.BR string_copy (7)
diff --git a/man3/strncpy.3 b/man3/strncpy.3
index e2ffc683f..4710b0201 100644
--- a/man3/strncpy.3
+++ b/man3/strncpy.3
@@ -1,129 +1 @@
-.\" Copyright (C) 1993 David Metcalfe <david@prism.demon.co.uk>
-.\" Copyright (C) 2022 Alejandro Colomar <alx@kernel.org>
-.\"
-.\" SPDX-License-Identifier: Linux-man-pages-copyleft
-.\"
-.\" References consulted:
-.\"     Linux libc source code
-.\"     Lewine's _POSIX Programmer's Guide_ (O'Reilly & Associates, 1991)
-.\"     386BSD man pages
-.\" Modified Sat Jul 24 18:06:49 1993 by Rik Faith (faith@cs.unc.edu)
-.\" Modified Fri Aug 25 23:17:51 1995 by Andries Brouwer (aeb@cwi.nl)
-.\" Modified Wed Dec 18 00:47:18 1996 by Andries Brouwer (aeb@cwi.nl)
-.\" 2007-06-15, Marc Boyer <marc.boyer@enseeiht.fr> + mtk
-.\"     Improve discussion of strncpy().
-.\"
-.TH strncpy 3 (date) "Linux man-pages (unreleased)"
-.SH NAME
-strncpy \- copy a string into a fixed-length buffer and zero the rest of it
-.SH LIBRARY
-Standard C library
-.RI ( libc ", " \-lc )
-.SH SYNOPSIS
-.nf
-.B #include <string.h>
-.PP
-.BI "[[deprecated]] char *strncpy(char " dest "[restrict ." n ],
-.BI "                             const char " src "[restrict ." n "], \
-size_t " n );
-.fi
-.SH DESCRIPTION
-.BI Note: " This is not the function you want to use."
-For string copying with truncation, see
-.BR strlcpy (3bsd).
-For copying a string into a fixed-length buffer with zeroing of the rest,
-see
-.BR stpncpy (3).
-.PP
-.BR strncpy ()
-copies at most
-.I n
-bytes of
-.IR src ,
-and fills the rest of the
-.I dest
-buffer with null bytes.
-.BR Warning :
-If there is no null byte
-among the first
-.I n
-bytes of
-.IR src ,
-the string placed in
-.I dest
-will not be null-terminated.
-.PP
-A simple implementation of
-.BR strncpy ()
-might be:
-.PP
-.in +4n
-.EX
-char *
-strncpy(char *dest, const char *src, size_t n)
-{
-    bzero(dest, n);
-    memccpy(dest, src, \(aq\e0\(aq, n);
-
-    return dest;
-}
-.EE
-.in
-.PP
-The use of
-.BR strncpy ()
-is to copy a C string to a fixed-length buffer
-while ensuring that unused bytes in the destination buffer are zeroed out
-(perhaps to prevent information leaks if the buffer is to be
-written to media or transmitted to another process via an
-interprocess communication technique).
-But
-.BR stpncpy (3)
-is better for this purpose,
-since it detects truncation.
-See BUGS below.
-.SH RETURN VALUE
-The
-.BR strncpy ()
-function returns a pointer to
-the destination buffer
-.IR dest .
-.SH ATTRIBUTES
-For an explanation of the terms used in this section, see
-.BR attributes (7).
-.ad l
-.nh
-.TS
-allbox;
-lbx lb lb
-l l l.
-Interface	Attribute	Value
-T{
-.BR strncpy ()
-T}	Thread safety	MT-Safe
-.TE
-.hy
-.ad
-.sp 1
-.SH STANDARDS
-POSIX.1-2001, POSIX.1-2008, C89, C99, SVr4, 4.3BSD.
-.SH BUGS
-.BR strncpy ()
-has a misleading name.
-It doesn't produce a (null-terminated) string;
-and it should never be used for producing a string.
-.PP
-It can't detect truncation.
-It's probably better to explicitly call
-.BR bzero (3)
-and
-.BR memccpy (3),
-or
-.BR stpncpy (3)
-since they allow detecting truncation.
-.SH SEE ALSO
-.BR bzero (3),
-.BR memccpy (3),
-.BR stpncpy (3),
-.BR string (3),
-.BR wcsncpy (3)
+.so man3/stpncpy.3
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v6 5/5] strncat.3: Rewrite to be consistent with string_copy.7.
  2022-12-15  0:26         ` [PATCH v5 0/5] Rewrite pages about " Alejandro Colomar
                             ` (4 preceding siblings ...)
  2022-12-19 21:02           ` [PATCH v6 4/5] stpncpy.3, strncpy.3: " Alejandro Colomar
@ 2022-12-19 21:02           ` Alejandro Colomar
  5 siblings, 0 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-19 21:02 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, Martin Sebor, G. Branden Robinson,
	Douglas McIlroy, Jakub Wilk, Serge Hallyn, Iker Pedrosa,
	Andrew Pinski, Stefan Puiu

Cc: Martin Sebor <msebor@redhat.com>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Douglas McIlroy <douglas.mcilroy@dartmouth.edu>
Cc: Jakub Wilk <jwilk@jwilk.net>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Iker Pedrosa <ipedrosa@redhat.com>
Cc: Andrew Pinski <pinskia@gmail.com>
Cc: Stefan Puiu <stefan.puiu@gmail.com>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man3/strncat.3 | 157 ++++++++++++++++++-------------------------------
 1 file changed, 57 insertions(+), 100 deletions(-)

diff --git a/man3/strncat.3 b/man3/strncat.3
index 6e4bf6d78..45fe0575c 100644
--- a/man3/strncat.3
+++ b/man3/strncat.3
@@ -1,10 +1,11 @@
+'\" t
 .\" Copyright 2022 Alejandro Colomar <alx@kernel.org>
 .\"
 .\" SPDX-License-Identifier: Linux-man-pages-copyleft
 .\"
 .TH strncat 3 (date) "Linux man-pages (unreleased)"
 .SH NAME
-strncat \- concatenate an unterminated string into a string
+strncat \- concatenate a null-padded character sequence into a string
 .SH LIBRARY
 Standard C library
 .RI ( libc ", " \-lc )
@@ -12,54 +13,41 @@ .SH SYNOPSIS
 .nf
 .B #include <string.h>
 .PP
-.BI "char *strncat(char " dest "[restrict strlen(." dest ") + ." n " + 1],"
-.BI "              const char " src "[restrict ." n ],
-.BI "              size_t " n );
+.BI "char *strncat(char *restrict " dst ", const char " src "[restrict ." sz ],
+.BI "               size_t " sz );
 .fi
 .SH DESCRIPTION
-.IR Note :
-This is probably not the function you want to use.
-For string concatenation with truncation, see
-.BR strlcat (3bsd).
-For copying or concatenating a string into a fixed-length buffer
-with zeroing of the rest, see
-.BR stpncpy (3).
-.PP
-.BR strncat ()
-appends at most
-.I n
-characters of
-.I src
-to the end of
+This function catenates the input character sequence
+contained in a null-padded fixed-width buffer,
+into a string at the buffer pointed to by
 .IR dst .
-It always terminates with a null character the string placed in
-.IR dest .
+The programmer is responsible for allocating a destination buffer large enough,
+that is,
+.IR "strlen(dst) + strnlen(src, sz) + 1" .
 .PP
-An implementation of
-.BR strncat ()
-might be:
+An implementation of this function might be:
 .PP
 .in +4n
 .EX
 char *
-strncat(char *dest, const char *src, size_t n)
+strncat(char *restrict dst, const char *restrict src, size_t sz)
 {
-    char    *cat;
-    size_t  len;
+    int   len;
+    char  *p;
 
-    cat = dest + strlen(dest);
-    len = strnlen(src, n);
-    memcpy(cat, src, len);
-    cat[len] = \(aq\e0\(aq;
+    len = strnlen(src, sz);
+    p = dst + strlen(dst);
+    p = mempcpy(p, src, len);
+    *p = \(aq\e0\(aq;
 
-    return dest;
+    return dst;
 }
 .EE
 .in
 .SH RETURN VALUE
 .BR strncat ()
-returns a pointer to the resulting string
-.IR dest .
+returns
+.IR dst .
 .SH ATTRIBUTES
 For an explanation of the terms used in this section, see
 .BR attributes (7).
@@ -79,65 +67,25 @@ .SH ATTRIBUTES
 .sp 1
 .SH STANDARDS
 POSIX.1-2001, POSIX.1-2008, C89, C99, SVr4, 4.3BSD.
-.SH NOTES
-.SS ustr2stpe()
-You may want to write your own function similar to
-.BR strncpy (),
-with the following improvements:
-.IP \(bu 3
-Copy, instead of concatenating.
-There's no equivalent of
-.BR strncat ()
-that copies instead of concatenating.
-.IP \(bu
-Allow chaining the function,
-by returning a suitable pointer.
-Copy chaining is faster than concatenating.
-.IP \(bu
-Don't check for null characters in the middle of the unterminated string.
-If the string is terminated, this function should not be used.
-If the string is unterminated, it is unnecessary.
-.IP \(bu
-A name that tells what it does:
-Copy from an
-.IR u nterminated
-.IR str ing
-to a
-.IR st ring,
-and return a
-.IR p ointer
-to its end.
-.PP
-.in +4n
-.EX
-/* This code is in the public domain.
- *
- * char *ustr2stp(char dst[restrict .n+1],
- *                const char src[restrict .n],
- *                size_t len);
- */
-char *
-ustr2stp(char *restrict dst, const char *restrict src, size_t len)
-{
-    memcpy(dst, src, len);
-    dst[len] = \(aq\e0\(aq;
-
-    return dst + len;
-}
-.EE
-.in
 .SH CAVEATS
-This function doesn't know the size of the destination buffer,
-so it can overrun the buffer if the programmer wasn't careful enough.
-.SH BUGS
-.BR strncat (3)
-has a misleading name;
-it has no relationship with
+The name of this function is confusing.
+This function has no relation to
 .BR strncpy (3).
+.PP
+If the destination buffer is not large enough,
+the behavior is undefined.
+See
+.B _FORTIFY_SOURCE
+in
+.BR feature_test_macros (7).
+.SH BUGS
+This function can be very inefficient.
+Read about
+.UR https://www.joelonsoftware.com/\:2001/12/11/\:back\-to\-basics/
+Shlemiel the painter
+.UE .
 .SH EXAMPLES
-The following program creates a string
-from a concatenation of unterminated strings.
-.\" SRC BEGIN (strncpy.c)
+.\" SRC BEGIN (strncat.c)
 .EX
 #include <stdio.h>
 #include <stdlib.h>
@@ -148,24 +96,33 @@ .SH EXAMPLES
 int
 main(void)
 {
-    char pre[4] = "pre.";
-    char *post = ".post";
-    char *src = "some_long_body.post";
-    char dest[100];
+    size_t  maxsize;
 
-    dest[0] = \(aq\e0\(aq;
+    // Null-padded fixed-width character sequences
+    char    pre[4] = "pre.";
+    char    new_post[50] = ".foo.bar";
+
+    // Strings
+    char    post[] = ".post";
+    char    src[] = "some_long_body.post";
+    char    *dest;
+
+    maxsize = nitems(pre) + strlen(src) \- strlen(post) +
+              nitems(new_post) + 1;
+    dest = malloc(sizeof(*dest) * maxsize);
+
+    dest[0] = \(aq\e0\(aq;  // There's no 'cpy' function to this 'cat'.
     strncat(dest, pre, nitems(pre));
     strncat(dest, src, strlen(src) \- strlen(post));
+    strncat(dest, new_post, nitems(new_post));
 
-    puts(dest);  // "pre.some_long_body"
+    puts(dest);  // "pre.some_long_body.foo.bar"
+    free(dest);
     exit(EXIT_SUCCESS);
 }
 .EE
 .\" SRC END
 .in
 .SH SEE ALSO
-.BR memccpy (3),
-.BR memcpy (3),
-.BR mempcpy (3),
-.BR strcpy (3),
-.BR string (3)
+.BR string (3),
+.BR string_copy (3)
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH v6 1/5] string_copy.7: Add page to document all string-copying functions
  2022-12-19 21:02           ` [PATCH v6 1/5] string_copy.7: Add page to document all " Alejandro Colomar
@ 2022-12-20 15:00             ` Stefan Puiu
  2022-12-20 15:03               ` Alejandro Colomar
  2023-01-20  3:43             ` Eric Biggers
  1 sibling, 1 reply; 53+ messages in thread
From: Stefan Puiu @ 2022-12-20 15:00 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: linux-man, Alejandro Colomar, Martin Sebor, G. Branden Robinson,
	Douglas McIlroy, Jakub Wilk, Serge Hallyn, Iker Pedrosa,
	Andrew Pinski

Hi,

Noticed a typo below

On Mon, Dec 19, 2022 at 11:02 PM Alejandro Colomar
<alx.manpages@gmail.com> wrote:
>
> This is an opportunity to use consistent language across the
> documentation for all string-copying functions.
>
> It is also easier to show the similarities and differences between all
> of the functions, so that a reader can use this page to know which
> function is needed for a given task.
>
> Alternative functions not provided by libc have been given in the same
> page, with reference implementations.
>
> Cc: Martin Sebor <msebor@redhat.com>
> Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
> Cc: Douglas McIlroy <douglas.mcilroy@dartmouth.edu>
> Cc: Jakub Wilk <jwilk@jwilk.net>
> Cc: Serge Hallyn <serge@hallyn.com>
> Cc: Iker Pedrosa <ipedrosa@redhat.com>
> Cc: Andrew Pinski <pinskia@gmail.com>
> Cc: Stefan Puiu <stefan.puiu@gmail.com>
> Signed-off-by: Alejandro Colomar <alx@kernel.org>
> ---
>  man7/string_copy.7 | 855 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 855 insertions(+)
>  create mode 100644 man7/string_copy.7
>
> diff --git a/man7/string_copy.7 b/man7/string_copy.7
> new file mode 100644
> index 000000000..a32b93c01
> --- /dev/null
> +++ b/man7/string_copy.7
> @@ -0,0 +1,855 @@
> +.\" Copyright 2022 Alejandro Colomar <alx@kernel.org>
> +.\"
> +.\" SPDX-License-Identifier: BSD-3-Clause
> +.\"
> +.TH string_copy 7 (date) "Linux man-pages (unreleased)"
> +.\" ----- NAME :: -----------------------------------------------------/
> +.SH NAME
> +stpcpy,
> +strcpy, strcat,
> +stpecpy, stpecpyx,
> +strlcpy, strlcat,
> +stpncpy,
> +strncpy,
> +zustr2ustp, zustr2stp,
> +strncat,
> +ustpcpy, ustr2stp
> +\- copy strings and character sequences
> +.\" ----- SYNOPSIS :: -------------------------------------------------/
> +.SH SYNOPSIS
> +.\" ----- SYNOPSIS :: (Null-terminated) strings -----------------------/
> +.SS Strings
> +.nf
> +// Chain-copy a string.
> +.BI "char *stpcpy(char *restrict " dst ", const char *restrict " src );
> +.PP
> +// Copy/catenate a string.
> +.BI "char *strcpy(char *restrict " dst ", const char *restrict " src );
> +.BI "char *strcat(char *restrict " dst ", const char *restrict " src );
> +.PP
> +// Chain-copy a string with truncation.
> +.BI "char *stpecpy(char *" dst ", char " end "[0], const char *restrict " src );
> +.PP
> +// Chain-copy a string with truncation and SIGSEGV on UB.
> +.BI "char *stpecpyx(char *" dst ", char " end "[0], const char *restrict " src );
> +.PP
> +// Copy/catenate a string with truncation and SIGSEGV on UB.
> +.BI "size_t strlcpy(char " dst "[restrict ." sz "], \
> +const char *restrict " src ,
> +.BI "               size_t " sz );
> +.BI "size_t strlcat(char " dst "[restrict ." sz "], \
> +const char *restrict " src ,
> +.BI "               size_t " sz );
> +.fi
> +.\" ----- SYNOPSIS :: Null-padded character sequences --------/
> +.SS Null-padded character sequences
> +.nf
> +// Zero a fixed-width buffer, and
> +// copy a string into a character sequence with truncation.
> +.BI "char *stpncpy(char " dst "[restrict ." sz "], \
> +const char *restrict " src ,
> +.BI "               size_t " sz );
> +.PP
> +// Zero a fixed-width buffer, and
> +// copy a string into a character sequence with truncation.
> +.BI "char *strncpy(char " dest "[restrict ." sz "], \
> +const char *restrict " src ,
> +.BI "               size_t " sz );
> +.PP
> +// Chain-copy a null-padded character sequence into a character sequence.
> +.BI "char *zustr2ustp(char *restrict " dst ", \
> +const char " src "[restrict ." sz ],
> +.BI "               size_t " sz );
> +.PP
> +// Chain-copy a null-padded character sequence into a string.
> +.BI "char *zustr2stp(char *restrict " dst ", \
> +const char " src "[restrict ." sz ],
> +.BI "               size_t " sz );
> +.PP
> +// Catenate a null-padded character sequence into a string.
> +.BI "char *strncat(char *restrict " dst ", const char " src "[restrict ." sz ],
> +.BI "               size_t " sz );
> +.fi
> +.\" ----- SYNOPSIS :: Measured character sequences --------------------/
> +.SS Measured character sequences
> +.nf
> +// Chain-copy a measured character sequence.
> +.BI "char *ustpcpy(char *restrict " dst ", \
> +const char " src "[restrict ." len ],
> +.BI "               size_t " len );
> +.PP
> +// Chain-copy a measured character sequence into a string.
> +.BI "char *ustr2stp(char *restrict " dst ", \
> +const char " src "[restrict ." len ],
> +.BI "               size_t " len );
> +.fi
> +.SH DESCRIPTION
> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: -----------------/
> +.SS Terms (and abbreviations)
> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: string (str) ----/
> +.TP
> +.IR "string " ( str )
> +is a sequence of zero or more non-null characters followed by a null byte.
> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: null-padded character seq
> +.TP
> +.I character sequence
> +is a sequence of zero or more non-null characters.
> +A program should never usa a character sequence where a string is required.

Here I think you want s/usa/use above.

Thanks,
Stefan.

> +However, with appropriate care,
> +a string can be used in the place of a character sequence.
> +.RS
> +.TP
> +.IR "null-padded character sequence " ( zustr )
> +Character sequences can be contained in fixed-width buffers,
> +which contain padding null bytes after the character sequence,
> +to fill the rest of the buffer
> +without affecting the character sequence;
> +however, those padding null bytes are not part of the character sequence.
> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: measured character sequence
> +.TP
> +.IR "measured character sequence " ( ustr )
> +Character sequence delimited by its length.
> +It may be a slice of a larger character sequence,
> +or even of a string.
> +.RE
> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: length (len) ----/
> +.TP
> +.IR "length " ( len )
> +is the number of non-null characters in a string or character sequence.
> +It is the return value of
> +.I strlen(str)
> +and of
> +.IR "strnlen(ustr, sz)" .
> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: size (sz) -------/
> +.TP
> +.IR "size " ( sz )
> +refers to the entire buffer
> +where the string or character sequence is contained.
> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: end -------------/
> +.TP
> +.I end
> +is the name of a pointer to one past the last element of a buffer.
> +It is equivalent to
> +.IR &str[sz] .
> +It is used as a sentinel value,
> +to be able to truncate strings or character sequences
> +instead of overrunning the containing buffer.
> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: copy ------------/
> +.TP
> +.I copy
> +This term is used when
> +the writing starts at the first element pointed to by
> +.IR dst .
> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: catenate --------/
> +.TP
> +.I catenate
> +This term is used when
> +a function first finds the terminating null byte in
> +.IR dst ,
> +and then starts writing at that position.
> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: chain -----------/
> +.TP
> +.I chain
> +This term is used when
> +it's the programmer who provides
> +a pointer to the terminating null byte in the string
> +.I dst
> +(or one after the last character in a character sequence),
> +and the function starts writing at that location.
> +The function returns
> +a pointer to the new location of the terminating null byte
> +(or one after the last character in a character sequence)
> +after the call,
> +so that the programmer can use it to chain such calls.
> +.\" ----- DESCRIPTION :: Copy, catenate, and chain-copy ---------------/
> +.SS Copy, catenate, and chain-copy
> +Originally,
> +there was a distinction between functions that copy and those that catenate.
> +However, newer functions that copy while allowing chaining
> +cover both use cases with a single API.
> +They are also algorithmically faster,
> +since they don't need to search for
> +the terminating null byte of the existing string.
> +However, functions that catenate have a much simpler use,
> +so if performance is not important,
> +it can make sense to use them for improving readability.
> +.PP
> +The pointer returned by functions that allow chaining
> +is a byproduct of the copy operation,
> +so it has no performance costs.
> +Functions that return such a pointer,
> +and thus can be chained,
> +have names of the form
> +.RB * stp *(),
> +since it's common to name the pointer just
> +.IR p .
> +.PP
> +Chain-copying functions that truncate
> +should accept a pointer to the end of the destination buffer,
> +and have names of the form
> +.RB * stpe *().
> +This allows not having to recalculate the remaining size after each call.
> +.\" ----- DESCRIPTION :: Truncate or not? -----------------------------/
> +.SS Truncate or not?
> +The first thing to note is that programmers should be careful with buffers,
> +so they always have the correct size,
> +and truncation is not necessary.
> +.PP
> +In most cases,
> +truncation is not desired,
> +and it is simpler to just do the copy.
> +Simpler code is safer code.
> +Programming against programming mistakes by adding more code
> +just adds more points where mistakes can be made.
> +.PP
> +Nowadays,
> +compilers can detect most programmer errors with features like
> +compiler warnings,
> +static analyzers, and
> +.BR \%_FORTIFY_SOURCE
> +(see
> +.BR ftm (7)).
> +Keeping the code simple
> +helps these overflow-detection features be more precise.
> +.PP
> +When validating user input,
> +however,
> +it makes sense to truncate.
> +Remember to check the return value of such function calls.
> +.PP
> +Functions that truncate:
> +.IP \(bu 3
> +.BR stpecpy (3)
> +is the most efficient string copy function that performs truncation.
> +It only requires to check for truncation once after all chained calls.
> +.IP \(bu
> +.BR stpecpyx (3)
> +is a variant of
> +.BR stpecpy (3)
> +that consumes the entire source string,
> +to catch bugs in the program
> +by forcing a segmentation fault (as
> +.BR strlcpy (3bsd)
> +and
> +.BR strlcat (3bsd)
> +do).
> +.IP \(bu
> +.BR strlcpy (3bsd)
> +and
> +.BR strlcat (3bsd)
> +are designed to crash if the input string is invalid
> +(doesn't contain a terminating null byte).
> +.IP \(bu
> +.BR stpncpy (3)
> +and
> +.BR strncpy (3)
> +also truncate, but they don't write strings,
> +but rather null-padded character sequences.
> +.\" ----- DESCRIPTION :: Null-padded character sequences --------------/
> +.SS Null-padded character sequences
> +For historic reasons,
> +some standard APIs,
> +such as
> +.BR utmpx (5),
> +use null-padded character sequences in fixed-width buffers.
> +To interface with them,
> +specialized functions need to be used.
> +.PP
> +To copy strings into them, use
> +.BR stpncpy (3).
> +.PP
> +To copy from an unterminated string within a fixed-width buffer into a string,
> +ignoring any trailing null bytes in the source fixed-width buffer,
> +you should use
> +.BR zustr2stp (3)
> +or
> +.BR strncat (3).
> +.PP
> +To copy from an unterminated string within a fixed-width buffer
> +into a character sequence,
> +ingoring any trailing null bytes in the source fixed-width buffer,
> +you should use
> +.BR zustr2ustp (3).
> +.\" ----- DESCRIPTION :: Measured character sequences -----------------/
> +.SS Measured character sequences
> +The simplest character sequence copying function is
> +.BR mempcpy (3).
> +It requires always knowing the length of your character sequences,
> +for which structures can be used.
> +It makes the code much faster,
> +since you always know the length of your character sequences,
> +and can do the minimal copies and length measurements.
> +.BR mempcpy (3)
> +copies character sequences,
> +so you need to explicitly set the terminating null byte if you need a string.
> +.PP
> +However,
> +for keeping type safety,
> +it's good to add a wrapper that uses
> +.I char\~*
> +instead of
> +.IR void\~* :
> +.BR ustpcpy (3).
> +.PP
> +In programs that make considerable use of strings or character sequences,
> +and need the best performance,
> +using overlapping character sequences can make a big difference.
> +It allows holding subsequences of a larger character sequence.
> +while not duplicating memory
> +nor using time to do a copy.
> +.PP
> +However, this is delicate,
> +since it requires using character sequences.
> +C library APIs use strings,
> +so programs that use character sequences
> +will have to take care of differentiating strings from character sequences.
> +.PP
> +To copy a measured character sequence, use
> +.BR ustpcpy (3).
> +.PP
> +To copy a measured character sequence into a string, use
> +.BR ustr2stp (3).
> +.PP
> +Because these functions ask for the length,
> +and a string is by nature composed of a character sequence of the same length
> +plus a terminating null byte,
> +a string is also accepted as input.
> +.\" ----- DESCRIPTION :: String vs character sequence -----------------/
> +.SS String vs character sequence
> +Some functions only operate on strings.
> +Those require that the input
> +.I src
> +is a string,
> +and guarantee an output string
> +(even when truncation occurs).
> +Functions that catenate
> +also require that
> +.I dst
> +holds a string before the call.
> +List of functions:
> +.IP \(bu 3
> +.PD 0
> +.BR stpcpy (3)
> +.IP \(bu
> +.BR strcpy "(3), \c"
> +.BR strcat (3)
> +.IP \(bu
> +.BR stpecpy "(3), \c"
> +.BR stpecpyx (3)
> +.IP \(bu
> +.BR strlcpy "(3bsd), \c"
> +.BR strlcat (3bsd)
> +.PD
> +.PP
> +Other functions require an input string,
> +but create a character sequence as output.
> +These functions have confusing names,
> +and have a long history of misuse.
> +List of functions:
> +.IP \(bu 3
> +.PD 0
> +.BR stpncpy (3)
> +.IP \(bu
> +.BR strncpy (3)
> +.PD
> +.PP
> +Other functions operate on an input character sequence,
> +and create an output string.
> +Functions that catenate
> +also require that
> +.I dst
> +holds a string before the call.
> +.BR strncat (3)
> +has an even more misleading name than the functions above.
> +List of functions:
> +.IP \(bu 3
> +.PD 0
> +.BR zustr2stp (3)
> +.IP \(bu
> +.BR strncat (3)
> +.IP \(bu
> +.BR ustr2stp (3)
> +.PD
> +.PP
> +Other functions operate on an input character sequence
> +to create an output character sequence.
> +List of functions:
> +.IP \(bu 3
> +.PD 0
> +.BR ustpcpy (3)
> +.IP \(bu
> +.BR zustr2stp (3)
> +.PD
> +.\" ----- DESCRIPTION :: Functions :: ---------------------------------/
> +.SS Functions
> +.\" ----- DESCRIPTION :: Functions :: stpcpy(3) -----------------------/
> +.TP
> +.BR stpcpy (3)
> +This function copies the input string into a destination string.
> +The programmer is responsible for allocating a buffer large enough.
> +It returns a pointer suitable for chaining.
> +.\" ----- DESCRIPTION :: Functions :: strcpy(3), strcat(3) ------------/
> +.TP
> +.BR strcpy (3)
> +.TQ
> +.BR strcat (3)
> +These functions copy and catenate the input string into a destination string.
> +The programmer is responsible for allocating a buffer large enough.
> +The return value is useless.
> +.IP
> +.BR stpcpy (3)
> +is a faster alternative to these functions.
> +.\" ----- DESCRIPTION :: Functions :: stpecpy(3), stpecpyx(3) ---------/
> +.TP
> +.BR stpecpy (3)
> +.TQ
> +.BR stpecpyx (3)
> +These functions copy the input string into a destination string.
> +If the destination buffer,
> +limited by a pointer to its end,
> +isn't large enough to hold the copy,
> +the resulting string is truncated
> +(but it is guaranteed to be null-terminated).
> +They return a pointer suitable for chaining.
> +Truncation needs to be detected only once after the last chained call.
> +.BR stpecpyx (3)
> +has identical semantics to
> +.BR stpecpy (3),
> +except that it forces a SIGSEGV if the
> +.I src
> +pointer is not a string.
> +.IP
> +These functions are not provided by any library;
> +See EXAMPLES for a reference implementation.
> +.\" ----- DESCRIPTION :: Functions :: strlcpy(3bsd), strlcat(3bsd) ----/
> +.TP
> +.BR strlcpy (3bsd)
> +.TQ
> +.BR strlcat (3bsd)
> +These functions copy and catenate the input string into a destination string.
> +If the destination buffer,
> +limited by its size,
> +isn't large enough to hold the copy,
> +the resulting string is truncated
> +(but it is guaranteed to be null-terminated).
> +They return the length of the total string they tried to create.
> +These functions force a SIGSEGV if the
> +.I src
> +pointer is not a string.
> +.IP
> +.BR stpecpyx (3)
> +is a faster alternative to these functions.
> +.\" ----- DESCRIPTION :: Functions :: stpncpy(3) ----------------------/
> +.TP
> +.BR stpncpy (3)
> +This function copies the input string into
> +a destination null-padded character sequence in a fixed-width buffer.
> +If the destination buffer,
> +limited by its size,
> +isn't large enough to hold the copy,
> +the resulting character sequence is truncated.
> +Since it creates a character sequence,
> +it doesn't need to write a terminating null byte.
> +It's impossible to distinguish truncation by the result of the call,
> +from a character sequence that just fits the destination buffer;
> +truncation should be detected by
> +comparing the length of the input string
> +with the size of the destination buffer.
> +.\" ----- DESCRIPTION :: Functions :: strncpy(3) ----------------------/
> +.TP
> +.BR strncpy (3)
> +This function is identical to
> +.BR stpncpy (3)
> +except for the useless return value.
> +.IP
> +.BR stpncpy (3)
> +is a more useful alternative to this function.
> +.\" ----- DESCRIPTION :: Functions :: zustr2ustp(3) --------------------/
> +.TP
> +.BR zustr2ustp (3)
> +This function copies the input character sequence
> +contained in a null-padded wixed-width buffer,
> +into a destination character sequence.
> +The programmer is responsible for allocating a buffer large enough.
> +It returns a pointer suitable for chaining.
> +.IP
> +A truncating version of this function doesn't exist,
> +since the size of the original character sequence is always known,
> +so it wouldn't be very useful.
> +.IP
> +This function is not provided by any library;
> +See EXAMPLES for a reference implementation.
> +.\" ----- DESCRIPTION :: Functions :: zustr2stp(3) --------------------/
> +.TP
> +.BR zustr2stp (3)
> +This function copies the input character sequence
> +contained in a null-padded wixed-width buffer,
> +into a destination string.
> +The programmer is responsible for allocating a buffer large enough.
> +It returns a pointer suitable for chaining.
> +.IP
> +A truncating version of this function doesn't exist,
> +since the size of the original character sequence is always known,
> +so it wouldn't be very useful.
> +.IP
> +This function is not provided by any library;
> +See EXAMPLES for a reference implementation.
> +.\" ----- DESCRIPTION :: Functions :: strncat(3) ----------------------/
> +.TP
> +.BR strncat (3)
> +Do not confuse this function with
> +.BR strncpy (3);
> +they are not related at all.
> +.IP
> +This function catenates the input character sequence
> +contained in a null-padded wixed-width buffer,
> +into a destination string.
> +The programmer is responsible for allocating a buffer large enough.
> +The return value is useless.
> +.IP
> +.BR zustr2stp (3)
> +is a faster alternative to this function.
> +.\" ----- DESCRIPTION :: Functions :: ustpcpy(3) ----------------------/
> +.TP
> +.BR ustpcpy (3)
> +This function copies the input character sequence,
> +limited by its length,
> +into a destination character sequence.
> +The programmer is responsible for allocating a buffer large enough.
> +It returns a pointer suitable for chaining.
> +.\" ----- DESCRIPTION :: Functions :: ustr2stp(3) ---------------------/
> +.TP
> +.BR ustr2stp (3)
> +This function copies the input character sequence,
> +limited by its length,
> +into a destination string.
> +The programmer is responsible for allocating a buffer large enough.
> +It returns a pointer suitable for chaining.
> +.\" ----- RETURN VALUE :: ---------------------------------------------/
> +.SH RETURN VALUE
> +The following functions return
> +a pointer to the terminating null byte in the destination string.
> +.IP \(bu 3
> +.PD 0
> +.BR stpcpy (3)
> +.IP \(bu
> +.BR ustr2stp (3)
> +.IP \(bu
> +.BR zustr2stp (3)
> +.PD
> +.PP
> +The following functions return
> +a pointer to the terminating null byte in the destination string,
> +except when truncation occurs;
> +if truncation occurs,
> +they return a pointer to the end of the destination buffer.
> +.IP \(bu 3
> +.BR stpecpy (3),
> +.BR stpecpyx (3)
> +.PP
> +The following function returns
> +a pointer to one after the last character
> +in the destination character sequence;
> +if truncation occurs,
> +that pointer is equivalent to
> +a pointer to the end of the destination buffer.
> +.IP \(bu 3
> +.BR stpncpy (3)
> +.PP
> +The following functions return
> +a pointer to one after the last character
> +in the destination character sequence.
> +.IP \(bu 3
> +.PD 0
> +.BR zustr2ustp (3)
> +.IP \(bu
> +.BR ustpcpy (3)
> +.PD
> +.PP
> +The following functions return
> +the length of the total string that they tried to create
> +(as if truncation didn't occur).
> +.IP \(bu 3
> +.BR strlcpy (3bsd),
> +.BR strlcat (3bsd)
> +.PP
> +The following functions return the
> +.I dst
> +pointer,
> +which is useless.
> +.IP \(bu 3
> +.PD 0
> +.BR strcpy (3),
> +.BR strcat (3)
> +.IP \(bu
> +.BR strncpy (3)
> +.IP \(bu
> +.BR strncat (3)
> +.PD
> +.\" ----- NOTES :: strscpy(9) -----------------------------------------/
> +.SH NOTES
> +The Linux kernel has an internal function for copying strings,
> +which is similar to
> +.BR stpecpy (3),
> +except that it can't be chained:
> +.TP
> +.BR strscpy (9)
> +This function copies the input string into a destination string.
> +If the destination buffer,
> +limited by its size,
> +isn't large enough to hold the copy,
> +the resulting string is truncated
> +(but it is guaranteed to be null-terminated).
> +It returns the length of the destination string, or
> +.B \-E2BIG
> +on truncation.
> +.IP
> +.BR stpecpy (3)
> +is a simpler and faster alternative to this function.
> +.RE
> +.\" ----- CAVEATS :: --------------------------------------------------/
> +.SH CAVEATS
> +Don't mix chain calls to truncating and non-truncating functions.
> +It is conceptually wrong
> +unless you know that the first part of a copy will always fit.
> +Anyway, the performance difference will probably be negligible,
> +so it will probably be more clear if you use consistent semantics:
> +either truncating or non-truncating.
> +Calling a non-truncating function after a truncating one is necessarily wrong.
> +.\" ----- BUGS :: -----------------------------------------------------/
> +.SH BUGS
> +All catenation functions share the same performance problem:
> +.UR https://www.joelonsoftware.com/\:2001/12/11/\:back\-to\-basics/
> +Shlemiel the painter
> +.UE .
> +.\" ----- EXAMPLES :: -------------------------------------------------/
> +.SH EXAMPLES
> +The following are examples of correct use of each of these functions.
> +.\" ----- EXAMPLES :: stpcpy(3) ---------------------------------------/
> +.TP
> +.BR stpcpy (3)
> +.EX
> +p = buf;
> +p = stpcpy(p, "Hello ");
> +p = stpcpy(p, "world");
> +p = stpcpy(p, "!");
> +len = p \- buf;
> +puts(buf);
> +.EE
> +.\" ----- EXAMPLES :: strcpy(3), strcat(3) ----------------------------/
> +.TP
> +.BR strcpy (3)
> +.TQ
> +.BR strcat (3)
> +.EX
> +strcpy(buf, "Hello ");
> +strcat(buf, "world");
> +strcat(buf, "!");
> +len = strlen(buf);
> +puts(buf);
> +.EE
> +.\" ----- EXAMPLES :: stpecpy(3), stpecpyx(3) -------------------------/
> +.TP
> +.BR stpecpy (3)
> +.TQ
> +.BR stpecpyx (3)
> +.EX
> +end = buf + sizeof(buf);
> +p = buf;
> +p = stpecpy(p, end, "Hello ");
> +p = stpecpy(p, end, "world");
> +p = stpecpy(p, end, "!");
> +if (p == end) {
> +    p\-\-;
> +    goto toolong;
> +}
> +len = p \- buf;
> +puts(buf);
> +.EE
> +.\" ----- EXAMPLES :: strlcpy(3bsd), strlcat(3bsd) --------------------/
> +.TP
> +.BR strlcpy (3bsd)
> +.TQ
> +.BR strlcat (3bsd)
> +.EX
> +if (strlcpy(buf, "Hello ", sizeof(buf)) >= sizeof(buf))
> +    goto toolong;
> +if (strlcat(buf, "world", sizeof(buf)) >= sizeof(buf))
> +    goto toolong;
> +len = strlcat(buf, "!", sizeof(buf));
> +if (len >= sizeof(buf))
> +    goto toolong;
> +puts(buf);
> +.EE
> +.\" ----- EXAMPLES :: strscpy(9) --------------------------------------/
> +.TP
> +.BR strscpy (9)
> +.EX
> +len = strscpy(buf, "Hello world!", sizeof(buf));
> +if (len == \-E2BIG)
> +    goto toolong;
> +puts(buf);
> +.EE
> +.\" ----- EXAMPLES :: stpncpy(3) --------------------------------------/
> +.TP
> +.BR stpncpy (3)
> +.EX
> +p = stpncpy(buf, "Hello world!", sizeof(buf));
> +if (sizeof(buf) < strlen("Hello world!"))
> +    goto toolong;
> +len = p \- buf;
> +for (size_t i = 0; i < sizeof(buf); i++)
> +    putchar(buf[i]);
> +.EE
> +.\" ----- EXAMPLES :: strncpy(3) --------------------------------------/
> +.TP
> +.BR strncpy (3)
> +.EX
> +strncpy(buf, "Hello world!", sizeof(buf));
> +if (sizeof(buf) < strlen("Hello world!"))
> +    goto toolong;
> +len = strnlen(buf, sizeof(buf));
> +for (size_t i = 0; i < sizeof(buf); i++)
> +    putchar(buf[i]);
> +.EE
> +.\" ----- EXAMPLES :: zustr2ustp(3) -----------------------------------/
> +.TP
> +.BR zustr2ustp (3)
> +.EX
> +p = buf;
> +p = zustr2ustp(p, "Hello ", 6);
> +p = zustr2ustp(p, "world", 42);  // Padding null bytes ignored.
> +p = zustr2ustp(p, "!", 1);
> +len = p \- buf;
> +printf("%.*s\en", (int) len, buf);
> +.EE
> +.\" ----- EXAMPLES :: zustr2stp(3) ------------------------------------/
> +.TP
> +.BR zustr2stp (3)
> +.EX
> +p = buf;
> +p = zustr2stp(p, "Hello ", 6);
> +p = zustr2stp(p, "world", 42);  // Padding null bytes ignored.
> +p = zustr2stp(p, "!", 1);
> +len = p \- buf;
> +puts(buf);
> +.EE
> +.\" ----- EXAMPLES :: strncat(3) --------------------------------------/
> +.TP
> +.BR strncat (3)
> +.EX
> +buf[0] = \(aq\e0\(aq;  // There's no 'cpy' function to this 'cat'.
> +strncat(buf, "Hello ", 6);
> +strncat(buf, "world", 42);  // Padding null bytes ignored.
> +strncat(buf, "!", 1);
> +len = strlen(buf);
> +puts(buf);
> +.EE
> +.\" ----- EXAMPLES :: ustpcpy(3) --------------------------------------/
> +.TP
> +.BR ustpcpy (3)
> +.EX
> +p = buf;
> +p = ustpcpy(p, "Hello ", 6);
> +p = ustpcpy(p, "world", 5);
> +p = ustpcpy(p, "!", 1);
> +len = p \- buf;
> +printf("%.*s\en", (int) len, buf);
> +.EE
> +.\" ----- EXAMPLES :: ustr2stp(3) -------------------------------------/
> +.TP
> +.BR ustr2stp (3)
> +.EX
> +p = buf;
> +p = ustr2stp(p, "Hello ", 6);
> +p = ustr2stp(p, "world", 5);
> +p = ustr2stp(p, "!", 1);
> +len = p \- buf;
> +puts(buf);
> +.EE
> +.\" ----- EXAMPLES :: Implementations :: ------------------------------/
> +.SS Implementations
> +Here are reference implementations for functions not provided by libc.
> +.PP
> +.in +4n
> +.EX
> +/* This code is in the public domain. */
> +
> +.\" ----- EXAMPLES :: Implementations :: stpecpy(3) -------------------/
> +char *
> +.IR stpecpy "(char *dst, char end[0], const char *restrict src)"
> +{
> +    char *p;
> +
> +    if (dst == end)
> +        return end;
> +
> +    p = memccpy(dst, src, \(aq\e0\(aq, end \- dst);
> +    if (p != NULL)
> +        return p \- 1;
> +
> +    /* truncation detected */
> +    end[\-1] = \(aq\e0\(aq;
> +    return end;
> +}
> +
> +.\" ----- EXAMPLES :: Implementations :: stpecpy(3) -------------------/
> +char *
> +.IR stpecpyx "(char *dst, char end[0], const char *restrict src)"
> +{
> +    if (src[strlen(src)] != \(aq\e0\(aq)
> +        raise(SIGSEGV);
> +
> +    return stpecpy(dst, end, src);
> +}
> +
> +.\" ----- EXAMPLES :: Implementations :: zustr2ustp(3) ----------------/
> +char *
> +.IR zustr2ustp "(char *restrict dst, const char *restrict src, size_t sz)"
> +{
> +    return ustpcpy(dst, src, strnlen(src, sz));
> +}
> +
> +.\" ----- EXAMPLES :: Implementations :: zustr2stp(3) -----------------/
> +char *
> +.IR zustr2stp "(char *restrict dst, const char *restrict src, size_t sz)"
> +{
> +    char  *p;
> +
> +    p = zustr2ustp(dst, src, sz);
> +    *p = \(aq\e0\(aq;
> +
> +    return p;
> +}
> +
> +.\" ----- EXAMPLES :: Implementations :: ustpcpy(3) -------------------/
> +char *
> +.IR ustpcpy "(char *restrict dst, const char *restrict src, size_t len)"
> +{
> +    return mempcpy(dst, src, len);
> +}
> +
> +.\" ----- EXAMPLES :: Implementations :: ustr2stp(3) ------------------/
> +char *
> +.IR ustr2stp "(char *restrict dst, const char *restrict src, size_t len)"
> +{
> +    char  *p;
> +
> +    p = ustpcpy(dst, src, len);
> +    *p = \(aq\e0\(aq;
> +
> +    return p;
> +}
> +.EE
> +.in
> +.\" ----- SEE ALSO :: -------------------------------------------------/
> +.SH SEE ALSO
> +.BR bzero (3),
> +.BR memcpy (3),
> +.BR memccpy (3),
> +.BR mempcpy (3),
> +.BR stpcpy (3),
> +.BR strlcpy (3bsd),
> +.BR strncat (3),
> +.BR stpncpy (3),
> +.BR string (3)
> --
> 2.39.0
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v6 1/5] string_copy.7: Add page to document all string-copying functions
  2022-12-20 15:00             ` Stefan Puiu
@ 2022-12-20 15:03               ` Alejandro Colomar
  0 siblings, 0 replies; 53+ messages in thread
From: Alejandro Colomar @ 2022-12-20 15:03 UTC (permalink / raw)
  To: Stefan Puiu
  Cc: linux-man, Alejandro Colomar, Martin Sebor, G. Branden Robinson,
	Douglas McIlroy, Jakub Wilk, Serge Hallyn, Iker Pedrosa,
	Andrew Pinski


[-- Attachment #1.1: Type: text/plain, Size: 31116 bytes --]

Hi Stefan,

On 12/20/22 16:00, Stefan Puiu wrote:
> Hi,
> 
> Noticed a typo below

Typo fixed.  Thanks,

Alex

<https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=3d395282860f7b86f65c6735351f24b52c486718>

> 
> On Mon, Dec 19, 2022 at 11:02 PM Alejandro Colomar
> <alx.manpages@gmail.com> wrote:
>>
>> This is an opportunity to use consistent language across the
>> documentation for all string-copying functions.
>>
>> It is also easier to show the similarities and differences between all
>> of the functions, so that a reader can use this page to know which
>> function is needed for a given task.
>>
>> Alternative functions not provided by libc have been given in the same
>> page, with reference implementations.
>>
>> Cc: Martin Sebor <msebor@redhat.com>
>> Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
>> Cc: Douglas McIlroy <douglas.mcilroy@dartmouth.edu>
>> Cc: Jakub Wilk <jwilk@jwilk.net>
>> Cc: Serge Hallyn <serge@hallyn.com>
>> Cc: Iker Pedrosa <ipedrosa@redhat.com>
>> Cc: Andrew Pinski <pinskia@gmail.com>
>> Cc: Stefan Puiu <stefan.puiu@gmail.com>
>> Signed-off-by: Alejandro Colomar <alx@kernel.org>
>> ---
>>   man7/string_copy.7 | 855 +++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 855 insertions(+)
>>   create mode 100644 man7/string_copy.7
>>
>> diff --git a/man7/string_copy.7 b/man7/string_copy.7
>> new file mode 100644
>> index 000000000..a32b93c01
>> --- /dev/null
>> +++ b/man7/string_copy.7
>> @@ -0,0 +1,855 @@
>> +.\" Copyright 2022 Alejandro Colomar <alx@kernel.org>
>> +.\"
>> +.\" SPDX-License-Identifier: BSD-3-Clause
>> +.\"
>> +.TH string_copy 7 (date) "Linux man-pages (unreleased)"
>> +.\" ----- NAME :: -----------------------------------------------------/
>> +.SH NAME
>> +stpcpy,
>> +strcpy, strcat,
>> +stpecpy, stpecpyx,
>> +strlcpy, strlcat,
>> +stpncpy,
>> +strncpy,
>> +zustr2ustp, zustr2stp,
>> +strncat,
>> +ustpcpy, ustr2stp
>> +\- copy strings and character sequences
>> +.\" ----- SYNOPSIS :: -------------------------------------------------/
>> +.SH SYNOPSIS
>> +.\" ----- SYNOPSIS :: (Null-terminated) strings -----------------------/
>> +.SS Strings
>> +.nf
>> +// Chain-copy a string.
>> +.BI "char *stpcpy(char *restrict " dst ", const char *restrict " src );
>> +.PP
>> +// Copy/catenate a string.
>> +.BI "char *strcpy(char *restrict " dst ", const char *restrict " src );
>> +.BI "char *strcat(char *restrict " dst ", const char *restrict " src );
>> +.PP
>> +// Chain-copy a string with truncation.
>> +.BI "char *stpecpy(char *" dst ", char " end "[0], const char *restrict " src );
>> +.PP
>> +// Chain-copy a string with truncation and SIGSEGV on UB.
>> +.BI "char *stpecpyx(char *" dst ", char " end "[0], const char *restrict " src );
>> +.PP
>> +// Copy/catenate a string with truncation and SIGSEGV on UB.
>> +.BI "size_t strlcpy(char " dst "[restrict ." sz "], \
>> +const char *restrict " src ,
>> +.BI "               size_t " sz );
>> +.BI "size_t strlcat(char " dst "[restrict ." sz "], \
>> +const char *restrict " src ,
>> +.BI "               size_t " sz );
>> +.fi
>> +.\" ----- SYNOPSIS :: Null-padded character sequences --------/
>> +.SS Null-padded character sequences
>> +.nf
>> +// Zero a fixed-width buffer, and
>> +// copy a string into a character sequence with truncation.
>> +.BI "char *stpncpy(char " dst "[restrict ." sz "], \
>> +const char *restrict " src ,
>> +.BI "               size_t " sz );
>> +.PP
>> +// Zero a fixed-width buffer, and
>> +// copy a string into a character sequence with truncation.
>> +.BI "char *strncpy(char " dest "[restrict ." sz "], \
>> +const char *restrict " src ,
>> +.BI "               size_t " sz );
>> +.PP
>> +// Chain-copy a null-padded character sequence into a character sequence.
>> +.BI "char *zustr2ustp(char *restrict " dst ", \
>> +const char " src "[restrict ." sz ],
>> +.BI "               size_t " sz );
>> +.PP
>> +// Chain-copy a null-padded character sequence into a string.
>> +.BI "char *zustr2stp(char *restrict " dst ", \
>> +const char " src "[restrict ." sz ],
>> +.BI "               size_t " sz );
>> +.PP
>> +// Catenate a null-padded character sequence into a string.
>> +.BI "char *strncat(char *restrict " dst ", const char " src "[restrict ." sz ],
>> +.BI "               size_t " sz );
>> +.fi
>> +.\" ----- SYNOPSIS :: Measured character sequences --------------------/
>> +.SS Measured character sequences
>> +.nf
>> +// Chain-copy a measured character sequence.
>> +.BI "char *ustpcpy(char *restrict " dst ", \
>> +const char " src "[restrict ." len ],
>> +.BI "               size_t " len );
>> +.PP
>> +// Chain-copy a measured character sequence into a string.
>> +.BI "char *ustr2stp(char *restrict " dst ", \
>> +const char " src "[restrict ." len ],
>> +.BI "               size_t " len );
>> +.fi
>> +.SH DESCRIPTION
>> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: -----------------/
>> +.SS Terms (and abbreviations)
>> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: string (str) ----/
>> +.TP
>> +.IR "string " ( str )
>> +is a sequence of zero or more non-null characters followed by a null byte.
>> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: null-padded character seq
>> +.TP
>> +.I character sequence
>> +is a sequence of zero or more non-null characters.
>> +A program should never usa a character sequence where a string is required.
> 
> Here I think you want s/usa/use above.
> 
> Thanks,
> Stefan.
> 
>> +However, with appropriate care,
>> +a string can be used in the place of a character sequence.
>> +.RS
>> +.TP
>> +.IR "null-padded character sequence " ( zustr )
>> +Character sequences can be contained in fixed-width buffers,
>> +which contain padding null bytes after the character sequence,
>> +to fill the rest of the buffer
>> +without affecting the character sequence;
>> +however, those padding null bytes are not part of the character sequence.
>> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: measured character sequence
>> +.TP
>> +.IR "measured character sequence " ( ustr )
>> +Character sequence delimited by its length.
>> +It may be a slice of a larger character sequence,
>> +or even of a string.
>> +.RE
>> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: length (len) ----/
>> +.TP
>> +.IR "length " ( len )
>> +is the number of non-null characters in a string or character sequence.
>> +It is the return value of
>> +.I strlen(str)
>> +and of
>> +.IR "strnlen(ustr, sz)" .
>> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: size (sz) -------/
>> +.TP
>> +.IR "size " ( sz )
>> +refers to the entire buffer
>> +where the string or character sequence is contained.
>> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: end -------------/
>> +.TP
>> +.I end
>> +is the name of a pointer to one past the last element of a buffer.
>> +It is equivalent to
>> +.IR &str[sz] .
>> +It is used as a sentinel value,
>> +to be able to truncate strings or character sequences
>> +instead of overrunning the containing buffer.
>> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: copy ------------/
>> +.TP
>> +.I copy
>> +This term is used when
>> +the writing starts at the first element pointed to by
>> +.IR dst .
>> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: catenate --------/
>> +.TP
>> +.I catenate
>> +This term is used when
>> +a function first finds the terminating null byte in
>> +.IR dst ,
>> +and then starts writing at that position.
>> +.\" ----- DESCRIPTION :: Terms (and abbreviations) :: chain -----------/
>> +.TP
>> +.I chain
>> +This term is used when
>> +it's the programmer who provides
>> +a pointer to the terminating null byte in the string
>> +.I dst
>> +(or one after the last character in a character sequence),
>> +and the function starts writing at that location.
>> +The function returns
>> +a pointer to the new location of the terminating null byte
>> +(or one after the last character in a character sequence)
>> +after the call,
>> +so that the programmer can use it to chain such calls.
>> +.\" ----- DESCRIPTION :: Copy, catenate, and chain-copy ---------------/
>> +.SS Copy, catenate, and chain-copy
>> +Originally,
>> +there was a distinction between functions that copy and those that catenate.
>> +However, newer functions that copy while allowing chaining
>> +cover both use cases with a single API.
>> +They are also algorithmically faster,
>> +since they don't need to search for
>> +the terminating null byte of the existing string.
>> +However, functions that catenate have a much simpler use,
>> +so if performance is not important,
>> +it can make sense to use them for improving readability.
>> +.PP
>> +The pointer returned by functions that allow chaining
>> +is a byproduct of the copy operation,
>> +so it has no performance costs.
>> +Functions that return such a pointer,
>> +and thus can be chained,
>> +have names of the form
>> +.RB * stp *(),
>> +since it's common to name the pointer just
>> +.IR p .
>> +.PP
>> +Chain-copying functions that truncate
>> +should accept a pointer to the end of the destination buffer,
>> +and have names of the form
>> +.RB * stpe *().
>> +This allows not having to recalculate the remaining size after each call.
>> +.\" ----- DESCRIPTION :: Truncate or not? -----------------------------/
>> +.SS Truncate or not?
>> +The first thing to note is that programmers should be careful with buffers,
>> +so they always have the correct size,
>> +and truncation is not necessary.
>> +.PP
>> +In most cases,
>> +truncation is not desired,
>> +and it is simpler to just do the copy.
>> +Simpler code is safer code.
>> +Programming against programming mistakes by adding more code
>> +just adds more points where mistakes can be made.
>> +.PP
>> +Nowadays,
>> +compilers can detect most programmer errors with features like
>> +compiler warnings,
>> +static analyzers, and
>> +.BR \%_FORTIFY_SOURCE
>> +(see
>> +.BR ftm (7)).
>> +Keeping the code simple
>> +helps these overflow-detection features be more precise.
>> +.PP
>> +When validating user input,
>> +however,
>> +it makes sense to truncate.
>> +Remember to check the return value of such function calls.
>> +.PP
>> +Functions that truncate:
>> +.IP \(bu 3
>> +.BR stpecpy (3)
>> +is the most efficient string copy function that performs truncation.
>> +It only requires to check for truncation once after all chained calls.
>> +.IP \(bu
>> +.BR stpecpyx (3)
>> +is a variant of
>> +.BR stpecpy (3)
>> +that consumes the entire source string,
>> +to catch bugs in the program
>> +by forcing a segmentation fault (as
>> +.BR strlcpy (3bsd)
>> +and
>> +.BR strlcat (3bsd)
>> +do).
>> +.IP \(bu
>> +.BR strlcpy (3bsd)
>> +and
>> +.BR strlcat (3bsd)
>> +are designed to crash if the input string is invalid
>> +(doesn't contain a terminating null byte).
>> +.IP \(bu
>> +.BR stpncpy (3)
>> +and
>> +.BR strncpy (3)
>> +also truncate, but they don't write strings,
>> +but rather null-padded character sequences.
>> +.\" ----- DESCRIPTION :: Null-padded character sequences --------------/
>> +.SS Null-padded character sequences
>> +For historic reasons,
>> +some standard APIs,
>> +such as
>> +.BR utmpx (5),
>> +use null-padded character sequences in fixed-width buffers.
>> +To interface with them,
>> +specialized functions need to be used.
>> +.PP
>> +To copy strings into them, use
>> +.BR stpncpy (3).
>> +.PP
>> +To copy from an unterminated string within a fixed-width buffer into a string,
>> +ignoring any trailing null bytes in the source fixed-width buffer,
>> +you should use
>> +.BR zustr2stp (3)
>> +or
>> +.BR strncat (3).
>> +.PP
>> +To copy from an unterminated string within a fixed-width buffer
>> +into a character sequence,
>> +ingoring any trailing null bytes in the source fixed-width buffer,
>> +you should use
>> +.BR zustr2ustp (3).
>> +.\" ----- DESCRIPTION :: Measured character sequences -----------------/
>> +.SS Measured character sequences
>> +The simplest character sequence copying function is
>> +.BR mempcpy (3).
>> +It requires always knowing the length of your character sequences,
>> +for which structures can be used.
>> +It makes the code much faster,
>> +since you always know the length of your character sequences,
>> +and can do the minimal copies and length measurements.
>> +.BR mempcpy (3)
>> +copies character sequences,
>> +so you need to explicitly set the terminating null byte if you need a string.
>> +.PP
>> +However,
>> +for keeping type safety,
>> +it's good to add a wrapper that uses
>> +.I char\~*
>> +instead of
>> +.IR void\~* :
>> +.BR ustpcpy (3).
>> +.PP
>> +In programs that make considerable use of strings or character sequences,
>> +and need the best performance,
>> +using overlapping character sequences can make a big difference.
>> +It allows holding subsequences of a larger character sequence.
>> +while not duplicating memory
>> +nor using time to do a copy.
>> +.PP
>> +However, this is delicate,
>> +since it requires using character sequences.
>> +C library APIs use strings,
>> +so programs that use character sequences
>> +will have to take care of differentiating strings from character sequences.
>> +.PP
>> +To copy a measured character sequence, use
>> +.BR ustpcpy (3).
>> +.PP
>> +To copy a measured character sequence into a string, use
>> +.BR ustr2stp (3).
>> +.PP
>> +Because these functions ask for the length,
>> +and a string is by nature composed of a character sequence of the same length
>> +plus a terminating null byte,
>> +a string is also accepted as input.
>> +.\" ----- DESCRIPTION :: String vs character sequence -----------------/
>> +.SS String vs character sequence
>> +Some functions only operate on strings.
>> +Those require that the input
>> +.I src
>> +is a string,
>> +and guarantee an output string
>> +(even when truncation occurs).
>> +Functions that catenate
>> +also require that
>> +.I dst
>> +holds a string before the call.
>> +List of functions:
>> +.IP \(bu 3
>> +.PD 0
>> +.BR stpcpy (3)
>> +.IP \(bu
>> +.BR strcpy "(3), \c"
>> +.BR strcat (3)
>> +.IP \(bu
>> +.BR stpecpy "(3), \c"
>> +.BR stpecpyx (3)
>> +.IP \(bu
>> +.BR strlcpy "(3bsd), \c"
>> +.BR strlcat (3bsd)
>> +.PD
>> +.PP
>> +Other functions require an input string,
>> +but create a character sequence as output.
>> +These functions have confusing names,
>> +and have a long history of misuse.
>> +List of functions:
>> +.IP \(bu 3
>> +.PD 0
>> +.BR stpncpy (3)
>> +.IP \(bu
>> +.BR strncpy (3)
>> +.PD
>> +.PP
>> +Other functions operate on an input character sequence,
>> +and create an output string.
>> +Functions that catenate
>> +also require that
>> +.I dst
>> +holds a string before the call.
>> +.BR strncat (3)
>> +has an even more misleading name than the functions above.
>> +List of functions:
>> +.IP \(bu 3
>> +.PD 0
>> +.BR zustr2stp (3)
>> +.IP \(bu
>> +.BR strncat (3)
>> +.IP \(bu
>> +.BR ustr2stp (3)
>> +.PD
>> +.PP
>> +Other functions operate on an input character sequence
>> +to create an output character sequence.
>> +List of functions:
>> +.IP \(bu 3
>> +.PD 0
>> +.BR ustpcpy (3)
>> +.IP \(bu
>> +.BR zustr2stp (3)
>> +.PD
>> +.\" ----- DESCRIPTION :: Functions :: ---------------------------------/
>> +.SS Functions
>> +.\" ----- DESCRIPTION :: Functions :: stpcpy(3) -----------------------/
>> +.TP
>> +.BR stpcpy (3)
>> +This function copies the input string into a destination string.
>> +The programmer is responsible for allocating a buffer large enough.
>> +It returns a pointer suitable for chaining.
>> +.\" ----- DESCRIPTION :: Functions :: strcpy(3), strcat(3) ------------/
>> +.TP
>> +.BR strcpy (3)
>> +.TQ
>> +.BR strcat (3)
>> +These functions copy and catenate the input string into a destination string.
>> +The programmer is responsible for allocating a buffer large enough.
>> +The return value is useless.
>> +.IP
>> +.BR stpcpy (3)
>> +is a faster alternative to these functions.
>> +.\" ----- DESCRIPTION :: Functions :: stpecpy(3), stpecpyx(3) ---------/
>> +.TP
>> +.BR stpecpy (3)
>> +.TQ
>> +.BR stpecpyx (3)
>> +These functions copy the input string into a destination string.
>> +If the destination buffer,
>> +limited by a pointer to its end,
>> +isn't large enough to hold the copy,
>> +the resulting string is truncated
>> +(but it is guaranteed to be null-terminated).
>> +They return a pointer suitable for chaining.
>> +Truncation needs to be detected only once after the last chained call.
>> +.BR stpecpyx (3)
>> +has identical semantics to
>> +.BR stpecpy (3),
>> +except that it forces a SIGSEGV if the
>> +.I src
>> +pointer is not a string.
>> +.IP
>> +These functions are not provided by any library;
>> +See EXAMPLES for a reference implementation.
>> +.\" ----- DESCRIPTION :: Functions :: strlcpy(3bsd), strlcat(3bsd) ----/
>> +.TP
>> +.BR strlcpy (3bsd)
>> +.TQ
>> +.BR strlcat (3bsd)
>> +These functions copy and catenate the input string into a destination string.
>> +If the destination buffer,
>> +limited by its size,
>> +isn't large enough to hold the copy,
>> +the resulting string is truncated
>> +(but it is guaranteed to be null-terminated).
>> +They return the length of the total string they tried to create.
>> +These functions force a SIGSEGV if the
>> +.I src
>> +pointer is not a string.
>> +.IP
>> +.BR stpecpyx (3)
>> +is a faster alternative to these functions.
>> +.\" ----- DESCRIPTION :: Functions :: stpncpy(3) ----------------------/
>> +.TP
>> +.BR stpncpy (3)
>> +This function copies the input string into
>> +a destination null-padded character sequence in a fixed-width buffer.
>> +If the destination buffer,
>> +limited by its size,
>> +isn't large enough to hold the copy,
>> +the resulting character sequence is truncated.
>> +Since it creates a character sequence,
>> +it doesn't need to write a terminating null byte.
>> +It's impossible to distinguish truncation by the result of the call,
>> +from a character sequence that just fits the destination buffer;
>> +truncation should be detected by
>> +comparing the length of the input string
>> +with the size of the destination buffer.
>> +.\" ----- DESCRIPTION :: Functions :: strncpy(3) ----------------------/
>> +.TP
>> +.BR strncpy (3)
>> +This function is identical to
>> +.BR stpncpy (3)
>> +except for the useless return value.
>> +.IP
>> +.BR stpncpy (3)
>> +is a more useful alternative to this function.
>> +.\" ----- DESCRIPTION :: Functions :: zustr2ustp(3) --------------------/
>> +.TP
>> +.BR zustr2ustp (3)
>> +This function copies the input character sequence
>> +contained in a null-padded wixed-width buffer,
>> +into a destination character sequence.
>> +The programmer is responsible for allocating a buffer large enough.
>> +It returns a pointer suitable for chaining.
>> +.IP
>> +A truncating version of this function doesn't exist,
>> +since the size of the original character sequence is always known,
>> +so it wouldn't be very useful.
>> +.IP
>> +This function is not provided by any library;
>> +See EXAMPLES for a reference implementation.
>> +.\" ----- DESCRIPTION :: Functions :: zustr2stp(3) --------------------/
>> +.TP
>> +.BR zustr2stp (3)
>> +This function copies the input character sequence
>> +contained in a null-padded wixed-width buffer,
>> +into a destination string.
>> +The programmer is responsible for allocating a buffer large enough.
>> +It returns a pointer suitable for chaining.
>> +.IP
>> +A truncating version of this function doesn't exist,
>> +since the size of the original character sequence is always known,
>> +so it wouldn't be very useful.
>> +.IP
>> +This function is not provided by any library;
>> +See EXAMPLES for a reference implementation.
>> +.\" ----- DESCRIPTION :: Functions :: strncat(3) ----------------------/
>> +.TP
>> +.BR strncat (3)
>> +Do not confuse this function with
>> +.BR strncpy (3);
>> +they are not related at all.
>> +.IP
>> +This function catenates the input character sequence
>> +contained in a null-padded wixed-width buffer,
>> +into a destination string.
>> +The programmer is responsible for allocating a buffer large enough.
>> +The return value is useless.
>> +.IP
>> +.BR zustr2stp (3)
>> +is a faster alternative to this function.
>> +.\" ----- DESCRIPTION :: Functions :: ustpcpy(3) ----------------------/
>> +.TP
>> +.BR ustpcpy (3)
>> +This function copies the input character sequence,
>> +limited by its length,
>> +into a destination character sequence.
>> +The programmer is responsible for allocating a buffer large enough.
>> +It returns a pointer suitable for chaining.
>> +.\" ----- DESCRIPTION :: Functions :: ustr2stp(3) ---------------------/
>> +.TP
>> +.BR ustr2stp (3)
>> +This function copies the input character sequence,
>> +limited by its length,
>> +into a destination string.
>> +The programmer is responsible for allocating a buffer large enough.
>> +It returns a pointer suitable for chaining.
>> +.\" ----- RETURN VALUE :: ---------------------------------------------/
>> +.SH RETURN VALUE
>> +The following functions return
>> +a pointer to the terminating null byte in the destination string.
>> +.IP \(bu 3
>> +.PD 0
>> +.BR stpcpy (3)
>> +.IP \(bu
>> +.BR ustr2stp (3)
>> +.IP \(bu
>> +.BR zustr2stp (3)
>> +.PD
>> +.PP
>> +The following functions return
>> +a pointer to the terminating null byte in the destination string,
>> +except when truncation occurs;
>> +if truncation occurs,
>> +they return a pointer to the end of the destination buffer.
>> +.IP \(bu 3
>> +.BR stpecpy (3),
>> +.BR stpecpyx (3)
>> +.PP
>> +The following function returns
>> +a pointer to one after the last character
>> +in the destination character sequence;
>> +if truncation occurs,
>> +that pointer is equivalent to
>> +a pointer to the end of the destination buffer.
>> +.IP \(bu 3
>> +.BR stpncpy (3)
>> +.PP
>> +The following functions return
>> +a pointer to one after the last character
>> +in the destination character sequence.
>> +.IP \(bu 3
>> +.PD 0
>> +.BR zustr2ustp (3)
>> +.IP \(bu
>> +.BR ustpcpy (3)
>> +.PD
>> +.PP
>> +The following functions return
>> +the length of the total string that they tried to create
>> +(as if truncation didn't occur).
>> +.IP \(bu 3
>> +.BR strlcpy (3bsd),
>> +.BR strlcat (3bsd)
>> +.PP
>> +The following functions return the
>> +.I dst
>> +pointer,
>> +which is useless.
>> +.IP \(bu 3
>> +.PD 0
>> +.BR strcpy (3),
>> +.BR strcat (3)
>> +.IP \(bu
>> +.BR strncpy (3)
>> +.IP \(bu
>> +.BR strncat (3)
>> +.PD
>> +.\" ----- NOTES :: strscpy(9) -----------------------------------------/
>> +.SH NOTES
>> +The Linux kernel has an internal function for copying strings,
>> +which is similar to
>> +.BR stpecpy (3),
>> +except that it can't be chained:
>> +.TP
>> +.BR strscpy (9)
>> +This function copies the input string into a destination string.
>> +If the destination buffer,
>> +limited by its size,
>> +isn't large enough to hold the copy,
>> +the resulting string is truncated
>> +(but it is guaranteed to be null-terminated).
>> +It returns the length of the destination string, or
>> +.B \-E2BIG
>> +on truncation.
>> +.IP
>> +.BR stpecpy (3)
>> +is a simpler and faster alternative to this function.
>> +.RE
>> +.\" ----- CAVEATS :: --------------------------------------------------/
>> +.SH CAVEATS
>> +Don't mix chain calls to truncating and non-truncating functions.
>> +It is conceptually wrong
>> +unless you know that the first part of a copy will always fit.
>> +Anyway, the performance difference will probably be negligible,
>> +so it will probably be more clear if you use consistent semantics:
>> +either truncating or non-truncating.
>> +Calling a non-truncating function after a truncating one is necessarily wrong.
>> +.\" ----- BUGS :: -----------------------------------------------------/
>> +.SH BUGS
>> +All catenation functions share the same performance problem:
>> +.UR https://www.joelonsoftware.com/\:2001/12/11/\:back\-to\-basics/
>> +Shlemiel the painter
>> +.UE .
>> +.\" ----- EXAMPLES :: -------------------------------------------------/
>> +.SH EXAMPLES
>> +The following are examples of correct use of each of these functions.
>> +.\" ----- EXAMPLES :: stpcpy(3) ---------------------------------------/
>> +.TP
>> +.BR stpcpy (3)
>> +.EX
>> +p = buf;
>> +p = stpcpy(p, "Hello ");
>> +p = stpcpy(p, "world");
>> +p = stpcpy(p, "!");
>> +len = p \- buf;
>> +puts(buf);
>> +.EE
>> +.\" ----- EXAMPLES :: strcpy(3), strcat(3) ----------------------------/
>> +.TP
>> +.BR strcpy (3)
>> +.TQ
>> +.BR strcat (3)
>> +.EX
>> +strcpy(buf, "Hello ");
>> +strcat(buf, "world");
>> +strcat(buf, "!");
>> +len = strlen(buf);
>> +puts(buf);
>> +.EE
>> +.\" ----- EXAMPLES :: stpecpy(3), stpecpyx(3) -------------------------/
>> +.TP
>> +.BR stpecpy (3)
>> +.TQ
>> +.BR stpecpyx (3)
>> +.EX
>> +end = buf + sizeof(buf);
>> +p = buf;
>> +p = stpecpy(p, end, "Hello ");
>> +p = stpecpy(p, end, "world");
>> +p = stpecpy(p, end, "!");
>> +if (p == end) {
>> +    p\-\-;
>> +    goto toolong;
>> +}
>> +len = p \- buf;
>> +puts(buf);
>> +.EE
>> +.\" ----- EXAMPLES :: strlcpy(3bsd), strlcat(3bsd) --------------------/
>> +.TP
>> +.BR strlcpy (3bsd)
>> +.TQ
>> +.BR strlcat (3bsd)
>> +.EX
>> +if (strlcpy(buf, "Hello ", sizeof(buf)) >= sizeof(buf))
>> +    goto toolong;
>> +if (strlcat(buf, "world", sizeof(buf)) >= sizeof(buf))
>> +    goto toolong;
>> +len = strlcat(buf, "!", sizeof(buf));
>> +if (len >= sizeof(buf))
>> +    goto toolong;
>> +puts(buf);
>> +.EE
>> +.\" ----- EXAMPLES :: strscpy(9) --------------------------------------/
>> +.TP
>> +.BR strscpy (9)
>> +.EX
>> +len = strscpy(buf, "Hello world!", sizeof(buf));
>> +if (len == \-E2BIG)
>> +    goto toolong;
>> +puts(buf);
>> +.EE
>> +.\" ----- EXAMPLES :: stpncpy(3) --------------------------------------/
>> +.TP
>> +.BR stpncpy (3)
>> +.EX
>> +p = stpncpy(buf, "Hello world!", sizeof(buf));
>> +if (sizeof(buf) < strlen("Hello world!"))
>> +    goto toolong;
>> +len = p \- buf;
>> +for (size_t i = 0; i < sizeof(buf); i++)
>> +    putchar(buf[i]);
>> +.EE
>> +.\" ----- EXAMPLES :: strncpy(3) --------------------------------------/
>> +.TP
>> +.BR strncpy (3)
>> +.EX
>> +strncpy(buf, "Hello world!", sizeof(buf));
>> +if (sizeof(buf) < strlen("Hello world!"))
>> +    goto toolong;
>> +len = strnlen(buf, sizeof(buf));
>> +for (size_t i = 0; i < sizeof(buf); i++)
>> +    putchar(buf[i]);
>> +.EE
>> +.\" ----- EXAMPLES :: zustr2ustp(3) -----------------------------------/
>> +.TP
>> +.BR zustr2ustp (3)
>> +.EX
>> +p = buf;
>> +p = zustr2ustp(p, "Hello ", 6);
>> +p = zustr2ustp(p, "world", 42);  // Padding null bytes ignored.
>> +p = zustr2ustp(p, "!", 1);
>> +len = p \- buf;
>> +printf("%.*s\en", (int) len, buf);
>> +.EE
>> +.\" ----- EXAMPLES :: zustr2stp(3) ------------------------------------/
>> +.TP
>> +.BR zustr2stp (3)
>> +.EX
>> +p = buf;
>> +p = zustr2stp(p, "Hello ", 6);
>> +p = zustr2stp(p, "world", 42);  // Padding null bytes ignored.
>> +p = zustr2stp(p, "!", 1);
>> +len = p \- buf;
>> +puts(buf);
>> +.EE
>> +.\" ----- EXAMPLES :: strncat(3) --------------------------------------/
>> +.TP
>> +.BR strncat (3)
>> +.EX
>> +buf[0] = \(aq\e0\(aq;  // There's no 'cpy' function to this 'cat'.
>> +strncat(buf, "Hello ", 6);
>> +strncat(buf, "world", 42);  // Padding null bytes ignored.
>> +strncat(buf, "!", 1);
>> +len = strlen(buf);
>> +puts(buf);
>> +.EE
>> +.\" ----- EXAMPLES :: ustpcpy(3) --------------------------------------/
>> +.TP
>> +.BR ustpcpy (3)
>> +.EX
>> +p = buf;
>> +p = ustpcpy(p, "Hello ", 6);
>> +p = ustpcpy(p, "world", 5);
>> +p = ustpcpy(p, "!", 1);
>> +len = p \- buf;
>> +printf("%.*s\en", (int) len, buf);
>> +.EE
>> +.\" ----- EXAMPLES :: ustr2stp(3) -------------------------------------/
>> +.TP
>> +.BR ustr2stp (3)
>> +.EX
>> +p = buf;
>> +p = ustr2stp(p, "Hello ", 6);
>> +p = ustr2stp(p, "world", 5);
>> +p = ustr2stp(p, "!", 1);
>> +len = p \- buf;
>> +puts(buf);
>> +.EE
>> +.\" ----- EXAMPLES :: Implementations :: ------------------------------/
>> +.SS Implementations
>> +Here are reference implementations for functions not provided by libc.
>> +.PP
>> +.in +4n
>> +.EX
>> +/* This code is in the public domain. */
>> +
>> +.\" ----- EXAMPLES :: Implementations :: stpecpy(3) -------------------/
>> +char *
>> +.IR stpecpy "(char *dst, char end[0], const char *restrict src)"
>> +{
>> +    char *p;
>> +
>> +    if (dst == end)
>> +        return end;
>> +
>> +    p = memccpy(dst, src, \(aq\e0\(aq, end \- dst);
>> +    if (p != NULL)
>> +        return p \- 1;
>> +
>> +    /* truncation detected */
>> +    end[\-1] = \(aq\e0\(aq;
>> +    return end;
>> +}
>> +
>> +.\" ----- EXAMPLES :: Implementations :: stpecpy(3) -------------------/
>> +char *
>> +.IR stpecpyx "(char *dst, char end[0], const char *restrict src)"
>> +{
>> +    if (src[strlen(src)] != \(aq\e0\(aq)
>> +        raise(SIGSEGV);
>> +
>> +    return stpecpy(dst, end, src);
>> +}
>> +
>> +.\" ----- EXAMPLES :: Implementations :: zustr2ustp(3) ----------------/
>> +char *
>> +.IR zustr2ustp "(char *restrict dst, const char *restrict src, size_t sz)"
>> +{
>> +    return ustpcpy(dst, src, strnlen(src, sz));
>> +}
>> +
>> +.\" ----- EXAMPLES :: Implementations :: zustr2stp(3) -----------------/
>> +char *
>> +.IR zustr2stp "(char *restrict dst, const char *restrict src, size_t sz)"
>> +{
>> +    char  *p;
>> +
>> +    p = zustr2ustp(dst, src, sz);
>> +    *p = \(aq\e0\(aq;
>> +
>> +    return p;
>> +}
>> +
>> +.\" ----- EXAMPLES :: Implementations :: ustpcpy(3) -------------------/
>> +char *
>> +.IR ustpcpy "(char *restrict dst, const char *restrict src, size_t len)"
>> +{
>> +    return mempcpy(dst, src, len);
>> +}
>> +
>> +.\" ----- EXAMPLES :: Implementations :: ustr2stp(3) ------------------/
>> +char *
>> +.IR ustr2stp "(char *restrict dst, const char *restrict src, size_t len)"
>> +{
>> +    char  *p;
>> +
>> +    p = ustpcpy(dst, src, len);
>> +    *p = \(aq\e0\(aq;
>> +
>> +    return p;
>> +}
>> +.EE
>> +.in
>> +.\" ----- SEE ALSO :: -------------------------------------------------/
>> +.SH SEE ALSO
>> +.BR bzero (3),
>> +.BR memcpy (3),
>> +.BR memccpy (3),
>> +.BR mempcpy (3),
>> +.BR stpcpy (3),
>> +.BR strlcpy (3bsd),
>> +.BR strncat (3),
>> +.BR stpncpy (3),
>> +.BR string (3)
>> --
>> 2.39.0
>>

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v6 1/5] string_copy.7: Add page to document all string-copying functions
  2022-12-19 21:02           ` [PATCH v6 1/5] string_copy.7: Add page to document all " Alejandro Colomar
  2022-12-20 15:00             ` Stefan Puiu
@ 2023-01-20  3:43             ` Eric Biggers
  2023-01-20 12:55               ` Alejandro Colomar
  1 sibling, 1 reply; 53+ messages in thread
From: Eric Biggers @ 2023-01-20  3:43 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: linux-man, Alejandro Colomar, Martin Sebor, G. Branden Robinson,
	Douglas McIlroy, Jakub Wilk, Serge Hallyn, Iker Pedrosa,
	Andrew Pinski, Stefan Puiu

On Mon, Dec 19, 2022 at 10:02:05PM +0100, Alejandro Colomar wrote:
> diff --git a/man7/string_copy.7 b/man7/string_copy.7
> new file mode 100644
> index 000000000..a32b93c01
> --- /dev/null
> +++ b/man7/string_copy.7
> @@ -0,0 +1,855 @@
> +.\" Copyright 2022 Alejandro Colomar <alx@kernel.org>
> +.\"
> +.\" SPDX-License-Identifier: BSD-3-Clause
> +.\"
> +.TH string_copy 7 (date) "Linux man-pages (unreleased)"
> +.\" ----- NAME :: -----------------------------------------------------/
> +.SH NAME
> +stpcpy,
> +strcpy, strcat,
> +stpecpy, stpecpyx,
> +strlcpy, strlcat,
> +stpncpy,
> +strncpy,
> +zustr2ustp, zustr2stp,
> +strncat,
> +ustpcpy, ustr2stp

I happened to come across this new man page, and I'm confused by the inclusion
of functions like "ustpcpy".  These functions don't seem to actually exist, so
why are they documented in the man page?

- Eric

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v6 1/5] string_copy.7: Add page to document all string-copying functions
  2023-01-20  3:43             ` Eric Biggers
@ 2023-01-20 12:55               ` Alejandro Colomar
  0 siblings, 0 replies; 53+ messages in thread
From: Alejandro Colomar @ 2023-01-20 12:55 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-man, Alejandro Colomar, Martin Sebor, G. Branden Robinson,
	Douglas McIlroy, Jakub Wilk, Serge Hallyn, Iker Pedrosa,
	Andrew Pinski, Stefan Puiu


[-- Attachment #1.1: Type: text/plain, Size: 1811 bytes --]

Hi Eric,

On 1/20/23 04:43, Eric Biggers wrote:
> On Mon, Dec 19, 2022 at 10:02:05PM +0100, Alejandro Colomar wrote:
>> diff --git a/man7/string_copy.7 b/man7/string_copy.7
>> new file mode 100644
>> index 000000000..a32b93c01
>> --- /dev/null
>> +++ b/man7/string_copy.7
>> @@ -0,0 +1,855 @@
>> +.\" Copyright 2022 Alejandro Colomar <alx@kernel.org>
>> +.\"
>> +.\" SPDX-License-Identifier: BSD-3-Clause
>> +.\"
>> +.TH string_copy 7 (date) "Linux man-pages (unreleased)"
>> +.\" ----- NAME :: -----------------------------------------------------/
>> +.SH NAME
>> +stpcpy,
>> +strcpy, strcat,
>> +stpecpy, stpecpyx,
>> +strlcpy, strlcat,
>> +stpncpy,
>> +strncpy,
>> +zustr2ustp, zustr2stp,
>> +strncat,
>> +ustpcpy, ustr2stp
> 
> I happened to come across this new man page, and I'm confused by the inclusion
> of functions like "ustpcpy".  These functions don't seem to actually exist, so
> why are they documented in the man page?

That page is not documenting the existing libc functions for copying strings, 
but rather trying to explain all the alternatives, including other systems' ones 
(such as strlcpy(3bsd)), and custom ones that are not provided by any system 
(yet).  It tries to guide a programmer that knows nothing about string copying 
to allow him produce quality code, independently of libc support.  For 
documentation of the libc functions we still have the separate pages for each, 
which have been also updated.

Those specific functions are similar to the old saying of "just use memcpy(3) 
and forget about string copying functions", which is not so bad of an advice: 
it's the fastest; however, those functions are a bit safer than directly calling 
memcpy(3).

> 
> - Eric

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2023-01-20 12:56 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-11 23:59 string_copy(7): New manual page documenting string copying functions Alejandro Colomar
2022-12-12  0:17 ` Alejandro Colomar
2022-12-12  0:25 ` Alejandro Colomar
2022-12-12  0:32 ` Alejandro Colomar
2022-12-12 14:24 ` [PATCH 1/3] strcpy.3: Rewrite page to document all string-copying functions Alejandro Colomar
2022-12-12 17:33   ` Alejandro Colomar
2022-12-12 18:38     ` groff man(7) extensions (was: [PATCH 1/3] strcpy.3: Rewrite page to document all string-copying functions) G. Branden Robinson
2022-12-13 15:45       ` a Q quotation macro for man(7) (was: groff man(7) extensions) G. Branden Robinson
2022-12-12 23:00   ` [PATCH v2 0/3] Rewrite strcpy(3) Alejandro Colomar
2022-12-13 20:56     ` Jakub Wilk
2022-12-13 20:57       ` Alejandro Colomar
2022-12-13 22:05       ` Alejandro Colomar
2022-12-13 22:46         ` Alejandro Colomar
2022-12-14  0:03     ` [PATCH v3 0/1] Rewritten page for string-copying functions Alejandro Colomar
2022-12-14  0:14       ` Alejandro Colomar
2022-12-14  0:16         ` Alejandro Colomar
2022-12-14 16:17       ` [PATCH v4 " Alejandro Colomar
2022-12-15  0:26         ` [PATCH v5 0/5] Rewrite pages about " Alejandro Colomar
2022-12-19 21:02           ` [PATCH v6 0/5] Rewrite documentation for " Alejandro Colomar
2022-12-19 21:02           ` [PATCH v6 1/5] string_copy.7: Add page to document all " Alejandro Colomar
2022-12-20 15:00             ` Stefan Puiu
2022-12-20 15:03               ` Alejandro Colomar
2023-01-20  3:43             ` Eric Biggers
2023-01-20 12:55               ` Alejandro Colomar
2022-12-19 21:02           ` [PATCH v6 2/5] stpecpy.3, stpecpyx.3, ustpcpy.3, ustr2stp.3, zustr2stp.3, zustr2ustp.3: Add new links to string_copy(7) Alejandro Colomar
2022-12-19 21:02           ` [PATCH v6 3/5] stpcpy.3, strcpy.3, strcat.3: Document in a single page Alejandro Colomar
2022-12-19 21:02           ` [PATCH v6 4/5] stpncpy.3, strncpy.3: " Alejandro Colomar
2022-12-19 21:02           ` [PATCH v6 5/5] strncat.3: Rewrite to be consistent with string_copy.7 Alejandro Colomar
2022-12-15  0:26         ` [PATCH v5 1/5] string_copy.7: Add page to document all string-copying functions Alejandro Colomar
2022-12-15  0:30           ` Alejandro Colomar
2022-12-15  0:26         ` [PATCH v5 2/5] stpecpy.3, stpecpyx.3, ustpcpy.3, ustr2stp.3, zustr2stp.3, zustr2ustp.3: Add new links to string_copy(7) Alejandro Colomar
2022-12-15  0:27           ` Alejandro Colomar
2022-12-16 18:47             ` Stefan Puiu
2022-12-16 19:03               ` Alejandro Colomar
2022-12-16 19:09                 ` Alejandro Colomar
2022-12-15  0:26         ` [PATCH v5 3/5] stpcpy.3, strcpy.3, strcat.3: Document in a single page Alejandro Colomar
2022-12-16 14:46           ` Alejandro Colomar
2022-12-16 14:47             ` Alejandro Colomar
2022-12-15  0:26         ` [PATCH v5 4/5] stpncpy.3, strncpy.3: " Alejandro Colomar
2022-12-15  0:28           ` Alejandro Colomar
2022-12-15  0:26         ` [PATCH v5 5/5] strncat.3: Rewrite to be consistent with string_copy.7 Alejandro Colomar
2022-12-15  0:29           ` Alejandro Colomar
2022-12-14 16:17       ` [PATCH v4 1/1] strcpy.3: Rewrite page to document all string-copying functions Alejandro Colomar
2022-12-14  0:03     ` [PATCH v3 " Alejandro Colomar
2022-12-14 16:22       ` Douglas McIlroy
2022-12-14 16:36         ` Alejandro Colomar
2022-12-14 17:11           ` Alejandro Colomar
2022-12-14 17:19             ` Alejandro Colomar
2022-12-12 23:00   ` [PATCH v2 1/3] " Alejandro Colomar
2022-12-12 23:00   ` [PATCH v2 2/3] stpcpy.3, stpncpy.3, strcat.3, strncat.3, strncpy.3: Transform the old pages into links to strcpy(3) Alejandro Colomar
2022-12-12 23:00   ` [PATCH v2 3/3] stpecpy.3, stpecpyx.3, strlcat.3, strlcpy.3, strscpy.3: Add new " Alejandro Colomar
2022-12-12 14:24 ` [PATCH 2/3] stpcpy.3, stpncpy.3, strcat.3, strncat.3, strncpy.3: Transform the old pages into " Alejandro Colomar
2022-12-12 14:24 ` [PATCH 3/3] stpecpy.3, stpecpyx.3, strlcat.3, strlcpy.3, strscpy.3: Add new " Alejandro Colomar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.