On 12/12/22 00:59, Alejandro Colomar wrote: > Hi all! > > I'm planning to add a new manual page that documents all string copying > functions.  It covers more detail than any of the existing manual pages (and in > fact, I've discovered some properties of the functions documented while working > on this page).  The intention is to remove the existing separate manual pages > for all string copying functions, and make them links to this new page.  It > intends to be the only reference documentation for copying strings in C, and > hopefully fix the half century of suboptimal string copying library with which > we've lived.  (Say goodbye to std::string, here come back C strings ;) > > The formatted manual page is below. > > Alex > > P.S.: I'm sorry for your beloved string copying function(s); it has high chances > of being dreaded by the page below.  Not sorry.  Oh well, at least I justified > it, or I tried :-) > > --- > > string_copy(7)         Miscellaneous Information Manual         string_copy(7) > > NAME >        stpcpy,  stpecpy,  stpecpyx, strlcpy, strlcat, strscpy, strcpy, strcat, >        stpncpy, ustr2stp, strncpy, strncat, mempcpy - copy strings > > SYNOPSIS >    (Null‐terminated) strings >        // Chain‐copy a string. >        char *stpcpy(char *restrict dst, const char *restrict src); > >        // Chain‐copy a string with truncation (not in libc). >        char *stpecpy(char *dst, char past_end[0], const char *restrict src); > >        // Chain‐copy a string with truncation and SIGSEGV on invalid input. >        char *stpecpyx(char *dst, char past_end[0], const char *restrict src); > >        // Copy a string with truncation and SIGSEGV on invalid input. >        [[deprecated]]  // Use stpecpyx() instead. >        size_t strlcpy(char dst[restrict .sz], const char *restrict src, >                       size_t sz); > >        // Concatenate a string with truncation. >        [[deprecated]]  // Use stpecpyx() instead. >        size_t strlcat(char dst[restrict .sz], const char *restrict src, >                       size_t sz); > >        // Copy a string with truncation (not in libc). >        [[deprecated]]  // Use stpecpy() instead. >        ssize_t strscpy(char dst[restrict .sz], const char src[restrict .sz], >                       size_t sz); > >        // Copy a string. >        [[deprecated]]  // Use stpcpy(3) instead. >        char *strcpy(char *restrict dst, const char *restrict src); > >        // Concatenate a string. >        [[deprecated]]  // Use stpcpy(3) instead. >        char *strcat(char *restrict dst, const char *restrict src); > >    Unterminated strings (null‐padded fixed‐width buffers) >        // Zero a fixed‐width buffer, and >        // copy a string with truncation into an unterminated string. >        char *stpncpy(char dst[restrict .sz], const char *restrict src, >                       size_t sz); > >        // Chain‐copy an unterminated string into a string (not in libc). >        char *ustr2stp(char *restrict dst, const char src[restrict .sz], >                       size_t sz); > >        // Zero a fixed‐width buffer, and >        // copy a string with truncation into an unterminated string >        [[deprecated]]  // Use stpncpy(3) instead. >        char *strncpy(char dest[restrict .sz], const char *restrict src, >                       size_t sz); > >        // Concatenate an unterminated string into a string. >        [[deprecated]]  // Use ustr2stp() instead. >        char *strncat(char *restrict dst, const char src[restrict .sz], >                       size_t sz); > >    String structures >        // (Null‐terminated) string structure. >        struct str_s { >            size_t  len; >            char    *str; >        }; > >        // Unterminated string structure (overlapping strings). >        struct ustr_s { >            size_t  len; >            char    *ustr; >        }; > >        // Chain‐copy a string structure into an unterminated string. >        void *mempcpy(void *restrict dst, const void src[restrict len], >                       size_t len); > > DESCRIPTION >    Terms (and abbreviations) >        string (str) >               is a sequence of zero or more non‐null characters, followed by a >               null byte. > >        unterminated string (ustr) >               is a sequence of zero or more  non‐null  characters.   They  are >               sometimes  contained  in fixed‐width buffers, which usually con‐ >               tain padding null bytes after the unterminated string,  to  fill >               the  rest  of  the  buffer  without  affecting  the unterminated >               string; however, those padding null bytes are not  part  of  the >               unterminated string. > >        length (len) >               is the number of non‐null characters in a string.  It is the re‐ >               turn value of strlen(str) and of strnlen(ustr, sz). > >        size (sz) >               refers to the entire buffer where the string is contained. > >        end    is  the  name  of  a  pointer  to the terminating null byte of a >               string, or a pointer to one past the last character of an unter‐ >               minated string.  This is the return value of functions that  al‐ >               low chaining.  It is equivalent to &str[len]. > >        past_end >               is  the name of a pointer to one past the end of the buffer that >               contains a string.  It is equivalent to &str[sz].  It is used as >               a sentinel value, to be able  to  truncate  strings  instead  of >               overrunning a buffer. > >        string structure >        unterminated string structure >               Structure  that  contains the length of a string, as well as the >               string or the unterminated string. > >    Types of functions >        Copy, concatenate, and chain‐copy >               Originally, there was a distinction between functions that  copy >               and  those that concatenate.  However, newer functions that copy >               while allowing chaining cover both use cases with a single  API. >               They  are  also algorithmically faster, since they don’t need to >               search for the end of the existing string. > >               To chain copy functions, they need to return a  pointer  to  the >               end.   That’s  a  byproduct  of the copy operation, so it has no >               performance costs.  These functions are preferred over  copy  or >               concatenation  functions.  Functions that return such a pointer, >               and thus can be chained, have names of the form  *stp*(),  since >               it’s also common to name the pointer just p. > >        Truncate or not? >               The  first  thing  to note is that programmers should be careful >               with buffers, so they always have the correct size, and  trunca‐ >               tion is not necessary. > >               In  most  cases, truncation is not desired, and it is simpler to >               just do the copy.  Simpler  code  is  safer  code.   Programming >               against  programming mistakes by adding more code just adds more >               points where mistakes can be made. > >               Nowadays, compilers can detect most programmer errors with  fea‐ >               tures    like   compiler   warnings,   static   analyzers,   and >               _FORTIFY_SOURCE (see ftm(7)).  Keeping  the  code  simple  helps >               these error‐detection features be more precise. > >               When validating user input, however, it makes sense to truncate. >               Remember to check the return value of such function calls. > >               Functions that truncate: > >               •  stpecpy()  is  the  most  efficient string copy function that >                  performs truncation.  It only requires to check  for  trunca‐ >                  tion once after all chained calls. > >               •  stpecpyx() is a variant of stpecpy() that consumes the entire >                  source string, to catch bugs in the program by forcing a seg‐ >                  mentation fault (as strlcpy(3bsd) and strlcat(3bsd) do). > >               •  strlcpy(3bsd) and strlcat(3bsd), which originated in OpenBSD, >                  are designed to crash if the input string is invalid (doesn’t >                  contain a null byte). > >               •  strscpy(9) is a function in the Linux kernel which reports an >                  error instead of crashing. > >               •  stpncpy(3) and strncpy(3) also truncate, but they don’t write >                  strings, but rather unterminated strings. > >    Unterminated strings (null‐padded fixed‐width buffers) >        For  historic reasons, some standard APIs, such as utmpx(5), use unter‐ >        minated strings in fixed‐width buffers.  To interface with  them,  spe‐ >        cialized functions need to be used. > >        To copy strings into them, use stpncpy(3). > >        To  copy from an unterminated string within a fixed‐width buffer into a >        string, ignoring any trailing null  bytes  in  the  source  fixed‐width >        buffer, you should use ustr2stp(). > >    String structures >        The simplest string copying function is mempcpy(3).  It requires always >        knowing  the length of your strings, for which string structures can be >        used.  It makes the code simpler, since you always know the  length  of >        your strings, and it’s also faster, since it doesn’t need to repeatedly >        calculate  those  lengths.   mempcpy(3)  always creates an unterminated >        string, so you need to explicitly set the terminating null byte. > >        String structure >               The following code can be  used  to  chain‐copy  from  a  string >               structure into a string: > >                   p = mempcpy(p, src->str, src->len); >                   *p = '\0'; > >               The  following  code  can  be  used  to chain‐copy from a string >               structure into an unterminated string: > >                   p = mempcpy(p, src->str, src->len); > >        Unterminated string structure (overlapping strings) >               In programs that make considerable use of strings, and need  the >               best  performance, using overlapping strings can make a big dif‐ >               ference.  It allows holding substrings of a bigger string  while >               not duplicating memory nor using time to do a copy. > >               However,  this is delicate, since it requires using unterminated >               strings.  C library APIs use strings, so programs that  use  un‐ >               terminated  strings  will  have  to  take  care to differentiate >               strings from unterminated strings. > >               The following code can be used to chain‐copy  from  an  untermi‐ >               nated string structure to a string: > >                   p = mempcpy(p, src->ustr, src->len); >                   *p = '\0'; > >               The  following  code  can be used to chain‐copy from an untermi‐ >               nated string structure to an unterminated string: > >                   p = mempcpy(p, src->ustr, src->len); > >    Functions >        stpcpy(3) >               This function copies the input string into a destination string. >               The programmer is responsible  for  allocating  a  buffer  large >               enough.  It returns a pointer suitable for chaining. > >        stpecpy() >        stpecpyx() >               These functions copy the input string into a destination string. >               If  the destination buffer, limited by a pointer to one past the >               end of it, isn’t large enough to hold the  copy,  the  resulting >               string  is  truncated  (but  it  is guaranteed to be null‐termi‐ >               nated).  They return a pointer suitable for  chaining.   Trunca‐ >               tion needs to be detected only once after the last chained call. >               stpecpyx()  has identical semantics to stpecpy(), except that it >               forces a SIGSEGV on Undefined Behavior. > >               These functions are not provided by any library, but you can de‐ >               fine them with the following reference implementations: > >                   /* This code is in the public domain. */ >                   char * >                   stpecpy(char *dst, char past_end[0], >                           const char *restrict src) >                   { >                       char *p; > >                       if (dst == past_end) >                           return past_end; > >                       p = memccpy(dst, src, '\0', past_end - dst); >                       if (p != NULL) >                           return p - 1; > >                       /* truncation detected */ >                       past_end[-1] = '\0'; >                       return past_end; >                   } > >                   /* This code is in the public domain. */ >                   char * >                   stpecpyx(char *dst, char past_end[0], >                            const char *restrict src) >                   { >                       if (src[strlen(src)] != '\0') >                           raise(SIGSEGV); > >                       return stpecpy(dst, past_end, src); >                   } > >        stpncpy(3) >               This function copies the input string into a  destination  null‐ >               padded  fixed‐width  unterminated  string.   If  the destination >               buffer, limited by its size, isn’t  large  enough  to  hold  the >               copy,  the  resulting  string is truncated.  Since it creates an >               unterminated string, it doesn’t need to write a terminating null >               byte.  It returns a pointer suitable for chaining, but it’s  not >               ideal for that.  Truncation needs to be detected only once after >               the last chained call. > >               If  you’re going to use this function in chained calls, it would >               probably be useful to develop a function similar to stpecpy(). > >        ustr2stp() >               This function copies the input unterminated string contained  in >               a  null‐padded wixed‐width buffer, into a destination (null‐ter‐ >               minated) string.  The programmer is responsible for allocating a >               buffer large enough.  It returns a pointer suitable  for  chain‐ >               ing. > >               This  function is not provided by any library, but you can write >               it with the definition above in this page. > >               A truncating version of this function doesn’t exist,  since  the >               size  of  the original string is always known, so it wouldn’t be >               very useful. > >               This function is not provided by any library, but you can define >               it with the following reference implementation: > >                   /* This code is in the public domain. */ >                   char * >                   ustr2stp(char *restrict dst, const char *restrict src, >                            size_t sz) >                   { >                       char  *end; > >                       end = memccpy(dst, src, '\0', sz)) ?: dst + sz; >                       *end = '\0'; > >                       return end; >                   } > >        mempcpy(3) >               This function copies the input string, limited  by  its  length, >               into  a  destination unterminated string.  The programmer is re‐ >               sponsible for allocating a buffer large enough.   It  returns  a >               pointer suitable for chaining. > >    Deprecated functions >        strlcpy(3bsd) >        strlcat(3bsd) >               Deprecated.  These functions copy the input string into a desti‐ >               nation  string.  If the destination buffer, limited by its size, >               isn’t large enough to hold the copy,  the  resulting  string  is >               truncated  (but  it  is guaranteed to be null‐terminated).  They >               return the length of the total  string  they  tried  to  create. >               These functions force a SIGSEGV on Undefined Behavior. > >               stpecpyx()  is  a better replacement for these functions for the >               following reasons: > >               •  Better performance (chain copy instead of concatenating). > >               •  Only requires detecting truncation once per chain of calls. > >        strscpy(9) >               Deprecated.  This function copies the input string into a desti‐ >               nation string.  If the destination buffer, limited by its  size, >               isn’t  large  enough  to  hold the copy, the resulting string is >               truncated (but it is guaranteed to be null‐terminated).  It  re‐ >               turns the length of the destination string, or -E2BIG on trunca‐ >               tion. > >               stpecpy()  is  a  better replacement for this function, since it >               has a much simpler interface. > >        strcpy(3) >        strcat(3) >               Deprecated.  These functions copy the input string into a desti‐ >               nation string.  The programmer is responsible for  allocating  a >               buffer large enough.  The return value is useless. > >               strcpy(3)  is  identical to stpcpy(3) except for the useless re‐ >               turn value. > >               stpcpy(3) is a better replacement for these  functions  for  the >               following reasons: > >               •  Better performance (chain copy instead of concatenating). > >               •  No need to call strlen(3), thanks to the useful return value. > >        strncpy(3) >               Deprecated.   strncpy(3)  is  identical to stpncpy(3) except for >               the useless return value.  Due to the return  value,  with  this >               function  it’s hard to correctly check for truncation.  Use stp‐ >               ncpy(3) instead. > >        strncat(3) >               Deprecated.  Do not confuse this function with strncpy(3);  they >               are not related at all. > >               This  function  concatenates  the input unterminated string con‐ >               tained in a null‐padded wixed‐width buffer, into  a  destination >               (null‐terminated) string.  The programmer is responsible for al‐ >               locating a buffer large enough.  The return value is useless. > >               ustr2stp()  is  a  better  replacement for this function for the >               following reasons: > >               •  Better performance (chain copy instead of concatenating). > >               •  No need to call strlen(3), thanks to the useful return value. > >               •  Function name that is not actively confusing. > > RETURN VALUE >        The following functions return a pointer to the terminating  null  byte >        in the destination string (they never truncate). > >        •  stpcpy(3) > >        •  ustr2stp() > >        •  mempcpy(3) > >        The  following  functions return a pointer to the terminating null byte >        in the destination string, except when truncation occurs; if truncation >        occurs, they return a pointer to one past the end  of  the  destination >        buffer. > >        •  stpecpy() > >        •  stpecpyx() > >        The  following function returns a pointer to one after the last charac‐ >        ter in the destination unterminated string; if truncation occurs,  that >        pointer  is equivalent to a pointer to one past the end of the destina‐ >        tion buffer. > >        •  stpncpy(3) > >    Deprecated >        The following functions return the length of the total string that they >        tried to create (as if truncation didn’t occur). > >        •  strlcpy(3bsd) > >        •  strlcat(3bsd) > >        The following function returns the length of the destination string, or >        -E2BIG on truncation. > >        •  strscpy(9) > >        The following functions return the dst pointer, which is useless. > >        •  strcpy(3) > >        •  strcat(3) > >        •  strncpy(3) > >        •  strncat(3) And here goes the STANDARDS section: STANDARDS stpcpy(3) POSIX.1‐2008. stpecpy() stpecpyx() ustr2stp() Not defined by any standards nor libraries. stpncpy(3) POSIX.1‐2008. mempcpy(3) This function is a GNU extension. strlcpy(3bsd) strlcat(3bsd) Functions originated in OpenBSD and present in some Unix sys‐ tems. They are provided in GNU/Linux systems by libbsd. strscpy(9) Linux kernel internal function. strcpy(3) strcat(3) POSIX.1‐2001, POSIX.1‐2008, C89, C99, SVr4, 4.3BSD. strncpy(3) POSIX.1‐2001, POSIX.1‐2008, C89, C99, SVr4, 4.3BSD. strncat(3) POSIX.1‐2001, POSIX.1‐2008, C89, C99, SVr4, 4.3BSD. > > CAVEATS >        Some of the functions described here are not provided by  any  library; >        you should write your own copy if you want to use them. > >        The  deprecated status of these functions varies from system to system. >        This page declares as deprecated those functions that have a better re‐ >        placement documented in this same page. > > EXAMPLES >        The following are examples of correct use of each of these functions. > >        stpcpy(3) >                   p = buf; >                   p = stpcpy(p, "Hello "); >                   p = stpcpy(p, "world"); >                   p = stpcpy(p, "!"); >                   len = p - buf; >                   puts(buf); > >        stpecpy() >        stpecpyx() >                   past_end = buf + sizeof(buf); >                   p = buf; >                   p = stpecpy(p, past_end, "Hello "); >                   p = stpecpy(p, past_end, "world"); >                   p = stpecpy(p, past_end, "!"); >                   if (p == past_end) { >                       p--; >                       goto toolong; >                   } >                   len = p - buf; >                   puts(buf); > >        stpncpy(3) >                   past_end = buf + sizeof(buf); >                   end = stpncpy(buf, "Hello world!", sizeof(buf)); >                   if (end == past_end) >                       goto toolong; >                   len = end - buf; >                   for (size_t i = 0; i < sizeof(buf); i++) >                       putchar(buf[i]); > >        ustr2stp() >                   p = buf; >                   p = ustr2stp(p, "Hello ", 6); >                   p = ustr2stp(p, "world", 42);  // Padding null bytes ignored. >                   p = ustr2stp(p, "!", 1); >                   len = p - buf; >                   puts(buf); > >        mempcpy(3) >                   p = buf; >                   p = mempcpy(p, "Hello ", 6); >                   p = mempcpy(p, "world", 5); >                   p = mempcpy(p, "!", 1); >                   p = '\0'; >                   len = p - buf; >                   puts(buf); > >    Deprecated >        strlcpy(3bsd) >        strlcat(3bsd) >                   if (strlcpy(buf, "Hello ", sizeof(buf)) >= sizeof(buf)) >                       goto toolong; >                   if (strlcat(buf, "world", sizeof(buf)) >= sizeof(buf)) >                       goto toolong; >                   len = strlcat(buf, "!", sizeof(buf)); >                   if (len >= sizeof(buf)) >                       goto toolong; >                   puts(buf); > >        strscpy(9) >                   len = strscpy(buf, "Hello world!", sizeof(buf)); >                   if (len == -E2BIG) >                       goto toolong; >                   puts(buf); > >        strcpy(3) >        strcat(3) >                   strcpy(buf, "Hello "); >                   strcat(buf, "world"); >                   strcat(buf, "!"); >                   len = strlen(buf); >                   puts(buf); > >        strncpy(3) >                   strncpy(buf, "Hello world!", sizeof(buf)); >                   if (buf + sizeof(buf) - 1 == '\0') >                       goto toolong; >                   len = strnlen(buf, sizeof(buf)); >                   for (size_t i = 0; i < sizeof(buf); i++) >                       putchar(buf[i]); > >        strncat(3) >                   strncpy(buf, "Hello ", 6); >                   strncat(buf, "world", 42);  // Padding null bytes ignored. >                   strncat(buf, "!", 1); >                   puts(buf); > > SEE ALSO >        memcpy(3), memccpy(3), mempcpy(3), string(3) > > Linux man‐pages (unreleased)        (date)                      string_copy(7) > > > --