On 12/12/22 00:59, Alejandro Colomar wrote: > Hi all! > > I'm planning to add a new manual page that documents all string copying > functions.  It covers more detail than any of the existing manual pages (and in > fact, I've discovered some properties of the functions documented while working > on this page).  The intention is to remove the existing separate manual pages > for all string copying functions, and make them links to this new page.  It > intends to be the only reference documentation for copying strings in C, and > hopefully fix the half century of suboptimal string copying library with which > we've lived.  (Say goodbye to std::string, here come back C strings ;) > > The formatted manual page is below. > > Alex > > P.S.: I'm sorry for your beloved string copying function(s); it has high chances > of being dreaded by the page below.  Not sorry.  Oh well, at least I justified > it, or I tried :-) > > --- > > string_copy(7)         Miscellaneous Information Manual         string_copy(7) > > NAME >        stpcpy,  stpecpy,  stpecpyx, strlcpy, strlcat, strscpy, strcpy, strcat, >        stpncpy, ustr2stp, strncpy, strncat, mempcpy - copy strings > > SYNOPSIS >    (Null‐terminated) strings >        // Chain‐copy a string. >        char *stpcpy(char *restrict dst, const char *restrict src); > >        // Chain‐copy a string with truncation (not in libc). >        char *stpecpy(char *dst, char past_end[0], const char *restrict src); > >        // Chain‐copy a string with truncation and SIGSEGV on invalid input. >        char *stpecpyx(char *dst, char past_end[0], const char *restrict src); > >        // Copy a string with truncation and SIGSEGV on invalid input. >        [[deprecated]]  // Use stpecpyx() instead. >        size_t strlcpy(char dst[restrict .sz], const char *restrict src, >                       size_t sz); > >        // Concatenate a string with truncation. >        [[deprecated]]  // Use stpecpyx() instead. >        size_t strlcat(char dst[restrict .sz], const char *restrict src, >                       size_t sz); > >        // Copy a string with truncation (not in libc). >        [[deprecated]]  // Use stpecpy() instead. >        ssize_t strscpy(char dst[restrict .sz], const char src[restrict .sz], >                       size_t sz); > >        // Copy a string. >        [[deprecated]]  // Use stpcpy(3) instead. >        char *strcpy(char *restrict dst, const char *restrict src); > >        // Concatenate a string. >        [[deprecated]]  // Use stpcpy(3) instead. >        char *strcat(char *restrict dst, const char *restrict src); > >    Unterminated strings (null‐padded fixed‐width buffers) >        // Zero a fixed‐width buffer, and >        // copy a string with truncation into an unterminated string. >        char *stpncpy(char dst[restrict .sz], const char *restrict src, >                       size_t sz); > >        // Chain‐copy an unterminated string into a string (not in libc). >        char *ustr2stp(char *restrict dst, const char src[restrict .sz], >                       size_t sz); > >        // Zero a fixed‐width buffer, and >        // copy a string with truncation into an unterminated string >        [[deprecated]]  // Use stpncpy(3) instead. >        char *strncpy(char dest[restrict .sz], const char *restrict src, >                       size_t sz); > >        // Concatenate an unterminated string into a string. >        [[deprecated]]  // Use ustr2stp() instead. >        char *strncat(char *restrict dst, const char src[restrict .sz], >                       size_t sz); > >    String structures >        // (Null‐terminated) string structure. >        struct str_s { >            size_t  len; >            char    *str; >        }; > >        // Unterminated string structure (overlapping strings). >        struct ustr_s { >            size_t  len; >            char    *ustr; >        }; > >        // Chain‐copy a string structure into an unterminated string. >        void *mempcpy(void *restrict dst, const void src[restrict len], >                       size_t len); > > DESCRIPTION >    Terms (and abbreviations) >        string (str) >               is a sequence of zero or more non‐null characters, followed by a >               null byte. > >        unterminated string (ustr) >               is a sequence of zero or more  non‐null  characters.   They  are >               sometimes  contained  in fixed‐width buffers, which usually con‐ >               tain padding null bytes after the unterminated string,  to  fill >               the  rest  of  the  buffer  without  affecting  the unterminated >               string; however, those padding null bytes are not  part  of  the >               unterminated string. > >        length (len) >               is the number of non‐null characters in a string.  It is the re‐ >               turn value of strlen(str) and of strnlen(ustr, sz). > >        size (sz) >               refers to the entire buffer where the string is contained. > >        end    is  the  name  of  a  pointer  to the terminating null byte of a >               string, or a pointer to one past the last character of an unter‐ >               minated string.  This is the return value of functions that  al‐ >               low chaining.  It is equivalent to &str[len]. > >        past_end >               is  the name of a pointer to one past the end of the buffer that >               contains a string.  It is equivalent to &str[sz].  It is used as >               a sentinel value, to be able  to  truncate  strings  instead  of >               overrunning a buffer. > >        string structure >        unterminated string structure >               Structure  that  contains the length of a string, as well as the >               string or the unterminated string. > >    Types of functions >        Copy, concatenate, and chain‐copy >               Originally, there was a distinction between functions that  copy >               and  those that concatenate.  However, newer functions that copy >               while allowing chaining cover both use cases with a single  API. >               They  are  also algorithmically faster, since they don’t need to >               search for the end of the existing string. > >               To chain copy functions, they need to return a  pointer  to  the >               end.   That’s  a  byproduct  of the copy operation, so it has no >               performance costs.  These functions are preferred over  copy  or >               concatenation  functions.  Functions that return such a pointer, >               and thus can be chained, have names of the form  *stp*(),  since >               it’s also common to name the pointer just p. > >        Truncate or not? >               The  first  thing  to note is that programmers should be careful >               with buffers, so they always have the correct size, and  trunca‐ >               tion is not necessary. > >               In  most  cases, truncation is not desired, and it is simpler to >               just do the copy.  Simpler  code  is  safer  code.   Programming >               against  programming mistakes by adding more code just adds more >               points where mistakes can be made. > >               Nowadays, compilers can detect most programmer errors with  fea‐ >               tures    like   compiler   warnings,   static   analyzers,   and >               _FORTIFY_SOURCE (see ftm(7)).  Keeping  the  code  simple  helps >               these error‐detection features be more precise. > >               When validating user input, however, it makes sense to truncate. >               Remember to check the return value of such function calls. > >               Functions that truncate: > >               •  stpecpy()  is  the  most  efficient string copy function that >                  performs truncation.  It only requires to check  for  trunca‐ >                  tion once after all chained calls. > >               •  stpecpyx() is a variant of stpecpy() that consumes the entire >                  source string, to catch bugs in the program by forcing a seg‐ >                  mentation fault (as strlcpy(3bsd) and strlcat(3bsd) do). > >               •  strlcpy(3bsd) and strlcat(3bsd), which originated in OpenBSD, >                  are designed to crash if the input string is invalid (doesn’t >                  contain a null byte). > >               •  strscpy(9) is a function in the Linux kernel which reports an >                  error instead of crashing. > >               •  stpncpy(3) and strncpy(3) also truncate, but they don’t write >                  strings, but rather unterminated strings. > >    Unterminated strings (null‐padded fixed‐width buffers) >        For  historic reasons, some standard APIs, such as utmpx(5), use unter‐ >        minated strings in fixed‐width buffers.  To interface with  them,  spe‐ >        cialized functions need to be used. > >        To copy strings into them, use stpncpy(3). > >        To  copy from an unterminated string within a fixed‐width buffer into a >        string, ignoring any trailing null  bytes  in  the  source  fixed‐width >        buffer, you should use ustr2stp(). > >    String structures >        The simplest string copying function is mempcpy(3).  It requires always >        knowing  the length of your strings, for which string structures can be >        used.  It makes the code simpler, since you always know the  length  of >        your strings, and it’s also faster, since it doesn’t need to repeatedly >        calculate  those  lengths.   mempcpy(3)  always creates an unterminated >        string, so you need to explicitly set the terminating null byte. > >        String structure >               The following code can be  used  to  chain‐copy  from  a  string >               structure into a string: > >                   p = mempcpy(p, src->str, src->len); >                   *p = '\0'; > >               The  following  code  can  be  used  to chain‐copy from a string >               structure into an unterminated string: > >                   p = mempcpy(p, src->str, src->len); > >        Unterminated string structure (overlapping strings) >               In programs that make considerable use of strings, and need  the >               best  performance, using overlapping strings can make a big dif‐ >               ference.  It allows holding substrings of a bigger string  while >               not duplicating memory nor using time to do a copy. > >               However,  this is delicate, since it requires using unterminated >               strings.  C library APIs use strings, so programs that  use  un‐ >               terminated  strings  will  have  to  take  care to differentiate >               strings from unterminated strings. > >               The following code can be used to chain‐copy  from  an  untermi‐ >               nated string structure to a string: > >                   p = mempcpy(p, src->ustr, src->len); >                   *p = '\0'; > >               The  following  code  can be used to chain‐copy from an untermi‐ >               nated string structure to an unterminated string: > >                   p = mempcpy(p, src->ustr, src->len); > >    Functions >        stpcpy(3) >               This function copies the input string into a destination string. >               The programmer is responsible  for  allocating  a  buffer  large >               enough.  It returns a pointer suitable for chaining. > >        stpecpy() >        stpecpyx() >               These functions copy the input string into a destination string. >               If  the destination buffer, limited by a pointer to one past the >               end of it, isn’t large enough to hold the  copy,  the  resulting >               string  is  truncated  (but  it  is guaranteed to be null‐termi‐ >               nated).  They return a pointer suitable for  chaining.   Trunca‐ >               tion needs to be detected only once after the last chained call. >               stpecpyx()  has identical semantics to stpecpy(), except that it >               forces a SIGSEGV on Undefined Behavior. > >               These functions are not provided by any library, but you can de‐ >               fine them with the following reference implementations: > >                   /* This code is in the public domain. */ >                   char * >                   stpecpy(char *dst, char past_end[0], >                           const char *restrict src) >                   { >                       char *p; > >                       if (dst == past_end) >                           return past_end; > >                       p = memccpy(dst, src, '\0', past_end - dst); >                       if (p != NULL) >                           return p - 1; > >                       /* truncation detected */ >                       past_end[-1] = '\0'; >                       return past_end; >                   } > >                   /* This code is in the public domain. */ >                   char * >                   stpecpyx(char *dst, char past_end[0], >                            const char *restrict src) >                   { >                       if (src[strlen(src)] != '\0') >                           raise(SIGSEGV); > >                       return stpecpy(dst, past_end, src); >                   } > >        stpncpy(3) >               This function copies the input string into a  destination  null‐ >               padded  fixed‐width  unterminated  string.   If  the destination >               buffer, limited by its size, isn’t  large  enough  to  hold  the >               copy,  the  resulting  string is truncated.  Since it creates an >               unterminated string, it doesn’t need to write a terminating null >               byte.  It returns a pointer suitable for chaining, but it’s  not >               ideal for that.  Truncation needs to be detected only once after >               the last chained call. > >               If  you’re going to use this function in chained calls, it would >               probably be useful to develop a function similar to stpecpy(). > >        ustr2stp() >               This function copies the input unterminated string contained  in >               a  null‐padded wixed‐width buffer, into a destination (null‐ter‐ >               minated) string.  The programmer is responsible for allocating a >               buffer large enough.  It returns a pointer suitable  for  chain‐ >               ing. > >               This  function is not provided by any library, but you can write >               it with the definition above in this page. > >               A truncating version of this function doesn’t exist,  since  the >               size  of  the original string is always known, so it wouldn’t be >               very useful. > >               This function is not provided by any library, but you can define >               it with the following reference implementation: > >                   /* This code is in the public domain. */ >                   char * >                   ustr2stp(char *restrict dst, const char *restrict src, >                            size_t sz) >                   { >                       char  *end; > >                       end = memccpy(dst, src, '\0', sz)) ?: dst + sz; >                       *end = '\0'; > >                       return end; >                   } > >        mempcpy(3) >               This function copies the input string, limited  by  its  length, >               into  a  destination unterminated string.  The programmer is re‐ >               sponsible for allocating a buffer large enough.   It  returns  a >               pointer suitable for chaining. > >    Deprecated functions >        strlcpy(3bsd) >        strlcat(3bsd) >               Deprecated.  These functions copy the input string into a desti‐ >               nation  string.  If the destination buffer, limited by its size, >               isn’t large enough to hold the copy,  the  resulting  string  is >               truncated  (but  it  is guaranteed to be null‐terminated).  They >               return the length of the total  string  they  tried  to  create. >               These functions force a SIGSEGV on Undefined Behavior. > >               stpecpyx()  is  a better replacement for these functions for the >               following reasons: > >               •  Better performance (chain copy instead of concatenating). > >               •  Only requires detecting truncation once per chain of calls. > >        strscpy(9) >               Deprecated.  This function copies the input string into a desti‐ >               nation string.  If the destination buffer, limited by its  size, >               isn’t  large  enough  to  hold the copy, the resulting string is >               truncated (but it is guaranteed to be null‐terminated).  It  re‐ >               turns the length of the destination string, or -E2BIG on trunca‐ >               tion. > >               stpecpy()  is  a  better replacement for this function, since it >               has a much simpler interface. > >        strcpy(3) >        strcat(3) >               Deprecated.  These functions copy the input string into a desti‐ >               nation string.  The programmer is responsible for  allocating  a >               buffer large enough.  The return value is useless. > >               strcpy(3)  is  identical to stpcpy(3) except for the useless re‐ >               turn value. > >               stpcpy(3) is a better replacement for these  functions  for  the >               following reasons: > >               •  Better performance (chain copy instead of concatenating). > >               •  No need to call strlen(3), thanks to the useful return value. > >        strncpy(3) >               Deprecated.   strncpy(3)  is  identical to stpncpy(3) except for >               the useless return value.  Due to the return  value,  with  this >               function  it’s hard to correctly check for truncation.  Use stp‐ >               ncpy(3) instead. > >        strncat(3) >               Deprecated.  Do not confuse this function with strncpy(3);  they >               are not related at all. > >               This  function  concatenates  the input unterminated string con‐ >               tained in a null‐padded wixed‐width buffer, into  a  destination >               (null‐terminated) string.  The programmer is responsible for al‐ >               locating a buffer large enough.  The return value is useless. > >               ustr2stp()  is  a  better  replacement for this function for the >               following reasons: > >               •  Better performance (chain copy instead of concatenating). > >               •  No need to call strlen(3), thanks to the useful return value. > >               •  Function name that is not actively confusing. > > RETURN VALUE >        The following functions return a pointer to the terminating  null  byte >        in the destination string (they never truncate). > >        •  stpcpy(3) > >        •  ustr2stp() > >        •  mempcpy(3) > >        The  following  functions return a pointer to the terminating null byte >        in the destination string, except when truncation occurs; if truncation >        occurs, they return a pointer to one past the end  of  the  destination >        buffer. > >        •  stpecpy() > >        •  stpecpyx() > >        The  following function returns a pointer to one after the last charac‐ >        ter in the destination unterminated string; if truncation occurs,  that >        pointer  is equivalent to a pointer to one past the end of the destina‐ >        tion buffer. > >        •  stpncpy(3) > >    Deprecated >        The following functions return the length of the total string that they >        tried to create (as if truncation didn’t occur). > >        •  strlcpy(3bsd) > >        •  strlcat(3bsd) > >        The following function returns the length of the destination string, or >        -E2BIG on truncation. > >        •  strscpy(9) > >        The following functions return the dst pointer, which is useless. > >        •  strcpy(3) > >        •  strcat(3) > >        •  strncpy(3) > >        •  strncat(3) > > CAVEATS >        Some of the functions described here are not provided by  any  library; >        you should write your own copy if you want to use them. > >        The  deprecated status of these functions varies from system to system. >        This page declares as deprecated those functions that have a better re‐ >        placement documented in this same page. > > EXAMPLES >        The following are examples of correct use of each of these functions. > >        stpcpy(3) >                   p = buf; >                   p = stpcpy(p, "Hello "); >                   p = stpcpy(p, "world"); >                   p = stpcpy(p, "!"); >                   len = p - buf; >                   puts(buf); > >        stpecpy() >        stpecpyx() >                   past_end = buf + sizeof(buf); >                   p = buf; >                   p = stpecpy(p, past_end, "Hello "); >                   p = stpecpy(p, past_end, "world"); >                   p = stpecpy(p, past_end, "!"); >                   if (p == past_end) { >                       p--; >                       goto toolong; >                   } >                   len = p - buf; >                   puts(buf); > >        stpncpy(3) >                   past_end = buf + sizeof(buf); >                   end = stpncpy(buf, "Hello world!", sizeof(buf)); >                   if (end == past_end) >                       goto toolong; >                   len = end - buf; >                   for (size_t i = 0; i < sizeof(buf); i++) >                       putchar(buf[i]); > >        ustr2stp() >                   p = buf; >                   p = ustr2stp(p, "Hello ", 6); >                   p = ustr2stp(p, "world", 42);  // Padding null bytes ignored. >                   p = ustr2stp(p, "!", 1); >                   len = p - buf; >                   puts(buf); > >        mempcpy(3) >                   p = buf; >                   p = mempcpy(p, "Hello ", 6); >                   p = mempcpy(p, "world", 5); >                   p = mempcpy(p, "!", 1); >                   p = '\0'; >                   len = p - buf; >                   puts(buf); > >    Deprecated >        strlcpy(3bsd) >        strlcat(3bsd) >                   if (strlcpy(buf, "Hello ", sizeof(buf)) >= sizeof(buf)) >                       goto toolong; >                   if (strlcat(buf, "world", sizeof(buf)) >= sizeof(buf)) >                       goto toolong; >                   len = strlcat(buf, "!", sizeof(buf)); >                   if (len >= sizeof(buf)) >                       goto toolong; >                   puts(buf); > >        strscpy(9) >                   len = strscpy(buf, "Hello world!", sizeof(buf)); >                   if (len == -E2BIG) >                       goto toolong; >                   puts(buf); > >        strcpy(3) >        strcat(3) >                   strcpy(buf, "Hello "); >                   strcat(buf, "world"); >                   strcat(buf, "!"); >                   len = strlen(buf); >                   puts(buf); > >        strncpy(3) >                   strncpy(buf, "Hello world!", sizeof(buf)); >                   if (buf + sizeof(buf) - 1 == '\0') >                       goto toolong; >                   len = strnlen(buf, sizeof(buf)); >                   for (size_t i = 0; i < sizeof(buf); i++) >                       putchar(buf[i]); > >        strncat(3) >                   strncpy(buf, "Hello ", 6); >                   strncat(buf, "world", 42);  // Padding null bytes ignored. >                   strncat(buf, "!", 1); >                   puts(buf); Oops, that example was mistaken; too much cut and paste. strncat(3) buf[0] = '\0'; strncat(buf, "Hello ", 6); strncat(buf, "world", 42); // Padding null bytes ignored. strncat(buf, "!", 1); len = strlen(buf); puts(buf); > > SEE ALSO >        memcpy(3), memccpy(3), mempcpy(3), string(3) > > Linux man‐pages (unreleased)        (date)                      string_copy(7) > > > --