* [RFC] strcpys(): New function for copying strings safely @ 2021-06-27 19:26 Alejandro Colomar (man-pages) 2021-06-27 19:46 ` Alejandro Colomar (man-pages) 0 siblings, 1 reply; 6+ messages in thread From: Alejandro Colomar (man-pages) @ 2021-06-27 19:26 UTC (permalink / raw) To: glibc; +Cc: tech, Christoph Hellwig, linux-kernel Hi, I recently re-read the discussions about strscpy() and strlcpy(). https://lwn.net/Articles/612244/ https://sourceware.org/legacy-ml/libc-alpha/2000-08/msg00052.html https://mafford.com/text/the-many-ways-to-copy-a-string-in-c/ https://lkml.org/lkml/2017/7/14/637 https://lwn.net/Articles/659214/ Arguments against strlcpy(): - Inefficiency (compared to memcpy()) - It doesn't report truncation, needing extra checks by the user - It doesn't prevent overrunning the source string (requires a C-string input) Arguments in favor of strlcpy(): - Avoid code bloat - Self-documenting code - Avoid everybody rolling their own strcat() safe variant - Prevent buffer overflows I rolled my own strcpy safer functions some time ago, and after reading those discussions, I decided to propose them for inclusion in glibc. I think they address all of the arguments before (well, efficiency will never be as good as memcpy(), I guess, but a good compiler, and -flto can make it good enough). It reports two kinds of errors: "hard" errors, where the string is invalid, and "soft" errors, where the string is valid but truncated. The method for reporting the errors is an 'int' return value, which is 0 for success, -1 for "hard" error, and 1 for "soft" error. The value of written characters that strcpy() functions typically return, is now moved to a pointer parameter (4th parameter). The order of the parameters has been changed (compared to other strcpy() typicall variants), according to a new principle of the C2x standard, which says that "the size of an array appears before the array". <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2086.htm> It doesn't require a C string in the input. It only reads up to the size limit provided by the user. I added the _np suffix to the function to mark it as non-portable (similar to existing practice in glibc with non-standard functions). I used 'ssize_t' instead of 'size_t', because I consider unsigned types to be inherently unsafe. See <https://google.github.io/styleguide/cppguide.html#Integer_Types>. It is designed so that usage requires the minimum number of lines of code for complete usage (including error handling checks): [[ // When we already checked that 'size' is >= 1 // and truncation is not an issue: strcpys_np(size, dest, src, NULL); // When we ddidn't check the value of 'size', // but truncation is still not an issue: if (strcpys_np(size, dest, src, NULL) == -1) goto handle_hard_error; // When truncation is an issue: if (strcpys_np(size, dest, src, NULL)) goto handle_all_errors; // When truncation is an error, // but it requires a different handling than a "hard" error: status = strcpys_np(size, dest, src, NULL); if (status == 1) goto handle_truncation; if (status) goto handle_hard_error; ]] After any of those samples of code, the string has been copied, and is a valid C-string. Here goes the code (strcpys_np() is defined in terms of strscpy_np(), which similar to the known strscpy(), but with some of the improvements mentioned above, such as using array parameters, and ssize_t): [[ #include <string.h> #include <sys/types.h> [[gnu::nonnull]] ssize_t strscpy_np(ssize_t size, char dest[static restrict size], const char src[static restrict size]) { ssize_t len; if (size <= 0) return -1; len = strnlen(src, size - 1); memcpy(dest, src, len); dest[len] = '\0'; return len; } [[gnu::nonnull(2, 3)]] int strcpys_np(ssize_t size, char dest[static restrict size], const char src[static restrict size], ssize_t *restrict len) { ssize_t l; l = strscpy_np(size, dest, src); if (len) *len = l; if (l == -1) return -1; if (l >= size) return 1; return 0; } ]] I may have introduced some bugs right now, as I adapted the code a bit before sending, but I expect it to be free of any bugs known of the existing str*cpy() interfaces. What do you think about this function? Would you want to add it to glibc? Thanks, Alex -- Alejandro Colomar Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/ http://www.alejandro-colomar.es/ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] strcpys(): New function for copying strings safely 2021-06-27 19:26 [RFC] strcpys(): New function for copying strings safely Alejandro Colomar (man-pages) @ 2021-06-27 19:46 ` Alejandro Colomar (man-pages) 2021-06-28 8:15 ` David Laight 0 siblings, 1 reply; 6+ messages in thread From: Alejandro Colomar (man-pages) @ 2021-06-27 19:46 UTC (permalink / raw) To: glibc; +Cc: tech, Christoph Hellwig, linux-kernel On 6/27/21 9:26 PM, Alejandro Colomar (man-pages) wrote: > > It is designed so that usage requires the minimum number of lines of > code for complete usage (including error handling checks): > > [[ > // When we already checked that 'size' is >= 1 > // and truncation is not an issue: > > strcpys_np(size, dest, src, NULL); Also, given how unlikely this case is, I have in my code: `[[gnu::warn_unused_result]]` I forgot to talk about it in the definition I sent. I would put that attribute in the glibc definition, if this is added to glibc. To ignore it, a simple cast of the result to `(void)` should be enough (or a more complex macro, like `UNUSED(strcpys_np(...));`). > > [[ > > #include <string.h> > #include <sys/types.h> > > > [[gnu::nonnull]] > ssize_t strscpy_np(ssize_t size, > char dest[static restrict size], > const char src[static restrict size]) > { > ssize_t len; > > if (size <= 0) > return -1; > > len = strnlen(src, size - 1); > memcpy(dest, src, len); > dest[len] = '\0'; > > return len; > } > > [[gnu::nonnull(2, 3)]] [[gnu::warn_unused_result]] > int strcpys_np(ssize_t size, > char dest[static restrict size], > const char src[static restrict size], > ssize_t *restrict len) > { > ssize_t l; > > l = strscpy_np(size, dest, src); > if (len) > *len = l; > > if (l == -1) > return -1; > if (l >= size) > return 1; > return 0; > } > > ]] -- Alejandro Colomar Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/ http://www.alejandro-colomar.es/ ^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [RFC] strcpys(): New function for copying strings safely 2021-06-27 19:46 ` Alejandro Colomar (man-pages) @ 2021-06-28 8:15 ` David Laight 2021-06-28 11:32 ` Alejandro Colomar (man-pages) 0 siblings, 1 reply; 6+ messages in thread From: David Laight @ 2021-06-28 8:15 UTC (permalink / raw) To: 'Alejandro Colomar (man-pages)', glibc Cc: tech, Christoph Hellwig, linux-kernel From: Alejandro Colomar > Sent: 27 June 2021 20:47 > > On 6/27/21 9:26 PM, Alejandro Colomar (man-pages) wrote: > > > > It is designed so that usage requires the minimum number of lines of > > code for complete usage (including error handling checks): > > > > [[ > > // When we already checked that 'size' is >= 1 > > // and truncation is not an issue: > > > > strcpys_np(size, dest, src, NULL); > > Also, given how unlikely this case is, I have in my code: > `[[gnu::warn_unused_result]]` warn_unused_result is such a PITA it should never have been invented. At least no one seems to have been stupid enough to apply it to fprintf() (you need to do fflush() before looking for errors). In most cases strcpy() is safe - but if the inputs are 'corrupt' horrid things happen - so you need truncation for safety. But you really may not care whether the data got truncated. The other use is where you want a sequence of: len *= str_copy(dest + len, dest_len - len, src); But don't really want to test for overflow after each call. In this case returning the buffer length (including the added '\0' on truncation works quite nicely. No chance of writing outside the buffer, safe truncation and a simple 'len == dest_len' check for overflow. OTOH there are already too many string copy functions. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] strcpys(): New function for copying strings safely 2021-06-28 8:15 ` David Laight @ 2021-06-28 11:32 ` Alejandro Colomar (man-pages) 2021-06-28 12:00 ` Alejandro Colomar (man-pages) 0 siblings, 1 reply; 6+ messages in thread From: Alejandro Colomar (man-pages) @ 2021-06-28 11:32 UTC (permalink / raw) To: David Laight, glibc; +Cc: tech, Christoph Hellwig, linux-kernel, drepper Hello David, On 6/28/21 10:15 AM, David Laight wrote: > From: Alejandro Colomar >> Sent: 27 June 2021 20:47 >> >> On 6/27/21 9:26 PM, Alejandro Colomar (man-pages) wrote: >>> >>> It is designed so that usage requires the minimum number of lines of >>> code for complete usage (including error handling checks): >>> >>> [[ >>> // When we already checked that 'size' is >= 1 >>> // and truncation is not an issue: >>> >>> strcpys_np(size, dest, src, NULL); >> >> Also, given how unlikely this case is, I have in my code: >> `[[gnu::warn_unused_result]]` > > warn_unused_result is such a PITA it should never have been invented. > At least no one seems to have been stupid enough to apply it to fprintf() > (you need to do fflush() before looking for errors). Okay. After some thought, I think removing it is better. > > In most cases strcpy() is safe - but if the inputs are 'corrupt' > horrid things happen - so you need truncation for safety. > But you really may not care whether the data got truncated. Yes, in most cases, expecially with literal strings ("xxx") and known buffers, strcpy() is completely safe. And yes, if the input is untrusted (may be corrupt or not), you need to be prepared for truncation. You don't need to truncate, hey, maybe the input is well-formed, but you need a function that may truncate if needed. For that, we have strncpy(3), strlcpy(3bsd), and strscpy (kernel). strncpy(3) is not for strings; it is for fixed length buffers. It has high performance penalties, and yet doesn't guarantee a C-string. DISCARDED. strlcpy(3bsd) still requires that the input is a C-string, so it can't be used with untrusted strings. It really is broken, as Linus said. Its only valid use is if the input string is known (maybe a literal), but the output buffer may be smaller than the input string. DISCARDED. strscpy (kernel) is great. BUT it is not supported by glibc. The reason for not including it (actually Ulrich used it for not including strlcpy, but it also applies to strscpy) is that it doesn't report an error when the string is truncated. I think that strscpy adds value to the current strcpy variants, and should be added to glibc. And then, coming back to truncation, yes, in some cases, truncation is wanted, and not an error, for which strscpy is great (so please add it to glibc :), but then there are other cases where truncation is not wanted, but yet we deal with untrusted strings. That is the case that I think Ulrich mentioned that wasn't covered by strlcpy (and isn't yet covered by strscpy). And the reason to add this function. Let's say a user introduces a username; it's an untrusted string, we need to process it at least with strscpy(). But if it is corrupted (too long), we don't want to silently store a truncated username; we want to inform the user, so that he introduces a new (hopefully valid) username. For that, strcpys() as defined in the previous email, is necessary. > > The other use is where you want a sequence of: > len *= str_copy(dest + len, dest_len - len, src); > But don't really want to test for overflow after each call. This is a legitimate use of strscpy(). Please, add it to glibc, as there is no libc function to do that. > > In this case returning the buffer length (including the added > '\0' on truncation works quite nicely. > No chance of writing outside the buffer, safe truncation and > a simple 'len == dest_len' check for overflow. > > OTOH there are already too many string copy functions. There are, but because the standard ones don't serve all purposes correctly, so people need to develop their own. If libc provided the necessary functions, less custom string copy functions would be necessary, as Christoph said a long time ago, which is a good thing. As I see it, there are the following, each of which has its valid usage: strcpy(3): known input && known buffer strncpy(3): not for C strings; only for fixed width buffers of characters! strlcpy)3bsd): known input && unknown buffer Given that performance-wise it's similar to strscpy(), it should probably always be replaced by strscpy(). strscpy(): unknown input && truncation is silently ignored Except for performance-critical applications, this call may replace strcpy(3), to add some extra safety. strcpys(): unknown input && truncation may be an error (or not). This call can replace strscpy() in most cases, simplifying usage. The only case I can see that strscpy() is superior is for chains of strscpy(), where I'd only use strcpys() in the last one (if any strscpy() in the chain has been trunncated, so will the last strcpys() (unless it's "")). For the reasons above, I propose both strscpy() and strcpys() for inclusion in glibc. Both trivial sources are in the previous email. Thanks, Alex --- BTW, I did some research in the kernel and glibc sources to see how are str*cpy functions being used there: user@sqli:~/src/gnu/glibc$ grep -rn 'strcpy(' * | sort -R | head -n 7 string/tester.c:1126: (void) strcpy(one, "abcd"); resolv/nss_dns/dns-host.c:532: strcpy(qp, "ip6.arpa"); dirent/tst-fdopendir.c:70: strcpy(fname, "/tmp/dXXXXXX"); string/tester.c:1006: cp = strcpy(one, "abc"); timezone/zic.c:459: return result ? strcpy(result, str) : result; timezone/zic.c:2886: strcpy(startbuf, zp->z_format); resolv/compat-gethnamaddr.c:393: strcpy(bp, qname); - string/tester.c:1126: (void) strcpy(one, "abcd"); Okay. - resolv/nss_dns/dns-host.c:532: strcpy(qp, "ip6.arpa"); Okay. - dirent/tst-fdopendir.c:70: strcpy(fname, "/tmp/dXXXXXX"); Okay. - string/tester.c:1006: cp = strcpy(one, "abc"); Okay. - timezone/zic.c:459: return result ? strcpy(result, str) : result; Okay. - timezone/zic.c:2886: strcpy(startbuf, zp->z_format); I couldn't find a guarantee that zp->z_format is NUL-terminated. Possible buffer overrun. I couldn't find a guarantee that zp->z_format fits in startbuf. Possible buffer overflow. This call could be replaced, at least, by strlcpy. The calls to strchr() previous to that strcpy() may also benefit from using strnchr(). - resolv/compat-gethnamaddr.c:393: strcpy(bp, qname); Okay. user@sqli:~/src/linux/linux$ grep -rn 'strcpy(' * | sort -R | head -n 3 drivers/platform/x86/wmi.c:346: strcpy(method, "WQ"); sound/pci/atiixp.c:1305: strcpy(pcm->name, "ATI IXP IEC958 (AC97)"); sound/pci/intel8x0.c:1439: strcpy(name, "Intel ICH"); user@sqli:~/src/linux/linux$ grep -rn 'strlcpy(' * | sort -R | head -n 3 tools/perf/builtin-buildid-cache.c:124: strlcpy(from_dir, filename, sizeof(from_dir)); drivers/usb/gadget/function/u_ether.c:148: strlcpy(p->version, UETH__VERSION, sizeof(p->version)); drivers/scsi/qla4xxx/ql4_mbx.c:1615: strlcpy(username, chap_table->name, QL4_CHAP_MAX_NAME_LEN); user@sqli:~/src/linux/linux$ grep -rn 'strscpy(' * | sort -R | head -n 3 sound/core/pcm.c:732: strscpy(pcm->id, id, sizeof(pcm->id)); sound/usb/hiface/pcm.c:597: strscpy(pcm->name, "USB-SPDIF Audio", sizeof(pcm->name)); crypto/crypto_user_stat.c:54: strscpy(rcipher.type, "cipher", sizeof(rcipher.type)); - drivers/platform/x86/wmi.c:346: strcpy(method, "WQ"); Okay - sound/pci/atiixp.c:1305: strcpy(pcm->name, "ATI IXP IEC958 (AC97)"); Okay - sound/pci/intel8x0.c:1439: strcpy(name, "Intel ICH"); Okay The sprintf() 2 lines above is also okay. - tools/perf/builtin-buildid-cache.c:124: strlcpy(from_dir, filename, sizeof(from_dir)); Okay. BUT no libc support! - drivers/usb/gadget/function/u_ether.c:148: strlcpy(p->version, UETH__VERSION, sizeof(p->version)); Okay. BUT no libc support! However, in the next line, it's not obvious where dev->gadget->name comes from. eth_get_drvinfo() is a callback (it's stored in a pointer), so I can't follow it. If it were an untrusted string (not NUL-terminated), it would be a bug. - drivers/scsi/qla4xxx/ql4_mbx.c:1615: strlcpy(username, chap_table->name, QL4_CHAP_MAX_NAME_LEN); Okay. BUT no libc support! - sound/core/pcm.c:732: strscpy(pcm->id, id, sizeof(pcm->id)); Okay. BUT no libc support! - sound/usb/hiface/pcm.c:597: strscpy(pcm->name, "USB-SPDIF Audio", sizeof(pcm->name)); Okay. BUT no libc support! - crypto/crypto_user_stat.c:54: strscpy(rcipher.type, "cipher", sizeof(rcipher.type)); Okay. BUT no libc support! None of the above checked for truncation, so I assume it's not an issue there. -- Alejandro Colomar Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/ http://www.alejandro-colomar.es/ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] strcpys(): New function for copying strings safely 2021-06-28 11:32 ` Alejandro Colomar (man-pages) @ 2021-06-28 12:00 ` Alejandro Colomar (man-pages) 2021-06-28 12:10 ` Alejandro Colomar (man-pages) 0 siblings, 1 reply; 6+ messages in thread From: Alejandro Colomar (man-pages) @ 2021-06-28 12:00 UTC (permalink / raw) To: David Laight, glibc; +Cc: tech, Christoph Hellwig, linux-kernel, drepper On 6/28/21 1:32 PM, Alejandro Colomar (man-pages) wrote: >> >> The other use is where you want a sequence of: >> len *= str_copy(dest + len, dest_len - len, src); >> But don't really want to test for overflow after each call. > > This is a legitimate use of strscpy(). Please, add it to glibc, as > there is no libc function to do that. > >> >> In this case returning the buffer length (including the added >> '\0' on truncation works quite nicely. >> No chance of writing outside the buffer, safe truncation and >> a simple 'len == dest_len' check for overflow. >> >> OTOH there are already too many string copy functions. > > There are, but because the standard ones don't serve all purposes > correctly, so people need to develop their own. If libc provided the > necessary functions, less custom string copy functions would be > necessary, as Christoph said a long time ago, which is a good thing. > > As I see it, there are the following, each of which has its valid usage: > > strcpy(3): > known input && known buffer > strncpy(3): > not for C strings; only for fixed width buffers of characters! > strlcpy)3bsd): > known input && unknown buffer > Given that performance-wise it's similar to strscpy(), > it should probably always be replaced by strscpy(). > strscpy(): > unknown input && truncation is silently ignored > Except for performance-critical applications, > this call may replace strcpy(3), to add some extra safety. > strcpys(): > unknown input && truncation may be an error (or not). > This call can replace strscpy() in most cases, > simplifying usage. > The only case I can see that strscpy() is superior > is for chains of strscpy(), where I'd only use strcpys() > in the last one (if any strscpy() in the chain has been > trunncated, so will the last strcpys() (unless it's "")). > > BTW, for chains of str_cpy(), a strscat() and a strcats() function should probably replace chained strcpy()s, so it would be something like: l = strscpy(n, dest, src); l = strscat(n - l, dest + l, src2); l = strscat(n - l, dest + l, src3); ... if (strcats(n - l, dest + l, srcN, NULL)) goto handle_truncation_or_error; And the user should make sure that srcN is not "" unless he doesn't care about truncation. Otherwise, check at every step. I'll send my source code for strscat and strcats, if strscpy and strcpys are considered for addition. -- Alejandro Colomar Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/ http://www.alejandro-colomar.es/ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] strcpys(): New function for copying strings safely 2021-06-28 12:00 ` Alejandro Colomar (man-pages) @ 2021-06-28 12:10 ` Alejandro Colomar (man-pages) 0 siblings, 0 replies; 6+ messages in thread From: Alejandro Colomar (man-pages) @ 2021-06-28 12:10 UTC (permalink / raw) To: David Laight, glibc; +Cc: tech, Christoph Hellwig, linux-kernel, drepper On 6/28/21 2:00 PM, Alejandro Colomar (man-pages) wrote: > l = strscat(n - l, dest + l, src2); Hmm, that's a bug; I wrote it too fast. Either l += strscpy(n - l, dest + l, src2); or l = strscat(n - l, dest, src2); should be used instead. -- Alejandro Colomar Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/ http://www.alejandro-colomar.es/ ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-06-28 12:10 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-06-27 19:26 [RFC] strcpys(): New function for copying strings safely Alejandro Colomar (man-pages) 2021-06-27 19:46 ` Alejandro Colomar (man-pages) 2021-06-28 8:15 ` David Laight 2021-06-28 11:32 ` Alejandro Colomar (man-pages) 2021-06-28 12:00 ` Alejandro Colomar (man-pages) 2021-06-28 12:10 ` Alejandro Colomar (man-pages)
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.