linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] strcpys(): New function for copying strings safely
@ 2021-06-27 19:26 Alejandro Colomar (man-pages)
  2021-06-27 19:46 ` Alejandro Colomar (man-pages)
  0 siblings, 1 reply; 6+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-06-27 19:26 UTC (permalink / raw)
  To: glibc; +Cc: tech, Christoph Hellwig, linux-kernel

Hi,

I recently re-read the discussions about strscpy() and strlcpy().

https://lwn.net/Articles/612244/
https://sourceware.org/legacy-ml/libc-alpha/2000-08/msg00052.html
https://mafford.com/text/the-many-ways-to-copy-a-string-in-c/
https://lkml.org/lkml/2017/7/14/637
https://lwn.net/Articles/659214/

Arguments against strlcpy():

- Inefficiency (compared to memcpy())
- It doesn't report truncation, needing extra checks by the user
- It doesn't prevent overrunning the source string (requires a C-string 
input)

Arguments in favor of strlcpy():

- Avoid code bloat
- Self-documenting code
- Avoid everybody rolling their own strcat() safe variant
- Prevent buffer overflows


I rolled my own strcpy safer functions some time ago, and after reading 
those discussions, I decided to propose them for inclusion in glibc.

I think they address all of the arguments before (well, efficiency will 
never be as good as memcpy(), I guess, but a good compiler, and -flto 
can make it good enough).

It reports two kinds of errors: "hard" errors, where the string is 
invalid, and "soft" errors, where the string is valid but truncated.
The method for reporting the errors is an 'int' return value, which is 0 
for success, -1 for "hard" error, and 1 for "soft" error.

The value of written characters that strcpy() functions typically 
return, is now moved to a pointer parameter (4th parameter).

The order of the parameters has been changed (compared to other strcpy() 
typicall variants), according to a new principle of the C2x standard, 
which says that "the size of an array appears before the array".
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2086.htm>

It doesn't require a C string in the input.  It only reads up to the 
size limit provided by the user.

I added the _np suffix to the function to mark it as non-portable 
(similar to existing practice in glibc with non-standard functions).

I used 'ssize_t' instead of 'size_t', because I consider unsigned types 
to be inherently unsafe.  See 
<https://google.github.io/styleguide/cppguide.html#Integer_Types>.

It is designed so that usage requires the minimum number of lines of 
code for complete usage (including error handling checks):

[[
// When we already checked that 'size' is >= 1
// and truncation is not an issue:

strcpys_np(size, dest, src, NULL);

// When we ddidn't check the value of 'size',
// but truncation is still not an issue:

if (strcpys_np(size, dest, src, NULL) == -1)
	goto handle_hard_error;

// When truncation is an issue:

if (strcpys_np(size, dest, src, NULL))
	goto handle_all_errors;

// When truncation is an error,
// but it requires a different handling than a "hard" error:

status = strcpys_np(size, dest, src, NULL);
if (status == 1)
	goto handle_truncation;
if (status)
	goto handle_hard_error;
]]

After any of those samples of code, the string has been copied, and is a 
valid C-string.


Here goes the code (strcpys_np() is defined in terms of strscpy_np(), 
which similar to the known strscpy(), but with some of the improvements 
mentioned above, such as using array parameters, and ssize_t):


[[

#include <string.h>
#include <sys/types.h>


[[gnu::nonnull]]
ssize_t strscpy_np(ssize_t size,
                    char dest[static restrict size],
                    const char src[static restrict size])
{
	ssize_t len;

	if (size <= 0)
		return -1;

	len = strnlen(src, size - 1);
	memcpy(dest, src, len);
	dest[len] = '\0';

	return len;
}

[[gnu::nonnull(2, 3)]]
int strcpys_np(ssize_t size,
                char dest[static restrict size],
                const char src[static restrict size],
                ssize_t *restrict len)
{
	ssize_t l;

	l = strscpy_np(size, dest, src);
	if (len)
		*len = l;

	if (l == -1)
		return -1;
	if (l >= size)
		return 1;
	return 0;
}

]]

I may have introduced some bugs right now, as I adapted the code a bit 
before sending, but I expect it to be free of any bugs known of the 
existing str*cpy() interfaces.


What do you think about this function?  Would you want to add it to glibc?


Thanks,

Alex



--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] strcpys(): New function for copying strings safely
  2021-06-27 19:26 [RFC] strcpys(): New function for copying strings safely Alejandro Colomar (man-pages)
@ 2021-06-27 19:46 ` Alejandro Colomar (man-pages)
  2021-06-28  8:15   ` David Laight
  0 siblings, 1 reply; 6+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-06-27 19:46 UTC (permalink / raw)
  To: glibc; +Cc: tech, Christoph Hellwig, linux-kernel

On 6/27/21 9:26 PM, Alejandro Colomar (man-pages) wrote:
> 
> It is designed so that usage requires the minimum number of lines of 
> code for complete usage (including error handling checks):
> 
> [[
> // When we already checked that 'size' is >= 1
> // and truncation is not an issue:
> 
> strcpys_np(size, dest, src, NULL);

Also, given how unlikely this case is, I have in my code:
`[[gnu::warn_unused_result]]`

I forgot to talk about it in the definition I sent.  I would put that 
attribute in the glibc definition, if this is added to glibc.

To ignore it, a simple cast of the result to `(void)` should be enough 
(or a more complex macro, like `UNUSED(strcpys_np(...));`).
> 
> [[
> 
> #include <string.h>
> #include <sys/types.h>
> 
> 
> [[gnu::nonnull]]
> ssize_t strscpy_np(ssize_t size,
>                     char dest[static restrict size],
>                     const char src[static restrict size])
> {
>      ssize_t len;
> 
>      if (size <= 0)
>          return -1;
> 
>      len = strnlen(src, size - 1);
>      memcpy(dest, src, len);
>      dest[len] = '\0';
> 
>      return len;
> }
> 
> [[gnu::nonnull(2, 3)]]
[[gnu::warn_unused_result]]
> int strcpys_np(ssize_t size,
>                 char dest[static restrict size],
>                 const char src[static restrict size],
>                 ssize_t *restrict len)
> {
>      ssize_t l;
> 
>      l = strscpy_np(size, dest, src);
>      if (len)
>          *len = l;
> 
>      if (l == -1)
>          return -1;
>      if (l >= size)
>          return 1;
>      return 0;
> }
> 
> ]]

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [RFC] strcpys(): New function for copying strings safely
  2021-06-27 19:46 ` Alejandro Colomar (man-pages)
@ 2021-06-28  8:15   ` David Laight
  2021-06-28 11:32     ` Alejandro Colomar (man-pages)
  0 siblings, 1 reply; 6+ messages in thread
From: David Laight @ 2021-06-28  8:15 UTC (permalink / raw)
  To: 'Alejandro Colomar (man-pages)', glibc
  Cc: tech, Christoph Hellwig, linux-kernel

From: Alejandro Colomar
> Sent: 27 June 2021 20:47
> 
> On 6/27/21 9:26 PM, Alejandro Colomar (man-pages) wrote:
> >
> > It is designed so that usage requires the minimum number of lines of
> > code for complete usage (including error handling checks):
> >
> > [[
> > // When we already checked that 'size' is >= 1
> > // and truncation is not an issue:
> >
> > strcpys_np(size, dest, src, NULL);
> 
> Also, given how unlikely this case is, I have in my code:
> `[[gnu::warn_unused_result]]`

warn_unused_result is such a PITA it should never have been invented.
At least no one seems to have been stupid enough to apply it to fprintf()
(you need to do fflush() before looking for errors).

In most cases strcpy() is safe - but if the inputs are 'corrupt'
horrid things happen - so you need truncation for safety.
But you really may not care whether the data got truncated.

The other use is where you want a sequence of:
	len *= str_copy(dest + len, dest_len - len, src);
But don't really want to test for overflow after each call.

In this case returning the buffer length (including the added
'\0' on truncation works quite nicely.
No chance of writing outside the buffer, safe truncation and
a simple 'len == dest_len' check for overflow.

OTOH there are already too many string copy functions.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] strcpys(): New function for copying strings safely
  2021-06-28  8:15   ` David Laight
@ 2021-06-28 11:32     ` Alejandro Colomar (man-pages)
  2021-06-28 12:00       ` Alejandro Colomar (man-pages)
  0 siblings, 1 reply; 6+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-06-28 11:32 UTC (permalink / raw)
  To: David Laight, glibc; +Cc: tech, Christoph Hellwig, linux-kernel, drepper

Hello David,

On 6/28/21 10:15 AM, David Laight wrote:
> From: Alejandro Colomar
>> Sent: 27 June 2021 20:47
>>
>> On 6/27/21 9:26 PM, Alejandro Colomar (man-pages) wrote:
>>>
>>> It is designed so that usage requires the minimum number of lines of
>>> code for complete usage (including error handling checks):
>>>
>>> [[
>>> // When we already checked that 'size' is >= 1
>>> // and truncation is not an issue:
>>>
>>> strcpys_np(size, dest, src, NULL);
>>
>> Also, given how unlikely this case is, I have in my code:
>> `[[gnu::warn_unused_result]]`
> 
> warn_unused_result is such a PITA it should never have been invented.
> At least no one seems to have been stupid enough to apply it to fprintf()
> (you need to do fflush() before looking for errors).

Okay.  After some thought, I think removing it is better.

> 
> In most cases strcpy() is safe - but if the inputs are 'corrupt'
> horrid things happen - so you need truncation for safety.
> But you really may not care whether the data got truncated.

Yes, in most cases, expecially with literal strings ("xxx") and known 
buffers, strcpy() is completely safe.

And yes, if the input is untrusted (may be corrupt or not), you need to 
be prepared for truncation.  You don't need to truncate, hey, maybe the 
input is well-formed, but you need a function that may truncate if needed.

For that, we have strncpy(3), strlcpy(3bsd), and strscpy (kernel).

strncpy(3) is not for strings; it is for fixed length buffers.  It has 
high performance penalties, and yet doesn't guarantee a C-string. 
DISCARDED.

strlcpy(3bsd) still requires that the input is a C-string, so it can't 
be used with untrusted strings.  It really is broken, as Linus said. 
Its only valid use is if the input string is known (maybe a literal), 
but the output buffer may be smaller than the input string.  DISCARDED.

strscpy (kernel) is great.  BUT it is not supported by glibc.  The 
reason for not including it (actually Ulrich used it for not including 
strlcpy, but it also applies to strscpy) is that it doesn't report an 
error when the string is truncated.  I think that strscpy adds value to 
the current strcpy variants, and should be added to glibc.

And then, coming back to truncation, yes, in some cases, truncation is 
wanted, and not an error, for which strscpy is great (so please add it 
to glibc :), but then there are other cases where truncation is not 
wanted, but yet we deal with untrusted strings.  That is the case that I 
think Ulrich mentioned that wasn't covered by strlcpy (and isn't yet 
covered by strscpy).  And the reason to add this function.

Let's say a user introduces a username;  it's an untrusted string, we 
need to process it at least with strscpy().  But if it is corrupted (too 
long), we don't want to silently store a truncated username; we want to 
inform the user, so that he introduces a new (hopefully valid) username. 
  For that, strcpys() as defined in the previous email, is necessary.

> 
> The other use is where you want a sequence of:
> 	len *= str_copy(dest + len, dest_len - len, src);
> But don't really want to test for overflow after each call.

This is a legitimate use of strscpy().  Please, add it to glibc, as 
there is no libc function to do that.

> 
> In this case returning the buffer length (including the added
> '\0' on truncation works quite nicely.
> No chance of writing outside the buffer, safe truncation and
> a simple 'len == dest_len' check for overflow.
> 
> OTOH there are already too many string copy functions.

There are, but because the standard ones don't serve all purposes 
correctly, so people need to develop their own.  If libc provided the 
necessary functions, less custom string copy functions would be 
necessary, as Christoph said a long time ago, which is a good thing.

As I see it, there are the following, each of which has its valid usage:

strcpy(3):
	known input && known buffer
strncpy(3):
	not for C strings; only for fixed width buffers of characters!
strlcpy)3bsd):
	known input && unknown buffer
	Given that performance-wise it's similar to strscpy(),
	it should probably always be replaced by strscpy().
strscpy():
	unknown input && truncation is silently ignored
	Except for performance-critical applications,
	this call may replace strcpy(3), to add some extra safety.
strcpys():
	unknown input && truncation may be an error (or not).
	This call can replace strscpy() in most cases,
	simplifying usage.
	The only case I can see that strscpy() is superior
	is for chains of strscpy(), where I'd only use strcpys()
	in the last one (if any strscpy() in the chain has been
	trunncated, so will the last strcpys() (unless it's "")).


For the reasons above, I propose both strscpy() and strcpys() for 
inclusion in glibc.  Both trivial sources are in the previous email.

Thanks,

Alex


---

BTW, I did some research in the kernel and glibc sources to see how are 
str*cpy functions being used there:


user@sqli:~/src/gnu/glibc$ grep -rn 'strcpy(' * | sort -R | head -n 7
string/tester.c:1126:  (void) strcpy(one, "abcd");
resolv/nss_dns/dns-host.c:532:      strcpy(qp, "ip6.arpa");
dirent/tst-fdopendir.c:70:  strcpy(fname, "/tmp/dXXXXXX");
string/tester.c:1006:  cp = strcpy(one, "abc");
timezone/zic.c:459:  return result ? strcpy(result, str) : result;
timezone/zic.c:2886:					strcpy(startbuf, zp->z_format);
resolv/compat-gethnamaddr.c:393:			strcpy(bp, qname);



- string/tester.c:1126:  (void) strcpy(one, "abcd");
Okay.

- resolv/nss_dns/dns-host.c:532:      strcpy(qp, "ip6.arpa");
Okay.

- dirent/tst-fdopendir.c:70:  strcpy(fname, "/tmp/dXXXXXX");
Okay.

- string/tester.c:1006:  cp = strcpy(one, "abc");
Okay.

- timezone/zic.c:459:  return result ? strcpy(result, str) : result;
Okay.

- timezone/zic.c:2886:					strcpy(startbuf, zp->z_format);
I couldn't find a guarantee that zp->z_format is NUL-terminated.
Possible buffer overrun.
I couldn't find a guarantee that zp->z_format fits in startbuf.
Possible buffer overflow.
This call could be replaced, at least, by strlcpy.  The calls to strchr()
previous to that strcpy() may also benefit from using strnchr().

- resolv/compat-gethnamaddr.c:393:			strcpy(bp, qname);
Okay.

user@sqli:~/src/linux/linux$ grep -rn 'strcpy(' * | sort -R | head -n 3
drivers/platform/x86/wmi.c:346:	strcpy(method, "WQ");
sound/pci/atiixp.c:1305:		strcpy(pcm->name, "ATI IXP IEC958 (AC97)");
sound/pci/intel8x0.c:1439:		strcpy(name, "Intel ICH");

user@sqli:~/src/linux/linux$ grep -rn 'strlcpy(' * | sort -R | head -n 3
tools/perf/builtin-buildid-cache.c:124:	strlcpy(from_dir, filename, 
sizeof(from_dir));
drivers/usb/gadget/function/u_ether.c:148:	strlcpy(p->version, 
UETH__VERSION, sizeof(p->version));
drivers/scsi/qla4xxx/ql4_mbx.c:1615:	strlcpy(username, chap_table->name, 
QL4_CHAP_MAX_NAME_LEN);

user@sqli:~/src/linux/linux$ grep -rn 'strscpy(' * | sort -R | head -n 3
sound/core/pcm.c:732:		strscpy(pcm->id, id, sizeof(pcm->id));
sound/usb/hiface/pcm.c:597:	strscpy(pcm->name, "USB-SPDIF Audio", 
sizeof(pcm->name));
crypto/crypto_user_stat.c:54:	strscpy(rcipher.type, "cipher", 
sizeof(rcipher.type));




- drivers/platform/x86/wmi.c:346:	strcpy(method, "WQ");
Okay

- sound/pci/atiixp.c:1305:		strcpy(pcm->name, "ATI IXP IEC958 (AC97)");
Okay

- sound/pci/intel8x0.c:1439:		strcpy(name, "Intel ICH");
Okay
The sprintf() 2 lines above is also okay.


- tools/perf/builtin-buildid-cache.c:124:	strlcpy(from_dir, filename, 
sizeof(from_dir));
Okay.  BUT no libc support!

- drivers/usb/gadget/function/u_ether.c:148:	strlcpy(p->version, 
UETH__VERSION, sizeof(p->version));
Okay.  BUT no libc support!
However, in the next line, it's not obvious where dev->gadget->name 
comes from.
eth_get_drvinfo() is a callback (it's stored in a pointer), so I can't 
follow it.
If it were an untrusted string (not NUL-terminated), it would be a bug.

- drivers/scsi/qla4xxx/ql4_mbx.c:1615:	strlcpy(username, 
chap_table->name, QL4_CHAP_MAX_NAME_LEN);
Okay.  BUT no libc support!


- sound/core/pcm.c:732:		strscpy(pcm->id, id, sizeof(pcm->id));
Okay.  BUT no libc support!

- sound/usb/hiface/pcm.c:597:	strscpy(pcm->name, "USB-SPDIF Audio", 
sizeof(pcm->name));
Okay.  BUT no libc support!

- crypto/crypto_user_stat.c:54:	strscpy(rcipher.type, "cipher", 
sizeof(rcipher.type));
Okay.  BUT no libc support!


None of the above checked for truncation, so I assume it's not an issue 
there.



-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] strcpys(): New function for copying strings safely
  2021-06-28 11:32     ` Alejandro Colomar (man-pages)
@ 2021-06-28 12:00       ` Alejandro Colomar (man-pages)
  2021-06-28 12:10         ` Alejandro Colomar (man-pages)
  0 siblings, 1 reply; 6+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-06-28 12:00 UTC (permalink / raw)
  To: David Laight, glibc; +Cc: tech, Christoph Hellwig, linux-kernel, drepper

On 6/28/21 1:32 PM, Alejandro Colomar (man-pages) wrote:
>>
>> The other use is where you want a sequence of:
>>     len *= str_copy(dest + len, dest_len - len, src);
>> But don't really want to test for overflow after each call.
> 
> This is a legitimate use of strscpy().  Please, add it to glibc, as 
> there is no libc function to do that.
> 
>>
>> In this case returning the buffer length (including the added
>> '\0' on truncation works quite nicely.
>> No chance of writing outside the buffer, safe truncation and
>> a simple 'len == dest_len' check for overflow.
>>
>> OTOH there are already too many string copy functions.
> 
> There are, but because the standard ones don't serve all purposes 
> correctly, so people need to develop their own.  If libc provided the 
> necessary functions, less custom string copy functions would be 
> necessary, as Christoph said a long time ago, which is a good thing.
> 
> As I see it, there are the following, each of which has its valid usage:
> 
> strcpy(3):
>      known input && known buffer
> strncpy(3):
>      not for C strings; only for fixed width buffers of characters!
> strlcpy)3bsd):
>      known input && unknown buffer
>      Given that performance-wise it's similar to strscpy(),
>      it should probably always be replaced by strscpy().
> strscpy():
>      unknown input && truncation is silently ignored
>      Except for performance-critical applications,
>      this call may replace strcpy(3), to add some extra safety.
> strcpys():
>      unknown input && truncation may be an error (or not).
>      This call can replace strscpy() in most cases,
>      simplifying usage.
>      The only case I can see that strscpy() is superior
>      is for chains of strscpy(), where I'd only use strcpys()
>      in the last one (if any strscpy() in the chain has been
>      trunncated, so will the last strcpys() (unless it's "")).
> 
> 

BTW, for chains of str_cpy(), a strscat() and a strcats() function 
should probably replace chained strcpy()s, so it would be something like:

l = strscpy(n, dest, src);
l = strscat(n - l, dest + l, src2);
l = strscat(n - l, dest + l, src3);
...
if (strcats(n - l, dest + l, srcN, NULL))
	goto handle_truncation_or_error;

And the user should make sure that srcN is not "" unless he doesn't care 
about truncation.  Otherwise, check at every step.

I'll send my source code for strscat and strcats, if strscpy and strcpys 
are considered for addition.


-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] strcpys(): New function for copying strings safely
  2021-06-28 12:00       ` Alejandro Colomar (man-pages)
@ 2021-06-28 12:10         ` Alejandro Colomar (man-pages)
  0 siblings, 0 replies; 6+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-06-28 12:10 UTC (permalink / raw)
  To: David Laight, glibc; +Cc: tech, Christoph Hellwig, linux-kernel, drepper

On 6/28/21 2:00 PM, Alejandro Colomar (man-pages) wrote:
> l = strscat(n - l, dest + l, src2);

Hmm, that's a bug; I wrote it too fast.

Either

l += strscpy(n - l, dest + l, src2);

or

l = strscat(n - l, dest, src2);

should be used instead.

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-06-28 12:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-27 19:26 [RFC] strcpys(): New function for copying strings safely Alejandro Colomar (man-pages)
2021-06-27 19:46 ` Alejandro Colomar (man-pages)
2021-06-28  8:15   ` David Laight
2021-06-28 11:32     ` Alejandro Colomar (man-pages)
2021-06-28 12:00       ` Alejandro Colomar (man-pages)
2021-06-28 12:10         ` Alejandro Colomar (man-pages)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).