dash.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] parser: don't keep alloca()ing in a loop for substitutions
@ 2022-12-14 23:39 наб
  2022-12-15 10:27 ` Herbert Xu
  0 siblings, 1 reply; 8+ messages in thread
From: наб @ 2022-12-14 23:39 UTC (permalink / raw)
  To: dash

[-- Attachment #1: Type: text/plain, Size: 2415 bytes --]

When encountering
  printf %010000d | tr 0 \` | sh -n
  printf %09999d  | tr 0 \` | sh -n
you want no output and "Syntax error: EOF in backquote substitution",
respectively; instead, current dash segfaults.

This is because the alloca for the save buffer is run, naturally,
in the same function, so first it allocates one byte, then two,
then ..., then appx. 4000 (for me, depends on the binary),
then it segfaults on the memcpy (it's even worse, since due to
alignment, it usually allocates much more for the early stuff).

Nevertheless, the stack frame grows unboundedly, until we completely
destroy the stack. Instead, alloca a 1KiB buffer on first use and
fall back to ckmalloc for bigger save buffers. In practice this means
that we'll alloca nothing in a good amount of cases, then just about
always use the alloca buffer except for truly pathological input.

Fixes: https://bugs.debian.org/966156
---
 src/parser.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/src/parser.c b/src/parser.c
index a552c47..3f7e50a 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -898,6 +898,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 	struct nodelist *bqlist;
 	int quotef;
 	int oldstyle;
+	char *parsebackq_save;
 	/* syntax stack */
 	struct synstack synbase = { .syntax = syntax };
 	struct synstack *synstack = &synbase;
@@ -906,6 +907,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 		synstack->dblquote = 1;
 	quotef = 0;
 	bqlist = NULL;
+	parsebackq_save = NULL;
 
 	STARTSTACKSTR(out);
 	loop: {	/* for each line, until end of word */
@@ -1355,15 +1357,18 @@ badsub:
 parsebackq: {
 	struct nodelist **nlpp;
 	union node *n;
-	char *str;
+	char *str, *mstr;
 	size_t savelen;
 	struct heredoc *saveheredoclist;
 	int uninitialized_var(saveprompt);
 
-	str = NULL;
+	str = mstr = NULL;
 	savelen = out - (char *)stackblock();
 	if (savelen > 0) {
-		str = alloca(savelen);
+		if (savelen > 1024)
+			str = mstr = ckmalloc(savelen);
+		else
+			str = parsebackq_save ?: (parsebackq_save = alloca(1024));
 		memcpy(str, stackblock(), savelen);
 	}
         if (oldstyle) {
@@ -1449,6 +1454,7 @@ done:
 	if (str) {
 		memcpy(out, str, savelen);
 		STADJUST(savelen, out);
+		free(mstr);
 	}
 	USTPUTC(CTLBACKQ, out);
 	if (oldstyle)
-- 
2.30.2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] parser: don't keep alloca()ing in a loop for substitutions
  2022-12-14 23:39 [PATCH] parser: don't keep alloca()ing in a loop for substitutions наб
@ 2022-12-15 10:27 ` Herbert Xu
  2022-12-15 17:02   ` [PATCH v2] " наб
  0 siblings, 1 reply; 8+ messages in thread
From: Herbert Xu @ 2022-12-15 10:27 UTC (permalink / raw)
  To: наб; +Cc: dash

наб <nabijaczleweli@nabijaczleweli.xyz> wrote:
>
> diff --git a/src/parser.c b/src/parser.c
> index a552c47..3f7e50a 100644
> --- a/src/parser.c
> +++ b/src/parser.c
> @@ -898,6 +898,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
>        struct nodelist *bqlist;
>        int quotef;
>        int oldstyle;
> +       char *parsebackq_save;
>        /* syntax stack */
>        struct synstack synbase = { .syntax = syntax };
>        struct synstack *synstack = &synbase;
> @@ -906,6 +907,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
>                synstack->dblquote = 1;
>        quotef = 0;
>        bqlist = NULL;
> +       parsebackq_save = NULL;
> 
>        STARTSTACKSTR(out);
>        loop: { /* for each line, until end of word */
> @@ -1355,15 +1357,18 @@ badsub:
> parsebackq: {
>        struct nodelist **nlpp;
>        union node *n;
> -       char *str;
> +       char *str, *mstr;
>        size_t savelen;
>        struct heredoc *saveheredoclist;
>        int uninitialized_var(saveprompt);
> 
> -       str = NULL;
> +       str = mstr = NULL;
>        savelen = out - (char *)stackblock();
>        if (savelen > 0) {
> -               str = alloca(savelen);
> +               if (savelen > 1024)
> +                       str = mstr = ckmalloc(savelen);
> +               else
> +                       str = parsebackq_save ?: (parsebackq_save = alloca(1024));
>                memcpy(str, stackblock(), savelen);
>        }
>         if (oldstyle) {
> @@ -1449,6 +1454,7 @@ done:
>        if (str) {
>                memcpy(out, str, savelen);
>                STADJUST(savelen, out);
> +               free(mstr);

You can't just call ckmalloc because that memory will be leaked
if there is a longjmp (such as sh_error) before you free it.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2] parser: don't keep alloca()ing in a loop for substitutions
  2022-12-15 10:27 ` Herbert Xu
@ 2022-12-15 17:02   ` наб
  2023-01-05  9:02     ` Herbert Xu
  0 siblings, 1 reply; 8+ messages in thread
From: наб @ 2022-12-15 17:02 UTC (permalink / raw)
  To: Herbert Xu; +Cc: dash

[-- Attachment #1: Type: text/plain, Size: 2678 bytes --]

When encountering
  printf %010000d | tr 0 \` | sh -n
  printf %09999d  | tr 0 \` | sh -n
you want no output and "Syntax error: EOF in backquote substitution",
respectively; instead, current dash segfaults.

This is because the alloca for the save buffer is run, naturally,
in the same function, so first it allocates one byte, then two,
then ..., then appx. 4000 (for me, depends on the binary),
then it segfaults on the memcpy (it's even worse, since due to
alignment, it usually allocates much more for the early stuff).

Nevertheless, the stack frame grows unboundedly, until we completely
destroy the stack. Instead of squirreling the out block away, then
letting subsequent allocations override the original, mark it used,
and just re-copy it to the top of the dash stack. This increases peak
memory usage somewhat
(in the most pathological case ‒ the above but with three nines ‒
 from 23.26 to 173.7KiB according to massif,
 in parsing a regular program (ratrun from ratrun 0c)
 from 28.68 to 29.19;
 a simpler program (ibid., rat) stays at 5.422;
 parsing libtoolize, debootstrap, and dkms
 (the biggest shell programs in my /[s]bin by size + by `/$( count)
 likewise stay the same at 12.02, 41.48, and 6.438)
but it's barely measurable outside of truly pathological conditions
that were a step away from a segfault previously.

Fixes: https://bugs.debian.org/966156
---
Naturally, I hadn't considered that.

This version I've run through valgrind in a good few configurations and
am happy to conclude there are no leaks (and the memory usage bump is
imperceptible unless you were almost-crashing anyway).

 src/parser.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/src/parser.c b/src/parser.c
index a552c47..89698cb 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -1360,12 +1360,9 @@ parsebackq: {
 	struct heredoc *saveheredoclist;
 	int uninitialized_var(saveprompt);
 
-	str = NULL;
+	str = stackblock();
 	savelen = out - (char *)stackblock();
-	if (savelen > 0) {
-		str = alloca(savelen);
-		memcpy(str, stackblock(), savelen);
-	}
+	grabstackblock(savelen);
         if (oldstyle) {
                 /* We must read until the closing backquote, giving special
                    treatment to some slashes, and then push the string and
@@ -1446,10 +1443,8 @@ done:
 	if (oldstyle)
 		tokpushback = 0;
 	out = growstackto(savelen + 1);
-	if (str) {
-		memcpy(out, str, savelen);
-		STADJUST(savelen, out);
-	}
+	memcpy(out, str, savelen);
+	STADJUST(savelen, out);
 	USTPUTC(CTLBACKQ, out);
 	if (oldstyle)
 		goto parsebackq_oldreturn;
-- 
2.30.2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] parser: don't keep alloca()ing in a loop for substitutions
  2022-12-15 17:02   ` [PATCH v2] " наб
@ 2023-01-05  9:02     ` Herbert Xu
  2023-01-05 13:42       ` [PATCH v3] " наб
  0 siblings, 1 reply; 8+ messages in thread
From: Herbert Xu @ 2023-01-05  9:02 UTC (permalink / raw)
  To: наб; +Cc: dash

наб <nabijaczleweli@nabijaczleweli.xyz> wrote:
>
> Naturally, I hadn't considered that.
> 
> This version I've run through valgrind in a good few configurations and
> am happy to conclude there are no leaks (and the memory usage bump is
> imperceptible unless you were almost-crashing anyway).

Nice work!

> @@ -1446,10 +1443,8 @@ done:
>        if (oldstyle)
>                tokpushback = 0;
>        out = growstackto(savelen + 1);
> -       if (str) {
> -               memcpy(out, str, savelen);
> -               STADJUST(savelen, out);
> -       }
> +       memcpy(out, str, savelen);
> +       STADJUST(savelen, out);

Minor nit but these three lines can be combined into:

	out = stnputs(stackblock(), str, savelen);

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3] parser: don't keep alloca()ing in a loop for substitutions
  2023-01-05  9:02     ` Herbert Xu
@ 2023-01-05 13:42       ` наб
  2023-01-06  3:15         ` [PATCH] parser: Print CTLBACKQ early in parsesub Herbert Xu
  2023-01-08 12:07         ` [PATCH v3] parser: don't keep alloca()ing in a loop for substitutions Herbert Xu
  0 siblings, 2 replies; 8+ messages in thread
From: наб @ 2023-01-05 13:42 UTC (permalink / raw)
  To: Herbert Xu; +Cc: dash

[-- Attachment #1: Type: text/plain, Size: 2682 bytes --]

When encountering
  printf %010000d | tr 0 \` | sh -n
  printf %09999d  | tr 0 \` | sh -n
you want no output and "Syntax error: EOF in backquote substitution",
respectively; instead, current dash segfaults.

This is because the alloca for the save buffer is run, naturally,
in the same function, so first it allocates one byte, then two,
then ..., then appx. 4000 (for me, depends on the binary),
then it segfaults on the memcpy (it's even worse, since due to
alignment, it usually allocates much more for the early stuff).

Nevertheless, the stack frame grows unboundedly, until we completely
destroy the stack. Instead of squirreling the out block away, then
letting subsequent allocations override the original, mark it used,
and just re-copy it to the top of the dash stack. This increases peak
memory usage somewhat
(in the most pathological case ‒ the above but with three nines ‒
 from 23.26 to 173.7KiB according to massif,
 in parsing a regular program (ratrun from ratrun 0c)
 from 28.68 to 29.19;
 a simpler program (ibid., rat) stays at 5.422;
 parsing libtoolize, debootstrap, and dkms
 (the biggest shell programs in my /[s]bin by size + by `/$( count)
 likewise stay the same at 12.02, 41.48, and 6.438)
but it's barely measurable outside of truly pathological conditions
that were a step away from a segfault previously.

Fixes: https://bugs.debian.org/966156
---
I think this means we also need to turn the USTPUTC() into STPUTC(),
since the previous code explicitly over-accounted for it in growstackto().

 src/parser.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/src/parser.c b/src/parser.c
index 8a06b9e..f5f76d5 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -1360,12 +1360,9 @@ parsebackq: {
 	struct heredoc *saveheredoclist;
 	int uninitialized_var(saveprompt);
 
-	str = NULL;
+	str = stackblock();
 	savelen = out - (char *)stackblock();
-	if (savelen > 0) {
-		str = alloca(savelen);
-		memcpy(str, stackblock(), savelen);
-	}
+	grabstackblock(savelen);
         if (oldstyle) {
                 /* We must read until the closing backquote, giving special
                    treatment to some slashes, and then push the string and
@@ -1445,12 +1442,8 @@ done:
 	/* Ignore any pushed back tokens left from the backquote parsing. */
 	if (oldstyle)
 		tokpushback = 0;
-	out = growstackto(savelen + 1);
-	if (str) {
-		memcpy(out, str, savelen);
-		STADJUST(savelen, out);
-	}
-	USTPUTC(CTLBACKQ, out);
+	out = stnputs(str, savelen, stackblock());
+	STPUTC(CTLBACKQ, out);
 	if (oldstyle)
 		goto parsebackq_oldreturn;
 	else
-- 
2.30.2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH] parser: Print CTLBACKQ early in parsesub
  2023-01-05 13:42       ` [PATCH v3] " наб
@ 2023-01-06  3:15         ` Herbert Xu
  2023-01-06 11:49           ` наб
  2023-01-08 12:07         ` [PATCH v3] parser: don't keep alloca()ing in a loop for substitutions Herbert Xu
  1 sibling, 1 reply; 8+ messages in thread
From: Herbert Xu @ 2023-01-06  3:15 UTC (permalink / raw)
  To: наб; +Cc: dash

On Thu, Jan 05, 2023 at 02:42:04PM +0100, наб wrote:
>
> I think this means we also need to turn the USTPUTC() into STPUTC(),
> since the previous code explicitly over-accounted for it in growstackto().

Good catch.  However, we can easily get rid of it by moving the
USTPUTC to the top:

---8<---
As we are allowed to perform 4 USTPUTC's we can save a growstackstr
call by adding the CTLBACKQ before we save the string.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/src/parser.c b/src/parser.c
index f5f76d5..299c260 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -1360,6 +1360,7 @@ parsebackq: {
 	struct heredoc *saveheredoclist;
 	int uninitialized_var(saveprompt);
 
+	USTPUTC(CTLBACKQ, out);
 	str = stackblock();
 	savelen = out - (char *)stackblock();
 	grabstackblock(savelen);
@@ -1443,7 +1444,6 @@ done:
 	if (oldstyle)
 		tokpushback = 0;
 	out = stnputs(str, savelen, stackblock());
-	STPUTC(CTLBACKQ, out);
 	if (oldstyle)
 		goto parsebackq_oldreturn;
 	else
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] parser: Print CTLBACKQ early in parsesub
  2023-01-06  3:15         ` [PATCH] parser: Print CTLBACKQ early in parsesub Herbert Xu
@ 2023-01-06 11:49           ` наб
  0 siblings, 0 replies; 8+ messages in thread
From: наб @ 2023-01-06 11:49 UTC (permalink / raw)
  To: Herbert Xu; +Cc: dash

[-- Attachment #1: Type: text/plain, Size: 505 bytes --]

Hi!

On Fri, Jan 06, 2023 at 11:15:55AM +0800, Herbert Xu wrote:
> On Thu, Jan 05, 2023 at 02:42:04PM +0100, наб wrote:
> > I think this means we also need to turn the USTPUTC() into STPUTC(),
> > since the previous code explicitly over-accounted for it in growstackto().
> Good catch.  However, we can easily get rid of it by moving the
> USTPUTC to the top:

Yep, patch works for me and appears sound; that's the first USTPUTC()
for the new path and the second for the old path.

наб

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] parser: don't keep alloca()ing in a loop for substitutions
  2023-01-05 13:42       ` [PATCH v3] " наб
  2023-01-06  3:15         ` [PATCH] parser: Print CTLBACKQ early in parsesub Herbert Xu
@ 2023-01-08 12:07         ` Herbert Xu
  1 sibling, 0 replies; 8+ messages in thread
From: Herbert Xu @ 2023-01-08 12:07 UTC (permalink / raw)
  To: наб; +Cc: dash

On Thu, Jan 05, 2023 at 02:42:04PM +0100, наб wrote:
> When encountering
>   printf %010000d | tr 0 \` | sh -n
>   printf %09999d  | tr 0 \` | sh -n
> you want no output and "Syntax error: EOF in backquote substitution",
> respectively; instead, current dash segfaults.
> 
> This is because the alloca for the save buffer is run, naturally,
> in the same function, so first it allocates one byte, then two,
> then ..., then appx. 4000 (for me, depends on the binary),
> then it segfaults on the memcpy (it's even worse, since due to
> alignment, it usually allocates much more for the early stuff).
> 
> Nevertheless, the stack frame grows unboundedly, until we completely
> destroy the stack. Instead of squirreling the out block away, then
> letting subsequent allocations override the original, mark it used,
> and just re-copy it to the top of the dash stack. This increases peak
> memory usage somewhat
> (in the most pathological case ‒ the above but with three nines ‒
>  from 23.26 to 173.7KiB according to massif,
>  in parsing a regular program (ratrun from ratrun 0c)
>  from 28.68 to 29.19;
>  a simpler program (ibid., rat) stays at 5.422;
>  parsing libtoolize, debootstrap, and dkms
>  (the biggest shell programs in my /[s]bin by size + by `/$( count)
>  likewise stay the same at 12.02, 41.48, and 6.438)
> but it's barely measurable outside of truly pathological conditions
> that were a step away from a segfault previously.
> 
> Fixes: https://bugs.debian.org/966156
> ---
> I think this means we also need to turn the USTPUTC() into STPUTC(),
> since the previous code explicitly over-accounted for it in growstackto().
> 
>  src/parser.c | 15 ++++-----------
>  1 file changed, 4 insertions(+), 11 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-01-08 12:07 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-14 23:39 [PATCH] parser: don't keep alloca()ing in a loop for substitutions наб
2022-12-15 10:27 ` Herbert Xu
2022-12-15 17:02   ` [PATCH v2] " наб
2023-01-05  9:02     ` Herbert Xu
2023-01-05 13:42       ` [PATCH v3] " наб
2023-01-06  3:15         ` [PATCH] parser: Print CTLBACKQ early in parsesub Herbert Xu
2023-01-06 11:49           ` наб
2023-01-08 12:07         ` [PATCH v3] parser: don't keep alloca()ing in a loop for substitutions Herbert Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).