All of lore.kernel.org
 help / color / mirror / Atom feed
* [LTP] [PATCH v4 1/1] docparse: Handle special characters in JSON
@ 2021-05-06 13:27 Petr Vorel
  2021-05-06 14:44 ` Cyril Hrubis
  0 siblings, 1 reply; 6+ messages in thread
From: Petr Vorel @ 2021-05-06 13:27 UTC (permalink / raw)
  To: ltp

* escape backslash (/) and double quote (")
  escaping backslash effectively escapes other C escaped strings (\t,
  \n, ...), which we sometimes want (in the comment) but sometimes not
  (in .option we want to have them interpreted)
* replace tab with 8x space
* skip and TWARN invalid chars (< 0x20, i.e. anything before space)
  defined by RFC 8259 (https://tools.ietf.org/html/rfc8259#page-9)

NOTE: atm fix is required only for ", but tab was problematic in the past.

TODO: This is just a "hot fix" solution before release. Proper solution
would be to check if chars needed to be escaped (", \, /) aren't already
escaped.

Also for correct decision whether \n, \t should be escaped or interpreted
we should decide in the parser which has the context. C string should be
probably interpreted (thus nothing needed to be done as it escapes in
a compatible way with JSON), but comments probably should display \n, \t
thus add extra \.

Fixes: c39b29f0a ("bpf: Check truncation on 32bit div/mod by zero")

Suggested-by: Cyril Hrubis <chrubis@suse.cz>
Co-developed-by: Cyril Hrubis <chrubis@suse.cz>
Signed-off-by: Petr Vorel <pvorel@suse.cz>
---
 docparse/data_storage.h | 36 +++++++++++++++++++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/docparse/data_storage.h b/docparse/data_storage.h
index ef420c08f..9f36dd6f0 100644
--- a/docparse/data_storage.h
+++ b/docparse/data_storage.h
@@ -256,6 +256,40 @@ static inline void data_fprintf(FILE *f, unsigned int padd, const char *fmt, ...
 	va_end(va);
 }
 
+
+static inline void data_fprintf_esc(FILE *f, unsigned int padd, const char *str)
+{
+	while (padd-- > 0)
+		fputc(' ', f);
+
+	fputc('"', f);
+
+	while (*str) {
+		switch (*str) {
+		case '\\':
+			fputs("\\\\", f);
+			break;
+		case '"':
+			fputs("\\\"", f);
+			break;
+		case '\t':
+			fputs("        ", f);
+			break;
+		default:
+			/* RFC 8259 specify  chars before 0x20 as invalid */
+			if (*str >= 0x20)
+				putc(*str, f);
+			else
+				fprintf(stderr, "%s:%d %s(): invalid character for JSON: %x\n",
+						__FILE__, __LINE__, __func__, *str);
+			break;
+		}
+		str++;
+	}
+
+	fputc('"', f);
+}
+
 static inline void data_to_json_(struct data_node *self, FILE *f, unsigned int padd, int do_padd)
 {
 	unsigned int i;
@@ -263,7 +297,7 @@ static inline void data_to_json_(struct data_node *self, FILE *f, unsigned int p
 	switch (self->type) {
 	case DATA_STRING:
 		padd = do_padd ? padd : 0;
-		data_fprintf(f, padd, "\"%s\"", self->string.val);
+		data_fprintf_esc(f, padd, self->string.val);
 	break;
 	case DATA_HASH:
 		for (i = 0; i < self->hash.elems_used; i++) {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [LTP] [PATCH v4 1/1] docparse: Handle special characters in JSON
  2021-05-06 13:27 [LTP] [PATCH v4 1/1] docparse: Handle special characters in JSON Petr Vorel
@ 2021-05-06 14:44 ` Cyril Hrubis
  2021-05-06 18:21   ` Petr Vorel
  2021-05-06 19:35   ` Petr Vorel
  0 siblings, 2 replies; 6+ messages in thread
From: Cyril Hrubis @ 2021-05-06 14:44 UTC (permalink / raw)
  To: ltp

Hi!
> * escape backslash (/) and double quote (")
                      ^
		      \
>   escaping backslash effectively escapes other C escaped strings (\t,
>   \n, ...), which we sometimes want (in the comment) but sometimes not
>   (in .option we want to have them interpreted)
> * replace tab with 8x space
> * skip and TWARN invalid chars (< 0x20, i.e. anything before space)
             ^
	     warn on? We are not actually using TWARN o here right?
>   defined by RFC 8259 (https://tools.ietf.org/html/rfc8259#page-9)
> 
> NOTE: atm fix is required only for ", but tab was problematic in the past.
> 
> TODO: This is just a "hot fix" solution before release. Proper solution
> would be to check if chars needed to be escaped (", \, /) aren't already
> escaped.
> 
> Also for correct decision whether \n, \t should be escaped or interpreted
> we should decide in the parser which has the context. C string should be
> probably interpreted (thus nothing needed to be done as it escapes in
> a compatible way with JSON), but comments probably should display \n, \t
> thus add extra \.
>
> Fixes: c39b29f0a ("bpf: Check truncation on 32bit div/mod by zero")
> 
> Suggested-by: Cyril Hrubis <chrubis@suse.cz>
> Co-developed-by: Cyril Hrubis <chrubis@suse.cz>
> Signed-off-by: Petr Vorel <pvorel@suse.cz>
> ---
>  docparse/data_storage.h | 36 +++++++++++++++++++++++++++++++++++-
>  1 file changed, 35 insertions(+), 1 deletion(-)
> 
> diff --git a/docparse/data_storage.h b/docparse/data_storage.h
> index ef420c08f..9f36dd6f0 100644
> --- a/docparse/data_storage.h
> +++ b/docparse/data_storage.h
> @@ -256,6 +256,40 @@ static inline void data_fprintf(FILE *f, unsigned int padd, const char *fmt, ...
>  	va_end(va);
>  }
>  
> +
> +static inline void data_fprintf_esc(FILE *f, unsigned int padd, const char *str)
> +{
> +	while (padd-- > 0)
> +		fputc(' ', f);
> +
> +	fputc('"', f);

	int was_backslash = 0;

> +	while (*str) {
> +		switch (*str) {
> +		case '\\':
> +		break;
> +		case '"':
> +			fputs("\\\"", f);
			was_backslash = 0;
> +			break;
> +		case '\t':
> +			fputs("        ", f);
> +			break;
> +		default:
> +			/* RFC 8259 specify  chars before 0x20 as invalid */
> +			if (*str >= 0x20)
> +				putc(*str, f);
> +			else
> +				fprintf(stderr, "%s:%d %s(): invalid character for JSON: %x\n",
> +						__FILE__, __LINE__, __func__, *str);
> +			break;
> +		}

		if (was_backslash)
			fputs("\\\\", f);

		was_backslash = (*str == '\\');
> +		str++;
> +	}
> +
> +	fputc('"', f);
> +}

This should avoid "unescaping" an escaped double quote. We deffer
printing the backslash until we know the character after it and we make
sure that we do not excape backslash before ".

Consider what would happen if someone did put a "\"text\"" into options
strings, the original code would escape the backslashes and we would end
up with "\\"text"\\" which would break parser again.

This way we can at least avoid parsing errors until we fix the problem
one level down in the parser where we have the context required for a
proper fix.

>  static inline void data_to_json_(struct data_node *self, FILE *f, unsigned int padd, int do_padd)
>  {
>  	unsigned int i;
> @@ -263,7 +297,7 @@ static inline void data_to_json_(struct data_node *self, FILE *f, unsigned int p
>  	switch (self->type) {
>  	case DATA_STRING:
>  		padd = do_padd ? padd : 0;
> -		data_fprintf(f, padd, "\"%s\"", self->string.val);
> +		data_fprintf_esc(f, padd, self->string.val);
>  	break;
>  	case DATA_HASH:
>  		for (i = 0; i < self->hash.elems_used; i++) {
> -- 
> 2.31.1
> 

-- 
Cyril Hrubis
chrubis@suse.cz

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [LTP] [PATCH v4 1/1] docparse: Handle special characters in JSON
  2021-05-06 14:44 ` Cyril Hrubis
@ 2021-05-06 18:21   ` Petr Vorel
  2021-05-06 19:35   ` Petr Vorel
  1 sibling, 0 replies; 6+ messages in thread
From: Petr Vorel @ 2021-05-06 18:21 UTC (permalink / raw)
  To: ltp

Hi Cyril,
> Hi!
> > * escape backslash (/) and double quote (")
>                       ^
> 		      \
+1

> >   escaping backslash effectively escapes other C escaped strings (\t,
> >   \n, ...), which we sometimes want (in the comment) but sometimes not
> >   (in .option we want to have them interpreted)
> > * replace tab with 8x space
> > * skip and TWARN invalid chars (< 0x20, i.e. anything before space)
>              ^
> 	     warn on? We are not actually using TWARN o here right?
Yep, I didn't update commit message (first I included tst_test.h with
TST_NO_DEFAULT_MAIN but there was missing include path => stderr is enough).

> >   defined by RFC 8259 (https://tools.ietf.org/html/rfc8259#page-9)

> > NOTE: atm fix is required only for ", but tab was problematic in the past.

> > TODO: This is just a "hot fix" solution before release. Proper solution
> > would be to check if chars needed to be escaped (", \, /) aren't already
> > escaped.

> > Also for correct decision whether \n, \t should be escaped or interpreted
> > we should decide in the parser which has the context. C string should be
> > probably interpreted (thus nothing needed to be done as it escapes in
> > a compatible way with JSON), but comments probably should display \n, \t
> > thus add extra \.

> > Fixes: c39b29f0a ("bpf: Check truncation on 32bit div/mod by zero")

> > Suggested-by: Cyril Hrubis <chrubis@suse.cz>
> > Co-developed-by: Cyril Hrubis <chrubis@suse.cz>
> > Signed-off-by: Petr Vorel <pvorel@suse.cz>
> > ---
> >  docparse/data_storage.h | 36 +++++++++++++++++++++++++++++++++++-
> >  1 file changed, 35 insertions(+), 1 deletion(-)

> > diff --git a/docparse/data_storage.h b/docparse/data_storage.h
> > index ef420c08f..9f36dd6f0 100644
> > --- a/docparse/data_storage.h
> > +++ b/docparse/data_storage.h
> > @@ -256,6 +256,40 @@ static inline void data_fprintf(FILE *f, unsigned int padd, const char *fmt, ...
> >  	va_end(va);
> >  }

> > +
> > +static inline void data_fprintf_esc(FILE *f, unsigned int padd, const char *str)
> > +{
> > +	while (padd-- > 0)
> > +		fputc(' ', f);
> > +
> > +	fputc('"', f);

> 	int was_backslash = 0;

> > +	while (*str) {
> > +		switch (*str) {
> > +		case '\\':
> > +		break;
> > +		case '"':
> > +			fputs("\\\"", f);
> 			was_backslash = 0;
> > +			break;
> > +		case '\t':
> > +			fputs("        ", f);
> > +			break;
> > +		default:
> > +			/* RFC 8259 specify  chars before 0x20 as invalid */
> > +			if (*str >= 0x20)
> > +				putc(*str, f);
> > +			else
> > +				fprintf(stderr, "%s:%d %s(): invalid character for JSON: %x\n",
> > +						__FILE__, __LINE__, __func__, *str);
> > +			break;
> > +		}

> 		if (was_backslash)
> 			fputs("\\\\", f);

> 		was_backslash = (*str == '\\');
> > +		str++;
> > +	}
> > +
> > +	fputc('"', f);
> > +}

> This should avoid "unescaping" an escaped double quote. We deffer
> printing the backslash until we know the character after it and we make
> sure that we do not excape backslash before ".

> Consider what would happen if someone did put a "\"text\"" into options
> strings, the original code would escape the backslashes and we would end
> up with "\\"text"\\" which would break parser again.

> This way we can at least avoid parsing errors until we fix the problem
> one level down in the parser where we have the context required for a
> proper fix.

+1.

I'll test it and merge under your as it's basically your work :).
Thanks!

Kind regards,
Petr

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [LTP] [PATCH v4 1/1] docparse: Handle special characters in JSON
  2021-05-06 14:44 ` Cyril Hrubis
  2021-05-06 18:21   ` Petr Vorel
@ 2021-05-06 19:35   ` Petr Vorel
  2021-05-07 10:10     ` Cyril Hrubis
  1 sibling, 1 reply; 6+ messages in thread
From: Petr Vorel @ 2021-05-06 19:35 UTC (permalink / raw)
  To: ltp

Hi Cyril,

Looking at your code, I'm not sure if it's needed.

> > +static inline void data_fprintf_esc(FILE *f, unsigned int padd, const char *str)
> > +{
> > +	while (padd-- > 0)
> > +		fputc(' ', f);
> > +
> > +	fputc('"', f);

> 	int was_backslash = 0;

> > +	while (*str) {
> > +		switch (*str) {
> > +		case '\\':
> > +		break;
> > +		case '"':
> > +			fputs("\\\"", f);
> 			was_backslash = 0;
> > +			break;
> > +		case '\t':
> > +			fputs("        ", f);
> > +			break;
> > +		default:
> > +			/* RFC 8259 specify  chars before 0x20 as invalid */
> > +			if (*str >= 0x20)
> > +				putc(*str, f);
> > +			else
> > +				fprintf(stderr, "%s:%d %s(): invalid character for JSON: %x\n",
> > +						__FILE__, __LINE__, __func__, *str);
> > +			break;
> > +		}

> 		if (was_backslash)
> 			fputs("\\\\", f);

> 		was_backslash = (*str == '\\');
> > +		str++;
> > +	}
> > +
> > +	fputc('"', f);
> > +}

> This should avoid "unescaping" an escaped double quote. We deffer
> printing the backslash until we know the character after it and we make
> sure that we do not excape backslash before ".

> Consider what would happen if someone did put a "\"text\"" into options
> strings, the original code would escape the backslashes and we would end
> up with "\\"text"\\" which would break parser again.

> This way we can at least avoid parsing errors until we fix the problem
> one level down in the parser where we have the context required for a
> proper fix.

It looks to me it it works exactly the same with and w/a was_backslash.

Trying to escape \" will results in first escape \ (=> \\), then " (=> \")

Example C code:

/*\
 * [Description]
 * "expected" \\ behaviour "\"text\""
 */

static struct tst_test test = {
	.options = (struct tst_option[]) {
		{"a:", &can_dev_name, "\"text \\ \""},
		{}
	},
};

results from both original code and your with was_backslash are valid JSON,
but was_backslash add extra backslashes.

result from original code:

  "testfile": {
   "options": [
     [
      "a:",
      "can_dev_name",
      "\\\"text \\\\ \\\""
     ]
    ],
   "doc": [
     "[Description]",
     "\"expected\" \\\\ behaviour \"\\\"text\\\"\""
    ],
   "fname": "testfile.c"
  }

result from was_backslash:
  "testfile": {
   "options": [
     [
      "a:",
      "can_dev_name",
      "\\\"text \\\\\\ \\\\\""
     ]
    ],
   "doc": [
     "[Description]",
     "\"expected\" \\\\\\ \\behaviour \"\\\"text\\\"\""
    ],
   "fname": "testfile.c"
  }

What am I missing?

Kind regards,
Petr

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [LTP] [PATCH v4 1/1] docparse: Handle special characters in JSON
  2021-05-06 19:35   ` Petr Vorel
@ 2021-05-07 10:10     ` Cyril Hrubis
  2021-05-07 10:52       ` Petr Vorel
  0 siblings, 1 reply; 6+ messages in thread
From: Cyril Hrubis @ 2021-05-07 10:10 UTC (permalink / raw)
  To: ltp

Hi!
> What am I missing?

Nothing I got confused again and the original code works fine, just
produces ugly output as you described.

-- 
Cyril Hrubis
chrubis@suse.cz

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [LTP] [PATCH v4 1/1] docparse: Handle special characters in JSON
  2021-05-07 10:10     ` Cyril Hrubis
@ 2021-05-07 10:52       ` Petr Vorel
  0 siblings, 0 replies; 6+ messages in thread
From: Petr Vorel @ 2021-05-07 10:52 UTC (permalink / raw)
  To: ltp

Hi Cyril,
> Hi!
> > What am I missing?

> Nothing I got confused again and the original code works fine, just
> produces ugly output as you described.

OK, pushed original version. Travis should be fixed now and we have TODO
after release :).

Kind regards,
Petr

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-05-07 10:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-06 13:27 [LTP] [PATCH v4 1/1] docparse: Handle special characters in JSON Petr Vorel
2021-05-06 14:44 ` Cyril Hrubis
2021-05-06 18:21   ` Petr Vorel
2021-05-06 19:35   ` Petr Vorel
2021-05-07 10:10     ` Cyril Hrubis
2021-05-07 10:52       ` Petr Vorel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.