All of lore.kernel.org
 help / color / mirror / Atom feed
* [LTP] [PATCH v3 1/1] docparse: Escape backslash, tab and double quote in JSON
@ 2021-05-04 12:57 Petr Vorel
  2021-05-04 13:10 ` Cyril Hrubis
  0 siblings, 1 reply; 5+ messages in thread
From: Petr Vorel @ 2021-05-04 12:57 UTC (permalink / raw)
  To: ltp

From: Cyril Hrubis <chrubis@suse.cz>

NOTE: quoting new line require to transform .options from array to
array of arrays.

Tested-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Cyril Hrubis <chrubis@suse.cz>
Signed-off-by: Petr Vorel <pvorel@suse.cz>
---
Hi,

changes v2-v3:
* remove *not* quoting new line (asked by Cyril

Patch is now exactly the same Cyril suggested on ML.
Changing .options is my TODO.

Kind regards,
Petr

 docparse/data_storage.h | 31 ++++++++++++++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/docparse/data_storage.h b/docparse/data_storage.h
index ef420c08f..24381334b 100644
--- a/docparse/data_storage.h
+++ b/docparse/data_storage.h
@@ -256,6 +256,35 @@ static inline void data_fprintf(FILE *f, unsigned int padd, const char *fmt, ...
 	va_end(va);
 }
 
+
+static inline void data_fprintf_esc(FILE *f, unsigned int padd, const char *str)
+{
+	while (padd-- > 0)
+		fputc(' ', f);
+
+	fputc('"', f);
+
+	while (*str) {
+		switch (*str) {
+		case '\\':
+			fputs("\\\\", f);
+			break;
+		case '"':
+			fputs("\\\"", f);
+			break;
+		case '\t':
+			fputs("\\t", f);
+			break;
+		default:
+			putc(*str, f);
+			break;
+		}
+		str++;
+	}
+
+	fputc('"', f);
+}
+
 static inline void data_to_json_(struct data_node *self, FILE *f, unsigned int padd, int do_padd)
 {
 	unsigned int i;
@@ -263,7 +292,7 @@ static inline void data_to_json_(struct data_node *self, FILE *f, unsigned int p
 	switch (self->type) {
 	case DATA_STRING:
 		padd = do_padd ? padd : 0;
-		data_fprintf(f, padd, "\"%s\"", self->string.val);
+		data_fprintf_esc(f, padd, self->string.val);
 	break;
 	case DATA_HASH:
 		for (i = 0; i < self->hash.elems_used; i++) {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [LTP] [PATCH v3 1/1] docparse: Escape backslash, tab and double quote in JSON
  2021-05-04 12:57 [LTP] [PATCH v3 1/1] docparse: Escape backslash, tab and double quote in JSON Petr Vorel
@ 2021-05-04 13:10 ` Cyril Hrubis
  2021-05-04 14:44   ` Petr Vorel
  2021-05-06  7:50   ` Petr Vorel
  0 siblings, 2 replies; 5+ messages in thread
From: Cyril Hrubis @ 2021-05-04 13:10 UTC (permalink / raw)
  To: ltp

Hi!
> +static inline void data_fprintf_esc(FILE *f, unsigned int padd, const char *str)
> +{
> +	while (padd-- > 0)
> +		fputc(' ', f);
> +
> +	fputc('"', f);
> +
> +	while (*str) {
> +		switch (*str) {
> +		case '\\':
> +			fputs("\\\\", f);
> +			break;
> +		case '"':
> +			fputs("\\\"", f);
> +			break;
> +		case '\t':
> +			fputs("\\t", f);
> +			break;
> +		default:
> +			putc(*str, f);
> +			break;
> +		}
> +		str++;
> +	}
> +
> +	fputc('"', f);
> +}

Also does this even escape newlines? If you write "\n" in C it's stored
in memory as [0x0a, 0x00], no actual \ are stored in the string. What
the '\\' case does it to escape literal backslash i.e. "\\" which is
stored as [0x5c, 0x00]. Looking at JSON specification anything that is
in ascii before 0x20 (space) is invalid character in a JSON string. I
guess that the safest to write strings would be:

	...
	default:
		if (*str >= 20)
			putc(*str, f);
	}

And we would have to add escape at least for '\n' the same way we have
for '\t' in the switch.

-- 
Cyril Hrubis
chrubis@suse.cz

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [LTP] [PATCH v3 1/1] docparse: Escape backslash, tab and double quote in JSON
  2021-05-04 13:10 ` Cyril Hrubis
@ 2021-05-04 14:44   ` Petr Vorel
  2021-05-04 14:57     ` Cyril Hrubis
  2021-05-06  7:50   ` Petr Vorel
  1 sibling, 1 reply; 5+ messages in thread
From: Petr Vorel @ 2021-05-04 14:44 UTC (permalink / raw)
  To: ltp

Hi Cyril,

> Hi!
> > +static inline void data_fprintf_esc(FILE *f, unsigned int padd, const char *str)
> > +{
> > +	while (padd-- > 0)
> > +		fputc(' ', f);
> > +
> > +	fputc('"', f);
> > +
> > +	while (*str) {
> > +		switch (*str) {
> > +		case '\\':
> > +			fputs("\\\\", f);
> > +			break;
> > +		case '"':
> > +			fputs("\\\"", f);
> > +			break;
> > +		case '\t':
> > +			fputs("\\t", f);
> > +			break;
> > +		default:
> > +			putc(*str, f);
> > +			break;
> > +		}
> > +		str++;
> > +	}
> > +
> > +	fputc('"', f);
> > +}

> Also does this even escape newlines? If you write "\n" in C it's stored
> in memory as [0x0a, 0x00], no actual \ are stored in the string. What
> the '\\' case does it to escape literal backslash i.e. "\\" which is
> stored as [0x5c, 0x00].
Well, due first handling '\\' any text written as \n will be kept as \n
(obviously anything starting with \ will be handled the same, e.g. \t, \r, \b, \f).
We'd like to interpret \n at least for .options (unless we transform them
to array of arrays as you suggested). But I'm not sure if we want to do
everywhere, e.g. in doc there might be \n which we want to keep, thus I'd prefer
to interpret only tabs ('\t' => "\\t") and the rest escape via escaping '\\'
(already in the patch).

I don't think there is real new line character in our JSON (unlike tab, which
was put there into CAN tests and needed to be reverted). If yes, I think we'd
prefer to interpret it instead escaping it (as well as form feed and carriage return).
Unless any of these is ascii before 0x20 (which will be handled by last change
you're suggested.

> Looking at JSON specification anything that is
> in ascii before 0x20 (space) is invalid character in a JSON string. I
> guess that the safest to write strings would be:
Very good point.

> 	...
> 	default:
> 		if (*str >= 20)
> 			putc(*str, f);
> 	}
Thus this should be the only change.

> And we would have to add escape at least for '\n' the same way we have
> for '\t' in the switch.
And I'd avoid this due previous explanation.

I can merge this under your name with my Reviewed-by: tag.
Or feel free to commit these changes yourself.

Kind regards,
Petr

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [LTP] [PATCH v3 1/1] docparse: Escape backslash, tab and double quote in JSON
  2021-05-04 14:44   ` Petr Vorel
@ 2021-05-04 14:57     ` Cyril Hrubis
  0 siblings, 0 replies; 5+ messages in thread
From: Cyril Hrubis @ 2021-05-04 14:57 UTC (permalink / raw)
  To: ltp

Hi!
> > Also does this even escape newlines? If you write "\n" in C it's stored
> > in memory as [0x0a, 0x00], no actual \ are stored in the string. What
> > the '\\' case does it to escape literal backslash i.e. "\\" which is
> > stored as [0x5c, 0x00].
> Well, due first handling '\\' any text written as \n will be kept as \n
> (obviously anything starting with \ will be handled the same, e.g. \t, \r, \b, \f).
> We'd like to interpret \n at least for .options (unless we transform them
> to array of arrays as you suggested). But I'm not sure if we want to do
> everywhere, e.g. in doc there might be \n which we want to keep, thus I'd prefer
> to interpret only tabs ('\t' => "\\t") and the rest escape via escaping '\\'
> (already in the patch).
>
> I don't think there is real new line character in our JSON (unlike tab, which
> was put there into CAN tests and needed to be reverted). If yes, I think we'd
> prefer to interpret it instead escaping it (as well as form feed and carriage return).
> Unless any of these is ascii before 0x20 (which will be handled by last change
> you're suggested.

Ah, I got confused here as well, we parse the C code and we do not
replace the \n with the actual ascii value in the docparse code so it
ends up verbatim in the strings in memory and then it's translated into
the JSON files.

This is even more complicated than I originally thought since there are
several types of strings from a different parts of the C code, i.e. the
expected values depens on context.

If we parse a C comment '"' is a valid character and does not need to be escaped,
while in the middle of C string it has to be encoded as "\"".

We have to think of all different cases here, I will have a closer look tomorrow.

-- 
Cyril Hrubis
chrubis@suse.cz

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [LTP] [PATCH v3 1/1] docparse: Escape backslash, tab and double quote in JSON
  2021-05-04 13:10 ` Cyril Hrubis
  2021-05-04 14:44   ` Petr Vorel
@ 2021-05-06  7:50   ` Petr Vorel
  1 sibling, 0 replies; 5+ messages in thread
From: Petr Vorel @ 2021-05-06  7:50 UTC (permalink / raw)
  To: ltp

Hi Cyril,

...
> Also does this even escape newlines? If you write "\n" in C it's stored
> in memory as [0x0a, 0x00], no actual \ are stored in the string. What
> the '\\' case does it to escape literal backslash i.e. "\\" which is
> stored as [0x5c, 0x00]. Looking at JSON specification anything that is
> in ascii before 0x20 (space) is invalid character in a JSON string. I
> guess that the safest to write strings would be:

> 	...
> 	default:
> 		if (*str >= 20)
> 			putc(*str, f);
> 	}
Actually, space 20 is hex, it should be 0x20 or 32.

https://tools.ietf.org/html/rfc8259#page-9
unescaped = %x20-21 / %x23-5B / %x5D-10FFFF

BTW preparing v4, it should definitely go to the release (or we need to fix
formatting of c39b29f0a ("bpf: Check truncation on 32bit div/mod by zero")).

Kind regards,
Petr

> And we would have to add escape at least for '\n' the same way we have
> for '\t' in the switch.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-05-06  7:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-04 12:57 [LTP] [PATCH v3 1/1] docparse: Escape backslash, tab and double quote in JSON Petr Vorel
2021-05-04 13:10 ` Cyril Hrubis
2021-05-04 14:44   ` Petr Vorel
2021-05-04 14:57     ` Cyril Hrubis
2021-05-06  7:50   ` Petr Vorel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.