Bad interaction between macro expansion and literal concatenation

All of lore.kernel.org
 help / color / mirror / Atom feed

* Bad interaction between macro expansion and literal concatenation
@ 2015-01-30 22:16 Rasmus Villemoes
  2015-01-31  1:23 ` [PATCH] Avoid reuse of string buffer when concatening adjacent string litterals Luc Van Oostenryck
  2015-01-31  5:16 ` Bad interaction between macro expansion and literal concatenation Christopher Li
  0 siblings, 2 replies; 28+ messages in thread
From: Rasmus Villemoes @ 2015-01-30 22:16 UTC (permalink / raw)
  To: linux-sparse

Hi,

I'm trying to write a type checker for the kernel's %p format extensions, but
I've run into a problem with sparse which I can't debug myself.

It seems that something goes wrong when sparse is concatenating literal
strings, one of which was hidden behind a macro. As a random example,
drivers/net/ethernet/realtek/atp.c:471 has this format string:

  KERN_DEBUG "%s: Reset: current Rx mode %d.\n"

KERN_DEBUG ultimately expands to "\001" "7". When I print the
corresponding ->string->data, I get

  ^A6%s7%s: Reset: current Rx mode %d.

(^A is just less' way of showing the \001 byte - that part is
fine). Note the extra "6%s". Needless to say, such random extra content
makes format string verification very hard...

If I replace KERN_DEBUG by its expansion "\001" "7", it works fine. If I
only replace it by its immediate replacement KERN_SOH "7", I see the
same bad behaviour.

Any ideas?

Rasmus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] Avoid reuse of string buffer when concatening adjacent string litterals
  2015-01-30 22:16 Bad interaction between macro expansion and literal concatenation Rasmus Villemoes
@ 2015-01-31  1:23 ` Luc Van Oostenryck
  2015-02-03 22:38   ` Rasmus Villemoes
  2015-01-31  5:16 ` Bad interaction between macro expansion and literal concatenation Christopher Li
  1 sibling, 1 reply; 28+ messages in thread
From: Luc Van Oostenryck @ 2015-01-31  1:23 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: linux-sparse, Christopher Li

In get_string_constant(), the code tried to reuse the storage for the string
but only if the expansion of the string was not bigger than its unexpanded form.
But this fail when the string constant is a sequence of adjacent string litterals
(each being possibly shared, used elsewhere, isolated or in another order).
The minimal exemple would be something like this:

#define P "\001"
const char a[] = P "a";
const char b[] = P "b";

The expansion for 'a' will produce a string which is smaller than
the unexpanded "\001" (2 instead of 4).
By trying to reuse the storage, all further occurrence of "\001"
(probably only from the same 'origin', here the macro P) will then be replaced by "\001a".

The fix is thus to not try to reuse the storage for the string if it consit of
several adjacent litterals.

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
---
 char.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/char.c b/char.c
index 08ca2230..ce1a0700 100644
--- a/char.c
+++ b/char.c
@@ -93,6 +93,7 @@ struct token *get_string_constant(struct token *token, struct expression *expr)
 	static char buffer[MAX_STRING];
 	int len = 0;
 	int bits;
+	int parts = 0;

 	while (!done) {
 		switch (token_type(next)) {
@@ -117,13 +118,14 @@ struct token *get_string_constant(struct token *token, struct expression *expr)
 			len++;
 		}
 		token = token->next;
+		parts++;
 	}
 	if (len > MAX_STRING) {
 		warning(token->pos, "trying to concatenate %d-character string (%d bytes max)", len, MAX_STRING);
 		len = MAX_STRING;
 	}

-	if (len >= string->length)	/* can't cannibalize */
+	if (len >= string->length || parts > 1)	/* safe to reuse the string buffer */
 		string = __alloc_string(len+1);
 	string->length = len+1;
 	memcpy(string->data, buffer, len);
-- 
2.2.2

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: Bad interaction between macro expansion and literal concatenation
  2015-01-30 22:16 Bad interaction between macro expansion and literal concatenation Rasmus Villemoes
  2015-01-31  1:23 ` [PATCH] Avoid reuse of string buffer when concatening adjacent string litterals Luc Van Oostenryck
@ 2015-01-31  5:16 ` Christopher Li
  2015-02-01  2:19   ` [PATCH 0/4] Teach sparse to display data/initial values Luc Van Oostenryck
  1 sibling, 1 reply; 28+ messages in thread
From: Christopher Li @ 2015-01-31  5:16 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: Linux-Sparse

On Fri, Jan 30, 2015 at 2:16 PM, Rasmus Villemoes
<linux@rasmusvillemoes.dk> wrote:
>
>   KERN_DEBUG "%s: Reset: current Rx mode %d.\n"
>
> KERN_DEBUG ultimately expands to "\001" "7". When I print the
> corresponding ->string->data, I get
>
>   ^A6%s7%s: Reset: current Rx mode %d.

Can you construct a small C program to demonstrate the problem
without including any real kernel headers?

e.g. some thing like:

#define KERNEL_DEBUG "\001" "7"

void foo(void)
{
     printk(KERN_DEBUG "%s: .....", ...);
}

Obviously, my test case is not a complete one. It is not
even valid C. It just show you some idea how a test case
can be constructed.

Once you have a small test case, it is much easier
to discuss and debug what is going on there.

BTW, how do you print the "string->data"? It helps to show the
functions that does the print.

Chris

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 0/4] Teach sparse to display data/initial values
  2015-01-31  5:16 ` Bad interaction between macro expansion and literal concatenation Christopher Li
@ 2015-02-01  2:19   ` Luc Van Oostenryck
  2015-02-01  2:19     ` [PATCH 1/4] Add support for '-vdata', the equivalent of '-ventry' but for data Luc Van Oostenryck
                       ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: Luc Van Oostenryck @ 2015-02-01  2:19 UTC (permalink / raw)
  To: sparse; +Cc: linux-sparse, linux

The patches in this serie implement the display of toplevel variable's initializer.
It should allow to easily investigate the bug reported here: http://article.gmane.org/gmane.comp.parsers.sparse/4080
but can be usefull for others things.

For convenience, this patch series can also be found at git://github.com/looxix/sparse.git show-data


Luc Van Oostenryck (4):
 Add support for '-vdata', the equivalent of '-ventry' but for data
 Add support for show_data()
 Teach sparse to display data/initial values
 Small test/exemple for using '-vdata'

 lib.c                  |  2 ++
 lib.h                  |  1 +
 linearize.c            | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 linearize.h            |  1 +
 sparse.c               |  3 +++
 validation/show_data.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 168 insertions(+)


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/4] Add support for '-vdata', the equivalent of '-ventry' but for data
  2015-02-01  2:19   ` [PATCH 0/4] Teach sparse to display data/initial values Luc Van Oostenryck
@ 2015-02-01  2:19     ` Luc Van Oostenryck
  2015-02-01  2:19     ` [PATCH 2/4] Add support for show_data() Luc Van Oostenryck
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 28+ messages in thread
From: Luc Van Oostenryck @ 2015-02-01  2:19 UTC (permalink / raw)
  To: sparse; +Cc: linux-sparse, linux, Luc Van Oostenryck

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
---
 lib.c | 2 ++
 lib.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/lib.c b/lib.c
index 8dc5bcf9..73714109 100644
--- a/lib.c
+++ b/lib.c
@@ -243,6 +243,7 @@ int Wuninitialized = 1;
 int Wvla = 1;
 
 int dbg_entry = 0;
+int dbg_data = 0;
 int dbg_dead = 0;
 
 int preprocess_only;
@@ -517,6 +518,7 @@ static char **handle_switch_W(char *arg, char **next)
 
 static struct warning debugs[] = {
 	{ "entry", &dbg_entry},
+	{ "data", &dbg_data},
 	{ "dead", &dbg_dead},
 };
 
diff --git a/lib.h b/lib.h
index 15b69fa2..2e1170b4 100644
--- a/lib.h
+++ b/lib.h
@@ -129,6 +129,7 @@ extern int Wuninitialized;
 extern int Wvla;
 
 extern int dbg_entry;
+extern int dbg_data;
 extern int dbg_dead;
 
 extern int arch_m64;
-- 
2.2.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/4] Add support for show_data()
  2015-02-01  2:19   ` [PATCH 0/4] Teach sparse to display data/initial values Luc Van Oostenryck
  2015-02-01  2:19     ` [PATCH 1/4] Add support for '-vdata', the equivalent of '-ventry' but for data Luc Van Oostenryck
@ 2015-02-01  2:19     ` Luc Van Oostenryck
  2015-02-02  5:30       ` Christopher Li
  2015-02-01  2:19     ` [PATCH 3/4] Teach sparse to display data/initial values Luc Van Oostenryck
  2015-02-01  2:19     ` [PATCH 4/4] Small test/exemple for using '-vdata' Luc Van Oostenryck
  3 siblings, 1 reply; 28+ messages in thread
From: Luc Van Oostenryck @ 2015-02-01  2:19 UTC (permalink / raw)
  To: sparse; +Cc: linux-sparse, linux, Luc Van Oostenryck

The purpose of show_data() is to visualize data, more or less
like show_entry() allow to visualize the code.

The support is not complete but at least the basic types are handled.

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
---
 linearize.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 linearize.h |  1 +
 2 files changed, 87 insertions(+)

diff --git a/linearize.c b/linearize.c
index c6ada1e8..28beb271 100644
--- a/linearize.c
+++ b/linearize.c
@@ -2220,3 +2220,89 @@ struct entrypoint *linearize_symbol(struct symbol *sym)
 		return linearize_fn(sym, base_type);
 	return NULL;
 }
+
+static void do_indent(int level)
+{
+	while (level--)
+		printf("\t");
+}
+
+static void show_data_expr(struct expression *expr, int indent)
+{
+	struct expression *entry;
+	struct symbol *sym;
+
+	switch (expr->type) {
+	case EXPR_VALUE:
+		printf("%lld", expr->value);
+		break;
+	case EXPR_SYMBOL:
+		sym = expr->symbol;
+		if (sym->initializer && sym->initializer->type == EXPR_STRING)
+			return show_data_expr(sym->initializer, indent);
+		printf("%s", show_ident(expr->symbol_name));
+		break;
+	case EXPR_IMPLIED_CAST:
+		show_data_expr(expr->cast_expression, indent);
+		break;
+	case EXPR_STRING:
+		printf("%s", show_string(expr->string));
+		break;
+	case EXPR_INITIALIZER:
+		printf("{");
+		FOR_EACH_PTR(expr->expr_list, entry) {
+			switch (entry->type) {
+			case EXPR_POS:
+			case EXPR_INDEX:
+			case EXPR_IDENTIFIER:
+				continue;
+			}
+			do_indent(indent+1);
+			show_data_expr(entry, indent+1);
+			printf(",\n");
+		} END_FOR_EACH_PTR(entry);
+		do_indent(indent);
+		printf("}");
+		break;
+	case EXPR_PREOP:
+		if (expr->op == '*') {
+			show_data_expr(expr->unop, indent);
+			break;
+		}
+		// Fall through
+	default:
+		printf("!!unhandled expression type (%d)", expr->type);
+		break;
+	}
+}
+
+void show_data(struct symbol *sym)
+{
+	struct expression *expr = sym->initializer;
+
+	printf("symbol %s:\n", show_ident(sym->ident));
+	printf("\t"); show_type(sym); printf("\n");
+	printf("\tbit_size = %i\n", sym->bit_size);
+	printf("\tval = ");
+	switch (sym->ctype.base_type->type) {
+	case SYM_BASETYPE:
+	case SYM_ENUM:
+		show_data_expr(expr, 0);
+		break;
+	case SYM_PTR:
+		printf("&");
+		show_data_expr(expr, 0);
+		break;
+	case SYM_ARRAY:
+		show_data_expr(expr, 0);
+		break;
+	case SYM_STRUCT:
+	case SYM_UNION:
+		show_data_expr(expr, 1);
+		break;
+	default:
+		printf("\ttype = %d\n", sym->ctype.base_type->type);
+		break;
+	}
+	printf("\n");
+}
diff --git a/linearize.h b/linearize.h
index 61fbd831..7b2d0681 100644
--- a/linearize.h
+++ b/linearize.h
@@ -344,6 +344,7 @@ void show_entry(struct entrypoint *ep);
 const char *show_pseudo(pseudo_t pseudo);
 void show_bb(struct basic_block *bb);
 const char *show_instruction(struct instruction *insn);
+void show_data(struct symbol *sym);
 
 #endif /* LINEARIZE_H */
 
-- 
2.2.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/4] Teach sparse to display data/initial values
  2015-02-01  2:19   ` [PATCH 0/4] Teach sparse to display data/initial values Luc Van Oostenryck
  2015-02-01  2:19     ` [PATCH 1/4] Add support for '-vdata', the equivalent of '-ventry' but for data Luc Van Oostenryck
  2015-02-01  2:19     ` [PATCH 2/4] Add support for show_data() Luc Van Oostenryck
@ 2015-02-01  2:19     ` Luc Van Oostenryck
  2015-02-01  2:19     ` [PATCH 4/4] Small test/exemple for using '-vdata' Luc Van Oostenryck
  3 siblings, 0 replies; 28+ messages in thread
From: Luc Van Oostenryck @ 2015-02-01  2:19 UTC (permalink / raw)
  To: sparse; +Cc: linux-sparse, linux, Luc Van Oostenryck

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
---
 sparse.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/sparse.c b/sparse.c
index 6b3324cf..e87f8c4f 100644
--- a/sparse.c
+++ b/sparse.c
@@ -285,6 +285,9 @@ static void check_symbols(struct symbol_list *list)
 				show_entry(ep);
 
 			check_context(ep);
+		} else {
+			if (dbg_data)
+				show_data(sym);
 		}
 	} END_FOR_EACH_PTR(sym);
 
-- 
2.2.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/4] Small test/exemple for using '-vdata'
  2015-02-01  2:19   ` [PATCH 0/4] Teach sparse to display data/initial values Luc Van Oostenryck
                       ` (2 preceding siblings ...)
  2015-02-01  2:19     ` [PATCH 3/4] Teach sparse to display data/initial values Luc Van Oostenryck
@ 2015-02-01  2:19     ` Luc Van Oostenryck
  3 siblings, 0 replies; 28+ messages in thread
From: Luc Van Oostenryck @ 2015-02-01  2:19 UTC (permalink / raw)
  To: sparse; +Cc: linux-sparse, linux, Luc Van Oostenryck

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
---
 validation/show_data.c | 75 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 75 insertions(+)
 create mode 100644 validation/show_data.c

diff --git a/validation/show_data.c b/validation/show_data.c
new file mode 100644
index 00000000..b620b610
--- /dev/null
+++ b/validation/show_data.c
@@ -0,0 +1,75 @@
+#define P "\001"
+
+static const char a[] = P "a";
+static const char b[] = P "b";
+
+static const char *c = "abc";
+
+static int i = 42;
+
+static int  *ipi = &i;
+static void *vpi = &i;
+
+struct s {
+	int i;
+};
+static struct s s = {
+	.i = 42,
+};
+
+union u {
+	int i;
+	void *p;
+};
+static union u u = {
+	.i = 42,
+};
+
+enum e {
+	first,
+	next,
+	last
+};
+static enum e e = next;
+/*
+ * check-name: show_data.c
+ * check-command: sparse -m32 -vdata $file
+ * check-output-start:
+symbol a:
+	char static const [toplevel] a[0]
+	bit_size = 24
+	val = "\1a"
+symbol b:
+	char static const [toplevel] b[0]
+	bit_size = 32
+	val = "\1b"
+symbol c:
+	char const *static [toplevel] c
+	bit_size = 32
+	val = &"abc"
+symbol i:
+	int static [signed] [addressable] [toplevel] i
+	bit_size = 32
+	val = 42
+symbol ipi:
+	int *static [toplevel] ipi
+	bit_size = 32
+	val = &i
+symbol vpi:
+	void *static [toplevel] vpi
+	bit_size = 32
+	val = &i
+symbol s:
+	struct s static [toplevel] s
+	bit_size = 32
+	val = {	}
+symbol u:
+	union u static [toplevel] u
+	bit_size = 32
+	val = {	}
+symbol e:
+	int enum e static [signed] [toplevel] e
+	bit_size = 32
+	val = 1
+ * check-output-end
+ */
-- 
2.2.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/4] Add support for show_data()
  2015-02-01  2:19     ` [PATCH 2/4] Add support for show_data() Luc Van Oostenryck
@ 2015-02-02  5:30       ` Christopher Li
  2015-02-04  0:50         ` Luc Van Oostenryck
  0 siblings, 1 reply; 28+ messages in thread
From: Christopher Li @ 2015-02-02  5:30 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: Linux-Sparse, Rasmus Villemoes

On Sat, Jan 31, 2015 at 6:19 PM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
> The purpose of show_data() is to visualize data, more or less

I think show_data() is a bad name. Data is very generic term,
it has not specific meaning what exactly is that data.

What you really want to show is the initializer expression.
Make the function name reflect that.

> diff --git a/linearize.c b/linearize.c

Why is this function belongs to linearize.c?
It has nothing to do with linearization instruction. It has
every thing to do with the AST structure.

I suggest that move this code to show-parse.c.
There are many functions showing internal of the AST
there.

Chris

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Avoid reuse of string buffer when concatening adjacent string litterals
  2015-01-31  1:23 ` [PATCH] Avoid reuse of string buffer when concatening adjacent string litterals Luc Van Oostenryck
@ 2015-02-03 22:38   ` Rasmus Villemoes
  2015-02-04  0:32     ` Luc Van Oostenryck
  2015-02-04  2:01     ` [PATCH v2] Avoid reusing string buffer when doing string expansion Luc Van Oostenryck
  0 siblings, 2 replies; 28+ messages in thread
From: Rasmus Villemoes @ 2015-02-03 22:38 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: linux-sparse, Christopher Li

On Sat, Jan 31 2015, Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote:

> In get_string_constant(), the code tried to reuse the storage for the string
> but only if the expansion of the string was not bigger than its unexpanded form.
> But this fail when the string constant is a sequence of adjacent string litterals
> (each being possibly shared, used elsewhere, isolated or in another order).
> The minimal exemple would be something like this:
>
> #define P "\001"
> const char a[] = P "a";
> const char b[] = P "b";
>
> The expansion for 'a' will produce a string which is smaller than
> the unexpanded "\001" (2 instead of 4).
> By trying to reuse the storage, all further occurrence of "\001"
> (probably only from the same 'origin', here the macro P) will then be replaced by "\001a".
>
> The fix is thus to not try to reuse the storage for the string if it consit of
> several adjacent litterals.
>

Thanks, but there's still something wrong. Using your show-data feature
on this:

===
#define BACKSLASH "\\"
#define LETTER_t "t"

static const char s1[] = BACKSLASH;
/* static const char s2[] = BACKSLASH; */
static const char s3[] = BACKSLASH LETTER_t;
static const char s4[] = "a" BACKSLASH LETTER_t "b";
===

I get

symbol s1:
        char static const [toplevel] s1[0]
        bit_size = 16
        val = "\\"
symbol s3:
        char static const [toplevel] s3[0]
        bit_size = 24
        val = "\0t"
symbol s4:
        char static const [toplevel] s4[0]
        bit_size = 40
        val = "a\0tb"

Now if I do the same with s2 not commented out, I get


symbol s1:
        char static const [toplevel] s1[0]
        bit_size = 16
        val = "\0"
symbol s2:
        char static const [toplevel] s2[0]
        bit_size = 16
        val = "\0"
symbol s3:
        char static const [toplevel] s3[0]
        bit_size = 24
        val = "\0t"
symbol s4:
        char static const [toplevel] s4[0]
        bit_size = 40
        val = "a\0tb"

So the expansion of BACKSLASH changes depending on how often it is
expanded...

The LETTER_t thing above is because I thought I had somehow provoked a
double expansion, making BACKSLASH LETTER_t (or some variant) expand to
a single-character string containing just a tab. But I can't seem to
reproduce that particular behaviour, so maybe I'm imagining
stuff. Anyway, the above is certainly real.

Thanks,
Rasmus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Avoid reuse of string buffer when concatening adjacent string litterals
  2015-02-03 22:38   ` Rasmus Villemoes
@ 2015-02-04  0:32     ` Luc Van Oostenryck
  2015-02-04  3:26       ` Christopher Li
  2015-02-04  8:39       ` Rasmus Villemoes
  2015-02-04  2:01     ` [PATCH v2] Avoid reusing string buffer when doing string expansion Luc Van Oostenryck
  1 sibling, 2 replies; 28+ messages in thread
From: Luc Van Oostenryck @ 2015-02-04  0:32 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: linux-sparse, Christopher Li

On Tue, Feb 03, 2015 at 11:38:02PM +0100, Rasmus Villemoes wrote:
> On Sat, Jan 31 2015, Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote:
> 
> > In get_string_constant(), the code tried to reuse the storage for the string
> > but only if the expansion of the string was not bigger than its unexpanded form.
> > But this fail when the string constant is a sequence of adjacent string litterals
> > (each being possibly shared, used elsewhere, isolated or in another order).
> > The minimal exemple would be something like this:
> >
> > #define P "\001"
> > const char a[] = P "a";
> > const char b[] = P "b";
> >
> > The expansion for 'a' will produce a string which is smaller than
> > the unexpanded "\001" (2 instead of 4).
> > By trying to reuse the storage, all further occurrence of "\001"
> > (probably only from the same 'origin', here the macro P) will then be replaced by "\001a".
> >
> > The fix is thus to not try to reuse the storage for the string if it consit of
> > several adjacent litterals.
> >
> 
> Thanks, but there's still something wrong. Using your show-data feature
> on this:
> 
> ===
> #define BACKSLASH "\\"
> #define LETTER_t "t"
> 
> static const char s1[] = BACKSLASH;
> /* static const char s2[] = BACKSLASH; */
> static const char s3[] = BACKSLASH LETTER_t;
> static const char s4[] = "a" BACKSLASH LETTER_t "b";
> ===
> 
> I get
> 
> symbol s1:
>         char static const [toplevel] s1[0]
>         bit_size = 16
>         val = "\\"
> symbol s3:
>         char static const [toplevel] s3[0]
>         bit_size = 24
>         val = "\0t"
> symbol s4:
>         char static const [toplevel] s4[0]
>         bit_size = 40
>         val = "a\0tb"
> 
> Now if I do the same with s2 not commented out, I get
> 
> 
> symbol s1:
>         char static const [toplevel] s1[0]
>         bit_size = 16
>         val = "\0"
> symbol s2:
>         char static const [toplevel] s2[0]
>         bit_size = 16
>         val = "\0"
> symbol s3:
>         char static const [toplevel] s3[0]
>         bit_size = 24
>         val = "\0t"
> symbol s4:
>         char static const [toplevel] s4[0]
>         bit_size = 40
>         val = "a\0tb"
> 
> So the expansion of BACKSLASH changes depending on how often it is
> expanded...
> 
> The LETTER_t thing above is because I thought I had somehow provoked a
> double expansion, making BACKSLASH LETTER_t (or some variant) expand to
> a single-character string containing just a tab. But I can't seem to
> reproduce that particular behaviour, so maybe I'm imagining
> stuff. Anyway, the above is certainly real.
> 
> Thanks,
> Rasmus
> --
Yes, I see.

Now thinking about it, it's obvious that the string buffer can't be reused at all
if there is any kind of expansion done on it, the adjacent strings concatenation
make just the thing worse but are not the cause of it.

I'll post an updated patch later.


Regards,
Luc

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/4] Add support for show_data()
  2015-02-02  5:30       ` Christopher Li
@ 2015-02-04  0:50         ` Luc Van Oostenryck
  0 siblings, 0 replies; 28+ messages in thread
From: Luc Van Oostenryck @ 2015-02-04  0:50 UTC (permalink / raw)
  To: Christopher Li; +Cc: Linux-Sparse, Rasmus Villemoes

On Sun, Feb 01, 2015 at 09:30:44PM -0800, Christopher Li wrote:
> On Sat, Jan 31, 2015 at 6:19 PM, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
> > The purpose of show_data() is to visualize data, more or less
> 
> I think show_data() is a bad name. Data is very generic term,
> it has not specific meaning what exactly is that data.
> 
> What you really want to show is the initializer expression.
> Make the function name reflect that.

Yes, the name is too generic. The purpose was to use 'data'
in opposition to 'code'.

> > diff --git a/linearize.c b/linearize.c
> 
> Why is this function belongs to linearize.c?
> It has nothing to do with linearization instruction. It has
> every thing to do with the AST structure.
> 
> I suggest that move this code to show-parse.c.
> There are many functions showing internal of the AST
> there.

Well, I put it in linearize because it's where show_entry() is defined
and since I wanted this new show function be its complement, it seemed
to me much better to put it there than in show-parse.c
Even more so because I'm not interested in showing anything close to the AST,
on the contrary, I want to show the values like you would find them in
an assembler code file generated by a compiler after constant folding,
string expansion, type coercion, ...

Anyway, my intention was not to have it included as is (mainly because it's
not complete) but only as a quick tool to investigate the bug that Rasmus have reported.
I'll try to complete it and resubmit it then, taking your remarks in account.

Regards,
Luc

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2] Avoid reusing string buffer when doing string expansion
  2015-02-03 22:38   ` Rasmus Villemoes
  2015-02-04  0:32     ` Luc Van Oostenryck
@ 2015-02-04  2:01     ` Luc Van Oostenryck
  2015-02-04  5:30       ` Christopher Li
  1 sibling, 1 reply; 28+ messages in thread
From: Luc Van Oostenryck @ 2015-02-04  2:01 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: linux-sparse, Christopher Li, Luc Van Oostenryck


In get_string_constant(), the code tried to reuse the storage for the string
but only if the expansion of the string was not bigger than its unexpanded form.
But this string can be shared with other expressions and reusing the buffer will
result in later corruption

A minimal exemple would be something like:
const char a[] = BACKSLASH;
const char b[] = BACKSLASH;

The expansion for 'a' will correctly produce the two-char string consisting
of a backslash char followed by a null char.
But then the expansion of 'b' will expand this once more,
producing the expansion of "\0": the two-char string: { '\0', '\0' }.

The fix is to not reuse the storage for the string if any king of expansion
have been done.

Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
---
 char.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/char.c b/char.c
index 08ca2230..2e21bb77 100644
--- a/char.c
+++ b/char.c
@@ -123,11 +123,15 @@ struct token *get_string_constant(struct token *token, struct expression *expr)
 		len = MAX_STRING;
 	}
 
-	if (len >= string->length)	/* can't cannibalize */
+	/* The input string can be shared with other expression and so
+	 * its storage can't be reused if any kind of expansion have been done on it.
+	 */
+	if ((len != string->length) || memcmp(buffer, string->data, len)) {
 		string = __alloc_string(len+1);
-	string->length = len+1;
-	memcpy(string->data, buffer, len);
-	string->data[len] = '\0';
+		string->length = len+1;
+		memcpy(string->data, buffer, len);
+		string->data[len] = '\0';
+	}
 	expr->string = string;
 	expr->wide = is_wide;
 	return token;
-- 
2.2.0



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] Avoid reuse of string buffer when concatening adjacent string litterals
  2015-02-04  0:32     ` Luc Van Oostenryck
@ 2015-02-04  3:26       ` Christopher Li
  2015-02-04  8:39       ` Rasmus Villemoes
  1 sibling, 0 replies; 28+ messages in thread
From: Christopher Li @ 2015-02-04  3:26 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: Rasmus Villemoes, Linux-Sparse

On Tue, Feb 3, 2015 at 4:32 PM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
> Now thinking about it, it's obvious that the string buffer can't be reused at all
> if there is any kind of expansion done on it, the adjacent strings concatenation
> make just the thing worse but are not the cause of it.
>
Right. That is what I think after reading your patch too.
String concatenation is a not a good indicator on macro expand.
There should be a fix base on the macro expand.

Even though I haven't construct an test case like Rasmus did.

Chris

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] Avoid reusing string buffer when doing string expansion
  2015-02-04  2:01     ` [PATCH v2] Avoid reusing string buffer when doing string expansion Luc Van Oostenryck
@ 2015-02-04  5:30       ` Christopher Li
  2015-02-04  6:22         ` Luc Van Oostenryck
  0 siblings, 1 reply; 28+ messages in thread
From: Christopher Li @ 2015-02-04  5:30 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: Rasmus Villemoes, Linux-Sparse

On Tue, Feb 3, 2015 at 6:01 PM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
>
> In get_string_constant(), the code tried to reuse the storage for the string
> but only if the expansion of the string was not bigger than its unexpanded form.
> But this string can be shared with other expressions and reusing the buffer will
> result in later corruption
>
> A minimal exemple would be something like:
> const char a[] = BACKSLASH;
> const char b[] = BACKSLASH;
>
> The expansion for 'a' will correctly produce the two-char string consisting
> of a backslash char followed by a null char.
> But then the expansion of 'b' will expand this once more,
> producing the expansion of "\0": the two-char string: { '\0', '\0' }.

Are you sure about this behavior? You mean you see "b" has the string
size as 2. I haven't understand how this can happen.

> The fix is to not reuse the storage for the string if any king of expansion
> have been done.

That is a bit over kill. We only need to avoid reuse storage if the
destination part of the string is come from a preprocessor macro.
It is pretty common string contain escape sequence. We don't
want to allocate extra memory copy if it is not part of a macro
expansion.

>
> Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
> Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
> ---
>  char.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/char.c b/char.c
> index 08ca2230..2e21bb77 100644
> --- a/char.c
> +++ b/char.c
> @@ -123,11 +123,15 @@ struct token *get_string_constant(struct token *token, struct expression *expr)
>                 len = MAX_STRING;
>         }
>
> -       if (len >= string->length)      /* can't cannibalize */
> +       /* The input string can be shared with other expression and so
> +        * its storage can't be reused if any kind of expansion have been done on it.
> +        */
> +       if ((len != string->length) || memcmp(buffer, string->data, len)) {

I don' think this check take into account the preprocessor macro has
been used or not. In other words, any general "hello world\n" which
contain the escape character will produce a different buffer, there for,
a new copy of the string. Which is not necessary. That is a pretty
common case.

I am working on patch to address it in the preprocessor macro.
The idea is that just mark the string as immutable if it is part of the
macro expansion. I will see how it goes.

Chris

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] Avoid reusing string buffer when doing string expansion
  2015-02-04  5:30       ` Christopher Li
@ 2015-02-04  6:22         ` Luc Van Oostenryck
  2015-02-04  8:01           ` Christopher Li
  0 siblings, 1 reply; 28+ messages in thread
From: Luc Van Oostenryck @ 2015-02-04  6:22 UTC (permalink / raw)
  To: Christopher Li; +Cc: Rasmus Villemoes, Linux-Sparse

On Tue, Feb 03, 2015 at 09:30:15PM -0800, Christopher Li wrote:
> On Tue, Feb 3, 2015 at 6:01 PM, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
> >
> > In get_string_constant(), the code tried to reuse the storage for the string
> > but only if the expansion of the string was not bigger than its unexpanded form.
> > But this string can be shared with other expressions and reusing the buffer will
> > result in later corruption
> >
> > A minimal exemple would be something like:
> > const char a[] = BACKSLASH;
> > const char b[] = BACKSLASH;
> >
> > The expansion for 'a' will correctly produce the two-char string consisting
> > of a backslash char followed by a null char.
> > But then the expansion of 'b' will expand this once more,
> > producing the expansion of "\0": the two-char string: { '\0', '\0' }.
> 
> Are you sure about this behavior? You mean you see "b" has the string
> size as 2. I haven't understand how this can happen.

Using the show_data() / sparse -vdata on:
===
#define BACKSLASH "\\"
const char a[] = BACKSLASH;
===

gives, correctly:
===
symbol a:
	char const [addressable] [toplevel] b[0]
	bit_size = 16
	val = "\\"
=== 

But if the macro is used several times:
===
#define BACKSLASH "\\"
const char a[] = BACKSLASH;
const char b[] = BACKSLASH;
const char c[] = "<" BACKSLASH ">";
===

the, we get:
===
symbol a:
	char const [addressable] [toplevel] a[0]
	bit_size = 16
	val = "\0"
symbol b:
	char const [addressable] [toplevel] b[0]
	bit_size = 16
	val = "\0"
symbol c:
	char const [addressable] [toplevel] c[0]
	bit_size = 32
	val = "<\0>"
===

And even worse:
===
#define BACKSLASH "(\\)"
const char m[] = BACKSLASH;
const char n[] = BACKSLASH;
const char k[] = "<" BACKSLASH ">";
===

gives:
===
symbol m:
	char const [addressable] [toplevel] m[0]
	bit_size = 24
	val = "()"
symbol n:
	char const [addressable] [toplevel] n[0]
	bit_size = 24
	val = "()"
symbol k:
	char const [addressable] [toplevel] k[0]
	bit_size = 40
	val = "<()>"
===

> > The fix is to not reuse the storage for the string if any king of expansion
> > have been done.
> 
> That is a bit over kill. We only need to avoid reuse storage if the
> destination part of the string is come from a preprocessor macro.
> It is pretty common string contain escape sequence. We don't
> want to allocate extra memory copy if it is not part of a macro
> expansion.

Well yes ...
Is it only with macros that the string structure is so shared?
And have we a way to test if the string is coming from a macro?

 
> >
> > Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
> > Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
> > ---
> >  char.c | 12 ++++++++----
> >  1 file changed, 8 insertions(+), 4 deletions(-)
> >
> > diff --git a/char.c b/char.c
> > index 08ca2230..2e21bb77 100644
> > --- a/char.c
> > +++ b/char.c
> > @@ -123,11 +123,15 @@ struct token *get_string_constant(struct token *token, struct expression *expr)
> >                 len = MAX_STRING;
> >         }
> >
> > -       if (len >= string->length)      /* can't cannibalize */
> > +       /* The input string can be shared with other expression and so
> > +        * its storage can't be reused if any kind of expansion have been done on it.
> > +        */
> > +       if ((len != string->length) || memcmp(buffer, string->data, len)) {
> 
> I don' think this check take into account the preprocessor macro has
> been used or not. In other words, any general "hello world\n" which
> contain the escape character will produce a different buffer, there for,
> a new copy of the string. Which is not necessary. That is a pretty
> common case.

No, indeed, it does not.
It just allocate a new buffer every time there is any modification/expansion
so that the original one is not touched (in case it is used elsewhere).

> 
> I am working on patch to address it in the preprocessor macro.
> The idea is that just mark the string as immutable if it is part of the
> macro expansion. I will see how it goes.
> 
> Chris
> --

A simpler and safer way would be to directly do the string expansion just after
a string token is recognized, or even better in the lexer itself.
So the string buffer, macro or not, will always directly contain the right values.
But maybe there was good reasons to not do it this way.

Luc

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] Avoid reusing string buffer when doing string expansion
  2015-02-04  6:22         ` Luc Van Oostenryck
@ 2015-02-04  8:01           ` Christopher Li
  2015-02-04 16:38             ` Christopher Li
  2015-02-04 23:38             ` Luc Van Oostenryck
  0 siblings, 2 replies; 28+ messages in thread
From: Christopher Li @ 2015-02-04  8:01 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: Rasmus Villemoes, Linux-Sparse

On Tue, Feb 3, 2015 at 10:22 PM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
>> Are you sure about this behavior? You mean you see "b" has the string
>> size as 2. I haven't understand how this can happen.
>
>
> But if the macro is used several times:
> ===
> #define BACKSLASH "\\"
> const char a[] = BACKSLASH;
> const char b[] = BACKSLASH;
> const char c[] = "<" BACKSLASH ">";
> ===
>
> the, we get:
> ===
> symbol a:
>         char const [addressable] [toplevel] a[0]
>         bit_size = 16
>         val = "\0"
> symbol b:
>         char const [addressable] [toplevel] b[0]
>         bit_size = 16

The value buffer is corrupted. But the bit_size is still 16, which
is correct. I just think that in your example it shouldn't corrupt
the size. Your test case seems confirm that.

> Is it only with macros that the string structure is so shared?

That is right. I haven't see it can happen any other way.
The tokenizer always construct new token and string structure
from the C source file.

It is the preprocessor using macro expand which copy and duplicate
the token list. The token has a pointer point to the string which
is shared across different invocation of macro.

> And have we a way to test if the string is coming from a macro?

Not right now. But we can add it.

>
> A simpler and safer way would be to directly do the string expansion just after
> a string token is recognized, or even better in the lexer itself.
> So the string buffer, macro or not, will always directly contain the right values.
> But maybe there was good reasons to not do it this way.

I have an counter example that will not work. Let say

#define b(a, d) a##d
wchar_t s[] = b(L, "\xabcdabc");

When the lexer process the escape char, you did not know the string
is wide char or not. That can be changed after the macro expansion.

Chris

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Avoid reuse of string buffer when concatening adjacent string litterals
  2015-02-04  0:32     ` Luc Van Oostenryck
  2015-02-04  3:26       ` Christopher Li
@ 2015-02-04  8:39       ` Rasmus Villemoes
  2015-02-04  8:58         ` Rasmus Villemoes
  1 sibling, 1 reply; 28+ messages in thread
From: Rasmus Villemoes @ 2015-02-04  8:39 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: linux-sparse, Christopher Li

On Wed, Feb 04 2015, Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote:

> On Tue, Feb 03, 2015 at 11:38:02PM +0100, Rasmus Villemoes wrote:
>> 
>> Thanks, but there's still something wrong. Using your show-data feature
>> on this:
>> 
>> ===
>> #define BACKSLASH "\\"
>> #define LETTER_t "t"
>> 
>> static const char s1[] = BACKSLASH;
>> /* static const char s2[] = BACKSLASH; */
>> static const char s3[] = BACKSLASH LETTER_t;
>> static const char s4[] = "a" BACKSLASH LETTER_t "b";
>> ===
>> 
>> I get
>> 
>> symbol s1:
>>         char static const [toplevel] s1[0]
>>         bit_size = 16
>>         val = "\\"
>> symbol s3:
>>         char static const [toplevel] s3[0]
>>         bit_size = 24
>>         val = "\0t"
>> symbol s4:
>>         char static const [toplevel] s4[0]
>>         bit_size = 40
>>         val = "a\0tb"
>> 
>> Now if I do the same with s2 not commented out, I get
>> 
>> 
>> symbol s1:
>>         char static const [toplevel] s1[0]
>>         bit_size = 16
>>         val = "\0"
>> symbol s2:
>>         char static const [toplevel] s2[0]
>>         bit_size = 16
>>         val = "\0"
>> symbol s3:
>>         char static const [toplevel] s3[0]
>>         bit_size = 24
>>         val = "\0t"
>> symbol s4:
>>         char static const [toplevel] s4[0]
>>         bit_size = 40
>>         val = "a\0tb"
>> 
>> So the expansion of BACKSLASH changes depending on how often it is
>> expanded...
>> 
>> The LETTER_t thing above is because I thought I had somehow provoked a
>> double expansion, making BACKSLASH LETTER_t (or some variant) expand to
>> a single-character string containing just a tab. But I can't seem to
>> reproduce that particular behaviour, so maybe I'm imagining
>> stuff. Anyway, the above is certainly real.
>> 
>> Thanks,
>> Rasmus
>> --
> Yes, I see.
>
> Now thinking about it, it's obvious that the string buffer can't be reused at all
> if there is any kind of expansion done on it, the adjacent strings concatenation
> make just the thing worse but are not the cause of it.
>

That was also my conclusion from looking at the code, but I was unable
to do anything about it. And I wasn't hallucinating, I was just
overcomplicating things:

#define NOT_TAB "\\t"

static const char s1[] = NOT_TAB;
static const char s2[] = NOT_TAB;

indeed fails.

Thanks,
Rasmus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Avoid reuse of string buffer when concatening adjacent string litterals
  2015-02-04  8:39       ` Rasmus Villemoes
@ 2015-02-04  8:58         ` Rasmus Villemoes
  2015-02-04 16:20           ` Christopher Li
  0 siblings, 1 reply; 28+ messages in thread
From: Rasmus Villemoes @ 2015-02-04  8:58 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: linux-sparse, Christopher Li

On Wed, Feb 04 2015, Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote:

> And I wasn't hallucinating, I was just overcomplicating things:
>
> #define NOT_TAB "\\t"
>
> static const char s1[] = NOT_TAB;
> static const char s2[] = NOT_TAB;
>
> indeed fails.

While we're collecting examples, let me also mention that __FILE__
doesn't work for files with backslash in their name. Sane people of
course don't put backslashes in file names, but they are a rather normal
occurence in path names on a certain operating system.

Rasmus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Avoid reuse of string buffer when concatening adjacent string litterals
  2015-02-04  8:58         ` Rasmus Villemoes
@ 2015-02-04 16:20           ` Christopher Li
  2015-02-06 21:52             ` Rasmus Villemoes
  0 siblings, 1 reply; 28+ messages in thread
From: Christopher Li @ 2015-02-04 16:20 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: Luc Van Oostenryck, Linux-Sparse

On Wed, Feb 4, 2015 at 12:58 AM, Rasmus Villemoes
<linux@rasmusvillemoes.dk> wrote:
> On Wed, Feb 04 2015, Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote:
>
>> And I wasn't hallucinating, I was just overcomplicating things:
>>
>> #define NOT_TAB "\\t"
>>
>> static const char s1[] = NOT_TAB;
>> static const char s2[] = NOT_TAB;
>>
>> indeed fails.
>
> While we're collecting examples, let me also mention that __FILE__
> doesn't work for files with backslash in their name. Sane people of
> course don't put backslashes in file names, but they are a rather normal
> occurence in path names on a certain operating system.

Can you submit a patch for adding the test case you found?
I will include those into the the test suit.

Thanks

Chris

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] Avoid reusing string buffer when doing string expansion
  2015-02-04  8:01           ` Christopher Li
@ 2015-02-04 16:38             ` Christopher Li
  2015-02-04 23:38               ` Luc Van Oostenryck
  2015-02-04 23:38             ` Luc Van Oostenryck
  1 sibling, 1 reply; 28+ messages in thread
From: Christopher Li @ 2015-02-04 16:38 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: Rasmus Villemoes, Linux-Sparse

On Wed, Feb 4, 2015 at 12:01 AM, Christopher Li <sparse@chrisli.org> wrote:
>
> When the lexer process the escape char, you did not know the string
> is wide char or not. That can be changed after the macro expansion.

With that in mind, we can't actually perform the escape char substitution
before the pre-processor stage.

Here is an untested patch adding the immutable string to macro body.
I need to double check if the macro arguments needs it as well.

Can you guys try it on the test case you have?

Thanks

Chris

diff --git a/char.c b/char.c
index 08ca223..7f86b8a 100644
--- a/char.c
+++ b/char.c
@@ -123,7 +123,7 @@ struct token *get_string_constant(struct token
*token, struct expression *expr)
                len = MAX_STRING;
        }

-       if (len >= string->length)      /* can't cannibalize */
+       if (string->immutable || len >= string->length) /* can't cannibalize */
                string = __alloc_string(len+1);
        string->length = len+1;
        memcpy(string->data, buffer, len);
diff --git a/pre-process.c b/pre-process.c
index 1aa3d2c..57d84b9 100644
--- a/pre-process.c
+++ b/pre-process.c
@@ -1259,8 +1259,16 @@ static struct token *parse_expansion(struct
token *expansion, struct token *argl
                } else {
                        try_arg(token, TOKEN_MACRO_ARGUMENT, arglist);
                }
-               if (token_type(token) == TOKEN_ERROR)
+               switch (token_type(token)) {
+               case TOKEN_ERROR:
                        goto Earg;
+
+               case TOKEN_STRING:
+               case TOKEN_WIDE_STRING:
+                       token->string->immutable = 1;
+               default:
+                       ;
+               }
        }
        token = alloc_token(&expansion->pos);
        token_type(token) = TOKEN_UNTAINT;
diff --git a/token.h b/token.h
index 8dbd80f..af66b2b 100644
--- a/token.h
+++ b/token.h
@@ -164,7 +164,8 @@ enum special_token {
 };

 struct string {
-       unsigned int length;
+       unsigned int length:31;
+       unsigned int immutable:1;
        char data[];
 };

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] Avoid reusing string buffer when doing string expansion
  2015-02-04  8:01           ` Christopher Li
  2015-02-04 16:38             ` Christopher Li
@ 2015-02-04 23:38             ` Luc Van Oostenryck
  1 sibling, 0 replies; 28+ messages in thread
From: Luc Van Oostenryck @ 2015-02-04 23:38 UTC (permalink / raw)
  To: Christopher Li; +Cc: Rasmus Villemoes, Linux-Sparse

On Wed, Feb 04, 2015 at 12:01:39AM -0800, Christopher Li wrote:
> On Tue, Feb 3, 2015 at 10:22 PM, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
> >> Are you sure about this behavior? You mean you see "b" has the string
> >> size as 2. I haven't understand how this can happen.
> >
> >
> > But if the macro is used several times:
> > ===
> > #define BACKSLASH "\\"
> > const char a[] = BACKSLASH;
> > const char b[] = BACKSLASH;
> > const char c[] = "<" BACKSLASH ">";
> > ===
> >
> > the, we get:
> > ===
> > symbol a:
> >         char const [addressable] [toplevel] a[0]
> >         bit_size = 16
> >         val = "\0"
> > symbol b:
> >         char const [addressable] [toplevel] b[0]
> >         bit_size = 16
> 
> The value buffer is corrupted. But the bit_size is still 16, which
> is correct. I just think that in your example it shouldn't corrupt
> the size. Your test case seems confirm that.
> 
> > Is it only with macros that the string structure is so shared?
> 
> That is right. I haven't see it can happen any other way.
> The tokenizer always construct new token and string structure
> from the C source file.
> 
> It is the preprocessor using macro expand which copy and duplicate
> the token list. The token has a pointer point to the string which
> is shared across different invocation of macro.

Fine.
I was affraid that there was other possibilities, like, for exemple,
if the identical string litterals are put in an hash table, like it is done
for identifiers.

> > And have we a way to test if the string is coming from a macro?
> 
> Not right now. But we can add it.
> 
> >
> > A simpler and safer way would be to directly do the string expansion just after
> > a string token is recognized, or even better in the lexer itself.
> > So the string buffer, macro or not, will always directly contain the right values.
> > But maybe there was good reasons to not do it this way.
> 
> I have an counter example that will not work. Let say
> 
> #define b(a, d) a##d
> wchar_t s[] = b(L, "\xabcdabc");
> 
> When the lexer process the escape char, you did not know the string
> is wide char or not. That can be changed after the macro expansion.
> 
> Chris

Yes, I see.

BTW, I've checked and there is a lot of problems with wide strings.
I'll send some test case later.


Regards,
Luc

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] Avoid reusing string buffer when doing string expansion
  2015-02-04 16:38             ` Christopher Li
@ 2015-02-04 23:38               ` Luc Van Oostenryck
  2015-02-06 13:58                 ` Christopher Li
  0 siblings, 1 reply; 28+ messages in thread
From: Luc Van Oostenryck @ 2015-02-04 23:38 UTC (permalink / raw)
  To: Christopher Li; +Cc: Rasmus Villemoes, Linux-Sparse

On Wed, Feb 04, 2015 at 08:38:19AM -0800, Christopher Li wrote:
> On Wed, Feb 4, 2015 at 12:01 AM, Christopher Li <sparse@chrisli.org> wrote:
> >
> > When the lexer process the escape char, you did not know the string
> > is wide char or not. That can be changed after the macro expansion.
> 
> With that in mind, we can't actually perform the escape char substitution
> before the pre-processor stage.
> 
> Here is an untested patch adding the immutable string to macro body.
> I need to double check if the macro arguments needs it as well.
> 
> Can you guys try it on the test case you have?
> 
> Thanks
> 
> Chris

Working perfectly here.

Greetings,
Luc

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] Avoid reusing string buffer when doing string expansion
  2015-02-04 23:38               ` Luc Van Oostenryck
@ 2015-02-06 13:58                 ` Christopher Li
  2015-02-06 20:32                   ` Rasmus Villemoes
  0 siblings, 1 reply; 28+ messages in thread
From: Christopher Li @ 2015-02-06 13:58 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: Rasmus Villemoes, Linux-Sparse

On Wed, Feb 4, 2015 at 3:38 PM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
>
> Working perfectly here.
>

Any update of the string test case?

I commit a review version of the immutable string fix in review-immutable-string
branch. Compare to the last patch,  it take into account that string
can come from
macro arguments rather than macro body.

It also avoid the string copy if there is no change, that is pretty
common as well.

Care to give this version a try?

https://git.kernel.org/cgit/devel/sparse/chrisl/sparse.git/commit/?h=review-immutable-string&id=de1fa7e60d3d179a1b67c47a0429b2d0ac4e4842

Chris

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] Avoid reusing string buffer when doing string expansion
  2015-02-06 13:58                 ` Christopher Li
@ 2015-02-06 20:32                   ` Rasmus Villemoes
  0 siblings, 0 replies; 28+ messages in thread
From: Rasmus Villemoes @ 2015-02-06 20:32 UTC (permalink / raw)
  To: Christopher Li; +Cc: Luc Van Oostenryck, Linux-Sparse

On Fri, Feb 06 2015, Christopher Li <sparse@chrisli.org> wrote:

> On Wed, Feb 4, 2015 at 3:38 PM, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
>>
>> Working perfectly here.
>>
>
> Any update of the string test case?
>
> I commit a review version of the immutable string fix in review-immutable-string
> branch. Compare to the last patch,  it take into account that string
> can come from
> macro arguments rather than macro body.
>
> It also avoid the string copy if there is no change, that is pretty
> common as well.
>
> Care to give this version a try?
>
> https://git.kernel.org/cgit/devel/sparse/chrisl/sparse.git/commit/?h=review-immutable-string&id=de1fa7e60d3d179a1b67c47a0429b2d0ac4e4842

Sorry for taking so long to reply. This seems to work for the cases I
can think of, but I haven't put it to the real test yet (whether the
format strings are going to be messed up).

Rasmus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Avoid reuse of string buffer when concatening adjacent string litterals
  2015-02-04 16:20           ` Christopher Li
@ 2015-02-06 21:52             ` Rasmus Villemoes
  2015-02-07  1:30               ` Christopher Li
  0 siblings, 1 reply; 28+ messages in thread
From: Rasmus Villemoes @ 2015-02-06 21:52 UTC (permalink / raw)
  To: Christopher Li; +Cc: Luc Van Oostenryck, Linux-Sparse

On Wed, Feb 04 2015, Christopher Li <sparse@chrisli.org> wrote:

> On Wed, Feb 4, 2015 at 12:58 AM, Rasmus Villemoes
> <linux@rasmusvillemoes.dk> wrote:
>> On Wed, Feb 04 2015, Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote:
>>
>>> And I wasn't hallucinating, I was just overcomplicating things:
>>>
>>> #define NOT_TAB "\\t"
>>>
>>> static const char s1[] = NOT_TAB;
>>> static const char s2[] = NOT_TAB;
>>>
>>> indeed fails.
>>
>> While we're collecting examples, let me also mention that __FILE__
>> doesn't work for files with backslash in their name. Sane people of
>> course don't put backslashes in file names, but they are a rather normal
>> occurence in path names on a certain operating system.
>
> Can you submit a patch for adding the test case you found?
> I will include those into the the test suit.

I'd like to, but I'm not sure how to write the test in terms of sparse's
test frame work. How do I check that the string is as expected? 

Rasmus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Avoid reuse of string buffer when concatening adjacent string litterals
  2015-02-06 21:52             ` Rasmus Villemoes
@ 2015-02-07  1:30               ` Christopher Li
  2015-02-09 21:48                 ` Damien Lespiau
  0 siblings, 1 reply; 28+ messages in thread
From: Christopher Li @ 2015-02-07  1:30 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: Luc Van Oostenryck, Linux-Sparse

On Fri, Feb 6, 2015 at 1:52 PM, Rasmus Villemoes
<linux@rasmusvillemoes.dk> wrote:
>> Can you submit a patch for adding the test case you found?
>> I will include those into the the test suit.
>
> I'd like to, but I'm not sure how to write the test in terms of sparse's
> test frame work. How do I check that the string is as expected?

You can use ./test-parsing to evaluate the test file.
$ cat /tmp/v.c
#define BACKSLASH "\\"
static const char a[] = BACKSLASH "a";
static const char b[] = BACKSLASH "b";

# on current master branch it shows:
$ ./test-parsing /tmp/v.c
.align 1
char static const [toplevel] a[0]
 =
    movi.64        v2,&"\7b"
    ld.24        v3,[v2]
,
.align 1
char static const [toplevel] b[0]
 =
    movi.64        v5,&"\7b"
    ld.24        v6,[v5]

# on the review-immutable-string branch it shows:
$ ./test-parsing /tmp/v.c

.align 1
char static const [toplevel] a[0]
 =
    movi.64        v2,&"\\a"
    ld.24        v3,[v2]
,
.align 1
char static const [toplevel] b[0]
 =
    movi.64        v5,&"\\b"
    ld.24        v6,[v5]

Notice that "\\a" and "\\b" was not there before the
bug was fixed.

The test suit is just some C file in the "validations" directory.
Each file has a test name and optional how to invoke the C file
and what to expect from the running result.
You should able to find some example in the test suit.
Or, you can just submit a patch to include those C file
in the "validations" directory. Let some one help you complete
the test suit.

You run the test suit as:

make check

Chris

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] Avoid reuse of string buffer when concatening adjacent string litterals
  2015-02-07  1:30               ` Christopher Li
@ 2015-02-09 21:48                 ` Damien Lespiau
  0 siblings, 0 replies; 28+ messages in thread
From: Damien Lespiau @ 2015-02-09 21:48 UTC (permalink / raw)
  To: Christopher Li; +Cc: Rasmus Villemoes, Luc Van Oostenryck, Linux-Sparse

On 7 February 2015 at 01:30, Christopher Li <sparse@chrisli.org> wrote:
> On Fri, Feb 6, 2015 at 1:52 PM, Rasmus Villemoes
> <linux@rasmusvillemoes.dk> wrote:
>>> Can you submit a patch for adding the test case you found?
>>> I will include those into the the test suit.
>>
>> I'd like to, but I'm not sure how to write the test in terms of sparse's
>> test frame work. How do I check that the string is as expected?
>
> You can use ./test-parsing to evaluate the test file.

There's even a bit of documentation!

http://git.kernel.org/cgit/devel/sparse/sparse.git/tree/Documentation/test-suite

-- 
Damien

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2015-02-09 21:48 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-30 22:16 Bad interaction between macro expansion and literal concatenation Rasmus Villemoes
2015-01-31  1:23 ` [PATCH] Avoid reuse of string buffer when concatening adjacent string litterals Luc Van Oostenryck
2015-02-03 22:38   ` Rasmus Villemoes
2015-02-04  0:32     ` Luc Van Oostenryck
2015-02-04  3:26       ` Christopher Li
2015-02-04  8:39       ` Rasmus Villemoes
2015-02-04  8:58         ` Rasmus Villemoes
2015-02-04 16:20           ` Christopher Li
2015-02-06 21:52             ` Rasmus Villemoes
2015-02-07  1:30               ` Christopher Li
2015-02-09 21:48                 ` Damien Lespiau
2015-02-04  2:01     ` [PATCH v2] Avoid reusing string buffer when doing string expansion Luc Van Oostenryck
2015-02-04  5:30       ` Christopher Li
2015-02-04  6:22         ` Luc Van Oostenryck
2015-02-04  8:01           ` Christopher Li
2015-02-04 16:38             ` Christopher Li
2015-02-04 23:38               ` Luc Van Oostenryck
2015-02-06 13:58                 ` Christopher Li
2015-02-06 20:32                   ` Rasmus Villemoes
2015-02-04 23:38             ` Luc Van Oostenryck
2015-01-31  5:16 ` Bad interaction between macro expansion and literal concatenation Christopher Li
2015-02-01  2:19   ` [PATCH 0/4] Teach sparse to display data/initial values Luc Van Oostenryck
2015-02-01  2:19     ` [PATCH 1/4] Add support for '-vdata', the equivalent of '-ventry' but for data Luc Van Oostenryck
2015-02-01  2:19     ` [PATCH 2/4] Add support for show_data() Luc Van Oostenryck
2015-02-02  5:30       ` Christopher Li
2015-02-04  0:50         ` Luc Van Oostenryck
2015-02-01  2:19     ` [PATCH 3/4] Teach sparse to display data/initial values Luc Van Oostenryck
2015-02-01  2:19     ` [PATCH 4/4] Small test/exemple for using '-vdata' Luc Van Oostenryck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.