All of lore.kernel.org
 help / color / mirror / Atom feed
From: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
To: Christopher Li <sparse@chrisli.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Linux-Sparse <linux-sparse@vger.kernel.org>
Subject: Re: [PATCH v2] Avoid reusing string buffer when doing string expansion
Date: Wed, 4 Feb 2015 07:22:50 +0100	[thread overview]
Message-ID: <20150204062250.GA9989@macbook.lan> (raw)
In-Reply-To: <CANeU7QnYCGWK0LH8+f=bDSbdPHfDvjdRtmUQF5R8j6h9fDBp2g@mail.gmail.com>

On Tue, Feb 03, 2015 at 09:30:15PM -0800, Christopher Li wrote:
> On Tue, Feb 3, 2015 at 6:01 PM, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
> >
> > In get_string_constant(), the code tried to reuse the storage for the string
> > but only if the expansion of the string was not bigger than its unexpanded form.
> > But this string can be shared with other expressions and reusing the buffer will
> > result in later corruption
> >
> > A minimal exemple would be something like:
> > const char a[] = BACKSLASH;
> > const char b[] = BACKSLASH;
> >
> > The expansion for 'a' will correctly produce the two-char string consisting
> > of a backslash char followed by a null char.
> > But then the expansion of 'b' will expand this once more,
> > producing the expansion of "\0": the two-char string: { '\0', '\0' }.
> 
> Are you sure about this behavior? You mean you see "b" has the string
> size as 2. I haven't understand how this can happen.

Using the show_data() / sparse -vdata on:
===
#define BACKSLASH "\\"
const char a[] = BACKSLASH;
===

gives, correctly:
===
symbol a:
	char const [addressable] [toplevel] b[0]
	bit_size = 16
	val = "\\"
=== 

But if the macro is used several times:
===
#define BACKSLASH "\\"
const char a[] = BACKSLASH;
const char b[] = BACKSLASH;
const char c[] = "<" BACKSLASH ">";
===

the, we get:
===
symbol a:
	char const [addressable] [toplevel] a[0]
	bit_size = 16
	val = "\0"
symbol b:
	char const [addressable] [toplevel] b[0]
	bit_size = 16
	val = "\0"
symbol c:
	char const [addressable] [toplevel] c[0]
	bit_size = 32
	val = "<\0>"
===

And even worse:
===
#define BACKSLASH "(\\)"
const char m[] = BACKSLASH;
const char n[] = BACKSLASH;
const char k[] = "<" BACKSLASH ">";
===

gives:
===
symbol m:
	char const [addressable] [toplevel] m[0]
	bit_size = 24
	val = "()"
symbol n:
	char const [addressable] [toplevel] n[0]
	bit_size = 24
	val = "()"
symbol k:
	char const [addressable] [toplevel] k[0]
	bit_size = 40
	val = "<()>"
===

> > The fix is to not reuse the storage for the string if any king of expansion
> > have been done.
> 
> That is a bit over kill. We only need to avoid reuse storage if the
> destination part of the string is come from a preprocessor macro.
> It is pretty common string contain escape sequence. We don't
> want to allocate extra memory copy if it is not part of a macro
> expansion.

Well yes ...
Is it only with macros that the string structure is so shared?
And have we a way to test if the string is coming from a macro?

 
> >
> > Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
> > Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
> > ---
> >  char.c | 12 ++++++++----
> >  1 file changed, 8 insertions(+), 4 deletions(-)
> >
> > diff --git a/char.c b/char.c
> > index 08ca2230..2e21bb77 100644
> > --- a/char.c
> > +++ b/char.c
> > @@ -123,11 +123,15 @@ struct token *get_string_constant(struct token *token, struct expression *expr)
> >                 len = MAX_STRING;
> >         }
> >
> > -       if (len >= string->length)      /* can't cannibalize */
> > +       /* The input string can be shared with other expression and so
> > +        * its storage can't be reused if any kind of expansion have been done on it.
> > +        */
> > +       if ((len != string->length) || memcmp(buffer, string->data, len)) {
> 
> I don' think this check take into account the preprocessor macro has
> been used or not. In other words, any general "hello world\n" which
> contain the escape character will produce a different buffer, there for,
> a new copy of the string. Which is not necessary. That is a pretty
> common case.

No, indeed, it does not.
It just allocate a new buffer every time there is any modification/expansion
so that the original one is not touched (in case it is used elsewhere).

> 
> I am working on patch to address it in the preprocessor macro.
> The idea is that just mark the string as immutable if it is part of the
> macro expansion. I will see how it goes.
> 
> Chris
> --

A simpler and safer way would be to directly do the string expansion just after
a string token is recognized, or even better in the lexer itself.
So the string buffer, macro or not, will always directly contain the right values.
But maybe there was good reasons to not do it this way.

Luc

  reply	other threads:[~2015-02-04  6:22 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-30 22:16 Bad interaction between macro expansion and literal concatenation Rasmus Villemoes
2015-01-31  1:23 ` [PATCH] Avoid reuse of string buffer when concatening adjacent string litterals Luc Van Oostenryck
2015-02-03 22:38   ` Rasmus Villemoes
2015-02-04  0:32     ` Luc Van Oostenryck
2015-02-04  3:26       ` Christopher Li
2015-02-04  8:39       ` Rasmus Villemoes
2015-02-04  8:58         ` Rasmus Villemoes
2015-02-04 16:20           ` Christopher Li
2015-02-06 21:52             ` Rasmus Villemoes
2015-02-07  1:30               ` Christopher Li
2015-02-09 21:48                 ` Damien Lespiau
2015-02-04  2:01     ` [PATCH v2] Avoid reusing string buffer when doing string expansion Luc Van Oostenryck
2015-02-04  5:30       ` Christopher Li
2015-02-04  6:22         ` Luc Van Oostenryck [this message]
2015-02-04  8:01           ` Christopher Li
2015-02-04 16:38             ` Christopher Li
2015-02-04 23:38               ` Luc Van Oostenryck
2015-02-06 13:58                 ` Christopher Li
2015-02-06 20:32                   ` Rasmus Villemoes
2015-02-04 23:38             ` Luc Van Oostenryck
2015-01-31  5:16 ` Bad interaction between macro expansion and literal concatenation Christopher Li
2015-02-01  2:19   ` [PATCH 0/4] Teach sparse to display data/initial values Luc Van Oostenryck
2015-02-01  2:19     ` [PATCH 1/4] Add support for '-vdata', the equivalent of '-ventry' but for data Luc Van Oostenryck
2015-02-01  2:19     ` [PATCH 2/4] Add support for show_data() Luc Van Oostenryck
2015-02-02  5:30       ` Christopher Li
2015-02-04  0:50         ` Luc Van Oostenryck
2015-02-01  2:19     ` [PATCH 3/4] Teach sparse to display data/initial values Luc Van Oostenryck
2015-02-01  2:19     ` [PATCH 4/4] Small test/exemple for using '-vdata' Luc Van Oostenryck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150204062250.GA9989@macbook.lan \
    --to=luc.vanoostenryck@gmail.com \
    --cc=linux-sparse@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=sparse@chrisli.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.