From mboxrd@z Thu Jan 1 00:00:00 1970 From: Luc Van Oostenryck Subject: Re: [PATCH v2] Avoid reusing string buffer when doing string expansion Date: Thu, 5 Feb 2015 00:38:03 +0100 Message-ID: <20150204233802.GA2275@macpro.local> References: <87y4ojhq2f.fsf@rasmusvillemoes.dk> <20150131012339.GA3460@macpro.local> <87386mvcxh.fsf@rasmusvillemoes.dk> <20150204020059.GA7069@macpro.local> <20150204062250.GA9989@macbook.lan> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from mail-wi0-f174.google.com ([209.85.212.174]:51023 "EHLO mail-wi0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755890AbbBDXiI (ORCPT ); Wed, 4 Feb 2015 18:38:08 -0500 Received: by mail-wi0-f174.google.com with SMTP id n3so35445564wiv.1 for ; Wed, 04 Feb 2015 15:38:07 -0800 (PST) Content-Disposition: inline In-Reply-To: Sender: linux-sparse-owner@vger.kernel.org List-Id: linux-sparse@vger.kernel.org To: Christopher Li Cc: Rasmus Villemoes , Linux-Sparse On Wed, Feb 04, 2015 at 12:01:39AM -0800, Christopher Li wrote: > On Tue, Feb 3, 2015 at 10:22 PM, Luc Van Oostenryck > wrote: > >> Are you sure about this behavior? You mean you see "b" has the string > >> size as 2. I haven't understand how this can happen. > > > > > > But if the macro is used several times: > > === > > #define BACKSLASH "\\" > > const char a[] = BACKSLASH; > > const char b[] = BACKSLASH; > > const char c[] = "<" BACKSLASH ">"; > > === > > > > the, we get: > > === > > symbol a: > > char const [addressable] [toplevel] a[0] > > bit_size = 16 > > val = "\0" > > symbol b: > > char const [addressable] [toplevel] b[0] > > bit_size = 16 > > The value buffer is corrupted. But the bit_size is still 16, which > is correct. I just think that in your example it shouldn't corrupt > the size. Your test case seems confirm that. > > > Is it only with macros that the string structure is so shared? > > That is right. I haven't see it can happen any other way. > The tokenizer always construct new token and string structure > from the C source file. > > It is the preprocessor using macro expand which copy and duplicate > the token list. The token has a pointer point to the string which > is shared across different invocation of macro. Fine. I was affraid that there was other possibilities, like, for exemple, if the identical string litterals are put in an hash table, like it is done for identifiers. > > And have we a way to test if the string is coming from a macro? > > Not right now. But we can add it. > > > > > A simpler and safer way would be to directly do the string expansion just after > > a string token is recognized, or even better in the lexer itself. > > So the string buffer, macro or not, will always directly contain the right values. > > But maybe there was good reasons to not do it this way. > > I have an counter example that will not work. Let say > > #define b(a, d) a##d > wchar_t s[] = b(L, "\xabcdabc"); > > When the lexer process the escape char, you did not know the string > is wide char or not. That can be changed after the macro expansion. > > Chris Yes, I see. BTW, I've checked and there is a lot of problems with wide strings. I'll send some test case later. Regards, Luc