All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konrad Eisele <eiselekd@gmail.com>
To: Christopher Li <sparse@chrisli.org>
Cc: Konrad Eisele <konrad@gaisler.com>, linux-sparse@vger.kernel.org
Subject: Re: dependency tee from c parser entities downto token
Date: Sat, 5 May 2012 18:59:39 +0200	[thread overview]
Message-ID: <CAEjhO7JqWNwGzEYX+xr2gvuCTcnNPovNgk3X1q69E5_CMPnSTw@mail.gmail.com> (raw)
In-Reply-To: <CANeU7QnNemo=iyx3+VJBgEzf4kOacxOx9JTsbu5Of46+3NkAJA@mail.gmail.com>

>
> I am not sure I understand your range representation yet.
>

You need to view it with a fixed width font. Its not rocket science,
token lists (or arrays) are viewed as dotted lists. The token.pos
field is listed below each token as p[x] or as the file-location
in file-scope.

I'll come up with a patch to implement this scheme when I have
time to and send it, it might take a while.
-- Konrad

> To be continue...
>
> Chris
>
>
>>
>> Note that a reference to p[] in p[x] notation only references
>> the "start" of the  PP_struct.copy. An uique identification
>> of the "source" token might not always be possible because
>> of disambiguities, so when doing a copy of the  tokens in
>> PP_struct.copy I might use an extended version of struct token
>> to also include an offset.
>>
>> ----- file a.h start -----
>> #define D0(d0a0,d0a1) 1 D1(d0a0) 2 D2(d0a1) 3
>> #define D1(d1a0) 4 d1a0 5
>> #define D2(d2a0) 6 d2a0 7
>> #define D3(d3a0) 8 d3a0 9
>> D0(D3(10),11)
>> ----- file a.h end   .....
>>
>> Preprocessor output (gcc -E a.h): "1 4 8 10 9 5 2 6 11 7 3"
>>
>> PreProcessor macro trace on p[]:
>>
>> p[0]:mdefn_body[D0]     :1.D1.(.d0a0.).2.D2.(.d0a1.).3
>>                         [ a.h:1:23     ..   a.h:1:45]
>> p[1]:mdefn_body[D1]     :4   .   d1a0   .    5
>>                         [ a.h:2:18..a.h:2:25]
>> p[2]:mdefn_body[D2]     :6   .   d2a0   .    7
>>                         [ a.h:3:18..a.h:3:25]
>> p[3]:mdefn_body[D3]     :8   .   d3a0   .    9
>>                         [ a.h:4:18..a.h:4:25]
>> p[4]:minst_arg0[D0]     :D3  . (  .   10 . )
>>                         [ a.h:5:4..a.h:5:9]
>> p[5]:minst_arg1[D0]     :11
>>                         [a.h:5:11]
>> p[6]:minst_arg0[D3]     :10
>>                         p[4]
>> p[7]:(args)expand[p[3]] :8    .  10   .  9
>>                         p[3]    p[4]    p[3]
>> p[8]:minst_arg0[d2]     :11
>>                         p[5]
>> p[9]:(body)expand[p[2]] :6   .   11   .    7
>>                         p[2]    p[5]      p[2]
>> p[10]:(body)expand[p[0]]:1  .4  .8  .10 .9  .5  .2  .6  .11 .7  .3
>>                         p[0]p[1]p[7]p[7]p[7]p[1]p[0]p[9]p[9]p[9]p[0]
>>
>>
>> p[0]-p[3] are build up when the macro is defined.
>>          A p[] entry is needed to destinguish between
>>          the different sources of tokens.
>> p[4],p[5] is build in collect_arguments() for D0(D3(10),11)
>> p[6]      is build in collect_arguments() for D3(10)
>> p[7]      is build in call to macro_expand() hook with flag that
>>          it is a (args)expand
>> p[8]      is build in collect_arguments() for D2(11)
>>          (inside D0's expansion
>> p[9]      is build in call to macro_expand() hook with flag that
>>          it is a (body)expand (of D2)
>> p[10]     is build in call to macro_expand() hook with flag that
>>          it is a (body)expand (of D0)
>>
>> PP_struct {
>>          enum {minst_arg, expand_body, expand_arg, mdef_body} typ;
>>          uint argidx;
>>          struct symbol *macro;
>>          struct token copy[];
>> };
>>
>> Conclusion:
>> -----------
>> Apart from the macro_expand() hook I also need hooks
>> in macro definition and also in collect_arguments() or expand().
>>
>>
>> Concerning (3) How to connect (1) and (2) to the AST
>> ----------------------------------------------------
>>
>> can maybe wait for later iteration. There are more complex parts
>> involved...
>>
>>
>>
>>>
>>> Now how to connect the AST tree with those information is a
>>> very good question. Notice the symbol->aux pointer? That is
>>> the place to attach extra context or back end related data
>>> to symbols.
>>>
>>> Because each symbol has "pos" and "endpos". If the symbol
>>> is expand from macro, using the previous scheme, the pos
>>> should point to a line in the "<pre-processor>" stream.
>>>
>>> However, if the macro expand is happen between "pos" and
>>> "endpos", you will not able to access the token that contain
>>> the macro expand "pos" easily.
>>>
>>> For that, we could, just thinking it out loud, add a parser
>>> hook for declares when a symbol is complete building.
>>> That would a very small and straight forward change.
>>> If the hook is not NULL, the call back function will be call
>>> with the symbol that just get defined, and the start and end
>>> token of that symbol.
>>>
>>> So your dependence program just need to register the
>>> symbol parsing hook. In side the call back function, walk
>>> the token from start to end. Look up macro expand information
>>> is needed. Build up the dependency struct and store that in
>>> symbol->aux.
>>>
>>> BTW, unrelated to this patch, I can see other program might
>>> be able to use the same parser hook to perform source code
>>> transformations as well.
>>>
>>> Make sense? In this way, you don't even need the hash
>>> table to attach a context into the token. You can get it directly
>>> from symbol->aux.
>>>
>>>> In my patch I have modeled (2) using 2 structs:
>>>> struct macro_expansion {
>>>>        int nargs;
>>>>        struct symbol *sym;
>>>>        struct token *m;
>>>>        struct arg args[0];
>>>> };
>>>> struct tok_macro_dep {
>>>>        struct macro_expansion *m;
>>>>        unsigned int argi;
>>>>        unsigned int isbody : 1;
>>>>        unsigned int visited : 1;
>>>> };
>>>> Each token from a macro expansion gets tagged with
>>>> tok_macro_dep. If it is an macro argument,<argi>  shows the
>>>> index, if it is from the macro body<isbody>  is 1.
>>>> Now, I didnt already think about special cases like
>>>> token concaternation, even more data is needed to
>>>> model this. Also when an macro argument is again used as an
>>>> macro argument inside the body expansion, then I kindof
>>>> loose the chain: I would also need a "token *dup_of" pointer
>>>> to point to the original token that the token is a copy
>>>> of (when arguments are created...) etc.
>>>>
>>>> I have read your macro_expand() hook idea, however
>>>> when I understand it right you want to reuse position.stream and
>>>> position.line as a kind of pointer (to save the extra 4 bytes).
>>>> (Your goal is to minimize codebase change, however I wonder
>>>> weather you dont change semantic of struct position and then
>>>> need to change the code that uses struct position anyway...)
>>>
>>>
>>> Nope, because the position.stream change is only happen on
>>> your dependency analyse program. It is the dependency program
>>> register the hook to it. This behaviour is private to the dependency
>>> analyse program. Other program that use sparse library don't see
>>> it at all, because they don't register macro_expand hooks to perform
>>> those stream manipulations. It will receive the exact AST as before.
>>>
>>>> Maybe it is possible like this...I doubt it, where should
>>>> all the extra context, that each token has, be saved and
>>>> extracted from? using that sheme...
>>>
>>>
>>> Two places, one is symbol->aux. Also the macro_expand
>>> can be lookup by pos->line. That will index into the macro_expand
>>> array which store the context.
>>>
>>> Having this two should be enough to put the exact same
>>> dependency result as you are doing right now.
>>>
>>>> Maybe it is possible but I dont want to have as a design
>>>> goal to save 4 bytes (I'd use the void *custom sheme to
>>>> save all my extra data, also the pointers to tokens to
>>>> "sit around") and adujust everything else to
>>>> that. The consequence is that the code-complexity would
>>>> grow on the other end.
>>>
>>>
>>> It is not only about saving 4 bytes. It is about other program
>>> don't have to suck in the full token struct if they don't need to.
>>> It is about re-usable macro hooks and parser hooks that
>>> external program can do more fancy stuff like source code transformations
>>> without impacting the other user of the sparse lib.
>>>
>>>> Here is my compromise then:
>>>> Keep the orignial "pos". But still grant me for
>>>> each struct a "void *custom" pointer that I can use
>>>> to store extradata i.e. pointer to token.
>>>
>>>
>>> symbol->aux.
>>>
>>> Chris
>>>
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2012-05-05 16:59 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-24  9:54 dependency tee from c parser entities downto token Konrad Eisele
2012-04-25 20:10 ` [PATCH] depend.c: build up a dependency tree from c entities downto tokens: entries in the tree are: macro-depend: tree of #if nesting macro-expansions: possible macro expansion source of a token tok->macro-expansions->macro tok->macro-depend->macro c entities are linked in via [stmt|expr|sym]->start-end-token Konrad Eisele
2012-04-30 22:58 ` dependency tee from c parser entities downto token Christopher Li
2012-05-02  7:27   ` Konrad Eisele
2012-05-03 23:52     ` Christopher Li
2012-05-04  7:33       ` Konrad Eisele
2012-05-04  9:25         ` Christopher Li
2012-05-04 10:36           ` Konrad Eisele
2012-05-04 12:36             ` Konrad Eisele
2012-05-04 15:30               ` Josh Triplett
2012-05-04 20:53                 ` Konrad Eisele
2012-05-04 22:30                   ` Christopher Li
2012-05-05  0:32                     ` Josh Triplett
2012-05-05  8:59                       ` Konrad Eisele
2012-05-05  8:56                     ` Konrad Eisele
2012-05-04 18:02             ` Christopher Li
2012-05-04 21:46               ` Konrad Eisele
2012-05-04 21:56                 ` Konrad Eisele
2012-05-04 23:05                 ` Christopher Li
2012-05-05  8:54                   ` Konrad Eisele
2012-05-05 11:12                     ` Christopher Li
2012-05-05 16:59                       ` Konrad Eisele [this message]
     [not found]                         ` <CANeU7Qn7vUzLQAF6JGRECro_pPDnL7MCswkrNACe1wohLHZu7g@mail.gmail.com>
2012-05-05 19:56                           ` Fwd: " Christopher Li
2012-05-05 23:38                             ` Konrad Eisele
2012-05-06 18:34                               ` Christopher Li
2012-05-07  6:12                                 ` Konrad Eisele
2012-05-07 22:06                                   ` Christopher Li
2012-05-08  6:38                                     ` Konrad Eisele
2012-05-09  9:18                                       ` Christopher Li
2012-05-09  9:48                                         ` Konrad Eisele
2012-05-09 22:50                                           ` Christopher Li
2012-05-10  6:19                                             ` Konrad Eisele
2012-05-10  6:38                                               ` Konrad Eisele
2012-05-10  9:37                                                 ` Christopher Li
2012-05-10  9:51                                                   ` Konrad Eisele
2012-05-10 11:25                                                     ` Christopher Li
2012-05-10 12:14                                                       ` Konrad Eisele
2012-05-10 12:28                                                         ` Konrad Eisele
2012-05-11 19:40                                                           ` Christopher Li
2012-05-11 21:48                                                             ` Konrad Eisele
2012-05-12 11:02                                                               ` Christopher Li
2012-05-12 17:46                                                                 ` Konrad Eisele
2012-05-12 17:57                                                                   ` Konrad Eisele
2012-05-13  8:52                                                                   ` Konrad Eisele
2012-05-15  6:30                                                                     ` Christopher Li
2012-05-15  7:52                                                                       ` Konrad Eisele
2012-05-15  9:44                                                                         ` Christopher Li
2012-05-15 13:03                                                                           ` Konrad Eisele
2012-05-14 10:53                                                                   ` Christopher Li
2012-05-10  9:03                                               ` Christopher Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEjhO7JqWNwGzEYX+xr2gvuCTcnNPovNgk3X1q69E5_CMPnSTw@mail.gmail.com \
    --to=eiselekd@gmail.com \
    --cc=konrad@gaisler.com \
    --cc=linux-sparse@vger.kernel.org \
    --cc=sparse@chrisli.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.