master merge plans

All of lore.kernel.org
 help / color / mirror / Atom feed

* master merge plans
@ 2017-08-21 19:39 Christopher Li
  2017-08-22 13:32 ` Luc Van Oostenryck
  0 siblings, 1 reply; 33+ messages in thread
From: Christopher Li @ 2017-08-21 19:39 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: Linux-Sparse

Here is the stuff I am considering:

1) gcc attribute update (trivial and safe to merge)
2) Makefile update to enable debug builds.
    This I want to merge first because it impact a lot of
    Makefile, other patches typically add some source file.
    Which is easy to resolve the conflict than the other way
    around.
    I will send out the update soon
3) Luc's SSSA
    Luc has sssa-mini-v2 now. Is that the latest on to be merged?
    I still want to have option to disable SSSA and option to enable
    old code path. That can be done after the merge though.

Any thing else I missed? Please let me know.


Chris

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: master merge plans
  2017-08-21 19:39 master merge plans Christopher Li
@ 2017-08-22 13:32 ` Luc Van Oostenryck
  2017-08-22 14:47   ` Christopher Li
  2017-08-22 14:53   ` Dibyendu Majumdar
  0 siblings, 2 replies; 33+ messages in thread
From: Luc Van Oostenryck @ 2017-08-22 13:32 UTC (permalink / raw)
  To: Christopher Li; +Cc: Linux-Sparse

On Mon, Aug 21, 2017 at 9:39 PM, Christopher Li <sparse@chrisli.org> wrote:
> Here is the stuff I am considering:
>
> 3) Luc's SSSA
>     Luc has sssa-mini-v2 now. Is that the latest on to be merged?
>     I still want to have option to disable SSSA and option to enable
>     old code path. That can be done after the merge though.

I'm busy to make significant changes.
I'll resubmit everything in a few days.

> Any thing else I missed? Please let me know.

Nicolai's constexpr series.

-- Luc

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: master merge plans
  2017-08-22 13:32 ` Luc Van Oostenryck
@ 2017-08-22 14:47   ` Christopher Li
  2017-08-22 15:51     ` Christopher Li
  2017-08-22 14:53   ` Dibyendu Majumdar
  1 sibling, 1 reply; 33+ messages in thread
From: Christopher Li @ 2017-08-22 14:47 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: Linux-Sparse

On Tue, Aug 22, 2017 at 9:32 AM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
>
> Nicolai's constexpr series.
>

Is that constexpr-v4 branch on your git repository you want me to merge?

If you have a newer one, please post the git url.

Chris

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: master merge plans
  2017-08-22 13:32 ` Luc Van Oostenryck
  2017-08-22 14:47   ` Christopher Li
@ 2017-08-22 14:53   ` Dibyendu Majumdar
  2017-08-22 14:59     ` Christopher Li
  1 sibling, 1 reply; 33+ messages in thread
From: Dibyendu Majumdar @ 2017-08-22 14:53 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: Christopher Li, Linux-Sparse

Hi,

On 22 August 2017 at 14:32, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
> On Mon, Aug 21, 2017 at 9:39 PM, Christopher Li <sparse@chrisli.org> wrote:
>> Here is the stuff I am considering:
>>
>> 3) Luc's SSSA
>>     Luc has sssa-mini-v2 now. Is that the latest on to be merged?
>> Any thing else I missed? Please let me know.
>
> Nicolai's constexpr series.
>

It will be great if the SSSA series is merged and tested a bit before
merging other changes.

Regards
Dibyendu

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: master merge plans
  2017-08-22 14:53   ` Dibyendu Majumdar
@ 2017-08-22 14:59     ` Christopher Li
  2017-09-03 20:26       ` Simple SSA status Luc Van Oostenryck
  0 siblings, 1 reply; 33+ messages in thread
From: Christopher Li @ 2017-08-22 14:59 UTC (permalink / raw)
  To: Dibyendu Majumdar; +Cc: Luc Van Oostenryck, Linux-Sparse

On Tue, Aug 22, 2017 at 10:53 AM, Dibyendu Majumdar
<mobile@majumdar.org.uk> wrote:
>>
>
> It will be great if the SSSA series is merged and tested a bit before
> merging other changes.

I can create a topic branch for SSSA on sparse repository.

Merge it when it is ready?

Right now I do wish the SSSA has the two options I request
before.

Chris

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: master merge plans
  2017-08-22 14:47   ` Christopher Li
@ 2017-08-22 15:51     ` Christopher Li
  2017-08-22 20:05       ` Luc Van Oostenryck
  0 siblings, 1 reply; 33+ messages in thread
From: Christopher Li @ 2017-08-22 15:51 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: Linux-Sparse

On Tue, Aug 22, 2017 at 10:47 AM, Christopher Li <sparse@chrisli.org> wrote:
> On Tue, Aug 22, 2017 at 9:32 AM, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
>>
>> Nicolai's constexpr series.
>>
>
> Is that constexpr-v4 branch on your git repository you want me to merge?

There might be a tiny bit slow down due to the constexpr. I will run more test
to see if I can ping point it. The last 3 group in each test was start by
a script. To be continue on this. The current number is not conclusive.

================ master merge constexpr-v4====================
1202.35user 446.83system 1:16.27elapsed 2162%CPU (0avgtext+0avgdata
238104maxresident)k
352inputs+12768outputs (0major+128889305minor)pagefaults 0swaps
1204.87user 447.25system 1:16.28elapsed 2165%CPU (0avgtext+0avgdata
238080maxresident)k
0inputs+12768outputs (0major+128888977minor)pagefaults 0swaps
1202.92user 447.03system 1:16.42elapsed 2158%CPU (0avgtext+0avgdata
238048maxresident)k
0inputs+12768outputs (0major+128888805minor)pagefaults 0swaps
1206.48user 446.75system 1:16.36elapsed 2164%CPU (0avgtext+0avgdata
238072maxresident)k
0inputs+12768outputs (0major+128888959minor)pagefaults 0swaps
1207.37user 447.35system 1:16.57elapsed 2161%CPU (0avgtext+0avgdata
238072maxresident)k
0inputs+12768outputs (0major+128888909minor)pagefaults 0swaps


1204.73user 446.87system 1:16.39elapsed 2161%CPU (0avgtext+0avgdata
238040maxresident)k
0inputs+12776outputs (0major+128889143minor)pagefaults 0swaps
1205.96user 447.91system 1:16.39elapsed 2164%CPU (0avgtext+0avgdata
238080maxresident)k
0inputs+12768outputs (0major+128889529minor)pagefaults 0swaps
1207.49user 448.00system 1:16.37elapsed 2167%CPU (0avgtext+0avgdata
238044maxresident)k
0inputs+12784outputs (0major+128888947minor)pagefaults 0swaps


================ rc5 =====================================
1201.56user 448.22system 1:16.24elapsed 2163%CPU (0avgtext+0avgdata
238048maxresident)k
0inputs+12768outputs (0major+128883583minor)pagefaults 0swaps
1203.27user 447.90system 1:16.22elapsed 2166%CPU (0avgtext+0avgdata
238036maxresident)k
0inputs+12768outputs (0major+128882925minor)pagefaults 0swaps
1201.65user 448.79system 1:16.20elapsed 2165%CPU (0avgtext+0avgdata
238044maxresident)k
0inputs+12768outputs (0major+128883529minor)pagefaults 0swaps

1202.19user 448.79system 1:16.24elapsed 2165%CPU (0avgtext+0avgdata
238192maxresident)k
0inputs+12768outputs (0major+128883477minor)pagefaults 0swaps
1203.38user 449.73system 1:16.33elapsed 2165%CPU (0avgtext+0avgdata
238044maxresident)k
0inputs+12776outputs (0major+128883089minor)pagefaults 0swaps
1203.37user 449.05system 1:16.23elapsed 2167%CPU (0avgtext+0avgdata
238080maxresident)k
0inputs+12768outputs (0major+128883199minor)pagefaults 0swaps

Chris

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: master merge plans
  2017-08-22 15:51     ` Christopher Li
@ 2017-08-22 20:05       ` Luc Van Oostenryck
  2017-08-23 20:50         ` Christopher Li
  0 siblings, 1 reply; 33+ messages in thread
From: Luc Van Oostenryck @ 2017-08-22 20:05 UTC (permalink / raw)
  To: Christopher Li; +Cc: Linux-Sparse

On Tue, Aug 22, 2017 at 5:51 PM, Christopher Li <sparse@chrisli.org> wrote:
>>
>> Is that constexpr-v4 branch on your git repository you want me to merge?

Yes.

> There might be a tiny bit slow down due to the constexpr. I will run more test
> to see if I can ping point it. The last 3 group in each test was start by
> a script. To be continue on this. The current number is not conclusive.

At worse these numbers show something like 0.1-0.3%.

-- Luc

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: master merge plans
  2017-08-22 20:05       ` Luc Van Oostenryck
@ 2017-08-23 20:50         ` Christopher Li
  2017-08-29 11:27           ` Christopher Li
  0 siblings, 1 reply; 33+ messages in thread
From: Christopher Li @ 2017-08-23 20:50 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: Linux-Sparse

On Tue, Aug 22, 2017 at 4:05 PM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
>
> At worse these numbers show something like 0.1-0.3%.

I am doing some finial stage testing and reviewing the const expr patch.

I have find out that, my server actually do get affect by the CPU temperature.
So I turn off the Intel Turbo Stepping. Allow some rest between test.
Then I have finish write a script to do the testing per each commit.
I include the log in the email.

I can't narrow to one commit cause a big jump. No a reason to against
the patch. I am just interested to see that the measurement can catch
such change in numbers.

Chris

BTW, for testing purpose, I rebase against master. That is *not*
how I am going to merge it.


exp~0 17c0bae constexpr: flag __builtin_bswap() as constexpr

real 1m24.786s
user 20m18.346s
sys 7m24.193s

real 1m18.803s
user 20m55.562s
sys 7m36.994s

real 1m18.827s
user 20m55.731s
sys 7m36.688s

real 1m18.848s
user 20m57.298s
sys 7m35.525s

real 1m18.922s
user 20m56.519s
sys 7m36.264s
exp~1 58d888a give default return type in evaluate_call()

real 1m18.989s
user 20m59.896s
sys 7m35.895s

real 1m18.934s
user 20m58.567s
sys 7m36.808s

real 1m19.037s
user 20m58.249s
sys 7m37.441s

real 1m18.993s
user 20m59.233s
sys 7m36.422s

real 1m19.092s
user 20m58.818s
sys 7m37.009s
exp~2 ba12c3c return an error if too few args

real 1m18.984s
user 20m59.210s
sys 7m37.683s

real 1m18.993s
user 20m58.759s
sys 7m37.619s

real 1m19.133s
user 20m59.426s
sys 7m37.658s

real 1m19.067s
user 21m0.375s
sys 7m36.571s

real 1m19.040s
user 20m59.331s
sys 7m37.687s
exp~3 ace5869 constexpr: treat comparisons between types as integer constexpr

real 1m18.853s
user 20m57.653s
sys 7m37.073s

real 1m18.870s
user 20m56.782s
sys 7m38.441s

real 1m18.949s
user 20m56.683s
sys 7m37.473s

real 1m19.101s
user 20m57.496s
sys 7m37.183s

real 1m18.889s
user 20m57.439s
sys 7m37.235s
exp~4 82f78d2 constexpr: support compound literals as address constants

real 1m18.966s
user 20m56.740s
sys 7m37.477s

real 1m18.882s
user 20m55.245s
sys 7m38.047s

real 1m18.872s
user 20m56.149s
sys 7m37.610s

real 1m19.015s
user 20m54.708s
sys 7m38.447s

real 1m18.831s
user 20m55.296s
sys 7m37.978s
exp~5 df2d126 constexpr: relax some constant expression rules for
pointer expressions

real 1m18.831s
user 20m55.178s
sys 7m37.580s

real 1m18.802s
user 20m55.791s
sys 7m36.577s

real 1m18.920s
user 20m55.301s
sys 7m37.593s

real 1m18.801s
user 20m54.810s
sys 7m37.699s

real 1m18.814s
user 20m55.836s
sys 7m36.516s
exp~6 96f5c7e constexpr: flag builtins constant_p, safe_p and warning
as constexprs

real 1m19.001s
user 20m54.861s
sys 7m39.497s

real 1m18.894s
user 20m54.176s
sys 7m39.225s

real 1m18.955s
user 20m55.225s
sys 7m38.273s

real 1m18.837s
user 20m55.395s
sys 7m38.423s

real 1m19.038s
user 20m56.132s
sys 7m37.417s
exp~7 82ba05e constexpr: examine constness of __builtin_offsetof at
evaluation only

real 1m18.844s
user 20m52.640s
sys 7m38.045s

real 1m18.796s
user 20m52.641s
sys 7m38.324s

real 1m18.908s
user 20m53.584s
sys 7m37.451s

real 1m18.842s
user 20m52.688s
sys 7m38.812s

real 1m18.806s
user 20m53.336s
sys 7m37.414s
exp~8 f585495 constexpr: recognize references to labels as address constants

real 1m18.921s
user 20m53.766s
sys 7m37.205s

real 1m18.784s
user 20m53.527s
sys 7m37.331s

real 1m18.846s
user 20m54.222s
sys 7m36.130s

real 1m18.900s
user 20m53.766s
sys 7m36.899s

real 1m18.676s
user 20m53.485s
sys 7m37.334s
exp~9 c19757f constexpr: recognize string literals as address constants

real 1m19.184s
user 20m54.707s
sys 7m37.612s

real 1m18.782s
user 20m54.507s
sys 7m37.990s

real 1m18.830s
user 20m54.008s
sys 7m38.648s

real 1m18.984s
user 20m53.784s
sys 7m38.611s

real 1m18.770s
user 20m53.433s
sys 7m39.090s
exp~10 a1ba3b8 constexpr: recognize members of static compound objects
as address constants

real 1m19.012s
user 20m53.325s
sys 7m37.647s

real 1m18.815s
user 20m53.541s
sys 7m38.065s

real 1m18.716s
user 20m52.065s
sys 7m39.047s

real 1m18.762s
user 20m52.854s
sys 7m38.205s

real 1m18.867s
user 20m53.234s
sys 7m38.251s
exp~11 89f03a1 constexpr: recognize address constants created through
pointer arithmetic

real 1m18.891s
user 20m56.707s
sys 7m37.824s

real 1m18.944s
user 20m56.198s
sys 7m37.925s

real 1m18.980s
user 20m57.025s
sys 7m37.688s

real 1m18.861s
user 20m56.359s
sys 7m38.265s

real 1m18.927s
user 20m56.978s
sys 7m37.942s
exp~12 add9d6b constexpr: recognize address constants created through casts

real 1m18.827s
user 20m54.727s
sys 7m37.786s

real 1m18.823s
user 20m55.051s
sys 7m37.619s

real 1m18.840s
user 20m55.395s
sys 7m37.611s

real 1m18.828s
user 20m54.439s
sys 7m37.358s

real 1m18.996s
user 20m55.342s
sys 7m37.761s
exp~13 1c5d817 constexpr: recognize static objects as address constants

real 1m18.849s
user 20m55.353s
sys 7m37.570s

real 1m18.951s
user 20m55.800s
sys 7m36.969s

real 1m18.901s
user 20m56.035s
sys 7m37.285s

real 1m19.008s
user 20m54.143s
sys 7m38.469s

real 1m18.851s
user 20m54.583s
sys 7m38.325s
exp~14 5931437 constexpr: check static storage duration objects'
intializers' constness

real 1m18.963s
user 20m57.428s
sys 7m36.810s

real 1m18.992s
user 20m57.136s
sys 7m37.428s

real 1m18.990s
user 20m57.323s
sys 7m36.503s

real 1m18.889s
user 20m57.453s
sys 7m37.012s

real 1m18.916s
user 20m56.250s
sys 7m37.578s
exp~15 3d1f615 constexpr: collect storage modifiers of initializers

real 1m18.620s
user 20m50.093s
sys 7m38.680s

real 1m18.710s
user 20m52.010s
sys 7m37.180s

real 1m18.658s
user 20m51.324s
sys 7m37.858s

real 1m18.708s
user 20m50.750s
sys 7m38.064s

real 1m18.729s
user 20m50.268s
sys 7m38.261s
exp~16 c5b00b9 constexpr: rename handle_simple_initializer() to
handle_initializer()

real 1m18.762s
user 20m53.856s
sys 7m39.205s

real 1m19.032s
user 20m53.479s
sys 7m39.081s

real 1m18.952s
user 20m53.434s
sys 7m39.073s

real 1m18.841s
user 20m53.849s
sys 7m38.662s

real 1m18.876s
user 20m53.638s
sys 7m39.069s
exp~17 ce1284b constexpr: add support for tagging address constants

real 1m18.841s
user 20m53.490s
sys 7m38.805s

real 1m18.808s
user 20m53.957s
sys 7m39.191s

real 1m18.979s
user 20m54.010s
sys 7m38.661s

real 1m18.923s
user 20m53.627s
sys 7m39.547s

real 1m18.837s
user 20m54.008s
sys 7m38.673s
exp~18 ca44c48 constexpr: add support for tagging arithmetic constant
expressions

real 1m18.732s
user 20m52.681s
sys 7m37.328s

real 1m18.795s
user 20m52.313s
sys 7m37.950s

real 1m18.755s
user 20m52.972s
sys 7m37.638s

real 1m18.964s
user 20m52.372s
sys 7m38.090s

real 1m18.906s
user 20m52.971s
sys 7m37.603s
exp~19 1d14279 constexpr: examine constness of conditionals at evaluation only

real 1m18.778s
user 20m52.130s
sys 7m38.598s

real 1m18.781s
user 20m53.484s
sys 7m37.828s

real 1m18.917s
user 20m53.524s
sys 7m37.803s

real 1m18.855s
user 20m52.568s
sys 7m38.649s

real 1m18.880s
user 20m52.250s
sys 7m38.596s
exp~20 d032f28 constexpr: examine constness of preops at evaluation only

real 1m18.670s
user 20m52.232s
sys 7m37.297s

real 1m18.740s
user 20m51.975s
sys 7m37.008s

real 1m18.690s
user 20m51.828s
sys 7m37.761s

real 1m18.874s
user 20m50.968s
sys 7m38.967s

real 1m18.689s
user 20m51.353s
sys 7m37.997s
exp~21 2322e92 constexpr: examine constness of binops and alike at
evaluation only

real 1m18.848s
user 20m52.420s
sys 7m38.549s

real 1m18.964s
user 20m53.504s
sys 7m37.359s

real 1m18.687s
user 20m52.094s
sys 7m39.136s

real 1m18.730s
user 20m52.640s
sys 7m38.264s

real 1m18.738s
user 20m52.135s
sys 7m38.986s
exp~22 63db9df constexpr: examine constness of casts at evaluation only

real 1m18.931s
user 20m52.036s
sys 7m38.964s

real 1m18.781s
user 20m51.949s
sys 7m38.190s

real 1m18.822s
user 20m52.409s
sys 7m38.207s

real 1m18.819s
user 20m51.696s
sys 7m39.160s

real 1m18.939s
user 20m51.307s
sys 7m39.398s
exp~23 3a4d6eb constexpr: init flags at expression allocation

real 1m18.735s
user 20m51.678s
sys 7m37.659s

real 1m18.763s
user 20m51.829s
sys 7m37.649s

real 1m18.878s
user 20m51.538s
sys 7m37.923s

real 1m18.764s
user 20m51.689s
sys 7m37.561s

real 1m18.842s
user 20m50.505s
sys 7m38.146s
exp~24 6b1e4ad constexpr: introduce additional expression constness
tracking flags

real 1m18.821s
user 20m52.272s
sys 7m38.586s

real 1m18.829s
user 20m50.961s
sys 7m39.365s

real 1m18.821s
user 20m52.493s
sys 7m38.452s

real 1m18.850s
user 20m51.705s
sys 7m39.267s

real 1m18.923s
user 20m53.136s
sys 7m37.625s
exp~25 9151ef4 gcc attr: add nonstring warn_if_not_aligned

real 1m18.813s
user 20m51.490s
sys 7m38.610s

real 1m18.703s
user 20m50.492s
sys 7m38.937s

real 1m18.723s
user 20m50.613s
sys 7m38.524s

real 1m18.654s
user 20m52.010s
sys 7m37.222s

real 1m18.790s
user 20m51.441s
sys 7m38.248s

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: master merge plans
  2017-08-23 20:50         ` Christopher Li
@ 2017-08-29 11:27           ` Christopher Li
  2017-09-03 19:24             ` Luc Van Oostenryck
  0 siblings, 1 reply; 33+ messages in thread
From: Christopher Li @ 2017-08-29 11:27 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: Linux-Sparse

On Wed, Aug 23, 2017 at 4:50 PM, Christopher Li <sparse@chrisli.org> wrote:
> On Tue, Aug 22, 2017 at 4:05 PM, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
>>
>> At worse these numbers show something like 0.1-0.3%.
>
> I am doing some finial stage testing and reviewing the const expr patch.
>

The context expression branch has been merge to master.

Chris

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: master merge plans
  2017-08-29 11:27           ` Christopher Li
@ 2017-09-03 19:24             ` Luc Van Oostenryck
  0 siblings, 0 replies; 33+ messages in thread
From: Luc Van Oostenryck @ 2017-09-03 19:24 UTC (permalink / raw)
  To: Christopher Li; +Cc: Linux-Sparse

On Tue, Aug 29, 2017 at 1:27 PM, Christopher Li <sparse@chrisli.org> wrote:
> On Wed, Aug 23, 2017 at 4:50 PM, Christopher Li <sparse@chrisli.org> wrote:
>> On Tue, Aug 22, 2017 at 4:05 PM, Luc Van Oostenryck
>> <luc.vanoostenryck@gmail.com> wrote:
>>>
>>> At worse these numbers show something like 0.1-0.3%.
>>
>> I am doing some finial stage testing and reviewing the const expr patch.
>>
>
> The context expression branch has been merge to master.

That's really great!
Thanks.

-- Luc

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Simple SSA status
  2017-08-22 14:59     ` Christopher Li
@ 2017-09-03 20:26       ` Luc Van Oostenryck
  2017-09-04 18:29         ` Christopher Li
  0 siblings, 1 reply; 33+ messages in thread
From: Luc Van Oostenryck @ 2017-09-03 20:26 UTC (permalink / raw)
  To: Christopher Li; +Cc: Dibyendu Majumdar, Linux-Sparse, Linus Torvalds

On Tue, Aug 22, 2017 at 10:59:54AM -0400, Christopher Li wrote:
> On Tue, Aug 22, 2017 at 10:53 AM, Dibyendu Majumdar
> <mobile@majumdar.org.uk> wrote:
> >>
> >
> > It will be great if the SSSA series is merged and tested a bit before
> > merging other changes.
> 
> I can create a topic branch for SSSA on sparse repository.
> 
> Merge it when it is ready?

I've put it on hold for the moment.
The conversion of purely local vars was the easy job.
The real challenge is to do it for the other loads & stores
(in simplify_symbol_usage() & simplify_memops()).
I'm working on it and have some encouraging results:
- correct phi-nodes
- pass the test-suite
- no crashes in GCC test-suite and others
- no infinite loops
- no catastrophic performance on exceptional workoads
- roughly convert as much stores & loads as currently
- good performance (within 2-5% as rc5 on some workloads)
- in some case, surprising good effect on optimization

I don't know yet if keeping the Simple SSA during linearization
will be worth to keep or not.

> Right now I do wish the SSSA has the two options I request
> before.

I don't remember what was the other option you asked but keeping
both the old and the new method is not something I'm interested in.
We know that the current method is broken. In fact, it's broken in two
different ways:
- the phi-nodes are mis-placed (they are on the use site instead of the
  meet point).
- each phi-nodes have as many sources as there are definitions for this
  variable (more or less) instead of having one for each parents.

I also recently discovered that the logic for aliases is broken.
For example, code like:
	int foo(void)
	{
		int i = 1;
		int *a;
		int **p;
		
		a = &i;
		p = &a;
		*p[0] = 0;
		return i;
	}
is mis-compiled as it always returns 1 instead of 0
(it fails to see that *p[0] is aliased to i).

What I'm interested in is, in this order:
1) produce correct IR
2) convert as much loads & stores as currently
   (but only when it's correct to do it, of course)
3) have performance that is similar as we currently have

Point 2) is needed to avoid more false warnings during
context checking (and have a very significant effect on
performance).

-- Luc

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Simple SSA status
  2017-09-03 20:26       ` Simple SSA status Luc Van Oostenryck
@ 2017-09-04 18:29         ` Christopher Li
  2017-09-04 20:07           ` Luc Van Oostenryck
  0 siblings, 1 reply; 33+ messages in thread
From: Christopher Li @ 2017-09-04 18:29 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: Dibyendu Majumdar, Linux-Sparse, Linus Torvalds

On Sun, Sep 3, 2017 at 4:26 PM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
>>
>> Merge it when it is ready?
>
> I've put it on hold for the moment.

Thanks for the update. Glad to heard you back.

> The conversion of purely local vars was the easy job.
> The real challenge is to do it for the other loads & stores
> (in simplify_symbol_usage() & simplify_memops()).
> I'm working on it and have some encouraging results:
> - correct phi-nodes
> - pass the test-suite
> - no crashes in GCC test-suite and others
> - no infinite loops
> - no catastrophic performance on exceptional workoads
> - roughly convert as much stores & loads as currently
> - good performance (within 2-5% as rc5 on some workloads)
> - in some case, surprising good effect on optimization

That is great. Very exciting news.

>
> I don't know yet if keeping the Simple SSA during linearization
> will be worth to keep or not.

I have the same question as well.  When I find out the Simple SSA
can't handle memory to register using pointers. It already means
we need to keep the other non local variable path alive.
We can do some bench mark to find out. If the performance
is very similar, remove the simple SSA will actually simplify
the code because we only need to care about one code path.

>> Right now I do wish the SSSA has the two options I request
>> before.
>
> I don't remember what was the other option you asked but keeping
> both the old and the new method is not something I'm interested in.
> We know that the current method is broken. In fact, it's broken in two
> different ways:

I have no intend to keep the broken behavior. My wish list for the two
options rephrased:
1) Option to by pass the simpole SSA conversion which generate the
    raw load/store instruction before convert memory to register.
2) Option to do the memory to register conversion not cover by
    simple SSA conversion. It sounds like you implement this already.

> - the phi-nodes are mis-placed (they are on the use site instead of the
>   meet point).

Right. The meet point on the book also call Dominance Frontier :-)

> - each phi-nodes have as many sources as there are definitions for this
>   variable (more or less) instead of having one for each parents.

I am not sure I understand this. If you always place phi node at
DF. It will have as many source as incoming edge. That is part
of the common SSA dominance property. If we do it right, I can't
see why we still need the phi source instruction.


>
> I also recently discovered that the logic for aliases is broken.
> For example, code like:
>         int foo(void)
>         {
>                 int i = 1;
>                 int *a;
>                 int **p;
>
>                 a = &i;
>                 p = &a;
>                 *p[0] = 0;
>                 return i;
>         }
> is mis-compiled as it always returns 1 instead of 0
> (it fails to see that *p[0] is aliased to i).

That is exactly why we need instruction level aliases
analyze. The SSA conversion should be done after
aliases analyze.
>
> What I'm interested in is, in this order:
> 1) produce correct IR

Agree.

> 2) convert as much loads & stores as currently
>    (but only when it's correct to do it, of course)

Agree.

> 3) have performance that is similar as we currently have

Agree.

>
> Point 2) is needed to avoid more false warnings during
> context checking (and have a very significant effect on
> performance).

It seems we are all in agreement.

Chris

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Simple SSA status
  2017-09-04 18:29         ` Christopher Li
@ 2017-09-04 20:07           ` Luc Van Oostenryck
  2017-09-04 20:37             ` Dibyendu Majumdar
  2017-09-04 23:31             ` Christopher Li
  0 siblings, 2 replies; 33+ messages in thread
From: Luc Van Oostenryck @ 2017-09-04 20:07 UTC (permalink / raw)
  To: Christopher Li; +Cc: Dibyendu Majumdar, Linux-Sparse, Linus Torvalds

On Mon, Sep 04, 2017 at 02:29:46PM -0400, Christopher Li wrote:
> On Sun, Sep 3, 2017 at 4:26 PM, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
> >>
> >> Merge it when it is ready?
> >
> > I've put it on hold for the moment.
> 
> Thanks for the update. Glad to heard you back.
> 
> > The conversion of purely local vars was the easy job.
> > The real challenge is to do it for the other loads & stores
> > (in simplify_symbol_usage() & simplify_memops()).
> > I'm working on it and have some encouraging results:
> > - correct phi-nodes
> > - pass the test-suite
> > - no crashes in GCC test-suite and others
> > - no infinite loops
> > - no catastrophic performance on exceptional workoads
> > - roughly convert as much stores & loads as currently
> > - good performance (within 2-5% as rc5 on some workloads)
> > - in some case, surprising good effect on optimization
> 
> That is great. Very exciting news.
> 
> >
> > I don't know yet if keeping the Simple SSA during linearization
> > will be worth to keep or not.
> 
> I have the same question as well.  When I find out the Simple SSA
> can't handle memory to register using pointers. It already means
> we need to keep the other non local variable path alive.

Well ... we've already talked about this but
- identifying which memory accesses can be converted or not is
  orthogonal to the conversion itself
- the current implementation has also two different conversions.
  * simplify_symbol_usage(), called early
  * simplify_loads(), called later, several times
  Both are different in what they can handle or not.

The more I've looked at this, the more I realize that what *can*
be done on local vars is most of the time unsuited for non-local
ones and what is *needed* for non-local ones are unneeded for
local ones.

Doing a quick, simple conversion of all local vars can be a good
thing to be able to easily, efficiently *identify* in a later pass,
which other accesses can be converted or not.

> We can do some bench mark to find out. If the performance
> is very similar, remove the simple SSA will actually simplify
> the code because we only need to care about one code path.
> 
> >> Right now I do wish the SSSA has the two options I request
> >> before.
> >
> > I don't remember what was the other option you asked but keeping
> > both the old and the new method is not something I'm interested in.
> > We know that the current method is broken. In fact, it's broken in two
> > different ways:
> 
> I have no intend to keep the broken behavior. My wish list for the two
> options rephrased:
> 1) Option to by pass the simpole SSA conversion which generate the
>     raw load/store instruction before convert memory to register.
> 2) Option to do the memory to register conversion not cover by
>     simple SSA conversion. It sounds like you implement this already.
> 
> > - the phi-nodes are mis-placed (they are on the use site instead of the
> >   meet point).
> 
> Right. The meet point on the book also call Dominance Frontier :-)

Not necessarily. The dominance frontier is where the phi-nodes are
strictly needed to guarantee the SSA property. You can put some others
phi-nodes, trivial or not, at some other meet points, and this will
give you correct SSA too (like dumb-SSA: put a phi-node for each variable
at every meet point).

So no, the DF is only a subset of all the meet point.
Here I was saying that some phi-node are not even put at a meet-point.
For example:
	.L1:
		...
		phisrc	%phi1, %r1
		br	.L3
	.L2
		...
		phisrc	%phi2, %r2
		br	.L3
	.L3
		// phi-node should be here, it's a meet point
		...
		br	.L4
	.L4
		# lnop ...
		phi	%r4 <- %phi1, %phi2

The method put the phi-node at .L4 because it was in this block
that the load/use was present but it should have been put at .L3,
the meet point. Often it's not a problem, it's why it hasn't been
seen earlier (and probably few people took much time to look at
the generated IR). It creates problems though once you:
- merge some blocks, phi-conversion, try_to_simplify_bb(), ...
- you have undefined vars in the way

> > - each phi-nodes have as many sources as there are definitions for this
> >   variable (more or less) instead of having one for each parents.
> 
> I am not sure I understand this. If you always place phi node at
> DF. It will have as many source as incoming edge. That is part
> of the common SSA dominance property. If we do it right, I can't
> see why we still need the phi source instruction.

Hehe, note how I said 'sources' and not 'phi-sources'. I just meant
the argument of the phi-nodes, their operands.

Note that I was saying that, currently, even when the phi-node
is correctly placed, it sometimes has more operands than predecessors
because the current method sortof accumulates with the operands of
other dominating phi-nodes related to this variable.

About the phi-sources, I agree strongly with this. To me these
phi-sources are just a nuisance. Once things will be in better
shape, I'll see what can be done to get rid of them (quite a bit
of the logic depends on them for the moment, though).

To be fair, these phi-sources have they use for the moment, but
as far as I can see it's only because we don't have a one-to-one
mapping between the phi-nodes operands and the block's predecessors.
In most notation, articles, ... you will see phi-nodes like:
	phi	%r4 <- [.L1:%r1, .L2:%r2]
to explicitly show the correspondance between each operands
and each predecessor (internally, things can be implicit,
it's what I have now, in fact).

> >
> > I also recently discovered that the logic for aliases is broken.
> > For example, code like:
> >         int foo(void)
> >         {
> >                 int i = 1;
> >                 int *a;
> >                 int **p;
> >
> >                 a = &i;
> >                 p = &a;
> >                 *p[0] = 0;
> >                 return i;
> >         }
> > is mis-compiled as it always returns 1 instead of 0
> > (it fails to see that *p[0] is aliased to i).
> 
> That is exactly why we need instruction level aliases
> analyze. The SSA conversion should be done after
> aliases analyze.

And it's here where the egg & chicken problem begin.
To efficiently do the alias analysis, you better already
have IR in SSA form, having constant propagation already
done, having dead code elimination already done, ...

And, it's here where doing first the simple SSA can be
a good thing.

Currently we have:
	1) linearization
	2) simple alias analysis + conversion of local loads +
	   conversion of other loads + simplification of
           unnneded stores (simplify_symbol_usage())
	3) simplification & CSE
	4) more conversion of memory accesses (simplify_memops())
	5) goto 3)

With the Simple SSA code we have:
	1) linearization & conversion of local loads & stores
	2) reuse or not the broken simplify_symbol_usage() to
           convert loads
	3) simplification & CSE
	4) reuse or not the broken simplify_memops()
           or reuse only the non-broken part of it.
	5) goto 3)

What we would like to have (more or less):
	1) linearization
	1') conversion of local vars (maybe via Simple SSA)
	2) first pass of simplifications & CSE
	3) alias analysis
	3') conversion of memory accesses using 3)
	5) more simplification & CSE
	6) goto 3)
This correspond to a very classical architecture.


Note: here above, I use 'local var' to mean '(local) var which can
      safely be converted because it cannot be aliased to anything'
Note: there are a lot of possible alias analyses, varying greatly
      in precision, complexity & run-time ressources.
      It's still very OK to me to have (first) something quite simple
      as we currently have.
      It's also OK to do the conversion while doing the analysis if
      it's advantageaous to do so (no need memory to hold the info).
Note: simplification & CSE should include Dead Code Elimination and
      it's not clear to me what exactly we're doing about it.

-- Luc

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Simple SSA status
  2017-09-04 20:07           ` Luc Van Oostenryck
@ 2017-09-04 20:37             ` Dibyendu Majumdar
  2017-09-04 20:55               ` Luc Van Oostenryck
  2017-09-04 23:31             ` Christopher Li
  1 sibling, 1 reply; 33+ messages in thread
From: Dibyendu Majumdar @ 2017-09-04 20:37 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: Christopher Li, Linux-Sparse, Linus Torvalds

Hi Luc,

On 4 September 2017 at 21:07, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:

> About the phi-sources, I agree strongly with this. To me these
> phi-sources are just a nuisance. Once things will be in better
> shape, I'll see what can be done to get rid of them (quite a bit
> of the logic depends on them for the moment, though).
>

As you know the backend relies upon the phisrc instruction and treats
it like a 'store' while treating the phi instructions as 'loads'.
Selfishly I would like this to remain as it seems changing it is not
necessary, but nice to have?

Regards
Dibyendu

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Simple SSA status
  2017-09-04 20:37             ` Dibyendu Majumdar
@ 2017-09-04 20:55               ` Luc Van Oostenryck
  2017-09-04 21:24                 ` Dibyendu Majumdar
  0 siblings, 1 reply; 33+ messages in thread
From: Luc Van Oostenryck @ 2017-09-04 20:55 UTC (permalink / raw)
  To: Dibyendu Majumdar; +Cc: Christopher Li, Linux-Sparse, Linus Torvalds

On Mon, Sep 04, 2017 at 09:37:17PM +0100, Dibyendu Majumdar wrote:
> Hi Luc,
> 
> On 4 September 2017 at 21:07, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
> 
> > About the phi-sources, I agree strongly with this. To me these
> > phi-sources are just a nuisance. Once things will be in better
> > shape, I'll see what can be done to get rid of them (quite a bit
> > of the logic depends on them for the moment, though).
> >
> 
> As you know the backend relies upon the phisrc instruction and treats
> it like a 'store' while treating the phi instructions as 'loads'.
> Selfishly I would like this to remain as it seems changing it is not
> necessary, but nice to have?

Well, I'm not very sure about how you use these phisrc but from what
I've understood, in sparse-llvm it's done like this because LLVM's phi
can't be used *because* of the problems described here above.
If I remove these phisources, I would of course adapt sparse-llvm
to do the right thing.

And no, removing these phisources is not a nice to have. Like
I said, they're a real nuisance and make a lots of things much
more complicated than needed.

Once the phi-nodes are correct, you can trivially recreate these
phisources if needed. But the simple fact that you can do this
trivially is a clear sign that you really don't *need* them.
Among other things you can do the alloca/store/load as easily
without them (instead of putting the store at the place of the
phisrc was, you put the store at the end of each predecessor,
which is where these phisrc should have been and where they are
now in my code).

Anyway, we're not yet at the point where it would be possible to
remove them. We're even still far from it.

-- Luc

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Simple SSA status
  2017-09-04 20:55               ` Luc Van Oostenryck
@ 2017-09-04 21:24                 ` Dibyendu Majumdar
  0 siblings, 0 replies; 33+ messages in thread
From: Dibyendu Majumdar @ 2017-09-04 21:24 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: Christopher Li, Linux-Sparse, Linus Torvalds

Hi Luc,

On 4 September 2017 at 21:55, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
> On Mon, Sep 04, 2017 at 09:37:17PM +0100, Dibyendu Majumdar wrote:
>> On 4 September 2017 at 21:07, Luc Van Oostenryck
>> <luc.vanoostenryck@gmail.com> wrote:
>>
>> > About the phi-sources, I agree strongly with this. To me these
>> > phi-sources are just a nuisance. Once things will be in better
>> > shape, I'll see what can be done to get rid of them (quite a bit
>> > of the logic depends on them for the moment, though).
>> >
>>
>> As you know the backend relies upon the phisrc instruction and treats
>> it like a 'store' while treating the phi instructions as 'loads'.
>> Selfishly I would like this to remain as it seems changing it is not
>> necessary, but nice to have?
>
> Well, I'm not very sure about how you use these phisrc but from what
> I've understood, in sparse-llvm it's done like this because LLVM's phi
> can't be used *because* of the problems described here above.
> If I remove these phisources, I would of course adapt sparse-llvm
> to do the right thing.
>

Well I have alternative backend that does not have phi instructions.
Which also makes me think now that maybe it is better to keep the
introduction of phi instructions as a separate phase rather than doing
this at the time of linearization.

Regards
Dibyendu

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Simple SSA status
  2017-09-04 20:07           ` Luc Van Oostenryck
  2017-09-04 20:37             ` Dibyendu Majumdar
@ 2017-09-04 23:31             ` Christopher Li
  2017-09-05  0:55               ` Luc Van Oostenryck
  1 sibling, 1 reply; 33+ messages in thread
From: Christopher Li @ 2017-09-04 23:31 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: Dibyendu Majumdar, Linux-Sparse, Linus Torvalds

On Mon, Sep 4, 2017 at 4:07 PM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
>> I have the same question as well.  When I find out the Simple SSA
>> can't handle memory to register using pointers. It already means
>> we need to keep the other non local variable path alive.
>
> Well ... we've already talked about this but
> - identifying which memory accesses can be converted or not is
>   orthogonal to the conversion itself

I am not sure that simple SSA can work on CFG after CFG has
been fully generated. Maybe it can.

> - the current implementation has also two different conversions.
>   * simplify_symbol_usage(), called early
>   * simplify_loads(), called later, several times
>   Both are different in what they can handle or not.

I am thinking at some point we might just do a proper Cytron et al then
we might be able to discard those simplify. Those simplify are ad hoc
any way.


> The more I've looked at this, the more I realize that what *can*
> be done on local vars is most of the time unsuited for non-local
> ones and what is *needed* for non-local ones are unneeded for
> local ones.

That is because you already buy into the simple SSA. The Cytron
et al should be very similar treatment for local vs non local. They
all need to find the DF then insert the phi node. Things like finding
dominator tree etc will be the same for both local vs non local variable.

>> Right. The meet point on the book also call Dominance Frontier :-)
>
> Not necessarily. The dominance frontier is where the phi-nodes are
> strictly needed to guarantee the SSA property. You can put some others
> phi-nodes, trivial or not, at some other meet points, and this will
> give you correct SSA too (like dumb-SSA: put a phi-node for each variable
> at every meet point).

SSA, yes. Minimal SSA? no.
I assume we want minimal SSA?


>
> So no, the DF is only a subset of all the meet point.
> Here I was saying that some phi-node are not even put at a meet-point.
> For example:
>         .L1:
>                 ...
>                 phisrc  %phi1, %r1
>                 br      .L3
>         .L2
>                 ...
>                 phisrc  %phi2, %r2
>                 br      .L3
>         .L3
>                 // phi-node should be here, it's a meet point
>                 ...
>                 br      .L4
>         .L4
>                 # lnop ...
>                 phi     %r4 <- %phi1, %phi2
>
> The method put the phi-node at .L4 because it was in this block
> that the load/use was present but it should have been put at .L3,

I think I give an example very similar to this before. That is where
the current sparse do it wrong.

> the meet point. Often it's not a problem, it's why it hasn't been

It is a problem already. Because at .L4, there is no incoming
edge information like .L3. At .L3 we can do phinode properly.
You need to place one phi node at .L3 any way. You can't do it at L4
without placing one at L3. At L3 it is the minimal requirement.
L4 is not required for minimal SSA.

> seen earlier (and probably few people took much time to look at
> the generated IR). It creates problems though once you:
> - merge some blocks, phi-conversion, try_to_simplify_bb(), ...
> - you have undefined vars in the way


>> > - each phi-nodes have as many sources as there are definitions for this
>> >   variable (more or less) instead of having one for each parents.
>>
>> I am not sure I understand this. If you always place phi node at
>> DF. It will have as many source as incoming edge. That is part
>> of the common SSA dominance property. If we do it right, I can't
>> see why we still need the phi source instruction.
>
> Hehe, note how I said 'sources' and not 'phi-sources'. I just meant
> the argument of the phi-nodes, their operands.



>
> Note that I was saying that, currently, even when the phi-node
> is correctly placed, it sometimes has more operands than predecessors
> because the current method sortof accumulates with the operands of
> other dominating phi-nodes related to this variable.

Are you saying current sparse doing that or that is the right thing
to do? I think currently sparse might do that. But that is wrong.
It is conflict with the SSA dominance property. Every incoming edge
should have one definition. It should  not have more than one
define per incoming edge. Do you have example?


> About the phi-sources, I agree strongly with this. To me these
> phi-sources are just a nuisance. Once things will be in better
> shape, I'll see what can be done to get rid of them (quite a bit
> of the logic depends on them for the moment, though).
>
> To be fair, these phi-sources have they use for the moment, but
> as far as I can see it's only because we don't have a one-to-one
> mapping between the phi-nodes operands and the block's predecessors.
> In most notation, articles, ... you will see phi-nodes like:
>         phi     %r4 <- [.L1:%r1, .L2:%r2]
> to explicitly show the correspondance between each operands
> and each predecessor (internally, things can be implicit,
> it's what I have now, in fact).

Seems all good then.


>> That is exactly why we need instruction level aliases
>> analyze. The SSA conversion should be done after
>> aliases analyze.
>
> And it's here where the egg & chicken problem begin.
> To efficiently do the alias analysis, you better already
> have IR in SSA form, having constant propagation already
> done, having dead code elimination already done, ...

That just mean it need more than one pass. Ideally if
the optimized code can be driven by incremental change
then we don't need to do too many unnecessary pass.

> And, it's here where doing first the simple SSA can be
> a good thing.
>
> Currently we have:
>         1) linearization
>         2) simple alias analysis + conversion of local loads +
>            conversion of other loads + simplification of
>            unnneded stores (simplify_symbol_usage())
>         3) simplification & CSE
>         4) more conversion of memory accesses (simplify_memops())
>         5) goto 3)
>
> With the Simple SSA code we have:
>         1) linearization & conversion of local loads & stores
>         2) reuse or not the broken simplify_symbol_usage() to
>            convert loads
>         3) simplification & CSE
>         4) reuse or not the broken simplify_memops()
>            or reuse only the non-broken part of it.
>         5) goto 3)
>
> What we would like to have (more or less):
>         1) linearization
>         1') conversion of local vars (maybe via Simple SSA)
>         2) first pass of simplifications & CSE
>         3) alias analysis
>         3') conversion of memory accesses using 3)
>         5) more simplification & CSE
>         6) goto 3)
> This correspond to a very classical architecture.

I think it is a good direction to go after. I also think constant
propagation should be separate out from  2) and 5)
because there is classic SSA algorithm to do it very
effectively.

> Note: here above, I use 'local var' to mean '(local) var which can
>       safely be converted because it cannot be aliased to anything'
> Note: there are a lot of possible alias analyses, varying greatly
>       in precision, complexity & run-time ressources.
>       It's still very OK to me to have (first) something quite simple
>       as we currently have.

Sure, it is depend on how much this fast path can save, and how
complex is this fast path.

Chris

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Simple SSA status
  2017-09-04 23:31             ` Christopher Li
@ 2017-09-05  0:55               ` Luc Van Oostenryck
  2017-09-05  3:28                 ` Christopher Li
  0 siblings, 1 reply; 33+ messages in thread
From: Luc Van Oostenryck @ 2017-09-05  0:55 UTC (permalink / raw)
  To: Christopher Li; +Cc: Dibyendu Majumdar, Linux-Sparse, Linus Torvalds

On Mon, Sep 04, 2017 at 07:31:02PM -0400, Christopher Li wrote:
> On Mon, Sep 4, 2017 at 4:07 PM, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
> >> I have the same question as well.  When I find out the Simple SSA
> >> can't handle memory to register using pointers. It already means
> >> we need to keep the other non local variable path alive.
> >
> > Well ... we've already talked about this but
> > - identifying which memory accesses can be converted or not is
> >   orthogonal to the conversion itself
> 
> I am not sure that simple SSA can work on CFG after CFG has
> been fully generated. Maybe it can.

It can and it has pro & cons
- pro: the CFG is know in advance, so no complication with gotos
- cons: for non-goto blocks during linearization we know when
	all the predecessors have been already processed
	If done after lineariation, this info need to be somehow
	deduced (but it's just keeping a flag).
 
> > - the current implementation has also two different conversions.
> >   * simplify_symbol_usage(), called early
> >   * simplify_loads(), called later, several times
> >   Both are different in what they can handle or not.
> 
> I am thinking at some point we might just do a proper Cytron et al then
> we might be able to discard those simplify. Those simplify are ad hoc
> any way.

If you look at the literature you will see two things:
- either the article doesn't talk at all about which variable are
  processed, so the alias problem is ignored and implicitly they
  talk only about local vars (which address is never taken in any way)
- or they explicitly say that for this or that reason SSA is only
  well suited for local vars (which address is never taken in any way)
- only rarely is there any mention that some alias analysis must be
  done *before* conversion to select which variable can be converted

So, if it is to do Cytron or else but only on local var, I largely
prefer the Simple SSA way. And this still leave open the problem
for the others variables.

> > The more I've looked at this, the more I realize that what *can*
> > be done on local vars is most of the time unsuited for non-local
> > ones and what is *needed* for non-local ones are unneeded for
> > local ones.
> 
> That is because you already buy into the simple SSA.

Not really. I'm appealing to the Simple SSA because with it we doesn't
have to create those phony stores & loads for local variables and
destroy them just after. Local variables are not memory, doesn't behave
like memory, are not concerned by aliasing and so it's the right thing
to me to not use stores & loads on them.

> The Cytron
> et al should be very similar treatment for local vs non local. They
> all need to find the DF then insert the phi node. Things like finding
> dominator tree etc will be the same for both local vs non local variable.

Yes and no.
When you're talking about the dominator tree it's just that, the dominance
of the BBs in the CFG. What really matters for the SSA conversion is
not this dominance, it's the dominance that variables definitions have
on variables uses.
Of course, the later depends on the former but the problem for non-local
variables is not the same as the one for local variables, you need to take
in account the presence of calls, stores to unrelated vars (and non-var),
... You can consider this as being part of the alias analysis if you
wish but ...

> >> Right. The meet point on the book also call Dominance Frontier :-)
> >
> > Not necessarily. The dominance frontier is where the phi-nodes are
> > strictly needed to guarantee the SSA property. You can put some others
> > phi-nodes, trivial or not, at some other meet points, and this will
> > give you correct SSA too (like dumb-SSA: put a phi-node for each variable
> > at every meet point).
> 
> SSA, yes. Minimal SSA? no.
> I assume we want minimal SSA?

We certainly don't want dumb-SSA.
Some superfluous phi-node in irreducible regions are not a problem.
Anyway this 'minimal' SSA is only minimal in a very narrow-sense and
you can have lots of unneeded phi-nodes and still be minimal-SSA.

> 
> >
> > So no, the DF is only a subset of all the meet point.
> > Here I was saying that some phi-node are not even put at a meet-point.
> > For example:
> >         .L1:
> >                 ...
> >                 phisrc  %phi1, %r1
> >                 br      .L3
> >         .L2
> >                 ...
> >                 phisrc  %phi2, %r2
> >                 br      .L3
> >         .L3
> >                 // phi-node should be here, it's a meet point
> >                 ...
> >                 br      .L4
> >         .L4
> >                 # lnop ...
> >                 phi     %r4 <- %phi1, %phi2
> >
> > The method put the phi-node at .L4 because it was in this block
> > that the load/use was present but it should have been put at .L3,
> 
> I think I give an example very similar to this before. That is where
> the current sparse do it wrong.
> 
> > the meet point. Often it's not a problem, it's why it hasn't been
> 
> It is a problem already. Because at .L4, there is no incoming
> edge information like .L3. At .L3 we can do phinode properly.
> You need to place one phi node at .L3 any way. You can't do it at L4
> without placing one at L3. At L3 it is the minimal requirement.
> L4 is not required for minimal SSA.

Yes yes yes, it's wrong. What I meant by "often it's not a problem"
is that currently, with the help of the phisources, often the code
can deal with it and do something sensical.
This doesn't make it less wrong and there is also enough situations
where the code can't possibly do anything sensical with it.

> > seen earlier (and probably few people took much time to look at
> > the generated IR). It creates problems though once you:
> > - merge some blocks, phi-conversion, try_to_simplify_bb(), ...
> > - you have undefined vars in the way
> 
> 
> >> > - each phi-nodes have as many sources as there are definitions for this
> >> >   variable (more or less) instead of having one for each parents.
> >>
> >> I am not sure I understand this. If you always place phi node at
> >> DF. It will have as many source as incoming edge. That is part
> >> of the common SSA dominance property. If we do it right, I can't
> >> see why we still need the phi source instruction.
> >
> > Hehe, note how I said 'sources' and not 'phi-sources'. I just meant
> > the argument of the phi-nodes, their operands.
> 
> 
> 
> >
> > Note that I was saying that, currently, even when the phi-node
> > is correctly placed, it sometimes has more operands than predecessors
> > because the current method sortof accumulates with the operands of
> > other dominating phi-nodes related to this variable.
> 
> Are you saying current sparse doing that or that is the right thing
> to do? I think currently sparse might do that. But that is wrong.

Yes, currently sparse do that and it's very very wrong. More wrong
than the misplacement of phi-nodes. The code can never do something
sensical with it.

> It is conflict with the SSA dominance property. Every incoming edge
> should have one definition. It should  not have more than one
> define per incoming edge.

Currently, there is no relation between phi-nodes operands and
incoming edges. The only relation that exist (and that the code
need and care about) is the one between phi-nodes and their
phi-sources.

> Do you have example?

I have lots of examples :)
In a few days I'll submit a series of test cases and some will
be exhibit this.  A simple example is:
	#define TEST(N)			\
		do {			\
			d = b + a[N];	\
			if (d < b)	\
				c++;	\
			b = d;		\
		} while (0)
	
	int foo(int *a, int b, int c)
	{
		int d;
	
		TEST(0);
		TEST(1);
		TEST(2);
	
		return d + c;
	}

Where you will have 3 phi-nodes (OK, there is 3 ifs), the first with
2 sources (OK), the second with 3 and the third with 4 sources while
all nodes have 2 parents or less.
Even more interestingly, if you add another TEST(x), you will have
another phi-node with ... 5 sources and so one. One more source for
each TEST(x) invocation, so yes this give a nice quadratic processing
time and memory consumption.

> > About the phi-sources, I agree strongly with this. To me these
> > phi-sources are just a nuisance. Once things will be in better
> > shape, I'll see what can be done to get rid of them (quite a bit
> > of the logic depends on them for the moment, though).
> >
> > To be fair, these phi-sources have they use for the moment, but
> > as far as I can see it's only because we don't have a one-to-one
> > mapping between the phi-nodes operands and the block's predecessors.
> > In most notation, articles, ... you will see phi-nodes like:
> >         phi     %r4 <- [.L1:%r1, .L2:%r2]
> > to explicitly show the correspondance between each operands
> > and each predecessor (internally, things can be implicit,
> > it's what I have now, in fact).
> 
> Seems all good then.
> 
> 
> >> That is exactly why we need instruction level aliases
> >> analyze. The SSA conversion should be done after
> >> aliases analyze.
> >
> > And it's here where the egg & chicken problem begin.
> > To efficiently do the alias analysis, you better already
> > have IR in SSA form, having constant propagation already
> > done, having dead code elimination already done, ...
> 
> That just mean it need more than one pass. Ideally if
> the optimized code can be driven by incremental change
> then we don't need to do too many unnecessary pass.
> 
> > And, it's here where doing first the simple SSA can be
> > a good thing.
> >
> > Currently we have:
> >         1) linearization
> >         2) simple alias analysis + conversion of local loads +
> >            conversion of other loads + simplification of
> >            unnneded stores (simplify_symbol_usage())
> >         3) simplification & CSE
> >         4) more conversion of memory accesses (simplify_memops())
> >         5) goto 3)
> >
> > With the Simple SSA code we have:
> >         1) linearization & conversion of local loads & stores
> >         2) reuse or not the broken simplify_symbol_usage() to
> >            convert loads
> >         3) simplification & CSE
> >         4) reuse or not the broken simplify_memops()
> >            or reuse only the non-broken part of it.
> >         5) goto 3)
> >
> > What we would like to have (more or less):
> >         1) linearization
> >         1') conversion of local vars (maybe via Simple SSA)
> >         2) first pass of simplifications & CSE
> >         3) alias analysis
> >         3') conversion of memory accesses using 3)
> >         5) more simplification & CSE
> >         6) goto 3)
> > This correspond to a very classical architecture.
> 
> I think it is a good direction to go after. I also think constant
> propagation should be separate out from  2) and 5)
> because there is classic SSA algorithm to do it very
> effectively.

It was implicitly included in 'simplification' which can contains all
sort of optimization. And yes, sure, we want SCCP for this.
 
> > Note: here above, I use 'local var' to mean '(local) var which can
> >       safely be converted because it cannot be aliased to anything'
> > Note: there are a lot of possible alias analyses, varying greatly
> >       in precision, complexity & run-time ressources.
> >       It's still very OK to me to have (first) something quite simple
> >       as we currently have.
> 
> Sure, it is depend on how much this fast path can save, and how
> complex is this fast path.

Sure. 


-- Luc

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Simple SSA status
  2017-09-05  0:55               ` Luc Van Oostenryck
@ 2017-09-05  3:28                 ` Christopher Li
  2017-09-07  2:03                   ` Luc Van Oostenryck
  0 siblings, 1 reply; 33+ messages in thread
From: Christopher Li @ 2017-09-05  3:28 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: Dibyendu Majumdar, Linux-Sparse, Linus Torvalds

On Mon, Sep 4, 2017 at 8:55 PM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
>
> It can and it has pro & cons
> - pro: the CFG is know in advance, so no complication with gotos
> - cons: for non-goto blocks during linearization we know when
>         all the predecessors have been already processed
>         If done after lineariation, this info need to be somehow
>         deduced (but it's just keeping a flag).

For non-goto blocks the dominator tree is easy.

>>
>> I am thinking at some point we might just do a proper Cytron et al then
>> we might be able to discard those simplify. Those simplify are ad hoc
>> any way.
>
> If you look at the literature you will see two things:
> - either the article doesn't talk at all about which variable are
>   processed, so the alias problem is ignored and implicitly they
>   talk only about local vars (which address is never taken in any way)
> - or they explicitly say that for this or that reason SSA is only
>   well suited for local vars (which address is never taken in any way)
> - only rarely is there any mention that some alias analysis must be
>   done *before* conversion to select which variable can be converted
>
> So, if it is to do Cytron or else but only on local var, I largely
> prefer the Simple SSA way. And this still leave open the problem
> for the others variables.

I mean do the Cytron et al for all memory store/load, not just
limit to local variable. Yes, it will need the help of alias analyze.

> Of course, the later depends on the former but the problem for non-local
> variables is not the same as the one for local variables, you need to take
> in account the presence of calls, stores to unrelated vars (and non-var),
> ... You can consider this as being part of the alias analysis if you
> wish but ...

Local variable can have address taken and pass around as well.
If you only look at local variable has never been taken address,
yes, that is different than the rest. But that different is largely due to how
you select the set as well. If you consider the more general case,
local variable can have address taken, it is just like other memory
storage.

> Yes yes yes, it's wrong. What I meant by "often it's not a problem"
> is that currently, with the help of the phisources, often the code
> can deal with it and do something sensical.
> This doesn't make it less wrong and there is also enough situations
> where the code can't possibly do anything sensical with it.

Yes, I think we have the same view on this.

>> It is conflict with the SSA dominance property. Every incoming edge
>> should have one definition. It should  not have more than one
>> define per incoming edge.
>
> Currently, there is no relation between phi-nodes operands and
> incoming edges. The only relation that exist (and that the code
> need and care about) is the one between phi-nodes and their
> phi-sources.
>
>> Do you have example?
>
> I have lots of examples :)
> In a few days I'll submit a series of test cases and some will
> be exhibit this.  A simple example is:
>         #define TEST(N)                 \
>                 do {                    \
>                         d = b + a[N];   \
>                         if (d < b)      \
>                                 c++;    \
>                         b = d;          \
>                 } while (0)
>
>         int foo(int *a, int b, int c)
>         {
>                 int d;
>
>                 TEST(0);
>                 TEST(1);
>                 TEST(2);
>
>                 return d + c;
>         }
>
> Where you will have 3 phi-nodes (OK, there is 3 ifs), the first with
> 2 sources (OK), the second with 3 and the third with 4 sources while
> all nodes have 2 parents or less.
> Even more interestingly, if you add another TEST(x), you will have
> another phi-node with ... 5 sources and so one. One more source for
> each TEST(x) invocation, so yes this give a nice quadratic processing
> time and memory consumption.

Yes, we are talking about how things done wrong right now,
Not how things should be.

Chris

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Simple SSA status
  2017-09-05  3:28                 ` Christopher Li
@ 2017-09-07  2:03                   ` Luc Van Oostenryck
  2017-09-07  2:15                     ` Linus Torvalds
  2017-09-07  2:20                     ` Christopher Li
  0 siblings, 2 replies; 33+ messages in thread
From: Luc Van Oostenryck @ 2017-09-07  2:03 UTC (permalink / raw)
  To: Christopher Li; +Cc: Dibyendu Majumdar, Linux-Sparse, Linus Torvalds

On Mon, Sep 04, 2017 at 11:28:07PM -0400, Christopher Li wrote:
> On Mon, Sep 4, 2017 at 8:55 PM, Luc Van Oostenryck wrote:
> >
> > It can and it has pro & cons
> > - pro: the CFG is know in advance, so no complication with gotos
> > - cons: for non-goto blocks during linearization we know when
> >         all the predecessors have been already processed
> >         If done after lineariation, this info need to be somehow
> >         deduced (but it's just keeping a flag).
> 
> For non-goto blocks the dominator tree is easy.

Mmmm, I think you're confused.
The dominator tree is a function of the whole graph, not just
some blocks. Also, after linearization, there is no more for(),
while() & do-while() or gotos: all CFG edges are alike.

What was written here above is that, *during linearization*,
when you have a branch from a structured-programming statement,
you know what are all the possible parents of the current block.
For branches coming from goto, it's not the case.
This has nothing to do with the dominance tree as such.

What is trivial about the dominator tree is when it is done
on an acyclic graph (you just have do to a DFS and voilà)
but that's independent of the presence or absence of gotos.

[Of course, the presence of gotos *can* create irreducible graphs
 and for those calculating their dominance tree *can* take a bit
 more time, but that's all.]

> > Of course, the later depends on the former but the problem for non-local
> > variables is not the same as the one for local variables, you need to take
> > in account the presence of calls, stores to unrelated vars (and non-var),
> > ... You can consider this as being part of the alias analysis if you
> > wish but ...
> 
> Local variable can have address taken and pass around as well.

I suppose you hadn't seen the note I added:
> > Note: here above, I use 'local var' to mean '(local) var which can
> >       safely be converted because it cannot be aliased to anything'
This, of course, excludes variables which had their address (explicitly)
taken but also things like:
- arrays
- structures & unions
- global variables
- EQUIVALENCE declarations in fortran
- variables with alias attribute (as gcc extension)
- ...

-- Luc

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Simple SSA status
  2017-09-07  2:03                   ` Luc Van Oostenryck
@ 2017-09-07  2:15                     ` Linus Torvalds
  2017-09-07  2:55                       ` Christopher Li
  2017-09-07  3:05                       ` Luc Van Oostenryck
  2017-09-07  2:20                     ` Christopher Li
  1 sibling, 2 replies; 33+ messages in thread
From: Linus Torvalds @ 2017-09-07  2:15 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: Christopher Li, Dibyendu Majumdar, Linux-Sparse

On Wed, Sep 6, 2017 at 7:03 PM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
>
> What was written here above is that, *during linearization*,
> when you have a branch from a structured-programming statement,
> you know what are all the possible parents of the current block.
> For branches coming from goto, it's not the case.

Even this is not the case.

Not for a C compiler, it isn't.

Sure, if you do compilers for toy languages, it might be, but a C
compiler has things like "switch()" statements that may not nest with
loops etc. So even without a goto, you can pretty much have arbitrary
flow.

So I think people should entirely stop thinking that "goto" is somehow
special. It isn't. It is only special in toy languages like Pascal,
not in the real world.

It's much better to get rid of that "goto is special" mindset, and
instead say "all control flow is just a goto in the end".

So yes, get rid of the *real* special cases - those structured flow
constructs. They are nice special cases, but they are nice for
*humans*, they aren't actually relevant to computers, and they
shouldn't be.

So the real special case is those simple "structured" loops. You
should never design around them, because it will just hurt later.

Turn those "structures loop" constructs into goto's as quickly as you
can, and just forget about the structure that they really never
enforced anyway.

                   Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Simple SSA status
  2017-09-07  2:03                   ` Luc Van Oostenryck
  2017-09-07  2:15                     ` Linus Torvalds
@ 2017-09-07  2:20                     ` Christopher Li
  2017-09-07  3:46                       ` Luc Van Oostenryck
  1 sibling, 1 reply; 33+ messages in thread
From: Christopher Li @ 2017-09-07  2:20 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: Dibyendu Majumdar, Linux-Sparse, Linus Torvalds

On Wed, Sep 6, 2017 at 10:03 PM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
>
> Mmmm, I think you're confused.
> The dominator tree is a function of the whole graph, not just
> some blocks. Also, after linearization, there is no more for(),
> while() & do-while() or gotos: all CFG edges are alike.

I am not confused. I might jump my sentence too fast.

> What was written here above is that, *during linearization*,
> when you have a branch from a structured-programming statement,
> you know what are all the possible parents of the current block.
> For branches coming from goto, it's not the case.
> This has nothing to do with the dominance tree as such.

I am referring to the "A Simple, Fast Dominance Algorithm"
If there is no goto. The function  CFG is reducible graph.
For reducible graph that algorithm converge in constant iteration.
So finding the dominator tree is pretty much linear if the graph
is reducible.

> [Of course, the presence of gotos *can* create irreducible graphs
>  and for those calculating their dominance tree *can* take a bit
>  more time, but that's all.]

Right, the irreducible graphs is the trouble maker account for
those nasty worse case performance.

Chris

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Simple SSA status
  2017-09-07  2:15                     ` Linus Torvalds
@ 2017-09-07  2:55                       ` Christopher Li
  2017-09-07  3:17                         ` Luc Van Oostenryck
  2017-09-07  3:05                       ` Luc Van Oostenryck
  1 sibling, 1 reply; 33+ messages in thread
From: Christopher Li @ 2017-09-07  2:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Luc Van Oostenryck, Dibyendu Majumdar, Linux-Sparse

On Wed, Sep 6, 2017 at 10:15 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> Even this is not the case.
>
> Not for a C compiler, it isn't.
>
> Sure, if you do compilers for toy languages, it might be, but a C
> compiler has things like "switch()" statements that may not nest with
> loops etc. So even without a goto, you can pretty much have arbitrary
> flow.

I am trying to think if without goto statement, can we create irreducible
graph. The for reducible graph, finding dominator has fast path.
On way to look at the Simple SSA paper is that, it is taking advanctage
of that fast path. The slow path is actually worse than other algorithm.

I originally thinking we have to use goto to create irreducible graph.
Now you mention it. You are right, with creatively mixing of switch
statement and while loop. Some one can create irreducible graph
without goto.

> So I think people should entirely stop thinking that "goto" is somehow
> special. It isn't. It is only special in toy languages like Pascal,
> not in the real world.

Goto can introduce the irreducible graph. That is the only
special part. The compiler of course need to handle all kind of
CFG.

> It's much better to get rid of that "goto is special" mindset, and
> instead say "all control flow is just a goto in the end".
>
> So yes, get rid of the *real* special cases - those structured flow
> constructs. They are nice special cases, but they are nice for
> *humans*, they aren't actually relevant to computers, and they
> shouldn't be.

The only relevant part to computer is weather a fast path exist or not.

> So the real special case is those simple "structured" loops. You
> should never design around them, because it will just hurt later.

In a sense that is why I like the Cytron el al for the SSA conversion
over Simple SSA. Cytron el al does not have special case for
non structured CFG. If the CFG is reducible, the algorithm will
run faster naturally.

Chris

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Simple SSA status
  2017-09-07  2:15                     ` Linus Torvalds
  2017-09-07  2:55                       ` Christopher Li
@ 2017-09-07  3:05                       ` Luc Van Oostenryck
  1 sibling, 0 replies; 33+ messages in thread
From: Luc Van Oostenryck @ 2017-09-07  3:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Christopher Li, Dibyendu Majumdar, Linux-Sparse

On Wed, Sep 06, 2017 at 07:15:13PM -0700, Linus Torvalds wrote:
> On Wed, Sep 6, 2017 at 7:03 PM, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
> >
> > What was written here above is that, *during linearization*,
> > when you have a branch from a structured-programming statement,
> > you know what are all the possible parents of the current block.
> > For branches coming from goto, it's not the case.
> 
> Even this is not the case.
> 
> Not for a C compiler, it isn't.
> 
> Sure, if you do compilers for toy languages, it might be, but a C
> compiler has things like "switch()" statements that may not nest with
> loops etc. So even without a goto, you can pretty much have arbitrary
> flow.

Yes, sure. But what I wrote still hold because, even for such cases,
the linearization create trivial and useless BBs and for each of those
you know where the possible parents are and when their linearization
is finished while doing the linearization of each strutured construct
of the AST. For gotos & computed gotos, it's not the case.

It doesn't matter in itself, but it matters for the Simple SSA method
because the processing of BBs for which you don't know yet all the
parents must be delayed (it's the price you have to pay for being able
to do things mostly on the fly). This delaying happens everywhere you
have some kind of back-edge but knowing when all parents are done means
that you also know when you can 'un-delay' things (the article call
this 'unseal') and it's advantageous do do that as soon as possible.

So for this Simple SSA method, gotos are kinda annoying and computed
gotos even more so.

Anyway, I've put this method on hold for the moment, focusing on
the conversion of non-local variables (which Simple SSA can't do)
and several related issues.

> So I think people should entirely stop thinking that "goto" is somehow
> special. It isn't. It is only special in toy languages like Pascal,
> not in the real world.
> 
> It's much better to get rid of that "goto is special" mindset, and
> instead say "all control flow is just a goto in the end".
> 
> So yes, get rid of the *real* special cases - those structured flow
> constructs. They are nice special cases, but they are nice for
> *humans*, they aren't actually relevant to computers, and they
> shouldn't be.
> 
> So the real special case is those simple "structured" loops. You
> should never design around them, because it will just hurt later.
> 
> Turn those "structures loop" constructs into goto's as quickly as you
> can, and just forget about the structure that they really never
> enforced anyway.

Totally agree and it's what's done (I suppose by you and Chris,
years ago) and will stay as such: directly after linearization
you just have BBs with the list of parents & children.

-- Luc Van Oostenryck

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Simple SSA status
  2017-09-07  2:55                       ` Christopher Li
@ 2017-09-07  3:17                         ` Luc Van Oostenryck
  2017-09-07  4:04                           ` Christopher Li
  0 siblings, 1 reply; 33+ messages in thread
From: Luc Van Oostenryck @ 2017-09-07  3:17 UTC (permalink / raw)
  To: Christopher Li; +Cc: Linus Torvalds, Dibyendu Majumdar, Linux-Sparse

On Wed, Sep 06, 2017 at 10:55:43PM -0400, Christopher Li wrote:
> On Wed, Sep 6, 2017 at 10:15 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > Even this is not the case.
> >
> > Not for a C compiler, it isn't.
> >
> > Sure, if you do compilers for toy languages, it might be, but a C
> > compiler has things like "switch()" statements that may not nest with
> > loops etc. So even without a goto, you can pretty much have arbitrary
> > flow.
> 
> I am trying to think if without goto statement, can we create irreducible
> graph. The for reducible graph, finding dominator has fast path.

Bah ...
Depending on the algorithm you use, irreducible graphs may indeed be a bit
faster to calculate but since you don't know in advance if your graph is
reducible or not, you can't take advantage of that. So, at least to me,
this doesn't make it a fast path.

> On way to look at the Simple SSA paper is that, it is taking advanctage
> of that fast path.

No. It takes advantage at every point where it is known that all the parents
have already been processed. That's all.

The part where reducibility matter is after the conversion is done, if
you whish to simplify what they call trivial phi-nodes. And again, since
you don't know in advance if your graph is reducible or not, there is no
fast path.

-- Luc

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Simple SSA status
  2017-09-07  2:20                     ` Christopher Li
@ 2017-09-07  3:46                       ` Luc Van Oostenryck
  2017-09-07  4:02                         ` Christopher Li
  0 siblings, 1 reply; 33+ messages in thread
From: Luc Van Oostenryck @ 2017-09-07  3:46 UTC (permalink / raw)
  To: Christopher Li; +Cc: Dibyendu Majumdar, Linux-Sparse, Linus Torvalds

On Thu, Sep 7, 2017 at 4:20 AM, Christopher Li <sparse@chrisli.org> wrote:
>
> I am referring to the "A Simple, Fast Dominance Algorithm"
> ...
> For reducible graph that algorithm converge in constant iteration.

Where the 'constant' is a function of the loop complexity (maximum
nesting level) ...

-- Luc Van Oostenryck

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Simple SSA status
  2017-09-07  3:46                       ` Luc Van Oostenryck
@ 2017-09-07  4:02                         ` Christopher Li
  2017-09-07  4:24                           ` Luc Van Oostenryck
  0 siblings, 1 reply; 33+ messages in thread
From: Christopher Li @ 2017-09-07  4:02 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: Dibyendu Majumdar, Linux-Sparse, Linus Torvalds

On Wed, Sep 6, 2017 at 11:46 PM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:

>> For reducible graph that algorithm converge in constant iteration.
>
> Where the 'constant' is a function of the loop complexity (maximum
> nesting level) ...

I think we are talking different things.

My iteration means one pass of the reverse postorder.
In the Keith Cooper et al paper end of first page:
quote "
show that reverse postorder iterative scheme solves these
equations in a single pass over the CFG for reducible graphs.
"
page 2 reference 19 also mention solved in linear time
on reducible graphs.

It is just linear to the size of the CFG if graph is reducible.
I don't recall it is related to maximum nesting level of loops.

For the Simple SSA paper, maybe.

Chris

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Simple SSA status
  2017-09-07  3:17                         ` Luc Van Oostenryck
@ 2017-09-07  4:04                           ` Christopher Li
  2017-09-07  4:49                             ` Luc Van Oostenryck
  0 siblings, 1 reply; 33+ messages in thread
From: Christopher Li @ 2017-09-07  4:04 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: Linus Torvalds, Dibyendu Majumdar, Linux-Sparse

On Wed, Sep 6, 2017 at 11:17 PM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
>
> Bah ...
> Depending on the algorithm you use, irreducible graphs may indeed be a bit
> faster to calculate but since you don't know in advance if your graph is
 ^^^^^^^^
You mean slower here.

> reducible or not, you can't take advantage of that. So, at least to me,
> this doesn't make it a fast path.

Well, if the graph is not reducible, it will have a extra step to get
get minimal
SSA, that is what I consider the slow path.

For Cytron et al, the slowest part is finding dominator tree.
If using the Keith Cooper paper, reducible graph will converge
faster naturally. You don't need to know it is reducible or not in
advance to benefit from it.

>
>> On way to look at the Simple SSA paper is that, it is taking advanctage
>> of that fast path.
>
> No. It takes advantage at every point where it is known that all the parents
> have already been processed. That's all.

That is only half of the job. To get minimal SSA there is extra step
to remove duplicate phi nodes. That is quadratic if I recall correctly.

> The part where reducibility matter is after the conversion is done, if
> you whish to simplify what they call trivial phi-nodes. And again, since
> you don't know in advance if your graph is reducible or not, there is no
> fast path.

So you are saying in order to get minimal SSA, you need to run do the
remove extra phi node even if it is already minimal? Because we don't
know it is reducible or not.

Chris

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Simple SSA status
  2017-09-07  4:02                         ` Christopher Li
@ 2017-09-07  4:24                           ` Luc Van Oostenryck
  2017-09-07  4:33                             ` Christopher Li
  0 siblings, 1 reply; 33+ messages in thread
From: Luc Van Oostenryck @ 2017-09-07  4:24 UTC (permalink / raw)
  To: Christopher Li; +Cc: Dibyendu Majumdar, Linux-Sparse, Linus Torvalds

On Thu, Sep 7, 2017 at 6:02 AM, Christopher Li <sparse@chrisli.org> wrote:
> On Wed, Sep 6, 2017 at 11:46 PM, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
>
>>> For reducible graph that algorithm converge in constant iteration.
>>
>> Where the 'constant' is a function of the loop complexity (maximum
>> nesting level) ...
>
> I think we are talking different things.

Since you're talking performance and complexity, I suggest that you read
carefully the section "Complexity Analysis" of the article and look where
it's written 'd(G)' and 'loop connectedness'.

-- Luc

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Simple SSA status
  2017-09-07  4:24                           ` Luc Van Oostenryck
@ 2017-09-07  4:33                             ` Christopher Li
  0 siblings, 0 replies; 33+ messages in thread
From: Christopher Li @ 2017-09-07  4:33 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: Dibyendu Majumdar, Linux-Sparse, Linus Torvalds

On Thu, Sep 7, 2017 at 12:24 AM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
> On Thu, Sep 7, 2017 at 6:02 AM, Christopher Li <sparse@chrisli.org> wrote:
>> On Wed, Sep 6, 2017 at 11:46 PM, Luc Van Oostenryck
>> <luc.vanoostenryck@gmail.com> wrote:
>>
>>>> For reducible graph that algorithm converge in constant iteration.

My statement is regarding *reducible* graph.

>>>
>>> Where the 'constant' is a function of the loop complexity (maximum
>>> nesting level) ...
>>
>> I think we are talking different things.
>
> Since you're talking performance and complexity, I suggest that you read
> carefully the section "Complexity Analysis" of the article and look where
> it's written 'd(G)' and 'loop connectedness'.

I assume you are talking page 7.  The 'd(G)'  and the "loop contentedness"
is actually referring to the *irreducible" graph. The Figure 4 is classic
*irreducible* graph. That is why I feel we are talking about different things.

Chris

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Simple SSA status
  2017-09-07  4:04                           ` Christopher Li
@ 2017-09-07  4:49                             ` Luc Van Oostenryck
  2017-09-07  6:17                               ` Christopher Li
  0 siblings, 1 reply; 33+ messages in thread
From: Luc Van Oostenryck @ 2017-09-07  4:49 UTC (permalink / raw)
  To: Christopher Li; +Cc: Linus Torvalds, Dibyendu Majumdar, Linux-Sparse

On Thu, Sep 7, 2017 at 6:04 AM, Christopher Li <sparse@chrisli.org> wrote:
> On Wed, Sep 6, 2017 at 11:17 PM, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
>>
>> Bah ...
>> Depending on the algorithm you use, irreducible graphs may indeed be a bit
>> faster to calculate but since you don't know in advance if your graph is
>  ^^^^^^^^
> You mean slower here.

Of course.

>> reducible or not, you can't take advantage of that. So, at least to me,
>> this doesn't make it a fast path.
>
> Well, if the graph is not reducible, it will have a extra step to get
> get minimal
> SSA, that is what I consider the slow path.
>
> For Cytron et al, the slowest part is finding dominator tree.
> If using the Keith Cooper paper, reducible graph will converge
> faster naturally. You don't need to know it is reducible or not in
> advance to benefit from it.

OK, it's just faster on reducible graphs.

>>
>>> On way to look at the Simple SSA paper is that, it is taking advanctage
>>> of that fast path.
>>
>> No. It takes advantage at every point where it is known that all the parents
>> have already been processed. That's all.
>
> That is only half of the job.

No, it's the main job.

> To get minimal SSA there is extra step
> to remove duplicate phi nodes. That is quadratic if I recall correctly.

Quadratic in what?
If you have plenty of phi-nodes with plenty of operands, you will have
more work to do, irrespectively of the size of the graph and its reducibility.

Also, even if the reducibility is a property of the whole graph, it's not really
what matters here because each phi-chain is to be taken independently.
In other words, what matters here is if the *subgraph* involved in the
phi-chain is reducible or not. So even if the graph is irreducible, you can
have 99% of the phi as if the graph was reducible.

Anyway, this minimal SSA and irreducibility is a nice theoretical question
you seem a bit obsessed with. I absolutely don't care about because in
practice it won't make any noticeable difference.
OTOH, there is a huge number of things that can be done on sparse
which *make* a lot of differences.

>> The part where reducibility matter is after the conversion is done, if
>> you whish to simplify what they call trivial phi-nodes. And again, since
>> you don't know in advance if your graph is reducible or not, there is no
>> fast path.
>
> So you are saying in order to get minimal SSA, you need to run do the
> remove extra phi node even if it is already minimal? Because we don't
> know it is reducible or not.

Yes and no.
There are a lot of cases where you can simplify away the phi-nodes by
simple ways.  For the remaining ones, yes, most probably.

[Warning, I'm only talking here about the situation with Simple SSA,
with the classic methods you have minimal SSA almost by definition but
again 'minimal SSA' is 'minimal' only in this very narrow sense and it doesn't
mean that no phi-nodes can be removed. For example, I'm pretty
sure that what the "Simple SSA" article call 'trivial phi' *can* also
happen even in minimal SSA].

-- Luc

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Simple SSA status
  2017-09-07  4:49                             ` Luc Van Oostenryck
@ 2017-09-07  6:17                               ` Christopher Li
  2017-09-07  7:38                                 ` Luc Van Oostenryck
  0 siblings, 1 reply; 33+ messages in thread
From: Christopher Li @ 2017-09-07  6:17 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: Linus Torvalds, Dibyendu Majumdar, Linux-Sparse

On Thu, Sep 7, 2017 at 12:49 AM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:

>>
>> That is only half of the job.
>
> No, it's the main job.

From the time complexity point of the view,
it is the not the main job. The later part is.

>> To get minimal SSA there is extra step
>> to remove duplicate phi nodes. That is quadratic if I recall correctly.
>
> Quadratic in what?

Simple SSA paper 4.1 Time Complexity.

O(P + B(B +E)*V*V).
that is O( B*B*V*V + B*E*V*V), where B is basic block,
E is edge, V is variables. That is why I consider it quadratic.

The first half or the part you consider the main job is not
nearly as bad.

> If you have plenty of phi-nodes with plenty of operands, you will have
> more work to do, irrespectively of the size of the graph and its reducibility.

The Keith et al paper is related to reducible. The simple SSA in the O()
form is not relate to reducible.

> Also, even if the reducibility is a property of the whole graph, it's not really
> what matters here because each phi-chain is to be taken independently.
> In other words, what matters here is if the *subgraph* involved in the
> phi-chain is reducible or not. So even if the graph is irreducible, you can
> have 99% of the phi as if the graph was reducible.
>
> Anyway, this minimal SSA and irreducibility is a nice theoretical question
> you seem a bit obsessed with. I absolutely don't care about because in

Minimal SSA is actually nice to have. Not minimal SSA will make the
optimization run less effectively. Cytron el al give minimal form.

> practice it won't make any noticeable difference.

I think it will have some difference there in terms of optimizations.
If we implement Cytron et al in the future, we can compare the result.

> OTOH, there is a huge number of things that can be done on sparse
> which *make* a lot of differences.

Sure, that is a separate patch.

> [Warning, I'm only talking here about the situation with Simple SSA,
> with the classic methods you have minimal SSA almost by definition but
> again 'minimal SSA' is 'minimal' only in this very narrow sense and it doesn't
> mean that no phi-nodes can be removed. For example, I'm pretty
> sure that what the "Simple SSA" article call 'trivial phi' *can* also
> happen even in minimal SSA].

I think trivial phi can't happen in minimal SSA. If you place phi node
in DF, it will need to have more than one parent and more than one
define.

OTOH, minimal SSA can have *dead* phi nodes. Liveness is
independent from the SSA form.

Chris

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Simple SSA status
  2017-09-07  6:17                               ` Christopher Li
@ 2017-09-07  7:38                                 ` Luc Van Oostenryck
  0 siblings, 0 replies; 33+ messages in thread
From: Luc Van Oostenryck @ 2017-09-07  7:38 UTC (permalink / raw)
  To: Christopher Li; +Cc: Linus Torvalds, Dibyendu Majumdar, Linux-Sparse

On Thu, Sep 7, 2017 at 8:17 AM, Christopher Li <sparse@chrisli.org> wrote:
> On Thu, Sep 7, 2017 at 12:49 AM, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
>

>
> Minimal SSA is actually nice to have. Not minimal SSA will make the
> optimization run less effectively. Cytron el al give minimal form.
>
>> practice it won't make any noticeable difference.
>
> I think it will have some difference there in terms of optimizations.
> If we implement Cytron et al in the future, we can compare the result.

That's the difference between 'in practice' and '(in theory) it will have some'.

In practice, it will, of course, make a big difference *if* what you
have is not at
all minimal, like dumb-SSA. But if 99% of your phis are in fact needed and only
1% are part of the non-minimality, you won't see any *significant* differences.
It's not as if the presence of a single superfluous phi-node will
totally collapse
your performance.

In practice, it also won't make any significative difference because *even* if
you use gotos, this won't make automatically your graph irreducible.
To make your graph irreducible you need to have a goto that jump *into*
a loop and normally it's not something people write (think about it, when it
could make sense?). So in practice, even with gotos, you will rarely have
irreducible graphs and so even without implementing the full phi-node
simplification, you will have minimal SSA.

In practice, it also won't make a difference because the difference in
complexity
will only show up if you have a very large number of nodes and in this case
you have plenty of other problems to worry about (like memory consumption).

>> [Warning, I'm only talking here about the situation with Simple SSA,
>> with the classic methods you have minimal SSA almost by definition but
>> again 'minimal SSA' is 'minimal' only in this very narrow sense and it doesn't
>> mean that no phi-nodes can be removed. For example, I'm pretty
>> sure that what the "Simple SSA" article call 'trivial phi' *can* also
>> happen even in minimal SSA].
>
> I think trivial phi can't happen in minimal SSA. If you place phi node
> in DF, it will need to have more than one parent and more than one
> define.

Thing is that these trivial/redundant phis (as defined in the SSSA article)
can also happen when phi-nodes are placed at the DF. They happen
naturally in some loops. In the article, when talking about these, they
explicitly say "This implies a definition of minimality [that is independent
of the source program and] *more strict* than the definition by Cytron et al."
So it's pretty clear that even with minimal SSA (Cytron's minimality)
you can have those redundant phis. In other words, if you eliminate all
redundant phi-nodes as described in SSSA not only you will not
have more phi-nodes than Cytron but you *can*, in some case, have less!

Anyway, the advantages of the SSSA is not here.

-- Luc

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2017-09-07  7:38 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-21 19:39 master merge plans Christopher Li
2017-08-22 13:32 ` Luc Van Oostenryck
2017-08-22 14:47   ` Christopher Li
2017-08-22 15:51     ` Christopher Li
2017-08-22 20:05       ` Luc Van Oostenryck
2017-08-23 20:50         ` Christopher Li
2017-08-29 11:27           ` Christopher Li
2017-09-03 19:24             ` Luc Van Oostenryck
2017-08-22 14:53   ` Dibyendu Majumdar
2017-08-22 14:59     ` Christopher Li
2017-09-03 20:26       ` Simple SSA status Luc Van Oostenryck
2017-09-04 18:29         ` Christopher Li
2017-09-04 20:07           ` Luc Van Oostenryck
2017-09-04 20:37             ` Dibyendu Majumdar
2017-09-04 20:55               ` Luc Van Oostenryck
2017-09-04 21:24                 ` Dibyendu Majumdar
2017-09-04 23:31             ` Christopher Li
2017-09-05  0:55               ` Luc Van Oostenryck
2017-09-05  3:28                 ` Christopher Li
2017-09-07  2:03                   ` Luc Van Oostenryck
2017-09-07  2:15                     ` Linus Torvalds
2017-09-07  2:55                       ` Christopher Li
2017-09-07  3:17                         ` Luc Van Oostenryck
2017-09-07  4:04                           ` Christopher Li
2017-09-07  4:49                             ` Luc Van Oostenryck
2017-09-07  6:17                               ` Christopher Li
2017-09-07  7:38                                 ` Luc Van Oostenryck
2017-09-07  3:05                       ` Luc Van Oostenryck
2017-09-07  2:20                     ` Christopher Li
2017-09-07  3:46                       ` Luc Van Oostenryck
2017-09-07  4:02                         ` Christopher Li
2017-09-07  4:24                           ` Luc Van Oostenryck
2017-09-07  4:33                             ` Christopher Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.