From mboxrd@z Thu Jan 1 00:00:00 1970 From: Luc Van Oostenryck Subject: Re: [RFC] rationale for systematic elimination of OP_SYMADDR instructions Date: Fri, 11 Aug 2017 14:25:51 +0200 Message-ID: <20170811122549.3v44es54aghlrskx@ltop.local> References: <20170309142044.96408-1-luc.vanoostenryck@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from mail-wm0-f41.google.com ([74.125.82.41]:37326 "EHLO mail-wm0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752196AbdHKMZy (ORCPT ); Fri, 11 Aug 2017 08:25:54 -0400 Received: by mail-wm0-f41.google.com with SMTP id i66so46093000wmg.0 for ; Fri, 11 Aug 2017 05:25:54 -0700 (PDT) Content-Disposition: inline In-Reply-To: Sender: linux-sparse-owner@vger.kernel.org List-Id: linux-sparse@vger.kernel.org To: Christopher Li Cc: Linux-Sparse , Linus Torvalds On Thu, Aug 10, 2017 at 09:17:44PM -0400, Christopher Li wrote: > First of all, I want to clarify that I ask this question is > to satisfy my curiously why it is done this way, > either it is the best optimal way to do it. > > It is not reason to reject your patch. I trust you have good > reason to do it, I will accept the patch even if I don't fully > understand it. I still want to understand it after applying > it. OK, I'll try to explain it the best I can. > Do CSE do you mean just CSE OP_SYMADDR or CSE the complicated > instruction expand from OP_SYMADDR. Just the OP_SYMADDR, there is no expansion for them at this level. > Let say you have code like this, the symbol is assign in different > scope block. > > Before CSE there is 3 OP_SYMADDR "a". After CSE where is those 3 > OP_SYMADDR "a" is likely to be combine into one? Where would the > CSE place the OP_SYMADDR if it is not one of the beginning or dominator > etc? Exactly like any other expression that is CSEed. Yes, I elude your question, on purpose, because the question about *where* those instructions need to be is unrelated to the question "must these instructions be present or not". > In that case, you might just place it before load or store. You don't seem > to need CSE to do it. Sorry I feel that I am still missing some big picture. > I am desperately trying to fill it. Thing is that the current linearizer already *create* them, all of them, just before the load or store that need them. There are also a few other situations where they are needed: when doing address calculation for symbols. For example: extern int a[]; int *ptr; ... ptr = &a[i]; But in all cases, the OP_SYMADDRs are correctly *created*. The problme is that (in most cases) these OP_SYMADDR are *directly* optimized away or more exactly back-folded with their symbol. For example: extern int a; int v; ... v = a; The linearized code is something like: symaddr %r1, a load %r2, %r1[0] And this is *exactly* what is needed. Among others things the load take as operand an *address* and not a symbol which is something that belong more to the source language. But after the very first pass of simplify_instruction() the symaddr is 'simplified away' and we get: load %r2, a[0] It's just this simplification that should be removed. > I also think that you make the OP_SYMADDR a lower level IR than > what I original have in mind writing linearizer. I didn't create them :) > I expect there is a separate > phase where you translate the sparse IR into machine level code, that > is where you can insert the equivalent of OP_SYMADDR. Of course. But again, look at the title of the thread: the question has never been about the creation/insertion of these OP_SYMADDRs but about their non-elimination. > OK. For the sparse checker question. I won't answer now to this question because I think, if it is still pertinant, it needs to be reformulated taking in account the fact that *all* needed OP_SYMADDR are already created during linearization. -- Luc