From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christopher Li Subject: Re: [RFC] rationale for systematic elimination of OP_SYMADDR instructions Date: Wed, 26 Apr 2017 03:20:35 +0800 Message-ID: References: <20170309142044.96408-1-luc.vanoostenryck@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Return-path: Received: from mail-it0-f45.google.com ([209.85.214.45]:36533 "EHLO mail-it0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1428771AbdDYTUh (ORCPT ); Tue, 25 Apr 2017 15:20:37 -0400 Received: by mail-it0-f45.google.com with SMTP id g66so25557177ite.1 for ; Tue, 25 Apr 2017 12:20:36 -0700 (PDT) In-Reply-To: <20170309142044.96408-1-luc.vanoostenryck@gmail.com> Sender: linux-sparse-owner@vger.kernel.org List-Id: linux-sparse@vger.kernel.org To: Luc Van Oostenryck Cc: Linux-Sparse , Linus Torvalds On Thu, Mar 9, 2017 at 10:20 PM, Luc Van Oostenryck wrote: > While investigating some problems related to code generation > I realized that OP_SYMADDR are systematically eliminated, > the target address are simply replaced by the symbol itself. > > While it's not wrong per se as it all depends to the semantic > we want to give to pseudos and the instructions and how high- > or low-level we want to IR, I don't think it was the intention > to remove them and more importantly I don't think it's desirable. > > Those OP_SYMADDR allowed to make a clear separation between a symbol > (a name with a type and info for storage & linkage) and its address > (which can be stored in memory or in a register and on which > arithmetic operations can then be done on it). Once these addresses > are replaced by the symbol itself, those symbols can appears almost > everywhere in the linearized code: > - in calls' arguments, > - in adds and subs (while doing pointer arithmetic), > - in casts, > - in load & stores, > - ... > and they complicate things considerably once you begin to be > interested concretly in things after linearization & simplification > since soon or later you will need the address anyway. > > So my question is: > "is there a good reason to eliminate those instructions?", This change is introduce in 962279e8 by Linus: Remove OP_SETVAL after symbol-pseudo simplification. We can just replace all users with the symbol pseudo directly. This means that we can no longer re-do symbol simplification after CSE, and we need to rely on the generic memop simplification. I can see the reason to do that is simplify the CSE. Before this change, every reference to the symbol will do a OP_SETVAL (or OP_SYMADDR now days) to get the address into a new pseudo. That is extra work for the CSE to discover that: "Oh, all those different pseudo are actually the same address for the same symbol. Let's replace it with the same pseudo." I haven't understand why things are more complicate after linearization if we replace all the symbol pseudo into one? Even if we don't do it here, wouldn't the CSE should do that any way? The way I see it, the pseudo of the symbol *is* the address of the symbol, I don't see a problem using the address of the symbol. Maybe you have some specific usage case in mind. Can you give some example? Chris