All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v0 0/4] Give a type to constants too
@ 2017-03-11 15:47 Luc Van Oostenryck
  2017-03-11 15:47 ` [PATCH v0 1/4] be more careful with concat_user_list() Luc Van Oostenryck
                   ` (5 more replies)
  0 siblings, 6 replies; 37+ messages in thread
From: Luc Van Oostenryck @ 2017-03-11 15:47 UTC (permalink / raw)
  To: linux-sparse
  Cc: Dibyendu Majumdar, Christopher Li, Jeff Garzik, Pekka Enberg,
	Luc Van Oostenryck

This is a RFC for giving a type to constants/PSEUDO_VALs.

Not having this info if fine for the linearization/simplification
but is quite painful once trying to generate code of it.

This serie try to do this by:
- shuffling some fields in struct pseudo to allow
  PSEUDO_VAL to have the ::sym field too without
  without increasing the size of the structure.
- mechanically initialize each PSEUDO_VAL's ::sym
  with the appropriate type.

This serie have been tested on the test suite only.
Thsi serie also doesn't make any use yet of this type info.

Luc Van Oostenryck (4):
  be more careful with concat_user_list()
  make space for PSEUDO_VAL have a type
  add helper pseudo_type()
  give a type to PSEUDO_VALs

 flow.c      |  5 +++--
 linearize.c | 17 +++++++++--------
 linearize.h | 24 +++++++++++++++++++++---
 memops.c    |  2 +-
 simplify.c  | 18 +++++++++---------
 5 files changed, 43 insertions(+), 23 deletions(-)

-- 
2.11.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v0 1/4] be more careful with concat_user_list()
  2017-03-11 15:47 [RFC v0 0/4] Give a type to constants too Luc Van Oostenryck
@ 2017-03-11 15:47 ` Luc Van Oostenryck
  2017-04-27 22:41   ` Christopher Li
  2017-03-11 15:47 ` [PATCH v1 2/4] make space for PSEUDO_VAL have a type Luc Van Oostenryck
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 37+ messages in thread
From: Luc Van Oostenryck @ 2017-03-11 15:47 UTC (permalink / raw)
  To: linux-sparse
  Cc: Dibyendu Majumdar, Christopher Li, Jeff Garzik, Pekka Enberg,
	Luc Van Oostenryck

In convert_instruction_target(), once all users have been converted
the old user list is concatened to the one of the replacing pseudo.
But this pseudo may be one for which a user list is meaningless,
like PSEUDO_VAL.

While as such it's not a problem, it inhibit the reuse of the users
pointer for other uses.

Fix this by doing the concatenation only if the pseudo can have
an use-list.

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
---
 flow.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/flow.c b/flow.c
index 8111e1ae6..6878c0b4c 100644
--- a/flow.c
+++ b/flow.c
@@ -254,7 +254,8 @@ void convert_instruction_target(struct instruction *insn, pseudo_t src)
 			*pu->userp = src;
 		}
 	} END_FOR_EACH_PTR(pu);
-	concat_user_list(target->users, &src->users);
+	if (has_use_list(src))
+		concat_user_list(target->users, &src->users);
 	target->users = NULL;
 }
 
-- 
2.11.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v1 2/4] make space for PSEUDO_VAL have a type
  2017-03-11 15:47 [RFC v0 0/4] Give a type to constants too Luc Van Oostenryck
  2017-03-11 15:47 ` [PATCH v0 1/4] be more careful with concat_user_list() Luc Van Oostenryck
@ 2017-03-11 15:47 ` Luc Van Oostenryck
  2017-03-11 15:47 ` [PATCH v0 3/4] add helper pseudo_type() Luc Van Oostenryck
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 37+ messages in thread
From: Luc Van Oostenryck @ 2017-03-11 15:47 UTC (permalink / raw)
  To: linux-sparse
  Cc: Dibyendu Majumdar, Christopher Li, Jeff Garzik, Pekka Enberg,
	Luc Van Oostenryck

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
---
 linearize.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/linearize.h b/linearize.h
index 9d192f7aa..7a3403095 100644
--- a/linearize.h
+++ b/linearize.h
@@ -31,12 +31,14 @@ enum pseudo_type {
 struct pseudo {
 	int nr;
 	enum pseudo_type type;
-	struct pseudo_user_list *users;
+	union {
+		struct pseudo_user_list *users;
+		long long value;	// PSEUDO_VAL
+	};
 	struct ident *ident;
 	union {
 		struct symbol *sym;	// PSEUDO_SYM & ARG
 		struct instruction *def;// PSEUDO_REG & PHI
-		long long value;	// PSEUDO_VAL
 	};
 	void *priv;
 };
-- 
2.11.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v0 3/4] add helper pseudo_type()
  2017-03-11 15:47 [RFC v0 0/4] Give a type to constants too Luc Van Oostenryck
  2017-03-11 15:47 ` [PATCH v0 1/4] be more careful with concat_user_list() Luc Van Oostenryck
  2017-03-11 15:47 ` [PATCH v1 2/4] make space for PSEUDO_VAL have a type Luc Van Oostenryck
@ 2017-03-11 15:47 ` Luc Van Oostenryck
  2017-03-11 15:47 ` [PATCH v0 4/4] give a type to PSEUDO_VALs Luc Van Oostenryck
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 37+ messages in thread
From: Luc Van Oostenryck @ 2017-03-11 15:47 UTC (permalink / raw)
  To: linux-sparse
  Cc: Dibyendu Majumdar, Christopher Li, Jeff Garzik, Pekka Enberg,
	Luc Van Oostenryck

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
---
 linearize.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/linearize.h b/linearize.h
index 7a3403095..a14733bd9 100644
--- a/linearize.h
+++ b/linearize.h
@@ -310,6 +310,22 @@ static inline void use_pseudo(struct instruction *insn, pseudo_t p, pseudo_t *pp
 		add_pseudo_user_ptr(alloc_pseudo_user(insn, pp), &p->users);
 }
 
+static inline struct symbol *pseudo_type(pseudo_t pseudo)
+{
+	switch (pseudo->type) {
+	case PSEUDO_SYM:
+	case PSEUDO_ARG:
+	case PSEUDO_VAL:
+		return pseudo->sym;
+	case PSEUDO_REG:
+	case PSEUDO_PHI:
+		return pseudo->def->type;
+	case PSEUDO_VOID:
+	default:
+		return &void_ctype;
+	}
+}
+
 static inline void remove_bb_from_list(struct basic_block_list **list, struct basic_block *entry, int count)
 {
 	delete_ptr_list_entry((struct ptr_list **)list, entry, count);
-- 
2.11.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v0 4/4] give a type to PSEUDO_VALs
  2017-03-11 15:47 [RFC v0 0/4] Give a type to constants too Luc Van Oostenryck
                   ` (2 preceding siblings ...)
  2017-03-11 15:47 ` [PATCH v0 3/4] add helper pseudo_type() Luc Van Oostenryck
@ 2017-03-11 15:47 ` Luc Van Oostenryck
  2017-03-12 20:30 ` [RFC v0 0/4] Give a type to constants, considered harmful Luc Van Oostenryck
  2017-03-16 17:25 ` [RFC v0 0/4] Give a type to constants too Linus Torvalds
  5 siblings, 0 replies; 37+ messages in thread
From: Luc Van Oostenryck @ 2017-03-11 15:47 UTC (permalink / raw)
  To: linux-sparse
  Cc: Dibyendu Majumdar, Christopher Li, Jeff Garzik, Pekka Enberg,
	Luc Van Oostenryck

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
---
 flow.c      |  2 +-
 linearize.c | 17 +++++++++--------
 linearize.h |  2 +-
 memops.c    |  2 +-
 simplify.c  | 18 +++++++++---------
 5 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/flow.c b/flow.c
index 6878c0b4c..8013628c8 100644
--- a/flow.c
+++ b/flow.c
@@ -474,7 +474,7 @@ found:
 		if (!local)
 			return 0;
 		check_access(insn);
-		convert_load_instruction(insn, value_pseudo(0));
+		convert_load_instruction(insn, value_pseudo(0, insn->type));
 		return 1;
 	}
 
diff --git a/linearize.c b/linearize.c
index 255231c60..16dd8b3f3 100644
--- a/linearize.c
+++ b/linearize.c
@@ -779,7 +779,7 @@ static pseudo_t symbol_pseudo(struct entrypoint *ep, struct symbol *sym)
 	return pseudo;
 }
 
-pseudo_t value_pseudo(long long val)
+pseudo_t value_pseudo(long long val, struct symbol *type)
 {
 #define MAX_VAL_HASH 64
 	static struct pseudo_list *prev[MAX_VAL_HASH];
@@ -788,13 +788,14 @@ pseudo_t value_pseudo(long long val)
 	pseudo_t pseudo;
 
 	FOR_EACH_PTR(*list, pseudo) {
-		if (pseudo->value == val)
+		if (pseudo->value == val && pseudo->sym == type)
 			return pseudo;
 	} END_FOR_EACH_PTR(pseudo);
 
 	pseudo = __alloc_pseudo(0);
 	pseudo->type = PSEUDO_VAL;
 	pseudo->value = val;
+	pseudo->sym = type;
 	add_pseudo(list, pseudo);
 
 	/* Value pseudos have neither nr, usage nor def */
@@ -951,10 +952,10 @@ static pseudo_t linearize_store_gen(struct entrypoint *ep,
 		unsigned long long mask = (1ULL << ad->bit_size)-1;
 
 		if (shift) {
-			store = add_binary_op(ep, ad->source_type, OP_SHL, value, value_pseudo(shift));
+			store = add_binary_op(ep, ad->source_type, OP_SHL, value, value_pseudo(shift, &int_ctype));
 			mask <<= shift;
 		}
-		orig = add_binary_op(ep, ad->source_type, OP_AND, orig, value_pseudo(~mask));
+		orig = add_binary_op(ep, ad->source_type, OP_AND, orig, value_pseudo(~mask, ad->source_type));
 		store = add_binary_op(ep, ad->source_type, OP_OR, orig, store);
 	}
 	add_store(ep, ad, store);
@@ -999,7 +1000,7 @@ static pseudo_t linearize_load_gen(struct entrypoint *ep, struct access_data *ad
 	pseudo_t new = add_load(ep, ad);
 
 	if (ad->bit_offset) {
-		pseudo_t shift = value_pseudo(ad->bit_offset);
+		pseudo_t shift = value_pseudo(ad->bit_offset, &int_ctype);
 		pseudo_t newval = add_binary_op(ep, ad->source_type, OP_LSR, new, shift);
 		new = newval;
 	}
@@ -1031,7 +1032,7 @@ static pseudo_t linearize_inc_dec(struct entrypoint *ep, struct expression *expr
 		return VOID;
 
 	old = linearize_load_gen(ep, &ad);
-	one = value_pseudo(expr->op_value);
+	one = value_pseudo(expr->op_value, expr->ctype);
 	new = add_binary_op(ep, expr->ctype, op, old, one);
 	linearize_store_gen(ep, new, &ad);
 	finish_address_gen(ep, &ad);
@@ -1070,7 +1071,7 @@ static pseudo_t linearize_regular_preop(struct entrypoint *ep, struct expression
 	case '+':
 		return pre;
 	case '!': {
-		pseudo_t zero = value_pseudo(0);
+		pseudo_t zero = value_pseudo(0, expr->unop->ctype);
 		return add_binary_op(ep, expr->ctype, OP_SET_EQ, pre, zero);
 	}
 	case '~':
@@ -1556,7 +1557,7 @@ pseudo_t linearize_expression(struct entrypoint *ep, struct expression *expr)
 		return add_symbol_address(ep, expr);
 
 	case EXPR_VALUE:
-		return value_pseudo(expr->value);
+		return value_pseudo(expr->value, expr->ctype);
 
 	case EXPR_STRING: case EXPR_FVALUE: case EXPR_LABEL:
 		return add_setval(ep, expr->ctype, expr);
diff --git a/linearize.h b/linearize.h
index a14733bd9..6a6eb1448 100644
--- a/linearize.h
+++ b/linearize.h
@@ -351,7 +351,7 @@ extern void insert_branch(struct basic_block *bb, struct instruction *br, struct
 
 pseudo_t alloc_phi(struct basic_block *source, pseudo_t pseudo, struct symbol *type);
 pseudo_t alloc_pseudo(struct instruction *def);
-pseudo_t value_pseudo(long long val);
+pseudo_t value_pseudo(long long val, struct symbol *type);
 
 struct entrypoint *linearize_symbol(struct symbol *sym);
 int unssa(struct entrypoint *ep);
diff --git a/memops.c b/memops.c
index 187a63284..05ce698c8 100644
--- a/memops.c
+++ b/memops.c
@@ -125,7 +125,7 @@ static void simplify_loads(struct basic_block *bb)
 				if (!dominators) {
 					if (local) {
 						assert(pseudo->type != PSEUDO_ARG);
-						convert_load_instruction(insn, value_pseudo(0));
+						convert_load_instruction(insn, value_pseudo(0, insn->type));
 					}
 					goto next_load;
 				}
diff --git a/simplify.c b/simplify.c
index a84e4787f..c9969531a 100644
--- a/simplify.c
+++ b/simplify.c
@@ -368,7 +368,7 @@ static int simplify_asr(struct instruction *insn, pseudo_t pseudo, long long val
 
 	if (value >= size) {
 		warning(insn->pos, "right shift by bigger than source value");
-		return replace_with_pseudo(insn, value_pseudo(0));
+		return replace_with_pseudo(insn, value_pseudo(0, insn->type));
 	}
 	if (!value)
 		return replace_with_pseudo(insn, pseudo);
@@ -478,7 +478,7 @@ static int simplify_constant_rightside(struct instruction *insn)
 	case OP_SUB:
 		if (value) {
 			insn->opcode = OP_ADD;
-			insn->src2 = value_pseudo(-value);
+			insn->src2 = value_pseudo(-value, insn->src2->sym);
 			return REPEAT_CSE;
 		}
 	/* Fall through */
@@ -495,7 +495,7 @@ static int simplify_constant_rightside(struct instruction *insn)
 
 	case OP_MODU: case OP_MODS:
 		if (value == 1)
-			return replace_with_pseudo(insn, value_pseudo(0));
+			return replace_with_pseudo(insn, value_pseudo(0, insn->type));
 		return 0;
 
 	case OP_DIVU: case OP_DIVS:
@@ -656,7 +656,7 @@ static int simplify_constant_binop(struct instruction *insn)
 	}
 	res &= bits;
 
-	replace_with_pseudo(insn, value_pseudo(res));
+	replace_with_pseudo(insn, value_pseudo(res, insn->type));
 	return REPEAT_CSE;
 }
 
@@ -670,14 +670,14 @@ static int simplify_binop_same_args(struct instruction *insn, pseudo_t arg)
 			warning(insn->pos, "self-comparison always evaluates to false");
 	case OP_SUB:
 	case OP_XOR:
-		return replace_with_pseudo(insn, value_pseudo(0));
+		return replace_with_pseudo(insn, value_pseudo(0, insn->type));
 
 	case OP_SET_EQ:
 	case OP_SET_LE: case OP_SET_GE:
 	case OP_SET_BE: case OP_SET_AE:
 		if (Wtautological_compare)
 			warning(insn->pos, "self-comparison always evaluates to true");
-		return replace_with_pseudo(insn, value_pseudo(1));
+		return replace_with_pseudo(insn, value_pseudo(1, insn->type));
 
 	case OP_AND:
 	case OP_OR:
@@ -686,7 +686,7 @@ static int simplify_binop_same_args(struct instruction *insn, pseudo_t arg)
 	case OP_AND_BOOL:
 	case OP_OR_BOOL:
 		remove_usage(arg, &insn->src2);
-		insn->src2 = value_pseudo(0);
+		insn->src2 = value_pseudo(0, pseudo_type(insn->src1));
 		insn->opcode = OP_SET_NE;
 		return REPEAT_CSE;
 
@@ -789,7 +789,7 @@ static int simplify_constant_unop(struct instruction *insn)
 	mask = 1ULL << (insn->size-1);
 	res &= mask | (mask-1);
 	
-	replace_with_pseudo(insn, value_pseudo(res));
+	replace_with_pseudo(insn, value_pseudo(res, insn->type));
 	return REPEAT_CSE;
 }
 
@@ -913,7 +913,7 @@ static int simplify_cast(struct instruction *insn)
 	if (constant(src)) {
 		int sign = orig_type->ctype.modifiers & MOD_SIGNED;
 		long long val = get_cast_value(src->value, orig_size, size, sign);
-		src = value_pseudo(val);
+		src = value_pseudo(val, insn->type);
 		goto simplify;
 	}
 
-- 
2.11.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [RFC v0 0/4] Give a type to constants, considered harmful
  2017-03-11 15:47 [RFC v0 0/4] Give a type to constants too Luc Van Oostenryck
                   ` (3 preceding siblings ...)
  2017-03-11 15:47 ` [PATCH v0 4/4] give a type to PSEUDO_VALs Luc Van Oostenryck
@ 2017-03-12 20:30 ` Luc Van Oostenryck
  2017-03-12 22:25   ` Dibyendu Majumdar
  2017-03-16 17:25 ` [RFC v0 0/4] Give a type to constants too Linus Torvalds
  5 siblings, 1 reply; 37+ messages in thread
From: Luc Van Oostenryck @ 2017-03-12 20:30 UTC (permalink / raw)
  To: linux-sparse; +Cc: Dibyendu Majumdar, Christopher Li, Jeff Garzik, Pekka Enberg

On Sat, Mar 11, 2017 at 04:47:21PM +0100, Luc Van Oostenryck wrote:
> This is a RFC for giving a type to constants/PSEUDO_VALs.
> 
> Not having this info if fine for the linearization/simplification
> but is quite painful once trying to generate code of it.
> 
> This serie try to do this by:
> - shuffling some fields in struct pseudo to allow
>   PSEUDO_VAL to have the ::sym field too without
>   without increasing the size of the structure.
> - mechanically initialize each PSEUDO_VAL's ::sym
>   with the appropriate type.
> 
> This serie have been tested on the test suite only.
> Thsi serie also doesn't make any use yet of this type info.

I have begun to try to make use of this and I'm now convinced
that this direction is not a viable solution for sparse.

Sparse's IR is slightly lower-level that LLVM's IR, more close
to what a real CPU would do. This can already be seen at some
instructions (nothing like GEP in sparse), the real difference
is less obvious but it's heer that things begin to hurt.
Indeed, sparse's CPU-like model implies that values are typeless
but have a size and sparse's CSE and simplification is heavily
based on this.
Once you try to add and maintain complete and correct typing to
sparse's instructions so that they can be used easily by sparse-llvm
you realize that:
- you need to add a lot more casts
- you need to change CSE to make things equivalent only if they
  have the same type
- a lot of simplifications are wrong, some can be corrected by adding
  even more casts.

So, while I'm very fine to add typing info where it was missing,
I have no interest in making the simplifications more complex and
of lesser quality.


-- Luc Van Oostenryck

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC v0 0/4] Give a type to constants, considered harmful
  2017-03-12 20:30 ` [RFC v0 0/4] Give a type to constants, considered harmful Luc Van Oostenryck
@ 2017-03-12 22:25   ` Dibyendu Majumdar
  2017-03-16 17:20     ` Luc Van Oostenryck
  0 siblings, 1 reply; 37+ messages in thread
From: Dibyendu Majumdar @ 2017-03-12 22:25 UTC (permalink / raw)
  To: Luc Van Oostenryck
  Cc: Linux-Sparse, Christopher Li, Jeff Garzik, Pekka Enberg

On 12 March 2017 at 20:30, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
> On Sat, Mar 11, 2017 at 04:47:21PM +0100, Luc Van Oostenryck wrote:
>> This is a RFC for giving a type to constants/PSEUDO_VALs.
>>
>> Not having this info if fine for the linearization/simplification
>> but is quite painful once trying to generate code of it.
>>
>> This serie try to do this by:
>> - shuffling some fields in struct pseudo to allow
>>   PSEUDO_VAL to have the ::sym field too without
>>   without increasing the size of the structure.
>> - mechanically initialize each PSEUDO_VAL's ::sym
>>   with the appropriate type.
>>
>> This serie have been tested on the test suite only.
>> Thsi serie also doesn't make any use yet of this type info.
>
> I have begun to try to make use of this and I'm now convinced
> that this direction is not a viable solution for sparse.
>
> Sparse's IR is slightly lower-level that LLVM's IR, more close
> to what a real CPU would do. This can already be seen at some
> instructions (nothing like GEP in sparse), the real difference
> is less obvious but it's heer that things begin to hurt.
> Indeed, sparse's CPU-like model implies that values are typeless
> but have a size and sparse's CSE and simplification is heavily
> based on this.
> Once you try to add and maintain complete and correct typing to
> sparse's instructions so that they can be used easily by sparse-llvm
> you realize that:
> - you need to add a lot more casts
> - you need to change CSE to make things equivalent only if they
>   have the same type
> - a lot of simplifications are wrong, some can be corrected by adding
>   even more casts.
>
> So, while I'm very fine to add typing info where it was missing,
> I have no interest in making the simplifications more complex and
> of lesser quality.
>

I do not know / understand enough to comment on this but I find that
your patches are working well for sparse-llvm. In particular without
the type information in constants, I cannot see how variadic functions
can be called correctly.

If the changes done so far haven't broken anything then perhaps they
can be left in?

Regards
Dibyendu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC v0 0/4] Give a type to constants, considered harmful
  2017-03-12 22:25   ` Dibyendu Majumdar
@ 2017-03-16 17:20     ` Luc Van Oostenryck
  2017-03-17 11:03       ` Dibyendu Majumdar
  0 siblings, 1 reply; 37+ messages in thread
From: Luc Van Oostenryck @ 2017-03-16 17:20 UTC (permalink / raw)
  To: Dibyendu Majumdar; +Cc: Linux-Sparse, Christopher Li, Jeff Garzik, Pekka Enberg

On Sun, Mar 12, 2017 at 10:25:48PM +0000, Dibyendu Majumdar wrote:
> On 12 March 2017 at 20:30, Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote:
> > I have begun to try to make use of this and I'm now convinced
> > that this direction is not a viable solution for sparse.
> >
> > Sparse's IR is slightly lower-level that LLVM's IR, more close
> > to what a real CPU would do. This can already be seen at some
> > instructions (nothing like GEP in sparse), the real difference
> > is less obvious but it's heer that things begin to hurt.
> > Indeed, sparse's CPU-like model implies that values are typeless
> > but have a size and sparse's CSE and simplification is heavily
> > based on this.
> > Once you try to add and maintain complete and correct typing to
> > sparse's instructions so that they can be used easily by sparse-llvm
> > you realize that:
> > - you need to add a lot more casts
> > - you need to change CSE to make things equivalent only if they
> >   have the same type
> > - a lot of simplifications are wrong, some can be corrected by adding
> >   even more casts.
> >
> > So, while I'm very fine to add typing info where it was missing,
> > I have no interest in making the simplifications more complex and
> > of lesser quality.
> >
> 
> I do not know / understand enough to comment on this but I find that
> your patches are working well for sparse-llvm.

Yes, sure. This fixes a number of issues regarding sparse-llvm and
more importantly it gives opportinities for even more fixes.

But if you look at patch 4/4, you can see that I already had to
restrict equivalent (for Common Subexpression Elimination) 
PSEUDO_VAL to those of the same type. That's annoying.

Once you take the simplifications in account, you realize that a
pseudo that had one type before simplification become of another
type after simplification. This is more annoying but yes fixable
with a cost.

And in general, the simplifications we do destroy the exact (C) types.
From what I've seen there is no way we can keep the full types and
do the simplifications we do.

So, even giving the correct types to the instructions that missed
them is useless once you do the CSE and the simplifications.
Which is perfectly logical, once the types have been validated
why would the IR instructions mind that the value is 'int' or 'long'
if both have the same size, same with a plain 'int' and a 'const int'?
Same with addresses of object of different types.

After all, LLVM also don't care much about primitive types, integers
also are not typed, just their size matter (and the information about
the size is carried by the instruction). It's only for pointers that
LLVM care about the size.

> In particular without
> the type information in constants, I cannot see how variadic functions
> can be called correctly.

Yes, variadic called with constants is an 'interesting' case.
But here also, it's not the the type that is needed for correctness,
it's only the size.

> If the changes done so far haven't broken anything then perhaps they
> can be left in?

I'll of course do my best to keep as much as possible.

For sparse-llvm, I haven't thought a lot about it, partly because
I'm not interested in it, but I think there is two possibilities
for it to be correct and complete:
1) ignore as much typing as possible, including casting pointers
   to integer of the right size (wich will emiminate all issues with
   GEP and pointer arithmetic, and only casting them back to pointers
   for loads & stores.
2) bypass the CSE & simplification (and possibly using LLVM's
   optimization phases).


-- Luc Van Oostenryck

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC v0 0/4] Give a type to constants too
  2017-03-11 15:47 [RFC v0 0/4] Give a type to constants too Luc Van Oostenryck
                   ` (4 preceding siblings ...)
  2017-03-12 20:30 ` [RFC v0 0/4] Give a type to constants, considered harmful Luc Van Oostenryck
@ 2017-03-16 17:25 ` Linus Torvalds
  2017-03-16 18:04   ` Dibyendu Majumdar
  2017-03-16 20:42   ` Luc Van Oostenryck
  5 siblings, 2 replies; 37+ messages in thread
From: Linus Torvalds @ 2017-03-16 17:25 UTC (permalink / raw)
  To: Luc Van Oostenryck
  Cc: Sparse Mailing-list, Dibyendu Majumdar, Christopher Li,
	Jeff Garzik, Pekka Enberg

Sorry for not reacting to this earlier..

On Sat, Mar 11, 2017 at 7:47 AM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
> This is a RFC for giving a type to constants/PSEUDO_VALs.

This seems completely broken. Not from an implementation standpoint,
but from a conceptual one.

To explain, let me give a completely idiotic example:

    unsigned int test(int arg)
    {
        return arg + (unsigned int)arg;
    }

note how we're adding a "int" and an "unsigned int" together. But the
CSE etc doesn't actually care at all, and we will linearize this to
just

    test:
        add.32      %r5 <- %arg1, %arg1
        ret.32      %r5

because the type just isn't relevant at the linearization phase.

You can tell that there *used* to be multiple pseudos from the
numbering ("%r5"? What happened to "%r1..4"?), but they have all been
smushed together.

Linearization has fundamentally gotten rid of all the C types, and all
you can find are some rough remnants of them (you can find the *size*
of the type, and you can find the rough "type" of type - is it a
pointer, FP value or integer. There aren't even any signs, although
some _operations_ are signed (but not the pseudos).

The same pseudo can have many different types.

                    Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC v0 0/4] Give a type to constants too
  2017-03-16 17:25 ` [RFC v0 0/4] Give a type to constants too Linus Torvalds
@ 2017-03-16 18:04   ` Dibyendu Majumdar
  2017-03-16 18:14     ` Linus Torvalds
  2017-03-16 20:42   ` Luc Van Oostenryck
  1 sibling, 1 reply; 37+ messages in thread
From: Dibyendu Majumdar @ 2017-03-16 18:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Luc Van Oostenryck, Sparse Mailing-list, Christopher Li,
	Jeff Garzik, Pekka Enberg

Hi Linus,

On 16 March 2017 at 17:25, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Sat, Mar 11, 2017 at 7:47 AM, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
>> This is a RFC for giving a type to constants/PSEUDO_VALs.
>
> Linearization has fundamentally gotten rid of all the C types, and all
> you can find are some rough remnants of them (you can find the *size*
> of the type, and you can find the rough "type" of type - is it a
> pointer, FP value or integer. There aren't even any signs, although
> some _operations_ are signed (but not the pseudos).
>

I think that having the size and information whether an integer
constant is an integer or pointer should be good enough, but as far as
I understood for pseudos of type PSEUDO_VAL even this is not
available. Only the value is available. This poses a problem
especially when passing arguments to a variadic function. In
sparse-llvm I think we need to cast values to their expected types to
keep LLVM happy. From my investigations so far this works most of the
time but the corner case appears to be passing integer/pointer
constants to variadic functions. In the normal case we can use the
function prototype to work out the required types.

Is it possible to at least retain size / and kind (int or pointer) for
pseudos representing integer constants? Perhaps the RFC could be
amended to say that?

Thanks and Regards
Dibyendu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC v0 0/4] Give a type to constants too
  2017-03-16 18:04   ` Dibyendu Majumdar
@ 2017-03-16 18:14     ` Linus Torvalds
  2017-03-16 18:24       ` Dibyendu Majumdar
  0 siblings, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2017-03-16 18:14 UTC (permalink / raw)
  To: Dibyendu Majumdar
  Cc: Luc Van Oostenryck, Sparse Mailing-list, Christopher Li,
	Jeff Garzik, Pekka Enberg

On Thu, Mar 16, 2017 at 11:04 AM, Dibyendu Majumdar
<mobile@majumdar.org.uk> wrote:
>
> I think that having the size and information whether an integer
> constant is an integer or pointer should be good enough

The thing is, I don't think you should should actually look at the
pseudo for that AT ALL.

The size is encoded in the operation itself, and that's what you should use.

Now, the "integer vs pointer" thing is perhaps more interesting. We do
*not* encode that, and we use the same "add" operation for both
integers and pointer values. That's because basically linearization
treats pointers as integers too, although we do actually keep the
pseudos separate because we consider casts to/from pointers to be
special (so they aren't _purely_ just integers).

So the instruction itself has

 - size

 - sign

 - _much_ of optype

and I think that the proper long-term fix is perhaps to try to make
that "much" into "all". And part of that may actually be about getting
rid of some of the type differences entirely (ie turn pointers truly
into just pointer-sized integers, the way they almost are already).

That does mean that the llvm interface would need to use even more
explicit casts to make llvm happy.

                Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC v0 0/4] Give a type to constants too
  2017-03-16 18:14     ` Linus Torvalds
@ 2017-03-16 18:24       ` Dibyendu Majumdar
  2017-03-16 18:40         ` Linus Torvalds
  0 siblings, 1 reply; 37+ messages in thread
From: Dibyendu Majumdar @ 2017-03-16 18:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Luc Van Oostenryck, Sparse Mailing-list, Christopher Li,
	Jeff Garzik, Pekka Enberg

Hi Linus,

On 16 March 2017 at 18:14, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Thu, Mar 16, 2017 at 11:04 AM, Dibyendu Majumdar
> <mobile@majumdar.org.uk> wrote:
>>
>> I think that having the size and information whether an integer
>> constant is an integer or pointer should be good enough
>
> The thing is, I don't think you should should actually look at the
> pseudo for that AT ALL.
>
> The size is encoded in the operation itself, and that's what you should use.
>

Agreed and we are doing that except that the function call instruction
only has the type of the call, not the arguments (as far as I
understand - apologies if I have got this wrong). The original issue
arose out of that - i.e. the code was using the type of the call
instruction to decide the type of the argument. To solve this we are
now using the function prototype to work out the types of arguments -
this works okay except for the variadic case.

> Now, the "integer vs pointer" thing is perhaps more interesting. We do
> *not* encode that, and we use the same "add" operation for both
> integers and pointer values. That's because basically linearization
> treats pointers as integers too, although we do actually keep the
> pseudos separate because we consider casts to/from pointers to be
> special (so they aren't _purely_ just integers).
>
> So the instruction itself has
>
>  - size
>
>  - sign
>
>  - _much_ of optype
>
> and I think that the proper long-term fix is perhaps to try to make
> that "much" into "all". And part of that may actually be about getting
> rid of some of the type differences entirely (ie turn pointers truly
> into just pointer-sized integers, the way they almost are already).
>
> That does mean that the llvm interface would need to use even more
> explicit casts to make llvm happy.
>

I think that is fine. I have ended up always casting values to their
expected types in sparse-llvm on the basis that a) the linearized
output has already worked out that it is legal, and b) the casts are
just to keep LLVM happy.

Thanks and Regards
Dibyendu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC v0 0/4] Give a type to constants too
  2017-03-16 18:24       ` Dibyendu Majumdar
@ 2017-03-16 18:40         ` Linus Torvalds
  2017-03-16 20:19           ` Dibyendu Majumdar
  0 siblings, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2017-03-16 18:40 UTC (permalink / raw)
  To: Dibyendu Majumdar
  Cc: Luc Van Oostenryck, Sparse Mailing-list, Christopher Li,
	Jeff Garzik, Pekka Enberg

On Thu, Mar 16, 2017 at 11:24 AM, Dibyendu Majumdar
<mobile@majumdar.org.uk> wrote:
>
> Agreed and we are doing that except that the function call instruction
> only has the type of the call, not the arguments (as far as I
> understand - apologies if I have got this wrong).

The "OP_CALL" should have the call type in the instruction itself:

                struct /* call */ {
                        pseudo_t func;
                        struct pseudo_list *arguments;
                        struct symbol *fntype;
                };

in that "fntype".

So you should not need it for the pseudo (that contains the address of
the function to call).

That was done exactly for the varargs issue, see commit ff527e2
("sparse, llvm: Make function declaration accessible to backend")

Now, that code may be *buggy*, of course, but it's actually very
important, exactly for the same kind of "one pseudo may be associated
with multiple types".

In particular, it's not uncommon to have auto-generated code (or
various handwritten interpreters) have the function be encoded as some
kind of void pointer, and then depending on use, the same pointer
value is used differently.

Eg code like this:

    typedef int (*binop_t)(int, int);
    typedef int (*unop_t)(int);

    #define BINOP 0

    unsigned int execute(int type, void *fn, int arg1, int arg2)
    {
        if (type == BINOP)
                return ((binop_t)fn)(arg1,arg2);
        return ((unop_t)fn)(arg1);
    }

which will linearize to something that does:

    call.32     %r7 <- %r6, %arg3, %arg4

in one branch, and

    call.32     %r13 <- %r6, %arg3

in another. Notice how it uses the same pseudo (%r6) in both cases,
even though the type of the function called is different.

So like all the other cases, the type of the operation (in this case,
a function call) has to be encoded in the instruction itself, not in
the pseudo.

               Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC v0 0/4] Give a type to constants too
  2017-03-16 18:40         ` Linus Torvalds
@ 2017-03-16 20:19           ` Dibyendu Majumdar
  2017-03-16 20:43             ` Linus Torvalds
  0 siblings, 1 reply; 37+ messages in thread
From: Dibyendu Majumdar @ 2017-03-16 20:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Luc Van Oostenryck, Sparse Mailing-list, Christopher Li,
	Jeff Garzik, Pekka Enberg

Hi Linus,

On 16 March 2017 at 18:40, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Thu, Mar 16, 2017 at 11:24 AM, Dibyendu Majumdar
> <mobile@majumdar.org.uk> wrote:
>>
>> Agreed and we are doing that except that the function call instruction
>> only has the type of the call, not the arguments (as far as I
>> understand - apologies if I have got this wrong).
>
> The "OP_CALL" should have the call type in the instruction itself:
>
>                 struct /* call */ {
>                         pseudo_t func;
>                         struct pseudo_list *arguments;
>                         struct symbol *fntype;
>                 };
>
> in that "fntype".
>
> So you should not need it for the pseudo (that contains the address of
> the function to call).
>

Yes, the function type is available, and this is used.

> In particular, it's not uncommon to have auto-generated code (or
> various handwritten interpreters) have the function be encoded as some
> kind of void pointer, and then depending on use, the same pointer
> value is used differently.
>
> Eg code like this:
>
>     typedef int (*binop_t)(int, int);
>     typedef int (*unop_t)(int);
>
>     #define BINOP 0
>
>     unsigned int execute(int type, void *fn, int arg1, int arg2)
>     {
>         if (type == BINOP)
>                 return ((binop_t)fn)(arg1,arg2);
>         return ((unop_t)fn)(arg1);
>     }
>
> which will linearize to something that does:
>
>     call.32     %r7 <- %r6, %arg3, %arg4
>
> in one branch, and
>
>     call.32     %r13 <- %r6, %arg3
>
> in another. Notice how it uses the same pseudo (%r6) in both cases,
> even though the type of the function called is different.
>

The issue is not with the type of %r6 but %arg3 and %arg4 if these
happen to be integer constants, and the function is variadic so we
cannot work out the type from the function prototype.

Here is a contrived example:

extern void concat(char *buf, unsigned long long len, ...);
#define NULL ((void *)0)
int main(void)
{
 char temp[256];
 concat(temp, sizeof temp, "hello", "world", NULL);
 return 0;
}

The linearized output is:

main:
.L0:
        <entry-point>
        symaddr.64  %r1 <- temp
        symaddr.64  %r2 <- <anon symbol:0000029439DF4E18>
        symaddr.64  %r3 <- <anon symbol:0000029439DF5198>
        call        concat, %r1, $256, %r2, %r3, $0
        ret.32      $0

The last argument $0 is a PSEUDO_VAL. The issue is working out what
should be the type/size of this constant.

I hope this is clearer in explaining what the problem is.

Btw the example you gave failed in sparse-llvm - because there is no
explicit cast in the linearized output. To fix it we basically have to
always cast a function to its expected type.

Thanks and Regards
Dibyendu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC v0 0/4] Give a type to constants too
  2017-03-16 17:25 ` [RFC v0 0/4] Give a type to constants too Linus Torvalds
  2017-03-16 18:04   ` Dibyendu Majumdar
@ 2017-03-16 20:42   ` Luc Van Oostenryck
  1 sibling, 0 replies; 37+ messages in thread
From: Luc Van Oostenryck @ 2017-03-16 20:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sparse Mailing-list, Dibyendu Majumdar, Christopher Li,
	Jeff Garzik, Pekka Enberg

On Thu, Mar 16, 2017 at 10:25:08AM -0700, Linus Torvalds wrote:
> Sorry for not reacting to this earlier..

No problem, of course, but since you're looking over here, hehe,s
I wonder if you would have an opinion on:
   http://marc.info/?l=linux-sparse&m=148728997923939&w=4
and to a lesser degree:
   http://marc.info/?l=linux-sparse&m=148906980631077&w=4
I would appreciate it very much.

> On Sat, Mar 11, 2017 at 7:47 AM, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
> > This is a RFC for giving a type to constants/PSEUDO_VALs.
> 
> This seems completely broken. Not from an implementation standpoint,
> but from a conceptual one.

Yes, I agree completely. I realized once I tried to make use of it.

> To explain, let me give a completely idiotic example:

...

> Linearization has fundamentally gotten rid of all the C types, and all
> you can find are some rough remnants of them (you can find the *size*
> of the type, and you can find the rough "type" of type - is it a
> pointer, FP value or integer. There aren't even any signs, although
> some _operations_ are signed (but not the pseudos).
> 
> The same pseudo can have many different types.
> 
>                     Linus

There is just one problem, like Dibyendu noticed, it is when you use
a constant in the variable part of a variadic function. In this case
you don't have any type/size information from the function and
nothing from the pseudo either.

I don't see how this could be done without requiring that pseudos
(which indded have indeed no type) have a fixed size that must not
be changed by CSE or the simplifications.


-- Luc

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC v0 0/4] Give a type to constants too
  2017-03-16 20:19           ` Dibyendu Majumdar
@ 2017-03-16 20:43             ` Linus Torvalds
  2017-03-16 21:19               ` Luc Van Oostenryck
  2017-08-10 15:25               ` [RFC v0 0/4] Give a type to constants too Christopher Li
  0 siblings, 2 replies; 37+ messages in thread
From: Linus Torvalds @ 2017-03-16 20:43 UTC (permalink / raw)
  To: Dibyendu Majumdar
  Cc: Luc Van Oostenryck, Sparse Mailing-list, Christopher Li,
	Jeff Garzik, Pekka Enberg

On Thu, Mar 16, 2017 at 1:19 PM, Dibyendu Majumdar
<mobile@majumdar.org.uk> wrote:
>
> The issue is not with the type of %r6 but %arg3 and %arg4 if these
> happen to be integer constants, and the function is variadic so we
> cannot work out the type from the function prototype.

Yes. Ok, I see the problem.

I think we could add a 'size' to the pseudo, and solve it that way.
CSE and other linearization artifacts may mean that the type is
undefined, but the size should be well-defined.

Alternatively - and this might be the better solution - the OP_CALL
instruction might be better off split up a bit. Instead of having just
that

    struct pseudo_list *arguments;

that lists the arguments, maybe it should have a

    struct instruction_list *arguments;

with a new OP_ARG instruction to generate the argument.

That way all the arguments would have a size, and always a generating
instruction.

              Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC v0 0/4] Give a type to constants too
  2017-03-16 20:43             ` Linus Torvalds
@ 2017-03-16 21:19               ` Luc Van Oostenryck
  2017-03-16 22:28                 ` Linus Torvalds
  2017-08-10 15:25               ` [RFC v0 0/4] Give a type to constants too Christopher Li
  1 sibling, 1 reply; 37+ messages in thread
From: Luc Van Oostenryck @ 2017-03-16 21:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dibyendu Majumdar, Sparse Mailing-list, Christopher Li,
	Jeff Garzik, Pekka Enberg

On Thu, Mar 16, 2017 at 01:43:05PM -0700, Linus Torvalds wrote:
> On Thu, Mar 16, 2017 at 1:19 PM, Dibyendu Majumdar
> <mobile@majumdar.org.uk> wrote:
> >
> > The issue is not with the type of %r6 but %arg3 and %arg4 if these
> > happen to be integer constants, and the function is variadic so we
> > cannot work out the type from the function prototype.
> 
> Yes. Ok, I see the problem.
> 
> I think we could add a 'size' to the pseudo, and solve it that way.
> CSE and other linearization artifacts may mean that the type is
> undefined, but the size should be well-defined.
> 
> Alternatively - and this might be the better solution - the OP_CALL
> instruction might be better off split up a bit. Instead of having just
> that
> 
>     struct pseudo_list *arguments;
> 
> that lists the arguments, maybe it should have a
> 
>     struct instruction_list *arguments;
> 
> with a new OP_ARG instruction to generate the argument.
> 
> That way all the arguments would have a size, and always a generating
> instruction.

Just to be sure of what you're suggesting, you mean that code like:
	int foo(int a)
	{
		printf("%d %d", a, 123);
	}

Would be linearized as something like:
	foo:
		<entry-point>
		set.64		%r1 <- "%d %d"
		arg.64		%r2 <- %r1 
		arg.32		%r3 <- %arg1
		arg.32		%r4 <- %123
		call.32		%r5 <- printf, %r2, %r3, %r4

with the %r2, %r3, %r4 in the call identifying their defining
instructions and the size of these OP_ARG instruction being,
of course, defined at linearization time with the type
of the corresponding expression?

-- Luc

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC v0 0/4] Give a type to constants too
  2017-03-16 21:19               ` Luc Van Oostenryck
@ 2017-03-16 22:28                 ` Linus Torvalds
  2017-03-16 23:12                   ` Luc Van Oostenryck
  0 siblings, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2017-03-16 22:28 UTC (permalink / raw)
  To: Luc Van Oostenryck
  Cc: Dibyendu Majumdar, Sparse Mailing-list, Christopher Li,
	Jeff Garzik, Pekka Enberg

On Thu, Mar 16, 2017 at 2:19 PM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
>
> Just to be sure of what you're suggesting, you mean that code like:
>         int foo(int a)
>         {
>                 printf("%d %d", a, 123);
>         }
>
> Would be linearized as something like:
>         foo:
>                 <entry-point>
>                 set.64          %r1 <- "%d %d"
>                 arg.64          %r2 <- %r1
>                 arg.32          %r3 <- %arg1
>                 arg.32          %r4 <- %123
>                 call.32         %r5 <- printf, %r2, %r3, %r4
>
> with the %r2, %r3, %r4 in the call identifying their defining
> instructions and the size of these OP_ARG instruction being,
> of course, defined at linearization time with the type
> of the corresponding expression?

Close. I was thinking that the "OP_ARG" instructions wouldn't really
generate any pseudo's, they'd just consume them (like the OP_CALL does
now).

So taking your example, which currently linearizes as

  foo:
      call.32     %r3 <- printf, "%d %d", %arg1, $123
      ret.32      %r3

I was more thinking that you'd linearize it purely as a split of the
OP_CALL into "OP_ARG* + OP_CALL":

  foo:
      arg.64   "%d %d"
      arg.32   %arg1
      arg.32   $123
      call.32   %r3 <- printf
      ret.32   %r3

where the OP_ARG would act kind of like a OP_SETVAL instruction, but
it wouldn't have a target pseudo (the target is the "call" instruction
that follows).

Now, an alternative would be that we *do* give it a pseudo target, but
not a regular one, but an "outgoing argument" one. We currently have
those PSEUDO_ARG pseudos that are for incoming arguments, and we could
split that into PSEUDO_ARG_IN and PSEUDO_ARG_OUT.

I dunno. My mental model was to just always have the OP_ARG
instructions tie very closely to the OP_CALL that they are associated
with.

                Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC v0 0/4] Give a type to constants too
  2017-03-16 22:28                 ` Linus Torvalds
@ 2017-03-16 23:12                   ` Luc Van Oostenryck
  2017-03-16 23:51                     ` Linus Torvalds
  0 siblings, 1 reply; 37+ messages in thread
From: Luc Van Oostenryck @ 2017-03-16 23:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dibyendu Majumdar, Sparse Mailing-list, Christopher Li,
	Jeff Garzik, Pekka Enberg

On Thu, Mar 16, 2017 at 03:28:10PM -0700, Linus Torvalds wrote:
> On Thu, Mar 16, 2017 at 2:19 PM, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
> >
> > Just to be sure of what you're suggesting, you mean that code like:
> >         int foo(int a)
> >         {
> >                 printf("%d %d", a, 123);
> >         }
> >
> > Would be linearized as something like:
> >         foo:
> >                 <entry-point>
> >                 set.64          %r1 <- "%d %d"
> >                 arg.64          %r2 <- %r1
> >                 arg.32          %r3 <- %arg1
> >                 arg.32          %r4 <- %123
> >                 call.32         %r5 <- printf, %r2, %r3, %r4
> >
> > with the %r2, %r3, %r4 in the call identifying their defining
> > instructions and the size of these OP_ARG instruction being,
> > of course, defined at linearization time with the type
> > of the corresponding expression?
> 
> Close. I was thinking that the "OP_ARG" instructions wouldn't really
> generate any pseudo's, they'd just consume them (like the OP_CALL does
> now).
> 
> So taking your example, which currently linearizes as
> 
>   foo:
>       call.32     %r3 <- printf, "%d %d", %arg1, $123
>       ret.32      %r3
> 
> I was more thinking that you'd linearize it purely as a split of the
> OP_CALL into "OP_ARG* + OP_CALL":
> 
>   foo:
>       arg.64   "%d %d"
>       arg.32   %arg1
>       arg.32   $123
>       call.32   %r3 <- printf
>       ret.32   %r3
> 
> where the OP_ARG would act kind of like a OP_SETVAL instruction, but
> it wouldn't have a target pseudo (the target is the "call" instruction
> that follows).

OK, I see.
 
> Now, an alternative would be that we *do* give it a pseudo target, but
> not a regular one, but an "outgoing argument" one. We currently have
> those PSEUDO_ARG pseudos that are for incoming arguments, and we could
> split that into PSEUDO_ARG_IN and PSEUDO_ARG_OUT.
> 
> I dunno. My mental model was to just always have the OP_ARG
> instructions tie very closely to the OP_CALL that they are associated
> with.

I think it's good that there is no pseudos between these OP_ARG and
the corresponding OP_CALL since it's kinda a 'single use' value
that must not be part of CSE & simplification.
So I think your mental model is the good one even if it is a bit
different from what other instructions are doing, but of course
the argument passing is *not* like other instructions..
We can simply see these OP_ARG as a sort of pseudo-push instruction.

Anyway, thanks for the feedback.

-- Luc

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC v0 0/4] Give a type to constants too
  2017-03-16 23:12                   ` Luc Van Oostenryck
@ 2017-03-16 23:51                     ` Linus Torvalds
  2017-03-17 11:30                       ` [RFC PATCH] use OP_PUSH + OP_CALL Luc Van Oostenryck
  0 siblings, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2017-03-16 23:51 UTC (permalink / raw)
  To: Luc Van Oostenryck
  Cc: Dibyendu Majumdar, Sparse Mailing-list, Christopher Li,
	Jeff Garzik, Pekka Enberg

On Thu, Mar 16, 2017 at 4:12 PM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
>
> We can simply see these OP_ARG as a sort of pseudo-push instruction.

Exactly, that was my thinking. It might even show in the
test-linearize pseudo-code output as a "push".

              Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC v0 0/4] Give a type to constants, considered harmful
  2017-03-16 17:20     ` Luc Van Oostenryck
@ 2017-03-17 11:03       ` Dibyendu Majumdar
  0 siblings, 0 replies; 37+ messages in thread
From: Dibyendu Majumdar @ 2017-03-17 11:03 UTC (permalink / raw)
  To: Luc Van Oostenryck
  Cc: Linux-Sparse, Christopher Li, Jeff Garzik, Pekka Enberg

Hi Luc,

On 16 March 2017 at 17:20, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
> On Sun, Mar 12, 2017 at 10:25:48PM +0000, Dibyendu Majumdar wrote:
>> On 12 March 2017 at 20:30, Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote:
>> > I have begun to try to make use of this and I'm now convinced
>> > that this direction is not a viable solution for sparse.
>> >
>> > Sparse's IR is slightly lower-level that LLVM's IR, more close
>> > to what a real CPU would do. This can already be seen at some
>> > instructions (nothing like GEP in sparse), the real difference
>> > is less obvious but it's heer that things begin to hurt.
>> > Indeed, sparse's CPU-like model implies that values are typeless
>> > but have a size and sparse's CSE and simplification is heavily
>> > based on this.
>> > Once you try to add and maintain complete and correct typing to
>> > sparse's instructions so that they can be used easily by sparse-llvm
>> > you realize that:
>> > - you need to add a lot more casts
>> > - you need to change CSE to make things equivalent only if they
>> >   have the same type
>> > - a lot of simplifications are wrong, some can be corrected by adding
>> >   even more casts.
>> >
>> > So, while I'm very fine to add typing info where it was missing,
>> > I have no interest in making the simplifications more complex and
>> > of lesser quality.
>> >
>>
>> I do not know / understand enough to comment on this but I find that
>> your patches are working well for sparse-llvm.
>
> Yes, sure. This fixes a number of issues regarding sparse-llvm and
> more importantly it gives opportinities for even more fixes.
>
> But if you look at patch 4/4, you can see that I already had to
> restrict equivalent (for Common Subexpression Elimination)
> PSEUDO_VAL to those of the same type. That's annoying.
>
> Once you take the simplifications in account, you realize that a
> pseudo that had one type before simplification become of another
> type after simplification. This is more annoying but yes fixable
> with a cost.
>
> And in general, the simplifications we do destroy the exact (C) types.
> From what I've seen there is no way we can keep the full types and
> do the simplifications we do.
>
> So, even giving the correct types to the instructions that missed
> them is useless once you do the CSE and the simplifications.
> Which is perfectly logical, once the types have been validated
> why would the IR instructions mind that the value is 'int' or 'long'
> if both have the same size, same with a plain 'int' and a 'const int'?
> Same with addresses of object of different types.
>
> After all, LLVM also don't care much about primitive types, integers
> also are not typed, just their size matter (and the information about
> the size is carried by the instruction). It's only for pointers that
> LLVM care about the size.
>
>> In particular without
>> the type information in constants, I cannot see how variadic functions
>> can be called correctly.
>
> Yes, variadic called with constants is an 'interesting' case.
> But here also, it's not the the type that is needed for correctness,
> it's only the size.
>
>> If the changes done so far haven't broken anything then perhaps they
>> can be left in?
>
> I'll of course do my best to keep as much as possible.
>
> For sparse-llvm, I haven't thought a lot about it, partly because
> I'm not interested in it, but I think there is two possibilities
> for it to be correct and complete:
> 1) ignore as much typing as possible, including casting pointers
>    to integer of the right size (wich will emiminate all issues with
>    GEP and pointer arithmetic, and only casting them back to pointers
>    for loads & stores.
> 2) bypass the CSE & simplification (and possibly using LLVM's
>    optimization phases).
>

Thank you for the detailed explanation of issues. I agree that there
is a mismatch of levels in the sparse linearized IR which is low level
and the LLVM IR which is somewhat higher level. It is still possible
to make sparse-llvm work by essentially casting most results to the
expected type. If for PSEUDO_VALs we had the size information only -
that would help because all we need is the size to ensure that the
value is passed correctly during a function call in the variadic
function case.

The downside of the approach required in sparse-llvm is that LLVM will
not be able to perform many of its optimisations which require
additional type based metadata.

Once the sparse-llvm implementation works correctly for real life
complex programs, I hope to start work on a different backend using
'nanojit'. The nanojit IR is close to sparse IR, except there are no
phi nodes. But we convert phis to stack variables in sparse-llvm
anyway so this approach will translate well. 'nanojit' is very small
JIT - much less capable than LLVM but at the same time, better as a
JIT engine due to speed of compilation and compactness.

Thanks and Regards
Dibyendu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [RFC PATCH] use OP_PUSH + OP_CALL
  2017-03-16 23:51                     ` Linus Torvalds
@ 2017-03-17 11:30                       ` Luc Van Oostenryck
  0 siblings, 0 replies; 37+ messages in thread
From: Luc Van Oostenryck @ 2017-03-17 11:30 UTC (permalink / raw)
  To: linux-sparse; +Cc: Linus Torvalds, Dibyendu Majumdar, Luc Van Oostenryck

Here is the introduction of a new instruction
very similar to 'push' instructions in real CPU
which purpose is to give a correct type to function
arguments.

It seems to work well but I still need to check
liveness & DCE.

sparse-llvm is fine with it too but it will need
to be first integrated with the other changes.


-- Luc Van Oostenryck

---
 example.c                  |  4 ++--
 linearize.c                | 29 +++++++++++++++++++++--------
 linearize.h                |  7 ++++++-
 liveness.c                 | 14 ++++++++------
 simplify.c                 | 11 ++++++++++-
 sparse-llvm.c              |  4 ++--
 validation/call-variadic.c | 31 +++++++++++++++++++++++++++++++
 7 files changed, 80 insertions(+), 20 deletions(-)
 create mode 100644 validation/call-variadic.c

diff --git a/example.c b/example.c
index 691e0f97c..69e00b325 100644
--- a/example.c
+++ b/example.c
@@ -1121,11 +1121,11 @@ static void generate_ret(struct bb_state *state, struct instruction *ret)
  */
 static void generate_call(struct bb_state *state, struct instruction *insn)
 {
+	struct instruction *arg;
 	int offset = 0;
-	pseudo_t arg;
 
 	FOR_EACH_PTR(insn->arguments, arg) {
-		output_insn(state, "pushl %s", generic(state, arg));
+		output_insn(state, "pushl %s", generic(state, arg->src));
 		offset += 4;
 	} END_FOR_EACH_PTR(arg);
 	flush_reg(state, hardregs+0);
diff --git a/linearize.c b/linearize.c
index 5199b6b02..d5ac71f3d 100644
--- a/linearize.c
+++ b/linearize.c
@@ -233,6 +233,7 @@ static const char *opcodes[] = {
 	[OP_FPCAST] = "fpcast",
 	[OP_PTRCAST] = "ptrcast",
 	[OP_INLINED_CALL] = "# call",
+	[OP_PUSH] = "push",
 	[OP_CALL] = "call",
 	[OP_VANEXT] = "va_next",
 	[OP_VAARG] = "va_arg",
@@ -407,17 +408,15 @@ const char *show_instruction(struct instruction *insn)
 	case OP_STORE: case OP_SNOP:
 		buf += sprintf(buf, "%s -> %d[%s]", show_pseudo(insn->target), insn->offset, show_pseudo(insn->src));
 		break;
+	case OP_PUSH:
+		buf += sprintf(buf, "%s", show_pseudo(insn->src));
+		break;
 	case OP_INLINED_CALL:
-	case OP_CALL: {
-		struct pseudo *arg;
+	case OP_CALL:
 		if (insn->target && insn->target != VOID)
 			buf += sprintf(buf, "%s <- ", show_pseudo(insn->target));
 		buf += sprintf(buf, "%s", show_pseudo(insn->func));
-		FOR_EACH_PTR(insn->arguments, arg) {
-			buf += sprintf(buf, ", %s", show_pseudo(arg));
-		} END_FOR_EACH_PTR(arg);
 		break;
-	}
 	case OP_CAST:
 	case OP_SCAST:
 	case OP_FPCAST:
@@ -1197,6 +1196,20 @@ static pseudo_t linearize_assignment(struct entrypoint *ep, struct expression *e
 	return value;
 }
 
+/*
+ * Add an argument for a call.
+ *   -) insn->opcode == O_CALL | OP_INLINE_CALL
+ *   -) ctype = typeof(arg)
+ */
+static void push_argument(struct entrypoint *ep, struct instruction *insn, pseudo_t arg, struct symbol *ctype)
+{
+	struct instruction *push = alloc_typed_instruction(OP_PUSH, ctype);
+	push->call = insn;
+	use_pseudo(push, arg, &push->src);
+	add_instruction(&insn->arguments, push);
+	add_one_insn(ep, push);
+}
+
 static pseudo_t linearize_call_expression(struct entrypoint *ep, struct expression *expr)
 {
 	struct expression *arg, *fn;
@@ -1213,7 +1226,7 @@ static pseudo_t linearize_call_expression(struct entrypoint *ep, struct expressi
 
 	FOR_EACH_PTR(expr->args, arg) {
 		pseudo_t new = linearize_expression(ep, arg);
-		use_pseudo(insn, new, add_pseudo(&insn->arguments, new));
+		push_argument(ep, insn, new, arg->ctype);
 	} END_FOR_EACH_PTR(arg);
 
 	fn = expr->fn;
@@ -1680,7 +1693,7 @@ static pseudo_t linearize_inlined_call(struct entrypoint *ep, struct statement *
 		concat_symbol_list(args->declaration, &ep->syms);
 		FOR_EACH_PTR(args->declaration, sym) {
 			pseudo_t value = linearize_one_symbol(ep, sym);
-			use_pseudo(insn, value, add_pseudo(&insn->arguments, value));
+			push_argument(ep, insn, value, sym);
 		} END_FOR_EACH_PTR(sym);
 	}
 
diff --git a/linearize.h b/linearize.h
index bac82d7ff..7b6e53fc5 100644
--- a/linearize.h
+++ b/linearize.h
@@ -112,9 +112,13 @@ struct instruction {
 		};
 		struct /* call */ {
 			pseudo_t func;
-			struct pseudo_list *arguments;
+			struct instruction_list *arguments;
 			struct symbol *fntype;
 		};
+		struct /* push/arg */ {
+			pseudo_t arg;			/* same as src, src1 & symbol */
+			struct instruction *call;
+		};
 		struct /* context */ {
 			int increment;
 			int check;
@@ -201,6 +205,7 @@ enum opcode {
 	OP_FPCAST,
 	OP_PTRCAST,
 	OP_INLINED_CALL,
+	OP_PUSH,
 	OP_CALL,
 	OP_VANEXT,
 	OP_VAARG,
diff --git a/liveness.c b/liveness.c
index 7461738b4..7b5b1693a 100644
--- a/liveness.c
+++ b/liveness.c
@@ -46,13 +46,12 @@ static void track_instruction_usage(struct basic_block *bb, struct instruction *
 	void (*def)(struct basic_block *, pseudo_t),
 	void (*use)(struct basic_block *, pseudo_t))
 {
-	pseudo_t pseudo;
-
 	#define USES(x)		use(bb, insn->x)
 	#define DEFINES(x)	def(bb, insn->x)
 
 	switch (insn->opcode) {
 	case OP_RET:
+	case OP_PUSH:
 		USES(src);
 		break;
 
@@ -118,14 +117,17 @@ static void track_instruction_usage(struct basic_block *bb, struct instruction *
 		USES(src); DEFINES(target);
 		break;
 
-	case OP_CALL:
+	case OP_CALL: {
+		struct instruction *arg;
+
 		USES(func);
 		if (insn->target != VOID)
 			DEFINES(target);
-		FOR_EACH_PTR(insn->arguments, pseudo) {
-			use(bb, pseudo);
-		} END_FOR_EACH_PTR(pseudo);
+		FOR_EACH_PTR(insn->arguments, arg) {
+			use(bb, arg->src);
+		} END_FOR_EACH_PTR(arg);
 		break;
+	}
 
 	case OP_SLICE:
 		USES(base); DEFINES(target);
diff --git a/simplify.c b/simplify.c
index 5d00937f1..9a3abdff4 100644
--- a/simplify.c
+++ b/simplify.c
@@ -182,6 +182,14 @@ static void kill_use_list(struct pseudo_list *list)
 	} END_FOR_EACH_PTR(p);
 }
 
+static void kill_insn_list(struct instruction_list *list)
+{
+	struct instruction *insn;
+	FOR_EACH_PTR(list, insn) {
+		kill_insn(insn, 0);
+	} END_FOR_EACH_PTR(insn);
+}
+
 /*
  * kill an instruction:
  * - remove it from its bb
@@ -213,6 +221,7 @@ void kill_insn(struct instruction *insn, int force)
 	case OP_SETVAL:
 	case OP_NOT: case OP_NEG:
 	case OP_SLICE:
+	case OP_PUSH:
 		kill_use(&insn->src1);
 		break;
 
@@ -240,7 +249,7 @@ void kill_insn(struct instruction *insn, int force)
 			if (!(insn->func->sym->ctype.modifiers & MOD_PURE))
 				return;
 		}
-		kill_use_list(insn->arguments);
+		kill_insn_list(insn->arguments);
 		if (insn->func->type == PSEUDO_REG)
 			kill_use(&insn->func);
 		break;
diff --git a/sparse-llvm.c b/sparse-llvm.c
index 9f362b3ed..ecc4f032f 100644
--- a/sparse-llvm.c
+++ b/sparse-llvm.c
@@ -707,7 +707,7 @@ static void output_op_call(struct function *fn, struct instruction *insn)
 {
 	LLVMValueRef target, func;
 	int n_arg = 0, i;
-	struct pseudo *arg;
+	struct instruction *arg;
 	LLVMValueRef *args;
 
 	FOR_EACH_PTR(insn->arguments, arg) {
@@ -718,7 +718,7 @@ static void output_op_call(struct function *fn, struct instruction *insn)
 
 	i = 0;
 	FOR_EACH_PTR(insn->arguments, arg) {
-		args[i++] = pseudo_to_value(fn, insn, arg);
+		args[i++] = pseudo_to_value(fn, arg, arg->src);
 	} END_FOR_EACH_PTR(arg);
 
 	func = pseudo_to_value(fn, insn, insn->func);
diff --git a/validation/call-variadic.c b/validation/call-variadic.c
new file mode 100644
index 000000000..372c8fc7d
--- /dev/null
+++ b/validation/call-variadic.c
@@ -0,0 +1,31 @@
+#define NULL	((void*)0)
+
+extern int print(const char *msg, ...);
+
+int foo(int a, long l, int *p)
+{
+	print("msg %c: %d %d/%ld %ld/%p %p\n", 'x', a, __LINE__, l, 0L, p, NULL);
+}
+
+/*
+ * check-name: call-variadic
+ * check-command: test-linearize -Wno-decl $file
+ *
+ * check-output-start
+foo:
+.L0:
+	<entry-point>
+	push.64     "msg %c: %d %d/%ld %ld/%p %p\n"
+	push.32     $120
+	push.32     %arg1
+	push.32     $7
+	push.64     %arg2
+	push.64     $0
+	push.64     %arg3
+	push.64     $0
+	call.32     %r5 <- print
+	ret.32      %r5
+
+
+ * check-output-end
+ */
-- 
2.12.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v0 1/4] be more careful with concat_user_list()
  2017-03-11 15:47 ` [PATCH v0 1/4] be more careful with concat_user_list() Luc Van Oostenryck
@ 2017-04-27 22:41   ` Christopher Li
  0 siblings, 0 replies; 37+ messages in thread
From: Christopher Li @ 2017-04-27 22:41 UTC (permalink / raw)
  To: Luc Van Oostenryck
  Cc: Linux-Sparse, Dibyendu Majumdar, Jeff Garzik, Pekka Enberg

On Sat, Mar 11, 2017 at 10:47 AM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
> In convert_instruction_target(), once all users have been converted
> the old user list is concatened to the one of the replacing pseudo.
> But this pseudo may be one for which a user list is meaningless,
> like PSEUDO_VAL.
>
> While as such it's not a problem, it inhibit the reuse of the users
> pointer for other uses.


Looks good to me.

Chris

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC v0 0/4] Give a type to constants too
  2017-03-16 20:43             ` Linus Torvalds
  2017-03-16 21:19               ` Luc Van Oostenryck
@ 2017-08-10 15:25               ` Christopher Li
  2017-08-10 22:34                 ` Luc Van Oostenryck
  1 sibling, 1 reply; 37+ messages in thread
From: Christopher Li @ 2017-08-10 15:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dibyendu Majumdar, Luc Van Oostenryck, Sparse Mailing-list,
	Jeff Garzik, Pekka Enberg

On Thu, Mar 16, 2017 at 4:43 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Mar 16, 2017 at 1:19 PM, Dibyendu Majumdar
> <mobile@majumdar.org.uk> wrote:
>>
>> The issue is not with the type of %r6 but %arg3 and %arg4 if these
>> happen to be integer constants, and the function is variadic so we
>> cannot work out the type from the function prototype.
>
> Yes. Ok, I see the problem.
>
> I think we could add a 'size' to the pseudo, and solve it that way.
> CSE and other linearization artifacts may mean that the type is
> undefined, but the size should be well-defined.

Sorry for not able to join the discussion earlier. I am catching up
my backlogs of discussion. I have think about this for a while now.
I think there is one alternative solution is just give set_val instruction
a size (not type).

reason being:
1) Necessary for CSE as well.
   If CSE combine two set_val of different size as one pesudo.
   and it is used as two different size. That is likely a bug.
   That is independent of having OP_PUSH or not.

2) Sufficient.
   If we have the size of set_val, we can solve the problem of
   calling function with instant constants and the function is variadic.

That is likely the minimal required change to get both issue
resolved.

Thanks

Chris

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC v0 0/4] Give a type to constants too
  2017-08-10 15:25               ` [RFC v0 0/4] Give a type to constants too Christopher Li
@ 2017-08-10 22:34                 ` Luc Van Oostenryck
  2017-08-11  2:14                   ` Christopher Li
                                     ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Luc Van Oostenryck @ 2017-08-10 22:34 UTC (permalink / raw)
  To: Christopher Li
  Cc: Linus Torvalds, Dibyendu Majumdar, Sparse Mailing-list,
	Jeff Garzik, Pekka Enberg

On Thu, Aug 10, 2017 at 5:25 PM, Christopher Li <sparse@chrisli.org> wrote:
> On Thu, Mar 16, 2017 at 4:43 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> Sorry for not able to join the discussion earlier. I am catching up
> my backlogs of discussion. I have think about this for a while now.
> I think there is one alternative solution is just give set_val instruction
> a size (not type).
>
> reason being:
> 1) Necessary for CSE as well.
>    If CSE combine two set_val of different size as one pesudo.
>    and it is used as two different size. That is likely a bug.

Absolutely not.
Anyway, currectlyeach distinct value correspond to a *single/unique*
pseudo. IN other words, there is no need to do any kind of CSE on
*values*, they are already kinda pre-CSEed at their creation
(because a constant value is nothing more than this value).

Sorry, this has been tried and discussed already. It doesn't really work.
* a size is essentially a type, certainly for integers in sparse IR.
* A constant is a constant, 1 is 1, when it has 1 bit or 64 bits, it's the same
  constant and follow the same mathematical rules. It's even not signed or
  unsigned, it's just 1.
* in sparse, types are conveyed by the *operations* made on their operands.
  There are lots of good reasons for this. Trying to do the opposite for these
  constants will just break a lot of things, duplicate a lot of code
(try it) and
  bring confusion everywhere (because of simply using the operation you
  then need at each time check the type of the operands. It's something
  we want to avoid (like for the discussion about casts we had yesterday)
  not something we want to add).
* the OP_PUSH instructions do a perfect job of giving a type to functions
  arguments
* the OP_PUSH instructions also nicely abstract away the *operation* of
  argument passing that all architectures have. It's a good thing.

-- Luc

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC v0 0/4] Give a type to constants too
  2017-08-10 22:34                 ` Luc Van Oostenryck
@ 2017-08-11  2:14                   ` Christopher Li
  2017-08-11 11:21                     ` Luc Van Oostenryck
  2017-08-11 10:28                   ` Dibyendu Majumdar
  2017-08-11 11:51                   ` Christopher Li
  2 siblings, 1 reply; 37+ messages in thread
From: Christopher Li @ 2017-08-11  2:14 UTC (permalink / raw)
  To: Luc Van Oostenryck
  Cc: Linus Torvalds, Dibyendu Majumdar, Sparse Mailing-list,
	Jeff Garzik, Pekka Enberg

On Thu, Aug 10, 2017 at 6:34 PM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
>
> Absolutely not.
> Anyway, currectlyeach distinct value correspond to a *single/unique*
> pseudo. IN other words, there is no need to do any kind of CSE on
> *values*, they are already kinda pre-CSEed at their creation
> (because a constant value is nothing more than this value).

I see. value_pseudo() has the hash for all constant values.

But float is not covered.

#define PI 3.1415

double a;
void foo(void)
{
a = a*PI + a*PI*5;
}

 ./test-linearize /tmp/pi.c
/tmp/pi.c:3:8: warning: symbol 'a' was not declared. Should it be static?
/tmp/pi.c:4:6: warning: symbol 'foo' was not declared. Should it be static?
foo:
.L0:
<entry-point>
load.64     %r1 <- 0[a]
set.64      %r2 <- 3.141500 <================
mulu.64     %r3 <- %r1, %r2
set.64      %r5 <- 3.141500 <===duplicate======
mulu.64     %r6 <- %r1, %r5
set.64      %r7 <- 5.000000
mulu.64     %r8 <- %r6, %r7
add.64      %r9 <- %r3, %r8
store.64    %r9 -> 0[a]
ret

>
> Sorry, this has been tried and discussed already. It doesn't really work.
> * a size is essentially a type, certainly for integers in sparse IR.
> * A constant is a constant, 1 is 1, when it has 1 bit or 64 bits, it's the same
>   constant and follow the same mathematical rules. It's even not signed or
>   unsigned, it's just 1.
> * in sparse, types are conveyed by the *operations* made on their operands.
>   There are lots of good reasons for this. Trying to do the opposite for these
>   constants will just break a lot of things, duplicate a lot of code
> (try it) and
>   bring confusion everywhere (because of simply using the operation you
>   then need at each time check the type of the operands. It's something
>   we want to avoid (like for the discussion about casts we had yesterday)
>   not something we want to add).

I see, thanks for the explain. Let me think about it for a while.

> * the OP_PUSH instructions do a perfect job of giving a type to functions
>   arguments
> * the OP_PUSH instructions also nicely abstract away the *operation* of
>   argument passing that all architectures have. It's a good thing.

I need to take a closer look at that patch.

Chris

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC v0 0/4] Give a type to constants too
  2017-08-10 22:34                 ` Luc Van Oostenryck
  2017-08-11  2:14                   ` Christopher Li
@ 2017-08-11 10:28                   ` Dibyendu Majumdar
  2017-08-11 11:49                     ` Luc Van Oostenryck
  2017-08-11 11:51                   ` Christopher Li
  2 siblings, 1 reply; 37+ messages in thread
From: Dibyendu Majumdar @ 2017-08-11 10:28 UTC (permalink / raw)
  To: Luc Van Oostenryck
  Cc: Christopher Li, Linus Torvalds, Sparse Mailing-list, Jeff Garzik,
	Pekka Enberg

Hi Luc,

On 10 August 2017 at 23:34, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
> * the OP_PUSH instructions do a perfect job of giving a type to functions
>   arguments
> * the OP_PUSH instructions also nicely abstract away the *operation* of
>   argument passing that all architectures have. It's a good thing.
>

It is true that OP_PUSH instructions resolved the issue I was having.
However there is something odd about them as they are not really
instructions in the instruction stream, are they? They get added as
arguments to the OP_CALL instruction. The solution works great though.

Regards
Dibyendu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC v0 0/4] Give a type to constants too
  2017-08-11  2:14                   ` Christopher Li
@ 2017-08-11 11:21                     ` Luc Van Oostenryck
  0 siblings, 0 replies; 37+ messages in thread
From: Luc Van Oostenryck @ 2017-08-11 11:21 UTC (permalink / raw)
  To: Christopher Li
  Cc: Linus Torvalds, Dibyendu Majumdar, Sparse Mailing-list,
	Jeff Garzik, Pekka Enberg

On Thu, Aug 10, 2017 at 10:14:40PM -0400, Christopher Li wrote:
> 
> I see. value_pseudo() has the hash for all constant values.
> 
> But float is not covered.

Indeed, but:
- floats are very different beasts anyway
- they are defined via OP_SETVAL
- I have patch(es) to change this to OP_SETFVAL
  (it's annoying that SETVAL is used for several unrelated purposes)
  and let CSE know about it.

-- Luc

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC v0 0/4] Give a type to constants too
  2017-08-11 10:28                   ` Dibyendu Majumdar
@ 2017-08-11 11:49                     ` Luc Van Oostenryck
  2017-08-11 12:00                       ` Christopher Li
                                         ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Luc Van Oostenryck @ 2017-08-11 11:49 UTC (permalink / raw)
  To: Dibyendu Majumdar
  Cc: Christopher Li, Linus Torvalds, Sparse Mailing-list, Jeff Garzik,
	Pekka Enberg

On Fri, Aug 11, 2017 at 11:28:23AM +0100, Dibyendu Majumdar wrote:
> Hi Luc,
> 
> On 10 August 2017 at 23:34, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
> > * the OP_PUSH instructions do a perfect job of giving a type to functions
> >   arguments
> > * the OP_PUSH instructions also nicely abstract away the *operation* of
> >   argument passing that all architectures have. It's a good thing.
> >
> 
> It is true that OP_PUSH instructions resolved the issue I was having.
> However there is something odd about them as they are not really
> instructions in the instruction stream, are they? They get added as
> arguments to the OP_CALL instruction. The solution works great though.

Yes, they are somehow odd. I'm not totally sure by what you exactly
mean by "in the instruction stream", though.

Let's use an example:
	fun(1, a)

Currently it's linearized as:
	call	fun, 1, a

The first version of the patch gave:
	arg	%r1, $1
	arg	%r2, a
	call	fun, %r1, %r2

The current version gives:
	push	$1
	push	a
	call	fun
which is very much like a lot of functions really work.
And yes, the 'pushes' is somehow disocciated from the 'call',
or the 'call' is not a consumer of the 'pushes'.
It's the stack that is the consumer from those 'pushes' and
the 'call' (or more exactly the function 'fun') that is the
consumer of (what have been pushed on) the stack.

The 'args' were really only a 'move' or 'copy' in disguise,
they are not needed at this level (but OK they *also* gave
the type).

The 'pushes' are IMO the right thing needed here.

-- Luc

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC v0 0/4] Give a type to constants too
  2017-08-10 22:34                 ` Luc Van Oostenryck
  2017-08-11  2:14                   ` Christopher Li
  2017-08-11 10:28                   ` Dibyendu Majumdar
@ 2017-08-11 11:51                   ` Christopher Li
  2 siblings, 0 replies; 37+ messages in thread
From: Christopher Li @ 2017-08-11 11:51 UTC (permalink / raw)
  To: Luc Van Oostenryck
  Cc: Linus Torvalds, Dibyendu Majumdar, Sparse Mailing-list,
	Jeff Garzik, Pekka Enberg

On Thu, Aug 10, 2017 at 6:34 PM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
> On Thu, Aug 10, 2017 at 5:25 PM, Christopher Li <sparse@chrisli.org> wrote:
>> On Thu, Mar 16, 2017 at 4:43 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>
>> Sorry for not able to join the discussion earlier. I am catching up
>> my backlogs of discussion. I have think about this for a while now.
>> I think there is one alternative solution is just give set_val instruction
>> a size (not type).
>>
>> reason being:
>> 1) Necessary for CSE as well.
>>    If CSE combine two set_val of different size as one pesudo.
>>    and it is used as two different size. That is likely a bug.
>
> Absolutely not.
> Anyway, currectlyeach distinct value correspond to a *single/unique*
> pseudo. IN other words, there is no need to do any kind of CSE on
> *values*, they are already kinda pre-CSEed at their creation
> (because a constant value is nothing more than this value).
>
> Sorry, this has been tried and discussed already. It doesn't really work.
> * a size is essentially a type, certainly for integers in sparse IR.
> * A constant is a constant, 1 is 1, when it has 1 bit or 64 bits, it's the same
>   constant and follow the same mathematical rules. It's even not signed or
>   unsigned, it's just 1.
> * in sparse, types are conveyed by the *operations* made on their operands.
>   There are lots of good reasons for this. Trying to do the opposite for these
>   constants will just break a lot of things, duplicate a lot of code
> (try it) and

OK. I have think about it. I look back the email. It is actually on the
email I reply to (and quoted) . So Linus was thinking two solutions.
One of them is add size to pseudo. The other is add push instruction.
He said that after he said pseudo shouldn't have the size,
which is what you are repeating here. So the reason you use are
actually older one, which already overrule by Linus. He thinks have
size for pseudo is acceptable.

He said the type might be undefined, the size is well-defined.
I totally agree.

What I am popularizing is actually along the line of Linus' first
suggestion. Except that I don't give size to *all* the pseudo,
only the OP_SETVAL, via the instruction to acquire the size.

I have to look at what impact it has on CSE. It might not  be that
big a deal.

Chris

======quote from Linus email May 16 on this thread ========
Yes. Ok, I see the problem.

I think we could add a 'size' to the pseudo, and solve it that way.
CSE and other linearization artifacts may mean that the type is
undefined, but the size should be well-defined.

Alternatively - and this might be the better solution - the OP_CALL
instruction might be better off split up a bit. Instead of having just
that
==================================================

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC v0 0/4] Give a type to constants too
  2017-08-11 11:49                     ` Luc Van Oostenryck
@ 2017-08-11 12:00                       ` Christopher Li
  2017-08-11 12:35                         ` Luc Van Oostenryck
  2017-08-11 12:20                       ` Dibyendu Majumdar
  2017-08-11 13:16                       ` Dibyendu Majumdar
  2 siblings, 1 reply; 37+ messages in thread
From: Christopher Li @ 2017-08-11 12:00 UTC (permalink / raw)
  To: Luc Van Oostenryck
  Cc: Dibyendu Majumdar, Linus Torvalds, Sparse Mailing-list,
	Jeff Garzik, Pekka Enberg

On Fri, Aug 11, 2017 at 7:49 AM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
> The first version of the patch gave:
>         arg     %r1, $1
>         arg     %r2, a
>         call    fun, %r1, %r2
>
> The current version gives:
>         push    $1
>         push    a
>         call    fun


I think he means the push instruction is not in the bb->insns list.

I agree with that view. The push instruction is very different than
any other instruction.  It contain mostly "repeated" information.
It is only needed when two things goes together:
1) the functions is varidic,
2) the OP_SETVAL is on the varidic part of the argument.

Which is rare, most of the function call are not varidic.

Another reason is that, it actually mess up with the parsing
state. If I am writing a script to parse the linearized output.
Notice that all other instruction don't have state itself.
The PUSH has state. It need to remember the ordered
position to the parser to figure out which PUSH apply
to which calling argument.

It also take up valuable screen space when I am staring
at the linearized byte code.

That is why I am actually what to explore Linus' alternative
suggest to push. Give OP_SETVAL a size.

Linus proposal two valid solution, add size to pseudo,
or the push. I haven't explore the first one yet.

Chris

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC v0 0/4] Give a type to constants too
  2017-08-11 11:49                     ` Luc Van Oostenryck
  2017-08-11 12:00                       ` Christopher Li
@ 2017-08-11 12:20                       ` Dibyendu Majumdar
  2017-08-11 12:39                         ` Luc Van Oostenryck
  2017-08-11 13:16                       ` Dibyendu Majumdar
  2 siblings, 1 reply; 37+ messages in thread
From: Dibyendu Majumdar @ 2017-08-11 12:20 UTC (permalink / raw)
  To: Luc Van Oostenryck
  Cc: Christopher Li, Linus Torvalds, Sparse Mailing-list, Jeff Garzik,
	Pekka Enberg

Hi Luc,

On 11 August 2017 at 12:49, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
> On Fri, Aug 11, 2017 at 11:28:23AM +0100, Dibyendu Majumdar wrote:
>> On 10 August 2017 at 23:34, Luc Van Oostenryck
>> <luc.vanoostenryck@gmail.com> wrote:
>> > * the OP_PUSH instructions do a perfect job of giving a type to functions
>> >   arguments
>> > * the OP_PUSH instructions also nicely abstract away the *operation* of
>> >   argument passing that all architectures have. It's a good thing.
>> >
>>
>> It is true that OP_PUSH instructions resolved the issue I was having.
>> However there is something odd about them as they are not really
>> instructions in the instruction stream, are they? They get added as
>> arguments to the OP_CALL instruction. The solution works great though.
>
> Yes, they are somehow odd. I'm not totally sure by what you exactly
> mean by "in the instruction stream", though.
>
> Let's use an example:
>         fun(1, a)
>
> Currently it's linearized as:
>         call    fun, 1, a
>
> The first version of the patch gave:
>         arg     %r1, $1
>         arg     %r2, a
>         call    fun, %r1, %r2
>
> The current version gives:
>         push    $1
>         push    a
>         call    fun
> which is very much like a lot of functions really work.

I was mistaken - I think they do appear in the instruction stream. I
thought they didn't because we kind of ignore them - and only look at
them as part of the OP_CALL arguments.

Regards
Dibyendu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC v0 0/4] Give a type to constants too
  2017-08-11 12:00                       ` Christopher Li
@ 2017-08-11 12:35                         ` Luc Van Oostenryck
  2017-08-11 12:40                           ` Christopher Li
  0 siblings, 1 reply; 37+ messages in thread
From: Luc Van Oostenryck @ 2017-08-11 12:35 UTC (permalink / raw)
  To: Christopher Li
  Cc: Dibyendu Majumdar, Linus Torvalds, Sparse Mailing-list,
	Jeff Garzik, Pekka Enberg

On Fri, Aug 11, 2017 at 08:00:50AM -0400, Christopher Li wrote:
> 
> I think he means the push instruction is not in the bb->insns list.
> 
> I agree with that view.

But these instructions *are* in the bb->insns list.

-- Luc

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC v0 0/4] Give a type to constants too
  2017-08-11 12:20                       ` Dibyendu Majumdar
@ 2017-08-11 12:39                         ` Luc Van Oostenryck
  0 siblings, 0 replies; 37+ messages in thread
From: Luc Van Oostenryck @ 2017-08-11 12:39 UTC (permalink / raw)
  To: Dibyendu Majumdar
  Cc: Christopher Li, Linus Torvalds, Sparse Mailing-list, Jeff Garzik,
	Pekka Enberg

On Fri, Aug 11, 2017 at 01:20:38PM +0100, Dibyendu Majumdar wrote:
> 
> I was mistaken - I think they do appear in the instruction stream.

Yes, they are in bb->insns.

> I thought they didn't because we kind of ignore them - and only look at
> them as part of the OP_CALL arguments.

Well, it depends what we do with them. For the LLVM front-end,
indeed they don't correspond to an LLVM instruction.
For other situations they are or will be used more directly.

-- Luc

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC v0 0/4] Give a type to constants too
  2017-08-11 12:35                         ` Luc Van Oostenryck
@ 2017-08-11 12:40                           ` Christopher Li
  2017-08-11 12:45                             ` Luc Van Oostenryck
  0 siblings, 1 reply; 37+ messages in thread
From: Christopher Li @ 2017-08-11 12:40 UTC (permalink / raw)
  To: Luc Van Oostenryck
  Cc: Dibyendu Majumdar, Linus Torvalds, Sparse Mailing-list,
	Jeff Garzik, Pekka Enberg

On Fri, Aug 11, 2017 at 8:35 AM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
> On Fri, Aug 11, 2017 at 08:00:50AM -0400, Christopher Li wrote:
>>
>> I think he means the push instruction is not in the bb->insns list.
>>
>> I agree with that view.
>
> But these instructions *are* in the bb->insns list.

If they are in the bb->insns, that is even worse in my book. It means
the back end processing the IR need to remember the state to match
up push into the calling arguments. The call instruction show up *after*
the push. You are forcing the back end to simulate a push stack to
properly process  the function call. When the back end see the
push instruction, they have noway of known the following call
will use varidic or not. The back end have to save the state to
wait for the call instruction.

You might just as well give call instruction a separate type list matching
the calling one. That at least don't show up as bb->insns.

I still want to explore Linus's first suggestion, give pseudo a type.
Using the OP_SETVAL, When I get a chance.

Chris

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC v0 0/4] Give a type to constants too
  2017-08-11 12:40                           ` Christopher Li
@ 2017-08-11 12:45                             ` Luc Van Oostenryck
  0 siblings, 0 replies; 37+ messages in thread
From: Luc Van Oostenryck @ 2017-08-11 12:45 UTC (permalink / raw)
  To: Christopher Li
  Cc: Dibyendu Majumdar, Linus Torvalds, Sparse Mailing-list,
	Jeff Garzik, Pekka Enberg

On Fri, Aug 11, 2017 at 08:40:25AM -0400, Christopher Li wrote:
> On Fri, Aug 11, 2017 at 8:35 AM, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
> > On Fri, Aug 11, 2017 at 08:00:50AM -0400, Christopher Li wrote:
> >>
> >> I think he means the push instruction is not in the bb->insns list.
> >>
> >> I agree with that view.
> >
> > But these instructions *are* in the bb->insns list.
> 
> If they are in the bb->insns, that is even worse in my book. It means
> the back end processing the IR need to remember the state to match
> up push into the calling arguments. The call instruction show up *after*
> the push.

Yes. Calling conventions are often like this.
isn't the opposite direction somehow less meaningful?

> You are forcing the back end to simulate a push stack to
> properly process  the function call.

That's not true.  Look at the LLVM backend.

-- Luc

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC v0 0/4] Give a type to constants too
  2017-08-11 11:49                     ` Luc Van Oostenryck
  2017-08-11 12:00                       ` Christopher Li
  2017-08-11 12:20                       ` Dibyendu Majumdar
@ 2017-08-11 13:16                       ` Dibyendu Majumdar
  2 siblings, 0 replies; 37+ messages in thread
From: Dibyendu Majumdar @ 2017-08-11 13:16 UTC (permalink / raw)
  To: Luc Van Oostenryck
  Cc: Christopher Li, Linus Torvalds, Sparse Mailing-list, Jeff Garzik,
	Pekka Enberg

Hi Luc,

On 11 August 2017 at 12:49, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
> On Fri, Aug 11, 2017 at 11:28:23AM +0100, Dibyendu Majumdar wrote:
>> On 10 August 2017 at 23:34, Luc Van Oostenryck
>> <luc.vanoostenryck@gmail.com> wrote:
>> > * the OP_PUSH instructions do a perfect job of giving a type to functions
>> >   arguments
>> > * the OP_PUSH instructions also nicely abstract away the *operation* of
>> >   argument passing that all architectures have. It's a good thing.
>> >
>>
>> It is true that OP_PUSH instructions resolved the issue I was having.
>> However there is something odd about them as they are not really
>> instructions in the instruction stream, are they? They get added as
>> arguments to the OP_CALL instruction. The solution works great though.
>
> Yes, they are somehow odd.

I am trying to think why I find them odd. I suppose being used to LLVM
IR - where each instruction results in a new value, I wasn't sure how
the push instruction plays with everything else. But since we ignore
them in the LLVM backend except as arguments to CALL, it doesn't make
any difference in practical terms. I also ignore these instructions in
my NanoJIT backend except as arguments to CALL.

Regards
Dibyendu

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2017-08-11 13:16 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-11 15:47 [RFC v0 0/4] Give a type to constants too Luc Van Oostenryck
2017-03-11 15:47 ` [PATCH v0 1/4] be more careful with concat_user_list() Luc Van Oostenryck
2017-04-27 22:41   ` Christopher Li
2017-03-11 15:47 ` [PATCH v1 2/4] make space for PSEUDO_VAL have a type Luc Van Oostenryck
2017-03-11 15:47 ` [PATCH v0 3/4] add helper pseudo_type() Luc Van Oostenryck
2017-03-11 15:47 ` [PATCH v0 4/4] give a type to PSEUDO_VALs Luc Van Oostenryck
2017-03-12 20:30 ` [RFC v0 0/4] Give a type to constants, considered harmful Luc Van Oostenryck
2017-03-12 22:25   ` Dibyendu Majumdar
2017-03-16 17:20     ` Luc Van Oostenryck
2017-03-17 11:03       ` Dibyendu Majumdar
2017-03-16 17:25 ` [RFC v0 0/4] Give a type to constants too Linus Torvalds
2017-03-16 18:04   ` Dibyendu Majumdar
2017-03-16 18:14     ` Linus Torvalds
2017-03-16 18:24       ` Dibyendu Majumdar
2017-03-16 18:40         ` Linus Torvalds
2017-03-16 20:19           ` Dibyendu Majumdar
2017-03-16 20:43             ` Linus Torvalds
2017-03-16 21:19               ` Luc Van Oostenryck
2017-03-16 22:28                 ` Linus Torvalds
2017-03-16 23:12                   ` Luc Van Oostenryck
2017-03-16 23:51                     ` Linus Torvalds
2017-03-17 11:30                       ` [RFC PATCH] use OP_PUSH + OP_CALL Luc Van Oostenryck
2017-08-10 15:25               ` [RFC v0 0/4] Give a type to constants too Christopher Li
2017-08-10 22:34                 ` Luc Van Oostenryck
2017-08-11  2:14                   ` Christopher Li
2017-08-11 11:21                     ` Luc Van Oostenryck
2017-08-11 10:28                   ` Dibyendu Majumdar
2017-08-11 11:49                     ` Luc Van Oostenryck
2017-08-11 12:00                       ` Christopher Li
2017-08-11 12:35                         ` Luc Van Oostenryck
2017-08-11 12:40                           ` Christopher Li
2017-08-11 12:45                             ` Luc Van Oostenryck
2017-08-11 12:20                       ` Dibyendu Majumdar
2017-08-11 12:39                         ` Luc Van Oostenryck
2017-08-11 13:16                       ` Dibyendu Majumdar
2017-08-11 11:51                   ` Christopher Li
2017-03-16 20:42   ` Luc Van Oostenryck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.