All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions
@ 2011-09-26  7:46 Jan Kiszka
  2011-09-26  8:01 ` Mulyadi Santosa
  2011-09-26 10:51 ` [Qemu-devel] [PATCH v2] " Jan Kiszka
  0 siblings, 2 replies; 22+ messages in thread
From: Jan Kiszka @ 2011-09-26  7:46 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Blue Swirl, Peter Maydell, qemu-devel

This increases the overhead of frequently executed helpers.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

Maybe this should be applied to more hot-path functions, but I haven't
done any thorough analysis yet.

 Makefile.target |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Makefile.target b/Makefile.target
index 88d2f1f..1cc758d 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -91,7 +91,7 @@ tcg/tcg.o: cpu.h
 
 # HELPER_CFLAGS is used for all the code compiled with static register
 # variables
-op_helper.o user-exec.o: QEMU_CFLAGS += $(HELPER_CFLAGS)
+op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS)
 
 # Note: this is a workaround. The real fix is to avoid compiling
 # cpu_signal_handler() in user-exec.c.

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions
  2011-09-26  7:46 [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions Jan Kiszka
@ 2011-09-26  8:01 ` Mulyadi Santosa
  2011-09-26  8:15   ` Laurent Desnogues
  2011-09-26 10:51 ` [Qemu-devel] [PATCH v2] " Jan Kiszka
  1 sibling, 1 reply; 22+ messages in thread
From: Mulyadi Santosa @ 2011-09-26  8:01 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: qemu-devel

Hi...

On Mon, Sep 26, 2011 at 14:46, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> This increases the overhead of frequently executed helpers.
>
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

IMHO, stack protector setup put more stuffs during epilogue, but quite
likely it is negligible unless it cause too much L1 cache misses. So,
I think this micro tuning is somewhat unnecessary but still okay.
Security wise, I think it's better to just leave it as is like now.

-- 
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions
  2011-09-26  8:01 ` Mulyadi Santosa
@ 2011-09-26  8:15   ` Laurent Desnogues
  2011-09-26 17:41     ` Avi Kivity
  0 siblings, 1 reply; 22+ messages in thread
From: Laurent Desnogues @ 2011-09-26  8:15 UTC (permalink / raw)
  To: Mulyadi Santosa; +Cc: Jan Kiszka, qemu-devel

On Mon, Sep 26, 2011 at 10:01 AM, Mulyadi Santosa
<mulyadi.santosa@gmail.com> wrote:
> Hi...
>
> On Mon, Sep 26, 2011 at 14:46, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> This increases the overhead of frequently executed helpers.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>
> IMHO, stack protector setup put more stuffs during epilogue, but quite
> likely it is negligible unless it cause too much L1 cache misses. So,
> I think this micro tuning is somewhat unnecessary but still okay.

The impact of stack protection is very high for instance running
FFmpeg ARM with NEON optimizations:  a few months ago I
measured that removing stack protection improved the run time
by more than 10%.  Of course it's extreme since the proportion
of NEON instructions (and hence of helper calls) is very high.


Laurent

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v2] tcg: Remove stack protection from helper functions
  2011-09-26  7:46 [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions Jan Kiszka
  2011-09-26  8:01 ` Mulyadi Santosa
@ 2011-09-26 10:51 ` Jan Kiszka
  2011-09-26 11:33   ` Peter Maydell
  1 sibling, 1 reply; 22+ messages in thread
From: Jan Kiszka @ 2011-09-26 10:51 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Blue Swirl, Peter Maydell, Laurent Desnogues, Mulyadi Santosa,
	qemu-devel

This increases the overhead of frequently executed helpers. We need to
move rule past QEMU_CFLAGS assignment to ensure that the required simple
assignment picks up all bits. The signal workaround is moved just for
the sake of consistency.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

Changes in v2:
 - unbreak qemu-user build

Maybe some real make guru has a nicer solution for removing the switch.

 Makefile.target |   17 +++++++++--------
 1 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/Makefile.target b/Makefile.target
index 88d2f1f..b545161 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -89,14 +89,6 @@ translate-all.o: translate-all.c cpu.h
 
 tcg/tcg.o: cpu.h
 
-# HELPER_CFLAGS is used for all the code compiled with static register
-# variables
-op_helper.o user-exec.o: QEMU_CFLAGS += $(HELPER_CFLAGS)
-
-# Note: this is a workaround. The real fix is to avoid compiling
-# cpu_signal_handler() in user-exec.c.
-signal.o: QEMU_CFLAGS += $(HELPER_CFLAGS)
-
 #########################################################
 # Linux user emulator target
 
@@ -387,6 +379,15 @@ obj-y += $(addprefix ../, $(trace-obj-y))
 
 endif # CONFIG_SOFTMMU
 
+# HELPER_CFLAGS is used for all the code compiled with static register
+# variables
+# NOTE: Must be after the last QEMU_CFLAGS assignment
+op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS)
+
+# Note: this is a workaround. The real fix is to avoid compiling
+# cpu_signal_handler() in user-exec.c.
+signal.o: QEMU_CFLAGS += $(HELPER_CFLAGS)
+
 ifndef CONFIG_LINUX_USER
 ifndef CONFIG_BSD_USER
 # libcacard needs qemu-thread support, and besides is only needed by devices
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v2] tcg: Remove stack protection from helper functions
  2011-09-26 10:51 ` [Qemu-devel] [PATCH v2] " Jan Kiszka
@ 2011-09-26 11:33   ` Peter Maydell
  2011-09-26 11:43     ` Jan Kiszka
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Maydell @ 2011-09-26 11:33 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Blue Swirl, Laurent Desnogues, Anthony Liguori, Mulyadi Santosa,
	qemu-devel

On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> This increases the overhead of frequently executed helpers. We need to
> move rule past QEMU_CFLAGS assignment to ensure that the required simple
> assignment picks up all bits. The signal workaround is moved just for
> the sake of consistency.

> +# NOTE: Must be after the last QEMU_CFLAGS assignment
> +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS)

Why also user-exec.o ? Why not the other source files with helpers in?
This doesn't seem very consistent. Maybe the right answer is to have
some of the offending helper functions inline instead? (Or to not
have -fstack-protector-all globally?)

-- PMM

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v2] tcg: Remove stack protection from helper functions
  2011-09-26 11:33   ` Peter Maydell
@ 2011-09-26 11:43     ` Jan Kiszka
  2011-09-26 11:56       ` Peter Maydell
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Kiszka @ 2011-09-26 11:43 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Blue Swirl, Laurent Desnogues, Anthony Liguori, Mulyadi Santosa,
	qemu-devel

On 2011-09-26 13:33, Peter Maydell wrote:
> On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> This increases the overhead of frequently executed helpers. We need to
>> move rule past QEMU_CFLAGS assignment to ensure that the required simple
>> assignment picks up all bits. The signal workaround is moved just for
>> the sake of consistency.
> 
>> +# NOTE: Must be after the last QEMU_CFLAGS assignment
>> +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS)
> 
> Why also user-exec.o ? 

That's a good question. It doesn't look like it's deserving this.

> Why not the other source files with helpers in?

Name them and I add them.

> This doesn't seem very consistent. Maybe the right answer is to have
> some of the offending helper functions inline instead? 

I can't imagine that this could be a short- or even mid-term answer.
Inlining is a huge work.

> (Or to not
> have -fstack-protector-all globally?)

Opt-in instead of opt-out, that might be some approach, though I bet the
out-out set still bets the opt-in crowed by some orders of magnitude.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v2] tcg: Remove stack protection from helper functions
  2011-09-26 11:43     ` Jan Kiszka
@ 2011-09-26 11:56       ` Peter Maydell
  2011-09-26 17:22         ` Blue Swirl
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Maydell @ 2011-09-26 11:56 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Blue Swirl, Laurent Desnogues, Anthony Liguori, Mulyadi Santosa,
	qemu-devel

On 26 September 2011 12:43, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2011-09-26 13:33, Peter Maydell wrote:
>> On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>> This increases the overhead of frequently executed helpers. We need to
>>> move rule past QEMU_CFLAGS assignment to ensure that the required simple
>>> assignment picks up all bits. The signal workaround is moved just for
>>> the sake of consistency.
>>
>>> +# NOTE: Must be after the last QEMU_CFLAGS assignment
>>> +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS)
>>
>> Why also user-exec.o ?
>
> That's a good question. It doesn't look like it's deserving this.
>
>> Why not the other source files with helpers in?
>
> Name them and I add them.

target-*/*helper.c ?

But mostly I think what I'm trying to say is that this is making
a tradeoff between safety and speed, so it ought to come with a
rationale for why it is OK to remove the safety checks for these
files. Given that rationale you can identify other files that are
also safe/worthwhile to flip the flag for.

>> (Or to not
>> have -fstack-protector-all globally?)
>
> Opt-in instead of opt-out, that might be some approach, though I bet the
> out-out set still bets the opt-in crowed by some orders of magnitude.

Have you looked at whether using plain -fstack-protector for all files
(rather than the -all version) helps? Presumably we had some reason for
picking the -all version though...

-- PMM

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v2] tcg: Remove stack protection from helper functions
  2011-09-26 11:56       ` Peter Maydell
@ 2011-09-26 17:22         ` Blue Swirl
  2011-09-26 17:33           ` Jan Kiszka
  0 siblings, 1 reply; 22+ messages in thread
From: Blue Swirl @ 2011-09-26 17:22 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Laurent Desnogues, Jan Kiszka, Anthony Liguori, Mulyadi Santosa,
	qemu-devel

On Mon, Sep 26, 2011 at 11:56 AM, Peter Maydell
<peter.maydell@linaro.org> wrote:
> On 26 September 2011 12:43, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> On 2011-09-26 13:33, Peter Maydell wrote:
>>> On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>> This increases the overhead of frequently executed helpers. We need to
>>>> move rule past QEMU_CFLAGS assignment to ensure that the required simple
>>>> assignment picks up all bits. The signal workaround is moved just for
>>>> the sake of consistency.
>>>
>>>> +# NOTE: Must be after the last QEMU_CFLAGS assignment
>>>> +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS)
>>>
>>> Why also user-exec.o ?
>>
>> That's a good question. It doesn't look like it's deserving this.
>>
>>> Why not the other source files with helpers in?
>>
>> Name them and I add them.
>
> target-*/*helper.c ?
>
> But mostly I think what I'm trying to say is that this is making
> a tradeoff between safety and speed, so it ought to come with a
> rationale for why it is OK to remove the safety checks for these
> files. Given that rationale you can identify other files that are
> also safe/worthwhile to flip the flag for.

I wouldn't remove -fstack-protector-all by default. Especially op code
interfaces with the guest.

For max performance version, I'd check if -fomit-frame-pointer and -O3
makes sense. See also this article:
https://www.debian-administration.org/article/672/Optimizing_code_via_compiler_flags

>>> (Or to not
>>> have -fstack-protector-all globally?)
>>
>> Opt-in instead of opt-out, that might be some approach, though I bet the
>> out-out set still bets the opt-in crowed by some orders of magnitude.
>
> Have you looked at whether using plain -fstack-protector for all files
> (rather than the -all version) helps? Presumably we had some reason for
> picking the -all version though...
>
> -- PMM
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v2] tcg: Remove stack protection from helper functions
  2011-09-26 17:22         ` Blue Swirl
@ 2011-09-26 17:33           ` Jan Kiszka
  2011-09-26 18:20             ` Blue Swirl
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Kiszka @ 2011-09-26 17:33 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Laurent Desnogues, Peter Maydell, Anthony Liguori,
	Mulyadi Santosa, qemu-devel

On 2011-09-26 19:22, Blue Swirl wrote:
> On Mon, Sep 26, 2011 at 11:56 AM, Peter Maydell
> <peter.maydell@linaro.org> wrote:
>> On 26 September 2011 12:43, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>> On 2011-09-26 13:33, Peter Maydell wrote:
>>>> On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>>> This increases the overhead of frequently executed helpers. We need to
>>>>> move rule past QEMU_CFLAGS assignment to ensure that the required simple
>>>>> assignment picks up all bits. The signal workaround is moved just for
>>>>> the sake of consistency.
>>>>
>>>>> +# NOTE: Must be after the last QEMU_CFLAGS assignment
>>>>> +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS)
>>>>
>>>> Why also user-exec.o ?
>>>
>>> That's a good question. It doesn't look like it's deserving this.
>>>
>>>> Why not the other source files with helpers in?
>>>
>>> Name them and I add them.
>>
>> target-*/*helper.c ?
>>
>> But mostly I think what I'm trying to say is that this is making
>> a tradeoff between safety and speed, so it ought to come with a
>> rationale for why it is OK to remove the safety checks for these
>> files. Given that rationale you can identify other files that are
>> also safe/worthwhile to flip the flag for.
> 
> I wouldn't remove -fstack-protector-all by default. Especially op code
> interfaces with the guest.

I'd love to have some function attribute for this, because a stack
protector for rather simple arithmetic operations or something like
helper_cli/sti are pointlessly burned cycles.

Maybe we can introduce op_helper_simple.c.

> 
> For max performance version, I'd check if -fomit-frame-pointer and -O3
> makes sense. See also this article:
> https://www.debian-administration.org/article/672/Optimizing_code_via_compiler_flags

We already run without frame pointers, -O3 might be worth exploring in
addition. Still, that won't take the protector overhead away.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions
  2011-09-26  8:15   ` Laurent Desnogues
@ 2011-09-26 17:41     ` Avi Kivity
  2011-09-26 19:43       ` Richard Henderson
  0 siblings, 1 reply; 22+ messages in thread
From: Avi Kivity @ 2011-09-26 17:41 UTC (permalink / raw)
  To: Laurent Desnogues; +Cc: Jan Kiszka, Mulyadi Santosa, qemu-devel

On 09/26/2011 11:15 AM, Laurent Desnogues wrote:
> On Mon, Sep 26, 2011 at 10:01 AM, Mulyadi Santosa
> <mulyadi.santosa@gmail.com>  wrote:
> >  Hi...
> >
> >  On Mon, Sep 26, 2011 at 14:46, Jan Kiszka<jan.kiszka@siemens.com>  wrote:
> >>  This increases the overhead of frequently executed helpers.
> >>
> >>  Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
> >
> >  IMHO, stack protector setup put more stuffs during epilogue, but quite
> >  likely it is negligible unless it cause too much L1 cache misses. So,
> >  I think this micro tuning is somewhat unnecessary but still okay.
>
> The impact of stack protection is very high for instance running
> FFmpeg ARM with NEON optimizations:  a few months ago I
> measured that removing stack protection improved the run time
> by more than 10%.  Of course it's extreme since the proportion
> of NEON instructions (and hence of helper calls) is very high.

I saw a lot of helper calls for sse in ordinary x86_64 code, likely for 
memcpy/cmp and friends.  Native tcg ops for common vector instructions 
would probably be quite a speedup.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v2] tcg: Remove stack protection from helper functions
  2011-09-26 17:33           ` Jan Kiszka
@ 2011-09-26 18:20             ` Blue Swirl
  2011-09-26 18:25               ` Jan Kiszka
  2011-09-26 19:08               ` Peter Maydell
  0 siblings, 2 replies; 22+ messages in thread
From: Blue Swirl @ 2011-09-26 18:20 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Laurent Desnogues, Peter Maydell, Anthony Liguori,
	Mulyadi Santosa, qemu-devel

On Mon, Sep 26, 2011 at 5:33 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2011-09-26 19:22, Blue Swirl wrote:
>> On Mon, Sep 26, 2011 at 11:56 AM, Peter Maydell
>> <peter.maydell@linaro.org> wrote:
>>> On 26 September 2011 12:43, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>> On 2011-09-26 13:33, Peter Maydell wrote:
>>>>> On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>>>> This increases the overhead of frequently executed helpers. We need to
>>>>>> move rule past QEMU_CFLAGS assignment to ensure that the required simple
>>>>>> assignment picks up all bits. The signal workaround is moved just for
>>>>>> the sake of consistency.
>>>>>
>>>>>> +# NOTE: Must be after the last QEMU_CFLAGS assignment
>>>>>> +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS)
>>>>>
>>>>> Why also user-exec.o ?
>>>>
>>>> That's a good question. It doesn't look like it's deserving this.
>>>>
>>>>> Why not the other source files with helpers in?
>>>>
>>>> Name them and I add them.
>>>
>>> target-*/*helper.c ?
>>>
>>> But mostly I think what I'm trying to say is that this is making
>>> a tradeoff between safety and speed, so it ought to come with a
>>> rationale for why it is OK to remove the safety checks for these
>>> files. Given that rationale you can identify other files that are
>>> also safe/worthwhile to flip the flag for.
>>
>> I wouldn't remove -fstack-protector-all by default. Especially op code
>> interfaces with the guest.
>
> I'd love to have some function attribute for this, because a stack
> protector for rather simple arithmetic operations or something like
> helper_cli/sti are pointlessly burned cycles.

In order to avoid burning the cycles, there is a certain kernel module
which gives almost native performance.

> Maybe we can introduce op_helper_simple.c.
>
>>
>> For max performance version, I'd check if -fomit-frame-pointer and -O3
>> makes sense. See also this article:
>> https://www.debian-administration.org/article/672/Optimizing_code_via_compiler_flags
>
> We already run without frame pointers, -O3 might be worth exploring in
> addition. Still, that won't take the protector overhead away.

It would be interesting to have some benchmarks. I'd expect that most
of the run time is spent within generated code, the next largest item
should be the translator and any helpers should be marginal.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v2] tcg: Remove stack protection from helper functions
  2011-09-26 18:20             ` Blue Swirl
@ 2011-09-26 18:25               ` Jan Kiszka
  2011-09-26 18:40                 ` Blue Swirl
  2011-09-26 19:08               ` Peter Maydell
  1 sibling, 1 reply; 22+ messages in thread
From: Jan Kiszka @ 2011-09-26 18:25 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Laurent Desnogues, Peter Maydell, Anthony Liguori,
	Mulyadi Santosa, qemu-devel

On 2011-09-26 20:20, Blue Swirl wrote:
> On Mon, Sep 26, 2011 at 5:33 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> On 2011-09-26 19:22, Blue Swirl wrote:
>>> On Mon, Sep 26, 2011 at 11:56 AM, Peter Maydell
>>> <peter.maydell@linaro.org> wrote:
>>>> On 26 September 2011 12:43, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>>> On 2011-09-26 13:33, Peter Maydell wrote:
>>>>>> On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>>>>> This increases the overhead of frequently executed helpers. We need to
>>>>>>> move rule past QEMU_CFLAGS assignment to ensure that the required simple
>>>>>>> assignment picks up all bits. The signal workaround is moved just for
>>>>>>> the sake of consistency.
>>>>>>
>>>>>>> +# NOTE: Must be after the last QEMU_CFLAGS assignment
>>>>>>> +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS)
>>>>>>
>>>>>> Why also user-exec.o ?
>>>>>
>>>>> That's a good question. It doesn't look like it's deserving this.
>>>>>
>>>>>> Why not the other source files with helpers in?
>>>>>
>>>>> Name them and I add them.
>>>>
>>>> target-*/*helper.c ?
>>>>
>>>> But mostly I think what I'm trying to say is that this is making
>>>> a tradeoff between safety and speed, so it ought to come with a
>>>> rationale for why it is OK to remove the safety checks for these
>>>> files. Given that rationale you can identify other files that are
>>>> also safe/worthwhile to flip the flag for.
>>>
>>> I wouldn't remove -fstack-protector-all by default. Especially op code
>>> interfaces with the guest.
>>
>> I'd love to have some function attribute for this, because a stack
>> protector for rather simple arithmetic operations or something like
>> helper_cli/sti are pointlessly burned cycles.
> 
> In order to avoid burning the cycles, there is a certain kernel module
> which gives almost native performance.

Well, even in 2011 there are cases remaining where VT-x/AMV-V is not
available to your favorite hypervisor.

> 
>> Maybe we can introduce op_helper_simple.c.
>>
>>>
>>> For max performance version, I'd check if -fomit-frame-pointer and -O3
>>> makes sense. See also this article:
>>> https://www.debian-administration.org/article/672/Optimizing_code_via_compiler_flags
>>
>> We already run without frame pointers, -O3 might be worth exploring in
>> addition. Still, that won't take the protector overhead away.
> 
> It would be interesting to have some benchmarks. I'd expect that most
> of the run time is spent within generated code, the next largest item
> should be the translator and any helpers should be marginal.

At least we've a rather static, long-running guest, not much is
recompiled once the system has settled.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v2] tcg: Remove stack protection from helper functions
  2011-09-26 18:25               ` Jan Kiszka
@ 2011-09-26 18:40                 ` Blue Swirl
  0 siblings, 0 replies; 22+ messages in thread
From: Blue Swirl @ 2011-09-26 18:40 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Laurent Desnogues, Peter Maydell, Anthony Liguori,
	Mulyadi Santosa, qemu-devel

On Mon, Sep 26, 2011 at 6:25 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2011-09-26 20:20, Blue Swirl wrote:
>> On Mon, Sep 26, 2011 at 5:33 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>> On 2011-09-26 19:22, Blue Swirl wrote:
>>>> On Mon, Sep 26, 2011 at 11:56 AM, Peter Maydell
>>>> <peter.maydell@linaro.org> wrote:
>>>>> On 26 September 2011 12:43, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>>>> On 2011-09-26 13:33, Peter Maydell wrote:
>>>>>>> On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>>>>>> This increases the overhead of frequently executed helpers. We need to
>>>>>>>> move rule past QEMU_CFLAGS assignment to ensure that the required simple
>>>>>>>> assignment picks up all bits. The signal workaround is moved just for
>>>>>>>> the sake of consistency.
>>>>>>>
>>>>>>>> +# NOTE: Must be after the last QEMU_CFLAGS assignment
>>>>>>>> +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS)
>>>>>>>
>>>>>>> Why also user-exec.o ?
>>>>>>
>>>>>> That's a good question. It doesn't look like it's deserving this.
>>>>>>
>>>>>>> Why not the other source files with helpers in?
>>>>>>
>>>>>> Name them and I add them.
>>>>>
>>>>> target-*/*helper.c ?
>>>>>
>>>>> But mostly I think what I'm trying to say is that this is making
>>>>> a tradeoff between safety and speed, so it ought to come with a
>>>>> rationale for why it is OK to remove the safety checks for these
>>>>> files. Given that rationale you can identify other files that are
>>>>> also safe/worthwhile to flip the flag for.
>>>>
>>>> I wouldn't remove -fstack-protector-all by default. Especially op code
>>>> interfaces with the guest.
>>>
>>> I'd love to have some function attribute for this, because a stack
>>> protector for rather simple arithmetic operations or something like
>>> helper_cli/sti are pointlessly burned cycles.
>>
>> In order to avoid burning the cycles, there is a certain kernel module
>> which gives almost native performance.
>
> Well, even in 2011 there are cases remaining where VT-x/AMV-V is not
> available to your favorite hypervisor.
>
>>
>>> Maybe we can introduce op_helper_simple.c.
>>>
>>>>
>>>> For max performance version, I'd check if -fomit-frame-pointer and -O3
>>>> makes sense. See also this article:
>>>> https://www.debian-administration.org/article/672/Optimizing_code_via_compiler_flags
>>>
>>> We already run without frame pointers, -O3 might be worth exploring in
>>> addition. Still, that won't take the protector overhead away.
>>
>> It would be interesting to have some benchmarks. I'd expect that most
>> of the run time is spent within generated code, the next largest item
>> should be the translator and any helpers should be marginal.
>
> At least we've a rather static, long-running guest, not much is
> recompiled once the system has settled.

Optimizing the translator or improving TCG optimizer could also
improve performance.

By the way, -fomit-frame-pointer is not enabled globally for some
reason, it should be the default for non-debug builds.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v2] tcg: Remove stack protection from helper functions
  2011-09-26 18:20             ` Blue Swirl
  2011-09-26 18:25               ` Jan Kiszka
@ 2011-09-26 19:08               ` Peter Maydell
  1 sibling, 0 replies; 22+ messages in thread
From: Peter Maydell @ 2011-09-26 19:08 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Laurent Desnogues, Jan Kiszka, Anthony Liguori, Mulyadi Santosa,
	qemu-devel

On 26 September 2011 19:20, Blue Swirl <blauwirbel@gmail.com> wrote:
> It would be interesting to have some benchmarks. I'd expect that most
> of the run time is spent within generated code, the next largest item
> should be the translator and any helpers should be marginal.

Depends a lot on the target, I suspect. On ARM at the moment
we spend huge amounts of time (30%+ of runtime) in a handful
of helpers like sub_cc because we haven't optimised the handling
of subtract and test to do things inline. (On my todo list but
performance in general is behind feature work. I have a half a
patchset that I haven't got back to yet.)

Which is kind of echoing your remarks about benchmarks, really.

-- PMM

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions
  2011-09-26 17:41     ` Avi Kivity
@ 2011-09-26 19:43       ` Richard Henderson
  2011-09-26 19:52         ` Avi Kivity
  0 siblings, 1 reply; 22+ messages in thread
From: Richard Henderson @ 2011-09-26 19:43 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Laurent Desnogues, Jan Kiszka, Mulyadi Santosa, qemu-devel

On 09/26/2011 10:41 AM, Avi Kivity wrote:
> Native tcg ops for common vector instructions would probably be quite a speedup.

It's very possible to simply open-code many of the vector operations.

I've done a port of qemu to the SPU (aka Cell) processor.  This core
has no scalar operations; all operations are on vectors.  It turned
out fairly well for the basic arithmetic.  I only have to fall back
on helpers for the more esoteric operations.

That said, all FP vector operations should of course continue to be
done completely via helpers, since one would need helpers for the
individual FP operations anyway.


r~

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions
  2011-09-26 19:43       ` Richard Henderson
@ 2011-09-26 19:52         ` Avi Kivity
  2011-09-26 19:53           ` Richard Henderson
  2011-09-26 20:19           ` Peter Maydell
  0 siblings, 2 replies; 22+ messages in thread
From: Avi Kivity @ 2011-09-26 19:52 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Laurent Desnogues, Jan Kiszka, Mulyadi Santosa, qemu-devel

On 09/26/2011 10:43 PM, Richard Henderson wrote:
> On 09/26/2011 10:41 AM, Avi Kivity wrote:
> >  Native tcg ops for common vector instructions would probably be quite a speedup.
>
> It's very possible to simply open-code many of the vector operations.
>
> I've done a port of qemu to the SPU (aka Cell) processor.  This core
> has no scalar operations; all operations are on vectors.  It turned
> out fairly well for the basic arithmetic.  I only have to fall back
> on helpers for the more esoteric operations.
>
> That said, all FP vector operations should of course continue to be
> done completely via helpers, since one would need helpers for the
> individual FP operations anyway.

Why do floating point ops need helpers?  At least if all the edge cases 
match? (i.e. NaNs and denormals)

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions
  2011-09-26 19:52         ` Avi Kivity
@ 2011-09-26 19:53           ` Richard Henderson
  2011-09-26 20:20             ` Avi Kivity
  2011-09-26 20:19           ` Peter Maydell
  1 sibling, 1 reply; 22+ messages in thread
From: Richard Henderson @ 2011-09-26 19:53 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Laurent Desnogues, Jan Kiszka, Mulyadi Santosa, qemu-devel

On 09/26/2011 12:52 PM, Avi Kivity wrote:
> Why do floating point ops need helpers?

Because TCG doesn't do any native floating point.


r~

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions
  2011-09-26 19:52         ` Avi Kivity
  2011-09-26 19:53           ` Richard Henderson
@ 2011-09-26 20:19           ` Peter Maydell
  2011-09-26 20:26             ` Avi Kivity
  2011-09-27  4:29             ` Andi Kleen
  1 sibling, 2 replies; 22+ messages in thread
From: Peter Maydell @ 2011-09-26 20:19 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Laurent Desnogues, Jan Kiszka, Mulyadi Santosa, qemu-devel,
	Richard Henderson

On 26 September 2011 20:52, Avi Kivity <avi@redhat.com> wrote:
> Why do floating point ops need helpers?  At least if all the edge cases
> match? (i.e. NaNs and denormals)

The answer is that the edge cases basically never match. No CPU
architecture does handling of NaNs and input denormals and output
denormals and underflow checks and all the rest of it in exactly
the same way as anybody else. (In particular x86 is pretty crazy,
which is unfortunate given that it's the most significant host
arch at the moment.) So any kind of TCG native floating point
support would probably have to translate to "check if either
input is a special case; if not, try the op; check if the output
was a special case; if any of those checks fired, back off to
the softfloat helper function". Which is quite a lot of inline
code, and also annoyingly bouncing between fp ops and integer
bitpattern checks on the fp values.

-- PMM

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions
  2011-09-26 19:53           ` Richard Henderson
@ 2011-09-26 20:20             ` Avi Kivity
  0 siblings, 0 replies; 22+ messages in thread
From: Avi Kivity @ 2011-09-26 20:20 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Laurent Desnogues, Jan Kiszka, Mulyadi Santosa, qemu-devel

On 09/26/2011 10:53 PM, Richard Henderson wrote:
> On 09/26/2011 12:52 PM, Avi Kivity wrote:
> >  Why do floating point ops need helpers?
>
> Because TCG doesn't do any native floating point.
>

Well, it could be made to do it.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions
  2011-09-26 20:19           ` Peter Maydell
@ 2011-09-26 20:26             ` Avi Kivity
  2011-09-27  4:29             ` Andi Kleen
  1 sibling, 0 replies; 22+ messages in thread
From: Avi Kivity @ 2011-09-26 20:26 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Laurent Desnogues, Jan Kiszka, Mulyadi Santosa, qemu-devel,
	Richard Henderson

On 09/26/2011 11:19 PM, Peter Maydell wrote:
> On 26 September 2011 20:52, Avi Kivity<avi@redhat.com>  wrote:
> >  Why do floating point ops need helpers?  At least if all the edge cases
> >  match? (i.e. NaNs and denormals)
>
> The answer is that the edge cases basically never match.

Surely they do when host == target.  Although there you can virtualize.

>   No CPU
> architecture does handling of NaNs and input denormals and output
> denormals and underflow checks and all the rest of it in exactly
> the same way as anybody else. (In particular x86 is pretty crazy,
> which is unfortunate given that it's the most significant host
> arch at the moment.) So any kind of TCG native floating point
> support would probably have to translate to "check if either
> input is a special case; if not, try the op; check if the output
> was a special case; if any of those checks fired, back off to
> the softfloat helper function". Which is quite a lot of inline
> code, and also annoyingly bouncing between fp ops and integer
> bitpattern checks on the fp values.
>

Alternatively, configure the fpu to trap on these cases, and handle them 
in a slow path.  At least x86 sse allows this (though perhaps not for 
"quiet NaN"s?

Does it matter in practice?  Perhaps we can have a fast-and-loose mode 
for the fpu (gcc does).

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions
  2011-09-26 20:19           ` Peter Maydell
  2011-09-26 20:26             ` Avi Kivity
@ 2011-09-27  4:29             ` Andi Kleen
  2011-09-27  7:58               ` Peter Maydell
  1 sibling, 1 reply; 22+ messages in thread
From: Andi Kleen @ 2011-09-27  4:29 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Jan Kiszka, qemu-devel, Avi Kivity, Laurent Desnogues,
	Mulyadi Santosa, Richard Henderson

Peter Maydell <peter.maydell@linaro.org> writes:
>
> The answer is that the edge cases basically never match. No CPU
> architecture does handling of NaNs and input denormals and output
> denormals and underflow checks and all the rest of it in exactly
> the same way as anybody else. (In particular x86 is pretty crazy,

Can you clarify this? 

IEEE754 is pretty strict on how all these things are handled
and to my knowledge all serious x86 are fully IEEE compliant.
Or are you refering to the x87 80bits expansion? While useful
that's not used anymore with SSE.

On the other hand qemu is not very good at it, e.g. with x87
it doesn't even pass paranoia.

-Andi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions
  2011-09-27  4:29             ` Andi Kleen
@ 2011-09-27  7:58               ` Peter Maydell
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Maydell @ 2011-09-27  7:58 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jan Kiszka, qemu-devel, Avi Kivity, Laurent Desnogues,
	Mulyadi Santosa, Richard Henderson

On 27 September 2011 05:29, Andi Kleen <andi@firstfloor.org> wrote:
> Peter Maydell <peter.maydell@linaro.org> writes:
>> The answer is that the edge cases basically never match. No CPU
>> architecture does handling of NaNs and input denormals and output
>> denormals and underflow checks and all the rest of it in exactly
>> the same way as anybody else. (In particular x86 is pretty crazy,
>
> Can you clarify this?
>
> IEEE754 is pretty strict on how all these things are handled
> and to my knowledge all serious x86 are fully IEEE compliant.
> Or are you refering to the x87 80bits expansion? While useful
> that's not used anymore with SSE.

IEEE leaves some leeway for implementations. Just off the top
of my head:
 * if two NaNs are passed to an op then which one is propagated
   is implementation defined
 * value of the 'default NaN' is imp-def
 * whether the msbit of the significand is 1 or 0 to indicate
   an SNaN is imp-def
 * how an SNaN is converted to a QNaN is imp-def
 * tininess can be detected either before or after rounding

which different architectures vary on (and some are better at
documenting their choices than others).

Also implementations often have extra non-IEEE modes (which
may even be the default, for speed):
 * squashing denormal outputs to zero
 * squashing denormal inputs to zero
and there's even less agreement here.

Common-but-not-officially-ieee ops like 'max' and 'min'
can also vary: for instance Intel's SSE MAXSD/MINSD etc
have very weird behaviour for the special cases: if both
operands are 0.0s of any sign you always get the second
operand, so max(-0,+0) != max(+0,-0), and if only one operand
is a NaN then the second operand is returned whether it is
the NaN or not (so max(NaN, 3) != max(3, NaN).

> On the other hand qemu is not very good at it, e.g. with x87
> it doesn't even pass paranoia.

This is only because nobody cares much about x86 TCG. ARM floating
point emulation is (now) bit-for-bit correct apart from a handful
of operations which don't set the right set of exception flags.

-- PMM

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2011-09-27  7:58 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-26  7:46 [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions Jan Kiszka
2011-09-26  8:01 ` Mulyadi Santosa
2011-09-26  8:15   ` Laurent Desnogues
2011-09-26 17:41     ` Avi Kivity
2011-09-26 19:43       ` Richard Henderson
2011-09-26 19:52         ` Avi Kivity
2011-09-26 19:53           ` Richard Henderson
2011-09-26 20:20             ` Avi Kivity
2011-09-26 20:19           ` Peter Maydell
2011-09-26 20:26             ` Avi Kivity
2011-09-27  4:29             ` Andi Kleen
2011-09-27  7:58               ` Peter Maydell
2011-09-26 10:51 ` [Qemu-devel] [PATCH v2] " Jan Kiszka
2011-09-26 11:33   ` Peter Maydell
2011-09-26 11:43     ` Jan Kiszka
2011-09-26 11:56       ` Peter Maydell
2011-09-26 17:22         ` Blue Swirl
2011-09-26 17:33           ` Jan Kiszka
2011-09-26 18:20             ` Blue Swirl
2011-09-26 18:25               ` Jan Kiszka
2011-09-26 18:40                 ` Blue Swirl
2011-09-26 19:08               ` Peter Maydell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.