* [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions @ 2011-09-26 7:46 Jan Kiszka 2011-09-26 8:01 ` Mulyadi Santosa 2011-09-26 10:51 ` [Qemu-devel] [PATCH v2] " Jan Kiszka 0 siblings, 2 replies; 22+ messages in thread From: Jan Kiszka @ 2011-09-26 7:46 UTC (permalink / raw) To: Anthony Liguori; +Cc: Blue Swirl, Peter Maydell, qemu-devel This increases the overhead of frequently executed helpers. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> --- Maybe this should be applied to more hot-path functions, but I haven't done any thorough analysis yet. Makefile.target | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/Makefile.target b/Makefile.target index 88d2f1f..1cc758d 100644 --- a/Makefile.target +++ b/Makefile.target @@ -91,7 +91,7 @@ tcg/tcg.o: cpu.h # HELPER_CFLAGS is used for all the code compiled with static register # variables -op_helper.o user-exec.o: QEMU_CFLAGS += $(HELPER_CFLAGS) +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS) # Note: this is a workaround. The real fix is to avoid compiling # cpu_signal_handler() in user-exec.c. ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions 2011-09-26 7:46 [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions Jan Kiszka @ 2011-09-26 8:01 ` Mulyadi Santosa 2011-09-26 8:15 ` Laurent Desnogues 2011-09-26 10:51 ` [Qemu-devel] [PATCH v2] " Jan Kiszka 1 sibling, 1 reply; 22+ messages in thread From: Mulyadi Santosa @ 2011-09-26 8:01 UTC (permalink / raw) To: Jan Kiszka; +Cc: qemu-devel Hi... On Mon, Sep 26, 2011 at 14:46, Jan Kiszka <jan.kiszka@siemens.com> wrote: > This increases the overhead of frequently executed helpers. > > Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> IMHO, stack protector setup put more stuffs during epilogue, but quite likely it is negligible unless it cause too much L1 cache misses. So, I think this micro tuning is somewhat unnecessary but still okay. Security wise, I think it's better to just leave it as is like now. -- regards, Mulyadi Santosa Freelance Linux trainer and consultant blog: the-hydra.blogspot.com training: mulyaditraining.blogspot.com ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions 2011-09-26 8:01 ` Mulyadi Santosa @ 2011-09-26 8:15 ` Laurent Desnogues 2011-09-26 17:41 ` Avi Kivity 0 siblings, 1 reply; 22+ messages in thread From: Laurent Desnogues @ 2011-09-26 8:15 UTC (permalink / raw) To: Mulyadi Santosa; +Cc: Jan Kiszka, qemu-devel On Mon, Sep 26, 2011 at 10:01 AM, Mulyadi Santosa <mulyadi.santosa@gmail.com> wrote: > Hi... > > On Mon, Sep 26, 2011 at 14:46, Jan Kiszka <jan.kiszka@siemens.com> wrote: >> This increases the overhead of frequently executed helpers. >> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> > > IMHO, stack protector setup put more stuffs during epilogue, but quite > likely it is negligible unless it cause too much L1 cache misses. So, > I think this micro tuning is somewhat unnecessary but still okay. The impact of stack protection is very high for instance running FFmpeg ARM with NEON optimizations: a few months ago I measured that removing stack protection improved the run time by more than 10%. Of course it's extreme since the proportion of NEON instructions (and hence of helper calls) is very high. Laurent ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions 2011-09-26 8:15 ` Laurent Desnogues @ 2011-09-26 17:41 ` Avi Kivity 2011-09-26 19:43 ` Richard Henderson 0 siblings, 1 reply; 22+ messages in thread From: Avi Kivity @ 2011-09-26 17:41 UTC (permalink / raw) To: Laurent Desnogues; +Cc: Jan Kiszka, Mulyadi Santosa, qemu-devel On 09/26/2011 11:15 AM, Laurent Desnogues wrote: > On Mon, Sep 26, 2011 at 10:01 AM, Mulyadi Santosa > <mulyadi.santosa@gmail.com> wrote: > > Hi... > > > > On Mon, Sep 26, 2011 at 14:46, Jan Kiszka<jan.kiszka@siemens.com> wrote: > >> This increases the overhead of frequently executed helpers. > >> > >> Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com> > > > > IMHO, stack protector setup put more stuffs during epilogue, but quite > > likely it is negligible unless it cause too much L1 cache misses. So, > > I think this micro tuning is somewhat unnecessary but still okay. > > The impact of stack protection is very high for instance running > FFmpeg ARM with NEON optimizations: a few months ago I > measured that removing stack protection improved the run time > by more than 10%. Of course it's extreme since the proportion > of NEON instructions (and hence of helper calls) is very high. I saw a lot of helper calls for sse in ordinary x86_64 code, likely for memcpy/cmp and friends. Native tcg ops for common vector instructions would probably be quite a speedup. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions 2011-09-26 17:41 ` Avi Kivity @ 2011-09-26 19:43 ` Richard Henderson 2011-09-26 19:52 ` Avi Kivity 0 siblings, 1 reply; 22+ messages in thread From: Richard Henderson @ 2011-09-26 19:43 UTC (permalink / raw) To: Avi Kivity; +Cc: Laurent Desnogues, Jan Kiszka, Mulyadi Santosa, qemu-devel On 09/26/2011 10:41 AM, Avi Kivity wrote: > Native tcg ops for common vector instructions would probably be quite a speedup. It's very possible to simply open-code many of the vector operations. I've done a port of qemu to the SPU (aka Cell) processor. This core has no scalar operations; all operations are on vectors. It turned out fairly well for the basic arithmetic. I only have to fall back on helpers for the more esoteric operations. That said, all FP vector operations should of course continue to be done completely via helpers, since one would need helpers for the individual FP operations anyway. r~ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions 2011-09-26 19:43 ` Richard Henderson @ 2011-09-26 19:52 ` Avi Kivity 2011-09-26 19:53 ` Richard Henderson 2011-09-26 20:19 ` Peter Maydell 0 siblings, 2 replies; 22+ messages in thread From: Avi Kivity @ 2011-09-26 19:52 UTC (permalink / raw) To: Richard Henderson Cc: Laurent Desnogues, Jan Kiszka, Mulyadi Santosa, qemu-devel On 09/26/2011 10:43 PM, Richard Henderson wrote: > On 09/26/2011 10:41 AM, Avi Kivity wrote: > > Native tcg ops for common vector instructions would probably be quite a speedup. > > It's very possible to simply open-code many of the vector operations. > > I've done a port of qemu to the SPU (aka Cell) processor. This core > has no scalar operations; all operations are on vectors. It turned > out fairly well for the basic arithmetic. I only have to fall back > on helpers for the more esoteric operations. > > That said, all FP vector operations should of course continue to be > done completely via helpers, since one would need helpers for the > individual FP operations anyway. Why do floating point ops need helpers? At least if all the edge cases match? (i.e. NaNs and denormals) -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions 2011-09-26 19:52 ` Avi Kivity @ 2011-09-26 19:53 ` Richard Henderson 2011-09-26 20:20 ` Avi Kivity 2011-09-26 20:19 ` Peter Maydell 1 sibling, 1 reply; 22+ messages in thread From: Richard Henderson @ 2011-09-26 19:53 UTC (permalink / raw) To: Avi Kivity; +Cc: Laurent Desnogues, Jan Kiszka, Mulyadi Santosa, qemu-devel On 09/26/2011 12:52 PM, Avi Kivity wrote: > Why do floating point ops need helpers? Because TCG doesn't do any native floating point. r~ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions 2011-09-26 19:53 ` Richard Henderson @ 2011-09-26 20:20 ` Avi Kivity 0 siblings, 0 replies; 22+ messages in thread From: Avi Kivity @ 2011-09-26 20:20 UTC (permalink / raw) To: Richard Henderson Cc: Laurent Desnogues, Jan Kiszka, Mulyadi Santosa, qemu-devel On 09/26/2011 10:53 PM, Richard Henderson wrote: > On 09/26/2011 12:52 PM, Avi Kivity wrote: > > Why do floating point ops need helpers? > > Because TCG doesn't do any native floating point. > Well, it could be made to do it. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions 2011-09-26 19:52 ` Avi Kivity 2011-09-26 19:53 ` Richard Henderson @ 2011-09-26 20:19 ` Peter Maydell 2011-09-26 20:26 ` Avi Kivity 2011-09-27 4:29 ` Andi Kleen 1 sibling, 2 replies; 22+ messages in thread From: Peter Maydell @ 2011-09-26 20:19 UTC (permalink / raw) To: Avi Kivity Cc: Laurent Desnogues, Jan Kiszka, Mulyadi Santosa, qemu-devel, Richard Henderson On 26 September 2011 20:52, Avi Kivity <avi@redhat.com> wrote: > Why do floating point ops need helpers? At least if all the edge cases > match? (i.e. NaNs and denormals) The answer is that the edge cases basically never match. No CPU architecture does handling of NaNs and input denormals and output denormals and underflow checks and all the rest of it in exactly the same way as anybody else. (In particular x86 is pretty crazy, which is unfortunate given that it's the most significant host arch at the moment.) So any kind of TCG native floating point support would probably have to translate to "check if either input is a special case; if not, try the op; check if the output was a special case; if any of those checks fired, back off to the softfloat helper function". Which is quite a lot of inline code, and also annoyingly bouncing between fp ops and integer bitpattern checks on the fp values. -- PMM ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions 2011-09-26 20:19 ` Peter Maydell @ 2011-09-26 20:26 ` Avi Kivity 2011-09-27 4:29 ` Andi Kleen 1 sibling, 0 replies; 22+ messages in thread From: Avi Kivity @ 2011-09-26 20:26 UTC (permalink / raw) To: Peter Maydell Cc: Laurent Desnogues, Jan Kiszka, Mulyadi Santosa, qemu-devel, Richard Henderson On 09/26/2011 11:19 PM, Peter Maydell wrote: > On 26 September 2011 20:52, Avi Kivity<avi@redhat.com> wrote: > > Why do floating point ops need helpers? At least if all the edge cases > > match? (i.e. NaNs and denormals) > > The answer is that the edge cases basically never match. Surely they do when host == target. Although there you can virtualize. > No CPU > architecture does handling of NaNs and input denormals and output > denormals and underflow checks and all the rest of it in exactly > the same way as anybody else. (In particular x86 is pretty crazy, > which is unfortunate given that it's the most significant host > arch at the moment.) So any kind of TCG native floating point > support would probably have to translate to "check if either > input is a special case; if not, try the op; check if the output > was a special case; if any of those checks fired, back off to > the softfloat helper function". Which is quite a lot of inline > code, and also annoyingly bouncing between fp ops and integer > bitpattern checks on the fp values. > Alternatively, configure the fpu to trap on these cases, and handle them in a slow path. At least x86 sse allows this (though perhaps not for "quiet NaN"s? Does it matter in practice? Perhaps we can have a fast-and-loose mode for the fpu (gcc does). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions 2011-09-26 20:19 ` Peter Maydell 2011-09-26 20:26 ` Avi Kivity @ 2011-09-27 4:29 ` Andi Kleen 2011-09-27 7:58 ` Peter Maydell 1 sibling, 1 reply; 22+ messages in thread From: Andi Kleen @ 2011-09-27 4:29 UTC (permalink / raw) To: Peter Maydell Cc: Jan Kiszka, qemu-devel, Avi Kivity, Laurent Desnogues, Mulyadi Santosa, Richard Henderson Peter Maydell <peter.maydell@linaro.org> writes: > > The answer is that the edge cases basically never match. No CPU > architecture does handling of NaNs and input denormals and output > denormals and underflow checks and all the rest of it in exactly > the same way as anybody else. (In particular x86 is pretty crazy, Can you clarify this? IEEE754 is pretty strict on how all these things are handled and to my knowledge all serious x86 are fully IEEE compliant. Or are you refering to the x87 80bits expansion? While useful that's not used anymore with SSE. On the other hand qemu is not very good at it, e.g. with x87 it doesn't even pass paranoia. -Andi ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions 2011-09-27 4:29 ` Andi Kleen @ 2011-09-27 7:58 ` Peter Maydell 0 siblings, 0 replies; 22+ messages in thread From: Peter Maydell @ 2011-09-27 7:58 UTC (permalink / raw) To: Andi Kleen Cc: Jan Kiszka, qemu-devel, Avi Kivity, Laurent Desnogues, Mulyadi Santosa, Richard Henderson On 27 September 2011 05:29, Andi Kleen <andi@firstfloor.org> wrote: > Peter Maydell <peter.maydell@linaro.org> writes: >> The answer is that the edge cases basically never match. No CPU >> architecture does handling of NaNs and input denormals and output >> denormals and underflow checks and all the rest of it in exactly >> the same way as anybody else. (In particular x86 is pretty crazy, > > Can you clarify this? > > IEEE754 is pretty strict on how all these things are handled > and to my knowledge all serious x86 are fully IEEE compliant. > Or are you refering to the x87 80bits expansion? While useful > that's not used anymore with SSE. IEEE leaves some leeway for implementations. Just off the top of my head: * if two NaNs are passed to an op then which one is propagated is implementation defined * value of the 'default NaN' is imp-def * whether the msbit of the significand is 1 or 0 to indicate an SNaN is imp-def * how an SNaN is converted to a QNaN is imp-def * tininess can be detected either before or after rounding which different architectures vary on (and some are better at documenting their choices than others). Also implementations often have extra non-IEEE modes (which may even be the default, for speed): * squashing denormal outputs to zero * squashing denormal inputs to zero and there's even less agreement here. Common-but-not-officially-ieee ops like 'max' and 'min' can also vary: for instance Intel's SSE MAXSD/MINSD etc have very weird behaviour for the special cases: if both operands are 0.0s of any sign you always get the second operand, so max(-0,+0) != max(+0,-0), and if only one operand is a NaN then the second operand is returned whether it is the NaN or not (so max(NaN, 3) != max(3, NaN). > On the other hand qemu is not very good at it, e.g. with x87 > it doesn't even pass paranoia. This is only because nobody cares much about x86 TCG. ARM floating point emulation is (now) bit-for-bit correct apart from a handful of operations which don't set the right set of exception flags. -- PMM ^ permalink raw reply [flat|nested] 22+ messages in thread
* [Qemu-devel] [PATCH v2] tcg: Remove stack protection from helper functions 2011-09-26 7:46 [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions Jan Kiszka 2011-09-26 8:01 ` Mulyadi Santosa @ 2011-09-26 10:51 ` Jan Kiszka 2011-09-26 11:33 ` Peter Maydell 1 sibling, 1 reply; 22+ messages in thread From: Jan Kiszka @ 2011-09-26 10:51 UTC (permalink / raw) To: Anthony Liguori Cc: Blue Swirl, Peter Maydell, Laurent Desnogues, Mulyadi Santosa, qemu-devel This increases the overhead of frequently executed helpers. We need to move rule past QEMU_CFLAGS assignment to ensure that the required simple assignment picks up all bits. The signal workaround is moved just for the sake of consistency. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> --- Changes in v2: - unbreak qemu-user build Maybe some real make guru has a nicer solution for removing the switch. Makefile.target | 17 +++++++++-------- 1 files changed, 9 insertions(+), 8 deletions(-) diff --git a/Makefile.target b/Makefile.target index 88d2f1f..b545161 100644 --- a/Makefile.target +++ b/Makefile.target @@ -89,14 +89,6 @@ translate-all.o: translate-all.c cpu.h tcg/tcg.o: cpu.h -# HELPER_CFLAGS is used for all the code compiled with static register -# variables -op_helper.o user-exec.o: QEMU_CFLAGS += $(HELPER_CFLAGS) - -# Note: this is a workaround. The real fix is to avoid compiling -# cpu_signal_handler() in user-exec.c. -signal.o: QEMU_CFLAGS += $(HELPER_CFLAGS) - ######################################################### # Linux user emulator target @@ -387,6 +379,15 @@ obj-y += $(addprefix ../, $(trace-obj-y)) endif # CONFIG_SOFTMMU +# HELPER_CFLAGS is used for all the code compiled with static register +# variables +# NOTE: Must be after the last QEMU_CFLAGS assignment +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS) + +# Note: this is a workaround. The real fix is to avoid compiling +# cpu_signal_handler() in user-exec.c. +signal.o: QEMU_CFLAGS += $(HELPER_CFLAGS) + ifndef CONFIG_LINUX_USER ifndef CONFIG_BSD_USER # libcacard needs qemu-thread support, and besides is only needed by devices -- 1.7.3.4 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH v2] tcg: Remove stack protection from helper functions 2011-09-26 10:51 ` [Qemu-devel] [PATCH v2] " Jan Kiszka @ 2011-09-26 11:33 ` Peter Maydell 2011-09-26 11:43 ` Jan Kiszka 0 siblings, 1 reply; 22+ messages in thread From: Peter Maydell @ 2011-09-26 11:33 UTC (permalink / raw) To: Jan Kiszka Cc: Blue Swirl, Laurent Desnogues, Anthony Liguori, Mulyadi Santosa, qemu-devel On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote: > This increases the overhead of frequently executed helpers. We need to > move rule past QEMU_CFLAGS assignment to ensure that the required simple > assignment picks up all bits. The signal workaround is moved just for > the sake of consistency. > +# NOTE: Must be after the last QEMU_CFLAGS assignment > +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS) Why also user-exec.o ? Why not the other source files with helpers in? This doesn't seem very consistent. Maybe the right answer is to have some of the offending helper functions inline instead? (Or to not have -fstack-protector-all globally?) -- PMM ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH v2] tcg: Remove stack protection from helper functions 2011-09-26 11:33 ` Peter Maydell @ 2011-09-26 11:43 ` Jan Kiszka 2011-09-26 11:56 ` Peter Maydell 0 siblings, 1 reply; 22+ messages in thread From: Jan Kiszka @ 2011-09-26 11:43 UTC (permalink / raw) To: Peter Maydell Cc: Blue Swirl, Laurent Desnogues, Anthony Liguori, Mulyadi Santosa, qemu-devel On 2011-09-26 13:33, Peter Maydell wrote: > On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote: >> This increases the overhead of frequently executed helpers. We need to >> move rule past QEMU_CFLAGS assignment to ensure that the required simple >> assignment picks up all bits. The signal workaround is moved just for >> the sake of consistency. > >> +# NOTE: Must be after the last QEMU_CFLAGS assignment >> +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS) > > Why also user-exec.o ? That's a good question. It doesn't look like it's deserving this. > Why not the other source files with helpers in? Name them and I add them. > This doesn't seem very consistent. Maybe the right answer is to have > some of the offending helper functions inline instead? I can't imagine that this could be a short- or even mid-term answer. Inlining is a huge work. > (Or to not > have -fstack-protector-all globally?) Opt-in instead of opt-out, that might be some approach, though I bet the out-out set still bets the opt-in crowed by some orders of magnitude. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH v2] tcg: Remove stack protection from helper functions 2011-09-26 11:43 ` Jan Kiszka @ 2011-09-26 11:56 ` Peter Maydell 2011-09-26 17:22 ` Blue Swirl 0 siblings, 1 reply; 22+ messages in thread From: Peter Maydell @ 2011-09-26 11:56 UTC (permalink / raw) To: Jan Kiszka Cc: Blue Swirl, Laurent Desnogues, Anthony Liguori, Mulyadi Santosa, qemu-devel On 26 September 2011 12:43, Jan Kiszka <jan.kiszka@siemens.com> wrote: > On 2011-09-26 13:33, Peter Maydell wrote: >> On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote: >>> This increases the overhead of frequently executed helpers. We need to >>> move rule past QEMU_CFLAGS assignment to ensure that the required simple >>> assignment picks up all bits. The signal workaround is moved just for >>> the sake of consistency. >> >>> +# NOTE: Must be after the last QEMU_CFLAGS assignment >>> +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS) >> >> Why also user-exec.o ? > > That's a good question. It doesn't look like it's deserving this. > >> Why not the other source files with helpers in? > > Name them and I add them. target-*/*helper.c ? But mostly I think what I'm trying to say is that this is making a tradeoff between safety and speed, so it ought to come with a rationale for why it is OK to remove the safety checks for these files. Given that rationale you can identify other files that are also safe/worthwhile to flip the flag for. >> (Or to not >> have -fstack-protector-all globally?) > > Opt-in instead of opt-out, that might be some approach, though I bet the > out-out set still bets the opt-in crowed by some orders of magnitude. Have you looked at whether using plain -fstack-protector for all files (rather than the -all version) helps? Presumably we had some reason for picking the -all version though... -- PMM ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH v2] tcg: Remove stack protection from helper functions 2011-09-26 11:56 ` Peter Maydell @ 2011-09-26 17:22 ` Blue Swirl 2011-09-26 17:33 ` Jan Kiszka 0 siblings, 1 reply; 22+ messages in thread From: Blue Swirl @ 2011-09-26 17:22 UTC (permalink / raw) To: Peter Maydell Cc: Laurent Desnogues, Jan Kiszka, Anthony Liguori, Mulyadi Santosa, qemu-devel On Mon, Sep 26, 2011 at 11:56 AM, Peter Maydell <peter.maydell@linaro.org> wrote: > On 26 September 2011 12:43, Jan Kiszka <jan.kiszka@siemens.com> wrote: >> On 2011-09-26 13:33, Peter Maydell wrote: >>> On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote: >>>> This increases the overhead of frequently executed helpers. We need to >>>> move rule past QEMU_CFLAGS assignment to ensure that the required simple >>>> assignment picks up all bits. The signal workaround is moved just for >>>> the sake of consistency. >>> >>>> +# NOTE: Must be after the last QEMU_CFLAGS assignment >>>> +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS) >>> >>> Why also user-exec.o ? >> >> That's a good question. It doesn't look like it's deserving this. >> >>> Why not the other source files with helpers in? >> >> Name them and I add them. > > target-*/*helper.c ? > > But mostly I think what I'm trying to say is that this is making > a tradeoff between safety and speed, so it ought to come with a > rationale for why it is OK to remove the safety checks for these > files. Given that rationale you can identify other files that are > also safe/worthwhile to flip the flag for. I wouldn't remove -fstack-protector-all by default. Especially op code interfaces with the guest. For max performance version, I'd check if -fomit-frame-pointer and -O3 makes sense. See also this article: https://www.debian-administration.org/article/672/Optimizing_code_via_compiler_flags >>> (Or to not >>> have -fstack-protector-all globally?) >> >> Opt-in instead of opt-out, that might be some approach, though I bet the >> out-out set still bets the opt-in crowed by some orders of magnitude. > > Have you looked at whether using plain -fstack-protector for all files > (rather than the -all version) helps? Presumably we had some reason for > picking the -all version though... > > -- PMM > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH v2] tcg: Remove stack protection from helper functions 2011-09-26 17:22 ` Blue Swirl @ 2011-09-26 17:33 ` Jan Kiszka 2011-09-26 18:20 ` Blue Swirl 0 siblings, 1 reply; 22+ messages in thread From: Jan Kiszka @ 2011-09-26 17:33 UTC (permalink / raw) To: Blue Swirl Cc: Laurent Desnogues, Peter Maydell, Anthony Liguori, Mulyadi Santosa, qemu-devel On 2011-09-26 19:22, Blue Swirl wrote: > On Mon, Sep 26, 2011 at 11:56 AM, Peter Maydell > <peter.maydell@linaro.org> wrote: >> On 26 September 2011 12:43, Jan Kiszka <jan.kiszka@siemens.com> wrote: >>> On 2011-09-26 13:33, Peter Maydell wrote: >>>> On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote: >>>>> This increases the overhead of frequently executed helpers. We need to >>>>> move rule past QEMU_CFLAGS assignment to ensure that the required simple >>>>> assignment picks up all bits. The signal workaround is moved just for >>>>> the sake of consistency. >>>> >>>>> +# NOTE: Must be after the last QEMU_CFLAGS assignment >>>>> +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS) >>>> >>>> Why also user-exec.o ? >>> >>> That's a good question. It doesn't look like it's deserving this. >>> >>>> Why not the other source files with helpers in? >>> >>> Name them and I add them. >> >> target-*/*helper.c ? >> >> But mostly I think what I'm trying to say is that this is making >> a tradeoff between safety and speed, so it ought to come with a >> rationale for why it is OK to remove the safety checks for these >> files. Given that rationale you can identify other files that are >> also safe/worthwhile to flip the flag for. > > I wouldn't remove -fstack-protector-all by default. Especially op code > interfaces with the guest. I'd love to have some function attribute for this, because a stack protector for rather simple arithmetic operations or something like helper_cli/sti are pointlessly burned cycles. Maybe we can introduce op_helper_simple.c. > > For max performance version, I'd check if -fomit-frame-pointer and -O3 > makes sense. See also this article: > https://www.debian-administration.org/article/672/Optimizing_code_via_compiler_flags We already run without frame pointers, -O3 might be worth exploring in addition. Still, that won't take the protector overhead away. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH v2] tcg: Remove stack protection from helper functions 2011-09-26 17:33 ` Jan Kiszka @ 2011-09-26 18:20 ` Blue Swirl 2011-09-26 18:25 ` Jan Kiszka 2011-09-26 19:08 ` Peter Maydell 0 siblings, 2 replies; 22+ messages in thread From: Blue Swirl @ 2011-09-26 18:20 UTC (permalink / raw) To: Jan Kiszka Cc: Laurent Desnogues, Peter Maydell, Anthony Liguori, Mulyadi Santosa, qemu-devel On Mon, Sep 26, 2011 at 5:33 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote: > On 2011-09-26 19:22, Blue Swirl wrote: >> On Mon, Sep 26, 2011 at 11:56 AM, Peter Maydell >> <peter.maydell@linaro.org> wrote: >>> On 26 September 2011 12:43, Jan Kiszka <jan.kiszka@siemens.com> wrote: >>>> On 2011-09-26 13:33, Peter Maydell wrote: >>>>> On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote: >>>>>> This increases the overhead of frequently executed helpers. We need to >>>>>> move rule past QEMU_CFLAGS assignment to ensure that the required simple >>>>>> assignment picks up all bits. The signal workaround is moved just for >>>>>> the sake of consistency. >>>>> >>>>>> +# NOTE: Must be after the last QEMU_CFLAGS assignment >>>>>> +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS) >>>>> >>>>> Why also user-exec.o ? >>>> >>>> That's a good question. It doesn't look like it's deserving this. >>>> >>>>> Why not the other source files with helpers in? >>>> >>>> Name them and I add them. >>> >>> target-*/*helper.c ? >>> >>> But mostly I think what I'm trying to say is that this is making >>> a tradeoff between safety and speed, so it ought to come with a >>> rationale for why it is OK to remove the safety checks for these >>> files. Given that rationale you can identify other files that are >>> also safe/worthwhile to flip the flag for. >> >> I wouldn't remove -fstack-protector-all by default. Especially op code >> interfaces with the guest. > > I'd love to have some function attribute for this, because a stack > protector for rather simple arithmetic operations or something like > helper_cli/sti are pointlessly burned cycles. In order to avoid burning the cycles, there is a certain kernel module which gives almost native performance. > Maybe we can introduce op_helper_simple.c. > >> >> For max performance version, I'd check if -fomit-frame-pointer and -O3 >> makes sense. See also this article: >> https://www.debian-administration.org/article/672/Optimizing_code_via_compiler_flags > > We already run without frame pointers, -O3 might be worth exploring in > addition. Still, that won't take the protector overhead away. It would be interesting to have some benchmarks. I'd expect that most of the run time is spent within generated code, the next largest item should be the translator and any helpers should be marginal. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH v2] tcg: Remove stack protection from helper functions 2011-09-26 18:20 ` Blue Swirl @ 2011-09-26 18:25 ` Jan Kiszka 2011-09-26 18:40 ` Blue Swirl 2011-09-26 19:08 ` Peter Maydell 1 sibling, 1 reply; 22+ messages in thread From: Jan Kiszka @ 2011-09-26 18:25 UTC (permalink / raw) To: Blue Swirl Cc: Laurent Desnogues, Peter Maydell, Anthony Liguori, Mulyadi Santosa, qemu-devel On 2011-09-26 20:20, Blue Swirl wrote: > On Mon, Sep 26, 2011 at 5:33 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote: >> On 2011-09-26 19:22, Blue Swirl wrote: >>> On Mon, Sep 26, 2011 at 11:56 AM, Peter Maydell >>> <peter.maydell@linaro.org> wrote: >>>> On 26 September 2011 12:43, Jan Kiszka <jan.kiszka@siemens.com> wrote: >>>>> On 2011-09-26 13:33, Peter Maydell wrote: >>>>>> On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote: >>>>>>> This increases the overhead of frequently executed helpers. We need to >>>>>>> move rule past QEMU_CFLAGS assignment to ensure that the required simple >>>>>>> assignment picks up all bits. The signal workaround is moved just for >>>>>>> the sake of consistency. >>>>>> >>>>>>> +# NOTE: Must be after the last QEMU_CFLAGS assignment >>>>>>> +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS) >>>>>> >>>>>> Why also user-exec.o ? >>>>> >>>>> That's a good question. It doesn't look like it's deserving this. >>>>> >>>>>> Why not the other source files with helpers in? >>>>> >>>>> Name them and I add them. >>>> >>>> target-*/*helper.c ? >>>> >>>> But mostly I think what I'm trying to say is that this is making >>>> a tradeoff between safety and speed, so it ought to come with a >>>> rationale for why it is OK to remove the safety checks for these >>>> files. Given that rationale you can identify other files that are >>>> also safe/worthwhile to flip the flag for. >>> >>> I wouldn't remove -fstack-protector-all by default. Especially op code >>> interfaces with the guest. >> >> I'd love to have some function attribute for this, because a stack >> protector for rather simple arithmetic operations or something like >> helper_cli/sti are pointlessly burned cycles. > > In order to avoid burning the cycles, there is a certain kernel module > which gives almost native performance. Well, even in 2011 there are cases remaining where VT-x/AMV-V is not available to your favorite hypervisor. > >> Maybe we can introduce op_helper_simple.c. >> >>> >>> For max performance version, I'd check if -fomit-frame-pointer and -O3 >>> makes sense. See also this article: >>> https://www.debian-administration.org/article/672/Optimizing_code_via_compiler_flags >> >> We already run without frame pointers, -O3 might be worth exploring in >> addition. Still, that won't take the protector overhead away. > > It would be interesting to have some benchmarks. I'd expect that most > of the run time is spent within generated code, the next largest item > should be the translator and any helpers should be marginal. At least we've a rather static, long-running guest, not much is recompiled once the system has settled. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH v2] tcg: Remove stack protection from helper functions 2011-09-26 18:25 ` Jan Kiszka @ 2011-09-26 18:40 ` Blue Swirl 0 siblings, 0 replies; 22+ messages in thread From: Blue Swirl @ 2011-09-26 18:40 UTC (permalink / raw) To: Jan Kiszka Cc: Laurent Desnogues, Peter Maydell, Anthony Liguori, Mulyadi Santosa, qemu-devel On Mon, Sep 26, 2011 at 6:25 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote: > On 2011-09-26 20:20, Blue Swirl wrote: >> On Mon, Sep 26, 2011 at 5:33 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote: >>> On 2011-09-26 19:22, Blue Swirl wrote: >>>> On Mon, Sep 26, 2011 at 11:56 AM, Peter Maydell >>>> <peter.maydell@linaro.org> wrote: >>>>> On 26 September 2011 12:43, Jan Kiszka <jan.kiszka@siemens.com> wrote: >>>>>> On 2011-09-26 13:33, Peter Maydell wrote: >>>>>>> On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote: >>>>>>>> This increases the overhead of frequently executed helpers. We need to >>>>>>>> move rule past QEMU_CFLAGS assignment to ensure that the required simple >>>>>>>> assignment picks up all bits. The signal workaround is moved just for >>>>>>>> the sake of consistency. >>>>>>> >>>>>>>> +# NOTE: Must be after the last QEMU_CFLAGS assignment >>>>>>>> +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS) >>>>>>> >>>>>>> Why also user-exec.o ? >>>>>> >>>>>> That's a good question. It doesn't look like it's deserving this. >>>>>> >>>>>>> Why not the other source files with helpers in? >>>>>> >>>>>> Name them and I add them. >>>>> >>>>> target-*/*helper.c ? >>>>> >>>>> But mostly I think what I'm trying to say is that this is making >>>>> a tradeoff between safety and speed, so it ought to come with a >>>>> rationale for why it is OK to remove the safety checks for these >>>>> files. Given that rationale you can identify other files that are >>>>> also safe/worthwhile to flip the flag for. >>>> >>>> I wouldn't remove -fstack-protector-all by default. Especially op code >>>> interfaces with the guest. >>> >>> I'd love to have some function attribute for this, because a stack >>> protector for rather simple arithmetic operations or something like >>> helper_cli/sti are pointlessly burned cycles. >> >> In order to avoid burning the cycles, there is a certain kernel module >> which gives almost native performance. > > Well, even in 2011 there are cases remaining where VT-x/AMV-V is not > available to your favorite hypervisor. > >> >>> Maybe we can introduce op_helper_simple.c. >>> >>>> >>>> For max performance version, I'd check if -fomit-frame-pointer and -O3 >>>> makes sense. See also this article: >>>> https://www.debian-administration.org/article/672/Optimizing_code_via_compiler_flags >>> >>> We already run without frame pointers, -O3 might be worth exploring in >>> addition. Still, that won't take the protector overhead away. >> >> It would be interesting to have some benchmarks. I'd expect that most >> of the run time is spent within generated code, the next largest item >> should be the translator and any helpers should be marginal. > > At least we've a rather static, long-running guest, not much is > recompiled once the system has settled. Optimizing the translator or improving TCG optimizer could also improve performance. By the way, -fomit-frame-pointer is not enabled globally for some reason, it should be the default for non-debug builds. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH v2] tcg: Remove stack protection from helper functions 2011-09-26 18:20 ` Blue Swirl 2011-09-26 18:25 ` Jan Kiszka @ 2011-09-26 19:08 ` Peter Maydell 1 sibling, 0 replies; 22+ messages in thread From: Peter Maydell @ 2011-09-26 19:08 UTC (permalink / raw) To: Blue Swirl Cc: Laurent Desnogues, Jan Kiszka, Anthony Liguori, Mulyadi Santosa, qemu-devel On 26 September 2011 19:20, Blue Swirl <blauwirbel@gmail.com> wrote: > It would be interesting to have some benchmarks. I'd expect that most > of the run time is spent within generated code, the next largest item > should be the translator and any helpers should be marginal. Depends a lot on the target, I suspect. On ARM at the moment we spend huge amounts of time (30%+ of runtime) in a handful of helpers like sub_cc because we haven't optimised the handling of subtract and test to do things inline. (On my todo list but performance in general is behind feature work. I have a half a patchset that I haven't got back to yet.) Which is kind of echoing your remarks about benchmarks, really. -- PMM ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2011-09-27 7:58 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-09-26 7:46 [Qemu-devel] [PATCH] tcg: Remove stack protection from helper functions Jan Kiszka 2011-09-26 8:01 ` Mulyadi Santosa 2011-09-26 8:15 ` Laurent Desnogues 2011-09-26 17:41 ` Avi Kivity 2011-09-26 19:43 ` Richard Henderson 2011-09-26 19:52 ` Avi Kivity 2011-09-26 19:53 ` Richard Henderson 2011-09-26 20:20 ` Avi Kivity 2011-09-26 20:19 ` Peter Maydell 2011-09-26 20:26 ` Avi Kivity 2011-09-27 4:29 ` Andi Kleen 2011-09-27 7:58 ` Peter Maydell 2011-09-26 10:51 ` [Qemu-devel] [PATCH v2] " Jan Kiszka 2011-09-26 11:33 ` Peter Maydell 2011-09-26 11:43 ` Jan Kiszka 2011-09-26 11:56 ` Peter Maydell 2011-09-26 17:22 ` Blue Swirl 2011-09-26 17:33 ` Jan Kiszka 2011-09-26 18:20 ` Blue Swirl 2011-09-26 18:25 ` Jan Kiszka 2011-09-26 18:40 ` Blue Swirl 2011-09-26 19:08 ` Peter Maydell
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.