* [Qemu-devel] [Bug 1830872] [NEW] AARCH64 to ARMv7 mistranslation in TCG
@ 2019-05-29 9:13 Laszlo Ersek (Red Hat)
2019-05-29 12:08 ` [Qemu-devel] [Bug 1830872] " Philippe Mathieu-Daudé
` (9 more replies)
0 siblings, 10 replies; 27+ messages in thread
From: Laszlo Ersek (Red Hat) @ 2019-05-29 9:13 UTC (permalink / raw)
To: qemu-devel
Public bug reported:
The following guest code:
https://github.com/tianocore/edk2/blob/3604174718e2afc950c3cc64c64ba5165c8692bd/MdePkg/Library/BaseMemoryLibOptDxe/AArch64/CopyMem.S
implements, in hand-optimized aarch64 assembly, the CopyMem() edk2 (EFI
Development Kit II) library function. (CopyMem() basically has memmove()
semantics, to provide a standard C analog here.) The relevant functions
are InternalMemCopyMem() and __memcpy().
When TCG translates this aarch64 code to x86_64, everything works fine.
When TCG translates this aarch64 code to ARMv7, the destination area of
the translated CopyMem() function becomes corrupted -- it differs from
the intended source contents. Namely, in every 4096 byte block, the
8-byte word at offset 4032 (0xFC0) is zeroed out in the destination,
instead of receiving the intended source value.
I'm attaching two hexdumps of the same destination area:
- "good.txt" is a hexdump of the destination area when CopyMem() was
translated to x86_64,
- "bad.txt" is a hexdump of the destination area when CopyMem() was
translated to ARMv7.
In order to assist with the analysis of this issue, I disassembled the
aarch64 binary with "objdump". Please find the listing in
"DxeCore.objdump", attached. The InternalMemCopyMem() function starts at
hex offset 2b2ec. The __memcpy() function starts at hex offset 2b180.
And, I ran the guest on the ARMv7 host with "-d
in_asm,op,op_opt,op_ind,out_asm". Please find the log in
"tcg.in_asm.op.op_opt.op_ind.out_asm.log", attached.
The TBs that correspond to (parts of) the InternalMemCopyMem() and
__memcpy() functions are scattered over the TCG log file, but the offset
between the "nice" disassembly from "DxeCore.objdump", and the in-RAM
TBs in the TCG log, can be determined from the fact that there is a
single prfm instruction in the entire binary. The instruction's offset
is 0x2b180 in "DxeCore.objdump" -- at the beginning of the __memcpy()
function --, and its RAM address is 0x472d2180 in the TCG log. Thus the
difference (= the load address of DxeCore.efi) is 0x472a7000.
QEMU was built at commit a4f667b67149 ("Merge remote-tracking branch
'remotes/cohuck/tags/s390x-20190521-3' into staging", 2019-05-21).
The reproducer command line is (on an ARMv7 host):
qemu-system-aarch64 \
-display none \
-machine virt,accel=tcg \
-nodefaults \
-nographic \
-drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-aarch64-code.fd,readonly \
-drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-arm-vars.fd,snapshot=on \
-cpu cortex-a57 \
-chardev stdio,signal=off,mux=on,id=char0 \
-mon chardev=char0,mode=readline \
-serial chardev:char0
The apparent symptom is an assertion failure *in the guest*, such as
> ASSERT [DxeCore]
> /home/lacos/src/upstream/qemu/roms/edk2/MdePkg/Library/BaseLib/String.c(1090):
> Length < _gPcd_FixedAtBuild_PcdMaximumAsciiStringLength
but that is only a (distant) consequence of the CopyMem()
mistranslation, and resultant destination area corruption.
Originally reported in the following two mailing list messages:
- http://mid.mail-archive.com/9d2e260c-c491-03d2-9b8b-b57b72083f77@redhat.com
- http://mid.mail-archive.com/f1cec8c0-1a9b-f5bb-f951-ea0ba9d276ee@redhat.com
** Affects: qemu
Importance: Undecided
Status: New
** Attachment added: "aarch64-to-arm-mistranslation.tar.xz"
https://bugs.launchpad.net/bugs/1830872/+attachment/5267323/+files/aarch64-to-arm-mistranslation.tar.xz
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1830872
Title:
AARCH64 to ARMv7 mistranslation in TCG
Status in QEMU:
New
Bug description:
The following guest code:
https://github.com/tianocore/edk2/blob/3604174718e2afc950c3cc64c64ba5165c8692bd/MdePkg/Library/BaseMemoryLibOptDxe/AArch64/CopyMem.S
implements, in hand-optimized aarch64 assembly, the CopyMem() edk2 (EFI
Development Kit II) library function. (CopyMem() basically has memmove()
semantics, to provide a standard C analog here.) The relevant functions
are InternalMemCopyMem() and __memcpy().
When TCG translates this aarch64 code to x86_64, everything works
fine.
When TCG translates this aarch64 code to ARMv7, the destination area of
the translated CopyMem() function becomes corrupted -- it differs from
the intended source contents. Namely, in every 4096 byte block, the
8-byte word at offset 4032 (0xFC0) is zeroed out in the destination,
instead of receiving the intended source value.
I'm attaching two hexdumps of the same destination area:
- "good.txt" is a hexdump of the destination area when CopyMem() was
translated to x86_64,
- "bad.txt" is a hexdump of the destination area when CopyMem() was
translated to ARMv7.
In order to assist with the analysis of this issue, I disassembled the
aarch64 binary with "objdump". Please find the listing in
"DxeCore.objdump", attached. The InternalMemCopyMem() function starts at
hex offset 2b2ec. The __memcpy() function starts at hex offset 2b180.
And, I ran the guest on the ARMv7 host with "-d
in_asm,op,op_opt,op_ind,out_asm". Please find the log in
"tcg.in_asm.op.op_opt.op_ind.out_asm.log", attached.
The TBs that correspond to (parts of) the InternalMemCopyMem() and
__memcpy() functions are scattered over the TCG log file, but the offset
between the "nice" disassembly from "DxeCore.objdump", and the in-RAM
TBs in the TCG log, can be determined from the fact that there is a
single prfm instruction in the entire binary. The instruction's offset
is 0x2b180 in "DxeCore.objdump" -- at the beginning of the __memcpy()
function --, and its RAM address is 0x472d2180 in the TCG log. Thus the
difference (= the load address of DxeCore.efi) is 0x472a7000.
QEMU was built at commit a4f667b67149 ("Merge remote-tracking branch
'remotes/cohuck/tags/s390x-20190521-3' into staging", 2019-05-21).
The reproducer command line is (on an ARMv7 host):
qemu-system-aarch64 \
-display none \
-machine virt,accel=tcg \
-nodefaults \
-nographic \
-drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-aarch64-code.fd,readonly \
-drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-arm-vars.fd,snapshot=on \
-cpu cortex-a57 \
-chardev stdio,signal=off,mux=on,id=char0 \
-mon chardev=char0,mode=readline \
-serial chardev:char0
The apparent symptom is an assertion failure *in the guest*, such as
> ASSERT [DxeCore]
> /home/lacos/src/upstream/qemu/roms/edk2/MdePkg/Library/BaseLib/String.c(1090):
> Length < _gPcd_FixedAtBuild_PcdMaximumAsciiStringLength
but that is only a (distant) consequence of the CopyMem()
mistranslation, and resultant destination area corruption.
Originally reported in the following two mailing list messages:
- http://mid.mail-archive.com/9d2e260c-c491-03d2-9b8b-b57b72083f77@redhat.com
- http://mid.mail-archive.com/f1cec8c0-1a9b-f5bb-f951-ea0ba9d276ee@redhat.com
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1830872/+subscriptions
^ permalink raw reply [flat|nested] 27+ messages in thread
* [Qemu-devel] [Bug 1830872] Re: AARCH64 to ARMv7 mistranslation in TCG 2019-05-29 9:13 [Qemu-devel] [Bug 1830872] [NEW] AARCH64 to ARMv7 mistranslation in TCG Laszlo Ersek (Red Hat) @ 2019-05-29 12:08 ` Philippe Mathieu-Daudé 2019-06-02 10:19 ` Laszlo Ersek (Red Hat) ` (8 subsequent siblings) 9 siblings, 0 replies; 27+ messages in thread From: Philippe Mathieu-Daudé @ 2019-05-29 12:08 UTC (permalink / raw) To: qemu-devel ** Tags added: arm -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1830872 Title: AARCH64 to ARMv7 mistranslation in TCG Status in QEMU: New Bug description: The following guest code: https://github.com/tianocore/edk2/blob/3604174718e2afc950c3cc64c64ba5165c8692bd/MdePkg/Library/BaseMemoryLibOptDxe/AArch64/CopyMem.S implements, in hand-optimized aarch64 assembly, the CopyMem() edk2 (EFI Development Kit II) library function. (CopyMem() basically has memmove() semantics, to provide a standard C analog here.) The relevant functions are InternalMemCopyMem() and __memcpy(). When TCG translates this aarch64 code to x86_64, everything works fine. When TCG translates this aarch64 code to ARMv7, the destination area of the translated CopyMem() function becomes corrupted -- it differs from the intended source contents. Namely, in every 4096 byte block, the 8-byte word at offset 4032 (0xFC0) is zeroed out in the destination, instead of receiving the intended source value. I'm attaching two hexdumps of the same destination area: - "good.txt" is a hexdump of the destination area when CopyMem() was translated to x86_64, - "bad.txt" is a hexdump of the destination area when CopyMem() was translated to ARMv7. In order to assist with the analysis of this issue, I disassembled the aarch64 binary with "objdump". Please find the listing in "DxeCore.objdump", attached. The InternalMemCopyMem() function starts at hex offset 2b2ec. The __memcpy() function starts at hex offset 2b180. And, I ran the guest on the ARMv7 host with "-d in_asm,op,op_opt,op_ind,out_asm". Please find the log in "tcg.in_asm.op.op_opt.op_ind.out_asm.log", attached. The TBs that correspond to (parts of) the InternalMemCopyMem() and __memcpy() functions are scattered over the TCG log file, but the offset between the "nice" disassembly from "DxeCore.objdump", and the in-RAM TBs in the TCG log, can be determined from the fact that there is a single prfm instruction in the entire binary. The instruction's offset is 0x2b180 in "DxeCore.objdump" -- at the beginning of the __memcpy() function --, and its RAM address is 0x472d2180 in the TCG log. Thus the difference (= the load address of DxeCore.efi) is 0x472a7000. QEMU was built at commit a4f667b67149 ("Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20190521-3' into staging", 2019-05-21). The reproducer command line is (on an ARMv7 host): qemu-system-aarch64 \ -display none \ -machine virt,accel=tcg \ -nodefaults \ -nographic \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-aarch64-code.fd,readonly \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-arm-vars.fd,snapshot=on \ -cpu cortex-a57 \ -chardev stdio,signal=off,mux=on,id=char0 \ -mon chardev=char0,mode=readline \ -serial chardev:char0 The apparent symptom is an assertion failure *in the guest*, such as > ASSERT [DxeCore] > /home/lacos/src/upstream/qemu/roms/edk2/MdePkg/Library/BaseLib/String.c(1090): > Length < _gPcd_FixedAtBuild_PcdMaximumAsciiStringLength but that is only a (distant) consequence of the CopyMem() mistranslation, and resultant destination area corruption. Originally reported in the following two mailing list messages: - http://mid.mail-archive.com/9d2e260c-c491-03d2-9b8b-b57b72083f77@redhat.com - http://mid.mail-archive.com/f1cec8c0-1a9b-f5bb-f951-ea0ba9d276ee@redhat.com To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1830872/+subscriptions ^ permalink raw reply [flat|nested] 27+ messages in thread
* [Qemu-devel] [Bug 1830872] Re: AARCH64 to ARMv7 mistranslation in TCG 2019-05-29 9:13 [Qemu-devel] [Bug 1830872] [NEW] AARCH64 to ARMv7 mistranslation in TCG Laszlo Ersek (Red Hat) 2019-05-29 12:08 ` [Qemu-devel] [Bug 1830872] " Philippe Mathieu-Daudé @ 2019-06-02 10:19 ` Laszlo Ersek (Red Hat) 2019-06-02 13:30 ` Alex Bennée 2019-06-02 14:54 ` Alex Bennée ` (7 subsequent siblings) 9 siblings, 1 reply; 27+ messages in thread From: Laszlo Ersek (Red Hat) @ 2019-06-02 10:19 UTC (permalink / raw) To: qemu-devel Possibly related: [Qemu-devel] "accel/tcg: demacro cputlb" break qemu-system-x86_64 https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg07362.html (qemu-system-x86_64 fails to boot 64-bit kernel under TCG accel when QEMU is built for i686) Note to self: try to reprodouce the present issue with QEMU built at eed5664238ea^ -- this LP has originally been filed about the tree at a4f667b67149, and that commit contains eed5664238ea. So checking at eed5664238ea^ might reveal a difference. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1830872 Title: AARCH64 to ARMv7 mistranslation in TCG Status in QEMU: New Bug description: The following guest code: https://github.com/tianocore/edk2/blob/3604174718e2afc950c3cc64c64ba5165c8692bd/MdePkg/Library/BaseMemoryLibOptDxe/AArch64/CopyMem.S implements, in hand-optimized aarch64 assembly, the CopyMem() edk2 (EFI Development Kit II) library function. (CopyMem() basically has memmove() semantics, to provide a standard C analog here.) The relevant functions are InternalMemCopyMem() and __memcpy(). When TCG translates this aarch64 code to x86_64, everything works fine. When TCG translates this aarch64 code to ARMv7, the destination area of the translated CopyMem() function becomes corrupted -- it differs from the intended source contents. Namely, in every 4096 byte block, the 8-byte word at offset 4032 (0xFC0) is zeroed out in the destination, instead of receiving the intended source value. I'm attaching two hexdumps of the same destination area: - "good.txt" is a hexdump of the destination area when CopyMem() was translated to x86_64, - "bad.txt" is a hexdump of the destination area when CopyMem() was translated to ARMv7. In order to assist with the analysis of this issue, I disassembled the aarch64 binary with "objdump". Please find the listing in "DxeCore.objdump", attached. The InternalMemCopyMem() function starts at hex offset 2b2ec. The __memcpy() function starts at hex offset 2b180. And, I ran the guest on the ARMv7 host with "-d in_asm,op,op_opt,op_ind,out_asm". Please find the log in "tcg.in_asm.op.op_opt.op_ind.out_asm.log", attached. The TBs that correspond to (parts of) the InternalMemCopyMem() and __memcpy() functions are scattered over the TCG log file, but the offset between the "nice" disassembly from "DxeCore.objdump", and the in-RAM TBs in the TCG log, can be determined from the fact that there is a single prfm instruction in the entire binary. The instruction's offset is 0x2b180 in "DxeCore.objdump" -- at the beginning of the __memcpy() function --, and its RAM address is 0x472d2180 in the TCG log. Thus the difference (= the load address of DxeCore.efi) is 0x472a7000. QEMU was built at commit a4f667b67149 ("Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20190521-3' into staging", 2019-05-21). The reproducer command line is (on an ARMv7 host): qemu-system-aarch64 \ -display none \ -machine virt,accel=tcg \ -nodefaults \ -nographic \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-aarch64-code.fd,readonly \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-arm-vars.fd,snapshot=on \ -cpu cortex-a57 \ -chardev stdio,signal=off,mux=on,id=char0 \ -mon chardev=char0,mode=readline \ -serial chardev:char0 The apparent symptom is an assertion failure *in the guest*, such as > ASSERT [DxeCore] > /home/lacos/src/upstream/qemu/roms/edk2/MdePkg/Library/BaseLib/String.c(1090): > Length < _gPcd_FixedAtBuild_PcdMaximumAsciiStringLength but that is only a (distant) consequence of the CopyMem() mistranslation, and resultant destination area corruption. Originally reported in the following two mailing list messages: - http://mid.mail-archive.com/9d2e260c-c491-03d2-9b8b-b57b72083f77@redhat.com - http://mid.mail-archive.com/f1cec8c0-1a9b-f5bb-f951-ea0ba9d276ee@redhat.com To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1830872/+subscriptions ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] [Bug 1830872] Re: AARCH64 to ARMv7 mistranslation in TCG @ 2019-06-02 13:30 ` Alex Bennée 0 siblings, 0 replies; 27+ messages in thread From: Alex Bennée @ 2019-06-02 13:30 UTC (permalink / raw) To: Bug 1830872; +Cc: qemu-devel Laszlo Ersek (Red Hat) <lersek@redhat.com> writes: > Possibly related: > [Qemu-devel] "accel/tcg: demacro cputlb" break qemu-system-x86_64 > https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg07362.html > > (qemu-system-x86_64 fails to boot 64-bit kernel under TCG accel when > QEMU is built for i686) > > Note to self: try to reprodouce the present issue with QEMU built at > eed5664238ea^ -- this LP has originally been filed about the tree at > a4f667b67149, and that commit contains eed5664238ea. So checking at > eed5664238ea^ might reveal a difference. Oops. Looks like tests/tcg/multiarch/system/memory.c didn't cover enough cases. -- Alex Bennée ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] [Bug 1830872] Re: AARCH64 to ARMv7 mistranslation in TCG @ 2019-06-02 13:30 ` Alex Bennée 0 siblings, 0 replies; 27+ messages in thread From: Alex Bennée @ 2019-06-02 13:30 UTC (permalink / raw) To: qemu-devel Laszlo Ersek (Red Hat) <lersek@redhat.com> writes: > Possibly related: > [Qemu-devel] "accel/tcg: demacro cputlb" break qemu-system-x86_64 > https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg07362.html > > (qemu-system-x86_64 fails to boot 64-bit kernel under TCG accel when > QEMU is built for i686) > > Note to self: try to reprodouce the present issue with QEMU built at > eed5664238ea^ -- this LP has originally been filed about the tree at > a4f667b67149, and that commit contains eed5664238ea. So checking at > eed5664238ea^ might reveal a difference. Oops. Looks like tests/tcg/multiarch/system/memory.c didn't cover enough cases. -- Alex Bennée -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1830872 Title: AARCH64 to ARMv7 mistranslation in TCG Status in QEMU: New Bug description: The following guest code: https://github.com/tianocore/edk2/blob/3604174718e2afc950c3cc64c64ba5165c8692bd/MdePkg/Library/BaseMemoryLibOptDxe/AArch64/CopyMem.S implements, in hand-optimized aarch64 assembly, the CopyMem() edk2 (EFI Development Kit II) library function. (CopyMem() basically has memmove() semantics, to provide a standard C analog here.) The relevant functions are InternalMemCopyMem() and __memcpy(). When TCG translates this aarch64 code to x86_64, everything works fine. When TCG translates this aarch64 code to ARMv7, the destination area of the translated CopyMem() function becomes corrupted -- it differs from the intended source contents. Namely, in every 4096 byte block, the 8-byte word at offset 4032 (0xFC0) is zeroed out in the destination, instead of receiving the intended source value. I'm attaching two hexdumps of the same destination area: - "good.txt" is a hexdump of the destination area when CopyMem() was translated to x86_64, - "bad.txt" is a hexdump of the destination area when CopyMem() was translated to ARMv7. In order to assist with the analysis of this issue, I disassembled the aarch64 binary with "objdump". Please find the listing in "DxeCore.objdump", attached. The InternalMemCopyMem() function starts at hex offset 2b2ec. The __memcpy() function starts at hex offset 2b180. And, I ran the guest on the ARMv7 host with "-d in_asm,op,op_opt,op_ind,out_asm". Please find the log in "tcg.in_asm.op.op_opt.op_ind.out_asm.log", attached. The TBs that correspond to (parts of) the InternalMemCopyMem() and __memcpy() functions are scattered over the TCG log file, but the offset between the "nice" disassembly from "DxeCore.objdump", and the in-RAM TBs in the TCG log, can be determined from the fact that there is a single prfm instruction in the entire binary. The instruction's offset is 0x2b180 in "DxeCore.objdump" -- at the beginning of the __memcpy() function --, and its RAM address is 0x472d2180 in the TCG log. Thus the difference (= the load address of DxeCore.efi) is 0x472a7000. QEMU was built at commit a4f667b67149 ("Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20190521-3' into staging", 2019-05-21). The reproducer command line is (on an ARMv7 host): qemu-system-aarch64 \ -display none \ -machine virt,accel=tcg \ -nodefaults \ -nographic \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-aarch64-code.fd,readonly \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-arm-vars.fd,snapshot=on \ -cpu cortex-a57 \ -chardev stdio,signal=off,mux=on,id=char0 \ -mon chardev=char0,mode=readline \ -serial chardev:char0 The apparent symptom is an assertion failure *in the guest*, such as > ASSERT [DxeCore] > /home/lacos/src/upstream/qemu/roms/edk2/MdePkg/Library/BaseLib/String.c(1090): > Length < _gPcd_FixedAtBuild_PcdMaximumAsciiStringLength but that is only a (distant) consequence of the CopyMem() mistranslation, and resultant destination area corruption. Originally reported in the following two mailing list messages: - http://mid.mail-archive.com/9d2e260c-c491-03d2-9b8b-b57b72083f77@redhat.com - http://mid.mail-archive.com/f1cec8c0-1a9b-f5bb-f951-ea0ba9d276ee@redhat.com To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1830872/+subscriptions ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] [Bug 1830872] Re: AARCH64 to ARMv7 mistranslation in TCG @ 2019-06-03 11:56 ` Alex Bennée 0 siblings, 0 replies; 27+ messages in thread From: Alex Bennée @ 2019-06-03 11:56 UTC (permalink / raw) To: Bug 1830872; +Cc: qemu-devel Alex Bennée <alex.bennee@linaro.org> writes: > Laszlo Ersek (Red Hat) <lersek@redhat.com> writes: > >> Possibly related: >> [Qemu-devel] "accel/tcg: demacro cputlb" break qemu-system-x86_64 >> https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg07362.html >> >> (qemu-system-x86_64 fails to boot 64-bit kernel under TCG accel when >> QEMU is built for i686) >> >> Note to self: try to reprodouce the present issue with QEMU built at >> eed5664238ea^ -- this LP has originally been filed about the tree at >> a4f667b67149, and that commit contains eed5664238ea. So checking at >> eed5664238ea^ might reveal a difference. > > Oops. Looks like tests/tcg/multiarch/system/memory.c didn't cover enough > cases. Actually I do see something with i386 host running the aarch64 memory test (although so far not with a armv7 host): ./qemu-system-aarch64 -monitor none -display none -M virt -cpu max -display none -semihosting -kernel tests/memory Gives: Reading u64 from 0x40213004 (offset 4):....Error 0, 0, 0, 0, 250, 249, 248, 255Test complete: FAILED -- Alex Bennée ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] [Bug 1830872] Re: AARCH64 to ARMv7 mistranslation in TCG @ 2019-06-03 11:56 ` Alex Bennée 0 siblings, 0 replies; 27+ messages in thread From: Alex Bennée @ 2019-06-03 11:56 UTC (permalink / raw) To: qemu-devel Alex Bennée <alex.bennee@linaro.org> writes: > Laszlo Ersek (Red Hat) <lersek@redhat.com> writes: > >> Possibly related: >> [Qemu-devel] "accel/tcg: demacro cputlb" break qemu-system-x86_64 >> https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg07362.html >> >> (qemu-system-x86_64 fails to boot 64-bit kernel under TCG accel when >> QEMU is built for i686) >> >> Note to self: try to reprodouce the present issue with QEMU built at >> eed5664238ea^ -- this LP has originally been filed about the tree at >> a4f667b67149, and that commit contains eed5664238ea. So checking at >> eed5664238ea^ might reveal a difference. > > Oops. Looks like tests/tcg/multiarch/system/memory.c didn't cover enough > cases. Actually I do see something with i386 host running the aarch64 memory test (although so far not with a armv7 host): ./qemu-system-aarch64 -monitor none -display none -M virt -cpu max -display none -semihosting -kernel tests/memory Gives: Reading u64 from 0x40213004 (offset 4):....Error 0, 0, 0, 0, 250, 249, 248, 255Test complete: FAILED -- Alex Bennée -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1830872 Title: AARCH64 to ARMv7 mistranslation in TCG Status in QEMU: New Bug description: The following guest code: https://github.com/tianocore/edk2/blob/3604174718e2afc950c3cc64c64ba5165c8692bd/MdePkg/Library/BaseMemoryLibOptDxe/AArch64/CopyMem.S implements, in hand-optimized aarch64 assembly, the CopyMem() edk2 (EFI Development Kit II) library function. (CopyMem() basically has memmove() semantics, to provide a standard C analog here.) The relevant functions are InternalMemCopyMem() and __memcpy(). When TCG translates this aarch64 code to x86_64, everything works fine. When TCG translates this aarch64 code to ARMv7, the destination area of the translated CopyMem() function becomes corrupted -- it differs from the intended source contents. Namely, in every 4096 byte block, the 8-byte word at offset 4032 (0xFC0) is zeroed out in the destination, instead of receiving the intended source value. I'm attaching two hexdumps of the same destination area: - "good.txt" is a hexdump of the destination area when CopyMem() was translated to x86_64, - "bad.txt" is a hexdump of the destination area when CopyMem() was translated to ARMv7. In order to assist with the analysis of this issue, I disassembled the aarch64 binary with "objdump". Please find the listing in "DxeCore.objdump", attached. The InternalMemCopyMem() function starts at hex offset 2b2ec. The __memcpy() function starts at hex offset 2b180. And, I ran the guest on the ARMv7 host with "-d in_asm,op,op_opt,op_ind,out_asm". Please find the log in "tcg.in_asm.op.op_opt.op_ind.out_asm.log", attached. The TBs that correspond to (parts of) the InternalMemCopyMem() and __memcpy() functions are scattered over the TCG log file, but the offset between the "nice" disassembly from "DxeCore.objdump", and the in-RAM TBs in the TCG log, can be determined from the fact that there is a single prfm instruction in the entire binary. The instruction's offset is 0x2b180 in "DxeCore.objdump" -- at the beginning of the __memcpy() function --, and its RAM address is 0x472d2180 in the TCG log. Thus the difference (= the load address of DxeCore.efi) is 0x472a7000. QEMU was built at commit a4f667b67149 ("Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20190521-3' into staging", 2019-05-21). The reproducer command line is (on an ARMv7 host): qemu-system-aarch64 \ -display none \ -machine virt,accel=tcg \ -nodefaults \ -nographic \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-aarch64-code.fd,readonly \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-arm-vars.fd,snapshot=on \ -cpu cortex-a57 \ -chardev stdio,signal=off,mux=on,id=char0 \ -mon chardev=char0,mode=readline \ -serial chardev:char0 The apparent symptom is an assertion failure *in the guest*, such as > ASSERT [DxeCore] > /home/lacos/src/upstream/qemu/roms/edk2/MdePkg/Library/BaseLib/String.c(1090): > Length < _gPcd_FixedAtBuild_PcdMaximumAsciiStringLength but that is only a (distant) consequence of the CopyMem() mistranslation, and resultant destination area corruption. Originally reported in the following two mailing list messages: - http://mid.mail-archive.com/9d2e260c-c491-03d2-9b8b-b57b72083f77@redhat.com - http://mid.mail-archive.com/f1cec8c0-1a9b-f5bb-f951-ea0ba9d276ee@redhat.com To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1830872/+subscriptions ^ permalink raw reply [flat|nested] 27+ messages in thread
* [Qemu-devel] [Bug 1830872] Re: AARCH64 to ARMv7 mistranslation in TCG 2019-05-29 9:13 [Qemu-devel] [Bug 1830872] [NEW] AARCH64 to ARMv7 mistranslation in TCG Laszlo Ersek (Red Hat) 2019-05-29 12:08 ` [Qemu-devel] [Bug 1830872] " Philippe Mathieu-Daudé 2019-06-02 10:19 ` Laszlo Ersek (Red Hat) @ 2019-06-02 14:54 ` Alex Bennée 2019-06-03 15:01 ` [Qemu-devel] [Bug 1830872] " Alex Bennée ` (6 subsequent siblings) 9 siblings, 0 replies; 27+ messages in thread From: Alex Bennée @ 2019-06-02 14:54 UTC (permalink / raw) To: qemu-devel ** Tags added: testcase -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1830872 Title: AARCH64 to ARMv7 mistranslation in TCG Status in QEMU: New Bug description: The following guest code: https://github.com/tianocore/edk2/blob/3604174718e2afc950c3cc64c64ba5165c8692bd/MdePkg/Library/BaseMemoryLibOptDxe/AArch64/CopyMem.S implements, in hand-optimized aarch64 assembly, the CopyMem() edk2 (EFI Development Kit II) library function. (CopyMem() basically has memmove() semantics, to provide a standard C analog here.) The relevant functions are InternalMemCopyMem() and __memcpy(). When TCG translates this aarch64 code to x86_64, everything works fine. When TCG translates this aarch64 code to ARMv7, the destination area of the translated CopyMem() function becomes corrupted -- it differs from the intended source contents. Namely, in every 4096 byte block, the 8-byte word at offset 4032 (0xFC0) is zeroed out in the destination, instead of receiving the intended source value. I'm attaching two hexdumps of the same destination area: - "good.txt" is a hexdump of the destination area when CopyMem() was translated to x86_64, - "bad.txt" is a hexdump of the destination area when CopyMem() was translated to ARMv7. In order to assist with the analysis of this issue, I disassembled the aarch64 binary with "objdump". Please find the listing in "DxeCore.objdump", attached. The InternalMemCopyMem() function starts at hex offset 2b2ec. The __memcpy() function starts at hex offset 2b180. And, I ran the guest on the ARMv7 host with "-d in_asm,op,op_opt,op_ind,out_asm". Please find the log in "tcg.in_asm.op.op_opt.op_ind.out_asm.log", attached. The TBs that correspond to (parts of) the InternalMemCopyMem() and __memcpy() functions are scattered over the TCG log file, but the offset between the "nice" disassembly from "DxeCore.objdump", and the in-RAM TBs in the TCG log, can be determined from the fact that there is a single prfm instruction in the entire binary. The instruction's offset is 0x2b180 in "DxeCore.objdump" -- at the beginning of the __memcpy() function --, and its RAM address is 0x472d2180 in the TCG log. Thus the difference (= the load address of DxeCore.efi) is 0x472a7000. QEMU was built at commit a4f667b67149 ("Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20190521-3' into staging", 2019-05-21). The reproducer command line is (on an ARMv7 host): qemu-system-aarch64 \ -display none \ -machine virt,accel=tcg \ -nodefaults \ -nographic \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-aarch64-code.fd,readonly \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-arm-vars.fd,snapshot=on \ -cpu cortex-a57 \ -chardev stdio,signal=off,mux=on,id=char0 \ -mon chardev=char0,mode=readline \ -serial chardev:char0 The apparent symptom is an assertion failure *in the guest*, such as > ASSERT [DxeCore] > /home/lacos/src/upstream/qemu/roms/edk2/MdePkg/Library/BaseLib/String.c(1090): > Length < _gPcd_FixedAtBuild_PcdMaximumAsciiStringLength but that is only a (distant) consequence of the CopyMem() mistranslation, and resultant destination area corruption. Originally reported in the following two mailing list messages: - http://mid.mail-archive.com/9d2e260c-c491-03d2-9b8b-b57b72083f77@redhat.com - http://mid.mail-archive.com/f1cec8c0-1a9b-f5bb-f951-ea0ba9d276ee@redhat.com To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1830872/+subscriptions ^ permalink raw reply [flat|nested] 27+ messages in thread
* [Qemu-devel] [RFC PATCH] cputlb: use uint64_t for interim values for unaligned load @ 2019-06-03 15:01 ` Alex Bennée 0 siblings, 0 replies; 27+ messages in thread From: Alex Bennée @ 2019-06-03 15:01 UTC (permalink / raw) To: qemu-devel Cc: 1830872, qemu-arm, Paolo Bonzini, Alex Bennée, randrianasulu, Richard Henderson When running on 32 bit TCG backends a wide unaligned load ends up truncating data before returning to the guest. We specifically have the return type as uint64_t to avoid any premature truncation so we should use the same for the interim types. Hopefully fixes #1830872 Signed-off-by: Alex Bennée <alex.bennee@linaro.org> --- accel/tcg/cputlb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c index cdcc3771020..b796ab1cbea 100644 --- a/accel/tcg/cputlb.c +++ b/accel/tcg/cputlb.c @@ -1303,7 +1303,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi, && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1 >= TARGET_PAGE_SIZE)) { target_ulong addr1, addr2; - tcg_target_ulong r1, r2; + uint64_t r1, r2; unsigned shift; do_unaligned_access: addr1 = addr & ~(size - 1); -- 2.20.1 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* [Qemu-devel] [Bug 1830872] [RFC PATCH] cputlb: use uint64_t for interim values for unaligned load @ 2019-06-03 15:01 ` Alex Bennée 0 siblings, 0 replies; 27+ messages in thread From: Alex Bennée @ 2019-06-03 15:01 UTC (permalink / raw) To: qemu-devel When running on 32 bit TCG backends a wide unaligned load ends up truncating data before returning to the guest. We specifically have the return type as uint64_t to avoid any premature truncation so we should use the same for the interim types. Hopefully fixes #1830872 Signed-off-by: Alex Bennée <alex.bennee@linaro.org> --- accel/tcg/cputlb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c index cdcc3771020..b796ab1cbea 100644 --- a/accel/tcg/cputlb.c +++ b/accel/tcg/cputlb.c @@ -1303,7 +1303,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi, && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1 >= TARGET_PAGE_SIZE)) { target_ulong addr1, addr2; - tcg_target_ulong r1, r2; + uint64_t r1, r2; unsigned shift; do_unaligned_access: addr1 = addr & ~(size - 1); -- 2.20.1 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1830872 Title: AARCH64 to ARMv7 mistranslation in TCG Status in QEMU: New Bug description: The following guest code: https://github.com/tianocore/edk2/blob/3604174718e2afc950c3cc64c64ba5165c8692bd/MdePkg/Library/BaseMemoryLibOptDxe/AArch64/CopyMem.S implements, in hand-optimized aarch64 assembly, the CopyMem() edk2 (EFI Development Kit II) library function. (CopyMem() basically has memmove() semantics, to provide a standard C analog here.) The relevant functions are InternalMemCopyMem() and __memcpy(). When TCG translates this aarch64 code to x86_64, everything works fine. When TCG translates this aarch64 code to ARMv7, the destination area of the translated CopyMem() function becomes corrupted -- it differs from the intended source contents. Namely, in every 4096 byte block, the 8-byte word at offset 4032 (0xFC0) is zeroed out in the destination, instead of receiving the intended source value. I'm attaching two hexdumps of the same destination area: - "good.txt" is a hexdump of the destination area when CopyMem() was translated to x86_64, - "bad.txt" is a hexdump of the destination area when CopyMem() was translated to ARMv7. In order to assist with the analysis of this issue, I disassembled the aarch64 binary with "objdump". Please find the listing in "DxeCore.objdump", attached. The InternalMemCopyMem() function starts at hex offset 2b2ec. The __memcpy() function starts at hex offset 2b180. And, I ran the guest on the ARMv7 host with "-d in_asm,op,op_opt,op_ind,out_asm". Please find the log in "tcg.in_asm.op.op_opt.op_ind.out_asm.log", attached. The TBs that correspond to (parts of) the InternalMemCopyMem() and __memcpy() functions are scattered over the TCG log file, but the offset between the "nice" disassembly from "DxeCore.objdump", and the in-RAM TBs in the TCG log, can be determined from the fact that there is a single prfm instruction in the entire binary. The instruction's offset is 0x2b180 in "DxeCore.objdump" -- at the beginning of the __memcpy() function --, and its RAM address is 0x472d2180 in the TCG log. Thus the difference (= the load address of DxeCore.efi) is 0x472a7000. QEMU was built at commit a4f667b67149 ("Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20190521-3' into staging", 2019-05-21). The reproducer command line is (on an ARMv7 host): qemu-system-aarch64 \ -display none \ -machine virt,accel=tcg \ -nodefaults \ -nographic \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-aarch64-code.fd,readonly \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-arm-vars.fd,snapshot=on \ -cpu cortex-a57 \ -chardev stdio,signal=off,mux=on,id=char0 \ -mon chardev=char0,mode=readline \ -serial chardev:char0 The apparent symptom is an assertion failure *in the guest*, such as > ASSERT [DxeCore] > /home/lacos/src/upstream/qemu/roms/edk2/MdePkg/Library/BaseLib/String.c(1090): > Length < _gPcd_FixedAtBuild_PcdMaximumAsciiStringLength but that is only a (distant) consequence of the CopyMem() mistranslation, and resultant destination area corruption. Originally reported in the following two mailing list messages: - http://mid.mail-archive.com/9d2e260c-c491-03d2-9b8b-b57b72083f77@redhat.com - http://mid.mail-archive.com/f1cec8c0-1a9b-f5bb-f951-ea0ba9d276ee@redhat.com To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1830872/+subscriptions ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] cputlb: use uint64_t for interim values for unaligned load @ 2019-06-03 15:35 ` Andrew Randrianasulu 0 siblings, 0 replies; 27+ messages in thread From: Andrew Randrianasulu @ 2019-06-03 15:35 UTC (permalink / raw) To: Alex Bennée Cc: 1830872, qemu-arm, qemu-devel, Paolo Bonzini, Richard Henderson В сообщении от Monday 03 June 2019 18:01:20 Alex Bennée написал(а): > When running on 32 bit TCG backends a wide unaligned load ends up > truncating data before returning to the guest. We specifically have > the return type as uint64_t to avoid any premature truncation so we > should use the same for the interim types. > > Hopefully fixes #1830872 > > Signed-off-by: Alex Bennée <alex.bennee@linaro.org> > --- > accel/tcg/cputlb.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c > index cdcc3771020..b796ab1cbea 100644 > --- a/accel/tcg/cputlb.c > +++ b/accel/tcg/cputlb.c > @@ -1303,7 +1303,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi, > && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1 > >= TARGET_PAGE_SIZE)) { > target_ulong addr1, addr2; > - tcg_target_ulong r1, r2; > + uint64_t r1, r2; > unsigned shift; > do_unaligned_access: > addr1 = addr & ~(size - 1); Unfortunatly, this doesn't fix 32-bit qemu-system-x86_64 .... so, my bug is separate from #1830872 ? I also was unable to convince qemu to use my kernel-only x86_64 gcc 6.5.0 cross-compiler .. probably x86-64 testing on i686 requires either docker (I don't have this ) or 'real' cross-compiler (build with glibc support). ^ permalink raw reply [flat|nested] 27+ messages in thread
* [Qemu-devel] [Bug 1830872] Re: [RFC PATCH] cputlb: use uint64_t for interim values for unaligned load @ 2019-06-03 15:35 ` Andrew Randrianasulu 0 siblings, 0 replies; 27+ messages in thread From: Andrew Randrianasulu @ 2019-06-03 15:35 UTC (permalink / raw) To: qemu-devel В сообщении от Monday 03 June 2019 18:01:20 Alex Bennée написал(а): > When running on 32 bit TCG backends a wide unaligned load ends up > truncating data before returning to the guest. We specifically have > the return type as uint64_t to avoid any premature truncation so we > should use the same for the interim types. > > Hopefully fixes #1830872 > > Signed-off-by: Alex Bennée <alex.bennee@linaro.org> > --- > accel/tcg/cputlb.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c > index cdcc3771020..b796ab1cbea 100644 > --- a/accel/tcg/cputlb.c > +++ b/accel/tcg/cputlb.c > @@ -1303,7 +1303,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi, > && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1 > >= TARGET_PAGE_SIZE)) { > target_ulong addr1, addr2; > - tcg_target_ulong r1, r2; > + uint64_t r1, r2; > unsigned shift; > do_unaligned_access: > addr1 = addr & ~(size - 1); Unfortunatly, this doesn't fix 32-bit qemu-system-x86_64 .... so, my bug is separate from #1830872 ? I also was unable to convince qemu to use my kernel-only x86_64 gcc 6.5.0 cross-compiler .. probably x86-64 testing on i686 requires either docker (I don't have this ) or 'real' cross-compiler (build with glibc support). -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1830872 Title: AARCH64 to ARMv7 mistranslation in TCG Status in QEMU: New Bug description: The following guest code: https://github.com/tianocore/edk2/blob/3604174718e2afc950c3cc64c64ba5165c8692bd/MdePkg/Library/BaseMemoryLibOptDxe/AArch64/CopyMem.S implements, in hand-optimized aarch64 assembly, the CopyMem() edk2 (EFI Development Kit II) library function. (CopyMem() basically has memmove() semantics, to provide a standard C analog here.) The relevant functions are InternalMemCopyMem() and __memcpy(). When TCG translates this aarch64 code to x86_64, everything works fine. When TCG translates this aarch64 code to ARMv7, the destination area of the translated CopyMem() function becomes corrupted -- it differs from the intended source contents. Namely, in every 4096 byte block, the 8-byte word at offset 4032 (0xFC0) is zeroed out in the destination, instead of receiving the intended source value. I'm attaching two hexdumps of the same destination area: - "good.txt" is a hexdump of the destination area when CopyMem() was translated to x86_64, - "bad.txt" is a hexdump of the destination area when CopyMem() was translated to ARMv7. In order to assist with the analysis of this issue, I disassembled the aarch64 binary with "objdump". Please find the listing in "DxeCore.objdump", attached. The InternalMemCopyMem() function starts at hex offset 2b2ec. The __memcpy() function starts at hex offset 2b180. And, I ran the guest on the ARMv7 host with "-d in_asm,op,op_opt,op_ind,out_asm". Please find the log in "tcg.in_asm.op.op_opt.op_ind.out_asm.log", attached. The TBs that correspond to (parts of) the InternalMemCopyMem() and __memcpy() functions are scattered over the TCG log file, but the offset between the "nice" disassembly from "DxeCore.objdump", and the in-RAM TBs in the TCG log, can be determined from the fact that there is a single prfm instruction in the entire binary. The instruction's offset is 0x2b180 in "DxeCore.objdump" -- at the beginning of the __memcpy() function --, and its RAM address is 0x472d2180 in the TCG log. Thus the difference (= the load address of DxeCore.efi) is 0x472a7000. QEMU was built at commit a4f667b67149 ("Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20190521-3' into staging", 2019-05-21). The reproducer command line is (on an ARMv7 host): qemu-system-aarch64 \ -display none \ -machine virt,accel=tcg \ -nodefaults \ -nographic \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-aarch64-code.fd,readonly \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-arm-vars.fd,snapshot=on \ -cpu cortex-a57 \ -chardev stdio,signal=off,mux=on,id=char0 \ -mon chardev=char0,mode=readline \ -serial chardev:char0 The apparent symptom is an assertion failure *in the guest*, such as > ASSERT [DxeCore] > /home/lacos/src/upstream/qemu/roms/edk2/MdePkg/Library/BaseLib/String.c(1090): > Length < _gPcd_FixedAtBuild_PcdMaximumAsciiStringLength but that is only a (distant) consequence of the CopyMem() mistranslation, and resultant destination area corruption. Originally reported in the following two mailing list messages: - http://mid.mail-archive.com/9d2e260c-c491-03d2-9b8b-b57b72083f77@redhat.com - http://mid.mail-archive.com/f1cec8c0-1a9b-f5bb-f951-ea0ba9d276ee@redhat.com To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1830872/+subscriptions ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] cputlb: use uint64_t for interim values for unaligned load @ 2019-06-04 9:43 ` Alex Bennée 0 siblings, 0 replies; 27+ messages in thread From: Alex Bennée @ 2019-06-04 9:43 UTC (permalink / raw) To: Andrew Randrianasulu Cc: 1830872, qemu-arm, qemu-devel, Paolo Bonzini, Richard Henderson Andrew Randrianasulu <randrianasulu@gmail.com> writes: > В сообщении от Monday 03 June 2019 18:01:20 Alex Bennée написал(а): >> When running on 32 bit TCG backends a wide unaligned load ends up >> truncating data before returning to the guest. We specifically have >> the return type as uint64_t to avoid any premature truncation so we >> should use the same for the interim types. >> >> Hopefully fixes #1830872 >> >> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> >> --- >> accel/tcg/cputlb.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c >> index cdcc3771020..b796ab1cbea 100644 >> --- a/accel/tcg/cputlb.c >> +++ b/accel/tcg/cputlb.c >> @@ -1303,7 +1303,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi, >> && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1 >> >= TARGET_PAGE_SIZE)) { >> target_ulong addr1, addr2; >> - tcg_target_ulong r1, r2; >> + uint64_t r1, r2; >> unsigned shift; >> do_unaligned_access: >> addr1 = addr & ~(size - 1); > > Unfortunatly, this doesn't fix 32-bit qemu-system-x86_64 .... so, my > bug is separate from #1830872 ? I think you've hit two - one of which we have just fixed. With my expanded memory test on i386 I'm seeing a hang but it's ok @ pull-demacro-softmmu-100519-1. Unfortunately bisecting through the slirp move and other i386 Werror stuff is proving painful. > > I also was unable to convince qemu to use my kernel-only x86_64 gcc 6.5.0 cross-compiler .. > probably x86-64 testing on i686 requires either docker (I don't have this > ) or 'real' cross-compiler (build with glibc support). -- Alex Bennée ^ permalink raw reply [flat|nested] 27+ messages in thread
* [Qemu-devel] [Bug 1830872] Re: [RFC PATCH] cputlb: use uint64_t for interim values for unaligned load @ 2019-06-04 9:43 ` Alex Bennée 0 siblings, 0 replies; 27+ messages in thread From: Alex Bennée @ 2019-06-04 9:43 UTC (permalink / raw) To: qemu-devel Andrew Randrianasulu <randrianasulu@gmail.com> writes: > В сообщении от Monday 03 June 2019 18:01:20 Alex Bennée написал(а): >> When running on 32 bit TCG backends a wide unaligned load ends up >> truncating data before returning to the guest. We specifically have >> the return type as uint64_t to avoid any premature truncation so we >> should use the same for the interim types. >> >> Hopefully fixes #1830872 >> >> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> >> --- >> accel/tcg/cputlb.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c >> index cdcc3771020..b796ab1cbea 100644 >> --- a/accel/tcg/cputlb.c >> +++ b/accel/tcg/cputlb.c >> @@ -1303,7 +1303,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi, >> && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1 >> >= TARGET_PAGE_SIZE)) { >> target_ulong addr1, addr2; >> - tcg_target_ulong r1, r2; >> + uint64_t r1, r2; >> unsigned shift; >> do_unaligned_access: >> addr1 = addr & ~(size - 1); > > Unfortunatly, this doesn't fix 32-bit qemu-system-x86_64 .... so, my > bug is separate from #1830872 ? I think you've hit two - one of which we have just fixed. With my expanded memory test on i386 I'm seeing a hang but it's ok @ pull-demacro-softmmu-100519-1. Unfortunately bisecting through the slirp move and other i386 Werror stuff is proving painful. > > I also was unable to convince qemu to use my kernel-only x86_64 gcc 6.5.0 cross-compiler .. > probably x86-64 testing on i686 requires either docker (I don't have this > ) or 'real' cross-compiler (build with glibc support). -- Alex Bennée -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1830872 Title: AARCH64 to ARMv7 mistranslation in TCG Status in QEMU: New Bug description: The following guest code: https://github.com/tianocore/edk2/blob/3604174718e2afc950c3cc64c64ba5165c8692bd/MdePkg/Library/BaseMemoryLibOptDxe/AArch64/CopyMem.S implements, in hand-optimized aarch64 assembly, the CopyMem() edk2 (EFI Development Kit II) library function. (CopyMem() basically has memmove() semantics, to provide a standard C analog here.) The relevant functions are InternalMemCopyMem() and __memcpy(). When TCG translates this aarch64 code to x86_64, everything works fine. When TCG translates this aarch64 code to ARMv7, the destination area of the translated CopyMem() function becomes corrupted -- it differs from the intended source contents. Namely, in every 4096 byte block, the 8-byte word at offset 4032 (0xFC0) is zeroed out in the destination, instead of receiving the intended source value. I'm attaching two hexdumps of the same destination area: - "good.txt" is a hexdump of the destination area when CopyMem() was translated to x86_64, - "bad.txt" is a hexdump of the destination area when CopyMem() was translated to ARMv7. In order to assist with the analysis of this issue, I disassembled the aarch64 binary with "objdump". Please find the listing in "DxeCore.objdump", attached. The InternalMemCopyMem() function starts at hex offset 2b2ec. The __memcpy() function starts at hex offset 2b180. And, I ran the guest on the ARMv7 host with "-d in_asm,op,op_opt,op_ind,out_asm". Please find the log in "tcg.in_asm.op.op_opt.op_ind.out_asm.log", attached. The TBs that correspond to (parts of) the InternalMemCopyMem() and __memcpy() functions are scattered over the TCG log file, but the offset between the "nice" disassembly from "DxeCore.objdump", and the in-RAM TBs in the TCG log, can be determined from the fact that there is a single prfm instruction in the entire binary. The instruction's offset is 0x2b180 in "DxeCore.objdump" -- at the beginning of the __memcpy() function --, and its RAM address is 0x472d2180 in the TCG log. Thus the difference (= the load address of DxeCore.efi) is 0x472a7000. QEMU was built at commit a4f667b67149 ("Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20190521-3' into staging", 2019-05-21). The reproducer command line is (on an ARMv7 host): qemu-system-aarch64 \ -display none \ -machine virt,accel=tcg \ -nodefaults \ -nographic \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-aarch64-code.fd,readonly \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-arm-vars.fd,snapshot=on \ -cpu cortex-a57 \ -chardev stdio,signal=off,mux=on,id=char0 \ -mon chardev=char0,mode=readline \ -serial chardev:char0 The apparent symptom is an assertion failure *in the guest*, such as > ASSERT [DxeCore] > /home/lacos/src/upstream/qemu/roms/edk2/MdePkg/Library/BaseLib/String.c(1090): > Length < _gPcd_FixedAtBuild_PcdMaximumAsciiStringLength but that is only a (distant) consequence of the CopyMem() mistranslation, and resultant destination area corruption. Originally reported in the following two mailing list messages: - http://mid.mail-archive.com/9d2e260c-c491-03d2-9b8b-b57b72083f77@redhat.com - http://mid.mail-archive.com/f1cec8c0-1a9b-f5bb-f951-ea0ba9d276ee@redhat.com To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1830872/+subscriptions ^ permalink raw reply [flat|nested] 27+ messages in thread
* [Qemu-devel] [Bug 1830872] Re: [RFC PATCH] cputlb: use uint64_t for interim values for unaligned load @ 2019-06-03 18:29 ` Laszlo Ersek 0 siblings, 0 replies; 27+ messages in thread From: Laszlo Ersek (Red Hat) @ 2019-06-03 18:29 UTC (permalink / raw) To: qemu-devel (+Igor) On 06/03/19 17:01, Alex Bennée wrote: > When running on 32 bit TCG backends a wide unaligned load ends up > truncating data before returning to the guest. We specifically have > the return type as uint64_t to avoid any premature truncation so we > should use the same for the interim types. > > Hopefully fixes #1830872 > > Signed-off-by: Alex Bennée <alex.bennee@linaro.org> > --- > accel/tcg/cputlb.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c > index cdcc3771020..b796ab1cbea 100644 > --- a/accel/tcg/cputlb.c > +++ b/accel/tcg/cputlb.c > @@ -1303,7 +1303,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi, > && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1 > >= TARGET_PAGE_SIZE)) { > target_ulong addr1, addr2; > - tcg_target_ulong r1, r2; > + uint64_t r1, r2; > unsigned shift; > do_unaligned_access: > addr1 = addr & ~(size - 1); > Applied on top of commit ad88e4252f09c2956b99c90de39e95bab2e8e7af: Tested-by: Laszlo Ersek <lersek@redhat.com> Thanks! Laszlo -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1830872 Title: AARCH64 to ARMv7 mistranslation in TCG Status in QEMU: New Bug description: The following guest code: https://github.com/tianocore/edk2/blob/3604174718e2afc950c3cc64c64ba5165c8692bd/MdePkg/Library/BaseMemoryLibOptDxe/AArch64/CopyMem.S implements, in hand-optimized aarch64 assembly, the CopyMem() edk2 (EFI Development Kit II) library function. (CopyMem() basically has memmove() semantics, to provide a standard C analog here.) The relevant functions are InternalMemCopyMem() and __memcpy(). When TCG translates this aarch64 code to x86_64, everything works fine. When TCG translates this aarch64 code to ARMv7, the destination area of the translated CopyMem() function becomes corrupted -- it differs from the intended source contents. Namely, in every 4096 byte block, the 8-byte word at offset 4032 (0xFC0) is zeroed out in the destination, instead of receiving the intended source value. I'm attaching two hexdumps of the same destination area: - "good.txt" is a hexdump of the destination area when CopyMem() was translated to x86_64, - "bad.txt" is a hexdump of the destination area when CopyMem() was translated to ARMv7. In order to assist with the analysis of this issue, I disassembled the aarch64 binary with "objdump". Please find the listing in "DxeCore.objdump", attached. The InternalMemCopyMem() function starts at hex offset 2b2ec. The __memcpy() function starts at hex offset 2b180. And, I ran the guest on the ARMv7 host with "-d in_asm,op,op_opt,op_ind,out_asm". Please find the log in "tcg.in_asm.op.op_opt.op_ind.out_asm.log", attached. The TBs that correspond to (parts of) the InternalMemCopyMem() and __memcpy() functions are scattered over the TCG log file, but the offset between the "nice" disassembly from "DxeCore.objdump", and the in-RAM TBs in the TCG log, can be determined from the fact that there is a single prfm instruction in the entire binary. The instruction's offset is 0x2b180 in "DxeCore.objdump" -- at the beginning of the __memcpy() function --, and its RAM address is 0x472d2180 in the TCG log. Thus the difference (= the load address of DxeCore.efi) is 0x472a7000. QEMU was built at commit a4f667b67149 ("Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20190521-3' into staging", 2019-05-21). The reproducer command line is (on an ARMv7 host): qemu-system-aarch64 \ -display none \ -machine virt,accel=tcg \ -nodefaults \ -nographic \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-aarch64-code.fd,readonly \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-arm-vars.fd,snapshot=on \ -cpu cortex-a57 \ -chardev stdio,signal=off,mux=on,id=char0 \ -mon chardev=char0,mode=readline \ -serial chardev:char0 The apparent symptom is an assertion failure *in the guest*, such as > ASSERT [DxeCore] > /home/lacos/src/upstream/qemu/roms/edk2/MdePkg/Library/BaseLib/String.c(1090): > Length < _gPcd_FixedAtBuild_PcdMaximumAsciiStringLength but that is only a (distant) consequence of the CopyMem() mistranslation, and resultant destination area corruption. Originally reported in the following two mailing list messages: - http://mid.mail-archive.com/9d2e260c-c491-03d2-9b8b-b57b72083f77@redhat.com - http://mid.mail-archive.com/f1cec8c0-1a9b-f5bb-f951-ea0ba9d276ee@redhat.com To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1830872/+subscriptions ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] cputlb: use uint64_t for interim values for unaligned load @ 2019-06-03 18:29 ` Laszlo Ersek 0 siblings, 0 replies; 27+ messages in thread From: Laszlo Ersek @ 2019-06-03 18:29 UTC (permalink / raw) To: Alex Bennée, qemu-devel Cc: 1830872, qemu-arm, Paolo Bonzini, Igor Mammedov, randrianasulu, Richard Henderson (+Igor) On 06/03/19 17:01, Alex Bennée wrote: > When running on 32 bit TCG backends a wide unaligned load ends up > truncating data before returning to the guest. We specifically have > the return type as uint64_t to avoid any premature truncation so we > should use the same for the interim types. > > Hopefully fixes #1830872 > > Signed-off-by: Alex Bennée <alex.bennee@linaro.org> > --- > accel/tcg/cputlb.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c > index cdcc3771020..b796ab1cbea 100644 > --- a/accel/tcg/cputlb.c > +++ b/accel/tcg/cputlb.c > @@ -1303,7 +1303,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi, > && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1 > >= TARGET_PAGE_SIZE)) { > target_ulong addr1, addr2; > - tcg_target_ulong r1, r2; > + uint64_t r1, r2; > unsigned shift; > do_unaligned_access: > addr1 = addr & ~(size - 1); > Applied on top of commit ad88e4252f09c2956b99c90de39e95bab2e8e7af: Tested-by: Laszlo Ersek <lersek@redhat.com> Thanks! Laszlo ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] cputlb: use uint64_t for interim values for unaligned load 2019-06-03 15:01 ` [Qemu-devel] [Bug 1830872] " Alex Bennée ` (2 preceding siblings ...) (?) @ 2019-06-03 22:01 ` Richard Henderson -1 siblings, 0 replies; 27+ messages in thread From: Richard Henderson @ 2019-06-03 22:01 UTC (permalink / raw) To: Alex Bennée, qemu-devel Cc: 1830872, Richard Henderson, qemu-arm, randrianasulu, Paolo Bonzini On 6/3/19 10:01 AM, Alex Bennée wrote: > When running on 32 bit TCG backends a wide unaligned load ends up > truncating data before returning to the guest. We specifically have > the return type as uint64_t to avoid any premature truncation so we > should use the same for the interim types. > > Hopefully fixes #1830872 > > Signed-off-by: Alex Bennée <alex.bennee@linaro.org> > --- > accel/tcg/cputlb.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Reviewed-by: Richard Henderson <richard.henderson@linaro.org> r~ ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] cputlb: use uint64_t for interim values for unaligned load 2019-06-03 15:01 ` [Qemu-devel] [Bug 1830872] " Alex Bennée ` (3 preceding siblings ...) (?) @ 2019-06-04 6:52 ` Philippe Mathieu-Daudé -1 siblings, 0 replies; 27+ messages in thread From: Philippe Mathieu-Daudé @ 2019-06-04 6:52 UTC (permalink / raw) To: Alex Bennée, qemu-devel Cc: 1830872, Richard Henderson, qemu-arm, randrianasulu, Paolo Bonzini On 6/3/19 5:01 PM, Alex Bennée wrote: > When running on 32 bit TCG backends a wide unaligned load ends up > truncating data before returning to the guest. We specifically have > the return type as uint64_t to avoid any premature truncation so we > should use the same for the interim types. > > Hopefully fixes #1830872 Maybe clearer as: Fixes: https://bugs.launchpad.net/qemu/+bug/1830872 Fixes: eed5664238e > > Signed-off-by: Alex Bennée <alex.bennee@linaro.org> > --- > accel/tcg/cputlb.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c > index cdcc3771020..b796ab1cbea 100644 > --- a/accel/tcg/cputlb.c > +++ b/accel/tcg/cputlb.c > @@ -1303,7 +1303,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi, > && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1 > >= TARGET_PAGE_SIZE)) { > target_ulong addr1, addr2; > - tcg_target_ulong r1, r2; > + uint64_t r1, r2; > unsigned shift; > do_unaligned_access: > addr1 = addr & ~(size - 1); > Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] cputlb: use uint64_t for interim values for unaligned load @ 2019-06-04 11:42 ` Igor 0 siblings, 0 replies; 27+ messages in thread From: Igor Mammedov @ 2019-06-04 11:42 UTC (permalink / raw) To: Alex Bennée Cc: qemu-devel, 1830872, qemu-arm, Paolo Bonzini, randrianasulu, Richard Henderson On Mon, 3 Jun 2019 16:01:20 +0100 Alex Bennée <alex.bennee@linaro.org> wrote: > When running on 32 bit TCG backends a wide unaligned load ends up > truncating data before returning to the guest. We specifically have > the return type as uint64_t to avoid any premature truncation so we > should use the same for the interim types. > > Hopefully fixes #1830872 > > Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Fixes arm/virt bios-tables-test for me, so Tested-by: Igor Mammedov <imammedo@redhat.com> > --- > accel/tcg/cputlb.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c > index cdcc3771020..b796ab1cbea 100644 > --- a/accel/tcg/cputlb.c > +++ b/accel/tcg/cputlb.c > @@ -1303,7 +1303,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi, > && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1 > >= TARGET_PAGE_SIZE)) { > target_ulong addr1, addr2; > - tcg_target_ulong r1, r2; > + uint64_t r1, r2; > unsigned shift; > do_unaligned_access: > addr1 = addr & ~(size - 1); ^ permalink raw reply [flat|nested] 27+ messages in thread
* [Qemu-devel] [Bug 1830872] Re: [RFC PATCH] cputlb: use uint64_t for interim values for unaligned load @ 2019-06-04 11:42 ` Igor 0 siblings, 0 replies; 27+ messages in thread From: Igor @ 2019-06-04 11:42 UTC (permalink / raw) To: qemu-devel On Mon, 3 Jun 2019 16:01:20 +0100 Alex Bennée <alex.bennee@linaro.org> wrote: > When running on 32 bit TCG backends a wide unaligned load ends up > truncating data before returning to the guest. We specifically have > the return type as uint64_t to avoid any premature truncation so we > should use the same for the interim types. > > Hopefully fixes #1830872 > > Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Fixes arm/virt bios-tables-test for me, so Tested-by: Igor Mammedov <imammedo@redhat.com> > --- > accel/tcg/cputlb.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c > index cdcc3771020..b796ab1cbea 100644 > --- a/accel/tcg/cputlb.c > +++ b/accel/tcg/cputlb.c > @@ -1303,7 +1303,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi, > && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1 > >= TARGET_PAGE_SIZE)) { > target_ulong addr1, addr2; > - tcg_target_ulong r1, r2; > + uint64_t r1, r2; > unsigned shift; > do_unaligned_access: > addr1 = addr & ~(size - 1); -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1830872 Title: AARCH64 to ARMv7 mistranslation in TCG Status in QEMU: New Bug description: The following guest code: https://github.com/tianocore/edk2/blob/3604174718e2afc950c3cc64c64ba5165c8692bd/MdePkg/Library/BaseMemoryLibOptDxe/AArch64/CopyMem.S implements, in hand-optimized aarch64 assembly, the CopyMem() edk2 (EFI Development Kit II) library function. (CopyMem() basically has memmove() semantics, to provide a standard C analog here.) The relevant functions are InternalMemCopyMem() and __memcpy(). When TCG translates this aarch64 code to x86_64, everything works fine. When TCG translates this aarch64 code to ARMv7, the destination area of the translated CopyMem() function becomes corrupted -- it differs from the intended source contents. Namely, in every 4096 byte block, the 8-byte word at offset 4032 (0xFC0) is zeroed out in the destination, instead of receiving the intended source value. I'm attaching two hexdumps of the same destination area: - "good.txt" is a hexdump of the destination area when CopyMem() was translated to x86_64, - "bad.txt" is a hexdump of the destination area when CopyMem() was translated to ARMv7. In order to assist with the analysis of this issue, I disassembled the aarch64 binary with "objdump". Please find the listing in "DxeCore.objdump", attached. The InternalMemCopyMem() function starts at hex offset 2b2ec. The __memcpy() function starts at hex offset 2b180. And, I ran the guest on the ARMv7 host with "-d in_asm,op,op_opt,op_ind,out_asm". Please find the log in "tcg.in_asm.op.op_opt.op_ind.out_asm.log", attached. The TBs that correspond to (parts of) the InternalMemCopyMem() and __memcpy() functions are scattered over the TCG log file, but the offset between the "nice" disassembly from "DxeCore.objdump", and the in-RAM TBs in the TCG log, can be determined from the fact that there is a single prfm instruction in the entire binary. The instruction's offset is 0x2b180 in "DxeCore.objdump" -- at the beginning of the __memcpy() function --, and its RAM address is 0x472d2180 in the TCG log. Thus the difference (= the load address of DxeCore.efi) is 0x472a7000. QEMU was built at commit a4f667b67149 ("Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20190521-3' into staging", 2019-05-21). The reproducer command line is (on an ARMv7 host): qemu-system-aarch64 \ -display none \ -machine virt,accel=tcg \ -nodefaults \ -nographic \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-aarch64-code.fd,readonly \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-arm-vars.fd,snapshot=on \ -cpu cortex-a57 \ -chardev stdio,signal=off,mux=on,id=char0 \ -mon chardev=char0,mode=readline \ -serial chardev:char0 The apparent symptom is an assertion failure *in the guest*, such as > ASSERT [DxeCore] > /home/lacos/src/upstream/qemu/roms/edk2/MdePkg/Library/BaseLib/String.c(1090): > Length < _gPcd_FixedAtBuild_PcdMaximumAsciiStringLength but that is only a (distant) consequence of the CopyMem() mistranslation, and resultant destination area corruption. Originally reported in the following two mailing list messages: - http://mid.mail-archive.com/9d2e260c-c491-03d2-9b8b-b57b72083f77@redhat.com - http://mid.mail-archive.com/f1cec8c0-1a9b-f5bb-f951-ea0ba9d276ee@redhat.com To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1830872/+subscriptions ^ permalink raw reply [flat|nested] 27+ messages in thread
* [Qemu-devel] [Bug 1830872] Re: AARCH64 to ARMv7 mistranslation in TCG 2019-05-29 9:13 [Qemu-devel] [Bug 1830872] [NEW] AARCH64 to ARMv7 mistranslation in TCG Laszlo Ersek (Red Hat) ` (3 preceding siblings ...) 2019-06-03 15:01 ` [Qemu-devel] [Bug 1830872] " Alex Bennée @ 2019-06-03 15:27 ` Laszlo Ersek (Red Hat) 2019-06-03 15:45 ` Laszlo Ersek (Red Hat) ` (4 subsequent siblings) 9 siblings, 0 replies; 27+ messages in thread From: Laszlo Ersek (Red Hat) @ 2019-06-03 15:27 UTC (permalink / raw) To: qemu-devel I confirm that QEMU works fine (for the use case originally reported in this LP ticket) when built at commit a6ae23831b, i.e. at the parent of eed5664238ea. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1830872 Title: AARCH64 to ARMv7 mistranslation in TCG Status in QEMU: New Bug description: The following guest code: https://github.com/tianocore/edk2/blob/3604174718e2afc950c3cc64c64ba5165c8692bd/MdePkg/Library/BaseMemoryLibOptDxe/AArch64/CopyMem.S implements, in hand-optimized aarch64 assembly, the CopyMem() edk2 (EFI Development Kit II) library function. (CopyMem() basically has memmove() semantics, to provide a standard C analog here.) The relevant functions are InternalMemCopyMem() and __memcpy(). When TCG translates this aarch64 code to x86_64, everything works fine. When TCG translates this aarch64 code to ARMv7, the destination area of the translated CopyMem() function becomes corrupted -- it differs from the intended source contents. Namely, in every 4096 byte block, the 8-byte word at offset 4032 (0xFC0) is zeroed out in the destination, instead of receiving the intended source value. I'm attaching two hexdumps of the same destination area: - "good.txt" is a hexdump of the destination area when CopyMem() was translated to x86_64, - "bad.txt" is a hexdump of the destination area when CopyMem() was translated to ARMv7. In order to assist with the analysis of this issue, I disassembled the aarch64 binary with "objdump". Please find the listing in "DxeCore.objdump", attached. The InternalMemCopyMem() function starts at hex offset 2b2ec. The __memcpy() function starts at hex offset 2b180. And, I ran the guest on the ARMv7 host with "-d in_asm,op,op_opt,op_ind,out_asm". Please find the log in "tcg.in_asm.op.op_opt.op_ind.out_asm.log", attached. The TBs that correspond to (parts of) the InternalMemCopyMem() and __memcpy() functions are scattered over the TCG log file, but the offset between the "nice" disassembly from "DxeCore.objdump", and the in-RAM TBs in the TCG log, can be determined from the fact that there is a single prfm instruction in the entire binary. The instruction's offset is 0x2b180 in "DxeCore.objdump" -- at the beginning of the __memcpy() function --, and its RAM address is 0x472d2180 in the TCG log. Thus the difference (= the load address of DxeCore.efi) is 0x472a7000. QEMU was built at commit a4f667b67149 ("Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20190521-3' into staging", 2019-05-21). The reproducer command line is (on an ARMv7 host): qemu-system-aarch64 \ -display none \ -machine virt,accel=tcg \ -nodefaults \ -nographic \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-aarch64-code.fd,readonly \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-arm-vars.fd,snapshot=on \ -cpu cortex-a57 \ -chardev stdio,signal=off,mux=on,id=char0 \ -mon chardev=char0,mode=readline \ -serial chardev:char0 The apparent symptom is an assertion failure *in the guest*, such as > ASSERT [DxeCore] > /home/lacos/src/upstream/qemu/roms/edk2/MdePkg/Library/BaseLib/String.c(1090): > Length < _gPcd_FixedAtBuild_PcdMaximumAsciiStringLength but that is only a (distant) consequence of the CopyMem() mistranslation, and resultant destination area corruption. Originally reported in the following two mailing list messages: - http://mid.mail-archive.com/9d2e260c-c491-03d2-9b8b-b57b72083f77@redhat.com - http://mid.mail-archive.com/f1cec8c0-1a9b-f5bb-f951-ea0ba9d276ee@redhat.com To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1830872/+subscriptions ^ permalink raw reply [flat|nested] 27+ messages in thread
* [Qemu-devel] [Bug 1830872] Re: AARCH64 to ARMv7 mistranslation in TCG 2019-05-29 9:13 [Qemu-devel] [Bug 1830872] [NEW] AARCH64 to ARMv7 mistranslation in TCG Laszlo Ersek (Red Hat) ` (4 preceding siblings ...) 2019-06-03 15:27 ` [Qemu-devel] [Bug 1830872] Re: AARCH64 to ARMv7 mistranslation in TCG Laszlo Ersek (Red Hat) @ 2019-06-03 15:45 ` Laszlo Ersek (Red Hat) 2019-06-03 15:51 ` Alex Bennée ` (3 subsequent siblings) 9 siblings, 0 replies; 27+ messages in thread From: Laszlo Ersek (Red Hat) @ 2019-06-03 15:45 UTC (permalink / raw) To: qemu-devel Sorry the patch in comment #5 wasn't visible when I wrote what would end up as comment #6. I'll test the patch later. Thanks! -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1830872 Title: AARCH64 to ARMv7 mistranslation in TCG Status in QEMU: New Bug description: The following guest code: https://github.com/tianocore/edk2/blob/3604174718e2afc950c3cc64c64ba5165c8692bd/MdePkg/Library/BaseMemoryLibOptDxe/AArch64/CopyMem.S implements, in hand-optimized aarch64 assembly, the CopyMem() edk2 (EFI Development Kit II) library function. (CopyMem() basically has memmove() semantics, to provide a standard C analog here.) The relevant functions are InternalMemCopyMem() and __memcpy(). When TCG translates this aarch64 code to x86_64, everything works fine. When TCG translates this aarch64 code to ARMv7, the destination area of the translated CopyMem() function becomes corrupted -- it differs from the intended source contents. Namely, in every 4096 byte block, the 8-byte word at offset 4032 (0xFC0) is zeroed out in the destination, instead of receiving the intended source value. I'm attaching two hexdumps of the same destination area: - "good.txt" is a hexdump of the destination area when CopyMem() was translated to x86_64, - "bad.txt" is a hexdump of the destination area when CopyMem() was translated to ARMv7. In order to assist with the analysis of this issue, I disassembled the aarch64 binary with "objdump". Please find the listing in "DxeCore.objdump", attached. The InternalMemCopyMem() function starts at hex offset 2b2ec. The __memcpy() function starts at hex offset 2b180. And, I ran the guest on the ARMv7 host with "-d in_asm,op,op_opt,op_ind,out_asm". Please find the log in "tcg.in_asm.op.op_opt.op_ind.out_asm.log", attached. The TBs that correspond to (parts of) the InternalMemCopyMem() and __memcpy() functions are scattered over the TCG log file, but the offset between the "nice" disassembly from "DxeCore.objdump", and the in-RAM TBs in the TCG log, can be determined from the fact that there is a single prfm instruction in the entire binary. The instruction's offset is 0x2b180 in "DxeCore.objdump" -- at the beginning of the __memcpy() function --, and its RAM address is 0x472d2180 in the TCG log. Thus the difference (= the load address of DxeCore.efi) is 0x472a7000. QEMU was built at commit a4f667b67149 ("Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20190521-3' into staging", 2019-05-21). The reproducer command line is (on an ARMv7 host): qemu-system-aarch64 \ -display none \ -machine virt,accel=tcg \ -nodefaults \ -nographic \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-aarch64-code.fd,readonly \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-arm-vars.fd,snapshot=on \ -cpu cortex-a57 \ -chardev stdio,signal=off,mux=on,id=char0 \ -mon chardev=char0,mode=readline \ -serial chardev:char0 The apparent symptom is an assertion failure *in the guest*, such as > ASSERT [DxeCore] > /home/lacos/src/upstream/qemu/roms/edk2/MdePkg/Library/BaseLib/String.c(1090): > Length < _gPcd_FixedAtBuild_PcdMaximumAsciiStringLength but that is only a (distant) consequence of the CopyMem() mistranslation, and resultant destination area corruption. Originally reported in the following two mailing list messages: - http://mid.mail-archive.com/9d2e260c-c491-03d2-9b8b-b57b72083f77@redhat.com - http://mid.mail-archive.com/f1cec8c0-1a9b-f5bb-f951-ea0ba9d276ee@redhat.com To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1830872/+subscriptions ^ permalink raw reply [flat|nested] 27+ messages in thread
* [Qemu-devel] [Bug 1830872] Re: AARCH64 to ARMv7 mistranslation in TCG 2019-05-29 9:13 [Qemu-devel] [Bug 1830872] [NEW] AARCH64 to ARMv7 mistranslation in TCG Laszlo Ersek (Red Hat) ` (5 preceding siblings ...) 2019-06-03 15:45 ` Laszlo Ersek (Red Hat) @ 2019-06-03 15:51 ` Alex Bennée 2019-06-03 16:53 ` Andrew Randrianasulu 2019-06-03 17:03 ` Andrew Randrianasulu ` (2 subsequent siblings) 9 siblings, 1 reply; 27+ messages in thread From: Alex Bennée @ 2019-06-03 15:51 UTC (permalink / raw) To: qemu-devel I managed to tweak the memory test enough to detect the failure on aarch64-on-armv7 and I the attached patch fixes it. Could you please double check with your test case? -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1830872 Title: AARCH64 to ARMv7 mistranslation in TCG Status in QEMU: New Bug description: The following guest code: https://github.com/tianocore/edk2/blob/3604174718e2afc950c3cc64c64ba5165c8692bd/MdePkg/Library/BaseMemoryLibOptDxe/AArch64/CopyMem.S implements, in hand-optimized aarch64 assembly, the CopyMem() edk2 (EFI Development Kit II) library function. (CopyMem() basically has memmove() semantics, to provide a standard C analog here.) The relevant functions are InternalMemCopyMem() and __memcpy(). When TCG translates this aarch64 code to x86_64, everything works fine. When TCG translates this aarch64 code to ARMv7, the destination area of the translated CopyMem() function becomes corrupted -- it differs from the intended source contents. Namely, in every 4096 byte block, the 8-byte word at offset 4032 (0xFC0) is zeroed out in the destination, instead of receiving the intended source value. I'm attaching two hexdumps of the same destination area: - "good.txt" is a hexdump of the destination area when CopyMem() was translated to x86_64, - "bad.txt" is a hexdump of the destination area when CopyMem() was translated to ARMv7. In order to assist with the analysis of this issue, I disassembled the aarch64 binary with "objdump". Please find the listing in "DxeCore.objdump", attached. The InternalMemCopyMem() function starts at hex offset 2b2ec. The __memcpy() function starts at hex offset 2b180. And, I ran the guest on the ARMv7 host with "-d in_asm,op,op_opt,op_ind,out_asm". Please find the log in "tcg.in_asm.op.op_opt.op_ind.out_asm.log", attached. The TBs that correspond to (parts of) the InternalMemCopyMem() and __memcpy() functions are scattered over the TCG log file, but the offset between the "nice" disassembly from "DxeCore.objdump", and the in-RAM TBs in the TCG log, can be determined from the fact that there is a single prfm instruction in the entire binary. The instruction's offset is 0x2b180 in "DxeCore.objdump" -- at the beginning of the __memcpy() function --, and its RAM address is 0x472d2180 in the TCG log. Thus the difference (= the load address of DxeCore.efi) is 0x472a7000. QEMU was built at commit a4f667b67149 ("Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20190521-3' into staging", 2019-05-21). The reproducer command line is (on an ARMv7 host): qemu-system-aarch64 \ -display none \ -machine virt,accel=tcg \ -nodefaults \ -nographic \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-aarch64-code.fd,readonly \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-arm-vars.fd,snapshot=on \ -cpu cortex-a57 \ -chardev stdio,signal=off,mux=on,id=char0 \ -mon chardev=char0,mode=readline \ -serial chardev:char0 The apparent symptom is an assertion failure *in the guest*, such as > ASSERT [DxeCore] > /home/lacos/src/upstream/qemu/roms/edk2/MdePkg/Library/BaseLib/String.c(1090): > Length < _gPcd_FixedAtBuild_PcdMaximumAsciiStringLength but that is only a (distant) consequence of the CopyMem() mistranslation, and resultant destination area corruption. Originally reported in the following two mailing list messages: - http://mid.mail-archive.com/9d2e260c-c491-03d2-9b8b-b57b72083f77@redhat.com - http://mid.mail-archive.com/f1cec8c0-1a9b-f5bb-f951-ea0ba9d276ee@redhat.com To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1830872/+subscriptions ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] [Bug 1830872] Re: AARCH64 to ARMv7 mistranslation in TCG 2019-06-03 15:51 ` Alex Bennée @ 2019-06-03 16:53 ` Andrew Randrianasulu 0 siblings, 0 replies; 27+ messages in thread From: Andrew Randrianasulu @ 2019-06-03 16:53 UTC (permalink / raw) To: qemu-devel В сообщении от Monday 03 June 2019 18:51:40 Alex Bennée написал(а): > I managed to tweak the memory test enough to detect the failure on > aarch64-on-armv7 and I the attached patch fixes it. Could you please > double check with your test case? > Hm, I manually applied path from LP(git diff disliked copypasted patch), so for now git diff in qemu tree shows: diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c index cdcc377102..b796ab1cbe 100644 --- a/accel/tcg/cputlb.c +++ b/accel/tcg/cputlb.c @@ -1303,7 +1303,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi, && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1 >= TARGET_PAGE_SIZE)) { target_ulong addr1, addr2; - tcg_target_ulong r1, r2; + uint64_t r1, r2; unsigned shift; do_unaligned_access: addr1 = addr & ~(size - 1); lines 1-13/13 (END) --------- but x86_64-softmmu/qemu-system-x86_64 -kernel /boot/bzImage-4.12.0-x64 -accel tcg still hangs at 'booting the kernel" (it decompress OK) I make distclean'ed source tree and reconfigured it: ./configure --target-list=x86_64-softmmu --disable-werror --enable-debug-tcg --cross-cc-x86_64="/opt/kgcc64/bin/x86_64-unknown-linux-gnu-gcc-6.5.0" next, make -j 5 and test. Hm. I tried debug switches, it seems to hang a bit differently for two runs: x86_64-softmmu/qemu-system-x86_64 -kernel /boot/bzImage-4.12.0-x64 -accel tcg -nographic -d in_asm,op,op_opt,op_ind,out_asm ===================== IN: 0xffffffff810e8a63: 48 83 c3 64 addq $0x64, %rbx 0xffffffff810e8a67: eb c2 jmp 0xffffffff810e8a2b OP: ld_i32 tmp18,env,$0xfffffff0 movi_i32 tmp19,$0x0 brcond_i32 tmp18,tmp19,lt,$L0 ---- ffffffff810e8a63 0000000000000000 movi_i32 tmp2,$0x64 movi_i32 tmp3,$0x0 mov_i32 tmp0,rbx_0 mov_i32 tmp1,rbx_1 add2_i32 tmp0,tmp1,tmp0,tmp1,tmp2,tmp3 mov_i32 rbx_0,tmp0 mov_i32 rbx_1,tmp1 mov_i32 cc_src_0,tmp2 mov_i32 cc_src_1,tmp3 mov_i32 cc_dst_0,tmp0 mov_i32 cc_dst_1,tmp1 discard cc_src2_0 discard cc_src2_1 discard cc_op ---- ffffffff810e8a67 0000000000000009 movi_i32 cc_op,$0x9 goto_tb $0x0 movi_i32 tmp6,$0x810e8a2b movi_i32 tmp7,$0xffffffff st_i32 tmp6,env,$0x80 st_i32 tmp7,env,$0x84 exit_tb $0xf2f1c080 set_label $L0 exit_tb $0xf2f1c083 OP after optimization and liveness analysis: ld_i32 tmp18,env,$0xfffffff0 dead: 1 pref=0xff movi_i32 tmp19,$0x0 pref=0xff brcond_i32 tmp18,tmp19,lt,$L0 dead: 0 1 ---- ffffffff810e8a63 0000000000000000 movi_i32 tmp2,$0x64 pref=0xff movi_i32 tmp3,$0x0 pref=0xff add2_i32 tmp0,tmp1,rbx_0,rbx_1,tmp2,tmp3 dead: 2 3 pref=0xff,0xff mov_i32 rbx_0,tmp0 sync: 0 dead: 1 pref=0xff mov_i32 rbx_1,tmp1 sync: 0 dead: 1 pref=0xff mov_i32 cc_src_0,tmp2 sync: 0 dead: 0 1 pref=0xff mov_i32 cc_src_1,tmp3 sync: 0 dead: 0 1 pref=0xff mov_i32 cc_dst_0,rbx_0 sync: 0 dead: 0 1 pref=0xff mov_i32 cc_dst_1,rbx_1 sync: 0 dead: 0 1 pref=0xff discard cc_src2_0 pref=0xff discard cc_src2_1 pref=0xff discard cc_op pref=0xff mov_i32 cc_dst_0,tmp0 mov_i32 cc_dst_1,tmp1 discard cc_src2_0 discard cc_src2_1 discard cc_op ---- ffffffff810e8a55 0000000000000021 movi_i32 cc_op,$0x21 movi_i32 tmp20,$0x0 movi_i32 tmp21,$0x0 brcond2_i32 cc_dst_0,cc_dst_1,tmp20,tmp21,eq,$L1 goto_tb $0x0 movi_i32 tmp6,$0x810e8a57 movi_i32 tmp7,$0xffffffff st_i32 tmp6,env,$0x80 st_i32 tmp7,env,$0x84 exit_tb $0xf2f1c180 set_label $L1 goto_tb $0x1 movi_i32 tmp6,$0x810e8a63 movi_i32 tmp7,$0xffffffff st_i32 tmp6,env,$0x80 st_i32 tmp7,env,$0x84 exit_tb $0xf2f1c181 set_label $L0 exit_tb $0xf2f1c183 OP after optimization and liveness analysis: ld_i32 tmp18,env,$0xfffffff0 dead: 1 pref=0xff movi_i32 tmp19,$0x0 pref=0xff brcond_i32 tmp18,tmp19,lt,$L0 dead: 0 1 ---- ffffffff810e8a4c 0000000000000000 ---- ffffffff810e8a52 0000000000000000 movi_i32 tmp1,$0x0 pref=0xff movi_i32 tmp0,$0x64 pref=0xff mov_i32 r14_0,tmp0 sync: 0 dead: 1 pref=0xf8 mov_i32 r14_1,tmp1 sync: 0 dead: 1 pref=0xf8 call cc_compute_c,$0x5,$2,cc_src_0,cc_src_1,cc_dst_0,cc_dst_1,cc_src_0,cc_src_1,cc_src2_0,cc_src2_1,cc_op sync: 0 1 dead: 0 1 2 3 4 5 6 7 8 pref=none,none mov_i32 cc_dst_0,r14_0 sync: 0 dead: 0 1 pref=0xff mov_i32 cc_dst_1,r14_1 sync: 0 dead: 0 1 pref=0xffУбито (killed by me) ================== IN: 0xffffffff810e8a61: eb ef jmp 0xffffffff810e8a52 OP: ld_i32 tmp18,env,$0xfffffff0 movi_i32 tmp19,$0x0 brcond_i32 tmp18,tmp19,lt,$L0 ---- ffffffff810e8a61 0000000000000000 goto_tb $0x0 movi_i32 tmp6,$0x810e8a52 movi_i32 tmp7,$0xffffffff st_i32 tmp6,env,$0x80 st_i32 tmp7,env,$0x84 exit_tb $0xf2f22900 set_label $L0 exit_tb $0xf2f22903 OP after optimization and liveness analysis: ld_i32 tmp18,env,$0xfffffff0 dead: 1 pref=0xff movi_i32 tmp19,$0x0 pref=0xff brcond_i32 tmp18,tmp19,lt,$L0 dead: 0 1 ---- ffffffff810e8a61 0000000000000000 goto_tb $0x0 movi_i32 tmp6,$0x810e8a52 pref=0xff movi_i32 tmp7,$0xffffffff pref=0xff st_i32 tmp6,env,$0x80 dead: 0 st_i32 tmp7,env,$0x84 dead: 0 1 exit_tb $0xf2f22900 set_label $L0 exit_tb $0xf2f22903 OUT: [size=56] 0xf2f22980: 8b 5d f0 movl -0x10(%ebp), %ebx 0xf2f22983: 85 db testl %ebx, %ebx 0xf2f22985: 0f 8c 23 00 00 00 jl 0xf2f229ae 0xf2f2298b: e9 00 00 00 00 jmp 0xf2f22990 0xf2f22990: c7 85 80 00 00 00 52 8a movl $0x810e8a52, 0x80(%ebp) 0xf2f22998: 0e 81 0xf2f2299a: c7 85 84 00 00 00 ff ff movl $0xffffffff, 0x84(%ebp) 0xf2f229a2: ff ff 0xf2f229a4: b8 00 29 f2 f2 movl $0xf2f22900, %eax 0xf2f229a9: e9 69 46 c9 ff jmp 0xf2bb7017 0xf2f229ae: b8 03 29 f2 f2 movl $0xf2f22903, %eax 0xf2f229b3: e9 5f 46 c9 ff jmp 0xf2bb7017 ---------------- IN: 0xffffffff810e8a52: 49 ff ce decq %r14 0xffffffff810e8a55: 74 0c je 0xffffffff810e8a63 OP: ld_i32 tmp18,env,$0xfffffff0 movi_i32 tmp19,$0x0 brcond_i32 tmp18,tmp19,lt,$L0 ---- ffffffff810e8a52 0000000000000000 mov_i32 tmp0,r14_0 mov_i32 tmp1,r14_1 mov_i32 tmp0,r14_0 mov_i32 tmp1,r14_1 movi_i32 tmp20,$0xffffffff movi_i32 tmp21,$0xffffffff add2_i32 tmp0,tmp1,tmp0,tmp1,tmp20,tmp21 mov_i32 r14_0,tmp0 mov_i32 r14_1,tmp1 call cc_compute_c,$0x5,$2,cc_src_0,cc_src_1,cc_dst_0,cc_dst_1,cc_src_0,cc_src_1,cc_src2_0,cc_src2_1,cc_op mov_i32 cc_dst_0,tmp0 mov_i32 cc_dst_1,tmp1 discard cc_src2_0 discard cc_src2_1 discard cc_op ---- ffffffff810e8a55 0000000000000021 mov_i32 cc_dst_0,r14_0 sync: 0 dead: 1 pref=0xff mov_i32 cc_dst_1,r14_1 sync: 0 dead: 1 pref=0xff discard cc_src2_0 pref=0xff discard cc_src2_1 pref=0xff discard cc_op pref=0xff ---- ffffffff810e8a55 0000000000000021 movi_i32 cc_op,$0x21 sync: 0 dead: 0 pref=0xff movi_i32 tmp20,$0x0 pref=0xff movi_i32 tmp21,$0x0 pref=0xff brcond2_i32 cc_dst_0,cc_dst_1,tmp20,tmp21,eq,$L1 dead: 0 1 2 3 goto_tb $0x0 movi_i32 tmp6,$0x810e8a57 pref=0xff movi_i32 tmp7,$0xffffffff pref=0xff st_i32 tmp6,env,$0x80 dead: 0 st_i32 tmp7,env,$0x84 dead: 0 1 exit_tb $0xf2f229c0 set_label $L1 goto_tb $0x1 movi_i32 tmp6,$0x810e8a63 pref=0xff movi_i32 cc_op,$0x9 sync: 0 dead: 0 pref=0xff goto_tb $0x0 movi_i32 tmp6,$0x810e8a2b pref=0xff movi_i32 tmp7,$0xffffffff pref=0xff st_i32 tmp6,env,$0x80 dead: 0 st_i32 tmp7,env,$0x84 dead: 0 1 exit_tb $0xf2f22b40 set_label $L0 exit_tb $0xf2f22b43 OUT: [size=116] 0xf2f22bc0: 8b 5d f0 movl -0x10(%ebp), %ebx 0xf2f22bc3: 85 db testl %ebx, %ebx 0xf2f22bc5: 0f 8c 5f 00 00 00 jl 0xf2f22c2a 0xf2f22bcb: 8b 5d 18 movl 0x18(%ebp), %ebx 0xf2f22bce: 8b 75 1c movl 0x1c(%ebp), %esi 0xf2f22bd1: 83 c3 64 addl $0x64, %ebx 0xf2f22bd4: 83 d6 00 adcl $0, %esi 0xf2f22bd7: 89 5d 18 movl %ebx, 0x18(%ebp) Убито ============================= try kernel I use (it works with qemu compiiled under 64-bit Slackware, and also with kvm on 32-bit x86) sha256sum /boot/bzImage-4.12.0-x64 b4183376de17e8ea7a25094b7a526e99bcb8339b8703090684c93e0e0a50d284 /boot/bzImage-4.12.0-x64 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1830872 Title: AARCH64 to ARMv7 mistranslation in TCG Status in QEMU: New Bug description: The following guest code: https://github.com/tianocore/edk2/blob/3604174718e2afc950c3cc64c64ba5165c8692bd/MdePkg/Library/BaseMemoryLibOptDxe/AArch64/CopyMem.S implements, in hand-optimized aarch64 assembly, the CopyMem() edk2 (EFI Development Kit II) library function. (CopyMem() basically has memmove() semantics, to provide a standard C analog here.) The relevant functions are InternalMemCopyMem() and __memcpy(). When TCG translates this aarch64 code to x86_64, everything works fine. When TCG translates this aarch64 code to ARMv7, the destination area of the translated CopyMem() function becomes corrupted -- it differs from the intended source contents. Namely, in every 4096 byte block, the 8-byte word at offset 4032 (0xFC0) is zeroed out in the destination, instead of receiving the intended source value. I'm attaching two hexdumps of the same destination area: - "good.txt" is a hexdump of the destination area when CopyMem() was translated to x86_64, - "bad.txt" is a hexdump of the destination area when CopyMem() was translated to ARMv7. In order to assist with the analysis of this issue, I disassembled the aarch64 binary with "objdump". Please find the listing in "DxeCore.objdump", attached. The InternalMemCopyMem() function starts at hex offset 2b2ec. The __memcpy() function starts at hex offset 2b180. And, I ran the guest on the ARMv7 host with "-d in_asm,op,op_opt,op_ind,out_asm". Please find the log in "tcg.in_asm.op.op_opt.op_ind.out_asm.log", attached. The TBs that correspond to (parts of) the InternalMemCopyMem() and __memcpy() functions are scattered over the TCG log file, but the offset between the "nice" disassembly from "DxeCore.objdump", and the in-RAM TBs in the TCG log, can be determined from the fact that there is a single prfm instruction in the entire binary. The instruction's offset is 0x2b180 in "DxeCore.objdump" -- at the beginning of the __memcpy() function --, and its RAM address is 0x472d2180 in the TCG log. Thus the difference (= the load address of DxeCore.efi) is 0x472a7000. QEMU was built at commit a4f667b67149 ("Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20190521-3' into staging", 2019-05-21). The reproducer command line is (on an ARMv7 host): qemu-system-aarch64 \ -display none \ -machine virt,accel=tcg \ -nodefaults \ -nographic \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-aarch64-code.fd,readonly \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-arm-vars.fd,snapshot=on \ -cpu cortex-a57 \ -chardev stdio,signal=off,mux=on,id=char0 \ -mon chardev=char0,mode=readline \ -serial chardev:char0 The apparent symptom is an assertion failure *in the guest*, such as > ASSERT [DxeCore] > /home/lacos/src/upstream/qemu/roms/edk2/MdePkg/Library/BaseLib/String.c(1090): > Length < _gPcd_FixedAtBuild_PcdMaximumAsciiStringLength but that is only a (distant) consequence of the CopyMem() mistranslation, and resultant destination area corruption. Originally reported in the following two mailing list messages: - http://mid.mail-archive.com/9d2e260c-c491-03d2-9b8b-b57b72083f77@redhat.com - http://mid.mail-archive.com/f1cec8c0-1a9b-f5bb-f951-ea0ba9d276ee@redhat.com To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1830872/+subscriptions ^ permalink raw reply related [flat|nested] 27+ messages in thread
* [Qemu-devel] [Bug 1830872] Re: AARCH64 to ARMv7 mistranslation in TCG 2019-05-29 9:13 [Qemu-devel] [Bug 1830872] [NEW] AARCH64 to ARMv7 mistranslation in TCG Laszlo Ersek (Red Hat) ` (6 preceding siblings ...) 2019-06-03 15:51 ` Alex Bennée @ 2019-06-03 17:03 ` Andrew Randrianasulu 2019-06-17 18:21 ` Alex Bennée 2019-08-16 5:04 ` Thomas Huth 9 siblings, 0 replies; 27+ messages in thread From: Andrew Randrianasulu @ 2019-06-03 17:03 UTC (permalink / raw) To: qemu-devel ** Attachment added: "bzImage-4.12.0-x64" https://bugs.launchpad.net/bugs/1830872/+attachment/5268560/+files/bzImage-4.12.0-x64 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1830872 Title: AARCH64 to ARMv7 mistranslation in TCG Status in QEMU: New Bug description: The following guest code: https://github.com/tianocore/edk2/blob/3604174718e2afc950c3cc64c64ba5165c8692bd/MdePkg/Library/BaseMemoryLibOptDxe/AArch64/CopyMem.S implements, in hand-optimized aarch64 assembly, the CopyMem() edk2 (EFI Development Kit II) library function. (CopyMem() basically has memmove() semantics, to provide a standard C analog here.) The relevant functions are InternalMemCopyMem() and __memcpy(). When TCG translates this aarch64 code to x86_64, everything works fine. When TCG translates this aarch64 code to ARMv7, the destination area of the translated CopyMem() function becomes corrupted -- it differs from the intended source contents. Namely, in every 4096 byte block, the 8-byte word at offset 4032 (0xFC0) is zeroed out in the destination, instead of receiving the intended source value. I'm attaching two hexdumps of the same destination area: - "good.txt" is a hexdump of the destination area when CopyMem() was translated to x86_64, - "bad.txt" is a hexdump of the destination area when CopyMem() was translated to ARMv7. In order to assist with the analysis of this issue, I disassembled the aarch64 binary with "objdump". Please find the listing in "DxeCore.objdump", attached. The InternalMemCopyMem() function starts at hex offset 2b2ec. The __memcpy() function starts at hex offset 2b180. And, I ran the guest on the ARMv7 host with "-d in_asm,op,op_opt,op_ind,out_asm". Please find the log in "tcg.in_asm.op.op_opt.op_ind.out_asm.log", attached. The TBs that correspond to (parts of) the InternalMemCopyMem() and __memcpy() functions are scattered over the TCG log file, but the offset between the "nice" disassembly from "DxeCore.objdump", and the in-RAM TBs in the TCG log, can be determined from the fact that there is a single prfm instruction in the entire binary. The instruction's offset is 0x2b180 in "DxeCore.objdump" -- at the beginning of the __memcpy() function --, and its RAM address is 0x472d2180 in the TCG log. Thus the difference (= the load address of DxeCore.efi) is 0x472a7000. QEMU was built at commit a4f667b67149 ("Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20190521-3' into staging", 2019-05-21). The reproducer command line is (on an ARMv7 host): qemu-system-aarch64 \ -display none \ -machine virt,accel=tcg \ -nodefaults \ -nographic \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-aarch64-code.fd,readonly \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-arm-vars.fd,snapshot=on \ -cpu cortex-a57 \ -chardev stdio,signal=off,mux=on,id=char0 \ -mon chardev=char0,mode=readline \ -serial chardev:char0 The apparent symptom is an assertion failure *in the guest*, such as > ASSERT [DxeCore] > /home/lacos/src/upstream/qemu/roms/edk2/MdePkg/Library/BaseLib/String.c(1090): > Length < _gPcd_FixedAtBuild_PcdMaximumAsciiStringLength but that is only a (distant) consequence of the CopyMem() mistranslation, and resultant destination area corruption. Originally reported in the following two mailing list messages: - http://mid.mail-archive.com/9d2e260c-c491-03d2-9b8b-b57b72083f77@redhat.com - http://mid.mail-archive.com/f1cec8c0-1a9b-f5bb-f951-ea0ba9d276ee@redhat.com To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1830872/+subscriptions ^ permalink raw reply [flat|nested] 27+ messages in thread
* [Qemu-devel] [Bug 1830872] Re: AARCH64 to ARMv7 mistranslation in TCG 2019-05-29 9:13 [Qemu-devel] [Bug 1830872] [NEW] AARCH64 to ARMv7 mistranslation in TCG Laszlo Ersek (Red Hat) ` (7 preceding siblings ...) 2019-06-03 17:03 ` Andrew Randrianasulu @ 2019-06-17 18:21 ` Alex Bennée 2019-08-16 5:04 ` Thomas Huth 9 siblings, 0 replies; 27+ messages in thread From: Alex Bennée @ 2019-06-17 18:21 UTC (permalink / raw) To: qemu-devel ** Changed in: qemu Status: New => Fix Committed -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1830872 Title: AARCH64 to ARMv7 mistranslation in TCG Status in QEMU: Fix Committed Bug description: The following guest code: https://github.com/tianocore/edk2/blob/3604174718e2afc950c3cc64c64ba5165c8692bd/MdePkg/Library/BaseMemoryLibOptDxe/AArch64/CopyMem.S implements, in hand-optimized aarch64 assembly, the CopyMem() edk2 (EFI Development Kit II) library function. (CopyMem() basically has memmove() semantics, to provide a standard C analog here.) The relevant functions are InternalMemCopyMem() and __memcpy(). When TCG translates this aarch64 code to x86_64, everything works fine. When TCG translates this aarch64 code to ARMv7, the destination area of the translated CopyMem() function becomes corrupted -- it differs from the intended source contents. Namely, in every 4096 byte block, the 8-byte word at offset 4032 (0xFC0) is zeroed out in the destination, instead of receiving the intended source value. I'm attaching two hexdumps of the same destination area: - "good.txt" is a hexdump of the destination area when CopyMem() was translated to x86_64, - "bad.txt" is a hexdump of the destination area when CopyMem() was translated to ARMv7. In order to assist with the analysis of this issue, I disassembled the aarch64 binary with "objdump". Please find the listing in "DxeCore.objdump", attached. The InternalMemCopyMem() function starts at hex offset 2b2ec. The __memcpy() function starts at hex offset 2b180. And, I ran the guest on the ARMv7 host with "-d in_asm,op,op_opt,op_ind,out_asm". Please find the log in "tcg.in_asm.op.op_opt.op_ind.out_asm.log", attached. The TBs that correspond to (parts of) the InternalMemCopyMem() and __memcpy() functions are scattered over the TCG log file, but the offset between the "nice" disassembly from "DxeCore.objdump", and the in-RAM TBs in the TCG log, can be determined from the fact that there is a single prfm instruction in the entire binary. The instruction's offset is 0x2b180 in "DxeCore.objdump" -- at the beginning of the __memcpy() function --, and its RAM address is 0x472d2180 in the TCG log. Thus the difference (= the load address of DxeCore.efi) is 0x472a7000. QEMU was built at commit a4f667b67149 ("Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20190521-3' into staging", 2019-05-21). The reproducer command line is (on an ARMv7 host): qemu-system-aarch64 \ -display none \ -machine virt,accel=tcg \ -nodefaults \ -nographic \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-aarch64-code.fd,readonly \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-arm-vars.fd,snapshot=on \ -cpu cortex-a57 \ -chardev stdio,signal=off,mux=on,id=char0 \ -mon chardev=char0,mode=readline \ -serial chardev:char0 The apparent symptom is an assertion failure *in the guest*, such as > ASSERT [DxeCore] > /home/lacos/src/upstream/qemu/roms/edk2/MdePkg/Library/BaseLib/String.c(1090): > Length < _gPcd_FixedAtBuild_PcdMaximumAsciiStringLength but that is only a (distant) consequence of the CopyMem() mistranslation, and resultant destination area corruption. Originally reported in the following two mailing list messages: - http://mid.mail-archive.com/9d2e260c-c491-03d2-9b8b-b57b72083f77@redhat.com - http://mid.mail-archive.com/f1cec8c0-1a9b-f5bb-f951-ea0ba9d276ee@redhat.com To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1830872/+subscriptions ^ permalink raw reply [flat|nested] 27+ messages in thread
* [Qemu-devel] [Bug 1830872] Re: AARCH64 to ARMv7 mistranslation in TCG 2019-05-29 9:13 [Qemu-devel] [Bug 1830872] [NEW] AARCH64 to ARMv7 mistranslation in TCG Laszlo Ersek (Red Hat) ` (8 preceding siblings ...) 2019-06-17 18:21 ` Alex Bennée @ 2019-08-16 5:04 ` Thomas Huth 9 siblings, 0 replies; 27+ messages in thread From: Thomas Huth @ 2019-08-16 5:04 UTC (permalink / raw) To: qemu-devel https://git.qemu.org/?p=qemu.git;a=commitdiff;h=8c79b288513587e960b ** Changed in: qemu Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1830872 Title: AARCH64 to ARMv7 mistranslation in TCG Status in QEMU: Fix Released Bug description: The following guest code: https://github.com/tianocore/edk2/blob/3604174718e2afc950c3cc64c64ba5165c8692bd/MdePkg/Library/BaseMemoryLibOptDxe/AArch64/CopyMem.S implements, in hand-optimized aarch64 assembly, the CopyMem() edk2 (EFI Development Kit II) library function. (CopyMem() basically has memmove() semantics, to provide a standard C analog here.) The relevant functions are InternalMemCopyMem() and __memcpy(). When TCG translates this aarch64 code to x86_64, everything works fine. When TCG translates this aarch64 code to ARMv7, the destination area of the translated CopyMem() function becomes corrupted -- it differs from the intended source contents. Namely, in every 4096 byte block, the 8-byte word at offset 4032 (0xFC0) is zeroed out in the destination, instead of receiving the intended source value. I'm attaching two hexdumps of the same destination area: - "good.txt" is a hexdump of the destination area when CopyMem() was translated to x86_64, - "bad.txt" is a hexdump of the destination area when CopyMem() was translated to ARMv7. In order to assist with the analysis of this issue, I disassembled the aarch64 binary with "objdump". Please find the listing in "DxeCore.objdump", attached. The InternalMemCopyMem() function starts at hex offset 2b2ec. The __memcpy() function starts at hex offset 2b180. And, I ran the guest on the ARMv7 host with "-d in_asm,op,op_opt,op_ind,out_asm". Please find the log in "tcg.in_asm.op.op_opt.op_ind.out_asm.log", attached. The TBs that correspond to (parts of) the InternalMemCopyMem() and __memcpy() functions are scattered over the TCG log file, but the offset between the "nice" disassembly from "DxeCore.objdump", and the in-RAM TBs in the TCG log, can be determined from the fact that there is a single prfm instruction in the entire binary. The instruction's offset is 0x2b180 in "DxeCore.objdump" -- at the beginning of the __memcpy() function --, and its RAM address is 0x472d2180 in the TCG log. Thus the difference (= the load address of DxeCore.efi) is 0x472a7000. QEMU was built at commit a4f667b67149 ("Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20190521-3' into staging", 2019-05-21). The reproducer command line is (on an ARMv7 host): qemu-system-aarch64 \ -display none \ -machine virt,accel=tcg \ -nodefaults \ -nographic \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-aarch64-code.fd,readonly \ -drive if=pflash,format=raw,file=$prefix/share/qemu/edk2-arm-vars.fd,snapshot=on \ -cpu cortex-a57 \ -chardev stdio,signal=off,mux=on,id=char0 \ -mon chardev=char0,mode=readline \ -serial chardev:char0 The apparent symptom is an assertion failure *in the guest*, such as > ASSERT [DxeCore] > /home/lacos/src/upstream/qemu/roms/edk2/MdePkg/Library/BaseLib/String.c(1090): > Length < _gPcd_FixedAtBuild_PcdMaximumAsciiStringLength but that is only a (distant) consequence of the CopyMem() mistranslation, and resultant destination area corruption. Originally reported in the following two mailing list messages: - http://mid.mail-archive.com/9d2e260c-c491-03d2-9b8b-b57b72083f77@redhat.com - http://mid.mail-archive.com/f1cec8c0-1a9b-f5bb-f951-ea0ba9d276ee@redhat.com To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1830872/+subscriptions ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2019-08-16 5:20 UTC | newest] Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-05-29 9:13 [Qemu-devel] [Bug 1830872] [NEW] AARCH64 to ARMv7 mistranslation in TCG Laszlo Ersek (Red Hat) 2019-05-29 12:08 ` [Qemu-devel] [Bug 1830872] " Philippe Mathieu-Daudé 2019-06-02 10:19 ` Laszlo Ersek (Red Hat) 2019-06-02 13:30 ` Alex Bennée 2019-06-02 13:30 ` Alex Bennée 2019-06-03 11:56 ` Alex Bennée 2019-06-03 11:56 ` Alex Bennée 2019-06-02 14:54 ` Alex Bennée 2019-06-03 15:01 ` [Qemu-devel] [RFC PATCH] cputlb: use uint64_t for interim values for unaligned load Alex Bennée 2019-06-03 15:01 ` [Qemu-devel] [Bug 1830872] " Alex Bennée 2019-06-03 15:35 ` [Qemu-devel] " Andrew Randrianasulu 2019-06-03 15:35 ` [Qemu-devel] [Bug 1830872] " Andrew Randrianasulu 2019-06-04 9:43 ` [Qemu-devel] " Alex Bennée 2019-06-04 9:43 ` [Qemu-devel] [Bug 1830872] " Alex Bennée 2019-06-03 18:29 ` Laszlo Ersek (Red Hat) 2019-06-03 18:29 ` [Qemu-devel] " Laszlo Ersek 2019-06-03 22:01 ` Richard Henderson 2019-06-04 6:52 ` Philippe Mathieu-Daudé 2019-06-04 11:42 ` Igor Mammedov 2019-06-04 11:42 ` [Qemu-devel] [Bug 1830872] " Igor 2019-06-03 15:27 ` [Qemu-devel] [Bug 1830872] Re: AARCH64 to ARMv7 mistranslation in TCG Laszlo Ersek (Red Hat) 2019-06-03 15:45 ` Laszlo Ersek (Red Hat) 2019-06-03 15:51 ` Alex Bennée 2019-06-03 16:53 ` Andrew Randrianasulu 2019-06-03 17:03 ` Andrew Randrianasulu 2019-06-17 18:21 ` Alex Bennée 2019-08-16 5:04 ` Thomas Huth
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.