* Re: [xen-unstable test] 106504: regressions - FAIL
2017-03-07 9:16 ` Jan Beulich
@ 2017-03-07 4:24 ` Chao Gao
2017-03-07 14:11 ` Jan Beulich
2017-03-08 3:16 ` Xuquan (Quan Xu)
0 siblings, 2 replies; 20+ messages in thread
From: Chao Gao @ 2017-03-07 4:24 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel, Kevin Tian, osstest-admin, Xuquan
On Tue, Mar 07, 2017 at 02:16:50AM -0700, Jan Beulich wrote:
>>>> On 07.03.17 at 06:52, <osstest-admin@xenproject.org> wrote:
>> flight 106504 xen-unstable real [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/106504/
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
>> [...]
>> test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 16 guest-stop fail REGR. vs.
>> 106482
>
>Here we go:
>
>(XEN) d15v0: intack: 02:48 pt: 38
>(XEN) vIRR: 00000000 00000000 00000000 00000000 00000000 00000000 00010000 00000000
>(XEN) PIR: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>(XEN) Assertion 'intack.vector >= pt_vector' failed at intr.c:360
>(XEN) ----[ Xen-4.9-unstable x86_64 debug=y Not tainted ]----
>(XEN) CPU: 0
>(XEN) RIP: e008:[<ffff82d0802039e8>] vmx_intr_assist+0x5fa/0x61a
>(XEN) RFLAGS: 0000000000010292 CONTEXT: hypervisor (d15v0)
>(XEN) rax: ffff82d0804754a8 rbx: ffff83007f375680 rcx: 0000000000000000
>(XEN) rdx: ffff83007cd3ffff rsi: 000000000000000a rdi: ffff82d0803316d8
>(XEN) rbp: ffff83007cd3ff08 rsp: ffff83007cd3fea8 r8: ffff830277db8000
>(XEN) r9: 0000000000000001 r10: 0000000000000000 r11: 0000000000000001
>(XEN) r12: 00000000ffffffff r13: ffff82d0802b5b02 r14: ffff82d0802b5b02
>(XEN) r15: ffff83027d82e000 cr0: 0000000080050033 cr4: 00000000001526e0
>(XEN) cr3: 0000000259135000 cr2: 000000000164f034
>(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
>(XEN) Xen code around <ffff82d0802039e8> (vmx_intr_assist+0x5fa/0x61a):
>(XEN) fb ff ff e9 49 fc ff ff <0f> 0b 89 ce 48 89 df e8 2a 21 00 00 e9 49 fe ff
>(XEN) Xen stack trace from rsp=ffff83007cd3fea8:
>(XEN) ffff82d08044ab00 00000038ffffffff ffff83007cd3ffff ffff83027d82e000
>(XEN) ffff83007cd3fef8 ffff82d080133a3d ffff83007f375000 ffff83007f375000
>(XEN) ffff83007f7fc000 ffff83026df78000 0000000000000000 ffff83027d82e000
>(XEN) ffff83007cd3fdb0 ffff82d080213191 0000000000000004 00000000000000c2
>(XEN) 0000000000000020 0000000000000002 ffff880029994000 ffffffff81ade0a0
>(XEN) 0000000000000246 0000000000000000 ffff88002d000008 0000000000000004
>(XEN) 000000000000006c 0000000000000000 00000000000003f8 00000000000003f8
>(XEN) ffffffff81ade0a0 0000beef0000beef ffffffff81389ac4 000000bf0000beef
>(XEN) 0000000000000002 ffff88002f403e08 000000000000beef 000000000000beef
>(XEN) 000000000000beef 000000000000beef 000000000000beef 0000000000000000
>(XEN) ffff83007f375000 0000000000000000 00000000001526e0
>(XEN) Xen call trace:
>(XEN) [<ffff82d0802039e8>] vmx_intr_assist+0x5fa/0x61a
>(XEN) [<ffff82d080213191>] vmx_asm_vmexit_handler+0x41/0x120
>(XEN)
>(XEN)
>(XEN) ****************************************
>(XEN) Panic on CPU 0:
>(XEN) Assertion 'intack.vector >= pt_vector' failed at intr.c:360
>(XEN) ****************************************
>
>I didn't make an attempt at interpreting this yet, but I wonder if it
>is more than coincidence that - just like the first time the ASSERT()
>triggered - this is again a guest-stop of a qemuu-debianhvm.
>
Cc: xuquan.
Exciting! I have been monitoring osstest for about one months through
a python script. But I always crawl the flights one time a day.
From the output, the pt_vector is 0x38 and the intack.vector is
0x30. these two values are same with they were in the first time.
And only one bit 0x30 is set in vIRR. PIR is NULL. So maybe
our suspicion that PIR is not synced to vIRR is wrong. The 0x38 bit
is not present in vIRR is strange. Is it possible that we clear the 0x38 bit
just after we return from pt_update_irq()? Or, just like I suspected that
it is caused by pt_update_irq() sets 0x30 but wrongly returns 0x38.
Do you think it worths a try to disable guest's LAPIC timer and
force it use IRQ0 along with changing RTE very frequently?
If yes, I am glad to do it.
Thanks,
Chao
>Jan
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* [xen-unstable test] 106504: regressions - FAIL
@ 2017-03-07 5:52 osstest service owner
2017-03-07 9:16 ` Jan Beulich
0 siblings, 1 reply; 20+ messages in thread
From: osstest service owner @ 2017-03-07 5:52 UTC (permalink / raw)
To: xen-devel, osstest-admin
flight 106504 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/106504/
Regressions :-(
Tests which did not succeed and are blocking,
including tests which could not be run:
test-xtf-amd64-amd64-5 34 xtf/test-hvm32pae-swint-emulation fail REGR. vs. 106482
test-xtf-amd64-amd64-5 38 xtf/test-hvm32pse-swint-emulation fail REGR. vs. 106482
test-xtf-amd64-amd64-4 34 xtf/test-hvm32pae-swint-emulation fail REGR. vs. 106482
test-xtf-amd64-amd64-4 38 xtf/test-hvm32pse-swint-emulation fail REGR. vs. 106482
test-xtf-amd64-amd64-5 46 xtf/test-hvm64-swint-emulation fail REGR. vs. 106482
test-xtf-amd64-amd64-4 46 xtf/test-hvm64-swint-emulation fail REGR. vs. 106482
test-xtf-amd64-amd64-1 34 xtf/test-hvm32pae-swint-emulation fail REGR. vs. 106482
test-xtf-amd64-amd64-1 38 xtf/test-hvm32pse-swint-emulation fail REGR. vs. 106482
test-xtf-amd64-amd64-1 46 xtf/test-hvm64-swint-emulation fail REGR. vs. 106482
test-amd64-i386-xl-qemut-debianhvm-amd64-xsm 15 guest-localmigrate/x10 fail REGR. vs. 106482
test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 16 guest-stop fail REGR. vs. 106482
Regressions which are regarded as allowable (not blocking):
test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail like 106482
test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 106482
test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail like 106482
test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail like 106482
test-armhf-armhf-libvirt 13 saverestore-support-check fail like 106482
test-armhf-armhf-libvirt-xsm 13 saverestore-support-check fail like 106482
test-armhf-armhf-libvirt-raw 12 saverestore-support-check fail like 106482
test-amd64-amd64-xl-rtds 9 debian-install fail like 106482
Tests which did not succeed, but are not blocking:
test-arm64-arm64-libvirt-xsm 1 build-check(1) blocked n/a
test-arm64-arm64-xl 1 build-check(1) blocked n/a
build-arm64-libvirt 1 build-check(1) blocked n/a
test-arm64-arm64-libvirt-qcow2 1 build-check(1) blocked n/a
test-arm64-arm64-libvirt 1 build-check(1) blocked n/a
test-arm64-arm64-xl-credit2 1 build-check(1) blocked n/a
test-arm64-arm64-xl-rtds 1 build-check(1) blocked n/a
test-arm64-arm64-xl-multivcpu 1 build-check(1) blocked n/a
test-arm64-arm64-xl-xsm 1 build-check(1) blocked n/a
test-amd64-amd64-xl-pvh-amd 11 guest-start fail never pass
test-amd64-amd64-xl-pvh-intel 11 guest-start fail never pass
test-amd64-i386-libvirt 12 migrate-support-check fail never pass
test-amd64-i386-libvirt-xsm 12 migrate-support-check fail never pass
test-amd64-amd64-libvirt 12 migrate-support-check fail never pass
test-amd64-amd64-libvirt-xsm 12 migrate-support-check fail never pass
build-arm64 5 xen-build fail never pass
build-arm64-xsm 5 xen-build fail never pass
test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass
test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass
build-arm64-pvops 5 kernel-build fail never pass
test-amd64-amd64-libvirt-vhd 11 migrate-support-check fail never pass
test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2 fail never pass
test-armhf-armhf-xl 12 migrate-support-check fail never pass
test-armhf-armhf-xl 13 saverestore-support-check fail never pass
test-armhf-armhf-xl-cubietruck 12 migrate-support-check fail never pass
test-armhf-armhf-xl-cubietruck 13 saverestore-support-check fail never pass
test-armhf-armhf-xl-multivcpu 12 migrate-support-check fail never pass
test-armhf-armhf-xl-multivcpu 13 saverestore-support-check fail never pass
test-armhf-armhf-xl-credit2 12 migrate-support-check fail never pass
test-armhf-armhf-xl-credit2 13 saverestore-support-check fail never pass
test-armhf-armhf-xl-xsm 12 migrate-support-check fail never pass
test-armhf-armhf-xl-xsm 13 saverestore-support-check fail never pass
test-armhf-armhf-libvirt 12 migrate-support-check fail never pass
test-armhf-armhf-libvirt-xsm 12 migrate-support-check fail never pass
test-armhf-armhf-xl-arndale 12 migrate-support-check fail never pass
test-armhf-armhf-xl-arndale 13 saverestore-support-check fail never pass
test-armhf-armhf-xl-rtds 12 migrate-support-check fail never pass
test-armhf-armhf-xl-rtds 13 saverestore-support-check fail never pass
test-armhf-armhf-libvirt-raw 11 migrate-support-check fail never pass
test-armhf-armhf-xl-vhd 11 migrate-support-check fail never pass
test-armhf-armhf-xl-vhd 12 saverestore-support-check fail never pass
version targeted for testing:
xen 06857f3436b987fb4942288d8f750c0a1854976c
baseline version:
xen 6d55c0c316357a412526b9dccd45d3c3abb75227
Last test of basis 106482 2017-03-06 01:57:48 Z 1 days
Testing same since 106504 2017-03-06 19:44:53 Z 0 days 1 attempts
------------------------------------------------------------
People who touched revisions under test:
Jan Beulich <jbeulich@suse.com>
Razvan Cojocaru <rcojocaru@bitdefender.com>
Tamas K Lengyel <tamas@tklengyel.com>
jobs:
build-amd64-xsm pass
build-arm64-xsm fail
build-armhf-xsm pass
build-i386-xsm pass
build-amd64-xtf pass
build-amd64 pass
build-arm64 fail
build-armhf pass
build-i386 pass
build-amd64-libvirt pass
build-arm64-libvirt blocked
build-armhf-libvirt pass
build-i386-libvirt pass
build-amd64-oldkern pass
build-i386-oldkern pass
build-amd64-prev pass
build-i386-prev pass
build-amd64-pvops pass
build-arm64-pvops fail
build-armhf-pvops pass
build-i386-pvops pass
build-amd64-rumprun pass
build-i386-rumprun pass
test-xtf-amd64-amd64-1 pass
test-xtf-amd64-amd64-2 pass
test-xtf-amd64-amd64-3 pass
test-xtf-amd64-amd64-4 pass
test-xtf-amd64-amd64-5 pass
test-amd64-amd64-xl pass
test-arm64-arm64-xl blocked
test-armhf-armhf-xl pass
test-amd64-i386-xl pass
test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm pass
test-amd64-i386-xl-qemut-debianhvm-amd64-xsm fail
test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm pass
test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm pass
test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm fail
test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm pass
test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm pass
test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm pass
test-amd64-amd64-libvirt-xsm pass
test-arm64-arm64-libvirt-xsm blocked
test-armhf-armhf-libvirt-xsm pass
test-amd64-i386-libvirt-xsm pass
test-amd64-amd64-xl-xsm pass
test-arm64-arm64-xl-xsm blocked
test-armhf-armhf-xl-xsm pass
test-amd64-i386-xl-xsm pass
test-amd64-amd64-qemuu-nested-amd fail
test-amd64-amd64-xl-pvh-amd fail
test-amd64-i386-qemut-rhel6hvm-amd pass
test-amd64-i386-qemuu-rhel6hvm-amd pass
test-amd64-amd64-xl-qemut-debianhvm-amd64 pass
test-amd64-i386-xl-qemut-debianhvm-amd64 pass
test-amd64-amd64-xl-qemuu-debianhvm-amd64 pass
test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
test-amd64-i386-freebsd10-amd64 pass
test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
test-amd64-i386-xl-qemuu-ovmf-amd64 pass
test-amd64-amd64-rumprun-amd64 pass
test-amd64-amd64-xl-qemut-win7-amd64 fail
test-amd64-i386-xl-qemut-win7-amd64 fail
test-amd64-amd64-xl-qemuu-win7-amd64 fail
test-amd64-i386-xl-qemuu-win7-amd64 fail
test-armhf-armhf-xl-arndale pass
test-amd64-amd64-xl-credit2 pass
test-arm64-arm64-xl-credit2 blocked
test-armhf-armhf-xl-credit2 pass
test-armhf-armhf-xl-cubietruck pass
test-amd64-i386-freebsd10-i386 pass
test-amd64-i386-rumprun-i386 pass
test-amd64-amd64-qemuu-nested-intel pass
test-amd64-amd64-xl-pvh-intel fail
test-amd64-i386-qemut-rhel6hvm-intel pass
test-amd64-i386-qemuu-rhel6hvm-intel pass
test-amd64-amd64-libvirt pass
test-arm64-arm64-libvirt blocked
test-armhf-armhf-libvirt pass
test-amd64-i386-libvirt pass
test-amd64-amd64-migrupgrade pass
test-amd64-i386-migrupgrade pass
test-amd64-amd64-xl-multivcpu pass
test-arm64-arm64-xl-multivcpu blocked
test-armhf-armhf-xl-multivcpu pass
test-amd64-amd64-pair pass
test-amd64-i386-pair pass
test-amd64-amd64-libvirt-pair pass
test-amd64-i386-libvirt-pair pass
test-amd64-amd64-amd64-pvgrub pass
test-amd64-amd64-i386-pvgrub pass
test-amd64-amd64-pygrub pass
test-arm64-arm64-libvirt-qcow2 blocked
test-amd64-amd64-xl-qcow2 pass
test-armhf-armhf-libvirt-raw pass
test-amd64-i386-xl-raw pass
test-amd64-amd64-xl-rtds fail
test-arm64-arm64-xl-rtds blocked
test-armhf-armhf-xl-rtds pass
test-amd64-i386-xl-qemut-winxpsp3-vcpus1 pass
test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 pass
test-amd64-amd64-libvirt-vhd pass
test-armhf-armhf-xl-vhd pass
test-amd64-amd64-xl-qemut-winxpsp3 pass
test-amd64-i386-xl-qemut-winxpsp3 pass
test-amd64-amd64-xl-qemuu-winxpsp3 pass
test-amd64-i386-xl-qemuu-winxpsp3 pass
------------------------------------------------------------
sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images
Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs
Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master
Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary
Not pushing.
------------------------------------------------------------
commit 06857f3436b987fb4942288d8f750c0a1854976c
Author: Razvan Cojocaru <rcojocaru@bitdefender.com>
Date: Mon Mar 6 17:51:15 2017 +0100
x86/mem_access: fix vm_event emulation check with altp2m enabled
Currently, p2m_mem_access_emulate_check() uses p2m_get_mem_access()
to check if the page restrictions have been lifted between the time
of sending the vm_event out and the reception of the reply - in
which case emulation is no longer required. Unfortunately,
p2m_get_mem_access() uses p2m_get_hostp2m(d) which only checks the
default EPT (view 0 in altp2m parlance). This patch fixes this by
checking the active altp2m view instead, whenever applicable.
Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
commit c86b899597dccff002313d1ce9bd32b0f4325c62
Author: Jan Beulich <jbeulich@suse.com>
Date: Mon Mar 6 17:49:45 2017 +0100
ditch redundant integer types
The very few uses can easily be replaced by more standard ones.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
(qemu changes not included)
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [xen-unstable test] 106504: regressions - FAIL
2017-03-07 5:52 [xen-unstable test] 106504: regressions - FAIL osstest service owner
@ 2017-03-07 9:16 ` Jan Beulich
2017-03-07 4:24 ` Chao Gao
0 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2017-03-07 9:16 UTC (permalink / raw)
To: Chao Gao, Kevin Tian; +Cc: xen-devel, osstest-admin
>>> On 07.03.17 at 06:52, <osstest-admin@xenproject.org> wrote:
> flight 106504 xen-unstable real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/106504/
>
> Regressions :-(
>
> Tests which did not succeed and are blocking,
> including tests which could not be run:
> [...]
> test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 16 guest-stop fail REGR. vs.
> 106482
Here we go:
(XEN) d15v0: intack: 02:48 pt: 38
(XEN) vIRR: 00000000 00000000 00000000 00000000 00000000 00000000 00010000 00000000
(XEN) PIR: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
(XEN) Assertion 'intack.vector >= pt_vector' failed at intr.c:360
(XEN) ----[ Xen-4.9-unstable x86_64 debug=y Not tainted ]----
(XEN) CPU: 0
(XEN) RIP: e008:[<ffff82d0802039e8>] vmx_intr_assist+0x5fa/0x61a
(XEN) RFLAGS: 0000000000010292 CONTEXT: hypervisor (d15v0)
(XEN) rax: ffff82d0804754a8 rbx: ffff83007f375680 rcx: 0000000000000000
(XEN) rdx: ffff83007cd3ffff rsi: 000000000000000a rdi: ffff82d0803316d8
(XEN) rbp: ffff83007cd3ff08 rsp: ffff83007cd3fea8 r8: ffff830277db8000
(XEN) r9: 0000000000000001 r10: 0000000000000000 r11: 0000000000000001
(XEN) r12: 00000000ffffffff r13: ffff82d0802b5b02 r14: ffff82d0802b5b02
(XEN) r15: ffff83027d82e000 cr0: 0000000080050033 cr4: 00000000001526e0
(XEN) cr3: 0000000259135000 cr2: 000000000164f034
(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
(XEN) Xen code around <ffff82d0802039e8> (vmx_intr_assist+0x5fa/0x61a):
(XEN) fb ff ff e9 49 fc ff ff <0f> 0b 89 ce 48 89 df e8 2a 21 00 00 e9 49 fe ff
(XEN) Xen stack trace from rsp=ffff83007cd3fea8:
(XEN) ffff82d08044ab00 00000038ffffffff ffff83007cd3ffff ffff83027d82e000
(XEN) ffff83007cd3fef8 ffff82d080133a3d ffff83007f375000 ffff83007f375000
(XEN) ffff83007f7fc000 ffff83026df78000 0000000000000000 ffff83027d82e000
(XEN) ffff83007cd3fdb0 ffff82d080213191 0000000000000004 00000000000000c2
(XEN) 0000000000000020 0000000000000002 ffff880029994000 ffffffff81ade0a0
(XEN) 0000000000000246 0000000000000000 ffff88002d000008 0000000000000004
(XEN) 000000000000006c 0000000000000000 00000000000003f8 00000000000003f8
(XEN) ffffffff81ade0a0 0000beef0000beef ffffffff81389ac4 000000bf0000beef
(XEN) 0000000000000002 ffff88002f403e08 000000000000beef 000000000000beef
(XEN) 000000000000beef 000000000000beef 000000000000beef 0000000000000000
(XEN) ffff83007f375000 0000000000000000 00000000001526e0
(XEN) Xen call trace:
(XEN) [<ffff82d0802039e8>] vmx_intr_assist+0x5fa/0x61a
(XEN) [<ffff82d080213191>] vmx_asm_vmexit_handler+0x41/0x120
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'intack.vector >= pt_vector' failed at intr.c:360
(XEN) ****************************************
I didn't make an attempt at interpreting this yet, but I wonder if it
is more than coincidence that - just like the first time the ASSERT()
triggered - this is again a guest-stop of a qemuu-debianhvm.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [xen-unstable test] 106504: regressions - FAIL
2017-03-07 4:24 ` Chao Gao
@ 2017-03-07 14:11 ` Jan Beulich
2017-03-22 4:53 ` Chao Gao
2017-03-08 3:16 ` Xuquan (Quan Xu)
1 sibling, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2017-03-07 14:11 UTC (permalink / raw)
To: Chao Gao; +Cc: xen-devel, Xuquan, Kevin Tian, osstest-admin
>>> On 07.03.17 at 05:24, <chao.gao@intel.com> wrote:
> On Tue, Mar 07, 2017 at 02:16:50AM -0700, Jan Beulich wrote:
>>>>> On 07.03.17 at 06:52, <osstest-admin@xenproject.org> wrote:
>>> flight 106504 xen-unstable real [real]
>>> http://logs.test-lab.xenproject.org/osstest/logs/106504/
>>>
>>> Regressions :-(
>>>
>>> Tests which did not succeed and are blocking,
>>> including tests which could not be run:
>>> [...]
>>> test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 16 guest-stop fail REGR. vs.
>>> 106482
>>
>>Here we go:
>>
>>(XEN) d15v0: intack: 02:48 pt: 38
>>(XEN) vIRR: 00000000 00000000 00000000 00000000 00000000 00000000 00010000
> 00000000
>>(XEN) PIR: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000
>>(XEN) Assertion 'intack.vector >= pt_vector' failed at intr.c:360
>>(XEN) ----[ Xen-4.9-unstable x86_64 debug=y Not tainted ]----
>>(XEN) CPU: 0
>>(XEN) RIP: e008:[<ffff82d0802039e8>] vmx_intr_assist+0x5fa/0x61a
>>(XEN) RFLAGS: 0000000000010292 CONTEXT: hypervisor (d15v0)
>>(XEN) rax: ffff82d0804754a8 rbx: ffff83007f375680 rcx: 0000000000000000
>>(XEN) rdx: ffff83007cd3ffff rsi: 000000000000000a rdi: ffff82d0803316d8
>>(XEN) rbp: ffff83007cd3ff08 rsp: ffff83007cd3fea8 r8: ffff830277db8000
>>(XEN) r9: 0000000000000001 r10: 0000000000000000 r11: 0000000000000001
>>(XEN) r12: 00000000ffffffff r13: ffff82d0802b5b02 r14: ffff82d0802b5b02
>>(XEN) r15: ffff83027d82e000 cr0: 0000000080050033 cr4: 00000000001526e0
>>(XEN) cr3: 0000000259135000 cr2: 000000000164f034
>>(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
>>(XEN) Xen code around <ffff82d0802039e8> (vmx_intr_assist+0x5fa/0x61a):
>>(XEN) fb ff ff e9 49 fc ff ff <0f> 0b 89 ce 48 89 df e8 2a 21 00 00 e9 49 fe
> ff
>>(XEN) Xen stack trace from rsp=ffff83007cd3fea8:
>>(XEN) ffff82d08044ab00 00000038ffffffff ffff83007cd3ffff ffff83027d82e000
>>(XEN) ffff83007cd3fef8 ffff82d080133a3d ffff83007f375000 ffff83007f375000
>>(XEN) ffff83007f7fc000 ffff83026df78000 0000000000000000 ffff83027d82e000
>>(XEN) ffff83007cd3fdb0 ffff82d080213191 0000000000000004 00000000000000c2
>>(XEN) 0000000000000020 0000000000000002 ffff880029994000 ffffffff81ade0a0
>>(XEN) 0000000000000246 0000000000000000 ffff88002d000008 0000000000000004
>>(XEN) 000000000000006c 0000000000000000 00000000000003f8 00000000000003f8
>>(XEN) ffffffff81ade0a0 0000beef0000beef ffffffff81389ac4 000000bf0000beef
>>(XEN) 0000000000000002 ffff88002f403e08 000000000000beef 000000000000beef
>>(XEN) 000000000000beef 000000000000beef 000000000000beef 0000000000000000
>>(XEN) ffff83007f375000 0000000000000000 00000000001526e0
>>(XEN) Xen call trace:
>>(XEN) [<ffff82d0802039e8>] vmx_intr_assist+0x5fa/0x61a
>>(XEN) [<ffff82d080213191>] vmx_asm_vmexit_handler+0x41/0x120
>>(XEN)
>>(XEN)
>>(XEN) ****************************************
>>(XEN) Panic on CPU 0:
>>(XEN) Assertion 'intack.vector >= pt_vector' failed at intr.c:360
>>(XEN) ****************************************
>>
>>I didn't make an attempt at interpreting this yet, but I wonder if it
>>is more than coincidence that - just like the first time the ASSERT()
>>triggered - this is again a guest-stop of a qemuu-debianhvm.
>>
>
> Cc: xuquan.
>
> Exciting! I have been monitoring osstest for about one months through
> a python script. But I always crawl the flights one time a day.
>
> From the output, the pt_vector is 0x38 and the intack.vector is
> 0x30. these two values are same with they were in the first time.
> And only one bit 0x30 is set in vIRR. PIR is NULL. So maybe
> our suspicion that PIR is not synced to vIRR is wrong. The 0x38 bit
> is not present in vIRR is strange. Is it possible that we clear the 0x38 bit
> just after we return from pt_update_irq()?
That would be done how?
> Or, just like I suspected that
> it is caused by pt_update_irq() sets 0x30 but wrongly returns 0x38.
Same here, and as expressed earlier: I'm lacking a plausible theory
on how this could be happening. In particular ...
> Do you think it worths a try to disable guest's LAPIC timer and
> force it use IRQ0 along with changing RTE very frequently?
... if this is the LAPIC timer, then the RTE isn't being read afaics
(pt_irq_vector() should be taking its very first return path in that
case). Nor am I aware that any Linux version would move around
one of its timer interrupts very frequently. But then again 0x30
or 0x38 wouldn't be use for the LAPIC timer anyway, but rather
a vector in the fixed range (0xEF on 4.10). So I think part of the
problem is to understand which timer's vector we're dealing with
here.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [xen-unstable test] 106504: regressions - FAIL
2017-03-07 4:24 ` Chao Gao
2017-03-07 14:11 ` Jan Beulich
@ 2017-03-08 3:16 ` Xuquan (Quan Xu)
1 sibling, 0 replies; 20+ messages in thread
From: Xuquan (Quan Xu) @ 2017-03-08 3:16 UTC (permalink / raw)
To: Chao Gao, Jan Beulich; +Cc: xen-devel, Kevin Tian, osstest-admin, Andrew Cooper
On March 07, 2017 12:24 PM, Chao Gao wrote:
>On Tue, Mar 07, 2017 at 02:16:50AM -0700, Jan Beulich wrote:
>>>>> On 07.03.17 at 06:52, <osstest-admin@xenproject.org> wrote:
>>> flight 106504 xen-unstable real [real]
>>> http://logs.test-lab.xenproject.org/osstest/logs/106504/
>>>
>>> Regressions :-(
>>>
>>> Tests which did not succeed and are blocking, including tests which
>>> could not be run:
>>> [...]
>>> test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 16 guest-stop fail
>REGR. vs.
>>> 106482
>>
>>Here we go:
>>
>>(XEN) d15v0: intack: 02:48 pt: 38
>>(XEN) vIRR: 00000000 00000000 00000000 00000000 00000000 00000000
>>00010000 00000000
>>(XEN) PIR: 00000000 00000000 00000000 00000000 00000000
>00000000
>>00000000 00000000
>>(XEN) Assertion 'intack.vector >= pt_vector' failed at intr.c:360
>>(XEN) ----[ Xen-4.9-unstable x86_64 debug=y Not tainted ]----
>>(XEN) CPU: 0
>>(XEN) RIP: e008:[<ffff82d0802039e8>] vmx_intr_assist+0x5fa/0x61a
>>(XEN) RFLAGS: 0000000000010292 CONTEXT: hypervisor (d15v0)
>>(XEN) rax: ffff82d0804754a8 rbx: ffff83007f375680 rcx:
>0000000000000000
>>(XEN) rdx: ffff83007cd3ffff rsi: 000000000000000a rdi:
>ffff82d0803316d8
>>(XEN) rbp: ffff83007cd3ff08 rsp: ffff83007cd3fea8 r8:
>ffff830277db8000
>>(XEN) r9: 0000000000000001 r10: 0000000000000000 r11:
>0000000000000001
>>(XEN) r12: 00000000ffffffff r13: ffff82d0802b5b02 r14:
>ffff82d0802b5b02
>>(XEN) r15: ffff83027d82e000 cr0: 0000000080050033 cr4:
>00000000001526e0
>>(XEN) cr3: 0000000259135000 cr2: 000000000164f034
>>(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
>>(XEN) Xen code around <ffff82d0802039e8> (vmx_intr_assist+0x5fa/0x61a):
>>(XEN) fb ff ff e9 49 fc ff ff <0f> 0b 89 ce 48 89 df e8 2a 21 00 00 e9
>>49 fe ff
>>(XEN) Xen stack trace from rsp=ffff83007cd3fea8:
>>(XEN) ffff82d08044ab00 00000038ffffffff ffff83007cd3ffff
>ffff83027d82e000
>>(XEN) ffff83007cd3fef8 ffff82d080133a3d ffff83007f375000
>ffff83007f375000
>>(XEN) ffff83007f7fc000 ffff83026df78000 0000000000000000
>ffff83027d82e000
>>(XEN) ffff83007cd3fdb0 ffff82d080213191 0000000000000004
>00000000000000c2
>>(XEN) 0000000000000020 0000000000000002 ffff880029994000
>ffffffff81ade0a0
>>(XEN) 0000000000000246 0000000000000000 ffff88002d000008
>0000000000000004
>>(XEN) 000000000000006c 0000000000000000 00000000000003f8
>00000000000003f8
>>(XEN) ffffffff81ade0a0 0000beef0000beef ffffffff81389ac4
>000000bf0000beef
>>(XEN) 0000000000000002 ffff88002f403e08 000000000000beef
>000000000000beef
>>(XEN) 000000000000beef 000000000000beef 000000000000beef
>0000000000000000
>>(XEN) ffff83007f375000 0000000000000000 00000000001526e0
>>(XEN) Xen call trace:
>>(XEN) [<ffff82d0802039e8>] vmx_intr_assist+0x5fa/0x61a
>>(XEN) [<ffff82d080213191>] vmx_asm_vmexit_handler+0x41/0x120
>>(XEN)
>>(XEN)
>>(XEN) ****************************************
>>(XEN) Panic on CPU 0:
>>(XEN) Assertion 'intack.vector >= pt_vector' failed at intr.c:360
>>(XEN) ****************************************
>>
>>I didn't make an attempt at interpreting this yet, but I wonder if it
>>is more than coincidence that - just like the first time the ASSERT()
>>triggered - this is again a guest-stop of a qemuu-debianhvm.
>>
>
>Cc: xuquan.
Also Cc: Andrew, who is really a debug expert :)..
>
>Exciting! I have been monitoring osstest for about one months through a
>python script. But I always crawl the flights one time a day.
>
>From the output, the pt_vector is 0x38 and the intack.vector is 0x30. these
>two values are same with they were in the first time.
>And only one bit 0x30 is set in vIRR. PIR is NULL. So maybe our suspicion that
>PIR is not synced to vIRR is wrong. The 0x38 bit is not present in vIRR is
>strange. Is it possible that we clear the 0x38 bit just after we return from
>pt_update_irq()? Or, just like I suspected that it is caused by pt_update_irq()
>sets 0x30 but wrongly returns 0x38.
>Do you think it worths a try to disable guest's LAPIC timer and force it use
>IRQ0 along with changing RTE very frequently?
>If yes, I am glad to do it.
>
I can't find a reasonable explanation for this regression.. However I found that self-ipi virtualization also sets 1 to 'VIRR[vector]'..
It might be a corner case with some special guest, such as mentioned above, debian..
Quan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [xen-unstable test] 106504: regressions - FAIL
2017-03-07 14:11 ` Jan Beulich
@ 2017-03-22 4:53 ` Chao Gao
2017-03-22 12:47 ` Jan Beulich
0 siblings, 1 reply; 20+ messages in thread
From: Chao Gao @ 2017-03-22 4:53 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel, Xuquan, Andrew Cooper, Kevin Tian, osstest-admin
[-- Attachment #1: Type: text/plain, Size: 9282 bytes --]
On Tue, Mar 07, 2017 at 07:11:22AM -0700, Jan Beulich wrote:
>>>> On 07.03.17 at 05:24, <chao.gao@intel.com> wrote:
>> On Tue, Mar 07, 2017 at 02:16:50AM -0700, Jan Beulich wrote:
>>>>>> On 07.03.17 at 06:52, <osstest-admin@xenproject.org> wrote:
>>>> flight 106504 xen-unstable real [real]
>>>> http://logs.test-lab.xenproject.org/osstest/logs/106504/
>>>>
>>>> Regressions :-(
>>>>
>>>> Tests which did not succeed and are blocking,
>>>> including tests which could not be run:
>>>> [...]
>>>> test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 16 guest-stop fail REGR. vs.
>>>> 106482
>>>
>>>Here we go:
>>>
>>>(XEN) d15v0: intack: 02:48 pt: 38
>>>(XEN) vIRR: 00000000 00000000 00000000 00000000 00000000 00000000 00010000
>> 00000000
>>>(XEN) PIR: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>> 00000000
>>>(XEN) Assertion 'intack.vector >= pt_vector' failed at intr.c:360
>>>(XEN) ----[ Xen-4.9-unstable x86_64 debug=y Not tainted ]----
>>>(XEN) CPU: 0
>>>(XEN) RIP: e008:[<ffff82d0802039e8>] vmx_intr_assist+0x5fa/0x61a
>>>(XEN) RFLAGS: 0000000000010292 CONTEXT: hypervisor (d15v0)
>>>(XEN) rax: ffff82d0804754a8 rbx: ffff83007f375680 rcx: 0000000000000000
>>>(XEN) rdx: ffff83007cd3ffff rsi: 000000000000000a rdi: ffff82d0803316d8
>>>(XEN) rbp: ffff83007cd3ff08 rsp: ffff83007cd3fea8 r8: ffff830277db8000
>>>(XEN) r9: 0000000000000001 r10: 0000000000000000 r11: 0000000000000001
>>>(XEN) r12: 00000000ffffffff r13: ffff82d0802b5b02 r14: ffff82d0802b5b02
>>>(XEN) r15: ffff83027d82e000 cr0: 0000000080050033 cr4: 00000000001526e0
>>>(XEN) cr3: 0000000259135000 cr2: 000000000164f034
>>>(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
>>>(XEN) Xen code around <ffff82d0802039e8> (vmx_intr_assist+0x5fa/0x61a):
>>>(XEN) fb ff ff e9 49 fc ff ff <0f> 0b 89 ce 48 89 df e8 2a 21 00 00 e9 49 fe
>> ff
>>>(XEN) Xen stack trace from rsp=ffff83007cd3fea8:
>>>(XEN) ffff82d08044ab00 00000038ffffffff ffff83007cd3ffff ffff83027d82e000
>>>(XEN) ffff83007cd3fef8 ffff82d080133a3d ffff83007f375000 ffff83007f375000
>>>(XEN) ffff83007f7fc000 ffff83026df78000 0000000000000000 ffff83027d82e000
>>>(XEN) ffff83007cd3fdb0 ffff82d080213191 0000000000000004 00000000000000c2
>>>(XEN) 0000000000000020 0000000000000002 ffff880029994000 ffffffff81ade0a0
>>>(XEN) 0000000000000246 0000000000000000 ffff88002d000008 0000000000000004
>>>(XEN) 000000000000006c 0000000000000000 00000000000003f8 00000000000003f8
>>>(XEN) ffffffff81ade0a0 0000beef0000beef ffffffff81389ac4 000000bf0000beef
>>>(XEN) 0000000000000002 ffff88002f403e08 000000000000beef 000000000000beef
>>>(XEN) 000000000000beef 000000000000beef 000000000000beef 0000000000000000
>>>(XEN) ffff83007f375000 0000000000000000 00000000001526e0
>>>(XEN) Xen call trace:
>>>(XEN) [<ffff82d0802039e8>] vmx_intr_assist+0x5fa/0x61a
>>>(XEN) [<ffff82d080213191>] vmx_asm_vmexit_handler+0x41/0x120
>>>(XEN)
>>>(XEN)
>>>(XEN) ****************************************
>>>(XEN) Panic on CPU 0:
>>>(XEN) Assertion 'intack.vector >= pt_vector' failed at intr.c:360
>>>(XEN) ****************************************
>>>
>>>I didn't make an attempt at interpreting this yet, but I wonder if it
>>>is more than coincidence that - just like the first time the ASSERT()
>>>triggered - this is again a guest-stop of a qemuu-debianhvm.
>>>
>>
>> Cc: xuquan.
>>
>> Exciting! I have been monitoring osstest for about one months through
>> a python script. But I always crawl the flights one time a day.
>>
>> From the output, the pt_vector is 0x38 and the intack.vector is
>> 0x30. these two values are same with they were in the first time.
>> And only one bit 0x30 is set in vIRR. PIR is NULL. So maybe
>> our suspicion that PIR is not synced to vIRR is wrong. The 0x38 bit
>> is not present in vIRR is strange. Is it possible that we clear the 0x38 bit
>> just after we return from pt_update_irq()?
>
>That would be done how?
>
>> Or, just like I suspected that
>> it is caused by pt_update_irq() sets 0x30 but wrongly returns 0x38.
>
>Same here, and as expressed earlier: I'm lacking a plausible theory
>on how this could be happening. In particular ...
>
>> Do you think it worths a try to disable guest's LAPIC timer and
>> force it use IRQ0 along with changing RTE very frequently?
>
>... if this is the LAPIC timer, then the RTE isn't being read afaics
>(pt_irq_vector() should be taking its very first return path in that
>case). Nor am I aware that any Linux version would move around
>one of its timer interrupts very frequently. But then again 0x30
>or 0x38 wouldn't be use for the LAPIC timer anyway, but rather
>a vector in the fixed range (0xEF on 4.10). So I think part of the
>problem is to understand which timer's vector we're dealing with
>here.
>
I have written a xtf test case (many codes are from hvmloader) to
trigger this assertion. The test case is in attachments. Bottom is the output
of this test. This test initializes PIT channel0 to generate periodic timer
interrupt at 1000hz per second. The timer interrupt is delivered to vCPU0. And
vCPU1 is used to change IOAPIC RTE 2 frequently.
The assertion can be triggered by guest. To fix assertion failure,
I propose to remove this assertion for the reason below:
1. Operations in this test case are very intrusive and abnormal. It updates
RTE frequently without disabling interrupt source. In this case, I think
software can't assume hardware works correctly.
2. If we remove this assertion(means we admit pt_vector may be different
from (or bigger than) the vector we set in vIRR in a rare case), the side
effect is that we won't decrease the counter pt->ending_intr_nr in
pt_intr_post() and one more timer interrupt in number is injected to guest.
3. We read RTE 3 times. 1st happens when we set vIRR. 2nd happens when
pt_update_irq() returns. 3rd happens in pt_intr_post(). If guest changes
the vector in RTE during the window, it will also incur losing or getting
more periodic timer interrupt.
(d1) [ 1409.741660] --- Xen Test Framework ---
(d1) [ 1409.741869] Environment: HVM 32bit (No paging)
(d1) [ 1409.741964] Test periodic-timer
(d1) [ 1409.742077] activate cpu1
(XEN) [ 1423.581228] d1v0: intack: 02:48 pt: 38
(XEN) [ 1423.581234] vIRR: 00000000 00000000 00000000 00000000 00000000 00000000 00010000 00000000
(XEN) [ 1423.581246] PIR: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
(XEN) [ 1423.581286] Assertion 'intack.vector >= pt_vector' failed at intr.c:360
(XEN) [ 1423.581294] ----[ Xen-4.9-unstable x86_64 debug=y Not tainted ]----
(XEN) [ 1423.581370] CPU: 58
(XEN) [ 1423.581375] RIP: e008:[<ffff82d0801fe405>] vmx_intr_assist+0x605/0x625
(XEN) [ 1423.581389] RFLAGS: 0000000000010296 CONTEXT: hypervisor (d1v0)
(XEN) [ 1423.581398] rax: ffff830837e0402c rbx: ffff83006a093680 rcx: 0000000000000000
(XEN) [ 1423.581404] rdx: ffff831075e17fff rsi: 000000000000000a rdi: ffff82d08032f6b8
(XEN) [ 1423.581410] rbp: ffff831075e17f08 rsp: ffff831075e17e98 r8: ffff83083e000000
(XEN) [ 1423.581416] r9: 0000000000000001 r10: 0000000000000000 r11: 0000000000000001
(XEN) [ 1423.581422] r12: 00000000ffffffff r13: ffff82d0802a4c31 r14: ffff82c000408200
(XEN) [ 1423.581427] r15: 0000000000004016 cr0: 000000008005003b cr4: 00000000003526e0
(XEN) [ 1423.581432] cr3: 000000081e2bf000 cr2: 0000000000000000
(XEN) [ 1423.581437] ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
(XEN) [ 1423.581446] Xen code around <ffff82d0801fe405> (vmx_intr_assist+0x605/0x625):
(XEN) [ 1423.581450] fb ff ff e9 5e fc ff ff <0f> 0b 89 ce 48 89 df e8 03 21 00 00 e9 62 fe ff
(XEN) [ 1423.581470] Xen stack trace from rsp=ffff831075e17e98:
(XEN) [ 1423.581473] ffff831075e17f08 ffff82d08034c700 ffff82d000000030 ffffffffffffffff
(XEN) [ 1423.581483] ffff831075e17fff 0000000000000000 ffff831075e17ef8 ffff82d0801340ff
(XEN) [ 1423.581491] ffff83006a093000 ffff83006a093000 ffff83006a093000 ffff830837e04148
(XEN) [ 1423.581500] 0000014b740caab6 0000000001c9c380 ffff831075e17e28 ffff82d08020da51
(XEN) [ 1423.581509] 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) [ 1423.581515] 0000000000000000 00000000fee00000 0000000000000000 0000000000000000
(XEN) [ 1423.581522] 0000000000000000 0000000000000000 0000000000000004 000000000010260d
(XEN) [ 1423.581529] 0000000000000001 0000000000000000 0000000000000000 0000beef0000beef
(XEN) [ 1423.581536] 0000000000102928 000000bf0000beef 0000000000000206 0000000000115fa0
(XEN) [ 1423.581544] 000000000000beef 000000000000beef 000000000000beef 000000000000beef
(XEN) [ 1423.581551] 000000000000beef 000000000000003a ffff83006a093000 00000037b7a91900
(XEN) [ 1423.581559] 00000000003526e0
(XEN) [ 1423.581564] Xen call trace:
(XEN) [ 1423.581570] [<ffff82d0801fe405>] vmx_intr_assist+0x605/0x625
(XEN) [ 1423.581580] [<ffff82d08020da51>] vmx_asm_vmexit_handler+0x41/0x120
(XEN) [ 1423.581584]
(XEN) [ 1423.827761]
(XEN) [ 1423.829753] ****************************************
(XEN) [ 1423.835210] Panic on CPU 58:
(XEN) [ 1423.838591] Assertion 'intack.vector >= pt_vector' failed at intr.c:360
(XEN) [ 1423.845698] ****************************************
[-- Attachment #2: periodic-timer-test --]
[-- Type: text/plain, Size: 8522 bytes --]
diff --git a/tests/periodic-timer/Makefile b/tests/periodic-timer/Makefile
new file mode 100644
index 0000000..56c42ea
--- /dev/null
+++ b/tests/periodic-timer/Makefile
@@ -0,0 +1,11 @@
+include $(ROOT)/build/common.mk
+
+NAME := periodic-timer
+CATEGORY := special
+TEST-ENVS := hvm32
+
+TEST-EXTRA-CFG := extra.cfg.in
+
+obj-perenv += main.o entry.o
+
+include $(ROOT)/build/gen.mk
diff --git a/tests/periodic-timer/entry.S b/tests/periodic-timer/entry.S
new file mode 100644
index 0000000..8a32f76
--- /dev/null
+++ b/tests/periodic-timer/entry.S
@@ -0,0 +1,15 @@
+#include <arch/idt.h>
+#include <arch/page.h>
+#include <arch/processor.h>
+#include <arch/segment.h>
+#include <xtf/asm_macros.h>
+#include <arch/msr-index.h>
+
+ .align 16
+ .code32
+
+ENTRY(handle_external_int)
+ SAVE_ALL
+ call pt_interrupt_handler
+ RESTORE_ALL
+ iret
diff --git a/tests/periodic-timer/extra.cfg.in b/tests/periodic-timer/extra.cfg.in
new file mode 100644
index 0000000..8cfbab9
--- /dev/null
+++ b/tests/periodic-timer/extra.cfg.in
@@ -0,0 +1 @@
+vcpus=2
diff --git a/tests/periodic-timer/main.c b/tests/periodic-timer/main.c
new file mode 100644
index 0000000..0098b89
--- /dev/null
+++ b/tests/periodic-timer/main.c
@@ -0,0 +1,283 @@
+/**
+ * @file tests/periodic-timer/main.c
+ * @ref test-periodic-timer
+ *
+ * @page test-periodic-timer periodic-timer
+ *
+ * @todo Docs for test-periodic-timer
+ *
+ * @see tests/periodic-timer/main.c
+ */
+#include <xtf.h>
+#include <arch/barrier.h>
+#include <arch/idt.h>
+#include <xtf/asm_macros.h>
+#include <arch/msr-index.h>
+
+#define COUNTER_FREQ 1193181
+#define MAX_PIT_HZ COUNTER_FREQ
+#define MIN_PIT_HZ 18
+#define PIT_CTRL_PORT 0x43
+#define PIT_CHANNEL0 0x40
+
+#define IOAPIC_REGSEL ((uint32_t *)0xfec00000)
+#define IOAPIC_IOWIN ((uint32_t *)0xfec00010)
+
+#define AP_START_EIP 0x1000UL
+extern char ap_boot_start[], ap_boot_end[];
+
+asm (
+ " .text \n"
+ " .code16 \n"
+ "ap_boot_start: .code16 \n"
+ " mov %cs,%ax \n"
+ " mov %ax,%ds \n"
+ " lgdt gdt_desr-ap_boot_start\n"
+ " xor %ax, %ax \n"
+ " inc %ax \n"
+ " lmsw %ax \n"
+ " ljmpl $0x08,$1f \n"
+ "gdt_desr: \n"
+ " .word gdt_end - gdt - 1 \n"
+ " .long gdt \n"
+ "ap_boot_end: .code32 \n"
+ "1: mov $0x10,%eax \n"
+ " mov %eax,%ds \n"
+ " mov %eax,%es \n"
+ " mov %eax,%ss \n"
+ " movl $stack_top,%esp \n"
+ " movl %esp,%ebp \n"
+ " call test_ap_main \n"
+ "1: hlt \n"
+ " jmp 1b \n"
+ " \n"
+ " .align 8 \n"
+ "gdt: \n"
+ " .quad 0x0000000000000000 \n"
+ " .quad 0x00cf9a000000ffff \n" /* 0x08: Flat code segment */
+ " .quad 0x00cf92000000ffff \n" /* 0x10: Flat data segment */
+ "gdt_end: \n"
+ " \n"
+ " .bss \n"
+ " .align 8 \n"
+ "stack: \n"
+ " .skip 0x4000 \n"
+ "stack_top: \n"
+ " .text \n"
+ );
+
+const char test_title[] = "Test periodic-timer";
+
+int init_pit(int freq)
+{
+ uint16_t reload;
+
+ if ( (freq < MIN_PIT_HZ) || (freq > MAX_PIT_HZ) )
+ return -1;
+
+ reload = COUNTER_FREQ / freq;
+
+ asm volatile("cli");
+ outb(0x34, PIT_CTRL_PORT);
+ outb(reload & 0xff, PIT_CHANNEL0);
+ outb(reload >> 8, PIT_CHANNEL0);
+ asm volatile("sti");
+ return 0;
+}
+
+struct ioapic_entry {
+ union {
+ struct {
+ uint32_t vector : 8,
+ dlm : 3,
+ dm : 1,
+ dls : 1,
+ pol : 1,
+ irr : 1,
+ tri : 1,
+ mask : 1,
+ rsvd1 : 15;
+ uint32_t rsvd2 : 24,
+ dest : 8;
+ } fields;
+ struct {
+ uint32_t lo;
+ uint32_t hi;
+ } bits;
+ };
+} __attribute__ ((packed));
+
+void writel(uint32_t data, uint32_t *addr)
+{
+ *addr = data;
+}
+
+#define readl(data, addr) (data) = *(addr)
+
+int write_IOAPIC_entry(struct ioapic_entry *ent, int pin)
+{
+ asm volatile("cli");
+ writel(0x11 + 2*pin, IOAPIC_REGSEL);
+ writel(ent->bits.hi, IOAPIC_IOWIN);
+ wmb();
+ writel(0x10 + 2*pin, IOAPIC_REGSEL);
+ writel(ent->bits.lo, IOAPIC_IOWIN);
+ wmb();
+ asm volatile("sti");
+ return 0;
+}
+
+void handle_external_int(void);
+
+#define rdmsr(msr, val1, val2) \
+ __asm__ __volatile__("rdmsr" \
+ : "=a" (val1), "=d" (val2) \
+ : "c" (msr))
+
+#define wrmsr(msr, val1, val2) \
+ __asm__ __volatile__("wrmsr" \
+ : \
+ : "c" (msr), "a" (val1), "d" (val2))
+
+static inline void wrmsrl(unsigned int msr, uint64_t val)
+{
+ uint32_t lo, hi;
+ lo = (uint32_t)val;
+ hi = (uint32_t)(val >> 32);
+ wrmsr(msr, lo, hi);
+}
+
+#define APIC_BASE_ADDR_MASK 0xfffff000
+#define APIC_BASE_ADDR(a) (a & APIC_BASE_ADDR_MASK)
+#define APIC_BASE_MSR 0x1b
+#define APIC_GLOBAL_ENABLE_MASK 0x800
+#define APIC_EOI 0xB0
+#define APIC_SVR 0xF0
+#define APIC_SOFT_ENABLE_MASK 0x100
+
+uint32_t apic_base_addr;
+
+void enable_lapic(void)
+{
+ uint32_t lo, hi;
+ uint64_t apic_base_msr;
+ uint32_t svr;
+ rdmsr(APIC_BASE_MSR, lo, hi);
+ apic_base_msr = lo | ((uint64_t) hi <<32);
+ apic_base_addr = APIC_BASE_ADDR(apic_base_msr);
+ wrmsrl(APIC_BASE_MSR, apic_base_msr | APIC_GLOBAL_ENABLE_MASK);
+ readl(svr, (uint32_t *)(apic_base_addr + APIC_SVR));
+ writel(svr | APIC_SOFT_ENABLE_MASK, (uint32_t *)(apic_base_addr + APIC_SVR));
+}
+
+void ack_APIC_irq(unsigned long apic_base)
+{
+ writel(0, (uint32_t *)(apic_base + APIC_EOI));
+}
+
+uint32_t lapic_read(uint32_t reg)
+{
+ return *(volatile uint32_t *)(apic_base_addr + reg);
+}
+
+void lapic_write(uint32_t reg, uint32_t val)
+{
+ *(volatile uint32_t *)(apic_base_addr + reg) = val;
+}
+
+#define APIC_ICR 0x300
+#define APIC_ICR2 0x310
+#define APIC_ICR_BUSY 0x01000
+#define APIC_DM_INIT 0x500
+#define APIC_DM_STARTUP 0x600
+
+static inline void cpu_relax(void)
+{
+ asm volatile ( "rep; nop" ::: "memory" );
+}
+
+static void lapic_wait_ready(void)
+{
+ while ( lapic_read(APIC_ICR) & APIC_ICR_BUSY )
+ cpu_relax();
+}
+
+void pt_interrupt_handler(void)
+{
+ ack_APIC_irq(apic_base_addr);
+}
+
+static void boot_cpu(int cpu)
+{
+ unsigned int icr2 = (cpu * 2) << 24;
+ lapic_wait_ready();
+ lapic_write(APIC_ICR2, icr2);
+ lapic_write(APIC_ICR, APIC_DM_INIT);
+ lapic_wait_ready();
+ lapic_write(APIC_ICR2, icr2);
+ lapic_write(APIC_ICR, APIC_DM_STARTUP | (AP_START_EIP >> 12));
+ lapic_wait_ready();
+ lapic_write(APIC_ICR2, icr2);
+ lapic_write(APIC_ICR, APIC_DM_STARTUP | (AP_START_EIP >> 12));
+ lapic_wait_ready();
+}
+
+void smp_initialize(void)
+{
+ memcpy((void*)AP_START_EIP, ap_boot_start,
+ ap_boot_end - ap_boot_start);
+ boot_cpu(1);
+}
+
+struct ioapic_entry ent;
+void test_main(void)
+{
+ struct xtf_idte idte =
+ {
+ .addr = (unsigned long)handle_external_int,
+ .cs = __KERN_CS,
+ .dpl = 0,
+ };
+
+ /* setup idt entry */
+ xtf_set_idte(0x30, &idte);
+ xtf_set_idte(0x38, &idte);
+
+ //asm volatile(".byte 0xcd,0x30\n");
+
+ memset(&ent, 0, sizeof(ent));
+ ent.fields.vector = 0x38;
+ write_IOAPIC_entry(&ent, 2);
+ enable_lapic();
+
+ printk("activate cpu1\n");
+ smp_initialize();
+ init_pit(1000);
+
+ while (1)
+ cpu_relax();
+
+ xtf_success(NULL);
+}
+
+void test_ap_main(void)
+{
+ struct ioapic_entry ent2;
+ memcpy(&ent2, &ent, sizeof(ent2));
+ ent2.fields.vector = 0x30;
+ while (1)
+ {
+ write_IOAPIC_entry(&ent2, 2);
+ write_IOAPIC_entry(&ent, 2);
+ }
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
[-- Attachment #3: Type: text/plain, Size: 127 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [xen-unstable test] 106504: regressions - FAIL
2017-03-22 12:47 ` Jan Beulich
@ 2017-03-22 6:13 ` Chao Gao
2017-03-22 13:40 ` Jan Beulich
2017-03-29 3:28 ` Xuquan (Quan Xu)
2017-03-24 7:48 ` Tian, Kevin
2017-04-04 23:57 ` Chao Gao
2 siblings, 2 replies; 20+ messages in thread
From: Chao Gao @ 2017-03-22 6:13 UTC (permalink / raw)
To: Jan Beulich; +Cc: Andrew Cooper, Xuquan, osstest-admin, Kevin Tian, xen-devel
On Wed, Mar 22, 2017 at 06:47:33AM -0600, Jan Beulich wrote:
>>>> On 22.03.17 at 05:53, <chao.gao@intel.com> wrote:
>> I have written a xtf test case (many codes are from hvmloader) to
>> trigger this assertion. The test case is in attachments.
>
>Thanks for doing this.
>
>> Bottom is the output
>> of this test. This test initializes PIT channel0 to generate periodic timer
>> interrupt at 1000hz per second. The timer interrupt is delivered to vCPU0. And
>> vCPU1 is used to change IOAPIC RTE 2 frequently.
>
>Well, this is certainly helpful (due to some of the conclusions you
>draw below), but it is very likely not what has caused the assertion
>to trigger in osstest. So by removing the assertion (as you suggest
>below) we then will have a silent, non-understood misbehavior.
Agree.
>
>> The assertion can be triggered by guest. To fix assertion failure,
>> I propose to remove this assertion for the reason below:
>
>Of course I agree that a guest triggerable assertion is bad, and
>hence needs a correction somewhere.
>
>> 1. Operations in this test case are very intrusive and abnormal. It updates
>> RTE frequently without disabling interrupt source. In this case, I think
>> software can't assume hardware works correctly.
>
>I guess hardware behavior simply is unspecified in such a case, so
>it's hard to judge whether it works "correctly".
agree.
>
>> 2. If we remove this assertion(means we admit pt_vector may be different
>> from (or bigger than) the vector we set in vIRR in a rare case), the side
>> effect is that we won't decrease the counter pt->ending_intr_nr in
>> pt_intr_post() and one more timer interrupt in number is injected to guest.
>
>Which is clearly wrong, afaict, as that may drive the guest clock
>off (depending on how the guest OS does its accounting).
Yes.
>
>> 3. We read RTE 3 times. 1st happens when we set vIRR. 2nd happens when
>> pt_update_irq() returns. 3rd happens in pt_intr_post(). If guest changes
>> the vector in RTE during the window, it will also incur losing or getting
>> more periodic timer interrupt.
>
>Which raises the question whether latching the value read the first
>time would address the issue you demonstrate with the test case.
>Or alternatively deferring writes to take effect only once readers
>are done with their perhaps multiple accesses?
I think your solution is better.
>
>Can you get in touch with your chipset folks to find out whether
>hardware has cases where multiple reads occur during the
>processing of a single event?
Yes, I will come back once I get how they handle similar processes.
>
>> (d1) [ 1409.741660] --- Xen Test Framework ---
>> (d1) [ 1409.741869] Environment: HVM 32bit (No paging)
>> (d1) [ 1409.741964] Test periodic-timer
>> (d1) [ 1409.742077] activate cpu1
>> (XEN) [ 1423.581228] d1v0: intack: 02:48 pt: 38
>
>I keep getting confused by my own mistake of getting the format
>string wrong here (the above should be intack: 2:30 pt: 38). I.e.
>I was about to complain that there's no use vector 48 in your
>test code, when I remembered that it's being wrongly printed in
>decimal.
Sorry for my fault.
>
>Jan
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [xen-unstable test] 106504: regressions - FAIL
2017-03-22 4:53 ` Chao Gao
@ 2017-03-22 12:47 ` Jan Beulich
2017-03-22 6:13 ` Chao Gao
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: Jan Beulich @ 2017-03-22 12:47 UTC (permalink / raw)
To: Chao Gao; +Cc: Andrew Cooper, Xuquan, osstest-admin, Kevin Tian, xen-devel
>>> On 22.03.17 at 05:53, <chao.gao@intel.com> wrote:
> I have written a xtf test case (many codes are from hvmloader) to
> trigger this assertion. The test case is in attachments.
Thanks for doing this.
> Bottom is the output
> of this test. This test initializes PIT channel0 to generate periodic timer
> interrupt at 1000hz per second. The timer interrupt is delivered to vCPU0. And
> vCPU1 is used to change IOAPIC RTE 2 frequently.
Well, this is certainly helpful (due to some of the conclusions you
draw below), but it is very likely not what has caused the assertion
to trigger in osstest. So by removing the assertion (as you suggest
below) we then will have a silent, non-understood misbehavior.
> The assertion can be triggered by guest. To fix assertion failure,
> I propose to remove this assertion for the reason below:
Of course I agree that a guest triggerable assertion is bad, and
hence needs a correction somewhere.
> 1. Operations in this test case are very intrusive and abnormal. It updates
> RTE frequently without disabling interrupt source. In this case, I think
> software can't assume hardware works correctly.
I guess hardware behavior simply is unspecified in such a case, so
it's hard to judge whether it works "correctly".
> 2. If we remove this assertion(means we admit pt_vector may be different
> from (or bigger than) the vector we set in vIRR in a rare case), the side
> effect is that we won't decrease the counter pt->ending_intr_nr in
> pt_intr_post() and one more timer interrupt in number is injected to guest.
Which is clearly wrong, afaict, as that may drive the guest clock
off (depending on how the guest OS does its accounting).
> 3. We read RTE 3 times. 1st happens when we set vIRR. 2nd happens when
> pt_update_irq() returns. 3rd happens in pt_intr_post(). If guest changes
> the vector in RTE during the window, it will also incur losing or getting
> more periodic timer interrupt.
Which raises the question whether latching the value read the first
time would address the issue you demonstrate with the test case.
Or alternatively deferring writes to take effect only once readers
are done with their perhaps multiple accesses?
Can you get in touch with your chipset folks to find out whether
hardware has cases where multiple reads occur during the
processing of a single event?
> (d1) [ 1409.741660] --- Xen Test Framework ---
> (d1) [ 1409.741869] Environment: HVM 32bit (No paging)
> (d1) [ 1409.741964] Test periodic-timer
> (d1) [ 1409.742077] activate cpu1
> (XEN) [ 1423.581228] d1v0: intack: 02:48 pt: 38
I keep getting confused by my own mistake of getting the format
string wrong here (the above should be intack: 2:30 pt: 38). I.e.
I was about to complain that there's no use vector 48 in your
test code, when I remembered that it's being wrongly printed in
decimal.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [xen-unstable test] 106504: regressions - FAIL
2017-03-22 6:13 ` Chao Gao
@ 2017-03-22 13:40 ` Jan Beulich
2017-03-29 3:28 ` Xuquan (Quan Xu)
1 sibling, 0 replies; 20+ messages in thread
From: Jan Beulich @ 2017-03-22 13:40 UTC (permalink / raw)
To: Chao Gao; +Cc: Andrew Cooper, Xuquan, osstest-admin, Kevin Tian, xen-devel
>>> On 22.03.17 at 07:13, <chao.gao@intel.com> wrote:
> On Wed, Mar 22, 2017 at 06:47:33AM -0600, Jan Beulich wrote:
>>>>> On 22.03.17 at 05:53, <chao.gao@intel.com> wrote:
>>> (d1) [ 1409.741660] --- Xen Test Framework ---
>>> (d1) [ 1409.741869] Environment: HVM 32bit (No paging)
>>> (d1) [ 1409.741964] Test periodic-timer
>>> (d1) [ 1409.742077] activate cpu1
>>> (XEN) [ 1423.581228] d1v0: intack: 02:48 pt: 38
>>
>>I keep getting confused by my own mistake of getting the format
>>string wrong here (the above should be intack: 2:30 pt: 38). I.e.
>>I was about to complain that there's no use vector 48 in your
>>test code, when I remembered that it's being wrongly printed in
>>decimal.
>
> Sorry for my fault.
Hmm? It was pretty obviously me who screwed it up when editing
the patch while committing.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [xen-unstable test] 106504: regressions - FAIL
2017-03-22 12:47 ` Jan Beulich
2017-03-22 6:13 ` Chao Gao
@ 2017-03-24 7:48 ` Tian, Kevin
2017-03-24 8:17 ` Jan Beulich
2017-04-04 23:57 ` Chao Gao
2 siblings, 1 reply; 20+ messages in thread
From: Tian, Kevin @ 2017-03-24 7:48 UTC (permalink / raw)
To: Jan Beulich, Gao, Chao; +Cc: Andrew Cooper, Xuquan, osstest-admin, xen-devel
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Wednesday, March 22, 2017 8:48 PM
>
> > 3. We read RTE 3 times. 1st happens when we set vIRR. 2nd happens when
> > pt_update_irq() returns. 3rd happens in pt_intr_post(). If guest
> > changes the vector in RTE during the window, it will also incur losing
> > or getting more periodic timer interrupt.
>
> Which raises the question whether latching the value read the first time
> would address the issue you demonstrate with the test case.
> Or alternatively deferring writes to take effect only once readers are done
> with their perhaps multiple accesses?
>
> Can you get in touch with your chipset folks to find out whether hardware
> has cases where multiple reads occur during the processing of a single event?
>
There is a similar case. For level-triggered interrupt, there is a "remote IRR"
bit in RTE which is set to 1 when LAPIC accepts the level interrupt sent by
IOAPIC. It's then cleared by EOI broadcast from LAPIC later, based on
matching interrupt vectors. If software happens to change the vector of
the said RTE in-between, "remote IRR" bit will never be cleared (it
expects an EOI with new vector now while actual EOI for previous injection
contains old vector).
Of course in our case pt timer is edge-interrupt, which shouldn't trigger
such multi-reads issue in real hardware. But anyway it's not a good behavior
to change RTE vector w/o stopping the interrupt source first...
Thanks
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [xen-unstable test] 106504: regressions - FAIL
2017-03-24 7:48 ` Tian, Kevin
@ 2017-03-24 8:17 ` Jan Beulich
2017-03-24 8:25 ` Tian, Kevin
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: Jan Beulich @ 2017-03-24 8:17 UTC (permalink / raw)
To: Kevin Tian; +Cc: Andrew Cooper, Xuquan, osstest-admin, xen-devel, Chao Gao
>>> On 24.03.17 at 08:48, <kevin.tian@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Wednesday, March 22, 2017 8:48 PM
>>
>> > 3. We read RTE 3 times. 1st happens when we set vIRR. 2nd happens when
>> > pt_update_irq() returns. 3rd happens in pt_intr_post(). If guest
>> > changes the vector in RTE during the window, it will also incur losing
>> > or getting more periodic timer interrupt.
>>
>> Which raises the question whether latching the value read the first time
>> would address the issue you demonstrate with the test case.
>> Or alternatively deferring writes to take effect only once readers are done
>> with their perhaps multiple accesses?
>>
>> Can you get in touch with your chipset folks to find out whether hardware
>> has cases where multiple reads occur during the processing of a single
> event?
>>
>
> There is a similar case. For level-triggered interrupt, there is a "remote
> IRR"
> bit in RTE which is set to 1 when LAPIC accepts the level interrupt sent by
> IOAPIC. It's then cleared by EOI broadcast from LAPIC later, based on
> matching interrupt vectors. If software happens to change the vector of
> the said RTE in-between, "remote IRR" bit will never be cleared (it
> expects an EOI with new vector now while actual EOI for previous injection
> contains old vector).
Hmm, I'd expect such a write to clear IRR at once, if somebody
really wrote code this way. Or is the bit wrongly documented R/W?
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [xen-unstable test] 106504: regressions - FAIL
2017-03-24 8:17 ` Jan Beulich
@ 2017-03-24 8:25 ` Tian, Kevin
[not found] ` <AADFC41AFE54684AB9EE6CBC0274A5D190C7CFB9@SHSMSX101.ccr.corp.intel.com>
2017-03-24 9:00 ` Andrew Cooper
2 siblings, 0 replies; 20+ messages in thread
From: Tian, Kevin @ 2017-03-24 8:25 UTC (permalink / raw)
To: Jan Beulich; +Cc: Andrew Cooper, Xuquan, osstest-admin, xen-devel, Gao, Chao
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Friday, March 24, 2017 4:18 PM
>
> >>> On 24.03.17 at 08:48, <kevin.tian@intel.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Wednesday, March 22, 2017 8:48 PM
> >>
> >> > 3. We read RTE 3 times. 1st happens when we set vIRR. 2nd happens
> >> > when
> >> > pt_update_irq() returns. 3rd happens in pt_intr_post(). If guest
> >> > changes the vector in RTE during the window, it will also incur
> >> > losing or getting more periodic timer interrupt.
> >>
> >> Which raises the question whether latching the value read the first
> >> time would address the issue you demonstrate with the test case.
> >> Or alternatively deferring writes to take effect only once readers
> >> are done with their perhaps multiple accesses?
> >>
> >> Can you get in touch with your chipset folks to find out whether
> >> hardware has cases where multiple reads occur during the processing
> >> of a single
> > event?
> >>
> >
> > There is a similar case. For level-triggered interrupt, there is a
> > "remote IRR"
> > bit in RTE which is set to 1 when LAPIC accepts the level interrupt
> > sent by IOAPIC. It's then cleared by EOI broadcast from LAPIC later,
> > based on matching interrupt vectors. If software happens to change the
> > vector of the said RTE in-between, "remote IRR" bit will never be
> > cleared (it expects an EOI with new vector now while actual EOI for
> > previous injection contains old vector).
>
> Hmm, I'd expect such a write to clear IRR at once, if somebody really wrote
> code this way. Or is the bit wrongly documented R/W?
>
It's read-only to software, but cleared only when accepting EOI.
Thanks
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [xen-unstable test] 106504: regressions - FAIL
[not found] ` <AADFC41AFE54684AB9EE6CBC0274A5D190C7CFB9@SHSMSX101.ccr.corp.intel.com>
@ 2017-03-24 8:49 ` Tian, Kevin
0 siblings, 0 replies; 20+ messages in thread
From: Tian, Kevin @ 2017-03-24 8:49 UTC (permalink / raw)
To: 'Jan Beulich'
Cc: Andrew Cooper, Xuquan, osstest-admin, xen-devel, Gao, Chao
> From: Tian, Kevin
> Sent: Friday, March 24, 2017 4:26 PM
>
> > From: Jan Beulich [mailto:JBeulich@suse.com]
> > Sent: Friday, March 24, 2017 4:18 PM
> >
> > >>> On 24.03.17 at 08:48, <kevin.tian@intel.com> wrote:
> > >> From: Jan Beulich [mailto:JBeulich@suse.com]
> > >> Sent: Wednesday, March 22, 2017 8:48 PM
> > >>
> > >> > 3. We read RTE 3 times. 1st happens when we set vIRR. 2nd happens
> > >> > when
> > >> > pt_update_irq() returns. 3rd happens in pt_intr_post(). If guest
> > >> > changes the vector in RTE during the window, it will also incur
> > >> > losing or getting more periodic timer interrupt.
> > >>
> > >> Which raises the question whether latching the value read the first
> > >> time would address the issue you demonstrate with the test case.
> > >> Or alternatively deferring writes to take effect only once readers
> > >> are done with their perhaps multiple accesses?
> > >>
> > >> Can you get in touch with your chipset folks to find out whether
> > >> hardware has cases where multiple reads occur during the processing
> > >> of a single
> > > event?
> > >>
> > >
> > > There is a similar case. For level-triggered interrupt, there is a
> > > "remote IRR"
> > > bit in RTE which is set to 1 when LAPIC accepts the level interrupt
> > > sent by IOAPIC. It's then cleared by EOI broadcast from LAPIC
> > > later, based on matching interrupt vectors. If software happens to
> > > change the vector of the said RTE in-between, "remote IRR" bit will
> > > never be cleared (it expects an EOI with new vector now while actual
> > > EOI for previous injection contains old vector).
> >
> > Hmm, I'd expect such a write to clear IRR at once, if somebody really
> > wrote code this way. Or is the bit wrongly documented R/W?
> >
>
> It's read-only to software, but cleared only when accepting EOI.
>
btw it's also the reason why we set EOI exit bitmap for level-triggered
interrupt in APICv, to allow proper emulation of above behavior. :-)
Thanks
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [xen-unstable test] 106504: regressions - FAIL
2017-03-24 8:17 ` Jan Beulich
2017-03-24 8:25 ` Tian, Kevin
[not found] ` <AADFC41AFE54684AB9EE6CBC0274A5D190C7CFB9@SHSMSX101.ccr.corp.intel.com>
@ 2017-03-24 9:00 ` Andrew Cooper
2 siblings, 0 replies; 20+ messages in thread
From: Andrew Cooper @ 2017-03-24 9:00 UTC (permalink / raw)
To: Jan Beulich, Kevin Tian; +Cc: xen-devel, Xuquan, osstest-admin, Chao Gao
On 24/03/2017 08:17, Jan Beulich wrote:
>>>> On 24.03.17 at 08:48, <kevin.tian@intel.com> wrote:
>>> From: Jan Beulich [mailto:JBeulich@suse.com]
>>> Sent: Wednesday, March 22, 2017 8:48 PM
>>>
>>>> 3. We read RTE 3 times. 1st happens when we set vIRR. 2nd happens when
>>>> pt_update_irq() returns. 3rd happens in pt_intr_post(). If guest
>>>> changes the vector in RTE during the window, it will also incur losing
>>>> or getting more periodic timer interrupt.
>>> Which raises the question whether latching the value read the first time
>>> would address the issue you demonstrate with the test case.
>>> Or alternatively deferring writes to take effect only once readers are done
>>> with their perhaps multiple accesses?
>>>
>>> Can you get in touch with your chipset folks to find out whether hardware
>>> has cases where multiple reads occur during the processing of a single
>> event?
>> There is a similar case. For level-triggered interrupt, there is a "remote
>> IRR"
>> bit in RTE which is set to 1 when LAPIC accepts the level interrupt sent by
>> IOAPIC. It's then cleared by EOI broadcast from LAPIC later, based on
>> matching interrupt vectors. If software happens to change the vector of
>> the said RTE in-between, "remote IRR" bit will never be cleared (it
>> expects an EOI with new vector now while actual EOI for previous injection
>> contains old vector).
> Hmm, I'd expect such a write to clear IRR at once, if somebody
> really wrote code this way. Or is the bit wrongly documented R/W?
The IRR is read only. This behaviour is the root cause of one of the
earliest bugs I fixed in the hypervisor, c/s 12b6ea528f
(And on re-reading the commit message, I still haven't found time to
full-fill "This fix is distinctly a temporary hack, waiting on a cleanup
of the irq code.")
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [xen-unstable test] 106504: regressions - FAIL
2017-03-29 3:28 ` Xuquan (Quan Xu)
@ 2017-03-28 20:48 ` Chao Gao
0 siblings, 0 replies; 20+ messages in thread
From: Chao Gao @ 2017-03-28 20:48 UTC (permalink / raw)
To: Xuquan (Quan Xu)
Cc: Andrew Cooper, Kevin Tian, osstest-admin, Jan Beulich, xen-devel
On Wed, Mar 29, 2017 at 03:28:43AM +0000, Xuquan (Quan Xu) wrote:
>On March 22, 2017 2:14 PM, Chao Gao wrote:
>>On Wed, Mar 22, 2017 at 06:47:33AM -0600, Jan Beulich wrote:
>>>> 3. We read RTE 3 times. 1st happens when we set vIRR. 2nd happens
>>>> when
>>>> pt_update_irq() returns. 3rd happens in pt_intr_post(). If guest
>>>> changes the vector in RTE during the window, it will also incur
>>>> losing or getting more periodic timer interrupt.
>>>
>>>Which raises the question whether latching the value read the first
>>>time would address the issue you demonstrate with the test case.
>>>Or alternatively deferring writes to take effect only once readers are
>>>done with their perhaps multiple accesses?
>>
>>I think your solution is better.
>>
>>>
>>>Can you get in touch with your chipset folks to find out whether
>>>hardware has cases where multiple reads occur during the processing of
>>>a single event?
>>
>>Yes, I will come back once I get how they handle similar processes.
>>
>>>
>
>Chao,
>Based on Jan's suggestion, a rcu lock may be helpful to you..
>Specifically, you can refer to rcu_read_lock() in kvm code..
Thanks for your advice. I will read this.
>
>btw, I still can't get how this caused the assertion clearly.. could you describe it in short? :)
In short, it caused by reading IOAPIC RTE twice. The first time is for setting a bit in vIRR.
The second one is to return the bit we set for setting EOI later. But the vector in RTE can be changed
by guest during the time window. If the new vector is bigger than the old one (such as 0x38 > 0x30),
the assertion may fail.
>
>
>
>Quan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [xen-unstable test] 106504: regressions - FAIL
2017-03-22 6:13 ` Chao Gao
2017-03-22 13:40 ` Jan Beulich
@ 2017-03-29 3:28 ` Xuquan (Quan Xu)
2017-03-28 20:48 ` Chao Gao
1 sibling, 1 reply; 20+ messages in thread
From: Xuquan (Quan Xu) @ 2017-03-29 3:28 UTC (permalink / raw)
To: Chao Gao, Jan Beulich; +Cc: Andrew Cooper, Kevin Tian, osstest-admin, xen-devel
On March 22, 2017 2:14 PM, Chao Gao wrote:
>On Wed, Mar 22, 2017 at 06:47:33AM -0600, Jan Beulich wrote:
>>> 3. We read RTE 3 times. 1st happens when we set vIRR. 2nd happens
>>> when
>>> pt_update_irq() returns. 3rd happens in pt_intr_post(). If guest
>>> changes the vector in RTE during the window, it will also incur
>>> losing or getting more periodic timer interrupt.
>>
>>Which raises the question whether latching the value read the first
>>time would address the issue you demonstrate with the test case.
>>Or alternatively deferring writes to take effect only once readers are
>>done with their perhaps multiple accesses?
>
>I think your solution is better.
>
>>
>>Can you get in touch with your chipset folks to find out whether
>>hardware has cases where multiple reads occur during the processing of
>>a single event?
>
>Yes, I will come back once I get how they handle similar processes.
>
>>
Chao,
Based on Jan's suggestion, a rcu lock may be helpful to you..
Specifically, you can refer to rcu_read_lock() in kvm code..
btw, I still can't get how this caused the assertion clearly.. could you describe it in short? :)
Quan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [xen-unstable test] 106504: regressions - FAIL
2017-03-22 12:47 ` Jan Beulich
2017-03-22 6:13 ` Chao Gao
2017-03-24 7:48 ` Tian, Kevin
@ 2017-04-04 23:57 ` Chao Gao
2017-04-05 7:48 ` Jan Beulich
2017-04-07 8:56 ` Xuquan (Quan Xu)
2 siblings, 2 replies; 20+ messages in thread
From: Chao Gao @ 2017-04-04 23:57 UTC (permalink / raw)
To: Jan Beulich; +Cc: Andrew Cooper, Xuquan, osstest-admin, Kevin Tian, xen-devel
On Wed, Mar 22, 2017 at 06:47:33AM -0600, Jan Beulich wrote:
>>>> On 22.03.17 at 05:53, <chao.gao@intel.com> wrote:
>> I have written a xtf test case (many codes are from hvmloader) to
>> trigger this assertion. The test case is in attachments.
>
>Thanks for doing this.
>
>> Bottom is the output
>> of this test. This test initializes PIT channel0 to generate periodic timer
>> interrupt at 1000hz per second. The timer interrupt is delivered to vCPU0. And
>> vCPU1 is used to change IOAPIC RTE 2 frequently.
>
>Well, this is certainly helpful (due to some of the conclusions you
>draw below), but it is very likely not what has caused the assertion
>to trigger in osstest. So by removing the assertion (as you suggest
>below) we then will have a silent, non-understood misbehavior.
>
>> The assertion can be triggered by guest. To fix assertion failure,
>> I propose to remove this assertion for the reason below:
>
>Of course I agree that a guest triggerable assertion is bad, and
>hence needs a correction somewhere.
>
>> 1. Operations in this test case are very intrusive and abnormal. It updates
>> RTE frequently without disabling interrupt source. In this case, I think
>> software can't assume hardware works correctly.
>
>I guess hardware behavior simply is unspecified in such a case, so
>it's hard to judge whether it works "correctly".
>
>> 2. If we remove this assertion(means we admit pt_vector may be different
>> from (or bigger than) the vector we set in vIRR in a rare case), the side
>> effect is that we won't decrease the counter pt->ending_intr_nr in
>> pt_intr_post() and one more timer interrupt in number is injected to guest.
>
>Which is clearly wrong, afaict, as that may drive the guest clock
>off (depending on how the guest OS does its accounting).
>
>> 3. We read RTE 3 times. 1st happens when we set vIRR. 2nd happens when
>> pt_update_irq() returns. 3rd happens in pt_intr_post(). If guest changes
>> the vector in RTE during the window, it will also incur losing or getting
>> more periodic timer interrupt.
>
>Which raises the question whether latching the value read the first
>time would address the issue you demonstrate with the test case.
>Or alternatively deferring writes to take effect only once readers
>are done with their perhaps multiple accesses?
Hi, Jan.
I plan to do the following changes:
1. get the vector set in vIRR to avoid getting a wrong interrupt vector
I think there are two appoaches. One is to extend hvm_isa_irq_assert()
to return the vector set in vIRR. Several functions in call trees are
also involved. The other is to make vIOAPIC support disabling
write operations to RTE. In this case, a rwlock_t is introduced to
protect RTE. pt_update_irq() will disable write operations
at first, then get the vector and assert the vector, at last enable
write operations. Which one do you think is better?
2. let pt_update_irq() pass the periodic timer
whose interrupt is to be injected to vmx_intr_assit() which
in turn can pass it to pt_intr_post(). After this, pt_intr_post()
needn't search the periodic timer that matches the interrupt has
been injected. Through this, we can avoid reading the RTE there.
Do you think the above changes would be a clean solution or you have
some different suggestion on how to fix it now.
Thanks
Chao
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [xen-unstable test] 106504: regressions - FAIL
2017-04-05 7:48 ` Jan Beulich
@ 2017-04-05 1:49 ` Chao Gao
0 siblings, 0 replies; 20+ messages in thread
From: Chao Gao @ 2017-04-05 1:49 UTC (permalink / raw)
To: Jan Beulich; +Cc: Andrew Cooper, Xuquan, osstest-admin, Kevin Tian, xen-devel
On Wed, Apr 05, 2017 at 01:48:22AM -0600, Jan Beulich wrote:
>>>> On 05.04.17 at 01:57, <chao.gao@intel.com> wrote:
>> On Wed, Mar 22, 2017 at 06:47:33AM -0600, Jan Beulich wrote:
>>
>> Hi, Jan.
>>
>> I plan to do the following changes:
>> 1. get the vector set in vIRR to avoid getting a wrong interrupt vector
>> I think there are two appoaches. One is to extend hvm_isa_irq_assert()
>> to return the vector set in vIRR. Several functions in call trees are
>> also involved. The other is to make vIOAPIC support disabling
>> write operations to RTE. In this case, a rwlock_t is introduced to
>> protect RTE. pt_update_irq() will disable write operations
>> at first, then get the vector and assert the vector, at last enable
>> write operations. Which one do you think is better?
>
>That's hard to tell without seeing the changes each actually involves.
>On the surface I'd probably prefer the 2nd, provided the locking can
>be got into a shape where there's no meaningful risk of missing an
>unlock on some path.
Thanks your opinion. I will try to add the lock.
>
>> 2. let pt_update_irq() pass the periodic timer
>> whose interrupt is to be injected to vmx_intr_assit() which
>> in turn can pass it to pt_intr_post(). After this, pt_intr_post()
>> needn't search the periodic timer that matches the interrupt has
>> been injected. Through this, we can avoid reading the RTE there.
>
>If the RTE can't be changed behind your back, why would you
>need this?
Yes. With the first change described above, the second change is needless
theoretically. But If the lock is acquired in vmx_intr_assist(), the lock is
also acquired in major cases (using LAPIC timer) in which getting lock is
useless. If the lock is acquired in pt_update_irq(), it would better be
released there. Thus the lock can't protect pt_intr_post(). Also the second
change would reduce the time spends in locked region. I am worried about adding
a lock here (very critical path) may hurt the performance.
Also I admit making this change should be very careful. Changing less decreases
the possibility of introducing errors.
Thanks
Chao
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [xen-unstable test] 106504: regressions - FAIL
2017-04-04 23:57 ` Chao Gao
@ 2017-04-05 7:48 ` Jan Beulich
2017-04-05 1:49 ` Chao Gao
2017-04-07 8:56 ` Xuquan (Quan Xu)
1 sibling, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2017-04-05 7:48 UTC (permalink / raw)
To: Chao Gao; +Cc: Andrew Cooper, Xuquan, osstest-admin, Kevin Tian, xen-devel
>>> On 05.04.17 at 01:57, <chao.gao@intel.com> wrote:
> On Wed, Mar 22, 2017 at 06:47:33AM -0600, Jan Beulich wrote:
>>>>> On 22.03.17 at 05:53, <chao.gao@intel.com> wrote:
>>> I have written a xtf test case (many codes are from hvmloader) to
>>> trigger this assertion. The test case is in attachments.
>>
>>Thanks for doing this.
>>
>>> Bottom is the output
>>> of this test. This test initializes PIT channel0 to generate periodic timer
>>> interrupt at 1000hz per second. The timer interrupt is delivered to vCPU0. And
>>> vCPU1 is used to change IOAPIC RTE 2 frequently.
>>
>>Well, this is certainly helpful (due to some of the conclusions you
>>draw below), but it is very likely not what has caused the assertion
>>to trigger in osstest. So by removing the assertion (as you suggest
>>below) we then will have a silent, non-understood misbehavior.
>>
>>> The assertion can be triggered by guest. To fix assertion failure,
>>> I propose to remove this assertion for the reason below:
>>
>>Of course I agree that a guest triggerable assertion is bad, and
>>hence needs a correction somewhere.
>>
>>> 1. Operations in this test case are very intrusive and abnormal. It updates
>>> RTE frequently without disabling interrupt source. In this case, I think
>>> software can't assume hardware works correctly.
>>
>>I guess hardware behavior simply is unspecified in such a case, so
>>it's hard to judge whether it works "correctly".
>>
>>> 2. If we remove this assertion(means we admit pt_vector may be different
>>> from (or bigger than) the vector we set in vIRR in a rare case), the side
>>> effect is that we won't decrease the counter pt->ending_intr_nr in
>>> pt_intr_post() and one more timer interrupt in number is injected to guest.
>>
>>Which is clearly wrong, afaict, as that may drive the guest clock
>>off (depending on how the guest OS does its accounting).
>>
>>> 3. We read RTE 3 times. 1st happens when we set vIRR. 2nd happens when
>>> pt_update_irq() returns. 3rd happens in pt_intr_post(). If guest changes
>>> the vector in RTE during the window, it will also incur losing or getting
>>> more periodic timer interrupt.
>>
>>Which raises the question whether latching the value read the first
>>time would address the issue you demonstrate with the test case.
>>Or alternatively deferring writes to take effect only once readers
>>are done with their perhaps multiple accesses?
>
> Hi, Jan.
>
> I plan to do the following changes:
> 1. get the vector set in vIRR to avoid getting a wrong interrupt vector
> I think there are two appoaches. One is to extend hvm_isa_irq_assert()
> to return the vector set in vIRR. Several functions in call trees are
> also involved. The other is to make vIOAPIC support disabling
> write operations to RTE. In this case, a rwlock_t is introduced to
> protect RTE. pt_update_irq() will disable write operations
> at first, then get the vector and assert the vector, at last enable
> write operations. Which one do you think is better?
That's hard to tell without seeing the changes each actually involves.
On the surface I'd probably prefer the 2nd, provided the locking can
be got into a shape where there's no meaningful risk of missing an
unlock on some path.
> 2. let pt_update_irq() pass the periodic timer
> whose interrupt is to be injected to vmx_intr_assit() which
> in turn can pass it to pt_intr_post(). After this, pt_intr_post()
> needn't search the periodic timer that matches the interrupt has
> been injected. Through this, we can avoid reading the RTE there.
If the RTE can't be changed behind your back, why would you
need this?
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [xen-unstable test] 106504: regressions - FAIL
2017-04-04 23:57 ` Chao Gao
2017-04-05 7:48 ` Jan Beulich
@ 2017-04-07 8:56 ` Xuquan (Quan Xu)
1 sibling, 0 replies; 20+ messages in thread
From: Xuquan (Quan Xu) @ 2017-04-07 8:56 UTC (permalink / raw)
To: Chao Gao; +Cc: Andrew Cooper, Kevin Tian, osstest-admin, Jan Beulich, xen-devel
On April 05, 2017 7:58 AM, Chao Gao wrote:
>2. let pt_update_irq() pass the periodic timer whose interrupt is to be
>injected to vmx_intr_assit() which in turn can pass it to pt_intr_post(). After
>this, pt_intr_post() needn't search the periodic timer that matches the
>interrupt has been injected. Through this, we can avoid reading the RTE
>there.
>
the key point of pt_intr_post() is to decrease the count (pending_intr_nr) of pending periodic
timer interrupt, otherwise Xen will deliver a periodic timer interrupt more.
Curious, even if you can pass the pt interrupt to pt_intr_post(), how could you leverage it?
Quan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2017-04-07 8:58 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-07 5:52 [xen-unstable test] 106504: regressions - FAIL osstest service owner
2017-03-07 9:16 ` Jan Beulich
2017-03-07 4:24 ` Chao Gao
2017-03-07 14:11 ` Jan Beulich
2017-03-22 4:53 ` Chao Gao
2017-03-22 12:47 ` Jan Beulich
2017-03-22 6:13 ` Chao Gao
2017-03-22 13:40 ` Jan Beulich
2017-03-29 3:28 ` Xuquan (Quan Xu)
2017-03-28 20:48 ` Chao Gao
2017-03-24 7:48 ` Tian, Kevin
2017-03-24 8:17 ` Jan Beulich
2017-03-24 8:25 ` Tian, Kevin
[not found] ` <AADFC41AFE54684AB9EE6CBC0274A5D190C7CFB9@SHSMSX101.ccr.corp.intel.com>
2017-03-24 8:49 ` Tian, Kevin
2017-03-24 9:00 ` Andrew Cooper
2017-04-04 23:57 ` Chao Gao
2017-04-05 7:48 ` Jan Beulich
2017-04-05 1:49 ` Chao Gao
2017-04-07 8:56 ` Xuquan (Quan Xu)
2017-03-08 3:16 ` Xuquan (Quan Xu)
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.