* [xen-unstable test] 104131: regressions - FAIL @ 2017-01-12 6:46 osstest service owner 2017-01-12 9:14 ` Andrew Cooper 0 siblings, 1 reply; 32+ messages in thread From: osstest service owner @ 2017-01-12 6:46 UTC (permalink / raw) To: xen-devel, osstest-admin flight 104131 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/104131/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-qemuu-debianhvm-amd64 16 guest-stop fail REGR. vs. 104119 Regressions which are regarded as allowable (not blocking): test-armhf-armhf-xl-rtds 15 guest-start/debian.repeat fail REGR. vs. 104119 test-armhf-armhf-libvirt-xsm 13 saverestore-support-check fail like 104104 test-armhf-armhf-libvirt 13 saverestore-support-check fail like 104119 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 104119 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail like 104119 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail like 104119 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail like 104119 test-armhf-armhf-libvirt-qcow2 12 saverestore-support-check fail like 104119 test-armhf-armhf-libvirt-raw 12 saverestore-support-check fail like 104119 test-amd64-amd64-xl-rtds 9 debian-install fail like 104119 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-pvh-amd 11 guest-start fail never pass test-amd64-amd64-xl-pvh-intel 11 guest-start fail never pass test-amd64-i386-libvirt 12 migrate-support-check fail never pass test-amd64-i386-libvirt-xsm 12 migrate-support-check fail never pass test-amd64-amd64-libvirt-xsm 12 migrate-support-check fail never pass test-amd64-amd64-libvirt 12 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 12 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 13 saverestore-support-check fail never pass test-amd64-amd64-libvirt-vhd 11 migrate-support-check fail never pass test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2 fail never pass test-armhf-armhf-xl-xsm 12 migrate-support-check fail never pass test-armhf-armhf-xl-credit2 12 migrate-support-check fail never pass test-armhf-armhf-xl-xsm 13 saverestore-support-check fail never pass test-armhf-armhf-xl-cubietruck 12 migrate-support-check fail never pass test-armhf-armhf-libvirt 12 migrate-support-check fail never pass test-armhf-armhf-xl-credit2 13 saverestore-support-check fail never pass test-armhf-armhf-xl-cubietruck 13 saverestore-support-check fail never pass test-armhf-armhf-libvirt-xsm 12 migrate-support-check fail never pass test-armhf-armhf-libvirt-qcow2 11 migrate-support-check fail never pass test-armhf-armhf-xl-rtds 12 migrate-support-check fail never pass test-armhf-armhf-xl-rtds 13 saverestore-support-check fail never pass test-armhf-armhf-libvirt-raw 11 migrate-support-check fail never pass test-armhf-armhf-xl-vhd 11 migrate-support-check fail never pass test-armhf-armhf-xl-vhd 12 saverestore-support-check fail never pass test-armhf-armhf-xl-multivcpu 12 migrate-support-check fail never pass test-armhf-armhf-xl-multivcpu 13 saverestore-support-check fail never pass test-armhf-armhf-xl 12 migrate-support-check fail never pass test-armhf-armhf-xl 13 saverestore-support-check fail never pass version targeted for testing: xen 0d045d65c19ac48b31344b566cbf82a0270e6e44 baseline version: xen ffc103c223a6d12e5221f66b7e96396a61ba1b20 Last test of basis 104119 2017-01-11 06:45:46 Z 0 days Failing since 104126 2017-01-11 16:44:54 Z 0 days 2 attempts Testing same since 104131 2017-01-11 22:43:41 Z 0 days 1 attempts ------------------------------------------------------------ People who touched revisions under test: Andrew Cooper <andrew.cooper3@citrix.com> Jan Beulich <jbeulich@suse.com> Kevin Tian <kevin.tian@intel.com> Stefano Stabellini <sstabellini@kernel.org> Wei Liu <wei.liu2@citrix.com> jobs: build-amd64-xsm pass build-armhf-xsm pass build-i386-xsm pass build-amd64-xtf pass build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-oldkern pass build-i386-oldkern pass build-amd64-prev pass build-i386-prev pass build-amd64-pvops pass build-armhf-pvops pass build-i386-pvops pass build-amd64-rumprun pass build-i386-rumprun pass test-xtf-amd64-amd64-1 pass test-xtf-amd64-amd64-2 pass test-xtf-amd64-amd64-3 pass test-xtf-amd64-amd64-4 pass test-xtf-amd64-amd64-5 pass test-amd64-amd64-xl pass test-armhf-armhf-xl pass test-amd64-i386-xl pass test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm pass test-amd64-i386-xl-qemut-debianhvm-amd64-xsm pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm pass test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm pass test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm pass test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm pass test-amd64-amd64-libvirt-xsm pass test-armhf-armhf-libvirt-xsm pass test-amd64-i386-libvirt-xsm pass test-amd64-amd64-xl-xsm pass test-armhf-armhf-xl-xsm pass test-amd64-i386-xl-xsm pass test-amd64-amd64-qemuu-nested-amd fail test-amd64-amd64-xl-pvh-amd fail test-amd64-i386-qemut-rhel6hvm-amd pass test-amd64-i386-qemuu-rhel6hvm-amd pass test-amd64-amd64-xl-qemut-debianhvm-amd64 pass test-amd64-i386-xl-qemut-debianhvm-amd64 pass test-amd64-amd64-xl-qemuu-debianhvm-amd64 pass test-amd64-i386-xl-qemuu-debianhvm-amd64 fail test-amd64-i386-freebsd10-amd64 pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64 pass test-amd64-amd64-rumprun-amd64 pass test-amd64-amd64-xl-qemut-win7-amd64 fail test-amd64-i386-xl-qemut-win7-amd64 fail test-amd64-amd64-xl-qemuu-win7-amd64 fail test-amd64-i386-xl-qemuu-win7-amd64 fail test-armhf-armhf-xl-arndale pass test-amd64-amd64-xl-credit2 pass test-armhf-armhf-xl-credit2 pass test-armhf-armhf-xl-cubietruck pass test-amd64-i386-freebsd10-i386 pass test-amd64-i386-rumprun-i386 pass test-amd64-amd64-qemuu-nested-intel pass test-amd64-amd64-xl-pvh-intel fail test-amd64-i386-qemut-rhel6hvm-intel pass test-amd64-i386-qemuu-rhel6hvm-intel pass test-amd64-amd64-libvirt pass test-armhf-armhf-libvirt pass test-amd64-i386-libvirt pass test-amd64-amd64-migrupgrade pass test-amd64-i386-migrupgrade pass test-amd64-amd64-xl-multivcpu pass test-armhf-armhf-xl-multivcpu pass test-amd64-amd64-pair pass test-amd64-i386-pair pass test-amd64-amd64-libvirt-pair pass test-amd64-i386-libvirt-pair pass test-amd64-amd64-amd64-pvgrub pass test-amd64-amd64-i386-pvgrub pass test-amd64-amd64-pygrub pass test-armhf-armhf-libvirt-qcow2 pass test-amd64-amd64-xl-qcow2 pass test-armhf-armhf-libvirt-raw pass test-amd64-i386-xl-raw pass test-amd64-amd64-xl-rtds fail test-armhf-armhf-xl-rtds fail test-amd64-i386-xl-qemut-winxpsp3-vcpus1 pass test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 pass test-amd64-amd64-libvirt-vhd pass test-armhf-armhf-xl-vhd pass test-amd64-amd64-xl-qemut-winxpsp3 pass test-amd64-i386-xl-qemut-winxpsp3 pass test-amd64-amd64-xl-qemuu-winxpsp3 pass test-amd64-i386-xl-qemuu-winxpsp3 pass ------------------------------------------------------------ sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 528 lines long.) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-01-12 6:46 [xen-unstable test] 104131: regressions - FAIL osstest service owner @ 2017-01-12 9:14 ` Andrew Cooper 2017-01-12 12:07 ` Xuquan (Quan Xu) 0 siblings, 1 reply; 32+ messages in thread From: Andrew Cooper @ 2017-01-12 9:14 UTC (permalink / raw) Cc: Kevin Tian, Quan Xu, osstest service owner, Jan Beulich, xen-devel, Chao Gao On 12/01/2017 06:46, osstest service owner wrote: > flight 104131 xen-unstable real [real] > http://logs.test-lab.xenproject.org/osstest/logs/104131/ > > Regressions :-( > > Tests which did not succeed and are blocking, > including tests which could not be run: > test-amd64-i386-xl-qemuu-debianhvm-amd64 16 guest-stop fail REGR. vs. 104119 Jan 12 01:25:17.397607 (XEN) Assertion 'intack.vector >= pt_vector' failed at intr.c:321 Jan 12 01:25:37.133596 (XEN) ----[ Xen-4.9-unstable x86_64 debug=y Not tainted ]---- Jan 12 01:25:37.141577 (XEN) CPU: 14 Jan 12 01:25:37.141607 (XEN) RIP: e008:[<ffff82d0801ef7fc>] vmx_intr_assist+0x35e/0x51d Jan 12 01:25:37.149617 (XEN) RFLAGS: 0000000000010202 CONTEXT: hypervisor (d15v0) Jan 12 01:25:37.149655 (XEN) rax: 0000000000000038 rbx: ffff830079e1e000 rcx: 0000000000000030 Jan 12 01:25:37.157582 (XEN) rdx: 0000000000000000 rsi: 0000000000000030 rdi: ffff830079e1e000 Jan 12 01:25:37.165584 (XEN) rbp: ffff83047de2ff08 rsp: ffff83047de2fea8 r8: ffff82c00022f000 Jan 12 01:25:37.173579 (XEN) r9: ffff8301b63ede80 r10: ffff830176386560 r11: 000001955ee79bd0 Jan 12 01:25:37.181582 (XEN) r12: 0000000000003002 r13: 0000000000003002 r14: 0000000000000030 Jan 12 01:25:37.189584 (XEN) r15: ffff83023fec2000 cr0: 0000000080050033 cr4: 00000000003526e0 Jan 12 01:25:37.197572 (XEN) cr3: 0000000232edb000 cr2: 0000000002487034 Jan 12 01:25:37.205569 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 Jan 12 01:25:37.205606 (XEN) Xen code around <ffff82d0801ef7fc> (vmx_intr_assist+0x35e/0x51d): Jan 12 01:25:37.213575 (XEN) 41 0f b6 f6 39 f0 7e 02 <0f> 0b 48 89 df e8 51 20 00 00 b8 10 08 00 00 0f Jan 12 01:25:37.221561 (XEN) Xen stack trace from rsp=ffff83047de2fea8: Jan 12 01:25:37.229600 (XEN) ffff82d08031aa80 00000038ffffffff ffff83047de2ffff ffff83023fec2000 Jan 12 01:25:37.237594 (XEN) ffff83047de2fef8 ffff82d080130cb6 ffff830079e1e000 ffff830079e1e000 Jan 12 01:25:37.245588 (XEN) ffff83007bae2000 000000000000000e ffff830233117000 ffff83023fec2000 Jan 12 01:25:37.253594 (XEN) ffff83047de2fdc0 ffff82d0801fdeb1 0000000000000004 00000000000000c2 Jan 12 01:25:37.261584 (XEN) 0000000000000020 0000000000000007 ffff8800e8d28000 ffffffff81add0a0 Jan 12 01:25:37.269607 (XEN) 0000000000000246 0000000000000000 ffff880142400008 0000000000000004 Jan 12 01:25:37.277580 (XEN) 0000000000000036 0000000000000000 00000000000003f8 00000000000003f8 Jan 12 01:25:37.285584 (XEN) ffffffff81add0a0 0000beef0000beef ffffffff813899a4 000000bf0000beef Jan 12 01:25:37.293567 (XEN) 0000000000000002 ffff880147c03e08 000000000000beef 1cec835356e5beef Jan 12 01:25:37.293606 (XEN) 085d8b002674beef 01dcb38b000cbeef 8914458d3174beef 2444c7100000000e Jan 12 01:25:37.301586 (XEN) ffff830079e1e000 00000031bfc37600 00000000003526e0 Jan 12 01:25:37.309607 (XEN) Xen call trace: Jan 12 01:25:37.309639 (XEN) [<ffff82d0801ef7fc>] vmx_intr_assist+0x35e/0x51d Jan 12 01:25:37.317591 (XEN) [<ffff82d0801fdeb1>] vmx_asm_vmexit_handler+0x41/0x120 Jan 12 01:25:37.325598 (XEN) Jan 12 01:25:37.325624 (XEN) Jan 12 01:25:37.325647 (XEN) **************************************** Jan 12 01:25:37.333653 (XEN) Panic on CPU 14: Jan 12 01:25:37.333684 (XEN) Assertion 'intack.vector >= pt_vector' failed at intr.c:321 Jan 12 01:25:37.341571 (XEN) **************************************** Jan 12 01:25:37.341603 (XEN) Jan 12 01:25:37.341626 (XEN) Reboot in five seconds... Jan 12 01:25:37.349566 (XEN) Resetting with ACPI MEMORY or I/O RESET_REG. This is caused by "x86/apicv: fix RTC periodic timer and apicv issue". It is not a deterministic issue, as it appears to have survived a week of testing already, but there is clearly something still problematic with the code. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-01-12 9:14 ` Andrew Cooper @ 2017-01-12 12:07 ` Xuquan (Quan Xu) 2017-01-12 9:04 ` Chao Gao 2017-01-12 12:15 ` Andrew Cooper 0 siblings, 2 replies; 32+ messages in thread From: Xuquan (Quan Xu) @ 2017-01-12 12:07 UTC (permalink / raw) To: Andrew Cooper Cc: Kevin Tian, Chao Gao, osstest service owner, Jan Beulich, xen-devel On January 12, 2017 5:14 PM, Andrew Cooper wrote: >On 12/01/2017 06:46, osstest service owner wrote: >> flight 104131 xen-unstable real [real] >> http://logs.test-lab.xenproject.org/osstest/logs/104131/ >> >> Regressions :-( >> >> Tests which did not succeed and are blocking, including tests which >> could not be run: >> test-amd64-i386-xl-qemuu-debianhvm-amd64 16 guest-stop fail >REGR. vs. 104119 > >Jan 12 01:25:17.397607 (XEN) Assertion 'intack.vector >= pt_vector' failed at >intr.c:321 >Jan 12 01:25:37.133596 (XEN) ----[ Xen-4.9-unstable x86_64 debug=y >Not tainted ]---- >Jan 12 01:25:37.141577 (XEN) CPU: 14 >Jan 12 01:25:37.141607 (XEN) RIP: e008:[<ffff82d0801ef7fc>] >vmx_intr_assist+0x35e/0x51d >Jan 12 01:25:37.149617 (XEN) RFLAGS: 0000000000010202 CONTEXT: >hypervisor (d15v0) >Jan 12 01:25:37.149655 (XEN) rax: 0000000000000038 rbx: >ffff830079e1e000 rcx: 0000000000000030 >Jan 12 01:25:37.157582 (XEN) rdx: 0000000000000000 rsi: >0000000000000030 rdi: ffff830079e1e000 >Jan 12 01:25:37.165584 (XEN) rbp: ffff83047de2ff08 rsp: ffff83047de2fea8 >r8: ffff82c00022f000 >Jan 12 01:25:37.173579 (XEN) r9: ffff8301b63ede80 r10: >ffff830176386560 r11: 000001955ee79bd0 >Jan 12 01:25:37.181582 (XEN) r12: 0000000000003002 r13: >0000000000003002 r14: 0000000000000030 >Jan 12 01:25:37.189584 (XEN) r15: ffff83023fec2000 cr0: >0000000080050033 cr4: 00000000003526e0 >Jan 12 01:25:37.197572 (XEN) cr3: 0000000232edb000 cr2: >0000000002487034 >Jan 12 01:25:37.205569 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 >ss: 0000 cs: e008 >Jan 12 01:25:37.205606 (XEN) Xen code around <ffff82d0801ef7fc> >(vmx_intr_assist+0x35e/0x51d): >Jan 12 01:25:37.213575 (XEN) 41 0f b6 f6 39 f0 7e 02 <0f> 0b 48 89 df e8 51 >20 00 00 b8 10 08 00 00 0f Jan 12 01:25:37.221561 (XEN) Xen stack trace >from rsp=ffff83047de2fea8: >Jan 12 01:25:37.229600 (XEN) ffff82d08031aa80 00000038ffffffff >ffff83047de2ffff ffff83023fec2000 >Jan 12 01:25:37.237594 (XEN) ffff83047de2fef8 ffff82d080130cb6 >ffff830079e1e000 ffff830079e1e000 >Jan 12 01:25:37.245588 (XEN) ffff83007bae2000 000000000000000e >ffff830233117000 ffff83023fec2000 >Jan 12 01:25:37.253594 (XEN) ffff83047de2fdc0 ffff82d0801fdeb1 >0000000000000004 00000000000000c2 >Jan 12 01:25:37.261584 (XEN) 0000000000000020 0000000000000007 >ffff8800e8d28000 ffffffff81add0a0 >Jan 12 01:25:37.269607 (XEN) 0000000000000246 0000000000000000 >ffff880142400008 0000000000000004 >Jan 12 01:25:37.277580 (XEN) 0000000000000036 0000000000000000 >00000000000003f8 00000000000003f8 >Jan 12 01:25:37.285584 (XEN) ffffffff81add0a0 0000beef0000beef >ffffffff813899a4 000000bf0000beef >Jan 12 01:25:37.293567 (XEN) 0000000000000002 ffff880147c03e08 >000000000000beef 1cec835356e5beef >Jan 12 01:25:37.293606 (XEN) 085d8b002674beef 01dcb38b000cbeef >8914458d3174beef 2444c7100000000e >Jan 12 01:25:37.301586 (XEN) ffff830079e1e000 00000031bfc37600 >00000000003526e0 >Jan 12 01:25:37.309607 (XEN) Xen call trace: >Jan 12 01:25:37.309639 (XEN) [<ffff82d0801ef7fc>] >vmx_intr_assist+0x35e/0x51d >Jan 12 01:25:37.317591 (XEN) [<ffff82d0801fdeb1>] >vmx_asm_vmexit_handler+0x41/0x120 >Jan 12 01:25:37.325598 (XEN) >Jan 12 01:25:37.325624 (XEN) >Jan 12 01:25:37.325647 (XEN) >**************************************** >Jan 12 01:25:37.333653 (XEN) Panic on CPU 14: >Jan 12 01:25:37.333684 (XEN) Assertion 'intack.vector >= pt_vector' failed at >intr.c:321 Jan 12 01:25:37.341571 (XEN) >**************************************** >Jan 12 01:25:37.341603 (XEN) >Jan 12 01:25:37.341626 (XEN) Reboot in five seconds... >Jan 12 01:25:37.349566 (XEN) Resetting with ACPI MEMORY or I/O >RESET_REG. > >This is caused by "x86/apicv: fix RTC periodic timer and apicv issue". It is >not a deterministic issue, as it appears to have survived a week of testing >already, but there is clearly something still problematic with the code. > Andrew, If you have, could you give more information? Such as the value of intack.vector / pt_vector.. I guess, the reason may be that the intack.vector is ' uint8_t ' and the pt_vector is 'int'.. Or there is a corner case that intack.vector is __not__ the highest priority vector.. Kevin / Jan, any thoughts? Quan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-01-12 12:07 ` Xuquan (Quan Xu) @ 2017-01-12 9:04 ` Chao Gao 2017-01-12 12:15 ` Andrew Cooper 1 sibling, 0 replies; 32+ messages in thread From: Chao Gao @ 2017-01-12 9:04 UTC (permalink / raw) To: Xuquan (Quan Xu) Cc: Andrew Cooper, Kevin Tian, osstest service owner, Jan Beulich, xen-devel According the code around the assert: movzbl %r14b, %esi 41 0f b6 f6 cmp %esi, %eax 39 f0 jle ... 7e 02 ud2 <0f> 0b mov %rbx, %rdi 48 89 df callq ... e8 51 20 00 00 mov $0x810, %eax b8 10 08 00 00 so I think one is 0x38 %eax, the other is 0x30 %esi On Thu, Jan 12, 2017 at 12:07:53PM +0000, Xuquan (Quan Xu) wrote: >On January 12, 2017 5:14 PM, Andrew Cooper wrote: >>On 12/01/2017 06:46, osstest service owner wrote: >>> flight 104131 xen-unstable real [real] >>> http://logs.test-lab.xenproject.org/osstest/logs/104131/ >>> >>> Regressions :-( >>> >>> Tests which did not succeed and are blocking, including tests which >>> could not be run: >>> test-amd64-i386-xl-qemuu-debianhvm-amd64 16 guest-stop fail >>REGR. vs. 104119 >> >>Jan 12 01:25:17.397607 (XEN) Assertion 'intack.vector >= pt_vector' failed at >>intr.c:321 >>Jan 12 01:25:37.133596 (XEN) ----[ Xen-4.9-unstable x86_64 debug=y >>Not tainted ]---- >>Jan 12 01:25:37.141577 (XEN) CPU: 14 >>Jan 12 01:25:37.141607 (XEN) RIP: e008:[<ffff82d0801ef7fc>] >>vmx_intr_assist+0x35e/0x51d >>Jan 12 01:25:37.149617 (XEN) RFLAGS: 0000000000010202 CONTEXT: >>hypervisor (d15v0) >>Jan 12 01:25:37.149655 (XEN) rax: 0000000000000038 rbx: >>ffff830079e1e000 rcx: 0000000000000030 >>Jan 12 01:25:37.157582 (XEN) rdx: 0000000000000000 rsi: >>0000000000000030 rdi: ffff830079e1e000 >>Jan 12 01:25:37.165584 (XEN) rbp: ffff83047de2ff08 rsp: ffff83047de2fea8 >>r8: ffff82c00022f000 >>Jan 12 01:25:37.173579 (XEN) r9: ffff8301b63ede80 r10: >>ffff830176386560 r11: 000001955ee79bd0 >>Jan 12 01:25:37.181582 (XEN) r12: 0000000000003002 r13: >>0000000000003002 r14: 0000000000000030 >>Jan 12 01:25:37.189584 (XEN) r15: ffff83023fec2000 cr0: >>0000000080050033 cr4: 00000000003526e0 >>Jan 12 01:25:37.197572 (XEN) cr3: 0000000232edb000 cr2: >>0000000002487034 >>Jan 12 01:25:37.205569 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 >>ss: 0000 cs: e008 >>Jan 12 01:25:37.205606 (XEN) Xen code around <ffff82d0801ef7fc> >>(vmx_intr_assist+0x35e/0x51d): >>Jan 12 01:25:37.213575 (XEN) 41 0f b6 f6 39 f0 7e 02 <0f> 0b 48 89 df e8 51 >>20 00 00 b8 10 08 00 00 0f Jan 12 01:25:37.221561 (XEN) Xen stack trace >>from rsp=ffff83047de2fea8: >>Jan 12 01:25:37.229600 (XEN) ffff82d08031aa80 00000038ffffffff >>ffff83047de2ffff ffff83023fec2000 >>Jan 12 01:25:37.237594 (XEN) ffff83047de2fef8 ffff82d080130cb6 >>ffff830079e1e000 ffff830079e1e000 >>Jan 12 01:25:37.245588 (XEN) ffff83007bae2000 000000000000000e >>ffff830233117000 ffff83023fec2000 >>Jan 12 01:25:37.253594 (XEN) ffff83047de2fdc0 ffff82d0801fdeb1 >>0000000000000004 00000000000000c2 >>Jan 12 01:25:37.261584 (XEN) 0000000000000020 0000000000000007 >>ffff8800e8d28000 ffffffff81add0a0 >>Jan 12 01:25:37.269607 (XEN) 0000000000000246 0000000000000000 >>ffff880142400008 0000000000000004 >>Jan 12 01:25:37.277580 (XEN) 0000000000000036 0000000000000000 >>00000000000003f8 00000000000003f8 >>Jan 12 01:25:37.285584 (XEN) ffffffff81add0a0 0000beef0000beef >>ffffffff813899a4 000000bf0000beef >>Jan 12 01:25:37.293567 (XEN) 0000000000000002 ffff880147c03e08 >>000000000000beef 1cec835356e5beef >>Jan 12 01:25:37.293606 (XEN) 085d8b002674beef 01dcb38b000cbeef >>8914458d3174beef 2444c7100000000e >>Jan 12 01:25:37.301586 (XEN) ffff830079e1e000 00000031bfc37600 >>00000000003526e0 >>Jan 12 01:25:37.309607 (XEN) Xen call trace: >>Jan 12 01:25:37.309639 (XEN) [<ffff82d0801ef7fc>] >>vmx_intr_assist+0x35e/0x51d >>Jan 12 01:25:37.317591 (XEN) [<ffff82d0801fdeb1>] >>vmx_asm_vmexit_handler+0x41/0x120 >>Jan 12 01:25:37.325598 (XEN) >>Jan 12 01:25:37.325624 (XEN) >>Jan 12 01:25:37.325647 (XEN) >>**************************************** >>Jan 12 01:25:37.333653 (XEN) Panic on CPU 14: >>Jan 12 01:25:37.333684 (XEN) Assertion 'intack.vector >= pt_vector' failed at >>intr.c:321 Jan 12 01:25:37.341571 (XEN) >>**************************************** >>Jan 12 01:25:37.341603 (XEN) >>Jan 12 01:25:37.341626 (XEN) Reboot in five seconds... >>Jan 12 01:25:37.349566 (XEN) Resetting with ACPI MEMORY or I/O >>RESET_REG. >> >>This is caused by "x86/apicv: fix RTC periodic timer and apicv issue". It is >>not a deterministic issue, as it appears to have survived a week of testing >>already, but there is clearly something still problematic with the code. >> > > >Andrew, >If you have, could you give more information? Such as the value of intack.vector / pt_vector.. >I guess, the reason may be that the intack.vector is ' uint8_t ' and the pt_vector is 'int'.. > >Or there is a corner case that intack.vector is __not__ the highest priority vector.. > >Kevin / Jan, any thoughts? > >Quan > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-01-12 12:07 ` Xuquan (Quan Xu) 2017-01-12 9:04 ` Chao Gao @ 2017-01-12 12:15 ` Andrew Cooper 2017-01-12 12:25 ` Jan Beulich 1 sibling, 1 reply; 32+ messages in thread From: Andrew Cooper @ 2017-01-12 12:15 UTC (permalink / raw) To: Xuquan (Quan Xu) Cc: Kevin Tian, Chao Gao, osstest service owner, Jan Beulich, xen-devel On 12/01/17 12:07, Xuquan (Quan Xu) wrote: > On January 12, 2017 5:14 PM, Andrew Cooper wrote: >> On 12/01/2017 06:46, osstest service owner wrote: >>> flight 104131 xen-unstable real [real] >>> http://logs.test-lab.xenproject.org/osstest/logs/104131/ >>> >>> Regressions :-( >>> >>> Tests which did not succeed and are blocking, including tests which >>> could not be run: >>> test-amd64-i386-xl-qemuu-debianhvm-amd64 16 guest-stop fail >> REGR. vs. 104119 >> >> Jan 12 01:25:17.397607 (XEN) Assertion 'intack.vector >= pt_vector' failed at >> intr.c:321 >> Jan 12 01:25:37.133596 (XEN) ----[ Xen-4.9-unstable x86_64 debug=y >> Not tainted ]---- >> Jan 12 01:25:37.141577 (XEN) CPU: 14 >> Jan 12 01:25:37.141607 (XEN) RIP: e008:[<ffff82d0801ef7fc>] >> vmx_intr_assist+0x35e/0x51d >> Jan 12 01:25:37.149617 (XEN) RFLAGS: 0000000000010202 CONTEXT: >> hypervisor (d15v0) >> Jan 12 01:25:37.149655 (XEN) rax: 0000000000000038 rbx: >> ffff830079e1e000 rcx: 0000000000000030 >> Jan 12 01:25:37.157582 (XEN) rdx: 0000000000000000 rsi: >> 0000000000000030 rdi: ffff830079e1e000 >> Jan 12 01:25:37.165584 (XEN) rbp: ffff83047de2ff08 rsp: ffff83047de2fea8 >> r8: ffff82c00022f000 >> Jan 12 01:25:37.173579 (XEN) r9: ffff8301b63ede80 r10: >> ffff830176386560 r11: 000001955ee79bd0 >> Jan 12 01:25:37.181582 (XEN) r12: 0000000000003002 r13: >> 0000000000003002 r14: 0000000000000030 >> Jan 12 01:25:37.189584 (XEN) r15: ffff83023fec2000 cr0: >> 0000000080050033 cr4: 00000000003526e0 >> Jan 12 01:25:37.197572 (XEN) cr3: 0000000232edb000 cr2: >> 0000000002487034 >> Jan 12 01:25:37.205569 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 >> ss: 0000 cs: e008 >> Jan 12 01:25:37.205606 (XEN) Xen code around <ffff82d0801ef7fc> >> (vmx_intr_assist+0x35e/0x51d): >> Jan 12 01:25:37.213575 (XEN) 41 0f b6 f6 39 f0 7e 02 <0f> 0b 48 89 df e8 51 >> 20 00 00 b8 10 08 00 00 0f Jan 12 01:25:37.221561 (XEN) Xen stack trace > >from rsp=ffff83047de2fea8: >> Jan 12 01:25:37.229600 (XEN) ffff82d08031aa80 00000038ffffffff >> ffff83047de2ffff ffff83023fec2000 >> Jan 12 01:25:37.237594 (XEN) ffff83047de2fef8 ffff82d080130cb6 >> ffff830079e1e000 ffff830079e1e000 >> Jan 12 01:25:37.245588 (XEN) ffff83007bae2000 000000000000000e >> ffff830233117000 ffff83023fec2000 >> Jan 12 01:25:37.253594 (XEN) ffff83047de2fdc0 ffff82d0801fdeb1 >> 0000000000000004 00000000000000c2 >> Jan 12 01:25:37.261584 (XEN) 0000000000000020 0000000000000007 >> ffff8800e8d28000 ffffffff81add0a0 >> Jan 12 01:25:37.269607 (XEN) 0000000000000246 0000000000000000 >> ffff880142400008 0000000000000004 >> Jan 12 01:25:37.277580 (XEN) 0000000000000036 0000000000000000 >> 00000000000003f8 00000000000003f8 >> Jan 12 01:25:37.285584 (XEN) ffffffff81add0a0 0000beef0000beef >> ffffffff813899a4 000000bf0000beef >> Jan 12 01:25:37.293567 (XEN) 0000000000000002 ffff880147c03e08 >> 000000000000beef 1cec835356e5beef >> Jan 12 01:25:37.293606 (XEN) 085d8b002674beef 01dcb38b000cbeef >> 8914458d3174beef 2444c7100000000e >> Jan 12 01:25:37.301586 (XEN) ffff830079e1e000 00000031bfc37600 >> 00000000003526e0 >> Jan 12 01:25:37.309607 (XEN) Xen call trace: >> Jan 12 01:25:37.309639 (XEN) [<ffff82d0801ef7fc>] >> vmx_intr_assist+0x35e/0x51d >> Jan 12 01:25:37.317591 (XEN) [<ffff82d0801fdeb1>] >> vmx_asm_vmexit_handler+0x41/0x120 >> Jan 12 01:25:37.325598 (XEN) >> Jan 12 01:25:37.325624 (XEN) >> Jan 12 01:25:37.325647 (XEN) >> **************************************** >> Jan 12 01:25:37.333653 (XEN) Panic on CPU 14: >> Jan 12 01:25:37.333684 (XEN) Assertion 'intack.vector >= pt_vector' failed at >> intr.c:321 Jan 12 01:25:37.341571 (XEN) >> **************************************** >> Jan 12 01:25:37.341603 (XEN) >> Jan 12 01:25:37.341626 (XEN) Reboot in five seconds... >> Jan 12 01:25:37.349566 (XEN) Resetting with ACPI MEMORY or I/O >> RESET_REG. >> >> This is caused by "x86/apicv: fix RTC periodic timer and apicv issue". It is >> not a deterministic issue, as it appears to have survived a week of testing >> already, but there is clearly something still problematic with the code. >> > > Andrew, > If you have, could you give more information? No further information sorry. This was found by the automated test system. Full logs are available from http://logs.test-lab.xenproject.org/osstest/logs/104131/test-amd64-i386-xl-qemuu-debianhvm-amd64/ but I doubt any of them will help in diagnosing the issue any further. > Such as the value of intack.vector / pt_vector.. > I guess, the reason may be that the intack.vector is ' uint8_t ' and the pt_vector is 'int'.. > > Or there is a corner case that intack.vector is __not__ the highest priority vector.. > > Kevin / Jan, any thoughts? It happened during domain shutdown. It might be an edge case interaction with qemu-raised interrupts via hypercall? ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-01-12 12:15 ` Andrew Cooper @ 2017-01-12 12:25 ` Jan Beulich 2017-01-16 5:25 ` Tian, Kevin 0 siblings, 1 reply; 32+ messages in thread From: Jan Beulich @ 2017-01-12 12:25 UTC (permalink / raw) To: Xuquan (Quan Xu) Cc: Andrew Cooper, Kevin Tian, xen-devel, osstest service owner, Chao Gao >>> On 12.01.17 at 13:15, <andrew.cooper3@citrix.com> wrote: > On 12/01/17 12:07, Xuquan (Quan Xu) wrote: >> On January 12, 2017 5:14 PM, Andrew Cooper wrote: >>> On 12/01/2017 06:46, osstest service owner wrote: >>>> flight 104131 xen-unstable real [real] >>>> http://logs.test-lab.xenproject.org/osstest/logs/104131/ >>>> >>>> Regressions :-( >>>> >>>> Tests which did not succeed and are blocking, including tests which >>>> could not be run: >>>> test-amd64-i386-xl-qemuu-debianhvm-amd64 16 guest-stop fail >>> REGR. vs. 104119 >>> >>> Jan 12 01:25:17.397607 (XEN) Assertion 'intack.vector >= pt_vector' failed at >>> intr.c:321 >>> Jan 12 01:25:37.133596 (XEN) ----[ Xen-4.9-unstable x86_64 debug=y >>> Not tainted ]---- >>> Jan 12 01:25:37.141577 (XEN) CPU: 14 >>> Jan 12 01:25:37.141607 (XEN) RIP: e008:[<ffff82d0801ef7fc>] >>> vmx_intr_assist+0x35e/0x51d >>> Jan 12 01:25:37.149617 (XEN) RFLAGS: 0000000000010202 CONTEXT: >>> hypervisor (d15v0) >>> Jan 12 01:25:37.149655 (XEN) rax: 0000000000000038 rbx: >>> ffff830079e1e000 rcx: 0000000000000030 >>> Jan 12 01:25:37.157582 (XEN) rdx: 0000000000000000 rsi: >>> 0000000000000030 rdi: ffff830079e1e000 >>> Jan 12 01:25:37.165584 (XEN) rbp: ffff83047de2ff08 rsp: ffff83047de2fea8 >>> r8: ffff82c00022f000 >>> Jan 12 01:25:37.173579 (XEN) r9: ffff8301b63ede80 r10: >>> ffff830176386560 r11: 000001955ee79bd0 >>> Jan 12 01:25:37.181582 (XEN) r12: 0000000000003002 r13: >>> 0000000000003002 r14: 0000000000000030 >>> Jan 12 01:25:37.189584 (XEN) r15: ffff83023fec2000 cr0: >>> 0000000080050033 cr4: 00000000003526e0 >>> Jan 12 01:25:37.197572 (XEN) cr3: 0000000232edb000 cr2: >>> 0000000002487034 >>> Jan 12 01:25:37.205569 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 >>> ss: 0000 cs: e008 >>> Jan 12 01:25:37.205606 (XEN) Xen code around <ffff82d0801ef7fc> >>> (vmx_intr_assist+0x35e/0x51d): >>> Jan 12 01:25:37.213575 (XEN) 41 0f b6 f6 39 f0 7e 02 <0f> 0b 48 89 df e8 51 >>> 20 00 00 b8 10 08 00 00 0f Jan 12 01:25:37.221561 (XEN) Xen stack trace >> >from rsp=ffff83047de2fea8: >>> Jan 12 01:25:37.229600 (XEN) ffff82d08031aa80 00000038ffffffff >>> ffff83047de2ffff ffff83023fec2000 >>> Jan 12 01:25:37.237594 (XEN) ffff83047de2fef8 ffff82d080130cb6 >>> ffff830079e1e000 ffff830079e1e000 >>> Jan 12 01:25:37.245588 (XEN) ffff83007bae2000 000000000000000e >>> ffff830233117000 ffff83023fec2000 >>> Jan 12 01:25:37.253594 (XEN) ffff83047de2fdc0 ffff82d0801fdeb1 >>> 0000000000000004 00000000000000c2 >>> Jan 12 01:25:37.261584 (XEN) 0000000000000020 0000000000000007 >>> ffff8800e8d28000 ffffffff81add0a0 >>> Jan 12 01:25:37.269607 (XEN) 0000000000000246 0000000000000000 >>> ffff880142400008 0000000000000004 >>> Jan 12 01:25:37.277580 (XEN) 0000000000000036 0000000000000000 >>> 00000000000003f8 00000000000003f8 >>> Jan 12 01:25:37.285584 (XEN) ffffffff81add0a0 0000beef0000beef >>> ffffffff813899a4 000000bf0000beef >>> Jan 12 01:25:37.293567 (XEN) 0000000000000002 ffff880147c03e08 >>> 000000000000beef 1cec835356e5beef >>> Jan 12 01:25:37.293606 (XEN) 085d8b002674beef 01dcb38b000cbeef >>> 8914458d3174beef 2444c7100000000e >>> Jan 12 01:25:37.301586 (XEN) ffff830079e1e000 00000031bfc37600 >>> 00000000003526e0 >>> Jan 12 01:25:37.309607 (XEN) Xen call trace: >>> Jan 12 01:25:37.309639 (XEN) [<ffff82d0801ef7fc>] >>> vmx_intr_assist+0x35e/0x51d >>> Jan 12 01:25:37.317591 (XEN) [<ffff82d0801fdeb1>] >>> vmx_asm_vmexit_handler+0x41/0x120 >>> Jan 12 01:25:37.325598 (XEN) >>> Jan 12 01:25:37.325624 (XEN) >>> Jan 12 01:25:37.325647 (XEN) >>> **************************************** >>> Jan 12 01:25:37.333653 (XEN) Panic on CPU 14: >>> Jan 12 01:25:37.333684 (XEN) Assertion 'intack.vector >= pt_vector' failed at >>> intr.c:321 Jan 12 01:25:37.341571 (XEN) >>> **************************************** >>> Jan 12 01:25:37.341603 (XEN) >>> Jan 12 01:25:37.341626 (XEN) Reboot in five seconds... >>> Jan 12 01:25:37.349566 (XEN) Resetting with ACPI MEMORY or I/O >>> RESET_REG. >>> >>> This is caused by "x86/apicv: fix RTC periodic timer and apicv issue". It is >>> not a deterministic issue, as it appears to have survived a week of testing >>> already, but there is clearly something still problematic with the code. >>> >> >> Andrew, >> If you have, could you give more information? > > No further information sorry. This was found by the automated test system. But some can be gathered: > Full logs are available from > http://logs.test-lab.xenproject.org/osstest/logs/104131/test-amd64-i386-xl-q > emuu-debianhvm-amd64/ > but I doubt any of them will help in diagnosing the issue any further. > >> Such as the value of intack.vector / pt_vector.. At leastb one of the two values is likely to live in a register, and hence its value would be available in the dump. Just takes looking at the disassembly. >> I guess, the reason may be that the intack.vector is ' uint8_t ' and the pt_vector is 'int'.. That would be odd. >> Or there is a corner case that intack.vector is __not__ the highest priority vector.. That's what I'm afraid of, and why I had asked to add the ASSERT(). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-01-12 12:25 ` Jan Beulich @ 2017-01-16 5:25 ` Tian, Kevin 2017-01-16 6:27 ` Xuquan (Quan Xu) ` (3 more replies) 0 siblings, 4 replies; 32+ messages in thread From: Tian, Kevin @ 2017-01-16 5:25 UTC (permalink / raw) To: Jan Beulich, Xuquan (Quan Xu) Cc: Andrew Cooper, xen-devel, osstest service owner, Gao, Chao > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Thursday, January 12, 2017 8:26 PM > > >>> On 12.01.17 at 13:15, <andrew.cooper3@citrix.com> wrote: > > On 12/01/17 12:07, Xuquan (Quan Xu) wrote: > >> On January 12, 2017 5:14 PM, Andrew Cooper wrote: > >>> On 12/01/2017 06:46, osstest service owner wrote: > >>>> flight 104131 xen-unstable real [real] > >>>> http://logs.test-lab.xenproject.org/osstest/logs/104131/ > >>>> > >>>> Regressions :-( > >>>> > >>>> Tests which did not succeed and are blocking, including tests which > >>>> could not be run: > >>>> test-amd64-i386-xl-qemuu-debianhvm-amd64 16 guest-stop fail > >>> REGR. vs. 104119 > >>> > >>> Jan 12 01:25:17.397607 (XEN) Assertion 'intack.vector >= pt_vector' failed at > >>> intr.c:321 > >>> Jan 12 01:25:37.133596 (XEN) ----[ Xen-4.9-unstable x86_64 debug=y > >>> Not tainted ]---- > >>> Jan 12 01:25:37.141577 (XEN) CPU: 14 > >>> Jan 12 01:25:37.141607 (XEN) RIP: e008:[<ffff82d0801ef7fc>] > >>> vmx_intr_assist+0x35e/0x51d > >>> Jan 12 01:25:37.149617 (XEN) RFLAGS: 0000000000010202 CONTEXT: > >>> hypervisor (d15v0) > >>> Jan 12 01:25:37.149655 (XEN) rax: 0000000000000038 rbx: > >>> ffff830079e1e000 rcx: 0000000000000030 > >>> Jan 12 01:25:37.157582 (XEN) rdx: 0000000000000000 rsi: > >>> 0000000000000030 rdi: ffff830079e1e000 > >>> Jan 12 01:25:37.165584 (XEN) rbp: ffff83047de2ff08 rsp: ffff83047de2fea8 > >>> r8: ffff82c00022f000 > >>> Jan 12 01:25:37.173579 (XEN) r9: ffff8301b63ede80 r10: > >>> ffff830176386560 r11: 000001955ee79bd0 > >>> Jan 12 01:25:37.181582 (XEN) r12: 0000000000003002 r13: > >>> 0000000000003002 r14: 0000000000000030 > >>> Jan 12 01:25:37.189584 (XEN) r15: ffff83023fec2000 cr0: > >>> 0000000080050033 cr4: 00000000003526e0 > >>> Jan 12 01:25:37.197572 (XEN) cr3: 0000000232edb000 cr2: > >>> 0000000002487034 > >>> Jan 12 01:25:37.205569 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 > >>> ss: 0000 cs: e008 > >>> Jan 12 01:25:37.205606 (XEN) Xen code around <ffff82d0801ef7fc> > >>> (vmx_intr_assist+0x35e/0x51d): > >>> Jan 12 01:25:37.213575 (XEN) 41 0f b6 f6 39 f0 7e 02 <0f> 0b 48 89 df e8 51 > >>> 20 00 00 b8 10 08 00 00 0f Jan 12 01:25:37.221561 (XEN) Xen stack trace > >> >from rsp=ffff83047de2fea8: > >>> Jan 12 01:25:37.229600 (XEN) ffff82d08031aa80 00000038ffffffff > >>> ffff83047de2ffff ffff83023fec2000 > >>> Jan 12 01:25:37.237594 (XEN) ffff83047de2fef8 ffff82d080130cb6 > >>> ffff830079e1e000 ffff830079e1e000 > >>> Jan 12 01:25:37.245588 (XEN) ffff83007bae2000 000000000000000e > >>> ffff830233117000 ffff83023fec2000 > >>> Jan 12 01:25:37.253594 (XEN) ffff83047de2fdc0 ffff82d0801fdeb1 > >>> 0000000000000004 00000000000000c2 > >>> Jan 12 01:25:37.261584 (XEN) 0000000000000020 0000000000000007 > >>> ffff8800e8d28000 ffffffff81add0a0 > >>> Jan 12 01:25:37.269607 (XEN) 0000000000000246 0000000000000000 > >>> ffff880142400008 0000000000000004 > >>> Jan 12 01:25:37.277580 (XEN) 0000000000000036 0000000000000000 > >>> 00000000000003f8 00000000000003f8 > >>> Jan 12 01:25:37.285584 (XEN) ffffffff81add0a0 0000beef0000beef > >>> ffffffff813899a4 000000bf0000beef > >>> Jan 12 01:25:37.293567 (XEN) 0000000000000002 ffff880147c03e08 > >>> 000000000000beef 1cec835356e5beef > >>> Jan 12 01:25:37.293606 (XEN) 085d8b002674beef 01dcb38b000cbeef > >>> 8914458d3174beef 2444c7100000000e > >>> Jan 12 01:25:37.301586 (XEN) ffff830079e1e000 00000031bfc37600 > >>> 00000000003526e0 > >>> Jan 12 01:25:37.309607 (XEN) Xen call trace: > >>> Jan 12 01:25:37.309639 (XEN) [<ffff82d0801ef7fc>] > >>> vmx_intr_assist+0x35e/0x51d > >>> Jan 12 01:25:37.317591 (XEN) [<ffff82d0801fdeb1>] > >>> vmx_asm_vmexit_handler+0x41/0x120 > >>> Jan 12 01:25:37.325598 (XEN) > >>> Jan 12 01:25:37.325624 (XEN) > >>> Jan 12 01:25:37.325647 (XEN) > >>> **************************************** > >>> Jan 12 01:25:37.333653 (XEN) Panic on CPU 14: > >>> Jan 12 01:25:37.333684 (XEN) Assertion 'intack.vector >= pt_vector' failed at > >>> intr.c:321 Jan 12 01:25:37.341571 (XEN) > >>> **************************************** > >>> Jan 12 01:25:37.341603 (XEN) > >>> Jan 12 01:25:37.341626 (XEN) Reboot in five seconds... > >>> Jan 12 01:25:37.349566 (XEN) Resetting with ACPI MEMORY or I/O > >>> RESET_REG. > >>> > >>> This is caused by "x86/apicv: fix RTC periodic timer and apicv issue". It is > >>> not a deterministic issue, as it appears to have survived a week of testing > >>> already, but there is clearly something still problematic with the code. > >>> > >> > >> Andrew, > >> If you have, could you give more information? > > > > No further information sorry. This was found by the automated test system. > > But some can be gathered: > > > Full logs are available from > > http://logs.test-lab.xenproject.org/osstest/logs/104131/test-amd64-i386-xl-q > > emuu-debianhvm-amd64/ > > but I doubt any of them will help in diagnosing the issue any further. > > > >> Such as the value of intack.vector / pt_vector.. > > At leastb one of the two values is likely to live in a register, and > hence its value would be available in the dump. Just takes looking > at the disassembly. > > >> I guess, the reason may be that the intack.vector is ' uint8_t ' and the pt_vector is 'int'.. > > That would be odd. > > >> Or there is a corner case that intack.vector is __not__ the highest priority vector.. > > That's what I'm afraid of, and why I had asked to add the ASSERT(). > I cannot come up a valid reason for such situation (intack.vector is 0x30 while pt_vector is 0x38 from Chao's data). pt_update_irq is invoked before checking highest pending IRRs so pt_vector should be honored anyway. One possible reason is that being some reason pt_vector is not in vIRR at that point (due to some bug in the path from PIR to vIRR). However I didn't catch such bug simply by looking at code. We need reproduce this problem in developer side to find out actual reason. Andrew it'd be helpful if you can help Quan/Chao to find out more test environment info. One thing noted though. The original patch from Quan is actually orthogonal to this ASSERT. Regardless of whether intack.vector is larger or smaller than pt_vector, we always require the trick as long as pt_vector is not the one being currently programmed to RVI. Then do we want to revert the whole commit until the problem is finally fixed, or OK to just remove ASSERT (or replace with WARN_ON with more debug info) to unblock test system before the fix is ready? Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-01-16 5:25 ` Tian, Kevin @ 2017-01-16 6:27 ` Xuquan (Quan Xu) 2017-01-16 0:06 ` Chao Gao 2017-01-16 11:00 ` Jan Beulich ` (2 subsequent siblings) 3 siblings, 1 reply; 32+ messages in thread From: Xuquan (Quan Xu) @ 2017-01-16 6:27 UTC (permalink / raw) To: Tian, Kevin, Jan Beulich, Gao, Chao Cc: Andrew Cooper, osstest service owner, xen-devel On January 16, 2017 1:26 PM, Tian, Kevin wrote: >> From: Jan Beulich [mailto:JBeulich@suse.com] >> Sent: Thursday, January 12, 2017 8:26 PM >> >> >>> On 12.01.17 at 13:15, <andrew.cooper3@citrix.com> wrote: >> > On 12/01/17 12:07, Xuquan (Quan Xu) wrote: >> >> On January 12, 2017 5:14 PM, Andrew Cooper wrote: >> >>> On 12/01/2017 06:46, osstest service owner wrote: >> >>>> flight 104131 xen-unstable real [real] >> >>>> http://logs.test-lab.xenproject.org/osstest/logs/104131/ >> >>>> >> >>>> Regressions :-( >> >>>> >> >>>> Tests which did not succeed and are blocking, including tests >> >>>> which could not be run: >> >>>> test-amd64-i386-xl-qemuu-debianhvm-amd64 16 guest-stop fail >> >>> REGR. vs. 104119 >> >>> >> >>> Jan 12 01:25:17.397607 (XEN) Assertion 'intack.vector >= >> >>> pt_vector' failed at >> >>> intr.c:321 >> >>> Jan 12 01:25:37.133596 (XEN) ----[ Xen-4.9-unstable x86_64 >> >>> debug=y Not tainted ]---- >> >>> Jan 12 01:25:37.141577 (XEN) CPU: 14 >> >>> Jan 12 01:25:37.141607 (XEN) RIP: e008:[<ffff82d0801ef7fc>] >> >>> vmx_intr_assist+0x35e/0x51d >> >>> Jan 12 01:25:37.149617 (XEN) RFLAGS: 0000000000010202 >CONTEXT: >> >>> hypervisor (d15v0) >> >>> Jan 12 01:25:37.149655 (XEN) rax: 0000000000000038 rbx: >> >>> ffff830079e1e000 rcx: 0000000000000030 >> >>> Jan 12 01:25:37.157582 (XEN) rdx: 0000000000000000 rsi: >> >>> 0000000000000030 rdi: ffff830079e1e000 >> >>> Jan 12 01:25:37.165584 (XEN) rbp: ffff83047de2ff08 rsp: >ffff83047de2fea8 >> >>> r8: ffff82c00022f000 >> >>> Jan 12 01:25:37.173579 (XEN) r9: ffff8301b63ede80 r10: >> >>> ffff830176386560 r11: 000001955ee79bd0 >> >>> Jan 12 01:25:37.181582 (XEN) r12: 0000000000003002 r13: >> >>> 0000000000003002 r14: 0000000000000030 >> >>> Jan 12 01:25:37.189584 (XEN) r15: ffff83023fec2000 cr0: >> >>> 0000000080050033 cr4: 00000000003526e0 >> >>> Jan 12 01:25:37.197572 (XEN) cr3: 0000000232edb000 cr2: >> >>> 0000000002487034 >> >>> Jan 12 01:25:37.205569 (XEN) ds: 0000 es: 0000 fs: 0000 gs: >0000 >> >>> ss: 0000 cs: e008 >> >>> Jan 12 01:25:37.205606 (XEN) Xen code around <ffff82d0801ef7fc> >> >>> (vmx_intr_assist+0x35e/0x51d): >> >>> Jan 12 01:25:37.213575 (XEN) 41 0f b6 f6 39 f0 7e 02 <0f> 0b 48 >> >>> 89 df e8 51 >> >>> 20 00 00 b8 10 08 00 00 0f Jan 12 01:25:37.221561 (XEN) Xen stack >> >>> trace >> >> >from rsp=ffff83047de2fea8: >> >>> Jan 12 01:25:37.229600 (XEN) ffff82d08031aa80 00000038ffffffff >> >>> ffff83047de2ffff ffff83023fec2000 >> >>> Jan 12 01:25:37.237594 (XEN) ffff83047de2fef8 ffff82d080130cb6 >> >>> ffff830079e1e000 ffff830079e1e000 >> >>> Jan 12 01:25:37.245588 (XEN) ffff83007bae2000 >000000000000000e >> >>> ffff830233117000 ffff83023fec2000 >> >>> Jan 12 01:25:37.253594 (XEN) ffff83047de2fdc0 ffff82d0801fdeb1 >> >>> 0000000000000004 00000000000000c2 >> >>> Jan 12 01:25:37.261584 (XEN) 0000000000000020 >0000000000000007 >> >>> ffff8800e8d28000 ffffffff81add0a0 >> >>> Jan 12 01:25:37.269607 (XEN) 0000000000000246 >0000000000000000 >> >>> ffff880142400008 0000000000000004 >> >>> Jan 12 01:25:37.277580 (XEN) 0000000000000036 >0000000000000000 >> >>> 00000000000003f8 00000000000003f8 >> >>> Jan 12 01:25:37.285584 (XEN) ffffffff81add0a0 0000beef0000beef >> >>> ffffffff813899a4 000000bf0000beef >> >>> Jan 12 01:25:37.293567 (XEN) 0000000000000002 >ffff880147c03e08 >> >>> 000000000000beef 1cec835356e5beef >> >>> Jan 12 01:25:37.293606 (XEN) 085d8b002674beef >01dcb38b000cbeef >> >>> 8914458d3174beef 2444c7100000000e >> >>> Jan 12 01:25:37.301586 (XEN) ffff830079e1e000 >00000031bfc37600 >> >>> 00000000003526e0 >> >>> Jan 12 01:25:37.309607 (XEN) Xen call trace: >> >>> Jan 12 01:25:37.309639 (XEN) [<ffff82d0801ef7fc>] >> >>> vmx_intr_assist+0x35e/0x51d >> >>> Jan 12 01:25:37.317591 (XEN) [<ffff82d0801fdeb1>] >> >>> vmx_asm_vmexit_handler+0x41/0x120 >> >>> Jan 12 01:25:37.325598 (XEN) >> >>> Jan 12 01:25:37.325624 (XEN) >> >>> Jan 12 01:25:37.325647 (XEN) >> >>> **************************************** >> >>> Jan 12 01:25:37.333653 (XEN) Panic on CPU 14: >> >>> Jan 12 01:25:37.333684 (XEN) Assertion 'intack.vector >= >> >>> pt_vector' failed at >> >>> intr.c:321 Jan 12 01:25:37.341571 (XEN) >> >>> **************************************** >> >>> Jan 12 01:25:37.341603 (XEN) >> >>> Jan 12 01:25:37.341626 (XEN) Reboot in five seconds... >> >>> Jan 12 01:25:37.349566 (XEN) Resetting with ACPI MEMORY or I/O >> >>> RESET_REG. >> >>> >> >>> This is caused by "x86/apicv: fix RTC periodic timer and apicv >> >>> issue". It is not a deterministic issue, as it appears to have >> >>> survived a week of testing already, but there is clearly something still >problematic with the code. >> >>> >> >> >> >> Andrew, >> >> If you have, could you give more information? >> > >> > No further information sorry. This was found by the automated test >system. >> >> But some can be gathered: >> >> > Full logs are available from >> > http://logs.test-lab.xenproject.org/osstest/logs/104131/test-amd64-i >> > 386-xl-q >> > emuu-debianhvm-amd64/ >> > but I doubt any of them will help in diagnosing the issue any further. >> > >> >> Such as the value of intack.vector / pt_vector.. >> >> At leastb one of the two values is likely to live in a register, and >> hence its value would be available in the dump. Just takes looking at >> the disassembly. >> >> >> I guess, the reason may be that the intack.vector is ' uint8_t ' and the >pt_vector is 'int'.. >> >> That would be odd. >> >> >> Or there is a corner case that intack.vector is __not__ the highest >priority vector.. >> >> That's what I'm afraid of, and why I had asked to add the ASSERT(). >> > >I cannot come up a valid reason for such situation (intack.vector is 0x30 >while pt_vector is 0x38 from Chao's data). pt_update_irq is invoked before >checking highest pending IRRs so pt_vector should be honored anyway. >One possible reason is that being some reason pt_vector is not in vIRR at >that point (due to some bug in the path from PIR to vIRR). However I didn't >catch such bug simply by looking at code. We need reproduce this problem >in developer side to find out actual reason. Andrew it'd be helpful if you >can help Quan/Chao to find out more test environment info. > I'll continue to follow up this issue.. However I don't have enough CPU-v3 machine for test it(occupied by another project).. I hope Chao could build some test environment.. Quan >One thing noted though. The original patch from Quan is actually >orthogonal to this ASSERT. Regardless of whether intack.vector is larger or >smaller than pt_vector, we always require the trick as long as pt_vector is >not the one being currently programmed to RVI. Then do we want to revert >the whole commit until the problem is finally fixed, or OK to just remove >ASSERT (or replace with WARN_ON with more debug info) to unblock test >system before the fix is ready? > >Thanks >Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-01-16 6:27 ` Xuquan (Quan Xu) @ 2017-01-16 0:06 ` Chao Gao 0 siblings, 0 replies; 32+ messages in thread From: Chao Gao @ 2017-01-16 0:06 UTC (permalink / raw) To: Xuquan (Quan Xu) Cc: Andrew Cooper, Tian, Kevin, osstest service owner, Jan Beulich, xen-devel On Mon, Jan 16, 2017 at 06:27:23AM +0000, Xuquan (Quan Xu) wrote: >On January 16, 2017 1:26 PM, Tian, Kevin wrote: >>> From: Jan Beulich [mailto:JBeulich@suse.com] >>> Sent: Thursday, January 12, 2017 8:26 PM >>> >>> >>> On 12.01.17 at 13:15, <andrew.cooper3@citrix.com> wrote: >>> > On 12/01/17 12:07, Xuquan (Quan Xu) wrote: >>> >> On January 12, 2017 5:14 PM, Andrew Cooper wrote: >>> >>> On 12/01/2017 06:46, osstest service owner wrote: >>> >>>> flight 104131 xen-unstable real [real] >>> >>>> http://logs.test-lab.xenproject.org/osstest/logs/104131/ >>> >>>> >>> >>>> Regressions :-( >>> >>>> >>> >>>> Tests which did not succeed and are blocking, including tests >>> >>>> which could not be run: >>> >>>> test-amd64-i386-xl-qemuu-debianhvm-amd64 16 guest-stop fail >>> >>> REGR. vs. 104119 >>> >>> >>> >>> Jan 12 01:25:17.397607 (XEN) Assertion 'intack.vector >= >>> >>> pt_vector' failed at >>> >>> intr.c:321 >>> >>> Jan 12 01:25:37.133596 (XEN) ----[ Xen-4.9-unstable x86_64 >>> >>> debug=y Not tainted ]---- >>> >>> Jan 12 01:25:37.141577 (XEN) CPU: 14 >>> >>> Jan 12 01:25:37.141607 (XEN) RIP: e008:[<ffff82d0801ef7fc>] >>> >>> vmx_intr_assist+0x35e/0x51d >>> >>> Jan 12 01:25:37.149617 (XEN) RFLAGS: 0000000000010202 >>CONTEXT: >>> >>> hypervisor (d15v0) >>> >>> Jan 12 01:25:37.149655 (XEN) rax: 0000000000000038 rbx: >>> >>> ffff830079e1e000 rcx: 0000000000000030 >>> >>> Jan 12 01:25:37.157582 (XEN) rdx: 0000000000000000 rsi: >>> >>> 0000000000000030 rdi: ffff830079e1e000 >>> >>> Jan 12 01:25:37.165584 (XEN) rbp: ffff83047de2ff08 rsp: >>ffff83047de2fea8 >>> >>> r8: ffff82c00022f000 >>> >>> Jan 12 01:25:37.173579 (XEN) r9: ffff8301b63ede80 r10: >>> >>> ffff830176386560 r11: 000001955ee79bd0 >>> >>> Jan 12 01:25:37.181582 (XEN) r12: 0000000000003002 r13: >>> >>> 0000000000003002 r14: 0000000000000030 >>> >>> Jan 12 01:25:37.189584 (XEN) r15: ffff83023fec2000 cr0: >>> >>> 0000000080050033 cr4: 00000000003526e0 >>> >>> Jan 12 01:25:37.197572 (XEN) cr3: 0000000232edb000 cr2: >>> >>> 0000000002487034 >>> >>> Jan 12 01:25:37.205569 (XEN) ds: 0000 es: 0000 fs: 0000 gs: >>0000 >>> >>> ss: 0000 cs: e008 >>> >>> Jan 12 01:25:37.205606 (XEN) Xen code around <ffff82d0801ef7fc> >>> >>> (vmx_intr_assist+0x35e/0x51d): >>> >>> Jan 12 01:25:37.213575 (XEN) 41 0f b6 f6 39 f0 7e 02 <0f> 0b 48 >>> >>> 89 df e8 51 >>> >>> 20 00 00 b8 10 08 00 00 0f Jan 12 01:25:37.221561 (XEN) Xen stack >>> >>> trace >>> >> >from rsp=ffff83047de2fea8: >>> >>> Jan 12 01:25:37.229600 (XEN) ffff82d08031aa80 00000038ffffffff >>> >>> ffff83047de2ffff ffff83023fec2000 >>> >>> Jan 12 01:25:37.237594 (XEN) ffff83047de2fef8 ffff82d080130cb6 >>> >>> ffff830079e1e000 ffff830079e1e000 >>> >>> Jan 12 01:25:37.245588 (XEN) ffff83007bae2000 >>000000000000000e >>> >>> ffff830233117000 ffff83023fec2000 >>> >>> Jan 12 01:25:37.253594 (XEN) ffff83047de2fdc0 ffff82d0801fdeb1 >>> >>> 0000000000000004 00000000000000c2 >>> >>> Jan 12 01:25:37.261584 (XEN) 0000000000000020 >>0000000000000007 >>> >>> ffff8800e8d28000 ffffffff81add0a0 >>> >>> Jan 12 01:25:37.269607 (XEN) 0000000000000246 >>0000000000000000 >>> >>> ffff880142400008 0000000000000004 >>> >>> Jan 12 01:25:37.277580 (XEN) 0000000000000036 >>0000000000000000 >>> >>> 00000000000003f8 00000000000003f8 >>> >>> Jan 12 01:25:37.285584 (XEN) ffffffff81add0a0 0000beef0000beef >>> >>> ffffffff813899a4 000000bf0000beef >>> >>> Jan 12 01:25:37.293567 (XEN) 0000000000000002 >>ffff880147c03e08 >>> >>> 000000000000beef 1cec835356e5beef >>> >>> Jan 12 01:25:37.293606 (XEN) 085d8b002674beef >>01dcb38b000cbeef >>> >>> 8914458d3174beef 2444c7100000000e >>> >>> Jan 12 01:25:37.301586 (XEN) ffff830079e1e000 >>00000031bfc37600 >>> >>> 00000000003526e0 >>> >>> Jan 12 01:25:37.309607 (XEN) Xen call trace: >>> >>> Jan 12 01:25:37.309639 (XEN) [<ffff82d0801ef7fc>] >>> >>> vmx_intr_assist+0x35e/0x51d >>> >>> Jan 12 01:25:37.317591 (XEN) [<ffff82d0801fdeb1>] >>> >>> vmx_asm_vmexit_handler+0x41/0x120 >>> >>> Jan 12 01:25:37.325598 (XEN) >>> >>> Jan 12 01:25:37.325624 (XEN) >>> >>> Jan 12 01:25:37.325647 (XEN) >>> >>> **************************************** >>> >>> Jan 12 01:25:37.333653 (XEN) Panic on CPU 14: >>> >>> Jan 12 01:25:37.333684 (XEN) Assertion 'intack.vector >= >>> >>> pt_vector' failed at >>> >>> intr.c:321 Jan 12 01:25:37.341571 (XEN) >>> >>> **************************************** >>> >>> Jan 12 01:25:37.341603 (XEN) >>> >>> Jan 12 01:25:37.341626 (XEN) Reboot in five seconds... >>> >>> Jan 12 01:25:37.349566 (XEN) Resetting with ACPI MEMORY or I/O >>> >>> RESET_REG. >>> >>> >>> >>> This is caused by "x86/apicv: fix RTC periodic timer and apicv >>> >>> issue". It is not a deterministic issue, as it appears to have >>> >>> survived a week of testing already, but there is clearly something still >>problematic with the code. >>> >>> >>> >> >>> >> Andrew, >>> >> If you have, could you give more information? >>> > >>> > No further information sorry. This was found by the automated test >>system. >>> >>> But some can be gathered: >>> >>> > Full logs are available from >>> > http://logs.test-lab.xenproject.org/osstest/logs/104131/test-amd64-i >>> > 386-xl-q >>> > emuu-debianhvm-amd64/ >>> > but I doubt any of them will help in diagnosing the issue any further. >>> > >>> >> Such as the value of intack.vector / pt_vector.. >>> >>> At leastb one of the two values is likely to live in a register, and >>> hence its value would be available in the dump. Just takes looking at >>> the disassembly. >>> >>> >> I guess, the reason may be that the intack.vector is ' uint8_t ' and the >>pt_vector is 'int'.. >>> >>> That would be odd. >>> >>> >> Or there is a corner case that intack.vector is __not__ the highest >>priority vector.. >>> >>> That's what I'm afraid of, and why I had asked to add the ASSERT(). >>> >> >>I cannot come up a valid reason for such situation (intack.vector is 0x30 >>while pt_vector is 0x38 from Chao's data). pt_update_irq is invoked before >>checking highest pending IRRs so pt_vector should be honored anyway. >>One possible reason is that being some reason pt_vector is not in vIRR at >>that point (due to some bug in the path from PIR to vIRR). However I didn't >>catch such bug simply by looking at code. We need reproduce this problem >>in developer side to find out actual reason. Andrew it'd be helpful if you >>can help Quan/Chao to find out more test environment info. >> > >I'll continue to follow up this issue.. >However I don't have enough CPU-v3 machine for test it(occupied by another project).. I hope Chao could build some test environment.. > No problem. When you come up with ideas or find some clues, I can verify them for you. thanks Chao > >Quan > > > > >>One thing noted though. The original patch from Quan is actually >>orthogonal to this ASSERT. Regardless of whether intack.vector is larger or >>smaller than pt_vector, we always require the trick as long as pt_vector is >>not the one being currently programmed to RVI. Then do we want to revert >>the whole commit until the problem is finally fixed, or OK to just remove >>ASSERT (or replace with WARN_ON with more debug info) to unblock test >>system before the fix is ready? >> >>Thanks >>Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-01-16 5:25 ` Tian, Kevin 2017-01-16 6:27 ` Xuquan (Quan Xu) @ 2017-01-16 11:00 ` Jan Beulich 2017-01-18 4:57 ` Tian, Kevin 2017-01-20 9:08 ` Xuquan (Quan Xu) 2017-01-23 10:56 ` Xuquan (Quan Xu) 3 siblings, 1 reply; 32+ messages in thread From: Jan Beulich @ 2017-01-16 11:00 UTC (permalink / raw) To: Xuquan (Quan Xu), Kevin Tian Cc: Andrew Cooper, xen-devel, osstest service owner, Chao Gao >>> On 16.01.17 at 06:25, <kevin.tian@intel.com> wrote: > One thing noted though. The original patch from Quan is actually orthogonal > to this ASSERT. Regardless of whether intack.vector is larger or smaller > than pt_vector, we always require the trick as long as pt_vector is not the > one being currently programmed to RVI. I don't think the ASSERT() addition is orthogonal: It exchanges intack.vector for pt_vector in the invocation of vmx_set_eoi_exit_bitmap(), and during discussion of the patch there at least intermediately was max() of the two used instead. It was - iirc - one of you who suggested that the use of max() there is unnecessary, which the ASSERT() triggering has now shown is wrong. > Then do we want to revert the whole > commit until the problem is finally fixed, or OK to just remove ASSERT > (or replace with WARN_ON with more debug info) to unblock test system > before the fix is ready? Well, as the VMX maintainer I think the proposal of whether to revert or wait should really come from you. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-01-16 11:00 ` Jan Beulich @ 2017-01-18 4:57 ` Tian, Kevin 2017-01-18 9:38 ` Jan Beulich 0 siblings, 1 reply; 32+ messages in thread From: Tian, Kevin @ 2017-01-18 4:57 UTC (permalink / raw) To: Jan Beulich, Xuquan (Quan Xu) Cc: Andrew Cooper, xen-devel, osstest service owner, Gao, Chao > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Monday, January 16, 2017 7:00 PM > > >>> On 16.01.17 at 06:25, <kevin.tian@intel.com> wrote: > > One thing noted though. The original patch from Quan is actually orthogonal > > to this ASSERT. Regardless of whether intack.vector is larger or smaller > > than pt_vector, we always require the trick as long as pt_vector is not the > > one being currently programmed to RVI. > > I don't think the ASSERT() addition is orthogonal: It exchanges > intack.vector for pt_vector in the invocation of > vmx_set_eoi_exit_bitmap(), and during discussion of the patch > there at least intermediately was max() of the two used instead. > It was - iirc - one of you who suggested that the use of max() > there is unnecessary, which the ASSERT() triggering has now > shown is wrong. Attached was my earlier comment: -- > >>> On 20.12.16 at 06:37, <kevin.tian@intel.com> wrote: > >> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com] > >> Sent: Friday, December 16, 2016 5:40 PM > >> - if (pt_vector != -1) > >> - vmx_set_eoi_exit_bitmap(v, pt_vector); > >> + if ( pt_vector != -1 ) { > >> + if ( intack.vector > pt_vector ) > >> + vmx_set_eoi_exit_bitmap(v, intack.vector); > >> + else > >> + vmx_set_eoi_exit_bitmap(v, pt_vector); > >> + } > > > > Above can be simplified as one line change: > > if ( pt_vector != -1 ) > > vmx_set_eoi_exit_bitmap(v, intack.vector); > > Hmm, I don't understand. Did you mean to use max() here? Or > else how is this an equivalent of the originally proposed code? > Original code is not 100% correct. The purpose is to set EOI exit bitmap for any vector which may block injection of pt_vector - give chance to recognize pt_vector in future intack and then do pt intr post. The simplified code achieves this effect same as original code if intack.vector >= vector. I cannot come up a case why intack.vector might be smaller than vector. If this case happens, we still need enable exit bitmap for intack.vector instead of pt_vector for said purpose while original code did it wrong. Thanks Kevin -- Using intack.vector is always expected here regardless of the comparison result between intack.vector and pt_vector. The reason why I was OK adding an ASSERT was simply to test whether intack.vecor<pt_vector does happen which is orthogonal to the fix itself. > > > Then do we want to revert the whole > > commit until the problem is finally fixed, or OK to just remove ASSERT > > (or replace with WARN_ON with more debug info) to unblock test system > > before the fix is ready? > > Well, as the VMX maintainer I think the proposal of whether to > revert or wait should really come from you. > > Jan Andrew, how long do you usually tolerate a failure case in osstest? I'm not sure how long it may take for developer to reproduce this situation. If it has blocking impact in your side, I'd suggest go replacing ASSERT with more informative warn info before final root-cause, if Quan&Chao cannot reproduce in a short time (say 1 or 2wk or so). Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-01-18 4:57 ` Tian, Kevin @ 2017-01-18 9:38 ` Jan Beulich 2017-01-18 10:23 ` Tian, Kevin 2017-01-20 8:47 ` Xuquan (Quan Xu) 0 siblings, 2 replies; 32+ messages in thread From: Jan Beulich @ 2017-01-18 9:38 UTC (permalink / raw) To: Xuquan (Quan Xu), Kevin Tian Cc: Andrew Cooper, xen-devel, osstest service owner, Chao Gao >>> On 18.01.17 at 05:57, <kevin.tian@intel.com> wrote: > Attached was my earlier comment: > > -- >> >>> On 20.12.16 at 06:37, <kevin.tian@intel.com> wrote: >> >> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com] >> >> Sent: Friday, December 16, 2016 5:40 PM >> >> - if (pt_vector != -1) >> >> - vmx_set_eoi_exit_bitmap(v, pt_vector); >> >> + if ( pt_vector != -1 ) { >> >> + if ( intack.vector > pt_vector ) >> >> + vmx_set_eoi_exit_bitmap(v, intack.vector); >> >> + else >> >> + vmx_set_eoi_exit_bitmap(v, pt_vector); >> >> + } >> > >> > Above can be simplified as one line change: >> > if ( pt_vector != -1 ) >> > vmx_set_eoi_exit_bitmap(v, intack.vector); >> >> Hmm, I don't understand. Did you mean to use max() here? Or >> else how is this an equivalent of the originally proposed code? >> > > Original code is not 100% correct. The purpose is to set EOI exit > bitmap for any vector which may block injection of pt_vector - > give chance to recognize pt_vector in future intack and then do pt > intr post. The simplified code achieves this effect same as original > code if intack.vector >= vector. I cannot come up a case why > intack.vector might be smaller than vector. If this case happens, > we still need enable exit bitmap for intack.vector instead of > pt_vector for said purpose while original code did it wrong. > > Thanks > Kevin > -- > > Using intack.vector is always expected here regardless of the > comparison result between intack.vector and pt_vector. The > reason why I was OK adding an ASSERT was simply to test > whether intack.vecor<pt_vector does happen which is > orthogonal to the fix itself. Well, a vector lower than pt_vector can't block delivery. Or wait: Don't we need to consider vector classes here, i.e. ASSERT((intack.vector >> 4) >= (pt_vector >> 4)); ? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-01-18 9:38 ` Jan Beulich @ 2017-01-18 10:23 ` Tian, Kevin 2017-01-20 11:48 ` Jan Beulich 2017-01-20 8:47 ` Xuquan (Quan Xu) 1 sibling, 1 reply; 32+ messages in thread From: Tian, Kevin @ 2017-01-18 10:23 UTC (permalink / raw) To: Jan Beulich, Xuquan (Quan Xu) Cc: Andrew Cooper, xen-devel, osstest service owner, Gao, Chao > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Wednesday, January 18, 2017 5:38 PM > > >>> On 18.01.17 at 05:57, <kevin.tian@intel.com> wrote: > > Attached was my earlier comment: > > > > -- > >> >>> On 20.12.16 at 06:37, <kevin.tian@intel.com> wrote: > >> >> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com] > >> >> Sent: Friday, December 16, 2016 5:40 PM > >> >> - if (pt_vector != -1) > >> >> - vmx_set_eoi_exit_bitmap(v, pt_vector); > >> >> + if ( pt_vector != -1 ) { > >> >> + if ( intack.vector > pt_vector ) > >> >> + vmx_set_eoi_exit_bitmap(v, intack.vector); > >> >> + else > >> >> + vmx_set_eoi_exit_bitmap(v, pt_vector); > >> >> + } > >> > > >> > Above can be simplified as one line change: > >> > if ( pt_vector != -1 ) > >> > vmx_set_eoi_exit_bitmap(v, intack.vector); > >> > >> Hmm, I don't understand. Did you mean to use max() here? Or > >> else how is this an equivalent of the originally proposed code? > >> > > > > Original code is not 100% correct. The purpose is to set EOI exit > > bitmap for any vector which may block injection of pt_vector - > > give chance to recognize pt_vector in future intack and then do pt > > intr post. The simplified code achieves this effect same as original > > code if intack.vector >= vector. I cannot come up a case why > > intack.vector might be smaller than vector. If this case happens, > > we still need enable exit bitmap for intack.vector instead of > > pt_vector for said purpose while original code did it wrong. > > > > Thanks > > Kevin > > -- > > > > Using intack.vector is always expected here regardless of the > > comparison result between intack.vector and pt_vector. The > > reason why I was OK adding an ASSERT was simply to test > > whether intack.vecor<pt_vector does happen which is > > orthogonal to the fix itself. > > Well, a vector lower than pt_vector can't block delivery. Or wait: There are two points here: a) We need enable EOI exit bitmap when pt_vector is blocked. b) As you said, ideally a vector lower than pt_vecotr cannot block The patch fixed a) and then added an ASSERT to verify b). Strictly speaking they are separate issues. > Don't we need to consider vector classes here, i.e. > > ASSERT((intack.vector >> 4) >= (pt_vector >> 4)); > > ? > However it still doesn't explain why original ASSERT is triggered. vlapic_find_highest_vector actually finds the highest vector, instead of highest class... static int vlapic_find_highest_vector(const void *bitmap) { const uint32_t *word = bitmap; unsigned int word_offset = NR_VECTORS / 32; /* Work backwards through the bitmap (first 32-bit word in every four). */ while ( (word_offset != 0) && (word[(--word_offset)*4] == 0) ) continue; return (fls(word[word_offset*4]) - 1) + (word_offset * 32); } Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-01-18 10:23 ` Tian, Kevin @ 2017-01-20 11:48 ` Jan Beulich 2017-01-22 4:35 ` Tian, Kevin 0 siblings, 1 reply; 32+ messages in thread From: Jan Beulich @ 2017-01-20 11:48 UTC (permalink / raw) To: Kevin Tian Cc: Andrew Cooper, Xuquan (Quan Xu), xen-devel, osstest service owner, Chao Gao >>> On 18.01.17 at 11:23, <kevin.tian@intel.com> wrote: >> From: Jan Beulich [mailto:JBeulich@suse.com] >> Sent: Wednesday, January 18, 2017 5:38 PM >> >> >>> On 18.01.17 at 05:57, <kevin.tian@intel.com> wrote: >> > Attached was my earlier comment: >> > >> > -- >> >> >>> On 20.12.16 at 06:37, <kevin.tian@intel.com> wrote: >> >> >> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com] >> >> >> Sent: Friday, December 16, 2016 5:40 PM >> >> >> - if (pt_vector != -1) >> >> >> - vmx_set_eoi_exit_bitmap(v, pt_vector); >> >> >> + if ( pt_vector != -1 ) { >> >> >> + if ( intack.vector > pt_vector ) >> >> >> + vmx_set_eoi_exit_bitmap(v, intack.vector); >> >> >> + else >> >> >> + vmx_set_eoi_exit_bitmap(v, pt_vector); >> >> >> + } >> >> > >> >> > Above can be simplified as one line change: >> >> > if ( pt_vector != -1 ) >> >> > vmx_set_eoi_exit_bitmap(v, intack.vector); >> >> >> >> Hmm, I don't understand. Did you mean to use max() here? Or >> >> else how is this an equivalent of the originally proposed code? >> >> >> > >> > Original code is not 100% correct. The purpose is to set EOI exit >> > bitmap for any vector which may block injection of pt_vector - >> > give chance to recognize pt_vector in future intack and then do pt >> > intr post. The simplified code achieves this effect same as original >> > code if intack.vector >= vector. I cannot come up a case why >> > intack.vector might be smaller than vector. If this case happens, >> > we still need enable exit bitmap for intack.vector instead of >> > pt_vector for said purpose while original code did it wrong. >> > >> > Thanks >> > Kevin >> > -- >> > >> > Using intack.vector is always expected here regardless of the >> > comparison result between intack.vector and pt_vector. The >> > reason why I was OK adding an ASSERT was simply to test >> > whether intack.vecor<pt_vector does happen which is >> > orthogonal to the fix itself. >> >> Well, a vector lower than pt_vector can't block delivery. Or wait: > > There are two points here: > > a) We need enable EOI exit bitmap when pt_vector is blocked. > > b) As you said, ideally a vector lower than pt_vecotr cannot block > > The patch fixed a) and then added an ASSERT to verify b). Strictly > speaking they are separate issues. Okay, I think I finally understand your argumentation here. >> Don't we need to consider vector classes here, i.e. >> >> ASSERT((intack.vector >> 4) >= (pt_vector >> 4)); >> >> ? >> > > However it still doesn't explain why original ASSERT is triggered. > vlapic_find_highest_vector actually finds the highest vector, instead > of highest class... > > static int vlapic_find_highest_vector(const void *bitmap) > { > const uint32_t *word = bitmap; > unsigned int word_offset = NR_VECTORS / 32; > > /* Work backwards through the bitmap (first 32-bit word in every four). */ > while ( (word_offset != 0) && (word[(--word_offset)*4] == 0) ) > continue; > > return (fls(word[word_offset*4]) - 1) + (word_offset * 32); > } Well, perhaps a PIR -> IRR syncing issue then (I in particular note the early bailing from vmx_sync_pir_to_irr())? I guess we'd want to see the entire IRR array (and perhaps also PI state) if the check in the assertion fails. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-01-20 11:48 ` Jan Beulich @ 2017-01-22 4:35 ` Tian, Kevin 2017-01-23 10:33 ` Jan Beulich 0 siblings, 1 reply; 32+ messages in thread From: Tian, Kevin @ 2017-01-22 4:35 UTC (permalink / raw) To: Jan Beulich Cc: Andrew Cooper, Xuquan (Quan Xu), xen-devel, osstest service owner, Gao, Chao > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Friday, January 20, 2017 7:49 PM > > >>> On 18.01.17 at 11:23, <kevin.tian@intel.com> wrote: > >> From: Jan Beulich [mailto:JBeulich@suse.com] > >> Sent: Wednesday, January 18, 2017 5:38 PM > >> > >> >>> On 18.01.17 at 05:57, <kevin.tian@intel.com> wrote: > >> > Attached was my earlier comment: > >> > > >> > -- > >> >> >>> On 20.12.16 at 06:37, <kevin.tian@intel.com> wrote: > >> >> >> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com] > >> >> >> Sent: Friday, December 16, 2016 5:40 PM > >> >> >> - if (pt_vector != -1) > >> >> >> - vmx_set_eoi_exit_bitmap(v, pt_vector); > >> >> >> + if ( pt_vector != -1 ) { > >> >> >> + if ( intack.vector > pt_vector ) > >> >> >> + vmx_set_eoi_exit_bitmap(v, intack.vector); > >> >> >> + else > >> >> >> + vmx_set_eoi_exit_bitmap(v, pt_vector); > >> >> >> + } > >> >> > > >> >> > Above can be simplified as one line change: > >> >> > if ( pt_vector != -1 ) > >> >> > vmx_set_eoi_exit_bitmap(v, intack.vector); > >> >> > >> >> Hmm, I don't understand. Did you mean to use max() here? Or > >> >> else how is this an equivalent of the originally proposed code? > >> >> > >> > > >> > Original code is not 100% correct. The purpose is to set EOI exit > >> > bitmap for any vector which may block injection of pt_vector - > >> > give chance to recognize pt_vector in future intack and then do pt > >> > intr post. The simplified code achieves this effect same as original > >> > code if intack.vector >= vector. I cannot come up a case why > >> > intack.vector might be smaller than vector. If this case happens, > >> > we still need enable exit bitmap for intack.vector instead of > >> > pt_vector for said purpose while original code did it wrong. > >> > > >> > Thanks > >> > Kevin > >> > -- > >> > > >> > Using intack.vector is always expected here regardless of the > >> > comparison result between intack.vector and pt_vector. The > >> > reason why I was OK adding an ASSERT was simply to test > >> > whether intack.vecor<pt_vector does happen which is > >> > orthogonal to the fix itself. > >> > >> Well, a vector lower than pt_vector can't block delivery. Or wait: > > > > There are two points here: > > > > a) We need enable EOI exit bitmap when pt_vector is blocked. > > > > b) As you said, ideally a vector lower than pt_vecotr cannot block > > > > The patch fixed a) and then added an ASSERT to verify b). Strictly > > speaking they are separate issues. > > Okay, I think I finally understand your argumentation here. > > >> Don't we need to consider vector classes here, i.e. > >> > >> ASSERT((intack.vector >> 4) >= (pt_vector >> 4)); > >> > >> ? > >> > > > > However it still doesn't explain why original ASSERT is triggered. > > vlapic_find_highest_vector actually finds the highest vector, instead > > of highest class... > > > > static int vlapic_find_highest_vector(const void *bitmap) > > { > > const uint32_t *word = bitmap; > > unsigned int word_offset = NR_VECTORS / 32; > > > > /* Work backwards through the bitmap (first 32-bit word in every four). */ > > while ( (word_offset != 0) && (word[(--word_offset)*4] == 0) ) > > continue; > > > > return (fls(word[word_offset*4]) - 1) + (word_offset * 32); > > } > > Well, perhaps a PIR -> IRR syncing issue then (I in particular note > the early bailing from vmx_sync_pir_to_irr())? I guess we'd want > to see the entire IRR array (and perhaps also PI state) if the check > in the assertion fails. > Yes, I asked Chao to add some debug info in that case. The problem now is when we will reproduce the bug to see meaningful hint... Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-01-22 4:35 ` Tian, Kevin @ 2017-01-23 10:33 ` Jan Beulich 0 siblings, 0 replies; 32+ messages in thread From: Jan Beulich @ 2017-01-23 10:33 UTC (permalink / raw) To: Chao Gao, Kevin Tian Cc: Andrew Cooper, Xuquan (Quan Xu), osstest service owner, xen-devel >>> On 22.01.17 at 05:35, <kevin.tian@intel.com> wrote: > Yes, I asked Chao to add some debug info in that case. The problem now > is when we will reproduce the bug to see meaningful hint... If written reasonably, feel free to submit the patch for considering to put it into staging for a while. The bigger problem is that we'd then need to constantly scan logs for incidents. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-01-18 9:38 ` Jan Beulich 2017-01-18 10:23 ` Tian, Kevin @ 2017-01-20 8:47 ` Xuquan (Quan Xu) 2017-01-20 8:56 ` Jan Beulich 1 sibling, 1 reply; 32+ messages in thread From: Xuquan (Quan Xu) @ 2017-01-20 8:47 UTC (permalink / raw) To: Jan Beulich, Kevin Tian Cc: Andrew Cooper, xen-devel, osstest service owner, Chao Gao On January 18, 2017 5:38 PM, Jan Beulich wrote: >>>> On 18.01.17 at 05:57, <kevin.tian@intel.com> wrote: >> Attached was my earlier comment: >> >> -- >>> >>> On 20.12.16 at 06:37, <kevin.tian@intel.com> wrote: >>> >> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com] >>> >> Sent: Friday, December 16, 2016 5:40 PM >>> >> - if (pt_vector != -1) >>> >> - vmx_set_eoi_exit_bitmap(v, pt_vector); >>> >> + if ( pt_vector != -1 ) { >>> >> + if ( intack.vector > pt_vector ) >>> >> + vmx_set_eoi_exit_bitmap(v, intack.vector); >>> >> + else >>> >> + vmx_set_eoi_exit_bitmap(v, pt_vector); >>> >> + } >>> > >>> > Above can be simplified as one line change: >>> > if ( pt_vector != -1 ) >>> > vmx_set_eoi_exit_bitmap(v, intack.vector); >>> >>> Hmm, I don't understand. Did you mean to use max() here? Or else how >>> is this an equivalent of the originally proposed code? >>> >> >> Original code is not 100% correct. The purpose is to set EOI exit >> bitmap for any vector which may block injection of pt_vector - give >> chance to recognize pt_vector in future intack and then do pt intr >> post. The simplified code achieves this effect same as original code >> if intack.vector >= vector. I cannot come up a case why intack.vector >> might be smaller than vector. If this case happens, we still need >> enable exit bitmap for intack.vector instead of pt_vector for said >> purpose while original code did it wrong. >> >> Thanks >> Kevin >> -- >> >> Using intack.vector is always expected here regardless of the >> comparison result between intack.vector and pt_vector. The reason why >> I was OK adding an ASSERT was simply to test whether >> intack.vecor<pt_vector does happen which is orthogonal to the fix >> itself. > >Well, a vector lower than pt_vector can't block delivery. Or wait: >Don't we need to consider vector classes here, i.e. > > ASSERT((intack.vector >> 4) >= (pt_vector >> 4)); > Jan, I can't follow vector classes.. could you explain more? Thanks.. Quan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-01-20 8:47 ` Xuquan (Quan Xu) @ 2017-01-20 8:56 ` Jan Beulich 0 siblings, 0 replies; 32+ messages in thread From: Jan Beulich @ 2017-01-20 8:56 UTC (permalink / raw) To: Xuquan (Quan Xu) Cc: Andrew Cooper, Kevin Tian, xen-devel, osstest service owner, Chao Gao >>> On 20.01.17 at 09:47, <xuquan8@huawei.com> wrote: > Jan, I can't follow vector classes.. could you explain more? Thanks.. For determining vector priority, the LAPIC uses only the high 4 bits. Iirc this is well documented in the SDM. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-01-16 5:25 ` Tian, Kevin 2017-01-16 6:27 ` Xuquan (Quan Xu) 2017-01-16 11:00 ` Jan Beulich @ 2017-01-20 9:08 ` Xuquan (Quan Xu) 2017-01-23 10:56 ` Xuquan (Quan Xu) 3 siblings, 0 replies; 32+ messages in thread From: Xuquan (Quan Xu) @ 2017-01-20 9:08 UTC (permalink / raw) To: Tian, Kevin, Jan Beulich Cc: Andrew Cooper, xen-devel, osstest service owner, Gao, Chao On January 16, 2017 1:26 PM, Tian, Kevin wrote: >I cannot come up a valid reason for such situation (intack.vector is 0x30 while >pt_vector is 0x38 from Chao's data). pt_update_irq is invoked before checking >highest pending IRRs so pt_vector should be honored anyway. >One possible reason is that being some reason pt_vector is not in vIRR at that >point (due to some bug in the path from PIR to vIRR). btw, for PIR.. I find that there might be a bug in __vmx_deliver_posted_interrupt()... why test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) ?? static void __vmx_deliver_posted_interrupt(struct vcpu *v) { ... if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) ... } Suppose that vCPUx is in guest mode, there are two (even more) interrupts to vCPUx.. As the bit is set when delivers the first interrupt... the second interrupt is pending until next VM entry -- by PIR to vIRR.. Quan However I didn't catch >such bug simply by looking at code. We need reproduce this problem in >developer side to find out actual reason. Andrew it'd be helpful if you can help >Quan/Chao to find out more test environment info. > >One thing noted though. The original patch from Quan is actually orthogonal to >this ASSERT. Regardless of whether intack.vector is larger or smaller than >pt_vector, we always require the trick as long as pt_vector is not the one being >currently programmed to RVI. Then do we want to revert the whole commit >until the problem is finally fixed, or OK to just remove ASSERT (or replace with >WARN_ON with more debug info) to unblock test system before the fix is >ready? > >Thanks >Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-01-16 5:25 ` Tian, Kevin ` (2 preceding siblings ...) 2017-01-20 9:08 ` Xuquan (Quan Xu) @ 2017-01-23 10:56 ` Xuquan (Quan Xu) 2017-02-08 6:51 ` Tian, Kevin 3 siblings, 1 reply; 32+ messages in thread From: Xuquan (Quan Xu) @ 2017-01-23 10:56 UTC (permalink / raw) To: Xuquan (Quan Xu), Tian, Kevin, Jan Beulich Cc: Andrew Cooper, xen-devel, osstest service owner, Gao, Chao On January 20, 2017 5:09 PM, Quan Xu wrote: >btw, for PIR.. I find that there might be a bug in >__vmx_deliver_posted_interrupt()... >why test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) ?? > >static void __vmx_deliver_posted_interrupt(struct vcpu *v) { ... > if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, >&softirq_pending(cpu)) ... >} > >Suppose that vCPUx is in guest mode, there are two (even more) interrupts >to vCPUx.. >As the bit is set when delivers the first interrupt... the second interrupt is >pending until next VM entry -- by PIR to vIRR.. > Jan , Kevin Correct me if I am wrong... Quan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-01-23 10:56 ` Xuquan (Quan Xu) @ 2017-02-08 6:51 ` Tian, Kevin 2017-02-08 8:27 ` Xuquan (Quan Xu) 0 siblings, 1 reply; 32+ messages in thread From: Tian, Kevin @ 2017-02-08 6:51 UTC (permalink / raw) To: Xuquan (Quan Xu), Jan Beulich Cc: Andrew Cooper, xen-devel, osstest service owner, Gao, Chao > From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com] > Sent: Monday, January 23, 2017 6:57 PM > > On January 20, 2017 5:09 PM, Quan Xu wrote: > >btw, for PIR.. I find that there might be a bug in > >__vmx_deliver_posted_interrupt()... > >why test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) ?? > > > >static void __vmx_deliver_posted_interrupt(struct vcpu *v) { ... > > if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, > >&softirq_pending(cpu)) ... > >} > > > >Suppose that vCPUx is in guest mode, there are two (even more) interrupts > >to vCPUx.. > >As the bit is set when delivers the first interrupt... the second interrupt is > >pending until next VM entry -- by PIR to vIRR.. > > > > Jan , Kevin > Correct me if I am wrong... > > Quan I don't quite understand the point here. Can you elaborate? Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-02-08 6:51 ` Tian, Kevin @ 2017-02-08 8:27 ` Xuquan (Quan Xu) 2017-02-08 8:52 ` Jan Beulich 0 siblings, 1 reply; 32+ messages in thread From: Xuquan (Quan Xu) @ 2017-02-08 8:27 UTC (permalink / raw) To: Tian, Kevin, Jan Beulich Cc: yang.zhang.wz, Andrew Cooper, xen-devel, Gao, Chao On February 08, 2017 2:51 PM, Tian, Kevin wrote: >> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com] >> Sent: Monday, January 23, 2017 6:57 PM >> >> On January 20, 2017 5:09 PM, Quan Xu wrote: >> >btw, for PIR.. I find that there might be a bug in >> >__vmx_deliver_posted_interrupt()... >> >why test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) ?? >> > >> >static void __vmx_deliver_posted_interrupt(struct vcpu *v) { ... >> > if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, >> >&softirq_pending(cpu)) ... >> >} >> > >> >Suppose that vCPUx is in guest mode, there are two (even more) >> >interrupts to vCPUx.. >> >As the bit is set when delivers the first interrupt... the second >> >interrupt is pending until next VM entry -- by PIR to vIRR.. >> > >> >> Jan , Kevin >> Correct me if I am wrong... >> >> Quan > >I don't quite understand the point here. Can you elaborate? > Assumed vCPU is in guest_mode.. When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), then __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit (also no vcpu_kick() ).. In __vmx_deliver_posted_interrupt(), it is __conditional__ to deliver posted interrupt. if posted interrupt is not delivered, the posted interrupt is pending until next VM entry -- by PIR to vIRR.. one condition is : In __vmx_deliver_posted_interrupt(), ' if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' .. Specifically, we did verify it by RES interrupt, which is used for smp_reschedule_interrupt.. We even cost more time to deliver RES interrupt than no-apicv in average.. If RES interrupt (no. 1) is delivered by posted way (the vcpu is still guest_mode).. when tries to deliver next-coming RES interrupt (no. 2) by posted way, The next-coming RES interrupt (no. 2) is not delivered, as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES interrupt (no. 1).. Then the next-coming RES interrupt (no. 2) is pending until next VM entry -- by PIR to vIRR.. We can fix it as below(I don't think this is a best one, it is better to set the VCPU_KICK_SOFTIRQ bit, but not test it): --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -1846,7 +1846,7 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v) { unsigned int cpu = v->processor; - if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) + if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) && (cpu != smp_processor_id()) ) send_IPI_mask(cpumask_of(cpu), posted_intr_vector); } To be honest, I really spent several days for the original awkward-description:).. Happy Spring Festival!! Quan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-02-08 8:27 ` Xuquan (Quan Xu) @ 2017-02-08 8:52 ` Jan Beulich 2017-02-08 10:15 ` Xuquan (Quan Xu) ` (2 more replies) 0 siblings, 3 replies; 32+ messages in thread From: Jan Beulich @ 2017-02-08 8:52 UTC (permalink / raw) To: Xuquan (Quan Xu) Cc: yang.zhang.wz, Andrew Cooper, Kevin Tian, xen-devel, Chao Gao >>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote: > Assumed vCPU is in guest_mode.. > When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), then > __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit (also no > vcpu_kick() ).. > In __vmx_deliver_posted_interrupt(), it is __conditional__ to deliver posted > interrupt. if posted interrupt is not delivered, the posted interrupt is > pending until next VM entry -- by PIR to vIRR.. > > one condition is : > In __vmx_deliver_posted_interrupt(), ' if ( > !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' .. > > Specifically, we did verify it by RES interrupt, which is used for > smp_reschedule_interrupt.. > We even cost more time to deliver RES interrupt than no-apicv in average.. > > If RES interrupt (no. 1) is delivered by posted way (the vcpu is still > guest_mode).. when tries to deliver next-coming RES interrupt (no. 2) by > posted way, > The next-coming RES interrupt (no. 2) is not delivered, as we set the > VCPU_KICK_SOFTIRQ bit when we deliver RES interrupt (no. 1).. > > Then the next-coming RES interrupt (no. 2) is pending until next VM entry -- by > PIR to vIRR.. > > > We can fix it as below(I don't think this is a best one, it is better to set > the VCPU_KICK_SOFTIRQ bit, but not test it): > > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -1846,7 +1846,7 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v) > { > unsigned int cpu = v->processor; > > - if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) > + if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) > && (cpu != smp_processor_id()) ) > send_IPI_mask(cpumask_of(cpu), posted_intr_vector); > } While I don't think I fully understand your description, the line you change here has always been puzzling me: If we were to raise a softirq here, we ought to call cpu_raise_softirq() instead of partly open coding what it does. So I think not marking that softirq pending (but doing this incompletely) is a valid change in any case. But I'll have to defer to Kevin in the hopes that he fully understands what you explain above as well as him knowing why this was a test-and-set here in the first place. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-02-08 8:52 ` Jan Beulich @ 2017-02-08 10:15 ` Xuquan (Quan Xu) 2017-02-08 8:22 ` Chao Gao 2017-02-13 8:21 ` Tian, Kevin 2017-02-13 8:23 ` Tian, Kevin 2 siblings, 1 reply; 32+ messages in thread From: Xuquan (Quan Xu) @ 2017-02-08 10:15 UTC (permalink / raw) To: Jan Beulich, Kevin Tian; +Cc: yang.zhang.wz, Andrew Cooper, xen-devel, Chao Gao On February 08, 2017 4:52 PM, Jan Beulich wrote: >>>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote: >> Assumed vCPU is in guest_mode.. >> When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), >> then >> __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit >> (also no >> vcpu_kick() ).. >> In __vmx_deliver_posted_interrupt(), it is __conditional__ to deliver >> posted interrupt. if posted interrupt is not delivered, the posted >> interrupt is pending until next VM entry -- by PIR to vIRR.. >> >> one condition is : >> In __vmx_deliver_posted_interrupt(), ' if ( >> !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' .. >> >> Specifically, we did verify it by RES interrupt, which is used for >> smp_reschedule_interrupt.. >> We even cost more time to deliver RES interrupt than no-apicv in >average.. >> >> If RES interrupt (no. 1) is delivered by posted way (the vcpu is still >> guest_mode).. when tries to deliver next-coming RES interrupt (no. 2) >> by posted way, The next-coming RES interrupt (no. 2) is not delivered, >> as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES interrupt (no. >> 1).. >> >> Then the next-coming RES interrupt (no. 2) is pending until next VM >> entry -- by PIR to vIRR.. >> >> >> We can fix it as below(I don't think this is a best one, it is better >> to set the VCPU_KICK_SOFTIRQ bit, but not test it): >> >> --- a/xen/arch/x86/hvm/vmx/vmx.c >> +++ b/xen/arch/x86/hvm/vmx/vmx.c >> @@ -1846,7 +1846,7 @@ static void >__vmx_deliver_posted_interrupt(struct vcpu *v) >> { >> unsigned int cpu = v->processor; >> >> - if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, >&softirq_pending(cpu)) >> + if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) >> && (cpu != smp_processor_id()) ) >> send_IPI_mask(cpumask_of(cpu), posted_intr_vector); >> } > >While I don't think I fully understand your description, Sorry!! >the line you change >here has always been puzzling me: If we were to raise a softirq here, we >ought to call cpu_raise_softirq() instead of partly open coding what it does. >So I think not marking that softirq pending (but doing this incompletely) is >a valid change in any case. As comments in pi_notification_interrupt() -- xen/arch/x86/hvm/vmx/vmx.c (((( * * we need to set VCPU_KICK_SOFTIRQ for the current cpu, just like * __vmx_deliver_posted_interrupt(). So the pending interrupt in PIRR will * be synced to vIRR before VM-Exit in time. * )))) I think setting VCPU_KICK_SOFTIRQ bit -- the pending interrupt in PIRR will be synced to vIRR before VM-Exit in time. That's also why i said it is better to set the VCPU_KICK_SOFTIRQ bit, but not test it.. >But I'll have to defer to Kevin in the hopes that he fully understands what >you explain above as well as him knowing why this was a test-and-set here >in the first place. > To me, this test-and-set is a bug. Quan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-02-08 10:15 ` Xuquan (Quan Xu) @ 2017-02-08 8:22 ` Chao Gao 2017-02-09 8:51 ` Xuquan (Quan Xu) 0 siblings, 1 reply; 32+ messages in thread From: Chao Gao @ 2017-02-08 8:22 UTC (permalink / raw) To: Xuquan (Quan Xu) Cc: yang.zhang.wz, Andrew Cooper, Kevin Tian, Jan Beulich, xen-devel On Wed, Feb 08, 2017 at 10:15:28AM +0000, Xuquan (Quan Xu) wrote: >On February 08, 2017 4:52 PM, Jan Beulich wrote: >>>>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote: >>> Assumed vCPU is in guest_mode.. >>> When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), >>> then >>> __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit >>> (also no >>> vcpu_kick() ).. >>> In __vmx_deliver_posted_interrupt(), it is __conditional__ to deliver >>> posted interrupt. if posted interrupt is not delivered, the posted >>> interrupt is pending until next VM entry -- by PIR to vIRR.. >>> >>> one condition is : >>> In __vmx_deliver_posted_interrupt(), ' if ( >>> !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' .. >>> >>> Specifically, we did verify it by RES interrupt, which is used for >>> smp_reschedule_interrupt.. >>> We even cost more time to deliver RES interrupt than no-apicv in >>average.. >>> >>> If RES interrupt (no. 1) is delivered by posted way (the vcpu is still >>> guest_mode).. when tries to deliver next-coming RES interrupt (no. 2) >>> by posted way, The next-coming RES interrupt (no. 2) is not delivered, >>> as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES interrupt (no. >>> 1).. >>> >>> Then the next-coming RES interrupt (no. 2) is pending until next VM >>> entry -- by PIR to vIRR.. >>> >>> >>> We can fix it as below(I don't think this is a best one, it is better >>> to set the VCPU_KICK_SOFTIRQ bit, but not test it): >>> >>> --- a/xen/arch/x86/hvm/vmx/vmx.c >>> +++ b/xen/arch/x86/hvm/vmx/vmx.c >>> @@ -1846,7 +1846,7 @@ static void >>__vmx_deliver_posted_interrupt(struct vcpu *v) >>> { >>> unsigned int cpu = v->processor; >>> >>> - if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, >>&softirq_pending(cpu)) >>> + if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) >>> && (cpu != smp_processor_id()) ) >>> send_IPI_mask(cpumask_of(cpu), posted_intr_vector); >>> } >> >>While I don't think I fully understand your description, > >Sorry!! > >>the line you change >>here has always been puzzling me: If we were to raise a softirq here, we >>ought to call cpu_raise_softirq() instead of partly open coding what it does. >>So I think not marking that softirq pending (but doing this incompletely) is >>a valid change in any case. > >As comments in pi_notification_interrupt() -- xen/arch/x86/hvm/vmx/vmx.c >(((( > * > * we need to set VCPU_KICK_SOFTIRQ for the current cpu, just like > * __vmx_deliver_posted_interrupt(). So the pending interrupt in PIRR will > * be synced to vIRR before VM-Exit in time. > * >)))) > >I think setting VCPU_KICK_SOFTIRQ bit -- the pending interrupt in PIRR will be synced to vIRR before VM-Exit in time. >That's also why i said it is better to set the VCPU_KICK_SOFTIRQ bit, but not test it.. > I think there is a typo. It should be "before VM-Entry in time". It set VCPU_KICK_SOFTIRQ bit only to jump to vmx_do_vmentry again instead of entering guest directly. Jumping to vmx_do_vmentry again can re-sync the PIR to vIRR in vmx_intr_assist(). In root-mode, cpu treat the pi notification interrupt as normal interrupt, so cpu will run the interrupt handler pi_notification_interrupt() instead of syncing PIR to vIRR automatically. Receiving a pi notificatio interrupt means some interrupts have been posted in PIR. Setting that bit is to deliver these new arrival interrupts before this VM-entry. Thanks chao > >>But I'll have to defer to Kevin in the hopes that he fully understands what >>you explain above as well as him knowing why this was a test-and-set here >>in the first place. >> > >To me, this test-and-set is a bug. > >Quan > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-02-08 8:22 ` Chao Gao @ 2017-02-09 8:51 ` Xuquan (Quan Xu) 2017-02-09 3:41 ` Chao Gao 0 siblings, 1 reply; 32+ messages in thread From: Xuquan (Quan Xu) @ 2017-02-09 8:51 UTC (permalink / raw) To: Chao Gao; +Cc: yang.zhang.wz, Andrew Cooper, Kevin Tian, Jan Beulich, xen-devel On February 08, 2017 4:22 PM, Chao Gao wrote: >On Wed, Feb 08, 2017 at 10:15:28AM +0000, Xuquan (Quan Xu) wrote: >>On February 08, 2017 4:52 PM, Jan Beulich wrote: >>>>>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote: >>>> Assumed vCPU is in guest_mode.. >>>> When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), >>>> then >>>> __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit >>>> (also no >>>> vcpu_kick() ).. >>>> In __vmx_deliver_posted_interrupt(), it is __conditional__ to >>>> deliver posted interrupt. if posted interrupt is not delivered, the >>>> posted interrupt is pending until next VM entry -- by PIR to vIRR.. >>>> >>>> one condition is : >>>> In __vmx_deliver_posted_interrupt(), ' if ( >>>> !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' .. >>>> >>>> Specifically, we did verify it by RES interrupt, which is used for >>>> smp_reschedule_interrupt.. >>>> We even cost more time to deliver RES interrupt than no-apicv in >>>average.. >>>> >>>> If RES interrupt (no. 1) is delivered by posted way (the vcpu is >>>> still guest_mode).. when tries to deliver next-coming RES interrupt >>>> (no. 2) by posted way, The next-coming RES interrupt (no. 2) is not >>>> delivered, as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES >interrupt (no. >>>> 1).. >>>> >>>> Then the next-coming RES interrupt (no. 2) is pending until next VM >>>> entry -- by PIR to vIRR.. >>>> >>>> >>>> We can fix it as below(I don't think this is a best one, it is >>>> better to set the VCPU_KICK_SOFTIRQ bit, but not test it): >>>> >>>> --- a/xen/arch/x86/hvm/vmx/vmx.c >>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c >>>> @@ -1846,7 +1846,7 @@ static void >>>__vmx_deliver_posted_interrupt(struct vcpu *v) >>>> { >>>> unsigned int cpu = v->processor; >>>> >>>> - if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, >>>&softirq_pending(cpu)) >>>> + if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) >>>> && (cpu != smp_processor_id()) ) >>>> send_IPI_mask(cpumask_of(cpu), posted_intr_vector); >>>> } >>> >>>While I don't think I fully understand your description, >> >>Sorry!! >> >>>the line you change >>>here has always been puzzling me: If we were to raise a softirq here, >>>we ought to call cpu_raise_softirq() instead of partly open coding what it >does. >>>So I think not marking that softirq pending (but doing this >>>incompletely) is a valid change in any case. >> >>As comments in pi_notification_interrupt() -- >>xen/arch/x86/hvm/vmx/vmx.c (((( >> * >> * we need to set VCPU_KICK_SOFTIRQ for the current cpu, just like >> * __vmx_deliver_posted_interrupt(). So the pending interrupt in >PIRR will >> * be synced to vIRR before VM-Exit in time. >> * >>)))) >> >>I think setting VCPU_KICK_SOFTIRQ bit -- the pending interrupt in PIRR will >be synced to vIRR before VM-Exit in time. >>That's also why i said it is better to set the VCPU_KICK_SOFTIRQ bit, but >not test it.. >> > >I think there is a typo. It should be "before VM-Entry in time". It set >VCPU_KICK_SOFTIRQ bit only to jump to vmx_do_vmentry again instead of >entering guest directly. Jumping to vmx_do_vmentry again can re-sync the >PIR to vIRR in vmx_intr_assist(). impressive analysis.. chao, could you show the related code? Quan >In root-mode, cpu treat the pi notification >interrupt as normal interrupt, so cpu will run the interrupt handler >pi_notification_interrupt() instead of syncing PIR to vIRR automatically. >Receiving a pi notificatio interrupt means some interrupts have been >posted in PIR. Setting that bit is to deliver these new arrival interrupts >before this VM-entry. > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-02-09 8:51 ` Xuquan (Quan Xu) @ 2017-02-09 3:41 ` Chao Gao 0 siblings, 0 replies; 32+ messages in thread From: Chao Gao @ 2017-02-09 3:41 UTC (permalink / raw) To: Xuquan (Quan Xu) Cc: yang.zhang.wz, Andrew Cooper, Kevin Tian, Jan Beulich, xen-devel On Thu, Feb 09, 2017 at 08:51:46AM +0000, Xuquan (Quan Xu) wrote: >On February 08, 2017 4:22 PM, Chao Gao wrote: >>On Wed, Feb 08, 2017 at 10:15:28AM +0000, Xuquan (Quan Xu) wrote: >>>On February 08, 2017 4:52 PM, Jan Beulich wrote: >>>>>>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote: >>>>> Assumed vCPU is in guest_mode.. >>>>> When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), >>>>> then >>>>> __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit >>>>> (also no >>>>> vcpu_kick() ).. >>>>> In __vmx_deliver_posted_interrupt(), it is __conditional__ to >>>>> deliver posted interrupt. if posted interrupt is not delivered, the >>>>> posted interrupt is pending until next VM entry -- by PIR to vIRR.. >>>>> >>>>> one condition is : >>>>> In __vmx_deliver_posted_interrupt(), ' if ( >>>>> !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' .. >>>>> >>>>> Specifically, we did verify it by RES interrupt, which is used for >>>>> smp_reschedule_interrupt.. >>>>> We even cost more time to deliver RES interrupt than no-apicv in >>>>average.. >>>>> >>>>> If RES interrupt (no. 1) is delivered by posted way (the vcpu is >>>>> still guest_mode).. when tries to deliver next-coming RES interrupt >>>>> (no. 2) by posted way, The next-coming RES interrupt (no. 2) is not >>>>> delivered, as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES >>interrupt (no. >>>>> 1).. >>>>> >>>>> Then the next-coming RES interrupt (no. 2) is pending until next VM >>>>> entry -- by PIR to vIRR.. >>>>> >>>>> >>>>> We can fix it as below(I don't think this is a best one, it is >>>>> better to set the VCPU_KICK_SOFTIRQ bit, but not test it): >>>>> >>>>> --- a/xen/arch/x86/hvm/vmx/vmx.c >>>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c >>>>> @@ -1846,7 +1846,7 @@ static void >>>>__vmx_deliver_posted_interrupt(struct vcpu *v) >>>>> { >>>>> unsigned int cpu = v->processor; >>>>> >>>>> - if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, >>>>&softirq_pending(cpu)) >>>>> + if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) >>>>> && (cpu != smp_processor_id()) ) >>>>> send_IPI_mask(cpumask_of(cpu), posted_intr_vector); >>>>> } >>>> >>>>While I don't think I fully understand your description, >>> >>>Sorry!! >>> >>>>the line you change >>>>here has always been puzzling me: If we were to raise a softirq here, >>>>we ought to call cpu_raise_softirq() instead of partly open coding what it >>does. >>>>So I think not marking that softirq pending (but doing this >>>>incompletely) is a valid change in any case. >>> >>>As comments in pi_notification_interrupt() -- >>>xen/arch/x86/hvm/vmx/vmx.c (((( >>> * >>> * we need to set VCPU_KICK_SOFTIRQ for the current cpu, just like >>> * __vmx_deliver_posted_interrupt(). So the pending interrupt in >>PIRR will >>> * be synced to vIRR before VM-Exit in time. >>> * >>>)))) >>> >>>I think setting VCPU_KICK_SOFTIRQ bit -- the pending interrupt in PIRR will >>be synced to vIRR before VM-Exit in time. >>>That's also why i said it is better to set the VCPU_KICK_SOFTIRQ bit, but >>not test it.. >>> >> >>I think there is a typo. It should be "before VM-Entry in time". It set >>VCPU_KICK_SOFTIRQ bit only to jump to vmx_do_vmentry again instead of >>entering guest directly. Jumping to vmx_do_vmentry again can re-sync the >>PIR to vIRR in vmx_intr_assist(). > >impressive analysis.. >chao, could you show the related code? > In xen/arch/x86/vmx/entry.S, .Lvmx_do_vmentry: call vmx_intr_assist ... cli cmp %ecx,(%rdx,%rax,1) jnz .Lvmx_process_softirqs and .Lvmx_process_softirqs: sti call do_softirq jmp .Lvmx_do_vmentry In vmx_intr_assist(), PIR is synced to vIRR. After vmx_intr_assist(), if some interrupts are posted in PIR, VCPU_KICK_SOFTIRQ is set in pi_nofitication_interrupt() and it will jump to vmx_process_softirqs and jump to vmx_do_vmentry again. Thanks Chao >Quan > > >>In root-mode, cpu treat the pi notification >>interrupt as normal interrupt, so cpu will run the interrupt handler >>pi_notification_interrupt() instead of syncing PIR to vIRR automatically. >>Receiving a pi notificatio interrupt means some interrupts have been >>posted in PIR. Setting that bit is to deliver these new arrival interrupts >>before this VM-entry. >> > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-02-08 8:52 ` Jan Beulich 2017-02-08 10:15 ` Xuquan (Quan Xu) @ 2017-02-13 8:21 ` Tian, Kevin 2017-02-20 12:04 ` Xuquan (Quan Xu) 2017-02-13 8:23 ` Tian, Kevin 2 siblings, 1 reply; 32+ messages in thread From: Tian, Kevin @ 2017-02-13 8:21 UTC (permalink / raw) To: Jan Beulich, Xuquan (Quan Xu) Cc: yang.zhang.wz, Andrew Cooper, xen-devel, Gao, Chao > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Wednesday, February 08, 2017 4:52 PM > > >>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote: > > Assumed vCPU is in guest_mode.. > > When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), then > > __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit (also no > > vcpu_kick() ).. > > In __vmx_deliver_posted_interrupt(), it is __conditional__ to deliver posted > > interrupt. if posted interrupt is not delivered, the posted interrupt is > > pending until next VM entry -- by PIR to vIRR.. > > > > one condition is : > > In __vmx_deliver_posted_interrupt(), ' if ( > > !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' .. > > > > Specifically, we did verify it by RES interrupt, which is used for > > smp_reschedule_interrupt.. > > We even cost more time to deliver RES interrupt than no-apicv in average.. > > > > If RES interrupt (no. 1) is delivered by posted way (the vcpu is still > > guest_mode).. when tries to deliver next-coming RES interrupt (no. 2) by > > posted way, > > The next-coming RES interrupt (no. 2) is not delivered, as we set the > > VCPU_KICK_SOFTIRQ bit when we deliver RES interrupt (no. 1).. > > > > Then the next-coming RES interrupt (no. 2) is pending until next VM entry -- by > > PIR to vIRR.. > > > > > > We can fix it as below(I don't think this is a best one, it is better to set > > the VCPU_KICK_SOFTIRQ bit, but not test it): > > > > --- a/xen/arch/x86/hvm/vmx/vmx.c > > +++ b/xen/arch/x86/hvm/vmx/vmx.c > > @@ -1846,7 +1846,7 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v) > > { > > unsigned int cpu = v->processor; > > > > - if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) > > + if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) > > && (cpu != smp_processor_id()) ) > > send_IPI_mask(cpumask_of(cpu), posted_intr_vector); > > } > > While I don't think I fully understand your description, the line you > change here has always been puzzling me: If we were to raise a > softirq here, we ought to call cpu_raise_softirq() instead of partly > open coding what it does. We require posted_intr_vector for target CPU to ack/deliver virtual interrupt in non-root mode. cpu_raise_softirq uses a different vector, which cannot trigger such effect. > So I think not marking that softirq > pending (but doing this incompletely) is a valid change in any case. > But I'll have to defer to Kevin in the hopes that he fully > understands what you explain above as well as him knowing why > this was a test-and-set here in the first place. > I agree we have a misuse of softirq mechanism here. If guest is already in non-root mode, the 1st posted interrupt will be directly delivered to guest (leaving softirq being set w/o actually incurring a VM-exit - breaking desired softirq behavior). Then further posted interrupts will skip the IPI, stay in PIR and not noted until another VM-exit happens. Looks Quan observes such delay of delivery in his experiments. I'm OK to remove the set here. Actually since it's an optimization for less IPIs, we'd better check softirq_pending(cpu) directly instead of sticking to one bit only. Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-02-13 8:21 ` Tian, Kevin @ 2017-02-20 12:04 ` Xuquan (Quan Xu) 2017-02-21 3:17 ` Tian, Kevin 0 siblings, 1 reply; 32+ messages in thread From: Xuquan (Quan Xu) @ 2017-02-20 12:04 UTC (permalink / raw) To: Tian, Kevin, Jan Beulich Cc: yang.zhang.wz, Andrew Cooper, xen-devel, Gao, Chao On February 13, 2017 4:21 PM, Tian, Kevin wrote: >> From: Jan Beulich [mailto:JBeulich@suse.com] >> Sent: Wednesday, February 08, 2017 4:52 PM >> >> >>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote: >> > Assumed vCPU is in guest_mode.. >> > When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), >> > then >> > __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit >> > (also no >> > vcpu_kick() ).. >> > In __vmx_deliver_posted_interrupt(), it is __conditional__ to >> > deliver posted interrupt. if posted interrupt is not delivered, the >> > posted interrupt is pending until next VM entry -- by PIR to vIRR.. >> > >> > one condition is : >> > In __vmx_deliver_posted_interrupt(), ' if ( >> > !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' .. >> > >> > Specifically, we did verify it by RES interrupt, which is used for >> > smp_reschedule_interrupt.. >> > We even cost more time to deliver RES interrupt than no-apicv in >average.. >> > >> > If RES interrupt (no. 1) is delivered by posted way (the vcpu is >> > still guest_mode).. when tries to deliver next-coming RES interrupt >> > (no. 2) by posted way, The next-coming RES interrupt (no. 2) is not >> > delivered, as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES >> > interrupt (no. 1).. >> > >> > Then the next-coming RES interrupt (no. 2) is pending until next VM >> > entry -- by PIR to vIRR.. >> > >> > >> > We can fix it as below(I don't think this is a best one, it is >> > better to set the VCPU_KICK_SOFTIRQ bit, but not test it): >> > >> > --- a/xen/arch/x86/hvm/vmx/vmx.c >> > +++ b/xen/arch/x86/hvm/vmx/vmx.c >> > @@ -1846,7 +1846,7 @@ static void >__vmx_deliver_posted_interrupt(struct vcpu *v) >> > { >> > unsigned int cpu = v->processor; >> > >> > - if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, >&softirq_pending(cpu)) >> > + if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) >> > && (cpu != smp_processor_id()) ) >> > send_IPI_mask(cpumask_of(cpu), posted_intr_vector); >> > } >> >> While I don't think I fully understand your description, the line you >> change here has always been puzzling me: If we were to raise a softirq >> here, we ought to call cpu_raise_softirq() instead of partly open >> coding what it does. > >We require posted_intr_vector for target CPU to ack/deliver virtual >interrupt in non-root mode. cpu_raise_softirq uses a different vector, which >cannot trigger such effect. > Kevin, I can't follow this 'to ack'.. As I understand, the posted_intr_vector is to call event_check_interrupt() [ or pi_notification_interrupt() ] to writes zero to the EOI register in the local APIC -- this dismisses the interrupt with the posted interrupt notification vector from the local APIC. What does this ack refer to? >> So I think not marking that softirq >> pending (but doing this incompletely) is a valid change in any case. >> But I'll have to defer to Kevin in the hopes that he fully understands >> what you explain above as well as him knowing why this was a >> test-and-set here in the first place. >> > >I agree we have a misuse of softirq mechanism here. If guest is already in >non-root mode, the 1st posted interrupt will be directly delivered to guest >(leaving softirq being set w/o actually incurring a VM-exit - breaking desired >softirq behavior). Then further posted interrupts will skip the IPI, stay in PIR >and not noted until another VM-exit happens. Looks Quan observes such >delay of delivery in his experiments. > >I'm OK to remove the set here. Actually since it's an optimization for less >IPIs, we'd better check softirq_pending(cpu) directly instead of sticking to >one bit only. > >Thanks >Kevin > > > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-02-20 12:04 ` Xuquan (Quan Xu) @ 2017-02-21 3:17 ` Tian, Kevin 0 siblings, 0 replies; 32+ messages in thread From: Tian, Kevin @ 2017-02-21 3:17 UTC (permalink / raw) To: Xuquan (Quan Xu), Jan Beulich Cc: yang.zhang.wz, Andrew Cooper, xen-devel, Gao, Chao > From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com] > Sent: Monday, February 20, 2017 8:04 PM > > On February 13, 2017 4:21 PM, Tian, Kevin wrote: > >> From: Jan Beulich [mailto:JBeulich@suse.com] > >> Sent: Wednesday, February 08, 2017 4:52 PM > >> > >> >>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote: > >> > Assumed vCPU is in guest_mode.. > >> > When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), > >> > then > >> > __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit > >> > (also no > >> > vcpu_kick() ).. > >> > In __vmx_deliver_posted_interrupt(), it is __conditional__ to > >> > deliver posted interrupt. if posted interrupt is not delivered, the > >> > posted interrupt is pending until next VM entry -- by PIR to vIRR.. > >> > > >> > one condition is : > >> > In __vmx_deliver_posted_interrupt(), ' if ( > >> > !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' .. > >> > > >> > Specifically, we did verify it by RES interrupt, which is used for > >> > smp_reschedule_interrupt.. > >> > We even cost more time to deliver RES interrupt than no-apicv in > >average.. > >> > > >> > If RES interrupt (no. 1) is delivered by posted way (the vcpu is > >> > still guest_mode).. when tries to deliver next-coming RES interrupt > >> > (no. 2) by posted way, The next-coming RES interrupt (no. 2) is not > >> > delivered, as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES > >> > interrupt (no. 1).. > >> > > >> > Then the next-coming RES interrupt (no. 2) is pending until next VM > >> > entry -- by PIR to vIRR.. > >> > > >> > > >> > We can fix it as below(I don't think this is a best one, it is > >> > better to set the VCPU_KICK_SOFTIRQ bit, but not test it): > >> > > >> > --- a/xen/arch/x86/hvm/vmx/vmx.c > >> > +++ b/xen/arch/x86/hvm/vmx/vmx.c > >> > @@ -1846,7 +1846,7 @@ static void > >__vmx_deliver_posted_interrupt(struct vcpu *v) > >> > { > >> > unsigned int cpu = v->processor; > >> > > >> > - if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, > >&softirq_pending(cpu)) > >> > + if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) > >> > && (cpu != smp_processor_id()) ) > >> > send_IPI_mask(cpumask_of(cpu), posted_intr_vector); > >> > } > >> > >> While I don't think I fully understand your description, the line you > >> change here has always been puzzling me: If we were to raise a softirq > >> here, we ought to call cpu_raise_softirq() instead of partly open > >> coding what it does. > > > >We require posted_intr_vector for target CPU to ack/deliver virtual > >interrupt in non-root mode. cpu_raise_softirq uses a different vector, which > >cannot trigger such effect. > > > > > Kevin, > > I can't follow this 'to ack'.. > As I understand, the posted_intr_vector is to call event_check_interrupt() [ or > pi_notification_interrupt() ] to writes zero to the EOI register in the local APIC -- > this dismisses the interrupt with the posted interrupt notification vector from the local > APIC. > > What does this ack refer to? > Please look at SDM. 'ack' means evaluation of pending vIRRs when CPU is in non-root mode which results in direct virtual interrupt delivery w/o incurring VM-exit. Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-02-08 8:52 ` Jan Beulich 2017-02-08 10:15 ` Xuquan (Quan Xu) 2017-02-13 8:21 ` Tian, Kevin @ 2017-02-13 8:23 ` Tian, Kevin 2017-02-14 1:22 ` Xuquan (Quan Xu) 2 siblings, 1 reply; 32+ messages in thread From: Tian, Kevin @ 2017-02-13 8:23 UTC (permalink / raw) To: Jan Beulich, Xuquan (Quan Xu) Cc: yang.zhang.wz, Andrew Cooper, xen-devel, Gao, Chao > From: Tian, Kevin > Sent: Monday, February 13, 2017 4:21 PM > > > From: Jan Beulich [mailto:JBeulich@suse.com] > > Sent: Wednesday, February 08, 2017 4:52 PM > > > > >>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote: > > > Assumed vCPU is in guest_mode.. > > > When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), then > > > __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit (also no > > > vcpu_kick() ).. > > > In __vmx_deliver_posted_interrupt(), it is __conditional__ to deliver posted > > > interrupt. if posted interrupt is not delivered, the posted interrupt is > > > pending until next VM entry -- by PIR to vIRR.. > > > > > > one condition is : > > > In __vmx_deliver_posted_interrupt(), ' if ( > > > !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' .. > > > > > > Specifically, we did verify it by RES interrupt, which is used for > > > smp_reschedule_interrupt.. > > > We even cost more time to deliver RES interrupt than no-apicv in average.. > > > > > > If RES interrupt (no. 1) is delivered by posted way (the vcpu is still > > > guest_mode).. when tries to deliver next-coming RES interrupt (no. 2) by > > > posted way, > > > The next-coming RES interrupt (no. 2) is not delivered, as we set the > > > VCPU_KICK_SOFTIRQ bit when we deliver RES interrupt (no. 1).. > > > > > > Then the next-coming RES interrupt (no. 2) is pending until next VM entry -- by > > > PIR to vIRR.. > > > > > > > > > We can fix it as below(I don't think this is a best one, it is better to set > > > the VCPU_KICK_SOFTIRQ bit, but not test it): > > > > > > --- a/xen/arch/x86/hvm/vmx/vmx.c > > > +++ b/xen/arch/x86/hvm/vmx/vmx.c > > > @@ -1846,7 +1846,7 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v) > > > { > > > unsigned int cpu = v->processor; > > > > > > - if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) > > > + if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) > > > && (cpu != smp_processor_id()) ) > > > send_IPI_mask(cpumask_of(cpu), posted_intr_vector); > > > } > > > > While I don't think I fully understand your description, the line you > > change here has always been puzzling me: If we were to raise a > > softirq here, we ought to call cpu_raise_softirq() instead of partly > > open coding what it does. > > We require posted_intr_vector for target CPU to ack/deliver virtual > interrupt in non-root mode. cpu_raise_softirq uses a different vector, > which cannot trigger such effect. > > > So I think not marking that softirq > > pending (but doing this incompletely) is a valid change in any case. > > But I'll have to defer to Kevin in the hopes that he fully > > understands what you explain above as well as him knowing why > > this was a test-and-set here in the first place. > > > > I agree we have a misuse of softirq mechanism here. If guest > is already in non-root mode, the 1st posted interrupt will be directly > delivered to guest (leaving softirq being set w/o actually incurring a > VM-exit - breaking desired softirq behavior). Then further posted > interrupts will skip the IPI, stay in PIR and not noted until another > VM-exit happens. Looks Quan observes such delay of delivery in > his experiments. > > I'm OK to remove the set here. Actually since it's an optimization > for less IPIs, we'd better check softirq_pending(cpu) directly > instead of sticking to one bit only. > sent too fast... Quan, can you work out a patch following this suggestion and see whether your slow-delivery issue is solved? (hope I understand your issue correctly here). Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [xen-unstable test] 104131: regressions - FAIL 2017-02-13 8:23 ` Tian, Kevin @ 2017-02-14 1:22 ` Xuquan (Quan Xu) 0 siblings, 0 replies; 32+ messages in thread From: Xuquan (Quan Xu) @ 2017-02-14 1:22 UTC (permalink / raw) To: Tian, Kevin, Jan Beulich Cc: yang.zhang.wz, Andrew Cooper, xen-devel, Gao, Chao On February 13, 2017 4:24 PM, Tian, Kevin wrote: >> From: Tian, Kevin >> Sent: Monday, February 13, 2017 4:21 PM >> >> > From: Jan Beulich [mailto:JBeulich@suse.com] >> > Sent: Wednesday, February 08, 2017 4:52 PM >> > >> > >>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote: >> > > Assumed vCPU is in guest_mode.. >> > > When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), >> > > then >> > > __vmx_deliver_posted_interrupt() to deliver interrupt, but no >> > > vmexit (also no >> > > vcpu_kick() ).. >> > > In __vmx_deliver_posted_interrupt(), it is __conditional__ to >> > > deliver posted interrupt. if posted interrupt is not delivered, >> > > the posted interrupt is pending until next VM entry -- by PIR to vIRR.. >> > > >> > > one condition is : >> > > In __vmx_deliver_posted_interrupt(), ' if ( >> > > !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' .. >> > > >> > > Specifically, we did verify it by RES interrupt, which is used for >> > > smp_reschedule_interrupt.. >> > > We even cost more time to deliver RES interrupt than no-apicv in >average.. >> > > >> > > If RES interrupt (no. 1) is delivered by posted way (the vcpu is >> > > still guest_mode).. when tries to deliver next-coming RES >> > > interrupt (no. 2) by posted way, The next-coming RES interrupt >> > > (no. 2) is not delivered, as we set the VCPU_KICK_SOFTIRQ bit when >> > > we deliver RES interrupt (no. 1).. >> > > >> > > Then the next-coming RES interrupt (no. 2) is pending until next >> > > VM entry -- by PIR to vIRR.. >> > > >> > > >> > > We can fix it as below(I don't think this is a best one, it is >> > > better to set the VCPU_KICK_SOFTIRQ bit, but not test it): >> > > >> > > --- a/xen/arch/x86/hvm/vmx/vmx.c >> > > +++ b/xen/arch/x86/hvm/vmx/vmx.c >> > > @@ -1846,7 +1846,7 @@ static void >__vmx_deliver_posted_interrupt(struct vcpu *v) >> > > { >> > > unsigned int cpu = v->processor; >> > > >> > > - if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, >&softirq_pending(cpu)) >> > > + if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) >> > > && (cpu != smp_processor_id()) ) >> > > send_IPI_mask(cpumask_of(cpu), posted_intr_vector); >> > > } >> > >> > While I don't think I fully understand your description, the line >> > you change here has always been puzzling me: If we were to raise a >> > softirq here, we ought to call cpu_raise_softirq() instead of partly >> > open coding what it does. >> >> We require posted_intr_vector for target CPU to ack/deliver virtual >> interrupt in non-root mode. cpu_raise_softirq uses a different vector, >> which cannot trigger such effect. >> >> > So I think not marking that softirq >> > pending (but doing this incompletely) is a valid change in any case. >> > But I'll have to defer to Kevin in the hopes that he fully >> > understands what you explain above as well as him knowing why this >> > was a test-and-set here in the first place. >> > >> >> I agree we have a misuse of softirq mechanism here. If guest is >> already in non-root mode, the 1st posted interrupt will be directly >> delivered to guest (leaving softirq being set w/o actually incurring a >> VM-exit - breaking desired softirq behavior). Then further posted >> interrupts will skip the IPI, stay in PIR and not noted until another >> VM-exit happens. Looks Quan observes such delay of delivery in his >> experiments. >> >> I'm OK to remove the set here. Actually since it's an optimization for >> less IPIs, we'd better check softirq_pending(cpu) directly instead of >> sticking to one bit only. >> > >sent too fast... Quan, can you work out a patch following this suggestion >and see whether your slow-delivery issue is solved? >(hope I understand your issue correctly here). > Cool, Very Correct!! Sure, I will send out a patch in coming days.. Quan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2017-02-21 3:17 UTC | newest] Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-01-12 6:46 [xen-unstable test] 104131: regressions - FAIL osstest service owner 2017-01-12 9:14 ` Andrew Cooper 2017-01-12 12:07 ` Xuquan (Quan Xu) 2017-01-12 9:04 ` Chao Gao 2017-01-12 12:15 ` Andrew Cooper 2017-01-12 12:25 ` Jan Beulich 2017-01-16 5:25 ` Tian, Kevin 2017-01-16 6:27 ` Xuquan (Quan Xu) 2017-01-16 0:06 ` Chao Gao 2017-01-16 11:00 ` Jan Beulich 2017-01-18 4:57 ` Tian, Kevin 2017-01-18 9:38 ` Jan Beulich 2017-01-18 10:23 ` Tian, Kevin 2017-01-20 11:48 ` Jan Beulich 2017-01-22 4:35 ` Tian, Kevin 2017-01-23 10:33 ` Jan Beulich 2017-01-20 8:47 ` Xuquan (Quan Xu) 2017-01-20 8:56 ` Jan Beulich 2017-01-20 9:08 ` Xuquan (Quan Xu) 2017-01-23 10:56 ` Xuquan (Quan Xu) 2017-02-08 6:51 ` Tian, Kevin 2017-02-08 8:27 ` Xuquan (Quan Xu) 2017-02-08 8:52 ` Jan Beulich 2017-02-08 10:15 ` Xuquan (Quan Xu) 2017-02-08 8:22 ` Chao Gao 2017-02-09 8:51 ` Xuquan (Quan Xu) 2017-02-09 3:41 ` Chao Gao 2017-02-13 8:21 ` Tian, Kevin 2017-02-20 12:04 ` Xuquan (Quan Xu) 2017-02-21 3:17 ` Tian, Kevin 2017-02-13 8:23 ` Tian, Kevin 2017-02-14 1:22 ` Xuquan (Quan Xu)
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.