All of lore.kernel.org
 help / color / mirror / Atom feed
* [xen-unstable test] 104131: regressions - FAIL
@ 2017-01-12  6:46 osstest service owner
  2017-01-12  9:14 ` Andrew Cooper
  0 siblings, 1 reply; 32+ messages in thread
From: osstest service owner @ 2017-01-12  6:46 UTC (permalink / raw)
  To: xen-devel, osstest-admin

flight 104131 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/104131/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-qemuu-debianhvm-amd64 16 guest-stop   fail REGR. vs. 104119

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds    15 guest-start/debian.repeat fail REGR. vs. 104119
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-check    fail  like 104104
 test-armhf-armhf-libvirt     13 saverestore-support-check    fail  like 104119
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop             fail like 104119
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop             fail like 104119
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop            fail like 104119
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop            fail like 104119
 test-armhf-armhf-libvirt-qcow2 12 saverestore-support-check   fail like 104119
 test-armhf-armhf-libvirt-raw 12 saverestore-support-check    fail  like 104119
 test-amd64-amd64-xl-rtds      9 debian-install               fail  like 104119

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvh-amd  11 guest-start                  fail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start                  fail  never pass
 test-amd64-i386-libvirt      12 migrate-support-check        fail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-check        fail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-check        fail   never pass
 test-amd64-amd64-libvirt     12 migrate-support-check        fail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-check    fail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-check        fail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-xsm      12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-xsm      13 saverestore-support-check    fail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-check        fail never pass
 test-armhf-armhf-libvirt     12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-check    fail   never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-check    fail never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-check        fail   never pass
 test-armhf-armhf-libvirt-qcow2 11 migrate-support-check        fail never pass
 test-armhf-armhf-xl-rtds     12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-rtds     13 saverestore-support-check    fail   never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-vhd      11 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-vhd      12 saverestore-support-check    fail   never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-check        fail  never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-check    fail  never pass
 test-armhf-armhf-xl          12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl          13 saverestore-support-check    fail   never pass

version targeted for testing:
 xen                  0d045d65c19ac48b31344b566cbf82a0270e6e44
baseline version:
 xen                  ffc103c223a6d12e5221f66b7e96396a61ba1b20

Last test of basis   104119  2017-01-11 06:45:46 Z    0 days
Failing since        104126  2017-01-11 16:44:54 Z    0 days    2 attempts
Testing same since   104131  2017-01-11 22:43:41 Z    0 days    1 attempts

------------------------------------------------------------
People who touched revisions under test:
  Andrew Cooper <andrew.cooper3@citrix.com>
  Jan Beulich <jbeulich@suse.com>
  Kevin Tian <kevin.tian@intel.com>
  Stefano Stabellini <sstabellini@kernel.org>
  Wei Liu <wei.liu2@citrix.com>

jobs:
 build-amd64-xsm                                              pass    
 build-armhf-xsm                                              pass    
 build-i386-xsm                                               pass    
 build-amd64-xtf                                              pass    
 build-amd64                                                  pass    
 build-armhf                                                  pass    
 build-i386                                                   pass    
 build-amd64-libvirt                                          pass    
 build-armhf-libvirt                                          pass    
 build-i386-libvirt                                           pass    
 build-amd64-oldkern                                          pass    
 build-i386-oldkern                                           pass    
 build-amd64-prev                                             pass    
 build-i386-prev                                              pass    
 build-amd64-pvops                                            pass    
 build-armhf-pvops                                            pass    
 build-i386-pvops                                             pass    
 build-amd64-rumprun                                          pass    
 build-i386-rumprun                                           pass    
 test-xtf-amd64-amd64-1                                       pass    
 test-xtf-amd64-amd64-2                                       pass    
 test-xtf-amd64-amd64-3                                       pass    
 test-xtf-amd64-amd64-4                                       pass    
 test-xtf-amd64-amd64-5                                       pass    
 test-amd64-amd64-xl                                          pass    
 test-armhf-armhf-xl                                          pass    
 test-amd64-i386-xl                                           pass    
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm                pass    
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm                 pass    
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm           pass    
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm            pass    
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm                pass    
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm                 pass    
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm        pass    
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm         pass    
 test-amd64-amd64-libvirt-xsm                                 pass    
 test-armhf-armhf-libvirt-xsm                                 pass    
 test-amd64-i386-libvirt-xsm                                  pass    
 test-amd64-amd64-xl-xsm                                      pass    
 test-armhf-armhf-xl-xsm                                      pass    
 test-amd64-i386-xl-xsm                                       pass    
 test-amd64-amd64-qemuu-nested-amd                            fail    
 test-amd64-amd64-xl-pvh-amd                                  fail    
 test-amd64-i386-qemut-rhel6hvm-amd                           pass    
 test-amd64-i386-qemuu-rhel6hvm-amd                           pass    
 test-amd64-amd64-xl-qemut-debianhvm-amd64                    pass    
 test-amd64-i386-xl-qemut-debianhvm-amd64                     pass    
 test-amd64-amd64-xl-qemuu-debianhvm-amd64                    pass    
 test-amd64-i386-xl-qemuu-debianhvm-amd64                     fail    
 test-amd64-i386-freebsd10-amd64                              pass    
 test-amd64-amd64-xl-qemuu-ovmf-amd64                         pass    
 test-amd64-i386-xl-qemuu-ovmf-amd64                          pass    
 test-amd64-amd64-rumprun-amd64                               pass    
 test-amd64-amd64-xl-qemut-win7-amd64                         fail    
 test-amd64-i386-xl-qemut-win7-amd64                          fail    
 test-amd64-amd64-xl-qemuu-win7-amd64                         fail    
 test-amd64-i386-xl-qemuu-win7-amd64                          fail    
 test-armhf-armhf-xl-arndale                                  pass    
 test-amd64-amd64-xl-credit2                                  pass    
 test-armhf-armhf-xl-credit2                                  pass    
 test-armhf-armhf-xl-cubietruck                               pass    
 test-amd64-i386-freebsd10-i386                               pass    
 test-amd64-i386-rumprun-i386                                 pass    
 test-amd64-amd64-qemuu-nested-intel                          pass    
 test-amd64-amd64-xl-pvh-intel                                fail    
 test-amd64-i386-qemut-rhel6hvm-intel                         pass    
 test-amd64-i386-qemuu-rhel6hvm-intel                         pass    
 test-amd64-amd64-libvirt                                     pass    
 test-armhf-armhf-libvirt                                     pass    
 test-amd64-i386-libvirt                                      pass    
 test-amd64-amd64-migrupgrade                                 pass    
 test-amd64-i386-migrupgrade                                  pass    
 test-amd64-amd64-xl-multivcpu                                pass    
 test-armhf-armhf-xl-multivcpu                                pass    
 test-amd64-amd64-pair                                        pass    
 test-amd64-i386-pair                                         pass    
 test-amd64-amd64-libvirt-pair                                pass    
 test-amd64-i386-libvirt-pair                                 pass    
 test-amd64-amd64-amd64-pvgrub                                pass    
 test-amd64-amd64-i386-pvgrub                                 pass    
 test-amd64-amd64-pygrub                                      pass    
 test-armhf-armhf-libvirt-qcow2                               pass    
 test-amd64-amd64-xl-qcow2                                    pass    
 test-armhf-armhf-libvirt-raw                                 pass    
 test-amd64-i386-xl-raw                                       pass    
 test-amd64-amd64-xl-rtds                                     fail    
 test-armhf-armhf-xl-rtds                                     fail    
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1                     pass    
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1                     pass    
 test-amd64-amd64-libvirt-vhd                                 pass    
 test-armhf-armhf-xl-vhd                                      pass    
 test-amd64-amd64-xl-qemut-winxpsp3                           pass    
 test-amd64-i386-xl-qemut-winxpsp3                            pass    
 test-amd64-amd64-xl-qemuu-winxpsp3                           pass    
 test-amd64-i386-xl-qemuu-winxpsp3                            pass    


------------------------------------------------------------
sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
    http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
    http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
    http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
    http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 528 lines long.)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-01-12 12:07   ` Xuquan (Quan Xu)
@ 2017-01-12  9:04     ` Chao Gao
  2017-01-12 12:15     ` Andrew Cooper
  1 sibling, 0 replies; 32+ messages in thread
From: Chao Gao @ 2017-01-12  9:04 UTC (permalink / raw)
  To: Xuquan (Quan Xu)
  Cc: Andrew Cooper, Kevin Tian, osstest service owner, Jan Beulich, xen-devel

According the code around the assert: 
movzbl %r14b, %esi  41 0f b6 f6 
cmp %esi, %eax      39 f0
jle ...             7e 02
ud2                 <0f> 0b 
mov %rbx, %rdi      48 89 df
callq ...           e8 51 20 00 00 
mov $0x810, %eax    b8 10 08 00 00 

so I think one is 0x38 %eax, the other is 0x30 %esi

On Thu, Jan 12, 2017 at 12:07:53PM +0000, Xuquan (Quan Xu) wrote:
>On January 12, 2017 5:14 PM, Andrew Cooper wrote:
>>On 12/01/2017 06:46, osstest service owner wrote:
>>> flight 104131 xen-unstable real [real]
>>> http://logs.test-lab.xenproject.org/osstest/logs/104131/
>>>
>>> Regressions :-(
>>>
>>> Tests which did not succeed and are blocking, including tests which
>>> could not be run:
>>>  test-amd64-i386-xl-qemuu-debianhvm-amd64 16 guest-stop   fail
>>REGR. vs. 104119
>>
>>Jan 12 01:25:17.397607 (XEN) Assertion 'intack.vector >= pt_vector' failed at
>>intr.c:321
>>Jan 12 01:25:37.133596 (XEN) ----[ Xen-4.9-unstable  x86_64  debug=y
>>Not tainted ]----
>>Jan 12 01:25:37.141577 (XEN) CPU:    14
>>Jan 12 01:25:37.141607 (XEN) RIP:    e008:[<ffff82d0801ef7fc>]
>>vmx_intr_assist+0x35e/0x51d
>>Jan 12 01:25:37.149617 (XEN) RFLAGS: 0000000000010202   CONTEXT:
>>hypervisor (d15v0)
>>Jan 12 01:25:37.149655 (XEN) rax: 0000000000000038   rbx:
>>ffff830079e1e000   rcx: 0000000000000030
>>Jan 12 01:25:37.157582 (XEN) rdx: 0000000000000000   rsi:
>>0000000000000030   rdi: ffff830079e1e000
>>Jan 12 01:25:37.165584 (XEN) rbp: ffff83047de2ff08   rsp: ffff83047de2fea8
>>r8:  ffff82c00022f000
>>Jan 12 01:25:37.173579 (XEN) r9:  ffff8301b63ede80   r10:
>>ffff830176386560   r11: 000001955ee79bd0
>>Jan 12 01:25:37.181582 (XEN) r12: 0000000000003002   r13:
>>0000000000003002   r14: 0000000000000030
>>Jan 12 01:25:37.189584 (XEN) r15: ffff83023fec2000   cr0:
>>0000000080050033   cr4: 00000000003526e0
>>Jan 12 01:25:37.197572 (XEN) cr3: 0000000232edb000   cr2:
>>0000000002487034
>>Jan 12 01:25:37.205569 (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000
>>ss: 0000   cs: e008
>>Jan 12 01:25:37.205606 (XEN) Xen code around <ffff82d0801ef7fc>
>>(vmx_intr_assist+0x35e/0x51d):
>>Jan 12 01:25:37.213575 (XEN)  41 0f b6 f6 39 f0 7e 02 <0f> 0b 48 89 df e8 51
>>20 00 00 b8 10 08 00 00 0f Jan 12 01:25:37.221561 (XEN) Xen stack trace
>>from rsp=ffff83047de2fea8:
>>Jan 12 01:25:37.229600 (XEN)    ffff82d08031aa80 00000038ffffffff
>>ffff83047de2ffff ffff83023fec2000
>>Jan 12 01:25:37.237594 (XEN)    ffff83047de2fef8 ffff82d080130cb6
>>ffff830079e1e000 ffff830079e1e000
>>Jan 12 01:25:37.245588 (XEN)    ffff83007bae2000 000000000000000e
>>ffff830233117000 ffff83023fec2000
>>Jan 12 01:25:37.253594 (XEN)    ffff83047de2fdc0 ffff82d0801fdeb1
>>0000000000000004 00000000000000c2
>>Jan 12 01:25:37.261584 (XEN)    0000000000000020 0000000000000007
>>ffff8800e8d28000 ffffffff81add0a0
>>Jan 12 01:25:37.269607 (XEN)    0000000000000246 0000000000000000
>>ffff880142400008 0000000000000004
>>Jan 12 01:25:37.277580 (XEN)    0000000000000036 0000000000000000
>>00000000000003f8 00000000000003f8
>>Jan 12 01:25:37.285584 (XEN)    ffffffff81add0a0 0000beef0000beef
>>ffffffff813899a4 000000bf0000beef
>>Jan 12 01:25:37.293567 (XEN)    0000000000000002 ffff880147c03e08
>>000000000000beef 1cec835356e5beef
>>Jan 12 01:25:37.293606 (XEN)    085d8b002674beef 01dcb38b000cbeef
>>8914458d3174beef 2444c7100000000e
>>Jan 12 01:25:37.301586 (XEN)    ffff830079e1e000 00000031bfc37600
>>00000000003526e0
>>Jan 12 01:25:37.309607 (XEN) Xen call trace:
>>Jan 12 01:25:37.309639 (XEN)    [<ffff82d0801ef7fc>]
>>vmx_intr_assist+0x35e/0x51d
>>Jan 12 01:25:37.317591 (XEN)    [<ffff82d0801fdeb1>]
>>vmx_asm_vmexit_handler+0x41/0x120
>>Jan 12 01:25:37.325598 (XEN)
>>Jan 12 01:25:37.325624 (XEN)
>>Jan 12 01:25:37.325647 (XEN)
>>****************************************
>>Jan 12 01:25:37.333653 (XEN) Panic on CPU 14:
>>Jan 12 01:25:37.333684 (XEN) Assertion 'intack.vector >= pt_vector' failed at
>>intr.c:321 Jan 12 01:25:37.341571 (XEN)
>>****************************************
>>Jan 12 01:25:37.341603 (XEN)
>>Jan 12 01:25:37.341626 (XEN) Reboot in five seconds...
>>Jan 12 01:25:37.349566 (XEN) Resetting with ACPI MEMORY or I/O
>>RESET_REG.
>>
>>This is caused by "x86/apicv: fix RTC periodic timer and apicv issue".  It is
>>not a deterministic issue, as it appears to have survived a week of testing
>>already, but there is clearly something still problematic with the code.
>>
>
>
>Andrew,
>If you have, could you give more information? Such as the value of intack.vector / pt_vector..
>I guess, the reason may be that the intack.vector is ' uint8_t ' and the pt_vector is 'int'..
>
>Or there is a corner case that intack.vector is __not__ the highest priority vector..
>
>Kevin / Jan,  any thoughts?
>
>Quan
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-01-12  6:46 [xen-unstable test] 104131: regressions - FAIL osstest service owner
@ 2017-01-12  9:14 ` Andrew Cooper
  2017-01-12 12:07   ` Xuquan (Quan Xu)
  0 siblings, 1 reply; 32+ messages in thread
From: Andrew Cooper @ 2017-01-12  9:14 UTC (permalink / raw)
  Cc: Kevin Tian, Quan Xu, osstest service owner, Jan Beulich,
	xen-devel, Chao Gao

On 12/01/2017 06:46, osstest service owner wrote:
> flight 104131 xen-unstable real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/104131/
>
> Regressions :-(
>
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-amd64-i386-xl-qemuu-debianhvm-amd64 16 guest-stop   fail REGR. vs. 104119

Jan 12 01:25:17.397607 (XEN) Assertion 'intack.vector >= pt_vector' failed at intr.c:321
Jan 12 01:25:37.133596 (XEN) ----[ Xen-4.9-unstable  x86_64  debug=y   Not tainted ]----
Jan 12 01:25:37.141577 (XEN) CPU:    14
Jan 12 01:25:37.141607 (XEN) RIP:    e008:[<ffff82d0801ef7fc>] vmx_intr_assist+0x35e/0x51d
Jan 12 01:25:37.149617 (XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor (d15v0)
Jan 12 01:25:37.149655 (XEN) rax: 0000000000000038   rbx: ffff830079e1e000   rcx: 0000000000000030
Jan 12 01:25:37.157582 (XEN) rdx: 0000000000000000   rsi: 0000000000000030   rdi: ffff830079e1e000
Jan 12 01:25:37.165584 (XEN) rbp: ffff83047de2ff08   rsp: ffff83047de2fea8   r8:  ffff82c00022f000
Jan 12 01:25:37.173579 (XEN) r9:  ffff8301b63ede80   r10: ffff830176386560   r11: 000001955ee79bd0
Jan 12 01:25:37.181582 (XEN) r12: 0000000000003002   r13: 0000000000003002   r14: 0000000000000030
Jan 12 01:25:37.189584 (XEN) r15: ffff83023fec2000   cr0: 0000000080050033   cr4: 00000000003526e0
Jan 12 01:25:37.197572 (XEN) cr3: 0000000232edb000   cr2: 0000000002487034
Jan 12 01:25:37.205569 (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
Jan 12 01:25:37.205606 (XEN) Xen code around <ffff82d0801ef7fc> (vmx_intr_assist+0x35e/0x51d):
Jan 12 01:25:37.213575 (XEN)  41 0f b6 f6 39 f0 7e 02 <0f> 0b 48 89 df e8 51 20 00 00 b8 10 08 00 00 0f
Jan 12 01:25:37.221561 (XEN) Xen stack trace from rsp=ffff83047de2fea8:
Jan 12 01:25:37.229600 (XEN)    ffff82d08031aa80 00000038ffffffff ffff83047de2ffff ffff83023fec2000
Jan 12 01:25:37.237594 (XEN)    ffff83047de2fef8 ffff82d080130cb6 ffff830079e1e000 ffff830079e1e000
Jan 12 01:25:37.245588 (XEN)    ffff83007bae2000 000000000000000e ffff830233117000 ffff83023fec2000
Jan 12 01:25:37.253594 (XEN)    ffff83047de2fdc0 ffff82d0801fdeb1 0000000000000004 00000000000000c2
Jan 12 01:25:37.261584 (XEN)    0000000000000020 0000000000000007 ffff8800e8d28000 ffffffff81add0a0
Jan 12 01:25:37.269607 (XEN)    0000000000000246 0000000000000000 ffff880142400008 0000000000000004
Jan 12 01:25:37.277580 (XEN)    0000000000000036 0000000000000000 00000000000003f8 00000000000003f8
Jan 12 01:25:37.285584 (XEN)    ffffffff81add0a0 0000beef0000beef ffffffff813899a4 000000bf0000beef
Jan 12 01:25:37.293567 (XEN)    0000000000000002 ffff880147c03e08 000000000000beef 1cec835356e5beef
Jan 12 01:25:37.293606 (XEN)    085d8b002674beef 01dcb38b000cbeef 8914458d3174beef 2444c7100000000e
Jan 12 01:25:37.301586 (XEN)    ffff830079e1e000 00000031bfc37600 00000000003526e0
Jan 12 01:25:37.309607 (XEN) Xen call trace:
Jan 12 01:25:37.309639 (XEN)    [<ffff82d0801ef7fc>] vmx_intr_assist+0x35e/0x51d
Jan 12 01:25:37.317591 (XEN)    [<ffff82d0801fdeb1>] vmx_asm_vmexit_handler+0x41/0x120
Jan 12 01:25:37.325598 (XEN) 
Jan 12 01:25:37.325624 (XEN) 
Jan 12 01:25:37.325647 (XEN) ****************************************
Jan 12 01:25:37.333653 (XEN) Panic on CPU 14:
Jan 12 01:25:37.333684 (XEN) Assertion 'intack.vector >= pt_vector' failed at intr.c:321
Jan 12 01:25:37.341571 (XEN) ****************************************
Jan 12 01:25:37.341603 (XEN) 
Jan 12 01:25:37.341626 (XEN) Reboot in five seconds...
Jan 12 01:25:37.349566 (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.

This is caused by "x86/apicv: fix RTC periodic timer and apicv issue".  It is not a deterministic issue, as it appears to have survived a week of testing already, but there is clearly something still problematic with the code.

~Andrew


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-01-12  9:14 ` Andrew Cooper
@ 2017-01-12 12:07   ` Xuquan (Quan Xu)
  2017-01-12  9:04     ` Chao Gao
  2017-01-12 12:15     ` Andrew Cooper
  0 siblings, 2 replies; 32+ messages in thread
From: Xuquan (Quan Xu) @ 2017-01-12 12:07 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Chao Gao, osstest service owner, Jan Beulich, xen-devel

On January 12, 2017 5:14 PM, Andrew Cooper wrote:
>On 12/01/2017 06:46, osstest service owner wrote:
>> flight 104131 xen-unstable real [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/104131/
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking, including tests which
>> could not be run:
>>  test-amd64-i386-xl-qemuu-debianhvm-amd64 16 guest-stop   fail
>REGR. vs. 104119
>
>Jan 12 01:25:17.397607 (XEN) Assertion 'intack.vector >= pt_vector' failed at
>intr.c:321
>Jan 12 01:25:37.133596 (XEN) ----[ Xen-4.9-unstable  x86_64  debug=y
>Not tainted ]----
>Jan 12 01:25:37.141577 (XEN) CPU:    14
>Jan 12 01:25:37.141607 (XEN) RIP:    e008:[<ffff82d0801ef7fc>]
>vmx_intr_assist+0x35e/0x51d
>Jan 12 01:25:37.149617 (XEN) RFLAGS: 0000000000010202   CONTEXT:
>hypervisor (d15v0)
>Jan 12 01:25:37.149655 (XEN) rax: 0000000000000038   rbx:
>ffff830079e1e000   rcx: 0000000000000030
>Jan 12 01:25:37.157582 (XEN) rdx: 0000000000000000   rsi:
>0000000000000030   rdi: ffff830079e1e000
>Jan 12 01:25:37.165584 (XEN) rbp: ffff83047de2ff08   rsp: ffff83047de2fea8
>r8:  ffff82c00022f000
>Jan 12 01:25:37.173579 (XEN) r9:  ffff8301b63ede80   r10:
>ffff830176386560   r11: 000001955ee79bd0
>Jan 12 01:25:37.181582 (XEN) r12: 0000000000003002   r13:
>0000000000003002   r14: 0000000000000030
>Jan 12 01:25:37.189584 (XEN) r15: ffff83023fec2000   cr0:
>0000000080050033   cr4: 00000000003526e0
>Jan 12 01:25:37.197572 (XEN) cr3: 0000000232edb000   cr2:
>0000000002487034
>Jan 12 01:25:37.205569 (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000
>ss: 0000   cs: e008
>Jan 12 01:25:37.205606 (XEN) Xen code around <ffff82d0801ef7fc>
>(vmx_intr_assist+0x35e/0x51d):
>Jan 12 01:25:37.213575 (XEN)  41 0f b6 f6 39 f0 7e 02 <0f> 0b 48 89 df e8 51
>20 00 00 b8 10 08 00 00 0f Jan 12 01:25:37.221561 (XEN) Xen stack trace
>from rsp=ffff83047de2fea8:
>Jan 12 01:25:37.229600 (XEN)    ffff82d08031aa80 00000038ffffffff
>ffff83047de2ffff ffff83023fec2000
>Jan 12 01:25:37.237594 (XEN)    ffff83047de2fef8 ffff82d080130cb6
>ffff830079e1e000 ffff830079e1e000
>Jan 12 01:25:37.245588 (XEN)    ffff83007bae2000 000000000000000e
>ffff830233117000 ffff83023fec2000
>Jan 12 01:25:37.253594 (XEN)    ffff83047de2fdc0 ffff82d0801fdeb1
>0000000000000004 00000000000000c2
>Jan 12 01:25:37.261584 (XEN)    0000000000000020 0000000000000007
>ffff8800e8d28000 ffffffff81add0a0
>Jan 12 01:25:37.269607 (XEN)    0000000000000246 0000000000000000
>ffff880142400008 0000000000000004
>Jan 12 01:25:37.277580 (XEN)    0000000000000036 0000000000000000
>00000000000003f8 00000000000003f8
>Jan 12 01:25:37.285584 (XEN)    ffffffff81add0a0 0000beef0000beef
>ffffffff813899a4 000000bf0000beef
>Jan 12 01:25:37.293567 (XEN)    0000000000000002 ffff880147c03e08
>000000000000beef 1cec835356e5beef
>Jan 12 01:25:37.293606 (XEN)    085d8b002674beef 01dcb38b000cbeef
>8914458d3174beef 2444c7100000000e
>Jan 12 01:25:37.301586 (XEN)    ffff830079e1e000 00000031bfc37600
>00000000003526e0
>Jan 12 01:25:37.309607 (XEN) Xen call trace:
>Jan 12 01:25:37.309639 (XEN)    [<ffff82d0801ef7fc>]
>vmx_intr_assist+0x35e/0x51d
>Jan 12 01:25:37.317591 (XEN)    [<ffff82d0801fdeb1>]
>vmx_asm_vmexit_handler+0x41/0x120
>Jan 12 01:25:37.325598 (XEN)
>Jan 12 01:25:37.325624 (XEN)
>Jan 12 01:25:37.325647 (XEN)
>****************************************
>Jan 12 01:25:37.333653 (XEN) Panic on CPU 14:
>Jan 12 01:25:37.333684 (XEN) Assertion 'intack.vector >= pt_vector' failed at
>intr.c:321 Jan 12 01:25:37.341571 (XEN)
>****************************************
>Jan 12 01:25:37.341603 (XEN)
>Jan 12 01:25:37.341626 (XEN) Reboot in five seconds...
>Jan 12 01:25:37.349566 (XEN) Resetting with ACPI MEMORY or I/O
>RESET_REG.
>
>This is caused by "x86/apicv: fix RTC periodic timer and apicv issue".  It is
>not a deterministic issue, as it appears to have survived a week of testing
>already, but there is clearly something still problematic with the code.
>


Andrew,
If you have, could you give more information? Such as the value of intack.vector / pt_vector..
I guess, the reason may be that the intack.vector is ' uint8_t ' and the pt_vector is 'int'..

Or there is a corner case that intack.vector is __not__ the highest priority vector..

Kevin / Jan,  any thoughts?

Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-01-12 12:07   ` Xuquan (Quan Xu)
  2017-01-12  9:04     ` Chao Gao
@ 2017-01-12 12:15     ` Andrew Cooper
  2017-01-12 12:25       ` Jan Beulich
  1 sibling, 1 reply; 32+ messages in thread
From: Andrew Cooper @ 2017-01-12 12:15 UTC (permalink / raw)
  To: Xuquan (Quan Xu)
  Cc: Kevin Tian, Chao Gao, osstest service owner, Jan Beulich, xen-devel

On 12/01/17 12:07, Xuquan (Quan Xu) wrote:
> On January 12, 2017 5:14 PM, Andrew Cooper wrote:
>> On 12/01/2017 06:46, osstest service owner wrote:
>>> flight 104131 xen-unstable real [real]
>>> http://logs.test-lab.xenproject.org/osstest/logs/104131/
>>>
>>> Regressions :-(
>>>
>>> Tests which did not succeed and are blocking, including tests which
>>> could not be run:
>>>  test-amd64-i386-xl-qemuu-debianhvm-amd64 16 guest-stop   fail
>> REGR. vs. 104119
>>
>> Jan 12 01:25:17.397607 (XEN) Assertion 'intack.vector >= pt_vector' failed at
>> intr.c:321
>> Jan 12 01:25:37.133596 (XEN) ----[ Xen-4.9-unstable  x86_64  debug=y
>> Not tainted ]----
>> Jan 12 01:25:37.141577 (XEN) CPU:    14
>> Jan 12 01:25:37.141607 (XEN) RIP:    e008:[<ffff82d0801ef7fc>]
>> vmx_intr_assist+0x35e/0x51d
>> Jan 12 01:25:37.149617 (XEN) RFLAGS: 0000000000010202   CONTEXT:
>> hypervisor (d15v0)
>> Jan 12 01:25:37.149655 (XEN) rax: 0000000000000038   rbx:
>> ffff830079e1e000   rcx: 0000000000000030
>> Jan 12 01:25:37.157582 (XEN) rdx: 0000000000000000   rsi:
>> 0000000000000030   rdi: ffff830079e1e000
>> Jan 12 01:25:37.165584 (XEN) rbp: ffff83047de2ff08   rsp: ffff83047de2fea8
>> r8:  ffff82c00022f000
>> Jan 12 01:25:37.173579 (XEN) r9:  ffff8301b63ede80   r10:
>> ffff830176386560   r11: 000001955ee79bd0
>> Jan 12 01:25:37.181582 (XEN) r12: 0000000000003002   r13:
>> 0000000000003002   r14: 0000000000000030
>> Jan 12 01:25:37.189584 (XEN) r15: ffff83023fec2000   cr0:
>> 0000000080050033   cr4: 00000000003526e0
>> Jan 12 01:25:37.197572 (XEN) cr3: 0000000232edb000   cr2:
>> 0000000002487034
>> Jan 12 01:25:37.205569 (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000
>> ss: 0000   cs: e008
>> Jan 12 01:25:37.205606 (XEN) Xen code around <ffff82d0801ef7fc>
>> (vmx_intr_assist+0x35e/0x51d):
>> Jan 12 01:25:37.213575 (XEN)  41 0f b6 f6 39 f0 7e 02 <0f> 0b 48 89 df e8 51
>> 20 00 00 b8 10 08 00 00 0f Jan 12 01:25:37.221561 (XEN) Xen stack trace
> >from rsp=ffff83047de2fea8:
>> Jan 12 01:25:37.229600 (XEN)    ffff82d08031aa80 00000038ffffffff
>> ffff83047de2ffff ffff83023fec2000
>> Jan 12 01:25:37.237594 (XEN)    ffff83047de2fef8 ffff82d080130cb6
>> ffff830079e1e000 ffff830079e1e000
>> Jan 12 01:25:37.245588 (XEN)    ffff83007bae2000 000000000000000e
>> ffff830233117000 ffff83023fec2000
>> Jan 12 01:25:37.253594 (XEN)    ffff83047de2fdc0 ffff82d0801fdeb1
>> 0000000000000004 00000000000000c2
>> Jan 12 01:25:37.261584 (XEN)    0000000000000020 0000000000000007
>> ffff8800e8d28000 ffffffff81add0a0
>> Jan 12 01:25:37.269607 (XEN)    0000000000000246 0000000000000000
>> ffff880142400008 0000000000000004
>> Jan 12 01:25:37.277580 (XEN)    0000000000000036 0000000000000000
>> 00000000000003f8 00000000000003f8
>> Jan 12 01:25:37.285584 (XEN)    ffffffff81add0a0 0000beef0000beef
>> ffffffff813899a4 000000bf0000beef
>> Jan 12 01:25:37.293567 (XEN)    0000000000000002 ffff880147c03e08
>> 000000000000beef 1cec835356e5beef
>> Jan 12 01:25:37.293606 (XEN)    085d8b002674beef 01dcb38b000cbeef
>> 8914458d3174beef 2444c7100000000e
>> Jan 12 01:25:37.301586 (XEN)    ffff830079e1e000 00000031bfc37600
>> 00000000003526e0
>> Jan 12 01:25:37.309607 (XEN) Xen call trace:
>> Jan 12 01:25:37.309639 (XEN)    [<ffff82d0801ef7fc>]
>> vmx_intr_assist+0x35e/0x51d
>> Jan 12 01:25:37.317591 (XEN)    [<ffff82d0801fdeb1>]
>> vmx_asm_vmexit_handler+0x41/0x120
>> Jan 12 01:25:37.325598 (XEN)
>> Jan 12 01:25:37.325624 (XEN)
>> Jan 12 01:25:37.325647 (XEN)
>> ****************************************
>> Jan 12 01:25:37.333653 (XEN) Panic on CPU 14:
>> Jan 12 01:25:37.333684 (XEN) Assertion 'intack.vector >= pt_vector' failed at
>> intr.c:321 Jan 12 01:25:37.341571 (XEN)
>> ****************************************
>> Jan 12 01:25:37.341603 (XEN)
>> Jan 12 01:25:37.341626 (XEN) Reboot in five seconds...
>> Jan 12 01:25:37.349566 (XEN) Resetting with ACPI MEMORY or I/O
>> RESET_REG.
>>
>> This is caused by "x86/apicv: fix RTC periodic timer and apicv issue".  It is
>> not a deterministic issue, as it appears to have survived a week of testing
>> already, but there is clearly something still problematic with the code.
>>
>
> Andrew,
> If you have, could you give more information?

No further information sorry.  This was found by the automated test system.

Full logs are available from
http://logs.test-lab.xenproject.org/osstest/logs/104131/test-amd64-i386-xl-qemuu-debianhvm-amd64/
but I doubt any of them will help in diagnosing the issue any further.

> Such as the value of intack.vector / pt_vector..
> I guess, the reason may be that the intack.vector is ' uint8_t ' and the pt_vector is 'int'..
>
> Or there is a corner case that intack.vector is __not__ the highest priority vector..
>
> Kevin / Jan,  any thoughts?

It happened during domain shutdown.  It might be an edge case
interaction with qemu-raised interrupts via hypercall?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-01-12 12:15     ` Andrew Cooper
@ 2017-01-12 12:25       ` Jan Beulich
  2017-01-16  5:25         ` Tian, Kevin
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2017-01-12 12:25 UTC (permalink / raw)
  To: Xuquan (Quan Xu)
  Cc: Andrew Cooper, Kevin Tian, xen-devel, osstest service owner, Chao Gao

>>> On 12.01.17 at 13:15, <andrew.cooper3@citrix.com> wrote:
> On 12/01/17 12:07, Xuquan (Quan Xu) wrote:
>> On January 12, 2017 5:14 PM, Andrew Cooper wrote:
>>> On 12/01/2017 06:46, osstest service owner wrote:
>>>> flight 104131 xen-unstable real [real]
>>>> http://logs.test-lab.xenproject.org/osstest/logs/104131/ 
>>>>
>>>> Regressions :-(
>>>>
>>>> Tests which did not succeed and are blocking, including tests which
>>>> could not be run:
>>>>  test-amd64-i386-xl-qemuu-debianhvm-amd64 16 guest-stop   fail
>>> REGR. vs. 104119
>>>
>>> Jan 12 01:25:17.397607 (XEN) Assertion 'intack.vector >= pt_vector' failed at
>>> intr.c:321
>>> Jan 12 01:25:37.133596 (XEN) ----[ Xen-4.9-unstable  x86_64  debug=y
>>> Not tainted ]----
>>> Jan 12 01:25:37.141577 (XEN) CPU:    14
>>> Jan 12 01:25:37.141607 (XEN) RIP:    e008:[<ffff82d0801ef7fc>]
>>> vmx_intr_assist+0x35e/0x51d
>>> Jan 12 01:25:37.149617 (XEN) RFLAGS: 0000000000010202   CONTEXT:
>>> hypervisor (d15v0)
>>> Jan 12 01:25:37.149655 (XEN) rax: 0000000000000038   rbx:
>>> ffff830079e1e000   rcx: 0000000000000030
>>> Jan 12 01:25:37.157582 (XEN) rdx: 0000000000000000   rsi:
>>> 0000000000000030   rdi: ffff830079e1e000
>>> Jan 12 01:25:37.165584 (XEN) rbp: ffff83047de2ff08   rsp: ffff83047de2fea8
>>> r8:  ffff82c00022f000
>>> Jan 12 01:25:37.173579 (XEN) r9:  ffff8301b63ede80   r10:
>>> ffff830176386560   r11: 000001955ee79bd0
>>> Jan 12 01:25:37.181582 (XEN) r12: 0000000000003002   r13:
>>> 0000000000003002   r14: 0000000000000030
>>> Jan 12 01:25:37.189584 (XEN) r15: ffff83023fec2000   cr0:
>>> 0000000080050033   cr4: 00000000003526e0
>>> Jan 12 01:25:37.197572 (XEN) cr3: 0000000232edb000   cr2:
>>> 0000000002487034
>>> Jan 12 01:25:37.205569 (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000
>>> ss: 0000   cs: e008
>>> Jan 12 01:25:37.205606 (XEN) Xen code around <ffff82d0801ef7fc>
>>> (vmx_intr_assist+0x35e/0x51d):
>>> Jan 12 01:25:37.213575 (XEN)  41 0f b6 f6 39 f0 7e 02 <0f> 0b 48 89 df e8 51
>>> 20 00 00 b8 10 08 00 00 0f Jan 12 01:25:37.221561 (XEN) Xen stack trace
>> >from rsp=ffff83047de2fea8:
>>> Jan 12 01:25:37.229600 (XEN)    ffff82d08031aa80 00000038ffffffff
>>> ffff83047de2ffff ffff83023fec2000
>>> Jan 12 01:25:37.237594 (XEN)    ffff83047de2fef8 ffff82d080130cb6
>>> ffff830079e1e000 ffff830079e1e000
>>> Jan 12 01:25:37.245588 (XEN)    ffff83007bae2000 000000000000000e
>>> ffff830233117000 ffff83023fec2000
>>> Jan 12 01:25:37.253594 (XEN)    ffff83047de2fdc0 ffff82d0801fdeb1
>>> 0000000000000004 00000000000000c2
>>> Jan 12 01:25:37.261584 (XEN)    0000000000000020 0000000000000007
>>> ffff8800e8d28000 ffffffff81add0a0
>>> Jan 12 01:25:37.269607 (XEN)    0000000000000246 0000000000000000
>>> ffff880142400008 0000000000000004
>>> Jan 12 01:25:37.277580 (XEN)    0000000000000036 0000000000000000
>>> 00000000000003f8 00000000000003f8
>>> Jan 12 01:25:37.285584 (XEN)    ffffffff81add0a0 0000beef0000beef
>>> ffffffff813899a4 000000bf0000beef
>>> Jan 12 01:25:37.293567 (XEN)    0000000000000002 ffff880147c03e08
>>> 000000000000beef 1cec835356e5beef
>>> Jan 12 01:25:37.293606 (XEN)    085d8b002674beef 01dcb38b000cbeef
>>> 8914458d3174beef 2444c7100000000e
>>> Jan 12 01:25:37.301586 (XEN)    ffff830079e1e000 00000031bfc37600
>>> 00000000003526e0
>>> Jan 12 01:25:37.309607 (XEN) Xen call trace:
>>> Jan 12 01:25:37.309639 (XEN)    [<ffff82d0801ef7fc>]
>>> vmx_intr_assist+0x35e/0x51d
>>> Jan 12 01:25:37.317591 (XEN)    [<ffff82d0801fdeb1>]
>>> vmx_asm_vmexit_handler+0x41/0x120
>>> Jan 12 01:25:37.325598 (XEN)
>>> Jan 12 01:25:37.325624 (XEN)
>>> Jan 12 01:25:37.325647 (XEN)
>>> ****************************************
>>> Jan 12 01:25:37.333653 (XEN) Panic on CPU 14:
>>> Jan 12 01:25:37.333684 (XEN) Assertion 'intack.vector >= pt_vector' failed at
>>> intr.c:321 Jan 12 01:25:37.341571 (XEN)
>>> ****************************************
>>> Jan 12 01:25:37.341603 (XEN)
>>> Jan 12 01:25:37.341626 (XEN) Reboot in five seconds...
>>> Jan 12 01:25:37.349566 (XEN) Resetting with ACPI MEMORY or I/O
>>> RESET_REG.
>>>
>>> This is caused by "x86/apicv: fix RTC periodic timer and apicv issue".  It is
>>> not a deterministic issue, as it appears to have survived a week of testing
>>> already, but there is clearly something still problematic with the code.
>>>
>>
>> Andrew,
>> If you have, could you give more information?
> 
> No further information sorry.  This was found by the automated test system.

But some can be gathered:

> Full logs are available from
> http://logs.test-lab.xenproject.org/osstest/logs/104131/test-amd64-i386-xl-q 
> emuu-debianhvm-amd64/
> but I doubt any of them will help in diagnosing the issue any further.
> 
>> Such as the value of intack.vector / pt_vector..

At leastb one of the two values is likely to live in a register, and
hence its value would be available in the dump. Just takes looking
at the disassembly.

>> I guess, the reason may be that the intack.vector is ' uint8_t ' and the pt_vector is 'int'..

That would be odd.

>> Or there is a corner case that intack.vector is __not__ the highest priority vector..

That's what I'm afraid of, and why I had asked to add the ASSERT().

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-01-16  6:27           ` Xuquan (Quan Xu)
@ 2017-01-16  0:06             ` Chao Gao
  0 siblings, 0 replies; 32+ messages in thread
From: Chao Gao @ 2017-01-16  0:06 UTC (permalink / raw)
  To: Xuquan (Quan Xu)
  Cc: Andrew Cooper, Tian, Kevin, osstest service owner, Jan Beulich,
	xen-devel

On Mon, Jan 16, 2017 at 06:27:23AM +0000, Xuquan (Quan Xu) wrote:
>On January 16, 2017 1:26 PM, Tian, Kevin wrote:
>>> From: Jan Beulich [mailto:JBeulich@suse.com]
>>> Sent: Thursday, January 12, 2017 8:26 PM
>>>
>>> >>> On 12.01.17 at 13:15, <andrew.cooper3@citrix.com> wrote:
>>> > On 12/01/17 12:07, Xuquan (Quan Xu) wrote:
>>> >> On January 12, 2017 5:14 PM, Andrew Cooper wrote:
>>> >>> On 12/01/2017 06:46, osstest service owner wrote:
>>> >>>> flight 104131 xen-unstable real [real]
>>> >>>> http://logs.test-lab.xenproject.org/osstest/logs/104131/
>>> >>>>
>>> >>>> Regressions :-(
>>> >>>>
>>> >>>> Tests which did not succeed and are blocking, including tests
>>> >>>> which could not be run:
>>> >>>>  test-amd64-i386-xl-qemuu-debianhvm-amd64 16 guest-stop   fail
>>> >>> REGR. vs. 104119
>>> >>>
>>> >>> Jan 12 01:25:17.397607 (XEN) Assertion 'intack.vector >=
>>> >>> pt_vector' failed at
>>> >>> intr.c:321
>>> >>> Jan 12 01:25:37.133596 (XEN) ----[ Xen-4.9-unstable  x86_64
>>> >>> debug=y Not tainted ]----
>>> >>> Jan 12 01:25:37.141577 (XEN) CPU:    14
>>> >>> Jan 12 01:25:37.141607 (XEN) RIP:    e008:[<ffff82d0801ef7fc>]
>>> >>> vmx_intr_assist+0x35e/0x51d
>>> >>> Jan 12 01:25:37.149617 (XEN) RFLAGS: 0000000000010202
>>CONTEXT:
>>> >>> hypervisor (d15v0)
>>> >>> Jan 12 01:25:37.149655 (XEN) rax: 0000000000000038   rbx:
>>> >>> ffff830079e1e000   rcx: 0000000000000030
>>> >>> Jan 12 01:25:37.157582 (XEN) rdx: 0000000000000000   rsi:
>>> >>> 0000000000000030   rdi: ffff830079e1e000
>>> >>> Jan 12 01:25:37.165584 (XEN) rbp: ffff83047de2ff08   rsp:
>>ffff83047de2fea8
>>> >>> r8:  ffff82c00022f000
>>> >>> Jan 12 01:25:37.173579 (XEN) r9:  ffff8301b63ede80   r10:
>>> >>> ffff830176386560   r11: 000001955ee79bd0
>>> >>> Jan 12 01:25:37.181582 (XEN) r12: 0000000000003002   r13:
>>> >>> 0000000000003002   r14: 0000000000000030
>>> >>> Jan 12 01:25:37.189584 (XEN) r15: ffff83023fec2000   cr0:
>>> >>> 0000000080050033   cr4: 00000000003526e0
>>> >>> Jan 12 01:25:37.197572 (XEN) cr3: 0000000232edb000   cr2:
>>> >>> 0000000002487034
>>> >>> Jan 12 01:25:37.205569 (XEN) ds: 0000   es: 0000   fs: 0000   gs:
>>0000
>>> >>> ss: 0000   cs: e008
>>> >>> Jan 12 01:25:37.205606 (XEN) Xen code around <ffff82d0801ef7fc>
>>> >>> (vmx_intr_assist+0x35e/0x51d):
>>> >>> Jan 12 01:25:37.213575 (XEN)  41 0f b6 f6 39 f0 7e 02 <0f> 0b 48
>>> >>> 89 df e8 51
>>> >>> 20 00 00 b8 10 08 00 00 0f Jan 12 01:25:37.221561 (XEN) Xen stack
>>> >>> trace
>>> >> >from rsp=ffff83047de2fea8:
>>> >>> Jan 12 01:25:37.229600 (XEN)    ffff82d08031aa80 00000038ffffffff
>>> >>> ffff83047de2ffff ffff83023fec2000
>>> >>> Jan 12 01:25:37.237594 (XEN)    ffff83047de2fef8 ffff82d080130cb6
>>> >>> ffff830079e1e000 ffff830079e1e000
>>> >>> Jan 12 01:25:37.245588 (XEN)    ffff83007bae2000
>>000000000000000e
>>> >>> ffff830233117000 ffff83023fec2000
>>> >>> Jan 12 01:25:37.253594 (XEN)    ffff83047de2fdc0 ffff82d0801fdeb1
>>> >>> 0000000000000004 00000000000000c2
>>> >>> Jan 12 01:25:37.261584 (XEN)    0000000000000020
>>0000000000000007
>>> >>> ffff8800e8d28000 ffffffff81add0a0
>>> >>> Jan 12 01:25:37.269607 (XEN)    0000000000000246
>>0000000000000000
>>> >>> ffff880142400008 0000000000000004
>>> >>> Jan 12 01:25:37.277580 (XEN)    0000000000000036
>>0000000000000000
>>> >>> 00000000000003f8 00000000000003f8
>>> >>> Jan 12 01:25:37.285584 (XEN)    ffffffff81add0a0 0000beef0000beef
>>> >>> ffffffff813899a4 000000bf0000beef
>>> >>> Jan 12 01:25:37.293567 (XEN)    0000000000000002
>>ffff880147c03e08
>>> >>> 000000000000beef 1cec835356e5beef
>>> >>> Jan 12 01:25:37.293606 (XEN)    085d8b002674beef
>>01dcb38b000cbeef
>>> >>> 8914458d3174beef 2444c7100000000e
>>> >>> Jan 12 01:25:37.301586 (XEN)    ffff830079e1e000
>>00000031bfc37600
>>> >>> 00000000003526e0
>>> >>> Jan 12 01:25:37.309607 (XEN) Xen call trace:
>>> >>> Jan 12 01:25:37.309639 (XEN)    [<ffff82d0801ef7fc>]
>>> >>> vmx_intr_assist+0x35e/0x51d
>>> >>> Jan 12 01:25:37.317591 (XEN)    [<ffff82d0801fdeb1>]
>>> >>> vmx_asm_vmexit_handler+0x41/0x120
>>> >>> Jan 12 01:25:37.325598 (XEN)
>>> >>> Jan 12 01:25:37.325624 (XEN)
>>> >>> Jan 12 01:25:37.325647 (XEN)
>>> >>> ****************************************
>>> >>> Jan 12 01:25:37.333653 (XEN) Panic on CPU 14:
>>> >>> Jan 12 01:25:37.333684 (XEN) Assertion 'intack.vector >=
>>> >>> pt_vector' failed at
>>> >>> intr.c:321 Jan 12 01:25:37.341571 (XEN)
>>> >>> ****************************************
>>> >>> Jan 12 01:25:37.341603 (XEN)
>>> >>> Jan 12 01:25:37.341626 (XEN) Reboot in five seconds...
>>> >>> Jan 12 01:25:37.349566 (XEN) Resetting with ACPI MEMORY or I/O
>>> >>> RESET_REG.
>>> >>>
>>> >>> This is caused by "x86/apicv: fix RTC periodic timer and apicv
>>> >>> issue".  It is not a deterministic issue, as it appears to have
>>> >>> survived a week of testing already, but there is clearly something still
>>problematic with the code.
>>> >>>
>>> >>
>>> >> Andrew,
>>> >> If you have, could you give more information?
>>> >
>>> > No further information sorry.  This was found by the automated test
>>system.
>>>
>>> But some can be gathered:
>>>
>>> > Full logs are available from
>>> > http://logs.test-lab.xenproject.org/osstest/logs/104131/test-amd64-i
>>> > 386-xl-q
>>> > emuu-debianhvm-amd64/
>>> > but I doubt any of them will help in diagnosing the issue any further.
>>> >
>>> >> Such as the value of intack.vector / pt_vector..
>>>
>>> At leastb one of the two values is likely to live in a register, and
>>> hence its value would be available in the dump. Just takes looking at
>>> the disassembly.
>>>
>>> >> I guess, the reason may be that the intack.vector is ' uint8_t ' and the
>>pt_vector is 'int'..
>>>
>>> That would be odd.
>>>
>>> >> Or there is a corner case that intack.vector is __not__ the highest
>>priority vector..
>>>
>>> That's what I'm afraid of, and why I had asked to add the ASSERT().
>>>
>>
>>I cannot come up a valid reason for such situation (intack.vector is 0x30
>>while pt_vector is 0x38 from Chao's data). pt_update_irq is invoked before
>>checking highest pending IRRs so pt_vector should be honored anyway.
>>One possible reason is that being some reason pt_vector is not in vIRR at
>>that point (due to some bug in the path from PIR to vIRR). However I didn't
>>catch such bug simply by looking at code. We need reproduce this problem
>>in developer side to find out actual reason. Andrew it'd be helpful if you
>>can help Quan/Chao to find out more test environment info.
>>
>
>I'll continue to follow up this issue..
>However I don't have enough CPU-v3 machine for test it(occupied by another project).. I hope Chao could build some test environment.. 
>
No problem. When you come up with ideas or find some clues, I can verify them for you.

thanks
Chao
>
>Quan 
>
>
>
>
>>One thing noted though. The original patch from Quan is actually
>>orthogonal to this ASSERT. Regardless of whether intack.vector is larger or
>>smaller than pt_vector, we always require the trick as long as pt_vector is
>>not the one being currently programmed to RVI. Then do we want to revert
>>the whole commit until the problem is finally fixed, or OK to just remove
>>ASSERT (or replace with WARN_ON with more debug info) to unblock test
>>system before the fix is ready?
>>
>>Thanks
>>Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-01-12 12:25       ` Jan Beulich
@ 2017-01-16  5:25         ` Tian, Kevin
  2017-01-16  6:27           ` Xuquan (Quan Xu)
                             ` (3 more replies)
  0 siblings, 4 replies; 32+ messages in thread
From: Tian, Kevin @ 2017-01-16  5:25 UTC (permalink / raw)
  To: Jan Beulich, Xuquan (Quan Xu)
  Cc: Andrew Cooper, xen-devel, osstest service owner, Gao, Chao

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Thursday, January 12, 2017 8:26 PM
> 
> >>> On 12.01.17 at 13:15, <andrew.cooper3@citrix.com> wrote:
> > On 12/01/17 12:07, Xuquan (Quan Xu) wrote:
> >> On January 12, 2017 5:14 PM, Andrew Cooper wrote:
> >>> On 12/01/2017 06:46, osstest service owner wrote:
> >>>> flight 104131 xen-unstable real [real]
> >>>> http://logs.test-lab.xenproject.org/osstest/logs/104131/
> >>>>
> >>>> Regressions :-(
> >>>>
> >>>> Tests which did not succeed and are blocking, including tests which
> >>>> could not be run:
> >>>>  test-amd64-i386-xl-qemuu-debianhvm-amd64 16 guest-stop   fail
> >>> REGR. vs. 104119
> >>>
> >>> Jan 12 01:25:17.397607 (XEN) Assertion 'intack.vector >= pt_vector' failed at
> >>> intr.c:321
> >>> Jan 12 01:25:37.133596 (XEN) ----[ Xen-4.9-unstable  x86_64  debug=y
> >>> Not tainted ]----
> >>> Jan 12 01:25:37.141577 (XEN) CPU:    14
> >>> Jan 12 01:25:37.141607 (XEN) RIP:    e008:[<ffff82d0801ef7fc>]
> >>> vmx_intr_assist+0x35e/0x51d
> >>> Jan 12 01:25:37.149617 (XEN) RFLAGS: 0000000000010202   CONTEXT:
> >>> hypervisor (d15v0)
> >>> Jan 12 01:25:37.149655 (XEN) rax: 0000000000000038   rbx:
> >>> ffff830079e1e000   rcx: 0000000000000030
> >>> Jan 12 01:25:37.157582 (XEN) rdx: 0000000000000000   rsi:
> >>> 0000000000000030   rdi: ffff830079e1e000
> >>> Jan 12 01:25:37.165584 (XEN) rbp: ffff83047de2ff08   rsp: ffff83047de2fea8
> >>> r8:  ffff82c00022f000
> >>> Jan 12 01:25:37.173579 (XEN) r9:  ffff8301b63ede80   r10:
> >>> ffff830176386560   r11: 000001955ee79bd0
> >>> Jan 12 01:25:37.181582 (XEN) r12: 0000000000003002   r13:
> >>> 0000000000003002   r14: 0000000000000030
> >>> Jan 12 01:25:37.189584 (XEN) r15: ffff83023fec2000   cr0:
> >>> 0000000080050033   cr4: 00000000003526e0
> >>> Jan 12 01:25:37.197572 (XEN) cr3: 0000000232edb000   cr2:
> >>> 0000000002487034
> >>> Jan 12 01:25:37.205569 (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000
> >>> ss: 0000   cs: e008
> >>> Jan 12 01:25:37.205606 (XEN) Xen code around <ffff82d0801ef7fc>
> >>> (vmx_intr_assist+0x35e/0x51d):
> >>> Jan 12 01:25:37.213575 (XEN)  41 0f b6 f6 39 f0 7e 02 <0f> 0b 48 89 df e8 51
> >>> 20 00 00 b8 10 08 00 00 0f Jan 12 01:25:37.221561 (XEN) Xen stack trace
> >> >from rsp=ffff83047de2fea8:
> >>> Jan 12 01:25:37.229600 (XEN)    ffff82d08031aa80 00000038ffffffff
> >>> ffff83047de2ffff ffff83023fec2000
> >>> Jan 12 01:25:37.237594 (XEN)    ffff83047de2fef8 ffff82d080130cb6
> >>> ffff830079e1e000 ffff830079e1e000
> >>> Jan 12 01:25:37.245588 (XEN)    ffff83007bae2000 000000000000000e
> >>> ffff830233117000 ffff83023fec2000
> >>> Jan 12 01:25:37.253594 (XEN)    ffff83047de2fdc0 ffff82d0801fdeb1
> >>> 0000000000000004 00000000000000c2
> >>> Jan 12 01:25:37.261584 (XEN)    0000000000000020 0000000000000007
> >>> ffff8800e8d28000 ffffffff81add0a0
> >>> Jan 12 01:25:37.269607 (XEN)    0000000000000246 0000000000000000
> >>> ffff880142400008 0000000000000004
> >>> Jan 12 01:25:37.277580 (XEN)    0000000000000036 0000000000000000
> >>> 00000000000003f8 00000000000003f8
> >>> Jan 12 01:25:37.285584 (XEN)    ffffffff81add0a0 0000beef0000beef
> >>> ffffffff813899a4 000000bf0000beef
> >>> Jan 12 01:25:37.293567 (XEN)    0000000000000002 ffff880147c03e08
> >>> 000000000000beef 1cec835356e5beef
> >>> Jan 12 01:25:37.293606 (XEN)    085d8b002674beef 01dcb38b000cbeef
> >>> 8914458d3174beef 2444c7100000000e
> >>> Jan 12 01:25:37.301586 (XEN)    ffff830079e1e000 00000031bfc37600
> >>> 00000000003526e0
> >>> Jan 12 01:25:37.309607 (XEN) Xen call trace:
> >>> Jan 12 01:25:37.309639 (XEN)    [<ffff82d0801ef7fc>]
> >>> vmx_intr_assist+0x35e/0x51d
> >>> Jan 12 01:25:37.317591 (XEN)    [<ffff82d0801fdeb1>]
> >>> vmx_asm_vmexit_handler+0x41/0x120
> >>> Jan 12 01:25:37.325598 (XEN)
> >>> Jan 12 01:25:37.325624 (XEN)
> >>> Jan 12 01:25:37.325647 (XEN)
> >>> ****************************************
> >>> Jan 12 01:25:37.333653 (XEN) Panic on CPU 14:
> >>> Jan 12 01:25:37.333684 (XEN) Assertion 'intack.vector >= pt_vector' failed at
> >>> intr.c:321 Jan 12 01:25:37.341571 (XEN)
> >>> ****************************************
> >>> Jan 12 01:25:37.341603 (XEN)
> >>> Jan 12 01:25:37.341626 (XEN) Reboot in five seconds...
> >>> Jan 12 01:25:37.349566 (XEN) Resetting with ACPI MEMORY or I/O
> >>> RESET_REG.
> >>>
> >>> This is caused by "x86/apicv: fix RTC periodic timer and apicv issue".  It is
> >>> not a deterministic issue, as it appears to have survived a week of testing
> >>> already, but there is clearly something still problematic with the code.
> >>>
> >>
> >> Andrew,
> >> If you have, could you give more information?
> >
> > No further information sorry.  This was found by the automated test system.
> 
> But some can be gathered:
> 
> > Full logs are available from
> > http://logs.test-lab.xenproject.org/osstest/logs/104131/test-amd64-i386-xl-q
> > emuu-debianhvm-amd64/
> > but I doubt any of them will help in diagnosing the issue any further.
> >
> >> Such as the value of intack.vector / pt_vector..
> 
> At leastb one of the two values is likely to live in a register, and
> hence its value would be available in the dump. Just takes looking
> at the disassembly.
> 
> >> I guess, the reason may be that the intack.vector is ' uint8_t ' and the pt_vector is 'int'..
> 
> That would be odd.
> 
> >> Or there is a corner case that intack.vector is __not__ the highest priority vector..
> 
> That's what I'm afraid of, and why I had asked to add the ASSERT().
> 

I cannot come up a valid reason for such situation (intack.vector is 0x30 
while pt_vector is 0x38 from Chao's data). pt_update_irq is invoked before
checking highest pending IRRs so pt_vector should be honored anyway. 
One possible reason is that being some reason pt_vector is not in vIRR at 
that point (due to some bug in the path from PIR to vIRR). However I 
didn't catch such bug simply by looking at code. We need reproduce this 
problem in developer side to find out actual reason. Andrew it'd be helpful
if you can help Quan/Chao to find out more test environment info.

One thing noted though. The original patch from Quan is actually orthogonal
to this ASSERT. Regardless of whether intack.vector is larger or smaller
than pt_vector, we always require the trick as long as pt_vector is not the
one being currently programmed to RVI. Then do we want to revert the whole
commit until the problem is finally fixed, or OK to just remove ASSERT 
(or replace with WARN_ON with more debug info) to unblock test system
before the fix is ready?

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-01-16  5:25         ` Tian, Kevin
@ 2017-01-16  6:27           ` Xuquan (Quan Xu)
  2017-01-16  0:06             ` Chao Gao
  2017-01-16 11:00           ` Jan Beulich
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 32+ messages in thread
From: Xuquan (Quan Xu) @ 2017-01-16  6:27 UTC (permalink / raw)
  To: Tian, Kevin, Jan Beulich, Gao, Chao
  Cc: Andrew Cooper, osstest service owner, xen-devel

On January 16, 2017 1:26 PM, Tian, Kevin wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Thursday, January 12, 2017 8:26 PM
>>
>> >>> On 12.01.17 at 13:15, <andrew.cooper3@citrix.com> wrote:
>> > On 12/01/17 12:07, Xuquan (Quan Xu) wrote:
>> >> On January 12, 2017 5:14 PM, Andrew Cooper wrote:
>> >>> On 12/01/2017 06:46, osstest service owner wrote:
>> >>>> flight 104131 xen-unstable real [real]
>> >>>> http://logs.test-lab.xenproject.org/osstest/logs/104131/
>> >>>>
>> >>>> Regressions :-(
>> >>>>
>> >>>> Tests which did not succeed and are blocking, including tests
>> >>>> which could not be run:
>> >>>>  test-amd64-i386-xl-qemuu-debianhvm-amd64 16 guest-stop   fail
>> >>> REGR. vs. 104119
>> >>>
>> >>> Jan 12 01:25:17.397607 (XEN) Assertion 'intack.vector >=
>> >>> pt_vector' failed at
>> >>> intr.c:321
>> >>> Jan 12 01:25:37.133596 (XEN) ----[ Xen-4.9-unstable  x86_64
>> >>> debug=y Not tainted ]----
>> >>> Jan 12 01:25:37.141577 (XEN) CPU:    14
>> >>> Jan 12 01:25:37.141607 (XEN) RIP:    e008:[<ffff82d0801ef7fc>]
>> >>> vmx_intr_assist+0x35e/0x51d
>> >>> Jan 12 01:25:37.149617 (XEN) RFLAGS: 0000000000010202
>CONTEXT:
>> >>> hypervisor (d15v0)
>> >>> Jan 12 01:25:37.149655 (XEN) rax: 0000000000000038   rbx:
>> >>> ffff830079e1e000   rcx: 0000000000000030
>> >>> Jan 12 01:25:37.157582 (XEN) rdx: 0000000000000000   rsi:
>> >>> 0000000000000030   rdi: ffff830079e1e000
>> >>> Jan 12 01:25:37.165584 (XEN) rbp: ffff83047de2ff08   rsp:
>ffff83047de2fea8
>> >>> r8:  ffff82c00022f000
>> >>> Jan 12 01:25:37.173579 (XEN) r9:  ffff8301b63ede80   r10:
>> >>> ffff830176386560   r11: 000001955ee79bd0
>> >>> Jan 12 01:25:37.181582 (XEN) r12: 0000000000003002   r13:
>> >>> 0000000000003002   r14: 0000000000000030
>> >>> Jan 12 01:25:37.189584 (XEN) r15: ffff83023fec2000   cr0:
>> >>> 0000000080050033   cr4: 00000000003526e0
>> >>> Jan 12 01:25:37.197572 (XEN) cr3: 0000000232edb000   cr2:
>> >>> 0000000002487034
>> >>> Jan 12 01:25:37.205569 (XEN) ds: 0000   es: 0000   fs: 0000   gs:
>0000
>> >>> ss: 0000   cs: e008
>> >>> Jan 12 01:25:37.205606 (XEN) Xen code around <ffff82d0801ef7fc>
>> >>> (vmx_intr_assist+0x35e/0x51d):
>> >>> Jan 12 01:25:37.213575 (XEN)  41 0f b6 f6 39 f0 7e 02 <0f> 0b 48
>> >>> 89 df e8 51
>> >>> 20 00 00 b8 10 08 00 00 0f Jan 12 01:25:37.221561 (XEN) Xen stack
>> >>> trace
>> >> >from rsp=ffff83047de2fea8:
>> >>> Jan 12 01:25:37.229600 (XEN)    ffff82d08031aa80 00000038ffffffff
>> >>> ffff83047de2ffff ffff83023fec2000
>> >>> Jan 12 01:25:37.237594 (XEN)    ffff83047de2fef8 ffff82d080130cb6
>> >>> ffff830079e1e000 ffff830079e1e000
>> >>> Jan 12 01:25:37.245588 (XEN)    ffff83007bae2000
>000000000000000e
>> >>> ffff830233117000 ffff83023fec2000
>> >>> Jan 12 01:25:37.253594 (XEN)    ffff83047de2fdc0 ffff82d0801fdeb1
>> >>> 0000000000000004 00000000000000c2
>> >>> Jan 12 01:25:37.261584 (XEN)    0000000000000020
>0000000000000007
>> >>> ffff8800e8d28000 ffffffff81add0a0
>> >>> Jan 12 01:25:37.269607 (XEN)    0000000000000246
>0000000000000000
>> >>> ffff880142400008 0000000000000004
>> >>> Jan 12 01:25:37.277580 (XEN)    0000000000000036
>0000000000000000
>> >>> 00000000000003f8 00000000000003f8
>> >>> Jan 12 01:25:37.285584 (XEN)    ffffffff81add0a0 0000beef0000beef
>> >>> ffffffff813899a4 000000bf0000beef
>> >>> Jan 12 01:25:37.293567 (XEN)    0000000000000002
>ffff880147c03e08
>> >>> 000000000000beef 1cec835356e5beef
>> >>> Jan 12 01:25:37.293606 (XEN)    085d8b002674beef
>01dcb38b000cbeef
>> >>> 8914458d3174beef 2444c7100000000e
>> >>> Jan 12 01:25:37.301586 (XEN)    ffff830079e1e000
>00000031bfc37600
>> >>> 00000000003526e0
>> >>> Jan 12 01:25:37.309607 (XEN) Xen call trace:
>> >>> Jan 12 01:25:37.309639 (XEN)    [<ffff82d0801ef7fc>]
>> >>> vmx_intr_assist+0x35e/0x51d
>> >>> Jan 12 01:25:37.317591 (XEN)    [<ffff82d0801fdeb1>]
>> >>> vmx_asm_vmexit_handler+0x41/0x120
>> >>> Jan 12 01:25:37.325598 (XEN)
>> >>> Jan 12 01:25:37.325624 (XEN)
>> >>> Jan 12 01:25:37.325647 (XEN)
>> >>> ****************************************
>> >>> Jan 12 01:25:37.333653 (XEN) Panic on CPU 14:
>> >>> Jan 12 01:25:37.333684 (XEN) Assertion 'intack.vector >=
>> >>> pt_vector' failed at
>> >>> intr.c:321 Jan 12 01:25:37.341571 (XEN)
>> >>> ****************************************
>> >>> Jan 12 01:25:37.341603 (XEN)
>> >>> Jan 12 01:25:37.341626 (XEN) Reboot in five seconds...
>> >>> Jan 12 01:25:37.349566 (XEN) Resetting with ACPI MEMORY or I/O
>> >>> RESET_REG.
>> >>>
>> >>> This is caused by "x86/apicv: fix RTC periodic timer and apicv
>> >>> issue".  It is not a deterministic issue, as it appears to have
>> >>> survived a week of testing already, but there is clearly something still
>problematic with the code.
>> >>>
>> >>
>> >> Andrew,
>> >> If you have, could you give more information?
>> >
>> > No further information sorry.  This was found by the automated test
>system.
>>
>> But some can be gathered:
>>
>> > Full logs are available from
>> > http://logs.test-lab.xenproject.org/osstest/logs/104131/test-amd64-i
>> > 386-xl-q
>> > emuu-debianhvm-amd64/
>> > but I doubt any of them will help in diagnosing the issue any further.
>> >
>> >> Such as the value of intack.vector / pt_vector..
>>
>> At leastb one of the two values is likely to live in a register, and
>> hence its value would be available in the dump. Just takes looking at
>> the disassembly.
>>
>> >> I guess, the reason may be that the intack.vector is ' uint8_t ' and the
>pt_vector is 'int'..
>>
>> That would be odd.
>>
>> >> Or there is a corner case that intack.vector is __not__ the highest
>priority vector..
>>
>> That's what I'm afraid of, and why I had asked to add the ASSERT().
>>
>
>I cannot come up a valid reason for such situation (intack.vector is 0x30
>while pt_vector is 0x38 from Chao's data). pt_update_irq is invoked before
>checking highest pending IRRs so pt_vector should be honored anyway.
>One possible reason is that being some reason pt_vector is not in vIRR at
>that point (due to some bug in the path from PIR to vIRR). However I didn't
>catch such bug simply by looking at code. We need reproduce this problem
>in developer side to find out actual reason. Andrew it'd be helpful if you
>can help Quan/Chao to find out more test environment info.
>

I'll continue to follow up this issue..
However I don't have enough CPU-v3 machine for test it(occupied by another project).. I hope Chao could build some test environment.. 


Quan 




>One thing noted though. The original patch from Quan is actually
>orthogonal to this ASSERT. Regardless of whether intack.vector is larger or
>smaller than pt_vector, we always require the trick as long as pt_vector is
>not the one being currently programmed to RVI. Then do we want to revert
>the whole commit until the problem is finally fixed, or OK to just remove
>ASSERT (or replace with WARN_ON with more debug info) to unblock test
>system before the fix is ready?
>
>Thanks
>Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-01-16  5:25         ` Tian, Kevin
  2017-01-16  6:27           ` Xuquan (Quan Xu)
@ 2017-01-16 11:00           ` Jan Beulich
  2017-01-18  4:57             ` Tian, Kevin
  2017-01-20  9:08           ` Xuquan (Quan Xu)
  2017-01-23 10:56           ` Xuquan (Quan Xu)
  3 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2017-01-16 11:00 UTC (permalink / raw)
  To: Xuquan (Quan Xu), Kevin Tian
  Cc: Andrew Cooper, xen-devel, osstest service owner, Chao Gao

>>> On 16.01.17 at 06:25, <kevin.tian@intel.com> wrote:
> One thing noted though. The original patch from Quan is actually orthogonal
> to this ASSERT. Regardless of whether intack.vector is larger or smaller
> than pt_vector, we always require the trick as long as pt_vector is not the
> one being currently programmed to RVI.

I don't think the ASSERT() addition is orthogonal: It exchanges
intack.vector for pt_vector in the invocation of
vmx_set_eoi_exit_bitmap(), and during discussion of the patch
there at least intermediately was max() of the two used instead.
It was - iirc - one of you who suggested that the use of max()
there is unnecessary, which the ASSERT() triggering has now
shown is wrong.

> Then do we want to revert the whole
> commit until the problem is finally fixed, or OK to just remove ASSERT 
> (or replace with WARN_ON with more debug info) to unblock test system
> before the fix is ready?

Well, as the VMX maintainer I think the proposal of whether to
revert or wait should really come from you.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-01-16 11:00           ` Jan Beulich
@ 2017-01-18  4:57             ` Tian, Kevin
  2017-01-18  9:38               ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Tian, Kevin @ 2017-01-18  4:57 UTC (permalink / raw)
  To: Jan Beulich, Xuquan (Quan Xu)
  Cc: Andrew Cooper, xen-devel, osstest service owner, Gao, Chao

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, January 16, 2017 7:00 PM
> 
> >>> On 16.01.17 at 06:25, <kevin.tian@intel.com> wrote:
> > One thing noted though. The original patch from Quan is actually orthogonal
> > to this ASSERT. Regardless of whether intack.vector is larger or smaller
> > than pt_vector, we always require the trick as long as pt_vector is not the
> > one being currently programmed to RVI.
> 
> I don't think the ASSERT() addition is orthogonal: It exchanges
> intack.vector for pt_vector in the invocation of
> vmx_set_eoi_exit_bitmap(), and during discussion of the patch
> there at least intermediately was max() of the two used instead.
> It was - iirc - one of you who suggested that the use of max()
> there is unnecessary, which the ASSERT() triggering has now
> shown is wrong.

Attached was my earlier comment:

--
> >>> On 20.12.16 at 06:37, <kevin.tian@intel.com> wrote:
> >>  From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
> >> Sent: Friday, December 16, 2016 5:40 PM
> >> -        if (pt_vector != -1)
> >> -            vmx_set_eoi_exit_bitmap(v, pt_vector);
> >> +        if ( pt_vector != -1 ) {
> >> +            if ( intack.vector > pt_vector )
> >> +                vmx_set_eoi_exit_bitmap(v, intack.vector);
> >> +            else
> >> +                vmx_set_eoi_exit_bitmap(v, pt_vector);
> >> +        }
> >
> > Above can be simplified as one line change:
> > 	if ( pt_vector != -1 )
> > 		vmx_set_eoi_exit_bitmap(v, intack.vector);
> 
> Hmm, I don't understand. Did you mean to use max() here? Or
> else how is this an equivalent of the originally proposed code?
> 

Original code is not 100% correct. The purpose is to set EOI exit
bitmap for any vector which may block injection of pt_vector - 
give chance to recognize pt_vector in future intack and then do pt 
intr post. The simplified code achieves this effect same as original
code if intack.vector >= vector. I cannot come up a case why
intack.vector might be smaller than vector. If this case happens,
we still need enable exit bitmap for intack.vector instead of
pt_vector for said purpose while original code did it wrong.

Thanks
Kevin
--

Using intack.vector is always expected here regardless of the 
comparison result between intack.vector and pt_vector. The 
reason why I was OK adding an ASSERT was simply to test 
whether intack.vecor<pt_vector does happen which is
orthogonal to the fix itself.

> 
> > Then do we want to revert the whole
> > commit until the problem is finally fixed, or OK to just remove ASSERT
> > (or replace with WARN_ON with more debug info) to unblock test system
> > before the fix is ready?
> 
> Well, as the VMX maintainer I think the proposal of whether to
> revert or wait should really come from you.
> 
> Jan

Andrew, how long do you usually tolerate a failure case in osstest?
I'm not sure how long it may take for developer to reproduce this
situation. If it has blocking impact in your side, I'd suggest go 
replacing ASSERT with more informative warn info before final 
root-cause, if Quan&Chao cannot reproduce in a short time (say
1 or 2wk or so).

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-01-18  4:57             ` Tian, Kevin
@ 2017-01-18  9:38               ` Jan Beulich
  2017-01-18 10:23                 ` Tian, Kevin
  2017-01-20  8:47                 ` Xuquan (Quan Xu)
  0 siblings, 2 replies; 32+ messages in thread
From: Jan Beulich @ 2017-01-18  9:38 UTC (permalink / raw)
  To: Xuquan (Quan Xu), Kevin Tian
  Cc: Andrew Cooper, xen-devel, osstest service owner, Chao Gao

>>> On 18.01.17 at 05:57, <kevin.tian@intel.com> wrote:
> Attached was my earlier comment:
> 
> --
>> >>> On 20.12.16 at 06:37, <kevin.tian@intel.com> wrote:
>> >>  From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
>> >> Sent: Friday, December 16, 2016 5:40 PM
>> >> -        if (pt_vector != -1)
>> >> -            vmx_set_eoi_exit_bitmap(v, pt_vector);
>> >> +        if ( pt_vector != -1 ) {
>> >> +            if ( intack.vector > pt_vector )
>> >> +                vmx_set_eoi_exit_bitmap(v, intack.vector);
>> >> +            else
>> >> +                vmx_set_eoi_exit_bitmap(v, pt_vector);
>> >> +        }
>> >
>> > Above can be simplified as one line change:
>> > 	if ( pt_vector != -1 )
>> > 		vmx_set_eoi_exit_bitmap(v, intack.vector);
>> 
>> Hmm, I don't understand. Did you mean to use max() here? Or
>> else how is this an equivalent of the originally proposed code?
>> 
> 
> Original code is not 100% correct. The purpose is to set EOI exit
> bitmap for any vector which may block injection of pt_vector - 
> give chance to recognize pt_vector in future intack and then do pt 
> intr post. The simplified code achieves this effect same as original
> code if intack.vector >= vector. I cannot come up a case why
> intack.vector might be smaller than vector. If this case happens,
> we still need enable exit bitmap for intack.vector instead of
> pt_vector for said purpose while original code did it wrong.
> 
> Thanks
> Kevin
> --
> 
> Using intack.vector is always expected here regardless of the 
> comparison result between intack.vector and pt_vector. The 
> reason why I was OK adding an ASSERT was simply to test 
> whether intack.vecor<pt_vector does happen which is
> orthogonal to the fix itself.

Well, a vector lower than pt_vector can't block delivery. Or wait:
Don't we need to consider vector classes here, i.e.

            ASSERT((intack.vector >> 4) >= (pt_vector >> 4));

?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-01-18  9:38               ` Jan Beulich
@ 2017-01-18 10:23                 ` Tian, Kevin
  2017-01-20 11:48                   ` Jan Beulich
  2017-01-20  8:47                 ` Xuquan (Quan Xu)
  1 sibling, 1 reply; 32+ messages in thread
From: Tian, Kevin @ 2017-01-18 10:23 UTC (permalink / raw)
  To: Jan Beulich, Xuquan (Quan Xu)
  Cc: Andrew Cooper, xen-devel, osstest service owner, Gao, Chao

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Wednesday, January 18, 2017 5:38 PM
> 
> >>> On 18.01.17 at 05:57, <kevin.tian@intel.com> wrote:
> > Attached was my earlier comment:
> >
> > --
> >> >>> On 20.12.16 at 06:37, <kevin.tian@intel.com> wrote:
> >> >>  From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
> >> >> Sent: Friday, December 16, 2016 5:40 PM
> >> >> -        if (pt_vector != -1)
> >> >> -            vmx_set_eoi_exit_bitmap(v, pt_vector);
> >> >> +        if ( pt_vector != -1 ) {
> >> >> +            if ( intack.vector > pt_vector )
> >> >> +                vmx_set_eoi_exit_bitmap(v, intack.vector);
> >> >> +            else
> >> >> +                vmx_set_eoi_exit_bitmap(v, pt_vector);
> >> >> +        }
> >> >
> >> > Above can be simplified as one line change:
> >> > 	if ( pt_vector != -1 )
> >> > 		vmx_set_eoi_exit_bitmap(v, intack.vector);
> >>
> >> Hmm, I don't understand. Did you mean to use max() here? Or
> >> else how is this an equivalent of the originally proposed code?
> >>
> >
> > Original code is not 100% correct. The purpose is to set EOI exit
> > bitmap for any vector which may block injection of pt_vector -
> > give chance to recognize pt_vector in future intack and then do pt
> > intr post. The simplified code achieves this effect same as original
> > code if intack.vector >= vector. I cannot come up a case why
> > intack.vector might be smaller than vector. If this case happens,
> > we still need enable exit bitmap for intack.vector instead of
> > pt_vector for said purpose while original code did it wrong.
> >
> > Thanks
> > Kevin
> > --
> >
> > Using intack.vector is always expected here regardless of the
> > comparison result between intack.vector and pt_vector. The
> > reason why I was OK adding an ASSERT was simply to test
> > whether intack.vecor<pt_vector does happen which is
> > orthogonal to the fix itself.
> 
> Well, a vector lower than pt_vector can't block delivery. Or wait:

There are two points here:

a) We need enable EOI exit bitmap when pt_vector is blocked.

b) As you said, ideally a vector lower than pt_vecotr cannot block

The patch fixed a) and then added an ASSERT to verify b). Strictly
speaking they are separate issues.

> Don't we need to consider vector classes here, i.e.
> 
>             ASSERT((intack.vector >> 4) >= (pt_vector >> 4));
> 
> ?
> 

However it still doesn't explain why original ASSERT is triggered.
vlapic_find_highest_vector actually finds the highest vector, instead
of highest class...

static int vlapic_find_highest_vector(const void *bitmap)
{
    const uint32_t *word = bitmap;
    unsigned int word_offset = NR_VECTORS / 32;

    /* Work backwards through the bitmap (first 32-bit word in every four). */
    while ( (word_offset != 0) && (word[(--word_offset)*4] == 0) )
        continue;

    return (fls(word[word_offset*4]) - 1) + (word_offset * 32);
}

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-01-18  9:38               ` Jan Beulich
  2017-01-18 10:23                 ` Tian, Kevin
@ 2017-01-20  8:47                 ` Xuquan (Quan Xu)
  2017-01-20  8:56                   ` Jan Beulich
  1 sibling, 1 reply; 32+ messages in thread
From: Xuquan (Quan Xu) @ 2017-01-20  8:47 UTC (permalink / raw)
  To: Jan Beulich, Kevin Tian
  Cc: Andrew Cooper, xen-devel, osstest service owner, Chao Gao

On January 18, 2017 5:38 PM, Jan Beulich wrote:
>>>> On 18.01.17 at 05:57, <kevin.tian@intel.com> wrote:
>> Attached was my earlier comment:
>>
>> --
>>> >>> On 20.12.16 at 06:37, <kevin.tian@intel.com> wrote:
>>> >>  From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
>>> >> Sent: Friday, December 16, 2016 5:40 PM
>>> >> -        if (pt_vector != -1)
>>> >> -            vmx_set_eoi_exit_bitmap(v, pt_vector);
>>> >> +        if ( pt_vector != -1 ) {
>>> >> +            if ( intack.vector > pt_vector )
>>> >> +                vmx_set_eoi_exit_bitmap(v, intack.vector);
>>> >> +            else
>>> >> +                vmx_set_eoi_exit_bitmap(v, pt_vector);
>>> >> +        }
>>> >
>>> > Above can be simplified as one line change:
>>> > 	if ( pt_vector != -1 )
>>> > 		vmx_set_eoi_exit_bitmap(v, intack.vector);
>>>
>>> Hmm, I don't understand. Did you mean to use max() here? Or else how
>>> is this an equivalent of the originally proposed code?
>>>
>>
>> Original code is not 100% correct. The purpose is to set EOI exit
>> bitmap for any vector which may block injection of pt_vector - give
>> chance to recognize pt_vector in future intack and then do pt intr
>> post. The simplified code achieves this effect same as original code
>> if intack.vector >= vector. I cannot come up a case why intack.vector
>> might be smaller than vector. If this case happens, we still need
>> enable exit bitmap for intack.vector instead of pt_vector for said
>> purpose while original code did it wrong.
>>
>> Thanks
>> Kevin
>> --
>>
>> Using intack.vector is always expected here regardless of the
>> comparison result between intack.vector and pt_vector. The reason why
>> I was OK adding an ASSERT was simply to test whether
>> intack.vecor<pt_vector does happen which is orthogonal to the fix
>> itself.
>
>Well, a vector lower than pt_vector can't block delivery. Or wait:
>Don't we need to consider vector classes here, i.e.
>
>            ASSERT((intack.vector >> 4) >= (pt_vector >> 4));
>

Jan, I can't follow vector classes.. could you explain more? Thanks..


Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-01-20  8:47                 ` Xuquan (Quan Xu)
@ 2017-01-20  8:56                   ` Jan Beulich
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2017-01-20  8:56 UTC (permalink / raw)
  To: Xuquan (Quan Xu)
  Cc: Andrew Cooper, Kevin Tian, xen-devel, osstest service owner, Chao Gao

>>> On 20.01.17 at 09:47, <xuquan8@huawei.com> wrote:
> Jan, I can't follow vector classes.. could you explain more? Thanks..

For determining vector priority, the LAPIC uses only the high 4 bits.
Iirc this is well documented in the SDM.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-01-16  5:25         ` Tian, Kevin
  2017-01-16  6:27           ` Xuquan (Quan Xu)
  2017-01-16 11:00           ` Jan Beulich
@ 2017-01-20  9:08           ` Xuquan (Quan Xu)
  2017-01-23 10:56           ` Xuquan (Quan Xu)
  3 siblings, 0 replies; 32+ messages in thread
From: Xuquan (Quan Xu) @ 2017-01-20  9:08 UTC (permalink / raw)
  To: Tian, Kevin, Jan Beulich
  Cc: Andrew Cooper, xen-devel, osstest service owner, Gao, Chao

On January 16, 2017 1:26 PM, Tian, Kevin wrote:
>I cannot come up a valid reason for such situation (intack.vector is 0x30 while
>pt_vector is 0x38 from Chao's data). pt_update_irq is invoked before checking
>highest pending IRRs so pt_vector should be honored anyway.
>One possible reason is that being some reason pt_vector is not in vIRR at that
>point (due to some bug in the path from PIR to vIRR). 

btw, for PIR.. I find that there might be a bug in __vmx_deliver_posted_interrupt()...
why test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) ??

static void __vmx_deliver_posted_interrupt(struct vcpu *v)
{
...
        if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
...
}

Suppose that vCPUx is in guest mode, there are two (even more) interrupts to vCPUx..
As the bit is set when delivers the first interrupt... the second interrupt is pending until next VM entry -- by PIR to vIRR..


Quan




However I didn't catch
>such bug simply by looking at code. We need reproduce this problem in
>developer side to find out actual reason. Andrew it'd be helpful if you can help
>Quan/Chao to find out more test environment info.
>
>One thing noted though. The original patch from Quan is actually orthogonal to
>this ASSERT. Regardless of whether intack.vector is larger or smaller than
>pt_vector, we always require the trick as long as pt_vector is not the one being
>currently programmed to RVI. Then do we want to revert the whole commit
>until the problem is finally fixed, or OK to just remove ASSERT (or replace with
>WARN_ON with more debug info) to unblock test system before the fix is
>ready?
>
>Thanks
>Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-01-18 10:23                 ` Tian, Kevin
@ 2017-01-20 11:48                   ` Jan Beulich
  2017-01-22  4:35                     ` Tian, Kevin
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2017-01-20 11:48 UTC (permalink / raw)
  To: Kevin Tian
  Cc: Andrew Cooper, Xuquan (Quan Xu),
	xen-devel, osstest service owner, Chao Gao

>>> On 18.01.17 at 11:23, <kevin.tian@intel.com> wrote:
>>  From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Wednesday, January 18, 2017 5:38 PM
>> 
>> >>> On 18.01.17 at 05:57, <kevin.tian@intel.com> wrote:
>> > Attached was my earlier comment:
>> >
>> > --
>> >> >>> On 20.12.16 at 06:37, <kevin.tian@intel.com> wrote:
>> >> >>  From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
>> >> >> Sent: Friday, December 16, 2016 5:40 PM
>> >> >> -        if (pt_vector != -1)
>> >> >> -            vmx_set_eoi_exit_bitmap(v, pt_vector);
>> >> >> +        if ( pt_vector != -1 ) {
>> >> >> +            if ( intack.vector > pt_vector )
>> >> >> +                vmx_set_eoi_exit_bitmap(v, intack.vector);
>> >> >> +            else
>> >> >> +                vmx_set_eoi_exit_bitmap(v, pt_vector);
>> >> >> +        }
>> >> >
>> >> > Above can be simplified as one line change:
>> >> > 	if ( pt_vector != -1 )
>> >> > 		vmx_set_eoi_exit_bitmap(v, intack.vector);
>> >>
>> >> Hmm, I don't understand. Did you mean to use max() here? Or
>> >> else how is this an equivalent of the originally proposed code?
>> >>
>> >
>> > Original code is not 100% correct. The purpose is to set EOI exit
>> > bitmap for any vector which may block injection of pt_vector -
>> > give chance to recognize pt_vector in future intack and then do pt
>> > intr post. The simplified code achieves this effect same as original
>> > code if intack.vector >= vector. I cannot come up a case why
>> > intack.vector might be smaller than vector. If this case happens,
>> > we still need enable exit bitmap for intack.vector instead of
>> > pt_vector for said purpose while original code did it wrong.
>> >
>> > Thanks
>> > Kevin
>> > --
>> >
>> > Using intack.vector is always expected here regardless of the
>> > comparison result between intack.vector and pt_vector. The
>> > reason why I was OK adding an ASSERT was simply to test
>> > whether intack.vecor<pt_vector does happen which is
>> > orthogonal to the fix itself.
>> 
>> Well, a vector lower than pt_vector can't block delivery. Or wait:
> 
> There are two points here:
> 
> a) We need enable EOI exit bitmap when pt_vector is blocked.
> 
> b) As you said, ideally a vector lower than pt_vecotr cannot block
> 
> The patch fixed a) and then added an ASSERT to verify b). Strictly
> speaking they are separate issues.

Okay, I think I finally understand your argumentation here.

>> Don't we need to consider vector classes here, i.e.
>> 
>>             ASSERT((intack.vector >> 4) >= (pt_vector >> 4));
>> 
>> ?
>> 
> 
> However it still doesn't explain why original ASSERT is triggered.
> vlapic_find_highest_vector actually finds the highest vector, instead
> of highest class...
> 
> static int vlapic_find_highest_vector(const void *bitmap)
> {
>     const uint32_t *word = bitmap;
>     unsigned int word_offset = NR_VECTORS / 32;
> 
>     /* Work backwards through the bitmap (first 32-bit word in every four). */
>     while ( (word_offset != 0) && (word[(--word_offset)*4] == 0) )
>         continue;
> 
>     return (fls(word[word_offset*4]) - 1) + (word_offset * 32);
> }

Well, perhaps a PIR -> IRR syncing issue then (I in particular note
the early bailing from vmx_sync_pir_to_irr())? I guess we'd want
to see the entire IRR array (and perhaps also PI state) if the check
in the assertion fails.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-01-20 11:48                   ` Jan Beulich
@ 2017-01-22  4:35                     ` Tian, Kevin
  2017-01-23 10:33                       ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Tian, Kevin @ 2017-01-22  4:35 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Xuquan (Quan Xu),
	xen-devel, osstest service owner, Gao, Chao

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Friday, January 20, 2017 7:49 PM
> 
> >>> On 18.01.17 at 11:23, <kevin.tian@intel.com> wrote:
> >>  From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Wednesday, January 18, 2017 5:38 PM
> >>
> >> >>> On 18.01.17 at 05:57, <kevin.tian@intel.com> wrote:
> >> > Attached was my earlier comment:
> >> >
> >> > --
> >> >> >>> On 20.12.16 at 06:37, <kevin.tian@intel.com> wrote:
> >> >> >>  From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
> >> >> >> Sent: Friday, December 16, 2016 5:40 PM
> >> >> >> -        if (pt_vector != -1)
> >> >> >> -            vmx_set_eoi_exit_bitmap(v, pt_vector);
> >> >> >> +        if ( pt_vector != -1 ) {
> >> >> >> +            if ( intack.vector > pt_vector )
> >> >> >> +                vmx_set_eoi_exit_bitmap(v, intack.vector);
> >> >> >> +            else
> >> >> >> +                vmx_set_eoi_exit_bitmap(v, pt_vector);
> >> >> >> +        }
> >> >> >
> >> >> > Above can be simplified as one line change:
> >> >> > 	if ( pt_vector != -1 )
> >> >> > 		vmx_set_eoi_exit_bitmap(v, intack.vector);
> >> >>
> >> >> Hmm, I don't understand. Did you mean to use max() here? Or
> >> >> else how is this an equivalent of the originally proposed code?
> >> >>
> >> >
> >> > Original code is not 100% correct. The purpose is to set EOI exit
> >> > bitmap for any vector which may block injection of pt_vector -
> >> > give chance to recognize pt_vector in future intack and then do pt
> >> > intr post. The simplified code achieves this effect same as original
> >> > code if intack.vector >= vector. I cannot come up a case why
> >> > intack.vector might be smaller than vector. If this case happens,
> >> > we still need enable exit bitmap for intack.vector instead of
> >> > pt_vector for said purpose while original code did it wrong.
> >> >
> >> > Thanks
> >> > Kevin
> >> > --
> >> >
> >> > Using intack.vector is always expected here regardless of the
> >> > comparison result between intack.vector and pt_vector. The
> >> > reason why I was OK adding an ASSERT was simply to test
> >> > whether intack.vecor<pt_vector does happen which is
> >> > orthogonal to the fix itself.
> >>
> >> Well, a vector lower than pt_vector can't block delivery. Or wait:
> >
> > There are two points here:
> >
> > a) We need enable EOI exit bitmap when pt_vector is blocked.
> >
> > b) As you said, ideally a vector lower than pt_vecotr cannot block
> >
> > The patch fixed a) and then added an ASSERT to verify b). Strictly
> > speaking they are separate issues.
> 
> Okay, I think I finally understand your argumentation here.
> 
> >> Don't we need to consider vector classes here, i.e.
> >>
> >>             ASSERT((intack.vector >> 4) >= (pt_vector >> 4));
> >>
> >> ?
> >>
> >
> > However it still doesn't explain why original ASSERT is triggered.
> > vlapic_find_highest_vector actually finds the highest vector, instead
> > of highest class...
> >
> > static int vlapic_find_highest_vector(const void *bitmap)
> > {
> >     const uint32_t *word = bitmap;
> >     unsigned int word_offset = NR_VECTORS / 32;
> >
> >     /* Work backwards through the bitmap (first 32-bit word in every four). */
> >     while ( (word_offset != 0) && (word[(--word_offset)*4] == 0) )
> >         continue;
> >
> >     return (fls(word[word_offset*4]) - 1) + (word_offset * 32);
> > }
> 
> Well, perhaps a PIR -> IRR syncing issue then (I in particular note
> the early bailing from vmx_sync_pir_to_irr())? I guess we'd want
> to see the entire IRR array (and perhaps also PI state) if the check
> in the assertion fails.
> 

Yes, I asked Chao to add some debug info in that case. The problem now
is when we will reproduce the bug to see meaningful hint...

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-01-22  4:35                     ` Tian, Kevin
@ 2017-01-23 10:33                       ` Jan Beulich
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2017-01-23 10:33 UTC (permalink / raw)
  To: Chao Gao, Kevin Tian
  Cc: Andrew Cooper, Xuquan (Quan Xu), osstest service owner, xen-devel

>>> On 22.01.17 at 05:35, <kevin.tian@intel.com> wrote:
> Yes, I asked Chao to add some debug info in that case. The problem now
> is when we will reproduce the bug to see meaningful hint...

If written reasonably, feel free to submit the patch for considering to
put it into staging for a while. The bigger problem is that we'd then
need to constantly scan logs for incidents.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-01-16  5:25         ` Tian, Kevin
                             ` (2 preceding siblings ...)
  2017-01-20  9:08           ` Xuquan (Quan Xu)
@ 2017-01-23 10:56           ` Xuquan (Quan Xu)
  2017-02-08  6:51             ` Tian, Kevin
  3 siblings, 1 reply; 32+ messages in thread
From: Xuquan (Quan Xu) @ 2017-01-23 10:56 UTC (permalink / raw)
  To: Xuquan (Quan Xu), Tian, Kevin, Jan Beulich
  Cc: Andrew Cooper, xen-devel, osstest service owner, Gao, Chao

On January 20, 2017 5:09 PM, Quan Xu wrote:
>btw, for PIR.. I find that there might be a bug in
>__vmx_deliver_posted_interrupt()...
>why test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) ??
>
>static void __vmx_deliver_posted_interrupt(struct vcpu *v) { ...
>        if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
>&softirq_pending(cpu)) ...
>}
>
>Suppose that vCPUx is in guest mode, there are two (even more) interrupts
>to vCPUx..
>As the bit is set when delivers the first interrupt... the second interrupt is
>pending until next VM entry -- by PIR to vIRR..
>

Jan , Kevin
Correct me if I am wrong...

Quan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-01-23 10:56           ` Xuquan (Quan Xu)
@ 2017-02-08  6:51             ` Tian, Kevin
  2017-02-08  8:27               ` Xuquan (Quan Xu)
  0 siblings, 1 reply; 32+ messages in thread
From: Tian, Kevin @ 2017-02-08  6:51 UTC (permalink / raw)
  To: Xuquan (Quan Xu), Jan Beulich
  Cc: Andrew Cooper, xen-devel, osstest service owner, Gao, Chao

> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
> Sent: Monday, January 23, 2017 6:57 PM
> 
> On January 20, 2017 5:09 PM, Quan Xu wrote:
> >btw, for PIR.. I find that there might be a bug in
> >__vmx_deliver_posted_interrupt()...
> >why test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) ??
> >
> >static void __vmx_deliver_posted_interrupt(struct vcpu *v) { ...
> >        if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
> >&softirq_pending(cpu)) ...
> >}
> >
> >Suppose that vCPUx is in guest mode, there are two (even more) interrupts
> >to vCPUx..
> >As the bit is set when delivers the first interrupt... the second interrupt is
> >pending until next VM entry -- by PIR to vIRR..
> >
> 
> Jan , Kevin
> Correct me if I am wrong...
> 
> Quan

I don't quite understand the point here. Can you elaborate?

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-02-08 10:15                   ` Xuquan (Quan Xu)
@ 2017-02-08  8:22                     ` Chao Gao
  2017-02-09  8:51                       ` Xuquan (Quan Xu)
  0 siblings, 1 reply; 32+ messages in thread
From: Chao Gao @ 2017-02-08  8:22 UTC (permalink / raw)
  To: Xuquan (Quan Xu)
  Cc: yang.zhang.wz, Andrew Cooper, Kevin Tian, Jan Beulich, xen-devel

On Wed, Feb 08, 2017 at 10:15:28AM +0000, Xuquan (Quan Xu) wrote:
>On February 08, 2017 4:52 PM, Jan Beulich wrote:
>>>>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote:
>>> Assumed vCPU is in guest_mode..
>>> When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(),
>>> then
>>> __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit
>>> (also no
>>> vcpu_kick() )..
>>> In __vmx_deliver_posted_interrupt(), it is __conditional__ to deliver
>>> posted interrupt. if posted interrupt is not delivered, the posted
>>> interrupt is pending until next VM entry -- by PIR to vIRR..
>>>
>>> one condition is :
>>> In __vmx_deliver_posted_interrupt(),  ' if (
>>> !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' ..
>>>
>>> Specifically, we did verify it by RES interrupt, which is used for
>>> smp_reschedule_interrupt..
>>> We even cost more time to deliver RES interrupt than no-apicv in
>>average..
>>>
>>> If RES interrupt (no. 1) is delivered by posted way (the vcpu is still
>>> guest_mode).. when tries to deliver next-coming RES interrupt (no. 2)
>>> by posted way, The next-coming RES interrupt (no. 2) is not delivered,
>>> as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES interrupt (no.
>>> 1)..
>>>
>>> Then the next-coming RES interrupt (no. 2) is pending until next VM
>>> entry -- by PIR to vIRR..
>>>
>>>
>>> We can fix it as below(I don't think this is a best one, it is better
>>> to set the VCPU_KICK_SOFTIRQ bit, but not test it):
>>>
>>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>>> @@ -1846,7 +1846,7 @@ static void
>>__vmx_deliver_posted_interrupt(struct vcpu *v)
>>>      {
>>>          unsigned int cpu = v->processor;
>>>
>>> -        if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
>>&softirq_pending(cpu))
>>> +        if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
>>>               && (cpu != smp_processor_id()) )
>>>              send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
>>>      }
>>
>>While I don't think I fully understand your description, 
>
>Sorry!!
>
>>the line you change
>>here has always been puzzling me: If we were to raise a softirq here, we
>>ought to call cpu_raise_softirq() instead of partly open coding what it does.
>>So I think not marking that softirq pending (but doing this incompletely) is
>>a valid change in any case.
>
>As comments in pi_notification_interrupt()  -- xen/arch/x86/hvm/vmx/vmx.c
>((((
>     *
>     * we need to set VCPU_KICK_SOFTIRQ for the current cpu, just like
>     * __vmx_deliver_posted_interrupt(). So the pending interrupt in PIRR will
>     * be synced to vIRR before VM-Exit in time.
>     *
>))))
>
>I think setting VCPU_KICK_SOFTIRQ bit -- the pending interrupt in PIRR will be synced to vIRR before VM-Exit in time.
>That's also why i said it is better to set the VCPU_KICK_SOFTIRQ bit, but not test it..
>

I think there is a typo. It should be "before VM-Entry in time". It set 
VCPU_KICK_SOFTIRQ bit only to jump to vmx_do_vmentry again instead of
entering guest directly. Jumping to vmx_do_vmentry again can re-sync the PIR to
vIRR in vmx_intr_assist(). In root-mode, cpu treat the pi notification interrupt
as normal interrupt, so cpu will run the interrupt handler pi_notification_interrupt()
instead of syncing PIR to vIRR automatically. Receiving a pi notificatio interrupt
means some interrupts have been posted in PIR. Setting that bit is to deliver 
these new arrival interrupts before this VM-entry. 

Thanks
chao

>
>>But I'll have to defer to Kevin in the hopes that he fully understands what
>>you explain above as well as him knowing why this was a test-and-set here
>>in the first place.
>>
>
>To me, this test-and-set is a bug.
>
>Quan
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-02-08  6:51             ` Tian, Kevin
@ 2017-02-08  8:27               ` Xuquan (Quan Xu)
  2017-02-08  8:52                 ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Xuquan (Quan Xu) @ 2017-02-08  8:27 UTC (permalink / raw)
  To: Tian, Kevin, Jan Beulich
  Cc: yang.zhang.wz, Andrew Cooper, xen-devel, Gao, Chao

On February 08, 2017 2:51 PM, Tian, Kevin wrote:
>> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
>> Sent: Monday, January 23, 2017 6:57 PM
>>
>> On January 20, 2017 5:09 PM, Quan Xu wrote:
>> >btw, for PIR.. I find that there might be a bug in
>> >__vmx_deliver_posted_interrupt()...
>> >why test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) ??
>> >
>> >static void __vmx_deliver_posted_interrupt(struct vcpu *v) { ...
>> >        if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
>> >&softirq_pending(cpu)) ...
>> >}
>> >
>> >Suppose that vCPUx is in guest mode, there are two (even more)
>> >interrupts to vCPUx..
>> >As the bit is set when delivers the first interrupt... the second
>> >interrupt is pending until next VM entry -- by PIR to vIRR..
>> >
>>
>> Jan , Kevin
>> Correct me if I am wrong...
>>
>> Quan
>
>I don't quite understand the point here. Can you elaborate?
>

Assumed vCPU is in guest_mode..
When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), then __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit (also no vcpu_kick() )..
In __vmx_deliver_posted_interrupt(), it is __conditional__ to deliver posted interrupt. if posted interrupt is not delivered, the posted interrupt is 
pending until next VM entry -- by PIR to vIRR.. 

one condition is :
In __vmx_deliver_posted_interrupt(),  ' if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' ..

Specifically, we did verify it by RES interrupt, which is used for smp_reschedule_interrupt..
We even cost more time to deliver RES interrupt than no-apicv in average..

If RES interrupt (no. 1) is delivered by posted way (the vcpu is still guest_mode).. when tries to deliver next-coming RES interrupt (no. 2) by posted way,
The next-coming RES interrupt (no. 2) is not delivered, as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES interrupt (no. 1)..

Then the next-coming RES interrupt (no. 2) is pending until next VM entry -- by PIR to vIRR..


We can fix it as below(I don't think this is a best one, it is better to set the VCPU_KICK_SOFTIRQ bit, but not test it):

--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1846,7 +1846,7 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v)
     {
         unsigned int cpu = v->processor;

-        if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
+        if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
              && (cpu != smp_processor_id()) )
             send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
     }



To be honest, I really spent several days for the original awkward-description:).. 

Happy Spring Festival!!
Quan






_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-02-08  8:27               ` Xuquan (Quan Xu)
@ 2017-02-08  8:52                 ` Jan Beulich
  2017-02-08 10:15                   ` Xuquan (Quan Xu)
                                     ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Jan Beulich @ 2017-02-08  8:52 UTC (permalink / raw)
  To: Xuquan (Quan Xu)
  Cc: yang.zhang.wz, Andrew Cooper, Kevin Tian, xen-devel, Chao Gao

>>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote:
> Assumed vCPU is in guest_mode..
> When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), then 
> __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit (also no 
> vcpu_kick() )..
> In __vmx_deliver_posted_interrupt(), it is __conditional__ to deliver posted 
> interrupt. if posted interrupt is not delivered, the posted interrupt is 
> pending until next VM entry -- by PIR to vIRR.. 
> 
> one condition is :
> In __vmx_deliver_posted_interrupt(),  ' if ( 
> !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' ..
> 
> Specifically, we did verify it by RES interrupt, which is used for 
> smp_reschedule_interrupt..
> We even cost more time to deliver RES interrupt than no-apicv in average..
> 
> If RES interrupt (no. 1) is delivered by posted way (the vcpu is still 
> guest_mode).. when tries to deliver next-coming RES interrupt (no. 2) by 
> posted way,
> The next-coming RES interrupt (no. 2) is not delivered, as we set the 
> VCPU_KICK_SOFTIRQ bit when we deliver RES interrupt (no. 1)..
> 
> Then the next-coming RES interrupt (no. 2) is pending until next VM entry -- by 
> PIR to vIRR..
> 
> 
> We can fix it as below(I don't think this is a best one, it is better to set 
> the VCPU_KICK_SOFTIRQ bit, but not test it):
> 
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1846,7 +1846,7 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v)
>      {
>          unsigned int cpu = v->processor;
> 
> -        if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
> +        if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
>               && (cpu != smp_processor_id()) )
>              send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
>      }

While I don't think I fully understand your description, the line you
change here has always been puzzling me: If we were to raise a
softirq here, we ought to call cpu_raise_softirq() instead of partly
open coding what it does. So I think not marking that softirq
pending (but doing this incompletely) is a valid change in any case.
But I'll have to defer to Kevin in the hopes that he fully
understands what you explain above as well as him knowing why
this was a test-and-set here in the first place.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-02-08  8:52                 ` Jan Beulich
@ 2017-02-08 10:15                   ` Xuquan (Quan Xu)
  2017-02-08  8:22                     ` Chao Gao
  2017-02-13  8:21                   ` Tian, Kevin
  2017-02-13  8:23                   ` Tian, Kevin
  2 siblings, 1 reply; 32+ messages in thread
From: Xuquan (Quan Xu) @ 2017-02-08 10:15 UTC (permalink / raw)
  To: Jan Beulich, Kevin Tian; +Cc: yang.zhang.wz, Andrew Cooper, xen-devel, Chao Gao

On February 08, 2017 4:52 PM, Jan Beulich wrote:
>>>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote:
>> Assumed vCPU is in guest_mode..
>> When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(),
>> then
>> __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit
>> (also no
>> vcpu_kick() )..
>> In __vmx_deliver_posted_interrupt(), it is __conditional__ to deliver
>> posted interrupt. if posted interrupt is not delivered, the posted
>> interrupt is pending until next VM entry -- by PIR to vIRR..
>>
>> one condition is :
>> In __vmx_deliver_posted_interrupt(),  ' if (
>> !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' ..
>>
>> Specifically, we did verify it by RES interrupt, which is used for
>> smp_reschedule_interrupt..
>> We even cost more time to deliver RES interrupt than no-apicv in
>average..
>>
>> If RES interrupt (no. 1) is delivered by posted way (the vcpu is still
>> guest_mode).. when tries to deliver next-coming RES interrupt (no. 2)
>> by posted way, The next-coming RES interrupt (no. 2) is not delivered,
>> as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES interrupt (no.
>> 1)..
>>
>> Then the next-coming RES interrupt (no. 2) is pending until next VM
>> entry -- by PIR to vIRR..
>>
>>
>> We can fix it as below(I don't think this is a best one, it is better
>> to set the VCPU_KICK_SOFTIRQ bit, but not test it):
>>
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -1846,7 +1846,7 @@ static void
>__vmx_deliver_posted_interrupt(struct vcpu *v)
>>      {
>>          unsigned int cpu = v->processor;
>>
>> -        if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
>&softirq_pending(cpu))
>> +        if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
>>               && (cpu != smp_processor_id()) )
>>              send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
>>      }
>
>While I don't think I fully understand your description, 

Sorry!!

>the line you change
>here has always been puzzling me: If we were to raise a softirq here, we
>ought to call cpu_raise_softirq() instead of partly open coding what it does.
>So I think not marking that softirq pending (but doing this incompletely) is
>a valid change in any case.

As comments in pi_notification_interrupt()  -- xen/arch/x86/hvm/vmx/vmx.c
((((
     *
     * we need to set VCPU_KICK_SOFTIRQ for the current cpu, just like
     * __vmx_deliver_posted_interrupt(). So the pending interrupt in PIRR will
     * be synced to vIRR before VM-Exit in time.
     *
))))

I think setting VCPU_KICK_SOFTIRQ bit -- the pending interrupt in PIRR will be synced to vIRR before VM-Exit in time.
That's also why i said it is better to set the VCPU_KICK_SOFTIRQ bit, but not test it..


>But I'll have to defer to Kevin in the hopes that he fully understands what
>you explain above as well as him knowing why this was a test-and-set here
>in the first place.
>

To me, this test-and-set is a bug.

Quan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-02-09  8:51                       ` Xuquan (Quan Xu)
@ 2017-02-09  3:41                         ` Chao Gao
  0 siblings, 0 replies; 32+ messages in thread
From: Chao Gao @ 2017-02-09  3:41 UTC (permalink / raw)
  To: Xuquan (Quan Xu)
  Cc: yang.zhang.wz, Andrew Cooper, Kevin Tian, Jan Beulich, xen-devel

On Thu, Feb 09, 2017 at 08:51:46AM +0000, Xuquan (Quan Xu) wrote:
>On February 08, 2017 4:22 PM, Chao Gao wrote:
>>On Wed, Feb 08, 2017 at 10:15:28AM +0000, Xuquan (Quan Xu) wrote:
>>>On February 08, 2017 4:52 PM, Jan Beulich wrote:
>>>>>>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote:
>>>>> Assumed vCPU is in guest_mode..
>>>>> When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(),
>>>>> then
>>>>> __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit
>>>>> (also no
>>>>> vcpu_kick() )..
>>>>> In __vmx_deliver_posted_interrupt(), it is __conditional__ to
>>>>> deliver posted interrupt. if posted interrupt is not delivered, the
>>>>> posted interrupt is pending until next VM entry -- by PIR to vIRR..
>>>>>
>>>>> one condition is :
>>>>> In __vmx_deliver_posted_interrupt(),  ' if (
>>>>> !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' ..
>>>>>
>>>>> Specifically, we did verify it by RES interrupt, which is used for
>>>>> smp_reschedule_interrupt..
>>>>> We even cost more time to deliver RES interrupt than no-apicv in
>>>>average..
>>>>>
>>>>> If RES interrupt (no. 1) is delivered by posted way (the vcpu is
>>>>> still guest_mode).. when tries to deliver next-coming RES interrupt
>>>>> (no. 2) by posted way, The next-coming RES interrupt (no. 2) is not
>>>>> delivered, as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES
>>interrupt (no.
>>>>> 1)..
>>>>>
>>>>> Then the next-coming RES interrupt (no. 2) is pending until next VM
>>>>> entry -- by PIR to vIRR..
>>>>>
>>>>>
>>>>> We can fix it as below(I don't think this is a best one, it is
>>>>> better to set the VCPU_KICK_SOFTIRQ bit, but not test it):
>>>>>
>>>>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>>>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>>>>> @@ -1846,7 +1846,7 @@ static void
>>>>__vmx_deliver_posted_interrupt(struct vcpu *v)
>>>>>      {
>>>>>          unsigned int cpu = v->processor;
>>>>>
>>>>> -        if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
>>>>&softirq_pending(cpu))
>>>>> +        if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
>>>>>               && (cpu != smp_processor_id()) )
>>>>>              send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
>>>>>      }
>>>>
>>>>While I don't think I fully understand your description,
>>>
>>>Sorry!!
>>>
>>>>the line you change
>>>>here has always been puzzling me: If we were to raise a softirq here,
>>>>we ought to call cpu_raise_softirq() instead of partly open coding what it
>>does.
>>>>So I think not marking that softirq pending (but doing this
>>>>incompletely) is a valid change in any case.
>>>
>>>As comments in pi_notification_interrupt()  --
>>>xen/arch/x86/hvm/vmx/vmx.c ((((
>>>     *
>>>     * we need to set VCPU_KICK_SOFTIRQ for the current cpu, just like
>>>     * __vmx_deliver_posted_interrupt(). So the pending interrupt in
>>PIRR will
>>>     * be synced to vIRR before VM-Exit in time.
>>>     *
>>>))))
>>>
>>>I think setting VCPU_KICK_SOFTIRQ bit -- the pending interrupt in PIRR will
>>be synced to vIRR before VM-Exit in time.
>>>That's also why i said it is better to set the VCPU_KICK_SOFTIRQ bit, but
>>not test it..
>>>
>>
>>I think there is a typo. It should be "before VM-Entry in time". It set
>>VCPU_KICK_SOFTIRQ bit only to jump to vmx_do_vmentry again instead of
>>entering guest directly. Jumping to vmx_do_vmentry again can re-sync the
>>PIR to vIRR in vmx_intr_assist(). 
>
>impressive analysis..
>chao, could you show the related code?
>

In xen/arch/x86/vmx/entry.S, 
.Lvmx_do_vmentry:
	call vmx_intr_assist
	... 
	cli
	cmp %ecx,(%rdx,%rax,1)
	jnz .Lvmx_process_softirqs
and 
.Lvmx_process_softirqs:
	sti
	call do_softirq
	jmp .Lvmx_do_vmentry	

In vmx_intr_assist(), PIR is synced to vIRR. After vmx_intr_assist(), 
if some interrupts are posted in PIR, VCPU_KICK_SOFTIRQ is set 
in pi_nofitication_interrupt() and it will jump to vmx_process_softirqs
and jump to vmx_do_vmentry again.

Thanks
Chao

>Quan
>
>
>>In root-mode, cpu treat the pi notification
>>interrupt as normal interrupt, so cpu will run the interrupt handler
>>pi_notification_interrupt() instead of syncing PIR to vIRR automatically.
>>Receiving a pi notificatio interrupt means some interrupts have been
>>posted in PIR. Setting that bit is to deliver these new arrival interrupts
>>before this VM-entry.
>>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-02-08  8:22                     ` Chao Gao
@ 2017-02-09  8:51                       ` Xuquan (Quan Xu)
  2017-02-09  3:41                         ` Chao Gao
  0 siblings, 1 reply; 32+ messages in thread
From: Xuquan (Quan Xu) @ 2017-02-09  8:51 UTC (permalink / raw)
  To: Chao Gao; +Cc: yang.zhang.wz, Andrew Cooper, Kevin Tian, Jan Beulich, xen-devel

On February 08, 2017 4:22 PM, Chao Gao wrote:
>On Wed, Feb 08, 2017 at 10:15:28AM +0000, Xuquan (Quan Xu) wrote:
>>On February 08, 2017 4:52 PM, Jan Beulich wrote:
>>>>>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote:
>>>> Assumed vCPU is in guest_mode..
>>>> When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(),
>>>> then
>>>> __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit
>>>> (also no
>>>> vcpu_kick() )..
>>>> In __vmx_deliver_posted_interrupt(), it is __conditional__ to
>>>> deliver posted interrupt. if posted interrupt is not delivered, the
>>>> posted interrupt is pending until next VM entry -- by PIR to vIRR..
>>>>
>>>> one condition is :
>>>> In __vmx_deliver_posted_interrupt(),  ' if (
>>>> !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' ..
>>>>
>>>> Specifically, we did verify it by RES interrupt, which is used for
>>>> smp_reschedule_interrupt..
>>>> We even cost more time to deliver RES interrupt than no-apicv in
>>>average..
>>>>
>>>> If RES interrupt (no. 1) is delivered by posted way (the vcpu is
>>>> still guest_mode).. when tries to deliver next-coming RES interrupt
>>>> (no. 2) by posted way, The next-coming RES interrupt (no. 2) is not
>>>> delivered, as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES
>interrupt (no.
>>>> 1)..
>>>>
>>>> Then the next-coming RES interrupt (no. 2) is pending until next VM
>>>> entry -- by PIR to vIRR..
>>>>
>>>>
>>>> We can fix it as below(I don't think this is a best one, it is
>>>> better to set the VCPU_KICK_SOFTIRQ bit, but not test it):
>>>>
>>>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>>>> @@ -1846,7 +1846,7 @@ static void
>>>__vmx_deliver_posted_interrupt(struct vcpu *v)
>>>>      {
>>>>          unsigned int cpu = v->processor;
>>>>
>>>> -        if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
>>>&softirq_pending(cpu))
>>>> +        if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
>>>>               && (cpu != smp_processor_id()) )
>>>>              send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
>>>>      }
>>>
>>>While I don't think I fully understand your description,
>>
>>Sorry!!
>>
>>>the line you change
>>>here has always been puzzling me: If we were to raise a softirq here,
>>>we ought to call cpu_raise_softirq() instead of partly open coding what it
>does.
>>>So I think not marking that softirq pending (but doing this
>>>incompletely) is a valid change in any case.
>>
>>As comments in pi_notification_interrupt()  --
>>xen/arch/x86/hvm/vmx/vmx.c ((((
>>     *
>>     * we need to set VCPU_KICK_SOFTIRQ for the current cpu, just like
>>     * __vmx_deliver_posted_interrupt(). So the pending interrupt in
>PIRR will
>>     * be synced to vIRR before VM-Exit in time.
>>     *
>>))))
>>
>>I think setting VCPU_KICK_SOFTIRQ bit -- the pending interrupt in PIRR will
>be synced to vIRR before VM-Exit in time.
>>That's also why i said it is better to set the VCPU_KICK_SOFTIRQ bit, but
>not test it..
>>
>
>I think there is a typo. It should be "before VM-Entry in time". It set
>VCPU_KICK_SOFTIRQ bit only to jump to vmx_do_vmentry again instead of
>entering guest directly. Jumping to vmx_do_vmentry again can re-sync the
>PIR to vIRR in vmx_intr_assist(). 

impressive analysis..
chao, could you show the related code?

Quan


>In root-mode, cpu treat the pi notification
>interrupt as normal interrupt, so cpu will run the interrupt handler
>pi_notification_interrupt() instead of syncing PIR to vIRR automatically.
>Receiving a pi notificatio interrupt means some interrupts have been
>posted in PIR. Setting that bit is to deliver these new arrival interrupts
>before this VM-entry.
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-02-08  8:52                 ` Jan Beulich
  2017-02-08 10:15                   ` Xuquan (Quan Xu)
@ 2017-02-13  8:21                   ` Tian, Kevin
  2017-02-20 12:04                     ` Xuquan (Quan Xu)
  2017-02-13  8:23                   ` Tian, Kevin
  2 siblings, 1 reply; 32+ messages in thread
From: Tian, Kevin @ 2017-02-13  8:21 UTC (permalink / raw)
  To: Jan Beulich, Xuquan (Quan Xu)
  Cc: yang.zhang.wz, Andrew Cooper, xen-devel, Gao, Chao

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Wednesday, February 08, 2017 4:52 PM
> 
> >>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote:
> > Assumed vCPU is in guest_mode..
> > When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), then
> > __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit (also no
> > vcpu_kick() )..
> > In __vmx_deliver_posted_interrupt(), it is __conditional__ to deliver posted
> > interrupt. if posted interrupt is not delivered, the posted interrupt is
> > pending until next VM entry -- by PIR to vIRR..
> >
> > one condition is :
> > In __vmx_deliver_posted_interrupt(),  ' if (
> > !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' ..
> >
> > Specifically, we did verify it by RES interrupt, which is used for
> > smp_reschedule_interrupt..
> > We even cost more time to deliver RES interrupt than no-apicv in average..
> >
> > If RES interrupt (no. 1) is delivered by posted way (the vcpu is still
> > guest_mode).. when tries to deliver next-coming RES interrupt (no. 2) by
> > posted way,
> > The next-coming RES interrupt (no. 2) is not delivered, as we set the
> > VCPU_KICK_SOFTIRQ bit when we deliver RES interrupt (no. 1)..
> >
> > Then the next-coming RES interrupt (no. 2) is pending until next VM entry -- by
> > PIR to vIRR..
> >
> >
> > We can fix it as below(I don't think this is a best one, it is better to set
> > the VCPU_KICK_SOFTIRQ bit, but not test it):
> >
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -1846,7 +1846,7 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v)
> >      {
> >          unsigned int cpu = v->processor;
> >
> > -        if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
> > +        if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
> >               && (cpu != smp_processor_id()) )
> >              send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
> >      }
> 
> While I don't think I fully understand your description, the line you
> change here has always been puzzling me: If we were to raise a
> softirq here, we ought to call cpu_raise_softirq() instead of partly
> open coding what it does. 

We require posted_intr_vector for target CPU to ack/deliver virtual 
interrupt in non-root mode. cpu_raise_softirq uses a different vector,
which cannot trigger such effect. 

> So I think not marking that softirq
> pending (but doing this incompletely) is a valid change in any case.
> But I'll have to defer to Kevin in the hopes that he fully
> understands what you explain above as well as him knowing why
> this was a test-and-set here in the first place.
> 

I agree we have a misuse of softirq mechanism here. If guest 
is already in non-root mode, the 1st posted interrupt will be directly 
delivered to guest (leaving softirq being set w/o actually incurring a 
VM-exit - breaking desired softirq behavior). Then further posted 
interrupts will skip the IPI, stay in PIR and not noted until another 
VM-exit happens. Looks Quan observes such delay of delivery in
his experiments.

I'm OK to remove the set here. Actually since it's an optimization
for less IPIs, we'd better check softirq_pending(cpu) directly 
instead of sticking to one bit only.

Thanks
Kevin





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-02-08  8:52                 ` Jan Beulich
  2017-02-08 10:15                   ` Xuquan (Quan Xu)
  2017-02-13  8:21                   ` Tian, Kevin
@ 2017-02-13  8:23                   ` Tian, Kevin
  2017-02-14  1:22                     ` Xuquan (Quan Xu)
  2 siblings, 1 reply; 32+ messages in thread
From: Tian, Kevin @ 2017-02-13  8:23 UTC (permalink / raw)
  To: Jan Beulich, Xuquan (Quan Xu)
  Cc: yang.zhang.wz, Andrew Cooper, xen-devel, Gao, Chao

> From: Tian, Kevin
> Sent: Monday, February 13, 2017 4:21 PM
> 
> > From: Jan Beulich [mailto:JBeulich@suse.com]
> > Sent: Wednesday, February 08, 2017 4:52 PM
> >
> > >>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote:
> > > Assumed vCPU is in guest_mode..
> > > When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), then
> > > __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit (also no
> > > vcpu_kick() )..
> > > In __vmx_deliver_posted_interrupt(), it is __conditional__ to deliver posted
> > > interrupt. if posted interrupt is not delivered, the posted interrupt is
> > > pending until next VM entry -- by PIR to vIRR..
> > >
> > > one condition is :
> > > In __vmx_deliver_posted_interrupt(),  ' if (
> > > !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' ..
> > >
> > > Specifically, we did verify it by RES interrupt, which is used for
> > > smp_reschedule_interrupt..
> > > We even cost more time to deliver RES interrupt than no-apicv in average..
> > >
> > > If RES interrupt (no. 1) is delivered by posted way (the vcpu is still
> > > guest_mode).. when tries to deliver next-coming RES interrupt (no. 2) by
> > > posted way,
> > > The next-coming RES interrupt (no. 2) is not delivered, as we set the
> > > VCPU_KICK_SOFTIRQ bit when we deliver RES interrupt (no. 1)..
> > >
> > > Then the next-coming RES interrupt (no. 2) is pending until next VM entry -- by
> > > PIR to vIRR..
> > >
> > >
> > > We can fix it as below(I don't think this is a best one, it is better to set
> > > the VCPU_KICK_SOFTIRQ bit, but not test it):
> > >
> > > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > > @@ -1846,7 +1846,7 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v)
> > >      {
> > >          unsigned int cpu = v->processor;
> > >
> > > -        if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
> > > +        if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
> > >               && (cpu != smp_processor_id()) )
> > >              send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
> > >      }
> >
> > While I don't think I fully understand your description, the line you
> > change here has always been puzzling me: If we were to raise a
> > softirq here, we ought to call cpu_raise_softirq() instead of partly
> > open coding what it does.
> 
> We require posted_intr_vector for target CPU to ack/deliver virtual
> interrupt in non-root mode. cpu_raise_softirq uses a different vector,
> which cannot trigger such effect.
> 
> > So I think not marking that softirq
> > pending (but doing this incompletely) is a valid change in any case.
> > But I'll have to defer to Kevin in the hopes that he fully
> > understands what you explain above as well as him knowing why
> > this was a test-and-set here in the first place.
> >
> 
> I agree we have a misuse of softirq mechanism here. If guest
> is already in non-root mode, the 1st posted interrupt will be directly
> delivered to guest (leaving softirq being set w/o actually incurring a
> VM-exit - breaking desired softirq behavior). Then further posted
> interrupts will skip the IPI, stay in PIR and not noted until another
> VM-exit happens. Looks Quan observes such delay of delivery in
> his experiments.
> 
> I'm OK to remove the set here. Actually since it's an optimization
> for less IPIs, we'd better check softirq_pending(cpu) directly
> instead of sticking to one bit only.
>

sent too fast... Quan, can you work out a patch following this
suggestion and see whether your slow-delivery issue is solved?
(hope I understand your issue correctly here).

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-02-13  8:23                   ` Tian, Kevin
@ 2017-02-14  1:22                     ` Xuquan (Quan Xu)
  0 siblings, 0 replies; 32+ messages in thread
From: Xuquan (Quan Xu) @ 2017-02-14  1:22 UTC (permalink / raw)
  To: Tian, Kevin, Jan Beulich
  Cc: yang.zhang.wz, Andrew Cooper, xen-devel, Gao, Chao

On February 13, 2017 4:24 PM, Tian, Kevin wrote:
>> From: Tian, Kevin
>> Sent: Monday, February 13, 2017 4:21 PM
>>
>> > From: Jan Beulich [mailto:JBeulich@suse.com]
>> > Sent: Wednesday, February 08, 2017 4:52 PM
>> >
>> > >>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote:
>> > > Assumed vCPU is in guest_mode..
>> > > When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(),
>> > > then
>> > > __vmx_deliver_posted_interrupt() to deliver interrupt, but no
>> > > vmexit (also no
>> > > vcpu_kick() )..
>> > > In __vmx_deliver_posted_interrupt(), it is __conditional__ to
>> > > deliver posted interrupt. if posted interrupt is not delivered,
>> > > the posted interrupt is pending until next VM entry -- by PIR to vIRR..
>> > >
>> > > one condition is :
>> > > In __vmx_deliver_posted_interrupt(),  ' if (
>> > > !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' ..
>> > >
>> > > Specifically, we did verify it by RES interrupt, which is used for
>> > > smp_reschedule_interrupt..
>> > > We even cost more time to deliver RES interrupt than no-apicv in
>average..
>> > >
>> > > If RES interrupt (no. 1) is delivered by posted way (the vcpu is
>> > > still guest_mode).. when tries to deliver next-coming RES
>> > > interrupt (no. 2) by posted way, The next-coming RES interrupt
>> > > (no. 2) is not delivered, as we set the VCPU_KICK_SOFTIRQ bit when
>> > > we deliver RES interrupt (no. 1)..
>> > >
>> > > Then the next-coming RES interrupt (no. 2) is pending until next
>> > > VM entry -- by PIR to vIRR..
>> > >
>> > >
>> > > We can fix it as below(I don't think this is a best one, it is
>> > > better to set the VCPU_KICK_SOFTIRQ bit, but not test it):
>> > >
>> > > --- a/xen/arch/x86/hvm/vmx/vmx.c
>> > > +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> > > @@ -1846,7 +1846,7 @@ static void
>__vmx_deliver_posted_interrupt(struct vcpu *v)
>> > >      {
>> > >          unsigned int cpu = v->processor;
>> > >
>> > > -        if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
>&softirq_pending(cpu))
>> > > +        if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
>> > >               && (cpu != smp_processor_id()) )
>> > >              send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
>> > >      }
>> >
>> > While I don't think I fully understand your description, the line
>> > you change here has always been puzzling me: If we were to raise a
>> > softirq here, we ought to call cpu_raise_softirq() instead of partly
>> > open coding what it does.
>>
>> We require posted_intr_vector for target CPU to ack/deliver virtual
>> interrupt in non-root mode. cpu_raise_softirq uses a different vector,
>> which cannot trigger such effect.
>>
>> > So I think not marking that softirq
>> > pending (but doing this incompletely) is a valid change in any case.
>> > But I'll have to defer to Kevin in the hopes that he fully
>> > understands what you explain above as well as him knowing why this
>> > was a test-and-set here in the first place.
>> >
>>
>> I agree we have a misuse of softirq mechanism here. If guest is
>> already in non-root mode, the 1st posted interrupt will be directly
>> delivered to guest (leaving softirq being set w/o actually incurring a
>> VM-exit - breaking desired softirq behavior). Then further posted
>> interrupts will skip the IPI, stay in PIR and not noted until another
>> VM-exit happens. Looks Quan observes such delay of delivery in his
>> experiments.
>>
>> I'm OK to remove the set here. Actually since it's an optimization for
>> less IPIs, we'd better check softirq_pending(cpu) directly instead of
>> sticking to one bit only.
>>
>
>sent too fast... Quan, can you work out a patch following this suggestion
>and see whether your slow-delivery issue is solved?
>(hope I understand your issue correctly here).
>

Cool, Very Correct!! 
Sure, I will send out a patch in coming days..

Quan










_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-02-13  8:21                   ` Tian, Kevin
@ 2017-02-20 12:04                     ` Xuquan (Quan Xu)
  2017-02-21  3:17                       ` Tian, Kevin
  0 siblings, 1 reply; 32+ messages in thread
From: Xuquan (Quan Xu) @ 2017-02-20 12:04 UTC (permalink / raw)
  To: Tian, Kevin, Jan Beulich
  Cc: yang.zhang.wz, Andrew Cooper, xen-devel, Gao, Chao

On February 13, 2017 4:21 PM, Tian, Kevin wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Wednesday, February 08, 2017 4:52 PM
>>
>> >>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote:
>> > Assumed vCPU is in guest_mode..
>> > When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(),
>> > then
>> > __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit
>> > (also no
>> > vcpu_kick() )..
>> > In __vmx_deliver_posted_interrupt(), it is __conditional__ to
>> > deliver posted interrupt. if posted interrupt is not delivered, the
>> > posted interrupt is pending until next VM entry -- by PIR to vIRR..
>> >
>> > one condition is :
>> > In __vmx_deliver_posted_interrupt(),  ' if (
>> > !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' ..
>> >
>> > Specifically, we did verify it by RES interrupt, which is used for
>> > smp_reschedule_interrupt..
>> > We even cost more time to deliver RES interrupt than no-apicv in
>average..
>> >
>> > If RES interrupt (no. 1) is delivered by posted way (the vcpu is
>> > still guest_mode).. when tries to deliver next-coming RES interrupt
>> > (no. 2) by posted way, The next-coming RES interrupt (no. 2) is not
>> > delivered, as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES
>> > interrupt (no. 1)..
>> >
>> > Then the next-coming RES interrupt (no. 2) is pending until next VM
>> > entry -- by PIR to vIRR..
>> >
>> >
>> > We can fix it as below(I don't think this is a best one, it is
>> > better to set the VCPU_KICK_SOFTIRQ bit, but not test it):
>> >
>> > --- a/xen/arch/x86/hvm/vmx/vmx.c
>> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> > @@ -1846,7 +1846,7 @@ static void
>__vmx_deliver_posted_interrupt(struct vcpu *v)
>> >      {
>> >          unsigned int cpu = v->processor;
>> >
>> > -        if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
>&softirq_pending(cpu))
>> > +        if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
>> >               && (cpu != smp_processor_id()) )
>> >              send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
>> >      }
>>
>> While I don't think I fully understand your description, the line you
>> change here has always been puzzling me: If we were to raise a softirq
>> here, we ought to call cpu_raise_softirq() instead of partly open
>> coding what it does.
>
>We require posted_intr_vector for target CPU to ack/deliver virtual
>interrupt in non-root mode. cpu_raise_softirq uses a different vector, which
>cannot trigger such effect.
>


Kevin,

I can't follow this 'to ack'..
As I understand, the posted_intr_vector is to call event_check_interrupt() [ or pi_notification_interrupt() ] to writes zero to the EOI register in the local APIC --
this dismisses the interrupt with the posted interrupt notification vector from the local APIC.

What does this ack refer to?






>> So I think not marking that softirq
>> pending (but doing this incompletely) is a valid change in any case.
>> But I'll have to defer to Kevin in the hopes that he fully understands
>> what you explain above as well as him knowing why this was a
>> test-and-set here in the first place.
>>
>
>I agree we have a misuse of softirq mechanism here. If guest is already in
>non-root mode, the 1st posted interrupt will be directly delivered to guest
>(leaving softirq being set w/o actually incurring a VM-exit - breaking desired
>softirq behavior). Then further posted interrupts will skip the IPI, stay in PIR
>and not noted until another VM-exit happens. Looks Quan observes such
>delay of delivery in his experiments.
>
>I'm OK to remove the set here. Actually since it's an optimization for less
>IPIs, we'd better check softirq_pending(cpu) directly instead of sticking to
>one bit only.
>
>Thanks
>Kevin
>
>
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [xen-unstable test] 104131: regressions - FAIL
  2017-02-20 12:04                     ` Xuquan (Quan Xu)
@ 2017-02-21  3:17                       ` Tian, Kevin
  0 siblings, 0 replies; 32+ messages in thread
From: Tian, Kevin @ 2017-02-21  3:17 UTC (permalink / raw)
  To: Xuquan (Quan Xu), Jan Beulich
  Cc: yang.zhang.wz, Andrew Cooper, xen-devel, Gao, Chao

> From: Xuquan (Quan Xu) [mailto:xuquan8@huawei.com]
> Sent: Monday, February 20, 2017 8:04 PM
> 
> On February 13, 2017 4:21 PM, Tian, Kevin wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Wednesday, February 08, 2017 4:52 PM
> >>
> >> >>> On 08.02.17 at 09:27, <xuquan8@huawei.com> wrote:
> >> > Assumed vCPU is in guest_mode..
> >> > When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(),
> >> > then
> >> > __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit
> >> > (also no
> >> > vcpu_kick() )..
> >> > In __vmx_deliver_posted_interrupt(), it is __conditional__ to
> >> > deliver posted interrupt. if posted interrupt is not delivered, the
> >> > posted interrupt is pending until next VM entry -- by PIR to vIRR..
> >> >
> >> > one condition is :
> >> > In __vmx_deliver_posted_interrupt(),  ' if (
> >> > !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))' ..
> >> >
> >> > Specifically, we did verify it by RES interrupt, which is used for
> >> > smp_reschedule_interrupt..
> >> > We even cost more time to deliver RES interrupt than no-apicv in
> >average..
> >> >
> >> > If RES interrupt (no. 1) is delivered by posted way (the vcpu is
> >> > still guest_mode).. when tries to deliver next-coming RES interrupt
> >> > (no. 2) by posted way, The next-coming RES interrupt (no. 2) is not
> >> > delivered, as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES
> >> > interrupt (no. 1)..
> >> >
> >> > Then the next-coming RES interrupt (no. 2) is pending until next VM
> >> > entry -- by PIR to vIRR..
> >> >
> >> >
> >> > We can fix it as below(I don't think this is a best one, it is
> >> > better to set the VCPU_KICK_SOFTIRQ bit, but not test it):
> >> >
> >> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> >> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> >> > @@ -1846,7 +1846,7 @@ static void
> >__vmx_deliver_posted_interrupt(struct vcpu *v)
> >> >      {
> >> >          unsigned int cpu = v->processor;
> >> >
> >> > -        if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
> >&softirq_pending(cpu))
> >> > +        if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
> >> >               && (cpu != smp_processor_id()) )
> >> >              send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
> >> >      }
> >>
> >> While I don't think I fully understand your description, the line you
> >> change here has always been puzzling me: If we were to raise a softirq
> >> here, we ought to call cpu_raise_softirq() instead of partly open
> >> coding what it does.
> >
> >We require posted_intr_vector for target CPU to ack/deliver virtual
> >interrupt in non-root mode. cpu_raise_softirq uses a different vector, which
> >cannot trigger such effect.
> >
> 
> 
> Kevin,
> 
> I can't follow this 'to ack'..
> As I understand, the posted_intr_vector is to call event_check_interrupt() [ or
> pi_notification_interrupt() ] to writes zero to the EOI register in the local APIC --
> this dismisses the interrupt with the posted interrupt notification vector from the local
> APIC.
> 
> What does this ack refer to?
> 

Please look at SDM. 'ack' means evaluation of pending vIRRs when CPU is
in non-root mode which results in direct virtual interrupt delivery w/o incurring
VM-exit.

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2017-02-21  3:17 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-12  6:46 [xen-unstable test] 104131: regressions - FAIL osstest service owner
2017-01-12  9:14 ` Andrew Cooper
2017-01-12 12:07   ` Xuquan (Quan Xu)
2017-01-12  9:04     ` Chao Gao
2017-01-12 12:15     ` Andrew Cooper
2017-01-12 12:25       ` Jan Beulich
2017-01-16  5:25         ` Tian, Kevin
2017-01-16  6:27           ` Xuquan (Quan Xu)
2017-01-16  0:06             ` Chao Gao
2017-01-16 11:00           ` Jan Beulich
2017-01-18  4:57             ` Tian, Kevin
2017-01-18  9:38               ` Jan Beulich
2017-01-18 10:23                 ` Tian, Kevin
2017-01-20 11:48                   ` Jan Beulich
2017-01-22  4:35                     ` Tian, Kevin
2017-01-23 10:33                       ` Jan Beulich
2017-01-20  8:47                 ` Xuquan (Quan Xu)
2017-01-20  8:56                   ` Jan Beulich
2017-01-20  9:08           ` Xuquan (Quan Xu)
2017-01-23 10:56           ` Xuquan (Quan Xu)
2017-02-08  6:51             ` Tian, Kevin
2017-02-08  8:27               ` Xuquan (Quan Xu)
2017-02-08  8:52                 ` Jan Beulich
2017-02-08 10:15                   ` Xuquan (Quan Xu)
2017-02-08  8:22                     ` Chao Gao
2017-02-09  8:51                       ` Xuquan (Quan Xu)
2017-02-09  3:41                         ` Chao Gao
2017-02-13  8:21                   ` Tian, Kevin
2017-02-20 12:04                     ` Xuquan (Quan Xu)
2017-02-21  3:17                       ` Tian, Kevin
2017-02-13  8:23                   ` Tian, Kevin
2017-02-14  1:22                     ` Xuquan (Quan Xu)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.