* vfio_pin_map_dma cause synchronize_sched wait too long @ 2019-12-02 9:10 ` Longpeng (Mike) 0 siblings, 0 replies; 10+ messages in thread From: Longpeng (Mike) @ 2019-12-02 9:10 UTC (permalink / raw) To: Alex Williamson, pbonzini Cc: qemu-devel, kvm, linux-kernel, Longpeng(Mike), Gonglei, Huangzhichao Hi guys, Suppose there're two VMs: VM1 is bind to node-0 and calling vfio_pin_map_dma(), VM2 is a migrate incoming VM which bind to node-1. We found the vm_start( QEMU function) of VM2 will take too long occasionally, the reason is as follow. - VM2 - qemu: vm_start vm_start_notify virtio_vmstate_change virtio_pci_vmstate_change virtio_pci_start_ioeventfd virtio_device_start_ioeventfd_impl event_notifier_init eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC) <-- too long kern: sys_eventfd2 get_unused_fd_flags __alloc_fd expand_files expand_fdtable synchronize_sched <-- too long - VM1 - The VM1 is doing vfio_pin_map_dma at the same time. The CPU must finish vfio_pin_map_dma and then rcu-sched grace period can be elapsed, so synchronize_sched would wait for a long time. Is there any solution to this ? Any suggestion would be greatly appreciated, thanks! -- Regards, Longpeng(Mike) ^ permalink raw reply [flat|nested] 10+ messages in thread
* vfio_pin_map_dma cause synchronize_sched wait too long @ 2019-12-02 9:10 ` Longpeng (Mike) 0 siblings, 0 replies; 10+ messages in thread From: Longpeng (Mike) @ 2019-12-02 9:10 UTC (permalink / raw) To: Alex Williamson, pbonzini Cc: kvm, qemu-devel, linux-kernel, Gonglei, Huangzhichao, Longpeng(Mike) Hi guys, Suppose there're two VMs: VM1 is bind to node-0 and calling vfio_pin_map_dma(), VM2 is a migrate incoming VM which bind to node-1. We found the vm_start( QEMU function) of VM2 will take too long occasionally, the reason is as follow. - VM2 - qemu: vm_start vm_start_notify virtio_vmstate_change virtio_pci_vmstate_change virtio_pci_start_ioeventfd virtio_device_start_ioeventfd_impl event_notifier_init eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC) <-- too long kern: sys_eventfd2 get_unused_fd_flags __alloc_fd expand_files expand_fdtable synchronize_sched <-- too long - VM1 - The VM1 is doing vfio_pin_map_dma at the same time. The CPU must finish vfio_pin_map_dma and then rcu-sched grace period can be elapsed, so synchronize_sched would wait for a long time. Is there any solution to this ? Any suggestion would be greatly appreciated, thanks! -- Regards, Longpeng(Mike) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: vfio_pin_map_dma cause synchronize_sched wait too long 2019-12-02 9:10 ` Longpeng (Mike) @ 2019-12-02 9:31 ` Paolo Bonzini -1 siblings, 0 replies; 10+ messages in thread From: Paolo Bonzini @ 2019-12-02 9:31 UTC (permalink / raw) To: Longpeng (Mike), Alex Williamson Cc: qemu-devel, kvm, linux-kernel, Longpeng(Mike), Gonglei, Huangzhichao On 02/12/19 10:10, Longpeng (Mike) wrote: > > Suppose there're two VMs: VM1 is bind to node-0 and calling vfio_pin_map_dma(), > VM2 is a migrate incoming VM which bind to node-1. We found the vm_start( QEMU > function) of VM2 will take too long occasionally, the reason is as follow. Which part of vfio_pin_map_dma is running? There is already a cond_resched in vfio_iommu_map. Perhaps you could add one to vfio_pin_pages_remote and/or use vfio_pgsize_bitmap to cap the number of pages that it returns. Paolo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: vfio_pin_map_dma cause synchronize_sched wait too long @ 2019-12-02 9:31 ` Paolo Bonzini 0 siblings, 0 replies; 10+ messages in thread From: Paolo Bonzini @ 2019-12-02 9:31 UTC (permalink / raw) To: Longpeng (Mike), Alex Williamson Cc: kvm, qemu-devel, linux-kernel, Gonglei, Huangzhichao, Longpeng(Mike) On 02/12/19 10:10, Longpeng (Mike) wrote: > > Suppose there're two VMs: VM1 is bind to node-0 and calling vfio_pin_map_dma(), > VM2 is a migrate incoming VM which bind to node-1. We found the vm_start( QEMU > function) of VM2 will take too long occasionally, the reason is as follow. Which part of vfio_pin_map_dma is running? There is already a cond_resched in vfio_iommu_map. Perhaps you could add one to vfio_pin_pages_remote and/or use vfio_pgsize_bitmap to cap the number of pages that it returns. Paolo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: vfio_pin_map_dma cause synchronize_sched wait too long 2019-12-02 9:31 ` Paolo Bonzini @ 2019-12-02 9:42 ` Longpeng (Mike) -1 siblings, 0 replies; 10+ messages in thread From: Longpeng (Mike) @ 2019-12-02 9:42 UTC (permalink / raw) To: Paolo Bonzini, Alex Williamson Cc: qemu-devel, kvm, linux-kernel, Longpeng(Mike), Gonglei, Huangzhichao 在 2019/12/2 17:31, Paolo Bonzini 写道: > On 02/12/19 10:10, Longpeng (Mike) wrote: >> >> Suppose there're two VMs: VM1 is bind to node-0 and calling vfio_pin_map_dma(), >> VM2 is a migrate incoming VM which bind to node-1. We found the vm_start( QEMU >> function) of VM2 will take too long occasionally, the reason is as follow. > > Which part of vfio_pin_map_dma is running? There is already a I need more analysis to find which part. > cond_resched in vfio_iommu_map. Perhaps you could add one to > vfio_pin_pages_remote and/or use vfio_pgsize_bitmap to cap the number of > pages that it returns. Um ... There's only one running task (qemu-kvm of the VM1) on that CPU, so maybe the cond_resched() is ineffective ? > > Paolo > > > -- Regards, Longpeng(Mike) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: vfio_pin_map_dma cause synchronize_sched wait too long @ 2019-12-02 9:42 ` Longpeng (Mike) 0 siblings, 0 replies; 10+ messages in thread From: Longpeng (Mike) @ 2019-12-02 9:42 UTC (permalink / raw) To: Paolo Bonzini, Alex Williamson Cc: kvm, qemu-devel, linux-kernel, Gonglei, Huangzhichao, Longpeng(Mike) 在 2019/12/2 17:31, Paolo Bonzini 写道: > On 02/12/19 10:10, Longpeng (Mike) wrote: >> >> Suppose there're two VMs: VM1 is bind to node-0 and calling vfio_pin_map_dma(), >> VM2 is a migrate incoming VM which bind to node-1. We found the vm_start( QEMU >> function) of VM2 will take too long occasionally, the reason is as follow. > > Which part of vfio_pin_map_dma is running? There is already a I need more analysis to find which part. > cond_resched in vfio_iommu_map. Perhaps you could add one to > vfio_pin_pages_remote and/or use vfio_pgsize_bitmap to cap the number of > pages that it returns. Um ... There's only one running task (qemu-kvm of the VM1) on that CPU, so maybe the cond_resched() is ineffective ? > > Paolo > > > -- Regards, Longpeng(Mike) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: vfio_pin_map_dma cause synchronize_sched wait too long 2019-12-02 9:42 ` Longpeng (Mike) @ 2019-12-02 10:06 ` Paolo Bonzini -1 siblings, 0 replies; 10+ messages in thread From: Paolo Bonzini @ 2019-12-02 10:06 UTC (permalink / raw) To: Longpeng (Mike), Alex Williamson Cc: qemu-devel, kvm, linux-kernel, Longpeng(Mike), Gonglei, Huangzhichao On 02/12/19 10:42, Longpeng (Mike) wrote: >> cond_resched in vfio_iommu_map. Perhaps you could add one to >> vfio_pin_pages_remote and/or use vfio_pgsize_bitmap to cap the >> number of pages that it returns. > Um ... There's only one running task (qemu-kvm of the VM1) on that > CPU, so maybe the cond_resched() is ineffective ? Note that synchronize_sched() these days is just a synonym of synchronize_rcu, so this makes me wonder if you're running on an older kernel and whether you are missing this commit: commit 92aa39e9dc77481b90cbef25e547d66cab901496 Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Date: Mon Jul 9 13:47:30 2018 -0700 rcu: Make need_resched() respond to urgent RCU-QS needs The per-CPU rcu_dynticks.rcu_urgent_qs variable communicates an urgent need for an RCU quiescent state from the force-quiescent-state processing within the grace-period kthread to context switches and to cond_resched(). Unfortunately, such urgent needs are not communicated to need_resched(), which is sometimes used to decide when to invoke cond_resched(), for but one example, within the KVM vcpu_run() function. As of v4.15, this can result in synchronize_sched() being delayed by up to ten seconds, which can be problematic, to say nothing of annoying. This commit therefore checks rcu_dynticks.rcu_urgent_qs from within rcu_check_callbacks(), which is invoked from the scheduling-clock interrupt handler. If the current task is not an idle task and is not executing in usermode, a context switch is forced, and either way, the rcu_dynticks.rcu_urgent_qs variable is set to false. If the current task is an idle task, then RCU's dyntick-idle code will detect the quiescent state, so no further action is required. Similarly, if the task is executing in usermode, other code in rcu_check_callbacks() and its called functions will report the corresponding quiescent state. Reported-by: Marius Hillenbrand <mhillenb@amazon.de> Reported-by: David Woodhouse <dwmw2@infradead.org> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Thanks, Paolo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: vfio_pin_map_dma cause synchronize_sched wait too long @ 2019-12-02 10:06 ` Paolo Bonzini 0 siblings, 0 replies; 10+ messages in thread From: Paolo Bonzini @ 2019-12-02 10:06 UTC (permalink / raw) To: Longpeng (Mike), Alex Williamson Cc: kvm, qemu-devel, linux-kernel, Gonglei, Huangzhichao, Longpeng(Mike) On 02/12/19 10:42, Longpeng (Mike) wrote: >> cond_resched in vfio_iommu_map. Perhaps you could add one to >> vfio_pin_pages_remote and/or use vfio_pgsize_bitmap to cap the >> number of pages that it returns. > Um ... There's only one running task (qemu-kvm of the VM1) on that > CPU, so maybe the cond_resched() is ineffective ? Note that synchronize_sched() these days is just a synonym of synchronize_rcu, so this makes me wonder if you're running on an older kernel and whether you are missing this commit: commit 92aa39e9dc77481b90cbef25e547d66cab901496 Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Date: Mon Jul 9 13:47:30 2018 -0700 rcu: Make need_resched() respond to urgent RCU-QS needs The per-CPU rcu_dynticks.rcu_urgent_qs variable communicates an urgent need for an RCU quiescent state from the force-quiescent-state processing within the grace-period kthread to context switches and to cond_resched(). Unfortunately, such urgent needs are not communicated to need_resched(), which is sometimes used to decide when to invoke cond_resched(), for but one example, within the KVM vcpu_run() function. As of v4.15, this can result in synchronize_sched() being delayed by up to ten seconds, which can be problematic, to say nothing of annoying. This commit therefore checks rcu_dynticks.rcu_urgent_qs from within rcu_check_callbacks(), which is invoked from the scheduling-clock interrupt handler. If the current task is not an idle task and is not executing in usermode, a context switch is forced, and either way, the rcu_dynticks.rcu_urgent_qs variable is set to false. If the current task is an idle task, then RCU's dyntick-idle code will detect the quiescent state, so no further action is required. Similarly, if the task is executing in usermode, other code in rcu_check_callbacks() and its called functions will report the corresponding quiescent state. Reported-by: Marius Hillenbrand <mhillenb@amazon.de> Reported-by: David Woodhouse <dwmw2@infradead.org> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Thanks, Paolo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: vfio_pin_map_dma cause synchronize_sched wait too long 2019-12-02 10:06 ` Paolo Bonzini @ 2019-12-02 10:47 ` Longpeng (Mike) -1 siblings, 0 replies; 10+ messages in thread From: Longpeng (Mike) @ 2019-12-02 10:47 UTC (permalink / raw) To: Paolo Bonzini, Alex Williamson Cc: qemu-devel, kvm, linux-kernel, Longpeng(Mike), Gonglei, Huangzhichao 在 2019/12/2 18:06, Paolo Bonzini 写道: > On 02/12/19 10:42, Longpeng (Mike) wrote: >>> cond_resched in vfio_iommu_map. Perhaps you could add one to >>> vfio_pin_pages_remote and/or use vfio_pgsize_bitmap to cap the >>> number of pages that it returns. >> Um ... There's only one running task (qemu-kvm of the VM1) on that >> CPU, so maybe the cond_resched() is ineffective ? > > Note that synchronize_sched() these days is just a synonym of > synchronize_rcu, so this makes me wonder if you're running on an older > kernel and whether you are missing this commit: > Yep. I'm running on an older kernel and I've missed this patchset. Thanks a lot :) > > commit 92aa39e9dc77481b90cbef25e547d66cab901496 > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > Date: Mon Jul 9 13:47:30 2018 -0700 > > rcu: Make need_resched() respond to urgent RCU-QS needs > > The per-CPU rcu_dynticks.rcu_urgent_qs variable communicates an urgent > need for an RCU quiescent state from the force-quiescent-state > processing > within the grace-period kthread to context switches and to > cond_resched(). > Unfortunately, such urgent needs are not communicated to need_resched(), > which is sometimes used to decide when to invoke cond_resched(), for > but one example, within the KVM vcpu_run() function. As of v4.15, this > can result in synchronize_sched() being delayed by up to ten seconds, > which can be problematic, to say nothing of annoying. > > This commit therefore checks rcu_dynticks.rcu_urgent_qs from within > rcu_check_callbacks(), which is invoked from the scheduling-clock > interrupt handler. If the current task is not an idle task and is > not executing in usermode, a context switch is forced, and either way, > the rcu_dynticks.rcu_urgent_qs variable is set to false. If the current > task is an idle task, then RCU's dyntick-idle code will detect the > quiescent state, so no further action is required. Similarly, if the > task is executing in usermode, other code in rcu_check_callbacks() and > its called functions will report the corresponding quiescent state. > > Reported-by: Marius Hillenbrand <mhillenb@amazon.de> > Reported-by: David Woodhouse <dwmw2@infradead.org> > Suggested-by: Peter Zijlstra <peterz@infradead.org> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > > Thanks, > > Paolo > > > . > -- Regards, Longpeng(Mike) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: vfio_pin_map_dma cause synchronize_sched wait too long @ 2019-12-02 10:47 ` Longpeng (Mike) 0 siblings, 0 replies; 10+ messages in thread From: Longpeng (Mike) @ 2019-12-02 10:47 UTC (permalink / raw) To: Paolo Bonzini, Alex Williamson Cc: kvm, qemu-devel, linux-kernel, Gonglei, Huangzhichao, Longpeng(Mike) 在 2019/12/2 18:06, Paolo Bonzini 写道: > On 02/12/19 10:42, Longpeng (Mike) wrote: >>> cond_resched in vfio_iommu_map. Perhaps you could add one to >>> vfio_pin_pages_remote and/or use vfio_pgsize_bitmap to cap the >>> number of pages that it returns. >> Um ... There's only one running task (qemu-kvm of the VM1) on that >> CPU, so maybe the cond_resched() is ineffective ? > > Note that synchronize_sched() these days is just a synonym of > synchronize_rcu, so this makes me wonder if you're running on an older > kernel and whether you are missing this commit: > Yep. I'm running on an older kernel and I've missed this patchset. Thanks a lot :) > > commit 92aa39e9dc77481b90cbef25e547d66cab901496 > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > Date: Mon Jul 9 13:47:30 2018 -0700 > > rcu: Make need_resched() respond to urgent RCU-QS needs > > The per-CPU rcu_dynticks.rcu_urgent_qs variable communicates an urgent > need for an RCU quiescent state from the force-quiescent-state > processing > within the grace-period kthread to context switches and to > cond_resched(). > Unfortunately, such urgent needs are not communicated to need_resched(), > which is sometimes used to decide when to invoke cond_resched(), for > but one example, within the KVM vcpu_run() function. As of v4.15, this > can result in synchronize_sched() being delayed by up to ten seconds, > which can be problematic, to say nothing of annoying. > > This commit therefore checks rcu_dynticks.rcu_urgent_qs from within > rcu_check_callbacks(), which is invoked from the scheduling-clock > interrupt handler. If the current task is not an idle task and is > not executing in usermode, a context switch is forced, and either way, > the rcu_dynticks.rcu_urgent_qs variable is set to false. If the current > task is an idle task, then RCU's dyntick-idle code will detect the > quiescent state, so no further action is required. Similarly, if the > task is executing in usermode, other code in rcu_check_callbacks() and > its called functions will report the corresponding quiescent state. > > Reported-by: Marius Hillenbrand <mhillenb@amazon.de> > Reported-by: David Woodhouse <dwmw2@infradead.org> > Suggested-by: Peter Zijlstra <peterz@infradead.org> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > > Thanks, > > Paolo > > > . > -- Regards, Longpeng(Mike) ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-12-02 10:48 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-12-02 9:10 vfio_pin_map_dma cause synchronize_sched wait too long Longpeng (Mike) 2019-12-02 9:10 ` Longpeng (Mike) 2019-12-02 9:31 ` Paolo Bonzini 2019-12-02 9:31 ` Paolo Bonzini 2019-12-02 9:42 ` Longpeng (Mike) 2019-12-02 9:42 ` Longpeng (Mike) 2019-12-02 10:06 ` Paolo Bonzini 2019-12-02 10:06 ` Paolo Bonzini 2019-12-02 10:47 ` Longpeng (Mike) 2019-12-02 10:47 ` Longpeng (Mike)
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.