* [PATCH] driver/pci: reduce the single block time in pci_read_config @ 2020-08-24 5:20 Jiang Biao 2020-08-27 9:49 ` Jiang Biao ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Jiang Biao @ 2020-08-24 5:20 UTC (permalink / raw) To: bhelgaas; +Cc: linux-pci, linux-kernel, Jiang Biao, Bin Lai From: Jiang Biao <benbjiang@tencent.com> pci_read_config() could block several ms in kernel space, mainly caused by the while loop to call pci_user_read_config_dword(). Singel pci_user_read_config_dword() loop could consume 130us+, | pci_user_read_config_dword() { | _raw_spin_lock_irq() { ! 136.698 us | native_queued_spin_lock_slowpath(); ! 137.582 us | } | pci_read() { | raw_pci_read() { | pci_conf1_read() { 0.230 us | _raw_spin_lock_irqsave(); 0.035 us | _raw_spin_unlock_irqrestore(); 8.476 us | } 8.790 us | } 9.091 us | } ! 147.263 us | } and dozens of the loop could consume ms+. If we execute some lspci commands concurrently, ms+ scheduling latency could be detected. Add scheduling chance in the loop to improve the latency. Reported-by: Bin Lai <robinlai@tencent.com> Signed-off-by: Jiang Biao <benbjiang@tencent.com> --- drivers/pci/pci-sysfs.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index 6d78df9..3b9f63d 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -708,6 +708,7 @@ static ssize_t pci_read_config(struct file *filp, struct kobject *kobj, data[off - init_off + 3] = (val >> 24) & 0xff; off += 4; size -= 4; + cond_resched(); } if (size >= 2) { -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] driver/pci: reduce the single block time in pci_read_config 2020-08-24 5:20 [PATCH] driver/pci: reduce the single block time in pci_read_config Jiang Biao @ 2020-08-27 9:49 ` Jiang Biao 2020-09-10 1:25 ` Bjorn Helgaas 2020-09-17 17:32 ` Bjorn Helgaas 2 siblings, 0 replies; 10+ messages in thread From: Jiang Biao @ 2020-08-27 9:49 UTC (permalink / raw) To: Bjorn Helgaas; +Cc: linux-pci, linux-kernel, Jiang Biao, Bin Lai kindly ping :) On Mon, 24 Aug 2020 at 13:20, Jiang Biao <benbjiang@gmail.com> wrote: > > From: Jiang Biao <benbjiang@tencent.com> > > pci_read_config() could block several ms in kernel space, mainly > caused by the while loop to call pci_user_read_config_dword(). > Singel pci_user_read_config_dword() loop could consume 130us+, > | pci_user_read_config_dword() { > | _raw_spin_lock_irq() { > ! 136.698 us | native_queued_spin_lock_slowpath(); > ! 137.582 us | } > | pci_read() { > | raw_pci_read() { > | pci_conf1_read() { > 0.230 us | _raw_spin_lock_irqsave(); > 0.035 us | _raw_spin_unlock_irqrestore(); > 8.476 us | } > 8.790 us | } > 9.091 us | } > ! 147.263 us | } > and dozens of the loop could consume ms+. > > If we execute some lspci commands concurrently, ms+ scheduling > latency could be detected. > > Add scheduling chance in the loop to improve the latency. > > Reported-by: Bin Lai <robinlai@tencent.com> > Signed-off-by: Jiang Biao <benbjiang@tencent.com> > --- > drivers/pci/pci-sysfs.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > index 6d78df9..3b9f63d 100644 > --- a/drivers/pci/pci-sysfs.c > +++ b/drivers/pci/pci-sysfs.c > @@ -708,6 +708,7 @@ static ssize_t pci_read_config(struct file *filp, struct kobject *kobj, > data[off - init_off + 3] = (val >> 24) & 0xff; > off += 4; > size -= 4; > + cond_resched(); > } > > if (size >= 2) { > -- > 1.8.3.1 > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] driver/pci: reduce the single block time in pci_read_config 2020-08-24 5:20 [PATCH] driver/pci: reduce the single block time in pci_read_config Jiang Biao 2020-08-27 9:49 ` Jiang Biao @ 2020-09-10 1:25 ` Bjorn Helgaas 2020-09-10 1:54 ` Jiang Biao 2020-09-17 17:32 ` Bjorn Helgaas 2 siblings, 1 reply; 10+ messages in thread From: Bjorn Helgaas @ 2020-09-10 1:25 UTC (permalink / raw) To: Jiang Biao; +Cc: bhelgaas, linux-pci, linux-kernel, Jiang Biao, Bin Lai On Mon, Aug 24, 2020 at 01:20:25PM +0800, Jiang Biao wrote: > From: Jiang Biao <benbjiang@tencent.com> > > pci_read_config() could block several ms in kernel space, mainly > caused by the while loop to call pci_user_read_config_dword(). > Singel pci_user_read_config_dword() loop could consume 130us+, > | pci_user_read_config_dword() { > | _raw_spin_lock_irq() { > ! 136.698 us | native_queued_spin_lock_slowpath(); > ! 137.582 us | } > | pci_read() { > | raw_pci_read() { > | pci_conf1_read() { > 0.230 us | _raw_spin_lock_irqsave(); > 0.035 us | _raw_spin_unlock_irqrestore(); > 8.476 us | } > 8.790 us | } > 9.091 us | } > ! 147.263 us | } > and dozens of the loop could consume ms+. > > If we execute some lspci commands concurrently, ms+ scheduling > latency could be detected. > > Add scheduling chance in the loop to improve the latency. Thanks for the patch, this makes a lot of sense. Shouldn't we do the same in pci_write_config()? > Reported-by: Bin Lai <robinlai@tencent.com> > Signed-off-by: Jiang Biao <benbjiang@tencent.com> > --- > drivers/pci/pci-sysfs.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > index 6d78df9..3b9f63d 100644 > --- a/drivers/pci/pci-sysfs.c > +++ b/drivers/pci/pci-sysfs.c > @@ -708,6 +708,7 @@ static ssize_t pci_read_config(struct file *filp, struct kobject *kobj, > data[off - init_off + 3] = (val >> 24) & 0xff; > off += 4; > size -= 4; > + cond_resched(); > } > > if (size >= 2) { > -- > 1.8.3.1 > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] driver/pci: reduce the single block time in pci_read_config 2020-09-10 1:25 ` Bjorn Helgaas @ 2020-09-10 1:54 ` Jiang Biao 2020-09-10 1:59 ` Bjorn Helgaas 0 siblings, 1 reply; 10+ messages in thread From: Jiang Biao @ 2020-09-10 1:54 UTC (permalink / raw) To: Bjorn Helgaas; +Cc: Bjorn Helgaas, linux-pci, linux-kernel, Jiang Biao, Bin Lai Hi, On Thu, 10 Sep 2020 at 09:25, Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Mon, Aug 24, 2020 at 01:20:25PM +0800, Jiang Biao wrote: > > From: Jiang Biao <benbjiang@tencent.com> > > > > pci_read_config() could block several ms in kernel space, mainly > > caused by the while loop to call pci_user_read_config_dword(). > > Singel pci_user_read_config_dword() loop could consume 130us+, > > | pci_user_read_config_dword() { > > | _raw_spin_lock_irq() { > > ! 136.698 us | native_queued_spin_lock_slowpath(); > > ! 137.582 us | } > > | pci_read() { > > | raw_pci_read() { > > | pci_conf1_read() { > > 0.230 us | _raw_spin_lock_irqsave(); > > 0.035 us | _raw_spin_unlock_irqrestore(); > > 8.476 us | } > > 8.790 us | } > > 9.091 us | } > > ! 147.263 us | } > > and dozens of the loop could consume ms+. > > > > If we execute some lspci commands concurrently, ms+ scheduling > > latency could be detected. > > > > Add scheduling chance in the loop to improve the latency. > > Thanks for the patch, this makes a lot of sense. > > Shouldn't we do the same in pci_write_config()? Yes, IMHO, that could be helpful too. I'll send v2 to include that. :) Thanks a lot for your comment. Regards, Jiang ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] driver/pci: reduce the single block time in pci_read_config 2020-09-10 1:54 ` Jiang Biao @ 2020-09-10 1:59 ` Bjorn Helgaas 2020-09-10 2:18 ` Jiang Biao 2020-09-13 4:27 ` Jiang Biao 0 siblings, 2 replies; 10+ messages in thread From: Bjorn Helgaas @ 2020-09-10 1:59 UTC (permalink / raw) To: Jiang Biao; +Cc: Bjorn Helgaas, linux-pci, linux-kernel, Jiang Biao, Bin Lai On Thu, Sep 10, 2020 at 09:54:02AM +0800, Jiang Biao wrote: > Hi, > > On Thu, 10 Sep 2020 at 09:25, Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > On Mon, Aug 24, 2020 at 01:20:25PM +0800, Jiang Biao wrote: > > > From: Jiang Biao <benbjiang@tencent.com> > > > > > > pci_read_config() could block several ms in kernel space, mainly > > > caused by the while loop to call pci_user_read_config_dword(). > > > Singel pci_user_read_config_dword() loop could consume 130us+, > > > | pci_user_read_config_dword() { > > > | _raw_spin_lock_irq() { > > > ! 136.698 us | native_queued_spin_lock_slowpath(); > > > ! 137.582 us | } > > > | pci_read() { > > > | raw_pci_read() { > > > | pci_conf1_read() { > > > 0.230 us | _raw_spin_lock_irqsave(); > > > 0.035 us | _raw_spin_unlock_irqrestore(); > > > 8.476 us | } > > > 8.790 us | } > > > 9.091 us | } > > > ! 147.263 us | } > > > and dozens of the loop could consume ms+. > > > > > > If we execute some lspci commands concurrently, ms+ scheduling > > > latency could be detected. > > > > > > Add scheduling chance in the loop to improve the latency. > > > > Thanks for the patch, this makes a lot of sense. > > > > Shouldn't we do the same in pci_write_config()? > Yes, IMHO, that could be helpful too. If it's feasible, it would be nice to actually verify that it makes a difference. I know config writes should be faster than reads, but they're certainly not as fast as a CPU can pump out data, so there must be *some* mechanism that slows the CPU down. Bjorn ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] driver/pci: reduce the single block time in pci_read_config 2020-09-10 1:59 ` Bjorn Helgaas @ 2020-09-10 2:18 ` Jiang Biao 2020-09-13 4:27 ` Jiang Biao 1 sibling, 0 replies; 10+ messages in thread From: Jiang Biao @ 2020-09-10 2:18 UTC (permalink / raw) To: Bjorn Helgaas; +Cc: Bjorn Helgaas, linux-pci, linux-kernel, Jiang Biao, Bin Lai Hi, On Thu, 10 Sep 2020 at 09:59, Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Thu, Sep 10, 2020 at 09:54:02AM +0800, Jiang Biao wrote: > > Hi, > > > > On Thu, 10 Sep 2020 at 09:25, Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > > On Mon, Aug 24, 2020 at 01:20:25PM +0800, Jiang Biao wrote: > > > > From: Jiang Biao <benbjiang@tencent.com> > > > > > > > > pci_read_config() could block several ms in kernel space, mainly > > > > caused by the while loop to call pci_user_read_config_dword(). > > > > Singel pci_user_read_config_dword() loop could consume 130us+, > > > > | pci_user_read_config_dword() { > > > > | _raw_spin_lock_irq() { > > > > ! 136.698 us | native_queued_spin_lock_slowpath(); > > > > ! 137.582 us | } > > > > | pci_read() { > > > > | raw_pci_read() { > > > > | pci_conf1_read() { > > > > 0.230 us | _raw_spin_lock_irqsave(); > > > > 0.035 us | _raw_spin_unlock_irqrestore(); > > > > 8.476 us | } > > > > 8.790 us | } > > > > 9.091 us | } > > > > ! 147.263 us | } > > > > and dozens of the loop could consume ms+. > > > > > > > > If we execute some lspci commands concurrently, ms+ scheduling > > > > latency could be detected. > > > > > > > > Add scheduling chance in the loop to improve the latency. > > > > > > Thanks for the patch, this makes a lot of sense. > > > > > > Shouldn't we do the same in pci_write_config()? > > Yes, IMHO, that could be helpful too. > > If it's feasible, it would be nice to actually verify that it makes a > difference. I know config writes should be faster than reads, but > they're certainly not as fast as a CPU can pump out data, so there > must be *some* mechanism that slows the CPU down. We did catch 5ms+ latency caused by pci_read_config() triggered by concurrent lspcis, and latency disappeared after this patch. For pci_write_config path, we have not met the case actually.:) I'll have some tries to verify that, and would send another patch if verified. Thx. Regards, Jiang ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] driver/pci: reduce the single block time in pci_read_config 2020-09-10 1:59 ` Bjorn Helgaas 2020-09-10 2:18 ` Jiang Biao @ 2020-09-13 4:27 ` Jiang Biao 2020-09-16 16:56 ` Bjorn Helgaas 1 sibling, 1 reply; 10+ messages in thread From: Jiang Biao @ 2020-09-13 4:27 UTC (permalink / raw) To: Bjorn Helgaas; +Cc: Bjorn Helgaas, linux-pci, linux-kernel, Jiang Biao, Bin Lai Hi, Bjorn On Thu, 10 Sep 2020 at 09:59, Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Thu, Sep 10, 2020 at 09:54:02AM +0800, Jiang Biao wrote: > > Hi, > > > > On Thu, 10 Sep 2020 at 09:25, Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > > On Mon, Aug 24, 2020 at 01:20:25PM +0800, Jiang Biao wrote: > > > > From: Jiang Biao <benbjiang@tencent.com> > > > > > > > > pci_read_config() could block several ms in kernel space, mainly > > > > caused by the while loop to call pci_user_read_config_dword(). > > > > Singel pci_user_read_config_dword() loop could consume 130us+, > > > > | pci_user_read_config_dword() { > > > > | _raw_spin_lock_irq() { > > > > ! 136.698 us | native_queued_spin_lock_slowpath(); > > > > ! 137.582 us | } > > > > | pci_read() { > > > > | raw_pci_read() { > > > > | pci_conf1_read() { > > > > 0.230 us | _raw_spin_lock_irqsave(); > > > > 0.035 us | _raw_spin_unlock_irqrestore(); > > > > 8.476 us | } > > > > 8.790 us | } > > > > 9.091 us | } > > > > ! 147.263 us | } > > > > and dozens of the loop could consume ms+. > > > > > > > > If we execute some lspci commands concurrently, ms+ scheduling > > > > latency could be detected. > > > > > > > > Add scheduling chance in the loop to improve the latency. > > > > > > Thanks for the patch, this makes a lot of sense. > > > > > > Shouldn't we do the same in pci_write_config()? > > Yes, IMHO, that could be helpful too. > > If it's feasible, it would be nice to actually verify that it makes a > difference. I know config writes should be faster than reads, but > they're certainly not as fast as a CPU can pump out data, so there > must be *some* mechanism that slows the CPU down. > > Bjorn We failed to build a test case to produce the latency by setpci command, AFAIU, setpci could be much less frequently realistically used than lspci. So, the latency from pci_write_config() path could not be verified for now, could we apply this patch alone to erase the verified latency introduced by pci_read_config() path? :) Thanks a lot. Regards, Jiang ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] driver/pci: reduce the single block time in pci_read_config 2020-09-13 4:27 ` Jiang Biao @ 2020-09-16 16:56 ` Bjorn Helgaas 2020-09-17 0:22 ` Jiang Biao 0 siblings, 1 reply; 10+ messages in thread From: Bjorn Helgaas @ 2020-09-16 16:56 UTC (permalink / raw) To: Jiang Biao; +Cc: Bjorn Helgaas, linux-pci, linux-kernel, Jiang Biao, Bin Lai On Sun, Sep 13, 2020 at 12:27:09PM +0800, Jiang Biao wrote: > On Thu, 10 Sep 2020 at 09:59, Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Thu, Sep 10, 2020 at 09:54:02AM +0800, Jiang Biao wrote: > > > On Thu, 10 Sep 2020 at 09:25, Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > On Mon, Aug 24, 2020 at 01:20:25PM +0800, Jiang Biao wrote: > > > > > From: Jiang Biao <benbjiang@tencent.com> > > > > > > > > > > pci_read_config() could block several ms in kernel space, mainly > > > > > caused by the while loop to call pci_user_read_config_dword(). > > > > > Singel pci_user_read_config_dword() loop could consume 130us+, > > > > > | pci_user_read_config_dword() { > > > > > | _raw_spin_lock_irq() { > > > > > ! 136.698 us | native_queued_spin_lock_slowpath(); > > > > > ! 137.582 us | } > > > > > | pci_read() { > > > > > | raw_pci_read() { > > > > > | pci_conf1_read() { > > > > > 0.230 us | _raw_spin_lock_irqsave(); > > > > > 0.035 us | _raw_spin_unlock_irqrestore(); > > > > > 8.476 us | } > > > > > 8.790 us | } > > > > > 9.091 us | } > > > > > ! 147.263 us | } > > > > > and dozens of the loop could consume ms+. > > > > > > > > > > If we execute some lspci commands concurrently, ms+ scheduling > > > > > latency could be detected. > > > > > > > > > > Add scheduling chance in the loop to improve the latency. > > > > > > > > Thanks for the patch, this makes a lot of sense. > > > > > > > > Shouldn't we do the same in pci_write_config()? > > > > > > Yes, IMHO, that could be helpful too. > > > > If it's feasible, it would be nice to actually verify that it makes a > > difference. I know config writes should be faster than reads, but > > they're certainly not as fast as a CPU can pump out data, so there > > must be *some* mechanism that slows the CPU down. > > > We failed to build a test case to produce the latency by setpci command, > AFAIU, setpci could be much less frequently realistically used than lspci. > So, the latency from pci_write_config() path could not be verified for now, > could we apply this patch alone to erase the verified latency introduced > by pci_read_config() path? :) Thanks for trying! I'll apply the patch as-is. I'd like to include a note in the commit log about the user-visible effect of this. I looked through recent similar commits: 928da37a229f ("RDMA/umem: Add a schedule point in ib_umem_get()") 47aaabdedf36 ("fanotify: Avoid softlockups when reading many events") 9f47eb5461aa ("fs/btrfs: Add cond_resched() for try_release_extent_mapping() stalls") 0a3b3c253a1e ("mm/mmap.c: Add cond_resched() for exit_mmap() CPU stalls") b7e3debdd040 ("mm/memory_hotplug.c: fix false softlockup during pfn range removal") d35bd764e689 ("dm writecache: add cond_resched to loop in persistent_memory_claim()") da97f2d56bbd ("mm: call cond_resched() from deferred_init_memmap()") ab8b65be1831 ("KVM: PPC: Book3S: Fix some RCU-list locks") 48c963e31bc6 ("KVM: arm/arm64: Release kvm->mmu_lock in loop to prevent starvation") e84fe99b68ce ("mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()") 4005f5c3c9d0 ("wireguard: send/receive: cond_resched() when processing worker ringbuffers") 3fd44c86711f ("io_uring: use cond_resched() in io_ring_ctx_wait_and_kill()") 7979457b1d3a ("net: bridge: vlan: Add a schedule point during VLAN processing") 2ed6edd33a21 ("perf: Add cond_resched() to task_function_call()") 1edaa447d958 ("dm writecache: add cond_resched to avoid CPU hangs") ce9a4186f9ac ("macvlan: add cond_resched() during multicast processing") 7be1b9b8e9d1 ("drm/mm: Break long searches in fragmented address spaces") bb699a793110 ("drm/i915/gem: Break up long lists of object reclaim") 9424ef56e13a ("ext4: add cond_resched() to __ext4_find_entry()") and many of them mention softlockups, RCU CPU stall warnings, or watchdogs triggering. Are you seeing one of those, or are you measuring latency some other way? Bjorn ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] driver/pci: reduce the single block time in pci_read_config 2020-09-16 16:56 ` Bjorn Helgaas @ 2020-09-17 0:22 ` Jiang Biao 0 siblings, 0 replies; 10+ messages in thread From: Jiang Biao @ 2020-09-17 0:22 UTC (permalink / raw) To: Bjorn Helgaas; +Cc: Bjorn Helgaas, linux-pci, linux-kernel, Jiang Biao, Bin Lai Hi, On Thu, 17 Sep 2020 at 00:56, Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Sun, Sep 13, 2020 at 12:27:09PM +0800, Jiang Biao wrote: > > On Thu, 10 Sep 2020 at 09:59, Bjorn Helgaas <helgaas@kernel.org> wrote: > > > On Thu, Sep 10, 2020 at 09:54:02AM +0800, Jiang Biao wrote: > > > > On Thu, 10 Sep 2020 at 09:25, Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > On Mon, Aug 24, 2020 at 01:20:25PM +0800, Jiang Biao wrote: > > > > > > From: Jiang Biao <benbjiang@tencent.com> > > > > > > > > > > > > pci_read_config() could block several ms in kernel space, mainly > > > > > > caused by the while loop to call pci_user_read_config_dword(). > > > > > > Singel pci_user_read_config_dword() loop could consume 130us+, > > > > > > | pci_user_read_config_dword() { > > > > > > | _raw_spin_lock_irq() { > > > > > > ! 136.698 us | native_queued_spin_lock_slowpath(); > > > > > > ! 137.582 us | } > > > > > > | pci_read() { > > > > > > | raw_pci_read() { > > > > > > | pci_conf1_read() { > > > > > > 0.230 us | _raw_spin_lock_irqsave(); > > > > > > 0.035 us | _raw_spin_unlock_irqrestore(); > > > > > > 8.476 us | } > > > > > > 8.790 us | } > > > > > > 9.091 us | } > > > > > > ! 147.263 us | } > > > > > > and dozens of the loop could consume ms+. > > > > > > > > > > > > If we execute some lspci commands concurrently, ms+ scheduling > > > > > > latency could be detected. > > > > > > > > > > > > Add scheduling chance in the loop to improve the latency. > > > > > > > > > > Thanks for the patch, this makes a lot of sense. > > > > > > > > > > Shouldn't we do the same in pci_write_config()? > > > > > > > > Yes, IMHO, that could be helpful too. > > > > > > If it's feasible, it would be nice to actually verify that it makes a > > > difference. I know config writes should be faster than reads, but > > > they're certainly not as fast as a CPU can pump out data, so there > > > must be *some* mechanism that slows the CPU down. > > > > > We failed to build a test case to produce the latency by setpci command, > > AFAIU, setpci could be much less frequently realistically used than lspci. > > So, the latency from pci_write_config() path could not be verified for now, > > could we apply this patch alone to erase the verified latency introduced > > by pci_read_config() path? :) > > Thanks for trying! I'll apply the patch as-is. I'd like to include a Thanks. :) > note in the commit log about the user-visible effect of this. I > looked through recent similar commits: > > 928da37a229f ("RDMA/umem: Add a schedule point in ib_umem_get()") > 47aaabdedf36 ("fanotify: Avoid softlockups when reading many events") > 9f47eb5461aa ("fs/btrfs: Add cond_resched() for try_release_extent_mapping() stalls") > 0a3b3c253a1e ("mm/mmap.c: Add cond_resched() for exit_mmap() CPU stalls") > b7e3debdd040 ("mm/memory_hotplug.c: fix false softlockup during pfn range removal") > d35bd764e689 ("dm writecache: add cond_resched to loop in persistent_memory_claim()") > da97f2d56bbd ("mm: call cond_resched() from deferred_init_memmap()") > ab8b65be1831 ("KVM: PPC: Book3S: Fix some RCU-list locks") > 48c963e31bc6 ("KVM: arm/arm64: Release kvm->mmu_lock in loop to prevent starvation") > e84fe99b68ce ("mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()") > 4005f5c3c9d0 ("wireguard: send/receive: cond_resched() when processing worker ringbuffers") > 3fd44c86711f ("io_uring: use cond_resched() in io_ring_ctx_wait_and_kill()") > 7979457b1d3a ("net: bridge: vlan: Add a schedule point during VLAN processing") > 2ed6edd33a21 ("perf: Add cond_resched() to task_function_call()") > 1edaa447d958 ("dm writecache: add cond_resched to avoid CPU hangs") > ce9a4186f9ac ("macvlan: add cond_resched() during multicast processing") > 7be1b9b8e9d1 ("drm/mm: Break long searches in fragmented address spaces") > bb699a793110 ("drm/i915/gem: Break up long lists of object reclaim") > 9424ef56e13a ("ext4: add cond_resched() to __ext4_find_entry()") > > and many of them mention softlockups, RCU CPU stall warnings, or > watchdogs triggering. Are you seeing one of those, or are you No softlockup or RCU stall warnings. > measuring latency some other way? We test the scheduling latency by cyclictest benchmark, the max latency could be more than 5ms without this patch. The ftrace log shows pci_read_config is the main cause, and the 5ms+ latency disappeared with this patch applied. Thanks a lot. Regards, Jiang ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] driver/pci: reduce the single block time in pci_read_config 2020-08-24 5:20 [PATCH] driver/pci: reduce the single block time in pci_read_config Jiang Biao 2020-08-27 9:49 ` Jiang Biao 2020-09-10 1:25 ` Bjorn Helgaas @ 2020-09-17 17:32 ` Bjorn Helgaas 2 siblings, 0 replies; 10+ messages in thread From: Bjorn Helgaas @ 2020-09-17 17:32 UTC (permalink / raw) To: Jiang Biao; +Cc: bhelgaas, linux-pci, linux-kernel, Jiang Biao, Bin Lai On Mon, Aug 24, 2020 at 01:20:25PM +0800, Jiang Biao wrote: > From: Jiang Biao <benbjiang@tencent.com> > > pci_read_config() could block several ms in kernel space, mainly > caused by the while loop to call pci_user_read_config_dword(). > Singel pci_user_read_config_dword() loop could consume 130us+, > | pci_user_read_config_dword() { > | _raw_spin_lock_irq() { > ! 136.698 us | native_queued_spin_lock_slowpath(); > ! 137.582 us | } > | pci_read() { > | raw_pci_read() { > | pci_conf1_read() { > 0.230 us | _raw_spin_lock_irqsave(); > 0.035 us | _raw_spin_unlock_irqrestore(); > 8.476 us | } > 8.790 us | } > 9.091 us | } > ! 147.263 us | } > and dozens of the loop could consume ms+. > > If we execute some lspci commands concurrently, ms+ scheduling > latency could be detected. > > Add scheduling chance in the loop to improve the latency. > > Reported-by: Bin Lai <robinlai@tencent.com> > Signed-off-by: Jiang Biao <benbjiang@tencent.com> Applied to pci/enumeration for v5.10, thanks! > --- > drivers/pci/pci-sysfs.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > index 6d78df9..3b9f63d 100644 > --- a/drivers/pci/pci-sysfs.c > +++ b/drivers/pci/pci-sysfs.c > @@ -708,6 +708,7 @@ static ssize_t pci_read_config(struct file *filp, struct kobject *kobj, > data[off - init_off + 3] = (val >> 24) & 0xff; > off += 4; > size -= 4; > + cond_resched(); > } > > if (size >= 2) { > -- > 1.8.3.1 > ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-09-17 17:34 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-08-24 5:20 [PATCH] driver/pci: reduce the single block time in pci_read_config Jiang Biao 2020-08-27 9:49 ` Jiang Biao 2020-09-10 1:25 ` Bjorn Helgaas 2020-09-10 1:54 ` Jiang Biao 2020-09-10 1:59 ` Bjorn Helgaas 2020-09-10 2:18 ` Jiang Biao 2020-09-13 4:27 ` Jiang Biao 2020-09-16 16:56 ` Bjorn Helgaas 2020-09-17 0:22 ` Jiang Biao 2020-09-17 17:32 ` Bjorn Helgaas
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.