* [PATCH net-next 1/2] sched: introduce nr_running_this_cpu() @ 2014-08-21 8:05 Jason Wang 2014-08-21 8:05 ` [PATCH net-next 2/2] net: exit busy loop when another process is runnable Jason Wang 2014-08-21 13:52 ` [PATCH net-next 1/2] sched: introduce nr_running_this_cpu() Ingo Molnar 0 siblings, 2 replies; 54+ messages in thread From: Jason Wang @ 2014-08-21 8:05 UTC (permalink / raw) To: davem, netdev, linux-kernel; +Cc: mst, Jason Wang, Ingo Molnar, Peter Zijlstra This patch introduces a helper nr_running_this_cpu() to return the number of runnable processes in current cpu. The first user will be net rx busy polling. It will use this to exit the busy loop when it finds more than one processes is runnable in current cpu. This can give us better performance of busy polling under heavy load. Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Jason Wang <jasowang@redhat.com> --- include/linux/sched.h | 1 + kernel/sched/core.c | 6 ++++++ 2 files changed, 7 insertions(+) diff --git a/include/linux/sched.h b/include/linux/sched.h index 5c2c885..e34020a 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -167,6 +167,7 @@ extern int nr_threads; DECLARE_PER_CPU(unsigned long, process_counts); extern int nr_processes(void); extern unsigned long nr_running(void); +extern unsigned long nr_running_this_cpu(void); extern unsigned long nr_iowait(void); extern unsigned long nr_iowait_cpu(int cpu); extern void get_iowait_load(unsigned long *nr_waiters, unsigned long *load); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index ec1a286..87fa7b5 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2366,6 +2366,12 @@ unsigned long nr_running(void) return sum; } +unsigned long nr_running_this_cpu(void) +{ + return this_rq()->nr_running; +} +EXPORT_SYMBOL(nr_running_this_cpu); + unsigned long long nr_context_switches(void) { int i; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-08-21 8:05 [PATCH net-next 1/2] sched: introduce nr_running_this_cpu() Jason Wang @ 2014-08-21 8:05 ` Jason Wang 2014-08-21 8:11 ` Michael S. Tsirkin ` (2 more replies) 2014-08-21 13:52 ` [PATCH net-next 1/2] sched: introduce nr_running_this_cpu() Ingo Molnar 1 sibling, 3 replies; 54+ messages in thread From: Jason Wang @ 2014-08-21 8:05 UTC (permalink / raw) To: davem, netdev, linux-kernel; +Cc: mst, Jason Wang Rx busy loop does not scale well in the case when several parallel sessions is active. This is because we keep looping even if there's another process is runnable. For example, if that process is about to send packet, keep busy polling in current process will brings extra delay and damage the performance. This patch solves this issue by exiting the busy loop when there's another process is runnable in current cpu. Simple test that pin two netperf sessions in the same cpu in receiving side shows obvious improvement: Before: netperf -H 192.168.100.2 -T 0,0 -t TCP_RR -P 0 & \ netperf -H 192.168.100.2 -T 1,0 -t TCP_RR -P 0 16384 87380 1 1 10.00 15513.74 16384 87380 16384 87380 1 1 10.00 15092.78 16384 87380 After: netperf -H 192.168.100.2 -T 0,0 -t TCP_RR -P 0 & \ netperf -H 192.168.100.2 -T 1,0 -t TCP_RR -P 0 16384 87380 1 1 10.00 23334.53 16384 87380 16384 87380 1 1 10.00 23327.58 16384 87380 Benchmark was done through two 8 cores Xeon machine back to back connected with mlx4 through netperf TCP_RR test (busy_read were set to 50): sessions/bytes/before/after/+improvement%/busy_read=0/ 1/1/30062.10/30034.72/+0%/20228.96/ 16/1/214719.83/307669.01/+43%/268997.71/ 32/1/231252.81/345845.16/+49%/336157.442/ 64/512/212467.39/373464.93/+75%/397449.375/ Signed-off-by: Jason Wang <jasowang@redhat.com> --- include/net/busy_poll.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h index 1d67fb6..8a33fb2 100644 --- a/include/net/busy_poll.h +++ b/include/net/busy_poll.h @@ -109,7 +109,8 @@ static inline bool sk_busy_loop(struct sock *sk, int nonblock) cpu_relax(); } while (!nonblock && skb_queue_empty(&sk->sk_receive_queue) && - !need_resched() && !busy_loop_timeout(end_time)); + !need_resched() && !busy_loop_timeout(end_time) && + nr_running_this_cpu() < 2); rc = !skb_queue_empty(&sk->sk_receive_queue); out: -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-08-21 8:05 ` [PATCH net-next 2/2] net: exit busy loop when another process is runnable Jason Wang @ 2014-08-21 8:11 ` Michael S. Tsirkin 2014-08-22 2:53 ` Jason Wang 2014-08-21 19:03 ` Amos Kong 2014-08-22 5:01 ` Mike Galbraith 2 siblings, 1 reply; 54+ messages in thread From: Michael S. Tsirkin @ 2014-08-21 8:11 UTC (permalink / raw) To: Jason Wang; +Cc: davem, netdev, linux-kernel On Thu, Aug 21, 2014 at 04:05:10PM +0800, Jason Wang wrote: > Rx busy loop does not scale well in the case when several parallel > sessions is active. This is because we keep looping even if there's > another process is runnable. For example, if that process is about to > send packet, keep busy polling in current process will brings extra > delay and damage the performance. > > This patch solves this issue by exiting the busy loop when there's > another process is runnable in current cpu. Simple test that pin two > netperf sessions in the same cpu in receiving side shows obvious > improvement: > > Before: > netperf -H 192.168.100.2 -T 0,0 -t TCP_RR -P 0 & \ > netperf -H 192.168.100.2 -T 1,0 -t TCP_RR -P 0 > 16384 87380 1 1 10.00 15513.74 > 16384 87380 > 16384 87380 1 1 10.00 15092.78 > 16384 87380 > > After: > netperf -H 192.168.100.2 -T 0,0 -t TCP_RR -P 0 & \ > netperf -H 192.168.100.2 -T 1,0 -t TCP_RR -P 0 > 16384 87380 1 1 10.00 23334.53 > 16384 87380 > 16384 87380 1 1 10.00 23327.58 > 16384 87380 > > Benchmark was done through two 8 cores Xeon machine back to back connected > with mlx4 through netperf TCP_RR test (busy_read were set to 50): > > sessions/bytes/before/after/+improvement%/busy_read=0/ > 1/1/30062.10/30034.72/+0%/20228.96/ > 16/1/214719.83/307669.01/+43%/268997.71/ > 32/1/231252.81/345845.16/+49%/336157.442/ > 64/512/212467.39/373464.93/+75%/397449.375/ > > Signed-off-by: Jason Wang <jasowang@redhat.com> > --- > include/net/busy_poll.h | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h > index 1d67fb6..8a33fb2 100644 > --- a/include/net/busy_poll.h > +++ b/include/net/busy_poll.h > @@ -109,7 +109,8 @@ static inline bool sk_busy_loop(struct sock *sk, int nonblock) > cpu_relax(); > > } while (!nonblock && skb_queue_empty(&sk->sk_receive_queue) && > - !need_resched() && !busy_loop_timeout(end_time)); > + !need_resched() && !busy_loop_timeout(end_time) && > + nr_running_this_cpu() < 2); <= 1 would be a bit clearer? We want at most one process here. > > rc = !skb_queue_empty(&sk->sk_receive_queue); > out: > -- > 1.8.3.1 ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-08-21 8:11 ` Michael S. Tsirkin @ 2014-08-22 2:53 ` Jason Wang 0 siblings, 0 replies; 54+ messages in thread From: Jason Wang @ 2014-08-22 2:53 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: davem, netdev, linux-kernel On 08/21/2014 04:11 PM, Michael S. Tsirkin wrote: > On Thu, Aug 21, 2014 at 04:05:10PM +0800, Jason Wang wrote: >> > Rx busy loop does not scale well in the case when several parallel >> > sessions is active. This is because we keep looping even if there's >> > another process is runnable. For example, if that process is about to >> > send packet, keep busy polling in current process will brings extra >> > delay and damage the performance. >> > >> > This patch solves this issue by exiting the busy loop when there's >> > another process is runnable in current cpu. Simple test that pin two >> > netperf sessions in the same cpu in receiving side shows obvious >> > improvement: >> > >> > Before: >> > netperf -H 192.168.100.2 -T 0,0 -t TCP_RR -P 0 & \ >> > netperf -H 192.168.100.2 -T 1,0 -t TCP_RR -P 0 >> > 16384 87380 1 1 10.00 15513.74 >> > 16384 87380 >> > 16384 87380 1 1 10.00 15092.78 >> > 16384 87380 >> > >> > After: >> > netperf -H 192.168.100.2 -T 0,0 -t TCP_RR -P 0 & \ >> > netperf -H 192.168.100.2 -T 1,0 -t TCP_RR -P 0 >> > 16384 87380 1 1 10.00 23334.53 >> > 16384 87380 >> > 16384 87380 1 1 10.00 23327.58 >> > 16384 87380 >> > >> > Benchmark was done through two 8 cores Xeon machine back to back connected >> > with mlx4 through netperf TCP_RR test (busy_read were set to 50): >> > >> > sessions/bytes/before/after/+improvement%/busy_read=0/ >> > 1/1/30062.10/30034.72/+0%/20228.96/ >> > 16/1/214719.83/307669.01/+43%/268997.71/ >> > 32/1/231252.81/345845.16/+49%/336157.442/ >> > 64/512/212467.39/373464.93/+75%/397449.375/ >> > >> > Signed-off-by: Jason Wang <jasowang@redhat.com> >> > --- >> > include/net/busy_poll.h | 3 ++- >> > 1 file changed, 2 insertions(+), 1 deletion(-) >> > >> > diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h >> > index 1d67fb6..8a33fb2 100644 >> > --- a/include/net/busy_poll.h >> > +++ b/include/net/busy_poll.h >> > @@ -109,7 +109,8 @@ static inline bool sk_busy_loop(struct sock *sk, int nonblock) >> > cpu_relax(); >> > >> > } while (!nonblock && skb_queue_empty(&sk->sk_receive_queue) && >> > - !need_resched() && !busy_loop_timeout(end_time)); >> > + !need_resched() && !busy_loop_timeout(end_time) && >> > + nr_running_this_cpu() < 2); > <= 1 would be a bit clearer? We want at most one process here. > Ok, will change it in next version. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-08-21 8:05 ` [PATCH net-next 2/2] net: exit busy loop when another process is runnable Jason Wang 2014-08-21 8:11 ` Michael S. Tsirkin @ 2014-08-21 19:03 ` Amos Kong 2014-08-22 5:01 ` Mike Galbraith 2 siblings, 0 replies; 54+ messages in thread From: Amos Kong @ 2014-08-21 19:03 UTC (permalink / raw) To: Jason Wang; +Cc: davem, netdev, linux-kernel, mst On Thu, Aug 21, 2014 at 04:05:10PM +0800, Jason Wang wrote: > Rx busy loop does not scale well in the case when several parallel > sessions is active. This is because we keep looping even if there's > another process is runnable. For example, if that process is about to > send packet, keep busy polling in current process will brings extra > delay and damage the performance. > > This patch solves this issue by exiting the busy loop when there's > another process is runnable in current cpu. Simple test that pin two > netperf sessions in the same cpu in receiving side shows obvious > improvement: > > Before: > netperf -H 192.168.100.2 -T 0,0 -t TCP_RR -P 0 & \ > netperf -H 192.168.100.2 -T 1,0 -t TCP_RR -P 0 > 16384 87380 1 1 10.00 15513.74 > 16384 87380 > 16384 87380 1 1 10.00 15092.78 > 16384 87380 > > After: > netperf -H 192.168.100.2 -T 0,0 -t TCP_RR -P 0 & \ > netperf -H 192.168.100.2 -T 1,0 -t TCP_RR -P 0 > 16384 87380 1 1 10.00 23334.53 > 16384 87380 > 16384 87380 1 1 10.00 23327.58 > 16384 87380 > > Benchmark was done through two 8 cores Xeon machine back to back connected > with mlx4 through netperf TCP_RR test (busy_read were set to 50): > > sessions/bytes/before/after/+improvement%/busy_read=0/ > 1/1/30062.10/30034.72/+0%/20228.96/ > 16/1/214719.83/307669.01/+43%/268997.71/ > 32/1/231252.81/345845.16/+49%/336157.442/ > 64/512/212467.39/373464.93/+75%/397449.375/ > > Signed-off-by: Jason Wang <jasowang@redhat.com> > --- > include/net/busy_poll.h | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h > index 1d67fb6..8a33fb2 100644 > --- a/include/net/busy_poll.h > +++ b/include/net/busy_poll.h > @@ -109,7 +109,8 @@ static inline bool sk_busy_loop(struct sock *sk, int nonblock) > cpu_relax(); > > } while (!nonblock && skb_queue_empty(&sk->sk_receive_queue) && > - !need_resched() && !busy_loop_timeout(end_time)); > + !need_resched() && !busy_loop_timeout(end_time) && > + nr_running_this_cpu() < 2); Reviewed-by: Amos Kong <akong@redhat.com> > rc = !skb_queue_empty(&sk->sk_receive_queue); > out: > -- > 1.8.3.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Amos. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-08-21 8:05 ` [PATCH net-next 2/2] net: exit busy loop when another process is runnable Jason Wang 2014-08-21 8:11 ` Michael S. Tsirkin 2014-08-21 19:03 ` Amos Kong @ 2014-08-22 5:01 ` Mike Galbraith 2014-08-22 7:29 ` Jason Wang ` (2 more replies) 2 siblings, 3 replies; 54+ messages in thread From: Mike Galbraith @ 2014-08-22 5:01 UTC (permalink / raw) To: Jason Wang; +Cc: davem, netdev, linux-kernel, mst, Peter Zijlstra, Ingo Molnar On Thu, 2014-08-21 at 16:05 +0800, Jason Wang wrote: > Rx busy loop does not scale well in the case when several parallel > sessions is active. This is because we keep looping even if there's > another process is runnable. For example, if that process is about to > send packet, keep busy polling in current process will brings extra > delay and damage the performance. > > This patch solves this issue by exiting the busy loop when there's > another process is runnable in current cpu. Simple test that pin two > netperf sessions in the same cpu in receiving side shows obvious > improvement: That patch says to me it's a bad idea to spin when someone (anyone) else can get some work done on a CPU, which intuitively makes sense. But.. (ponders net goop: with silly 1 byte ping-pong load, throughput is bound by fastpath latency, net plus sched plus fixable nohz and governor crud if not polling, so you can't get a lot of data moved byte at a time no matter how sexy the pipe whether polling or not due to bound. If OTOH net hardware is a blazing fast large bore packet cannon, net overhead per unit payload drops, sched+crud is a constant) Seems the only time it's a good idea to poll is if blasting big packets on sexy hardware, and if you're doing that, you want to poll regardless of whether somebody else is waiting, or? > Before: > netperf -H 192.168.100.2 -T 0,0 -t TCP_RR -P 0 & \ > netperf -H 192.168.100.2 -T 1,0 -t TCP_RR -P 0 > 16384 87380 1 1 10.00 15513.74 > 16384 87380 > 16384 87380 1 1 10.00 15092.78 > 16384 87380 > > After: > netperf -H 192.168.100.2 -T 0,0 -t TCP_RR -P 0 & \ > netperf -H 192.168.100.2 -T 1,0 -t TCP_RR -P 0 > 16384 87380 1 1 10.00 23334.53 > 16384 87380 > 16384 87380 1 1 10.00 23327.58 > 16384 87380 > > Benchmark was done through two 8 cores Xeon machine back to back connected > with mlx4 through netperf TCP_RR test (busy_read were set to 50): > > sessions/bytes/before/after/+improvement%/busy_read=0/ > 1/1/30062.10/30034.72/+0%/20228.96/ > 16/1/214719.83/307669.01/+43%/268997.71/ > 32/1/231252.81/345845.16/+49%/336157.442/ > 64/512/212467.39/373464.93/+75%/397449.375/ > > Signed-off-by: Jason Wang <jasowang@redhat.com> > --- > include/net/busy_poll.h | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h > index 1d67fb6..8a33fb2 100644 > --- a/include/net/busy_poll.h > +++ b/include/net/busy_poll.h > @@ -109,7 +109,8 @@ static inline bool sk_busy_loop(struct sock *sk, int nonblock) > cpu_relax(); > > } while (!nonblock && skb_queue_empty(&sk->sk_receive_queue) && > - !need_resched() && !busy_loop_timeout(end_time)); > + !need_resched() && !busy_loop_timeout(end_time) && > + nr_running_this_cpu() < 2); > > rc = !skb_queue_empty(&sk->sk_receive_queue); > out: ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-08-22 5:01 ` Mike Galbraith @ 2014-08-22 7:29 ` Jason Wang 2014-08-22 7:42 ` Ingo Molnar 2014-08-22 7:36 ` Ingo Molnar 2014-09-01 9:31 ` Peter Zijlstra 2 siblings, 1 reply; 54+ messages in thread From: Jason Wang @ 2014-08-22 7:29 UTC (permalink / raw) To: Mike Galbraith Cc: davem, netdev, linux-kernel, mst, Peter Zijlstra, Ingo Molnar On 08/22/2014 01:01 PM, Mike Galbraith wrote: > On Thu, 2014-08-21 at 16:05 +0800, Jason Wang wrote: >> > Rx busy loop does not scale well in the case when several parallel >> > sessions is active. This is because we keep looping even if there's >> > another process is runnable. For example, if that process is about to >> > send packet, keep busy polling in current process will brings extra >> > delay and damage the performance. >> > >> > This patch solves this issue by exiting the busy loop when there's >> > another process is runnable in current cpu. Simple test that pin two >> > netperf sessions in the same cpu in receiving side shows obvious >> > improvement: > That patch says to me it's a bad idea to spin when someone (anyone) else > can get some work done on a CPU, which intuitively makes sense. But.. > > (ponders net goop: with silly 1 byte ping-pong load, throughput is bound > by fastpath latency, net plus sched plus fixable nohz and governor crud > if not polling, so you can't get a lot of data moved byte at a time no > matter how sexy the pipe whether polling or not due to bound. If OTOH > net hardware is a blazing fast large bore packet cannon, net overhead > per unit payload drops, sched+crud is a constant) Polling could be done by either rx busy loop in process context or NAPI in softirq. Rx busy loop may only spin and poll when no packet were found in socket receive queue. It spins in the hope that at least one packet will come (in this case the process will exit rx busy loop) in a short while. In this way, it eliminates the overheads of NAPI, wakeup and scheduling. This patch just make the busy polling less aggressive: Since the process finds nothing to receive when still spinning in this loop, there's no need to waste cpu cycles ( or even call cpu_relax()) if there's another work could be done by current CPU. For stream workload like you mentioned here, if the card was fast enough, the socket receive queue was not easy to be drained. Rx busy loop won't help or even won't be triggered in this case. > > Seems the only time it's a good idea to poll is if blasting big packets > on sexy hardware, and if you're doing that, you want to poll regardless > of whether somebody else is waiting, or? NAPI will work instead of rx busy loop in this case. It will poll and try to drain nic's rx ring in softirq regardless somebody else. Btw, current rx busy loop does not perform well on stream workload since it bypasses GRO to reduce latency. But this issue beyond the scope of this patch. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-08-22 7:29 ` Jason Wang @ 2014-08-22 7:42 ` Ingo Molnar 2014-08-29 3:08 ` Jason Wang 0 siblings, 1 reply; 54+ messages in thread From: Ingo Molnar @ 2014-08-22 7:42 UTC (permalink / raw) To: Jason Wang Cc: Mike Galbraith, davem, netdev, linux-kernel, mst, Peter Zijlstra, Ingo Molnar * Jason Wang <jasowang@redhat.com> wrote: > Polling could be done by either rx busy loop in process > context or NAPI in softirq. [...] Note that this shows another reason why it's a bad idea to query nr_running directly: depending on the softirq processing method, a softirq might run: - directly in process context - in an idle thread's context - or in a ksoftirqd context. 'nr_running' will have different values in these cases, causing assymetries in busy-poll handling! Another class of assymetry is when there are other softirq bits pending, beyond NET_RX (or NET_TX): a nr_running check misses them. The solution I outlined in the previous mail (using a sched_expected_runtime() method) would be able to avoid most of these artifacts. Thanks, Ingo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-08-22 7:42 ` Ingo Molnar @ 2014-08-29 3:08 ` Jason Wang 2014-09-01 6:39 ` Eliezer Tamir 0 siblings, 1 reply; 54+ messages in thread From: Jason Wang @ 2014-08-29 3:08 UTC (permalink / raw) To: Ingo Molnar Cc: Mike Galbraith, davem, netdev, linux-kernel, mst, Peter Zijlstra, Ingo Molnar On 08/22/2014 03:42 PM, Ingo Molnar wrote: > * Jason Wang <jasowang@redhat.com> wrote: > >> Polling could be done by either rx busy loop in process >> context or NAPI in softirq. [...] > Note that this shows another reason why it's a bad idea to > query nr_running directly: depending on the softirq processing > method, a softirq might run: > > - directly in process context > - in an idle thread's context > - or in a ksoftirqd context. > > 'nr_running' will have different values in these cases, causing > assymetries in busy-poll handling! Current rx busy polling code only works in process context, we can add BUG or warnings to make sure the helper was only called in process context. > > Another class of assymetry is when there are other softirq bits > pending, beyond NET_RX (or NET_TX): a nr_running check misses > them. Yes, but rx busy polling only works in process context and does not disable bh, so it may be not an issue. > The solution I outlined in the previous mail (using a > sched_expected_runtime() method) would be able to avoid most of > these artifacts. It only take cares the remaining runnable time for current process, this is a good hint for sk_busy_loop_end_time() which return the maximum time a process could do busy polling. But this is not this patch needs. We need check the state of other runnable process in current cpu to avoid damaging the performance of other process. > > Thanks, > > Ingo > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-08-29 3:08 ` Jason Wang @ 2014-09-01 6:39 ` Eliezer Tamir 2014-09-02 3:29 ` Jason Wang 0 siblings, 1 reply; 54+ messages in thread From: Eliezer Tamir @ 2014-09-01 6:39 UTC (permalink / raw) To: Jason Wang, Ingo Molnar Cc: Mike Galbraith, davem, netdev, linux-kernel, mst, Peter Zijlstra, Ingo Molnar On 29/08/2014 06:08, Jason Wang wrote: > Yes, but rx busy polling only works in process context and does not > disable bh, so it may be not an issue. sk_busy_loop() uses rcu_read_lock_bh(), so it does run with bh disabled. -Eliezer ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-09-01 6:39 ` Eliezer Tamir @ 2014-09-02 3:29 ` Jason Wang 2014-09-02 6:15 ` Eliezer Tamir 0 siblings, 1 reply; 54+ messages in thread From: Jason Wang @ 2014-09-02 3:29 UTC (permalink / raw) To: Eliezer Tamir, Ingo Molnar Cc: Mike Galbraith, davem, netdev, linux-kernel, mst, Peter Zijlstra, Ingo Molnar On 09/01/2014 02:39 PM, Eliezer Tamir wrote: > On 29/08/2014 06:08, Jason Wang wrote: >> > Yes, but rx busy polling only works in process context and does not >> > disable bh, so it may be not an issue. > sk_busy_loop() uses rcu_read_lock_bh(), so it does run with bh disabled. > > -Eliezer True, so we need probably also exit the loop when there are pending bhs. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-09-02 3:29 ` Jason Wang @ 2014-09-02 6:15 ` Eliezer Tamir 2014-09-02 7:37 ` Jason Wang 2014-09-02 8:31 ` Michael S. Tsirkin 0 siblings, 2 replies; 54+ messages in thread From: Eliezer Tamir @ 2014-09-02 6:15 UTC (permalink / raw) To: Jason Wang, Ingo Molnar Cc: Mike Galbraith, davem, netdev, linux-kernel, mst, Peter Zijlstra, Ingo Molnar On 02/09/2014 06:29, Jason Wang wrote: > On 09/01/2014 02:39 PM, Eliezer Tamir wrote: >> On 29/08/2014 06:08, Jason Wang wrote: >>>> Yes, but rx busy polling only works in process context and does not >>>> disable bh, so it may be not an issue. >> sk_busy_loop() uses rcu_read_lock_bh(), so it does run with bh disabled. > > True, so we need probably also exit the loop when there are pending bhs. I'm not so sure, in the typical busy poll scenario, the incoming traffic is the most time-critical thing in the system. It's so important that you are willing to trade lots of CPU power for better latency. The user has decided that he wants to dedicate this CPU mostly for that. This is not something that plays nice with other apps, but this is what the user wants. So, you definitely don't want to starve any bh, and you should regularly re-enable bh's, but you also don't want to stop everything at any time a bh is scheduled. You also want network processing on the queues that are busy polled to come through busy polling and not through NAPI, which is run in bh context. -Eliezer ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-09-02 6:15 ` Eliezer Tamir @ 2014-09-02 7:37 ` Jason Wang 2014-09-02 8:31 ` Michael S. Tsirkin 1 sibling, 0 replies; 54+ messages in thread From: Jason Wang @ 2014-09-02 7:37 UTC (permalink / raw) To: Eliezer Tamir, Ingo Molnar Cc: Mike Galbraith, davem, netdev, linux-kernel, mst, Peter Zijlstra, Ingo Molnar On 09/02/2014 02:15 PM, Eliezer Tamir wrote: > On 02/09/2014 06:29, Jason Wang wrote: >> On 09/01/2014 02:39 PM, Eliezer Tamir wrote: >>> On 29/08/2014 06:08, Jason Wang wrote: >>>>> Yes, but rx busy polling only works in process context and does not >>>>> disable bh, so it may be not an issue. >>> sk_busy_loop() uses rcu_read_lock_bh(), so it does run with bh disabled. >> True, so we need probably also exit the loop when there are pending bhs. > I'm not so sure, in the typical busy poll scenario, the incoming > traffic is the most time-critical thing in the system. > It's so important that you are willing to trade lots of CPU power > for better latency. The user has decided that he wants to dedicate > this CPU mostly for that. But user should increase the process priority or cgroup in this case. > This is not something that plays nice with > other apps, but this is what the user wants. So the busy polling looks have a higher priority somehow than other processes. > So, you definitely don't want to starve any bh, and you should > regularly re-enable bh's, but you also don't want to stop everything > at any time a bh is scheduled. If I get your meaning, you may want call to rcu_read_lock_bh() and get socket napi id inside the do{} loop? This seems can prevent bhs from being starved and can also handle the case that the packets were from different NAPIs. > > You also want network processing on the queues that are busy polled > to come through busy polling and not through NAPI, which is run in bh > context. > > -Eliezer > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-09-02 6:15 ` Eliezer Tamir 2014-09-02 7:37 ` Jason Wang @ 2014-09-02 8:31 ` Michael S. Tsirkin 2014-09-03 6:49 ` Eliezer Tamir 1 sibling, 1 reply; 54+ messages in thread From: Michael S. Tsirkin @ 2014-09-02 8:31 UTC (permalink / raw) To: Eliezer Tamir Cc: Jason Wang, Ingo Molnar, Mike Galbraith, davem, netdev, linux-kernel, Peter Zijlstra, Ingo Molnar On Tue, Sep 02, 2014 at 09:15:18AM +0300, Eliezer Tamir wrote: > On 02/09/2014 06:29, Jason Wang wrote: > > On 09/01/2014 02:39 PM, Eliezer Tamir wrote: > >> On 29/08/2014 06:08, Jason Wang wrote: > >>>> Yes, but rx busy polling only works in process context and does not > >>>> disable bh, so it may be not an issue. > >> sk_busy_loop() uses rcu_read_lock_bh(), so it does run with bh disabled. > > > > True, so we need probably also exit the loop when there are pending bhs. > > I'm not so sure, in the typical busy poll scenario, the incoming > traffic is the most time-critical thing in the system. > It's so important that you are willing to trade lots of CPU power > for better latency. The user has decided that he wants to dedicate > this CPU mostly for that. This is not something that plays nice with > other apps, but this is what the user wants. I think most applications wouldn't interpret this flag as "burn up CPU I don't care what is the result", what apps want is more of "maximise throughput and minimise latency even if throughput/CPU ratio goes down". Jason posted benchmarks that show throughput going up because other processes get more of a chance to run, so this seems consistent with that goal. > So, you definitely don't want to starve any bh, and you should > regularly re-enable bh's, but you also don't want to stop everything > at any time a bh is scheduled. > > You also want network processing on the queues that are busy polled > to come through busy polling and not through NAPI, which is run in bh > context. > > -Eliezer ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-09-02 8:31 ` Michael S. Tsirkin @ 2014-09-03 6:49 ` Eliezer Tamir 2014-09-03 7:33 ` Jason Wang 2014-09-03 7:51 ` Michael S. Tsirkin 0 siblings, 2 replies; 54+ messages in thread From: Eliezer Tamir @ 2014-09-03 6:49 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Jason Wang, Ingo Molnar, Mike Galbraith, davem, netdev, linux-kernel, Peter Zijlstra, Ingo Molnar On 02/09/2014 11:31, Michael S. Tsirkin wrote: > On Tue, Sep 02, 2014 at 09:15:18AM +0300, Eliezer Tamir wrote: >> On 02/09/2014 06:29, Jason Wang wrote: >>> On 09/01/2014 02:39 PM, Eliezer Tamir wrote: >>>> On 29/08/2014 06:08, Jason Wang wrote: >>>>>> Yes, but rx busy polling only works in process context and does not >>>>>> disable bh, so it may be not an issue. >>>> sk_busy_loop() uses rcu_read_lock_bh(), so it does run with bh disabled. >>> >>> True, so we need probably also exit the loop when there are pending bhs. >> >> I'm not so sure, in the typical busy poll scenario, the incoming >> traffic is the most time-critical thing in the system. >> It's so important that you are willing to trade lots of CPU power >> for better latency. The user has decided that he wants to dedicate >> this CPU mostly for that. This is not something that plays nice with >> other apps, but this is what the user wants. > > I think most applications wouldn't interpret this flag as "burn up CPU I don't > care what is the result", what apps want is more of "maximise throughput > and minimise latency even if throughput/CPU ratio goes down". > Jason posted benchmarks that show throughput going up because other > processes get more of a chance to run, so this seems consistent > with that goal. Busy polling is not a general purpose feature, it's not something you can casually turn on and will "just work". Most applications should not be using busy polling. Currently it is used by multiserver applications that you spend days tuning to specific platforms. What the user wants is to lower both avg and maximum latencies, at the expense of everything else including power efficiency and sometimes even throughput. The only exception is making the system crash ;) While letting other things take precedence over busy polling might not hurt the avg latency much, it will kill your maximum latency. -Eliezer ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-09-03 6:49 ` Eliezer Tamir @ 2014-09-03 7:33 ` Jason Wang 2014-09-03 9:36 ` Peter Zijlstra 2014-09-03 7:51 ` Michael S. Tsirkin 1 sibling, 1 reply; 54+ messages in thread From: Jason Wang @ 2014-09-03 7:33 UTC (permalink / raw) To: Eliezer Tamir, Michael S. Tsirkin Cc: Ingo Molnar, Mike Galbraith, davem, netdev, linux-kernel, Peter Zijlstra, Ingo Molnar On 09/03/2014 02:49 PM, Eliezer Tamir wrote: > On 02/09/2014 11:31, Michael S. Tsirkin wrote: >> On Tue, Sep 02, 2014 at 09:15:18AM +0300, Eliezer Tamir wrote: >>> On 02/09/2014 06:29, Jason Wang wrote: >>>> On 09/01/2014 02:39 PM, Eliezer Tamir wrote: >>>>> On 29/08/2014 06:08, Jason Wang wrote: >>>>>>> Yes, but rx busy polling only works in process context and does not >>>>>>> disable bh, so it may be not an issue. >>>>> sk_busy_loop() uses rcu_read_lock_bh(), so it does run with bh disabled. >>>> True, so we need probably also exit the loop when there are pending bhs. >>> I'm not so sure, in the typical busy poll scenario, the incoming >>> traffic is the most time-critical thing in the system. >>> It's so important that you are willing to trade lots of CPU power >>> for better latency. The user has decided that he wants to dedicate >>> this CPU mostly for that. This is not something that plays nice with >>> other apps, but this is what the user wants. >> I think most applications wouldn't interpret this flag as "burn up CPU I don't >> care what is the result", what apps want is more of "maximise throughput >> and minimise latency even if throughput/CPU ratio goes down". >> Jason posted benchmarks that show throughput going up because other >> processes get more of a chance to run, so this seems consistent >> with that goal. > Busy polling is not a general purpose feature, it's not something you > can casually turn on and will "just work". Most applications should not > be using busy polling. How about busy read? It was enabled only through a global sysctl and then applications can make use of this without rewrite. > Currently it is used by multiserver applications > that you spend days tuning to specific platforms. I agree the polling code needs more thought but the patch only change busy read. A new issue is for virt users. I implement busy polling for virtio-net but we don't want one vcpu monopoly the cpu if there's some tasks on other vcpus. We may need some hint from host to guest to let it exit the loop if needed. > > What the user wants is to lower both avg and maximum latencies, at the > expense of everything else including power efficiency and sometimes > even throughput. The only exception is making the system crash ;) > > While letting other things take precedence over busy polling might not > hurt the avg latency much, it will kill your maximum latency. > > -Eliezer ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-09-03 7:33 ` Jason Wang @ 2014-09-03 9:36 ` Peter Zijlstra 2014-09-03 9:59 ` Michael S. Tsirkin 0 siblings, 1 reply; 54+ messages in thread From: Peter Zijlstra @ 2014-09-03 9:36 UTC (permalink / raw) To: Jason Wang Cc: Eliezer Tamir, Michael S. Tsirkin, Ingo Molnar, Mike Galbraith, davem, netdev, linux-kernel, Ingo Molnar On Wed, Sep 03, 2014 at 03:33:23PM +0800, Jason Wang wrote: > A new issue is for virt users. I implement busy polling for virtio-net > but we don't want one vcpu monopoly the cpu if there's some tasks on > other vcpus. We may need some hint from host to guest to let it exit the > loop if needed. Aw god.. virt nonsense is wrecking things again. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-09-03 9:36 ` Peter Zijlstra @ 2014-09-03 9:59 ` Michael S. Tsirkin 0 siblings, 0 replies; 54+ messages in thread From: Michael S. Tsirkin @ 2014-09-03 9:59 UTC (permalink / raw) To: Peter Zijlstra Cc: Jason Wang, Eliezer Tamir, Ingo Molnar, Mike Galbraith, davem, netdev, linux-kernel, Ingo Molnar On Wed, Sep 03, 2014 at 11:36:05AM +0200, Peter Zijlstra wrote: > On Wed, Sep 03, 2014 at 03:33:23PM +0800, Jason Wang wrote: > > A new issue is for virt users. I implement busy polling for virtio-net > > but we don't want one vcpu monopoly the cpu if there's some tasks on > > other vcpus. We may need some hint from host to guest to let it exit the > > loop if needed. > > Aw god.. virt nonsense is wrecking things again. I don't see a reason to bring virt up here. Original patch shows improvement with a simple loopback netperf test. Some people like the linux kernel so much, they want to run many instances of it on the same CPU. Understandable. So of course everything is then magnified x2 as you are running two schedulers and two networking stacks, but let's focus on the important thing which is better interaction between busy polling and the linux scheduler. -- MST ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-09-03 6:49 ` Eliezer Tamir 2014-09-03 7:33 ` Jason Wang @ 2014-09-03 7:51 ` Michael S. Tsirkin 2014-09-04 6:51 ` Eliezer Tamir 1 sibling, 1 reply; 54+ messages in thread From: Michael S. Tsirkin @ 2014-09-03 7:51 UTC (permalink / raw) To: Eliezer Tamir Cc: Jason Wang, Ingo Molnar, Mike Galbraith, davem, netdev, linux-kernel, Peter Zijlstra, Ingo Molnar On Wed, Sep 03, 2014 at 09:49:10AM +0300, Eliezer Tamir wrote: > On 02/09/2014 11:31, Michael S. Tsirkin wrote: > > On Tue, Sep 02, 2014 at 09:15:18AM +0300, Eliezer Tamir wrote: > >> On 02/09/2014 06:29, Jason Wang wrote: > >>> On 09/01/2014 02:39 PM, Eliezer Tamir wrote: > >>>> On 29/08/2014 06:08, Jason Wang wrote: > >>>>>> Yes, but rx busy polling only works in process context and does not > >>>>>> disable bh, so it may be not an issue. > >>>> sk_busy_loop() uses rcu_read_lock_bh(), so it does run with bh disabled. > >>> > >>> True, so we need probably also exit the loop when there are pending bhs. > >> > >> I'm not so sure, in the typical busy poll scenario, the incoming > >> traffic is the most time-critical thing in the system. > >> It's so important that you are willing to trade lots of CPU power > >> for better latency. The user has decided that he wants to dedicate > >> this CPU mostly for that. This is not something that plays nice with > >> other apps, but this is what the user wants. > > > > I think most applications wouldn't interpret this flag as "burn up CPU I don't > > care what is the result", what apps want is more of "maximise throughput > > and minimise latency even if throughput/CPU ratio goes down". > > Jason posted benchmarks that show throughput going up because other > > processes get more of a chance to run, so this seems consistent > > with that goal. > > Busy polling is not a general purpose feature, it's not something you > can casually turn on and will "just work". Most applications should not > be using busy polling. Currently it is used by multiserver applications > that you spend days tuning to specific platforms. > > What the user wants is to lower both avg and maximum latencies, at the > expense of everything else including power efficiency and sometimes > even throughput. The only exception is making the system crash ;) > > While letting other things take precedence over busy polling might not > hurt the avg latency much, it will kill your maximum latency. > > -Eliezer If scheduler happens to run both server and client on the same CPU, polling will hurt maximum latency even more. So I guess different users want different things. How about applications giving us a hint what they prefer? For example, a new flag that says "I don't have anything useful to do so let's do busy polling but my server is on the local system, so please only poll if CPU is otherwise idle". -- MST ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-09-03 7:51 ` Michael S. Tsirkin @ 2014-09-04 6:51 ` Eliezer Tamir 2014-09-04 8:25 ` Michael S. Tsirkin 0 siblings, 1 reply; 54+ messages in thread From: Eliezer Tamir @ 2014-09-04 6:51 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Jason Wang, Ingo Molnar, Mike Galbraith, davem, netdev, linux-kernel, Peter Zijlstra, Ingo Molnar On 03/09/2014 10:51, Michael S. Tsirkin wrote: > On Wed, Sep 03, 2014 at 09:49:10AM +0300, Eliezer Tamir wrote: >> On 02/09/2014 11:31, Michael S. Tsirkin wrote: >>> On Tue, Sep 02, 2014 at 09:15:18AM +0300, Eliezer Tamir wrote: >> Busy polling is not a general purpose feature, it's not something you >> can casually turn on and will "just work". Most applications should not >> be using busy polling. Currently it is used by multiserver applications >> that you spend days tuning to specific platforms. >> >> What the user wants is to lower both avg and maximum latencies, at the >> expense of everything else including power efficiency and sometimes >> even throughput. The only exception is making the system crash ;) >> >> While letting other things take precedence over busy polling might not >> hurt the avg latency much, it will kill your maximum latency. > > If scheduler happens to run both server and client on the > same CPU, polling will hurt maximum latency even more. > So I guess different users want different things. > > How about applications giving us a hint what they prefer? > For example, a new flag that says "I don't have anything useful to do so > let's do busy polling but my server is on the local system, so please > only poll if CPU is otherwise idle". I'm sorry for being ambiguous, when I said multi-server application, I meant an app that runs on more than one server machine. The loopback test is as far as I know not interesting. Of course if busypoll becomes interesting for virtualization over loopback, I have no problem with that, provided that there is a way to get the old behavior and that it is well documented. -Eliezer ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-09-04 6:51 ` Eliezer Tamir @ 2014-09-04 8:25 ` Michael S. Tsirkin 0 siblings, 0 replies; 54+ messages in thread From: Michael S. Tsirkin @ 2014-09-04 8:25 UTC (permalink / raw) To: Eliezer Tamir Cc: Jason Wang, Ingo Molnar, Mike Galbraith, davem, netdev, linux-kernel, Peter Zijlstra, Ingo Molnar On Thu, Sep 04, 2014 at 09:51:33AM +0300, Eliezer Tamir wrote: > On 03/09/2014 10:51, Michael S. Tsirkin wrote: > > On Wed, Sep 03, 2014 at 09:49:10AM +0300, Eliezer Tamir wrote: > >> On 02/09/2014 11:31, Michael S. Tsirkin wrote: > >>> On Tue, Sep 02, 2014 at 09:15:18AM +0300, Eliezer Tamir wrote: > > >> Busy polling is not a general purpose feature, it's not something you > >> can casually turn on and will "just work". Most applications should not > >> be using busy polling. Currently it is used by multiserver applications > >> that you spend days tuning to specific platforms. > >> > >> What the user wants is to lower both avg and maximum latencies, at the > >> expense of everything else including power efficiency and sometimes > >> even throughput. The only exception is making the system crash ;) > >> > >> While letting other things take precedence over busy polling might not > >> hurt the avg latency much, it will kill your maximum latency. > > > > If scheduler happens to run both server and client on the > > same CPU, polling will hurt maximum latency even more. > > So I guess different users want different things. > > > > How about applications giving us a hint what they prefer? > > For example, a new flag that says "I don't have anything useful to do so > > let's do busy polling but my server is on the local system, so please > > only poll if CPU is otherwise idle". > > I'm sorry for being ambiguous, when I said multi-server application, I > meant an app that runs on more than one server machine. > > The loopback test is as far as I know not interesting. It's not interesting for ethernet NIC vendors :). But people do deploy servers and clients on the same box, and improving latency for them would be benefitial. Loopback is widely used by desktops, containers, vpns, etc etc Getting more theoretical, even with external networking, any application processing traffic in more than one thread behaves the same. For example, it's not uncommon to have a thread get messages from a socket, and pass them out to other threads for processing. At the moment the polling thread would sometimes delay the processing if it lands on the same CPU. > Of course if busypoll becomes interesting for virtualization over > loopback, IMO bringing up virtualization just muddies the waters, it's a general problem that arises whenever you run a lot of stuff on your CPU. > I have no problem with that, provided that there is a way to > get the old behavior and that it is well documented. > > -Eliezer I agree here, since some application might be relying on the old behavior. -- MST ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-08-22 5:01 ` Mike Galbraith 2014-08-22 7:29 ` Jason Wang @ 2014-08-22 7:36 ` Ingo Molnar 2014-08-22 9:08 ` Jason Wang ` (2 more replies) 2014-09-01 9:31 ` Peter Zijlstra 2 siblings, 3 replies; 54+ messages in thread From: Ingo Molnar @ 2014-08-22 7:36 UTC (permalink / raw) To: Mike Galbraith, Jason Wang Cc: Jason Wang, davem, netdev, linux-kernel, mst, Peter Zijlstra, Ingo Molnar > > diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h > > index 1d67fb6..8a33fb2 100644 > > --- a/include/net/busy_poll.h > > +++ b/include/net/busy_poll.h > > @@ -109,7 +109,8 @@ static inline bool sk_busy_loop(struct sock *sk, int nonblock) > > cpu_relax(); > > > > } while (!nonblock && skb_queue_empty(&sk->sk_receive_queue) && > > - !need_resched() && !busy_loop_timeout(end_time)); > > + !need_resched() && !busy_loop_timeout(end_time) && > > + nr_running_this_cpu() < 2); So it's generally a bad idea to couple to the scheduler through such a low level, implementation dependent value like 'nr_running', causing various problems: - It misses important work that might be pending on this CPU, like RCU callbacks. - It will also over-credit task contexts that might be runnable, but which are less important than the currently running one: such as a SCHED_IDLE task - It will also over-credit even regular SCHED_NORMAL tasks, if this current task is more important than them: say SCHED_FIFO. A SCHED_FIFO workload should run just as fast with SCHED_NORMAL tasks around, as a SCHED_NORMAL workload on an otherwise idle system. So what you want is a more sophisticated query to the scheduler, a sched_expected_runtime() method that returns the number of nsecs this task is expected to run in the future, which returns 0 if you will be scheduled away on the next schedule(), and returns infinity for a high prio SCHED_FIFO task, or if this SCHED_NORMAL task is on an otherwise idle CPU. It will return a regular time slice value in other cases, when there's some load on the CPU. The polling logic can then do its decision based on that time value. All this can be done reasonably fast and lockless in most cases, so that it can be called from busy-polling code. An added advantage would be that this approach consolidates the somewhat random need_resched() checks into this method as well. In any case I don't agree with the nr_running_this_cpu() method. (Please Cc: me and lkml to future iterations of this patchset.) Thanks, Ingo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-08-22 7:36 ` Ingo Molnar @ 2014-08-22 9:08 ` Jason Wang 2014-08-22 14:16 ` Eric Dumazet 2014-09-03 8:09 ` Michael S. Tsirkin 2016-04-11 16:31 ` Michael S. Tsirkin 2 siblings, 1 reply; 54+ messages in thread From: Jason Wang @ 2014-08-22 9:08 UTC (permalink / raw) To: Ingo Molnar, Mike Galbraith Cc: davem, netdev, linux-kernel, mst, Peter Zijlstra, Ingo Molnar On 08/22/2014 03:36 PM, Ingo Molnar wrote: >>> diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h >>> index 1d67fb6..8a33fb2 100644 >>> --- a/include/net/busy_poll.h >>> +++ b/include/net/busy_poll.h >>> @@ -109,7 +109,8 @@ static inline bool sk_busy_loop(struct sock *sk, int nonblock) >>> cpu_relax(); >>> >>> } while (!nonblock && skb_queue_empty(&sk->sk_receive_queue) && >>> - !need_resched() && !busy_loop_timeout(end_time)); >>> + !need_resched() && !busy_loop_timeout(end_time) && >>> + nr_running_this_cpu() < 2); > So it's generally a bad idea to couple to the scheduler through > such a low level, implementation dependent value like > 'nr_running', causing various problems: > > - It misses important work that might be pending on this CPU, > like RCU callbacks. > > - It will also over-credit task contexts that might be > runnable, but which are less important than the currently > running one: such as a SCHED_IDLE task > > - It will also over-credit even regular SCHED_NORMAL tasks, if > this current task is more important than them: say > SCHED_FIFO. A SCHED_FIFO workload should run just as fast > with SCHED_NORMAL tasks around, as a SCHED_NORMAL workload > on an otherwise idle system. I see. > So what you want is a more sophisticated query to the > scheduler, a sched_expected_runtime() method that returns the > number of nsecs this task is expected to run in the future, > which returns 0 if you will be scheduled away on the next > schedule(), and returns infinity for a high prio SCHED_FIFO > task, or if this SCHED_NORMAL task is on an otherwise idle CPU. > > It will return a regular time slice value in other cases, when > there's some load on the CPU. > > The polling logic can then do its decision based on that time > value. But this is just for current process. We want to determine whether or not it was worth to loop busily in current process by checking if there's any another runnable processes or callbacks. And what we need here is just a simple and lockless hint which can't be wrong but may be inaccurate to exit the busy loop. The net code does not depends on this hint to do scheduling or yielding. How about just introducing a boolean helper like current_can_busy_loop() and return true in one of the following conditions: - Current task is SCHED_FIFO - Current task is neither SCHED_FIFO nor SCHED_IDLE and no other runnable processes or pending RCU callbacks in current cpu And add warns to make sure it can only be called in process context. Thanks > > All this can be done reasonably fast and lockless in most > cases, so that it can be called from busy-polling code. > > An added advantage would be that this approach consolidates the > somewhat random need_resched() checks into this method as well. > > In any case I don't agree with the nr_running_this_cpu() > method. > > (Please Cc: me and lkml to future iterations of this patchset.) > > Thanks, > > Ingo > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-08-22 9:08 ` Jason Wang @ 2014-08-22 14:16 ` Eric Dumazet 2014-08-25 2:54 ` Jason Wang 2014-08-25 13:16 ` Eliezer Tamir 0 siblings, 2 replies; 54+ messages in thread From: Eric Dumazet @ 2014-08-22 14:16 UTC (permalink / raw) To: Jason Wang Cc: Ingo Molnar, Mike Galbraith, davem, netdev, linux-kernel, mst, Peter Zijlstra, Ingo Molnar On Fri, 2014-08-22 at 17:08 +0800, Jason Wang wrote: > But this is just for current process. We want to determine whether or > not it was worth to loop busily in current process by checking if > there's any another runnable processes or callbacks. And what we need > here is just a simple and lockless hint which can't be wrong but may be > inaccurate to exit the busy loop. The net code does not depends on this > hint to do scheduling or yielding. > > How about just introducing a boolean helper like current_can_busy_loop() > and return true in one of the following conditions: > > - Current task is SCHED_FIFO > - Current task is neither SCHED_FIFO nor SCHED_IDLE and no other > runnable processes or pending RCU callbacks in current cpu > > And add warns to make sure it can only be called in process context. 1) Any reasons Eliezer Tamir is not included in the CC list ? He is the busypoll author after all, and did nothing wrong to be banned from these patches ;) 2) It looks like sk_buy_loop() should not be inlined, its is already too big. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-08-22 14:16 ` Eric Dumazet @ 2014-08-25 2:54 ` Jason Wang 2014-08-25 13:16 ` Eliezer Tamir 1 sibling, 0 replies; 54+ messages in thread From: Jason Wang @ 2014-08-25 2:54 UTC (permalink / raw) To: Eric Dumazet Cc: Ingo Molnar, Mike Galbraith, davem, netdev, linux-kernel, mst, Peter Zijlstra, Ingo Molnar, eliezer.tamir On 08/22/2014 10:16 PM, Eric Dumazet wrote: > On Fri, 2014-08-22 at 17:08 +0800, Jason Wang wrote: > >> > But this is just for current process. We want to determine whether or >> > not it was worth to loop busily in current process by checking if >> > there's any another runnable processes or callbacks. And what we need >> > here is just a simple and lockless hint which can't be wrong but may be >> > inaccurate to exit the busy loop. The net code does not depends on this >> > hint to do scheduling or yielding. >> > >> > How about just introducing a boolean helper like current_can_busy_loop() >> > and return true in one of the following conditions: >> > >> > - Current task is SCHED_FIFO >> > - Current task is neither SCHED_FIFO nor SCHED_IDLE and no other >> > runnable processes or pending RCU callbacks in current cpu >> > >> > And add warns to make sure it can only be called in process context. > 1) Any reasons Eliezer Tamir is not included in the CC list ? > > He is the busypoll author after all, and did nothing wrong to be > banned from these patches ;) CC Eliezer Tamir No intentional, I just generate the CC list through get_maintainer.pl :( > > 2) It looks like sk_buy_loop() should not be inlined, its is already too > big. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-08-22 14:16 ` Eric Dumazet 2014-08-25 2:54 ` Jason Wang @ 2014-08-25 13:16 ` Eliezer Tamir 2014-08-26 7:16 ` Jason Wang 1 sibling, 1 reply; 54+ messages in thread From: Eliezer Tamir @ 2014-08-25 13:16 UTC (permalink / raw) To: Eric Dumazet, Jason Wang Cc: Ingo Molnar, Mike Galbraith, davem, netdev, linux-kernel, mst, Peter Zijlstra, Ingo Molnar jacob.e.keller@intel.com On 22/08/2014 17:16, Eric Dumazet wrote: > On Fri, 2014-08-22 at 17:08 +0800, Jason Wang wrote: > >> But this is just for current process. We want to determine whether or >> not it was worth to loop busily in current process by checking if >> there's any another runnable processes or callbacks. And what we need >> here is just a simple and lockless hint which can't be wrong but may be >> inaccurate to exit the busy loop. The net code does not depends on this >> hint to do scheduling or yielding. >> >> How about just introducing a boolean helper like current_can_busy_loop() >> and return true in one of the following conditions: >> >> - Current task is SCHED_FIFO >> - Current task is neither SCHED_FIFO nor SCHED_IDLE and no other >> runnable processes or pending RCU callbacks in current cpu >> >> And add warns to make sure it can only be called in process context. > > > 1) Any reasons Eliezer Tamir is not included in the CC list ? Thanks for remembering me, Eric ;) Here are my 2 cents: I think Ingo's suggestion of only yielding to tasks with same or higher priority makes sense. IF you change the current behavior, please update the documentation. You are going to make people scratch their head and ask "what changed?" you owe them a clue. I also would like to have some way to keep track of when/if/how much this yield happens. > 2) It looks like sk_buy_loop() should not be inlined, its is already too > big. You made this comment in the past, my response was that it's inlnied so it can be optimized when nonblock is known at compile time to be true (e.g when called from sock_poll). IFF you think that's less important, then I will defer to your opinion. -Eliezer ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-08-25 13:16 ` Eliezer Tamir @ 2014-08-26 7:16 ` Jason Wang 2014-09-01 6:55 ` Eliezer Tamir 0 siblings, 1 reply; 54+ messages in thread From: Jason Wang @ 2014-08-26 7:16 UTC (permalink / raw) To: Eliezer Tamir, Eric Dumazet Cc: Ingo Molnar, Mike Galbraith, davem, netdev, linux-kernel, mst, Peter Zijlstra, Ingo Molnar jacob.e.keller@intel.com On 08/25/2014 09:16 PM, Eliezer Tamir wrote: > On 22/08/2014 17:16, Eric Dumazet wrote: >> > On Fri, 2014-08-22 at 17:08 +0800, Jason Wang wrote: >> > >>> >> But this is just for current process. We want to determine whether or >>> >> not it was worth to loop busily in current process by checking if >>> >> there's any another runnable processes or callbacks. And what we need >>> >> here is just a simple and lockless hint which can't be wrong but may be >>> >> inaccurate to exit the busy loop. The net code does not depends on this >>> >> hint to do scheduling or yielding. >>> >> >>> >> How about just introducing a boolean helper like current_can_busy_loop() >>> >> and return true in one of the following conditions: >>> >> >>> >> - Current task is SCHED_FIFO >>> >> - Current task is neither SCHED_FIFO nor SCHED_IDLE and no other >>> >> runnable processes or pending RCU callbacks in current cpu >>> >> >>> >> And add warns to make sure it can only be called in process context. >> > >> > >> > 1) Any reasons Eliezer Tamir is not included in the CC list ? > Thanks for remembering me, Eric ;) > > Here are my 2 cents: > I think Ingo's suggestion of only yielding to tasks with same or higher > priority makes sense. I'm not sure I get your meaning. Do you mean calling yield_to() directly in sk_busy_loop? Schedule() which will be called later should handle all cases such as priority and rt process. And this patch just want the schedule() to do this decision earlier by exiting the busy loop earlier. This will improve the latency in both heavy load and light load. Checking number of nsecs this task is expected to run in the future sounds like the work that sk_busy_loop_end_time() should consider. It was not the issue that this patch want to address. > > IF you change the current behavior, please update the documentation. > You are going to make people scratch their head and ask "what changed?" > you owe them a clue. Thanks for the reminding. But for this patch itself, it does not change user noticeable behaviour. > I also would like to have some way to keep track of when/if/how much > this yield happens. > Ok, not very hard to add, maybe just another statistics counter. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-08-26 7:16 ` Jason Wang @ 2014-09-01 6:55 ` Eliezer Tamir 2014-09-02 3:35 ` Jason Wang 0 siblings, 1 reply; 54+ messages in thread From: Eliezer Tamir @ 2014-09-01 6:55 UTC (permalink / raw) To: Jason Wang, Eric Dumazet Cc: Ingo Molnar, Mike Galbraith, davem, netdev, linux-kernel, mst, Peter Zijlstra, Ingo Molnar jacob.e.keller@intel.com On 26/08/2014 10:16, Jason Wang wrote: > On 08/25/2014 09:16 PM, Eliezer Tamir wrote: >> Here are my 2 cents: >> I think Ingo's suggestion of only yielding to tasks with same or higher >> priority makes sense. > > I'm not sure I get your meaning. Do you mean calling yield_to() directly > in sk_busy_loop? Think about the case where two processes are busy polling on the same CPU and the same device queue. Since busy polling processes incoming packets on the queue from any process, this scenario works well currently, and will not work at all when polling yields to other processes that are of the same priority that are running on the same CPU. As a side note, there is a lot of room for improvement when two processes on the same CPU want to busy poll on different device queues. The RFC code I published for epoll support showed one possible way of solving this, but I'm sure that there are other possibilities. Maybe the networking subsystem should maintain a list of device queues that need busypolling and have a thread that would poll all of them when there's nothing better to do. I'm aware of similar work on busy polling on NVMe devices, so maybe there should be a global busypoll thread for all devices that support it. BTW, I have someone inside Intel that wants to test future patches. Feel free to send me patches for testing, even if they are not ready for publishing yet. Cheers, Eliezer ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-09-01 6:55 ` Eliezer Tamir @ 2014-09-02 3:35 ` Jason Wang 2014-09-02 6:03 ` Eliezer Tamir 0 siblings, 1 reply; 54+ messages in thread From: Jason Wang @ 2014-09-02 3:35 UTC (permalink / raw) To: Eliezer Tamir, Eric Dumazet Cc: Ingo Molnar, Mike Galbraith, davem, netdev, linux-kernel, mst, Peter Zijlstra, Ingo Molnar jacob.e.keller@intel.com On 09/01/2014 02:55 PM, Eliezer Tamir wrote: > On 26/08/2014 10:16, Jason Wang wrote: >> On 08/25/2014 09:16 PM, Eliezer Tamir wrote: >>> Here are my 2 cents: >>> I think Ingo's suggestion of only yielding to tasks with same or higher >>> priority makes sense. >> I'm not sure I get your meaning. Do you mean calling yield_to() directly >> in sk_busy_loop? > Think about the case where two processes are busy polling on the > same CPU and the same device queue. Since busy polling processes > incoming packets on the queue from any process, this scenario works > well currently, I see, but looks like we can simply do this by exiting the busy loop when ndo_busy_poll() finds something but not for current socket? > and will not work at all when polling yields to other > processes that are of the same priority that are running on the same > CPU. So yielding has its limitation, we need let scheduler to do the choice instead. > > As a side note, there is a lot of room for improvement when two > processes on the same CPU want to busy poll on different device > queues. > The RFC code I published for epoll support showed one possible > way of solving this, but I'm sure that there are other possibilities. > > Maybe the networking subsystem should maintain a list of device > queues that need busypolling and have a thread that would poll > all of them when there's nothing better to do. Not sure whether this method will scale considering thousands of sockets and processes. > > I'm aware of similar work on busy polling on NVMe devices, so > maybe there should be a global busypoll thread for all devices > that support it. > > BTW, I have someone inside Intel that wants to test future patches. Feel > free to send me patches for testing, even if they are not ready for > publishing yet. > > Cheers, > Eliezer Ok, will do it, thanks a lot. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-09-02 3:35 ` Jason Wang @ 2014-09-02 6:03 ` Eliezer Tamir 2014-09-02 6:31 ` Jason Wang 0 siblings, 1 reply; 54+ messages in thread From: Eliezer Tamir @ 2014-09-02 6:03 UTC (permalink / raw) To: Jason Wang, Eric Dumazet Cc: Ingo Molnar, Mike Galbraith, davem, netdev, linux-kernel, mst, Peter Zijlstra, Ingo Molnar jacob.e.keller@intel.com On 02/09/2014 06:35, Jason Wang wrote: > On 09/01/2014 02:55 PM, Eliezer Tamir wrote: >> On 26/08/2014 10:16, Jason Wang wrote: >>> On 08/25/2014 09:16 PM, Eliezer Tamir wrote: >> Think about the case where two processes are busy polling on the >> same CPU and the same device queue. Since busy polling processes >> incoming packets on the queue from any process, this scenario works >> well currently, > > I see, but looks like we can simply do this by exiting the busy loop > when ndo_busy_poll() finds something but not for current socket? I don't think there is a need for that. When ndo_busy_poll() finds something it feeds it to the stack, which will process the packet, just as if it came from NAPI polling. So, if this is data that someone is blocked waiting on, the stack will wake them up, and then you presumably can decide which app should get the cpu. Note, that there is no easy way to know, when looking at the incoming traffic, whether it is important, or even if you are seeing a full message. (Maybe you only have 9 packets out of 10?) The only place this knowledge might exist is in the layers of the stack closer to the user. >> and will not work at all when polling yields to other >> processes that are of the same priority that are running on the same >> CPU. > >> >> Maybe the networking subsystem should maintain a list of device >> queues that need busypolling and have a thread that would poll >> all of them when there's nothing better to do. > > Not sure whether this method will scale considering thousands of sockets > and processes. There may be millions of sockets, but in most cases only a handful of device queues per CPU to busy poll on. I have tested the epoll rfc code with hundreds of thousands of sockets and one or two device queues and is scales pretty well. The part I don't like in that code is the cumbersome mechanism I used to track the socket -> queue relationship. I think that if I had more time to work on it, I would instead look into extending the epoll interface so that libevent can tell the kernel what it wants, instead of having the busypoll code try and learn it. Cheers, Eliezer ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-09-02 6:03 ` Eliezer Tamir @ 2014-09-02 6:31 ` Jason Wang 2014-09-03 6:21 ` Eliezer Tamir 2016-04-14 0:55 ` Peter Zijlstra 0 siblings, 2 replies; 54+ messages in thread From: Jason Wang @ 2014-09-02 6:31 UTC (permalink / raw) To: Eliezer Tamir, Eric Dumazet Cc: Ingo Molnar, Mike Galbraith, davem, netdev, linux-kernel, mst, Peter Zijlstra, Ingo Molnar jacob.e.keller@intel.com On 09/02/2014 02:03 PM, Eliezer Tamir wrote: > On 02/09/2014 06:35, Jason Wang wrote: >> On 09/01/2014 02:55 PM, Eliezer Tamir wrote: >>> On 26/08/2014 10:16, Jason Wang wrote: >>>> On 08/25/2014 09:16 PM, Eliezer Tamir wrote: >>> Think about the case where two processes are busy polling on the >>> same CPU and the same device queue. Since busy polling processes >>> incoming packets on the queue from any process, this scenario works >>> well currently, >> I see, but looks like we can simply do this by exiting the busy loop >> when ndo_busy_poll() finds something but not for current socket? > I don't think there is a need for that. > > When ndo_busy_poll() finds something it feeds it to the stack, which > will process the packet, just as if it came from NAPI polling. > So, if this is data that someone is blocked waiting on, the stack will > wake them up, and then you presumably can decide which app should get > the cpu. Yes, but current code can not do this. In most of the cases, the new woke up process have no chance to run if another process is busy loop in the same cpu. > > Note, that there is no easy way to know, when looking at the > incoming traffic, whether it is important, or even if you are seeing > a full message. (Maybe you only have 9 packets out of 10?) > The only place this knowledge might exist is in the layers of the > stack closer to the user. > >>> and will not work at all when polling yields to other >>> processes that are of the same priority that are running on the same >>> CPU. >>> Maybe the networking subsystem should maintain a list of device >>> queues that need busypolling and have a thread that would poll >>> all of them when there's nothing better to do. >> Not sure whether this method will scale considering thousands of sockets >> and processes. > There may be millions of sockets, but in most cases only a handful of > device queues per CPU to busy poll on. I have tested the epoll rfc > code with hundreds of thousands of sockets and one or two device > queues and is scales pretty well. > > The part I don't like in that code is the cumbersome mechanism I used > to track the socket -> queue relationship. I think that if I had more > time to work on it, I would instead look into extending the epoll > interface so that libevent can tell the kernel what it wants, instead > of having the busypoll code try and learn it. I'd like to have a look at this rfc. Could you please give me a pointer? I've done a quick search on kernel mailing list but didn't find it. Thanks > > Cheers, > Eliezer ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-09-02 6:31 ` Jason Wang @ 2014-09-03 6:21 ` Eliezer Tamir 2014-09-03 6:59 ` Jason Wang 2016-04-14 0:55 ` Peter Zijlstra 1 sibling, 1 reply; 54+ messages in thread From: Eliezer Tamir @ 2014-09-03 6:21 UTC (permalink / raw) To: Jason Wang, Eric Dumazet Cc: Ingo Molnar, Mike Galbraith, davem, netdev, linux-kernel, mst, Peter Zijlstra, Ingo Molnar jacob.e.keller@intel.com On 02/09/2014 09:31, Jason Wang wrote: > On 09/02/2014 02:03 PM, Eliezer Tamir wrote: >> On 02/09/2014 06:35, Jason Wang wrote: >>> Not sure whether this method will scale considering thousands of sockets >>> and processes. >> There may be millions of sockets, but in most cases only a handful of >> device queues per CPU to busy poll on. I have tested the epoll rfc >> code with hundreds of thousands of sockets and one or two device >> queues and is scales pretty well. >> >> The part I don't like in that code is the cumbersome mechanism I used >> to track the socket -> queue relationship. I think that if I had more >> time to work on it, I would instead look into extending the epoll >> interface so that libevent can tell the kernel what it wants, instead >> of having the busypoll code try and learn it. > > I'd like to have a look at this rfc. Could you please give me a pointer? > I've done a quick search on kernel mailing list but didn't find it. https://lkml.org/lkml/2013/8/21/192 Cheers, Eliezer ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-09-03 6:21 ` Eliezer Tamir @ 2014-09-03 6:59 ` Jason Wang 0 siblings, 0 replies; 54+ messages in thread From: Jason Wang @ 2014-09-03 6:59 UTC (permalink / raw) To: Eliezer Tamir, Eric Dumazet Cc: Ingo Molnar, Mike Galbraith, davem, netdev, linux-kernel, mst, Peter Zijlstra, Ingo Molnar jacob.e.keller@intel.com On 09/03/2014 02:21 PM, Eliezer Tamir wrote: > On 02/09/2014 09:31, Jason Wang wrote: >> On 09/02/2014 02:03 PM, Eliezer Tamir wrote: >>> On 02/09/2014 06:35, Jason Wang wrote: >>>> Not sure whether this method will scale considering thousands of sockets >>>> and processes. >>> There may be millions of sockets, but in most cases only a handful of >>> device queues per CPU to busy poll on. I have tested the epoll rfc >>> code with hundreds of thousands of sockets and one or two device >>> queues and is scales pretty well. >>> >>> The part I don't like in that code is the cumbersome mechanism I used >>> to track the socket -> queue relationship. I think that if I had more >>> time to work on it, I would instead look into extending the epoll >>> interface so that libevent can tell the kernel what it wants, instead >>> of having the busypoll code try and learn it. >> I'd like to have a look at this rfc. Could you please give me a pointer? >> I've done a quick search on kernel mailing list but didn't find it. > https://lkml.org/lkml/2013/8/21/192 > > Cheers, > Eliezer > -- Thanks. I will have a look at this series. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-09-02 6:31 ` Jason Wang 2014-09-03 6:21 ` Eliezer Tamir @ 2016-04-14 0:55 ` Peter Zijlstra 1 sibling, 0 replies; 54+ messages in thread From: Peter Zijlstra @ 2016-04-14 0:55 UTC (permalink / raw) To: Jason Wang Cc: Eliezer Tamir, Eric Dumazet, Ingo Molnar, Mike Galbraith, davem, netdev, linux-kernel, mst, Ingo Molnar jacob.e.keller@intel.com On Tue, Sep 02, 2014 at 02:31:53PM +0800, Jason Wang wrote: > Yes, but current code can not do this. In most of the cases, the new > woke up process have no chance to run if another process is busy loop in > the same cpu. That sounds weird; a process that slept will have an effective vruntime deficit vs one that has been running for a while. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-08-22 7:36 ` Ingo Molnar 2014-08-22 9:08 ` Jason Wang @ 2014-09-03 8:09 ` Michael S. Tsirkin 2016-04-11 16:31 ` Michael S. Tsirkin 2 siblings, 0 replies; 54+ messages in thread From: Michael S. Tsirkin @ 2014-09-03 8:09 UTC (permalink / raw) To: Ingo Molnar Cc: Mike Galbraith, Jason Wang, davem, netdev, linux-kernel, Peter Zijlstra, Ingo Molnar, Eliezer Tamir On Fri, Aug 22, 2014 at 09:36:53AM +0200, Ingo Molnar wrote: > > > > diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h > > > index 1d67fb6..8a33fb2 100644 > > > --- a/include/net/busy_poll.h > > > +++ b/include/net/busy_poll.h > > > @@ -109,7 +109,8 @@ static inline bool sk_busy_loop(struct sock *sk, int nonblock) > > > cpu_relax(); > > > > > > } while (!nonblock && skb_queue_empty(&sk->sk_receive_queue) && > > > - !need_resched() && !busy_loop_timeout(end_time)); > > > + !need_resched() && !busy_loop_timeout(end_time) && > > > + nr_running_this_cpu() < 2); > > So it's generally a bad idea to couple to the scheduler through > such a low level, implementation dependent value like > 'nr_running', causing various problems: > > - It misses important work that might be pending on this CPU, > like RCU callbacks. > > - It will also over-credit task contexts that might be > runnable, but which are less important than the currently > running one: such as a SCHED_IDLE task > > - It will also over-credit even regular SCHED_NORMAL tasks, if > this current task is more important than them: say > SCHED_FIFO. A SCHED_FIFO workload should run just as fast > with SCHED_NORMAL tasks around, as a SCHED_NORMAL workload > on an otherwise idle system. > > So what you want is a more sophisticated query to the > scheduler, a sched_expected_runtime() method that returns the > number of nsecs this task is expected to run in the future, > which returns 0 if you will be scheduled away on the next > schedule(), and returns infinity for a high prio SCHED_FIFO > task, or if this SCHED_NORMAL task is on an otherwise idle CPU. > > It will return a regular time slice value in other cases, when > there's some load on the CPU. > > The polling logic can then do its decision based on that time > value. > > All this can be done reasonably fast and lockless in most > cases, so that it can be called from busy-polling code. > > An added advantage would be that this approach consolidates the > somewhat random need_resched() checks into this method as well. > > In any case I don't agree with the nr_running_this_cpu() > method. > > (Please Cc: me and lkml to future iterations of this patchset.) > > Thanks, > > Ingo This sounds very nice. We could then have applications say "I want to poll only if no one else has urgent work" where urgent is defined as "has to run within next N nseconds". Really just a bit more flexibility added to busy polling. Peter, Ingo, does this sound good? -- MST ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-08-22 7:36 ` Ingo Molnar 2014-08-22 9:08 ` Jason Wang 2014-09-03 8:09 ` Michael S. Tsirkin @ 2016-04-11 16:31 ` Michael S. Tsirkin 2016-04-13 7:20 ` Ingo Molnar 2016-04-13 13:28 ` Peter Zijlstra 2 siblings, 2 replies; 54+ messages in thread From: Michael S. Tsirkin @ 2016-04-11 16:31 UTC (permalink / raw) To: Ingo Molnar Cc: Mike Galbraith, Jason Wang, davem, netdev, linux-kernel, Peter Zijlstra, Ingo Molnar On Fri, Aug 22, 2014 at 09:36:53AM +0200, Ingo Molnar wrote: > > > > diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h > > > index 1d67fb6..8a33fb2 100644 > > > --- a/include/net/busy_poll.h > > > +++ b/include/net/busy_poll.h > > > @@ -109,7 +109,8 @@ static inline bool sk_busy_loop(struct sock *sk, int nonblock) > > > cpu_relax(); > > > > > > } while (!nonblock && skb_queue_empty(&sk->sk_receive_queue) && > > > - !need_resched() && !busy_loop_timeout(end_time)); > > > + !need_resched() && !busy_loop_timeout(end_time) && > > > + nr_running_this_cpu() < 2); > > So it's generally a bad idea to couple to the scheduler through > such a low level, implementation dependent value like > 'nr_running', causing various problems: > > - It misses important work that might be pending on this CPU, > like RCU callbacks. > > - It will also over-credit task contexts that might be > runnable, but which are less important than the currently > running one: such as a SCHED_IDLE task > > - It will also over-credit even regular SCHED_NORMAL tasks, if > this current task is more important than them: say > SCHED_FIFO. A SCHED_FIFO workload should run just as fast > with SCHED_NORMAL tasks around, as a SCHED_NORMAL workload > on an otherwise idle system. > > So what you want is a more sophisticated query to the > scheduler, a sched_expected_runtime() method that returns the > number of nsecs this task is expected to run in the future, > which returns 0 if you will be scheduled away on the next > schedule(), and returns infinity for a high prio SCHED_FIFO > task, or if this SCHED_NORMAL task is on an otherwise idle CPU. > > It will return a regular time slice value in other cases, when > there's some load on the CPU. > > The polling logic can then do its decision based on that time > value. > > All this can be done reasonably fast and lockless in most > cases, so that it can be called from busy-polling code. > > An added advantage would be that this approach consolidates the > somewhat random need_resched() checks into this method as well. > > In any case I don't agree with the nr_running_this_cpu() > method. > > (Please Cc: me and lkml to future iterations of this patchset.) > > Thanks, > > Ingo I tried to look into this: it might be even nicer to add sched_expected_to_run(time) which tells us whether we expect the current task to keep running for the next XX nsecs. For the fair scheduler, it seems that it could be as simple as +static bool expected_to_run_fair(struct cfs_rq *cfs_rq, s64 t) +{ + struct sched_entity *left; + struct sched_entity *curr = cfs_rq->curr; + + if (!curr || !curr->on_rq) + return false; + + left = __pick_first_entity(cfs_rq); + if (!left) + return true; + + return (s64)(curr->vruntime + calc_delta_fair(t, curr) - + left->vruntime) < 0; +} The reason it seems easier is because that way we can reuse calc_delta_fair and don't have to do the reverse translation from vruntime to nsec. And I guess if we do this with interrupts disabled, and only poke at the current CPU's rq, we know first entity won't go away so we don't need locks? Is this close to what you had in mind? Thanks, -- MST ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2016-04-11 16:31 ` Michael S. Tsirkin @ 2016-04-13 7:20 ` Ingo Molnar 2016-04-13 13:28 ` Peter Zijlstra 1 sibling, 0 replies; 54+ messages in thread From: Ingo Molnar @ 2016-04-13 7:20 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Mike Galbraith, Jason Wang, davem, netdev, linux-kernel, Peter Zijlstra, Ingo Molnar * Michael S. Tsirkin <mst@redhat.com> wrote: > On Fri, Aug 22, 2014 at 09:36:53AM +0200, Ingo Molnar wrote: > > > > > > diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h > > > > index 1d67fb6..8a33fb2 100644 > > > > --- a/include/net/busy_poll.h > > > > +++ b/include/net/busy_poll.h > > > > @@ -109,7 +109,8 @@ static inline bool sk_busy_loop(struct sock *sk, int nonblock) > > > > cpu_relax(); > > > > > > > > } while (!nonblock && skb_queue_empty(&sk->sk_receive_queue) && > > > > - !need_resched() && !busy_loop_timeout(end_time)); > > > > + !need_resched() && !busy_loop_timeout(end_time) && > > > > + nr_running_this_cpu() < 2); > > > > So it's generally a bad idea to couple to the scheduler through > > such a low level, implementation dependent value like > > 'nr_running', causing various problems: > > > > - It misses important work that might be pending on this CPU, > > like RCU callbacks. > > > > - It will also over-credit task contexts that might be > > runnable, but which are less important than the currently > > running one: such as a SCHED_IDLE task > > > > - It will also over-credit even regular SCHED_NORMAL tasks, if > > this current task is more important than them: say > > SCHED_FIFO. A SCHED_FIFO workload should run just as fast > > with SCHED_NORMAL tasks around, as a SCHED_NORMAL workload > > on an otherwise idle system. > > > > So what you want is a more sophisticated query to the > > scheduler, a sched_expected_runtime() method that returns the > > number of nsecs this task is expected to run in the future, > > which returns 0 if you will be scheduled away on the next > > schedule(), and returns infinity for a high prio SCHED_FIFO > > task, or if this SCHED_NORMAL task is on an otherwise idle CPU. > > > > It will return a regular time slice value in other cases, when > > there's some load on the CPU. > > > > The polling logic can then do its decision based on that time > > value. > > > > All this can be done reasonably fast and lockless in most > > cases, so that it can be called from busy-polling code. > > > > An added advantage would be that this approach consolidates the > > somewhat random need_resched() checks into this method as well. > > > > In any case I don't agree with the nr_running_this_cpu() > > method. > > > > (Please Cc: me and lkml to future iterations of this patchset.) > > > > Thanks, > > > > Ingo > > I tried to look into this: it might be even nicer to add > sched_expected_to_run(time) which tells us whether we expect the current > task to keep running for the next XX nsecs. > > For the fair scheduler, it seems that it could be as simple as > > +static bool expected_to_run_fair(struct cfs_rq *cfs_rq, s64 t) > +{ > + struct sched_entity *left; > + struct sched_entity *curr = cfs_rq->curr; > + > + if (!curr || !curr->on_rq) > + return false; > + > + left = __pick_first_entity(cfs_rq); > + if (!left) > + return true; > + > + return (s64)(curr->vruntime + calc_delta_fair(t, curr) - > + left->vruntime) < 0; > +} > > The reason it seems easier is because that way we can reuse > calc_delta_fair and don't have to do the reverse translation > from vruntime to nsec. > > And I guess if we do this with interrupts disabled, and only poke > at the current CPU's rq, we know first entity > won't go away so we don't need locks? > > Is this close to what you had in mind? Yeah, fair enough ;-) I'm not 100% convinced about the interface, but the model looks good to me. Let's try it - I don't have fundamental objections anymore. I also agree that it could be done lockless - although I'd suggest two steps: first do the dumb thing with the proper scheduler lock(s) held, then another patch which removes the locks for a bit more performance. That will make any subtle crashes/races bisectable. Thanks, Ingo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2016-04-11 16:31 ` Michael S. Tsirkin 2016-04-13 7:20 ` Ingo Molnar @ 2016-04-13 13:28 ` Peter Zijlstra 2016-04-13 13:51 ` Michael S. Tsirkin 1 sibling, 1 reply; 54+ messages in thread From: Peter Zijlstra @ 2016-04-13 13:28 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Ingo Molnar, Mike Galbraith, Jason Wang, davem, netdev, linux-kernel, Ingo Molnar On Mon, Apr 11, 2016 at 07:31:57PM +0300, Michael S. Tsirkin wrote: > +static bool expected_to_run_fair(struct cfs_rq *cfs_rq, s64 t) > +{ > + struct sched_entity *left; > + struct sched_entity *curr = cfs_rq->curr; > + > + if (!curr || !curr->on_rq) > + return false; > + > + left = __pick_first_entity(cfs_rq); > + if (!left) > + return true; > + > + return (s64)(curr->vruntime + calc_delta_fair(t, curr) - > + left->vruntime) < 0; > +} > > The reason it seems easier is because that way we can reuse > calc_delta_fair and don't have to do the reverse translation > from vruntime to nsec. > > And I guess if we do this with interrupts disabled, and only poke > at the current CPU's rq, we know first entity > won't go away so we don't need locks? Nope, not true. Current isn't actually in the tree, and any other task is subject to being moved at any time. Even if current was in the tree, there is no guarantee it is ->rb_leftmost; imagine a task being migrated in that has a smaller vruntime. So this really cannot work without locks :/ I've not thought about the actual problem you're trying to solve; but I figured I'd let you know this before you continue down this path. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2016-04-13 13:28 ` Peter Zijlstra @ 2016-04-13 13:51 ` Michael S. Tsirkin 2016-04-14 0:58 ` Peter Zijlstra 0 siblings, 1 reply; 54+ messages in thread From: Michael S. Tsirkin @ 2016-04-13 13:51 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Mike Galbraith, Jason Wang, davem, netdev, linux-kernel, Ingo Molnar On Wed, Apr 13, 2016 at 03:28:03PM +0200, Peter Zijlstra wrote: > On Mon, Apr 11, 2016 at 07:31:57PM +0300, Michael S. Tsirkin wrote: > > +static bool expected_to_run_fair(struct cfs_rq *cfs_rq, s64 t) > > +{ > > + struct sched_entity *left; > > + struct sched_entity *curr = cfs_rq->curr; > > + > > + if (!curr || !curr->on_rq) > > + return false; > > + > > + left = __pick_first_entity(cfs_rq); > > + if (!left) > > + return true; > > + > > + return (s64)(curr->vruntime + calc_delta_fair(t, curr) - > > + left->vruntime) < 0; > > +} > > > > The reason it seems easier is because that way we can reuse > > calc_delta_fair and don't have to do the reverse translation > > from vruntime to nsec. > > > > And I guess if we do this with interrupts disabled, and only poke > > at the current CPU's rq, we know first entity > > won't go away so we don't need locks? > > Nope, not true. Current isn't actually in the tree, and any other task > is subject to being moved at any time. > Even if current was in the tree, there is no guarantee it is > ->rb_leftmost; imagine a task being migrated in that has a smaller > vruntime. > > So this really cannot work without locks :/ > > I've not thought about the actual problem you're trying to solve; but I > figured I'd let you know this before you continue down this path. Hmm. This is in fact called in the context of a given task, so maybe I should just change the API and use &task->se instead of cfs_rq->current : static bool expected_to_run_fair(struct cfs_rq *cfs_rq, struct task_struct *task, s64 t) { struct sched_entity *left; struct sched_entity *curr = &task->se; if (!curr || !curr->on_rq) return false; left = __pick_first_entity(cfs_rq); if (!left) return true; return (s64)(curr->vruntime + calc_delta_fair(t, curr) - left->vruntime) < 0; } This way it is caller's respinsibility to make sure task is not going away. Left here is on tree so it's not going away, right? -- MST ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2016-04-13 13:51 ` Michael S. Tsirkin @ 2016-04-14 0:58 ` Peter Zijlstra 0 siblings, 0 replies; 54+ messages in thread From: Peter Zijlstra @ 2016-04-14 0:58 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Ingo Molnar, Mike Galbraith, Jason Wang, davem, netdev, linux-kernel, Ingo Molnar On Wed, Apr 13, 2016 at 04:51:36PM +0300, Michael S. Tsirkin wrote: > On Wed, Apr 13, 2016 at 03:28:03PM +0200, Peter Zijlstra wrote: > > Nope, not true. Current isn't actually in the tree, and any other task > > is subject to being moved at any time. > > Even if current was in the tree, there is no guarantee it is > > ->rb_leftmost; imagine a task being migrated in that has a smaller > > vruntime. > > > > So this really cannot work without locks :/ > > > > I've not thought about the actual problem you're trying to solve; but I > > figured I'd let you know this before you continue down this path. > static bool expected_to_run_fair(struct cfs_rq *cfs_rq, struct task_struct *task, s64 t) > { > struct sched_entity *left; > struct sched_entity *curr = &task->se; > if (!curr || !curr->on_rq) > return false; > > left = __pick_first_entity(cfs_rq); > if (!left) > return true; > > return (s64)(curr->vruntime + calc_delta_fair(t, curr) - > left->vruntime) < 0; > } > > Left here is on tree so it's not going away, right? No, left is the leftmost entry on the tree, it can go away at any time. You cannot touch the tree without locks. Someone can come an migrate the task to another CPU, start executing it and it can die between you finding the left pointer and dereferencing it. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-08-22 5:01 ` Mike Galbraith 2014-08-22 7:29 ` Jason Wang 2014-08-22 7:36 ` Ingo Molnar @ 2014-09-01 9:31 ` Peter Zijlstra 2014-09-01 9:52 ` Michael S. Tsirkin 2 siblings, 1 reply; 54+ messages in thread From: Peter Zijlstra @ 2014-09-01 9:31 UTC (permalink / raw) To: Mike Galbraith; +Cc: Jason Wang, davem, netdev, linux-kernel, mst, Ingo Molnar On Fri, Aug 22, 2014 at 07:01:05AM +0200, Mike Galbraith wrote: > > +++ b/include/net/busy_poll.h > > @@ -109,7 +109,8 @@ static inline bool sk_busy_loop(struct sock *sk, int nonblock) > > cpu_relax(); > > > > } while (!nonblock && skb_queue_empty(&sk->sk_receive_queue) && > > - !need_resched() && !busy_loop_timeout(end_time)); > > + !need_resched() && !busy_loop_timeout(end_time) && > > + nr_running_this_cpu() < 2); > > So as has been said by now; this is horrible. We should not export nr_running like this ever. Your usage of < 2 implies this can be hit with nr_running == 0, and therefore you can also hit it with nr_running == 1 where the one is not network related and you get random delays. Worse still, you have BH (and thereby preemption) disabled, you should not _ever_ have undefined and indefinite waits like that. You also destroy any hope of dropping into lower power states; even when there's never going to be a packet ever again, also bad. All in all, a complete trainwreck. NAK. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-09-01 9:31 ` Peter Zijlstra @ 2014-09-01 9:52 ` Michael S. Tsirkin 2014-09-01 10:04 ` Peter Zijlstra 0 siblings, 1 reply; 54+ messages in thread From: Michael S. Tsirkin @ 2014-09-01 9:52 UTC (permalink / raw) To: Peter Zijlstra Cc: Mike Galbraith, Jason Wang, davem, netdev, linux-kernel, Ingo Molnar On Mon, Sep 01, 2014 at 11:31:59AM +0200, Peter Zijlstra wrote: > On Fri, Aug 22, 2014 at 07:01:05AM +0200, Mike Galbraith wrote: > > > +++ b/include/net/busy_poll.h > > > @@ -109,7 +109,8 @@ static inline bool sk_busy_loop(struct sock *sk, int nonblock) > > > cpu_relax(); > > > > > > } while (!nonblock && skb_queue_empty(&sk->sk_receive_queue) && > > > - !need_resched() && !busy_loop_timeout(end_time)); > > > + !need_resched() && !busy_loop_timeout(end_time) && > > > + nr_running_this_cpu() < 2); > > > > > So as has been said by now; this is horrible. > > We should not export nr_running like this ever. Your usage of < 2 > implies this can be hit with nr_running == 0, and therefore you can also > hit it with nr_running == 1 where the one is not network related and you > get random delays. > > Worse still, you have BH (and thereby preemption) disabled, you should > not _ever_ have undefined and indefinite waits like that. > > You also destroy any hope of dropping into lower power states; even when > there's never going to be a packet ever again, also bad. Hmm this patch sometimes makes us exit from the busy loop *earlier*. How can this interfere with dropping into lower power states? > All in all, a complete trainwreck. > > NAK. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-09-01 9:52 ` Michael S. Tsirkin @ 2014-09-01 10:04 ` Peter Zijlstra 2014-09-01 10:19 ` Peter Zijlstra ` (2 more replies) 0 siblings, 3 replies; 54+ messages in thread From: Peter Zijlstra @ 2014-09-01 10:04 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Mike Galbraith, Jason Wang, davem, netdev, linux-kernel, Ingo Molnar On Mon, Sep 01, 2014 at 12:52:19PM +0300, Michael S. Tsirkin wrote: > On Mon, Sep 01, 2014 at 11:31:59AM +0200, Peter Zijlstra wrote: > > On Fri, Aug 22, 2014 at 07:01:05AM +0200, Mike Galbraith wrote: > > > > +++ b/include/net/busy_poll.h > > > > @@ -109,7 +109,8 @@ static inline bool sk_busy_loop(struct sock *sk, int nonblock) > > > > cpu_relax(); > > > > > > > > } while (!nonblock && skb_queue_empty(&sk->sk_receive_queue) && > > > > - !need_resched() && !busy_loop_timeout(end_time)); > > > > + !need_resched() && !busy_loop_timeout(end_time) && > > > > + nr_running_this_cpu() < 2); > > > > > > > > So as has been said by now; this is horrible. > > > > We should not export nr_running like this ever. Your usage of < 2 > > implies this can be hit with nr_running == 0, and therefore you can also > > hit it with nr_running == 1 where the one is not network related and you > > get random delays. > > > > Worse still, you have BH (and thereby preemption) disabled, you should > > not _ever_ have undefined and indefinite waits like that. > > > > You also destroy any hope of dropping into lower power states; even when > > there's never going to be a packet ever again, also bad. > > Hmm this patch sometimes makes us exit from the busy loop *earlier*. > How can this interfere with dropping into lower power states? Ah.. jetlag.. :/ I read it like it owuld indefinitely spin if there was only the 'one' task, not avoid the spin unless there was the one task. The nr_running thing is still horrible, but let me reread this patch description to see if it explains why that is a good thing. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-09-01 10:04 ` Peter Zijlstra @ 2014-09-01 10:19 ` Peter Zijlstra 2014-09-02 4:03 ` Jason Wang 2014-09-01 10:22 ` Michael S. Tsirkin 2014-09-02 3:38 ` Jason Wang 2 siblings, 1 reply; 54+ messages in thread From: Peter Zijlstra @ 2014-09-01 10:19 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Mike Galbraith, Jason Wang, davem, netdev, linux-kernel, Ingo Molnar, Eliezer Tamir On Mon, Sep 01, 2014 at 12:04:34PM +0200, Peter Zijlstra wrote: > On Mon, Sep 01, 2014 at 12:52:19PM +0300, Michael S. Tsirkin wrote: > > On Mon, Sep 01, 2014 at 11:31:59AM +0200, Peter Zijlstra wrote: > > > On Fri, Aug 22, 2014 at 07:01:05AM +0200, Mike Galbraith wrote: > > > > > +++ b/include/net/busy_poll.h > > > > > @@ -109,7 +109,8 @@ static inline bool sk_busy_loop(struct sock *sk, int nonblock) > > > > > cpu_relax(); > > > > > > > > > > } while (!nonblock && skb_queue_empty(&sk->sk_receive_queue) && > > > > > - !need_resched() && !busy_loop_timeout(end_time)); > > > > > + !need_resched() && !busy_loop_timeout(end_time) && > > > > > + nr_running_this_cpu() < 2); > > > > > > > > > > > So as has been said by now; this is horrible. > > > > > > We should not export nr_running like this ever. Your usage of < 2 > > > implies this can be hit with nr_running == 0, and therefore you can also > > > hit it with nr_running == 1 where the one is not network related and you > > > get random delays. > > > > > > Worse still, you have BH (and thereby preemption) disabled, you should > > > not _ever_ have undefined and indefinite waits like that. > > > > > > You also destroy any hope of dropping into lower power states; even when > > > there's never going to be a packet ever again, also bad. > > > > Hmm this patch sometimes makes us exit from the busy loop *earlier*. > > How can this interfere with dropping into lower power states? > > Ah.. jetlag.. :/ I read it like it owuld indefinitely spin if there was > only the 'one' task, not avoid the spin unless there was the one task. > > The nr_running thing is still horrible, but let me reread this patch > description to see if it explains why that is a good thing. OK I suppose that more or less makes sense, the contextual behaviour is of course tedious in that it makes behaviour less predictable. The 'other' tasks might not want to generate data and you then destroy throughput by not spinning. I'm not entirely sure I see how its all supposed to work though; the various poll functions call sk_busy_poll() and do_select() also loops. The patch only kills the sk_busy_poll() loop, but then do_select() will still loop and not sleep, so how is this helping? ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-09-01 10:19 ` Peter Zijlstra @ 2014-09-02 4:03 ` Jason Wang 2014-09-02 10:24 ` Peter Zijlstra 0 siblings, 1 reply; 54+ messages in thread From: Jason Wang @ 2014-09-02 4:03 UTC (permalink / raw) To: Peter Zijlstra, Michael S. Tsirkin Cc: Mike Galbraith, davem, netdev, linux-kernel, Ingo Molnar, Eliezer Tamir On 09/01/2014 06:19 PM, Peter Zijlstra wrote: > On Mon, Sep 01, 2014 at 12:04:34PM +0200, Peter Zijlstra wrote: >> > On Mon, Sep 01, 2014 at 12:52:19PM +0300, Michael S. Tsirkin wrote: >>> > > On Mon, Sep 01, 2014 at 11:31:59AM +0200, Peter Zijlstra wrote: >>>> > > > On Fri, Aug 22, 2014 at 07:01:05AM +0200, Mike Galbraith wrote: >>>>>> > > > > > +++ b/include/net/busy_poll.h >>>>>> > > > > > @@ -109,7 +109,8 @@ static inline bool sk_busy_loop(struct sock *sk, int nonblock) >>>>>> > > > > > cpu_relax(); >>>>>> > > > > > >>>>>> > > > > > } while (!nonblock && skb_queue_empty(&sk->sk_receive_queue) && >>>>>> > > > > > - !need_resched() && !busy_loop_timeout(end_time)); >>>>>> > > > > > + !need_resched() && !busy_loop_timeout(end_time) && >>>>>> > > > > > + nr_running_this_cpu() < 2); >>>>>> > > > > > >>>> > > > >>>> > > > So as has been said by now; this is horrible. >>>> > > > >>>> > > > We should not export nr_running like this ever. Your usage of < 2 >>>> > > > implies this can be hit with nr_running == 0, and therefore you can also >>>> > > > hit it with nr_running == 1 where the one is not network related and you >>>> > > > get random delays. >>>> > > > >>>> > > > Worse still, you have BH (and thereby preemption) disabled, you should >>>> > > > not _ever_ have undefined and indefinite waits like that. >>>> > > > >>>> > > > You also destroy any hope of dropping into lower power states; even when >>>> > > > there's never going to be a packet ever again, also bad. >>> > > >>> > > Hmm this patch sometimes makes us exit from the busy loop *earlier*. >>> > > How can this interfere with dropping into lower power states? >> > >> > Ah.. jetlag.. :/ I read it like it owuld indefinitely spin if there was >> > only the 'one' task, not avoid the spin unless there was the one task. >> > >> > The nr_running thing is still horrible, but let me reread this patch >> > description to see if it explains why that is a good thing. > OK I suppose that more or less makes sense, the contextual behaviour is > of course tedious in that it makes behaviour less predictable. The > 'other' tasks might not want to generate data and you then destroy > throughput by not spinning. The patch try to make sure: - the the performance of busy read was not worse than it was disabled in any cases. - the performance improvement of a single socket was not achieved by sacrificing the total performance (all other processes) of the system If 'other' tasks are also CPU or I/O intensive jobs, we switch to do them so the total performance were kept or even increased, and the performance of current process were guaranteed not worse than when busy read was disabled (or even better since it may still do busy read sometimes when it was the only runnable process). If 'other' task are not intensive, they just do little work and sleep soon, then the busy read can still work in most of the time during the future reads, we may still get obvious improvements. > I'm not entirely sure I see how its all supposed to work though; the > various poll functions call sk_busy_poll() and do_select() also loops. > > The patch only kills the sk_busy_poll() loop, but then do_select() will > still loop and not sleep, so how is this helping? Yes, the patch only help for processes who did a blocking reads (busy read). For select(), maybe we can do the same thing but need more test and thoughts. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-09-02 4:03 ` Jason Wang @ 2014-09-02 10:24 ` Peter Zijlstra 2014-09-03 6:58 ` Jason Wang 0 siblings, 1 reply; 54+ messages in thread From: Peter Zijlstra @ 2014-09-02 10:24 UTC (permalink / raw) To: Jason Wang Cc: Michael S. Tsirkin, Mike Galbraith, davem, netdev, linux-kernel, Ingo Molnar, Eliezer Tamir On Tue, Sep 02, 2014 at 12:03:42PM +0800, Jason Wang wrote: > On 09/01/2014 06:19 PM, Peter Zijlstra wrote: > > OK I suppose that more or less makes sense, the contextual behaviour is > > of course tedious in that it makes behaviour less predictable. The > > 'other' tasks might not want to generate data and you then destroy > > throughput by not spinning. > > The patch try to make sure: > - the the performance of busy read was not worse than it was disabled in > any cases. > - the performance improvement of a single socket was not achieved by > sacrificing the total performance (all other processes) of the system > > If 'other' tasks are also CPU or I/O intensive jobs, we switch to do > them so the total performance were kept or even increased, and the > performance of current process were guaranteed not worse than when busy > read was disabled (or even better since it may still do busy read > sometimes when it was the only runnable process). If 'other' task are > not intensive, they just do little work and sleep soon, then the busy > read can still work in most of the time during the future reads, we may > still get obvious improvements. Not entirely true; the select/poll whatever will now block, which means we need a wakeup, which increases the latency immensely. > > I'm not entirely sure I see how its all supposed to work though; the > > various poll functions call sk_busy_poll() and do_select() also loops. > > > > The patch only kills the sk_busy_poll() loop, but then do_select() will > > still loop and not sleep, so how is this helping? > > Yes, the patch only help for processes who did a blocking reads (busy > read). For select(), maybe we can do the same thing but need more test > and thoughts. What's the blocking read callgraph, how so we end up in sk_busy_poll() there? But that's another reason the patch is wrong. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-09-02 10:24 ` Peter Zijlstra @ 2014-09-03 6:58 ` Jason Wang 2014-09-03 9:30 ` Peter Zijlstra 0 siblings, 1 reply; 54+ messages in thread From: Jason Wang @ 2014-09-03 6:58 UTC (permalink / raw) To: Peter Zijlstra Cc: Michael S. Tsirkin, Mike Galbraith, davem, netdev, linux-kernel, Ingo Molnar, Eliezer Tamir On 09/02/2014 06:24 PM, Peter Zijlstra wrote: > On Tue, Sep 02, 2014 at 12:03:42PM +0800, Jason Wang wrote: >> > On 09/01/2014 06:19 PM, Peter Zijlstra wrote: >>> > > OK I suppose that more or less makes sense, the contextual behaviour is >>> > > of course tedious in that it makes behaviour less predictable. The >>> > > 'other' tasks might not want to generate data and you then destroy >>> > > throughput by not spinning. >> > >> > The patch try to make sure: >> > - the the performance of busy read was not worse than it was disabled in >> > any cases. >> > - the performance improvement of a single socket was not achieved by >> > sacrificing the total performance (all other processes) of the system >> > >> > If 'other' tasks are also CPU or I/O intensive jobs, we switch to do >> > them so the total performance were kept or even increased, and the >> > performance of current process were guaranteed not worse than when busy >> > read was disabled (or even better since it may still do busy read >> > sometimes when it was the only runnable process). If 'other' task are >> > not intensive, they just do little work and sleep soon, then the busy >> > read can still work in most of the time during the future reads, we may >> > still get obvious improvements > Not entirely true; the select/poll whatever will now block, which means > we need a wakeup, which increases the latency immensely. Not sure I get your meaning. This patch does not change the logic or dynamic of select/poll since sock_poll() always call sk_busy_loop() with noblock is true. This means sk_busy_loop() will only try ndo_busy_poll() once whatever the result of other checks. The busy polling was done through its caller in fact. >>> > > I'm not entirely sure I see how its all supposed to work though; the >>> > > various poll functions call sk_busy_poll() and do_select() also loops. >>> > > >>> > > The patch only kills the sk_busy_poll() loop, but then do_select() will >>> > > still loop and not sleep, so how is this helping? >> > >> > Yes, the patch only help for processes who did a blocking reads (busy >> > read). For select(), maybe we can do the same thing but need more test >> > and thoughts. > What's the blocking read callgraph, how so we end up in sk_busy_poll() there? > > But that's another reason the patch is wrong. The patch only try to improve the performance of busy read (and test results shows impressive changes). It does not change anything for busy poll. Considering there maybe two processes in one cpu, one is doing busy read and one is doing busy polling. This patch may in fact help the busy polling performance in this case. It's good to discuss the ideas of busy poll together, but it was out of the scope of this patch. We can try to do optimization on top. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-09-03 6:58 ` Jason Wang @ 2014-09-03 9:30 ` Peter Zijlstra 0 siblings, 0 replies; 54+ messages in thread From: Peter Zijlstra @ 2014-09-03 9:30 UTC (permalink / raw) To: Jason Wang Cc: Michael S. Tsirkin, Mike Galbraith, davem, netdev, linux-kernel, Ingo Molnar, Eliezer Tamir On Wed, Sep 03, 2014 at 02:58:33PM +0800, Jason Wang wrote: > On 09/02/2014 06:24 PM, Peter Zijlstra wrote: > The patch only try to improve the performance of busy read (and test > results shows impressive changes). It does not change anything for busy > poll. Considering there maybe two processes in one cpu, one is doing > busy read and one is doing busy polling. This patch may in fact help the > busy polling performance in this case. > > It's good to discuss the ideas of busy poll together, but it was out of > the scope of this patch. We can try to do optimization on top. No thta's just wrong, blocked read and blocking select should behave the same. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-09-01 10:04 ` Peter Zijlstra 2014-09-01 10:19 ` Peter Zijlstra @ 2014-09-01 10:22 ` Michael S. Tsirkin 2014-09-02 3:38 ` Jason Wang 2 siblings, 0 replies; 54+ messages in thread From: Michael S. Tsirkin @ 2014-09-01 10:22 UTC (permalink / raw) To: Peter Zijlstra Cc: Mike Galbraith, Jason Wang, davem, netdev, linux-kernel, Ingo Molnar On Mon, Sep 01, 2014 at 12:04:34PM +0200, Peter Zijlstra wrote: > On Mon, Sep 01, 2014 at 12:52:19PM +0300, Michael S. Tsirkin wrote: > > On Mon, Sep 01, 2014 at 11:31:59AM +0200, Peter Zijlstra wrote: > > > On Fri, Aug 22, 2014 at 07:01:05AM +0200, Mike Galbraith wrote: > > > > > +++ b/include/net/busy_poll.h > > > > > @@ -109,7 +109,8 @@ static inline bool sk_busy_loop(struct sock *sk, int nonblock) > > > > > cpu_relax(); > > > > > > > > > > } while (!nonblock && skb_queue_empty(&sk->sk_receive_queue) && > > > > > - !need_resched() && !busy_loop_timeout(end_time)); > > > > > + !need_resched() && !busy_loop_timeout(end_time) && > > > > > + nr_running_this_cpu() < 2); > > > > > > > > > > > So as has been said by now; this is horrible. > > > > > > We should not export nr_running like this ever. Your usage of < 2 > > > implies this can be hit with nr_running == 0, and therefore you can also > > > hit it with nr_running == 1 where the one is not network related and you > > > get random delays. > > > > > > Worse still, you have BH (and thereby preemption) disabled, you should > > > not _ever_ have undefined and indefinite waits like that. > > > > > > You also destroy any hope of dropping into lower power states; even when > > > there's never going to be a packet ever again, also bad. > > > > Hmm this patch sometimes makes us exit from the busy loop *earlier*. > > How can this interfere with dropping into lower power states? > > Ah.. jetlag.. :/ I read it like it owuld indefinitely spin if there was > only the 'one' task, not avoid the spin unless there was the one task. > > The nr_running thing is still horrible, Yea, it's a kludge, but busy waiting is a heuristic thing anyway, so it boils down to whether it's mostly effective. I agree it would be better to make it more robust/consistent if we can do it without a lot of complexity. > but let me reread this patch > description to see if it explains why that is a good thing. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-09-01 10:04 ` Peter Zijlstra 2014-09-01 10:19 ` Peter Zijlstra 2014-09-01 10:22 ` Michael S. Tsirkin @ 2014-09-02 3:38 ` Jason Wang 2014-09-02 6:12 ` Peter Zijlstra 2 siblings, 1 reply; 54+ messages in thread From: Jason Wang @ 2014-09-02 3:38 UTC (permalink / raw) To: Peter Zijlstra, Michael S. Tsirkin Cc: Mike Galbraith, davem, netdev, linux-kernel, Ingo Molnar On 09/01/2014 06:04 PM, Peter Zijlstra wrote: > On Mon, Sep 01, 2014 at 12:52:19PM +0300, Michael S. Tsirkin wrote: >> On Mon, Sep 01, 2014 at 11:31:59AM +0200, Peter Zijlstra wrote: >>> On Fri, Aug 22, 2014 at 07:01:05AM +0200, Mike Galbraith wrote: >>>>> +++ b/include/net/busy_poll.h >>>>> @@ -109,7 +109,8 @@ static inline bool sk_busy_loop(struct sock *sk, int nonblock) >>>>> cpu_relax(); >>>>> >>>>> } while (!nonblock && skb_queue_empty(&sk->sk_receive_queue) && >>>>> - !need_resched() && !busy_loop_timeout(end_time)); >>>>> + !need_resched() && !busy_loop_timeout(end_time) && >>>>> + nr_running_this_cpu() < 2); >>>>> >>> So as has been said by now; this is horrible. >>> >>> We should not export nr_running like this ever. Your usage of < 2 >>> implies this can be hit with nr_running == 0, and therefore you can also >>> hit it with nr_running == 1 where the one is not network related and you >>> get random delays. >>> >>> Worse still, you have BH (and thereby preemption) disabled, you should >>> not _ever_ have undefined and indefinite waits like that. >>> >>> You also destroy any hope of dropping into lower power states; even when >>> there's never going to be a packet ever again, also bad. >> Hmm this patch sometimes makes us exit from the busy loop *earlier*. >> How can this interfere with dropping into lower power states? > Ah.. jetlag.. :/ I read it like it owuld indefinitely spin if there was > only the 'one' task, not avoid the spin unless there was the one task. > > The nr_running thing is still horrible, but let me reread this patch > description to see if it explains why that is a good thing. I see, how about just exporting a boolean helper like current_can_busy_loop() and take care all of the conditions (pending bhs and rcu callbacks, runnable processes) in scheduler code itself? ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-09-02 3:38 ` Jason Wang @ 2014-09-02 6:12 ` Peter Zijlstra 2014-09-02 7:19 ` Jason Wang 0 siblings, 1 reply; 54+ messages in thread From: Peter Zijlstra @ 2014-09-02 6:12 UTC (permalink / raw) To: Jason Wang Cc: Michael S. Tsirkin, Mike Galbraith, davem, netdev, linux-kernel, Ingo Molnar On Tue, Sep 02, 2014 at 11:38:40AM +0800, Jason Wang wrote: > I see, how about just exporting a boolean helper like > current_can_busy_loop() and take care all of the conditions (pending bhs > and rcu callbacks, runnable processes) in scheduler code itself? How is that going to help the cases that are hurt by not spinning for a packet? ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable 2014-09-02 6:12 ` Peter Zijlstra @ 2014-09-02 7:19 ` Jason Wang 0 siblings, 0 replies; 54+ messages in thread From: Jason Wang @ 2014-09-02 7:19 UTC (permalink / raw) To: Peter Zijlstra Cc: Michael S. Tsirkin, Mike Galbraith, davem, netdev, linux-kernel, Ingo Molnar On 09/02/2014 02:12 PM, Peter Zijlstra wrote: > On Tue, Sep 02, 2014 at 11:38:40AM +0800, Jason Wang wrote: >> > I see, how about just exporting a boolean helper like >> > current_can_busy_loop() and take care all of the conditions (pending bhs >> > and rcu callbacks, runnable processes) in scheduler code itself? > How is that going to help the cases that are hurt by not spinning for a > packet? The patch does not help for this case. Spinning in the case only help for a single process but hurt all others. Those other processes may not use busy polling or even non network related. Spinning still may give somewhat a high priority to the process who use busy polling or reading which is unfair. And how much we can gain for only a single process by spinning still when several other tasks could be done is still questionable. It's quite possible that we could only get one or zero packet after wasting lots of the cpu cycles when there are thousands or more sockets. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 1/2] sched: introduce nr_running_this_cpu() 2014-08-21 8:05 [PATCH net-next 1/2] sched: introduce nr_running_this_cpu() Jason Wang 2014-08-21 8:05 ` [PATCH net-next 2/2] net: exit busy loop when another process is runnable Jason Wang @ 2014-08-21 13:52 ` Ingo Molnar 2014-08-22 7:27 ` Jason Wang 1 sibling, 1 reply; 54+ messages in thread From: Ingo Molnar @ 2014-08-21 13:52 UTC (permalink / raw) To: Jason Wang; +Cc: davem, netdev, linux-kernel, mst, Ingo Molnar, Peter Zijlstra * Jason Wang <jasowang@redhat.com> wrote: > This patch introduces a helper nr_running_this_cpu() to return the > number of runnable processes in current cpu. > > The first user will be net rx busy polling. It will use this to exit > the busy loop when it finds more than one processes is runnable in > current cpu. This can give us better performance of busy polling under > heavy load. s/one processes/one process More importantly, please Cc: scheduler maintainers and lkml to both patches, so that the whole intent of the change can be reviewed, in full context. Thanks, Ingo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH net-next 1/2] sched: introduce nr_running_this_cpu() 2014-08-21 13:52 ` [PATCH net-next 1/2] sched: introduce nr_running_this_cpu() Ingo Molnar @ 2014-08-22 7:27 ` Jason Wang 0 siblings, 0 replies; 54+ messages in thread From: Jason Wang @ 2014-08-22 7:27 UTC (permalink / raw) To: Ingo Molnar; +Cc: davem, netdev, linux-kernel, mst, Ingo Molnar, Peter Zijlstra On 08/21/2014 09:52 PM, Ingo Molnar wrote: > * Jason Wang <jasowang@redhat.com> wrote: > >> This patch introduces a helper nr_running_this_cpu() to return the >> number of runnable processes in current cpu. >> >> The first user will be net rx busy polling. It will use this to exit >> the busy loop when it finds more than one processes is runnable in >> current cpu. This can give us better performance of busy polling under >> heavy load. > s/one processes/one process > > More importantly, please Cc: scheduler maintainers and lkml to > both patches, so that the whole intent of the change can be > reviewed, in full context. > > Thanks, > > Ingo Will do this. Thanks ^ permalink raw reply [flat|nested] 54+ messages in thread
end of thread, other threads:[~2016-04-14 15:28 UTC | newest] Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-08-21 8:05 [PATCH net-next 1/2] sched: introduce nr_running_this_cpu() Jason Wang 2014-08-21 8:05 ` [PATCH net-next 2/2] net: exit busy loop when another process is runnable Jason Wang 2014-08-21 8:11 ` Michael S. Tsirkin 2014-08-22 2:53 ` Jason Wang 2014-08-21 19:03 ` Amos Kong 2014-08-22 5:01 ` Mike Galbraith 2014-08-22 7:29 ` Jason Wang 2014-08-22 7:42 ` Ingo Molnar 2014-08-29 3:08 ` Jason Wang 2014-09-01 6:39 ` Eliezer Tamir 2014-09-02 3:29 ` Jason Wang 2014-09-02 6:15 ` Eliezer Tamir 2014-09-02 7:37 ` Jason Wang 2014-09-02 8:31 ` Michael S. Tsirkin 2014-09-03 6:49 ` Eliezer Tamir 2014-09-03 7:33 ` Jason Wang 2014-09-03 9:36 ` Peter Zijlstra 2014-09-03 9:59 ` Michael S. Tsirkin 2014-09-03 7:51 ` Michael S. Tsirkin 2014-09-04 6:51 ` Eliezer Tamir 2014-09-04 8:25 ` Michael S. Tsirkin 2014-08-22 7:36 ` Ingo Molnar 2014-08-22 9:08 ` Jason Wang 2014-08-22 14:16 ` Eric Dumazet 2014-08-25 2:54 ` Jason Wang 2014-08-25 13:16 ` Eliezer Tamir 2014-08-26 7:16 ` Jason Wang 2014-09-01 6:55 ` Eliezer Tamir 2014-09-02 3:35 ` Jason Wang 2014-09-02 6:03 ` Eliezer Tamir 2014-09-02 6:31 ` Jason Wang 2014-09-03 6:21 ` Eliezer Tamir 2014-09-03 6:59 ` Jason Wang 2016-04-14 0:55 ` Peter Zijlstra 2014-09-03 8:09 ` Michael S. Tsirkin 2016-04-11 16:31 ` Michael S. Tsirkin 2016-04-13 7:20 ` Ingo Molnar 2016-04-13 13:28 ` Peter Zijlstra 2016-04-13 13:51 ` Michael S. Tsirkin 2016-04-14 0:58 ` Peter Zijlstra 2014-09-01 9:31 ` Peter Zijlstra 2014-09-01 9:52 ` Michael S. Tsirkin 2014-09-01 10:04 ` Peter Zijlstra 2014-09-01 10:19 ` Peter Zijlstra 2014-09-02 4:03 ` Jason Wang 2014-09-02 10:24 ` Peter Zijlstra 2014-09-03 6:58 ` Jason Wang 2014-09-03 9:30 ` Peter Zijlstra 2014-09-01 10:22 ` Michael S. Tsirkin 2014-09-02 3:38 ` Jason Wang 2014-09-02 6:12 ` Peter Zijlstra 2014-09-02 7:19 ` Jason Wang 2014-08-21 13:52 ` [PATCH net-next 1/2] sched: introduce nr_running_this_cpu() Ingo Molnar 2014-08-22 7:27 ` Jason Wang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).