rcu.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 1/3] rcu: Expedite the rcu quiescent state reporting if help needed
@ 2019-07-01  4:04 Joel Fernandes (Google)
  2019-07-01  4:04 ` [RFC 2/3] rcu: Simplify rcu_note_context_switch exit from critical section Joel Fernandes (Google)
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Joel Fernandes (Google) @ 2019-07-01  4:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Josh Triplett, kernel-team, Lai Jiangshan, linux-kselftest,
	Mathieu Desnoyers, Paul E. McKenney, rcu,
	Sebastian Andrzej Siewior, Shuah Khan, Steven Rostedt

The t->rcu_read_unlock_special union's need_qs bit can be set by the
scheduler tick (in rcu_flavor_sched_clock_irq) to indicate that help is
needed from the rcu_read_unlock path. When this help arrives however, we
can do better to speed up the quiescent state reporting which if
rcu_read_unlock_special::need_qs is set might be quite urgent. Make use
of this information in deciding when to do heavy-weight softirq raising
where possible.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree_plugin.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index c588ef98efd3..bff6410fac06 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -622,7 +622,8 @@ static void rcu_read_unlock_special(struct task_struct *t)
 		t->rcu_read_unlock_special.b.exp_hint = false;
 		exp = (t->rcu_blocked_node && t->rcu_blocked_node->exp_tasks) ||
 		      (rdp->grpmask & rnp->expmask) ||
-		      tick_nohz_full_cpu(rdp->cpu);
+		      tick_nohz_full_cpu(rdp->cpu)  ||
+		      t->rcu_read_unlock_special.b.need_qs;
 		// Need to defer quiescent state until everything is enabled.
 		if (irqs_were_disabled && use_softirq &&
 		    (in_interrupt() ||
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC 2/3] rcu: Simplify rcu_note_context_switch exit from critical section
  2019-07-01  4:04 [RFC 1/3] rcu: Expedite the rcu quiescent state reporting if help needed Joel Fernandes (Google)
@ 2019-07-01  4:04 ` Joel Fernandes (Google)
  2019-07-01 20:03   ` Paul E. McKenney
  2019-07-01  4:04 ` [RFC 3/3] Revert "rcutorture: Tweak kvm options" Joel Fernandes (Google)
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Joel Fernandes (Google) @ 2019-07-01  4:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	rcu, kernel-team, Josh Triplett, Lai Jiangshan, linux-kselftest,
	Mathieu Desnoyers, Paul E. McKenney, Sebastian Andrzej Siewior,
	Shuah Khan, Steven Rostedt

The rcu_preempt_note_context_switch() tries to handle cases where
__rcu_read_unlock() got preempted and then the context switch path does
the reporting of the quiscent state along with clearing any bits in the
rcu_read_unlock_special union.

This can be handled by just calling rcu_deferred_qs() which was added
during the RCU consolidation work and already does these checks.

Tested RCU config TREE03 for an hour which succeeds.

Cc: rcu@vger.kernel.org
Cc: kernel-team@android.com
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree_plugin.h | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index bff6410fac06..ebb4d46a6267 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -313,15 +313,6 @@ void rcu_note_context_switch(bool preempt)
 				       ? rnp->gp_seq
 				       : rcu_seq_snap(&rnp->gp_seq));
 		rcu_preempt_ctxt_queue(rnp, rdp);
-	} else if (t->rcu_read_lock_nesting < 0 &&
-		   t->rcu_read_unlock_special.s) {
-
-		/*
-		 * Complete exit from RCU read-side critical section on
-		 * behalf of preempted instance of __rcu_read_unlock().
-		 */
-		rcu_read_unlock_special(t);
-		rcu_preempt_deferred_qs(t);
 	} else {
 		rcu_preempt_deferred_qs(t);
 	}
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC 3/3] Revert "rcutorture: Tweak kvm options"
  2019-07-01  4:04 [RFC 1/3] rcu: Expedite the rcu quiescent state reporting if help needed Joel Fernandes (Google)
  2019-07-01  4:04 ` [RFC 2/3] rcu: Simplify rcu_note_context_switch exit from critical section Joel Fernandes (Google)
@ 2019-07-01  4:04 ` Joel Fernandes (Google)
  2019-07-01 12:23   ` Sebastian Andrzej Siewior
  2019-07-01 13:53 ` [RFC 1/3] rcu: Expedite the rcu quiescent state reporting if help needed Joel Fernandes
  2019-07-02  3:47 ` Paul E. McKenney
  3 siblings, 1 reply; 18+ messages in thread
From: Joel Fernandes (Google) @ 2019-07-01  4:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Josh Triplett, kernel-team, Lai Jiangshan, linux-kselftest,
	Mathieu Desnoyers, Paul E. McKenney, rcu,
	Sebastian Andrzej Siewior, Shuah Khan, Steven Rostedt

This reverts commit a6fda6dab93c2c06ef4b8cb4b9258df6674d2438 which
causes kvm.sh to not run on my machines. The qemu-system-x86_64 command
runs but does nothing.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
I am Ok if we want to drop this patch but it is in my tree because
without it I can't run the tests.

 tools/testing/selftests/rcutorture/bin/functions.sh | 13 +------------
 .../selftests/rcutorture/configs/rcu/CFcommon       |  3 ---
 2 files changed, 1 insertion(+), 15 deletions(-)

diff --git a/tools/testing/selftests/rcutorture/bin/functions.sh b/tools/testing/selftests/rcutorture/bin/functions.sh
index c3a49fb4d6f6..6bcb8b5b2ff2 100644
--- a/tools/testing/selftests/rcutorture/bin/functions.sh
+++ b/tools/testing/selftests/rcutorture/bin/functions.sh
@@ -172,7 +172,7 @@ identify_qemu_append () {
 	local console=ttyS0
 	case "$1" in
 	qemu-system-x86_64|qemu-system-i386)
-		echo selinux=0 initcall_debug debug
+		echo noapic selinux=0 initcall_debug debug
 		;;
 	qemu-system-aarch64)
 		console=ttyAMA0
@@ -191,19 +191,8 @@ identify_qemu_append () {
 # Output arguments for qemu arguments based on the TORTURE_QEMU_MAC
 # and TORTURE_QEMU_INTERACTIVE environment variables.
 identify_qemu_args () {
-	local KVM_CPU=""
-	case "$1" in
-	qemu-system-x86_64)
-		KVM_CPU=kvm64
-		;;
-	qemu-system-i386)
-		KVM_CPU=kvm32
-		;;
-	esac
 	case "$1" in
 	qemu-system-x86_64|qemu-system-i386)
-		echo -machine q35,accel=kvm
-		echo -cpu ${KVM_CPU}
 		;;
 	qemu-system-aarch64)
 		echo -machine virt,gic-version=host -cpu host
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/CFcommon b/tools/testing/selftests/rcutorture/configs/rcu/CFcommon
index e19a444a0684..d2d2a86139db 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/CFcommon
+++ b/tools/testing/selftests/rcutorture/configs/rcu/CFcommon
@@ -1,5 +1,2 @@
 CONFIG_RCU_TORTURE_TEST=y
 CONFIG_PRINTK_TIME=y
-CONFIG_HYPERVISOR_GUEST=y
-CONFIG_PARAVIRT=y
-CONFIG_KVM_GUEST=y
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC 3/3] Revert "rcutorture: Tweak kvm options"
  2019-07-01  4:04 ` [RFC 3/3] Revert "rcutorture: Tweak kvm options" Joel Fernandes (Google)
@ 2019-07-01 12:23   ` Sebastian Andrzej Siewior
  2019-07-01 14:14     ` Joel Fernandes
  0 siblings, 1 reply; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-07-01 12:23 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Josh Triplett, kernel-team, Lai Jiangshan,
	linux-kselftest, Mathieu Desnoyers, Paul E. McKenney, rcu,
	Shuah Khan, Steven Rostedt

On 2019-07-01 00:04:15 [-0400], Joel Fernandes (Google) wrote:
> This reverts commit a6fda6dab93c2c06ef4b8cb4b9258df6674d2438 which
> causes kvm.sh to not run on my machines. The qemu-system-x86_64 command
> runs but does nothing.

Nope. I would like to know *why* you need 'noapic' to work. Is it a
brand new or old qemu-system-x86_64?

> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Sebastian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 1/3] rcu: Expedite the rcu quiescent state reporting if help needed
  2019-07-01  4:04 [RFC 1/3] rcu: Expedite the rcu quiescent state reporting if help needed Joel Fernandes (Google)
  2019-07-01  4:04 ` [RFC 2/3] rcu: Simplify rcu_note_context_switch exit from critical section Joel Fernandes (Google)
  2019-07-01  4:04 ` [RFC 3/3] Revert "rcutorture: Tweak kvm options" Joel Fernandes (Google)
@ 2019-07-01 13:53 ` Joel Fernandes
  2019-07-02  3:47 ` Paul E. McKenney
  3 siblings, 0 replies; 18+ messages in thread
From: Joel Fernandes @ 2019-07-01 13:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: Josh Triplett, kernel-team, Lai Jiangshan, linux-kselftest,
	Mathieu Desnoyers, Paul E. McKenney, rcu,
	Sebastian Andrzej Siewior, Shuah Khan, Steven Rostedt

On Mon, Jul 01, 2019 at 12:04:13AM -0400, Joel Fernandes (Google) wrote:
> The t->rcu_read_unlock_special union's need_qs bit can be set by the
> scheduler tick (in rcu_flavor_sched_clock_irq) to indicate that help is
> needed from the rcu_read_unlock path. When this help arrives however, we
> can do better to speed up the quiescent state reporting which if
> rcu_read_unlock_special::need_qs is set might be quite urgent. Make use
> of this information in deciding when to do heavy-weight softirq raising
> where possible.

Just fyi, TREE01-06, SRCU-N and SRCU-t passed overnight testing with this series.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 3/3] Revert "rcutorture: Tweak kvm options"
  2019-07-01 12:23   ` Sebastian Andrzej Siewior
@ 2019-07-01 14:14     ` Joel Fernandes
  2019-07-01 14:48       ` Dmitry Vyukov
  0 siblings, 1 reply; 18+ messages in thread
From: Joel Fernandes @ 2019-07-01 14:14 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, Josh Triplett, kernel-team, Lai Jiangshan,
	linux-kselftest, Mathieu Desnoyers, Paul E. McKenney, rcu,
	Shuah Khan, Steven Rostedt, dvyukov

On Mon, Jul 01, 2019 at 02:23:58PM +0200, Sebastian Andrzej Siewior wrote:
> On 2019-07-01 00:04:15 [-0400], Joel Fernandes (Google) wrote:
> > This reverts commit a6fda6dab93c2c06ef4b8cb4b9258df6674d2438 which
> > causes kvm.sh to not run on my machines. The qemu-system-x86_64 command
> > runs but does nothing.
> 
> Nope. I would like to know *why* you need 'noapic' to work. Is it a
> brand new or old qemu-system-x86_64?

I did not have time to debug yesterday and I posted this particular revert as
an 'RFC' just to make aware of this problem.

I spent some more time just now, it looks like this has nothing to do with
'noapic' and appears to be a problem on debian distros with the e1000e NIC.
May be this NIC was added to the virtual hardware because of -machine in the
patch?

Any if I add the following to the qemu command that kvm.sh runs, it works again:
-net nic,model=e1000

Without it I get:
 qemu-system-x86_64: Initialization of device e1000e failed: failed to find romfile "efi-e1000e.rom"

Seems to be mentioned here:
https://bugs.launchpad.net/ubuntu/+source/ipxe/+bug/1737211

And in syzkaller as well:
https://github.com/google/syzkaller/blob/master/vm/qemu/qemu.go#L88

Adding Dmitry who is syzkaller owner for any thoughts as well.

I'm happy to write a patch to set the nic model as e1000 and send it out if
we agree this solution is good enough.

 - Joel


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 3/3] Revert "rcutorture: Tweak kvm options"
  2019-07-01 14:14     ` Joel Fernandes
@ 2019-07-01 14:48       ` Dmitry Vyukov
  2019-07-01 14:59         ` Joel Fernandes
  0 siblings, 1 reply; 18+ messages in thread
From: Dmitry Vyukov @ 2019-07-01 14:48 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Sebastian Andrzej Siewior, LKML, Josh Triplett, kernel-team,
	Lai Jiangshan, open list:KERNEL SELFTEST FRAMEWORK,
	Mathieu Desnoyers, Paul E. McKenney, rcu, Shuah Khan,
	Steven Rostedt

On Mon, Jul 1, 2019 at 4:14 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> On Mon, Jul 01, 2019 at 02:23:58PM +0200, Sebastian Andrzej Siewior wrote:
> > On 2019-07-01 00:04:15 [-0400], Joel Fernandes (Google) wrote:
> > > This reverts commit a6fda6dab93c2c06ef4b8cb4b9258df6674d2438 which
> > > causes kvm.sh to not run on my machines. The qemu-system-x86_64 command
> > > runs but does nothing.
> >
> > Nope. I would like to know *why* you need 'noapic' to work. Is it a
> > brand new or old qemu-system-x86_64?
>
> I did not have time to debug yesterday and I posted this particular revert as
> an 'RFC' just to make aware of this problem.
>
> I spent some more time just now, it looks like this has nothing to do with
> 'noapic' and appears to be a problem on debian distros with the e1000e NIC.
> May be this NIC was added to the virtual hardware because of -machine in the
> patch?
>
> Any if I add the following to the qemu command that kvm.sh runs, it works again:
> -net nic,model=e1000
>
> Without it I get:
>  qemu-system-x86_64: Initialization of device e1000e failed: failed to find romfile "efi-e1000e.rom"
>
> Seems to be mentioned here:
> https://bugs.launchpad.net/ubuntu/+source/ipxe/+bug/1737211
>
> And in syzkaller as well:
> https://github.com/google/syzkaller/blob/master/vm/qemu/qemu.go#L88
>
> Adding Dmitry who is syzkaller owner for any thoughts as well.

I don't have many thoughts on this. That particular error looked like
a bug in the package in the particular distro/version.


> I'm happy to write a patch to set the nic model as e1000 and send it out if
> we agree this solution is good enough.
>
>  - Joel
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 3/3] Revert "rcutorture: Tweak kvm options"
  2019-07-01 14:48       ` Dmitry Vyukov
@ 2019-07-01 14:59         ` Joel Fernandes
  2019-07-01 20:27           ` Paul E. McKenney
  0 siblings, 1 reply; 18+ messages in thread
From: Joel Fernandes @ 2019-07-01 14:59 UTC (permalink / raw)
  To: Paul E. McKenney, rcu

On Mon, Jul 01, 2019 at 04:48:43PM +0200, Dmitry Vyukov wrote:
> On Mon, Jul 1, 2019 at 4:14 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> >
> > On Mon, Jul 01, 2019 at 02:23:58PM +0200, Sebastian Andrzej Siewior wrote:
> > > On 2019-07-01 00:04:15 [-0400], Joel Fernandes (Google) wrote:
> > > > This reverts commit a6fda6dab93c2c06ef4b8cb4b9258df6674d2438 which
> > > > causes kvm.sh to not run on my machines. The qemu-system-x86_64 command
> > > > runs but does nothing.
> > >
> > > Nope. I would like to know *why* you need 'noapic' to work. Is it a
> > > brand new or old qemu-system-x86_64?
> >
> > I did not have time to debug yesterday and I posted this particular revert as
> > an 'RFC' just to make aware of this problem.
> >
> > I spent some more time just now, it looks like this has nothing to do with
> > 'noapic' and appears to be a problem on debian distros with the e1000e NIC.
> > May be this NIC was added to the virtual hardware because of -machine in the
> > patch?
> >
> > Any if I add the following to the qemu command that kvm.sh runs, it works again:
> > -net nic,model=e1000
> >
> > Without it I get:
> >  qemu-system-x86_64: Initialization of device e1000e failed: failed to find romfile "efi-e1000e.rom"
> >
> > Seems to be mentioned here:
> > https://bugs.launchpad.net/ubuntu/+source/ipxe/+bug/1737211
> >
> > And in syzkaller as well:
> > https://github.com/google/syzkaller/blob/master/vm/qemu/qemu.go#L88
> >
> > Adding Dmitry who is syzkaller owner for any thoughts as well.
> 
> I don't have many thoughts on this. That particular error looked like
> a bug in the package in the particular distro/version.

Paul, what is your preference here? Can we add the -net nic,model=e1000 to
fix it for the benefit of any other Debian folks running kvm.sh?

Or do you prefer if I just built my own custom Qemu? I can't upgrade Qemu on
this machine unfortunately. But may be I can build my own.

I prefer the -net option since I can save the time for something else. ;) Let
me know what you prefer and I'll fix it accordingly.

thanks,

 J.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 2/3] rcu: Simplify rcu_note_context_switch exit from critical section
  2019-07-01  4:04 ` [RFC 2/3] rcu: Simplify rcu_note_context_switch exit from critical section Joel Fernandes (Google)
@ 2019-07-01 20:03   ` Paul E. McKenney
  2019-07-01 21:33     ` Joel Fernandes
  0 siblings, 1 reply; 18+ messages in thread
From: Paul E. McKenney @ 2019-07-01 20:03 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, rcu, kernel-team, Josh Triplett, Lai Jiangshan,
	linux-kselftest, Mathieu Desnoyers, Sebastian Andrzej Siewior,
	Shuah Khan, Steven Rostedt

On Mon, Jul 01, 2019 at 12:04:14AM -0400, Joel Fernandes (Google) wrote:
> The rcu_preempt_note_context_switch() tries to handle cases where
> __rcu_read_unlock() got preempted and then the context switch path does
> the reporting of the quiscent state along with clearing any bits in the
> rcu_read_unlock_special union.
> 
> This can be handled by just calling rcu_deferred_qs() which was added
> during the RCU consolidation work and already does these checks.
> 
> Tested RCU config TREE03 for an hour which succeeds.
> 
> Cc: rcu@vger.kernel.org
> Cc: kernel-team@android.com
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

My first reaction was "that cannot possibly work", but after a bit of
digging, it really does appear to work just fine.  I therefore expanded
the commit log a bit, so please check it to catch any messups on my part.

Very cool, thank you very much!  ;-)

							Thanx, Paul

------------------------------------------------------------------------

commit ce547cb41ed7662f70d0b07d4c7f7555ba130c61
Author: Joel Fernandes (Google) <joel@joelfernandes.org>
Date:   Mon Jul 1 00:04:14 2019 -0400

    rcu: Simplify rcu_note_context_switch exit from critical section
    
    Because __rcu_read_unlock() can be preempted just before the call to
    rcu_read_unlock_special(), it is possible for a task to be preempted just
    before it would have fully exited its RCU read-side critical section.
    This would result in a needless extension of that critical section until
    that task was resumed, which might in turn result in a needlessly
    long grace period, needless RCU priority boosting, and needless
    force-quiescent-state actions.  Therefore, rcu_note_context_switch()
    invokes __rcu_read_unlock() followed by rcu_preempt_deferred_qs() when
    it detects this situation.  This action by rcu_note_context_switch()
    ends the RCU read-side critical section immediately.
    
    Of course, once the task resumes, it will invoke rcu_read_unlock_special()
    redundantly.  This is harmless because the fact that a preemption
    happened means that interrupts, preemption, and softirqs cannot
    have been disabled, so there would be no deferred quiescent state.
    While ->rcu_read_lock_nesting remains less than zero, none of the
    ->rcu_read_unlock_special.b bits can be set, and they were all zeroed by
    the call to rcu_note_context_switch() at task-preemption time.  Therefore,
    setting ->rcu_read_unlock_special.b.exp_hint to false has no effect.
    
    Therefore, the extra call to rcu_preempt_deferred_qs_irqrestore()
    would return immediately.  With one possible exception, which is
    if an expedited grace period started just as the task was being
    resumed, which could leave ->exp_deferred_qs set.  This will cause
    rcu_preempt_deferred_qs_irqrestore() to invoke rcu_report_exp_rdp(),
    reporting the quiescent state, just as it should.  (Such an expedited
    grace period won't affect the preemption code path due to interrupts
    having already been disabled.)
    
    But when rcu_note_context_switch() invokes __rcu_read_unlock(), it
    is doing so with preemption disabled, hence __rcu_read_unlock() will
    unconditionally defer the quiescent state, only to immediately invoke
    rcu_preempt_deferred_qs(), thus immediately reporting the deferred
    quiescent state.  It turns out to be safe (and faster) to instead
    just invoke rcu_preempt_deferred_qs() without the __rcu_read_unlock()
    middleman.
    
    Because this is the invocation during the preemption (as opposed to
    the invocation just after the resume), at least one of the bits in
    ->rcu_read_unlock_special.b must be set and ->rcu_read_lock_nesting
    must be negative.  This means that rcu_preempt_need_deferred_qs() must
    return true, avoiding the early exit from rcu_preempt_deferred_qs().
    Thus, rcu_preempt_deferred_qs_irqrestore() will be invoked immediately,
    as required.
    
    This commit therefore simplifies the CONFIG_PREEMPT=y version of
    rcu_note_context_switch() by removing the "else if" branch of its
    "if" statement.  This change means that all callers that would have
    invoked rcu_read_unlock_special() followed by rcu_preempt_deferred_qs()
    will now simply invoke rcu_preempt_deferred_qs(), thus avoiding the
    rcu_read_unlock_special() middleman when __rcu_read_unlock() is preempted.
    
    Cc: rcu@vger.kernel.org
    Cc: kernel-team@android.com
    Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 187dc076c497..214e4689c29d 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -313,15 +313,6 @@ void rcu_note_context_switch(bool preempt)
 				       ? rnp->gp_seq
 				       : rcu_seq_snap(&rnp->gp_seq));
 		rcu_preempt_ctxt_queue(rnp, rdp);
-	} else if (t->rcu_read_lock_nesting < 0 &&
-		   t->rcu_read_unlock_special.s) {
-
-		/*
-		 * Complete exit from RCU read-side critical section on
-		 * behalf of preempted instance of __rcu_read_unlock().
-		 */
-		rcu_read_unlock_special(t);
-		rcu_preempt_deferred_qs(t);
 	} else {
 		rcu_preempt_deferred_qs(t);
 	}


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC 3/3] Revert "rcutorture: Tweak kvm options"
  2019-07-01 14:59         ` Joel Fernandes
@ 2019-07-01 20:27           ` Paul E. McKenney
  2019-07-01 21:31             ` Joel Fernandes
  0 siblings, 1 reply; 18+ messages in thread
From: Paul E. McKenney @ 2019-07-01 20:27 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: rcu

On Mon, Jul 01, 2019 at 10:59:20AM -0400, Joel Fernandes wrote:
> On Mon, Jul 01, 2019 at 04:48:43PM +0200, Dmitry Vyukov wrote:
> > On Mon, Jul 1, 2019 at 4:14 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> > >
> > > On Mon, Jul 01, 2019 at 02:23:58PM +0200, Sebastian Andrzej Siewior wrote:
> > > > On 2019-07-01 00:04:15 [-0400], Joel Fernandes (Google) wrote:
> > > > > This reverts commit a6fda6dab93c2c06ef4b8cb4b9258df6674d2438 which
> > > > > causes kvm.sh to not run on my machines. The qemu-system-x86_64 command
> > > > > runs but does nothing.
> > > >
> > > > Nope. I would like to know *why* you need 'noapic' to work. Is it a
> > > > brand new or old qemu-system-x86_64?
> > >
> > > I did not have time to debug yesterday and I posted this particular revert as
> > > an 'RFC' just to make aware of this problem.
> > >
> > > I spent some more time just now, it looks like this has nothing to do with
> > > 'noapic' and appears to be a problem on debian distros with the e1000e NIC.
> > > May be this NIC was added to the virtual hardware because of -machine in the
> > > patch?
> > >
> > > Any if I add the following to the qemu command that kvm.sh runs, it works again:
> > > -net nic,model=e1000
> > >
> > > Without it I get:
> > >  qemu-system-x86_64: Initialization of device e1000e failed: failed to find romfile "efi-e1000e.rom"
> > >
> > > Seems to be mentioned here:
> > > https://bugs.launchpad.net/ubuntu/+source/ipxe/+bug/1737211
> > >
> > > And in syzkaller as well:
> > > https://github.com/google/syzkaller/blob/master/vm/qemu/qemu.go#L88
> > >
> > > Adding Dmitry who is syzkaller owner for any thoughts as well.
> > 
> > I don't have many thoughts on this. That particular error looked like
> > a bug in the package in the particular distro/version.
> 
> Paul, what is your preference here? Can we add the -net nic,model=e1000 to
> fix it for the benefit of any other Debian folks running kvm.sh?
> 
> Or do you prefer if I just built my own custom Qemu? I can't upgrade Qemu on
> this machine unfortunately. But may be I can build my own.
> 
> I prefer the -net option since I can save the time for something else. ;) Let
> me know what you prefer and I'll fix it accordingly.

Why not just add the following to your kvm.sh command line?

	--qemu-args "-net nic,model=e1000"

Easy workaround and no need to wait for changes to hit mainline.

Yes, I am perhaps naively assuming that the qemu bug in Debian will
be fixed in about the time that it would take for any change to the
rcutorture scripting to make its way into mainline.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 3/3] Revert "rcutorture: Tweak kvm options"
  2019-07-01 20:27           ` Paul E. McKenney
@ 2019-07-01 21:31             ` Joel Fernandes
  2019-07-01 21:50               ` Paul E. McKenney
  0 siblings, 1 reply; 18+ messages in thread
From: Joel Fernandes @ 2019-07-01 21:31 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: rcu

On Mon, Jul 01, 2019 at 01:27:27PM -0700, Paul E. McKenney wrote:
> On Mon, Jul 01, 2019 at 10:59:20AM -0400, Joel Fernandes wrote:
> > On Mon, Jul 01, 2019 at 04:48:43PM +0200, Dmitry Vyukov wrote:
> > > On Mon, Jul 1, 2019 at 4:14 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> > > >
> > > > On Mon, Jul 01, 2019 at 02:23:58PM +0200, Sebastian Andrzej Siewior wrote:
> > > > > On 2019-07-01 00:04:15 [-0400], Joel Fernandes (Google) wrote:
> > > > > > This reverts commit a6fda6dab93c2c06ef4b8cb4b9258df6674d2438 which
> > > > > > causes kvm.sh to not run on my machines. The qemu-system-x86_64 command
> > > > > > runs but does nothing.
> > > > >
> > > > > Nope. I would like to know *why* you need 'noapic' to work. Is it a
> > > > > brand new or old qemu-system-x86_64?
> > > >
> > > > I did not have time to debug yesterday and I posted this particular revert as
> > > > an 'RFC' just to make aware of this problem.
> > > >
> > > > I spent some more time just now, it looks like this has nothing to do with
> > > > 'noapic' and appears to be a problem on debian distros with the e1000e NIC.
> > > > May be this NIC was added to the virtual hardware because of -machine in the
> > > > patch?
> > > >
> > > > Any if I add the following to the qemu command that kvm.sh runs, it works again:
> > > > -net nic,model=e1000
> > > >
> > > > Without it I get:
> > > >  qemu-system-x86_64: Initialization of device e1000e failed: failed to find romfile "efi-e1000e.rom"
> > > >
> > > > Seems to be mentioned here:
> > > > https://bugs.launchpad.net/ubuntu/+source/ipxe/+bug/1737211
> > > >
> > > > And in syzkaller as well:
> > > > https://github.com/google/syzkaller/blob/master/vm/qemu/qemu.go#L88
> > > >
> > > > Adding Dmitry who is syzkaller owner for any thoughts as well.
> > > 
> > > I don't have many thoughts on this. That particular error looked like
> > > a bug in the package in the particular distro/version.
> > 
> > Paul, what is your preference here? Can we add the -net nic,model=e1000 to
> > fix it for the benefit of any other Debian folks running kvm.sh?
> > 
> > Or do you prefer if I just built my own custom Qemu? I can't upgrade Qemu on
> > this machine unfortunately. But may be I can build my own.
> > 
> > I prefer the -net option since I can save the time for something else. ;) Let
> > me know what you prefer and I'll fix it accordingly.
> 
> Why not just add the following to your kvm.sh command line?
> 
> 	--qemu-args "-net nic,model=e1000"
> 
> Easy workaround and no need to wait for changes to hit mainline.
> 
> Yes, I am perhaps naively assuming that the qemu bug in Debian will
> be fixed in about the time that it would take for any change to the
> rcutorture scripting to make its way into mainline.  ;-)
> 

Thanks for the pointer to the args! Just what I wanted.

Only down side is then we run into the risk of other developers running
kvm.sh on debian as well and wondering why torture is broken. Perhaps the kvm
tweaks commit hasn't hit that many machines yet but it might. At least we
have this discussion archive here to guide such poor souls, if we don't want
to append these arguments by default ;-)

The other fix, could be I just build and run a new custom qemu instance but I
am happy with qemu-args as well.

thanks,

 J.

> 							Thanx, Paul

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 2/3] rcu: Simplify rcu_note_context_switch exit from critical section
  2019-07-01 20:03   ` Paul E. McKenney
@ 2019-07-01 21:33     ` Joel Fernandes
  2019-07-01 21:42       ` Paul E. McKenney
  0 siblings, 1 reply; 18+ messages in thread
From: Joel Fernandes @ 2019-07-01 21:33 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, rcu, kernel-team, Josh Triplett, Lai Jiangshan,
	linux-kselftest, Mathieu Desnoyers, Sebastian Andrzej Siewior,
	Shuah Khan, Steven Rostedt

On Mon, Jul 01, 2019 at 01:03:10PM -0700, Paul E. McKenney wrote:
> On Mon, Jul 01, 2019 at 12:04:14AM -0400, Joel Fernandes (Google) wrote:
> > The rcu_preempt_note_context_switch() tries to handle cases where
> > __rcu_read_unlock() got preempted and then the context switch path does
> > the reporting of the quiscent state along with clearing any bits in the
> > rcu_read_unlock_special union.
> > 
> > This can be handled by just calling rcu_deferred_qs() which was added
> > during the RCU consolidation work and already does these checks.
> > 
> > Tested RCU config TREE03 for an hour which succeeds.
> > 
> > Cc: rcu@vger.kernel.org
> > Cc: kernel-team@android.com
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> My first reaction was "that cannot possibly work", but after a bit of
> digging, it really does appear to work just fine.  I therefore expanded
> the commit log a bit, so please check it to catch any messups on my part.
> 
> Very cool, thank you very much!  ;-)

Awesome! Thanks. I am glad you agree with the change and I agree with your
changes to the commit log.

thanks,

 - Joel


> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit ce547cb41ed7662f70d0b07d4c7f7555ba130c61
> Author: Joel Fernandes (Google) <joel@joelfernandes.org>
> Date:   Mon Jul 1 00:04:14 2019 -0400
> 
>     rcu: Simplify rcu_note_context_switch exit from critical section
>     
>     Because __rcu_read_unlock() can be preempted just before the call to
>     rcu_read_unlock_special(), it is possible for a task to be preempted just
>     before it would have fully exited its RCU read-side critical section.
>     This would result in a needless extension of that critical section until
>     that task was resumed, which might in turn result in a needlessly
>     long grace period, needless RCU priority boosting, and needless
>     force-quiescent-state actions.  Therefore, rcu_note_context_switch()
>     invokes __rcu_read_unlock() followed by rcu_preempt_deferred_qs() when
>     it detects this situation.  This action by rcu_note_context_switch()
>     ends the RCU read-side critical section immediately.
>     
>     Of course, once the task resumes, it will invoke rcu_read_unlock_special()
>     redundantly.  This is harmless because the fact that a preemption
>     happened means that interrupts, preemption, and softirqs cannot
>     have been disabled, so there would be no deferred quiescent state.
>     While ->rcu_read_lock_nesting remains less than zero, none of the
>     ->rcu_read_unlock_special.b bits can be set, and they were all zeroed by
>     the call to rcu_note_context_switch() at task-preemption time.  Therefore,
>     setting ->rcu_read_unlock_special.b.exp_hint to false has no effect.
>     
>     Therefore, the extra call to rcu_preempt_deferred_qs_irqrestore()
>     would return immediately.  With one possible exception, which is
>     if an expedited grace period started just as the task was being
>     resumed, which could leave ->exp_deferred_qs set.  This will cause
>     rcu_preempt_deferred_qs_irqrestore() to invoke rcu_report_exp_rdp(),
>     reporting the quiescent state, just as it should.  (Such an expedited
>     grace period won't affect the preemption code path due to interrupts
>     having already been disabled.)
>     
>     But when rcu_note_context_switch() invokes __rcu_read_unlock(), it
>     is doing so with preemption disabled, hence __rcu_read_unlock() will
>     unconditionally defer the quiescent state, only to immediately invoke
>     rcu_preempt_deferred_qs(), thus immediately reporting the deferred
>     quiescent state.  It turns out to be safe (and faster) to instead
>     just invoke rcu_preempt_deferred_qs() without the __rcu_read_unlock()
>     middleman.
>     
>     Because this is the invocation during the preemption (as opposed to
>     the invocation just after the resume), at least one of the bits in
>     ->rcu_read_unlock_special.b must be set and ->rcu_read_lock_nesting
>     must be negative.  This means that rcu_preempt_need_deferred_qs() must
>     return true, avoiding the early exit from rcu_preempt_deferred_qs().
>     Thus, rcu_preempt_deferred_qs_irqrestore() will be invoked immediately,
>     as required.
>     
>     This commit therefore simplifies the CONFIG_PREEMPT=y version of
>     rcu_note_context_switch() by removing the "else if" branch of its
>     "if" statement.  This change means that all callers that would have
>     invoked rcu_read_unlock_special() followed by rcu_preempt_deferred_qs()
>     will now simply invoke rcu_preempt_deferred_qs(), thus avoiding the
>     rcu_read_unlock_special() middleman when __rcu_read_unlock() is preempted.
>     
>     Cc: rcu@vger.kernel.org
>     Cc: kernel-team@android.com
>     Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 187dc076c497..214e4689c29d 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -313,15 +313,6 @@ void rcu_note_context_switch(bool preempt)
>  				       ? rnp->gp_seq
>  				       : rcu_seq_snap(&rnp->gp_seq));
>  		rcu_preempt_ctxt_queue(rnp, rdp);
> -	} else if (t->rcu_read_lock_nesting < 0 &&
> -		   t->rcu_read_unlock_special.s) {
> -
> -		/*
> -		 * Complete exit from RCU read-side critical section on
> -		 * behalf of preempted instance of __rcu_read_unlock().
> -		 */
> -		rcu_read_unlock_special(t);
> -		rcu_preempt_deferred_qs(t);
>  	} else {
>  		rcu_preempt_deferred_qs(t);
>  	}
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 2/3] rcu: Simplify rcu_note_context_switch exit from critical section
  2019-07-01 21:33     ` Joel Fernandes
@ 2019-07-01 21:42       ` Paul E. McKenney
  0 siblings, 0 replies; 18+ messages in thread
From: Paul E. McKenney @ 2019-07-01 21:42 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, rcu, kernel-team, Josh Triplett, Lai Jiangshan,
	linux-kselftest, Mathieu Desnoyers, Sebastian Andrzej Siewior,
	Shuah Khan, Steven Rostedt

On Mon, Jul 01, 2019 at 05:33:28PM -0400, Joel Fernandes wrote:
> On Mon, Jul 01, 2019 at 01:03:10PM -0700, Paul E. McKenney wrote:
> > On Mon, Jul 01, 2019 at 12:04:14AM -0400, Joel Fernandes (Google) wrote:
> > > The rcu_preempt_note_context_switch() tries to handle cases where
> > > __rcu_read_unlock() got preempted and then the context switch path does
> > > the reporting of the quiscent state along with clearing any bits in the
> > > rcu_read_unlock_special union.
> > > 
> > > This can be handled by just calling rcu_deferred_qs() which was added
> > > during the RCU consolidation work and already does these checks.
> > > 
> > > Tested RCU config TREE03 for an hour which succeeds.
> > > 
> > > Cc: rcu@vger.kernel.org
> > > Cc: kernel-team@android.com
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > 
> > My first reaction was "that cannot possibly work", but after a bit of
> > digging, it really does appear to work just fine.  I therefore expanded
> > the commit log a bit, so please check it to catch any messups on my part.
> > 
> > Very cool, thank you very much!  ;-)
> 
> Awesome! Thanks. I am glad you agree with the change and I agree with your
> changes to the commit log.

Very good, I will push it to -rcu shortly.

							Thanx, Paul

> thanks,
> 
>  - Joel
> 
> 
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > commit ce547cb41ed7662f70d0b07d4c7f7555ba130c61
> > Author: Joel Fernandes (Google) <joel@joelfernandes.org>
> > Date:   Mon Jul 1 00:04:14 2019 -0400
> > 
> >     rcu: Simplify rcu_note_context_switch exit from critical section
> >     
> >     Because __rcu_read_unlock() can be preempted just before the call to
> >     rcu_read_unlock_special(), it is possible for a task to be preempted just
> >     before it would have fully exited its RCU read-side critical section.
> >     This would result in a needless extension of that critical section until
> >     that task was resumed, which might in turn result in a needlessly
> >     long grace period, needless RCU priority boosting, and needless
> >     force-quiescent-state actions.  Therefore, rcu_note_context_switch()
> >     invokes __rcu_read_unlock() followed by rcu_preempt_deferred_qs() when
> >     it detects this situation.  This action by rcu_note_context_switch()
> >     ends the RCU read-side critical section immediately.
> >     
> >     Of course, once the task resumes, it will invoke rcu_read_unlock_special()
> >     redundantly.  This is harmless because the fact that a preemption
> >     happened means that interrupts, preemption, and softirqs cannot
> >     have been disabled, so there would be no deferred quiescent state.
> >     While ->rcu_read_lock_nesting remains less than zero, none of the
> >     ->rcu_read_unlock_special.b bits can be set, and they were all zeroed by
> >     the call to rcu_note_context_switch() at task-preemption time.  Therefore,
> >     setting ->rcu_read_unlock_special.b.exp_hint to false has no effect.
> >     
> >     Therefore, the extra call to rcu_preempt_deferred_qs_irqrestore()
> >     would return immediately.  With one possible exception, which is
> >     if an expedited grace period started just as the task was being
> >     resumed, which could leave ->exp_deferred_qs set.  This will cause
> >     rcu_preempt_deferred_qs_irqrestore() to invoke rcu_report_exp_rdp(),
> >     reporting the quiescent state, just as it should.  (Such an expedited
> >     grace period won't affect the preemption code path due to interrupts
> >     having already been disabled.)
> >     
> >     But when rcu_note_context_switch() invokes __rcu_read_unlock(), it
> >     is doing so with preemption disabled, hence __rcu_read_unlock() will
> >     unconditionally defer the quiescent state, only to immediately invoke
> >     rcu_preempt_deferred_qs(), thus immediately reporting the deferred
> >     quiescent state.  It turns out to be safe (and faster) to instead
> >     just invoke rcu_preempt_deferred_qs() without the __rcu_read_unlock()
> >     middleman.
> >     
> >     Because this is the invocation during the preemption (as opposed to
> >     the invocation just after the resume), at least one of the bits in
> >     ->rcu_read_unlock_special.b must be set and ->rcu_read_lock_nesting
> >     must be negative.  This means that rcu_preempt_need_deferred_qs() must
> >     return true, avoiding the early exit from rcu_preempt_deferred_qs().
> >     Thus, rcu_preempt_deferred_qs_irqrestore() will be invoked immediately,
> >     as required.
> >     
> >     This commit therefore simplifies the CONFIG_PREEMPT=y version of
> >     rcu_note_context_switch() by removing the "else if" branch of its
> >     "if" statement.  This change means that all callers that would have
> >     invoked rcu_read_unlock_special() followed by rcu_preempt_deferred_qs()
> >     will now simply invoke rcu_preempt_deferred_qs(), thus avoiding the
> >     rcu_read_unlock_special() middleman when __rcu_read_unlock() is preempted.
> >     
> >     Cc: rcu@vger.kernel.org
> >     Cc: kernel-team@android.com
> >     Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> >     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> > 
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 187dc076c497..214e4689c29d 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -313,15 +313,6 @@ void rcu_note_context_switch(bool preempt)
> >  				       ? rnp->gp_seq
> >  				       : rcu_seq_snap(&rnp->gp_seq));
> >  		rcu_preempt_ctxt_queue(rnp, rdp);
> > -	} else if (t->rcu_read_lock_nesting < 0 &&
> > -		   t->rcu_read_unlock_special.s) {
> > -
> > -		/*
> > -		 * Complete exit from RCU read-side critical section on
> > -		 * behalf of preempted instance of __rcu_read_unlock().
> > -		 */
> > -		rcu_read_unlock_special(t);
> > -		rcu_preempt_deferred_qs(t);
> >  	} else {
> >  		rcu_preempt_deferred_qs(t);
> >  	}
> > 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 3/3] Revert "rcutorture: Tweak kvm options"
  2019-07-01 21:31             ` Joel Fernandes
@ 2019-07-01 21:50               ` Paul E. McKenney
  2019-07-02  2:56                 ` Joel Fernandes
  0 siblings, 1 reply; 18+ messages in thread
From: Paul E. McKenney @ 2019-07-01 21:50 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: rcu

On Mon, Jul 01, 2019 at 05:31:07PM -0400, Joel Fernandes wrote:
> On Mon, Jul 01, 2019 at 01:27:27PM -0700, Paul E. McKenney wrote:
> > On Mon, Jul 01, 2019 at 10:59:20AM -0400, Joel Fernandes wrote:
> > > On Mon, Jul 01, 2019 at 04:48:43PM +0200, Dmitry Vyukov wrote:
> > > > On Mon, Jul 1, 2019 at 4:14 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> > > > >
> > > > > On Mon, Jul 01, 2019 at 02:23:58PM +0200, Sebastian Andrzej Siewior wrote:
> > > > > > On 2019-07-01 00:04:15 [-0400], Joel Fernandes (Google) wrote:
> > > > > > > This reverts commit a6fda6dab93c2c06ef4b8cb4b9258df6674d2438 which
> > > > > > > causes kvm.sh to not run on my machines. The qemu-system-x86_64 command
> > > > > > > runs but does nothing.
> > > > > >
> > > > > > Nope. I would like to know *why* you need 'noapic' to work. Is it a
> > > > > > brand new or old qemu-system-x86_64?
> > > > >
> > > > > I did not have time to debug yesterday and I posted this particular revert as
> > > > > an 'RFC' just to make aware of this problem.
> > > > >
> > > > > I spent some more time just now, it looks like this has nothing to do with
> > > > > 'noapic' and appears to be a problem on debian distros with the e1000e NIC.
> > > > > May be this NIC was added to the virtual hardware because of -machine in the
> > > > > patch?
> > > > >
> > > > > Any if I add the following to the qemu command that kvm.sh runs, it works again:
> > > > > -net nic,model=e1000
> > > > >
> > > > > Without it I get:
> > > > >  qemu-system-x86_64: Initialization of device e1000e failed: failed to find romfile "efi-e1000e.rom"
> > > > >
> > > > > Seems to be mentioned here:
> > > > > https://bugs.launchpad.net/ubuntu/+source/ipxe/+bug/1737211
> > > > >
> > > > > And in syzkaller as well:
> > > > > https://github.com/google/syzkaller/blob/master/vm/qemu/qemu.go#L88
> > > > >
> > > > > Adding Dmitry who is syzkaller owner for any thoughts as well.
> > > > 
> > > > I don't have many thoughts on this. That particular error looked like
> > > > a bug in the package in the particular distro/version.
> > > 
> > > Paul, what is your preference here? Can we add the -net nic,model=e1000 to
> > > fix it for the benefit of any other Debian folks running kvm.sh?
> > > 
> > > Or do you prefer if I just built my own custom Qemu? I can't upgrade Qemu on
> > > this machine unfortunately. But may be I can build my own.
> > > 
> > > I prefer the -net option since I can save the time for something else. ;) Let
> > > me know what you prefer and I'll fix it accordingly.
> > 
> > Why not just add the following to your kvm.sh command line?
> > 
> > 	--qemu-args "-net nic,model=e1000"
> > 
> > Easy workaround and no need to wait for changes to hit mainline.
> > 
> > Yes, I am perhaps naively assuming that the qemu bug in Debian will
> > be fixed in about the time that it would take for any change to the
> > rcutorture scripting to make its way into mainline.  ;-)
> 
> Thanks for the pointer to the args! Just what I wanted.
> 
> Only down side is then we run into the risk of other developers running
> kvm.sh on debian as well and wondering why torture is broken. Perhaps the kvm
> tweaks commit hasn't hit that many machines yet but it might. At least we
> have this discussion archive here to guide such poor souls, if we don't want
> to append these arguments by default ;-)

What is the likely timeframe of a qemu fix in Debian?

This does raise a question of rcutorture workarounds in general.
There will be times when some failure or another is expected behavior.
For example, in current -rcu getting an rcutorture_oom_notify() splat or
the occasional RCU CPU stall warning from TREE04 is expected behavior.
This is because rcutorture's forward-progress testing is more aggressive
that the TREE04 configuration can currently handle (but I am working
on it).

For internal-to-RCU issues, I could make rcutorture warn of the likely
problem.  for external-to-RCU issues, such as your fun with Debian qemu,
that won't work all that well.  I could blog about the external-to-RCU
issues, if that would help.  Or we could trust people to either do
the websearch or ask for help on the rcu email list.

Thoughts?

> The other fix, could be I just build and run a new custom qemu instance but I
> am happy with qemu-args as well.

My concern is that applying the workaround to the rcutorture scripts might
break someone else.  Plus it is hard to synchronize with qemu releases,
to say nothing of distro releases containing qemu.  Hence my preference
for use of kvm.sh's --qemu-args for this purpose.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 3/3] Revert "rcutorture: Tweak kvm options"
  2019-07-01 21:50               ` Paul E. McKenney
@ 2019-07-02  2:56                 ` Joel Fernandes
  2019-07-02  3:34                   ` Paul E. McKenney
  0 siblings, 1 reply; 18+ messages in thread
From: Joel Fernandes @ 2019-07-02  2:56 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: rcu

On Mon, Jul 01, 2019 at 02:50:20PM -0700, Paul E. McKenney wrote:
> On Mon, Jul 01, 2019 at 05:31:07PM -0400, Joel Fernandes wrote:
> > On Mon, Jul 01, 2019 at 01:27:27PM -0700, Paul E. McKenney wrote:
> > > On Mon, Jul 01, 2019 at 10:59:20AM -0400, Joel Fernandes wrote:
> > > > On Mon, Jul 01, 2019 at 04:48:43PM +0200, Dmitry Vyukov wrote:
> > > > > On Mon, Jul 1, 2019 at 4:14 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> > > > > >
> > > > > > On Mon, Jul 01, 2019 at 02:23:58PM +0200, Sebastian Andrzej Siewior wrote:
> > > > > > > On 2019-07-01 00:04:15 [-0400], Joel Fernandes (Google) wrote:
> > > > > > > > This reverts commit a6fda6dab93c2c06ef4b8cb4b9258df6674d2438 which
> > > > > > > > causes kvm.sh to not run on my machines. The qemu-system-x86_64 command
> > > > > > > > runs but does nothing.
> > > > > > >
> > > > > > > Nope. I would like to know *why* you need 'noapic' to work. Is it a
> > > > > > > brand new or old qemu-system-x86_64?
> > > > > >
> > > > > > I did not have time to debug yesterday and I posted this particular revert as
> > > > > > an 'RFC' just to make aware of this problem.
> > > > > >
> > > > > > I spent some more time just now, it looks like this has nothing to do with
> > > > > > 'noapic' and appears to be a problem on debian distros with the e1000e NIC.
> > > > > > May be this NIC was added to the virtual hardware because of -machine in the
> > > > > > patch?
> > > > > >
> > > > > > Any if I add the following to the qemu command that kvm.sh runs, it works again:
> > > > > > -net nic,model=e1000
> > > > > >
> > > > > > Without it I get:
> > > > > >  qemu-system-x86_64: Initialization of device e1000e failed: failed to find romfile "efi-e1000e.rom"
> > > > > >
> > > > > > Seems to be mentioned here:
> > > > > > https://bugs.launchpad.net/ubuntu/+source/ipxe/+bug/1737211
> > > > > >
> > > > > > And in syzkaller as well:
> > > > > > https://github.com/google/syzkaller/blob/master/vm/qemu/qemu.go#L88
> > > > > >
> > > > > > Adding Dmitry who is syzkaller owner for any thoughts as well.
> > > > > 
> > > > > I don't have many thoughts on this. That particular error looked like
> > > > > a bug in the package in the particular distro/version.
> > > > 
> > > > Paul, what is your preference here? Can we add the -net nic,model=e1000 to
> > > > fix it for the benefit of any other Debian folks running kvm.sh?
> > > > 
> > > > Or do you prefer if I just built my own custom Qemu? I can't upgrade Qemu on
> > > > this machine unfortunately. But may be I can build my own.
> > > > 
> > > > I prefer the -net option since I can save the time for something else. ;) Let
> > > > me know what you prefer and I'll fix it accordingly.
> > > 
> > > Why not just add the following to your kvm.sh command line?
> > > 
> > > 	--qemu-args "-net nic,model=e1000"
> > > 
> > > Easy workaround and no need to wait for changes to hit mainline.
> > > 
> > > Yes, I am perhaps naively assuming that the qemu bug in Debian will
> > > be fixed in about the time that it would take for any change to the
> > > rcutorture scripting to make its way into mainline.  ;-)
> > 
> > Thanks for the pointer to the args! Just what I wanted.
> > 
> > Only down side is then we run into the risk of other developers running
> > kvm.sh on debian as well and wondering why torture is broken. Perhaps the kvm
> > tweaks commit hasn't hit that many machines yet but it might. At least we
> > have this discussion archive here to guide such poor souls, if we don't want
> > to append these arguments by default ;-)
> 
> What is the likely timeframe of a qemu fix in Debian?

I dug through debian's archive. Indeed they have already fixed it
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=884240

I may need to upgrade my debian release or pull patches from debian testing,
or something. It might not be worth the trouble and I could just as well have a
custom kvm.sh command in my bashrc.

> This does raise a question of rcutorture workarounds in general.
> There will be times when some failure or another is expected behavior.
> For example, in current -rcu getting an rcutorture_oom_notify() splat or
> the occasional RCU CPU stall warning from TREE04 is expected behavior.
> This is because rcutorture's forward-progress testing is more aggressive
> that the TREE04 configuration can currently handle (but I am working
> on it).

I see, thanks good to know. I would love try out those tests as well.

> 
> For internal-to-RCU issues, I could make rcutorture warn of the likely
> problem.  for external-to-RCU issues, such as your fun with Debian qemu,
> that won't work all that well.  I could blog about the external-to-RCU
> issues, if that would help.  Or we could trust people to either do
> the websearch or ask for help on the rcu email list.

Yes this discussion is here. I also feel it may not be worth working around
since it is a distro bug which is already fixed it. Another approach could be
to look for that error message in qemu-output and tell people to use a custom
command or upgrade when they run the script. But that also may not be worth
it unless more people are running into this due to the default distro package
they are using.

> Thoughts?
> 
> > The other fix, could be I just build and run a new custom qemu instance but I
> > am happy with qemu-args as well.
> 
> My concern is that applying the workaround to the rcutorture scripts might
> break someone else.  Plus it is hard to synchronize with qemu releases,
> to say nothing of distro releases containing qemu.  Hence my preference
> for use of kvm.sh's --qemu-args for this purpose.

Yes, that works for me.

Thanks a lot for the discussion!

 - Joel


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 3/3] Revert "rcutorture: Tweak kvm options"
  2019-07-02  2:56                 ` Joel Fernandes
@ 2019-07-02  3:34                   ` Paul E. McKenney
  0 siblings, 0 replies; 18+ messages in thread
From: Paul E. McKenney @ 2019-07-02  3:34 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: rcu

On Mon, Jul 01, 2019 at 10:56:45PM -0400, Joel Fernandes wrote:
> On Mon, Jul 01, 2019 at 02:50:20PM -0700, Paul E. McKenney wrote:
> > On Mon, Jul 01, 2019 at 05:31:07PM -0400, Joel Fernandes wrote:
> > > On Mon, Jul 01, 2019 at 01:27:27PM -0700, Paul E. McKenney wrote:
> > > > On Mon, Jul 01, 2019 at 10:59:20AM -0400, Joel Fernandes wrote:
> > > > > On Mon, Jul 01, 2019 at 04:48:43PM +0200, Dmitry Vyukov wrote:
> > > > > > On Mon, Jul 1, 2019 at 4:14 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> > > > > > >
> > > > > > > On Mon, Jul 01, 2019 at 02:23:58PM +0200, Sebastian Andrzej Siewior wrote:
> > > > > > > > On 2019-07-01 00:04:15 [-0400], Joel Fernandes (Google) wrote:
> > > > > > > > > This reverts commit a6fda6dab93c2c06ef4b8cb4b9258df6674d2438 which
> > > > > > > > > causes kvm.sh to not run on my machines. The qemu-system-x86_64 command
> > > > > > > > > runs but does nothing.
> > > > > > > >
> > > > > > > > Nope. I would like to know *why* you need 'noapic' to work. Is it a
> > > > > > > > brand new or old qemu-system-x86_64?
> > > > > > >
> > > > > > > I did not have time to debug yesterday and I posted this particular revert as
> > > > > > > an 'RFC' just to make aware of this problem.
> > > > > > >
> > > > > > > I spent some more time just now, it looks like this has nothing to do with
> > > > > > > 'noapic' and appears to be a problem on debian distros with the e1000e NIC.
> > > > > > > May be this NIC was added to the virtual hardware because of -machine in the
> > > > > > > patch?
> > > > > > >
> > > > > > > Any if I add the following to the qemu command that kvm.sh runs, it works again:
> > > > > > > -net nic,model=e1000
> > > > > > >
> > > > > > > Without it I get:
> > > > > > >  qemu-system-x86_64: Initialization of device e1000e failed: failed to find romfile "efi-e1000e.rom"
> > > > > > >
> > > > > > > Seems to be mentioned here:
> > > > > > > https://bugs.launchpad.net/ubuntu/+source/ipxe/+bug/1737211
> > > > > > >
> > > > > > > And in syzkaller as well:
> > > > > > > https://github.com/google/syzkaller/blob/master/vm/qemu/qemu.go#L88
> > > > > > >
> > > > > > > Adding Dmitry who is syzkaller owner for any thoughts as well.
> > > > > > 
> > > > > > I don't have many thoughts on this. That particular error looked like
> > > > > > a bug in the package in the particular distro/version.
> > > > > 
> > > > > Paul, what is your preference here? Can we add the -net nic,model=e1000 to
> > > > > fix it for the benefit of any other Debian folks running kvm.sh?
> > > > > 
> > > > > Or do you prefer if I just built my own custom Qemu? I can't upgrade Qemu on
> > > > > this machine unfortunately. But may be I can build my own.
> > > > > 
> > > > > I prefer the -net option since I can save the time for something else. ;) Let
> > > > > me know what you prefer and I'll fix it accordingly.
> > > > 
> > > > Why not just add the following to your kvm.sh command line?
> > > > 
> > > > 	--qemu-args "-net nic,model=e1000"
> > > > 
> > > > Easy workaround and no need to wait for changes to hit mainline.
> > > > 
> > > > Yes, I am perhaps naively assuming that the qemu bug in Debian will
> > > > be fixed in about the time that it would take for any change to the
> > > > rcutorture scripting to make its way into mainline.  ;-)
> > > 
> > > Thanks for the pointer to the args! Just what I wanted.
> > > 
> > > Only down side is then we run into the risk of other developers running
> > > kvm.sh on debian as well and wondering why torture is broken. Perhaps the kvm
> > > tweaks commit hasn't hit that many machines yet but it might. At least we
> > > have this discussion archive here to guide such poor souls, if we don't want
> > > to append these arguments by default ;-)
> > 
> > What is the likely timeframe of a qemu fix in Debian?
> 
> I dug through debian's archive. Indeed they have already fixed it
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=884240

Whew!!!  ;-)

> I may need to upgrade my debian release or pull patches from debian testing,
> or something. It might not be worth the trouble and I could just as well have a
> custom kvm.sh command in my bashrc.

It is good to have the flexibility!

> > This does raise a question of rcutorture workarounds in general.
> > There will be times when some failure or another is expected behavior.
> > For example, in current -rcu getting an rcutorture_oom_notify() splat or
> > the occasional RCU CPU stall warning from TREE04 is expected behavior.
> > This is because rcutorture's forward-progress testing is more aggressive
> > that the TREE04 configuration can currently handle (but I am working
> > on it).
> 
> I see, thanks good to know. I would love try out those tests as well.

Just do "--config TREE04" on -rcu's "dev" branch.  ;-)

> > For internal-to-RCU issues, I could make rcutorture warn of the likely
> > problem.  for external-to-RCU issues, such as your fun with Debian qemu,
> > that won't work all that well.  I could blog about the external-to-RCU
> > issues, if that would help.  Or we could trust people to either do
> > the websearch or ask for help on the rcu email list.
> 
> Yes this discussion is here. I also feel it may not be worth working around
> since it is a distro bug which is already fixed it. Another approach could be
> to look for that error message in qemu-output and tell people to use a custom
> command or upgrade when they run the script. But that also may not be worth
> it unless more people are running into this due to the default distro package
> they are using.

True enough, got lucky this time.

> > Thoughts?
> > 
> > > The other fix, could be I just build and run a new custom qemu instance but I
> > > am happy with qemu-args as well.
> > 
> > My concern is that applying the workaround to the rcutorture scripts might
> > break someone else.  Plus it is hard to synchronize with qemu releases,
> > to say nothing of distro releases containing qemu.  Hence my preference
> > for use of kvm.sh's --qemu-args for this purpose.
> 
> Yes, that works for me.
> 
> Thanks a lot for the discussion!

Happy rcutorturing!  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 1/3] rcu: Expedite the rcu quiescent state reporting if help needed
  2019-07-01  4:04 [RFC 1/3] rcu: Expedite the rcu quiescent state reporting if help needed Joel Fernandes (Google)
                   ` (2 preceding siblings ...)
  2019-07-01 13:53 ` [RFC 1/3] rcu: Expedite the rcu quiescent state reporting if help needed Joel Fernandes
@ 2019-07-02  3:47 ` Paul E. McKenney
  2019-07-02 11:40   ` Joel Fernandes
  3 siblings, 1 reply; 18+ messages in thread
From: Paul E. McKenney @ 2019-07-02  3:47 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Josh Triplett, kernel-team, Lai Jiangshan,
	linux-kselftest, Mathieu Desnoyers, rcu,
	Sebastian Andrzej Siewior, Shuah Khan, Steven Rostedt

On Mon, Jul 01, 2019 at 12:04:13AM -0400, Joel Fernandes (Google) wrote:
> The t->rcu_read_unlock_special union's need_qs bit can be set by the
> scheduler tick (in rcu_flavor_sched_clock_irq) to indicate that help is
> needed from the rcu_read_unlock path. When this help arrives however, we
> can do better to speed up the quiescent state reporting which if
> rcu_read_unlock_special::need_qs is set might be quite urgent. Make use
> of this information in deciding when to do heavy-weight softirq raising
> where possible.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Cute thought, but I am going to have to pass on this one.  The reason
is that by the time that ->rcu_read_unlock_special.b.need_qs gets set,
the grace period is already one full second old.  At that point, the
extra tick of waiting is down in the noise.

Right now, we do the extra work if we really are blocking an expedited
grace period (the first two lines of the original condition) or we are
running on a nohz_full CPU (which might never execute a scheduling clock
tick, thus potentially delaying forever).  And expedited grace periods
are supposed to complete in tens or maybe hundreds of microseconds,
assuming the RCU readers are being cooperative, which is a whole
different level of urgent.

Nevertheless, thank you for looking into this!

							Thanx, Paul

> ---
>  kernel/rcu/tree_plugin.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index c588ef98efd3..bff6410fac06 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -622,7 +622,8 @@ static void rcu_read_unlock_special(struct task_struct *t)
>  		t->rcu_read_unlock_special.b.exp_hint = false;
>  		exp = (t->rcu_blocked_node && t->rcu_blocked_node->exp_tasks) ||
>  		      (rdp->grpmask & rnp->expmask) ||
> -		      tick_nohz_full_cpu(rdp->cpu);
> +		      tick_nohz_full_cpu(rdp->cpu)  ||
> +		      t->rcu_read_unlock_special.b.need_qs;
>  		// Need to defer quiescent state until everything is enabled.
>  		if (irqs_were_disabled && use_softirq &&
>  		    (in_interrupt() ||
> -- 
> 2.22.0.410.gd8fdbe21b5-goog
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 1/3] rcu: Expedite the rcu quiescent state reporting if help needed
  2019-07-02  3:47 ` Paul E. McKenney
@ 2019-07-02 11:40   ` Joel Fernandes
  0 siblings, 0 replies; 18+ messages in thread
From: Joel Fernandes @ 2019-07-02 11:40 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Josh Triplett, kernel-team, Lai Jiangshan,
	linux-kselftest, Mathieu Desnoyers, rcu,
	Sebastian Andrzej Siewior, Shuah Khan, Steven Rostedt

On Mon, Jul 01, 2019 at 08:47:30PM -0700, Paul E. McKenney wrote:
> On Mon, Jul 01, 2019 at 12:04:13AM -0400, Joel Fernandes (Google) wrote:
> > The t->rcu_read_unlock_special union's need_qs bit can be set by the
> > scheduler tick (in rcu_flavor_sched_clock_irq) to indicate that help is
> > needed from the rcu_read_unlock path. When this help arrives however, we
> > can do better to speed up the quiescent state reporting which if
> > rcu_read_unlock_special::need_qs is set might be quite urgent. Make use
> > of this information in deciding when to do heavy-weight softirq raising
> > where possible.
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> Cute thought, but I am going to have to pass on this one.  The reason
> is that by the time that ->rcu_read_unlock_special.b.need_qs gets set,
> the grace period is already one full second old.  At that point, the
> extra tick of waiting is down in the noise.
> 
> Right now, we do the extra work if we really are blocking an expedited
> grace period (the first two lines of the original condition) or we are
> running on a nohz_full CPU (which might never execute a scheduling clock
> tick, thus potentially delaying forever).  And expedited grace periods
> are supposed to complete in tens or maybe hundreds of microseconds,
> assuming the RCU readers are being cooperative, which is a whole
> different level of urgent.

Makes sense, I agree the patch may not be that helpful right now. I mixed up
the different levels or urgencies. No problem dropping it.

> 
> Nevertheless, thank you for looking into this!

My pleasure! Will keep them coming.

 - Joel

> 
> 							Thanx, Paul
> 
> > ---
> >  kernel/rcu/tree_plugin.h | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index c588ef98efd3..bff6410fac06 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -622,7 +622,8 @@ static void rcu_read_unlock_special(struct task_struct *t)
> >  		t->rcu_read_unlock_special.b.exp_hint = false;
> >  		exp = (t->rcu_blocked_node && t->rcu_blocked_node->exp_tasks) ||
> >  		      (rdp->grpmask & rnp->expmask) ||
> > -		      tick_nohz_full_cpu(rdp->cpu);
> > +		      tick_nohz_full_cpu(rdp->cpu)  ||
> > +		      t->rcu_read_unlock_special.b.need_qs;
> >  		// Need to defer quiescent state until everything is enabled.
> >  		if (irqs_were_disabled && use_softirq &&
> >  		    (in_interrupt() ||
> > -- 
> > 2.22.0.410.gd8fdbe21b5-goog
> > 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-07-02 11:40 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-01  4:04 [RFC 1/3] rcu: Expedite the rcu quiescent state reporting if help needed Joel Fernandes (Google)
2019-07-01  4:04 ` [RFC 2/3] rcu: Simplify rcu_note_context_switch exit from critical section Joel Fernandes (Google)
2019-07-01 20:03   ` Paul E. McKenney
2019-07-01 21:33     ` Joel Fernandes
2019-07-01 21:42       ` Paul E. McKenney
2019-07-01  4:04 ` [RFC 3/3] Revert "rcutorture: Tweak kvm options" Joel Fernandes (Google)
2019-07-01 12:23   ` Sebastian Andrzej Siewior
2019-07-01 14:14     ` Joel Fernandes
2019-07-01 14:48       ` Dmitry Vyukov
2019-07-01 14:59         ` Joel Fernandes
2019-07-01 20:27           ` Paul E. McKenney
2019-07-01 21:31             ` Joel Fernandes
2019-07-01 21:50               ` Paul E. McKenney
2019-07-02  2:56                 ` Joel Fernandes
2019-07-02  3:34                   ` Paul E. McKenney
2019-07-01 13:53 ` [RFC 1/3] rcu: Expedite the rcu quiescent state reporting if help needed Joel Fernandes
2019-07-02  3:47 ` Paul E. McKenney
2019-07-02 11:40   ` Joel Fernandes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).