All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paul Mackerras <paulus@ozlabs.org>
To: Lijun Pan <ljp@linux.vnet.ibm.com>
Cc: linuxppc-dev@ozlabs.org, kvm@vger.kernel.org,
	kvm-ppc@vger.kernel.org,
	David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [PATCH 2/2] powerpc/xive: Implement get_irqchip_state method for XIVE to fix shutdown race
Date: Tue, 13 Aug 2019 14:56:39 +1000	[thread overview]
Message-ID: <20190813045639.lm443zsouievhtne@oak.ozlabs.ibm.com> (raw)
In-Reply-To: <E547965E-CC31-470F-8849-0F2A899A121F@linux.vnet.ibm.com>

On Mon, Aug 12, 2019 at 10:52:11PM -0500, Lijun Pan wrote:
> 
> 
> > On Aug 12, 2019, at 12:07 AM, Paul Mackerras <paulus@ozlabs.org> wrote:

[snip]

> > +static void cleanup_single_escalation(struct kvm_vcpu *vcpu,
> > +				      struct kvmppc_xive_vcpu *xc, int irq)
> > +{
> > +	struct irq_data *d = irq_get_irq_data(irq);
> > +	struct xive_irq_data *xd = irq_data_get_irq_handler_data(d);
> > +
> > +	/*
> > +	 * This slightly odd sequence gives the right result
> > +	 * (i.e. stale_p set if xive_esc_on is false) even if
> > +	 * we race with xive_esc_irq() and xive_irq_eoi().
> > +	 */
> 
> Hi Paul,
> 
> I don’t quite understand the logic here.
> Are you saying the code sequence is
> vcpu->arch.xive_esc_on = false; (xive_esc_irq)
> then
> xd->stale_p = true; (cleanup_single_escaltion)
> 
> > +	xd->stale_p = false;
> > +	smp_mb();		/* paired with smb_wmb in xive_esc_irq */
> > +	if (!vcpu->arch.xive_esc_on)
> > +		xd->stale_p = true;

The natural sequence would be just
	xd->stale_p = !vcpu->arch.xive_esc_on;

However, imagine that vcpu->arch.xive_esc_on is true, and the
escalation interrupt gets handled on another CPU between the load of
vcpu->arch.xive_esc_on and the store to xd->stale_p.  The interrupt
code calls xive_esc_irq(), which clears vcpu->arch.xive_esc_on, and
then xive_irq_eoi(), which sets xd->stale_p.  The natural sequence
could read vcpu->arch.xive_esc_on before the interrupt and see 1, then
write 0 to xd->stale_p after xive_irq_eoi() has set it.  That would
mean the final value of xd->stale_p was 0, which is wrong, since in
fact the queue entry has been removed.

With the code I wrote, because the clearing of xd->stale_p comes
before the load from vcpu->arch.xive_esc_on (with a barrier to make
sure they don't get reordered), then if a racing escalation interrupt
on another CPU stores 1 to xd->stale_p before we clear it, then we
must see vcpu->arch.xive_esc_on as 0, and we will set xd->stale_p
again, giving the correct final state (xd->stale_p == 1).  If the
racing interrupt clears vcpu->arch.xive_esc_on after we load it and
see 1, then its store to set xd->stale_p must come after our store to
clear it because of the barrier that I added to xive_esc_irq, so the
final result is once again correct.

[snip]

> > @@ -397,11 +411,16 @@ static void xive_do_source_set_mask(struct xive_irq_data *xd,
> > 	 */
> > 	if (mask) {
> > 		val = xive_esb_read(xd, XIVE_ESB_SET_PQ_01);
> > -		xd->saved_p = !!(val & XIVE_ESB_VAL_P);
> > -	} else if (xd->saved_p)
> > +		if (!xd->stale_p && !!(val & XIVE_ESB_VAL_P))
> > +			xd->saved_p = true;
> > +		xd->stale_p = false;
> > +	} else if (xd->saved_p) {
> > 		xive_esb_read(xd, XIVE_ESB_SET_PQ_10);
> > -	else
> > +		xd->saved_p = false;
> 
> Should we also explicitly set xd->stale_p = true; here?

We don't need to because xd->saved_p and xd->stale_p can never be both
set, and we just saw that xd->saved_p was set.

> > +	} else {
> > 		xive_esb_read(xd, XIVE_ESB_SET_PQ_00);
> > +		xd->stale_p = false;
> 
> Should we also explicitly set xd->saved_p = true; here?

No, that would be incorrect.  This is the case where we are unmasking
the interrupt and there is no queue entry currently.  Setting saved_p
would imply that there is a queue entry, which there isn't.

Paul.

WARNING: multiple messages have this Message-ID (diff)
From: Paul Mackerras <paulus@ozlabs.org>
To: Lijun Pan <ljp@linux.vnet.ibm.com>
Cc: linuxppc-dev@ozlabs.org, kvm-ppc@vger.kernel.org,
	kvm@vger.kernel.org, David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [PATCH 2/2] powerpc/xive: Implement get_irqchip_state method for XIVE to fix shutdown race
Date: Tue, 13 Aug 2019 14:56:39 +1000	[thread overview]
Message-ID: <20190813045639.lm443zsouievhtne@oak.ozlabs.ibm.com> (raw)
In-Reply-To: <E547965E-CC31-470F-8849-0F2A899A121F@linux.vnet.ibm.com>

On Mon, Aug 12, 2019 at 10:52:11PM -0500, Lijun Pan wrote:
> 
> 
> > On Aug 12, 2019, at 12:07 AM, Paul Mackerras <paulus@ozlabs.org> wrote:

[snip]

> > +static void cleanup_single_escalation(struct kvm_vcpu *vcpu,
> > +				      struct kvmppc_xive_vcpu *xc, int irq)
> > +{
> > +	struct irq_data *d = irq_get_irq_data(irq);
> > +	struct xive_irq_data *xd = irq_data_get_irq_handler_data(d);
> > +
> > +	/*
> > +	 * This slightly odd sequence gives the right result
> > +	 * (i.e. stale_p set if xive_esc_on is false) even if
> > +	 * we race with xive_esc_irq() and xive_irq_eoi().
> > +	 */
> 
> Hi Paul,
> 
> I don’t quite understand the logic here.
> Are you saying the code sequence is
> vcpu->arch.xive_esc_on = false; (xive_esc_irq)
> then
> xd->stale_p = true; (cleanup_single_escaltion)
> 
> > +	xd->stale_p = false;
> > +	smp_mb();		/* paired with smb_wmb in xive_esc_irq */
> > +	if (!vcpu->arch.xive_esc_on)
> > +		xd->stale_p = true;

The natural sequence would be just
	xd->stale_p = !vcpu->arch.xive_esc_on;

However, imagine that vcpu->arch.xive_esc_on is true, and the
escalation interrupt gets handled on another CPU between the load of
vcpu->arch.xive_esc_on and the store to xd->stale_p.  The interrupt
code calls xive_esc_irq(), which clears vcpu->arch.xive_esc_on, and
then xive_irq_eoi(), which sets xd->stale_p.  The natural sequence
could read vcpu->arch.xive_esc_on before the interrupt and see 1, then
write 0 to xd->stale_p after xive_irq_eoi() has set it.  That would
mean the final value of xd->stale_p was 0, which is wrong, since in
fact the queue entry has been removed.

With the code I wrote, because the clearing of xd->stale_p comes
before the load from vcpu->arch.xive_esc_on (with a barrier to make
sure they don't get reordered), then if a racing escalation interrupt
on another CPU stores 1 to xd->stale_p before we clear it, then we
must see vcpu->arch.xive_esc_on as 0, and we will set xd->stale_p
again, giving the correct final state (xd->stale_p == 1).  If the
racing interrupt clears vcpu->arch.xive_esc_on after we load it and
see 1, then its store to set xd->stale_p must come after our store to
clear it because of the barrier that I added to xive_esc_irq, so the
final result is once again correct.

[snip]

> > @@ -397,11 +411,16 @@ static void xive_do_source_set_mask(struct xive_irq_data *xd,
> > 	 */
> > 	if (mask) {
> > 		val = xive_esb_read(xd, XIVE_ESB_SET_PQ_01);
> > -		xd->saved_p = !!(val & XIVE_ESB_VAL_P);
> > -	} else if (xd->saved_p)
> > +		if (!xd->stale_p && !!(val & XIVE_ESB_VAL_P))
> > +			xd->saved_p = true;
> > +		xd->stale_p = false;
> > +	} else if (xd->saved_p) {
> > 		xive_esb_read(xd, XIVE_ESB_SET_PQ_10);
> > -	else
> > +		xd->saved_p = false;
> 
> Should we also explicitly set xd->stale_p = true; here?

We don't need to because xd->saved_p and xd->stale_p can never be both
set, and we just saw that xd->saved_p was set.

> > +	} else {
> > 		xive_esb_read(xd, XIVE_ESB_SET_PQ_00);
> > +		xd->stale_p = false;
> 
> Should we also explicitly set xd->saved_p = true; here?

No, that would be incorrect.  This is the case where we are unmasking
the interrupt and there is no queue entry currently.  Setting saved_p
would imply that there is a queue entry, which there isn't.

Paul.

WARNING: multiple messages have this Message-ID (diff)
From: Paul Mackerras <paulus@ozlabs.org>
To: Lijun Pan <ljp@linux.vnet.ibm.com>
Cc: linuxppc-dev@ozlabs.org, kvm@vger.kernel.org,
	kvm-ppc@vger.kernel.org,
	David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [PATCH 2/2] powerpc/xive: Implement get_irqchip_state method for XIVE to fix shutdown race
Date: Tue, 13 Aug 2019 04:56:39 +0000	[thread overview]
Message-ID: <20190813045639.lm443zsouievhtne@oak.ozlabs.ibm.com> (raw)
In-Reply-To: <E547965E-CC31-470F-8849-0F2A899A121F@linux.vnet.ibm.com>

On Mon, Aug 12, 2019 at 10:52:11PM -0500, Lijun Pan wrote:
> 
> 
> > On Aug 12, 2019, at 12:07 AM, Paul Mackerras <paulus@ozlabs.org> wrote:

[snip]

> > +static void cleanup_single_escalation(struct kvm_vcpu *vcpu,
> > +				      struct kvmppc_xive_vcpu *xc, int irq)
> > +{
> > +	struct irq_data *d = irq_get_irq_data(irq);
> > +	struct xive_irq_data *xd = irq_data_get_irq_handler_data(d);
> > +
> > +	/*
> > +	 * This slightly odd sequence gives the right result
> > +	 * (i.e. stale_p set if xive_esc_on is false) even if
> > +	 * we race with xive_esc_irq() and xive_irq_eoi().
> > +	 */
> 
> Hi Paul,
> 
> I don’t quite understand the logic here.
> Are you saying the code sequence is
> vcpu->arch.xive_esc_on = false; (xive_esc_irq)
> then
> xd->stale_p = true; (cleanup_single_escaltion)
> 
> > +	xd->stale_p = false;
> > +	smp_mb();		/* paired with smb_wmb in xive_esc_irq */
> > +	if (!vcpu->arch.xive_esc_on)
> > +		xd->stale_p = true;

The natural sequence would be just
	xd->stale_p = !vcpu->arch.xive_esc_on;

However, imagine that vcpu->arch.xive_esc_on is true, and the
escalation interrupt gets handled on another CPU between the load of
vcpu->arch.xive_esc_on and the store to xd->stale_p.  The interrupt
code calls xive_esc_irq(), which clears vcpu->arch.xive_esc_on, and
then xive_irq_eoi(), which sets xd->stale_p.  The natural sequence
could read vcpu->arch.xive_esc_on before the interrupt and see 1, then
write 0 to xd->stale_p after xive_irq_eoi() has set it.  That would
mean the final value of xd->stale_p was 0, which is wrong, since in
fact the queue entry has been removed.

With the code I wrote, because the clearing of xd->stale_p comes
before the load from vcpu->arch.xive_esc_on (with a barrier to make
sure they don't get reordered), then if a racing escalation interrupt
on another CPU stores 1 to xd->stale_p before we clear it, then we
must see vcpu->arch.xive_esc_on as 0, and we will set xd->stale_p
again, giving the correct final state (xd->stale_p = 1).  If the
racing interrupt clears vcpu->arch.xive_esc_on after we load it and
see 1, then its store to set xd->stale_p must come after our store to
clear it because of the barrier that I added to xive_esc_irq, so the
final result is once again correct.

[snip]

> > @@ -397,11 +411,16 @@ static void xive_do_source_set_mask(struct xive_irq_data *xd,
> > 	 */
> > 	if (mask) {
> > 		val = xive_esb_read(xd, XIVE_ESB_SET_PQ_01);
> > -		xd->saved_p = !!(val & XIVE_ESB_VAL_P);
> > -	} else if (xd->saved_p)
> > +		if (!xd->stale_p && !!(val & XIVE_ESB_VAL_P))
> > +			xd->saved_p = true;
> > +		xd->stale_p = false;
> > +	} else if (xd->saved_p) {
> > 		xive_esb_read(xd, XIVE_ESB_SET_PQ_10);
> > -	else
> > +		xd->saved_p = false;
> 
> Should we also explicitly set xd->stale_p = true; here?

We don't need to because xd->saved_p and xd->stale_p can never be both
set, and we just saw that xd->saved_p was set.

> > +	} else {
> > 		xive_esb_read(xd, XIVE_ESB_SET_PQ_00);
> > +		xd->stale_p = false;
> 
> Should we also explicitly set xd->saved_p = true; here?

No, that would be incorrect.  This is the case where we are unmasking
the interrupt and there is no queue entry currently.  Setting saved_p
would imply that there is a queue entry, which there isn't.

Paul.

  reply	other threads:[~2019-08-13  4:59 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-12  5:06 [PATCH 0/2] powerpc/xive: Fix race condition leading to host crashes and hangs Paul Mackerras
2019-08-12  5:06 ` Paul Mackerras
2019-08-12  5:07 ` [PATCH 1/2] KVM: PPC: Book3S HV: Fix race in re-enabling XIVE escalation interrupts Paul Mackerras
2019-08-12  5:07   ` Paul Mackerras
2019-08-13  4:59   ` Paul Mackerras
2019-08-13  4:59     ` Paul Mackerras
2019-08-12  5:07 ` [PATCH 2/2] powerpc/xive: Implement get_irqchip_state method for XIVE to fix shutdown race Paul Mackerras
2019-08-12  5:07   ` Paul Mackerras
2019-08-13  3:52   ` Lijun Pan
2019-08-13  3:52     ` Lijun Pan
2019-08-13  3:52     ` Lijun Pan
2019-08-13  4:56     ` Paul Mackerras [this message]
2019-08-13  4:56       ` Paul Mackerras
2019-08-13  4:56       ` Paul Mackerras

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190813045639.lm443zsouievhtne@oak.ozlabs.ibm.com \
    --to=paulus@ozlabs.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=ljp@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.