All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chris J Arges <chris.j.arges@canonical.com>,
	Rafael David Tinoco <inaddy@ubuntu.com>,
	Peter Anvin <hpa@zytor.com>,
	Jiang Liu <jiang.liu@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>, Jens Axboe <axboe@kernel.dk>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Gema Gomez <gema.gomez-solano@canonical.com>,
	the arch/x86 maintainers <x86@kernel.org>
Subject: [debug PATCHes] Re: smp_call_function_single lockups
Date: Tue, 31 Mar 2015 12:56:56 +0200	[thread overview]
Message-ID: <20150331105656.GA25180@gmail.com> (raw)
In-Reply-To: <CA+55aFyk9nnStb9sNm0xt=j7K+eGES=m_dDm9E49QSnW7jFgkQ@mail.gmail.com>


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> Ok, interesting. So the whole "we try to do an APIC ACK with the ISR 
> bit clear" seems to be a real issue.

It's interesting in particular when it happens with an edge-triggered 
interrupt source: it's much harder to miss level triggered IRQs, which 
stay around until actively handled. Edge triggered irqs are more 
fragile to loss of event processing.

> > Anyway, maybe this sheds some more light on this issue. I can 
> > reproduce this at will, so let me know of other experiments to do.

Btw., could you please describe (again) what your current best method 
for reproduction is? It's been a long discussion ...

> Somebody else who knows the apic needs to also take a look, but I'd 
> love to hear what the actual hardware interrupt is (from that 
> "vector_irq[vector]" thing above.
> 
> I'm not recognizing 0xe1 as any of the hardcoded SMP vectors (they 
> are 0xf0-0xff), so it sounds like an external one. But that then 
> requires the whole mapping table thing.
> 
> Ingo/Peter/Jiang - is there anything else useful we could print out? 
> I worry about the irq movement code. Can we add printk's to when an 
> irq is chasing from one CPU to another and doing that 
> "move_in_progress" thing? I've always been scared of that code.

1)

So the irq_cfg::move_in_progress field originates from:

  610142927b5b ("[PATCH] x86_64 irq: Safely cleanup an irq after moving it.")

and I agree that it looks fragile, so it would be nice to see how much 
(if at all?) it gets used, by sticking a few pr_info()s in it.

Debug patch for the vector movement state machine attached below. 
Untested.

2)

But ... having taken a quick look at the vector movement handling, I 
am scared more than ever,and I cannot convince myself that it's race 
free. It's possibly harmless, but still, famous last words ...

For example, most ->move_in_progress transitions happen with the 
vector_lock held - with the exception of send_cleanup_vector(): there 
we are only holding the desc->lock, but that's not enough! For 
example, this codepath:

static void __irq_complete_move(struct irq_cfg *cfg, unsigned vector)
{
        unsigned me;

        if (likely(!cfg->move_in_progress))
                return;

        me = smp_processor_id();

        if (vector == cfg->vector && cpumask_test_cpu(me, cfg->domain))
                send_cleanup_vector(cfg);

...

void send_cleanup_vector(struct irq_cfg *cfg)
{
...

        cfg->move_in_progress = 0;
}

is blatantly racy, unless I missed something obvious?

2b)

Initially I thought this only affects slowpaths like driver unregister 
or CPU hotplug, but I think this would be relevant for the fastpath of 
edge triggered irqs as well:

void apic_ack_edge(struct irq_data *data)
{
        irq_complete_move(irqd_cfg(data));
        irq_move_irq(data);
        ack_APIC_irq();
}

and irq_complete_move() checks and clears cfg->move_in_progress as 
listed above.

So in most scenarios this is probably harmless, because it can in the 
worst case result in the missing of a ->move_in_progress flag setting.

But it does not look harmless in the apic_set_affinity() code path: 
there we call into assign_irq_vector() without the desc->lock held, 
due to this gem in the IRQ core:

int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m)
{
        unsigned long flags;
        struct irq_desc *desc = irq_get_desc_lock(irq, &flags, IRQ_GET_DESC_CHECK_GLOBAL);

        if (!desc)
                return -EINVAL;
        desc->affinity_hint = m;
        irq_put_desc_unlock(desc, flags);
        /* set the initial affinity to prevent every interrupt being on CPU0 */
        if (m)
                __irq_set_affinity(irq, m, false);
        return 0;
}

argh!

Now if this ever crosses path with an assign_irq_vector() call on 
another CPU, then I really cannot see what would save us from deep 
trouble: we use cfg->vector while that is possibly being modified, 
right?

So I'd suggest to put a printk into irq_set_affinity_hint() as well, 
to make sure it's not used during this test. In particular:

drivers/virtio/virtio_pci_common.c:                     irq_set_affinity_hint(irq, NULL);
drivers/virtio/virtio_pci_common.c:                     irq_set_affinity_hint(irq, mask);

might be triggering it in the virtualization code...

The second patch below adds the relevant pr_info(). (untested)

Now this too might be unrelated, because the affinity hint was added 
ages ago, in:

e7a297b0d7d6 ("genirq: Add CPU mask affinity hint")

and the potentially racy nature of calling into set_affinity without 
the desc lock held was probably not realized back then.

VirtIO's use of the affinity-hint was added a while ago as well, four 
years ago, in:

  75a0a52be3c2 ("virtio: introduce an API to set affinity for a virtqueue")

2c)

The other thing that worries me here is that we apparently send an 
IRQ-move IPI while having an un-acked local APIC (!). I have vague 
memories that this was problematic and fragile before.

To ease my worries in this area I've attached a third patch below that 
changes the order around.

This too is ancient behavior - but might be more prominent on certain 
hardware (and virtualization) variants, so it has a chance to be 
relevant.

3)

Furthermore, I also noticed that the whole vector_lock()/unlock() 
business is actively misleading IMHO, the vector_lock looks local to 
vector.c, while we have these two calls that essentially export it ...

Some of that damage was inflicted in this ancient commit:

  d388e5fdc461 ("x86: Restore proper vector locking during cpu hotplug")

the proper approach would be to protect against hotplug events in 
assign_irq_vector(), not the other way around!

Probably not a functional problem, yet somewhat disgusting.

-------------------------------

So ... all in one, these various problems might be related to the 
regression or not, but they sure need addressing.

Thanks,

	Ingo

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 6cedd7914581..1dd1de9dd690 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -143,6 +143,8 @@ __assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask)
 			cpumask_andnot(cfg->old_domain, cfg->domain, tmp_mask);
 			cfg->move_in_progress =
 			   cpumask_intersects(cfg->old_domain, cpu_online_mask);
+			if (cfg->move_in_progress)
+				pr_info("apic: vector %02x, same-domain move in progress\n", cfg->vector);
 			cpumask_and(cfg->domain, cfg->domain, tmp_mask);
 			break;
 		}
@@ -178,6 +180,8 @@ __assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask)
 			cpumask_copy(cfg->old_domain, cfg->domain);
 			cfg->move_in_progress =
 			   cpumask_intersects(cfg->old_domain, cpu_online_mask);
+			if (cfg->move_in_progress)
+				pr_info("apic: vector %02x, new-domain move in progress\n", cfg->vector);
 		}
 		for_each_cpu_and(new_cpu, tmp_mask, cpu_online_mask)
 			per_cpu(vector_irq, new_cpu)[vector] = irq;
@@ -232,6 +236,7 @@ void clear_irq_vector(int irq, struct irq_cfg *cfg)
 		}
 	}
 	cfg->move_in_progress = 0;
+	pr_info("apic: vector %02x, vector cleared, move completed\n", cfg->vector);
 	raw_spin_unlock_irqrestore(&vector_lock, flags);
 }
 
@@ -391,6 +396,7 @@ void send_cleanup_vector(struct irq_cfg *cfg)
 		free_cpumask_var(cleanup_mask);
 	}
 	cfg->move_in_progress = 0;
+	pr_info("apic: vector %02x, sent cleanup vector, move completed\n", cfg->vector);
 }
 
 asmlinkage __visible void smp_irq_move_cleanup_interrupt(void)

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index c0a1100d911f..18300a7b8ed2 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -250,8 +250,10 @@ int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m)
 	desc->affinity_hint = m;
 	irq_put_desc_unlock(desc, flags);
 	/* set the initial affinity to prevent every interrupt being on CPU0 */
-	if (m)
+	if (m) {
+		pr_info("irq/core: Called irq_set_affinity_hint(%d)\n", irq);
 		__irq_set_affinity(irq, m, false);
+	}
 	return 0;
 }
 EXPORT_SYMBOL_GPL(irq_set_affinity_hint);

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 6cedd7914581..833a981c5420 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -335,9 +340,11 @@ int apic_retrigger_irq(struct irq_data *data)
 
 void apic_ack_edge(struct irq_data *data)
 {
+	ack_APIC_irq();
+
+	/* Might generate IPIs, so do this after having ACKed the APIC: */
 	irq_complete_move(irqd_cfg(data));
 	irq_move_irq(data);
-	ack_APIC_irq();
 }
 
 /*

  reply	other threads:[~2015-03-31 10:57 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-11 13:19 smp_call_function_single lockups Rafael David Tinoco
2015-02-11 18:18 ` Linus Torvalds
2015-02-11 19:59   ` Linus Torvalds
2015-02-11 20:42     ` Linus Torvalds
2015-02-12 16:38       ` Rafael David Tinoco
2015-02-18 22:25       ` Peter Zijlstra
2015-02-19 15:42         ` Rafael David Tinoco
2015-02-19 16:14           ` Linus Torvalds
2015-02-23 14:01             ` Rafael David Tinoco
2015-02-23 19:32               ` Linus Torvalds
2015-02-23 20:50                 ` Peter Zijlstra
2015-02-23 21:02                   ` Rafael David Tinoco
2015-02-19 16:16           ` Peter Zijlstra
2015-02-19 16:26           ` Linus Torvalds
2015-02-19 16:32             ` Rafael David Tinoco
2015-02-19 16:59               ` Linus Torvalds
2015-02-19 17:30                 ` Rafael David Tinoco
2015-02-19 17:39                 ` Linus Torvalds
2015-02-19 20:29                   ` Linus Torvalds
2015-02-19 21:59                     ` Linus Torvalds
2015-02-19 22:45                       ` Linus Torvalds
2015-03-31  3:15                         ` Chris J Arges
2015-03-31  4:28                           ` Linus Torvalds
2015-03-31 10:56                             ` Ingo Molnar [this message]
2015-03-31 22:38                               ` [debug PATCHes] " Chris J Arges
2015-04-01 12:39                                 ` Ingo Molnar
2015-04-01 14:10                                   ` Chris J Arges
2015-04-01 14:55                                     ` Ingo Molnar
2015-03-31  4:46                           ` Linus Torvalds
2015-03-31 15:08                           ` Linus Torvalds
2015-03-31 22:23                             ` Chris J Arges
2015-03-31 23:07                               ` Linus Torvalds
2015-04-01 14:32                                 ` Chris J Arges
2015-04-01 15:36                                   ` Linus Torvalds
2015-04-02  9:55                                     ` Ingo Molnar
2015-04-02 17:35                                       ` Linus Torvalds
2015-04-01 12:43                               ` Ingo Molnar
2015-04-01 16:10                                 ` Chris J Arges
2015-04-01 16:14                                   ` Linus Torvalds
2015-04-01 21:59                                     ` Chris J Arges
2015-04-02 17:31                                       ` Linus Torvalds
2015-04-02 18:26                                         ` Ingo Molnar
2015-04-02 18:51                                           ` Chris J Arges
2015-04-02 19:07                                             ` Ingo Molnar
2015-04-02 20:57                                               ` Linus Torvalds
2015-04-02 21:13                                               ` Chris J Arges
2015-04-03  5:43                                                 ` [PATCH] smp/call: Detect stuck CSD locks Ingo Molnar
2015-04-03  5:47                                                   ` Ingo Molnar
2015-04-06 16:58                                                   ` Chris J Arges
2015-04-06 17:32                                                     ` Linus Torvalds
2015-04-07  9:21                                                       ` Ingo Molnar
2015-04-07 20:59                                                         ` Chris J Arges
2015-04-07 21:15                                                           ` Linus Torvalds
2015-04-08  6:47                                                           ` Ingo Molnar
2015-04-13  3:56                                                             ` Chris J Arges
2015-04-13  6:14                                                               ` Ingo Molnar
2015-04-15 19:54                                                                 ` Chris J Arges
2015-04-16 11:04                                                                   ` Ingo Molnar
2015-04-16 15:58                                                                     ` Chris J Arges
2015-04-16 16:31                                                                       ` Ingo Molnar
2015-04-29 21:08                                                                         ` Chris J Arges
2015-05-11 14:00                                                                           ` Ingo Molnar
2015-05-20 18:19                                                                             ` Chris J Arges
2015-04-03  5:45                                                 ` smp_call_function_single lockups Ingo Molnar
2015-04-06 17:23                                         ` Chris J Arges
2015-02-20  9:30                     ` Ingo Molnar
2015-02-20 16:49                       ` Linus Torvalds
2015-02-20 19:41                         ` Ingo Molnar
2015-02-20 20:03                           ` Linus Torvalds
2015-02-20 20:11                             ` Ingo Molnar
2015-03-20 10:15       ` Peter Zijlstra
2015-03-20 16:26         ` Linus Torvalds
2015-03-20 17:14           ` Mike Galbraith
2015-04-01 14:22       ` Frederic Weisbecker
2015-04-18 10:13       ` [tip:locking/urgent] smp: Fix smp_call_function_single_async() locking tip-bot for Linus Torvalds
2015-04-01  2:00 [debug PATCHes] Re: smp_call_function_single lockups Daniel J Blueman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150331105656.GA25180@gmail.com \
    --to=mingo@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=chris.j.arges@canonical.com \
    --cc=fweisbec@gmail.com \
    --cc=gema.gomez-solano@canonical.com \
    --cc=hpa@zytor.com \
    --cc=inaddy@ubuntu.com \
    --cc=jiang.liu@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.