LKML Archive on lore.kernel.org
 help / Atom feed
* [patch 1/8] x86/apic/vector: Prevent hlist corruption and leaks
@ 2018-06-04 15:33 Thomas Gleixner
  2018-06-05  7:06 ` Song Liu
  2018-06-06 13:30 ` [tip:x86/urgent] " tip-bot for Thomas Gleixner
  0 siblings, 2 replies; 3+ messages in thread
From: Thomas Gleixner @ 2018-06-04 15:33 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Peter Zijlstra, Borislav Petkov, Dmitry Safonov,
	Tariq Toukan, Song Liu, Joerg Roedel, Mike Travis, stable

[-- Attachment #0: x86-apic-vector--Prevent-hlist-corruption-and-leaks.patch --]
[-- Type: text/plain, Size: 2386 bytes --]

Several people observed the WARN_ON() in irq_matrix_free() which triggers
when the caller tries to free an vector which is not in the allocation
range. Song provided the trace information which allowed to decode the root
cause.

The rework of the vector allocation mechanism failed to preserve a sanity
check, which prevents setting a new target vector/CPU when the previous
affinity change has not fully completed.

As a result a half finished affinity change can be overwritten, which can
cause the leak of a irq descriptor pointer on the previous target CPU and
double enqueue of the hlist head into the cleanup lists of two or more
CPUs. After one CPU cleaned up its vector the next CPU will invoke the
cleanup handler with vector 0, which triggers the out of range warning in
the matrix allocator.

Prevent this by checking the apic_data of the interrupt whether the
move_in_progress flag is false and the hlist node is not hashed. Return
-EBUSY if not.

This prevents the damage and restores the behaviour before the vector
allocation rework, but due to other changes in that area it also widens the
chance that user space can observe -EBUSY. In theory this should be fine,
but actually not all user space tools handle -EBUSY correctly. Addressing
that is not part of this fix, but will be addressed in follow up patches.

Fixes: 69cde0004a4b ("x86/vector: Use matrix allocator for vector assignment")
Reported-by: Dmitry Safonov <0x7f454c46@gmail.com>
Reported-by: Tariq Toukan <tariqt@mellanox.com>
Reported-by: Song Liu <liu.song.a23@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
---
 arch/x86/kernel/apic/vector.c |    9 +++++++++
 1 file changed, 9 insertions(+)

--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -235,6 +235,15 @@ static int allocate_vector(struct irq_da
 	if (vector && cpu_online(cpu) && cpumask_test_cpu(cpu, dest))
 		return 0;
 
+	/*
+	 * Careful here. @apicd might either have move_in_progress set or
+	 * be enqueued for cleanup. Assigning a new vector would either
+	 * leave a stale vector on some CPU around or in case of a pending
+	 * cleanup corrupt the hlist.
+	 */
+	if (apicd->move_in_progress || !hlist_unhashed(&apicd->clist))
+		return -EBUSY;
+
 	vector = irq_matrix_alloc(vector_matrix, dest, resvd, &cpu);
 	if (vector > 0)
 		apic_update_vector(irqd, vector, cpu);

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [patch 1/8] x86/apic/vector: Prevent hlist corruption and leaks
  2018-06-04 15:33 [patch 1/8] x86/apic/vector: Prevent hlist corruption and leaks Thomas Gleixner
@ 2018-06-05  7:06 ` Song Liu
  2018-06-06 13:30 ` [tip:x86/urgent] " tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 3+ messages in thread
From: Song Liu @ 2018-06-05  7:06 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Borislav Petkov,
	Dmitry Safonov, Tariq Toukan, Joerg Roedel, Mike Travis, stable

On Mon, Jun 4, 2018 at 8:33 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> Several people observed the WARN_ON() in irq_matrix_free() which triggers
> when the caller tries to free an vector which is not in the allocation
> range. Song provided the trace information which allowed to decode the root
> cause.
>
> The rework of the vector allocation mechanism failed to preserve a sanity
> check, which prevents setting a new target vector/CPU when the previous
> affinity change has not fully completed.
>
> As a result a half finished affinity change can be overwritten, which can
> cause the leak of a irq descriptor pointer on the previous target CPU and
> double enqueue of the hlist head into the cleanup lists of two or more
> CPUs. After one CPU cleaned up its vector the next CPU will invoke the
> cleanup handler with vector 0, which triggers the out of range warning in
> the matrix allocator.
>
> Prevent this by checking the apic_data of the interrupt whether the
> move_in_progress flag is false and the hlist node is not hashed. Return
> -EBUSY if not.
>
> This prevents the damage and restores the behaviour before the vector
> allocation rework, but due to other changes in that area it also widens the
> chance that user space can observe -EBUSY. In theory this should be fine,
> but actually not all user space tools handle -EBUSY correctly. Addressing
> that is not part of this fix, but will be addressed in follow up patches.
>
> Fixes: 69cde0004a4b ("x86/vector: Use matrix allocator for vector assignment")
> Reported-by: Dmitry Safonov <0x7f454c46@gmail.com>
> Reported-by: Tariq Toukan <tariqt@mellanox.com>
> Reported-by: Song Liu <liu.song.a23@gmail.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: stable@vger.kernel.org

Thanks Thomas!

This patch alone fixes my test: ethtool -L in a loop.

I also run the same test for the full set, and it works well.

Tested-by: Song Liu <songliubraving@fb.com>


> ---
>  arch/x86/kernel/apic/vector.c |    9 +++++++++
>  1 file changed, 9 insertions(+)
>
> --- a/arch/x86/kernel/apic/vector.c
> +++ b/arch/x86/kernel/apic/vector.c
> @@ -235,6 +235,15 @@ static int allocate_vector(struct irq_da
>         if (vector && cpu_online(cpu) && cpumask_test_cpu(cpu, dest))
>                 return 0;
>
> +       /*
> +        * Careful here. @apicd might either have move_in_progress set or
> +        * be enqueued for cleanup. Assigning a new vector would either
> +        * leave a stale vector on some CPU around or in case of a pending
> +        * cleanup corrupt the hlist.
> +        */
> +       if (apicd->move_in_progress || !hlist_unhashed(&apicd->clist))
> +               return -EBUSY;
> +
>         vector = irq_matrix_alloc(vector_matrix, dest, resvd, &cpu);
>         if (vector > 0)
>                 apic_update_vector(irqd, vector, cpu);
>
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [tip:x86/urgent] x86/apic/vector: Prevent hlist corruption and leaks
  2018-06-04 15:33 [patch 1/8] x86/apic/vector: Prevent hlist corruption and leaks Thomas Gleixner
  2018-06-05  7:06 ` Song Liu
@ 2018-06-06 13:30 ` " tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 3+ messages in thread
From: tip-bot for Thomas Gleixner @ 2018-06-06 13:30 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: liu.song.a23, peterz, jroedel, hpa, linux-kernel, songliubraving,
	tglx, 0x7f454c46, bp, tariqt, mingo, mike.travis

Commit-ID:  80ae7b1a918e78b0bae88b0c0ad413d3fdced968
Gitweb:     https://git.kernel.org/tip/80ae7b1a918e78b0bae88b0c0ad413d3fdced968
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Mon, 4 Jun 2018 17:33:53 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 6 Jun 2018 15:18:19 +0200

x86/apic/vector: Prevent hlist corruption and leaks

Several people observed the WARN_ON() in irq_matrix_free() which triggers
when the caller tries to free an vector which is not in the allocation
range. Song provided the trace information which allowed to decode the root
cause.

The rework of the vector allocation mechanism failed to preserve a sanity
check, which prevents setting a new target vector/CPU when the previous
affinity change has not fully completed.

As a result a half finished affinity change can be overwritten, which can
cause the leak of a irq descriptor pointer on the previous target CPU and
double enqueue of the hlist head into the cleanup lists of two or more
CPUs. After one CPU cleaned up its vector the next CPU will invoke the
cleanup handler with vector 0, which triggers the out of range warning in
the matrix allocator.

Prevent this by checking the apic_data of the interrupt whether the
move_in_progress flag is false and the hlist node is not hashed. Return
-EBUSY if not.

This prevents the damage and restores the behaviour before the vector
allocation rework, but due to other changes in that area it also widens the
chance that user space can observe -EBUSY. In theory this should be fine,
but actually not all user space tools handle -EBUSY correctly. Addressing
that is not part of this fix, but will be addressed in follow up patches.

Fixes: 69cde0004a4b ("x86/vector: Use matrix allocator for vector assignment")
Reported-by: Dmitry Safonov <0x7f454c46@gmail.com>
Reported-by: Tariq Toukan <tariqt@mellanox.com>
Reported-by: Song Liu <liu.song.a23@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Song Liu <songliubraving@fb.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Cc: Mike Travis <mike.travis@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20180604162224.303870257@linutronix.de

---
 arch/x86/kernel/apic/vector.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index bb6f7a2148d7..72b575a0b662 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -235,6 +235,15 @@ static int allocate_vector(struct irq_data *irqd, const struct cpumask *dest)
 	if (vector && cpu_online(cpu) && cpumask_test_cpu(cpu, dest))
 		return 0;
 
+	/*
+	 * Careful here. @apicd might either have move_in_progress set or
+	 * be enqueued for cleanup. Assigning a new vector would either
+	 * leave a stale vector on some CPU around or in case of a pending
+	 * cleanup corrupt the hlist.
+	 */
+	if (apicd->move_in_progress || !hlist_unhashed(&apicd->clist))
+		return -EBUSY;
+
 	vector = irq_matrix_alloc(vector_matrix, dest, resvd, &cpu);
 	if (vector > 0)
 		apic_update_vector(irqd, vector, cpu);

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, back to index

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-04 15:33 [patch 1/8] x86/apic/vector: Prevent hlist corruption and leaks Thomas Gleixner
2018-06-05  7:06 ` Song Liu
2018-06-06 13:30 ` [tip:x86/urgent] " tip-bot for Thomas Gleixner

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox