All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xen/events: Fix race in set_evtchn_to_irq
@ 2021-08-11 14:08 ` Maximilian Heyne
  0 siblings, 0 replies; 4+ messages in thread
From: Maximilian Heyne @ 2021-08-11 14:08 UTC (permalink / raw)
  Cc: Amit Shah, Maximilian Heyne, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Wei Liu, Thomas Gleixner, Jan Beulich,
	Malcolm Crossley, David Vrabel, Konrad Rzeszutek Wilk, xen-devel,
	linux-kernel

There is a TOCTOU issue in set_evtchn_to_irq. Rows in the evtchn_to_irq
mapping are lazily allocated in this function. The check whether the row
is already present and the row initialization is not synchronized. Two
threads can at the same time allocate a new row for evtchn_to_irq and
add the irq mapping to the their newly allocated row. One thread will
overwrite what the other has set for evtchn_to_irq[row] and therefore
the irq mapping is lost. This will trigger a BUG_ON later in
bind_evtchn_to_cpu:

  INFO: pci 0000:1a:15.4: [1d0f:8061] type 00 class 0x010802
  INFO: nvme 0000:1a:12.1: enabling device (0000 -> 0002)
  INFO: nvme nvme77: 1/0/0 default/read/poll queues
  CRIT: kernel BUG at drivers/xen/events/events_base.c:427!
  WARN: invalid opcode: 0000 [#1] SMP NOPTI
  WARN: Workqueue: nvme-reset-wq nvme_reset_work [nvme]
  WARN: RIP: e030:bind_evtchn_to_cpu+0xc2/0xd0
  WARN: Call Trace:
  WARN:  set_affinity_irq+0x121/0x150
  WARN:  irq_do_set_affinity+0x37/0xe0
  WARN:  irq_setup_affinity+0xf6/0x170
  WARN:  irq_startup+0x64/0xe0
  WARN:  __setup_irq+0x69e/0x740
  WARN:  ? request_threaded_irq+0xad/0x160
  WARN:  request_threaded_irq+0xf5/0x160
  WARN:  ? nvme_timeout+0x2f0/0x2f0 [nvme]
  WARN:  pci_request_irq+0xa9/0xf0
  WARN:  ? pci_alloc_irq_vectors_affinity+0xbb/0x130
  WARN:  queue_request_irq+0x4c/0x70 [nvme]
  WARN:  nvme_reset_work+0x82d/0x1550 [nvme]
  WARN:  ? check_preempt_wakeup+0x14f/0x230
  WARN:  ? check_preempt_curr+0x29/0x80
  WARN:  ? nvme_irq_check+0x30/0x30 [nvme]
  WARN:  process_one_work+0x18e/0x3c0
  WARN:  worker_thread+0x30/0x3a0
  WARN:  ? process_one_work+0x3c0/0x3c0
  WARN:  kthread+0x113/0x130
  WARN:  ? kthread_park+0x90/0x90
  WARN:  ret_from_fork+0x3a/0x50

This patch sets evtchn_to_irq rows via a cmpxchg operation so that they
will be set only once. Clearing the row was moved up before writing the
row to evtchn_to_irq in order to not create a race once the row is
visible for other threads. Accesses to the rows are now guarded by
READ_ONCE and WRITE_ONCE just as for the columns in the data structure.

Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
Fixes: d0b075ffeede ("xen/events: Refactor evtchn_to_irq array to be dynamically allocated")
---
 drivers/xen/events/events_base.c | 35 ++++++++++++++++++++++----------
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index d7e361fb0548..7582a7f52313 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -198,22 +198,24 @@ static void disable_dynirq(struct irq_data *data);
 
 static DEFINE_PER_CPU(unsigned int, irq_epoch);
 
-static void clear_evtchn_to_irq_row(unsigned row)
+static void clear_evtchn_to_irq_row(int *evtchn_row)
 {
 	unsigned col;
 
 	for (col = 0; col < EVTCHN_PER_ROW; col++)
-		WRITE_ONCE(evtchn_to_irq[row][col], -1);
+		WRITE_ONCE(evtchn_row[col], -1);
 }
 
 static void clear_evtchn_to_irq_all(void)
 {
 	unsigned row;
+	int *evtchn_row;
 
 	for (row = 0; row < EVTCHN_ROW(xen_evtchn_max_channels()); row++) {
-		if (evtchn_to_irq[row] == NULL)
+		evtchn_row = READ_ONCE(evtchn_to_irq[row]);
+		if (evtchn_row == NULL)
 			continue;
-		clear_evtchn_to_irq_row(row);
+		clear_evtchn_to_irq_row(evtchn_row);
 	}
 }
 
@@ -221,36 +223,47 @@ static int set_evtchn_to_irq(evtchn_port_t evtchn, unsigned int irq)
 {
 	unsigned row;
 	unsigned col;
+	int *evtchn_row;
 
 	if (evtchn >= xen_evtchn_max_channels())
 		return -EINVAL;
 
 	row = EVTCHN_ROW(evtchn);
 	col = EVTCHN_COL(evtchn);
+	evtchn_row = READ_ONCE(evtchn_to_irq[row]);
 
-	if (evtchn_to_irq[row] == NULL) {
+	if (evtchn_row == NULL) {
 		/* Unallocated irq entries return -1 anyway */
 		if (irq == -1)
 			return 0;
 
-		evtchn_to_irq[row] = (int *)get_zeroed_page(GFP_KERNEL);
-		if (evtchn_to_irq[row] == NULL)
+		evtchn_row = (int *) get_zeroed_page(GFP_KERNEL);
+		if (evtchn_row == NULL)
 			return -ENOMEM;
 
-		clear_evtchn_to_irq_row(row);
+		clear_evtchn_to_irq_row(evtchn_row);
+
+		if (cmpxchg(&evtchn_to_irq[row], NULL, evtchn_row) != NULL) {
+			free_page((unsigned long) evtchn_row);
+			evtchn_row = READ_ONCE(evtchn_to_irq[row]);
+		}
 	}
 
-	WRITE_ONCE(evtchn_to_irq[row][col], irq);
+	WRITE_ONCE(evtchn_row[col], irq);
 	return 0;
 }
 
 int get_evtchn_to_irq(evtchn_port_t evtchn)
 {
+	int *evtchn_row;
+
 	if (evtchn >= xen_evtchn_max_channels())
 		return -1;
-	if (evtchn_to_irq[EVTCHN_ROW(evtchn)] == NULL)
+
+	evtchn_row = READ_ONCE(evtchn_to_irq[EVTCHN_ROW(evtchn)]);
+	if (evtchn_row == NULL)
 		return -1;
-	return READ_ONCE(evtchn_to_irq[EVTCHN_ROW(evtchn)][EVTCHN_COL(evtchn)]);
+	return READ_ONCE(evtchn_row[EVTCHN_COL(evtchn)]);
 }
 
 /* Get info for IRQ */
-- 
2.32.0




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH] xen/events: Fix race in set_evtchn_to_irq
@ 2021-08-11 14:08 ` Maximilian Heyne
  0 siblings, 0 replies; 4+ messages in thread
From: Maximilian Heyne @ 2021-08-11 14:08 UTC (permalink / raw)
  Cc: Amit Shah, Maximilian Heyne, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Wei Liu, Thomas Gleixner, Jan Beulich,
	Malcolm Crossley, David Vrabel, Konrad Rzeszutek Wilk, xen-devel,
	linux-kernel

There is a TOCTOU issue in set_evtchn_to_irq. Rows in the evtchn_to_irq
mapping are lazily allocated in this function. The check whether the row
is already present and the row initialization is not synchronized. Two
threads can at the same time allocate a new row for evtchn_to_irq and
add the irq mapping to the their newly allocated row. One thread will
overwrite what the other has set for evtchn_to_irq[row] and therefore
the irq mapping is lost. This will trigger a BUG_ON later in
bind_evtchn_to_cpu:

  INFO: pci 0000:1a:15.4: [1d0f:8061] type 00 class 0x010802
  INFO: nvme 0000:1a:12.1: enabling device (0000 -> 0002)
  INFO: nvme nvme77: 1/0/0 default/read/poll queues
  CRIT: kernel BUG at drivers/xen/events/events_base.c:427!
  WARN: invalid opcode: 0000 [#1] SMP NOPTI
  WARN: Workqueue: nvme-reset-wq nvme_reset_work [nvme]
  WARN: RIP: e030:bind_evtchn_to_cpu+0xc2/0xd0
  WARN: Call Trace:
  WARN:  set_affinity_irq+0x121/0x150
  WARN:  irq_do_set_affinity+0x37/0xe0
  WARN:  irq_setup_affinity+0xf6/0x170
  WARN:  irq_startup+0x64/0xe0
  WARN:  __setup_irq+0x69e/0x740
  WARN:  ? request_threaded_irq+0xad/0x160
  WARN:  request_threaded_irq+0xf5/0x160
  WARN:  ? nvme_timeout+0x2f0/0x2f0 [nvme]
  WARN:  pci_request_irq+0xa9/0xf0
  WARN:  ? pci_alloc_irq_vectors_affinity+0xbb/0x130
  WARN:  queue_request_irq+0x4c/0x70 [nvme]
  WARN:  nvme_reset_work+0x82d/0x1550 [nvme]
  WARN:  ? check_preempt_wakeup+0x14f/0x230
  WARN:  ? check_preempt_curr+0x29/0x80
  WARN:  ? nvme_irq_check+0x30/0x30 [nvme]
  WARN:  process_one_work+0x18e/0x3c0
  WARN:  worker_thread+0x30/0x3a0
  WARN:  ? process_one_work+0x3c0/0x3c0
  WARN:  kthread+0x113/0x130
  WARN:  ? kthread_park+0x90/0x90
  WARN:  ret_from_fork+0x3a/0x50

This patch sets evtchn_to_irq rows via a cmpxchg operation so that they
will be set only once. Clearing the row was moved up before writing the
row to evtchn_to_irq in order to not create a race once the row is
visible for other threads. Accesses to the rows are now guarded by
READ_ONCE and WRITE_ONCE just as for the columns in the data structure.

Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
Fixes: d0b075ffeede ("xen/events: Refactor evtchn_to_irq array to be dynamically allocated")
---
 drivers/xen/events/events_base.c | 35 ++++++++++++++++++++++----------
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index d7e361fb0548..7582a7f52313 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -198,22 +198,24 @@ static void disable_dynirq(struct irq_data *data);
 
 static DEFINE_PER_CPU(unsigned int, irq_epoch);
 
-static void clear_evtchn_to_irq_row(unsigned row)
+static void clear_evtchn_to_irq_row(int *evtchn_row)
 {
 	unsigned col;
 
 	for (col = 0; col < EVTCHN_PER_ROW; col++)
-		WRITE_ONCE(evtchn_to_irq[row][col], -1);
+		WRITE_ONCE(evtchn_row[col], -1);
 }
 
 static void clear_evtchn_to_irq_all(void)
 {
 	unsigned row;
+	int *evtchn_row;
 
 	for (row = 0; row < EVTCHN_ROW(xen_evtchn_max_channels()); row++) {
-		if (evtchn_to_irq[row] == NULL)
+		evtchn_row = READ_ONCE(evtchn_to_irq[row]);
+		if (evtchn_row == NULL)
 			continue;
-		clear_evtchn_to_irq_row(row);
+		clear_evtchn_to_irq_row(evtchn_row);
 	}
 }
 
@@ -221,36 +223,47 @@ static int set_evtchn_to_irq(evtchn_port_t evtchn, unsigned int irq)
 {
 	unsigned row;
 	unsigned col;
+	int *evtchn_row;
 
 	if (evtchn >= xen_evtchn_max_channels())
 		return -EINVAL;
 
 	row = EVTCHN_ROW(evtchn);
 	col = EVTCHN_COL(evtchn);
+	evtchn_row = READ_ONCE(evtchn_to_irq[row]);
 
-	if (evtchn_to_irq[row] == NULL) {
+	if (evtchn_row == NULL) {
 		/* Unallocated irq entries return -1 anyway */
 		if (irq == -1)
 			return 0;
 
-		evtchn_to_irq[row] = (int *)get_zeroed_page(GFP_KERNEL);
-		if (evtchn_to_irq[row] == NULL)
+		evtchn_row = (int *) get_zeroed_page(GFP_KERNEL);
+		if (evtchn_row == NULL)
 			return -ENOMEM;
 
-		clear_evtchn_to_irq_row(row);
+		clear_evtchn_to_irq_row(evtchn_row);
+
+		if (cmpxchg(&evtchn_to_irq[row], NULL, evtchn_row) != NULL) {
+			free_page((unsigned long) evtchn_row);
+			evtchn_row = READ_ONCE(evtchn_to_irq[row]);
+		}
 	}
 
-	WRITE_ONCE(evtchn_to_irq[row][col], irq);
+	WRITE_ONCE(evtchn_row[col], irq);
 	return 0;
 }
 
 int get_evtchn_to_irq(evtchn_port_t evtchn)
 {
+	int *evtchn_row;
+
 	if (evtchn >= xen_evtchn_max_channels())
 		return -1;
-	if (evtchn_to_irq[EVTCHN_ROW(evtchn)] == NULL)
+
+	evtchn_row = READ_ONCE(evtchn_to_irq[EVTCHN_ROW(evtchn)]);
+	if (evtchn_row == NULL)
 		return -1;
-	return READ_ONCE(evtchn_to_irq[EVTCHN_ROW(evtchn)][EVTCHN_COL(evtchn)]);
+	return READ_ONCE(evtchn_row[EVTCHN_COL(evtchn)]);
 }
 
 /* Get info for IRQ */
-- 
2.32.0




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879





^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] xen/events: Fix race in set_evtchn_to_irq
  2021-08-11 14:08 ` Maximilian Heyne
  (?)
@ 2021-08-11 15:05 ` Boris Ostrovsky
  2021-08-12 12:02   ` Heyne, Maximilian
  -1 siblings, 1 reply; 4+ messages in thread
From: Boris Ostrovsky @ 2021-08-11 15:05 UTC (permalink / raw)
  To: Maximilian Heyne
  Cc: Amit Shah, Juergen Gross, Stefano Stabellini, Wei Liu,
	Thomas Gleixner, Jan Beulich, Malcolm Crossley, David Vrabel,
	Konrad Rzeszutek Wilk, xen-devel, linux-kernel


On 8/11/21 10:08 AM, Maximilian Heyne wrote:
>
> This patch sets evtchn_to_irq rows via a cmpxchg operation so that they
> will be set only once. Clearing the row was moved up before writing the
> row to evtchn_to_irq in order to not create a race once the row is
> visible for other threads. Accesses to the rows are now guarded by
> READ_ONCE and WRITE_ONCE just as for the columns in the data structure.


Is this last part really needed? We needed to do that for array elements to avoid an interrupt handler from seeing a partially updated entry but I am not sure I see how this can happen to the row pointer. The only place where it might be important is when we update the pointer to the new page but you are using cmpxchg there already.


>  
> -		evtchn_to_irq[row] = (int *)get_zeroed_page(GFP_KERNEL);
> -		if (evtchn_to_irq[row] == NULL)
> +		evtchn_row = (int *) get_zeroed_page(GFP_KERNEL);


Not directly related to this patch but I don't think we need to get a zeroed page --- we will initialize it to -1 immediately below.



-boris


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] xen/events: Fix race in set_evtchn_to_irq
  2021-08-11 15:05 ` Boris Ostrovsky
@ 2021-08-12 12:02   ` Heyne, Maximilian
  0 siblings, 0 replies; 4+ messages in thread
From: Heyne, Maximilian @ 2021-08-12 12:02 UTC (permalink / raw)
  To: boris.ostrovsky
  Cc: jbeulich, jgross, sstabellini, linux-kernel, tglx, Shah, Amit,
	david.vrabel, malcolm.crossley, wei.liu, konrad.wilk, xen-devel

On Wed, 2021-08-11 at 11:05 -0400, Boris Ostrovsky wrote:
> On 8/11/21 10:08 AM, Maximilian Heyne wrote:
> > This patch sets evtchn_to_irq rows via a cmpxchg operation so that
> > they
> > will be set only once. Clearing the row was moved up before writing
> > the
> > row to evtchn_to_irq in order to not create a race once the row is
> > visible for other threads. Accesses to the rows are now guarded by
> > READ_ONCE and WRITE_ONCE just as for the columns in the data
> > structure.
> 
> Is this last part really needed? We needed to do that for array
> elements to avoid an interrupt handler from seeing a partially
> updated entry but I am not sure I see how this can happen to the row
> pointer. The only place where it might be important is when we update
> the pointer to the new page but you are using cmpxchg there already.

I think you are right. I will remove the changes related to the
READ_ONCE.

> 
> 
> > -             evtchn_to_irq[row] = (int
> > *)get_zeroed_page(GFP_KERNEL);
> > -             if (evtchn_to_irq[row] == NULL)
> > +             evtchn_row = (int *) get_zeroed_page(GFP_KERNEL);
> 
> Not directly related to this patch but I don't think we need to get a
> zeroed page --- we will initialize it to -1 immediately below.

That is correct. I will just fix this in the next version of the patch.

Max



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-08-12 12:02 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-11 14:08 [PATCH] xen/events: Fix race in set_evtchn_to_irq Maximilian Heyne
2021-08-11 14:08 ` Maximilian Heyne
2021-08-11 15:05 ` Boris Ostrovsky
2021-08-12 12:02   ` Heyne, Maximilian

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.