linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] xen: harden blkfront against malicious backends
@ 2021-07-30 10:38 Juergen Gross
  2021-07-30 10:38 ` [PATCH v3 1/3] xen/blkfront: read response from backend only once Juergen Gross
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Juergen Gross @ 2021-07-30 10:38 UTC (permalink / raw)
  To: xen-devel, linux-block, linux-kernel
  Cc: Juergen Gross, Boris Ostrovsky, Stefano Stabellini,
	Konrad Rzeszutek Wilk, Roger Pau Monné,
	Jens Axboe

Xen backends of para-virtualized devices can live in dom0 kernel, dom0
user land, or in a driver domain. This means that a backend might
reside in a less trusted environment than the Xen core components, so
a backend should not be able to do harm to a Xen guest (it can still
mess up I/O data, but it shouldn't be able to e.g. crash a guest by
other means or cause a privilege escalation in the guest).

Unfortunately blkfront in the Linux kernel is fully trusting its
backend. This series is fixing blkfront in this regard.

It was discussed to handle this as a security problem, but the topic
was discussed in public before, so it isn't a real secret.

It should be mentioned that a similar series has been posted some years
ago by Marek Marczykowski-Górecki, but this series has not been applied
due to a Xen header not having been available in the Xen git repo at
that time. Additionally my series is fixing some more DoS cases.

Changes in V3:
- patch 3: insert missing unlock in error case (kernel test robot)
- patch 3: use %#x as format for printing wrong operation value
  (Roger Pau Monné)

Changes in V2:
- put blkfront patches into own series
- some minor comments addressed

Juergen Gross (3):
  xen/blkfront: read response from backend only once
  xen/blkfront: don't take local copy of a request from the ring page
  xen/blkfront: don't trust the backend response data blindly

 drivers/block/xen-blkfront.c | 126 +++++++++++++++++++++++------------
 1 file changed, 84 insertions(+), 42 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 1/3] xen/blkfront: read response from backend only once
  2021-07-30 10:38 [PATCH v3 0/3] xen: harden blkfront against malicious backends Juergen Gross
@ 2021-07-30 10:38 ` Juergen Gross
  2021-08-02 14:06   ` Oleksandr Andrushchenko
  2021-07-30 10:38 ` [PATCH v3 2/3] xen/blkfront: don't take local copy of a request from the ring page Juergen Gross
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Juergen Gross @ 2021-07-30 10:38 UTC (permalink / raw)
  To: xen-devel, linux-block, linux-kernel
  Cc: Juergen Gross, Boris Ostrovsky, Stefano Stabellini,
	Konrad Rzeszutek Wilk, Roger Pau Monné,
	Jens Axboe, Jan Beulich

In order to avoid problems in case the backend is modifying a response
on the ring page while the frontend has already seen it, just read the
response into a local buffer in one go and then operate on that buffer
only.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
---
 drivers/block/xen-blkfront.c | 35 ++++++++++++++++++-----------------
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index d83fee21f6c5..15e840287734 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1496,7 +1496,7 @@ static bool blkif_completion(unsigned long *id,
 static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 {
 	struct request *req;
-	struct blkif_response *bret;
+	struct blkif_response bret;
 	RING_IDX i, rp;
 	unsigned long flags;
 	struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)dev_id;
@@ -1513,8 +1513,9 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 	for (i = rinfo->ring.rsp_cons; i != rp; i++) {
 		unsigned long id;
 
-		bret = RING_GET_RESPONSE(&rinfo->ring, i);
-		id   = bret->id;
+		RING_COPY_RESPONSE(&rinfo->ring, i, &bret);
+		id = bret.id;
+
 		/*
 		 * The backend has messed up and given us an id that we would
 		 * never have given to it (we stamp it up to BLK_RING_SIZE -
@@ -1522,39 +1523,39 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 		 */
 		if (id >= BLK_RING_SIZE(info)) {
 			WARN(1, "%s: response to %s has incorrect id (%ld)\n",
-			     info->gd->disk_name, op_name(bret->operation), id);
+			     info->gd->disk_name, op_name(bret.operation), id);
 			/* We can't safely get the 'struct request' as
 			 * the id is busted. */
 			continue;
 		}
 		req  = rinfo->shadow[id].request;
 
-		if (bret->operation != BLKIF_OP_DISCARD) {
+		if (bret.operation != BLKIF_OP_DISCARD) {
 			/*
 			 * We may need to wait for an extra response if the
 			 * I/O request is split in 2
 			 */
-			if (!blkif_completion(&id, rinfo, bret))
+			if (!blkif_completion(&id, rinfo, &bret))
 				continue;
 		}
 
 		if (add_id_to_freelist(rinfo, id)) {
 			WARN(1, "%s: response to %s (id %ld) couldn't be recycled!\n",
-			     info->gd->disk_name, op_name(bret->operation), id);
+			     info->gd->disk_name, op_name(bret.operation), id);
 			continue;
 		}
 
-		if (bret->status == BLKIF_RSP_OKAY)
+		if (bret.status == BLKIF_RSP_OKAY)
 			blkif_req(req)->error = BLK_STS_OK;
 		else
 			blkif_req(req)->error = BLK_STS_IOERR;
 
-		switch (bret->operation) {
+		switch (bret.operation) {
 		case BLKIF_OP_DISCARD:
-			if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
+			if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) {
 				struct request_queue *rq = info->rq;
 				printk(KERN_WARNING "blkfront: %s: %s op failed\n",
-					   info->gd->disk_name, op_name(bret->operation));
+					   info->gd->disk_name, op_name(bret.operation));
 				blkif_req(req)->error = BLK_STS_NOTSUPP;
 				info->feature_discard = 0;
 				info->feature_secdiscard = 0;
@@ -1564,15 +1565,15 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 			break;
 		case BLKIF_OP_FLUSH_DISKCACHE:
 		case BLKIF_OP_WRITE_BARRIER:
-			if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
+			if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) {
 				printk(KERN_WARNING "blkfront: %s: %s op failed\n",
-				       info->gd->disk_name, op_name(bret->operation));
+				       info->gd->disk_name, op_name(bret.operation));
 				blkif_req(req)->error = BLK_STS_NOTSUPP;
 			}
-			if (unlikely(bret->status == BLKIF_RSP_ERROR &&
+			if (unlikely(bret.status == BLKIF_RSP_ERROR &&
 				     rinfo->shadow[id].req.u.rw.nr_segments == 0)) {
 				printk(KERN_WARNING "blkfront: %s: empty %s op failed\n",
-				       info->gd->disk_name, op_name(bret->operation));
+				       info->gd->disk_name, op_name(bret.operation));
 				blkif_req(req)->error = BLK_STS_NOTSUPP;
 			}
 			if (unlikely(blkif_req(req)->error)) {
@@ -1585,9 +1586,9 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 			fallthrough;
 		case BLKIF_OP_READ:
 		case BLKIF_OP_WRITE:
-			if (unlikely(bret->status != BLKIF_RSP_OKAY))
+			if (unlikely(bret.status != BLKIF_RSP_OKAY))
 				dev_dbg(&info->xbdev->dev, "Bad return from blkdev data "
-					"request: %x\n", bret->status);
+					"request: %x\n", bret.status);
 
 			break;
 		default:
-- 
2.26.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 2/3] xen/blkfront: don't take local copy of a request from the ring page
  2021-07-30 10:38 [PATCH v3 0/3] xen: harden blkfront against malicious backends Juergen Gross
  2021-07-30 10:38 ` [PATCH v3 1/3] xen/blkfront: read response from backend only once Juergen Gross
@ 2021-07-30 10:38 ` Juergen Gross
  2021-09-10 10:14   ` Marek Marczykowski-Górecki
  2021-07-30 10:38 ` [PATCH v3 3/3] xen/blkfront: don't trust the backend response data blindly Juergen Gross
  2021-08-30 10:16 ` [PATCH v3 0/3] xen: harden blkfront against malicious backends Juergen Gross
  3 siblings, 1 reply; 10+ messages in thread
From: Juergen Gross @ 2021-07-30 10:38 UTC (permalink / raw)
  To: xen-devel, linux-block, linux-kernel
  Cc: Juergen Gross, Konrad Rzeszutek Wilk, Roger Pau Monné,
	Boris Ostrovsky, Stefano Stabellini, Jens Axboe, Jan Beulich

In order to avoid a malicious backend being able to influence the local
copy of a request build the request locally first and then copy it to
the ring page instead of doing it the other way round as today.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
---
V2:
- init variable to avoid potential compiler warning (Jan Beulich)
---
 drivers/block/xen-blkfront.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 15e840287734..b7301006fb28 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -533,7 +533,7 @@ static unsigned long blkif_ring_get_request(struct blkfront_ring_info *rinfo,
 	rinfo->shadow[id].status = REQ_WAITING;
 	rinfo->shadow[id].associated_id = NO_ASSOCIATED_ID;
 
-	(*ring_req)->u.rw.id = id;
+	rinfo->shadow[id].req.u.rw.id = id;
 
 	return id;
 }
@@ -541,11 +541,12 @@ static unsigned long blkif_ring_get_request(struct blkfront_ring_info *rinfo,
 static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_info *rinfo)
 {
 	struct blkfront_info *info = rinfo->dev_info;
-	struct blkif_request *ring_req;
+	struct blkif_request *ring_req, *final_ring_req;
 	unsigned long id;
 
 	/* Fill out a communications ring structure. */
-	id = blkif_ring_get_request(rinfo, req, &ring_req);
+	id = blkif_ring_get_request(rinfo, req, &final_ring_req);
+	ring_req = &rinfo->shadow[id].req;
 
 	ring_req->operation = BLKIF_OP_DISCARD;
 	ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
@@ -556,8 +557,8 @@ static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_inf
 	else
 		ring_req->u.discard.flag = 0;
 
-	/* Keep a private copy so we can reissue requests when recovering. */
-	rinfo->shadow[id].req = *ring_req;
+	/* Copy the request to the ring page. */
+	*final_ring_req = *ring_req;
 
 	return 0;
 }
@@ -690,6 +691,7 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
 {
 	struct blkfront_info *info = rinfo->dev_info;
 	struct blkif_request *ring_req, *extra_ring_req = NULL;
+	struct blkif_request *final_ring_req, *final_extra_ring_req = NULL;
 	unsigned long id, extra_id = NO_ASSOCIATED_ID;
 	bool require_extra_req = false;
 	int i;
@@ -734,7 +736,8 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
 	}
 
 	/* Fill out a communications ring structure. */
-	id = blkif_ring_get_request(rinfo, req, &ring_req);
+	id = blkif_ring_get_request(rinfo, req, &final_ring_req);
+	ring_req = &rinfo->shadow[id].req;
 
 	num_sg = blk_rq_map_sg(req->q, req, rinfo->shadow[id].sg);
 	num_grant = 0;
@@ -785,7 +788,9 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
 		ring_req->u.rw.nr_segments = num_grant;
 		if (unlikely(require_extra_req)) {
 			extra_id = blkif_ring_get_request(rinfo, req,
-							  &extra_ring_req);
+							  &final_extra_ring_req);
+			extra_ring_req = &rinfo->shadow[extra_id].req;
+
 			/*
 			 * Only the first request contains the scatter-gather
 			 * list.
@@ -827,10 +832,10 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
 	if (setup.segments)
 		kunmap_atomic(setup.segments);
 
-	/* Keep a private copy so we can reissue requests when recovering. */
-	rinfo->shadow[id].req = *ring_req;
+	/* Copy request(s) to the ring page. */
+	*final_ring_req = *ring_req;
 	if (unlikely(require_extra_req))
-		rinfo->shadow[extra_id].req = *extra_ring_req;
+		*final_extra_ring_req = *extra_ring_req;
 
 	if (new_persistent_gnts)
 		gnttab_free_grant_references(setup.gref_head);
-- 
2.26.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 3/3] xen/blkfront: don't trust the backend response data blindly
  2021-07-30 10:38 [PATCH v3 0/3] xen: harden blkfront against malicious backends Juergen Gross
  2021-07-30 10:38 ` [PATCH v3 1/3] xen/blkfront: read response from backend only once Juergen Gross
  2021-07-30 10:38 ` [PATCH v3 2/3] xen/blkfront: don't take local copy of a request from the ring page Juergen Gross
@ 2021-07-30 10:38 ` Juergen Gross
  2021-08-30 10:16 ` [PATCH v3 0/3] xen: harden blkfront against malicious backends Juergen Gross
  3 siblings, 0 replies; 10+ messages in thread
From: Juergen Gross @ 2021-07-30 10:38 UTC (permalink / raw)
  To: xen-devel, linux-block, linux-kernel
  Cc: Juergen Gross, Konrad Rzeszutek Wilk, Roger Pau Monné,
	Boris Ostrovsky, Stefano Stabellini, Jens Axboe, Jan Beulich

Today blkfront will trust the backend to send only sane response data.
In order to avoid privilege escalations or crashes in case of malicious
backends verify the data to be within expected limits. Especially make
sure that the response always references an outstanding request.

Introduce a new state of the ring BLKIF_STATE_ERROR which will be
switched to in case an inconsistency is being detected. Recovering from
this state is possible only via removing and adding the virtual device
again (e.g. via a suspend/resume cycle).

Make all warning messages issued due to valid error responses rate
limited in order to avoid message floods being triggered by a malicious
backend.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
---
V2:
- use READ_ONCE() for reading the producer index
- check validity of producer index only after memory barrier (Jan Beulich)
- use virt_rmb() as barrier (Jan Beulich)
V3:
- insert missing unlock in error case (kernel test robot)
- use %#x as format for printing wrong operation value (Roger Pau Monné)
---
 drivers/block/xen-blkfront.c | 70 +++++++++++++++++++++++++++---------
 1 file changed, 53 insertions(+), 17 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index b7301006fb28..7a5e62fae171 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -80,6 +80,7 @@ enum blkif_state {
 	BLKIF_STATE_DISCONNECTED,
 	BLKIF_STATE_CONNECTED,
 	BLKIF_STATE_SUSPENDED,
+	BLKIF_STATE_ERROR,
 };
 
 struct grant {
@@ -89,6 +90,7 @@ struct grant {
 };
 
 enum blk_req_status {
+	REQ_PROCESSING,
 	REQ_WAITING,
 	REQ_DONE,
 	REQ_ERROR,
@@ -530,7 +532,7 @@ static unsigned long blkif_ring_get_request(struct blkfront_ring_info *rinfo,
 
 	id = get_id_from_freelist(rinfo);
 	rinfo->shadow[id].request = req;
-	rinfo->shadow[id].status = REQ_WAITING;
+	rinfo->shadow[id].status = REQ_PROCESSING;
 	rinfo->shadow[id].associated_id = NO_ASSOCIATED_ID;
 
 	rinfo->shadow[id].req.u.rw.id = id;
@@ -559,6 +561,7 @@ static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_inf
 
 	/* Copy the request to the ring page. */
 	*final_ring_req = *ring_req;
+	rinfo->shadow[id].status = REQ_WAITING;
 
 	return 0;
 }
@@ -834,8 +837,11 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
 
 	/* Copy request(s) to the ring page. */
 	*final_ring_req = *ring_req;
-	if (unlikely(require_extra_req))
+	rinfo->shadow[id].status = REQ_WAITING;
+	if (unlikely(require_extra_req)) {
 		*final_extra_ring_req = *extra_ring_req;
+		rinfo->shadow[extra_id].status = REQ_WAITING;
+	}
 
 	if (new_persistent_gnts)
 		gnttab_free_grant_references(setup.gref_head);
@@ -1359,8 +1365,8 @@ static enum blk_req_status blkif_rsp_to_req_status(int rsp)
 static int blkif_get_final_status(enum blk_req_status s1,
 				  enum blk_req_status s2)
 {
-	BUG_ON(s1 == REQ_WAITING);
-	BUG_ON(s2 == REQ_WAITING);
+	BUG_ON(s1 < REQ_DONE);
+	BUG_ON(s2 < REQ_DONE);
 
 	if (s1 == REQ_ERROR || s2 == REQ_ERROR)
 		return BLKIF_RSP_ERROR;
@@ -1393,7 +1399,7 @@ static bool blkif_completion(unsigned long *id,
 		s->status = blkif_rsp_to_req_status(bret->status);
 
 		/* Wait the second response if not yet here. */
-		if (s2->status == REQ_WAITING)
+		if (s2->status < REQ_DONE)
 			return false;
 
 		bret->status = blkif_get_final_status(s->status,
@@ -1512,11 +1518,17 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 
 	spin_lock_irqsave(&rinfo->ring_lock, flags);
  again:
-	rp = rinfo->ring.sring->rsp_prod;
-	rmb(); /* Ensure we see queued responses up to 'rp'. */
+	rp = READ_ONCE(rinfo->ring.sring->rsp_prod);
+	virt_rmb(); /* Ensure we see queued responses up to 'rp'. */
+	if (RING_RESPONSE_PROD_OVERFLOW(&rinfo->ring, rp)) {
+		pr_alert("%s: illegal number of responses %u\n",
+			 info->gd->disk_name, rp - rinfo->ring.rsp_cons);
+		goto err;
+	}
 
 	for (i = rinfo->ring.rsp_cons; i != rp; i++) {
 		unsigned long id;
+		unsigned int op;
 
 		RING_COPY_RESPONSE(&rinfo->ring, i, &bret);
 		id = bret.id;
@@ -1527,14 +1539,28 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 		 * look in get_id_from_freelist.
 		 */
 		if (id >= BLK_RING_SIZE(info)) {
-			WARN(1, "%s: response to %s has incorrect id (%ld)\n",
-			     info->gd->disk_name, op_name(bret.operation), id);
-			/* We can't safely get the 'struct request' as
-			 * the id is busted. */
-			continue;
+			pr_alert("%s: response has incorrect id (%ld)\n",
+				 info->gd->disk_name, id);
+			goto err;
+		}
+		if (rinfo->shadow[id].status != REQ_WAITING) {
+			pr_alert("%s: response references no pending request\n",
+				 info->gd->disk_name);
+			goto err;
 		}
+
+		rinfo->shadow[id].status = REQ_PROCESSING;
 		req  = rinfo->shadow[id].request;
 
+		op = rinfo->shadow[id].req.operation;
+		if (op == BLKIF_OP_INDIRECT)
+			op = rinfo->shadow[id].req.u.indirect.indirect_op;
+		if (bret.operation != op) {
+			pr_alert("%s: response has wrong operation (%u instead of %u)\n",
+				 info->gd->disk_name, bret.operation, op);
+			goto err;
+		}
+
 		if (bret.operation != BLKIF_OP_DISCARD) {
 			/*
 			 * We may need to wait for an extra response if the
@@ -1559,7 +1585,8 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 		case BLKIF_OP_DISCARD:
 			if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) {
 				struct request_queue *rq = info->rq;
-				printk(KERN_WARNING "blkfront: %s: %s op failed\n",
+
+				pr_warn_ratelimited("blkfront: %s: %s op failed\n",
 					   info->gd->disk_name, op_name(bret.operation));
 				blkif_req(req)->error = BLK_STS_NOTSUPP;
 				info->feature_discard = 0;
@@ -1571,13 +1598,13 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 		case BLKIF_OP_FLUSH_DISKCACHE:
 		case BLKIF_OP_WRITE_BARRIER:
 			if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) {
-				printk(KERN_WARNING "blkfront: %s: %s op failed\n",
+				pr_warn_ratelimited("blkfront: %s: %s op failed\n",
 				       info->gd->disk_name, op_name(bret.operation));
 				blkif_req(req)->error = BLK_STS_NOTSUPP;
 			}
 			if (unlikely(bret.status == BLKIF_RSP_ERROR &&
 				     rinfo->shadow[id].req.u.rw.nr_segments == 0)) {
-				printk(KERN_WARNING "blkfront: %s: empty %s op failed\n",
+				pr_warn_ratelimited("blkfront: %s: empty %s op failed\n",
 				       info->gd->disk_name, op_name(bret.operation));
 				blkif_req(req)->error = BLK_STS_NOTSUPP;
 			}
@@ -1592,8 +1619,9 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 		case BLKIF_OP_READ:
 		case BLKIF_OP_WRITE:
 			if (unlikely(bret.status != BLKIF_RSP_OKAY))
-				dev_dbg(&info->xbdev->dev, "Bad return from blkdev data "
-					"request: %x\n", bret.status);
+				dev_dbg_ratelimited(&info->xbdev->dev,
+					"Bad return from blkdev data request: %#x\n",
+					bret.status);
 
 			break;
 		default:
@@ -1619,6 +1647,14 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 	spin_unlock_irqrestore(&rinfo->ring_lock, flags);
 
 	return IRQ_HANDLED;
+
+ err:
+	info->connected = BLKIF_STATE_ERROR;
+
+	spin_unlock_irqrestore(&rinfo->ring_lock, flags);
+
+	pr_alert("%s disabled for further use\n", info->gd->disk_name);
+	return IRQ_HANDLED;
 }
 
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/3] xen/blkfront: read response from backend only once
  2021-07-30 10:38 ` [PATCH v3 1/3] xen/blkfront: read response from backend only once Juergen Gross
@ 2021-08-02 14:06   ` Oleksandr Andrushchenko
  2021-08-02 19:26     ` Julien Grall
  0 siblings, 1 reply; 10+ messages in thread
From: Oleksandr Andrushchenko @ 2021-08-02 14:06 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, linux-block, linux-kernel
  Cc: Boris Ostrovsky, Stefano Stabellini, Konrad Rzeszutek Wilk,
	Roger Pau Monné,
	Jens Axboe, Jan Beulich

Hi, Juergen!

On 30.07.21 13:38, Juergen Gross wrote:
> In order to avoid problems in case the backend is modifying a response
> on the ring page while the frontend has already seen it, just read the
> response into a local buffer in one go and then operate on that buffer
> only.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
>   drivers/block/xen-blkfront.c | 35 ++++++++++++++++++-----------------
>   1 file changed, 18 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index d83fee21f6c5..15e840287734 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -1496,7 +1496,7 @@ static bool blkif_completion(unsigned long *id,
>   static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>   {
>   	struct request *req;
> -	struct blkif_response *bret;
> +	struct blkif_response bret;
>   	RING_IDX i, rp;
>   	unsigned long flags;
>   	struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)dev_id;
> @@ -1513,8 +1513,9 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>   	for (i = rinfo->ring.rsp_cons; i != rp; i++) {
>   		unsigned long id;
>   
> -		bret = RING_GET_RESPONSE(&rinfo->ring, i);
> -		id   = bret->id;
> +		RING_COPY_RESPONSE(&rinfo->ring, i, &bret);

As per my understanding copying is still not an atomic operation as the request/response

are multi-byte structures in general. IOW, what prevents the backend from modifying the ring while

we are copying?

Thanks,

Oleksandr

> +		id = bret.id;
> +
>   		/*
>   		 * The backend has messed up and given us an id that we would
>   		 * never have given to it (we stamp it up to BLK_RING_SIZE -
> @@ -1522,39 +1523,39 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>   		 */
>   		if (id >= BLK_RING_SIZE(info)) {
>   			WARN(1, "%s: response to %s has incorrect id (%ld)\n",
> -			     info->gd->disk_name, op_name(bret->operation), id);
> +			     info->gd->disk_name, op_name(bret.operation), id);
>   			/* We can't safely get the 'struct request' as
>   			 * the id is busted. */
>   			continue;
>   		}
>   		req  = rinfo->shadow[id].request;
>   
> -		if (bret->operation != BLKIF_OP_DISCARD) {
> +		if (bret.operation != BLKIF_OP_DISCARD) {
>   			/*
>   			 * We may need to wait for an extra response if the
>   			 * I/O request is split in 2
>   			 */
> -			if (!blkif_completion(&id, rinfo, bret))
> +			if (!blkif_completion(&id, rinfo, &bret))
>   				continue;
>   		}
>   
>   		if (add_id_to_freelist(rinfo, id)) {
>   			WARN(1, "%s: response to %s (id %ld) couldn't be recycled!\n",
> -			     info->gd->disk_name, op_name(bret->operation), id);
> +			     info->gd->disk_name, op_name(bret.operation), id);
>   			continue;
>   		}
>   
> -		if (bret->status == BLKIF_RSP_OKAY)
> +		if (bret.status == BLKIF_RSP_OKAY)
>   			blkif_req(req)->error = BLK_STS_OK;
>   		else
>   			blkif_req(req)->error = BLK_STS_IOERR;
>   
> -		switch (bret->operation) {
> +		switch (bret.operation) {
>   		case BLKIF_OP_DISCARD:
> -			if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
> +			if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) {
>   				struct request_queue *rq = info->rq;
>   				printk(KERN_WARNING "blkfront: %s: %s op failed\n",
> -					   info->gd->disk_name, op_name(bret->operation));
> +					   info->gd->disk_name, op_name(bret.operation));
>   				blkif_req(req)->error = BLK_STS_NOTSUPP;
>   				info->feature_discard = 0;
>   				info->feature_secdiscard = 0;
> @@ -1564,15 +1565,15 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>   			break;
>   		case BLKIF_OP_FLUSH_DISKCACHE:
>   		case BLKIF_OP_WRITE_BARRIER:
> -			if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
> +			if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) {
>   				printk(KERN_WARNING "blkfront: %s: %s op failed\n",
> -				       info->gd->disk_name, op_name(bret->operation));
> +				       info->gd->disk_name, op_name(bret.operation));
>   				blkif_req(req)->error = BLK_STS_NOTSUPP;
>   			}
> -			if (unlikely(bret->status == BLKIF_RSP_ERROR &&
> +			if (unlikely(bret.status == BLKIF_RSP_ERROR &&
>   				     rinfo->shadow[id].req.u.rw.nr_segments == 0)) {
>   				printk(KERN_WARNING "blkfront: %s: empty %s op failed\n",
> -				       info->gd->disk_name, op_name(bret->operation));
> +				       info->gd->disk_name, op_name(bret.operation));
>   				blkif_req(req)->error = BLK_STS_NOTSUPP;
>   			}
>   			if (unlikely(blkif_req(req)->error)) {
> @@ -1585,9 +1586,9 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>   			fallthrough;
>   		case BLKIF_OP_READ:
>   		case BLKIF_OP_WRITE:
> -			if (unlikely(bret->status != BLKIF_RSP_OKAY))
> +			if (unlikely(bret.status != BLKIF_RSP_OKAY))
>   				dev_dbg(&info->xbdev->dev, "Bad return from blkdev data "
> -					"request: %x\n", bret->status);
> +					"request: %x\n", bret.status);
>   
>   			break;
>   		default:

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/3] xen/blkfront: read response from backend only once
  2021-08-02 14:06   ` Oleksandr Andrushchenko
@ 2021-08-02 19:26     ` Julien Grall
  2021-08-03  7:00       ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 10+ messages in thread
From: Julien Grall @ 2021-08-02 19:26 UTC (permalink / raw)
  To: Oleksandr Andrushchenko, Juergen Gross, xen-devel, linux-block,
	linux-kernel
  Cc: Boris Ostrovsky, Stefano Stabellini, Konrad Rzeszutek Wilk,
	Roger Pau Monné,
	Jens Axboe, Jan Beulich

Hi,

On 02/08/2021 15:06, Oleksandr Andrushchenko wrote:
> On 30.07.21 13:38, Juergen Gross wrote:
>> In order to avoid problems in case the backend is modifying a response
>> on the ring page while the frontend has already seen it, just read the
>> response into a local buffer in one go and then operate on that buffer
>> only.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
>> ---
>>    drivers/block/xen-blkfront.c | 35 ++++++++++++++++++-----------------
>>    1 file changed, 18 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>> index d83fee21f6c5..15e840287734 100644
>> --- a/drivers/block/xen-blkfront.c
>> +++ b/drivers/block/xen-blkfront.c
>> @@ -1496,7 +1496,7 @@ static bool blkif_completion(unsigned long *id,
>>    static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>>    {
>>    	struct request *req;
>> -	struct blkif_response *bret;
>> +	struct blkif_response bret;
>>    	RING_IDX i, rp;
>>    	unsigned long flags;
>>    	struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)dev_id;
>> @@ -1513,8 +1513,9 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>>    	for (i = rinfo->ring.rsp_cons; i != rp; i++) {
>>    		unsigned long id;
>>    
>> -		bret = RING_GET_RESPONSE(&rinfo->ring, i);
>> -		id   = bret->id;
>> +		RING_COPY_RESPONSE(&rinfo->ring, i, &bret);
> 
> As per my understanding copying is still not an atomic operation as the request/response
> 
> are multi-byte structures in general. IOW, what prevents the backend from modifying the ring while
> 
> we are copying?

Nothing and, I believe, you are never going to be able to ensure 
atomicity with large structure (at least between entity that doesn't 
trust each other).

However, what you can do is copying the response once, check that it is 
consistent and then use it. If it is not consistent, then you can report 
an error.

This is better than what's currently in tree. IOW we may have multiple 
read so the code is prone to TOCTOU.

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/3] xen/blkfront: read response from backend only once
  2021-08-02 19:26     ` Julien Grall
@ 2021-08-03  7:00       ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 10+ messages in thread
From: Oleksandr Andrushchenko @ 2021-08-03  7:00 UTC (permalink / raw)
  To: Julien Grall, Juergen Gross, xen-devel, linux-block, linux-kernel
  Cc: Boris Ostrovsky, Stefano Stabellini, Konrad Rzeszutek Wilk,
	Roger Pau Monné,
	Jens Axboe, Jan Beulich


On 02.08.21 22:26, Julien Grall wrote:
> Hi,
>
> On 02/08/2021 15:06, Oleksandr Andrushchenko wrote:
>> On 30.07.21 13:38, Juergen Gross wrote:
>>> In order to avoid problems in case the backend is modifying a response
>>> on the ring page while the frontend has already seen it, just read the
>>> response into a local buffer in one go and then operate on that buffer
>>> only.
>>>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
>>> ---
>>>    drivers/block/xen-blkfront.c | 35 ++++++++++++++++++-----------------
>>>    1 file changed, 18 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>>> index d83fee21f6c5..15e840287734 100644
>>> --- a/drivers/block/xen-blkfront.c
>>> +++ b/drivers/block/xen-blkfront.c
>>> @@ -1496,7 +1496,7 @@ static bool blkif_completion(unsigned long *id,
>>>    static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>>>    {
>>>        struct request *req;
>>> -    struct blkif_response *bret;
>>> +    struct blkif_response bret;
>>>        RING_IDX i, rp;
>>>        unsigned long flags;
>>>        struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)dev_id;
>>> @@ -1513,8 +1513,9 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>>>        for (i = rinfo->ring.rsp_cons; i != rp; i++) {
>>>            unsigned long id;
>>>    -        bret = RING_GET_RESPONSE(&rinfo->ring, i);
>>> -        id   = bret->id;
>>> +        RING_COPY_RESPONSE(&rinfo->ring, i, &bret);
>>
>> As per my understanding copying is still not an atomic operation as the request/response
>>
>> are multi-byte structures in general. IOW, what prevents the backend from modifying the ring while
>>
>> we are copying?
>
> Nothing and, I believe, you are never going to be able to ensure atomicity with large structure (at least between entity that doesn't trust each other).
>
> However, what you can do is copying the response once, check that it is consistent and then use it. If it is not consistent, then you can report an error.
>
> This is better than what's currently in tree. IOW we may have multiple read so the code is prone to TOCTOU.

Agree,

Thanks

>
> Cheers,
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/3] xen: harden blkfront against malicious backends
  2021-07-30 10:38 [PATCH v3 0/3] xen: harden blkfront against malicious backends Juergen Gross
                   ` (2 preceding siblings ...)
  2021-07-30 10:38 ` [PATCH v3 3/3] xen/blkfront: don't trust the backend response data blindly Juergen Gross
@ 2021-08-30 10:16 ` Juergen Gross
  3 siblings, 0 replies; 10+ messages in thread
From: Juergen Gross @ 2021-08-30 10:16 UTC (permalink / raw)
  To: xen-devel, linux-block, linux-kernel
  Cc: Boris Ostrovsky, Stefano Stabellini, Konrad Rzeszutek Wilk,
	Roger Pau Monné,
	Jens Axboe


[-- Attachment #1.1.1: Type: text/plain, Size: 1731 bytes --]

On 30.07.21 12:38, Juergen Gross wrote:
> Xen backends of para-virtualized devices can live in dom0 kernel, dom0
> user land, or in a driver domain. This means that a backend might
> reside in a less trusted environment than the Xen core components, so
> a backend should not be able to do harm to a Xen guest (it can still
> mess up I/O data, but it shouldn't be able to e.g. crash a guest by
> other means or cause a privilege escalation in the guest).
> 
> Unfortunately blkfront in the Linux kernel is fully trusting its
> backend. This series is fixing blkfront in this regard.
> 
> It was discussed to handle this as a security problem, but the topic
> was discussed in public before, so it isn't a real secret.
> 
> It should be mentioned that a similar series has been posted some years
> ago by Marek Marczykowski-Górecki, but this series has not been applied
> due to a Xen header not having been available in the Xen git repo at
> that time. Additionally my series is fixing some more DoS cases.
> 
> Changes in V3:
> - patch 3: insert missing unlock in error case (kernel test robot)
> - patch 3: use %#x as format for printing wrong operation value
>    (Roger Pau Monné)
> 
> Changes in V2:
> - put blkfront patches into own series
> - some minor comments addressed
> 
> Juergen Gross (3):
>    xen/blkfront: read response from backend only once
>    xen/blkfront: don't take local copy of a request from the ring page
>    xen/blkfront: don't trust the backend response data blindly
> 
>   drivers/block/xen-blkfront.c | 126 +++++++++++++++++++++++------------
>   1 file changed, 84 insertions(+), 42 deletions(-)
> 

Series pushed to xen/tip.git for-linus-5.15


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 2/3] xen/blkfront: don't take local copy of a request from the ring page
  2021-07-30 10:38 ` [PATCH v3 2/3] xen/blkfront: don't take local copy of a request from the ring page Juergen Gross
@ 2021-09-10 10:14   ` Marek Marczykowski-Górecki
  2021-09-10 10:36     ` Juergen Gross
  0 siblings, 1 reply; 10+ messages in thread
From: Marek Marczykowski-Górecki @ 2021-09-10 10:14 UTC (permalink / raw)
  To: Juergen Gross
  Cc: xen-devel, linux-block, linux-kernel, Konrad Rzeszutek Wilk,
	Roger Pau Monné,
	Boris Ostrovsky, Stefano Stabellini, Jens Axboe, Jan Beulich

[-- Attachment #1: Type: text/plain, Size: 1681 bytes --]

On Fri, Jul 30, 2021 at 12:38:53PM +0200, Juergen Gross wrote:
> In order to avoid a malicious backend being able to influence the local
> copy of a request build the request locally first and then copy it to
> the ring page instead of doing it the other way round as today.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> V2:
> - init variable to avoid potential compiler warning (Jan Beulich)
> ---
>  drivers/block/xen-blkfront.c | 25 +++++++++++++++----------
>  1 file changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 15e840287734..b7301006fb28 100644

(...)

> @@ -827,10 +832,10 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
>  	if (setup.segments)
>  		kunmap_atomic(setup.segments);
>  
> -	/* Keep a private copy so we can reissue requests when recovering. */
> -	rinfo->shadow[id].req = *ring_req;
> +	/* Copy request(s) to the ring page. */
> +	*final_ring_req = *ring_req;

Is this guaranteed to not be optimized by the compiler in an unsafe way
(like, do the operation the other way around)?
My version of the patch had "wmb()" just before, maybe a good idea to
add it here too?

>  	if (unlikely(require_extra_req))
> -		rinfo->shadow[extra_id].req = *extra_ring_req;
> +		*final_extra_ring_req = *extra_ring_req;
>  
>  	if (new_persistent_gnts)
>  		gnttab_free_grant_references(setup.gref_head);
> -- 
> 2.26.2
> 
> 

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 2/3] xen/blkfront: don't take local copy of a request from the ring page
  2021-09-10 10:14   ` Marek Marczykowski-Górecki
@ 2021-09-10 10:36     ` Juergen Gross
  0 siblings, 0 replies; 10+ messages in thread
From: Juergen Gross @ 2021-09-10 10:36 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: xen-devel, linux-block, linux-kernel, Konrad Rzeszutek Wilk,
	Roger Pau Monné,
	Boris Ostrovsky, Stefano Stabellini, Jens Axboe, Jan Beulich


[-- Attachment #1.1.1: Type: text/plain, Size: 1589 bytes --]

On 10.09.21 12:14, Marek Marczykowski-Górecki wrote:
> On Fri, Jul 30, 2021 at 12:38:53PM +0200, Juergen Gross wrote:
>> In order to avoid a malicious backend being able to influence the local
>> copy of a request build the request locally first and then copy it to
>> the ring page instead of doing it the other way round as today.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
>> ---
>> V2:
>> - init variable to avoid potential compiler warning (Jan Beulich)
>> ---
>>   drivers/block/xen-blkfront.c | 25 +++++++++++++++----------
>>   1 file changed, 15 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>> index 15e840287734..b7301006fb28 100644
> 
> (...)
> 
>> @@ -827,10 +832,10 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
>>   	if (setup.segments)
>>   		kunmap_atomic(setup.segments);
>>   
>> -	/* Keep a private copy so we can reissue requests when recovering. */
>> -	rinfo->shadow[id].req = *ring_req;
>> +	/* Copy request(s) to the ring page. */
>> +	*final_ring_req = *ring_req;
> 
> Is this guaranteed to not be optimized by the compiler in an unsafe way
> (like, do the operation the other way around)?

I don't think the C standard allows that. AFAIK reordering writes is
allowed only between sequence points. And each external function call is
a sequence point, making such an optimization in our case illegal.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-09-10 10:36 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-30 10:38 [PATCH v3 0/3] xen: harden blkfront against malicious backends Juergen Gross
2021-07-30 10:38 ` [PATCH v3 1/3] xen/blkfront: read response from backend only once Juergen Gross
2021-08-02 14:06   ` Oleksandr Andrushchenko
2021-08-02 19:26     ` Julien Grall
2021-08-03  7:00       ` Oleksandr Andrushchenko
2021-07-30 10:38 ` [PATCH v3 2/3] xen/blkfront: don't take local copy of a request from the ring page Juergen Gross
2021-09-10 10:14   ` Marek Marczykowski-Górecki
2021-09-10 10:36     ` Juergen Gross
2021-07-30 10:38 ` [PATCH v3 3/3] xen/blkfront: don't trust the backend response data blindly Juergen Gross
2021-08-30 10:16 ` [PATCH v3 0/3] xen: harden blkfront against malicious backends Juergen Gross

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).