From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-eopbgr10079.outbound.protection.outlook.com ([40.107.1.79]:48608 "EHLO EUR02-HE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751288AbeCEKJ6 (ORCPT ); Mon, 5 Mar 2018 05:09:58 -0500 Subject: Re: [PATCH] net/mlx4_en: fix potential use-after-free with dma_unmap_page To: Sarah Newman , Tariq Toukan , Yishai Hadas Cc: netdev@vger.kernel.org References: <1520223607-17560-1-git-send-email-srn@prgmr.com> From: Tariq Toukan Message-ID: <1cee1792-52ae-f408-15f0-ee42c1fb2ea3@mellanox.com> Date: Mon, 5 Mar 2018 12:09:48 +0200 MIME-Version: 1.0 In-Reply-To: <1520223607-17560-1-git-send-email-srn@prgmr.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: netdev-owner@vger.kernel.org List-ID: On 05/03/2018 6:20 AM, Sarah Newman wrote: > Take an additional reference to a page whenever it is placed > into the rx ring and put the page again after running > dma_unmap_page. > > When swiotlb is in use, calling dma_unmap_page means that > the original page mapped with dma_map_page must still be valid, > as swiotlb will copy data from its internal cache back to the > originally requested DMA location. > > When GRO is enabled, before this patch all references to the > original frag may be put and the page freed before dma_unmap_page > in mlx4_en_free_frag is called. > > It is possible there is a path where the use-after-free occurs > even with GRO disabled, but this has not been observed so far. > > The bug can be trivially detected by doing the following: > > * Compile the kernel with DEBUG_PAGEALLOC > * Run the kernel as a Xen Dom0 > * Leave GRO enabled on the interface > * Run a 10 second or more test with iperf over the interface. > Hi Sarah, thanks for your patch! > This bug was likely introduced in > commit 4cce66cdd14a ("mlx4_en: map entire pages to increase throughput"), > first part of u3.6. > > It was incidentally fixed in > commit 34db548bfb95 ("mlx4: add page recycling in receive path"), > first part of v4.12. > > This version applies to the v4.9 series. > > Signed-off-by: Sarah Newman > Tested-by: Sarah Newman > --- > drivers/net/ethernet/mellanox/mlx4/en_rx.c | 39 +++++++++++++++++++++------- > drivers/net/ethernet/mellanox/mlx4/en_tx.c | 3 ++- > drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 1 + > 3 files changed, 32 insertions(+), 11 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c > index bcbb80f..d1fb087 100644 > --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c > +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c > @@ -80,10 +80,14 @@ static int mlx4_alloc_pages(struct mlx4_en_priv *priv, > page_alloc->page = page; > page_alloc->dma = dma; > page_alloc->page_offset = 0; > + page_alloc->page_owner = true; Do we really need this boolean? I believe the issue can be fixed without it. We need to make sure we hold the correct refcnt at every stage, and maintain symmetry between a flow and its inverse. Upon alloc, refcnt is 1. This alloc refcnt should be inverted by a call to put_page. We might want to introduce a page free API (symmetric to mlx4_alloc_pages), that does: dma unmap the page, call put_page, nullify pointer. Once alloced, page refcnt is bumped up by the amount of possible frags populating it, which is (page_size / frag_stride), as you do here. > /* Not doing get_page() for each frag is a big win > * on asymetric workloads. Note we can not use atomic_set(). > */ > - page_ref_add(page, page_alloc->page_size / frag_info->frag_stride - 1); > + /* Since the page must be valid until after dma_unmap_page is called, > + * take an additional reference we would not have otherwise. > + */ > + page_ref_add(page, page_alloc->page_size / frag_info->frag_stride); > return 0; > } > > @@ -105,9 +109,13 @@ static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv, > page_alloc[i].page_offset += frag_info->frag_stride; > > if (page_alloc[i].page_offset + frag_info->frag_stride <= > - ring_alloc[i].page_size) > - continue; > - > + ring_alloc[i].page_size) { > + WARN_ON(!page_alloc[i].page); > + WARN_ON(!page_alloc[i].page_owner); Why WARN before the likely() check? Move after the check, for a better performance. > + if (likely(page_alloc[i].page && > + page_alloc[i].page_owner)) > + continue; > + } > if (unlikely(mlx4_alloc_pages(priv, &page_alloc[i], > frag_info, gfp))) > goto out; > @@ -131,7 +139,7 @@ static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv, > page = page_alloc[i].page; > /* Revert changes done by mlx4_alloc_pages */ > page_ref_sub(page, page_alloc[i].page_size / > - priv->frag_info[i].frag_stride - 1); > + priv->frag_info[i].frag_stride); > put_page(page); > } > } > @@ -146,11 +154,13 @@ static void mlx4_en_free_frag(struct mlx4_en_priv *priv, > u32 next_frag_end = frags[i].page_offset + 2 * frag_info->frag_stride; > > > - if (next_frag_end > frags[i].page_size) > + if (next_frag_end > frags[i].page_size) { > dma_unmap_page(priv->ddev, frags[i].dma, frags[i].page_size, > frag_info->dma_dir); > + put_page(frags[i].page); > + } > > - if (frags[i].page) > + if (frags[i].page_owner) > put_page(frags[i].page); > } > > @@ -184,9 +194,10 @@ static int mlx4_en_init_allocator(struct mlx4_en_priv *priv, > page = page_alloc->page; > /* Revert changes done by mlx4_alloc_pages */ > page_ref_sub(page, page_alloc->page_size / > - priv->frag_info[i].frag_stride - 1); > + priv->frag_info[i].frag_stride); > put_page(page); > page_alloc->page = NULL; > + page_alloc->page_owner = false; > } > return -ENOMEM; > } > @@ -206,12 +217,14 @@ static void mlx4_en_destroy_allocator(struct mlx4_en_priv *priv, > > dma_unmap_page(priv->ddev, page_alloc->dma, > page_alloc->page_size, frag_info->dma_dir); > + put_page(page_alloc->page); for symmetry, i'd move this after the while loop. > while (page_alloc->page_offset + frag_info->frag_stride < > page_alloc->page_size) { > put_page(page_alloc->page); > page_alloc->page_offset += frag_info->frag_stride; > } > page_alloc->page = NULL; > + page_alloc->page_owner = false; > } > } > > @@ -251,6 +264,11 @@ static int mlx4_en_prepare_rx_desc(struct mlx4_en_priv *priv, > if (ring->page_cache.index > 0) { > frags[0] = ring->page_cache.buf[--ring->page_cache.index]; > rx_desc->data[0].addr = cpu_to_be64(frags[0].dma); > + WARN_ON(frags[0].page_owner); > + if (likely(!frags[0].page_owner)) { > + page_ref_inc(frags[0].page); > + frags[0].page_owner = true; > + } Why? If I'm not mistaken, the page is cached with refcnt == 2. No? > return 0; > } > > @@ -569,6 +587,7 @@ void mlx4_en_deactivate_rx_ring(struct mlx4_en_priv *priv, > > dma_unmap_page(priv->ddev, frame->dma, frame->page_size, > priv->frag_info[0].dma_dir); > + WARN_ON(frame->page_owner); > put_page(frame->page); > } > ring->page_cache.index = 0; > @@ -595,7 +614,7 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv, > frag_info = &priv->frag_info[nr]; > if (length <= frag_info->frag_prefix_size) > break; > - if (unlikely(!frags[nr].page)) > + if (unlikely(!frags[nr].page_owner)) > goto fail; > > dma = be64_to_cpu(rx_desc->data[nr].addr); > @@ -607,7 +626,7 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv, > skb_frag_size_set(&skb_frags_rx[nr], frag_info->frag_size); > skb_frags_rx[nr].page_offset = frags[nr].page_offset; > skb->truesize += frag_info->frag_stride; > - frags[nr].page = NULL; > + frags[nr].page_owner = false; > } > /* Adjust size of last fragment to match actual length */ > if (nr > 0) > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c > index e2509bb..25f7f9e 100644 > --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c > +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c > @@ -356,6 +356,7 @@ u32 mlx4_en_recycle_tx_desc(struct mlx4_en_priv *priv, > .dma = tx_info->map0_dma, > .page_offset = 0, > .page_size = PAGE_SIZE, > + .page_owner = false, I don't understand why this is needed. > }; > > if (!mlx4_en_rx_recycle(ring->recycle_ring, &frame)) { > @@ -1128,7 +1129,7 @@ netdev_tx_t mlx4_en_xmit_frame(struct mlx4_en_rx_alloc *frame, > dma = frame->dma; > > tx_info->page = frame->page; > - frame->page = NULL; > + frame->page_owner = false; > tx_info->map0_dma = dma; > tx_info->map0_byte_count = length; > tx_info->nr_txbb = nr_txbb; > diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h > index df0f396..2c9d9a6 100644 > --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h > +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h > @@ -261,6 +261,7 @@ struct mlx4_en_rx_alloc { > dma_addr_t dma; > u32 page_offset; > u32 page_size; > + bool page_owner; > }; > > #define MLX4_EN_CACHE_SIZE (2 * NAPI_POLL_WEIGHT) > Thanks, Tariq