From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tariq Toukan via iovisor-dev Subject: Re: [PATCH RFC 03/11] net/mlx5e: Implement RX mapped page cache for page recycle Date: Tue, 13 Sep 2016 13:16:29 +0300 Message-ID: <549ee0e2-b76b-ec62-4287-e63c4320e7c6@mellanox.com> References: <1473252152-11379-1-git-send-email-saeedm@mellanox.com> <1473252152-11379-4-git-send-email-saeedm@mellanox.com> <20160907204501.08cc4ede@redhat.com> Reply-To: Tariq Toukan Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, iovisor-dev , Jamal Hadi Salim , Eric Dumazet , Tom Herbert To: Jesper Dangaard Brouer , Saeed Mahameed Return-path: In-Reply-To: <20160907204501.08cc4ede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: iovisor-dev-bounces-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy@public.gmane.org Errors-To: iovisor-dev-bounces-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy@public.gmane.org List-Id: netdev.vger.kernel.org On 07/09/2016 9:45 PM, Jesper Dangaard Brouer wrote: > On Wed, 7 Sep 2016 15:42:24 +0300 Saeed Mahameed wrote: > >> From: Tariq Toukan >> >> Instead of reallocating and mapping pages for RX data-path, >> recycle already used pages in a per ring cache. >> >> We ran pktgen single-stream benchmarks, with iptables-raw-drop: >> >> Single stride, 64 bytes: >> * 4,739,057 - baseline >> * 4,749,550 - order0 no cache >> * 4,786,899 - order0 with cache >> 1% gain >> >> Larger packets, no page cross, 1024 bytes: >> * 3,982,361 - baseline >> * 3,845,682 - order0 no cache >> * 4,127,852 - order0 with cache >> 3.7% gain >> >> Larger packets, every 3rd packet crosses a page, 1500 bytes: >> * 3,731,189 - baseline >> * 3,579,414 - order0 no cache >> * 3,931,708 - order0 with cache >> 5.4% gain >> >> Signed-off-by: Tariq Toukan >> Signed-off-by: Saeed Mahameed >> --- >> drivers/net/ethernet/mellanox/mlx5/core/en.h | 16 ++++++ >> drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 15 ++++++ >> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 57 ++++++++++++++++++++-- >> drivers/net/ethernet/mellanox/mlx5/core/en_stats.h | 16 ++++++ >> 4 files changed, 99 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h >> index 075cdfc..afbdf70 100644 >> --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h >> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h >> @@ -287,6 +287,18 @@ struct mlx5e_rx_am { /* Adaptive Moderation */ >> u8 tired; >> }; >> >> +/* a single cache unit is capable to serve one napi call (for non-striding rq) >> + * or a MPWQE (for striding rq). >> + */ >> +#define MLX5E_CACHE_UNIT (MLX5_MPWRQ_PAGES_PER_WQE > NAPI_POLL_WEIGHT ? \ >> + MLX5_MPWRQ_PAGES_PER_WQE : NAPI_POLL_WEIGHT) >> +#define MLX5E_CACHE_SIZE (2 * roundup_pow_of_two(MLX5E_CACHE_UNIT)) >> +struct mlx5e_page_cache { >> + u32 head; >> + u32 tail; >> + struct mlx5e_dma_info page_cache[MLX5E_CACHE_SIZE]; >> +}; >> + > [...] >> >> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c >> index c1cb510..8e02af3 100644 >> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c >> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c >> @@ -305,11 +305,55 @@ static inline void mlx5e_post_umr_wqe(struct mlx5e_rq *rq, u16 ix) >> mlx5e_tx_notify_hw(sq, &wqe->ctrl, 0); >> } >> >> +static inline bool mlx5e_rx_cache_put(struct mlx5e_rq *rq, >> + struct mlx5e_dma_info *dma_info) >> +{ >> + struct mlx5e_page_cache *cache = &rq->page_cache; >> + u32 tail_next = (cache->tail + 1) & (MLX5E_CACHE_SIZE - 1); >> + >> + if (tail_next == cache->head) { >> + rq->stats.cache_full++; >> + return false; >> + } >> + >> + cache->page_cache[cache->tail] = *dma_info; >> + cache->tail = tail_next; >> + return true; >> +} >> + >> +static inline bool mlx5e_rx_cache_get(struct mlx5e_rq *rq, >> + struct mlx5e_dma_info *dma_info) >> +{ >> + struct mlx5e_page_cache *cache = &rq->page_cache; >> + >> + if (unlikely(cache->head == cache->tail)) { >> + rq->stats.cache_empty++; >> + return false; >> + } >> + >> + if (page_ref_count(cache->page_cache[cache->head].page) != 1) { >> + rq->stats.cache_busy++; >> + return false; >> + } > Hmmm... doesn't this cause "blocking" of the page_cache recycle > facility until the page at the head of the queue gets (page) refcnt > decremented? Real use-case could fairly easily block/cause this... Hi Jesper, That's right. We are aware of this issue. We considered ways of solving this, but decided to keep current implementation for now. One way of solving this is to look deeper in the cache. Cons: - this will consume time, and the chance of finding an available page is not that high: if the page in head of queue is busy then there's a good chance that all the others are too (because of FIFO). in other words, you already checked all pages and anyway you're going to allocate a new one (higher penalty for same decision). - this will make holes in the array causing complex accounting when looking for an available page (this can easily be fixed by swapping between the page in head and the available one). Another way is sharing pages between different RQs. - For now we're not doing this for simplicity and to keep synchronization away. What do you think? Anyway, we're looking forward to use your page-pool API which solves these issues. Regards, Tariq > >> + >> + *dma_info = cache->page_cache[cache->head]; >> + cache->head = (cache->head + 1) & (MLX5E_CACHE_SIZE - 1); >> + rq->stats.cache_reuse++; >> + >> + dma_sync_single_for_device(rq->pdev, dma_info->addr, PAGE_SIZE, >> + DMA_FROM_DEVICE); >> + return true; >> +} >> + >> static inline int mlx5e_page_alloc_mapped(struct mlx5e_rq *rq, >> struct mlx5e_dma_info *dma_info) >> { >> - struct page *page = dev_alloc_page(); >> + struct page *page; >> + >> + if (mlx5e_rx_cache_get(rq, dma_info)) >> + return 0; >> >> + page = dev_alloc_page(); >> if (unlikely(!page)) >> return -ENOMEM;