All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thorsten Leemhuis <regressions@leemhuis.info>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org, Alaa Hleihel <alaa@nvidia.com>,
	Leon Romanovsky <leonro@nvidia.com>,
	Jason Gunthorpe <jgg@nvidia.com>, Sasha Levin <sashal@kernel.org>
Subject: Re: [PATCH 5.15 04/42] RDMA/mlx5: Fix releasing unallocated memory in dereg MR flow
Date: Sat, 1 Jan 2022 11:56:26 +0100	[thread overview]
Message-ID: <bbb587b1-4555-ba8d-fe43-d56d41a3c652@leemhuis.info> (raw)
In-Reply-To: <20211215172026.789963312@linuxfoundation.org>

Hi, this is your Linux kernel regression tracker speaking.

On 15.12.21 18:20, Greg Kroah-Hartman wrote:
> From: Alaa Hleihel <alaa@nvidia.com>
> 
> [ Upstream commit f0ae4afe3d35e67db042c58a52909e06262b740f ]
> 
> For the case of IB_MR_TYPE_DM the mr does doesn't have a umem, even though
> it is a user MR. This causes function mlx5_free_priv_descs() to think that
> it is a kernel MR, leading to wrongly accessing mr->descs that will get
> wrong values in the union which leads to attempt to release resources that
> were not allocated in the first place.

TWIMC, that commit made it into 5.15.y, but is known to cause a
regression in v5.16-rc:

https://lore.kernel.org/lkml/f298db4ec5fdf7a2d1d166ca2f66020fd9397e5c.1640079962.git.leonro@nvidia.com/
https://lore.kernel.org/all/EEBA2D1C-F29C-4237-901C-587B60CEE113@oracle.com/

A fix for mainline was posted, but got stuck afaics:
https://lore.kernel.org/lkml/f298db4ec5fdf7a2d1d166ca2f66020fd9397e5c.1640079962.git.leonro@nvidia.com/

A revert was also discussed, but not performed:
https://lore.kernel.org/all/20211222101312.1358616-1-maorg@nvidia.com/

Ciao, Thorsten

> For example:
>  DMA-API: mlx5_core 0000:08:00.1: device driver tries to free DMA memory it has not allocated [device address=0x0000000000000000] [size=0 bytes]
>  WARNING: CPU: 8 PID: 1021 at kernel/dma/debug.c:961 check_unmap+0x54f/0x8b0
>  RIP: 0010:check_unmap+0x54f/0x8b0
>  Call Trace:
>   debug_dma_unmap_page+0x57/0x60
>   mlx5_free_priv_descs+0x57/0x70 [mlx5_ib]
>   mlx5_ib_dereg_mr+0x1fb/0x3d0 [mlx5_ib]
>   ib_dereg_mr_user+0x60/0x140 [ib_core]
>   uverbs_destroy_uobject+0x59/0x210 [ib_uverbs]
>   uobj_destroy+0x3f/0x80 [ib_uverbs]
>   ib_uverbs_cmd_verbs+0x435/0xd10 [ib_uverbs]
>   ? uverbs_finalize_object+0x50/0x50 [ib_uverbs]
>   ? lock_acquire+0xc4/0x2e0
>   ? lock_acquired+0x12/0x380
>   ? lock_acquire+0xc4/0x2e0
>   ? lock_acquire+0xc4/0x2e0
>   ? ib_uverbs_ioctl+0x7c/0x140 [ib_uverbs]
>   ? lock_release+0x28a/0x400
>   ib_uverbs_ioctl+0xc0/0x140 [ib_uverbs]
>   ? ib_uverbs_ioctl+0x7c/0x140 [ib_uverbs]
>   __x64_sys_ioctl+0x7f/0xb0
>   do_syscall_64+0x38/0x90
> 
> Fix it by reorganizing the dereg flow and mlx5_ib_mr structure:
>  - Move the ib_umem field into the user MRs structure in the union as it's
>    applicable only there.
>  - Function mlx5_ib_dereg_mr() will now call mlx5_free_priv_descs() only
>    in case there isn't udata, which indicates that this isn't a user MR.
> 
> Fixes: f18ec4223117 ("RDMA/mlx5: Use a union inside mlx5_ib_mr")
> Link: https://lore.kernel.org/r/66bb1dd253c1fd7ceaa9fc411061eefa457b86fb.1637581144.git.leonro@nvidia.com
> Signed-off-by: Alaa Hleihel <alaa@nvidia.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>  drivers/infiniband/hw/mlx5/mlx5_ib.h |  6 +++---
>  drivers/infiniband/hw/mlx5/mr.c      | 26 ++++++++++++--------------
>  2 files changed, 15 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> index bf20a388eabe1..6204ae2caef58 100644
> --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
> +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> @@ -641,7 +641,6 @@ struct mlx5_ib_mr {
>  
>  	/* User MR data */
>  	struct mlx5_cache_ent *cache_ent;
> -	struct ib_umem *umem;
>  
>  	/* This is zero'd when the MR is allocated */
>  	union {
> @@ -653,7 +652,7 @@ struct mlx5_ib_mr {
>  			struct list_head list;
>  		};
>  
> -		/* Used only by kernel MRs (umem == NULL) */
> +		/* Used only by kernel MRs */
>  		struct {
>  			void *descs;
>  			void *descs_alloc;
> @@ -675,8 +674,9 @@ struct mlx5_ib_mr {
>  			int data_length;
>  		};
>  
> -		/* Used only by User MRs (umem != NULL) */
> +		/* Used only by User MRs */
>  		struct {
> +			struct ib_umem *umem;
>  			unsigned int page_shift;
>  			/* Current access_flags */
>  			int access_flags;
> diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
> index 22e2f4d79743d..69b2ce4c292ae 100644
> --- a/drivers/infiniband/hw/mlx5/mr.c
> +++ b/drivers/infiniband/hw/mlx5/mr.c
> @@ -1911,19 +1911,18 @@ mlx5_alloc_priv_descs(struct ib_device *device,
>  	return ret;
>  }
>  
> -static void
> -mlx5_free_priv_descs(struct mlx5_ib_mr *mr)
> +static void mlx5_free_priv_descs(struct mlx5_ib_mr *mr)
>  {
> -	if (!mr->umem && mr->descs) {
> -		struct ib_device *device = mr->ibmr.device;
> -		int size = mr->max_descs * mr->desc_size;
> -		struct mlx5_ib_dev *dev = to_mdev(device);
> +	struct mlx5_ib_dev *dev = to_mdev(mr->ibmr.device);
> +	int size = mr->max_descs * mr->desc_size;
>  
> -		dma_unmap_single(&dev->mdev->pdev->dev, mr->desc_map, size,
> -				 DMA_TO_DEVICE);
> -		kfree(mr->descs_alloc);
> -		mr->descs = NULL;
> -	}
> +	if (!mr->descs)
> +		return;
> +
> +	dma_unmap_single(&dev->mdev->pdev->dev, mr->desc_map, size,
> +			 DMA_TO_DEVICE);
> +	kfree(mr->descs_alloc);
> +	mr->descs = NULL;
>  }
>  
>  int mlx5_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
> @@ -1999,7 +1998,8 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
>  	if (mr->cache_ent) {
>  		mlx5_mr_cache_free(dev, mr);
>  	} else {
> -		mlx5_free_priv_descs(mr);
> +		if (!udata)
> +			mlx5_free_priv_descs(mr);
>  		kfree(mr);
>  	}
>  	return 0;
> @@ -2086,7 +2086,6 @@ static struct mlx5_ib_mr *mlx5_ib_alloc_pi_mr(struct ib_pd *pd,
>  	if (err)
>  		goto err_free_in;
>  
> -	mr->umem = NULL;
>  	kfree(in);
>  
>  	return mr;
> @@ -2213,7 +2212,6 @@ static struct ib_mr *__mlx5_ib_alloc_mr(struct ib_pd *pd,
>  	}
>  
>  	mr->ibmr.device = pd->device;
> -	mr->umem = NULL;
>  
>  	switch (mr_type) {
>  	case IB_MR_TYPE_MEM_REG:


  reply	other threads:[~2022-01-01 10:56 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-15 17:20 [PATCH 5.15 00/42] 5.15.9-rc1 review Greg Kroah-Hartman
2021-12-15 17:20 ` [PATCH 5.15 01/42] nfc: fix segfault in nfc_genl_dump_devices_done Greg Kroah-Hartman
2021-12-15 17:20 ` [PATCH 5.15 02/42] hwmon: (corsair-psu) fix plain integer used as NULL pointer Greg Kroah-Hartman
2021-12-15 17:20 ` [PATCH 5.15 03/42] RDMA: Fix use-after-free in rxe_queue_cleanup Greg Kroah-Hartman
2021-12-15 17:20 ` [PATCH 5.15 04/42] RDMA/mlx5: Fix releasing unallocated memory in dereg MR flow Greg Kroah-Hartman
2022-01-01 10:56   ` Thorsten Leemhuis [this message]
2022-01-07  5:57     ` Thorsten Leemhuis
2022-01-07 10:57       ` Greg Kroah-Hartman
2021-12-15 17:20 ` [PATCH 5.15 05/42] mtd: rawnand: Fix nand_erase_op delay Greg Kroah-Hartman
2021-12-15 17:20 ` [PATCH 5.15 06/42] mtd: rawnand: Fix nand_choose_best_timings() on unsupported interface Greg Kroah-Hartman
2021-12-15 17:20 ` [PATCH 5.15 07/42] inet: use #ifdef CONFIG_SOCK_RX_QUEUE_MAPPING consistently Greg Kroah-Hartman
2021-12-15 17:20 ` [PATCH 5.15 08/42] dt-bindings: media: nxp,imx7-mipi-csi2: Drop bad if/then schema Greg Kroah-Hartman
2021-12-15 17:20   ` [PATCH 5.15 08/42] dt-bindings: media: nxp, imx7-mipi-csi2: " Greg Kroah-Hartman
2021-12-15 17:20 ` [PATCH 5.15 09/42] clk: qcom: sm6125-gcc: Swap ops of ice and apps on sdcc1 Greg Kroah-Hartman
2021-12-15 17:20 ` [PATCH 5.15 10/42] perf bpf_skel: Do not use typedef to avoid error on old clang Greg Kroah-Hartman
2021-12-15 17:20 ` [PATCH 5.15 11/42] netfs: Fix lockdep warning from taking sb_writers whilst holding mmap_lock Greg Kroah-Hartman
2021-12-15 17:20 ` [PATCH 5.15 12/42] RDMA/irdma: Fix a user-after-free in add_pble_prm Greg Kroah-Hartman
2021-12-15 17:20 ` [PATCH 5.15 13/42] RDMA/irdma: Fix a potential memory allocation issue in irdma_prm_add_pble_mem() Greg Kroah-Hartman
2021-12-15 17:20 ` [PATCH 5.15 14/42] RDMA/irdma: Report correct WC errors Greg Kroah-Hartman
2021-12-15 17:20 ` [PATCH 5.15 15/42] RDMA/irdma: Dont arm the CQ more than two times if no CE for this CQ Greg Kroah-Hartman
2021-12-15 17:20 ` [PATCH 5.15 16/42] ice: fix FDIR init missing when reset VF Greg Kroah-Hartman
2021-12-15 17:20 ` [PATCH 5.15 17/42] vmxnet3: fix minimum vectors alloc issue Greg Kroah-Hartman
2021-12-15 17:20 ` [PATCH 5.15 18/42] i2c: virtio: fix completion handling Greg Kroah-Hartman
2021-12-15 17:21 ` [PATCH 5.15 19/42] drm/msm: Fix null ptr access msm_ioctl_gem_submit() Greg Kroah-Hartman
2021-12-15 17:21 ` [PATCH 5.15 20/42] drm/msm/a6xx: Fix uinitialized use of gpu_scid Greg Kroah-Hartman
2021-12-15 17:21 ` [PATCH 5.15 21/42] drm/msm/dsi: set default num_data_lanes Greg Kroah-Hartman
2021-12-15 17:21 ` [PATCH 5.15 22/42] drm/msm/dp: Avoid unpowered AUX xfers that caused crashes Greg Kroah-Hartman
2021-12-15 17:21 ` [PATCH 5.15 23/42] KVM: arm64: Save PSTATE early on exit Greg Kroah-Hartman
2021-12-15 17:21 ` [PATCH 5.15 24/42] s390/test_unwind: use raw opcode instead of invalid instruction Greg Kroah-Hartman
2021-12-15 17:21 ` [PATCH 5.15 25/42] Revert "tty: serial: fsl_lpuart: drop earlycon entry for i.MX8QXP" Greg Kroah-Hartman
2021-12-15 17:21 ` [PATCH 5.15 26/42] net/mlx4_en: Update reported link modes for 1/10G Greg Kroah-Hartman
2021-12-15 17:21 ` [PATCH 5.15 27/42] loop: Use pr_warn_once() for loop_control_remove() warning Greg Kroah-Hartman
2021-12-15 17:21 ` [PATCH 5.15 28/42] ALSA: hda: Add Intel DG2 PCI ID and HDMI codec vid Greg Kroah-Hartman
2021-12-15 17:21 ` [PATCH 5.15 29/42] ALSA: hda/hdmi: fix HDA codec entry table order for ADL-P Greg Kroah-Hartman
2021-12-15 17:21 ` [PATCH 5.15 30/42] parisc/agp: Annotate parisc agp init functions with __init Greg Kroah-Hartman
2021-12-15 17:21 ` [PATCH 5.15 31/42] i2c: rk3x: Handle a spurious start completion interrupt flag Greg Kroah-Hartman
2021-12-15 17:21 ` [PATCH 5.15 32/42] net: netlink: af_netlink: Prevent empty skb by adding a check on len Greg Kroah-Hartman
2021-12-15 17:21 ` [PATCH 5.15 33/42] drm/amdgpu: cancel the correct hrtimer on exit Greg Kroah-Hartman
2021-12-15 17:21 ` [PATCH 5.15 34/42] drm/amdgpu: check atomic flag to differeniate with legacy path Greg Kroah-Hartman
2021-12-15 17:21 ` [PATCH 5.15 35/42] drm/amd/display: Fix for the no Audio bug with Tiled Displays Greg Kroah-Hartman
2021-12-15 17:21 ` [PATCH 5.15 36/42] drm/amdkfd: fix double free mem structure Greg Kroah-Hartman
2021-12-15 17:21 ` [PATCH 5.15 37/42] drm/amd/display: add connector type check for CRC source set Greg Kroah-Hartman
2021-12-15 17:21 ` [PATCH 5.15 38/42] drm/amdkfd: process_info lock not needed for svm Greg Kroah-Hartman
2021-12-15 17:21 ` [PATCH 5.15 39/42] tracing: Fix a kmemleak false positive in tracing_map Greg Kroah-Hartman
2021-12-15 17:21 ` [PATCH 5.15 40/42] staging: most: dim2: use device release method Greg Kroah-Hartman
2021-12-15 17:21 ` [PATCH 5.15 41/42] fuse: make sure reclaim doesnt write the inode Greg Kroah-Hartman
2021-12-15 17:21 ` [PATCH 5.15 42/42] perf inject: Fix itrace space allowed for new attributes Greg Kroah-Hartman
2021-12-15 20:01 ` [PATCH 5.15 00/42] 5.15.9-rc1 review Jon Hunter
2021-12-15 21:51 ` Shuah Khan
2021-12-15 23:03 ` Fox Chen
2021-12-15 23:46 ` Florian Fainelli
2021-12-16  3:08 ` Naresh Kamboju
2021-12-16 18:08 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bbb587b1-4555-ba8d-fe43-d56d41a3c652@leemhuis.info \
    --to=regressions@leemhuis.info \
    --cc=alaa@nvidia.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jgg@nvidia.com \
    --cc=leonro@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.