linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Leon Romanovsky <leon@kernel.org>
Cc: Doug Ledford <dledford@redhat.com>,
	Aharon Landau <aharonl@nvidia.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Jason Wang <jasowang@redhat.com>,
	linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
	"Michael S. Tsirkin" <mst@redhat.com>,
	netdev@vger.kernel.org, Saeed Mahameed <saeedm@nvidia.com>,
	Shay Drory <shayd@nvidia.com>,
	virtualization@lists.linux-foundation.org
Subject: Re: [PATCH mlx5-next 4/5] RDMA/mlx5: Change the cache structure to an rbtree
Date: Thu, 29 Jul 2021 16:45:43 -0300	[thread overview]
Message-ID: <20210729194543.GA2484190@nvidia.com> (raw)
In-Reply-To: <f4dbb2d090b2d97ac6ba3d88605069fa2e83fff8.1624362290.git.leonro@nvidia.com>


> diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> index ffb6f1d41f3d..e22eeceae9eb 100644
> +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> @@ -749,7 +749,7 @@ struct mlx5_cache_ent {
>  
>  
>  	char                    name[4];
> -	u32                     order;
> +	u32			order;
>  	u32			xlt;
>  	u32			access_mode;
>  	u32			page;

Looking at this, it looks like it will be a lot simpler to just store
the reference mkc here and use that whole blob as the rb key and write
a slightly special compare function.. Maybe only the ndescs needs to
be stored loose.

Then all the weirdness about special ents disappears, they are
naturally handled by having the required bits in their mkc.

And all the random encode/decode/recode scattered all over the place
goes away. Anyone working with mkeys needs to build a mkc on their
stack, then check if allocation of that mkc can be satisfied with the
cache, otherwise pass that same mkc to the alloc cmd. The one case
that uses the PAS will have to alloc a new mkc and memcpy, but that is
OK.

> +static struct mlx5_cache_ent *mkey_cache_ent_from_size(struct mlx5_ib_dev *dev,
> +						       int ent_flags, int size)
> +{
> +	struct rb_node *node = dev->cache.cache_root.rb_node;
> +	struct mlx5_cache_ent *cur, *prev = NULL;
> +
> +	WARN_ON(!mutex_is_locked(&dev->cache.cache_lock));

Yikes, no, use lockdep.

> @@ -616,13 +739,18 @@ struct mlx5_ib_mr *mlx5_alloc_special_mkey(struct mlx5_ib_dev *dev,
>  static struct mlx5r_cache_mkey *get_cache_mkey(struct mlx5_cache_ent *req_ent)
>  {
>  	struct mlx5_ib_dev *dev = req_ent->dev;
> -	struct mlx5_cache_ent *ent = req_ent;
>  	struct mlx5r_cache_mkey *cmkey;
> +	struct mlx5_cache_ent *ent;
> +	struct rb_node *node;
>  
>  	/* Try larger Mkey pools from the cache to satisfy the allocation */
> -	for (; ent != &dev->cache.ent[MKEY_CACHE_LAST_STD_ENTRY + 1]; ent++) {
> -		mlx5_ib_dbg(dev, "order %u, cache index %zu\n", ent->order,
> -			    ent - dev->cache.ent);
> +	mutex_lock(&dev->cache.cache_lock);
> +	for (node = &req_ent->node; node; node = rb_next(node)) {
> +		ent = container_of(node, struct mlx5_cache_ent, node);

See, this should be 'search for the mkc I have for the lowest entry with size+1'

> -int mlx5_mkey_cache_init(struct mlx5_ib_dev *dev)
> +int mlx5_mkey_cache_tree_init(struct mlx5_ib_dev *dev)
>  {
> -	struct mlx5_mkey_cache *cache = &dev->cache;
> +	struct mlx5_mkey_cache_tree *cache = &dev->cache;
>  	struct mlx5_cache_ent *ent;
> +	int err;
>  	int i;
>  
>  	mutex_init(&dev->slow_path_mutex);
> +	mutex_init(&cache->cache_lock);
> +	cache->cache_root = RB_ROOT;
>  	cache->wq = alloc_ordered_workqueue("mkey_cache", WQ_MEM_RECLAIM);
>  	if (!cache->wq) {
>  		mlx5_ib_warn(dev, "failed to create work queue\n");
> @@ -745,28 +882,25 @@ int mlx5_mkey_cache_init(struct mlx5_ib_dev *dev)
>  
>  	mlx5_cmd_init_async_ctx(dev->mdev, &dev->async_ctx);
>  	timer_setup(&dev->delay_timer, delay_time_func, 0);
> -	for (i = 0; i < MAX_MKEY_CACHE_ENTRIES; i++) {
> -		ent = &cache->ent[i];
> -		INIT_LIST_HEAD(&ent->head);
> -		spin_lock_init(&ent->lock);
> -		ent->order = i + 2;
> -		ent->dev = dev;
> -		ent->limit = 0;
> -
> -		INIT_WORK(&ent->work, cache_work_func);
> -		INIT_DELAYED_WORK(&ent->dwork, delayed_cache_work_func);
> +	for (i = 0; i < MAX_MKEY_CACHE_DEFAULT_ENTRIES; i++) {
> +		u8 order = i + 2;
> +		u32 xlt_size = (1 << order) * sizeof(struct mlx5_mtt) /
> +			       MLX5_IB_UMR_OCTOWORD;

This should be written saner

for (xlt_size = MKEY_CACHE_DEFAULT_MIN_DESCS * sizeof(struct mlx5_mtt) / MLX5_IB_UMR_OCTOWORD; 
     xlt_size <= MKEY_CACHE_DEFAULT_MAX_DESCS *  sizeof(struct mlx5_mtt) / MLX5_IB_UMR_OCTOWORD; 
     xlt_size *= 2)

>  
>  		if (i > MKEY_CACHE_LAST_STD_ENTRY) {

The index in the cache should be meaningless now, so don't put this
code here.

> -			mlx5_odp_init_mkey_cache_entry(ent);
> +			err = mlx5_odp_init_mkey_cache_entry(dev, i);
> +			if (err)
> +				return err;
>  			continue;
>  		}

> -		if (ent->order > mkey_cache_max_order(dev))
> +		ent = mlx5_ib_create_cache_ent(dev, 0, xlt_size, order);

And why do we need to pass in order, why is it stored in the
cache_ent? Looks like it should be removed

The debugfs looks like it might need some rethink as is it can only
control the original buckets, the new buckets don't get exposed. Seems
like trouble.

If just exposing the legacy things is the idea then it should have the
same sweep over the parameter space as above, not just assume that the
rb tree is in order and only contains debugfs entries.

Probably change it to create the debugfs nodes at the same time the
cache entry itself is created.

> @@ -973,14 +1100,16 @@ static struct mlx5_ib_mr *alloc_cacheable_mr(struct ib_pd *pd,
>  						     0, iova);
>  	if (WARN_ON(!page_size))
>  		return ERR_PTR(-EINVAL);
> -	ent = mkey_cache_ent_from_order(
> -		dev, order_base_2(ib_umem_num_dma_blocks(umem, page_size)));
> +	ent_flags = mlx5_ent_access_flags(dev, access_flags);
> +	xlt_size = get_octo_len(iova, umem->length, order_base_2(page_size));
> +	mutex_lock(&dev->cache.cache_lock);
> +	ent = mkey_cache_ent_from_size(dev, ent_flags, xlt_size);

See here is where I wonder if it is just better to build the mkc on
the stack in one place instead of having all this stuff open coded all
over..

> -void mlx5_odp_init_mkey_cache_entry(struct mlx5_cache_ent *ent)
> +int mlx5_odp_init_mkey_cache_entry(struct mlx5_ib_dev *dev, int ent_num)
>  {
> -	if (!(ent->dev->odp_caps.general_caps & IB_ODP_SUPPORT_IMPLICIT))
> -		return;
> +	struct mlx5_cache_ent *ent;
> +	int ent_flags;
> +	u32 xlt_size;
> +
> +	if (!(dev->odp_caps.general_caps & IB_ODP_SUPPORT_IMPLICIT))
> +		return 0;
>  
> -	switch (ent->order - 2) {
> +	switch (ent_num) {
>  	case MLX5_IMR_MTT_CACHE_ENTRY:

Don't do stuff like this either. The mkc scheme will fix this too as
this will just create two mkcs for these unique usages and store them
normally.

Jason

  reply	other threads:[~2021-07-29 19:45 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1624362290.git.leonro@nvidia.com>
2021-06-22 12:08 ` [PATCH mlx5-next 1/5] RDMA/mlx5: Replace struct mlx5_core_mkey by u32 key Leon Romanovsky
2021-07-29 15:28   ` Jason Gunthorpe
2021-07-29 17:27     ` Leon Romanovsky
2021-07-29 18:08   ` Jason Gunthorpe
2021-06-22 12:08 ` [PATCH mlx5-next 2/5] RDMA/mlx5: Move struct mlx5_core_mkey to mlx5_ib Leon Romanovsky
2021-07-29 18:39   ` Jason Gunthorpe
2021-06-22 12:08 ` [PATCH mlx5-next 3/5] RDMA/mlx5: Change the cache to hold mkeys instead of MRs Leon Romanovsky
2021-07-29 19:08   ` Jason Gunthorpe
2021-06-22 12:08 ` [PATCH mlx5-next 4/5] RDMA/mlx5: Change the cache structure to an rbtree Leon Romanovsky
2021-07-29 19:45   ` Jason Gunthorpe [this message]
2021-06-22 12:08 ` [PATCH rdma-next 5/5] RDMA/mlx5: Delay the deregistration of a non-cache mkey Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210729194543.GA2484190@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=aharonl@nvidia.com \
    --cc=dledford@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=shayd@nvidia.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).