All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: "Christian König" <ckoenig.leichtzumerken@gmail.com>
Cc: airlied@linux.ie, sroland@vmware.com,
	dri-devel@lists.freedesktop.org, ray.huang@amd.com,
	linux-graphics-maintainer@vmware.com
Subject: Re: [PATCH 1/2] drm/ttm: rework ttm_tt page limit v2
Date: Wed, 16 Dec 2020 16:09:48 +0100	[thread overview]
Message-ID: <X9ojPDJv1ybEJT0k@phenom.ffwll.local> (raw)
In-Reply-To: <20201216140427.1253-1-christian.koenig@amd.com>

On Wed, Dec 16, 2020 at 03:04:26PM +0100, Christian König wrote:
> TTM implements a rather extensive accounting of allocated memory.
> 
> There are two reasons for this:
> 1. It tries to block userspace allocating a huge number of very small
>    BOs without accounting for the kmalloced memory.
> 
> 2. Make sure we don't over allocate and run into an OOM situation
>    during swapout while trying to handle the memory shortage.
> 
> This is only partially a good idea. First of all it is perfectly
> valid for an application to use all of system memory, limiting it to
> 50% is not really acceptable.
> 
> What we need to take care of is that the application is held
> accountable for the memory it allocated. This is what control
> mechanisms like memcg and the normal Linux page accounting already do.
> 
> Making sure that we don't run into an OOM situation while trying to
> cope with a memory shortage is still a good idea, but this is also
> not very well implemented since it means another opportunity of
> recursion from the driver back into TTM.
> 
> So start to rework all of this by implementing a shrinker callback which
> allows for TT object to be swapped out if necessary.
> 
> v2: Switch from limit to shrinker callback.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/gpu/drm/ttm/ttm_bo.c        |  4 +-
>  drivers/gpu/drm/ttm/ttm_memory.c    |  7 ++-
>  drivers/gpu/drm/ttm/ttm_tt.c        | 82 +++++++++++++++++++++++++++--
>  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c |  2 +-
>  include/drm/ttm/ttm_bo_api.h        |  2 +-
>  include/drm/ttm/ttm_tt.h            |  6 ++-
>  6 files changed, 91 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 31e8b3da5563..f1f3fd085465 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -1412,7 +1412,7 @@ EXPORT_SYMBOL(ttm_bo_wait);
>   * A buffer object shrink method that tries to swap out the first
>   * buffer object on the bo_global::swap_lru list.
>   */
> -int ttm_bo_swapout(struct ttm_operation_ctx *ctx)
> +int ttm_bo_swapout(struct ttm_operation_ctx *ctx, gfp_t gfp_flags)
>  {
>  	struct ttm_bo_global *glob = &ttm_bo_glob;
>  	struct ttm_buffer_object *bo;
> @@ -1495,7 +1495,7 @@ int ttm_bo_swapout(struct ttm_operation_ctx *ctx)
>  	if (bo->bdev->driver->swap_notify)
>  		bo->bdev->driver->swap_notify(bo);
>  
> -	ret = ttm_tt_swapout(bo->bdev, bo->ttm);
> +	ret = ttm_tt_swapout(bo->bdev, bo->ttm, gfp_flags);
>  out:
>  
>  	/**
> diff --git a/drivers/gpu/drm/ttm/ttm_memory.c b/drivers/gpu/drm/ttm/ttm_memory.c
> index a3bfbd9cea68..3d2479d0ce2c 100644
> --- a/drivers/gpu/drm/ttm/ttm_memory.c
> +++ b/drivers/gpu/drm/ttm/ttm_memory.c
> @@ -37,6 +37,7 @@
>  #include <linux/slab.h>
>  #include <linux/swap.h>
>  #include <drm/ttm/ttm_pool.h>
> +#include <drm/ttm/ttm_tt.h>
>  
>  #include "ttm_module.h"
>  
> @@ -276,9 +277,9 @@ static void ttm_shrink(struct ttm_mem_global *glob, bool from_wq,
>  
>  	while (ttm_zones_above_swap_target(glob, from_wq, extra)) {
>  		spin_unlock(&glob->lock);
> -		ret = ttm_bo_swapout(ctx);
> +		ret = ttm_bo_swapout(ctx, 0);

General we don't treat gfp_mask as a set of additional flags, but the full
thing. So here we should have GFP_KERNEL.

Also having both the shrinker and the ttm_shrink_work is a bit much, the
shrink work should get deleted completely I think.

>  		spin_lock(&glob->lock);
> -		if (unlikely(ret != 0))
> +		if (unlikely(ret < 0))
>  			break;
>  	}
>  
> @@ -453,6 +454,7 @@ int ttm_mem_global_init(struct ttm_mem_global *glob)
>  			zone->name, (unsigned long long)zone->max_mem >> 10);
>  	}
>  	ttm_pool_mgr_init(glob->zone_kernel->max_mem/(2*PAGE_SIZE));
> +	ttm_tt_mgr_init();
>  	return 0;
>  out_no_zone:
>  	ttm_mem_global_release(glob);
> @@ -466,6 +468,7 @@ void ttm_mem_global_release(struct ttm_mem_global *glob)
>  
>  	/* let the page allocator first stop the shrink work. */
>  	ttm_pool_mgr_fini();
> +	ttm_tt_mgr_fini();
>  
>  	flush_workqueue(glob->swap_queue);
>  	destroy_workqueue(glob->swap_queue);
> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
> index 7f75a13163f0..d454c428c56a 100644
> --- a/drivers/gpu/drm/ttm/ttm_tt.c
> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> @@ -38,6 +38,8 @@
>  #include <drm/drm_cache.h>
>  #include <drm/ttm/ttm_bo_driver.h>
>  
> +static struct shrinker mm_shrinker;
> +
>  /*
>   * Allocates a ttm structure for the given BO.
>   */
> @@ -223,13 +225,23 @@ int ttm_tt_swapin(struct ttm_tt *ttm)
>  	return ret;
>  }
>  
> -int ttm_tt_swapout(struct ttm_bo_device *bdev, struct ttm_tt *ttm)
> +/**
> + * ttm_tt_swapout - swap out tt object
> + *
> + * @bdev: TTM device structure.
> + * @ttm: The struct ttm_tt.
> + * @gfp_flags: Flags to use for memory allocation.
> + *
> + * Swapout a TT object to a shmem_file, return number of pages swapped out or
> + * negative error code.
> + */
> +int ttm_tt_swapout(struct ttm_bo_device *bdev, struct ttm_tt *ttm,
> +		   gfp_t gfp_flags)
>  {
>  	struct address_space *swap_space;
>  	struct file *swap_storage;
>  	struct page *from_page;
>  	struct page *to_page;
> -	gfp_t gfp_mask;
>  	int i, ret;
>  
>  	swap_storage = shmem_file_setup("ttm swap",
> @@ -241,14 +253,14 @@ int ttm_tt_swapout(struct ttm_bo_device *bdev, struct ttm_tt *ttm)
>  	}
>  
>  	swap_space = swap_storage->f_mapping;
> -	gfp_mask = mapping_gfp_mask(swap_space);
> +	gfp_flags |= mapping_gfp_mask(swap_space);

I don't think this combines flags correctly. mapping_gfp_mask is most
likely GFP_KERNEL or something like that, so __GFP_FS (the thing you want
to check for to avoid recursion) is always set.

I think we need an & here, and maybe screaming if the gfp flags for the
swapout space are funny.

>  
>  	for (i = 0; i < ttm->num_pages; ++i) {
>  		from_page = ttm->pages[i];
>  		if (unlikely(from_page == NULL))
>  			continue;
>  
> -		to_page = shmem_read_mapping_page_gfp(swap_space, i, gfp_mask);
> +		to_page = shmem_read_mapping_page_gfp(swap_space, i, gfp_flags);
>  		if (IS_ERR(to_page)) {
>  			ret = PTR_ERR(to_page);
>  			goto out_err;
> @@ -263,7 +275,7 @@ int ttm_tt_swapout(struct ttm_bo_device *bdev, struct ttm_tt *ttm)
>  	ttm->swap_storage = swap_storage;
>  	ttm->page_flags |= TTM_PAGE_FLAG_SWAPPED;
>  
> -	return 0;
> +	return ttm->num_pages;
>  
>  out_err:
>  	fput(swap_storage);
> @@ -341,3 +353,63 @@ void ttm_tt_unpopulate(struct ttm_bo_device *bdev,
>  		ttm_pool_free(&bdev->pool, ttm);
>  	ttm->page_flags &= ~TTM_PAGE_FLAG_PRIV_POPULATED;
>  }
> +
> +/* As long as pages are available make sure to release at least one */
> +static unsigned long ttm_tt_shrinker_scan(struct shrinker *shrink,
> +					  struct shrink_control *sc)
> +{
> +	struct ttm_operation_ctx ctx = {
> +		.no_wait_gpu = true

Iirc there's an eventual shrinker limit where it gets desperate. I think
once we hit that, we should allow gpu waits. But it's not passed to
shrinkers for reasons, so maybe we should have a second round that tries
to more actively shrink objects if we fell substantially short of what
reclaim expected us to do?

Also don't we have a trylock_only flag here to make sure drivers don't do
something stupid?

> +	};
> +	int ret;
> +
> +	if (sc->gfp_mask & GFP_NOFS)
> +		return 0;
> +
> +	ret = ttm_bo_swapout(&ctx, GFP_NOFS);
> +	return ret < 0 ? SHRINK_EMPTY : ret;
> +}
> +
> +/* Return the number of pages available or SHRINK_EMPTY if we have none */
> +static unsigned long ttm_tt_shrinker_count(struct shrinker *shrink,
> +					   struct shrink_control *sc)
> +{
> +	struct ttm_buffer_object *bo;
> +	unsigned long num_pages = 0;
> +	unsigned int i;
> +
> +	if (sc->gfp_mask & GFP_NOFS)
> +		return 0;

The count function should always count, and I'm not seeing a reason why
you couldn't do that here ... Also my understanding is that GFP_NOFS never
goes into shrinkers (the NOFS comes from shrinkers originally only being
used for filesystem objects), so this is double redundant.

My understanding is that gfp_mask is just to convey the right zones and
stuff, so that your shrinker can try to shrink objects in the right zones.
Hence I think the check in the _scan() function should also be removed.

Also the non __ prefixed flags are the combinations callers are supposed
to look at. Memory reclaim code needs to look at the __GFP flags, see e.g. 
gfpflags_allow_blocking() or fs_reclaim_acquire().

> +
> +	/* TODO: Is that to much overhead? */
> +	spin_lock(&ttm_bo_glob.lru_lock);
> +	for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i)
> +		list_for_each_entry(bo, &ttm_bo_glob.swap_lru[i], swap)
> +			num_pages += bo->ttm->num_pages;
> +	spin_unlock(&ttm_bo_glob.lru_lock);
> +
> +	return num_pages ? num_pages : SHRINK_EMPTY;
> +}
> +
> +/**
> + * ttm_tt_mgr_init - register with the MM shrinker
> + *
> + * Register with the MM shrinker for swapping out BOs.
> + */
> +int ttm_tt_mgr_init(void)
> +{
> +	mm_shrinker.count_objects = ttm_tt_shrinker_count;
> +	mm_shrinker.scan_objects = ttm_tt_shrinker_scan;
> +	mm_shrinker.seeks = 1;
> +	return register_shrinker(&mm_shrinker);
> +}
> +
> +/**
> + * ttm_tt_mgr_fini - unregister our MM shrinker
> + *
> + * Unregisters the MM shrinker.
> + */
> +void ttm_tt_mgr_fini(void)
> +{
> +	unregister_shrinker(&mm_shrinker);
> +}
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> index 0008be02d31c..e141fa9f616d 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> @@ -1389,7 +1389,7 @@ static int vmw_pm_freeze(struct device *kdev)
>  	vmw_execbuf_release_pinned_bo(dev_priv);
>  	vmw_resource_evict_all(dev_priv);
>  	vmw_release_device_early(dev_priv);
> -	while (ttm_bo_swapout(&ctx) == 0);
> +	while (ttm_bo_swapout(&ctx, 0) > 0);

I think again GFP_KERNEL.

>  	if (dev_priv->enable_fb)
>  		vmw_fifo_resource_dec(dev_priv);
>  	if (atomic_read(&dev_priv->num_fifo_resources) != 0) {
> diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h
> index e17be324d95f..7c7539d76e1d 100644
> --- a/include/drm/ttm/ttm_bo_api.h
> +++ b/include/drm/ttm/ttm_bo_api.h
> @@ -569,7 +569,7 @@ ssize_t ttm_bo_io(struct ttm_bo_device *bdev, struct file *filp,
>  		  const char __user *wbuf, char __user *rbuf,
>  		  size_t count, loff_t *f_pos, bool write);
>  
> -int ttm_bo_swapout(struct ttm_operation_ctx *ctx);
> +int ttm_bo_swapout(struct ttm_operation_ctx *ctx, gfp_t gfp_flags);
>  
>  /**
>   * ttm_bo_uses_embedded_gem_object - check if the given bo uses the
> diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
> index 6c8eb9a4de81..4bc3a08644fc 100644
> --- a/include/drm/ttm/ttm_tt.h
> +++ b/include/drm/ttm/ttm_tt.h
> @@ -135,7 +135,8 @@ void ttm_tt_destroy_common(struct ttm_bo_device *bdev, struct ttm_tt *ttm);
>   * Swap in a previously swap out ttm_tt.
>   */
>  int ttm_tt_swapin(struct ttm_tt *ttm);
> -int ttm_tt_swapout(struct ttm_bo_device *bdev, struct ttm_tt *ttm);
> +int ttm_tt_swapout(struct ttm_bo_device *bdev, struct ttm_tt *ttm,
> +		   gfp_t gfp_flags);
>  
>  /**
>   * ttm_tt_populate - allocate pages for a ttm
> @@ -155,6 +156,9 @@ int ttm_tt_populate(struct ttm_bo_device *bdev, struct ttm_tt *ttm, struct ttm_o
>   */
>  void ttm_tt_unpopulate(struct ttm_bo_device *bdev, struct ttm_tt *ttm);
>  
> +int ttm_tt_mgr_init(void);
> +void ttm_tt_mgr_fini(void);
> +
>  #if IS_ENABLED(CONFIG_AGP)
>  #include <linux/agp_backend.h>

For testing I strongly recommend a debugfs file to trigger this shrinker
completely, from the right lockdep context (i.e. using
fs_reclaim_acquire/release()). Much easier to test that way. See
i915_drop_caches_set() in i915_debugfs.c.

That way you can fully test it all without hitting anything remotely
resembling actual OOM, which tends to kill all kinds of things.

Aside from the detail work I think this is going in the right direction.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  parent reply	other threads:[~2020-12-16 15:09 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-16 14:04 [PATCH 1/2] drm/ttm: rework ttm_tt page limit v2 Christian König
2020-12-16 14:04 ` [PATCH 2/2] drm/ttm: move memory accounting into vmwgfx Christian König
2020-12-16 15:26   ` Daniel Vetter
2020-12-16 16:55   ` kernel test robot
2020-12-16 16:55     ` kernel test robot
2020-12-16 18:19   ` kernel test robot
2020-12-16 18:19     ` kernel test robot
2020-12-16 15:09 ` Daniel Vetter [this message]
2020-12-17 13:46   ` [PATCH 1/2] drm/ttm: rework ttm_tt page limit v2 Christian König
2020-12-17 14:36     ` Daniel Vetter
2020-12-17 15:10       ` Christian König
2020-12-17 15:26         ` Daniel Vetter
2020-12-17 15:35           ` Christian König
2020-12-17 15:45             ` Daniel Vetter
2020-12-17 18:09               ` Jerome Glisse
2020-12-17 18:19                 ` Daniel Vetter
2020-12-17 18:26                   ` Daniel Vetter
2020-12-17 19:22                     ` Daniel Vetter
2020-12-17 18:40                   ` Jerome Glisse
2020-12-17 19:20                     ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=X9ojPDJv1ybEJT0k@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=airlied@linux.ie \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-graphics-maintainer@vmware.com \
    --cc=ray.huang@amd.com \
    --cc=sroland@vmware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.