* [PATCH 0/3] drm/ttm: hard to swim in an empty pool @ 2014-08-13 3:52 Jérôme Glisse 2014-08-13 3:52 ` [PATCH 1/3] drm/ttm: set sensible pool size limit Jérôme Glisse ` (2 more replies) 0 siblings, 3 replies; 29+ messages in thread From: Jérôme Glisse @ 2014-08-13 3:52 UTC (permalink / raw) To: dri-devel So it seems there was several issue with the various ttm pool. The obvious one is fixed in patch 2 where the always empty pool syndrom is addressed. However the pool size are kind of crazy and because before some pool were never actualy fill we might never have experience the hill effect of the crazy maximum limit. This is what is addressed by first patch. Last patch cook it up further so that under memory pressure the pool size is divided by 2 each time a shrinker is run on a pool. There is a timeout to restore the pool size on next allocation. Idea here is that memory should not last and if it last then shrinker will keep minimize the pool size and anyway thing are probably already sluggish once we it the shrinker path. Of course because this fix thing in ttm memory allocation this need careful testing. So before pushing anything i would like to see more people testing this. Cheers, Jérôme _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 1/3] drm/ttm: set sensible pool size limit. 2014-08-13 3:52 [PATCH 0/3] drm/ttm: hard to swim in an empty pool Jérôme Glisse @ 2014-08-13 3:52 ` Jérôme Glisse 2014-08-13 6:24 ` Michel Dänzer 2014-08-13 3:52 ` [PATCH 2/3] drm/ttm: fix object deallocation to properly fill in the page pool Jérôme Glisse 2014-08-13 3:52 ` [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool Jérôme Glisse 2 siblings, 1 reply; 29+ messages in thread From: Jérôme Glisse @ 2014-08-13 3:52 UTC (permalink / raw) To: dri-devel Cc: Michel Dänzer, Jérôme Glisse, Konrad Rzeszutek Wilk, Thomas Hellstrom From: Jérôme Glisse <jglisse@redhat.com> Due to bug in code it appear that some of the pool where never properly use and always empty. Before fixing that bug this patch set sensible limit on pool size. The magic 64MB number was nominated. This is obviously a some what arbitrary number but the intend of ttm pool is to minimize page alloc cost especialy when allocating page that will be mark to be excluded from cpu cache mecanisms. We assume that mostly small buffer that are constantly allocated/deallocated might suffer from core memory allocation overhead as well as cache status change. This are the assumptions behind the 64MB value. This obviously need some serious testing including monitoring pool size. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Cc: Mario Kleiner <mario.kleiner.de@gmail.com> Cc: Michel Dänzer <michel@daenzer.net> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- drivers/gpu/drm/ttm/ttm_memory.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_memory.c b/drivers/gpu/drm/ttm/ttm_memory.c index dbc2def..73b2ded 100644 --- a/drivers/gpu/drm/ttm/ttm_memory.c +++ b/drivers/gpu/drm/ttm/ttm_memory.c @@ -38,6 +38,16 @@ #include <linux/slab.h> #define TTM_MEMORY_ALLOC_RETRIES 4 +/* Have a maximum of 64MB of memory inside the pool. + * + * This is obviously a some what arbitrary number but the intend of ttm pool + * is to minimize page alloc cost especialy when allocating page that will be + * mark to be excluded from cpu cache mecanisms. We assume that mostly small + * buffer that are constantly allocated/deallocated might suffer from core + * memory allocation overhead as well as cache status change. This are the + * assumptions behind the 64MB value. + */ +#define MAX_POOL_SIZE (64UL << 20UL) struct ttm_mem_zone { struct kobject kobj; @@ -363,6 +373,7 @@ int ttm_mem_global_init(struct ttm_mem_global *glob) int ret; int i; struct ttm_mem_zone *zone; + unsigned long max_pool_size; spin_lock_init(&glob->lock); glob->swap_queue = create_singlethread_workqueue("ttm_swap"); @@ -393,8 +404,9 @@ int ttm_mem_global_init(struct ttm_mem_global *glob) pr_info("Zone %7s: Available graphics memory: %llu kiB\n", zone->name, (unsigned long long)zone->max_mem >> 10); } - ttm_page_alloc_init(glob, glob->zone_kernel->max_mem/(2*PAGE_SIZE)); - ttm_dma_page_alloc_init(glob, glob->zone_kernel->max_mem/(2*PAGE_SIZE)); + max_pool_size = min(glob->zone_kernel->max_mem >> 3UL, MAX_POOL_SIZE); + ttm_page_alloc_init(glob, max_pool_size / (2 * PAGE_SIZE)); + ttm_dma_page_alloc_init(glob, max_pool_size / (2 * PAGE_SIZE)); return 0; out_no_zone: ttm_mem_global_release(glob); -- 1.9.3 _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] drm/ttm: set sensible pool size limit. 2014-08-13 3:52 ` [PATCH 1/3] drm/ttm: set sensible pool size limit Jérôme Glisse @ 2014-08-13 6:24 ` Michel Dänzer 0 siblings, 0 replies; 29+ messages in thread From: Michel Dänzer @ 2014-08-13 6:24 UTC (permalink / raw) To: Jérôme Glisse, dri-devel Cc: Jérôme Glisse, Thomas Hellstrom, Konrad Rzeszutek Wilk On 13.08.2014 12:52, Jérôme Glisse wrote: > From: Jérôme Glisse <jglisse@redhat.com> > > Due to bug in code it appear that some of the pool where never properly > use and always empty. Before fixing that bug this patch set sensible > limit on pool size. The magic 64MB number was nominated. > > This is obviously a some what arbitrary number but the intend of ttm pool > is to minimize page alloc cost especialy when allocating page that will be > mark to be excluded from cpu cache mecanisms. We assume that mostly small > buffer that are constantly allocated/deallocated might suffer from core > memory allocation overhead as well as cache status change. This are the > assumptions behind the 64MB value. > > This obviously need some serious testing including monitoring pool size. > > Signed-off-by: Jérôme Glisse <jglisse@redhat.com> [...] > @@ -393,8 +404,9 @@ int ttm_mem_global_init(struct ttm_mem_global *glob) > pr_info("Zone %7s: Available graphics memory: %llu kiB\n", > zone->name, (unsigned long long)zone->max_mem >> 10); > } > - ttm_page_alloc_init(glob, glob->zone_kernel->max_mem/(2*PAGE_SIZE)); > - ttm_dma_page_alloc_init(glob, glob->zone_kernel->max_mem/(2*PAGE_SIZE)); > + max_pool_size = min(glob->zone_kernel->max_mem >> 3UL, MAX_POOL_SIZE); This introduces a 'comparison of distinct pointer types lacks a cast' warning for me. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 2/3] drm/ttm: fix object deallocation to properly fill in the page pool. 2014-08-13 3:52 [PATCH 0/3] drm/ttm: hard to swim in an empty pool Jérôme Glisse 2014-08-13 3:52 ` [PATCH 1/3] drm/ttm: set sensible pool size limit Jérôme Glisse @ 2014-08-13 3:52 ` Jérôme Glisse 2015-03-25 19:06 ` Konrad Rzeszutek Wilk 2015-07-06 9:11 ` Michel Dänzer 2014-08-13 3:52 ` [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool Jérôme Glisse 2 siblings, 2 replies; 29+ messages in thread From: Jérôme Glisse @ 2014-08-13 3:52 UTC (permalink / raw) To: dri-devel Cc: Jérôme Glisse, Thomas Hellstrom, Konrad Rzeszutek Wilk From: Jérôme Glisse <jglisse@redhat.com> Current code never allowed the page pool to actualy fill in anyway. This fix it and also allow it to grow over its limit until it grow beyond the batch size for allocation and deallocation. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: Mario Kleiner <mario.kleiner.de@gmail.com> Tested-by: Michel Dänzer <michel@daenzer.net> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c index c96db43..a076ff3 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c @@ -953,14 +953,9 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) } else { pool->npages_free += count; list_splice(&ttm_dma->pages_list, &pool->free_list); - npages = count; - if (pool->npages_free > _manager->options.max_size) { + if (pool->npages_free >= (_manager->options.max_size + + NUM_PAGES_TO_ALLOC)) npages = pool->npages_free - _manager->options.max_size; - /* free at least NUM_PAGES_TO_ALLOC number of pages - * to reduce calls to set_memory_wb */ - if (npages < NUM_PAGES_TO_ALLOC) - npages = NUM_PAGES_TO_ALLOC; - } } spin_unlock_irqrestore(&pool->lock, irq_flags); -- 1.9.3 _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 2/3] drm/ttm: fix object deallocation to properly fill in the page pool. 2014-08-13 3:52 ` [PATCH 2/3] drm/ttm: fix object deallocation to properly fill in the page pool Jérôme Glisse @ 2015-03-25 19:06 ` Konrad Rzeszutek Wilk 2015-07-06 9:11 ` Michel Dänzer 1 sibling, 0 replies; 29+ messages in thread From: Konrad Rzeszutek Wilk @ 2015-03-25 19:06 UTC (permalink / raw) To: Jérôme Glisse Cc: Jérôme Glisse, Thomas Hellstrom, dri-devel On Tue, Aug 12, 2014 at 11:52:05PM -0400, Jérôme Glisse wrote: > From: Jérôme Glisse <jglisse@redhat.com> > > Current code never allowed the page pool to actualy fill in anyway. This fix > it and also allow it to grow over its limit until it grow beyond the batch > size for allocation and deallocation. > > Signed-off-by: Jérôme Glisse <jglisse@redhat.com> > Reviewed-by: Mario Kleiner <mario.kleiner.de@gmail.com> > Tested-by: Michel Dänzer <michel@daenzer.net> > Cc: Thomas Hellstrom <thellstrom@vmware.com> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > --- > drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 9 ++------- > 1 file changed, 2 insertions(+), 7 deletions(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c > index c96db43..a076ff3 100644 > --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c > +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c > @@ -953,14 +953,9 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) > } else { > pool->npages_free += count; > list_splice(&ttm_dma->pages_list, &pool->free_list); > - npages = count; > - if (pool->npages_free > _manager->options.max_size) { > + if (pool->npages_free >= (_manager->options.max_size + > + NUM_PAGES_TO_ALLOC)) > npages = pool->npages_free - _manager->options.max_size; > - /* free at least NUM_PAGES_TO_ALLOC number of pages > - * to reduce calls to set_memory_wb */ > - if (npages < NUM_PAGES_TO_ALLOC) > - npages = NUM_PAGES_TO_ALLOC; > - } > } > spin_unlock_irqrestore(&pool->lock, irq_flags); > > -- > 1.9.3 > _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2/3] drm/ttm: fix object deallocation to properly fill in the page pool. 2014-08-13 3:52 ` [PATCH 2/3] drm/ttm: fix object deallocation to properly fill in the page pool Jérôme Glisse 2015-03-25 19:06 ` Konrad Rzeszutek Wilk @ 2015-07-06 9:11 ` Michel Dänzer 2015-07-06 16:10 ` Jerome Glisse 1 sibling, 1 reply; 29+ messages in thread From: Michel Dänzer @ 2015-07-06 9:11 UTC (permalink / raw) To: Jérôme Glisse Cc: Thomas Hellstrom, Jack.Xiao, dri-devel, Konrad Rzeszutek Wilk Hi Jérôme, On 13.08.2014 12:52, Jérôme Glisse wrote: > From: Jérôme Glisse <jglisse@redhat.com> > > Current code never allowed the page pool to actualy fill in anyway. This fix > it and also allow it to grow over its limit until it grow beyond the batch > size for allocation and deallocation. > > Signed-off-by: Jérôme Glisse <jglisse@redhat.com> > Reviewed-by: Mario Kleiner <mario.kleiner.de@gmail.com> > Tested-by: Michel Dänzer <michel@daenzer.net> > Cc: Thomas Hellstrom <thellstrom@vmware.com> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > --- > drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 9 ++------- > 1 file changed, 2 insertions(+), 7 deletions(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c > index c96db43..a076ff3 100644 > --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c > +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c > @@ -953,14 +953,9 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) > } else { > pool->npages_free += count; > list_splice(&ttm_dma->pages_list, &pool->free_list); > - npages = count; > - if (pool->npages_free > _manager->options.max_size) { > + if (pool->npages_free >= (_manager->options.max_size + > + NUM_PAGES_TO_ALLOC)) > npages = pool->npages_free - _manager->options.max_size; > - /* free at least NUM_PAGES_TO_ALLOC number of pages > - * to reduce calls to set_memory_wb */ > - if (npages < NUM_PAGES_TO_ALLOC) > - npages = NUM_PAGES_TO_ALLOC; > - } > } > spin_unlock_irqrestore(&pool->lock, irq_flags); > > Colleagues of mine have measured significant performance gains for some workloads with this patch. Without it, a lot of CPU cycles are spent changing the caching attributes of pages on allocation. Note that the performance effect seems to mostly disappear when applying patch 1 as well, so apparently 64MB is too small for the max pool size. Is there any chance this patch could be applied without the controversial patch 3? If not, do you have ideas for addressing the concerns raised against patch 3? -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2/3] drm/ttm: fix object deallocation to properly fill in the page pool. 2015-07-06 9:11 ` Michel Dänzer @ 2015-07-06 16:10 ` Jerome Glisse 2015-07-07 6:39 ` Michel Dänzer 0 siblings, 1 reply; 29+ messages in thread From: Jerome Glisse @ 2015-07-06 16:10 UTC (permalink / raw) To: Michel Dänzer Cc: Thomas Hellstrom, Jack.Xiao, dri-devel, Konrad Rzeszutek Wilk On Mon, Jul 06, 2015 at 06:11:29PM +0900, Michel Dänzer wrote: > > Hi Jérôme, > > On 13.08.2014 12:52, Jérôme Glisse wrote: > > From: Jérôme Glisse <jglisse@redhat.com> > > > > Current code never allowed the page pool to actualy fill in anyway. This fix > > it and also allow it to grow over its limit until it grow beyond the batch > > size for allocation and deallocation. > > > > Signed-off-by: Jérôme Glisse <jglisse@redhat.com> > > Reviewed-by: Mario Kleiner <mario.kleiner.de@gmail.com> > > Tested-by: Michel Dänzer <michel@daenzer.net> > > Cc: Thomas Hellstrom <thellstrom@vmware.com> > > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > > --- > > drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 9 ++------- > > 1 file changed, 2 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c > > index c96db43..a076ff3 100644 > > --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c > > +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c > > @@ -953,14 +953,9 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) > > } else { > > pool->npages_free += count; > > list_splice(&ttm_dma->pages_list, &pool->free_list); > > - npages = count; > > - if (pool->npages_free > _manager->options.max_size) { > > + if (pool->npages_free >= (_manager->options.max_size + > > + NUM_PAGES_TO_ALLOC)) > > npages = pool->npages_free - _manager->options.max_size; > > - /* free at least NUM_PAGES_TO_ALLOC number of pages > > - * to reduce calls to set_memory_wb */ > > - if (npages < NUM_PAGES_TO_ALLOC) > > - npages = NUM_PAGES_TO_ALLOC; > > - } > > } > > spin_unlock_irqrestore(&pool->lock, irq_flags); > > > > > > Colleagues of mine have measured significant performance gains for some > workloads with this patch. Without it, a lot of CPU cycles are spent > changing the caching attributes of pages on allocation. > > Note that the performance effect seems to mostly disappear when applying > patch 1 as well, so apparently 64MB is too small for the max pool size. > > Is there any chance this patch could be applied without the > controversial patch 3? If not, do you have ideas for addressing the > concerns raised against patch 3? Wahou, now i need to find the keys to the DeLorean to travel back in time. This patch is a fix and should be applied without 1 or 3. Because today basicly the pool is always emptied and never fill up. But as Thomas pointed out there is already bit too much pool accross the stack. Proper solution would be to work something inside the mm level or the architecture (i assume AMD is mostly interested in x86 on that front). Here the whole issue is really about allocating page with WC/UC cache properties. Changing cache properties on page is really costly on several level, like the kernel needs to break the huge linear mapping and populate lower level to remap the page with proper cache attribute inside the kernel mapping. As far as i remember the kernel never goes back to huge page mapping when restoring page cache attribute, which meaning that little by litte with uptime you loose the whole huge page mapping benefit for everything and you waste more memory. Anyway just wanted to dump here my recolection and how i think patch 1 & 3 should be replaced by simply moving down this allocation infrastructure inside core mm code. Where it should always have been. In meantime i think we need to apply this patch as it is really a fix to make the code actually do what the comment and design pretends it does. Cheers, Jérôme _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2/3] drm/ttm: fix object deallocation to properly fill in the page pool. 2015-07-06 16:10 ` Jerome Glisse @ 2015-07-07 6:39 ` Michel Dänzer 2015-07-07 17:41 ` Jerome Glisse 0 siblings, 1 reply; 29+ messages in thread From: Michel Dänzer @ 2015-07-07 6:39 UTC (permalink / raw) To: Jerome Glisse Cc: Thomas Hellstrom, Jack.Xiao, dri-devel, Konrad Rzeszutek Wilk On 07.07.2015 01:10, Jerome Glisse wrote: > On Mon, Jul 06, 2015 at 06:11:29PM +0900, Michel Dänzer wrote: >> >> Hi Jérôme, >> >> On 13.08.2014 12:52, Jérôme Glisse wrote: >>> From: Jérôme Glisse <jglisse@redhat.com> >>> >>> Current code never allowed the page pool to actualy fill in anyway. This fix >>> it and also allow it to grow over its limit until it grow beyond the batch >>> size for allocation and deallocation. >>> >>> Signed-off-by: Jérôme Glisse <jglisse@redhat.com> >>> Reviewed-by: Mario Kleiner <mario.kleiner.de@gmail.com> >>> Tested-by: Michel Dänzer <michel@daenzer.net> >>> Cc: Thomas Hellstrom <thellstrom@vmware.com> >>> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> >>> --- >>> drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 9 ++------- >>> 1 file changed, 2 insertions(+), 7 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c >>> index c96db43..a076ff3 100644 >>> --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c >>> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c >>> @@ -953,14 +953,9 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) >>> } else { >>> pool->npages_free += count; >>> list_splice(&ttm_dma->pages_list, &pool->free_list); >>> - npages = count; >>> - if (pool->npages_free > _manager->options.max_size) { >>> + if (pool->npages_free >= (_manager->options.max_size + >>> + NUM_PAGES_TO_ALLOC)) >>> npages = pool->npages_free - _manager->options.max_size; >>> - /* free at least NUM_PAGES_TO_ALLOC number of pages >>> - * to reduce calls to set_memory_wb */ >>> - if (npages < NUM_PAGES_TO_ALLOC) >>> - npages = NUM_PAGES_TO_ALLOC; >>> - } >>> } >>> spin_unlock_irqrestore(&pool->lock, irq_flags); >>> >>> >> >> Colleagues of mine have measured significant performance gains for some >> workloads with this patch. Without it, a lot of CPU cycles are spent >> changing the caching attributes of pages on allocation. >> >> Note that the performance effect seems to mostly disappear when applying >> patch 1 as well, so apparently 64MB is too small for the max pool size. >> >> Is there any chance this patch could be applied without the >> controversial patch 3? If not, do you have ideas for addressing the >> concerns raised against patch 3? > > Wahou, now i need to find the keys to the DeLorean to travel back in time. > > This patch is a fix and should be applied without 1 or 3. Because today > basicly the pool is always emptied and never fill up. But as Thomas pointed > out there is already bit too much pool accross the stack. Proper solution > would be to work something inside the mm level or the architecture (i assume > AMD is mostly interested in x86 on that front). > > Here the whole issue is really about allocating page with WC/UC cache > properties. Changing cache properties on page is really costly on several > level, like the kernel needs to break the huge linear mapping and populate > lower level to remap the page with proper cache attribute inside the kernel > mapping. > > As far as i remember the kernel never goes back to huge page mapping when > restoring page cache attribute, which meaning that little by litte with > uptime you loose the whole huge page mapping benefit for everything and you > waste more memory. That sounds pretty bad, but this patch might actually help a little for that by reducing the amount of huge page mappings that need to be broken up? > In meantime i think we need to apply this patch as it is really a fix to > make the code actually do what the comment and design pretends it does. I agree. BTW, maybe it should be split up into the actual fix (removing the npages assignment) and the NUM_PAGES_TO_ALLOC related simplification? -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2/3] drm/ttm: fix object deallocation to properly fill in the page pool. 2015-07-07 6:39 ` Michel Dänzer @ 2015-07-07 17:41 ` Jerome Glisse 2015-07-08 2:34 ` Michel Dänzer 0 siblings, 1 reply; 29+ messages in thread From: Jerome Glisse @ 2015-07-07 17:41 UTC (permalink / raw) To: Michel Dänzer Cc: Thomas Hellstrom, Jack.Xiao, dri-devel, Konrad Rzeszutek Wilk On Tue, Jul 07, 2015 at 03:39:29PM +0900, Michel Dänzer wrote: > On 07.07.2015 01:10, Jerome Glisse wrote: > > On Mon, Jul 06, 2015 at 06:11:29PM +0900, Michel Dänzer wrote: > >> > >> Hi Jérôme, > >> > >> On 13.08.2014 12:52, Jérôme Glisse wrote: > >>> From: Jérôme Glisse <jglisse@redhat.com> > >>> > >>> Current code never allowed the page pool to actualy fill in anyway. This fix > >>> it and also allow it to grow over its limit until it grow beyond the batch > >>> size for allocation and deallocation. > >>> > >>> Signed-off-by: Jérôme Glisse <jglisse@redhat.com> > >>> Reviewed-by: Mario Kleiner <mario.kleiner.de@gmail.com> > >>> Tested-by: Michel Dänzer <michel@daenzer.net> > >>> Cc: Thomas Hellstrom <thellstrom@vmware.com> > >>> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > >>> --- > >>> drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 9 ++------- > >>> 1 file changed, 2 insertions(+), 7 deletions(-) > >>> > >>> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c > >>> index c96db43..a076ff3 100644 > >>> --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c > >>> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c > >>> @@ -953,14 +953,9 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) > >>> } else { > >>> pool->npages_free += count; > >>> list_splice(&ttm_dma->pages_list, &pool->free_list); > >>> - npages = count; > >>> - if (pool->npages_free > _manager->options.max_size) { > >>> + if (pool->npages_free >= (_manager->options.max_size + > >>> + NUM_PAGES_TO_ALLOC)) > >>> npages = pool->npages_free - _manager->options.max_size; > >>> - /* free at least NUM_PAGES_TO_ALLOC number of pages > >>> - * to reduce calls to set_memory_wb */ > >>> - if (npages < NUM_PAGES_TO_ALLOC) > >>> - npages = NUM_PAGES_TO_ALLOC; > >>> - } > >>> } > >>> spin_unlock_irqrestore(&pool->lock, irq_flags); > >>> > >>> > >> > >> Colleagues of mine have measured significant performance gains for some > >> workloads with this patch. Without it, a lot of CPU cycles are spent > >> changing the caching attributes of pages on allocation. > >> > >> Note that the performance effect seems to mostly disappear when applying > >> patch 1 as well, so apparently 64MB is too small for the max pool size. > >> > >> Is there any chance this patch could be applied without the > >> controversial patch 3? If not, do you have ideas for addressing the > >> concerns raised against patch 3? > > > > Wahou, now i need to find the keys to the DeLorean to travel back in time. > > > > This patch is a fix and should be applied without 1 or 3. Because today > > basicly the pool is always emptied and never fill up. But as Thomas pointed > > out there is already bit too much pool accross the stack. Proper solution > > would be to work something inside the mm level or the architecture (i assume > > AMD is mostly interested in x86 on that front). > > > > Here the whole issue is really about allocating page with WC/UC cache > > properties. Changing cache properties on page is really costly on several > > level, like the kernel needs to break the huge linear mapping and populate > > lower level to remap the page with proper cache attribute inside the kernel > > mapping. > > > > As far as i remember the kernel never goes back to huge page mapping when > > restoring page cache attribute, which meaning that little by litte with > > uptime you loose the whole huge page mapping benefit for everything and you > > waste more memory. > > That sounds pretty bad, but this patch might actually help a little for > that by reducing the amount of huge page mappings that need to be broken up? Not really, for limiting huge page mapping breakage you would need to allocate page in same physical cluster so that only 1 huge page mapping needs to be broken. It would be a bit like DMA32 or DMA16 physical range. So this would obviously need some work at the MM level. At ttm level this can be more or less implemented using GFP_DMA32 flag on page allocation but at the same time doing that you kind of put more pressure on the first 4G of memory and i think nowadays with the whole webbrowser consuming several GB of texture you probably do not want to do that. > > In meantime i think we need to apply this patch as it is really a fix to > > make the code actually do what the comment and design pretends it does. > > I agree. > > BTW, maybe it should be split up into the actual fix (removing the > npages assignment) and the NUM_PAGES_TO_ALLOC related simplification? This would make 2 really small patch, patch is small as it is :) But why not. Cheers, Jérôme _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2/3] drm/ttm: fix object deallocation to properly fill in the page pool. 2015-07-07 17:41 ` Jerome Glisse @ 2015-07-08 2:34 ` Michel Dänzer 0 siblings, 0 replies; 29+ messages in thread From: Michel Dänzer @ 2015-07-08 2:34 UTC (permalink / raw) To: Jerome Glisse Cc: Thomas Hellstrom, Jack.Xiao, dri-devel, Konrad Rzeszutek Wilk On 08.07.2015 02:41, Jerome Glisse wrote: > On Tue, Jul 07, 2015 at 03:39:29PM +0900, Michel Dänzer wrote: >> On 07.07.2015 01:10, Jerome Glisse wrote: >>> >>> In meantime i think we need to apply this patch as it is really a fix to >>> make the code actually do what the comment and design pretends it does. >> >> I agree. >> >> BTW, maybe it should be split up into the actual fix (removing the >> npages assignment) and the NUM_PAGES_TO_ALLOC related simplification? > > This would make 2 really small patch, patch is small as it is :) But why > not. It's not about size but about having one commit for each logical change. It took me a while to realize that the NUM_PAGES_TO_ALLOC changes aren't directly related to the fix, and I've seen the same thing happen to others. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool 2014-08-13 3:52 [PATCH 0/3] drm/ttm: hard to swim in an empty pool Jérôme Glisse 2014-08-13 3:52 ` [PATCH 1/3] drm/ttm: set sensible pool size limit Jérôme Glisse 2014-08-13 3:52 ` [PATCH 2/3] drm/ttm: fix object deallocation to properly fill in the page pool Jérôme Glisse @ 2014-08-13 3:52 ` Jérôme Glisse 2014-08-13 6:32 ` Michel Dänzer 2014-08-13 9:06 ` Thomas Hellstrom 2 siblings, 2 replies; 29+ messages in thread From: Jérôme Glisse @ 2014-08-13 3:52 UTC (permalink / raw) To: dri-devel Cc: Michel Dänzer, Jérôme Glisse, Konrad Rzeszutek Wilk, Thomas Hellstrom From: Jérôme Glisse <jglisse@redhat.com> When experiencing memory pressure we want to minimize pool size so that memory we just shrinked is not added back again just as the next thing. This will divide by 2 the maximum pool size for each device each time the pool have to shrink. The limit is bumped again is next allocation happen after one second since the last shrink. The one second delay is obviously an arbitrary choice. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Cc: Mario Kleiner <mario.kleiner.de@gmail.com> Cc: Michel Dänzer <michel@daenzer.net> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- drivers/gpu/drm/ttm/ttm_page_alloc.c | 35 +++++++++++++++++++++++++------- drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 27 ++++++++++++++++++++++-- 2 files changed, 53 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c index 09874d6..ab41adf 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c @@ -68,6 +68,8 @@ * @list: Pool of free uc/wc pages for fast reuse. * @gfp_flags: Flags to pass for alloc_page. * @npages: Number of pages in pool. + * @cur_max_size: Current maximum size for the pool. + * @shrink_timeout: Timeout for pool maximum size restriction. */ struct ttm_page_pool { spinlock_t lock; @@ -76,6 +78,8 @@ struct ttm_page_pool { gfp_t gfp_flags; unsigned npages; char *name; + unsigned cur_max_size; + unsigned long last_shrink; unsigned long nfrees; unsigned long nrefills; }; @@ -289,6 +293,16 @@ static void ttm_pool_update_free_locked(struct ttm_page_pool *pool, pool->nfrees += freed_pages; } +static inline void ttm_pool_update_max_size(struct ttm_page_pool *pool) +{ + if (time_before(jiffies, pool->shrink_timeout)) + return; + /* In case we reached zero bounce back to 512 pages. */ + pool->cur_max_size = max(pool->cur_max_size << 1, 512); + pool->cur_max_size = min(pool->cur_max_size, + _manager->options.max_size); +} + /** * Free pages from pool. * @@ -407,6 +421,9 @@ ttm_pool_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) if (shrink_pages == 0) break; pool = &_manager->pools[(i + pool_offset)%NUM_POOLS]; + /* No matter what make sure the pool do not grow in next second. */ + pool->cur_max_size = pool->cur_max_size >> 1; + pool->shrink_timeout = jiffies + HZ; shrink_pages = ttm_page_pool_free(pool, nr_free, sc->gfp_mask); freed += nr_free - shrink_pages; @@ -701,13 +718,12 @@ static void ttm_put_pages(struct page **pages, unsigned npages, int flags, } /* Check that we don't go over the pool limit */ npages = 0; - if (pool->npages > _manager->options.max_size) { - npages = pool->npages - _manager->options.max_size; - /* free at least NUM_PAGES_TO_ALLOC number of pages - * to reduce calls to set_memory_wb */ - if (npages < NUM_PAGES_TO_ALLOC) - npages = NUM_PAGES_TO_ALLOC; - } + /* + * Free at least NUM_PAGES_TO_ALLOC number of pages to reduce calls to + * set_memory_wb. + */ + if (pool->npages > (pool->cur_max_size + NUM_PAGES_TO_ALLOC)) + npages = pool->npages - pool->cur_max_size; spin_unlock_irqrestore(&pool->lock, irq_flags); if (npages) ttm_page_pool_free(pool, npages, GFP_KERNEL); @@ -751,6 +767,9 @@ static int ttm_get_pages(struct page **pages, unsigned npages, int flags, return 0; } + /* Update pool size in case shrinker limited it. */ + ttm_pool_update_max_size(pool); + /* combine zero flag to pool flags */ gfp_flags |= pool->gfp_flags; @@ -803,6 +822,8 @@ static void ttm_page_pool_init_locked(struct ttm_page_pool *pool, gfp_t flags, pool->npages = pool->nfrees = 0; pool->gfp_flags = flags; pool->name = name; + pool->cur_max_size = _manager->options.max_size; + pool->shrink_timeout = jiffies; } int ttm_page_alloc_init(struct ttm_mem_global *glob, unsigned max_pages) diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c index a076ff3..80b10aa 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c @@ -93,6 +93,8 @@ enum pool_type { * @size: Size used during DMA allocation. * @npages_free: Count of available pages for re-use. * @npages_in_use: Count of pages that are in use. + * @cur_max_size: Current maximum size for the pool. + * @shrink_timeout: Timeout for pool maximum size restriction. * @nfrees: Stats when pool is shrinking. * @nrefills: Stats when the pool is grown. * @gfp_flags: Flags to pass for alloc_page. @@ -110,6 +112,8 @@ struct dma_pool { unsigned size; unsigned npages_free; unsigned npages_in_use; + unsigned cur_max_size; + unsigned long last_shrink; unsigned long nfrees; /* Stats when shrunk. */ unsigned long nrefills; /* Stats when grown. */ gfp_t gfp_flags; @@ -331,6 +335,17 @@ static void __ttm_dma_free_page(struct dma_pool *pool, struct dma_page *d_page) kfree(d_page); d_page = NULL; } + +static inline void ttm_dma_pool_update_max_size(struct dma_pool *pool) +{ + if (time_before(jiffies, pool->shrink_timeout)) + return; + /* In case we reached zero bounce back to 512 pages. */ + pool->cur_max_size = max(pool->cur_max_size << 1, 512); + pool->cur_max_size = min(pool->cur_max_size, + _manager->options.max_size); +} + static struct dma_page *__ttm_dma_alloc_page(struct dma_pool *pool) { struct dma_page *d_page; @@ -606,6 +621,8 @@ static struct dma_pool *ttm_dma_pool_init(struct device *dev, gfp_t flags, pool->size = PAGE_SIZE; pool->type = type; pool->nrefills = 0; + pool->cur_max_size = _manager->options.max_size; + pool->shrink_timeout = jiffies; p = pool->name; for (i = 0; i < 5; i++) { if (type & t[i]) { @@ -892,6 +909,9 @@ int ttm_dma_populate(struct ttm_dma_tt *ttm_dma, struct device *dev) } } + /* Update pool size in case shrinker limited it. */ + ttm_dma_pool_update_max_size(pool); + INIT_LIST_HEAD(&ttm_dma->pages_list); for (i = 0; i < ttm->num_pages; ++i) { ret = ttm_dma_pool_get_pages(pool, ttm_dma, i); @@ -953,9 +973,9 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) } else { pool->npages_free += count; list_splice(&ttm_dma->pages_list, &pool->free_list); - if (pool->npages_free >= (_manager->options.max_size + + if (pool->npages_free >= (pool->cur_max_size + NUM_PAGES_TO_ALLOC)) - npages = pool->npages_free - _manager->options.max_size; + npages = pool->npages_free - pool->cur_max_size; } spin_unlock_irqrestore(&pool->lock, irq_flags); @@ -1024,6 +1044,9 @@ ttm_dma_pool_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) /* Do it in round-robin fashion. */ if (++idx < pool_offset) continue; + /* No matter what make sure the pool do not grow in next second. */ + p->pool->cur_max_size = p->pool->cur_max_size >> 1; + p->pool->shrink_timeout = jiffies + HZ; nr_free = shrink_pages; shrink_pages = ttm_dma_page_pool_free(p->pool, nr_free, sc->gfp_mask); -- 1.9.3 _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool 2014-08-13 3:52 ` [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool Jérôme Glisse @ 2014-08-13 6:32 ` Michel Dänzer 2014-08-13 9:06 ` Thomas Hellstrom 1 sibling, 0 replies; 29+ messages in thread From: Michel Dänzer @ 2014-08-13 6:32 UTC (permalink / raw) To: Jérôme Glisse, dri-devel Cc: Jérôme Glisse, Thomas Hellstrom, Konrad Rzeszutek Wilk On 13.08.2014 12:52, Jérôme Glisse wrote: > From: Jérôme Glisse <jglisse@redhat.com> > > When experiencing memory pressure we want to minimize pool size so that > memory we just shrinked is not added back again just as the next thing. > > This will divide by 2 the maximum pool size for each device each time > the pool have to shrink. The limit is bumped again is next allocation > happen after one second since the last shrink. The one second delay is > obviously an arbitrary choice. > > Signed-off-by: Jérôme Glisse <jglisse@redhat.com> [...] > diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c > index 09874d6..ab41adf 100644 > --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c > +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c > @@ -68,6 +68,8 @@ > * @list: Pool of free uc/wc pages for fast reuse. > * @gfp_flags: Flags to pass for alloc_page. > * @npages: Number of pages in pool. > + * @cur_max_size: Current maximum size for the pool. > + * @shrink_timeout: Timeout for pool maximum size restriction. > */ > struct ttm_page_pool { > spinlock_t lock; > @@ -76,6 +78,8 @@ struct ttm_page_pool { > gfp_t gfp_flags; > unsigned npages; > char *name; > + unsigned cur_max_size; > + unsigned long last_shrink; s/last_shrink/shrink_timeout/ Looks like maybe you posted an untested stale set of patches? > @@ -289,6 +293,16 @@ static void ttm_pool_update_free_locked(struct ttm_page_pool *pool, > pool->nfrees += freed_pages; > } > > +static inline void ttm_pool_update_max_size(struct ttm_page_pool *pool) > +{ > + if (time_before(jiffies, pool->shrink_timeout)) > + return; > + /* In case we reached zero bounce back to 512 pages. */ > + pool->cur_max_size = max(pool->cur_max_size << 1, 512); Another 'comparison of distinct pointer types lacks a cast' warning. Both issues apply to ttm_page_alloc_dma.c as well. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool 2014-08-13 3:52 ` [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool Jérôme Glisse 2014-08-13 6:32 ` Michel Dänzer @ 2014-08-13 9:06 ` Thomas Hellstrom 2014-08-13 10:42 ` Daniel Vetter 1 sibling, 1 reply; 29+ messages in thread From: Thomas Hellstrom @ 2014-08-13 9:06 UTC (permalink / raw) To: Jérôme Glisse Cc: Michel Dänzer, Jérôme Glisse, dri-devel, Konrad Rzeszutek Wilk On 08/13/2014 05:52 AM, Jérôme Glisse wrote: > From: Jérôme Glisse <jglisse@redhat.com> > > When experiencing memory pressure we want to minimize pool size so that > memory we just shrinked is not added back again just as the next thing. > > This will divide by 2 the maximum pool size for each device each time > the pool have to shrink. The limit is bumped again is next allocation > happen after one second since the last shrink. The one second delay is > obviously an arbitrary choice. Jérôme, I don't like this patch. It adds extra complexity and its usefulness is highly questionable. There are a number of caches in the system, and if all of them added some sort of voluntary shrink heuristics like this, we'd end up with impossible-to-debug unpredictable performance issues. We should let the memory subsystem decide when to reclaim pages from caches and what caches to reclaim them from. /Thomas > > Signed-off-by: Jérôme Glisse <jglisse@redhat.com> > Cc: Mario Kleiner <mario.kleiner.de@gmail.com> > Cc: Michel Dänzer <michel@daenzer.net> > Cc: Thomas Hellstrom <thellstrom@vmware.com> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > --- > drivers/gpu/drm/ttm/ttm_page_alloc.c | 35 +++++++++++++++++++++++++------- > drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 27 ++++++++++++++++++++++-- > 2 files changed, 53 insertions(+), 9 deletions(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c > index 09874d6..ab41adf 100644 > --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c > +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c > @@ -68,6 +68,8 @@ > * @list: Pool of free uc/wc pages for fast reuse. > * @gfp_flags: Flags to pass for alloc_page. > * @npages: Number of pages in pool. > + * @cur_max_size: Current maximum size for the pool. > + * @shrink_timeout: Timeout for pool maximum size restriction. > */ > struct ttm_page_pool { > spinlock_t lock; > @@ -76,6 +78,8 @@ struct ttm_page_pool { > gfp_t gfp_flags; > unsigned npages; > char *name; > + unsigned cur_max_size; > + unsigned long last_shrink; > unsigned long nfrees; > unsigned long nrefills; > }; > @@ -289,6 +293,16 @@ static void ttm_pool_update_free_locked(struct ttm_page_pool *pool, > pool->nfrees += freed_pages; > } > > +static inline void ttm_pool_update_max_size(struct ttm_page_pool *pool) > +{ > + if (time_before(jiffies, pool->shrink_timeout)) > + return; > + /* In case we reached zero bounce back to 512 pages. */ > + pool->cur_max_size = max(pool->cur_max_size << 1, 512); > + pool->cur_max_size = min(pool->cur_max_size, > + _manager->options.max_size); > +} > + > /** > * Free pages from pool. > * > @@ -407,6 +421,9 @@ ttm_pool_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) > if (shrink_pages == 0) > break; > pool = &_manager->pools[(i + pool_offset)%NUM_POOLS]; > + /* No matter what make sure the pool do not grow in next second. */ > + pool->cur_max_size = pool->cur_max_size >> 1; > + pool->shrink_timeout = jiffies + HZ; > shrink_pages = ttm_page_pool_free(pool, nr_free, > sc->gfp_mask); > freed += nr_free - shrink_pages; > @@ -701,13 +718,12 @@ static void ttm_put_pages(struct page **pages, unsigned npages, int flags, > } > /* Check that we don't go over the pool limit */ > npages = 0; > - if (pool->npages > _manager->options.max_size) { > - npages = pool->npages - _manager->options.max_size; > - /* free at least NUM_PAGES_TO_ALLOC number of pages > - * to reduce calls to set_memory_wb */ > - if (npages < NUM_PAGES_TO_ALLOC) > - npages = NUM_PAGES_TO_ALLOC; > - } > + /* > + * Free at least NUM_PAGES_TO_ALLOC number of pages to reduce calls to > + * set_memory_wb. > + */ > + if (pool->npages > (pool->cur_max_size + NUM_PAGES_TO_ALLOC)) > + npages = pool->npages - pool->cur_max_size; > spin_unlock_irqrestore(&pool->lock, irq_flags); > if (npages) > ttm_page_pool_free(pool, npages, GFP_KERNEL); > @@ -751,6 +767,9 @@ static int ttm_get_pages(struct page **pages, unsigned npages, int flags, > return 0; > } > > + /* Update pool size in case shrinker limited it. */ > + ttm_pool_update_max_size(pool); > + > /* combine zero flag to pool flags */ > gfp_flags |= pool->gfp_flags; > > @@ -803,6 +822,8 @@ static void ttm_page_pool_init_locked(struct ttm_page_pool *pool, gfp_t flags, > pool->npages = pool->nfrees = 0; > pool->gfp_flags = flags; > pool->name = name; > + pool->cur_max_size = _manager->options.max_size; > + pool->shrink_timeout = jiffies; > } > > int ttm_page_alloc_init(struct ttm_mem_global *glob, unsigned max_pages) > diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c > index a076ff3..80b10aa 100644 > --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c > +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c > @@ -93,6 +93,8 @@ enum pool_type { > * @size: Size used during DMA allocation. > * @npages_free: Count of available pages for re-use. > * @npages_in_use: Count of pages that are in use. > + * @cur_max_size: Current maximum size for the pool. > + * @shrink_timeout: Timeout for pool maximum size restriction. > * @nfrees: Stats when pool is shrinking. > * @nrefills: Stats when the pool is grown. > * @gfp_flags: Flags to pass for alloc_page. > @@ -110,6 +112,8 @@ struct dma_pool { > unsigned size; > unsigned npages_free; > unsigned npages_in_use; > + unsigned cur_max_size; > + unsigned long last_shrink; > unsigned long nfrees; /* Stats when shrunk. */ > unsigned long nrefills; /* Stats when grown. */ > gfp_t gfp_flags; > @@ -331,6 +335,17 @@ static void __ttm_dma_free_page(struct dma_pool *pool, struct dma_page *d_page) > kfree(d_page); > d_page = NULL; > } > + > +static inline void ttm_dma_pool_update_max_size(struct dma_pool *pool) > +{ > + if (time_before(jiffies, pool->shrink_timeout)) > + return; > + /* In case we reached zero bounce back to 512 pages. */ > + pool->cur_max_size = max(pool->cur_max_size << 1, 512); > + pool->cur_max_size = min(pool->cur_max_size, > + _manager->options.max_size); > +} > + > static struct dma_page *__ttm_dma_alloc_page(struct dma_pool *pool) > { > struct dma_page *d_page; > @@ -606,6 +621,8 @@ static struct dma_pool *ttm_dma_pool_init(struct device *dev, gfp_t flags, > pool->size = PAGE_SIZE; > pool->type = type; > pool->nrefills = 0; > + pool->cur_max_size = _manager->options.max_size; > + pool->shrink_timeout = jiffies; > p = pool->name; > for (i = 0; i < 5; i++) { > if (type & t[i]) { > @@ -892,6 +909,9 @@ int ttm_dma_populate(struct ttm_dma_tt *ttm_dma, struct device *dev) > } > } > > + /* Update pool size in case shrinker limited it. */ > + ttm_dma_pool_update_max_size(pool); > + > INIT_LIST_HEAD(&ttm_dma->pages_list); > for (i = 0; i < ttm->num_pages; ++i) { > ret = ttm_dma_pool_get_pages(pool, ttm_dma, i); > @@ -953,9 +973,9 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) > } else { > pool->npages_free += count; > list_splice(&ttm_dma->pages_list, &pool->free_list); > - if (pool->npages_free >= (_manager->options.max_size + > + if (pool->npages_free >= (pool->cur_max_size + > NUM_PAGES_TO_ALLOC)) > - npages = pool->npages_free - _manager->options.max_size; > + npages = pool->npages_free - pool->cur_max_size; > } > spin_unlock_irqrestore(&pool->lock, irq_flags); > > @@ -1024,6 +1044,9 @@ ttm_dma_pool_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) > /* Do it in round-robin fashion. */ > if (++idx < pool_offset) > continue; > + /* No matter what make sure the pool do not grow in next second. */ > + p->pool->cur_max_size = p->pool->cur_max_size >> 1; > + p->pool->shrink_timeout = jiffies + HZ; > nr_free = shrink_pages; > shrink_pages = ttm_dma_page_pool_free(p->pool, nr_free, > sc->gfp_mask); _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool 2014-08-13 9:06 ` Thomas Hellstrom @ 2014-08-13 10:42 ` Daniel Vetter 2014-08-13 12:35 ` GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool) Thomas Hellstrom 0 siblings, 1 reply; 29+ messages in thread From: Daniel Vetter @ 2014-08-13 10:42 UTC (permalink / raw) To: Thomas Hellstrom Cc: Michel Dänzer, Jérôme Glisse, dri-devel, Konrad Rzeszutek Wilk On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote: > > On 08/13/2014 05:52 AM, Jérôme Glisse wrote: > > From: Jérôme Glisse <jglisse@redhat.com> > > > > When experiencing memory pressure we want to minimize pool size so that > > memory we just shrinked is not added back again just as the next thing. > > > > This will divide by 2 the maximum pool size for each device each time > > the pool have to shrink. The limit is bumped again is next allocation > > happen after one second since the last shrink. The one second delay is > > obviously an arbitrary choice. > > Jérôme, > > I don't like this patch. It adds extra complexity and its usefulness is > highly questionable. > There are a number of caches in the system, and if all of them added > some sort of voluntary shrink heuristics like this, we'd end up with > impossible-to-debug unpredictable performance issues. > > We should let the memory subsystem decide when to reclaim pages from > caches and what caches to reclaim them from. Yeah, artificially limiting your cache from growing when your shrinker gets called will just break the equal-memory pressure the core mm uses to rebalance between all caches when workload changes. In i915 we let everything grow without artificial bounds and only rely upon the shrinker callbacks to ensure we don't consume more than our fair share of available memory overall. -Daniel > > /Thomas > > > > Signed-off-by: Jérôme Glisse <jglisse@redhat.com> > > Cc: Mario Kleiner <mario.kleiner.de@gmail.com> > > Cc: Michel Dänzer <michel@daenzer.net> > > Cc: Thomas Hellstrom <thellstrom@vmware.com> > > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > > --- > > drivers/gpu/drm/ttm/ttm_page_alloc.c | 35 +++++++++++++++++++++++++------- > > drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 27 ++++++++++++++++++++++-- > > 2 files changed, 53 insertions(+), 9 deletions(-) > > > > diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c > > index 09874d6..ab41adf 100644 > > --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c > > +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c > > @@ -68,6 +68,8 @@ > > * @list: Pool of free uc/wc pages for fast reuse. > > * @gfp_flags: Flags to pass for alloc_page. > > * @npages: Number of pages in pool. > > + * @cur_max_size: Current maximum size for the pool. > > + * @shrink_timeout: Timeout for pool maximum size restriction. > > */ > > struct ttm_page_pool { > > spinlock_t lock; > > @@ -76,6 +78,8 @@ struct ttm_page_pool { > > gfp_t gfp_flags; > > unsigned npages; > > char *name; > > + unsigned cur_max_size; > > + unsigned long last_shrink; > > unsigned long nfrees; > > unsigned long nrefills; > > }; > > @@ -289,6 +293,16 @@ static void ttm_pool_update_free_locked(struct ttm_page_pool *pool, > > pool->nfrees += freed_pages; > > } > > > > +static inline void ttm_pool_update_max_size(struct ttm_page_pool *pool) > > +{ > > + if (time_before(jiffies, pool->shrink_timeout)) > > + return; > > + /* In case we reached zero bounce back to 512 pages. */ > > + pool->cur_max_size = max(pool->cur_max_size << 1, 512); > > + pool->cur_max_size = min(pool->cur_max_size, > > + _manager->options.max_size); > > +} > > + > > /** > > * Free pages from pool. > > * > > @@ -407,6 +421,9 @@ ttm_pool_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) > > if (shrink_pages == 0) > > break; > > pool = &_manager->pools[(i + pool_offset)%NUM_POOLS]; > > + /* No matter what make sure the pool do not grow in next second. */ > > + pool->cur_max_size = pool->cur_max_size >> 1; > > + pool->shrink_timeout = jiffies + HZ; > > shrink_pages = ttm_page_pool_free(pool, nr_free, > > sc->gfp_mask); > > freed += nr_free - shrink_pages; > > @@ -701,13 +718,12 @@ static void ttm_put_pages(struct page **pages, unsigned npages, int flags, > > } > > /* Check that we don't go over the pool limit */ > > npages = 0; > > - if (pool->npages > _manager->options.max_size) { > > - npages = pool->npages - _manager->options.max_size; > > - /* free at least NUM_PAGES_TO_ALLOC number of pages > > - * to reduce calls to set_memory_wb */ > > - if (npages < NUM_PAGES_TO_ALLOC) > > - npages = NUM_PAGES_TO_ALLOC; > > - } > > + /* > > + * Free at least NUM_PAGES_TO_ALLOC number of pages to reduce calls to > > + * set_memory_wb. > > + */ > > + if (pool->npages > (pool->cur_max_size + NUM_PAGES_TO_ALLOC)) > > + npages = pool->npages - pool->cur_max_size; > > spin_unlock_irqrestore(&pool->lock, irq_flags); > > if (npages) > > ttm_page_pool_free(pool, npages, GFP_KERNEL); > > @@ -751,6 +767,9 @@ static int ttm_get_pages(struct page **pages, unsigned npages, int flags, > > return 0; > > } > > > > + /* Update pool size in case shrinker limited it. */ > > + ttm_pool_update_max_size(pool); > > + > > /* combine zero flag to pool flags */ > > gfp_flags |= pool->gfp_flags; > > > > @@ -803,6 +822,8 @@ static void ttm_page_pool_init_locked(struct ttm_page_pool *pool, gfp_t flags, > > pool->npages = pool->nfrees = 0; > > pool->gfp_flags = flags; > > pool->name = name; > > + pool->cur_max_size = _manager->options.max_size; > > + pool->shrink_timeout = jiffies; > > } > > > > int ttm_page_alloc_init(struct ttm_mem_global *glob, unsigned max_pages) > > diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c > > index a076ff3..80b10aa 100644 > > --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c > > +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c > > @@ -93,6 +93,8 @@ enum pool_type { > > * @size: Size used during DMA allocation. > > * @npages_free: Count of available pages for re-use. > > * @npages_in_use: Count of pages that are in use. > > + * @cur_max_size: Current maximum size for the pool. > > + * @shrink_timeout: Timeout for pool maximum size restriction. > > * @nfrees: Stats when pool is shrinking. > > * @nrefills: Stats when the pool is grown. > > * @gfp_flags: Flags to pass for alloc_page. > > @@ -110,6 +112,8 @@ struct dma_pool { > > unsigned size; > > unsigned npages_free; > > unsigned npages_in_use; > > + unsigned cur_max_size; > > + unsigned long last_shrink; > > unsigned long nfrees; /* Stats when shrunk. */ > > unsigned long nrefills; /* Stats when grown. */ > > gfp_t gfp_flags; > > @@ -331,6 +335,17 @@ static void __ttm_dma_free_page(struct dma_pool *pool, struct dma_page *d_page) > > kfree(d_page); > > d_page = NULL; > > } > > + > > +static inline void ttm_dma_pool_update_max_size(struct dma_pool *pool) > > +{ > > + if (time_before(jiffies, pool->shrink_timeout)) > > + return; > > + /* In case we reached zero bounce back to 512 pages. */ > > + pool->cur_max_size = max(pool->cur_max_size << 1, 512); > > + pool->cur_max_size = min(pool->cur_max_size, > > + _manager->options.max_size); > > +} > > + > > static struct dma_page *__ttm_dma_alloc_page(struct dma_pool *pool) > > { > > struct dma_page *d_page; > > @@ -606,6 +621,8 @@ static struct dma_pool *ttm_dma_pool_init(struct device *dev, gfp_t flags, > > pool->size = PAGE_SIZE; > > pool->type = type; > > pool->nrefills = 0; > > + pool->cur_max_size = _manager->options.max_size; > > + pool->shrink_timeout = jiffies; > > p = pool->name; > > for (i = 0; i < 5; i++) { > > if (type & t[i]) { > > @@ -892,6 +909,9 @@ int ttm_dma_populate(struct ttm_dma_tt *ttm_dma, struct device *dev) > > } > > } > > > > + /* Update pool size in case shrinker limited it. */ > > + ttm_dma_pool_update_max_size(pool); > > + > > INIT_LIST_HEAD(&ttm_dma->pages_list); > > for (i = 0; i < ttm->num_pages; ++i) { > > ret = ttm_dma_pool_get_pages(pool, ttm_dma, i); > > @@ -953,9 +973,9 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) > > } else { > > pool->npages_free += count; > > list_splice(&ttm_dma->pages_list, &pool->free_list); > > - if (pool->npages_free >= (_manager->options.max_size + > > + if (pool->npages_free >= (pool->cur_max_size + > > NUM_PAGES_TO_ALLOC)) > > - npages = pool->npages_free - _manager->options.max_size; > > + npages = pool->npages_free - pool->cur_max_size; > > } > > spin_unlock_irqrestore(&pool->lock, irq_flags); > > > > @@ -1024,6 +1044,9 @@ ttm_dma_pool_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) > > /* Do it in round-robin fashion. */ > > if (++idx < pool_offset) > > continue; > > + /* No matter what make sure the pool do not grow in next second. */ > > + p->pool->cur_max_size = p->pool->cur_max_size >> 1; > > + p->pool->shrink_timeout = jiffies + HZ; > > nr_free = shrink_pages; > > shrink_pages = ttm_dma_page_pool_free(p->pool, nr_free, > > sc->gfp_mask); > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 29+ messages in thread
* GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool) 2014-08-13 10:42 ` Daniel Vetter @ 2014-08-13 12:35 ` Thomas Hellstrom 2014-08-13 12:40 ` David Herrmann 2014-08-13 13:01 ` Daniel Vetter 0 siblings, 2 replies; 29+ messages in thread From: Thomas Hellstrom @ 2014-08-13 12:35 UTC (permalink / raw) To: Daniel Vetter Cc: Michel Dänzer, Jérôme Glisse, dri-devel, Konrad Rzeszutek Wilk On 08/13/2014 12:42 PM, Daniel Vetter wrote: > On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote: >> On 08/13/2014 05:52 AM, Jérôme Glisse wrote: >>> From: Jérôme Glisse <jglisse@redhat.com> >>> >>> When experiencing memory pressure we want to minimize pool size so that >>> memory we just shrinked is not added back again just as the next thing. >>> >>> This will divide by 2 the maximum pool size for each device each time >>> the pool have to shrink. The limit is bumped again is next allocation >>> happen after one second since the last shrink. The one second delay is >>> obviously an arbitrary choice. >> Jérôme, >> >> I don't like this patch. It adds extra complexity and its usefulness is >> highly questionable. >> There are a number of caches in the system, and if all of them added >> some sort of voluntary shrink heuristics like this, we'd end up with >> impossible-to-debug unpredictable performance issues. >> >> We should let the memory subsystem decide when to reclaim pages from >> caches and what caches to reclaim them from. > Yeah, artificially limiting your cache from growing when your shrinker > gets called will just break the equal-memory pressure the core mm uses to > rebalance between all caches when workload changes. In i915 we let > everything grow without artificial bounds and only rely upon the shrinker > callbacks to ensure we don't consume more than our fair share of available > memory overall. > -Daniel Now when you bring i915 memory usage up, Daniel, I can't refrain from bringing up the old user-space unreclaimable kernel memory issue, for which gem open is a good example ;) Each time user-space opens a gem handle, some un-reclaimable kernel memory is allocated, for which there is no accounting, so theoretically I think a user can bring a system to unusability this way. Typically there are various limits on unreclaimable objects like this, like open file descriptors, and IIRC the kernel even has an internal limit on the number of struct files you initialize, based on the available system memory, so dma-buf / prime should already have some sort of protection. /Thomas >> /Thomas >>> Signed-off-by: Jérôme Glisse <jglisse@redhat.com> >>> Cc: Mario Kleiner <mario.kleiner.de@gmail.com> >>> Cc: Michel Dänzer <michel@daenzer.net> >>> Cc: Thomas Hellstrom <thellstrom@vmware.com> >>> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> >>> --- >>> drivers/gpu/drm/ttm/ttm_page_alloc.c | 35 +++++++++++++++++++++++++------- >>> drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 27 ++++++++++++++++++++++-- >>> 2 files changed, 53 insertions(+), 9 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c >>> index 09874d6..ab41adf 100644 >>> --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c >>> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c >>> @@ -68,6 +68,8 @@ >>> * @list: Pool of free uc/wc pages for fast reuse. >>> * @gfp_flags: Flags to pass for alloc_page. >>> * @npages: Number of pages in pool. >>> + * @cur_max_size: Current maximum size for the pool. >>> + * @shrink_timeout: Timeout for pool maximum size restriction. >>> */ >>> struct ttm_page_pool { >>> spinlock_t lock; >>> @@ -76,6 +78,8 @@ struct ttm_page_pool { >>> gfp_t gfp_flags; >>> unsigned npages; >>> char *name; >>> + unsigned cur_max_size; >>> + unsigned long last_shrink; >>> unsigned long nfrees; >>> unsigned long nrefills; >>> }; >>> @@ -289,6 +293,16 @@ static void ttm_pool_update_free_locked(struct ttm_page_pool *pool, >>> pool->nfrees += freed_pages; >>> } >>> >>> +static inline void ttm_pool_update_max_size(struct ttm_page_pool *pool) >>> +{ >>> + if (time_before(jiffies, pool->shrink_timeout)) >>> + return; >>> + /* In case we reached zero bounce back to 512 pages. */ >>> + pool->cur_max_size = max(pool->cur_max_size << 1, 512); >>> + pool->cur_max_size = min(pool->cur_max_size, >>> + _manager->options.max_size); >>> +} >>> + >>> /** >>> * Free pages from pool. >>> * >>> @@ -407,6 +421,9 @@ ttm_pool_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) >>> if (shrink_pages == 0) >>> break; >>> pool = &_manager->pools[(i + pool_offset)%NUM_POOLS]; >>> + /* No matter what make sure the pool do not grow in next second. */ >>> + pool->cur_max_size = pool->cur_max_size >> 1; >>> + pool->shrink_timeout = jiffies + HZ; >>> shrink_pages = ttm_page_pool_free(pool, nr_free, >>> sc->gfp_mask); >>> freed += nr_free - shrink_pages; >>> @@ -701,13 +718,12 @@ static void ttm_put_pages(struct page **pages, unsigned npages, int flags, >>> } >>> /* Check that we don't go over the pool limit */ >>> npages = 0; >>> - if (pool->npages > _manager->options.max_size) { >>> - npages = pool->npages - _manager->options.max_size; >>> - /* free at least NUM_PAGES_TO_ALLOC number of pages >>> - * to reduce calls to set_memory_wb */ >>> - if (npages < NUM_PAGES_TO_ALLOC) >>> - npages = NUM_PAGES_TO_ALLOC; >>> - } >>> + /* >>> + * Free at least NUM_PAGES_TO_ALLOC number of pages to reduce calls to >>> + * set_memory_wb. >>> + */ >>> + if (pool->npages > (pool->cur_max_size + NUM_PAGES_TO_ALLOC)) >>> + npages = pool->npages - pool->cur_max_size; >>> spin_unlock_irqrestore(&pool->lock, irq_flags); >>> if (npages) >>> ttm_page_pool_free(pool, npages, GFP_KERNEL); >>> @@ -751,6 +767,9 @@ static int ttm_get_pages(struct page **pages, unsigned npages, int flags, >>> return 0; >>> } >>> >>> + /* Update pool size in case shrinker limited it. */ >>> + ttm_pool_update_max_size(pool); >>> + >>> /* combine zero flag to pool flags */ >>> gfp_flags |= pool->gfp_flags; >>> >>> @@ -803,6 +822,8 @@ static void ttm_page_pool_init_locked(struct ttm_page_pool *pool, gfp_t flags, >>> pool->npages = pool->nfrees = 0; >>> pool->gfp_flags = flags; >>> pool->name = name; >>> + pool->cur_max_size = _manager->options.max_size; >>> + pool->shrink_timeout = jiffies; >>> } >>> >>> int ttm_page_alloc_init(struct ttm_mem_global *glob, unsigned max_pages) >>> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c >>> index a076ff3..80b10aa 100644 >>> --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c >>> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c >>> @@ -93,6 +93,8 @@ enum pool_type { >>> * @size: Size used during DMA allocation. >>> * @npages_free: Count of available pages for re-use. >>> * @npages_in_use: Count of pages that are in use. >>> + * @cur_max_size: Current maximum size for the pool. >>> + * @shrink_timeout: Timeout for pool maximum size restriction. >>> * @nfrees: Stats when pool is shrinking. >>> * @nrefills: Stats when the pool is grown. >>> * @gfp_flags: Flags to pass for alloc_page. >>> @@ -110,6 +112,8 @@ struct dma_pool { >>> unsigned size; >>> unsigned npages_free; >>> unsigned npages_in_use; >>> + unsigned cur_max_size; >>> + unsigned long last_shrink; >>> unsigned long nfrees; /* Stats when shrunk. */ >>> unsigned long nrefills; /* Stats when grown. */ >>> gfp_t gfp_flags; >>> @@ -331,6 +335,17 @@ static void __ttm_dma_free_page(struct dma_pool *pool, struct dma_page *d_page) >>> kfree(d_page); >>> d_page = NULL; >>> } >>> + >>> +static inline void ttm_dma_pool_update_max_size(struct dma_pool *pool) >>> +{ >>> + if (time_before(jiffies, pool->shrink_timeout)) >>> + return; >>> + /* In case we reached zero bounce back to 512 pages. */ >>> + pool->cur_max_size = max(pool->cur_max_size << 1, 512); >>> + pool->cur_max_size = min(pool->cur_max_size, >>> + _manager->options.max_size); >>> +} >>> + >>> static struct dma_page *__ttm_dma_alloc_page(struct dma_pool *pool) >>> { >>> struct dma_page *d_page; >>> @@ -606,6 +621,8 @@ static struct dma_pool *ttm_dma_pool_init(struct device *dev, gfp_t flags, >>> pool->size = PAGE_SIZE; >>> pool->type = type; >>> pool->nrefills = 0; >>> + pool->cur_max_size = _manager->options.max_size; >>> + pool->shrink_timeout = jiffies; >>> p = pool->name; >>> for (i = 0; i < 5; i++) { >>> if (type & t[i]) { >>> @@ -892,6 +909,9 @@ int ttm_dma_populate(struct ttm_dma_tt *ttm_dma, struct device *dev) >>> } >>> } >>> >>> + /* Update pool size in case shrinker limited it. */ >>> + ttm_dma_pool_update_max_size(pool); >>> + >>> INIT_LIST_HEAD(&ttm_dma->pages_list); >>> for (i = 0; i < ttm->num_pages; ++i) { >>> ret = ttm_dma_pool_get_pages(pool, ttm_dma, i); >>> @@ -953,9 +973,9 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) >>> } else { >>> pool->npages_free += count; >>> list_splice(&ttm_dma->pages_list, &pool->free_list); >>> - if (pool->npages_free >= (_manager->options.max_size + >>> + if (pool->npages_free >= (pool->cur_max_size + >>> NUM_PAGES_TO_ALLOC)) >>> - npages = pool->npages_free - _manager->options.max_size; >>> + npages = pool->npages_free - pool->cur_max_size; >>> } >>> spin_unlock_irqrestore(&pool->lock, irq_flags); >>> >>> @@ -1024,6 +1044,9 @@ ttm_dma_pool_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) >>> /* Do it in round-robin fashion. */ >>> if (++idx < pool_offset) >>> continue; >>> + /* No matter what make sure the pool do not grow in next second. */ >>> + p->pool->cur_max_size = p->pool->cur_max_size >> 1; >>> + p->pool->shrink_timeout = jiffies + HZ; >>> nr_free = shrink_pages; >>> shrink_pages = ttm_dma_page_pool_free(p->pool, nr_free, >>> sc->gfp_mask); >> _______________________________________________ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/dri-devel&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=O3%2ForLs9%2FKcgsMAkAfqRcjXW1tOmP7AUEsPkaztSSrE%3D%0A&s=a7ca6074f66b92976893c6c6e18d96b96f905d4fc8ec0b8503614728fc387dc5 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool) 2014-08-13 12:35 ` GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool) Thomas Hellstrom @ 2014-08-13 12:40 ` David Herrmann 2014-08-13 12:48 ` Thomas Hellstrom 2014-08-13 13:01 ` Daniel Vetter 1 sibling, 1 reply; 29+ messages in thread From: David Herrmann @ 2014-08-13 12:40 UTC (permalink / raw) To: Thomas Hellstrom Cc: dri-devel, Michel Dänzer, Jérôme Glisse, Konrad Rzeszutek Wilk Hi On Wed, Aug 13, 2014 at 2:35 PM, Thomas Hellstrom <thellstrom@vmware.com> wrote: > On 08/13/2014 12:42 PM, Daniel Vetter wrote: >> On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote: >>> On 08/13/2014 05:52 AM, Jérôme Glisse wrote: >>>> From: Jérôme Glisse <jglisse@redhat.com> >>>> >>>> When experiencing memory pressure we want to minimize pool size so that >>>> memory we just shrinked is not added back again just as the next thing. >>>> >>>> This will divide by 2 the maximum pool size for each device each time >>>> the pool have to shrink. The limit is bumped again is next allocation >>>> happen after one second since the last shrink. The one second delay is >>>> obviously an arbitrary choice. >>> Jérôme, >>> >>> I don't like this patch. It adds extra complexity and its usefulness is >>> highly questionable. >>> There are a number of caches in the system, and if all of them added >>> some sort of voluntary shrink heuristics like this, we'd end up with >>> impossible-to-debug unpredictable performance issues. >>> >>> We should let the memory subsystem decide when to reclaim pages from >>> caches and what caches to reclaim them from. >> Yeah, artificially limiting your cache from growing when your shrinker >> gets called will just break the equal-memory pressure the core mm uses to >> rebalance between all caches when workload changes. In i915 we let >> everything grow without artificial bounds and only rely upon the shrinker >> callbacks to ensure we don't consume more than our fair share of available >> memory overall. >> -Daniel > > Now when you bring i915 memory usage up, Daniel, > I can't refrain from bringing up the old user-space unreclaimable kernel > memory issue, for which gem open is a good example ;) Each time > user-space opens a gem handle, some un-reclaimable kernel memory is > allocated, for which there is no accounting, so theoretically I think a > user can bring a system to unusability this way. > > Typically there are various limits on unreclaimable objects like this, > like open file descriptors, and IIRC the kernel even has an internal > limit on the number of struct files you initialize, based on the > available system memory, so dma-buf / prime should already have some > sort of protection. gem->filp points to a fresh shmem file, which itself is limited like dmabuf. That should suffice, right? Thanks David _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool) 2014-08-13 12:40 ` David Herrmann @ 2014-08-13 12:48 ` Thomas Hellstrom 0 siblings, 0 replies; 29+ messages in thread From: Thomas Hellstrom @ 2014-08-13 12:48 UTC (permalink / raw) To: David Herrmann Cc: dri-devel, Michel Dänzer, Jérôme Glisse, Konrad Rzeszutek Wilk On 08/13/2014 02:40 PM, David Herrmann wrote: > Hi > > On Wed, Aug 13, 2014 at 2:35 PM, Thomas Hellstrom <thellstrom@vmware.com> wrote: >> On 08/13/2014 12:42 PM, Daniel Vetter wrote: >>> On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote: >>>> On 08/13/2014 05:52 AM, Jérôme Glisse wrote: >>>>> From: Jérôme Glisse <jglisse@redhat.com> >>>>> >>>>> When experiencing memory pressure we want to minimize pool size so that >>>>> memory we just shrinked is not added back again just as the next thing. >>>>> >>>>> This will divide by 2 the maximum pool size for each device each time >>>>> the pool have to shrink. The limit is bumped again is next allocation >>>>> happen after one second since the last shrink. The one second delay is >>>>> obviously an arbitrary choice. >>>> Jérôme, >>>> >>>> I don't like this patch. It adds extra complexity and its usefulness is >>>> highly questionable. >>>> There are a number of caches in the system, and if all of them added >>>> some sort of voluntary shrink heuristics like this, we'd end up with >>>> impossible-to-debug unpredictable performance issues. >>>> >>>> We should let the memory subsystem decide when to reclaim pages from >>>> caches and what caches to reclaim them from. >>> Yeah, artificially limiting your cache from growing when your shrinker >>> gets called will just break the equal-memory pressure the core mm uses to >>> rebalance between all caches when workload changes. In i915 we let >>> everything grow without artificial bounds and only rely upon the shrinker >>> callbacks to ensure we don't consume more than our fair share of available >>> memory overall. >>> -Daniel >> Now when you bring i915 memory usage up, Daniel, >> I can't refrain from bringing up the old user-space unreclaimable kernel >> memory issue, for which gem open is a good example ;) Each time >> user-space opens a gem handle, some un-reclaimable kernel memory is >> allocated, for which there is no accounting, so theoretically I think a >> user can bring a system to unusability this way. >> >> Typically there are various limits on unreclaimable objects like this, >> like open file descriptors, and IIRC the kernel even has an internal >> limit on the number of struct files you initialize, based on the >> available system memory, so dma-buf / prime should already have some >> sort of protection. > gem->filp points to a fresh shmem file, which itself is limited like > dmabuf. That should suffice, right? > > Thanks > David I'm thinking of situations where you have a gem name and open a new handle. It allocates a new unaccounted idr object. Admittedly you'd have to open a hell of a lot of new handles to stress the system, but that's an example of the situation I'm thinking of. Similarly perhaps if you create a gem handle from a prime file-descriptor but I haven't looked at that code in detail. Thanks /Thomas _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool) 2014-08-13 12:35 ` GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool) Thomas Hellstrom 2014-08-13 12:40 ` David Herrmann @ 2014-08-13 13:01 ` Daniel Vetter 2014-08-13 14:09 ` Oded Gabbay 2014-08-13 15:13 ` Thomas Hellstrom 1 sibling, 2 replies; 29+ messages in thread From: Daniel Vetter @ 2014-08-13 13:01 UTC (permalink / raw) To: Thomas Hellstrom Cc: Konrad Rzeszutek Wilk, Michel Dänzer, dri-devel, Jérôme Glisse On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote: > On 08/13/2014 12:42 PM, Daniel Vetter wrote: > > On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote: > >> On 08/13/2014 05:52 AM, Jérôme Glisse wrote: > >>> From: Jérôme Glisse <jglisse@redhat.com> > >>> > >>> When experiencing memory pressure we want to minimize pool size so that > >>> memory we just shrinked is not added back again just as the next thing. > >>> > >>> This will divide by 2 the maximum pool size for each device each time > >>> the pool have to shrink. The limit is bumped again is next allocation > >>> happen after one second since the last shrink. The one second delay is > >>> obviously an arbitrary choice. > >> Jérôme, > >> > >> I don't like this patch. It adds extra complexity and its usefulness is > >> highly questionable. > >> There are a number of caches in the system, and if all of them added > >> some sort of voluntary shrink heuristics like this, we'd end up with > >> impossible-to-debug unpredictable performance issues. > >> > >> We should let the memory subsystem decide when to reclaim pages from > >> caches and what caches to reclaim them from. > > Yeah, artificially limiting your cache from growing when your shrinker > > gets called will just break the equal-memory pressure the core mm uses to > > rebalance between all caches when workload changes. In i915 we let > > everything grow without artificial bounds and only rely upon the shrinker > > callbacks to ensure we don't consume more than our fair share of available > > memory overall. > > -Daniel > > Now when you bring i915 memory usage up, Daniel, > I can't refrain from bringing up the old user-space unreclaimable kernel > memory issue, for which gem open is a good example ;) Each time > user-space opens a gem handle, some un-reclaimable kernel memory is > allocated, for which there is no accounting, so theoretically I think a > user can bring a system to unusability this way. > > Typically there are various limits on unreclaimable objects like this, > like open file descriptors, and IIRC the kernel even has an internal > limit on the number of struct files you initialize, based on the > available system memory, so dma-buf / prime should already have some > sort of protection. Oh yeah, we have zero cgroups limits or similar stuff for gem allocations, so there's not really a way to isolate gpu memory usage in a sane way for specific processes. But there's also zero limits on actual gpu usage itself (timeslices or whatever) so I guess no one asked for this yet. My comment really was about balancing mm users under the assumption that they're all unlimited. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool) 2014-08-13 13:01 ` Daniel Vetter @ 2014-08-13 14:09 ` Oded Gabbay 2014-08-13 15:19 ` Thomas Hellstrom 2014-08-13 16:30 ` Daniel Vetter 2014-08-13 15:13 ` Thomas Hellstrom 1 sibling, 2 replies; 29+ messages in thread From: Oded Gabbay @ 2014-08-13 14:09 UTC (permalink / raw) To: Daniel Vetter, Thomas Hellstrom, Jérôme Glisse Cc: Michel Dänzer, dri-devel, Konrad Rzeszutek Wilk On 13/08/14 16:01, Daniel Vetter wrote: > On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote: >> On 08/13/2014 12:42 PM, Daniel Vetter wrote: >>> On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote: >>>> On 08/13/2014 05:52 AM, Jérôme Glisse wrote: >>>>> From: Jérôme Glisse <jglisse@redhat.com> >>>>> >>>>> When experiencing memory pressure we want to minimize pool size so that >>>>> memory we just shrinked is not added back again just as the next thing. >>>>> >>>>> This will divide by 2 the maximum pool size for each device each time >>>>> the pool have to shrink. The limit is bumped again is next allocation >>>>> happen after one second since the last shrink. The one second delay is >>>>> obviously an arbitrary choice. >>>> Jérôme, >>>> >>>> I don't like this patch. It adds extra complexity and its usefulness is >>>> highly questionable. >>>> There are a number of caches in the system, and if all of them added >>>> some sort of voluntary shrink heuristics like this, we'd end up with >>>> impossible-to-debug unpredictable performance issues. >>>> >>>> We should let the memory subsystem decide when to reclaim pages from >>>> caches and what caches to reclaim them from. >>> Yeah, artificially limiting your cache from growing when your shrinker >>> gets called will just break the equal-memory pressure the core mm uses to >>> rebalance between all caches when workload changes. In i915 we let >>> everything grow without artificial bounds and only rely upon the shrinker >>> callbacks to ensure we don't consume more than our fair share of available >>> memory overall. >>> -Daniel >> >> Now when you bring i915 memory usage up, Daniel, >> I can't refrain from bringing up the old user-space unreclaimable kernel >> memory issue, for which gem open is a good example ;) Each time >> user-space opens a gem handle, some un-reclaimable kernel memory is >> allocated, for which there is no accounting, so theoretically I think a >> user can bring a system to unusability this way. >> >> Typically there are various limits on unreclaimable objects like this, >> like open file descriptors, and IIRC the kernel even has an internal >> limit on the number of struct files you initialize, based on the >> available system memory, so dma-buf / prime should already have some >> sort of protection. > > Oh yeah, we have zero cgroups limits or similar stuff for gem allocations, > so there's not really a way to isolate gpu memory usage in a sane way for > specific processes. But there's also zero limits on actual gpu usage > itself (timeslices or whatever) so I guess no one asked for this yet. > > My comment really was about balancing mm users under the assumption that > they're all unlimited. > -Daniel > I think the point you brought up becomes very important for compute (HSA) processes. I still don't know how to distinguish between legitimate use of GPU local memory and misbehaving/malicious processes. We have a requirement that HSA processes will be allowed to allocate and pin GPU local memory. They do it through an ioctl. In the kernel driver, we have an accounting of those memory allocations, meaning that I can print a list of all the objects that were allocated by a certain process, per device. Therefore, in theory, I can reclaim any object, but that will probably break the userspace app. If the app is misbehaving/malicious than that's ok, I guess. But how do I know that ? And what prevents that malicious app to re-spawn and do the same allocation again ? Oded ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool) 2014-08-13 14:09 ` Oded Gabbay @ 2014-08-13 15:19 ` Thomas Hellstrom 2014-08-13 16:30 ` Daniel Vetter 1 sibling, 0 replies; 29+ messages in thread From: Thomas Hellstrom @ 2014-08-13 15:19 UTC (permalink / raw) To: Oded Gabbay Cc: dri-devel, Jérôme Glisse, Michel Dänzer, Konrad Rzeszutek Wilk On 08/13/2014 04:09 PM, Oded Gabbay wrote: > > > On 13/08/14 16:01, Daniel Vetter wrote: >> On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote: >>> On 08/13/2014 12:42 PM, Daniel Vetter wrote: >>>> On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote: >>>>> On 08/13/2014 05:52 AM, Jérôme Glisse wrote: >>>>>> From: Jérôme Glisse <jglisse@redhat.com> >>>>>> >>>>>> When experiencing memory pressure we want to minimize pool size >>>>>> so that >>>>>> memory we just shrinked is not added back again just as the next >>>>>> thing. >>>>>> >>>>>> This will divide by 2 the maximum pool size for each device each >>>>>> time >>>>>> the pool have to shrink. The limit is bumped again is next >>>>>> allocation >>>>>> happen after one second since the last shrink. The one second >>>>>> delay is >>>>>> obviously an arbitrary choice. >>>>> Jérôme, >>>>> >>>>> I don't like this patch. It adds extra complexity and its >>>>> usefulness is >>>>> highly questionable. >>>>> There are a number of caches in the system, and if all of them added >>>>> some sort of voluntary shrink heuristics like this, we'd end up with >>>>> impossible-to-debug unpredictable performance issues. >>>>> >>>>> We should let the memory subsystem decide when to reclaim pages from >>>>> caches and what caches to reclaim them from. >>>> Yeah, artificially limiting your cache from growing when your shrinker >>>> gets called will just break the equal-memory pressure the core mm >>>> uses to >>>> rebalance between all caches when workload changes. In i915 we let >>>> everything grow without artificial bounds and only rely upon the >>>> shrinker >>>> callbacks to ensure we don't consume more than our fair share of >>>> available >>>> memory overall. >>>> -Daniel >>> >>> Now when you bring i915 memory usage up, Daniel, >>> I can't refrain from bringing up the old user-space unreclaimable >>> kernel >>> memory issue, for which gem open is a good example ;) Each time >>> user-space opens a gem handle, some un-reclaimable kernel memory is >>> allocated, for which there is no accounting, so theoretically I think a >>> user can bring a system to unusability this way. >>> >>> Typically there are various limits on unreclaimable objects like this, >>> like open file descriptors, and IIRC the kernel even has an internal >>> limit on the number of struct files you initialize, based on the >>> available system memory, so dma-buf / prime should already have some >>> sort of protection. >> >> Oh yeah, we have zero cgroups limits or similar stuff for gem >> allocations, >> so there's not really a way to isolate gpu memory usage in a sane way >> for >> specific processes. But there's also zero limits on actual gpu usage >> itself (timeslices or whatever) so I guess no one asked for this yet. >> >> My comment really was about balancing mm users under the assumption that >> they're all unlimited. >> -Daniel >> > I think the point you brought up becomes very important for compute > (HSA) processes. I still don't know how to distinguish between > legitimate use of GPU local memory and misbehaving/malicious processes. > > We have a requirement that HSA processes will be allowed to allocate > and pin GPU local memory. They do it through an ioctl. > In the kernel driver, we have an accounting of those memory > allocations, meaning that I can print a list of all the objects that > were allocated by a certain process, per device. > Therefore, in theory, I can reclaim any object, but that will probably > break the userspace app. If the app is misbehaving/malicious than > that's ok, I guess. But how do I know that ? And what prevents that > malicious app to re-spawn and do the same allocation again ? If you have per-process accounting of all those memory allocations and you need to reclaim and there's no nice way of doing it, you should probably do it like the kernel OOM killer: You simply kill the process that is most likely to bring back most memory (or use any other heuristic). Typically the kernel OOM killer does that when all swap space is exhausted, probably assuming that the process that uses most swap space is most likely to be malicious, if there are any malicious processes. If the process respawns, and you run out of resources again, repeat the process. /Thomas > > Oded ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool) 2014-08-13 14:09 ` Oded Gabbay 2014-08-13 15:19 ` Thomas Hellstrom @ 2014-08-13 16:30 ` Daniel Vetter 1 sibling, 0 replies; 29+ messages in thread From: Daniel Vetter @ 2014-08-13 16:30 UTC (permalink / raw) To: Oded Gabbay Cc: Thomas Hellstrom, Konrad Rzeszutek Wilk, Michel Dänzer, dri-devel, Jérôme Glisse On Wed, Aug 13, 2014 at 05:09:49PM +0300, Oded Gabbay wrote: > > > On 13/08/14 16:01, Daniel Vetter wrote: > >On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote: > >>On 08/13/2014 12:42 PM, Daniel Vetter wrote: > >>>On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote: > >>>>On 08/13/2014 05:52 AM, Jérôme Glisse wrote: > >>>>>From: Jérôme Glisse <jglisse@redhat.com> > >>>>> > >>>>>When experiencing memory pressure we want to minimize pool size so that > >>>>>memory we just shrinked is not added back again just as the next thing. > >>>>> > >>>>>This will divide by 2 the maximum pool size for each device each time > >>>>>the pool have to shrink. The limit is bumped again is next allocation > >>>>>happen after one second since the last shrink. The one second delay is > >>>>>obviously an arbitrary choice. > >>>>Jérôme, > >>>> > >>>>I don't like this patch. It adds extra complexity and its usefulness is > >>>>highly questionable. > >>>>There are a number of caches in the system, and if all of them added > >>>>some sort of voluntary shrink heuristics like this, we'd end up with > >>>>impossible-to-debug unpredictable performance issues. > >>>> > >>>>We should let the memory subsystem decide when to reclaim pages from > >>>>caches and what caches to reclaim them from. > >>>Yeah, artificially limiting your cache from growing when your shrinker > >>>gets called will just break the equal-memory pressure the core mm uses to > >>>rebalance between all caches when workload changes. In i915 we let > >>>everything grow without artificial bounds and only rely upon the shrinker > >>>callbacks to ensure we don't consume more than our fair share of available > >>>memory overall. > >>>-Daniel > >> > >>Now when you bring i915 memory usage up, Daniel, > >>I can't refrain from bringing up the old user-space unreclaimable kernel > >>memory issue, for which gem open is a good example ;) Each time > >>user-space opens a gem handle, some un-reclaimable kernel memory is > >>allocated, for which there is no accounting, so theoretically I think a > >>user can bring a system to unusability this way. > >> > >>Typically there are various limits on unreclaimable objects like this, > >>like open file descriptors, and IIRC the kernel even has an internal > >>limit on the number of struct files you initialize, based on the > >>available system memory, so dma-buf / prime should already have some > >>sort of protection. > > > >Oh yeah, we have zero cgroups limits or similar stuff for gem allocations, > >so there's not really a way to isolate gpu memory usage in a sane way for > >specific processes. But there's also zero limits on actual gpu usage > >itself (timeslices or whatever) so I guess no one asked for this yet. > > > >My comment really was about balancing mm users under the assumption that > >they're all unlimited. > >-Daniel > > > I think the point you brought up becomes very important for compute (HSA) > processes. I still don't know how to distinguish between legitimate use of > GPU local memory and misbehaving/malicious processes. > > We have a requirement that HSA processes will be allowed to allocate and pin > GPU local memory. They do it through an ioctl. > In the kernel driver, we have an accounting of those memory allocations, > meaning that I can print a list of all the objects that were allocated by a > certain process, per device. > Therefore, in theory, I can reclaim any object, but that will probably break > the userspace app. If the app is misbehaving/malicious than that's ok, I > guess. But how do I know that ? And what prevents that malicious app to > re-spawn and do the same allocation again ? You can't do that in the kernel, this is policy decisions which is userspaces job. But what we instead need to allow is to properly track memory allocations so that memory limits can be set with cgroups. With SVM you get that for free. Without SVM we need some work in that area since currently the memory accounting for gem/ttm drivers is broken. The other bit is limits for wasting gpu time, and I guess for that we want a new gpu time cgroup system so that users can set soft/hard limits for different gpgpu tasks on servers. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool) 2014-08-13 13:01 ` Daniel Vetter 2014-08-13 14:09 ` Oded Gabbay @ 2014-08-13 15:13 ` Thomas Hellstrom 2014-08-13 16:24 ` Daniel Vetter 2014-08-14 22:29 ` Jesse Barnes 1 sibling, 2 replies; 29+ messages in thread From: Thomas Hellstrom @ 2014-08-13 15:13 UTC (permalink / raw) To: Daniel Vetter Cc: Michel Dänzer, Jérôme Glisse, dri-devel, Konrad Rzeszutek Wilk On 08/13/2014 03:01 PM, Daniel Vetter wrote: > On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote: >> On 08/13/2014 12:42 PM, Daniel Vetter wrote: >>> On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote: >>>> On 08/13/2014 05:52 AM, Jérôme Glisse wrote: >>>>> From: Jérôme Glisse <jglisse@redhat.com> >>>>> >>>>> When experiencing memory pressure we want to minimize pool size so that >>>>> memory we just shrinked is not added back again just as the next thing. >>>>> >>>>> This will divide by 2 the maximum pool size for each device each time >>>>> the pool have to shrink. The limit is bumped again is next allocation >>>>> happen after one second since the last shrink. The one second delay is >>>>> obviously an arbitrary choice. >>>> Jérôme, >>>> >>>> I don't like this patch. It adds extra complexity and its usefulness is >>>> highly questionable. >>>> There are a number of caches in the system, and if all of them added >>>> some sort of voluntary shrink heuristics like this, we'd end up with >>>> impossible-to-debug unpredictable performance issues. >>>> >>>> We should let the memory subsystem decide when to reclaim pages from >>>> caches and what caches to reclaim them from. >>> Yeah, artificially limiting your cache from growing when your shrinker >>> gets called will just break the equal-memory pressure the core mm uses to >>> rebalance between all caches when workload changes. In i915 we let >>> everything grow without artificial bounds and only rely upon the shrinker >>> callbacks to ensure we don't consume more than our fair share of available >>> memory overall. >>> -Daniel >> Now when you bring i915 memory usage up, Daniel, >> I can't refrain from bringing up the old user-space unreclaimable kernel >> memory issue, for which gem open is a good example ;) Each time >> user-space opens a gem handle, some un-reclaimable kernel memory is >> allocated, for which there is no accounting, so theoretically I think a >> user can bring a system to unusability this way. >> >> Typically there are various limits on unreclaimable objects like this, >> like open file descriptors, and IIRC the kernel even has an internal >> limit on the number of struct files you initialize, based on the >> available system memory, so dma-buf / prime should already have some >> sort of protection. > Oh yeah, we have zero cgroups limits or similar stuff for gem allocations, > so there's not really a way to isolate gpu memory usage in a sane way for > specific processes. But there's also zero limits on actual gpu usage > itself (timeslices or whatever) so I guess no one asked for this yet. In its simplest form (like in TTM if correctly implemented by drivers) this type of accounting stops non-privileged malicious GPU-users from exhausting all system physical memory causing grief for other kernel systems but not from causing grief for other GPU users. I think that's the minimum level that's intended also for example also for the struct file accounting. > My comment really was about balancing mm users under the assumption that > they're all unlimited. Yeah, sorry for stealing the thread. I usually bring this up now and again but nowadays with an exponential backoff. > -Daniel Thomas ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool) 2014-08-13 15:13 ` Thomas Hellstrom @ 2014-08-13 16:24 ` Daniel Vetter 2014-08-13 16:30 ` Alex Deucher 2014-08-13 17:20 ` Thomas Hellstrom 2014-08-14 22:29 ` Jesse Barnes 1 sibling, 2 replies; 29+ messages in thread From: Daniel Vetter @ 2014-08-13 16:24 UTC (permalink / raw) To: Thomas Hellstrom Cc: Konrad Rzeszutek Wilk, Michel Dänzer, dri-devel, Jérôme Glisse On Wed, Aug 13, 2014 at 05:13:56PM +0200, Thomas Hellstrom wrote: > On 08/13/2014 03:01 PM, Daniel Vetter wrote: > > On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote: > >> On 08/13/2014 12:42 PM, Daniel Vetter wrote: > >>> On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote: > >>>> On 08/13/2014 05:52 AM, Jérôme Glisse wrote: > >>>>> From: Jérôme Glisse <jglisse@redhat.com> > >>>>> > >>>>> When experiencing memory pressure we want to minimize pool size so that > >>>>> memory we just shrinked is not added back again just as the next thing. > >>>>> > >>>>> This will divide by 2 the maximum pool size for each device each time > >>>>> the pool have to shrink. The limit is bumped again is next allocation > >>>>> happen after one second since the last shrink. The one second delay is > >>>>> obviously an arbitrary choice. > >>>> Jérôme, > >>>> > >>>> I don't like this patch. It adds extra complexity and its usefulness is > >>>> highly questionable. > >>>> There are a number of caches in the system, and if all of them added > >>>> some sort of voluntary shrink heuristics like this, we'd end up with > >>>> impossible-to-debug unpredictable performance issues. > >>>> > >>>> We should let the memory subsystem decide when to reclaim pages from > >>>> caches and what caches to reclaim them from. > >>> Yeah, artificially limiting your cache from growing when your shrinker > >>> gets called will just break the equal-memory pressure the core mm uses to > >>> rebalance between all caches when workload changes. In i915 we let > >>> everything grow without artificial bounds and only rely upon the shrinker > >>> callbacks to ensure we don't consume more than our fair share of available > >>> memory overall. > >>> -Daniel > >> Now when you bring i915 memory usage up, Daniel, > >> I can't refrain from bringing up the old user-space unreclaimable kernel > >> memory issue, for which gem open is a good example ;) Each time > >> user-space opens a gem handle, some un-reclaimable kernel memory is > >> allocated, for which there is no accounting, so theoretically I think a > >> user can bring a system to unusability this way. > >> > >> Typically there are various limits on unreclaimable objects like this, > >> like open file descriptors, and IIRC the kernel even has an internal > >> limit on the number of struct files you initialize, based on the > >> available system memory, so dma-buf / prime should already have some > >> sort of protection. > > Oh yeah, we have zero cgroups limits or similar stuff for gem allocations, > > so there's not really a way to isolate gpu memory usage in a sane way for > > specific processes. But there's also zero limits on actual gpu usage > > itself (timeslices or whatever) so I guess no one asked for this yet. > > In its simplest form (like in TTM if correctly implemented by drivers) > this type of accounting stops non-privileged malicious GPU-users from > exhausting all system physical memory causing grief for other kernel > systems but not from causing grief for other GPU users. I think that's > the minimum level that's intended also for example also for the struct > file accounting. I think in i915 we're fairly close on that minimal standard - interactions with shrinkers and oom logic work decently. It starts to fall apart though when we've actually run out of memory - if the real memory hog is a gpu process the oom killer won't notice all that memory since it's not accounted against processes correctly. I don't agree that gpu process should be punished in general compared to other subsystems in the kernel. If the user wants to use 90% of all memory for gpu tasks then I want to make that possible, even if it means that everything else thrashes horribly. And as long as the system recovers and rebalances after that gpu memory hog is gone ofc. Iirc ttm currently has a fairly arbitrary (tunable) setting to limit system memory consumption, but I might be wrong on that. > > My comment really was about balancing mm users under the assumption that > > they're all unlimited. > > Yeah, sorry for stealing the thread. I usually bring this up now and > again but nowadays with an exponential backoff. Oh I'd love to see some cgroups or similar tracking so that server users could set sane per-process/user/task limits on how much memory/gpu time that group is allowed to consume. It's just that I haven't seen real demand for this and so couldn't make the time available to implement it. So thus far my goal is to make everything work nicely for unlimited tasks right up to the point where the OOM killer needs to step in. Past that everything starts to fall apart, but thus far that was good enough for desktop usage. Maybe WebGL will finally make this important enough so that we can fix it for real ... -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool) 2014-08-13 16:24 ` Daniel Vetter @ 2014-08-13 16:30 ` Alex Deucher 2014-08-13 16:38 ` Daniel Vetter 2014-08-13 17:38 ` Thomas Hellstrom 2014-08-13 17:20 ` Thomas Hellstrom 1 sibling, 2 replies; 29+ messages in thread From: Alex Deucher @ 2014-08-13 16:30 UTC (permalink / raw) To: Daniel Vetter Cc: Jérôme Glisse, Thomas Hellstrom, Michel Dänzer, Maling list - DRI developers, Konrad Rzeszutek Wilk On Wed, Aug 13, 2014 at 12:24 PM, Daniel Vetter <daniel@ffwll.ch> wrote: > On Wed, Aug 13, 2014 at 05:13:56PM +0200, Thomas Hellstrom wrote: >> On 08/13/2014 03:01 PM, Daniel Vetter wrote: >> > On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote: >> >> On 08/13/2014 12:42 PM, Daniel Vetter wrote: >> >>> On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote: >> >>>> On 08/13/2014 05:52 AM, Jérôme Glisse wrote: >> >>>>> From: Jérôme Glisse <jglisse@redhat.com> >> >>>>> >> >>>>> When experiencing memory pressure we want to minimize pool size so that >> >>>>> memory we just shrinked is not added back again just as the next thing. >> >>>>> >> >>>>> This will divide by 2 the maximum pool size for each device each time >> >>>>> the pool have to shrink. The limit is bumped again is next allocation >> >>>>> happen after one second since the last shrink. The one second delay is >> >>>>> obviously an arbitrary choice. >> >>>> Jérôme, >> >>>> >> >>>> I don't like this patch. It adds extra complexity and its usefulness is >> >>>> highly questionable. >> >>>> There are a number of caches in the system, and if all of them added >> >>>> some sort of voluntary shrink heuristics like this, we'd end up with >> >>>> impossible-to-debug unpredictable performance issues. >> >>>> >> >>>> We should let the memory subsystem decide when to reclaim pages from >> >>>> caches and what caches to reclaim them from. >> >>> Yeah, artificially limiting your cache from growing when your shrinker >> >>> gets called will just break the equal-memory pressure the core mm uses to >> >>> rebalance between all caches when workload changes. In i915 we let >> >>> everything grow without artificial bounds and only rely upon the shrinker >> >>> callbacks to ensure we don't consume more than our fair share of available >> >>> memory overall. >> >>> -Daniel >> >> Now when you bring i915 memory usage up, Daniel, >> >> I can't refrain from bringing up the old user-space unreclaimable kernel >> >> memory issue, for which gem open is a good example ;) Each time >> >> user-space opens a gem handle, some un-reclaimable kernel memory is >> >> allocated, for which there is no accounting, so theoretically I think a >> >> user can bring a system to unusability this way. >> >> >> >> Typically there are various limits on unreclaimable objects like this, >> >> like open file descriptors, and IIRC the kernel even has an internal >> >> limit on the number of struct files you initialize, based on the >> >> available system memory, so dma-buf / prime should already have some >> >> sort of protection. >> > Oh yeah, we have zero cgroups limits or similar stuff for gem allocations, >> > so there's not really a way to isolate gpu memory usage in a sane way for >> > specific processes. But there's also zero limits on actual gpu usage >> > itself (timeslices or whatever) so I guess no one asked for this yet. >> >> In its simplest form (like in TTM if correctly implemented by drivers) >> this type of accounting stops non-privileged malicious GPU-users from >> exhausting all system physical memory causing grief for other kernel >> systems but not from causing grief for other GPU users. I think that's >> the minimum level that's intended also for example also for the struct >> file accounting. > > I think in i915 we're fairly close on that minimal standard - interactions > with shrinkers and oom logic work decently. It starts to fall apart though > when we've actually run out of memory - if the real memory hog is a gpu > process the oom killer won't notice all that memory since it's not > accounted against processes correctly. > > I don't agree that gpu process should be punished in general compared to > other subsystems in the kernel. If the user wants to use 90% of all memory > for gpu tasks then I want to make that possible, even if it means that > everything else thrashes horribly. And as long as the system recovers and > rebalances after that gpu memory hog is gone ofc. Iirc ttm currently has a > fairly arbitrary (tunable) setting to limit system memory consumption, but > I might be wrong on that. Yes, it currently limits you to half of memory, but at least we would like to make it tuneable since there are a lot of user cases where the user wants to use 90% of memory for GPU tasks at the expense of everything else. Alex > >> > My comment really was about balancing mm users under the assumption that >> > they're all unlimited. >> >> Yeah, sorry for stealing the thread. I usually bring this up now and >> again but nowadays with an exponential backoff. > > Oh I'd love to see some cgroups or similar tracking so that server users > could set sane per-process/user/task limits on how much memory/gpu time > that group is allowed to consume. It's just that I haven't seen real > demand for this and so couldn't make the time available to implement it. > So thus far my goal is to make everything work nicely for unlimited tasks > right up to the point where the OOM killer needs to step in. Past that > everything starts to fall apart, but thus far that was good enough for > desktop usage. > > Maybe WebGL will finally make this important enough so that we can fix it > for real ... > -Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool) 2014-08-13 16:30 ` Alex Deucher @ 2014-08-13 16:38 ` Daniel Vetter 2014-08-13 16:45 ` Daniel Vetter 2014-08-13 17:38 ` Thomas Hellstrom 1 sibling, 1 reply; 29+ messages in thread From: Daniel Vetter @ 2014-08-13 16:38 UTC (permalink / raw) To: Alex Deucher Cc: Thomas Hellstrom, Konrad Rzeszutek Wilk, Michel Dänzer, Maling list - DRI developers, Jérôme Glisse On Wed, Aug 13, 2014 at 12:30:45PM -0400, Alex Deucher wrote: > On Wed, Aug 13, 2014 at 12:24 PM, Daniel Vetter <daniel@ffwll.ch> wrote: > > On Wed, Aug 13, 2014 at 05:13:56PM +0200, Thomas Hellstrom wrote: > >> On 08/13/2014 03:01 PM, Daniel Vetter wrote: > >> > On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote: > >> >> On 08/13/2014 12:42 PM, Daniel Vetter wrote: > >> >>> On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote: > >> >>>> On 08/13/2014 05:52 AM, Jérôme Glisse wrote: > >> >>>>> From: Jérôme Glisse <jglisse@redhat.com> > >> >>>>> > >> >>>>> When experiencing memory pressure we want to minimize pool size so that > >> >>>>> memory we just shrinked is not added back again just as the next thing. > >> >>>>> > >> >>>>> This will divide by 2 the maximum pool size for each device each time > >> >>>>> the pool have to shrink. The limit is bumped again is next allocation > >> >>>>> happen after one second since the last shrink. The one second delay is > >> >>>>> obviously an arbitrary choice. > >> >>>> Jérôme, > >> >>>> > >> >>>> I don't like this patch. It adds extra complexity and its usefulness is > >> >>>> highly questionable. > >> >>>> There are a number of caches in the system, and if all of them added > >> >>>> some sort of voluntary shrink heuristics like this, we'd end up with > >> >>>> impossible-to-debug unpredictable performance issues. > >> >>>> > >> >>>> We should let the memory subsystem decide when to reclaim pages from > >> >>>> caches and what caches to reclaim them from. > >> >>> Yeah, artificially limiting your cache from growing when your shrinker > >> >>> gets called will just break the equal-memory pressure the core mm uses to > >> >>> rebalance between all caches when workload changes. In i915 we let > >> >>> everything grow without artificial bounds and only rely upon the shrinker > >> >>> callbacks to ensure we don't consume more than our fair share of available > >> >>> memory overall. > >> >>> -Daniel > >> >> Now when you bring i915 memory usage up, Daniel, > >> >> I can't refrain from bringing up the old user-space unreclaimable kernel > >> >> memory issue, for which gem open is a good example ;) Each time > >> >> user-space opens a gem handle, some un-reclaimable kernel memory is > >> >> allocated, for which there is no accounting, so theoretically I think a > >> >> user can bring a system to unusability this way. > >> >> > >> >> Typically there are various limits on unreclaimable objects like this, > >> >> like open file descriptors, and IIRC the kernel even has an internal > >> >> limit on the number of struct files you initialize, based on the > >> >> available system memory, so dma-buf / prime should already have some > >> >> sort of protection. > >> > Oh yeah, we have zero cgroups limits or similar stuff for gem allocations, > >> > so there's not really a way to isolate gpu memory usage in a sane way for > >> > specific processes. But there's also zero limits on actual gpu usage > >> > itself (timeslices or whatever) so I guess no one asked for this yet. > >> > >> In its simplest form (like in TTM if correctly implemented by drivers) > >> this type of accounting stops non-privileged malicious GPU-users from > >> exhausting all system physical memory causing grief for other kernel > >> systems but not from causing grief for other GPU users. I think that's > >> the minimum level that's intended also for example also for the struct > >> file accounting. > > > > I think in i915 we're fairly close on that minimal standard - interactions > > with shrinkers and oom logic work decently. It starts to fall apart though > > when we've actually run out of memory - if the real memory hog is a gpu > > process the oom killer won't notice all that memory since it's not > > accounted against processes correctly. > > > > I don't agree that gpu process should be punished in general compared to > > other subsystems in the kernel. If the user wants to use 90% of all memory > > for gpu tasks then I want to make that possible, even if it means that > > everything else thrashes horribly. And as long as the system recovers and > > rebalances after that gpu memory hog is gone ofc. Iirc ttm currently has a > > fairly arbitrary (tunable) setting to limit system memory consumption, but > > I might be wrong on that. > > Yes, it currently limits you to half of memory, but at least we would > like to make it tuneable since there are a lot of user cases where the > user wants to use 90% of memory for GPU tasks at the expense of > everything else. Ime a lot of fun stuff starts to happen when you go there. We have piles of memory thrashing testcases and generally had lots of fun with our shrinker, so I think until you've really beaten onto those paths in ttm+radeon I'd keep the limit where it is. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool) 2014-08-13 16:38 ` Daniel Vetter @ 2014-08-13 16:45 ` Daniel Vetter 0 siblings, 0 replies; 29+ messages in thread From: Daniel Vetter @ 2014-08-13 16:45 UTC (permalink / raw) To: Alex Deucher Cc: Thomas Hellstrom, Konrad Rzeszutek Wilk, Michel Dänzer, Maling list - DRI developers, Jérôme Glisse On Wed, Aug 13, 2014 at 6:38 PM, Daniel Vetter <daniel@ffwll.ch> wrote: >> Yes, it currently limits you to half of memory, but at least we would >> like to make it tuneable since there are a lot of user cases where the >> user wants to use 90% of memory for GPU tasks at the expense of >> everything else. > > Ime a lot of fun stuff starts to happen when you go there. We have piles > of memory thrashing testcases and generally had lots of fun with our > shrinker, so I think until you've really beaten onto those paths in > ttm+radeon I'd keep the limit where it is. One example that already starts if you go above 50% is that by default the dirty pagecache is limited to 40% of memory. Above that you start to stall in writeback, but gpus are really fast at re-dirtying memory. So we've seen cases where the core mm OOMed with essentially 90% of memory on writeback and piles of free swap. Waiting a few seconds for the SSD to catch up would have gotten it out of that tight spot without killing any process. One side-effect of such fun is that memory allocations start to fail in really interesting places, and you need to pile in hacks so make it all a bit more synchronous to avoid the core mm freaking out. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool) 2014-08-13 16:30 ` Alex Deucher 2014-08-13 16:38 ` Daniel Vetter @ 2014-08-13 17:38 ` Thomas Hellstrom 1 sibling, 0 replies; 29+ messages in thread From: Thomas Hellstrom @ 2014-08-13 17:38 UTC (permalink / raw) To: Alex Deucher Cc: Jérôme Glisse, Michel Dänzer, Maling list - DRI developers, Konrad Rzeszutek Wilk On 08/13/2014 06:30 PM, Alex Deucher wrote: > On Wed, Aug 13, 2014 at 12:24 PM, Daniel Vetter <daniel@ffwll.ch> wrote: >> On Wed, Aug 13, 2014 at 05:13:56PM +0200, Thomas Hellstrom wrote: >>> On 08/13/2014 03:01 PM, Daniel Vetter wrote: >>>> On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote: >>>>> On 08/13/2014 12:42 PM, Daniel Vetter wrote: >>>>>> On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote: >>>>>>> On 08/13/2014 05:52 AM, Jérôme Glisse wrote: >>>>>>>> From: Jérôme Glisse <jglisse@redhat.com> >>>>>>>> >>>>>>>> When experiencing memory pressure we want to minimize pool size so that >>>>>>>> memory we just shrinked is not added back again just as the next thing. >>>>>>>> >>>>>>>> This will divide by 2 the maximum pool size for each device each time >>>>>>>> the pool have to shrink. The limit is bumped again is next allocation >>>>>>>> happen after one second since the last shrink. The one second delay is >>>>>>>> obviously an arbitrary choice. >>>>>>> Jérôme, >>>>>>> >>>>>>> I don't like this patch. It adds extra complexity and its usefulness is >>>>>>> highly questionable. >>>>>>> There are a number of caches in the system, and if all of them added >>>>>>> some sort of voluntary shrink heuristics like this, we'd end up with >>>>>>> impossible-to-debug unpredictable performance issues. >>>>>>> >>>>>>> We should let the memory subsystem decide when to reclaim pages from >>>>>>> caches and what caches to reclaim them from. >>>>>> Yeah, artificially limiting your cache from growing when your shrinker >>>>>> gets called will just break the equal-memory pressure the core mm uses to >>>>>> rebalance between all caches when workload changes. In i915 we let >>>>>> everything grow without artificial bounds and only rely upon the shrinker >>>>>> callbacks to ensure we don't consume more than our fair share of available >>>>>> memory overall. >>>>>> -Daniel >>>>> Now when you bring i915 memory usage up, Daniel, >>>>> I can't refrain from bringing up the old user-space unreclaimable kernel >>>>> memory issue, for which gem open is a good example ;) Each time >>>>> user-space opens a gem handle, some un-reclaimable kernel memory is >>>>> allocated, for which there is no accounting, so theoretically I think a >>>>> user can bring a system to unusability this way. >>>>> >>>>> Typically there are various limits on unreclaimable objects like this, >>>>> like open file descriptors, and IIRC the kernel even has an internal >>>>> limit on the number of struct files you initialize, based on the >>>>> available system memory, so dma-buf / prime should already have some >>>>> sort of protection. >>>> Oh yeah, we have zero cgroups limits or similar stuff for gem allocations, >>>> so there's not really a way to isolate gpu memory usage in a sane way for >>>> specific processes. But there's also zero limits on actual gpu usage >>>> itself (timeslices or whatever) so I guess no one asked for this yet. >>> In its simplest form (like in TTM if correctly implemented by drivers) >>> this type of accounting stops non-privileged malicious GPU-users from >>> exhausting all system physical memory causing grief for other kernel >>> systems but not from causing grief for other GPU users. I think that's >>> the minimum level that's intended also for example also for the struct >>> file accounting. >> I think in i915 we're fairly close on that minimal standard - interactions >> with shrinkers and oom logic work decently. It starts to fall apart though >> when we've actually run out of memory - if the real memory hog is a gpu >> process the oom killer won't notice all that memory since it's not >> accounted against processes correctly. >> >> I don't agree that gpu process should be punished in general compared to >> other subsystems in the kernel. If the user wants to use 90% of all memory >> for gpu tasks then I want to make that possible, even if it means that >> everything else thrashes horribly. And as long as the system recovers and >> rebalances after that gpu memory hog is gone ofc. Iirc ttm currently has a >> fairly arbitrary (tunable) setting to limit system memory consumption, but >> I might be wrong on that. > Yes, it currently limits you to half of memory, but at least we would > like to make it tuneable since there are a lot of user cases where the > user wants to use 90% of memory for GPU tasks at the expense of > everything else. > > Alex > It's in /sys/devices/virtual/drm/ttm/memory_accounting/* Run-time tunable, but you should probably write an app to tune if you want to hand out to users, since if you up the limit, you probably want to modify a number of values. zone_memory: ro: Total memory in the zone. used_memory: ro: Currently pinned memory. available_memory: rw: Allocation limit. emergency_memory: rw: Allocation limit for CAP_SYS_ADMIN swap_limit: rw: Swapper thread starts at this limit. /Thomas _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool) 2014-08-13 16:24 ` Daniel Vetter 2014-08-13 16:30 ` Alex Deucher @ 2014-08-13 17:20 ` Thomas Hellstrom 1 sibling, 0 replies; 29+ messages in thread From: Thomas Hellstrom @ 2014-08-13 17:20 UTC (permalink / raw) To: Daniel Vetter Cc: Michel Dänzer, Jérôme Glisse, dri-devel, Konrad Rzeszutek Wilk On 08/13/2014 06:24 PM, Daniel Vetter wrote: > On Wed, Aug 13, 2014 at 05:13:56PM +0200, Thomas Hellstrom wrote: >> On 08/13/2014 03:01 PM, Daniel Vetter wrote: >>> On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote: >>>> On 08/13/2014 12:42 PM, Daniel Vetter wrote: >>>>> On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote: >>>>>> On 08/13/2014 05:52 AM, Jérôme Glisse wrote: >>>>>>> From: Jérôme Glisse <jglisse@redhat.com> >>>>>>> >>>>>>> When experiencing memory pressure we want to minimize pool size so that >>>>>>> memory we just shrinked is not added back again just as the next thing. >>>>>>> >>>>>>> This will divide by 2 the maximum pool size for each device each time >>>>>>> the pool have to shrink. The limit is bumped again is next allocation >>>>>>> happen after one second since the last shrink. The one second delay is >>>>>>> obviously an arbitrary choice. >>>>>> Jérôme, >>>>>> >>>>>> I don't like this patch. It adds extra complexity and its usefulness is >>>>>> highly questionable. >>>>>> There are a number of caches in the system, and if all of them added >>>>>> some sort of voluntary shrink heuristics like this, we'd end up with >>>>>> impossible-to-debug unpredictable performance issues. >>>>>> >>>>>> We should let the memory subsystem decide when to reclaim pages from >>>>>> caches and what caches to reclaim them from. >>>>> Yeah, artificially limiting your cache from growing when your shrinker >>>>> gets called will just break the equal-memory pressure the core mm uses to >>>>> rebalance between all caches when workload changes. In i915 we let >>>>> everything grow without artificial bounds and only rely upon the shrinker >>>>> callbacks to ensure we don't consume more than our fair share of available >>>>> memory overall. >>>>> -Daniel >>>> Now when you bring i915 memory usage up, Daniel, >>>> I can't refrain from bringing up the old user-space unreclaimable kernel >>>> memory issue, for which gem open is a good example ;) Each time >>>> user-space opens a gem handle, some un-reclaimable kernel memory is >>>> allocated, for which there is no accounting, so theoretically I think a >>>> user can bring a system to unusability this way. >>>> >>>> Typically there are various limits on unreclaimable objects like this, >>>> like open file descriptors, and IIRC the kernel even has an internal >>>> limit on the number of struct files you initialize, based on the >>>> available system memory, so dma-buf / prime should already have some >>>> sort of protection. >>> Oh yeah, we have zero cgroups limits or similar stuff for gem allocations, >>> so there's not really a way to isolate gpu memory usage in a sane way for >>> specific processes. But there's also zero limits on actual gpu usage >>> itself (timeslices or whatever) so I guess no one asked for this yet. >> In its simplest form (like in TTM if correctly implemented by drivers) >> this type of accounting stops non-privileged malicious GPU-users from >> exhausting all system physical memory causing grief for other kernel >> systems but not from causing grief for other GPU users. I think that's >> the minimum level that's intended also for example also for the struct >> file accounting. > I think in i915 we're fairly close on that minimal standard - interactions > with shrinkers and oom logic work decently. It starts to fall apart though > when we've actually run out of memory - if the real memory hog is a gpu > process the oom killer won't notice all that memory since it's not > accounted against processes correctly. > > I don't agree that gpu process should be punished in general compared to > other subsystems in the kernel. If the user wants to use 90% of all memory > for gpu tasks then I want to make that possible, even if it means that > everything else thrashes horribly. And as long as the system recovers and > rebalances after that gpu memory hog is gone ofc. Iirc ttm currently has a > fairly arbitrary (tunable) setting to limit system memory consumption, but > I might be wrong on that. No, that's correct, or rather it's intended to limit pinned unreclaimable system memory (though part of what's unreclaimable could actually be made reclaimable if we'd implement another shrinker level). >>> My comment really was about balancing mm users under the assumption that >>> they're all unlimited. >> Yeah, sorry for stealing the thread. I usually bring this up now and >> again but nowadays with an exponential backoff. > Oh I'd love to see some cgroups or similar tracking so that server users > could set sane per-process/user/task limits on how much memory/gpu time > that group is allowed to consume. It's just that I haven't seen real > demand for this and so couldn't make the time available to implement it. > So thus far my goal is to make everything work nicely for unlimited tasks > right up to the point where the OOM killer needs to step in. Past that > everything starts to fall apart, but thus far that was good enough for > desktop usage. Well I'm not sure if things have changed but last time (a couple of years ago) I looked at this situation (kernel out of physical memory but a fair amount of swap space left) the OOM killer was never invoked, so a number of more or less critical kernel systems (disk I/O, paging, networking) where getting -ENOMEM and hitting rarely tested error paths. A state you don't want to have the kernel in. Now the OOM algorithm may of course have changed since then. My point is that with unaccounted constructs like gem-open-from-name it should be easy for any unpriviliged authenticated gem client to pin all kernel physical memory, put the kernel in that state and keep it there, and IMO a kernel-user space interface shouldn't allow that. /Thomas > > Maybe WebGL will finally make this important enough so that we can fix it > for real ... > -Daniel ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool) 2014-08-13 15:13 ` Thomas Hellstrom 2014-08-13 16:24 ` Daniel Vetter @ 2014-08-14 22:29 ` Jesse Barnes 1 sibling, 0 replies; 29+ messages in thread From: Jesse Barnes @ 2014-08-14 22:29 UTC (permalink / raw) To: Thomas Hellstrom Cc: dri-devel, Michel Dänzer, Jérôme Glisse, Konrad Rzeszutek Wilk On Wed, 13 Aug 2014 17:13:56 +0200 Thomas Hellstrom <thellstrom@vmware.com> wrote: > On 08/13/2014 03:01 PM, Daniel Vetter wrote: > > On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote: > >> On 08/13/2014 12:42 PM, Daniel Vetter wrote: > >>> On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote: > >>>> On 08/13/2014 05:52 AM, Jérôme Glisse wrote: > >>>>> From: Jérôme Glisse <jglisse@redhat.com> > >>>>> > >>>>> When experiencing memory pressure we want to minimize pool size so that > >>>>> memory we just shrinked is not added back again just as the next thing. > >>>>> > >>>>> This will divide by 2 the maximum pool size for each device each time > >>>>> the pool have to shrink. The limit is bumped again is next allocation > >>>>> happen after one second since the last shrink. The one second delay is > >>>>> obviously an arbitrary choice. > >>>> Jérôme, > >>>> > >>>> I don't like this patch. It adds extra complexity and its usefulness is > >>>> highly questionable. > >>>> There are a number of caches in the system, and if all of them added > >>>> some sort of voluntary shrink heuristics like this, we'd end up with > >>>> impossible-to-debug unpredictable performance issues. > >>>> > >>>> We should let the memory subsystem decide when to reclaim pages from > >>>> caches and what caches to reclaim them from. > >>> Yeah, artificially limiting your cache from growing when your shrinker > >>> gets called will just break the equal-memory pressure the core mm uses to > >>> rebalance between all caches when workload changes. In i915 we let > >>> everything grow without artificial bounds and only rely upon the shrinker > >>> callbacks to ensure we don't consume more than our fair share of available > >>> memory overall. > >>> -Daniel > >> Now when you bring i915 memory usage up, Daniel, > >> I can't refrain from bringing up the old user-space unreclaimable kernel > >> memory issue, for which gem open is a good example ;) Each time > >> user-space opens a gem handle, some un-reclaimable kernel memory is > >> allocated, for which there is no accounting, so theoretically I think a > >> user can bring a system to unusability this way. > >> > >> Typically there are various limits on unreclaimable objects like this, > >> like open file descriptors, and IIRC the kernel even has an internal > >> limit on the number of struct files you initialize, based on the > >> available system memory, so dma-buf / prime should already have some > >> sort of protection. > > Oh yeah, we have zero cgroups limits or similar stuff for gem allocations, > > so there's not really a way to isolate gpu memory usage in a sane way for > > specific processes. But there's also zero limits on actual gpu usage > > itself (timeslices or whatever) so I guess no one asked for this yet. > > In its simplest form (like in TTM if correctly implemented by drivers) > this type of accounting stops non-privileged malicious GPU-users from > exhausting all system physical memory causing grief for other kernel > systems but not from causing grief for other GPU users. I think that's > the minimum level that's intended also for example also for the struct > file accounting. > > > My comment really was about balancing mm users under the assumption that > > they're all unlimited. > > Yeah, sorry for stealing the thread. I usually bring this up now and > again but nowadays with an exponential backoff. Yeah I agree we're missing some good limits stuff in i915 and DRM in general probably. Chris started looking at this awhile back, but I haven't seen anything recently. Tying into the ulimits/rlimits might make sense, and at the very least we need to account for things properly so we can add new limits where needed. -- Jesse Barnes, Intel Open Source Technology Center _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2015-07-08 2:34 UTC | newest] Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-08-13 3:52 [PATCH 0/3] drm/ttm: hard to swim in an empty pool Jérôme Glisse 2014-08-13 3:52 ` [PATCH 1/3] drm/ttm: set sensible pool size limit Jérôme Glisse 2014-08-13 6:24 ` Michel Dänzer 2014-08-13 3:52 ` [PATCH 2/3] drm/ttm: fix object deallocation to properly fill in the page pool Jérôme Glisse 2015-03-25 19:06 ` Konrad Rzeszutek Wilk 2015-07-06 9:11 ` Michel Dänzer 2015-07-06 16:10 ` Jerome Glisse 2015-07-07 6:39 ` Michel Dänzer 2015-07-07 17:41 ` Jerome Glisse 2015-07-08 2:34 ` Michel Dänzer 2014-08-13 3:52 ` [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool Jérôme Glisse 2014-08-13 6:32 ` Michel Dänzer 2014-08-13 9:06 ` Thomas Hellstrom 2014-08-13 10:42 ` Daniel Vetter 2014-08-13 12:35 ` GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool) Thomas Hellstrom 2014-08-13 12:40 ` David Herrmann 2014-08-13 12:48 ` Thomas Hellstrom 2014-08-13 13:01 ` Daniel Vetter 2014-08-13 14:09 ` Oded Gabbay 2014-08-13 15:19 ` Thomas Hellstrom 2014-08-13 16:30 ` Daniel Vetter 2014-08-13 15:13 ` Thomas Hellstrom 2014-08-13 16:24 ` Daniel Vetter 2014-08-13 16:30 ` Alex Deucher 2014-08-13 16:38 ` Daniel Vetter 2014-08-13 16:45 ` Daniel Vetter 2014-08-13 17:38 ` Thomas Hellstrom 2014-08-13 17:20 ` Thomas Hellstrom 2014-08-14 22:29 ` Jesse Barnes
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.