From mboxrd@z Thu Jan  1 00:00:00 1970
From: Thomas Hellstrom <thellstrom@vmware.com>
Subject: Re: GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure
 minimize the size of memory pool)
Date: Wed, 13 Aug 2014 17:13:56 +0200
Message-ID: <53EB80B4.207@vmware.com>
References: <1407901926-24516-1-git-send-email-j.glisse@gmail.com>
 <1407901926-24516-4-git-send-email-j.glisse@gmail.com>
 <53EB2A91.3000804@vmware.com> <20140813104246.GP10500@phenom.ffwll.local>
 <53EB5BA8.3010206@vmware.com> <20140813130108.GA10500@phenom.ffwll.local>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Return-path: <dri-devel-bounces@lists.freedesktop.org>
Received: from smtp-outbound-1.vmware.com (smtp-outbound-1.vmware.com
 [208.91.2.12])
 by gabe.freedesktop.org (Postfix) with ESMTP id 161006E599
 for <dri-devel@lists.freedesktop.org>; Wed, 13 Aug 2014 08:14:08 -0700 (PDT)
In-Reply-To: <20140813130108.GA10500@phenom.ffwll.local>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: =?ISO-8859-1?Q?Michel_D=E4nzer?= <michel@daenzer.net>, =?ISO-8859-1?Q?J=E9r=F4me_Glisse?= <jglisse@redhat.com>, dri-devel@lists.freedesktop.org, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
List-Id: dri-devel@lists.freedesktop.org

On 08/13/2014 03:01 PM, Daniel Vetter wrote:
> On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote:
>> On 08/13/2014 12:42 PM, Daniel Vetter wrote:
>>> On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:
>>>> On 08/13/2014 05:52 AM, J=E9r=F4me Glisse wrote:
>>>>> From: J=E9r=F4me Glisse <jglisse@redhat.com>
>>>>>
>>>>> When experiencing memory pressure we want to minimize pool size so th=
at
>>>>> memory we just shrinked is not added back again just as the next thin=
g.
>>>>>
>>>>> This will divide by 2 the maximum pool size for each device each time
>>>>> the pool have to shrink. The limit is bumped again is next allocation
>>>>> happen after one second since the last shrink. The one second delay is
>>>>> obviously an arbitrary choice.
>>>> J=E9r=F4me,
>>>>
>>>> I don't like this patch. It adds extra complexity and its usefulness is
>>>> highly questionable.
>>>> There are a number of caches in the system, and if all of them added
>>>> some sort of voluntary shrink heuristics like this, we'd end up with
>>>> impossible-to-debug unpredictable performance issues.
>>>>
>>>> We should let the memory subsystem decide when to reclaim pages from
>>>> caches and what caches to reclaim them from.
>>> Yeah, artificially limiting your cache from growing when your shrinker
>>> gets called will just break the equal-memory pressure the core mm uses =
to
>>> rebalance between all caches when workload changes. In i915 we let
>>> everything grow without artificial bounds and only rely upon the shrink=
er
>>> callbacks to ensure we don't consume more than our fair share of availa=
ble
>>> memory overall.
>>> -Daniel
>> Now when you bring i915 memory usage up, Daniel,
>> I can't refrain from bringing up the old user-space unreclaimable kernel
>> memory issue, for which gem open is a good example ;) Each time
>> user-space opens a gem handle, some un-reclaimable kernel memory is
>> allocated, for which there is no accounting, so theoretically I think a
>> user can bring a system to unusability this way.
>>
>> Typically there are various limits on unreclaimable objects like this,
>> like open file descriptors, and IIRC the kernel even has an internal
>> limit on the number of struct files you initialize, based on the
>> available system memory, so dma-buf / prime should already have some
>> sort of protection.
> Oh yeah, we have zero cgroups limits or similar stuff for gem allocations,
> so there's not really a way to isolate gpu memory usage in a sane way for
> specific processes. But there's also zero limits on actual gpu usage
> itself (timeslices or whatever) so I guess no one asked for this yet.

In its simplest form (like in TTM if correctly implemented by drivers)
this type of accounting stops non-privileged malicious GPU-users from
exhausting all system physical memory causing grief for other kernel
systems but not from causing grief for other GPU users. I think that's
the minimum level that's intended also for example also for the struct
file accounting.

> My comment really was about balancing mm users under the assumption that
> they're all unlimited.

Yeah, sorry for stealing the thread. I usually bring this up now and
again but nowadays with an exponential backoff.


> -Daniel

Thomas