linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Possible deny of service with memfd_create()
@ 2021-02-04 16:32 Christian König
  2021-02-04 17:12 ` Michal Hocko
  0 siblings, 1 reply; 7+ messages in thread
From: Christian König @ 2021-02-04 16:32 UTC (permalink / raw)
  To: Michal Hocko, LKML

Hi Michal,

as requested in the other mail thread the following sample code gets my 
test system down within seconds.

The issue is that the memory allocated for the file descriptor is not 
accounted to the process allocating it, so the OOM killer pics whatever 
process it things is good but never my small test program.

Since memfd_create() doesn't need any special permission this is a 
rather nice deny of service and as far as I can see also works with a 
standard Ubuntu 5.4.0-65-generic kernel.

Cheers,
Christian.

#define _GNU_SOURCE
#include <sys/mman.h>
#include <unistd.h>
#include <stdlib.h>

unsigned char page[4096];

int main(void)
{
         int i, fd;

         for (i = 0; i < 4096; ++i)
                 page[i] = i;

         fd = memfd_create("test", 0);

         while (1)
                 write(fd, page, 4096);
}


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Possible deny of service with memfd_create()
  2021-02-04 16:32 Possible deny of service with memfd_create() Christian König
@ 2021-02-04 17:12 ` Michal Hocko
  2021-02-05  0:32   ` Hugh Dickins
  0 siblings, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2021-02-04 17:12 UTC (permalink / raw)
  To: Christian König; +Cc: LKML

On Thu 04-02-21 17:32:20, Christian König wrote:
> Hi Michal,
> 
> as requested in the other mail thread the following sample code gets my test
> system down within seconds.
> 
> The issue is that the memory allocated for the file descriptor is not
> accounted to the process allocating it, so the OOM killer pics whatever
> process it things is good but never my small test program.
> 
> Since memfd_create() doesn't need any special permission this is a rather
> nice deny of service and as far as I can see also works with a standard
> Ubuntu 5.4.0-65-generic kernel.

Thanks for following up. This is really nasty but now that I am looking
at it more closely, this is not really different from tmpfs in general.
You are free to create files and eat the memory without being accounted
for that memory because that is not seen as your memory from the sysstem
POV. You would have to map that memory to be part of your rss.

The only existing protection right now is to use memoery cgroup
controller because the tmpfs memory is accounted to the process which
faults the memory in (or write to the file).

I am not sure there is a good way to handle this in general
unfortunatelly. Shmem is is just tricky (e.g. how to you deal with left
overs after the fd is closed?). Maybe memfd_create can be more clever
and account memory to all owners of the fd but even that sounds far from
trivial from the accounting POV. It is true that tmpfs can at least
control who can write to it which is not the case for memfd but then we
hit the backward compatibility wall.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Possible deny of service with memfd_create()
  2021-02-04 17:12 ` Michal Hocko
@ 2021-02-05  0:32   ` Hugh Dickins
  2021-02-05  7:54     ` Christian König
  0 siblings, 1 reply; 7+ messages in thread
From: Hugh Dickins @ 2021-02-05  0:32 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Christian Koenig, LKML

On Thu, 4 Feb 2021, Michal Hocko wrote:
> On Thu 04-02-21 17:32:20, Christian Koenig wrote:
> > Hi Michal,
> > 
> > as requested in the other mail thread the following sample code gets my test
> > system down within seconds.
> > 
> > The issue is that the memory allocated for the file descriptor is not
> > accounted to the process allocating it, so the OOM killer pics whatever
> > process it things is good but never my small test program.
> > 
> > Since memfd_create() doesn't need any special permission this is a rather
> > nice deny of service and as far as I can see also works with a standard
> > Ubuntu 5.4.0-65-generic kernel.
> 
> Thanks for following up. This is really nasty but now that I am looking
> at it more closely, this is not really different from tmpfs in general.
> You are free to create files and eat the memory without being accounted
> for that memory because that is not seen as your memory from the sysstem
> POV. You would have to map that memory to be part of your rss.
> 
> The only existing protection right now is to use memoery cgroup
> controller because the tmpfs memory is accounted to the process which
> faults the memory in (or write to the file).
> 
> I am not sure there is a good way to handle this in general
> unfortunatelly. Shmem is is just tricky (e.g. how to you deal with left
> overs after the fd is closed?). Maybe memfd_create can be more clever
> and account memory to all owners of the fd but even that sounds far from
> trivial from the accounting POV. It is true that tmpfs can at least
> control who can write to it which is not the case for memfd but then we
> hit the backward compatibility wall.

Yes, no solution satisfactory, and memcg best, but don't forget
echo 2 >/proc/sys/vm/overcommit_memory

Hugh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Possible deny of service with memfd_create()
  2021-02-05  0:32   ` Hugh Dickins
@ 2021-02-05  7:54     ` Christian König
  2021-02-05 10:50       ` Michal Hocko
  0 siblings, 1 reply; 7+ messages in thread
From: Christian König @ 2021-02-05  7:54 UTC (permalink / raw)
  To: Hugh Dickins, Michal Hocko; +Cc: LKML

Am 05.02.21 um 01:32 schrieb Hugh Dickins:
> On Thu, 4 Feb 2021, Michal Hocko wrote:
>> On Thu 04-02-21 17:32:20, Christian Koenig wrote:
>>> Hi Michal,
>>>
>>> as requested in the other mail thread the following sample code gets my test
>>> system down within seconds.
>>>
>>> The issue is that the memory allocated for the file descriptor is not
>>> accounted to the process allocating it, so the OOM killer pics whatever
>>> process it things is good but never my small test program.
>>>
>>> Since memfd_create() doesn't need any special permission this is a rather
>>> nice deny of service and as far as I can see also works with a standard
>>> Ubuntu 5.4.0-65-generic kernel.
>> Thanks for following up. This is really nasty but now that I am looking
>> at it more closely, this is not really different from tmpfs in general.
>> You are free to create files and eat the memory without being accounted
>> for that memory because that is not seen as your memory from the sysstem
>> POV. You would have to map that memory to be part of your rss.

I mostly agree. The big difference is that tmpfs is only available when 
mounted.

And tmpfs can be restricted in size per mount point as well as per user 
quotas IIRC. Looking at my desktop system those restrictions are 
actually exactly what I see there.

But memfd_create() is just free for all, you don't have any size limit 
nor access restriction as far as I can see.

>> The only existing protection right now is to use memoery cgroup
>> controller because the tmpfs memory is accounted to the process which
>> faults the memory in (or write to the file).

Agreed, but having to rely on cgroup is not really satisfying when you 
have to maintain a hardened server.

>> I am not sure there is a good way to handle this in general
>> unfortunatelly. Shmem is is just tricky (e.g. how to you deal with left
>> overs after the fd is closed?). Maybe memfd_create can be more clever
>> and account memory to all owners of the fd but even that sounds far from
>> trivial from the accounting POV. It is true that tmpfs can at least
>> control who can write to it which is not the case for memfd but then we
>> hit the backward compatibility wall.
> Yes, no solution satisfactory, and memcg best, but don't forget
> echo 2 >/proc/sys/vm/overcommit_memory

Good point as well.

Regards,
Christian.

>
> Hugh


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Possible deny of service with memfd_create()
  2021-02-05  7:54     ` Christian König
@ 2021-02-05 10:50       ` Michal Hocko
  2021-02-05 10:57         ` Christian König
  0 siblings, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2021-02-05 10:50 UTC (permalink / raw)
  To: Christian König; +Cc: Hugh Dickins, LKML

On Fri 05-02-21 08:54:31, Christian König wrote:
> Am 05.02.21 um 01:32 schrieb Hugh Dickins:
> > On Thu, 4 Feb 2021, Michal Hocko wrote:
> > > On Thu 04-02-21 17:32:20, Christian Koenig wrote:
> > > > Hi Michal,
> > > > 
> > > > as requested in the other mail thread the following sample code gets my test
> > > > system down within seconds.
> > > > 
> > > > The issue is that the memory allocated for the file descriptor is not
> > > > accounted to the process allocating it, so the OOM killer pics whatever
> > > > process it things is good but never my small test program.
> > > > 
> > > > Since memfd_create() doesn't need any special permission this is a rather
> > > > nice deny of service and as far as I can see also works with a standard
> > > > Ubuntu 5.4.0-65-generic kernel.
> > > Thanks for following up. This is really nasty but now that I am looking
> > > at it more closely, this is not really different from tmpfs in general.
> > > You are free to create files and eat the memory without being accounted
> > > for that memory because that is not seen as your memory from the sysstem
> > > POV. You would have to map that memory to be part of your rss.
> 
> I mostly agree. The big difference is that tmpfs is only available when
> mounted.
>
> And tmpfs can be restricted in size per mount point as well as per user
> quotas IIRC. Looking at my desktop system those restrictions are actually
> exactly what I see there.

I cannot find anything about per user quotas for tmpfs in the tmpfs man
page. Or maybe I am looking at a wrong layer and there is a generic
handling somewhere in the vfs core?

> But memfd_create() is just free for all, you don't have any size limit nor
> access restriction as far as I can see.

Yes, this is unfortunate and a design decision that should have been
considered when the syscall has been introduced. But this boat has
sailed looong ago to change that without risking a userspace breakage.

> > > The only existing protection right now is to use memoery cgroup
> > > controller because the tmpfs memory is accounted to the process which
> > > faults the memory in (or write to the file).
> 
> Agreed, but having to rely on cgroup is not really satisfying when you have
> to maintain a hardened server.

Yes I do recognize the pain. The only other way to mitigate the risk is
to disallow the syscall to untrusted users in a hardened environment.
You should be very strict in tmpfs usage there already.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Possible deny of service with memfd_create()
  2021-02-05 10:50       ` Michal Hocko
@ 2021-02-05 10:57         ` Christian König
  2021-02-05 12:26           ` Michal Hocko
  0 siblings, 1 reply; 7+ messages in thread
From: Christian König @ 2021-02-05 10:57 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Hugh Dickins, LKML

Am 05.02.21 um 11:50 schrieb Michal Hocko:
> On Fri 05-02-21 08:54:31, Christian König wrote:
>> Am 05.02.21 um 01:32 schrieb Hugh Dickins:
>>> On Thu, 4 Feb 2021, Michal Hocko wrote:
>>>> On Thu 04-02-21 17:32:20, Christian Koenig wrote:
>>>>> Hi Michal,
>>>>>
>>>>> as requested in the other mail thread the following sample code gets my test
>>>>> system down within seconds.
>>>>>
>>>>> The issue is that the memory allocated for the file descriptor is not
>>>>> accounted to the process allocating it, so the OOM killer pics whatever
>>>>> process it things is good but never my small test program.
>>>>>
>>>>> Since memfd_create() doesn't need any special permission this is a rather
>>>>> nice deny of service and as far as I can see also works with a standard
>>>>> Ubuntu 5.4.0-65-generic kernel.
>>>> Thanks for following up. This is really nasty but now that I am looking
>>>> at it more closely, this is not really different from tmpfs in general.
>>>> You are free to create files and eat the memory without being accounted
>>>> for that memory because that is not seen as your memory from the sysstem
>>>> POV. You would have to map that memory to be part of your rss.
>> I mostly agree. The big difference is that tmpfs is only available when
>> mounted.
>>
>> And tmpfs can be restricted in size per mount point as well as per user
>> quotas IIRC. Looking at my desktop system those restrictions are actually
>> exactly what I see there.
> I cannot find anything about per user quotas for tmpfs in the tmpfs man
> page. Or maybe I am looking at a wrong layer and there is a generic
> handling somewhere in the vfs core?

I think so, yes. I briefly remember a discussion about how to implement 
quotas for tmpfs, but that was a really long time ago and I didn't 
followed it till the end.

>> But memfd_create() is just free for all, you don't have any size limit nor
>> access restriction as far as I can see.
> Yes, this is unfortunate and a design decision that should have been
> considered when the syscall has been introduced. But this boat has
> sailed looong ago to change that without risking a userspace breakage.
>
>>>> The only existing protection right now is to use memoery cgroup
>>>> controller because the tmpfs memory is accounted to the process which
>>>> faults the memory in (or write to the file).
>> Agreed, but having to rely on cgroup is not really satisfying when you have
>> to maintain a hardened server.
> Yes I do recognize the pain. The only other way to mitigate the risk is
> to disallow the syscall to untrusted users in a hardened environment.
> You should be very strict in tmpfs usage there already.
>

Well it is perfectly valid for a process to use as much memory as it 
wants, the problem is that we are not holding the process accountable 
for it.

As I said we have similar problems with GPU drivers and I think we just 
need a way to do this.

Let me think about it a bit, maybe we can somehow use the file owner for 
this.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Possible deny of service with memfd_create()
  2021-02-05 10:57         ` Christian König
@ 2021-02-05 12:26           ` Michal Hocko
  0 siblings, 0 replies; 7+ messages in thread
From: Michal Hocko @ 2021-02-05 12:26 UTC (permalink / raw)
  To: Christian König; +Cc: Hugh Dickins, LKML

On Fri 05-02-21 11:57:09, Christian König wrote:
> Am 05.02.21 um 11:50 schrieb Michal Hocko:
> > On Fri 05-02-21 08:54:31, Christian König wrote:
> > > Am 05.02.21 um 01:32 schrieb Hugh Dickins:
> > > > On Thu, 4 Feb 2021, Michal Hocko wrote:
[...]
> > > > > The only existing protection right now is to use memoery cgroup
> > > > > controller because the tmpfs memory is accounted to the process which
> > > > > faults the memory in (or write to the file).
> > > Agreed, but having to rely on cgroup is not really satisfying when you have
> > > to maintain a hardened server.
> > Yes I do recognize the pain. The only other way to mitigate the risk is
> > to disallow the syscall to untrusted users in a hardened environment.
> > You should be very strict in tmpfs usage there already.
> > 
> 
> Well it is perfectly valid for a process to use as much memory as it wants,
> the problem is that we are not holding the process accountable for it.
> 
> As I said we have similar problems with GPU drivers and I think we just need
> a way to do this.
> 
> Let me think about it a bit, maybe we can somehow use the file owner for
> this.

There are some land mines on the way to watch for. The most obvious one
would be to not double account populated file with its mapping. Those
two might live in separate processes. So you would need a rmap walk just
to evaluate oom_badness. Also you need to consider files which are not
open anymore or they have been passed through to another process. And
then the question is what to do about them. Killing their owner doesn't
help anything because the file is still left behind.  I do expect you
will learn more problems on the way but I definitely do not want to
discourage you from this endeavor.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-02-06  0:37 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-04 16:32 Possible deny of service with memfd_create() Christian König
2021-02-04 17:12 ` Michal Hocko
2021-02-05  0:32   ` Hugh Dickins
2021-02-05  7:54     ` Christian König
2021-02-05 10:50       ` Michal Hocko
2021-02-05 10:57         ` Christian König
2021-02-05 12:26           ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).