linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Is reading from /proc/self/smaps thread-safe?
@ 2016-07-26 12:44 Marcin Ślusarz
  2016-07-31  6:22 ` Mateusz Guzik
  0 siblings, 1 reply; 3+ messages in thread
From: Marcin Ślusarz @ 2016-07-26 12:44 UTC (permalink / raw)
  To: linux-mm, LKML

Hey

I have a simple program that mmaps 8MB of anonymous memory, spawns 16
threads, reads /proc/self/smaps in each thread and looks up whether
mapped address can be found in smaps. From time to time it's not there.

Is this supposed to work reliably?

My guess is that libc functions allocate memory internally using mmap
and modify process' address space while other thread is iterating over
vmas.

I see that reading from smaps takes mmap_sem in read mode. I'm guessing
vm modifications are done under mmap_sem in write mode.

Documentation/filesystem/proc.txt says reading from smaps is "slow but
very precise" (although in context of RSS).

Example program below.

smaps_test.c:
#include <fcntl.h>
#include <pthread.h>
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>

#define N 16
#define SZ (8 * 1024 * 1024)

void *addr;
char addrstr[20];
pthread_mutex_t mtx = PTHREAD_MUTEX_INITIALIZER;

static void *worker(void *arg)
{
    char tmp[100000];
    int ret;
    int off = 0;

    int fd = open("/proc/self/smaps", O_RDONLY);
    if (fd < 0)
        abort();

    do {
        ret = read(fd, tmp + off, sizeof(tmp) - off);
        if (ret < 0)
            abort();
        off += ret;
        if (off == sizeof(tmp))
            abort();
    } while (ret != 0);

    char *found = strstr(tmp, addrstr);

    /* lock to prevent multiple threads from
       writing to stdout at the same time */
    pthread_mutex_lock(&mtx);
    printf("%d\n", found ? 1 : 0);
    if (!found) {
        printf("%s\n", tmp);
        printf("address %p not found in smaps\n", addr);
        fflush(stdout);
        abort();
    }
    pthread_mutex_unlock(&mtx);

    close(fd);
    return NULL;
}

int main()
{
    pthread_t t[N];

    addr = mmap(NULL, SZ, PROT_READ|PROT_WRITE,
MAP_SHARED|MAP_ANONYMOUS, -1, 0);
    if (addr == MAP_FAILED)
        abort();

    sprintf(addrstr, "%lx-", (uintptr_t)addr);

    for (int i = 0; i < N; ++i)
        if (pthread_create(&t[i], NULL, worker, NULL))
            abort();
    for (int i = 0; i < N; ++i)
        if (pthread_join(t[i], NULL))
            abort();

    munmap(addr, SZ);

    return 0;
}

Makefile:
LDFLAGS=-pthread

smaps_test: smaps_test.c

run: smaps_test
    while ./smaps_test; do echo; done | grep -v ': '


Failing run:
$ make run
while ./smaps_test; do echo; done | grep -v ': '
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
00400000-00401000 r-xp 00000000 08:02 19006749
  /home/mslusarz/smaps_test/smaps_test
00601000-00602000 rw-p 00001000 08:02 19006749
  /home/mslusarz/smaps_test/smaps_test
020f8000-02119000 rw-p 00000000 00:00 0                                  [heap]
7f0dfdffc000-7f0dfe7fd000 rw-p 00000000 00:00 0
7f0dfe7fd000-7f0dfe7fe000 ---p 00000000 00:00 0
7f0dfe7fe000-7f0dfeffe000 rw-p 00000000 00:00 0
7f0dfeffe000-7f0dfefff000 ---p 00000000 00:00 0
7f0dfefff000-7f0dff7ff000 rw-p 00000000 00:00 0
7f0dff7ff000-7f0dff800000 ---p 00000000 00:00 0
7f0dff800000-7f0e00000000 rw-p 00000000 00:00 0
7f0e00000000-7f0e00022000 rw-p 00000000 00:00 0
7f0e00022000-7f0e04000000 ---p 00000000 00:00 0
7f0e04595000-7f0e04596000 ---p 00000000 00:00 0
7f0e04596000-7f0e04d96000 rw-p 00000000 00:00 0
7f0e04d96000-7f0e04d97000 ---p 00000000 00:00 0
7f0e04d97000-7f0e05597000 rw-p 00000000 00:00 0
7f0e05597000-7f0e05598000 ---p 00000000 00:00 0
7f0e05598000-7f0e05d98000 rw-p 00000000 00:00 0
7f0e05d98000-7f0e05d99000 ---p 00000000 00:00 0
7f0e05d99000-7f0e06599000 rw-p 00000000 00:00 0
7f0e06599000-7f0e0659a000 ---p 00000000 00:00 0
7f0e0659a000-7f0e06d9a000 rw-p 00000000 00:00 0
7f0e06d9a000-7f0e06d9b000 ---p 00000000 00:00 0
7f0e06d9b000-7f0e0759b000 rw-p 00000000 00:00 0
7f0e0759b000-7f0e0759c000 ---p 00000000 00:00 0
7f0e0759c000-7f0e07d9c000 rw-p 00000000 00:00 0
7f0e07d9c000-7f0e07d9d000 ---p 00000000 00:00 0
7f0e07d9d000-7f0e0859d000 rw-p 00000000 00:00 0
7f0e0859d000-7f0e0859e000 ---p 00000000 00:00 0
7f0e0859e000-7f0e08d9e000 rw-p 00000000 00:00 0
7f0e08d9e000-7f0e08d9f000 ---p 00000000 00:00 0
7f0e08d9f000-7f0e0959f000 rw-p 00000000 00:00 0
7f0e0959f000-7f0e095a0000 ---p 00000000 00:00 0
7f0e095a0000-7f0e09da0000 rw-p 00000000 00:00 0
7f0e09da0000-7f0e09da1000 ---p 00000000 00:00 0
(should be here)
7f0e0ada1000-7f0e0af38000 r-xp 00000000 08:02 9699508
  /lib/x86_64-linux-gnu/libc-2.23.so
7f0e0af38000-7f0e0b138000 ---p 00197000 08:02 9699508
  /lib/x86_64-linux-gnu/libc-2.23.so
7f0e0b138000-7f0e0b13c000 r--p 00197000 08:02 9699508
  /lib/x86_64-linux-gnu/libc-2.23.so
7f0e0b13c000-7f0e0b13e000 rw-p 0019b000 08:02 9699508
  /lib/x86_64-linux-gnu/libc-2.23.so
7f0e0b13e000-7f0e0b142000 rw-p 00000000 00:00 0
7f0e0b142000-7f0e0b15a000 r-xp 00000000 08:02 9699869
  /lib/x86_64-linux-gnu/libpthread-2.23.so
7f0e0b15a000-7f0e0b359000 ---p 00018000 08:02 9699869
  /lib/x86_64-linux-gnu/libpthread-2.23.so
7f0e0b359000-7f0e0b35a000 r--p 00017000 08:02 9699869
  /lib/x86_64-linux-gnu/libpthread-2.23.so
7f0e0b35a000-7f0e0b35b000 rw-p 00018000 08:02 9699869
  /lib/x86_64-linux-gnu/libpthread-2.23.so
7f0e0b35b000-7f0e0b35f000 rw-p 00000000 00:00 0
7f0e0b35f000-7f0e0b383000 r-xp 00000000 08:02 9699378
  /lib/x86_64-linux-gnu/ld-2.23.so
7f0e0b557000-7f0e0b55a000 rw-p 00000000 00:00 0
7f0e0b580000-7f0e0b582000 rw-p 00000000 00:00 0
7f0e0b582000-7f0e0b583000 r--p 00023000 08:02 9699378
  /lib/x86_64-linux-gnu/ld-2.23.so
7f0e0b583000-7f0e0b584000 rw-p 00024000 08:02 9699378
  /lib/x86_64-linux-gnu/ld-2.23.so
7f0e0b584000-7f0e0b585000 rw-p 00000000 00:00 0
7fff48fe4000-7fff49005000 rw-p 00000000 00:00 0                          [stack]
7fff4908e000-7fff49090000 r--p 00000000 00:00 0                          [vvar]
7fff49090000-7fff49092000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0
  [vsyscall]

address 0x7f0e0a5a1000 not found in smaps
Aborted

Cheers,
Marcin

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Is reading from /proc/self/smaps thread-safe?
  2016-07-26 12:44 Is reading from /proc/self/smaps thread-safe? Marcin Ślusarz
@ 2016-07-31  6:22 ` Mateusz Guzik
  2016-08-02 17:20   ` Marcin Ślusarz
  0 siblings, 1 reply; 3+ messages in thread
From: Mateusz Guzik @ 2016-07-31  6:22 UTC (permalink / raw)
  To: Marcin Ślusarz; +Cc: linux-mm, LKML

On Tue, Jul 26, 2016 at 02:44:48PM +0200, Marcin Ślusarz wrote:
> Hey
> 
> I have a simple program that mmaps 8MB of anonymous memory, spawns 16
> threads, reads /proc/self/smaps in each thread and looks up whether
> mapped address can be found in smaps. From time to time it's not there.
> 
> Is this supposed to work reliably?
> 
> My guess is that libc functions allocate memory internally using mmap
> and modify process' address space while other thread is iterating over
> vmas.
> 
> I see that reading from smaps takes mmap_sem in read mode. I'm guessing
> vm modifications are done under mmap_sem in write mode.
> 
> Documentation/filesystem/proc.txt says reading from smaps is "slow but
> very precise" (although in context of RSS).
> 

Address space modification definitely happens as threads get their
stacks mmaped and unmapped.

If you run your program under strace you will see all threads perform
multiple read()s to get the content as the kernel keeps return short
reads (below 1 page size). In particular, seq_read imposes the limit
artificially.

Since there are multiple trips to the kernel, locks are dropped and
special measures are needed to maintain consistency of the result.

In m_start you can see there is a best-effort attempt: it is remembered
what vma was accessed by the previous run. But the vma can be unmapped
before we get here next time.

So no, reading the file when the content is bigger than 4k is not
guaranteed to give consistent results across reads.

I don't have a good idea how to fix it, and it is likely not worth
working on. This is not the only place which is unable to return
reliable information for sufficiently large dataset.

The obvious thing to try out is just storing all the necessary
information and generating the text form on read. Unfortunately even
that data is quite big -- over 100 bytes per vma. This can be shrinked
down significantly with encoding what information is present as opposed
to keeping all records). But with thousands of entries per application
this translates into kilobytes of memory which would have to allocated
just to hold it, sounds like a non-starter to me.

-- 
Mateusz Guzik

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Is reading from /proc/self/smaps thread-safe?
  2016-07-31  6:22 ` Mateusz Guzik
@ 2016-08-02 17:20   ` Marcin Ślusarz
  0 siblings, 0 replies; 3+ messages in thread
From: Marcin Ślusarz @ 2016-08-02 17:20 UTC (permalink / raw)
  To: Mateusz Guzik; +Cc: linux-mm, LKML

2016-07-31 8:22 GMT+02:00 Mateusz Guzik <mguzik@redhat.com>:
> On Tue, Jul 26, 2016 at 02:44:48PM +0200, Marcin Ślusarz wrote:
>> Hey
>>
>> I have a simple program that mmaps 8MB of anonymous memory, spawns 16
>> threads, reads /proc/self/smaps in each thread and looks up whether
>> mapped address can be found in smaps. From time to time it's not there.
>>
>> Is this supposed to work reliably?
>>
>> My guess is that libc functions allocate memory internally using mmap
>> and modify process' address space while other thread is iterating over
>> vmas.
>>
>> I see that reading from smaps takes mmap_sem in read mode. I'm guessing
>> vm modifications are done under mmap_sem in write mode.
>>
>> Documentation/filesystem/proc.txt says reading from smaps is "slow but
>> very precise" (although in context of RSS).
>>
>
> Address space modification definitely happens as threads get their
> stacks mmaped and unmapped.
>
> If you run your program under strace you will see all threads perform
> multiple read()s to get the content as the kernel keeps return short
> reads (below 1 page size). In particular, seq_read imposes the limit
> artificially.
>
> Since there are multiple trips to the kernel, locks are dropped and
> special measures are needed to maintain consistency of the result.
>
> In m_start you can see there is a best-effort attempt: it is remembered
> what vma was accessed by the previous run. But the vma can be unmapped
> before we get here next time.

I added printks to m_start and I see that when last_addr is non-zero, find_vma
succeeds, even in cases when my test can't find its mapping.
So it seems the problem is not that simple.

Just for testing I commented out m_next_vma call from m_start and now my
test always succeeds (of course at the expense of duplicated entries).
Maybe it's just because of changed timing or maybe the problem is deeper...

>
> So no, reading the file when the content is bigger than 4k is not
> guaranteed to give consistent results across reads.
>
> I don't have a good idea how to fix it, and it is likely not worth
> working on. This is not the only place which is unable to return
> reliable information for sufficiently large dataset.
>
> The obvious thing to try out is just storing all the necessary
> information and generating the text form on read. Unfortunately even
> that data is quite big -- over 100 bytes per vma. This can be shrinked
> down significantly with encoding what information is present as opposed
> to keeping all records). But with thousands of entries per application
> this translates into kilobytes of memory which would have to allocated
> just to hold it, sounds like a non-starter to me.

Another idea is to change seq_read to flush data to user buffer every
time it's full, without "stopping/starting" the seq_file. The logic of seq_read
is quite hairy, so I didn't try this. And I'm not sure if page fault in
copy_to_user could retake mmap_sem in write mode.

Another (crazy) idea is to implement write operation for smaps where
buffer contents would pick the right VMA for the next read.
Too crazy? :)

Marcin

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-08-02 17:21 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-26 12:44 Is reading from /proc/self/smaps thread-safe? Marcin Ślusarz
2016-07-31  6:22 ` Mateusz Guzik
2016-08-02 17:20   ` Marcin Ślusarz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).