All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Yonghong Song <yhs@fb.com>
Cc: Matt Pallissard <matt@pallissard.net>, bpf <bpf@vger.kernel.org>
Subject: Re: Accessing mm_rss_stat fields with btf/BPF_CORE_READ_INTO
Date: Tue, 23 Jun 2020 10:58:20 -0700	[thread overview]
Message-ID: <CAEf4BzbgQoi=NC6hM0j=49iGeexUEeuJFciMfipV+VDt+Luadg@mail.gmail.com> (raw)
In-Reply-To: <8ffec8ff-664d-fd3e-12eb-49eac339b612@fb.com>

On Tue, Jun 23, 2020 at 9:36 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 6/23/20 7:54 AM, Matt Pallissard wrote:
> >
> > On 2020-06-22T15:09:57 -0700, Andrii Nakryiko wrote:
> >> On Mon, Jun 22, 2020 at 10:19 AM Matt Pallissard <matt@pallissard.net> wrote:
> >>>
> >>> On 2020-06-22T09:20:03 -0700, Andrii Nakryiko wrote:
> >>>> On Mon, Jun 22, 2020 at 8:01 AM Matt Pallissard <matt@pallissard.net> wrote:
> >>>>> On 2020-06-21T08:44:28 -0700, Matt Pallissard wrote:
> >>>>>> On 2020-06-20T20:29:43 -0700, Andrii Nakryiko wrote:
> >>>>>>> On Sat, Jun 20, 2020 at 1:07 PM Matt Pallissard <matt@pallissard.net> wrote:
> >>>>>>>> On 2020-06-20T11:11:55 -0700, Yonghong Song wrote:
> >>>>>>>>> On 6/20/20 9:22 AM, Matt Pallissard wrote:
> >>>>>>>>>> New to bpf here.
> >>>>>>>>>>
> >>>>>>>>>> I'm trying to read values out of of mm_struct.  I have code like this;
> >>>>>>>>>>
> >>>>>>>>>> unsigned long i[10] = {};
> >>>>>>>>>> struct task_struct *t;
> >>>>>>>>>> struct mm_rss_stat *rss;
> >>>>>>>>>>
> >>>>>>>>>> t = (struct task_struct *)bpf_get_current_task();
> >>>>>>>>>> BPF_CORE_READ_INTO(&rss, t, mm, rss_stat);
> >>>>>>>>>> BPF_CORE_READ_INTO(i, rss, count);
> >>>>>>>>>>
> >>>>>>>>>> However, all values in `i` appear to be 0 (i[MM_FILEPAGES], etc), as if no data gets copied.  I'm about 100% confident that this is caused by a glaring oversight on my part.
> >>>>>>>>>
> >>>>>>>>> Maybe you want to check the return value of BPF_CORE_READ_INTO.
> >>>>>>>>> Underlying it is using bpf_probe_read and bpf_probe_read may fail e.g., due
> >>>>>>>>> to major fault.
> >>>>>>>>
> >>>>>>>> Doh, I should have known to check the return codes!  Yes, it was failing.  I knew I was overlooking something trivial.
> >>>>>>>>
> >>>>>>>
> >>>>>>> I wrote exactly such piece of code a while ago. Here's part of it for
> >>>>>>> reference, I think it will be helpful:
> >>>>>>>
> >>>>>>>    struct task_struct *task = (struct task_struct *)bpf_get_current_task();
> >>>>>>>    const struct mm_struct *mm = BPF_CORE_READ(task, mm);
> >>>>>>>
> >>>>>>>    if (mm) {
> >>>>>>>        u64 hiwater_rss = BPF_CORE_READ(mm, hiwater_rss);
> >>>>>>>        u64 file_pages = BPF_CORE_READ(mm, rss_stat.count[MM_FILEPAGES].counter);
> >>>>>>>        u64 anon_pages = BPF_CORE_READ(mm, rss_stat.count[MM_ANONPAGES].counter);
> >>>>>>>        u64 shmem_pages = BPF_CORE_READ(mm,
> >>>>>>> rss_stat.count[MM_SHMEMPAGES].counter);
> >>>>>>>        u64 active_rss = file_pages + anon_pages + shmem_pages;
> >>>>>>>        /* ... */
> >>>>>>
> >>>>>> Thank you,
> >>>>>>
> >>>>>> After realizing that I was referencing the struct incorrectly, I wound up with a similar block of code.  However, as I started testing it against /proc/pid/smaps[,_rollup] I noticed that my numbers didn't match up.  Always smaller.
> >>>>>>
> >>>>>> I took a quick glance at fs/proc/task_mmu.c.  I think I'll have to walk some sort of accounting structure.
> >>>>>
> >>>>>
> >>>>> I started to take a hard look at fs/proc/task_mmu.c.  With all the locking, globals, and compile-time constants, I'm not sure that it's even possible to correctly walk `vm_area_struct` in bpf.
> >>>>
> >>>> Yes, you can't take all those locks from BPF. But reading atomic
> >>>> counters from BPF should be no problem. You might get a slightly out
> >>>> of sync readings, but whatever you are doing shouldn't expect to have
> >>>> 100% correct values anyways, because they might change so fast after
> >>>> you read them.
> >>>
> >>> That was my initial thought.  I didn't care to much about stale data, my only real concern was walking vm_area_struct and having memory freed.  I wasn't sure if that could break the list underneath me.  Although, that shouldn't be too difficult to get to the bottom of.
> >>>
> >>
> >> Not sure about vm_area_struct (where is it in the example above?), but
> >> mm_struct won't go away, because current task won't go away, because
> >> BPF program is running in the context of current. Similarly for
> >> bpf_iter, bpf_iter will actually take a refcnt on tast_struct. So I
> >> think you don't have to worry about that.
> >
> > I didn't mention it explicitly in the example above.  But when I originally mentioned walking an accounting structure, as procfs does, it winds up being `mm_struct->mmap,vm_[next,prev]`, with mmap being a `vm_area_struct`.  But, it sounds like I should be abandoning that path and iterating over all the tasks.
> >
> >
> >>>>> If anyone has suggestions for getting memory numbers from an entire process, not just a task/thread, I'd love to hear them.  If not, I'll pursue this on my own.
> >>>>
> >>>> For this, you'd need to iterate across many tasks and aggregate their
> >>>> results based on tasks's tgid. Check iter/task programs in selftests
> >>>> (progs/bpf_iter_task.c, I think).
> >
> >
> > When I try to replicate some of the selftest task logic. I run into some errors when I call bpf_object__load.  `libbpf: task is not found in vmlinux BTF.`  I'll try matching the selftest code more closely and digging into that further.
>
> Somehow libbpf did not prepend `task` with `bpf_iter_` prefix. Not sure
> what is the exact issue. Yes, please mimic what selftests did.
>

It's just an artifact of how libbpf logs error in such case. It did
search for "bpf_iter_task" type, though. But Matt probably doesn't
have a recent enough kernel or didn't build it with
CONFIG_DEBUG_INFO_BTF=y and pahole 1.16+?

> >
> > As an aside; is there any documentation for bpf_iter outside of the selftests?
>
> Unfortunately, no. The commit messages of the original patch set might help.
> https://lore.kernel.org/bpf/20200507053916.1542319-1-yhs@fb.com/T/#mf973843af65fc51ac9b3e3673962cd3e87f705e8
>
> >
> > Matt Pallissard
> >

  reply	other threads:[~2020-06-23 17:58 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-20 16:22 Accessing mm_rss_stat fields with btf/BPF_CORE_READ_INTO Matt Pallissard
2020-06-20 18:11 ` Yonghong Song
2020-06-20 20:06   ` Matt Pallissard
2020-06-21  3:29     ` Andrii Nakryiko
2020-06-21 15:44       ` Matt Pallissard
2020-06-22 15:01         ` Matt Pallissard
2020-06-22 16:20           ` Andrii Nakryiko
2020-06-22 17:19             ` Matt Pallissard
2020-06-22 22:09               ` Andrii Nakryiko
2020-06-23 14:54                 ` Matt Pallissard
2020-06-23 16:35                   ` Yonghong Song
2020-06-23 17:58                     ` Andrii Nakryiko [this message]
2020-06-23 18:11                       ` Matt Pallissard
2020-06-23 18:36                         ` Andrii Nakryiko
2020-06-23 22:05                           ` Matt Pallissard
2020-06-23 22:13                             ` Andrii Nakryiko
2020-06-24 15:51                               ` Matt Pallissard
2020-06-23 22:16                             ` Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEf4BzbgQoi=NC6hM0j=49iGeexUEeuJFciMfipV+VDt+Luadg@mail.gmail.com' \
    --to=andrii.nakryiko@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=matt@pallissard.net \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.