All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: HugePages shared memory support in LLTng
       [not found] <CAO+PNdHdLFk=Q0L2BLGnz8xCvdgMw3aYpZuAZumBOWKraKTnAw@mail.gmail.com>
@ 2019-07-15 14:33 ` Jonathan Rajotte-Julien
       [not found] ` <20190715143302.GA2017@joraj-alpa>
  1 sibling, 0 replies; 13+ messages in thread
From: Jonathan Rajotte-Julien @ 2019-07-15 14:33 UTC (permalink / raw)
  To: Yiteng Guo; +Cc: lttng-dev

Hi Yiteng,

On Fri, Jul 12, 2019 at 06:18:44PM -0400, Yiteng Guo wrote:
> Hello,
> 
> I am wondering if there is any way for lttng-ust to create its shm on
> hugepages. I noticed that there was an option `--shm-path` which can be
> used to change the location of shm. However, if I specified the path to a
> `hugetlbfs` such as /dev/hugepages, I would get errors in lttng-sessiond
> and no trace data were generated.
> 
> The error I got was
> ```
> PERROR - 17:54:56.740674 [8163/8168]: Error appending to metadata file:
> Invalid argument (in lttng_metadata_printf() at ust-metadata.c:176)
> Error: Failed to generate session metadata (errno = -1)
> ```
> I took a look at lttng code base and found that lttng used `write` to
> generate a metadata file under `--shm-path`. However, it looks like
> `hugetlbfs` does not support `write` operation. I did a simple patch with
> `mmap` to get around this problem. Then, I got another error:

Would you be interested in sharing this patch so we can help you figure out the
problem?

A github branch would be perfect.

> ```
> Error: Error creating UST channel "my-channel" on the consumer daemon
> ```

Make sure to pass "--verbose-consumer" to lttng-sessiond. It will ensure that
the lttng-consumerd output is present in lttng-sesssiond logs. It should help a
bit

I suspect that we fail on buffers allocation.

> This time, I could not locate the problem anymore :(. Do you have any idea
> of how to get hugepages shm work in lttng?
> 
> To give you more context here, I was tracing a performance sensitive
> program. I didn't want to suffer from the sub-buffer switch cost so I
> created a very large sub-buffer (1MB).

If you don't mind, how many core are present? How much memory is available on
the host?

Could you share with us the complete sequence of command you use to setup your
tracing session?

If it is not much trouble could you also share the step you took to setup/mount
your hugetlbfs path?

> I did a benchmark on my tracepoint
> and noticed that after running a certain number of tracepoints, I got a
> noticeably larger overhead (1200ns larger than other) for every ~130
> tracepoints. It turned out that this large overhead was due to a page
> fault. The numbers were matched up (130 * 32 bytes = 4160 bytes, which is
> approximately the size of a normal page 4kB) and I also used lttng perf
> page fault counters to verify it. Therefore, I am looking for a solution to
> have lttng create shm on hugepages.

Quite interesting!

> 
> Thank you very much! I look forward to hearing from you.
> 
> Best,
> Yiteng

> _______________________________________________
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


-- 
Jonathan Rajotte-Julien
EfficiOS

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: HugePages shared memory support in LLTng
       [not found] ` <20190715143302.GA2017@joraj-alpa>
@ 2019-07-15 19:21   ` Yiteng Guo
       [not found]   ` <CAO+PNdHW7O98QSdWyA5U6e=gtLmdFt77wHOT=eHb-Py1W3A-oQ@mail.gmail.com>
  1 sibling, 0 replies; 13+ messages in thread
From: Yiteng Guo @ 2019-07-15 19:21 UTC (permalink / raw)
  To: Jonathan Rajotte-Julien; +Cc: lttng-dev

Hi Jonathan,

Thank you for your prompt reply.

On Mon, Jul 15, 2019 at 10:33 AM Jonathan Rajotte-Julien
<jonathan.rajotte-julien@efficios.com> wrote:
>
> Hi Yiteng,
>
> On Fri, Jul 12, 2019 at 06:18:44PM -0400, Yiteng Guo wrote:
> > Hello,
> >
> > I am wondering if there is any way for lttng-ust to create its shm on
> > hugepages. I noticed that there was an option `--shm-path` which can be
> > used to change the location of shm. However, if I specified the path to a
> > `hugetlbfs` such as /dev/hugepages, I would get errors in lttng-sessiond
> > and no trace data were generated.
> >
> > The error I got was
> > ```
> > PERROR - 17:54:56.740674 [8163/8168]: Error appending to metadata file:
> > Invalid argument (in lttng_metadata_printf() at ust-metadata.c:176)
> > Error: Failed to generate session metadata (errno = -1)
> > ```
> > I took a look at lttng code base and found that lttng used `write` to
> > generate a metadata file under `--shm-path`. However, it looks like
> > `hugetlbfs` does not support `write` operation. I did a simple patch with
> > `mmap` to get around this problem. Then, I got another error:
>
> Would you be interested in sharing this patch so we can help you figure out the
> problem?
>
> A github branch would be perfect.
>

You can check out my patch here:
https://github.com/guoyiteng/lttng-tools/compare/master...guoyiteng:hugepage

> > ```
> > Error: Error creating UST channel "my-channel" on the consumer daemon
> > ```
>
> Make sure to pass "--verbose-consumer" to lttng-sessiond. It will ensure that
> the lttng-consumerd output is present in lttng-sesssiond logs. It should help a
> bit
>
> I suspect that we fail on buffers allocation.
>

After I passed "--verbose-consumer", I got the following logs.

DEBUG1 - 18:59:55.773387304 [8844/8844]: Health check time delta in
seconds set to 20 (in health_init() at health.c:73)
DEBUG3 - 18:59:55.773544396 [8844/8844]: Created hashtable size 4 at
0x5625c427d4c0 of type 2 (in lttng_ht_new() at hashtable.c:145)
DEBUG3 - 18:59:55.773560630 [8844/8844]: Created hashtable size 4 at
0x5625c427dc00 of type 2 (in lttng_ht_new() at hashtable.c:145)
DEBUG3 - 18:59:55.773566277 [8844/8844]: Created hashtable size 4 at
0x5625c427df30 of type 2 (in lttng_ht_new() at hashtable.c:145)
DEBUG3 - 18:59:55.773572450 [8844/8844]: Created hashtable size 4 at
0x5625c427ead0 of type 2 (in lttng_ht_new() at hashtable.c:145)
DEBUG3 - 18:59:55.773576515 [8844/8844]: Created hashtable size 4 at
0x5625c427f210 of type 2 (in lttng_ht_new() at hashtable.c:145)
DEBUG3 - 18:59:55.773582290 [8844/8844]: Created hashtable size 4 at
0x5625c427f950 of type 2 (in lttng_ht_new() at hashtable.c:145)
DEBUG1 - 18:59:55.773605669 [8844/8844]: TCP inet operation timeout
set to 216 sec (in lttcomm_inet_init() at inet.c:546)
DEBUG1 - 18:59:55.773627249 [8844/8844]: Connecting to error socket
/home/vagrant/.lttng/ustconsumerd64/error (in main() at
lttng-consumerd.c:464)
DEBUG1 - 18:59:55.773767487 [8844/8848]: [thread] Manage health check
started (in thread_manage_health() at health-consumerd.c:167)
DEBUG1 - 18:59:55.773849241 [8844/8848]: epoll set max size is 1659863
(in compat_epoll_set_max_size() at compat-epoll.c:337)
DEBUG1 - 18:59:55.773884368 [8844/8848]: Health check ready (in
thread_manage_health() at health-consumerd.c:247)
DEBUG3 - 18:59:55.883547291 [8844/8850]: Created hashtable size 4 at
0x7ff5ec000b40 of type 2 (in lttng_ht_new() at hashtable.c:145)
DEBUG1 - 18:59:55.884682278 [8844/8850]: Thread channel poll started
(in consumer_thread_channel_poll() at consumer.c:2941)
DEBUG1 - 18:59:55.883573028 [8844/8853]: Creating command socket
/home/vagrant/.lttng/ustconsumerd64/command (in
consumer_thread_sessiond_poll() at consumer.c:3204)
DEBUG1 - 18:59:55.885435478 [8844/8853]: Sending ready command to
lttng-sessiond (in consumer_thread_sessiond_poll() at consumer.c:3217)
DEBUG1 - 18:59:55.883646301 [8844/8852]: Updating poll fd array (in
update_poll_array() at consumer.c:1103)
DEBUG1 - 18:59:55.885574718 [8844/8853]: Connection on client_socket
(in consumer_thread_sessiond_poll() at consumer.c:3239)
DEBUG1 - 18:59:55.885583183 [8844/8852]: polling on 2 fd (in
consumer_thread_data_poll() at consumer.c:2630)
DEBUG1 - 18:59:55.885596572 [8844/8853]: Metadata connection on
client_socket (in set_metadata_socket() at consumer.c:3165)
DEBUG1 - 18:59:55.885612073 [8844/8853]: Incoming command on sock (in
consumer_thread_sessiond_poll() at consumer.c:3285)
DEBUG1 - 18:59:55.883553158 [8844/8851]: Thread metadata poll started
(in consumer_thread_metadata_poll() at consumer.c:2351)
DEBUG1 - 18:59:55.885714717 [8844/8851]: Metadata main loop started
(in consumer_thread_metadata_poll() at consumer.c:2367)
DEBUG1 - 18:59:55.885726270 [8844/8851]: Metadata poll wait (in
consumer_thread_metadata_poll() at consumer.c:2373)
DEBUG1 - 18:59:55.885781919 [8844/8853]: Received channel monitor pipe
(29) (in lttng_ustconsumer_recv_cmd() at ust-consumer.c:1903)
DEBUG1 - 18:59:55.885803340 [8844/8853]: Channel monitor pipe set as
non-blocking (in lttng_ustconsumer_recv_cmd() at ust-consumer.c:1924)
DEBUG1 - 18:59:55.885810860 [8844/8853]: received command on sock (in
consumer_thread_sessiond_poll() at consumer.c:3301)
DEBUG1 - 18:59:55.887146328 [8844/8850]: Channel main loop started (in
consumer_thread_channel_poll() at consumer.c:2956)
DEBUG1 - 18:59:55.887497303 [8844/8850]: Channel poll wait (in
consumer_thread_channel_poll() at consumer.c:2961)
DEBUG1 - 18:59:55.892440821 [8844/8853]: Incoming command on sock (in
consumer_thread_sessiond_poll() at consumer.c:3285)
DEBUG1 - 18:59:55.892479711 [8844/8853]: Consumer mkdir
/home/vagrant/lttng-traces/auto-20190715-185955//ust in session 0 (in
lttng_ustconsumer_recv_cmd() at ust-consumer.c:2093)
DEBUG3 - 18:59:55.892486547 [8844/8853]: mkdirat() recursive fd = -100
(AT_FDCWD), path =
/home/vagrant/lttng-traces/auto-20190715-185955//ust, mode = 504, uid
= 1000, gid = 1000 (in run_as_mkdirat_recursive() at runas.c:1147)
DEBUG1 - 18:59:55.892500964 [8844/8853]: Using run_as worker (in
run_as() at runas.c:1100)
DEBUG1 - 18:59:55.892852801 [8844/8853]: received command on sock (in
consumer_thread_sessiond_poll() at consumer.c:3301)
DEBUG1 - 18:59:57.964977091 [8844/8853]: Incoming command on sock (in
consumer_thread_sessiond_poll() at consumer.c:3285)
DEBUG1 - 18:59:57.965041124 [8844/8853]: Allocated channel (key 1) (in
consumer_allocate_channel() at consumer.c:1043)
DEBUG3 - 18:59:57.965052309 [8844/8853]: Creating channel to ustctl
with attr: [overwrite: 0, subbuf_size: 524288, num_subbuf: 4,
switch_timer_interval: 0, read_timer_interval: 0, output: 0, type: 0
(in create_ust_channel() at ust-consumer.c:457)
DEBUG3 - 18:59:57.965104805 [8844/8853]: open()
/dev/hugepages/auto-20190715-185955/ust/uid/1000/64-bit/channel0_0
with flags C2 mode 384 for uid 1000 and gid 1000 (in run_as_open() at
runas.c:1212)
DEBUG1 - 18:59:57.965120609 [8844/8853]: Using run_as worker (in
run_as() at runas.c:1100)
DEBUG3 - 18:59:57.965317517 [8844/8853]: open()
/dev/hugepages/auto-20190715-185955/ust/uid/1000/64-bit/channel0_1
with flags C2 mode 384 for uid 1000 and gid 1000 (in run_as_open() at
runas.c:1212)
DEBUG1 - 18:59:57.965335148 [8844/8853]: Using run_as worker (in
run_as() at runas.c:1100)
DEBUG3 - 18:59:57.965445811 [8844/8853]: open()
/dev/hugepages/auto-20190715-185955/ust/uid/1000/64-bit/channel0_2
with flags C2 mode 384 for uid 1000 and gid 1000 (in run_as_open() at
runas.c:1212)
DEBUG1 - 18:59:57.965461438 [8844/8853]: Using run_as worker (in
run_as() at runas.c:1100)
DEBUG3 - 18:59:57.966116363 [8844/8853]: open()
/dev/hugepages/auto-20190715-185955/ust/uid/1000/64-bit/channel0_3
with flags C2 mode 384 for uid 1000 and gid 1000 (in run_as_open() at
runas.c:1212)
DEBUG1 - 18:59:57.966145191 [8844/8853]: Using run_as worker (in
run_as() at runas.c:1100)
DEBUG3 - 18:59:57.966341799 [8844/8853]: open()
/dev/hugepages/auto-20190715-185955/ust/uid/1000/64-bit/channel0_4
with flags C2 mode 384 for uid 1000 and gid 1000 (in run_as_open() at
runas.c:1212)
DEBUG1 - 18:59:57.966420313 [8844/8853]: Using run_as worker (in
run_as() at runas.c:1100)
DEBUG3 - 18:59:57.966548533 [8844/8853]: open()
/dev/hugepages/auto-20190715-185955/ust/uid/1000/64-bit/channel0_5
with flags C2 mode 384 for uid 1000 and gid 1000 (in run_as_open() at
runas.c:1212)
DEBUG1 - 18:59:57.966567778 [8844/8853]: Using run_as worker (in
run_as() at runas.c:1100)
DEBUG3 - 18:59:57.966932907 [8844/8853]: unlink()
/dev/hugepages/auto-20190715-185955/ust/uid/1000/64-bit/channel0_5
with for uid 1000 and gid 1000 (in run_as_unlink() at runas.c:1233)
DEBUG1 - 18:59:57.966950256 [8844/8853]: Using run_as worker (in
run_as() at runas.c:1100)
DEBUG3 - 18:59:57.967061802 [8844/8853]: unlink()
/dev/hugepages/auto-20190715-185955/ust/uid/1000/64-bit/channel0_4
with for uid 1000 and gid 1000 (in run_as_unlink() at runas.c:1233)
DEBUG1 - 18:59:57.967081332 [8844/8853]: Using run_as worker (in
run_as() at runas.c:1100)
DEBUG3 - 18:59:57.967366982 [8844/8853]: unlink()
/dev/hugepages/auto-20190715-185955/ust/uid/1000/64-bit/channel0_3
with for uid 1000 and gid 1000 (in run_as_unlink() at runas.c:1233)
DEBUG1 - 18:59:57.967419957 [8844/8853]: Using run_as worker (in
run_as() at runas.c:1100)
DEBUG3 - 18:59:57.967562353 [8844/8853]: unlink()
/dev/hugepages/auto-20190715-185955/ust/uid/1000/64-bit/channel0_2
with for uid 1000 and gid 1000 (in run_as_unlink() at runas.c:1233)
DEBUG1 - 18:59:57.967587355 [8844/8853]: Using run_as worker (in
run_as() at runas.c:1100)
DEBUG3 - 18:59:57.968008237 [8844/8853]: unlink()
/dev/hugepages/auto-20190715-185955/ust/uid/1000/64-bit/channel0_1
with for uid 1000 and gid 1000 (in run_as_unlink() at runas.c:1233)
DEBUG1 - 18:59:57.968104447 [8844/8853]: Using run_as worker (in
run_as() at runas.c:1100)
DEBUG3 - 18:59:57.968327138 [8844/8853]: unlink()
/dev/hugepages/auto-20190715-185955/ust/uid/1000/64-bit/channel0_0
with for uid 1000 and gid 1000 (in run_as_unlink() at runas.c:1233)
DEBUG1 - 18:59:57.968349750 [8844/8853]: Using run_as worker (in
run_as() at runas.c:1100)
DEBUG3 - 18:59:57.968562473 [8844/8853]: rmdir_recursive()
/dev/hugepages/auto-20190715-185955 with for uid 1000 and gid 1000 (in
run_as_rmdir_recursive() at runas.c:1251)
DEBUG1 - 18:59:57.968582498 [8844/8853]: Using run_as worker (in
run_as() at runas.c:1100)
DEBUG1 - 18:59:57.968934753 [8844/8853]: UST consumer cleaning stream
list (in destroy_channel() at ust-consumer.c:67)
DEBUG1 - 18:59:57.969019502 [8844/8853]: received command on sock (in
consumer_thread_sessiond_poll() at consumer.c:3301)
Error: ask_channel_creation consumer command failed
Error: Error creating UST channel "channel0" on the consumer daemon

> > This time, I could not locate the problem anymore :(. Do you have any idea
> > of how to get hugepages shm work in lttng?
> >
> > To give you more context here, I was tracing a performance sensitive
> > program. I didn't want to suffer from the sub-buffer switch cost so I
> > created a very large sub-buffer (1MB).
>
> If you don't mind, how many core are present? How much memory is available on
> the host?

I compiled and played around with lttng source codes on my vagrant vm
environment. I assigned 6 cores and 7.8G memory to it. My vm OS is
Ubuntu 18.04.2 LTS (GNU/Linux 4.15.0-51-generic x86_64).

>
> Could you share with us the complete sequence of command you use to setup your
> tracing session?
>

I used the following commands to test if lttng works with hugepages.
```
lttng create --shm-path=/dev/hugepages
lttng enable-event --userspace hello_world:my_first_tracepoint
lttng start
```
And the binary program I traced was the hello_world example in lttng
documentation page.

> If it is not much trouble could you also share the step you took to setup/mount
> your hugetlbfs path?
>

I followed the first section in https://wiki.debian.org/Hugepages to
set up my hugetlbfs, except I used /dev/hugepages instead of
/hugepages.

> > I did a benchmark on my tracepoint
> > and noticed that after running a certain number of tracepoints, I got a
> > noticeably larger overhead (1200ns larger than other) for every ~130
> > tracepoints. It turned out that this large overhead was due to a page
> > fault. The numbers were matched up (130 * 32 bytes = 4160 bytes, which is
> > approximately the size of a normal page 4kB) and I also used lttng perf
> > page fault counters to verify it. Therefore, I am looking for a solution to
> > have lttng create shm on hugepages.
>
> Quite interesting!
>
> >
> > Thank you very much! I look forward to hearing from you.
> >
> > Best,
> > Yiteng
>
> > _______________________________________________
> > lttng-dev mailing list
> > lttng-dev@lists.lttng.org
> > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>
>
> --
> Jonathan Rajotte-Julien
> EfficiOS

Best,
Yiteng

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: HugePages shared memory support in LLTng
       [not found]   ` <CAO+PNdHW7O98QSdWyA5U6e=gtLmdFt77wHOT=eHb-Py1W3A-oQ@mail.gmail.com>
@ 2019-07-22 18:44     ` Yiteng Guo
       [not found]     ` <CAO+PNdFotFk6uCF1dySZi9dV6PYpAazWoQpsnU+N58F2b-73FQ@mail.gmail.com>
  1 sibling, 0 replies; 13+ messages in thread
From: Yiteng Guo @ 2019-07-22 18:44 UTC (permalink / raw)
  To: Jonathan Rajotte-Julien; +Cc: lttng-dev

Hi Jonathan,

I spent these days on this problem and finally figured it out. Here
are patches I've written.

https://github.com/lttng/lttng-ust/compare/master...guoyiteng:hugepages
https://github.com/lttng/lttng-tools/compare/master...guoyiteng:hugepages

These two patches are just ad-hoc supports for hugepages, which are
not intended to be a pull request. If you want to support hugepages in
future lttng releases, I am glad to help you with that. What I did
here is to replace `shm_open` with `open` on a hugetlbfs directory. I
also modified other parts of code (such as memory alignment) to make
them compatible with huge pages. I didn't use `shm-path` option
because I noticed that this option would not only relocate the shm of
ring buffer but also other shm and metadata files. However, we only
wanted to use huge pages for ring buffer here. Here are commands I
used to launch an lttng session.

```
lttng create
lttng enable-channel --userspace --subbuf-size=4M --num-subbuf=2
--buffers-pid my-channel
lttng add-context --userspace --type=perf:thread:page-fault
lttng enable-event --userspace -c my-channel hello_world:my_first_tracepoint
lttng start
```

My patches worked very well and I didn't get page faults anymore.
However, the only caveat of this patch is that ringbuffers are not
destroyed correctly. It leads to a problem that every new lttng
session acquires some hugepages but never releases them. After I
created and destroyed several sessions, I would get an error that told
me there were not enough hugepages to be used. I get around this
problem by restarting the session daemon. But there should be some way
to have ringbuffers (or its channel) destroyed elegently when its
session is destroyed.

In the meantime, I am also trying another way to get rid of these page
faults, which is to prefault the ringbuffer shared memory in my
program. This solution does not need any modification on lttng souce
codes, which, I think, is a safer way to go. However, to prefault the
ringbuffer shm, I need to know the address (and size) of the
ringbuffer. Is there any way to learn this piece of information from
the user program?

I wish you could have a plan to support the hugepages in the future. I
am more than happy to help you with that. Thank you very much and I
look forward to hearing from you.

Best,
Yiteng

On Mon, Jul 15, 2019 at 3:21 PM Yiteng Guo <guoyiteng@gmail.com> wrote:
>
> Hi Jonathan,
>
> Thank you for your prompt reply.
>
> On Mon, Jul 15, 2019 at 10:33 AM Jonathan Rajotte-Julien
> <jonathan.rajotte-julien@efficios.com> wrote:
> >
> > Hi Yiteng,
> >
> > On Fri, Jul 12, 2019 at 06:18:44PM -0400, Yiteng Guo wrote:
> > > Hello,
> > >
> > > I am wondering if there is any way for lttng-ust to create its shm on
> > > hugepages. I noticed that there was an option `--shm-path` which can be
> > > used to change the location of shm. However, if I specified the path to a
> > > `hugetlbfs` such as /dev/hugepages, I would get errors in lttng-sessiond
> > > and no trace data were generated.
> > >
> > > The error I got was
> > > ```
> > > PERROR - 17:54:56.740674 [8163/8168]: Error appending to metadata file:
> > > Invalid argument (in lttng_metadata_printf() at ust-metadata.c:176)
> > > Error: Failed to generate session metadata (errno = -1)
> > > ```
> > > I took a look at lttng code base and found that lttng used `write` to
> > > generate a metadata file under `--shm-path`. However, it looks like
> > > `hugetlbfs` does not support `write` operation. I did a simple patch with
> > > `mmap` to get around this problem. Then, I got another error:
> >
> > Would you be interested in sharing this patch so we can help you figure out the
> > problem?
> >
> > A github branch would be perfect.
> >
>
> You can check out my patch here:
> https://github.com/guoyiteng/lttng-tools/compare/master...guoyiteng:hugepage
>
> > > ```
> > > Error: Error creating UST channel "my-channel" on the consumer daemon
> > > ```
> >
> > Make sure to pass "--verbose-consumer" to lttng-sessiond. It will ensure that
> > the lttng-consumerd output is present in lttng-sesssiond logs. It should help a
> > bit
> >
> > I suspect that we fail on buffers allocation.
> >
>
> After I passed "--verbose-consumer", I got the following logs.
>
> DEBUG1 - 18:59:55.773387304 [8844/8844]: Health check time delta in
> seconds set to 20 (in health_init() at health.c:73)
> DEBUG3 - 18:59:55.773544396 [8844/8844]: Created hashtable size 4 at
> 0x5625c427d4c0 of type 2 (in lttng_ht_new() at hashtable.c:145)
> DEBUG3 - 18:59:55.773560630 [8844/8844]: Created hashtable size 4 at
> 0x5625c427dc00 of type 2 (in lttng_ht_new() at hashtable.c:145)
> DEBUG3 - 18:59:55.773566277 [8844/8844]: Created hashtable size 4 at
> 0x5625c427df30 of type 2 (in lttng_ht_new() at hashtable.c:145)
> DEBUG3 - 18:59:55.773572450 [8844/8844]: Created hashtable size 4 at
> 0x5625c427ead0 of type 2 (in lttng_ht_new() at hashtable.c:145)
> DEBUG3 - 18:59:55.773576515 [8844/8844]: Created hashtable size 4 at
> 0x5625c427f210 of type 2 (in lttng_ht_new() at hashtable.c:145)
> DEBUG3 - 18:59:55.773582290 [8844/8844]: Created hashtable size 4 at
> 0x5625c427f950 of type 2 (in lttng_ht_new() at hashtable.c:145)
> DEBUG1 - 18:59:55.773605669 [8844/8844]: TCP inet operation timeout
> set to 216 sec (in lttcomm_inet_init() at inet.c:546)
> DEBUG1 - 18:59:55.773627249 [8844/8844]: Connecting to error socket
> /home/vagrant/.lttng/ustconsumerd64/error (in main() at
> lttng-consumerd.c:464)
> DEBUG1 - 18:59:55.773767487 [8844/8848]: [thread] Manage health check
> started (in thread_manage_health() at health-consumerd.c:167)
> DEBUG1 - 18:59:55.773849241 [8844/8848]: epoll set max size is 1659863
> (in compat_epoll_set_max_size() at compat-epoll.c:337)
> DEBUG1 - 18:59:55.773884368 [8844/8848]: Health check ready (in
> thread_manage_health() at health-consumerd.c:247)
> DEBUG3 - 18:59:55.883547291 [8844/8850]: Created hashtable size 4 at
> 0x7ff5ec000b40 of type 2 (in lttng_ht_new() at hashtable.c:145)
> DEBUG1 - 18:59:55.884682278 [8844/8850]: Thread channel poll started
> (in consumer_thread_channel_poll() at consumer.c:2941)
> DEBUG1 - 18:59:55.883573028 [8844/8853]: Creating command socket
> /home/vagrant/.lttng/ustconsumerd64/command (in
> consumer_thread_sessiond_poll() at consumer.c:3204)
> DEBUG1 - 18:59:55.885435478 [8844/8853]: Sending ready command to
> lttng-sessiond (in consumer_thread_sessiond_poll() at consumer.c:3217)
> DEBUG1 - 18:59:55.883646301 [8844/8852]: Updating poll fd array (in
> update_poll_array() at consumer.c:1103)
> DEBUG1 - 18:59:55.885574718 [8844/8853]: Connection on client_socket
> (in consumer_thread_sessiond_poll() at consumer.c:3239)
> DEBUG1 - 18:59:55.885583183 [8844/8852]: polling on 2 fd (in
> consumer_thread_data_poll() at consumer.c:2630)
> DEBUG1 - 18:59:55.885596572 [8844/8853]: Metadata connection on
> client_socket (in set_metadata_socket() at consumer.c:3165)
> DEBUG1 - 18:59:55.885612073 [8844/8853]: Incoming command on sock (in
> consumer_thread_sessiond_poll() at consumer.c:3285)
> DEBUG1 - 18:59:55.883553158 [8844/8851]: Thread metadata poll started
> (in consumer_thread_metadata_poll() at consumer.c:2351)
> DEBUG1 - 18:59:55.885714717 [8844/8851]: Metadata main loop started
> (in consumer_thread_metadata_poll() at consumer.c:2367)
> DEBUG1 - 18:59:55.885726270 [8844/8851]: Metadata poll wait (in
> consumer_thread_metadata_poll() at consumer.c:2373)
> DEBUG1 - 18:59:55.885781919 [8844/8853]: Received channel monitor pipe
> (29) (in lttng_ustconsumer_recv_cmd() at ust-consumer.c:1903)
> DEBUG1 - 18:59:55.885803340 [8844/8853]: Channel monitor pipe set as
> non-blocking (in lttng_ustconsumer_recv_cmd() at ust-consumer.c:1924)
> DEBUG1 - 18:59:55.885810860 [8844/8853]: received command on sock (in
> consumer_thread_sessiond_poll() at consumer.c:3301)
> DEBUG1 - 18:59:55.887146328 [8844/8850]: Channel main loop started (in
> consumer_thread_channel_poll() at consumer.c:2956)
> DEBUG1 - 18:59:55.887497303 [8844/8850]: Channel poll wait (in
> consumer_thread_channel_poll() at consumer.c:2961)
> DEBUG1 - 18:59:55.892440821 [8844/8853]: Incoming command on sock (in
> consumer_thread_sessiond_poll() at consumer.c:3285)
> DEBUG1 - 18:59:55.892479711 [8844/8853]: Consumer mkdir
> /home/vagrant/lttng-traces/auto-20190715-185955//ust in session 0 (in
> lttng_ustconsumer_recv_cmd() at ust-consumer.c:2093)
> DEBUG3 - 18:59:55.892486547 [8844/8853]: mkdirat() recursive fd = -100
> (AT_FDCWD), path =
> /home/vagrant/lttng-traces/auto-20190715-185955//ust, mode = 504, uid
> = 1000, gid = 1000 (in run_as_mkdirat_recursive() at runas.c:1147)
> DEBUG1 - 18:59:55.892500964 [8844/8853]: Using run_as worker (in
> run_as() at runas.c:1100)
> DEBUG1 - 18:59:55.892852801 [8844/8853]: received command on sock (in
> consumer_thread_sessiond_poll() at consumer.c:3301)
> DEBUG1 - 18:59:57.964977091 [8844/8853]: Incoming command on sock (in
> consumer_thread_sessiond_poll() at consumer.c:3285)
> DEBUG1 - 18:59:57.965041124 [8844/8853]: Allocated channel (key 1) (in
> consumer_allocate_channel() at consumer.c:1043)
> DEBUG3 - 18:59:57.965052309 [8844/8853]: Creating channel to ustctl
> with attr: [overwrite: 0, subbuf_size: 524288, num_subbuf: 4,
> switch_timer_interval: 0, read_timer_interval: 0, output: 0, type: 0
> (in create_ust_channel() at ust-consumer.c:457)
> DEBUG3 - 18:59:57.965104805 [8844/8853]: open()
> /dev/hugepages/auto-20190715-185955/ust/uid/1000/64-bit/channel0_0
> with flags C2 mode 384 for uid 1000 and gid 1000 (in run_as_open() at
> runas.c:1212)
> DEBUG1 - 18:59:57.965120609 [8844/8853]: Using run_as worker (in
> run_as() at runas.c:1100)
> DEBUG3 - 18:59:57.965317517 [8844/8853]: open()
> /dev/hugepages/auto-20190715-185955/ust/uid/1000/64-bit/channel0_1
> with flags C2 mode 384 for uid 1000 and gid 1000 (in run_as_open() at
> runas.c:1212)
> DEBUG1 - 18:59:57.965335148 [8844/8853]: Using run_as worker (in
> run_as() at runas.c:1100)
> DEBUG3 - 18:59:57.965445811 [8844/8853]: open()
> /dev/hugepages/auto-20190715-185955/ust/uid/1000/64-bit/channel0_2
> with flags C2 mode 384 for uid 1000 and gid 1000 (in run_as_open() at
> runas.c:1212)
> DEBUG1 - 18:59:57.965461438 [8844/8853]: Using run_as worker (in
> run_as() at runas.c:1100)
> DEBUG3 - 18:59:57.966116363 [8844/8853]: open()
> /dev/hugepages/auto-20190715-185955/ust/uid/1000/64-bit/channel0_3
> with flags C2 mode 384 for uid 1000 and gid 1000 (in run_as_open() at
> runas.c:1212)
> DEBUG1 - 18:59:57.966145191 [8844/8853]: Using run_as worker (in
> run_as() at runas.c:1100)
> DEBUG3 - 18:59:57.966341799 [8844/8853]: open()
> /dev/hugepages/auto-20190715-185955/ust/uid/1000/64-bit/channel0_4
> with flags C2 mode 384 for uid 1000 and gid 1000 (in run_as_open() at
> runas.c:1212)
> DEBUG1 - 18:59:57.966420313 [8844/8853]: Using run_as worker (in
> run_as() at runas.c:1100)
> DEBUG3 - 18:59:57.966548533 [8844/8853]: open()
> /dev/hugepages/auto-20190715-185955/ust/uid/1000/64-bit/channel0_5
> with flags C2 mode 384 for uid 1000 and gid 1000 (in run_as_open() at
> runas.c:1212)
> DEBUG1 - 18:59:57.966567778 [8844/8853]: Using run_as worker (in
> run_as() at runas.c:1100)
> DEBUG3 - 18:59:57.966932907 [8844/8853]: unlink()
> /dev/hugepages/auto-20190715-185955/ust/uid/1000/64-bit/channel0_5
> with for uid 1000 and gid 1000 (in run_as_unlink() at runas.c:1233)
> DEBUG1 - 18:59:57.966950256 [8844/8853]: Using run_as worker (in
> run_as() at runas.c:1100)
> DEBUG3 - 18:59:57.967061802 [8844/8853]: unlink()
> /dev/hugepages/auto-20190715-185955/ust/uid/1000/64-bit/channel0_4
> with for uid 1000 and gid 1000 (in run_as_unlink() at runas.c:1233)
> DEBUG1 - 18:59:57.967081332 [8844/8853]: Using run_as worker (in
> run_as() at runas.c:1100)
> DEBUG3 - 18:59:57.967366982 [8844/8853]: unlink()
> /dev/hugepages/auto-20190715-185955/ust/uid/1000/64-bit/channel0_3
> with for uid 1000 and gid 1000 (in run_as_unlink() at runas.c:1233)
> DEBUG1 - 18:59:57.967419957 [8844/8853]: Using run_as worker (in
> run_as() at runas.c:1100)
> DEBUG3 - 18:59:57.967562353 [8844/8853]: unlink()
> /dev/hugepages/auto-20190715-185955/ust/uid/1000/64-bit/channel0_2
> with for uid 1000 and gid 1000 (in run_as_unlink() at runas.c:1233)
> DEBUG1 - 18:59:57.967587355 [8844/8853]: Using run_as worker (in
> run_as() at runas.c:1100)
> DEBUG3 - 18:59:57.968008237 [8844/8853]: unlink()
> /dev/hugepages/auto-20190715-185955/ust/uid/1000/64-bit/channel0_1
> with for uid 1000 and gid 1000 (in run_as_unlink() at runas.c:1233)
> DEBUG1 - 18:59:57.968104447 [8844/8853]: Using run_as worker (in
> run_as() at runas.c:1100)
> DEBUG3 - 18:59:57.968327138 [8844/8853]: unlink()
> /dev/hugepages/auto-20190715-185955/ust/uid/1000/64-bit/channel0_0
> with for uid 1000 and gid 1000 (in run_as_unlink() at runas.c:1233)
> DEBUG1 - 18:59:57.968349750 [8844/8853]: Using run_as worker (in
> run_as() at runas.c:1100)
> DEBUG3 - 18:59:57.968562473 [8844/8853]: rmdir_recursive()
> /dev/hugepages/auto-20190715-185955 with for uid 1000 and gid 1000 (in
> run_as_rmdir_recursive() at runas.c:1251)
> DEBUG1 - 18:59:57.968582498 [8844/8853]: Using run_as worker (in
> run_as() at runas.c:1100)
> DEBUG1 - 18:59:57.968934753 [8844/8853]: UST consumer cleaning stream
> list (in destroy_channel() at ust-consumer.c:67)
> DEBUG1 - 18:59:57.969019502 [8844/8853]: received command on sock (in
> consumer_thread_sessiond_poll() at consumer.c:3301)
> Error: ask_channel_creation consumer command failed
> Error: Error creating UST channel "channel0" on the consumer daemon
>
> > > This time, I could not locate the problem anymore :(. Do you have any idea
> > > of how to get hugepages shm work in lttng?
> > >
> > > To give you more context here, I was tracing a performance sensitive
> > > program. I didn't want to suffer from the sub-buffer switch cost so I
> > > created a very large sub-buffer (1MB).
> >
> > If you don't mind, how many core are present? How much memory is available on
> > the host?
>
> I compiled and played around with lttng source codes on my vagrant vm
> environment. I assigned 6 cores and 7.8G memory to it. My vm OS is
> Ubuntu 18.04.2 LTS (GNU/Linux 4.15.0-51-generic x86_64).
>
> >
> > Could you share with us the complete sequence of command you use to setup your
> > tracing session?
> >
>
> I used the following commands to test if lttng works with hugepages.
> ```
> lttng create --shm-path=/dev/hugepages
> lttng enable-event --userspace hello_world:my_first_tracepoint
> lttng start
> ```
> And the binary program I traced was the hello_world example in lttng
> documentation page.
>
> > If it is not much trouble could you also share the step you took to setup/mount
> > your hugetlbfs path?
> >
>
> I followed the first section in https://wiki.debian.org/Hugepages to
> set up my hugetlbfs, except I used /dev/hugepages instead of
> /hugepages.
>
> > > I did a benchmark on my tracepoint
> > > and noticed that after running a certain number of tracepoints, I got a
> > > noticeably larger overhead (1200ns larger than other) for every ~130
> > > tracepoints. It turned out that this large overhead was due to a page
> > > fault. The numbers were matched up (130 * 32 bytes = 4160 bytes, which is
> > > approximately the size of a normal page 4kB) and I also used lttng perf
> > > page fault counters to verify it. Therefore, I am looking for a solution to
> > > have lttng create shm on hugepages.
> >
> > Quite interesting!
> >
> > >
> > > Thank you very much! I look forward to hearing from you.
> > >
> > > Best,
> > > Yiteng
> >
> > > _______________________________________________
> > > lttng-dev mailing list
> > > lttng-dev@lists.lttng.org
> > > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> >
> >
> > --
> > Jonathan Rajotte-Julien
> > EfficiOS
>
> Best,
> Yiteng

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: HugePages shared memory support in LLTng
       [not found]     ` <CAO+PNdFotFk6uCF1dySZi9dV6PYpAazWoQpsnU+N58F2b-73FQ@mail.gmail.com>
@ 2019-07-22 19:23       ` Jonathan Rajotte-Julien
       [not found]       ` <20190722192308.GA803@joraj-alpa>
  1 sibling, 0 replies; 13+ messages in thread
From: Jonathan Rajotte-Julien @ 2019-07-22 19:23 UTC (permalink / raw)
  To: Yiteng Guo; +Cc: lttng-dev

Hi Yiteng,

On Mon, Jul 22, 2019 at 02:44:09PM -0400, Yiteng Guo wrote:
> Hi Jonathan,
> 
> I spent these days on this problem and finally figured it out. Here
> are patches I've written.

Sorry for that, I had other stuff ongoing.

I had a brief discussion about this with Mathieu Desnoyers.

Mathieu mentioned that the page faults you are seeing might be related to
qemu/kvm usage of KSM [1]. I did not have time to play around with it and see if
this indeed have an effect. You might be better off trying it since you are
already all setup. Might want to disable it and retry your experiment (if only
doing this on a vm).

[1] https://www.linux-kvm.org/page/KSM

> 
> https://github.com/lttng/lttng-ust/compare/master...guoyiteng:hugepages
> https://github.com/lttng/lttng-tools/compare/master...guoyiteng:hugepages

I'll have a look as soon as possible.

> 
> These two patches are just ad-hoc supports for hugepages, which are
> not intended to be a pull request. If you want to support hugepages in
> future lttng releases, I am glad to help you with that. What I did
> here is to replace `shm_open` with `open` on a hugetlbfs directory. I
> also modified other parts of code (such as memory alignment) to make
> them compatible with huge pages. I didn't use `shm-path` option
> because I noticed that this option would not only relocate the shm of
> ring buffer but also other shm and metadata files. However, we only
> wanted to use huge pages for ring buffer here. Here are commands I
> used to launch an lttng session.
> 
> ```
> lttng create
> lttng enable-channel --userspace --subbuf-size=4M --num-subbuf=2
> --buffers-pid my-channel

Any particular reason to user per-pid buffering?

We normally recommend per-uid tracing + lttng track when possible. Depends on
the final usecase.

> lttng add-context --userspace --type=perf:thread:page-fault
> lttng enable-event --userspace -c my-channel hello_world:my_first_tracepoint
> lttng start
> ```
> 
> My patches worked very well and I didn't get page faults anymore.
> However, the only caveat of this patch is that ringbuffers are not
> destroyed correctly. It leads to a problem that every new lttng
> session acquires some hugepages but never releases them. After I
> created and destroyed several sessions, I would get an error that told
> me there were not enough hugepages to be used. I get around this
> problem by restarting the session daemon. But there should be some way
> to have ringbuffers (or its channel) destroyed elegently when its
> session is destroyed.

That is weird. I would expect the cleanup code to get rid of the ringbuffers as
needed. Or at least try and fail.

> 
> In the meantime, I am also trying another way to get rid of these page
> faults, which is to prefault the ringbuffer shared memory in my
> program. This solution does not need any modification on lttng souce
> codes, which, I think, is a safer way to go. However, to prefault the
> ringbuffer shm, I need to know the address (and size) of the
> ringbuffer. Is there any way to learn this piece of information from
> the user program?

AFAIK, we do not expose the address. I might be wrong here.

How to you plan on prefaulting the pages?

MAP_POPULATE?

Cheers

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: HugePages shared memory support in LLTng
       [not found]       ` <20190722192308.GA803@joraj-alpa>
@ 2019-07-23 15:07         ` Jonathan Rajotte-Julien
       [not found]         ` <20190723150744.GC803@joraj-alpa>
  1 sibling, 0 replies; 13+ messages in thread
From: Jonathan Rajotte-Julien @ 2019-07-23 15:07 UTC (permalink / raw)
  To: Yiteng Guo; +Cc: lttng-dev

[-- Attachment #1: Type: text/plain, Size: 1081 bytes --]

hi Yiteng,

On Mon, Jul 22, 2019 at 03:23:08PM -0400, Jonathan Rajotte-Julien wrote:
> Hi Yiteng,
> 
> On Mon, Jul 22, 2019 at 02:44:09PM -0400, Yiteng Guo wrote:
> > Hi Jonathan,
> > 
> > I spent these days on this problem and finally figured it out. Here
> > are patches I've written.
> 
> Sorry for that, I had other stuff ongoing.
> 
> I had a brief discussion about this with Mathieu Desnoyers.
> 
> Mathieu mentioned that the page faults you are seeing might be related to
> qemu/kvm usage of KSM [1]. I did not have time to play around with it and see if
> this indeed have an effect. You might be better off trying it since you are
> already all setup. Might want to disable it and retry your experiment (if only
> doing this on a vm).

Disregard all of this for now. I think we misunderstood the first email and got
too far too fast.

I modified lttng-ust to use MAP_POPULATE and based on the result from the page_fault
perf counter it seems to achieve what you are looking for.

See attached patch. Let me know if this help.

Cheers.
-- 
Jonathan Rajotte-Julien
EfficiOS

[-- Attachment #2: 0001-Use-MAP_POPULATE-to-reduce-pagefault.patch --]
[-- Type: text/x-diff, Size: 1222 bytes --]

From 2b065e5988067291e3367f413571248f4551acb2 Mon Sep 17 00:00:00 2001
From: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Date: Mon, 22 Jul 2019 17:37:43 -0400
Subject: [PATCH lttng-ust] Use MAP_POPULATE to reduce pagefault

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
---
 libringbuffer/shm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libringbuffer/shm.c b/libringbuffer/shm.c
index 10b3bcef..489322e6 100644
--- a/libringbuffer/shm.c
+++ b/libringbuffer/shm.c
@@ -154,7 +154,7 @@ struct shm_object *_shm_object_table_alloc_shm(struct shm_object_table *table,
 
 	/* memory_map: mmap */
 	memory_map = mmap(NULL, memory_map_size, PROT_READ | PROT_WRITE,
-			  MAP_SHARED, shmfd, 0);
+			  MAP_SHARED | MAP_POPULATE, shmfd, 0);
 	if (memory_map == MAP_FAILED) {
 		PERROR("mmap");
 		goto error_mmap;
@@ -341,7 +341,7 @@ struct shm_object *shm_object_table_append_shm(struct shm_object_table *table,
 
 	/* memory_map: mmap */
 	memory_map = mmap(NULL, memory_map_size, PROT_READ | PROT_WRITE,
-			  MAP_SHARED, shm_fd, 0);
+			  MAP_SHARED | MAP_POPULATE, shm_fd, 0);
 	if (memory_map == MAP_FAILED) {
 		PERROR("mmap");
 		goto error_mmap;
-- 
2.17.1


[-- Attachment #3: Type: text/plain, Size: 156 bytes --]

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: HugePages shared memory support in LLTng
       [not found]           ` <CAO+PNdGhEgeTo35du4ysMcCOUQ0PKE4tuyGg593AE5feZZ4_JQ@mail.gmail.com>
@ 2019-07-23 20:27             ` Jonathan Rajotte-Julien
       [not found]             ` <20190723202723.GD803@joraj-alpa>
  1 sibling, 0 replies; 13+ messages in thread
From: Jonathan Rajotte-Julien @ 2019-07-23 20:27 UTC (permalink / raw)
  To: Yiteng Guo; +Cc: lttng-dev

CC'ing the mailing list back.

On Tue, Jul 23, 2019 at 03:58:09PM -0400, Yiteng Guo wrote:
> Hi Jonathan,
> 
> Thank you for the patch! It is really helpful.

Were you able to observe a positive impact?

This is something we might be interested in upstreaming if we have good
feedback.

> 
> Is there any disadvantage of per-pid buffering? I don't want to have
> processes interfere with each other so I choose per-pid buffering.

The main downside is that each registered applications will get their own
subbuffers resulting in a lot of memory usage depending on your session
configuration. This can get out of hand quickly, especially on systems withs lots
of cores and unknown number of instrumented applications.

If you completely control the runtime, for example when doing benchmarking or
simple analysis, feel free to use what make more sense to you as long as you
understand the pitfalls of each mode.

Cheers

-- 
Jonathan Rajotte-Julien
EfficiOS

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: HugePages shared memory support in LLTng
       [not found]             ` <20190723202723.GD803@joraj-alpa>
@ 2019-07-24 15:54               ` Yiteng Guo
       [not found]               ` <CAO+PNdEfTq5vAqWJAoWK_hyxdjUuQgPPf0sqJXNO9jw1J6RoNg@mail.gmail.com>
                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Yiteng Guo @ 2019-07-24 15:54 UTC (permalink / raw)
  To: Jonathan Rajotte-Julien; +Cc: lttng-dev

(Forgot to cc mailing list in the previous email)

Hi Jonathan,

On Tue, Jul 23, 2019 at 4:27 PM Jonathan Rajotte-Julien
<jonathan.rajotte-julien@efficios.com> wrote:
>
> CC'ing the mailing list back.
>
> On Tue, Jul 23, 2019 at 03:58:09PM -0400, Yiteng Guo wrote:
> > Hi Jonathan,
> >
> > Thank you for the patch! It is really helpful.
>
> Were you able to observe a positive impact?
>
> This is something we might be interested in upstreaming if we have good
> feedback.

Yes, page faults disappeared and I didn't get those periodic overheads anymore.

And I also solved the problem that hugepages are not closed correctly
in my patch. It is my fault that I forgot to close the mmap pointer. I
updated the patch here:
https://github.com/lttng/lttng-ust/compare/master...guoyiteng:hugepages
The current prefault solution works well for me and I will use that
for now. In my opinion, using hugepages could further reduce the TLB
misses, but that involved more changes in source codes than the
prefault solution.

Best,
Yiteng

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: HugePages shared memory support in LLTng
       [not found]               ` <CAO+PNdEfTq5vAqWJAoWK_hyxdjUuQgPPf0sqJXNO9jw1J6RoNg@mail.gmail.com>
@ 2019-07-24 15:59                 ` Jonathan Rajotte-Julien
  0 siblings, 0 replies; 13+ messages in thread
From: Jonathan Rajotte-Julien @ 2019-07-24 15:59 UTC (permalink / raw)
  To: Yiteng Guo; +Cc: lttng-dev

Hi Yiteng,

Make sure to always CC the mailing list.

> > Were you able to observe a positive impact?
> >
> > This is something we might be interested in upstreaming if we have good
> > feedback.
> 
> Yes, page faults disappeared and I didn't get those periodic overheads anymore.

Good. We will have to discuss this with Mathieu Desnoyers when he is back from
vacation and see if always using MAP_POPULATE make sense.

> 
> And I also solved the problem that hugepages are not closed correctly
> in my patch. It is my fault that I forgot to close the mmap pointer. I
> updated the patch here:
> https://github.com/lttng/lttng-ust/compare/master...guoyiteng:hugepages

Good. Would you be interested in posting those patches as RFC on the mailing
list so that we have a trace of this work in the future? Github cannot give us
the persistence needed for this. It might also lead to broader discussion.

> The current prefault solution works well for me and I will use that
> for now. In my opinion, using hugepages could further reduce the TLB
> misses, but that involved more changes in source codes than the
> prefault solution.
> 
> Best,
> Yiteng

-- 
Jonathan Rajotte-Julien
EfficiOS

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: HugePages shared memory support in LLTng
       [not found]             ` <20190723202723.GD803@joraj-alpa>
  2019-07-24 15:54               ` Yiteng Guo
       [not found]               ` <CAO+PNdEfTq5vAqWJAoWK_hyxdjUuQgPPf0sqJXNO9jw1J6RoNg@mail.gmail.com>
@ 2019-07-25 15:40               ` Mathieu Desnoyers
       [not found]               ` <1962899258.11638.1564069223526.JavaMail.zimbra@efficios.com>
  3 siblings, 0 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2019-07-25 15:40 UTC (permalink / raw)
  To: Jonathan Rajotte, Yiteng Guo; +Cc: lttng-dev

----- On Jul 23, 2019, at 9:27 PM, Jonathan Rajotte jonathan.rajotte-julien@efficios.com wrote:

> CC'ing the mailing list back.
> 
> On Tue, Jul 23, 2019 at 03:58:09PM -0400, Yiteng Guo wrote:
>> Hi Jonathan,
>> 
>> Thank you for the patch! It is really helpful.
> 
> Were you able to observe a positive impact?
> 
> This is something we might be interested in upstreaming if we have good
> feedback.
> 
>> 
>> Is there any disadvantage of per-pid buffering? I don't want to have
>> processes interfere with each other so I choose per-pid buffering.
> 
> The main downside is that each registered applications will get their own
> subbuffers resulting in a lot of memory usage depending on your session
> configuration. This can get out of hand quickly, especially on systems withs
> lots
> of cores and unknown number of instrumented applications.

I can add 2 extra cents (or actually a few more) to this answer:

There are a few reasons for using per-uid buffers over per-pid:

- Lower memory consumption for use-cases with many processes,
- Faster process launch time: no need to allocate buffers for each process.
  Useful for use-cases with short-lived processes.
- Keep a flight recorder "snapshot" available for all processes, including
  those which recently exited. Indeed, the per-pid buffers don't stay around
  for snapshot after a process exits or is killed.

There are however a few advantages for per-pid buffers:

- Isolation: if one PID generates corrupted trace data, it does not interfere
  with other PIDs buffers,
- If one PID is killed between reserve and commit, it does not make that specific
  per-cpu ring buffer unusable for the rest of the tracing session lifetime.

Hoping this information helps making the right choice for your deployment!

Thanks,

Mathieu


> 
> If you completely control the runtime, for example when doing benchmarking or
> simple analysis, feel free to use what make more sense to you as long as you
> understand the pitfalls of each mode.
> 
> Cheers
> 
> --
> Jonathan Rajotte-Julien
> EfficiOS
> _______________________________________________
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: HugePages shared memory support in LLTng
       [not found]               ` <1962899258.11638.1564069223526.JavaMail.zimbra@efficios.com>
@ 2019-07-25 17:59                 ` Trent Piepho via lttng-dev
       [not found]                 ` <1564077561.2343.121.camel@impinj.com>
  1 sibling, 0 replies; 13+ messages in thread
From: Trent Piepho via lttng-dev @ 2019-07-25 17:59 UTC (permalink / raw)
  To: mathieu.desnoyers, guoyiteng, jonathan.rajotte-julien; +Cc: lttng-dev

On Thu, 2019-07-25 at 11:40 -0400, Mathieu Desnoyers wrote:
> There are a few reasons for using per-uid buffers over per-pid:
> 
> - Lower memory consumption for use-cases with many processes,
> - Faster process launch time: no need to allocate buffers for each process.
> Useful for use-cases with short-lived processes.
> - Keep a flight recorder "snapshot" available for all processes, including
> those which recently exited. Indeed, the per-pid buffers don't stay around
> for snapshot after a process exits or is killed.
> 
> There are however a few advantages for per-pid buffers:
> 
> - Isolation: if one PID generates corrupted trace data, it does not interfere
> with other PIDs buffers,
> - If one PID is killed between reserve and commit, it does not make that specific
> per-cpu ring buffer unusable for the rest of the tracing session lifetime.
> 
> Hoping this information helps making the right choice for your deployment!

We recently had this discussion for an embedded product that uses LTTng
to gather trace data during operation.  In our case, we want to have a flight recorder of the last X seconds of trace data, for the entire device.  X seconds times Y byte/sec data generation rate ends up being a very large portion (~30%) of the total memory available.  This has to be in RAM, using flash memory for this is not a good idea.

If we use per-PID buffers, then the buffer size needed for the largest
producer of trace data times the total number of processes is too
large: far larger than the device's memory size.  Some processes
produce trace data at a much higher rate than others.  A buffer for X
seconds of data on one processes ends up being a buffer for 10*X
seconds of data on another.  There's not enough RAM for 10*X second
buffers.

If we use per-UID buffers, then we must run everything as one UID. 
Which, on an embedded system, is not that bad, but negatively impacts
the security of the software.  Now all processes, which generate data
at different rates, can share one buffer.  Much more efficient that
having to reserve space the same space for the largest and smallest
producers.

But there ends up being another problem, the flight recorder data needs
to be saved to make use of.  To tmpfs in RAM, since the device's flash
is not suitable and used elsewhere anyway.  So one needs 2x the RAM,
one for the ring buffer and one for the trace data dump in tmpfs of the
ring buffer.

So what we did was not use flight recorder mode.  We configured lttng
to use a limited number of smaller trace files and trace file rotation.
 And used small ring buffers, which ended up not needing to be very
large to avoid overflow (I imagine saving the data to tmpfs is fast).

The trace files are in effect a per-session buffer, which is what we
want for greatest efficiency in space utilization.  And we can archive
those and download them when "something happens" without paying extra
cost for space.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: HugePages shared memory support in LLTng
       [not found]                 ` <1564077561.2343.121.camel@impinj.com>
@ 2019-07-26  4:37                   ` Yiteng Guo
       [not found]                   ` <CAO+PNdGRD3BkfEOgjCLo+kgXreZD_GqYnW6LB7-ELjnowk+GjQ@mail.gmail.com>
  1 sibling, 0 replies; 13+ messages in thread
From: Yiteng Guo @ 2019-07-26  4:37 UTC (permalink / raw)
  To: Trent Piepho, mathieu.desnoyers, jonathan.rajotte-julien; +Cc: lttng-dev

Hello,

Thank you very much for all the information about per-uid and per-pid
buffer. It is really helpful for me to make a decision.

@Jonathan: This is my first time to get involved in an open-source
project on the mailing list. I don't quite know how RFC works. Should
I just do `git format-patch` and copy-paste the diff to an email? Is
there any specific format for the RFC email and its title?

Best,
Yiteng

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: HugePages shared memory support in LLTng
       [not found]                   ` <CAO+PNdGRD3BkfEOgjCLo+kgXreZD_GqYnW6LB7-ELjnowk+GjQ@mail.gmail.com>
@ 2019-07-29 19:05                     ` Jonathan Rajotte-Julien
  0 siblings, 0 replies; 13+ messages in thread
From: Jonathan Rajotte-Julien @ 2019-07-29 19:05 UTC (permalink / raw)
  To: Yiteng Guo; +Cc: lttng-dev

> @Jonathan: This is my first time to get involved in an open-source
> project on the mailing list. I don't quite know how RFC works. Should
> I just do `git format-patch` and copy-paste the diff to an email? Is
> there any specific format for the RFC email and its title?

Copy paste to email is the way to go. You can also lookup "git send email" if you
want [1].

Make sure to have the following prefix in the email title: [RFC PATCH <project name>] Hugepages ...

<project name> would be either lttng-ust or lttng-tools for your patches.

Also make sure to give all the necessary details to test the patches and also the
pitfall/advantages you know regarding the use of hugepages. We do no expect the
patchset to be "integrated" into lttng (command line switch etc.) but one should
be able to take your patch and at least get a version of lttng working. Make
sure to indicate the current commit for each projects you are basing this work
on. As I said before, this is more a way of archiving the work you have done
then actively working on making lttng support hugepages.

[1] https://git-send-email.io

Cheers

-- 
Jonathan Rajotte-Julien
EfficiOS

^ permalink raw reply	[flat|nested] 13+ messages in thread

* HugePages shared memory support in LLTng
@ 2019-07-12 22:18 Yiteng Guo
  0 siblings, 0 replies; 13+ messages in thread
From: Yiteng Guo @ 2019-07-12 22:18 UTC (permalink / raw)
  To: lttng-dev


[-- Attachment #1.1: Type: text/plain, Size: 1781 bytes --]

Hello,

I am wondering if there is any way for lttng-ust to create its shm on
hugepages. I noticed that there was an option `--shm-path` which can be
used to change the location of shm. However, if I specified the path to a
`hugetlbfs` such as /dev/hugepages, I would get errors in lttng-sessiond
and no trace data were generated.

The error I got was
```
PERROR - 17:54:56.740674 [8163/8168]: Error appending to metadata file:
Invalid argument (in lttng_metadata_printf() at ust-metadata.c:176)
Error: Failed to generate session metadata (errno = -1)
```
I took a look at lttng code base and found that lttng used `write` to
generate a metadata file under `--shm-path`. However, it looks like
`hugetlbfs` does not support `write` operation. I did a simple patch with
`mmap` to get around this problem. Then, I got another error:
```
Error: Error creating UST channel "my-channel" on the consumer daemon
```
This time, I could not locate the problem anymore :(. Do you have any idea
of how to get hugepages shm work in lttng?

To give you more context here, I was tracing a performance sensitive
program. I didn't want to suffer from the sub-buffer switch cost so I
created a very large sub-buffer (1MB). I did a benchmark on my tracepoint
and noticed that after running a certain number of tracepoints, I got a
noticeably larger overhead (1200ns larger than other) for every ~130
tracepoints. It turned out that this large overhead was due to a page
fault. The numbers were matched up (130 * 32 bytes = 4160 bytes, which is
approximately the size of a normal page 4kB) and I also used lttng perf
page fault counters to verify it. Therefore, I am looking for a solution to
have lttng create shm on hugepages.

Thank you very much! I look forward to hearing from you.

Best,
Yiteng

[-- Attachment #1.2: Type: text/html, Size: 2041 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-07-29 19:05 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAO+PNdHdLFk=Q0L2BLGnz8xCvdgMw3aYpZuAZumBOWKraKTnAw@mail.gmail.com>
2019-07-15 14:33 ` HugePages shared memory support in LLTng Jonathan Rajotte-Julien
     [not found] ` <20190715143302.GA2017@joraj-alpa>
2019-07-15 19:21   ` Yiteng Guo
     [not found]   ` <CAO+PNdHW7O98QSdWyA5U6e=gtLmdFt77wHOT=eHb-Py1W3A-oQ@mail.gmail.com>
2019-07-22 18:44     ` Yiteng Guo
     [not found]     ` <CAO+PNdFotFk6uCF1dySZi9dV6PYpAazWoQpsnU+N58F2b-73FQ@mail.gmail.com>
2019-07-22 19:23       ` Jonathan Rajotte-Julien
     [not found]       ` <20190722192308.GA803@joraj-alpa>
2019-07-23 15:07         ` Jonathan Rajotte-Julien
     [not found]         ` <20190723150744.GC803@joraj-alpa>
     [not found]           ` <CAO+PNdGhEgeTo35du4ysMcCOUQ0PKE4tuyGg593AE5feZZ4_JQ@mail.gmail.com>
2019-07-23 20:27             ` Jonathan Rajotte-Julien
     [not found]             ` <20190723202723.GD803@joraj-alpa>
2019-07-24 15:54               ` Yiteng Guo
     [not found]               ` <CAO+PNdEfTq5vAqWJAoWK_hyxdjUuQgPPf0sqJXNO9jw1J6RoNg@mail.gmail.com>
2019-07-24 15:59                 ` Jonathan Rajotte-Julien
2019-07-25 15:40               ` Mathieu Desnoyers
     [not found]               ` <1962899258.11638.1564069223526.JavaMail.zimbra@efficios.com>
2019-07-25 17:59                 ` Trent Piepho via lttng-dev
     [not found]                 ` <1564077561.2343.121.camel@impinj.com>
2019-07-26  4:37                   ` Yiteng Guo
     [not found]                   ` <CAO+PNdGRD3BkfEOgjCLo+kgXreZD_GqYnW6LB7-ELjnowk+GjQ@mail.gmail.com>
2019-07-29 19:05                     ` Jonathan Rajotte-Julien
2019-07-12 22:18 Yiteng Guo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.