fs: Improve eventpoll logging to stop indicting timerfd
diff mbox series

Message ID 20210302034928.3761098-1-varmam@google.com
State New, archived
Headers show
Series
  • fs: Improve eventpoll logging to stop indicting timerfd
Related show

Commit Message

Manish Varma March 2, 2021, 3:49 a.m. UTC
timerfd doesn't create any wakelocks, but eventpoll can.  When it does,
it names them after the underlying file descriptor, and since all
timerfd file descriptors are named "[timerfd]" (which saves memory on
systems like desktops with potentially many timerfd instances), all
wakesources created as a result of using the eventpoll-on-timerfd idiom
are called... "[timerfd]".

However, it becomes impossible to tell which "[timerfd]" wakesource is
affliated with which process and hence troubleshooting is difficult.

This change addresses this problem by changing the way eventpoll and
timerfd file descriptors (and thus the eventpoll-on-timerfd wakesources)
are named:

1) timerfd descriptors are now named "[timerfdN:P]", where N is a
monotonically increasing integer for each timerfd instance created and P
is the command string of the creating process.
2) the top-level per-process eventpoll wakesource is now named "epoll:P"
(instead of just "eventpoll"), where P, again, is the command string of
the creating process
3) individual per-underlying-filedescriptor eventpoll wakesources are
now named "epollitem:P.F", where P is the command string of the creating
process and F is the name of the underlying file descriptor.

All together, that will give us names like the following:

1) timerfd file descriptor: [timerfd14:system_server]
2) eventpoll top-level per-process wakesource: epoll:system_server
3) eventpoll-on-timerfd per-descriptor wakesource:
epollitem:system_server.[timerfd14:system_server]

Co-developed-by: Kelly Rossmoyer <krossmo@google.com>
Signed-off-by: Kelly Rossmoyer <krossmo@google.com>
Signed-off-by: Manish Varma <varmam@google.com>
---
 fs/eventpoll.c | 10 ++++++++--
 fs/timerfd.c   | 11 ++++++++++-
 2 files changed, 18 insertions(+), 3 deletions(-)

Comments

Thomas Gleixner March 18, 2021, 1:04 p.m. UTC | #1
Manish,

On Mon, Mar 01 2021 at 19:49, Manish Varma wrote:

> All together, that will give us names like the following:
>
> 1) timerfd file descriptor: [timerfd14:system_server]
> 2) eventpoll top-level per-process wakesource: epoll:system_server
> 3) eventpoll-on-timerfd per-descriptor wakesource:
> epollitem:system_server.[timerfd14:system_server]

All together that should be splitted up into a change to eventpoll and
timerfd.

> diff --git a/fs/timerfd.c b/fs/timerfd.c
> index c5509d2448e3..4249e8c9a38c 100644
> --- a/fs/timerfd.c
> +++ b/fs/timerfd.c
> @@ -46,6 +46,8 @@ struct timerfd_ctx {
>  	bool might_cancel;
>  };
>  
> +static atomic_t instance_count = ATOMIC_INIT(0);

instance_count is misleading as it does not do any accounting of
instances as the name suggests.

>  static LIST_HEAD(cancel_list);
>  static DEFINE_SPINLOCK(cancel_lock);
>  
> @@ -391,6 +393,9 @@ SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags)
>  {
>  	int ufd;
>  	struct timerfd_ctx *ctx;
> +	char task_comm_buf[sizeof(current->comm)];
> +	char file_name_buf[32];
> +	int instance;
>  
>  	/* Check the TFD_* constants for consistency.  */
>  	BUILD_BUG_ON(TFD_CLOEXEC != O_CLOEXEC);
> @@ -427,7 +432,11 @@ SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags)
>  
>  	ctx->moffs = ktime_mono_to_real(0);
>  
> -	ufd = anon_inode_getfd("[timerfd]", &timerfd_fops, ctx,
> +	instance = atomic_inc_return(&instance_count);
> +	get_task_comm(task_comm_buf, current);

How is current->comm supposed to be unique? And with a wrapping counter
like the above you can end up with identical file descriptor names.

What's wrong with simply using the PID which is guaranteed to be unique
for the life time of a process/task?

> +	snprintf(file_name_buf, sizeof(file_name_buf), "[timerfd%d:%s]",
> +		 instance, task_comm_buf);
> +	ufd = anon_inode_getfd(file_name_buf, &timerfd_fops, ctx,
>  			       O_RDWR | (flags & TFD_SHARED_FCNTL_FLAGS));
>  	if (ufd < 0)
>  		kfree(ctx);

I actually wonder, whether this should be part of anon_inode_get*().

Aside of that this is a user space visible change both for eventpoll and
timerfd.

Have you carefully investigated whether there is existing user space
which might depend on the existing naming conventions?

The changelog is silent about this...

Thanks,

        tglx
Manish Varma March 22, 2021, 5:15 p.m. UTC | #2
Hi Thomas,

On Thu, Mar 18, 2021 at 6:04 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Manish,
>
> On Mon, Mar 01 2021 at 19:49, Manish Varma wrote:
>
> > All together, that will give us names like the following:
> >
> > 1) timerfd file descriptor: [timerfd14:system_server]
> > 2) eventpoll top-level per-process wakesource: epoll:system_server
> > 3) eventpoll-on-timerfd per-descriptor wakesource:
> > epollitem:system_server.[timerfd14:system_server]
>
> All together that should be splitted up into a change to eventpoll and
> timerfd.
>

Noted.

> > diff --git a/fs/timerfd.c b/fs/timerfd.c
> > index c5509d2448e3..4249e8c9a38c 100644
> > --- a/fs/timerfd.c
> > +++ b/fs/timerfd.c
> > @@ -46,6 +46,8 @@ struct timerfd_ctx {
> >       bool might_cancel;
> >  };
> >
> > +static atomic_t instance_count = ATOMIC_INIT(0);
>
> instance_count is misleading as it does not do any accounting of
> instances as the name suggests.
>

Not sure if I am missing a broader point here, but the objective of this
patch is to:
A. To help find the process a given timerfd associated with, and
B. one step further, if there are multiple fds created by a single
process then label each instance using monotonically increasing integer
i.e. "instance_count" to help identify each of them separately.

So, instance_count in my mind helps with "B", i.e. to keep track and
identify each instance of timerfd individually.

> >  static LIST_HEAD(cancel_list);
> >  static DEFINE_SPINLOCK(cancel_lock);
> >
> > @@ -391,6 +393,9 @@ SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags)
> >  {
> >       int ufd;
> >       struct timerfd_ctx *ctx;
> > +     char task_comm_buf[sizeof(current->comm)];
> > +     char file_name_buf[32];
> > +     int instance;
> >
> >       /* Check the TFD_* constants for consistency.  */
> >       BUILD_BUG_ON(TFD_CLOEXEC != O_CLOEXEC);
> > @@ -427,7 +432,11 @@ SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags)
> >
> >       ctx->moffs = ktime_mono_to_real(0);
> >
> > -     ufd = anon_inode_getfd("[timerfd]", &timerfd_fops, ctx,
> > +     instance = atomic_inc_return(&instance_count);
> > +     get_task_comm(task_comm_buf, current);
>
> How is current->comm supposed to be unique? And with a wrapping counter
> like the above you can end up with identical file descriptor names.
>
> What's wrong with simply using the PID which is guaranteed to be unique
> for the life time of a process/task?
>

Thanks, and Yes, on a second thought, PID sounds like a better option.
I will address in v2 patch.

> > +     snprintf(file_name_buf, sizeof(file_name_buf), "[timerfd%d:%s]",
> > +              instance, task_comm_buf);
> > +     ufd = anon_inode_getfd(file_name_buf, &timerfd_fops, ctx,
> >                              O_RDWR | (flags & TFD_SHARED_FCNTL_FLAGS));
> >       if (ufd < 0)
> >               kfree(ctx);
>
> I actually wonder, whether this should be part of anon_inode_get*().
>

I am curious (and open at the same time) if that will be helpful..
In the case of timerfd, I could see it adds up value by stuffing more
context to the file descriptor name as eventpoll is using the same file
descriptor names as wakesource name, and hence the cost of slightly
longer file descriptor name justifies. But I don't have a solid reason
if this additional cost (of longer file descriptor names) will be
helpful in general with other file descriptors.

> Aside of that this is a user space visible change both for eventpoll and
> timerfd.
>
> Have you carefully investigated whether there is existing user space
> which might depend on the existing naming conventions?
>

I am not sure how I can confirm that for all userspace, but open for
suggestions if you can share some ideas.

However, I have verified and can confirm for Android userspace that
there is no dependency on existing naming conventions for timerfd and
eventpoll wakesource names, if that helps.

> The changelog is silent about this...
>

Noted - will add this into the revised patch.

> Thanks,
>
>         tglx

Thanks,
Manish
Thomas Gleixner March 22, 2021, 9:40 p.m. UTC | #3
Manish,

On Mon, Mar 22 2021 at 10:15, Manish Varma wrote:
> On Thu, Mar 18, 2021 at 6:04 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>> > +static atomic_t instance_count = ATOMIC_INIT(0);
>>
>> instance_count is misleading as it does not do any accounting of
>> instances as the name suggests.
>>
>
> Not sure if I am missing a broader point here, but the objective of this
> patch is to:
> A. To help find the process a given timerfd associated with, and
> B. one step further, if there are multiple fds created by a single
> process then label each instance using monotonically increasing integer
> i.e. "instance_count" to help identify each of them separately.
>
> So, instance_count in my mind helps with "B", i.e. to keep track and
> identify each instance of timerfd individually.

I know what you want to do. The point is that instance_count is the
wrong name as it suggests that it counts instances, and that in most
cases implies active instances.

It's not a counter, it's a token generator which allows you to create
unique ids. The fact that it is just incrementing once per created file
descriptor does not matter. That's just an implementation detail.

Name it something like timerfd_create_id or timerfd_session_id which
clearly tells that this is not counting any thing. It immediately tells
the purpose of generating an id.

Naming matters when reading code, really.

>> > +     snprintf(file_name_buf, sizeof(file_name_buf), "[timerfd%d:%s]",
>> > +              instance, task_comm_buf);
>> > +     ufd = anon_inode_getfd(file_name_buf, &timerfd_fops, ctx,
>> >                              O_RDWR | (flags & TFD_SHARED_FCNTL_FLAGS));
>> >       if (ufd < 0)
>> >               kfree(ctx);
>>
>> I actually wonder, whether this should be part of anon_inode_get*().
>>
>
> I am curious (and open at the same time) if that will be helpful..
> In the case of timerfd, I could see it adds up value by stuffing more
> context to the file descriptor name as eventpoll is using the same file
> descriptor names as wakesource name, and hence the cost of slightly
> longer file descriptor name justifies. But I don't have a solid reason
> if this additional cost (of longer file descriptor names) will be
> helpful in general with other file descriptors.

Obviously you want to make that depend on a flag handed to anon_...().

The point is that there will be the next anonfd usecase which needs
unique identification at some point. That is going to copy&pasta that
timerfd code and then make it slightly different just because and then
userspace needs to parse yet another format.

>> Aside of that this is a user space visible change both for eventpoll and
>> timerfd.

Not when done right.

>> Have you carefully investigated whether there is existing user space
>> which might depend on the existing naming conventions?
>>
> I am not sure how I can confirm that for all userspace, but open for
> suggestions if you can share some ideas.
>
> However, I have verified and can confirm for Android userspace that
> there is no dependency on existing naming conventions for timerfd and
> eventpoll wakesource names, if that helps.

Well, there is a world outside Android and you're working for a company
which should have tools to search for '[timerfd]' usage in a gazillion of
projects. The obvious primary targets are distros of all sorts. I'm sure
there are ways to figure this out without doing it manually.

Not that I expect any real dependencies on it, but as always the devil
is in the details.

Thanks,

        tglx
Manish Varma March 25, 2021, 5:18 a.m. UTC | #4
Hi Thomas,

On Mon, Mar 22, 2021 at 2:40 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Manish,
>
> On Mon, Mar 22 2021 at 10:15, Manish Varma wrote:
> > On Thu, Mar 18, 2021 at 6:04 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> >> > +static atomic_t instance_count = ATOMIC_INIT(0);
> >>
> >> instance_count is misleading as it does not do any accounting of
> >> instances as the name suggests.
> >>
> >
> > Not sure if I am missing a broader point here, but the objective of this
> > patch is to:
> > A. To help find the process a given timerfd associated with, and
> > B. one step further, if there are multiple fds created by a single
> > process then label each instance using monotonically increasing integer
> > i.e. "instance_count" to help identify each of them separately.
> >
> > So, instance_count in my mind helps with "B", i.e. to keep track and
> > identify each instance of timerfd individually.
>
> I know what you want to do. The point is that instance_count is the
> wrong name as it suggests that it counts instances, and that in most
> cases implies active instances.
>
> It's not a counter, it's a token generator which allows you to create
> unique ids. The fact that it is just incrementing once per created file
> descriptor does not matter. That's just an implementation detail.
>
> Name it something like timerfd_create_id or timerfd_session_id which
> clearly tells that this is not counting any thing. It immediately tells
> the purpose of generating an id.
>
> Naming matters when reading code, really.
>

Noted, and thanks for the clarification!

> >> > +     snprintf(file_name_buf, sizeof(file_name_buf), "[timerfd%d:%s]",
> >> > +              instance, task_comm_buf);
> >> > +     ufd = anon_inode_getfd(file_name_buf, &timerfd_fops, ctx,
> >> >                              O_RDWR | (flags & TFD_SHARED_FCNTL_FLAGS));
> >> >       if (ufd < 0)
> >> >               kfree(ctx);
> >>
> >> I actually wonder, whether this should be part of anon_inode_get*().
> >>
> >
> > I am curious (and open at the same time) if that will be helpful..
> > In the case of timerfd, I could see it adds up value by stuffing more
> > context to the file descriptor name as eventpoll is using the same file
> > descriptor names as wakesource name, and hence the cost of slightly
> > longer file descriptor name justifies. But I don't have a solid reason
> > if this additional cost (of longer file descriptor names) will be
> > helpful in general with other file descriptors.
>
> Obviously you want to make that depend on a flag handed to anon_...().

Unfortunately, changing file descriptor names does not seem to be a viable
option here (more details in my answer in the next section), and
hence changes in anon_...() does not seem to be required.

>
> The point is that there will be the next anonfd usecase which needs
> unique identification at some point. That is going to copy&pasta that
> timerfd code and then make it slightly different just because and then
> userspace needs to parse yet another format.
>
> >> Aside of that this is a user space visible change both for eventpoll and
> >> timerfd.
>
> Not when done right.
>
> >> Have you carefully investigated whether there is existing user space
> >> which might depend on the existing naming conventions?
> >>
> > I am not sure how I can confirm that for all userspace, but open for
> > suggestions if you can share some ideas.
> >
> > However, I have verified and can confirm for Android userspace that
> > there is no dependency on existing naming conventions for timerfd and
> > eventpoll wakesource names, if that helps.
>
> Well, there is a world outside Android and you're working for a company
> which should have tools to search for '[timerfd]' usage in a gazillion of
> projects. The obvious primary targets are distros of all sorts. I'm sure
> there are ways to figure this out without doing it manually.
>
> Not that I expect any real dependencies on it, but as always the devil
> is in the details.
>

Right, there are some userspace which depends on "[timerfd]" string
https://codesearch.debian.net/search?q=%22%5Btimerfd%5D%22&literal=1

So, modifying file descriptor names at-least for timerfd will definitely
break those.

With that said, I am now thinking about leaving alone the file descriptor
names as is, and instead, adding those extra information about the
associated processes (i.e. process name or rather PID of the
process) along with token ID directly into wakesource name, at the
time of creating new wakesource i.e. in ep_create_wakeup_source().

So, the wakesource names, that currently named as "[timerfd]", will be
named something like:
"epollitem<N>:<PID>.[timerfd]"

Where N is the number of wakesource created since boot.

This way we can still associate the process with the wakesource
name and also distinguish multiple instances of wakesources using
the integer identifier.

Please share your thoughts!

> Thanks,
>
>         tglx

Thanks,
Manish
--
Manish Varma | Software Engineer | varmam@google.com | 650-686-0858
Thomas Gleixner March 25, 2021, 7:06 a.m. UTC | #5
Manish,

On Wed, Mar 24 2021 at 22:18, Manish Varma wrote:
> On Mon, Mar 22, 2021 at 2:40 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> Not that I expect any real dependencies on it, but as always the devil
>> is in the details.
>
> Right, there are some userspace which depends on "[timerfd]" string
> https://codesearch.debian.net/search?q=%22%5Btimerfd%5D%22&literal=1

Details :)

> So, modifying file descriptor names at-least for timerfd will definitely
> break those.
>
> With that said, I am now thinking about leaving alone the file descriptor
> names as is, and instead, adding those extra information about the
> associated processes (i.e. process name or rather PID of the
> process) along with token ID directly into wakesource name, at the
> time of creating new wakesource i.e. in ep_create_wakeup_source().
>
> So, the wakesource names, that currently named as "[timerfd]", will be
> named something like:
> "epollitem<N>:<PID>.[timerfd]"
>
> Where N is the number of wakesource created since boot.

Where N is a unique ID token. :)

> This way we can still associate the process with the wakesource
> name and also distinguish multiple instances of wakesources using
> the integer identifier.

If that solves your problem and does not make anything else breaks which
relies on the exisitng epollitem naming convention, then I don't see a
problem with that.

Thanks,

        tglx

Patch
diff mbox series

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 185252d8f4c7..af28be4285f8 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1451,15 +1451,21 @@  static int ep_create_wakeup_source(struct epitem *epi)
 {
 	struct name_snapshot n;
 	struct wakeup_source *ws;
+	char task_comm_buf[sizeof(current->comm)];
+	char buf[64];
+
+	get_task_comm(task_comm_buf, current);
 
 	if (!epi->ep->ws) {
-		epi->ep->ws = wakeup_source_register(NULL, "eventpoll");
+		snprintf(buf, sizeof(buf), "epoll:%s", task_comm_buf);
+		epi->ep->ws = wakeup_source_register(NULL, buf);
 		if (!epi->ep->ws)
 			return -ENOMEM;
 	}
 
 	take_dentry_name_snapshot(&n, epi->ffd.file->f_path.dentry);
-	ws = wakeup_source_register(NULL, n.name.name);
+	snprintf(buf, sizeof(buf), "epollitem:%s.%s", task_comm_buf, n.name.name);
+	ws = wakeup_source_register(NULL, buf);
 	release_dentry_name_snapshot(&n);
 
 	if (!ws)
diff --git a/fs/timerfd.c b/fs/timerfd.c
index c5509d2448e3..4249e8c9a38c 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -46,6 +46,8 @@  struct timerfd_ctx {
 	bool might_cancel;
 };
 
+static atomic_t instance_count = ATOMIC_INIT(0);
+
 static LIST_HEAD(cancel_list);
 static DEFINE_SPINLOCK(cancel_lock);
 
@@ -391,6 +393,9 @@  SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags)
 {
 	int ufd;
 	struct timerfd_ctx *ctx;
+	char task_comm_buf[sizeof(current->comm)];
+	char file_name_buf[32];
+	int instance;
 
 	/* Check the TFD_* constants for consistency.  */
 	BUILD_BUG_ON(TFD_CLOEXEC != O_CLOEXEC);
@@ -427,7 +432,11 @@  SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags)
 
 	ctx->moffs = ktime_mono_to_real(0);
 
-	ufd = anon_inode_getfd("[timerfd]", &timerfd_fops, ctx,
+	instance = atomic_inc_return(&instance_count);
+	get_task_comm(task_comm_buf, current);
+	snprintf(file_name_buf, sizeof(file_name_buf), "[timerfd%d:%s]",
+		 instance, task_comm_buf);
+	ufd = anon_inode_getfd(file_name_buf, &timerfd_fops, ctx,
 			       O_RDWR | (flags & TFD_SHARED_FCNTL_FLAGS));
 	if (ufd < 0)
 		kfree(ctx);