All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Cc: igt-dev@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH i-g-t 2/2] i915/sysfs_clients: Check that client ids are cyclic
Date: Wed, 27 Jan 2021 09:12:00 +0000	[thread overview]
Message-ID: <5c3f792f-9b87-ba90-69ee-4c2c982d8aa5@linux.intel.com> (raw)
In-Reply-To: <161169867584.2943.4917889020493124389@build.alporthouse.com>


On 26/01/2021 22:04, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2021-01-26 16:47:14)
>> On 26/01/2021 13:05, Chris Wilson wrote:
>>> The client id used is a cyclic allocator as that reduces the likelihood
>>> of userspace seeing the same id used again (and so confusing the new
>>> client as the old). Verify that each new client has an id greater than
>>> the last.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>>> ---
>>>    tests/i915/sysfs_clients.c | 129 +++++++++++++++++++++++++++++++------
>>>    1 file changed, 108 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/tests/i915/sysfs_clients.c b/tests/i915/sysfs_clients.c
>>> index a3a1f81e1..d2c1ebc5f 100644
>>> --- a/tests/i915/sysfs_clients.c
>>> +++ b/tests/i915/sysfs_clients.c
>>> @@ -8,6 +8,7 @@
>>>    #include <errno.h>
>>>    #include <fcntl.h>
>>>    #include <inttypes.h>
>>> +#include <limits.h>
>>>    #include <math.h>
>>>    #include <sched.h>
>>>    #include <sys/socket.h>
>>> @@ -47,6 +48,8 @@
>>>    #define assert_within_epsilon(x, ref, tolerance) \
>>>        __assert_within_epsilon(x, ref, tolerance / 100., tolerance / 100.)
>>>    
>>> +#define BUFSZ 280
>>> +
>>>    #define MI_BATCH_BUFFER_START (0x31 << 23)
>>>    #define MI_BATCH_BUFFER_END (0xa << 23)
>>>    #define MI_ARB_CHECK (0x5 << 23)
>>> @@ -75,7 +78,7 @@ static void pidname(int i915, int clients)
>>>    {
>>>        struct dirent *de;
>>>        int sv[2], rv[2];
>>> -     char buf[280];
>>> +     char buf[BUFSZ];
>>>        int me = -1;
>>>        long count;
>>>        pid_t pid;
>>> @@ -180,7 +183,7 @@ static long count_clients(int clients)
>>>    {
>>>        struct dirent *de;
>>>        long count = 0;
>>> -     char buf[280];
>>> +     char buf[BUFSZ];
>>>        DIR *dir;
>>>    
>>>        dir = fdopendir(dup(clients));
>>> @@ -229,32 +232,113 @@ static void create(int i915, int clients)
>>>        igt_assert_eq(count_clients(clients), 1);
>>>    }
>>>    
>>> +static const char *find_client(int clients, pid_t pid, char *buf)
>>> +{
>>> +     DIR *dir = fdopendir(dup(clients));
>>> +
>>> +     /* Reading a dir as it changes does not appear to be stable, SEP */
>>
>> Oh yes, it is POSIX undefined as far as I could figure out. You are
>> creating/destroying clients while reading the dir from different
>> threads?
> 
> Different processes on different directory fd. man readdir(3) does say
> that glibc is threadsafe, but that is not a requirement of POSIX. The
> problem we are seeing is that the directory contents are not stable
> around the readdir loop as clients are being created/destroyed, causing
> dirents to be missed.
> 
> stracing the problem shows that glibc used a 32KiB buffer for getdents
> and only a single syscall was required to grab the entire directory. No
> errors were seen before the missed dirent. It just seemed to be missing.
> 
> Afaict there is no delay in creating the sysfs entry, and I think there
> should also be no delay in creating the kernfs node and inodes, so my
> initial assumption is that it's something not quite happy in the
> kernfs directory that shows up under stress. A second loop ran for a
> long time without incident locally, but CI seems much more adept at
> finding it.

I think we are talking about the same thing..

>> I think best might be to introduce explicit sync points between
>> those two entities to make this reliable. In another review for the same
>> problem I suggested pipes for implementing this handshake. Along the
>> lines of "let child open/close some processes, make it wait for parent;
>> parent scans sysfs, signals child to open/close some more; repeat for a
>> while".
> 
> Sadly, I don't expect userspace to do that :)
> 
> I do expect userspace to search for its own client/ upon creation, and I
> expect there to be many clients being opened at once. So I think so long
> as the use of the library readdir is correct (distinct processes with
> distinct directory fd, so I think there's no accidental sharing) this
> represents. Hmm. fsync(dirfd).

...and I think it is just not guaranteed that reading the dir in 
parallel to dentries being created/deleted is guaranteed to work (no 
guarantee for discovering newly addded or deleted clients since the 
iteration started). Or you are saying that a client opens itself starts 
readdir loop and then fails to find itself?

Otherwise, I was thinking, if we wanted to allow clients to inspect 
themselves, we should really add a getrusage(2) like ioctl.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

WARNING: multiple messages have this Message-ID (diff)
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Cc: igt-dev@lists.freedesktop.org
Subject: Re: [igt-dev] [PATCH i-g-t 2/2] i915/sysfs_clients: Check that client ids are cyclic
Date: Wed, 27 Jan 2021 09:12:00 +0000	[thread overview]
Message-ID: <5c3f792f-9b87-ba90-69ee-4c2c982d8aa5@linux.intel.com> (raw)
In-Reply-To: <161169867584.2943.4917889020493124389@build.alporthouse.com>


On 26/01/2021 22:04, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2021-01-26 16:47:14)
>> On 26/01/2021 13:05, Chris Wilson wrote:
>>> The client id used is a cyclic allocator as that reduces the likelihood
>>> of userspace seeing the same id used again (and so confusing the new
>>> client as the old). Verify that each new client has an id greater than
>>> the last.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>>> ---
>>>    tests/i915/sysfs_clients.c | 129 +++++++++++++++++++++++++++++++------
>>>    1 file changed, 108 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/tests/i915/sysfs_clients.c b/tests/i915/sysfs_clients.c
>>> index a3a1f81e1..d2c1ebc5f 100644
>>> --- a/tests/i915/sysfs_clients.c
>>> +++ b/tests/i915/sysfs_clients.c
>>> @@ -8,6 +8,7 @@
>>>    #include <errno.h>
>>>    #include <fcntl.h>
>>>    #include <inttypes.h>
>>> +#include <limits.h>
>>>    #include <math.h>
>>>    #include <sched.h>
>>>    #include <sys/socket.h>
>>> @@ -47,6 +48,8 @@
>>>    #define assert_within_epsilon(x, ref, tolerance) \
>>>        __assert_within_epsilon(x, ref, tolerance / 100., tolerance / 100.)
>>>    
>>> +#define BUFSZ 280
>>> +
>>>    #define MI_BATCH_BUFFER_START (0x31 << 23)
>>>    #define MI_BATCH_BUFFER_END (0xa << 23)
>>>    #define MI_ARB_CHECK (0x5 << 23)
>>> @@ -75,7 +78,7 @@ static void pidname(int i915, int clients)
>>>    {
>>>        struct dirent *de;
>>>        int sv[2], rv[2];
>>> -     char buf[280];
>>> +     char buf[BUFSZ];
>>>        int me = -1;
>>>        long count;
>>>        pid_t pid;
>>> @@ -180,7 +183,7 @@ static long count_clients(int clients)
>>>    {
>>>        struct dirent *de;
>>>        long count = 0;
>>> -     char buf[280];
>>> +     char buf[BUFSZ];
>>>        DIR *dir;
>>>    
>>>        dir = fdopendir(dup(clients));
>>> @@ -229,32 +232,113 @@ static void create(int i915, int clients)
>>>        igt_assert_eq(count_clients(clients), 1);
>>>    }
>>>    
>>> +static const char *find_client(int clients, pid_t pid, char *buf)
>>> +{
>>> +     DIR *dir = fdopendir(dup(clients));
>>> +
>>> +     /* Reading a dir as it changes does not appear to be stable, SEP */
>>
>> Oh yes, it is POSIX undefined as far as I could figure out. You are
>> creating/destroying clients while reading the dir from different
>> threads?
> 
> Different processes on different directory fd. man readdir(3) does say
> that glibc is threadsafe, but that is not a requirement of POSIX. The
> problem we are seeing is that the directory contents are not stable
> around the readdir loop as clients are being created/destroyed, causing
> dirents to be missed.
> 
> stracing the problem shows that glibc used a 32KiB buffer for getdents
> and only a single syscall was required to grab the entire directory. No
> errors were seen before the missed dirent. It just seemed to be missing.
> 
> Afaict there is no delay in creating the sysfs entry, and I think there
> should also be no delay in creating the kernfs node and inodes, so my
> initial assumption is that it's something not quite happy in the
> kernfs directory that shows up under stress. A second loop ran for a
> long time without incident locally, but CI seems much more adept at
> finding it.

I think we are talking about the same thing..

>> I think best might be to introduce explicit sync points between
>> those two entities to make this reliable. In another review for the same
>> problem I suggested pipes for implementing this handshake. Along the
>> lines of "let child open/close some processes, make it wait for parent;
>> parent scans sysfs, signals child to open/close some more; repeat for a
>> while".
> 
> Sadly, I don't expect userspace to do that :)
> 
> I do expect userspace to search for its own client/ upon creation, and I
> expect there to be many clients being opened at once. So I think so long
> as the use of the library readdir is correct (distinct processes with
> distinct directory fd, so I think there's no accidental sharing) this
> represents. Hmm. fsync(dirfd).

...and I think it is just not guaranteed that reading the dir in 
parallel to dentries being created/deleted is guaranteed to work (no 
guarantee for discovering newly addded or deleted clients since the 
iteration started). Or you are saying that a client opens itself starts 
readdir loop and then fails to find itself?

Otherwise, I was thinking, if we wanted to allow clients to inspect 
themselves, we should really add a getrusage(2) like ioctl.

Regards,

Tvrtko
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

  parent reply	other threads:[~2021-01-27  9:12 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-26 13:05 [Intel-gfx] [PATCH i-g-t 1/2] i915/sysfs_client: Ignore clients being closed as we read their sysfs Chris Wilson
2021-01-26 13:05 ` [igt-dev] " Chris Wilson
2021-01-26 13:05 ` [Intel-gfx] [PATCH i-g-t 2/2] i915/sysfs_clients: Check that client ids are cyclic Chris Wilson
2021-01-26 13:05   ` [igt-dev] " Chris Wilson
2021-01-26 16:47   ` [Intel-gfx] " Tvrtko Ursulin
2021-01-26 16:47     ` [igt-dev] " Tvrtko Ursulin
2021-01-26 22:04     ` [Intel-gfx] " Chris Wilson
2021-01-26 22:07       ` Chris Wilson
2021-01-26 22:07         ` [igt-dev] " Chris Wilson
2021-01-27  9:12       ` Tvrtko Ursulin [this message]
2021-01-27  9:12         ` Tvrtko Ursulin
2021-01-27  9:51         ` [Intel-gfx] " Chris Wilson
2021-01-27  9:51           ` [igt-dev] " Chris Wilson
2021-01-29  9:18   ` [Intel-gfx] " Tvrtko Ursulin
2021-01-29  9:52     ` Chris Wilson
2021-01-29  9:52       ` [igt-dev] " Chris Wilson
2021-01-29 18:30       ` [Intel-gfx] " Tvrtko Ursulin
2021-01-26 16:33 ` [Intel-gfx] [igt-dev] [PATCH i-g-t 1/2] i915/sysfs_client: Ignore clients being closed as we read their sysfs Tvrtko Ursulin
2021-01-26 18:06 ` [igt-dev] ✗ GitLab.Pipeline: failure for series starting with [i-g-t,1/2] " Patchwork
2021-01-26 18:18 ` [igt-dev] ✓ Fi.CI.BAT: success " Patchwork
2021-01-26 23:20 ` [igt-dev] ✓ Fi.CI.IGT: " Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2021-01-26 10:30 [Intel-gfx] [PATCH i-g-t 1/2] " Chris Wilson
2021-01-26 10:30 ` [Intel-gfx] [PATCH i-g-t 2/2] i915/sysfs_clients: Check that client ids are cyclic Chris Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5c3f792f-9b87-ba90-69ee-4c2c982d8aa5@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=igt-dev@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.