From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org Cc: igt-dev@lists.freedesktop.org Subject: Re: [Intel-gfx] [PATCH i-g-t 2/2] i915/sysfs_clients: Check that client ids are cyclic Date: Wed, 27 Jan 2021 09:12:00 +0000 [thread overview] Message-ID: <5c3f792f-9b87-ba90-69ee-4c2c982d8aa5@linux.intel.com> (raw) In-Reply-To: <161169867584.2943.4917889020493124389@build.alporthouse.com> On 26/01/2021 22:04, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2021-01-26 16:47:14) >> On 26/01/2021 13:05, Chris Wilson wrote: >>> The client id used is a cyclic allocator as that reduces the likelihood >>> of userspace seeing the same id used again (and so confusing the new >>> client as the old). Verify that each new client has an id greater than >>> the last. >>> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> >>> --- >>> tests/i915/sysfs_clients.c | 129 +++++++++++++++++++++++++++++++------ >>> 1 file changed, 108 insertions(+), 21 deletions(-) >>> >>> diff --git a/tests/i915/sysfs_clients.c b/tests/i915/sysfs_clients.c >>> index a3a1f81e1..d2c1ebc5f 100644 >>> --- a/tests/i915/sysfs_clients.c >>> +++ b/tests/i915/sysfs_clients.c >>> @@ -8,6 +8,7 @@ >>> #include <errno.h> >>> #include <fcntl.h> >>> #include <inttypes.h> >>> +#include <limits.h> >>> #include <math.h> >>> #include <sched.h> >>> #include <sys/socket.h> >>> @@ -47,6 +48,8 @@ >>> #define assert_within_epsilon(x, ref, tolerance) \ >>> __assert_within_epsilon(x, ref, tolerance / 100., tolerance / 100.) >>> >>> +#define BUFSZ 280 >>> + >>> #define MI_BATCH_BUFFER_START (0x31 << 23) >>> #define MI_BATCH_BUFFER_END (0xa << 23) >>> #define MI_ARB_CHECK (0x5 << 23) >>> @@ -75,7 +78,7 @@ static void pidname(int i915, int clients) >>> { >>> struct dirent *de; >>> int sv[2], rv[2]; >>> - char buf[280]; >>> + char buf[BUFSZ]; >>> int me = -1; >>> long count; >>> pid_t pid; >>> @@ -180,7 +183,7 @@ static long count_clients(int clients) >>> { >>> struct dirent *de; >>> long count = 0; >>> - char buf[280]; >>> + char buf[BUFSZ]; >>> DIR *dir; >>> >>> dir = fdopendir(dup(clients)); >>> @@ -229,32 +232,113 @@ static void create(int i915, int clients) >>> igt_assert_eq(count_clients(clients), 1); >>> } >>> >>> +static const char *find_client(int clients, pid_t pid, char *buf) >>> +{ >>> + DIR *dir = fdopendir(dup(clients)); >>> + >>> + /* Reading a dir as it changes does not appear to be stable, SEP */ >> >> Oh yes, it is POSIX undefined as far as I could figure out. You are >> creating/destroying clients while reading the dir from different >> threads? > > Different processes on different directory fd. man readdir(3) does say > that glibc is threadsafe, but that is not a requirement of POSIX. The > problem we are seeing is that the directory contents are not stable > around the readdir loop as clients are being created/destroyed, causing > dirents to be missed. > > stracing the problem shows that glibc used a 32KiB buffer for getdents > and only a single syscall was required to grab the entire directory. No > errors were seen before the missed dirent. It just seemed to be missing. > > Afaict there is no delay in creating the sysfs entry, and I think there > should also be no delay in creating the kernfs node and inodes, so my > initial assumption is that it's something not quite happy in the > kernfs directory that shows up under stress. A second loop ran for a > long time without incident locally, but CI seems much more adept at > finding it. I think we are talking about the same thing.. >> I think best might be to introduce explicit sync points between >> those two entities to make this reliable. In another review for the same >> problem I suggested pipes for implementing this handshake. Along the >> lines of "let child open/close some processes, make it wait for parent; >> parent scans sysfs, signals child to open/close some more; repeat for a >> while". > > Sadly, I don't expect userspace to do that :) > > I do expect userspace to search for its own client/ upon creation, and I > expect there to be many clients being opened at once. So I think so long > as the use of the library readdir is correct (distinct processes with > distinct directory fd, so I think there's no accidental sharing) this > represents. Hmm. fsync(dirfd). ...and I think it is just not guaranteed that reading the dir in parallel to dentries being created/deleted is guaranteed to work (no guarantee for discovering newly addded or deleted clients since the iteration started). Or you are saying that a client opens itself starts readdir loop and then fails to find itself? Otherwise, I was thinking, if we wanted to allow clients to inspect themselves, we should really add a getrusage(2) like ioctl. Regards, Tvrtko _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
WARNING: multiple messages have this Message-ID (diff)
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org Cc: igt-dev@lists.freedesktop.org Subject: Re: [igt-dev] [PATCH i-g-t 2/2] i915/sysfs_clients: Check that client ids are cyclic Date: Wed, 27 Jan 2021 09:12:00 +0000 [thread overview] Message-ID: <5c3f792f-9b87-ba90-69ee-4c2c982d8aa5@linux.intel.com> (raw) In-Reply-To: <161169867584.2943.4917889020493124389@build.alporthouse.com> On 26/01/2021 22:04, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2021-01-26 16:47:14) >> On 26/01/2021 13:05, Chris Wilson wrote: >>> The client id used is a cyclic allocator as that reduces the likelihood >>> of userspace seeing the same id used again (and so confusing the new >>> client as the old). Verify that each new client has an id greater than >>> the last. >>> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> >>> --- >>> tests/i915/sysfs_clients.c | 129 +++++++++++++++++++++++++++++++------ >>> 1 file changed, 108 insertions(+), 21 deletions(-) >>> >>> diff --git a/tests/i915/sysfs_clients.c b/tests/i915/sysfs_clients.c >>> index a3a1f81e1..d2c1ebc5f 100644 >>> --- a/tests/i915/sysfs_clients.c >>> +++ b/tests/i915/sysfs_clients.c >>> @@ -8,6 +8,7 @@ >>> #include <errno.h> >>> #include <fcntl.h> >>> #include <inttypes.h> >>> +#include <limits.h> >>> #include <math.h> >>> #include <sched.h> >>> #include <sys/socket.h> >>> @@ -47,6 +48,8 @@ >>> #define assert_within_epsilon(x, ref, tolerance) \ >>> __assert_within_epsilon(x, ref, tolerance / 100., tolerance / 100.) >>> >>> +#define BUFSZ 280 >>> + >>> #define MI_BATCH_BUFFER_START (0x31 << 23) >>> #define MI_BATCH_BUFFER_END (0xa << 23) >>> #define MI_ARB_CHECK (0x5 << 23) >>> @@ -75,7 +78,7 @@ static void pidname(int i915, int clients) >>> { >>> struct dirent *de; >>> int sv[2], rv[2]; >>> - char buf[280]; >>> + char buf[BUFSZ]; >>> int me = -1; >>> long count; >>> pid_t pid; >>> @@ -180,7 +183,7 @@ static long count_clients(int clients) >>> { >>> struct dirent *de; >>> long count = 0; >>> - char buf[280]; >>> + char buf[BUFSZ]; >>> DIR *dir; >>> >>> dir = fdopendir(dup(clients)); >>> @@ -229,32 +232,113 @@ static void create(int i915, int clients) >>> igt_assert_eq(count_clients(clients), 1); >>> } >>> >>> +static const char *find_client(int clients, pid_t pid, char *buf) >>> +{ >>> + DIR *dir = fdopendir(dup(clients)); >>> + >>> + /* Reading a dir as it changes does not appear to be stable, SEP */ >> >> Oh yes, it is POSIX undefined as far as I could figure out. You are >> creating/destroying clients while reading the dir from different >> threads? > > Different processes on different directory fd. man readdir(3) does say > that glibc is threadsafe, but that is not a requirement of POSIX. The > problem we are seeing is that the directory contents are not stable > around the readdir loop as clients are being created/destroyed, causing > dirents to be missed. > > stracing the problem shows that glibc used a 32KiB buffer for getdents > and only a single syscall was required to grab the entire directory. No > errors were seen before the missed dirent. It just seemed to be missing. > > Afaict there is no delay in creating the sysfs entry, and I think there > should also be no delay in creating the kernfs node and inodes, so my > initial assumption is that it's something not quite happy in the > kernfs directory that shows up under stress. A second loop ran for a > long time without incident locally, but CI seems much more adept at > finding it. I think we are talking about the same thing.. >> I think best might be to introduce explicit sync points between >> those two entities to make this reliable. In another review for the same >> problem I suggested pipes for implementing this handshake. Along the >> lines of "let child open/close some processes, make it wait for parent; >> parent scans sysfs, signals child to open/close some more; repeat for a >> while". > > Sadly, I don't expect userspace to do that :) > > I do expect userspace to search for its own client/ upon creation, and I > expect there to be many clients being opened at once. So I think so long > as the use of the library readdir is correct (distinct processes with > distinct directory fd, so I think there's no accidental sharing) this > represents. Hmm. fsync(dirfd). ...and I think it is just not guaranteed that reading the dir in parallel to dentries being created/deleted is guaranteed to work (no guarantee for discovering newly addded or deleted clients since the iteration started). Or you are saying that a client opens itself starts readdir loop and then fails to find itself? Otherwise, I was thinking, if we wanted to allow clients to inspect themselves, we should really add a getrusage(2) like ioctl. Regards, Tvrtko _______________________________________________ igt-dev mailing list igt-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/igt-dev
next prev parent reply other threads:[~2021-01-27 9:12 UTC|newest] Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-01-26 13:05 [Intel-gfx] [PATCH i-g-t 1/2] i915/sysfs_client: Ignore clients being closed as we read their sysfs Chris Wilson 2021-01-26 13:05 ` [igt-dev] " Chris Wilson 2021-01-26 13:05 ` [Intel-gfx] [PATCH i-g-t 2/2] i915/sysfs_clients: Check that client ids are cyclic Chris Wilson 2021-01-26 13:05 ` [igt-dev] " Chris Wilson 2021-01-26 16:47 ` [Intel-gfx] " Tvrtko Ursulin 2021-01-26 16:47 ` [igt-dev] " Tvrtko Ursulin 2021-01-26 22:04 ` [Intel-gfx] " Chris Wilson 2021-01-26 22:07 ` Chris Wilson 2021-01-26 22:07 ` [igt-dev] " Chris Wilson 2021-01-27 9:12 ` Tvrtko Ursulin [this message] 2021-01-27 9:12 ` Tvrtko Ursulin 2021-01-27 9:51 ` [Intel-gfx] " Chris Wilson 2021-01-27 9:51 ` [igt-dev] " Chris Wilson 2021-01-29 9:18 ` [Intel-gfx] " Tvrtko Ursulin 2021-01-29 9:52 ` Chris Wilson 2021-01-29 9:52 ` [igt-dev] " Chris Wilson 2021-01-29 18:30 ` [Intel-gfx] " Tvrtko Ursulin 2021-01-26 16:33 ` [Intel-gfx] [igt-dev] [PATCH i-g-t 1/2] i915/sysfs_client: Ignore clients being closed as we read their sysfs Tvrtko Ursulin 2021-01-26 18:06 ` [igt-dev] ✗ GitLab.Pipeline: failure for series starting with [i-g-t,1/2] " Patchwork 2021-01-26 18:18 ` [igt-dev] ✓ Fi.CI.BAT: success " Patchwork 2021-01-26 23:20 ` [igt-dev] ✓ Fi.CI.IGT: " Patchwork -- strict thread matches above, loose matches on Subject: below -- 2021-01-26 10:30 [Intel-gfx] [PATCH i-g-t 1/2] " Chris Wilson 2021-01-26 10:30 ` [Intel-gfx] [PATCH i-g-t 2/2] i915/sysfs_clients: Check that client ids are cyclic Chris Wilson
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=5c3f792f-9b87-ba90-69ee-4c2c982d8aa5@linux.intel.com \ --to=tvrtko.ursulin@linux.intel.com \ --cc=chris@chris-wilson.co.uk \ --cc=igt-dev@lists.freedesktop.org \ --cc=intel-gfx@lists.freedesktop.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.