All of lore.kernel.org
 help / color / mirror / Atom feed
* getxattr() on cifs sometimes hangs since kernel 5.14
@ 2022-05-17 20:48 Forest
  2022-05-18  3:18 ` ronnie sahlberg
       [not found] ` <CAH2r5muJYFQ7FutNP_WWCHPE+dDSi6=_x27P81+FN7QGQKyzFA@mail.gmail.com>
  0 siblings, 2 replies; 4+ messages in thread
From: Forest @ 2022-05-17 20:48 UTC (permalink / raw)
  To: linux-cifs

When running on recent kernel versions, this system call on a cifs-mounted
file sometimes takes an unusually long time:

getxattr("/cifsmount/dir/image.jpg", "user.baloo.rating", NULL, 0)

The call normally returns in under 10 milliseconds, but on kernel 5.14+, it
sometimes takes over 30 seconds with no significant client or server load.

Discovered while using gwenview to browse 100+ 1.5 MiB images on a samba share
mounted via /etc/fstab. While quickly flipping through the images, the problem
often occurs within 20 seconds. Gwenview freezes until the call completes.

Client:
  kernel versions 5.14 and later
  mount.cifs 6.11
  Gwenview 20.12.3
  Debian Bullseye
  4-core amd64
Server:
  Samba 4.13.13-Debian
  Debian Bullseye
  6-core arm64 

A git bisect identified kernel commit 9e992755be8f as the problematic change.
The problem does not occur when any of the following are true:
- Client is running a kernel from before that commit.
- The nouser_xattr mount option is used on the cifs share.
- Gwenview accesses the files via smb:// URL instead of a cifs mount.

I don't know Gwenview's internals, but using its strace output as a guide, I
have written a potential reproducer. It succeeds at triggering slow getxattr()
calls, though not nearly as slow as those triggered by Gwenview. I can post it
if that would be helpful.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: getxattr() on cifs sometimes hangs since kernel 5.14
  2022-05-17 20:48 getxattr() on cifs sometimes hangs since kernel 5.14 Forest
@ 2022-05-18  3:18 ` ronnie sahlberg
  2022-07-15 21:29   ` Forest
       [not found] ` <CAH2r5muJYFQ7FutNP_WWCHPE+dDSi6=_x27P81+FN7QGQKyzFA@mail.gmail.com>
  1 sibling, 1 reply; 4+ messages in thread
From: ronnie sahlberg @ 2022-05-18  3:18 UTC (permalink / raw)
  To: Forest; +Cc: linux-cifs

On Wed, 18 May 2022 at 13:15, Forest <forestix@sonic.net> wrote:
>
> When running on recent kernel versions, this system call on a cifs-mounted
> file sometimes takes an unusually long time:
>
> getxattr("/cifsmount/dir/image.jpg", "user.baloo.rating", NULL, 0)
>
> The call normally returns in under 10 milliseconds, but on kernel 5.14+, it
> sometimes takes over 30 seconds with no significant client or server load.
>
> Discovered while using gwenview to browse 100+ 1.5 MiB images on a samba share
> mounted via /etc/fstab. While quickly flipping through the images, the problem
> often occurs within 20 seconds. Gwenview freezes until the call completes.
>
> Client:
>   kernel versions 5.14 and later
>   mount.cifs 6.11
>   Gwenview 20.12.3
>   Debian Bullseye
>   4-core amd64
> Server:
>   Samba 4.13.13-Debian
>   Debian Bullseye
>   6-core arm64
>
> A git bisect identified kernel commit 9e992755be8f as the problematic change.
> The problem does not occur when any of the following are true:
> - Client is running a kernel from before that commit.
> - The nouser_xattr mount option is used on the cifs share.
> - Gwenview accesses the files via smb:// URL instead of a cifs mount.
>
> I don't know Gwenview's internals, but using its strace output as a guide, I
> have written a potential reproducer. It succeeds at triggering slow getxattr()
> calls, though not nearly as slow as those triggered by Gwenview. I can post it
> if that would be helpful.


Please post the reproducer. It will be useful for testing as well as
verifying if a potential fix.
If the reproducer is simple enough we might add it to our buildbot.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: getxattr() on cifs sometimes hangs since kernel 5.14
       [not found] ` <CAH2r5muJYFQ7FutNP_WWCHPE+dDSi6=_x27P81+FN7QGQKyzFA@mail.gmail.com>
@ 2022-05-18  3:56   ` Forest
  0 siblings, 0 replies; 4+ messages in thread
From: Forest @ 2022-05-18  3:56 UTC (permalink / raw)
  To: Steve French; +Cc: Paulo Alcantara, ronnie sahlberg, linux-cifs

/*
Attempt to reproduce a cifs xattr problem from kernel commit 9e992755be8f.

When running on recent kernel versions, this system call on a cifs-mounted
file sometimes takes an unusually long time:

getxattr("/cifsmount/dir/image.jpg", "user.baloo.rating", NULL, 0)

The call normally returns in under 10 milliseconds, but on kernel 5.14+, it
sometimes takes over 30 seconds with no significant client or server load.

Discovered while using gwenview to browse 100+ 1.5 MiB images on a samba share
mounted via /etc/fstab. While quickly flipping through the images, the problem
often occurs within 20 seconds. Gwenview freezes until the call completes.

Client:
  kernel versions 5.14 and later
  mount.cifs 6.11
  Gwenview 20.12.3
  Debian Bullseye
  4-core amd64
Server:
  Samba 4.13.13-Debian
  Debian Bullseye
  6-core arm64 

A git bisect identified kernel commit 9e992755be8f as the problematic change.
The problem does not occur when any of the following are true:
- Client is running a kernel from before that commit.
- The nouser_xattr mount option is used on the cifs share.
- Gwenview accesses the files via smb:// URL instead of a cifs mount.

This program tries to reproduce the problem by making system calls seen in
strace output from a stuck gwenview instance. It expects its arguments to be
file paths on a cifs mount. It will loop over the named files, applying the
system calls to each one in sequence. The -i option is available to run
several iterations of the loop. For example, with -i 2 and 10 files, the system
calls will be made 20 times. This normally completes quickly.

The -t option runs the same loop in multiple threads, which seems to trigger
the problem: getxattr() takes over 100 times as long when more than one thread
is running.

Curiously, the call never seems to be as slow in this reproducer (~1 second) as
it sometimes is in gwenview (30+ seconds), so perhaps this code does not model
gwenview's triggering behavior well. Nevertheless, it reproduces a significant
delay under the same conditions, so it might still help track down the problem.

Build with:
gcc -pthread

*/

#include <alloca.h>
#include <fcntl.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/xattr.h>
#include <unistd.h>


int test_file(char *path)
    {
    int fd;

    fd = openat(AT_FDCWD, path, O_RDONLY);
    if (fd == -1)
        {
        perror("openat");
        return -1;
        }
    close(fd);
    getxattr(path, "user.baloo.rating", NULL, 0); /* sometimes slow */

    return 0;
    }


int test_files(char **paths)
    {
    for (; *paths; paths++)
        if (test_file(*paths))
            return -1;
    return 0;
    }


int test_files_repeatedly(char **paths, int itercount)
    {
    while (itercount--)
        if (test_files(paths))
            return -1;
    return 0;
    }


struct thread_params
    {
    char **paths;
    int itercount;
    };


void *thread_main(void *thread_arg)
    {
    struct thread_params params = *(struct thread_params *)thread_arg;

    while (params.itercount--)
        if (test_files(params.paths))
            return "failure in test thread";

    return 0;
    }


int test_files_threaded(char **paths, int itercount, int threadcount)
    {
    struct thread_params params = {paths, itercount};
    pthread_t *threadids;
    int i;

    threadcount--; /* the main thread will do one thread's work */

    threadids = alloca(sizeof(*threadids) * threadcount);

    for (i = 0; i < threadcount; i++)
        if (pthread_create(&threadids[i], NULL, thread_main, &params))
            {
            printf("pthread_create failed\n");
            return -1;
            }

    /* do one thread's work in the main thread */
    if (test_files_repeatedly(paths, itercount))
        {
        printf("failure in main thread");
        return -1;
        }

    for (i = 0; i < threadcount; i++)
        {
        void *thread_result;
        if (pthread_join(threadids[i], &thread_result))
            {
            printf("pthread_join failed\n");
            return -1;
            }
        if (thread_result)
            {
            printf("%s\n", (char *)thread_result);
            return -1;
            }
        }

    return 0;
    }


void usage(const char *cmd)
    {
    printf("usage: %s [-i iterations] [-t threads] <files>\n", cmd);
    }


int main(int argc, char *argv[])
    {
    int itercount = 1, threadcount=1, opt;
    char **paths;

    while ((opt = getopt(argc, argv, "i:t:h")) != -1)
        {
        switch (opt)
            {
            case 'i':
                itercount = atoi(optarg);
                break;
            case 't':
                threadcount = atoi(optarg);
                break;
            default:
                usage(argv[0]);
                return 2;
            }
        }
    if (optind == argc)
        {
        usage(argv[0]);
        return 2;
        }
    paths = &argv[optind];

    return test_files_threaded(paths, itercount, threadcount);
    }

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: getxattr() on cifs sometimes hangs since kernel 5.14
  2022-05-18  3:18 ` ronnie sahlberg
@ 2022-07-15 21:29   ` Forest
  0 siblings, 0 replies; 4+ messages in thread
From: Forest @ 2022-07-15 21:29 UTC (permalink / raw)
  To: linux-cifs; +Cc: ronnie sahlberg, Steve French, Paulo Alcantara

On Wed, 18 May 2022 13:18:02 +1000, ronnie sahlberg wrote:

>Please post the reproducer. It will be useful for testing as well as
>verifying if a potential fix.

I sent the reproducer to you guys back in May, but forgot to cc: the list.
There is now a report in bugzilla, with the reproducer attached:

https://bugzilla.samba.org/show_bug.cgi?id=15123

I'm dropping off the mailing list, but updates to the bug report should
still reach me.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-07-15 21:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-17 20:48 getxattr() on cifs sometimes hangs since kernel 5.14 Forest
2022-05-18  3:18 ` ronnie sahlberg
2022-07-15 21:29   ` Forest
     [not found] ` <CAH2r5muJYFQ7FutNP_WWCHPE+dDSi6=_x27P81+FN7QGQKyzFA@mail.gmail.com>
2022-05-18  3:56   ` Forest

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.