All of lore.kernel.org
 help / color / mirror / Atom feed
* Strange NFS/async write issue
@ 2013-07-02 17:02 William Dauchy
  2013-07-17  6:47 ` William Dauchy
  0 siblings, 1 reply; 2+ messages in thread
From: William Dauchy @ 2013-07-02 17:02 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: pascal, linux-nfs


[-- Attachment #1.1: Type: text/plain, Size: 3138 bytes --]

Hello Trond,

We wrote the attached tester to reproduce an issue we have with mysql on
a 3.9.8 kernel using a nfsv4 mount. 

mount options are "classic":

rw,nosuid,nodev,noexec,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,local_lock=none

When handling a temporary file, mysql open(), then writes a bunch (say ~16MB) of data
to that file, lseeks to the start and re-reads file contents.

With the code as attached, we see a lot of small NFS writes (1400 bytes) and
a lot of getattrs [1] and the test takes a little more than 2 seconds.

It seems the lseek/read are serialized to the last write actually sent to the filer - 
and nothing aggregates the big writes.

If we uncomment the fsync() call, or simply close() and reopen the file, the test
completes in a few milliseconds. Same if we call sync() from any process while this
test is running. Dump shows lots of big tcp pushes, few NFS commands/answers - no
getattr in between writes.

Is this an issue worth a dig or a known behaviour?
Can we debug this a bit further for you?

Thanks for your time,

[1]
(extract from the dump:
15:56:43.681766 IP 10.0.46.65.813 > 10.0.41.39.2049: Flags [P.], seq 14399980:14401384, ack 1038253, win 24576, options [nop,nop,TS val 928349822 ecr 259987202], length 1404
15:56:43.685189 IP 10.0.41.39.2049 > 10.0.46.65.813: Flags [.], ack 14399980, win 32715, options [nop,nop,TS val 259987203 ecr 928349822], length 0
15:56:43.689131 IP 10.0.41.39.2049 > 10.0.46.65.263281194: reply ok 132 getattr NON 3 ids 0/38 sz 0
15:56:43.689292 IP 10.0.46.65.280058410 > 10.0.41.39.2049: 2892 getattr fh 0,0/22
15:56:43.689330 IP 10.0.46.65.813 > 10.0.41.39.2049: Flags [P.], seq 14404280:14405684, ack 1038389, win 24576, options [nop,nop,TS val 928349824 ecr 259987203], length 1404
15:56:43.690127 IP 10.0.41.39.2049 > 10.0.46.65.813: Flags [.], ack 14404280, win 32715, options [nop,nop,TS val 259987203 ecr 928349824], length 0
15:56:43.690651 IP 10.0.41.39.2049 > 10.0.46.65.280058410: reply ok 132 getattr NON 3 ids 0/38 sz 0
15:56:43.690769 IP 10.0.46.65.296835626 > 10.0.41.39.2049: 2892 getattr fh 0,0/22
15:56:43.690782 IP 10.0.46.65.813 > 10.0.41.39.2049: Flags [P.], seq 14408580:14409984, ack 1038525, win 24576, options [nop,nop,TS val 928349824 ecr 259987203], length 1404
15:56:43.691496 IP 10.0.41.39.2049 > 10.0.46.65.813: Flags [.], ack 14408580, win 32715, options [nop,nop,TS val 259987203 ecr 928349824], length 0
15:56:43.692970 IP 10.0.41.39.2049 > 10.0.46.65.296835626: reply ok 132 getattr NON 3 ids 0/38 sz 0
15:56:43.693194 IP 10.0.46.65.313612842 > 10.0.41.39.2049: 2892 getattr fh 0,0/22
15:56:43.693249 IP 10.0.46.65.813 > 10.0.41.39.2049: Flags [P.], seq 14412880:14414284, ack 1038661, win 24576, options [nop,nop,TS val 928349825 ecr 259987204], length 1404
15:56:43.693395 IP 10.0.41.39.2049 > 10.0.46.65.813: Flags [.], ack 14412880, win 32715, options [nop,nop,TS val 259987204 ecr 928349825], length 0
15:56:43.693644 IP 10.0.41.39.2049 > 10.0.46.65.313612842: reply ok 132 getattr NON 3 ids 0/38 sz 0
)

-- 
William

[-- Attachment #1.2: nfssync.c --]
[-- Type: text/x-csrc, Size: 1213 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

int main(int argc, char **argv)
{
    int i, fd;
    time_t ts;
    char *fakebuf;

    if (argc < 2)
        exit(1);

    fd = open(argv[1], O_RDWR|O_CREAT|O_EXCL|O_TRUNC|O_NOFOLLOW, 0755);
    if (fd < 0) {
        perror("open");
        exit(1);
    }

    fakebuf = malloc(256*1024);
    if (fakebuf == NULL) {
        perror("malloc");
        exit(1);
    }

    ts = time(NULL);
    printf("writing..");
    memset(fakebuf, 'A', 256*1024);
    for (i = 0; i < 68; i++)
        write(fd, fakebuf, 256*1024);
    printf("done in %lu secs\n", time(NULL) - ts);

    ts = time(NULL);
    printf("not syncing..");
    // fsync(fd);
    printf("synced in %lu secs\n", time(NULL) - ts);

    printf("reading..");

    ts = time(NULL);
    if (fd < 0) {
        perror("open");
        exit(1);
    }

    // just mimic mysql stuff
    lseek(fd, 0, SEEK_CUR);
    lseek(fd, 0, SEEK_END);
    lseek(fd, 0, SEEK_SET);
    for (i = 0; i < 68; i++)
        read(fd, fakebuf, 256*1024);
    printf("done in %lu secs\n", time(NULL) - ts);

    return 0;
}

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Strange NFS/async write issue
  2013-07-02 17:02 Strange NFS/async write issue William Dauchy
@ 2013-07-17  6:47 ` William Dauchy
  0 siblings, 0 replies; 2+ messages in thread
From: William Dauchy @ 2013-07-17  6:47 UTC (permalink / raw)
  To: William Dauchy
  Cc: Trond Myklebust, Pascal Bouchareine, Linux NFS mailing list

Hi Trond,

On Tue, Jul 2, 2013 at 7:02 PM, William Dauchy <william@gandi.net> wrote:
> We wrote the attached tester to reproduce an issue we have with mysql on
> a 3.9.8 kernel using a nfsv4 mount.
>
> mount options are "classic":
>
> rw,nosuid,nodev,noexec,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,local_lock=none
>
> When handling a temporary file, mysql open(), then writes a bunch (say ~16MB) of data
> to that file, lseeks to the start and re-reads file contents.
>
> With the code as attached, we see a lot of small NFS writes (1400 bytes) and
> a lot of getattrs [1] and the test takes a little more than 2 seconds.
>
> It seems the lseek/read are serialized to the last write actually sent to the filer -
> and nothing aggregates the big writes.
>
> If we uncomment the fsync() call, or simply close() and reopen the file, the test
> completes in a few milliseconds. Same if we call sync() from any process while this
> test is running. Dump shows lots of big tcp pushes, few NFS commands/answers - no
> getattr in between writes.
>
> Is this an issue worth a dig or a known behaviour?
> Can we debug this a bit further for you?
>
> Thanks for your time,
>
> [1]
> (extract from the dump:
> 15:56:43.681766 IP 10.0.46.65.813 > 10.0.41.39.2049: Flags [P.], seq 14399980:14401384, ack 1038253, win 24576, options [nop,nop,TS val 928349822 ecr 259987202], length 1404
> 15:56:43.685189 IP 10.0.41.39.2049 > 10.0.46.65.813: Flags [.], ack 14399980, win 32715, options [nop,nop,TS val 259987203 ecr 928349822], length 0
> 15:56:43.689131 IP 10.0.41.39.2049 > 10.0.46.65.263281194: reply ok 132 getattr NON 3 ids 0/38 sz 0
> 15:56:43.689292 IP 10.0.46.65.280058410 > 10.0.41.39.2049: 2892 getattr fh 0,0/22
> 15:56:43.689330 IP 10.0.46.65.813 > 10.0.41.39.2049: Flags [P.], seq 14404280:14405684, ack 1038389, win 24576, options [nop,nop,TS val 928349824 ecr 259987203], length 1404
> 15:56:43.690127 IP 10.0.41.39.2049 > 10.0.46.65.813: Flags [.], ack 14404280, win 32715, options [nop,nop,TS val 259987203 ecr 928349824], length 0
> 15:56:43.690651 IP 10.0.41.39.2049 > 10.0.46.65.280058410: reply ok 132 getattr NON 3 ids 0/38 sz 0
> 15:56:43.690769 IP 10.0.46.65.296835626 > 10.0.41.39.2049: 2892 getattr fh 0,0/22
> 15:56:43.690782 IP 10.0.46.65.813 > 10.0.41.39.2049: Flags [P.], seq 14408580:14409984, ack 1038525, win 24576, options [nop,nop,TS val 928349824 ecr 259987203], length 1404
> 15:56:43.691496 IP 10.0.41.39.2049 > 10.0.46.65.813: Flags [.], ack 14408580, win 32715, options [nop,nop,TS val 259987203 ecr 928349824], length 0
> 15:56:43.692970 IP 10.0.41.39.2049 > 10.0.46.65.296835626: reply ok 132 getattr NON 3 ids 0/38 sz 0
> 15:56:43.693194 IP 10.0.46.65.313612842 > 10.0.41.39.2049: 2892 getattr fh 0,0/22
> 15:56:43.693249 IP 10.0.46.65.813 > 10.0.41.39.2049: Flags [P.], seq 14412880:14414284, ack 1038661, win 24576, options [nop,nop,TS val 928349825 ecr 259987204], length 1404
> 15:56:43.693395 IP 10.0.41.39.2049 > 10.0.46.65.813: Flags [.], ack 14412880, win 32715, options [nop,nop,TS val 259987204 ecr 928349825], length 0
> 15:56:43.693644 IP 10.0.41.39.2049 > 10.0.46.65.313612842: reply ok 132 getattr NON 3 ids 0/38 sz 0
> )

Any thoughts about that?

Best regards,
--
William

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-07-17  6:47 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-02 17:02 Strange NFS/async write issue William Dauchy
2013-07-17  6:47 ` William Dauchy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.