All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 01/11] tests/virtio-9p: add terminating null in v9fs_string_read()
  2020-01-13 23:18 [PATCH v3 00/11] 9pfs: readdir optimization Christian Schoenebeck
@ 2020-01-13 22:20 ` Christian Schoenebeck
  2020-01-13 22:21 ` [PATCH v3 02/11] 9pfs: require msize >= 4096 Christian Schoenebeck
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Christian Schoenebeck @ 2020-01-13 22:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: Greg Kurz

The 9p protocol sends strings in general without null termination
over the wire. However for future use of this functions it is
beneficial for the delivered string to be null terminated though
for being able to use the string with standard C functions which
often rely on strings being null terminated.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
---
 tests/virtio-9p-test.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/virtio-9p-test.c b/tests/virtio-9p-test.c
index e7b58e3a0c..06263edb53 100644
--- a/tests/virtio-9p-test.c
+++ b/tests/virtio-9p-test.c
@@ -130,8 +130,9 @@ static void v9fs_string_read(P9Req *req, uint16_t *len, char **string)
         *len = local_len;
     }
     if (string) {
-        *string = g_malloc(local_len);
+        *string = g_malloc(local_len + 1);
         v9fs_memread(req, *string, local_len);
+        (*string)[local_len] = 0;
     } else {
         v9fs_memskip(req, local_len);
     }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 02/11] 9pfs: require msize >= 4096
  2020-01-13 23:18 [PATCH v3 00/11] 9pfs: readdir optimization Christian Schoenebeck
  2020-01-13 22:20 ` [PATCH v3 01/11] tests/virtio-9p: add terminating null in v9fs_string_read() Christian Schoenebeck
@ 2020-01-13 22:21 ` Christian Schoenebeck
  2020-01-16 13:15   ` Greg Kurz
  2020-01-13 22:22 ` [PATCH v3 03/11] 9pfs: validate count sent by client with T_readdir Christian Schoenebeck
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 25+ messages in thread
From: Christian Schoenebeck @ 2020-01-13 22:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: Greg Kurz

A client establishes a session by sending a Tversion request along with
a 'msize' parameter which client uses to suggest server a maximum
message size ever to be used for communication (for both requests and
replies) between client and server during that session. If client
suggests a 'msize' smaller than 4096 then deny session by server
immediately with an error response (Rlerror for "9P2000.L" clients or
Rerror for "9P2000.u" clients) instead of replying with Rversion.

Introduction of a minimum msize is required to prevent issues in
responses to numerous individual request types. For instance with a
msize of < P9_IOHDRSZ no useful operations would be possible at all.
Furthermore there are some responses which are not allowed by the 9p
protocol to be truncated like e.g. Rreadlink which may yield up to
a size of PATH_MAX which is usually 4096. Hence this size was chosen
as min. msize for server, which is already the minimum msize of the
Linux kernel's 9pfs client. By forcing a min. msize already at
session start (when handling Tversion) we don't have to check for a
minimum msize on a per request type basis later on during session,
which would be much harder and more error prone to maintain.

This is a user visible change which should be documented as such
(i.e. in public QEMU release changelog).

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
---
 hw/9pfs/9p.c | 12 ++++++++++++
 hw/9pfs/9p.h | 11 +++++++++++
 2 files changed, 23 insertions(+)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 520177f40c..a5fbe821d4 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -1363,8 +1363,20 @@ static void coroutine_fn v9fs_version(void *opaque)
         s->proto_version = V9FS_PROTO_2000L;
     } else {
         v9fs_string_sprintf(&version, "unknown");
+        /* skip min. msize check, reporting invalid version has priority */
+        goto marshal;
     }
 
+    if (s->msize < P9_MIN_MSIZE) {
+        err = -EMSGSIZE;
+        error_report(
+            "9pfs: Client requested msize < minimum msize ("
+            stringify(P9_MIN_MSIZE) ") supported by this server."
+        );
+        goto out;
+    }
+
+marshal:
     err = pdu_marshal(pdu, offset, "ds", s->msize, &version);
     if (err < 0) {
         goto out;
diff --git a/hw/9pfs/9p.h b/hw/9pfs/9p.h
index 3904f82901..6fffe44f5a 100644
--- a/hw/9pfs/9p.h
+++ b/hw/9pfs/9p.h
@@ -100,6 +100,17 @@ typedef enum P9ProtoVersion {
     V9FS_PROTO_2000L = 0x02,
 } P9ProtoVersion;
 
+/**
+ * @brief Minimum message size supported by this 9pfs server.
+ *
+ * A client establishes a session by sending a Tversion request along with a
+ * 'msize' parameter which suggests the server a maximum message size ever to be
+ * used for communication (for both requests and replies) between client and
+ * server during that session. If client suggests a 'msize' smaller than this
+ * value then session is denied by server with an error response.
+ */
+#define P9_MIN_MSIZE    4096
+
 #define P9_NOTAG    UINT16_MAX
 #define P9_NOFID    UINT32_MAX
 #define P9_MAXWELEM 16
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 03/11] 9pfs: validate count sent by client with T_readdir
  2020-01-13 23:18 [PATCH v3 00/11] 9pfs: readdir optimization Christian Schoenebeck
  2020-01-13 22:20 ` [PATCH v3 01/11] tests/virtio-9p: add terminating null in v9fs_string_read() Christian Schoenebeck
  2020-01-13 22:21 ` [PATCH v3 02/11] 9pfs: require msize >= 4096 Christian Schoenebeck
@ 2020-01-13 22:22 ` Christian Schoenebeck
  2020-01-16 13:33   ` Greg Kurz
  2020-01-13 22:23 ` [PATCH v3 04/11] hw/9pfs/9p-synth: added directory for readdir test Christian Schoenebeck
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 25+ messages in thread
From: Christian Schoenebeck @ 2020-01-13 22:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: Greg Kurz

A good 9p client sends T_readdir with "count" parameter that's
sufficiently smaller than client's initially negotiated msize
(maximum message size). We perform a check for that though to
avoid the server to be interrupted with a "Failed to encode
VirtFS reply type 41" error message by bad clients.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
---
 hw/9pfs/9p.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index a5fbe821d4..30da2fedf3 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -2426,6 +2426,7 @@ static void coroutine_fn v9fs_readdir(void *opaque)
     int32_t count;
     uint32_t max_count;
     V9fsPDU *pdu = opaque;
+    V9fsState *s = pdu->s;
 
     retval = pdu_unmarshal(pdu, offset, "dqd", &fid,
                            &initial_offset, &max_count);
@@ -2434,6 +2435,13 @@ static void coroutine_fn v9fs_readdir(void *opaque)
     }
     trace_v9fs_readdir(pdu->tag, pdu->id, fid, initial_offset, max_count);
 
+    if (max_count > s->msize - P9_IOHDRSZ) {
+        max_count = s->msize - P9_IOHDRSZ;
+        warn_report_once(
+            "9p: bad client: T_readdir with count > msize - P9_IOHDRSZ"
+        );
+    }
+
     fidp = get_fid(pdu, fid);
     if (fidp == NULL) {
         retval = -EINVAL;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 04/11] hw/9pfs/9p-synth: added directory for readdir test
  2020-01-13 23:18 [PATCH v3 00/11] 9pfs: readdir optimization Christian Schoenebeck
                   ` (2 preceding siblings ...)
  2020-01-13 22:22 ` [PATCH v3 03/11] 9pfs: validate count sent by client with T_readdir Christian Schoenebeck
@ 2020-01-13 22:23 ` Christian Schoenebeck
  2020-01-13 23:08 ` [PATCH v3 05/11] tests/virtio-9p: added " Christian Schoenebeck
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Christian Schoenebeck @ 2020-01-13 22:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: Greg Kurz

This will provide the following virtual files by the 9pfs
synth driver:

  - /ReadDirDir/ReadDirFile99
  - /ReadDirDir/ReadDirFile98
  ...
  - /ReadDirDir/ReadDirFile1
  - /ReadDirDir/ReadDirFile0

This virtual directory and its virtual 100 files will be
used by the upcoming 9pfs readdir tests.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
---
 hw/9pfs/9p-synth.c | 19 +++++++++++++++++++
 hw/9pfs/9p-synth.h |  5 +++++
 2 files changed, 24 insertions(+)

diff --git a/hw/9pfs/9p-synth.c b/hw/9pfs/9p-synth.c
index 54239c9bbf..7eb210ffa8 100644
--- a/hw/9pfs/9p-synth.c
+++ b/hw/9pfs/9p-synth.c
@@ -578,6 +578,25 @@ static int synth_init(FsContext *ctx, Error **errp)
                                        NULL, v9fs_synth_qtest_flush_write,
                                        ctx);
         assert(!ret);
+
+        /* Directory for READDIR test */
+        {
+            V9fsSynthNode *dir = NULL;
+            ret = qemu_v9fs_synth_mkdir(
+                NULL, 0700, QTEST_V9FS_SYNTH_READDIR_DIR, &dir
+            );
+            assert(!ret);
+            for (i = 0; i < QTEST_V9FS_SYNTH_READDIR_NFILES; ++i) {
+                char *name = g_strdup_printf(
+                    QTEST_V9FS_SYNTH_READDIR_FILE, i
+                );
+                ret = qemu_v9fs_synth_add_file(
+                    dir, 0, name, NULL, NULL, ctx
+                );
+                assert(!ret);
+                g_free(name);
+            }
+        }
     }
 
     return 0;
diff --git a/hw/9pfs/9p-synth.h b/hw/9pfs/9p-synth.h
index af7a993a1e..036d7e4a5b 100644
--- a/hw/9pfs/9p-synth.h
+++ b/hw/9pfs/9p-synth.h
@@ -55,6 +55,11 @@ int qemu_v9fs_synth_add_file(V9fsSynthNode *parent, int mode,
 #define QTEST_V9FS_SYNTH_LOPEN_FILE "LOPEN"
 #define QTEST_V9FS_SYNTH_WRITE_FILE "WRITE"
 
+/* for READDIR test */
+#define QTEST_V9FS_SYNTH_READDIR_DIR "ReadDirDir"
+#define QTEST_V9FS_SYNTH_READDIR_FILE "ReadDirFile%d"
+#define QTEST_V9FS_SYNTH_READDIR_NFILES 100
+
 /* Any write to the "FLUSH" file is handled one byte at a time by the
  * backend. If the byte is zero, the backend returns success (ie, 1),
  * otherwise it forces the server to try again forever. Thus allowing
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 05/11] tests/virtio-9p: added readdir test
  2020-01-13 23:18 [PATCH v3 00/11] 9pfs: readdir optimization Christian Schoenebeck
                   ` (3 preceding siblings ...)
  2020-01-13 22:23 ` [PATCH v3 04/11] hw/9pfs/9p-synth: added directory for readdir test Christian Schoenebeck
@ 2020-01-13 23:08 ` Christian Schoenebeck
  2020-01-17 15:51   ` Greg Kurz
  2020-01-13 23:11 ` [PATCH v3 06/11] tests/virtio-9p: added splitted " Christian Schoenebeck
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 25+ messages in thread
From: Christian Schoenebeck @ 2020-01-13 23:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Greg Kurz

The first readdir test simply checks the amount of directory
entries returned by 9pfs server, according to the created amount
of virtual files on 9pfs synth driver side. Then the subsequent
readdir test also checks whether all directory entries have the
expected file names (as created on 9pfs synth driver side),
ignoring their precise order in result list though.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
---
 tests/virtio-9p-test.c | 149 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 149 insertions(+)

diff --git a/tests/virtio-9p-test.c b/tests/virtio-9p-test.c
index 06263edb53..721f13c1fb 100644
--- a/tests/virtio-9p-test.c
+++ b/tests/virtio-9p-test.c
@@ -68,6 +68,11 @@ static void v9fs_memread(P9Req *req, void *addr, size_t len)
     req->r_off += len;
 }
 
+static void v9fs_uint8_read(P9Req *req, uint8_t *val)
+{
+    v9fs_memread(req, val, 1);
+}
+
 static void v9fs_uint16_write(P9Req *req, uint16_t val)
 {
     uint16_t le_val = cpu_to_le16(val);
@@ -101,6 +106,12 @@ static void v9fs_uint32_read(P9Req *req, uint32_t *val)
     le32_to_cpus(val);
 }
 
+static void v9fs_uint64_read(P9Req *req, uint64_t *val)
+{
+    v9fs_memread(req, val, 8);
+    le64_to_cpus(val);
+}
+
 /* len[2] string[len] */
 static uint16_t v9fs_string_size(const char *string)
 {
@@ -191,6 +202,7 @@ static const char *rmessage_name(uint8_t id)
         id == P9_RLOPEN ? "RLOPEN" :
         id == P9_RWRITE ? "RWRITE" :
         id == P9_RFLUSH ? "RFLUSH" :
+        id == P9_RREADDIR ? "READDIR" :
         "<unknown>";
 }
 
@@ -348,6 +360,82 @@ static void v9fs_rwalk(P9Req *req, uint16_t *nwqid, v9fs_qid **wqid)
     v9fs_req_free(req);
 }
 
+/* size[4] Treaddir tag[2] fid[4] offset[8] count[4] */
+static P9Req *v9fs_treaddir(QVirtio9P *v9p, uint32_t fid, uint64_t offset,
+                            uint32_t count, uint16_t tag)
+{
+    P9Req *req;
+
+    req = v9fs_req_init(v9p, 4 + 8 + 4, P9_TREADDIR, tag);
+    v9fs_uint32_write(req, fid);
+    v9fs_uint64_write(req, offset);
+    v9fs_uint32_write(req, count);
+    v9fs_req_send(req);
+    return req;
+}
+
+struct V9fsDirent {
+    v9fs_qid qid;
+    uint64_t offset;
+    uint8_t type;
+    char *name;
+    struct V9fsDirent *next;
+};
+
+/* size[4] Rreaddir tag[2] count[4] data[count] */
+static void v9fs_rreaddir(P9Req *req, uint32_t *count, uint32_t *nentries,
+                          struct V9fsDirent **entries)
+{
+    uint32_t local_count;
+    struct V9fsDirent *e = NULL;
+    uint16_t slen;
+    uint32_t n = 0;
+
+    v9fs_req_recv(req, P9_RREADDIR);
+    v9fs_uint32_read(req, &local_count);
+
+    if (count) {
+        *count = local_count;
+    }
+
+    for (int32_t togo = (int32_t)local_count;
+         togo >= 13 + 8 + 1 + 2;
+         togo -= 13 + 8 + 1 + 2 + slen, ++n)
+    {
+        if (!e) {
+            e = g_malloc(sizeof(struct V9fsDirent));
+            if (entries) {
+                *entries = e;
+            }
+        } else {
+            e = e->next = g_malloc(sizeof(struct V9fsDirent));
+        }
+        e->next = NULL;
+        /* qid[13] offset[8] type[1] name[s] */
+        v9fs_memread(req, &e->qid, 13);
+        v9fs_uint64_read(req, &e->offset);
+        v9fs_uint8_read(req, &e->type);
+        v9fs_string_read(req, &slen, &e->name);
+    }
+
+    if (nentries) {
+        *nentries = n;
+    }
+
+    v9fs_req_free(req);
+}
+
+static void v9fs_free_dirents(struct V9fsDirent *e)
+{
+    struct V9fsDirent *next = NULL;
+
+    for (; e; e = next) {
+        next = e->next;
+        g_free(e->name);
+        g_free(e);
+    }
+}
+
 /* size[4] Tlopen tag[2] fid[4] flags[4] */
 static P9Req *v9fs_tlopen(QVirtio9P *v9p, uint32_t fid, uint32_t flags,
                           uint16_t tag)
@@ -480,6 +568,66 @@ static void fs_walk(void *obj, void *data, QGuestAllocator *t_alloc)
     g_free(wqid);
 }
 
+static bool fs_dirents_contain_name(struct V9fsDirent *e, const char* name)
+{
+    for (; e; e = e->next) {
+        if (!strcmp(e->name, name)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+static void fs_readdir(void *obj, void *data, QGuestAllocator *t_alloc)
+{
+    QVirtio9P *v9p = obj;
+    alloc = t_alloc;
+    char *const wnames[] = { g_strdup(QTEST_V9FS_SYNTH_READDIR_DIR) };
+    uint16_t nqid;
+    v9fs_qid qid;
+    uint32_t count, nentries;
+    struct V9fsDirent *entries = NULL;
+    P9Req *req;
+
+    fs_attach(v9p, NULL, t_alloc);
+    req = v9fs_twalk(v9p, 0, 1, 1, wnames, 0);
+    v9fs_req_wait_for_reply(req, NULL);
+    v9fs_rwalk(req, &nqid, NULL);
+    g_assert_cmpint(nqid, ==, 1);
+
+    req = v9fs_tlopen(v9p, 1, O_DIRECTORY, 0);
+    v9fs_req_wait_for_reply(req, NULL);
+    v9fs_rlopen(req, &qid, NULL);
+
+    req = v9fs_treaddir(v9p, 1, 0, P9_MAX_SIZE - P9_IOHDRSZ, 0);
+    v9fs_req_wait_for_reply(req, NULL);
+    v9fs_rreaddir(req, &count, &nentries, &entries);
+
+    /*
+     * Assuming msize (P9_MAX_SIZE) is large enough so we can retrieve all
+     * dir entries with only one readdir request.
+     */
+    g_assert_cmpint(
+        nentries, ==,
+        QTEST_V9FS_SYNTH_READDIR_NFILES + 2 /* "." and ".." */
+    );
+
+    /*
+     * Check all file names exist in returned entries, ignore their order
+     * though.
+     */
+    g_assert_cmpint(fs_dirents_contain_name(entries, "."), ==, true);
+    g_assert_cmpint(fs_dirents_contain_name(entries, ".."), ==, true);
+    for (int i = 0; i < QTEST_V9FS_SYNTH_READDIR_NFILES; ++i) {
+        char *name = g_strdup_printf(QTEST_V9FS_SYNTH_READDIR_FILE, i);
+        g_assert_cmpint(fs_dirents_contain_name(entries, name), ==, true);
+        g_free(name);
+    }
+
+    v9fs_free_dirents(entries);
+    g_free(wnames[0]);
+}
+
 static void fs_walk_no_slash(void *obj, void *data, QGuestAllocator *t_alloc)
 {
     QVirtio9P *v9p = obj;
@@ -658,6 +806,7 @@ static void register_virtio_9p_test(void)
                  NULL);
     qos_add_test("fs/flush/ignored", "virtio-9p", fs_flush_ignored,
                  NULL);
+    qos_add_test("fs/readdir/basic", "virtio-9p", fs_readdir, NULL);
 }
 
 libqos_init(register_virtio_9p_test);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 06/11] tests/virtio-9p: added splitted readdir test
  2020-01-13 23:18 [PATCH v3 00/11] 9pfs: readdir optimization Christian Schoenebeck
                   ` (4 preceding siblings ...)
  2020-01-13 23:08 ` [PATCH v3 05/11] tests/virtio-9p: added " Christian Schoenebeck
@ 2020-01-13 23:11 ` Christian Schoenebeck
  2020-01-13 23:13 ` [PATCH v3 07/11] tests/virtio-9p: failing " Christian Schoenebeck
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Christian Schoenebeck @ 2020-01-13 23:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: Greg Kurz

The previous, already existing readdir test simply used a 'count'
parameter big enough to retrieve all directory entries with a
single Treaddir request.

In this new 'splitted' readdir test, directory entries are
retrieved, splitted over several Treaddir requests by picking small
'count' parameters which force the server to truncate the response.
So the test client sends as many Treaddir requests as necessary to
get all directory entries. Currently this test covers actually two
tests: a sequence of Treaddir requests with count=512 and then a
subsequent test with a sequence of Treaddir requests with count=256.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
---
 tests/virtio-9p-test.c | 91 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 91 insertions(+)

diff --git a/tests/virtio-9p-test.c b/tests/virtio-9p-test.c
index 721f13c1fb..55bfe41dfd 100644
--- a/tests/virtio-9p-test.c
+++ b/tests/virtio-9p-test.c
@@ -578,6 +578,7 @@ static bool fs_dirents_contain_name(struct V9fsDirent *e, const char* name)
     return false;
 }
 
+/* basic readdir test where reply fits into a single response message */
 static void fs_readdir(void *obj, void *data, QGuestAllocator *t_alloc)
 {
     QVirtio9P *v9p = obj;
@@ -628,6 +629,95 @@ static void fs_readdir(void *obj, void *data, QGuestAllocator *t_alloc)
     g_free(wnames[0]);
 }
 
+/* readdir test where overall request is splitted over several messages */
+static void fs_readdir_split(void *obj, void *data, QGuestAllocator *t_alloc)
+{
+    QVirtio9P *v9p = obj;
+    alloc = t_alloc;
+    char *const wnames[] = { g_strdup(QTEST_V9FS_SYNTH_READDIR_DIR) };
+    uint16_t nqid;
+    v9fs_qid qid;
+    uint32_t count, nentries, npartialentries;
+    struct V9fsDirent *entries, *tail, *partialentries;
+    P9Req *req;
+    int subtest;
+    int fid;
+    uint64_t offset;
+    /* the Treaddir 'count' parameter values to be tested */
+    const uint32_t vcount[] = { 512, 256 };
+    const int nvcount = sizeof(vcount) / sizeof(uint32_t);
+
+    fs_attach(v9p, NULL, t_alloc);
+
+    /* iterate over all 'count' parameter values to be tested with Treaddir */
+    for (subtest = 0; subtest < nvcount; ++subtest) {
+        fid = subtest + 1;
+        offset = 0;
+        entries = NULL;
+        nentries = 0;
+        tail = NULL;
+
+        req = v9fs_twalk(v9p, 0, fid, 1, wnames, 0);
+        v9fs_req_wait_for_reply(req, NULL);
+        v9fs_rwalk(req, &nqid, NULL);
+        g_assert_cmpint(nqid, ==, 1);
+
+        req = v9fs_tlopen(v9p, fid, O_DIRECTORY, 0);
+        v9fs_req_wait_for_reply(req, NULL);
+        v9fs_rlopen(req, &qid, NULL);
+
+        /*
+         * send as many Treaddir requests as required to get all directory
+         * entries
+         */
+        while (true) {
+            npartialentries = 0;
+            partialentries = NULL;
+
+            req = v9fs_treaddir(v9p, fid, offset, vcount[subtest], 0);
+            v9fs_req_wait_for_reply(req, NULL);
+            v9fs_rreaddir(req, &count, &npartialentries, &partialentries);
+            if (npartialentries > 0 && partialentries) {
+                if (!entries) {
+                    entries = partialentries;
+                    nentries = npartialentries;
+                    tail = partialentries;
+                } else {
+                    tail->next = partialentries;
+                    nentries += npartialentries;
+                }
+                while (tail->next) {
+                    tail = tail->next;
+                }
+                offset = tail->offset;
+            } else {
+                break;
+            }
+        }
+
+        g_assert_cmpint(
+            nentries, ==,
+            QTEST_V9FS_SYNTH_READDIR_NFILES + 2 /* "." and ".." */
+        );
+
+        /*
+         * Check all file names exist in returned entries, ignore their order
+         * though.
+         */
+        g_assert_cmpint(fs_dirents_contain_name(entries, "."), ==, true);
+        g_assert_cmpint(fs_dirents_contain_name(entries, ".."), ==, true);
+        for (int i = 0; i < QTEST_V9FS_SYNTH_READDIR_NFILES; ++i) {
+            char *name = g_strdup_printf(QTEST_V9FS_SYNTH_READDIR_FILE, i);
+            g_assert_cmpint(fs_dirents_contain_name(entries, name), ==, true);
+            g_free(name);
+        }
+
+        v9fs_free_dirents(entries);
+    }
+
+    g_free(wnames[0]);
+}
+
 static void fs_walk_no_slash(void *obj, void *data, QGuestAllocator *t_alloc)
 {
     QVirtio9P *v9p = obj;
@@ -807,6 +897,7 @@ static void register_virtio_9p_test(void)
     qos_add_test("fs/flush/ignored", "virtio-9p", fs_flush_ignored,
                  NULL);
     qos_add_test("fs/readdir/basic", "virtio-9p", fs_readdir, NULL);
+    qos_add_test("fs/readdir/split", "virtio-9p", fs_readdir_split, NULL);
 }
 
 libqos_init(register_virtio_9p_test);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 07/11] tests/virtio-9p: failing splitted readdir test
  2020-01-13 23:18 [PATCH v3 00/11] 9pfs: readdir optimization Christian Schoenebeck
                   ` (5 preceding siblings ...)
  2020-01-13 23:11 ` [PATCH v3 06/11] tests/virtio-9p: added splitted " Christian Schoenebeck
@ 2020-01-13 23:13 ` Christian Schoenebeck
  2020-01-13 23:16 ` [PATCH v3 08/11] 9pfs: readdir benchmark Christian Schoenebeck
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Christian Schoenebeck @ 2020-01-13 23:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Greg Kurz

This patch is not intended to be merged. It resembles
an issue (with debug messages) where the splitted
readdir test fails because server is interrupted with
transport error "Failed to decode VirtFS request type 40",
which BTW fails both with the unoptimized and with the
optimized 9p readdir code.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
---
 tests/virtio-9p-test.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/tests/virtio-9p-test.c b/tests/virtio-9p-test.c
index 55bfe41dfd..f5a8a192b5 100644
--- a/tests/virtio-9p-test.c
+++ b/tests/virtio-9p-test.c
@@ -644,13 +644,14 @@ static void fs_readdir_split(void *obj, void *data, QGuestAllocator *t_alloc)
     int fid;
     uint64_t offset;
     /* the Treaddir 'count' parameter values to be tested */
-    const uint32_t vcount[] = { 512, 256 };
+    const uint32_t vcount[] = { 512, 256, 128 };
     const int nvcount = sizeof(vcount) / sizeof(uint32_t);
 
     fs_attach(v9p, NULL, t_alloc);
 
     /* iterate over all 'count' parameter values to be tested with Treaddir */
     for (subtest = 0; subtest < nvcount; ++subtest) {
+        printf("\nsubtest[%d] with count=%d\n", subtest, vcount[subtest]);
         fid = subtest + 1;
         offset = 0;
         entries = NULL;
@@ -671,12 +672,16 @@ static void fs_readdir_split(void *obj, void *data, QGuestAllocator *t_alloc)
          * entries
          */
         while (true) {
+            printf("\toffset=%ld\n", offset);
             npartialentries = 0;
             partialentries = NULL;
 
+            printf("Treaddir fid=%d offset=%ld count=%d\n",
+                   fid, offset, vcount[subtest]);
             req = v9fs_treaddir(v9p, fid, offset, vcount[subtest], 0);
             v9fs_req_wait_for_reply(req, NULL);
             v9fs_rreaddir(req, &count, &npartialentries, &partialentries);
+            printf("\t\tnpartial=%d nentries=%d\n", npartialentries, nentries);
             if (npartialentries > 0 && partialentries) {
                 if (!entries) {
                     entries = partialentries;
@@ -713,6 +718,8 @@ static void fs_readdir_split(void *obj, void *data, QGuestAllocator *t_alloc)
         }
 
         v9fs_free_dirents(entries);
+
+        printf("PASSED subtest[%d]\n", subtest);
     }
 
     g_free(wnames[0]);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 08/11] 9pfs: readdir benchmark
  2020-01-13 23:18 [PATCH v3 00/11] 9pfs: readdir optimization Christian Schoenebeck
                   ` (6 preceding siblings ...)
  2020-01-13 23:13 ` [PATCH v3 07/11] tests/virtio-9p: failing " Christian Schoenebeck
@ 2020-01-13 23:16 ` Christian Schoenebeck
  2020-01-13 23:16 ` [PATCH v3 09/11] hw/9pfs/9p-synth: avoid n-square issue in synth_readdir() Christian Schoenebeck
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Christian Schoenebeck @ 2020-01-13 23:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: Greg Kurz

This patch is not intended to be merged. It just provides a
temporary benchmark foundation for coneniently A/B comparison
of the subsequent 9p readdir optimization patches:

* hw/9pfs/9p-synth: increase amount of simulated files for
  readdir test to 2000 files.

* tests/virtio-9p: measure wall time that elapsed between
  sending T_readdir request and arrival of R_readdir response
  and print out that measured duration, as well as amount of
  directory entries received, and the amount of bytes of the
  response message.

* tests/virtio-9p: increased msize to 256kiB to allow
  retrieving all 2000 files (simulated by 9pfs synth driver)
  with only one T_readdir request.

Running this benchmark is fairly quick & simple and does not
require any guest OS installation or other prerequisites:

cd build
make && make tests/qos-test
export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64
tests/qos-test -p $(tests/qos-test -l | grep readdir/basic)

Since this benchmark uses the 9pfs synth driver, the host
machine's I/O hardware (SSDs/HDDs) is not relevant for the
benchmark result, because the synth backend's readdir
implementation returns immediately (without any blocking I/O
that would incur with a real-life fs driver) and just returns
already prepared, simulated directory entries directly from RAM.
So this benchmark focuses on the efficiency of the 9pfs
controller code (or top half) for readdir request handling.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
---
 hw/9pfs/9p-synth.h     |  2 +-
 tests/virtio-9p-test.c | 37 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/hw/9pfs/9p-synth.h b/hw/9pfs/9p-synth.h
index 036d7e4a5b..7d6cedcdac 100644
--- a/hw/9pfs/9p-synth.h
+++ b/hw/9pfs/9p-synth.h
@@ -58,7 +58,7 @@ int qemu_v9fs_synth_add_file(V9fsSynthNode *parent, int mode,
 /* for READDIR test */
 #define QTEST_V9FS_SYNTH_READDIR_DIR "ReadDirDir"
 #define QTEST_V9FS_SYNTH_READDIR_FILE "ReadDirFile%d"
-#define QTEST_V9FS_SYNTH_READDIR_NFILES 100
+#define QTEST_V9FS_SYNTH_READDIR_NFILES 2000
 
 /* Any write to the "FLUSH" file is handled one byte at a time by the
  * backend. If the byte is zero, the backend returns success (ie, 1),
diff --git a/tests/virtio-9p-test.c b/tests/virtio-9p-test.c
index f5a8a192b5..1c901c889f 100644
--- a/tests/virtio-9p-test.c
+++ b/tests/virtio-9p-test.c
@@ -15,6 +15,18 @@
 #include "libqos/virtio-9p.h"
 #include "libqos/qgraph.h"
 
+/*
+ * to benchmark the real time (not CPU time) that elapsed between start of
+ * a request and arrival of its response
+ */
+static double wall_time(void)
+{
+    struct timeval t;
+    struct timezone tz;
+    gettimeofday(&t, &tz);
+    return t.tv_sec + t.tv_usec * 0.000001;
+}
+
 #define QVIRTIO_9P_TIMEOUT_US (10 * 1000 * 1000)
 static QGuestAllocator *alloc;
 
@@ -36,7 +48,7 @@ static void pci_config(void *obj, void *data, QGuestAllocator *t_alloc)
     g_free(tag);
 }
 
-#define P9_MAX_SIZE 4096 /* Max size of a T-message or R-message */
+#define P9_MAX_SIZE (256 * 1024) /* Max size of a T-message or R-message */
 
 typedef struct {
     QTestState *qts;
@@ -600,9 +612,32 @@ static void fs_readdir(void *obj, void *data, QGuestAllocator *t_alloc)
     v9fs_req_wait_for_reply(req, NULL);
     v9fs_rlopen(req, &qid, NULL);
 
+    const double start = wall_time();
+
     req = v9fs_treaddir(v9p, 1, 0, P9_MAX_SIZE - P9_IOHDRSZ, 0);
+    const double treaddir = wall_time();
     v9fs_req_wait_for_reply(req, NULL);
+    const double waitforreply = wall_time();
     v9fs_rreaddir(req, &count, &nentries, &entries);
+    const double end = wall_time();
+
+    printf("\nTime client spent on sending T_readdir: %fs\n\n",
+           treaddir - start);
+
+    printf("Time client spent for waiting for reply from server: %fs "
+           "[MOST IMPORTANT]\n", waitforreply - start);
+    printf("(This is the most important value, because it reflects the time\n"
+           "the 9p server required to process and return the result of the\n"
+           "T_readdir request.)\n\n");
+
+    printf("Total client time: %fs\n", end - start);
+    printf("(NOTE: this time is not relevant; this huge time comes from\n"
+           "inefficient qtest_memread() calls. So you can discard this\n"
+           "value as a problem of this test client implementation while\n"
+           "processing the received server T_readdir reply.)\n\n");
+
+    printf("Details of response message data: R_readddir nentries=%d "
+           "rbytes=%d\n", nentries, count);
 
     /*
      * Assuming msize (P9_MAX_SIZE) is large enough so we can retrieve all
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 09/11] hw/9pfs/9p-synth: avoid n-square issue in synth_readdir()
  2020-01-13 23:18 [PATCH v3 00/11] 9pfs: readdir optimization Christian Schoenebeck
                   ` (7 preceding siblings ...)
  2020-01-13 23:16 ` [PATCH v3 08/11] 9pfs: readdir benchmark Christian Schoenebeck
@ 2020-01-13 23:16 ` Christian Schoenebeck
  2020-01-13 23:17 ` [PATCH v3 10/11] 9pfs: T_readdir latency optimization Christian Schoenebeck
  2020-01-13 23:17 ` [PATCH v3 11/11] hw/9pfs/9p.c: benchmark time on T_readdir request Christian Schoenebeck
  10 siblings, 0 replies; 25+ messages in thread
From: Christian Schoenebeck @ 2020-01-13 23:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: Greg Kurz

This patch is just a temporary benchmark hack, not intended
to be merged!

9pfs synth driver's readdir() implementation has a severe
n-square performance problem. This patch is a quick and dirty
hack to prevent that performance problem from tainting the
readdir() benchmark results. In its current form, this patch
is not useful for anything else than for an isolated readdir
benchmark.

NOTE: This patch would break the new readdir/split test,
because it would alter the behaviour of seekdir() required
for retrieving directory entries splitted over several
requests.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
---
 hw/9pfs/9p-synth.c | 29 ++++++++++++++++++++++++++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/hw/9pfs/9p-synth.c b/hw/9pfs/9p-synth.c
index 7eb210ffa8..54dc30f37b 100644
--- a/hw/9pfs/9p-synth.c
+++ b/hw/9pfs/9p-synth.c
@@ -225,7 +225,8 @@ static void synth_direntry(V9fsSynthNode *node,
 }
 
 static struct dirent *synth_get_dentry(V9fsSynthNode *dir,
-                                            struct dirent *entry, off_t off)
+                                       struct dirent *entry, off_t off,
+                                       V9fsSynthNode **hack)
 {
     int i = 0;
     V9fsSynthNode *node;
@@ -243,16 +244,38 @@ static struct dirent *synth_get_dentry(V9fsSynthNode *dir,
         /* end of directory */
         return NULL;
     }
+    *hack = node;
     synth_direntry(node, entry, off);
     return entry;
 }
 
 static struct dirent *synth_readdir(FsContext *ctx, V9fsFidOpenState *fs)
 {
-    struct dirent *entry;
+    struct dirent *entry = NULL;
     V9fsSynthOpenState *synth_open = fs->private;
     V9fsSynthNode *node = synth_open->node;
-    entry = synth_get_dentry(node, &synth_open->dent, synth_open->offset);
+
+    /*
+     * HACK: This is just intended for benchmark, to avoid severe n-square
+     * performance problem of synth driver's readdir implementation here which
+     * would otherwise unncessarily taint the benchmark results. By simply
+     * caching (globally) the previous node (of the previous synth_readdir()
+     * call) we can simply proceed to next node in chained list efficiently.
+     *
+     * not a good idea for any production code ;-)
+     */
+    static struct V9fsSynthNode *cachedNode;
+
+    if (!cachedNode) {
+        entry = synth_get_dentry(node, &synth_open->dent, synth_open->offset,
+                                 &cachedNode);
+    } else {
+        cachedNode = cachedNode->sibling.le_next;
+        if (cachedNode) {
+            entry = &synth_open->dent;
+            synth_direntry(cachedNode, entry, synth_open->offset + 1);
+        }
+    }
     if (entry) {
         synth_open->offset++;
     }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 10/11] 9pfs: T_readdir latency optimization
  2020-01-13 23:18 [PATCH v3 00/11] 9pfs: readdir optimization Christian Schoenebeck
                   ` (8 preceding siblings ...)
  2020-01-13 23:16 ` [PATCH v3 09/11] hw/9pfs/9p-synth: avoid n-square issue in synth_readdir() Christian Schoenebeck
@ 2020-01-13 23:17 ` Christian Schoenebeck
  2020-01-13 23:17 ` [PATCH v3 11/11] hw/9pfs/9p.c: benchmark time on T_readdir request Christian Schoenebeck
  10 siblings, 0 replies; 25+ messages in thread
From: Christian Schoenebeck @ 2020-01-13 23:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Greg Kurz

Make top half really top half and bottom half really bottom half:

Each T_readdir request handling is hopping between threads (main
I/O thread and background I/O driver threads) several times for
every individual directory entry, which sums up to huge latencies
for handling just a single T_readdir request.

Instead of doing that, collect now all required directory entries
(including all potentially required stat buffers for each entry) in
one rush on a background I/O thread from fs driver, then assemble
the entire resulting network response message for the readdir
request on main I/O thread. The fs driver is still aborting the
directory entry retrieval loop (on the background I/O thread) as
soon as it would exceed the client's requested maximum R_readdir
response size. So we should not have any performance penalty by
doing this.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
---
 hw/9pfs/9p.c    | 124 +++++++++++++++-----------------
 hw/9pfs/9p.h    |  23 ++++++
 hw/9pfs/codir.c | 183 +++++++++++++++++++++++++++++++++++++++++++++---
 hw/9pfs/coth.h  |   3 +
 4 files changed, 254 insertions(+), 79 deletions(-)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 30da2fedf3..5ebd424af9 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -971,30 +971,6 @@ static int coroutine_fn fid_to_qid(V9fsPDU *pdu, V9fsFidState *fidp,
     return 0;
 }
 
-static int coroutine_fn dirent_to_qid(V9fsPDU *pdu, V9fsFidState *fidp,
-                                      struct dirent *dent, V9fsQID *qidp)
-{
-    struct stat stbuf;
-    V9fsPath path;
-    int err;
-
-    v9fs_path_init(&path);
-
-    err = v9fs_co_name_to_path(pdu, &fidp->path, dent->d_name, &path);
-    if (err < 0) {
-        goto out;
-    }
-    err = v9fs_co_lstat(pdu, &path, &stbuf);
-    if (err < 0) {
-        goto out;
-    }
-    err = stat_to_qid(pdu, &stbuf, qidp);
-
-out:
-    v9fs_path_free(&path);
-    return err;
-}
-
 V9fsPDU *pdu_alloc(V9fsState *s)
 {
     V9fsPDU *pdu = NULL;
@@ -2314,7 +2290,7 @@ out_nofid:
     pdu_complete(pdu, err);
 }
 
-static size_t v9fs_readdir_data_size(V9fsString *name)
+size_t v9fs_readdir_response_size(V9fsString *name)
 {
     /*
      * Size of each dirent on the wire: size of qid (13) + size of offset (8)
@@ -2323,6 +2299,18 @@ static size_t v9fs_readdir_data_size(V9fsString *name)
     return 24 + v9fs_string_size(name);
 }
 
+static void v9fs_free_dirents(struct V9fsDirEnt *e)
+{
+    struct V9fsDirEnt *next = NULL;
+
+    for (; e; e = next) {
+        next = e->next;
+        g_free(e->dent);
+        g_free(e->st);
+        g_free(e);
+    }
+}
+
 static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, V9fsFidState *fidp,
                                         int32_t max_count)
 {
@@ -2331,54 +2319,53 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, V9fsFidState *fidp,
     V9fsString name;
     int len, err = 0;
     int32_t count = 0;
-    off_t saved_dir_pos;
     struct dirent *dent;
+    struct stat *st;
+    struct V9fsDirEnt *entries = NULL;
 
-    /* save the directory position */
-    saved_dir_pos = v9fs_co_telldir(pdu, fidp);
-    if (saved_dir_pos < 0) {
-        return saved_dir_pos;
-    }
-
-    while (1) {
-        v9fs_readdir_lock(&fidp->fs.dir);
+    /*
+     * inode remapping requires the device id, which in turn might be
+     * different for different directory entries, so if inode remapping is
+     * enabled we have to make a full stat for each directory entry
+     */
+    const bool dostat = pdu->s->ctx.export_flags & V9FS_REMAP_INODES;
 
-        err = v9fs_co_readdir(pdu, fidp, &dent);
-        if (err || !dent) {
-            break;
-        }
-        v9fs_string_init(&name);
-        v9fs_string_sprintf(&name, "%s", dent->d_name);
-        if ((count + v9fs_readdir_data_size(&name)) > max_count) {
-            v9fs_readdir_unlock(&fidp->fs.dir);
+    /*
+     * Fetch all required directory entries altogether on a background IO
+     * thread from fs driver. We don't want to do that for each entry
+     * individually, because hopping between threads (this main IO thread
+     * and background IO driver thread) would sum up to huge latencies.
+     */
+    count = v9fs_co_readdir_lowlat(pdu, fidp, &entries, max_count, dostat);
+    if (count < 0) {
+        err = count;
+        count = 0;
+        goto out;
+    }
+    count = 0;
 
-            /* Ran out of buffer. Set dir back to old position and return */
-            v9fs_co_seekdir(pdu, fidp, saved_dir_pos);
-            v9fs_string_free(&name);
-            return count;
-        }
+    for (struct V9fsDirEnt *e = entries; e; e = e->next) {
+        dent = e->dent;
 
         if (pdu->s->ctx.export_flags & V9FS_REMAP_INODES) {
-            /*
-             * dirent_to_qid() implies expensive stat call for each entry,
-             * we must do that here though since inode remapping requires
-             * the device id, which in turn might be different for
-             * different entries; we cannot make any assumption to avoid
-             * that here.
-             */
-            err = dirent_to_qid(pdu, fidp, dent, &qid);
+            st = e->st;
+            /* e->st should never be NULL, but just to be sure */
+            if (!st) {
+                err = -1;
+                break;
+            }
+
+            /* remap inode */
+            err = stat_to_qid(pdu, st, &qid);
             if (err < 0) {
-                v9fs_readdir_unlock(&fidp->fs.dir);
-                v9fs_co_seekdir(pdu, fidp, saved_dir_pos);
-                v9fs_string_free(&name);
-                return err;
+                break;
             }
         } else {
             /*
              * Fill up just the path field of qid because the client uses
              * only that. To fill the entire qid structure we will have
              * to stat each dirent found, which is expensive. For the
-             * latter reason we don't call dirent_to_qid() here. Only drawback
+             * latter reason we don't call stat_to_qid() here. Only drawback
              * is that no multi-device export detection of stat_to_qid()
              * would be done and provided as error to the user here. But
              * user would get that error anyway when accessing those
@@ -2391,25 +2378,26 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, V9fsFidState *fidp,
             qid.version = 0;
         }
 
+        v9fs_string_init(&name);
+        v9fs_string_sprintf(&name, "%s", dent->d_name);
+
         /* 11 = 7 + 4 (7 = start offset, 4 = space for storing count) */
         len = pdu_marshal(pdu, 11 + count, "Qqbs",
                           &qid, dent->d_off,
                           dent->d_type, &name);
 
-        v9fs_readdir_unlock(&fidp->fs.dir);
+        v9fs_string_free(&name);
 
         if (len < 0) {
-            v9fs_co_seekdir(pdu, fidp, saved_dir_pos);
-            v9fs_string_free(&name);
-            return len;
+            err = len;
+            break;
         }
+
         count += len;
-        v9fs_string_free(&name);
-        saved_dir_pos = dent->d_off;
     }
 
-    v9fs_readdir_unlock(&fidp->fs.dir);
-
+out:
+    v9fs_free_dirents(entries);
     if (err < 0) {
         return err;
     }
diff --git a/hw/9pfs/9p.h b/hw/9pfs/9p.h
index 6fffe44f5a..1dbb0ad189 100644
--- a/hw/9pfs/9p.h
+++ b/hw/9pfs/9p.h
@@ -215,6 +215,28 @@ static inline void v9fs_readdir_init(V9fsDir *dir)
     qemu_mutex_init(&dir->readdir_mutex);
 }
 
+/*
+ * Type for 9p fs drivers' (a.k.a. 9p backends) result of readdir requests,
+ * which is a chained list of directory entries.
+ */
+typedef struct V9fsDirEnt {
+    /* mandatory (must not be NULL) information for all readdir requests */
+    struct dirent *dent;
+    /*
+     * optional (may be NULL): A full stat of each directory entry is just
+     * done if explicitly told to fs driver.
+     */
+    struct stat *st;
+    /*
+     * instead of an array, directory entries are always returned as
+     * chained list, that's because the amount of entries retrieved by fs
+     * drivers is dependent on the individual entries' name (since response
+     * messages are size limited), so the final amount cannot be estimated
+     * before hand
+     */
+    struct V9fsDirEnt *next;
+} V9fsDirEnt;
+
 /*
  * Filled by fs driver on open and other
  * calls.
@@ -419,6 +441,7 @@ void v9fs_path_init(V9fsPath *path);
 void v9fs_path_free(V9fsPath *path);
 void v9fs_path_sprintf(V9fsPath *path, const char *fmt, ...);
 void v9fs_path_copy(V9fsPath *dst, const V9fsPath *src);
+size_t v9fs_readdir_response_size(V9fsString *name);
 int v9fs_name_to_path(V9fsState *s, V9fsPath *dirpath,
                       const char *name, V9fsPath *path);
 int v9fs_device_realize_common(V9fsState *s, const V9fsTransport *t,
diff --git a/hw/9pfs/codir.c b/hw/9pfs/codir.c
index 73f9a751e1..6ce7dc8cde 100644
--- a/hw/9pfs/codir.c
+++ b/hw/9pfs/codir.c
@@ -18,28 +18,189 @@
 #include "qemu/main-loop.h"
 #include "coth.h"
 
+/*
+ * This is solely executed on a background IO thread.
+ */
+static int do_readdir(V9fsPDU *pdu, V9fsFidState *fidp, struct dirent **dent)
+{
+    int err = 0;
+    V9fsState *s = pdu->s;
+    struct dirent *entry;
+
+    errno = 0;
+    entry = s->ops->readdir(&s->ctx, &fidp->fs);
+    if (!entry && errno) {
+        *dent = NULL;
+        err = -errno;
+    } else {
+        *dent = entry;
+    }
+    return err;
+}
+
+/*
+ * TODO: This will be removed for performance reasons.
+ * Use v9fs_co_readdir_lowlat() instead.
+ */
 int coroutine_fn v9fs_co_readdir(V9fsPDU *pdu, V9fsFidState *fidp,
                                  struct dirent **dent)
 {
     int err;
-    V9fsState *s = pdu->s;
 
     if (v9fs_request_cancelled(pdu)) {
         return -EINTR;
     }
-    v9fs_co_run_in_worker(
-        {
-            struct dirent *entry;
+    v9fs_co_run_in_worker({
+        err = do_readdir(pdu, fidp, dent);
+    });
+    return err;
+}
+
+/*
+ * This is solely executed on a background IO thread.
+ *
+ * See v9fs_co_readdir_lowlat() (as its only user) below for details.
+ */
+static int do_readdir_lowlat(V9fsPDU *pdu, V9fsFidState *fidp,
+                             struct V9fsDirEnt **entries,
+                             int32_t maxsize, bool dostat)
+{
+    V9fsState *s = pdu->s;
+    V9fsString name;
+    int len, err = 0;
+    int32_t size = 0;
+    off_t saved_dir_pos;
+    struct dirent *dent;
+    struct V9fsDirEnt *e = NULL;
+    V9fsPath path;
+    struct stat stbuf;
+
+    *entries = NULL;
+    v9fs_path_init(&path);
+
+    /*
+     * TODO: Here should be a warn_report_once() if lock failed.
+     *
+     * With a good 9p client we should not get into concurrency here,
+     * because a good client would not use the same fid for concurrent
+     * requests. We do the lock here for safety reasons though. However
+     * the client would then suffer performance issues, so better log that
+     * issue here.
+     */
+    v9fs_readdir_lock(&fidp->fs.dir);
+
+    /* save the directory position */
+    saved_dir_pos = s->ops->telldir(&s->ctx, &fidp->fs);
+    if (saved_dir_pos < 0) {
+        err = saved_dir_pos;
+        goto out;
+    }
+
+    while (true) {
+        /* get directory entry from fs driver */
+        err = do_readdir(pdu, fidp, &dent);
+        if (err || !dent) {
+            break;
+        }
+
+        /*
+         * stop this loop as soon as it would exceed the allowed maximum
+         * response message size for the directory entries collected so far,
+         * because anything beyond that size would need to be discarded by
+         * 9p controller (main thread / top half) anyway
+         */
+        v9fs_string_init(&name);
+        v9fs_string_sprintf(&name, "%s", dent->d_name);
+        len = v9fs_readdir_response_size(&name);
+        v9fs_string_free(&name);
+        if (size + len > maxsize) {
+            /* this is not an error case actually */
+            break;
+        }
+
+        /* append next node to result chain */
+        if (!e) {
+            *entries = e = g_malloc0(sizeof(V9fsDirEnt));
+        } else {
+            e = e->next = g_malloc0(sizeof(V9fsDirEnt));
+        }
+        e->dent = g_malloc0(sizeof(struct dirent));
+        memcpy(e->dent, dent, sizeof(struct dirent));
 
-            errno = 0;
-            entry = s->ops->readdir(&s->ctx, &fidp->fs);
-            if (!entry && errno) {
+        /* perform a full stat() for directory entry if requested by caller */
+        if (dostat) {
+            err = s->ops->name_to_path(
+                &s->ctx, &fidp->path, dent->d_name, &path
+            );
+            if (err < 0) {
                 err = -errno;
-            } else {
-                *dent = entry;
-                err = 0;
+                break;
             }
-        });
+
+            err = s->ops->lstat(&s->ctx, &path, &stbuf);
+            if (err < 0) {
+                err = -errno;
+                break;
+            }
+
+            e->st = g_malloc0(sizeof(struct stat));
+            memcpy(e->st, &stbuf, sizeof(struct stat));
+        }
+
+        size += len;
+        saved_dir_pos = dent->d_off;
+    }
+
+    /* restore (last) saved position */
+    s->ops->seekdir(&s->ctx, &fidp->fs, saved_dir_pos);
+
+out:
+    v9fs_readdir_unlock(&fidp->fs.dir);
+    v9fs_path_free(&path);
+    if (err < 0) {
+        return err;
+    }
+    return size;
+}
+
+/**
+ * @brief Low latency variant of fs driver readdir handling.
+ *
+ * Retrieves the requested (max. amount of) directory entries from the fs
+ * driver. This function must only be called by the main IO thread (top half).
+ * Internally this function call will be dispatched to a background IO thread
+ * (bottom half) where it is eventually executed by the fs driver.
+ *
+ * The old readdir implementation above just retrieves always one dir entry
+ * per call. The problem of that implementation above is that latency is
+ * added for (retrieval of) each directory entry, which in practice lead to
+ * latencies of several hundred ms for readdir of only one directory.
+ *
+ * This is avoided in this function by letting the fs driver retrieve all
+ * required directory entries with only call of this function and hence with
+ * only a single fs driver request.
+ *
+ * @param pdu - the causing 9p (T_readdir) client request
+ * @param fidp - already opened directory where readdir shall be performed on
+ * @param entries - output for directory entries (must not be NULL)
+ * @param maxsize - maximum result message body size (in bytes)
+ * @param dostat - whether a stat() should be performed and returned for
+ *                 each directory entry
+ * @returns resulting response message body size (in bytes) on success,
+ *          negative error code otherwise
+ */
+int coroutine_fn v9fs_co_readdir_lowlat(V9fsPDU *pdu, V9fsFidState *fidp,
+                                        struct V9fsDirEnt **entries,
+                                        int32_t maxsize, bool dostat)
+{
+    int err = 0;
+
+    if (v9fs_request_cancelled(pdu)) {
+        return -EINTR;
+    }
+    v9fs_co_run_in_worker({
+        err = do_readdir_lowlat(pdu, fidp, entries, maxsize, dostat);
+    });
     return err;
 }
 
diff --git a/hw/9pfs/coth.h b/hw/9pfs/coth.h
index c2cdc7a9ea..1249dbe6df 100644
--- a/hw/9pfs/coth.h
+++ b/hw/9pfs/coth.h
@@ -49,6 +49,9 @@
 void co_run_in_worker_bh(void *);
 int coroutine_fn v9fs_co_readlink(V9fsPDU *, V9fsPath *, V9fsString *);
 int coroutine_fn v9fs_co_readdir(V9fsPDU *, V9fsFidState *, struct dirent **);
+int coroutine_fn v9fs_co_readdir_lowlat(V9fsPDU *, V9fsFidState *,
+                                        struct V9fsDirEnt **,
+                                        int32_t, bool);
 off_t coroutine_fn v9fs_co_telldir(V9fsPDU *, V9fsFidState *);
 void coroutine_fn v9fs_co_seekdir(V9fsPDU *, V9fsFidState *, off_t);
 void coroutine_fn v9fs_co_rewinddir(V9fsPDU *, V9fsFidState *);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 11/11] hw/9pfs/9p.c: benchmark time on T_readdir request
  2020-01-13 23:18 [PATCH v3 00/11] 9pfs: readdir optimization Christian Schoenebeck
                   ` (9 preceding siblings ...)
  2020-01-13 23:17 ` [PATCH v3 10/11] 9pfs: T_readdir latency optimization Christian Schoenebeck
@ 2020-01-13 23:17 ` Christian Schoenebeck
  10 siblings, 0 replies; 25+ messages in thread
From: Christian Schoenebeck @ 2020-01-13 23:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Greg Kurz

This patch is not intended to be merged, it measures
and prints the time the 9p server spends on handling
a T_readdir request. It prints the total time it spent
on handling the request, and also the time it spent
on I/O (fs driver) only.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
---
 hw/9pfs/9p.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 5ebd424af9..7cdf18080f 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -2311,6 +2311,15 @@ static void v9fs_free_dirents(struct V9fsDirEnt *e)
     }
 }
 
+static double wall_time(void)
+{
+    struct timeval t;
+    struct timezone tz;
+    gettimeofday(&t, &tz);
+    return t.tv_sec + t.tv_usec * 0.000001;
+}
+
+
 static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, V9fsFidState *fidp,
                                         int32_t max_count)
 {
@@ -2330,6 +2339,8 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, V9fsFidState *fidp,
      */
     const bool dostat = pdu->s->ctx.export_flags & V9FS_REMAP_INODES;
 
+    const double start = wall_time();
+
     /*
      * Fetch all required directory entries altogether on a background IO
      * thread from fs driver. We don't want to do that for each entry
@@ -2344,6 +2355,10 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, V9fsFidState *fidp,
     }
     count = 0;
 
+    const double end = wall_time();
+    printf("\n\nTime 9p server spent on synth_readdir() I/O only (synth "
+           "driver): %fs\n", end - start);
+
     for (struct V9fsDirEnt *e = entries; e; e = e->next) {
         dent = e->dent;
 
@@ -2416,6 +2431,8 @@ static void coroutine_fn v9fs_readdir(void *opaque)
     V9fsPDU *pdu = opaque;
     V9fsState *s = pdu->s;
 
+    const double start = wall_time();
+
     retval = pdu_unmarshal(pdu, offset, "dqd", &fid,
                            &initial_offset, &max_count);
     if (retval < 0) {
@@ -2459,6 +2476,10 @@ out:
     put_fid(pdu, fidp);
 out_nofid:
     pdu_complete(pdu, retval);
+
+    const double end = wall_time();
+    printf("Time 9p server spent on entire T_readdir request: %fs "
+           "[IMPORTANT]\n", end - start);
 }
 
 static int v9fs_xattr_write(V9fsState *s, V9fsPDU *pdu, V9fsFidState *fidp,
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 00/11] 9pfs: readdir optimization
@ 2020-01-13 23:18 Christian Schoenebeck
  2020-01-13 22:20 ` [PATCH v3 01/11] tests/virtio-9p: add terminating null in v9fs_string_read() Christian Schoenebeck
                   ` (10 more replies)
  0 siblings, 11 replies; 25+ messages in thread
From: Christian Schoenebeck @ 2020-01-13 23:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: Greg Kurz

As previously mentioned, I was investigating performance issues with 9pfs.
Raw file read/write of 9pfs is actually quite good, provided that client
picked a reasonable high msize (maximum message size). I would recommend
to log a warning on 9p server side if a client attached with a small msize
that would cause performance issues for that reason.

However there are other aspects where 9pfs currently performs suboptimally,
especially readdir handling of 9pfs is extremely slow, a simple readdir
request of a guest typically blocks for several hundred milliseconds or
even several seconds, no matter how powerful the underlying hardware is.
The reason for this performance issue: latency.
Currently 9pfs is heavily dispatching a T_readdir request numerous times
between main I/O thread and a background I/O thread back and forth; in fact
it is actually hopping between threads even multiple times for every single
directory entry during T_readdir request handling which leads in total to
huge latencies for a single T_readdir request.

This patch series aims to address this severe performance issue of 9pfs
T_readdir request handling. The actual performance fix is patch 10. I also
provided a convenient benchmark for comparing the performance improvements
by using the 9pfs "synth" driver (see patch 8 for instructions how to run
the benchmark), so no guest OS installation is required to peform this
benchmark A/B comparison. With patch 10 I achieved a performance improvement
of factor 40 on my test machine.

** NOTE: ** As outlined by patch 7 there seems to be an outstanding issue
(both with current, unoptimized readdir code, as well as with new, optimized
readdir code) causing a transport error with splitted readdir requests. This
issue only occurs if patch 7 is applied. I haven't investigated the cause of
this issue yet, it looks like a memory issue though. I am not sure if it is a
problem with the actual 9p server or rather "just" with the test environment.
Apart from that issue, the actual splitted readdir seems to work well with the
new performance optimized readdir code as well though.

v2->v3:

  * NEW patch: require msize >= 4096 [patch 2].

  * Shortened commit log message [patch 3]
    (since previously mentioned issue now addressed by new patch 2).

  * Merged previous 2 test case patches into one -> [patch 5]
    (since trivial enough for one patch).

  * Fixed code style issue [patch 5].

  * Fixed memory leak in test case [patch 5]
    (missing v9fs_req_free() in v9fs_rreaddir()).

  * NEW patch: added splitted readdir test [patch 6].

  * NEW patch: Failing splitted readdir issue [patch 7]
    (see issue description above).

  * Adjusted commit log message [patch 9]
    (that this patch would break the new splitted readdir test).

  * Fixed comment in code [patch 10].

Christian Schoenebeck (11):
  tests/virtio-9p: add terminating null in v9fs_string_read()
  9pfs: require msize >= 4096
  9pfs: validate count sent by client with T_readdir
  hw/9pfs/9p-synth: added directory for readdir test
  tests/virtio-9p: added readdir test
  tests/virtio-9p: added splitted readdir test
  tests/virtio-9p: failing splitted readdir test
  9pfs: readdir benchmark
  hw/9pfs/9p-synth: avoid n-square issue in synth_readdir()
  9pfs: T_readdir latency optimization
  hw/9pfs/9p.c: benchmark time on T_readdir request

 hw/9pfs/9p-synth.c     |  48 ++++++-
 hw/9pfs/9p-synth.h     |   5 +
 hw/9pfs/9p.c           | 163 +++++++++++++----------
 hw/9pfs/9p.h           |  34 +++++
 hw/9pfs/codir.c        | 183 ++++++++++++++++++++++++--
 hw/9pfs/coth.h         |   3 +
 tests/virtio-9p-test.c | 287 ++++++++++++++++++++++++++++++++++++++++-
 7 files changed, 640 insertions(+), 83 deletions(-)

-- 
2.20.1



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 02/11] 9pfs: require msize >= 4096
  2020-01-13 22:21 ` [PATCH v3 02/11] 9pfs: require msize >= 4096 Christian Schoenebeck
@ 2020-01-16 13:15   ` Greg Kurz
  2020-01-16 16:16     ` Christian Schoenebeck
  0 siblings, 1 reply; 25+ messages in thread
From: Greg Kurz @ 2020-01-16 13:15 UTC (permalink / raw)
  To: Christian Schoenebeck; +Cc: qemu-devel

On Mon, 13 Jan 2020 23:21:04 +0100
Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:

> A client establishes a session by sending a Tversion request along with
> a 'msize' parameter which client uses to suggest server a maximum
> message size ever to be used for communication (for both requests and
> replies) between client and server during that session. If client
> suggests a 'msize' smaller than 4096 then deny session by server
> immediately with an error response (Rlerror for "9P2000.L" clients or
> Rerror for "9P2000.u" clients) instead of replying with Rversion.
> 
> Introduction of a minimum msize is required to prevent issues in
> responses to numerous individual request types. For instance with a
> msize of < P9_IOHDRSZ no useful operations would be possible at all.

P9_IOHDRSZ is really specific to write/read operations. 

/*
 * ample room for Twrite/Rread header
 * size[4] Tread/Twrite tag[2] fid[4] offset[8] count[4]
 */
#define P9_IOHDRSZ 24

As you see P9_IOHDRSZ is approximately the size of a Twrite header.
Its primary use is to inform the client about the 'count' to use for
Twrite/Tread messages (see get_iounit()).

Not sure it helps to mention P9_IOHDRSZ since we're going to choose
something much greater. I'd personally drop this sentence.

> Furthermore there are some responses which are not allowed by the 9p
> protocol to be truncated like e.g. Rreadlink which may yield up to

No message may be truncated in any way actually. The spec just allows
an exception with the string part of Rerror.

Maybe just mention that and say we choose 4096 to be able to send
big Rreadlink messages.

> a size of PATH_MAX which is usually 4096. Hence this size was chosen
> as min. msize for server, which is already the minimum msize of the
> Linux kernel's 9pfs client. By forcing a min. msize already at
> session start (when handling Tversion) we don't have to check for a
> minimum msize on a per request type basis later on during session,
> which would be much harder and more error prone to maintain.
> 
> This is a user visible change which should be documented as such
> (i.e. in public QEMU release changelog).
> 

This last sentence isn't informative in the commit message. This
kind of indication should be added after the --- below.

> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> ---

LGTM

With an updated changelog,

Reviewed-by: Greg Kurz <groug@kaod.org>

>  hw/9pfs/9p.c | 12 ++++++++++++
>  hw/9pfs/9p.h | 11 +++++++++++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> index 520177f40c..a5fbe821d4 100644
> --- a/hw/9pfs/9p.c
> +++ b/hw/9pfs/9p.c
> @@ -1363,8 +1363,20 @@ static void coroutine_fn v9fs_version(void *opaque)
>          s->proto_version = V9FS_PROTO_2000L;
>      } else {
>          v9fs_string_sprintf(&version, "unknown");
> +        /* skip min. msize check, reporting invalid version has priority */
> +        goto marshal;
>      }
>  
> +    if (s->msize < P9_MIN_MSIZE) {
> +        err = -EMSGSIZE;
> +        error_report(
> +            "9pfs: Client requested msize < minimum msize ("
> +            stringify(P9_MIN_MSIZE) ") supported by this server."
> +        );
> +        goto out;
> +    }
> +
> +marshal:
>      err = pdu_marshal(pdu, offset, "ds", s->msize, &version);
>      if (err < 0) {
>          goto out;
> diff --git a/hw/9pfs/9p.h b/hw/9pfs/9p.h
> index 3904f82901..6fffe44f5a 100644
> --- a/hw/9pfs/9p.h
> +++ b/hw/9pfs/9p.h
> @@ -100,6 +100,17 @@ typedef enum P9ProtoVersion {
>      V9FS_PROTO_2000L = 0x02,
>  } P9ProtoVersion;
>  
> +/**
> + * @brief Minimum message size supported by this 9pfs server.
> + *
> + * A client establishes a session by sending a Tversion request along with a
> + * 'msize' parameter which suggests the server a maximum message size ever to be
> + * used for communication (for both requests and replies) between client and
> + * server during that session. If client suggests a 'msize' smaller than this
> + * value then session is denied by server with an error response.
> + */
> +#define P9_MIN_MSIZE    4096
> +
>  #define P9_NOTAG    UINT16_MAX
>  #define P9_NOFID    UINT32_MAX
>  #define P9_MAXWELEM 16



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 03/11] 9pfs: validate count sent by client with T_readdir
  2020-01-13 22:22 ` [PATCH v3 03/11] 9pfs: validate count sent by client with T_readdir Christian Schoenebeck
@ 2020-01-16 13:33   ` Greg Kurz
  2020-01-16 16:51     ` Christian Schoenebeck
  0 siblings, 1 reply; 25+ messages in thread
From: Greg Kurz @ 2020-01-16 13:33 UTC (permalink / raw)
  To: Christian Schoenebeck; +Cc: qemu-devel

On Mon, 13 Jan 2020 23:22:08 +0100
Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:

> A good 9p client sends T_readdir with "count" parameter that's
> sufficiently smaller than client's initially negotiated msize
> (maximum message size). We perform a check for that though to
> avoid the server to be interrupted with a "Failed to encode
> VirtFS reply type 41" error message by bad clients.
> 
> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> ---
>  hw/9pfs/9p.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> index a5fbe821d4..30da2fedf3 100644
> --- a/hw/9pfs/9p.c
> +++ b/hw/9pfs/9p.c
> @@ -2426,6 +2426,7 @@ static void coroutine_fn v9fs_readdir(void *opaque)
>      int32_t count;
>      uint32_t max_count;
>      V9fsPDU *pdu = opaque;
> +    V9fsState *s = pdu->s;
>  
>      retval = pdu_unmarshal(pdu, offset, "dqd", &fid,
>                             &initial_offset, &max_count);
> @@ -2434,6 +2435,13 @@ static void coroutine_fn v9fs_readdir(void *opaque)
>      }
>      trace_v9fs_readdir(pdu->tag, pdu->id, fid, initial_offset, max_count);
>  
> +    if (max_count > s->msize - P9_IOHDRSZ) {

P9_IOHDRSZ relates to Twrite. The Rreaddir message has a smaller header
of size 11:

size[4] Rreaddir tag[2] count[4]

> +        max_count = s->msize - P9_IOHDRSZ;
> +        warn_report_once(
> +            "9p: bad client: T_readdir with count > msize - P9_IOHDRSZ"
> +        );
> +    }
> +
>      fidp = get_fid(pdu, fid);
>      if (fidp == NULL) {
>          retval = -EINVAL;



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 02/11] 9pfs: require msize >= 4096
  2020-01-16 13:15   ` Greg Kurz
@ 2020-01-16 16:16     ` Christian Schoenebeck
  2020-01-16 18:07       ` Greg Kurz
  0 siblings, 1 reply; 25+ messages in thread
From: Christian Schoenebeck @ 2020-01-16 16:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: Greg Kurz, Christian Schoenebeck

On Donnerstag, 16. Januar 2020 14:15:03 CET Greg Kurz wrote:
> On Mon, 13 Jan 2020 23:21:04 +0100
> 
> Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> > A client establishes a session by sending a Tversion request along with
> > a 'msize' parameter which client uses to suggest server a maximum
> > message size ever to be used for communication (for both requests and
> > replies) between client and server during that session. If client
> > suggests a 'msize' smaller than 4096 then deny session by server
> > immediately with an error response (Rlerror for "9P2000.L" clients or
> > Rerror for "9P2000.u" clients) instead of replying with Rversion.
> > 
> > Introduction of a minimum msize is required to prevent issues in
> > responses to numerous individual request types. For instance with a
> > msize of < P9_IOHDRSZ no useful operations would be possible at all.
> 
> P9_IOHDRSZ is really specific to write/read operations.
> 
> /*
>  * ample room for Twrite/Rread header
>  * size[4] Tread/Twrite tag[2] fid[4] offset[8] count[4]
>  */
> #define P9_IOHDRSZ 24
> 
> As you see P9_IOHDRSZ is approximately the size of a Twrite header.
> Its primary use is to inform the client about the 'count' to use for
> Twrite/Tread messages (see get_iounit()).
> 
> Not sure it helps to mention P9_IOHDRSZ since we're going to choose
> something much greater. I'd personally drop this sentence.

The point here was that there must be some kind of absolute minimum msize, 
even though the protocol specs officially don't mandate one. And if a client 
choses a msize < P9_IOHDRSZ, what useful actions shall it be able to do? 
That's why I mentioned P9_IOHDRSZ just in case somebody might come up later 
asking to lower that minimum msize value for whatever reason.

But np, IMO this sentence makes the fundamental requirement for this patch 
more clear, but I have no problem dropping this sentence.

> > Furthermore there are some responses which are not allowed by the 9p
> > protocol to be truncated like e.g. Rreadlink which may yield up to
> 
> No message may be truncated in any way actually. The spec just allows
> an exception with the string part of Rerror.

Rreaddir, Rread, Twrite, Rerror can be truncated. So I suggest I'll just 
change that to s/some/most/ semantically.

> Maybe just mention that and say we choose 4096 to be able to send
> big Rreadlink messages.

That's not the point for this patch. The main purpose is to avoid having to 
maintain individual error prone minimum msize checks all over the code base.
If we would allow very small msizes (much smaller than 4096), then we would 
need to add numerous error handlers all over the code base, each one checking 
for different values (depending on the specific message type).

> > a size of PATH_MAX which is usually 4096. Hence this size was chosen
> > as min. msize for server, which is already the minimum msize of the
> > Linux kernel's 9pfs client. By forcing a min. msize already at
> > session start (when handling Tversion) we don't have to check for a
> > minimum msize on a per request type basis later on during session,
> > which would be much harder and more error prone to maintain.
> > 
> > This is a user visible change which should be documented as such
> > (i.e. in public QEMU release changelog).
> 
> This last sentence isn't informative in the commit message. This
> kind of indication should be added after the --- below.
> 
> > Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> > ---

np

> 
> LGTM
> 
> With an updated changelog,
> 
> Reviewed-by: Greg Kurz <groug@kaod.org>
> 
> >  hw/9pfs/9p.c | 12 ++++++++++++
> >  hw/9pfs/9p.h | 11 +++++++++++
> >  2 files changed, 23 insertions(+)
> > 
> > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> > index 520177f40c..a5fbe821d4 100644
> > --- a/hw/9pfs/9p.c
> > +++ b/hw/9pfs/9p.c
> > @@ -1363,8 +1363,20 @@ static void coroutine_fn v9fs_version(void *opaque)
> > 
> >          s->proto_version = V9FS_PROTO_2000L;
> >      
> >      } else {
> >      
> >          v9fs_string_sprintf(&version, "unknown");
> > 
> > +        /* skip min. msize check, reporting invalid version has priority
> > */ +        goto marshal;
> > 
> >      }
> > 
> > +    if (s->msize < P9_MIN_MSIZE) {
> > +        err = -EMSGSIZE;
> > +        error_report(
> > +            "9pfs: Client requested msize < minimum msize ("
> > +            stringify(P9_MIN_MSIZE) ") supported by this server."
> > +        );
> > +        goto out;
> > +    }
> > +
> > 
> > +marshal:
> >      err = pdu_marshal(pdu, offset, "ds", s->msize, &version);
> >      if (err < 0) {
> >      
> >          goto out;
> > 
> > diff --git a/hw/9pfs/9p.h b/hw/9pfs/9p.h
> > index 3904f82901..6fffe44f5a 100644
> > --- a/hw/9pfs/9p.h
> > +++ b/hw/9pfs/9p.h
> > @@ -100,6 +100,17 @@ typedef enum P9ProtoVersion {
> > 
> >      V9FS_PROTO_2000L = 0x02,
> >  
> >  } P9ProtoVersion;
> > 
> > +/**
> > + * @brief Minimum message size supported by this 9pfs server.
> > + *
> > + * A client establishes a session by sending a Tversion request along
> > with a + * 'msize' parameter which suggests the server a maximum message
> > size ever to be + * used for communication (for both requests and
> > replies) between client and + * server during that session. If client
> > suggests a 'msize' smaller than this + * value then session is denied by
> > server with an error response. + */
> > +#define P9_MIN_MSIZE    4096
> > +
> > 
> >  #define P9_NOTAG    UINT16_MAX
> >  #define P9_NOFID    UINT32_MAX
> >  #define P9_MAXWELEM 16

Best regards,
Christian Schoenebeck




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 03/11] 9pfs: validate count sent by client with T_readdir
  2020-01-16 13:33   ` Greg Kurz
@ 2020-01-16 16:51     ` Christian Schoenebeck
  2020-01-17 15:50       ` Greg Kurz
  0 siblings, 1 reply; 25+ messages in thread
From: Christian Schoenebeck @ 2020-01-16 16:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: Greg Kurz, Christian Schoenebeck

On Donnerstag, 16. Januar 2020 14:33:42 CET Greg Kurz wrote:
> On Mon, 13 Jan 2020 23:22:08 +0100
> 
> Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> > A good 9p client sends T_readdir with "count" parameter that's
> > sufficiently smaller than client's initially negotiated msize
> > (maximum message size). We perform a check for that though to
> > avoid the server to be interrupted with a "Failed to encode
> > VirtFS reply type 41" error message by bad clients.
> > 
> > Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> > ---
> > 
> >  hw/9pfs/9p.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> > index a5fbe821d4..30da2fedf3 100644
> > --- a/hw/9pfs/9p.c
> > +++ b/hw/9pfs/9p.c
> > @@ -2426,6 +2426,7 @@ static void coroutine_fn v9fs_readdir(void *opaque)
> > 
> >      int32_t count;
> >      uint32_t max_count;
> >      V9fsPDU *pdu = opaque;
> > 
> > +    V9fsState *s = pdu->s;
> > 
> >      retval = pdu_unmarshal(pdu, offset, "dqd", &fid,
> >      
> >                             &initial_offset, &max_count);
> > 
> > @@ -2434,6 +2435,13 @@ static void coroutine_fn v9fs_readdir(void *opaque)
> > 
> >      }
> >      trace_v9fs_readdir(pdu->tag, pdu->id, fid, initial_offset,
> >      max_count);
> > 
> > +    if (max_count > s->msize - P9_IOHDRSZ) {
> 
> P9_IOHDRSZ relates to Twrite. The Rreaddir message has a smaller header
> of size 11:
> 
> size[4] Rreaddir tag[2] count[4]

Right, looks like I have falsely picked P9_IOHDRSZ after looking at:

static size_t v9fs_readdir_data_size(V9fsString *name)
{
    /*
     * Size of each dirent on the wire: size of qid (13) + size of offset (8)
     * size of type (1) + size of name.size (2) + strlen(name.data)
     */
    return 24 + v9fs_string_size(name);
}

I'll have to correct that in the test cases as well. So no need to comment on 
them for now.

But if you have an idea about the issue mentioned in cover letter (patch 7), 
let me know. I have a feeling that there is some problem with the test 
environment, because I also get strange error messages when I just add some 
more e.g. noop 9pfs test cases (empty test cases doing nothing) or by copy 
pasting existing tests and then running 

tests/qos-test -l

which obviously should just list the test cases, but not executing any of 
them. I'd end up with "cannot push stack" error messages for some reason.

> > +        max_count = s->msize - P9_IOHDRSZ;
> > +        warn_report_once(
> > +            "9p: bad client: T_readdir with count > msize - P9_IOHDRSZ"
> > +        );
> > +    }
> > +
> > 
> >      fidp = get_fid(pdu, fid);
> >      if (fidp == NULL) {
> >      
> >          retval = -EINVAL;

Best regards,
Christian Schoenebeck




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 02/11] 9pfs: require msize >= 4096
  2020-01-16 16:16     ` Christian Schoenebeck
@ 2020-01-16 18:07       ` Greg Kurz
  2020-01-16 21:39         ` Christian Schoenebeck
  0 siblings, 1 reply; 25+ messages in thread
From: Greg Kurz @ 2020-01-16 18:07 UTC (permalink / raw)
  To: Christian Schoenebeck; +Cc: qemu-devel

On Thu, 16 Jan 2020 17:16:07 +0100
Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:

> On Donnerstag, 16. Januar 2020 14:15:03 CET Greg Kurz wrote:
> > On Mon, 13 Jan 2020 23:21:04 +0100
> > 
> > Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> > > A client establishes a session by sending a Tversion request along with
> > > a 'msize' parameter which client uses to suggest server a maximum
> > > message size ever to be used for communication (for both requests and
> > > replies) between client and server during that session. If client
> > > suggests a 'msize' smaller than 4096 then deny session by server
> > > immediately with an error response (Rlerror for "9P2000.L" clients or
> > > Rerror for "9P2000.u" clients) instead of replying with Rversion.
> > > 
> > > Introduction of a minimum msize is required to prevent issues in
> > > responses to numerous individual request types. For instance with a
> > > msize of < P9_IOHDRSZ no useful operations would be possible at all.
> > 
> > P9_IOHDRSZ is really specific to write/read operations.
> > 
> > /*
> >  * ample room for Twrite/Rread header
> >  * size[4] Tread/Twrite tag[2] fid[4] offset[8] count[4]
> >  */
> > #define P9_IOHDRSZ 24
> > 
> > As you see P9_IOHDRSZ is approximately the size of a Twrite header.
> > Its primary use is to inform the client about the 'count' to use for
> > Twrite/Tread messages (see get_iounit()).
> > 
> > Not sure it helps to mention P9_IOHDRSZ since we're going to choose
> > something much greater. I'd personally drop this sentence.
> 
> The point here was that there must be some kind of absolute minimum msize, 

Then the absolute minimum size is 7, the size of the header part that is
common to all messages:

size[4] Message tag[2]

> even though the protocol specs officially don't mandate one. And if a client 
> choses a msize < P9_IOHDRSZ, what useful actions shall it be able to do? 

I haven't checked but I guess some messages can fit in 24 bytes.

> That's why I mentioned P9_IOHDRSZ just in case somebody might come up later 
> asking to lower that minimum msize value for whatever reason.
> 

That's precisely what I want to avoid. The semantic of P9_IOHDRSZ is
that it is the header size of a Twrite message and as such it is used
to compute the 'iounit' which the guest later uses to compute a
suitable 'count' for Tread/Twrite messages only. It shouldn't be
involved in msize considerations IMHO.

> But np, IMO this sentence makes the fundamental requirement for this patch 
> more clear, but I have no problem dropping this sentence.
> 

Then you can mention 7 has the true minimal size.

> > > Furthermore there are some responses which are not allowed by the 9p
> > > protocol to be truncated like e.g. Rreadlink which may yield up to
> > 
> > No message may be truncated in any way actually. The spec just allows
> > an exception with the string part of Rerror.
> 
> Rreaddir, Rread, Twrite, Rerror can be truncated. So I suggest I'll just 
> change that to s/some/most/ semantically.
> 

I mean truncate with loss of data. Rreaddir can return less than 'count'
but it won't truncate an individual entry. It rewinds to the previous
one and returns its offset for the next Treaddir. Same goes with Rread
and Twrite, we always return a 'count' that can be used by the client
to continue reading or writing.

Rerror is really the only message where we can deliberately drop
data (but we actually don't do that).

> > Maybe just mention that and say we choose 4096 to be able to send
> > big Rreadlink messages.
> 
> That's not the point for this patch. The main purpose is to avoid having to 
> maintain individual error prone minimum msize checks all over the code base.
> If we would allow very small msizes (much smaller than 4096), then we would 
> need to add numerous error handlers all over the code base, each one checking 
> for different values (depending on the specific message type).
> 

I'm not sure what you mean by 'minimum msize checks all over the code base'...
Please provide an example.

> > > a size of PATH_MAX which is usually 4096. Hence this size was chosen
> > > as min. msize for server, which is already the minimum msize of the
> > > Linux kernel's 9pfs client. By forcing a min. msize already at
> > > session start (when handling Tversion) we don't have to check for a
> > > minimum msize on a per request type basis later on during session,
> > > which would be much harder and more error prone to maintain.
> > > 
> > > This is a user visible change which should be documented as such
> > > (i.e. in public QEMU release changelog).
> > 
> > This last sentence isn't informative in the commit message. This
> > kind of indication should be added after the --- below.
> > 
> > > Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> > > ---
> 
> np
> 
> > 
> > LGTM
> > 
> > With an updated changelog,
> > 
> > Reviewed-by: Greg Kurz <groug@kaod.org>
> > 
> > >  hw/9pfs/9p.c | 12 ++++++++++++
> > >  hw/9pfs/9p.h | 11 +++++++++++
> > >  2 files changed, 23 insertions(+)
> > > 
> > > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> > > index 520177f40c..a5fbe821d4 100644
> > > --- a/hw/9pfs/9p.c
> > > +++ b/hw/9pfs/9p.c
> > > @@ -1363,8 +1363,20 @@ static void coroutine_fn v9fs_version(void *opaque)
> > > 
> > >          s->proto_version = V9FS_PROTO_2000L;
> > >      
> > >      } else {
> > >      
> > >          v9fs_string_sprintf(&version, "unknown");
> > > 
> > > +        /* skip min. msize check, reporting invalid version has priority
> > > */ +        goto marshal;
> > > 
> > >      }
> > > 
> > > +    if (s->msize < P9_MIN_MSIZE) {
> > > +        err = -EMSGSIZE;
> > > +        error_report(
> > > +            "9pfs: Client requested msize < minimum msize ("
> > > +            stringify(P9_MIN_MSIZE) ") supported by this server."
> > > +        );
> > > +        goto out;
> > > +    }
> > > +
> > > 
> > > +marshal:
> > >      err = pdu_marshal(pdu, offset, "ds", s->msize, &version);
> > >      if (err < 0) {
> > >      
> > >          goto out;
> > > 
> > > diff --git a/hw/9pfs/9p.h b/hw/9pfs/9p.h
> > > index 3904f82901..6fffe44f5a 100644
> > > --- a/hw/9pfs/9p.h
> > > +++ b/hw/9pfs/9p.h
> > > @@ -100,6 +100,17 @@ typedef enum P9ProtoVersion {
> > > 
> > >      V9FS_PROTO_2000L = 0x02,
> > >  
> > >  } P9ProtoVersion;
> > > 
> > > +/**
> > > + * @brief Minimum message size supported by this 9pfs server.
> > > + *
> > > + * A client establishes a session by sending a Tversion request along
> > > with a + * 'msize' parameter which suggests the server a maximum message
> > > size ever to be + * used for communication (for both requests and
> > > replies) between client and + * server during that session. If client
> > > suggests a 'msize' smaller than this + * value then session is denied by
> > > server with an error response. + */
> > > +#define P9_MIN_MSIZE    4096
> > > +
> > > 
> > >  #define P9_NOTAG    UINT16_MAX
> > >  #define P9_NOFID    UINT32_MAX
> > >  #define P9_MAXWELEM 16
> 
> Best regards,
> Christian Schoenebeck
> 
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 02/11] 9pfs: require msize >= 4096
  2020-01-16 18:07       ` Greg Kurz
@ 2020-01-16 21:39         ` Christian Schoenebeck
  2020-01-17 10:24           ` Greg Kurz
  0 siblings, 1 reply; 25+ messages in thread
From: Christian Schoenebeck @ 2020-01-16 21:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: Greg Kurz, Christian Schoenebeck

On Donnerstag, 16. Januar 2020 19:07:48 CET Greg Kurz wrote:
> > The point here was that there must be some kind of absolute minimum msize,
> 
> Then the absolute minimum size is 7, the size of the header part that is
> common to all messages:
> 
> size[4] Message tag[2]
> 
> > even though the protocol specs officially don't mandate one. And if a
> > client choses a msize < P9_IOHDRSZ, what useful actions shall it be able
> > to do?
> I haven't checked but I guess some messages can fit in 24 bytes.
> 
> > That's why I mentioned P9_IOHDRSZ just in case somebody might come up
> > later
> > asking to lower that minimum msize value for whatever reason.
> 
> That's precisely what I want to avoid. The semantic of P9_IOHDRSZ is
> that it is the header size of a Twrite message and as such it is used
> to compute the 'iounit' which the guest later uses to compute a
> suitable 'count' for Tread/Twrite messages only. It shouldn't be
> involved in msize considerations IMHO.
> 
> > But np, IMO this sentence makes the fundamental requirement for this patch
> > more clear, but I have no problem dropping this sentence.
> 
> Then you can mention 7 has the true minimal size.

Ok, I'll drop P9_IOHDRSZ completely and mention literal 7 as absolute 
theoretical minimum instead.

> > > > Furthermore there are some responses which are not allowed by the 9p
> > > > protocol to be truncated like e.g. Rreadlink which may yield up to
> > > 
> > > No message may be truncated in any way actually. The spec just allows
> > > an exception with the string part of Rerror.
> > 
> > Rreaddir, Rread, Twrite, Rerror can be truncated. So I suggest I'll just
> > change that to s/some/most/ semantically.
> 
> I mean truncate with loss of data. Rreaddir can return less than 'count'
> but it won't truncate an individual entry. It rewinds to the previous
> one and returns its offset for the next Treaddir. Same goes with Rread
> and Twrite, we always return a 'count' that can be used by the client
> to continue reading or writing.

Individual entries are not truncated, correct, but data loss: Example, you 
have this directory and say server's fs would deliver entries by readdir() in 
this order (from top down):

	.
	..
	a
	1234567890
	b
	c
	d

and say 37 >= msize < 45, then client would receive 3 entries ".", "..", "a" 
and that's it.

> Rerror is really the only message where we can deliberately drop
> data (but we actually don't do that).
> 
> > > Maybe just mention that and say we choose 4096 to be able to send
> > > big Rreadlink messages.
> > 
> > That's not the point for this patch. The main purpose is to avoid having
> > to
> > maintain individual error prone minimum msize checks all over the code
> > base. If we would allow very small msizes (much smaller than 4096), then
> > we would need to add numerous error handlers all over the code base, each
> > one checking for different values (depending on the specific message
> > type).
> 
> I'm not sure what you mean by 'minimum msize checks all over the code
> base'... Please provide an example.

1. Like done in this patch series: Treaddir has to limit client's 'count'
   parameter according to msize (by subtracting with msize).

2. get_iounit() does this:

		iounit = stbuf.f_bsize;
		iounit *= (s->msize - P9_IOHDRSZ) / stbuf.f_bsize;

   without checking anywhere for a potential negative outcome
   (i.e. when msize < P9_IOHDRSZ)

3. Example of directory entry name length with Rreaddir above.

Individual issues that can easily be overlooked but also easily avoided by not 
allowing small msizes in the first place.

Best regards,
Christian Schoenebeck




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 02/11] 9pfs: require msize >= 4096
  2020-01-16 21:39         ` Christian Schoenebeck
@ 2020-01-17 10:24           ` Greg Kurz
  2020-01-17 12:01             ` Christian Schoenebeck
  0 siblings, 1 reply; 25+ messages in thread
From: Greg Kurz @ 2020-01-17 10:24 UTC (permalink / raw)
  To: Christian Schoenebeck; +Cc: qemu-devel

On Thu, 16 Jan 2020 22:39:19 +0100
Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:

> On Donnerstag, 16. Januar 2020 19:07:48 CET Greg Kurz wrote:
> > > The point here was that there must be some kind of absolute minimum msize,
> > 
> > Then the absolute minimum size is 7, the size of the header part that is
> > common to all messages:
> > 
> > size[4] Message tag[2]
> > 
> > > even though the protocol specs officially don't mandate one. And if a
> > > client choses a msize < P9_IOHDRSZ, what useful actions shall it be able
> > > to do?
> > I haven't checked but I guess some messages can fit in 24 bytes.
> > 
> > > That's why I mentioned P9_IOHDRSZ just in case somebody might come up
> > > later
> > > asking to lower that minimum msize value for whatever reason.
> > 
> > That's precisely what I want to avoid. The semantic of P9_IOHDRSZ is
> > that it is the header size of a Twrite message and as such it is used
> > to compute the 'iounit' which the guest later uses to compute a
> > suitable 'count' for Tread/Twrite messages only. It shouldn't be
> > involved in msize considerations IMHO.
> > 
> > > But np, IMO this sentence makes the fundamental requirement for this patch
> > > more clear, but I have no problem dropping this sentence.
> > 
> > Then you can mention 7 has the true minimal size.
> 
> Ok, I'll drop P9_IOHDRSZ completely and mention literal 7 as absolute 
> theoretical minimum instead.
> 
> > > > > Furthermore there are some responses which are not allowed by the 9p
> > > > > protocol to be truncated like e.g. Rreadlink which may yield up to
> > > > 
> > > > No message may be truncated in any way actually. The spec just allows
> > > > an exception with the string part of Rerror.
> > > 
> > > Rreaddir, Rread, Twrite, Rerror can be truncated. So I suggest I'll just
> > > change that to s/some/most/ semantically.
> > 
> > I mean truncate with loss of data. Rreaddir can return less than 'count'
> > but it won't truncate an individual entry. It rewinds to the previous
> > one and returns its offset for the next Treaddir. Same goes with Rread
> > and Twrite, we always return a 'count' that can be used by the client
> > to continue reading or writing.
> 
> Individual entries are not truncated, correct, but data loss: Example, you 
> have this directory and say server's fs would deliver entries by readdir() in 
> this order (from top down):
> 
> 	.
> 	..
> 	a
> 	1234567890
> 	b
> 	c
> 	d
> 
> and say 37 >= msize < 45, then client would receive 3 entries ".", "..", "a" 
> and that's it.
> 

Yes msize < 45 isn't enough to send the Rreaddir for "1234567890"
and since it is forbidden to truncate, we bail out... but we
did not send a truncated message still.

> > Rerror is really the only message where we can deliberately drop
> > data (but we actually don't do that).
> > 
> > > > Maybe just mention that and say we choose 4096 to be able to send
> > > > big Rreadlink messages.
> > > 
> > > That's not the point for this patch. The main purpose is to avoid having
> > > to
> > > maintain individual error prone minimum msize checks all over the code
> > > base. If we would allow very small msizes (much smaller than 4096), then
> > > we would need to add numerous error handlers all over the code base, each
> > > one checking for different values (depending on the specific message
> > > type).
> > 
> > I'm not sure what you mean by 'minimum msize checks all over the code
> > base'... Please provide an example.
> 
> 1. Like done in this patch series: Treaddir has to limit client's 'count'
>    parameter according to msize (by subtracting with msize).
> 

Hmm... this patch does a sanity check on 'count', not on 'msize'... I
mean no matter what msize is, clipping count to msize - 11 gives a
chance to stop processing the entries before overflowing the transport
buffer.

> 2. get_iounit() does this:
> 
> 		iounit = stbuf.f_bsize;
> 		iounit *= (s->msize - P9_IOHDRSZ) / stbuf.f_bsize;
> 
>    without checking anywhere for a potential negative outcome
>    (i.e. when msize < P9_IOHDRSZ)
> 

This function should even have an assert() for that, just to be
sure no one tries to call it before s->msize is even set, which
would clearly be a bug in QEMU. But this can be done in a
follow-up patch.

> 3. Example of directory entry name length with Rreaddir above.
> 

msize being too small to accommodate an Rreaddir with a single
entry is a different problem as we cannot do anything about it
at this point but fail... That's why the minimum msize should
rather be chosen with file names in mind, which are likely to
be longer than any message header. Rreadlink being the one with
the higher potential since it will usually return a string
containing a path name (while Rreaddir entries only contain
a single path element).

> Individual issues that can easily be overlooked but also easily avoided by not 
> allowing small msizes in the first place.
> 

My point is that we're not going to check msize in Tversion in
order to to avoid multiple checks everywhere. We're going to do
it there because it is the only place where it makes sense to
do it.

> Best regards,
> Christian Schoenebeck
> 
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 02/11] 9pfs: require msize >= 4096
  2020-01-17 10:24           ` Greg Kurz
@ 2020-01-17 12:01             ` Christian Schoenebeck
  2020-01-17 15:15               ` Greg Kurz
  0 siblings, 1 reply; 25+ messages in thread
From: Christian Schoenebeck @ 2020-01-17 12:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: Greg Kurz, Christian Schoenebeck

On Freitag, 17. Januar 2020 11:24:21 CET Greg Kurz wrote:
> On Thu, 16 Jan 2020 22:39:19 +0100
> 
> Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> > On Donnerstag, 16. Januar 2020 19:07:48 CET Greg Kurz wrote:
> > > > The point here was that there must be some kind of absolute minimum
> > > > msize,
> > > 
> > > Then the absolute minimum size is 7, the size of the header part that is
> > > common to all messages:
> > > 
> > > size[4] Message tag[2]
> > > 
> > > > even though the protocol specs officially don't mandate one. And if a
> > > > client choses a msize < P9_IOHDRSZ, what useful actions shall it be
> > > > able
> > > > to do?
> > > 
> > > I haven't checked but I guess some messages can fit in 24 bytes.
> > > 
> > > > That's why I mentioned P9_IOHDRSZ just in case somebody might come up
> > > > later
> > > > asking to lower that minimum msize value for whatever reason.
> > > 
> > > That's precisely what I want to avoid. The semantic of P9_IOHDRSZ is
> > > that it is the header size of a Twrite message and as such it is used
> > > to compute the 'iounit' which the guest later uses to compute a
> > > suitable 'count' for Tread/Twrite messages only. It shouldn't be
> > > involved in msize considerations IMHO.
> > > 
> > > > But np, IMO this sentence makes the fundamental requirement for this
> > > > patch
> > > > more clear, but I have no problem dropping this sentence.
> > > 
> > > Then you can mention 7 has the true minimal size.
> > 
> > Ok, I'll drop P9_IOHDRSZ completely and mention literal 7 as absolute
> > theoretical minimum instead.
> > 
> > > > > > Furthermore there are some responses which are not allowed by the
> > > > > > 9p
> > > > > > protocol to be truncated like e.g. Rreadlink which may yield up to
> > > > > 
> > > > > No message may be truncated in any way actually. The spec just
> > > > > allows
> > > > > an exception with the string part of Rerror.
> > > > 
> > > > Rreaddir, Rread, Twrite, Rerror can be truncated. So I suggest I'll
> > > > just
> > > > change that to s/some/most/ semantically.
> > > 
> > > I mean truncate with loss of data. Rreaddir can return less than 'count'
> > > but it won't truncate an individual entry. It rewinds to the previous
> > > one and returns its offset for the next Treaddir. Same goes with Rread
> > > and Twrite, we always return a 'count' that can be used by the client
> > > to continue reading or writing.
> > 
> > Individual entries are not truncated, correct, but data loss: Example, you
> > have this directory and say server's fs would deliver entries by readdir()
> > in> 
> > this order (from top down):
> > 	.
> > 	..
> > 	a
> > 	1234567890
> > 	b
> > 	c
> > 	d
> > 
> > and say 37 >= msize < 45, then client would receive 3 entries ".", "..",
> > "a" and that's it.
> 
> Yes msize < 45 isn't enough to send the Rreaddir for "1234567890"
> and since it is forbidden to truncate, we bail out... but we
> did not send a truncated message still.

I really think we're discussing semantical interpretation details here which 
bring us nowhere.

> > > Rerror is really the only message where we can deliberately drop
> > > data (but we actually don't do that).
> > > 
> > > > > Maybe just mention that and say we choose 4096 to be able to send
> > > > > big Rreadlink messages.
> > > > 
> > > > That's not the point for this patch. The main purpose is to avoid
> > > > having
> > > > to
> > > > maintain individual error prone minimum msize checks all over the code
> > > > base. If we would allow very small msizes (much smaller than 4096),
> > > > then
> > > > we would need to add numerous error handlers all over the code base,
> > > > each
> > > > one checking for different values (depending on the specific message
> > > > type).
> > > 
> > > I'm not sure what you mean by 'minimum msize checks all over the code
> > > base'... Please provide an example.
> > 
> > 1. Like done in this patch series: Treaddir has to limit client's 'count'
> > 
> >    parameter according to msize (by subtracting with msize).
> 
> Hmm... this patch does a sanity check on 'count', not on 'msize'... 

Yes ... :)

> I mean no matter what msize is, clipping count to msize - 11 gives a
> chance to stop processing the entries before overflowing the transport
> buffer.

... and no, this cannot happen if minimum msize of 4096 is forced already by 
Tversion. Maybe you now get my point -> It is about avoiding exactly such kind 
of issues in the first place. Most file systems have a name limit of 255 
bytes:

https://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits

So by forcing a minimum 'msize' of 4096 you avoid having to deal with this  
issue (and similar ones) on Treaddir request level (and other request type 
handlers), including ReiserFS BTW because 4032+35 < 4096.

If you would allow smaller 'msize' values by Tversion, then you would need to 
suffer some kind of death when handling Treaddir with certain high file name 
length. Either a transport error (with an error message that a normal user 
would not be able to understand at all) or by returning an incomplete Treaddir 
response sequence with { Rreaddir count=0 }, or ... any other kind of death.

No matter which death you would choose here, it would be horrible from 
usability perspective, because the system would most of the time work 
flawlessy, and an error case would just be triggered if guest hits a file/dir 
beyond some certain name length. It is not worth it! Force 4kiB already at 
Tversion and that's it.

> > 2. get_iounit() does this:
> > 		iounit = stbuf.f_bsize;
> > 		iounit *= (s->msize - P9_IOHDRSZ) / stbuf.f_bsize;
> > 		
> >    without checking anywhere for a potential negative outcome
> >    (i.e. when msize < P9_IOHDRSZ)
> 
> This function should even have an assert() for that, just to be
> sure no one tries to call it before s->msize is even set, which
> would clearly be a bug in QEMU. But this can be done in a
> follow-up patch.
> 
> > 3. Example of directory entry name length with Rreaddir above.
> 
> msize being too small to accommodate an Rreaddir with a single
> entry is a different problem as we cannot do anything about it
> at this point but fail... That's why the minimum msize should
> rather be chosen with file names in mind, which are likely to
> be longer than any message header. Rreadlink being the one with
> the higher potential since it will usually return a string
> containing a path name (while Rreaddir entries only contain
> a single path element).
> 
> > Individual issues that can easily be overlooked but also easily avoided by
> > not allowing small msizes in the first place.
> 
> My point is that we're not going to check msize in Tversion in
> order to to avoid multiple checks everywhere. We're going to do
> it there because it is the only place where it makes sense to
> do it.

Also yes and no. Of course it just makes sense to handle it already at 
Tversion. But no, you could theoretically also allow much smaller minimum 
'msize' value << 4096 (i.e. somewhere closely >7 as we discussed), then you 
would indeed still need to add msize checks at other places of the code base 
as you just found out now. So forcing a minimum 'msize' which is high enough, 
avoids having to add such individual checks and having to deal with them in 
some kind of unpleasant way.

I hope this makes it more clear now.

Best regards,
Christian Schoenebeck




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 02/11] 9pfs: require msize >= 4096
  2020-01-17 12:01             ` Christian Schoenebeck
@ 2020-01-17 15:15               ` Greg Kurz
  2020-01-17 16:41                 ` Christian Schoenebeck
  0 siblings, 1 reply; 25+ messages in thread
From: Greg Kurz @ 2020-01-17 15:15 UTC (permalink / raw)
  To: Christian Schoenebeck; +Cc: qemu-devel

On Fri, 17 Jan 2020 13:01:30 +0100
Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:

> On Freitag, 17. Januar 2020 11:24:21 CET Greg Kurz wrote:
> > On Thu, 16 Jan 2020 22:39:19 +0100
> > 
> > Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> > > On Donnerstag, 16. Januar 2020 19:07:48 CET Greg Kurz wrote:
> > > > > The point here was that there must be some kind of absolute minimum
> > > > > msize,
> > > > 
> > > > Then the absolute minimum size is 7, the size of the header part that is
> > > > common to all messages:
> > > > 
> > > > size[4] Message tag[2]
> > > > 
> > > > > even though the protocol specs officially don't mandate one. And if a
> > > > > client choses a msize < P9_IOHDRSZ, what useful actions shall it be
> > > > > able
> > > > > to do?
> > > > 
> > > > I haven't checked but I guess some messages can fit in 24 bytes.
> > > > 
> > > > > That's why I mentioned P9_IOHDRSZ just in case somebody might come up
> > > > > later
> > > > > asking to lower that minimum msize value for whatever reason.
> > > > 
> > > > That's precisely what I want to avoid. The semantic of P9_IOHDRSZ is
> > > > that it is the header size of a Twrite message and as such it is used
> > > > to compute the 'iounit' which the guest later uses to compute a
> > > > suitable 'count' for Tread/Twrite messages only. It shouldn't be
> > > > involved in msize considerations IMHO.
> > > > 
> > > > > But np, IMO this sentence makes the fundamental requirement for this
> > > > > patch
> > > > > more clear, but I have no problem dropping this sentence.
> > > > 
> > > > Then you can mention 7 has the true minimal size.
> > > 
> > > Ok, I'll drop P9_IOHDRSZ completely and mention literal 7 as absolute
> > > theoretical minimum instead.
> > > 
> > > > > > > Furthermore there are some responses which are not allowed by the
> > > > > > > 9p
> > > > > > > protocol to be truncated like e.g. Rreadlink which may yield up to
> > > > > > 
> > > > > > No message may be truncated in any way actually. The spec just
> > > > > > allows
> > > > > > an exception with the string part of Rerror.
> > > > > 
> > > > > Rreaddir, Rread, Twrite, Rerror can be truncated. So I suggest I'll
> > > > > just
> > > > > change that to s/some/most/ semantically.
> > > > 
> > > > I mean truncate with loss of data. Rreaddir can return less than 'count'
> > > > but it won't truncate an individual entry. It rewinds to the previous
> > > > one and returns its offset for the next Treaddir. Same goes with Rread
> > > > and Twrite, we always return a 'count' that can be used by the client
> > > > to continue reading or writing.
> > > 
> > > Individual entries are not truncated, correct, but data loss: Example, you
> > > have this directory and say server's fs would deliver entries by readdir()
> > > in> 
> > > this order (from top down):
> > > 	.
> > > 	..
> > > 	a
> > > 	1234567890
> > > 	b
> > > 	c
> > > 	d
> > > 
> > > and say 37 >= msize < 45, then client would receive 3 entries ".", "..",
> > > "a" and that's it.
> > 
> > Yes msize < 45 isn't enough to send the Rreaddir for "1234567890"
> > and since it is forbidden to truncate, we bail out... but we
> > did not send a truncated message still.
> 
> I really think we're discussing semantical interpretation details here which 
> bring us nowhere.
> 

Agreed.

> > > > Rerror is really the only message where we can deliberately drop
> > > > data (but we actually don't do that).
> > > > 
> > > > > > Maybe just mention that and say we choose 4096 to be able to send
> > > > > > big Rreadlink messages.
> > > > > 
> > > > > That's not the point for this patch. The main purpose is to avoid
> > > > > having
> > > > > to
> > > > > maintain individual error prone minimum msize checks all over the code
> > > > > base. If we would allow very small msizes (much smaller than 4096),
> > > > > then
> > > > > we would need to add numerous error handlers all over the code base,
> > > > > each
> > > > > one checking for different values (depending on the specific message
> > > > > type).
> > > > 
> > > > I'm not sure what you mean by 'minimum msize checks all over the code
> > > > base'... Please provide an example.
> > > 
> > > 1. Like done in this patch series: Treaddir has to limit client's 'count'
> > > 
> > >    parameter according to msize (by subtracting with msize).
> > 
> > Hmm... this patch does a sanity check on 'count', not on 'msize'... 
> 
> Yes ... :)
> 
> > I mean no matter what msize is, clipping count to msize - 11 gives a
> > chance to stop processing the entries before overflowing the transport
> > buffer.
> 
> ... and no, this cannot happen if minimum msize of 4096 is forced already by 
> Tversion. Maybe you now get my point -> It is about avoiding exactly such kind 

I'm not sure to see how setting a minimum msize of 4096 at Tversion would
prevent the client to pass a higher 'count' argument and lure the server
into generating a bigger than msize response since it does not check
count < msize - 11 without patch 3.

> of issues in the first place. Most file systems have a name limit of 255 
> bytes:
> 
> https://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits
> 
> So by forcing a minimum 'msize' of 4096 you avoid having to deal with this  
> issue (and similar ones) on Treaddir request level (and other request type 
> handlers), including ReiserFS BTW because 4032+35 < 4096.
> 

Good to know for ReiserFS.

> If you would allow smaller 'msize' values by Tversion, then you would need to 
> suffer some kind of death when handling Treaddir with certain high file name 
> length. Either a transport error (with an error message that a normal user 
> would not be able to understand at all) or by returning an incomplete Treaddir 
> response sequence with { Rreaddir count=0 }, or ... any other kind of death.
> 

Ahh I now understand at last your argument about Rreaddir loosing data.
We may end up sending { Rreaddir count=0 } because the next entry is too
large... and thus end the readdir sequence. Mentioning this explicitly
from the start would have been more clear for me ;-)

This looks like yet another bug to me. It looks wrong to return this
special response if we have more entries to go. Also this could be the
client's _fault_ if it provides a ridiculously small value for count.
The current code will return count=0 all the same.

In any case, I think v9fs_do_readdir() should only return 0 if there
are no more entries to read. It should error out otherwise, but I'm
not sure how...

> No matter which death you would choose here, it would be horrible from 
> usability perspective, because the system would most of the time work 
> flawlessy, and an error case would just be triggered if guest hits a file/dir 
> beyond some certain name length. It is not worth it! Force 4kiB already at 
> Tversion and that's it.
> 
> > > 2. get_iounit() does this:
> > > 		iounit = stbuf.f_bsize;
> > > 		iounit *= (s->msize - P9_IOHDRSZ) / stbuf.f_bsize;
> > > 		
> > >    without checking anywhere for a potential negative outcome
> > >    (i.e. when msize < P9_IOHDRSZ)
> > 
> > This function should even have an assert() for that, just to be
> > sure no one tries to call it before s->msize is even set, which
> > would clearly be a bug in QEMU. But this can be done in a
> > follow-up patch.
> > 
> > > 3. Example of directory entry name length with Rreaddir above.
> > 
> > msize being too small to accommodate an Rreaddir with a single
> > entry is a different problem as we cannot do anything about it
> > at this point but fail... That's why the minimum msize should
> > rather be chosen with file names in mind, which are likely to
> > be longer than any message header. Rreadlink being the one with
> > the higher potential since it will usually return a string
> > containing a path name (while Rreaddir entries only contain
> > a single path element).
> > 
> > > Individual issues that can easily be overlooked but also easily avoided by
> > > not allowing small msizes in the first place.
> > 
> > My point is that we're not going to check msize in Tversion in
> > order to to avoid multiple checks everywhere. We're going to do
> > it there because it is the only place where it makes sense to
> > do it.
> 
> Also yes and no. Of course it just makes sense to handle it already at 
> Tversion. But no, you could theoretically also allow much smaller minimum 
> 'msize' value << 4096 (i.e. somewhere closely >7 as we discussed), then you 
> would indeed still need to add msize checks at other places of the code base 
> as you just found out now. So forcing a minimum 'msize' which is high enough, 
> avoids having to add such individual checks and having to deal with them in 
> some kind of unpleasant way.
> 

We still don't understand each other I'm afraid... we actually have
implicit 'msize' checks already for every single thing we write on
the wire: v9fs_packunpack() which detects when we're trying to write
passed the buffer. When this happens, it is propagated to the transport
which then disconnects, which is the painful thing you've been
experiencing with your readdir experiments. In the case of Rreaddir, it
really does make sense to try to avoid the disconnection like you do in
patch 3 because the readdir sequence allows _partial_ reads. Same goes
for Rread. But that's it. No other message in the protocol allows that,
so I've never thought of adding individual 'msize' checks anywhere else.
What would they do better than v9fs_packunpack() already does ?

> I hope this makes it more clear now.
> 
> Best regards,
> Christian Schoenebeck
> 
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 03/11] 9pfs: validate count sent by client with T_readdir
  2020-01-16 16:51     ` Christian Schoenebeck
@ 2020-01-17 15:50       ` Greg Kurz
  0 siblings, 0 replies; 25+ messages in thread
From: Greg Kurz @ 2020-01-17 15:50 UTC (permalink / raw)
  To: Christian Schoenebeck; +Cc: qemu-devel

On Thu, 16 Jan 2020 17:51:10 +0100
Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:

> On Donnerstag, 16. Januar 2020 14:33:42 CET Greg Kurz wrote:
> > On Mon, 13 Jan 2020 23:22:08 +0100
> > 
> > Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> > > A good 9p client sends T_readdir with "count" parameter that's
> > > sufficiently smaller than client's initially negotiated msize
> > > (maximum message size). We perform a check for that though to
> > > avoid the server to be interrupted with a "Failed to encode
> > > VirtFS reply type 41" error message by bad clients.
> > > 
> > > Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> > > ---
> > > 
> > >  hw/9pfs/9p.c | 8 ++++++++
> > >  1 file changed, 8 insertions(+)
> > > 
> > > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> > > index a5fbe821d4..30da2fedf3 100644
> > > --- a/hw/9pfs/9p.c
> > > +++ b/hw/9pfs/9p.c
> > > @@ -2426,6 +2426,7 @@ static void coroutine_fn v9fs_readdir(void *opaque)
> > > 
> > >      int32_t count;
> > >      uint32_t max_count;
> > >      V9fsPDU *pdu = opaque;
> > > 
> > > +    V9fsState *s = pdu->s;
> > > 
> > >      retval = pdu_unmarshal(pdu, offset, "dqd", &fid,
> > >      
> > >                             &initial_offset, &max_count);
> > > 
> > > @@ -2434,6 +2435,13 @@ static void coroutine_fn v9fs_readdir(void *opaque)
> > > 
> > >      }
> > >      trace_v9fs_readdir(pdu->tag, pdu->id, fid, initial_offset,
> > >      max_count);
> > > 
> > > +    if (max_count > s->msize - P9_IOHDRSZ) {
> > 
> > P9_IOHDRSZ relates to Twrite. The Rreaddir message has a smaller header
> > of size 11:
> > 
> > size[4] Rreaddir tag[2] count[4]
> 
> Right, looks like I have falsely picked P9_IOHDRSZ after looking at:
> 
> static size_t v9fs_readdir_data_size(V9fsString *name)
> {
>     /*
>      * Size of each dirent on the wire: size of qid (13) + size of offset (8)
>      * size of type (1) + size of name.size (2) + strlen(name.data)
>      */
>     return 24 + v9fs_string_size(name);
> }
> 
> I'll have to correct that in the test cases as well. So no need to comment on 
> them for now.
> 
> But if you have an idea about the issue mentioned in cover letter (patch 7), 
> let me know. I have a feeling that there is some problem with the test 
> environment, because I also get strange error messages when I just add some 
> more e.g. noop 9pfs test cases (empty test cases doing nothing) or by copy 
> pasting existing tests and then running 
> 
> tests/qos-test -l
> 
> which obviously should just list the test cases, but not executing any of 
> them. I'd end up with "cannot push stack" error messages for some reason.
> 

No idea. I'll have to look more.

> > > +        max_count = s->msize - P9_IOHDRSZ;
> > > +        warn_report_once(
> > > +            "9p: bad client: T_readdir with count > msize - P9_IOHDRSZ"
> > > +        );
> > > +    }
> > > +
> > > 
> > >      fidp = get_fid(pdu, fid);
> > >      if (fidp == NULL) {
> > >      
> > >          retval = -EINVAL;
> 
> Best regards,
> Christian Schoenebeck
> 
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 05/11] tests/virtio-9p: added readdir test
  2020-01-13 23:08 ` [PATCH v3 05/11] tests/virtio-9p: added " Christian Schoenebeck
@ 2020-01-17 15:51   ` Greg Kurz
  2020-01-17 16:44     ` Christian Schoenebeck
  0 siblings, 1 reply; 25+ messages in thread
From: Greg Kurz @ 2020-01-17 15:51 UTC (permalink / raw)
  To: Christian Schoenebeck; +Cc: qemu-devel

On Tue, 14 Jan 2020 00:08:51 +0100
Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:

> The first readdir test simply checks the amount of directory
> entries returned by 9pfs server, according to the created amount
> of virtual files on 9pfs synth driver side. Then the subsequent
> readdir test also checks whether all directory entries have the
> expected file names (as created on 9pfs synth driver side),
> ignoring their precise order in result list though.
> 
> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> ---

LGTM but I'm afraid this needs rebasing because virtio-9p-test.c got moved
to tests/qtest/ by this commit (merged earlier this week):

commit 1e8a1fae7464ef79c9e50aa0f807d2c511be3c8e
Author: Thomas Huth <thuth@redhat.com>
Date:   Mon Sep 9 12:04:01 2019 +0200

    test: Move qtests to a separate directory
    
    The tests directory itself is pretty overcrowded, and it's hard to
    see which test belongs to which test subsystem (unit, qtest, ...).
    Let's move the qtests to a separate folder for more clarity.
    
    Message-Id: <20191218103059.11729-6-thuth@redhat.com>
    Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Thomas Huth <thuth@redhat.com>

>  tests/virtio-9p-test.c | 149 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 149 insertions(+)
> 
> diff --git a/tests/virtio-9p-test.c b/tests/virtio-9p-test.c
> index 06263edb53..721f13c1fb 100644
> --- a/tests/virtio-9p-test.c
> +++ b/tests/virtio-9p-test.c
> @@ -68,6 +68,11 @@ static void v9fs_memread(P9Req *req, void *addr, size_t len)
>      req->r_off += len;
>  }
>  
> +static void v9fs_uint8_read(P9Req *req, uint8_t *val)
> +{
> +    v9fs_memread(req, val, 1);
> +}
> +
>  static void v9fs_uint16_write(P9Req *req, uint16_t val)
>  {
>      uint16_t le_val = cpu_to_le16(val);
> @@ -101,6 +106,12 @@ static void v9fs_uint32_read(P9Req *req, uint32_t *val)
>      le32_to_cpus(val);
>  }
>  
> +static void v9fs_uint64_read(P9Req *req, uint64_t *val)
> +{
> +    v9fs_memread(req, val, 8);
> +    le64_to_cpus(val);
> +}
> +
>  /* len[2] string[len] */
>  static uint16_t v9fs_string_size(const char *string)
>  {
> @@ -191,6 +202,7 @@ static const char *rmessage_name(uint8_t id)
>          id == P9_RLOPEN ? "RLOPEN" :
>          id == P9_RWRITE ? "RWRITE" :
>          id == P9_RFLUSH ? "RFLUSH" :
> +        id == P9_RREADDIR ? "READDIR" :
>          "<unknown>";
>  }
>  
> @@ -348,6 +360,82 @@ static void v9fs_rwalk(P9Req *req, uint16_t *nwqid, v9fs_qid **wqid)
>      v9fs_req_free(req);
>  }
>  
> +/* size[4] Treaddir tag[2] fid[4] offset[8] count[4] */
> +static P9Req *v9fs_treaddir(QVirtio9P *v9p, uint32_t fid, uint64_t offset,
> +                            uint32_t count, uint16_t tag)
> +{
> +    P9Req *req;
> +
> +    req = v9fs_req_init(v9p, 4 + 8 + 4, P9_TREADDIR, tag);
> +    v9fs_uint32_write(req, fid);
> +    v9fs_uint64_write(req, offset);
> +    v9fs_uint32_write(req, count);
> +    v9fs_req_send(req);
> +    return req;
> +}
> +
> +struct V9fsDirent {
> +    v9fs_qid qid;
> +    uint64_t offset;
> +    uint8_t type;
> +    char *name;
> +    struct V9fsDirent *next;
> +};
> +
> +/* size[4] Rreaddir tag[2] count[4] data[count] */
> +static void v9fs_rreaddir(P9Req *req, uint32_t *count, uint32_t *nentries,
> +                          struct V9fsDirent **entries)
> +{
> +    uint32_t local_count;
> +    struct V9fsDirent *e = NULL;
> +    uint16_t slen;
> +    uint32_t n = 0;
> +
> +    v9fs_req_recv(req, P9_RREADDIR);
> +    v9fs_uint32_read(req, &local_count);
> +
> +    if (count) {
> +        *count = local_count;
> +    }
> +
> +    for (int32_t togo = (int32_t)local_count;
> +         togo >= 13 + 8 + 1 + 2;
> +         togo -= 13 + 8 + 1 + 2 + slen, ++n)
> +    {
> +        if (!e) {
> +            e = g_malloc(sizeof(struct V9fsDirent));
> +            if (entries) {
> +                *entries = e;
> +            }
> +        } else {
> +            e = e->next = g_malloc(sizeof(struct V9fsDirent));
> +        }
> +        e->next = NULL;
> +        /* qid[13] offset[8] type[1] name[s] */
> +        v9fs_memread(req, &e->qid, 13);
> +        v9fs_uint64_read(req, &e->offset);
> +        v9fs_uint8_read(req, &e->type);
> +        v9fs_string_read(req, &slen, &e->name);
> +    }
> +
> +    if (nentries) {
> +        *nentries = n;
> +    }
> +
> +    v9fs_req_free(req);
> +}
> +
> +static void v9fs_free_dirents(struct V9fsDirent *e)
> +{
> +    struct V9fsDirent *next = NULL;
> +
> +    for (; e; e = next) {
> +        next = e->next;
> +        g_free(e->name);
> +        g_free(e);
> +    }
> +}
> +
>  /* size[4] Tlopen tag[2] fid[4] flags[4] */
>  static P9Req *v9fs_tlopen(QVirtio9P *v9p, uint32_t fid, uint32_t flags,
>                            uint16_t tag)
> @@ -480,6 +568,66 @@ static void fs_walk(void *obj, void *data, QGuestAllocator *t_alloc)
>      g_free(wqid);
>  }
>  
> +static bool fs_dirents_contain_name(struct V9fsDirent *e, const char* name)
> +{
> +    for (; e; e = e->next) {
> +        if (!strcmp(e->name, name)) {
> +            return true;
> +        }
> +    }
> +    return false;
> +}
> +
> +static void fs_readdir(void *obj, void *data, QGuestAllocator *t_alloc)
> +{
> +    QVirtio9P *v9p = obj;
> +    alloc = t_alloc;
> +    char *const wnames[] = { g_strdup(QTEST_V9FS_SYNTH_READDIR_DIR) };
> +    uint16_t nqid;
> +    v9fs_qid qid;
> +    uint32_t count, nentries;
> +    struct V9fsDirent *entries = NULL;
> +    P9Req *req;
> +
> +    fs_attach(v9p, NULL, t_alloc);
> +    req = v9fs_twalk(v9p, 0, 1, 1, wnames, 0);
> +    v9fs_req_wait_for_reply(req, NULL);
> +    v9fs_rwalk(req, &nqid, NULL);
> +    g_assert_cmpint(nqid, ==, 1);
> +
> +    req = v9fs_tlopen(v9p, 1, O_DIRECTORY, 0);
> +    v9fs_req_wait_for_reply(req, NULL);
> +    v9fs_rlopen(req, &qid, NULL);
> +
> +    req = v9fs_treaddir(v9p, 1, 0, P9_MAX_SIZE - P9_IOHDRSZ, 0);
> +    v9fs_req_wait_for_reply(req, NULL);
> +    v9fs_rreaddir(req, &count, &nentries, &entries);
> +
> +    /*
> +     * Assuming msize (P9_MAX_SIZE) is large enough so we can retrieve all
> +     * dir entries with only one readdir request.
> +     */
> +    g_assert_cmpint(
> +        nentries, ==,
> +        QTEST_V9FS_SYNTH_READDIR_NFILES + 2 /* "." and ".." */
> +    );
> +
> +    /*
> +     * Check all file names exist in returned entries, ignore their order
> +     * though.
> +     */
> +    g_assert_cmpint(fs_dirents_contain_name(entries, "."), ==, true);
> +    g_assert_cmpint(fs_dirents_contain_name(entries, ".."), ==, true);
> +    for (int i = 0; i < QTEST_V9FS_SYNTH_READDIR_NFILES; ++i) {
> +        char *name = g_strdup_printf(QTEST_V9FS_SYNTH_READDIR_FILE, i);
> +        g_assert_cmpint(fs_dirents_contain_name(entries, name), ==, true);
> +        g_free(name);
> +    }
> +
> +    v9fs_free_dirents(entries);
> +    g_free(wnames[0]);
> +}
> +
>  static void fs_walk_no_slash(void *obj, void *data, QGuestAllocator *t_alloc)
>  {
>      QVirtio9P *v9p = obj;
> @@ -658,6 +806,7 @@ static void register_virtio_9p_test(void)
>                   NULL);
>      qos_add_test("fs/flush/ignored", "virtio-9p", fs_flush_ignored,
>                   NULL);
> +    qos_add_test("fs/readdir/basic", "virtio-9p", fs_readdir, NULL);
>  }
>  
>  libqos_init(register_virtio_9p_test);



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 02/11] 9pfs: require msize >= 4096
  2020-01-17 15:15               ` Greg Kurz
@ 2020-01-17 16:41                 ` Christian Schoenebeck
  0 siblings, 0 replies; 25+ messages in thread
From: Christian Schoenebeck @ 2020-01-17 16:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: Greg Kurz

On Freitag, 17. Januar 2020 16:15:37 CET Greg Kurz wrote:
> > > Hmm... this patch does a sanity check on 'count', not on 'msize'...
> > 
> > Yes ... :)
> > 
> > > I mean no matter what msize is, clipping count to msize - 11 gives a
> > > chance to stop processing the entries before overflowing the transport
> > > buffer.
> > 
> > ... and no, this cannot happen if minimum msize of 4096 is forced already
> > by Tversion. Maybe you now get my point -> It is about avoiding exactly
> > such kind
> I'm not sure to see how setting a minimum msize of 4096 at Tversion would
> prevent the client to pass a higher 'count' argument and lure the server
> into generating a bigger than msize response since it does not check
> count < msize - 11 without patch 3.

That's correct, it requires patch 3 as well to prevent that. Without patch 3, 
if a (i.e. bad) client sends a 'count' parameter >> msize then the Treaddir 
request is processed by server to full extent according to 'count' and finally 
aborted by a transport error since server's response would exceed msize.

> > of issues in the first place. Most file systems have a name limit of 255
> > bytes:
> > 
> > https://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits
> > 
> > So by forcing a minimum 'msize' of 4096 you avoid having to deal with this
> > issue (and similar ones) on Treaddir request level (and other request type
> > handlers), including ReiserFS BTW because 4032+35 < 4096.
> 
> Good to know for ReiserFS.
> 
> > If you would allow smaller 'msize' values by Tversion, then you would need
> > to suffer some kind of death when handling Treaddir with certain high
> > file name length. Either a transport error (with an error message that a
> > normal user would not be able to understand at all) or by returning an
> > incomplete Treaddir response sequence with { Rreaddir count=0 }, or ...
> > any other kind of death.
> Ahh I now understand at last your argument about Rreaddir loosing data.
> We may end up sending { Rreaddir count=0 } because the next entry is too
> large... and thus end the readdir sequence.

Yep.

> Mentioning this explicitly
> from the start would have been more clear for me ;-)

Sorry for that. :) I thought I made it clear with the directory entries 
example. I try to be more clear next time.

> This looks like yet another bug to me. It looks wrong to return this
> special response if we have more entries to go. Also this could be the
> client's _fault_ if it provides a ridiculously small value for count.
> The current code will return count=0 all the same.
> 
> In any case, I think v9fs_do_readdir() should only return 0 if there
> are no more entries to read. It should error out otherwise, but I'm
> not sure how...

Patience please. I have to limit the scope of this patch series somewhere. I 
am aware about these issues, but if I add fixes for more and more edge cases 
(which already exist) as part of this patch series, it will become a never 
ending story.

I just added those particular fixes to this series, because they were directly 
related to things I've changed here for the actual purpose of this patch set, 
which was and is: readdir latency optimization.

> > > My point is that we're not going to check msize in Tversion in
> > > order to to avoid multiple checks everywhere. We're going to do
> > > it there because it is the only place where it makes sense to
> > > do it.
> > 
> > Also yes and no. Of course it just makes sense to handle it already at
> > Tversion. But no, you could theoretically also allow much smaller minimum
> > 'msize' value << 4096 (i.e. somewhere closely >7 as we discussed), then
> > you
> > would indeed still need to add msize checks at other places of the code
> > base as you just found out now. So forcing a minimum 'msize' which is
> > high enough, avoids having to add such individual checks and having to
> > deal with them in some kind of unpleasant way.
> 
> We still don't understand each other I'm afraid... we actually have
> implicit 'msize' checks already for every single thing we write on
> the wire: v9fs_packunpack() which detects when we're trying to write
> passed the buffer. When this happens, it is propagated to the transport
> which then disconnects, which is the painful thing you've been
> experiencing with your readdir experiments. In the case of Rreaddir, it
> really does make sense to try to avoid the disconnection like you do in
> patch 3 because the readdir sequence allows _partial_ reads. Same goes
> for Rread. But that's it. No other message in the protocol allows that,
> so I've never thought of adding individual 'msize' checks anywhere else.
> What would they do better than v9fs_packunpack() already does ?

Right, but you realized that a min. msize of 4096 (in combination with
patch 3) prevents the readdir data loss issue we discussed here (provided we 
have a "good" client sending count=msize-11), right?

If so, I suggest I "try" to address your concerns you came up with here in the 
commit log message as far as I can, and would like to ask you to adjust the 
message later on according to your personal preference if required.

Best regards,
Christian Schoenebeck




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 05/11] tests/virtio-9p: added readdir test
  2020-01-17 15:51   ` Greg Kurz
@ 2020-01-17 16:44     ` Christian Schoenebeck
  0 siblings, 0 replies; 25+ messages in thread
From: Christian Schoenebeck @ 2020-01-17 16:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: Greg Kurz

On Freitag, 17. Januar 2020 16:51:29 CET Greg Kurz wrote:
> On Tue, 14 Jan 2020 00:08:51 +0100
> 
> Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> > The first readdir test simply checks the amount of directory
> > entries returned by 9pfs server, according to the created amount
> > of virtual files on 9pfs synth driver side. Then the subsequent
> > readdir test also checks whether all directory entries have the
> > expected file names (as created on 9pfs synth driver side),
> > ignoring their precise order in result list though.
> > 
> > Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> > ---
> 
> LGTM but I'm afraid this needs rebasing because virtio-9p-test.c got moved
> to tests/qtest/ by this commit (merged earlier this week):
> 
> commit 1e8a1fae7464ef79c9e50aa0f807d2c511be3c8e
> Author: Thomas Huth <thuth@redhat.com>
> Date:   Mon Sep 9 12:04:01 2019 +0200
> 
>     test: Move qtests to a separate directory
> 
>     The tests directory itself is pretty overcrowded, and it's hard to
>     see which test belongs to which test subsystem (unit, qtest, ...).
>     Let's move the qtests to a separate folder for more clarity.
> 
>     Message-Id: <20191218103059.11729-6-thuth@redhat.com>
>     Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>     Signed-off-by: Thomas Huth <thuth@redhat.com>

Sure, I'll take care of that, and like previously discussed ...

> >  tests/virtio-9p-test.c | 149 +++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 149 insertions(+)
> > 
> > diff --git a/tests/virtio-9p-test.c b/tests/virtio-9p-test.c
> > index 06263edb53..721f13c1fb 100644
> > --- a/tests/virtio-9p-test.c
> > +++ b/tests/virtio-9p-test.c
> > @@ -68,6 +68,11 @@ static void v9fs_memread(P9Req *req, void *addr, size_t
> > len)> 
> >      req->r_off += len;
> >  
> >  }
> > 
> > +static void v9fs_uint8_read(P9Req *req, uint8_t *val)
> > +{
> > +    v9fs_memread(req, val, 1);
> > +}
> > +
> > 
> >  static void v9fs_uint16_write(P9Req *req, uint16_t val)
> >  {
> >  
> >      uint16_t le_val = cpu_to_le16(val);
> > 
> > @@ -101,6 +106,12 @@ static void v9fs_uint32_read(P9Req *req, uint32_t
> > *val)> 
> >      le32_to_cpus(val);
> >  
> >  }
> > 
> > +static void v9fs_uint64_read(P9Req *req, uint64_t *val)
> > +{
> > +    v9fs_memread(req, val, 8);
> > +    le64_to_cpus(val);
> > +}
> > +
> > 
> >  /* len[2] string[len] */
> >  static uint16_t v9fs_string_size(const char *string)
> >  {
> > 
> > @@ -191,6 +202,7 @@ static const char *rmessage_name(uint8_t id)
> > 
> >          id == P9_RLOPEN ? "RLOPEN" :
> >          id == P9_RWRITE ? "RWRITE" :
> > 
> >          id == P9_RFLUSH ? "RFLUSH" :
> > +        id == P9_RREADDIR ? "READDIR" :
> >          "<unknown>";
> >  
> >  }
> > 
> > @@ -348,6 +360,82 @@ static void v9fs_rwalk(P9Req *req, uint16_t *nwqid,
> > v9fs_qid **wqid)> 
> >      v9fs_req_free(req);
> >  
> >  }
> > 
> > +/* size[4] Treaddir tag[2] fid[4] offset[8] count[4] */
> > +static P9Req *v9fs_treaddir(QVirtio9P *v9p, uint32_t fid, uint64_t
> > offset,
> > +                            uint32_t count, uint16_t tag)
> > +{
> > +    P9Req *req;
> > +
> > +    req = v9fs_req_init(v9p, 4 + 8 + 4, P9_TREADDIR, tag);
> > +    v9fs_uint32_write(req, fid);
> > +    v9fs_uint64_write(req, offset);
> > +    v9fs_uint32_write(req, count);
> > +    v9fs_req_send(req);
> > +    return req;
> > +}
> > +
> > +struct V9fsDirent {
> > +    v9fs_qid qid;
> > +    uint64_t offset;
> > +    uint8_t type;
> > +    char *name;
> > +    struct V9fsDirent *next;
> > +};
> > +
> > +/* size[4] Rreaddir tag[2] count[4] data[count] */
> > +static void v9fs_rreaddir(P9Req *req, uint32_t *count, uint32_t
> > *nentries,
> > +                          struct V9fsDirent **entries)
> > +{
> > +    uint32_t local_count;
> > +    struct V9fsDirent *e = NULL;
> > +    uint16_t slen;
> > +    uint32_t n = 0;
> > +
> > +    v9fs_req_recv(req, P9_RREADDIR);
> > +    v9fs_uint32_read(req, &local_count);
> > +
> > +    if (count) {
> > +        *count = local_count;
> > +    }
> > +
> > +    for (int32_t togo = (int32_t)local_count;
> > +         togo >= 13 + 8 + 1 + 2;
> > +         togo -= 13 + 8 + 1 + 2 + slen, ++n)
> > +    {
> > +        if (!e) {
> > +            e = g_malloc(sizeof(struct V9fsDirent));
> > +            if (entries) {
> > +                *entries = e;
> > +            }
> > +        } else {
> > +            e = e->next = g_malloc(sizeof(struct V9fsDirent));
> > +        }
> > +        e->next = NULL;
> > +        /* qid[13] offset[8] type[1] name[s] */
> > +        v9fs_memread(req, &e->qid, 13);
> > +        v9fs_uint64_read(req, &e->offset);
> > +        v9fs_uint8_read(req, &e->type);
> > +        v9fs_string_read(req, &slen, &e->name);
> > +    }
> > +
> > +    if (nentries) {
> > +        *nentries = n;
> > +    }
> > +
> > +    v9fs_req_free(req);
> > +}
> > +
> > +static void v9fs_free_dirents(struct V9fsDirent *e)
> > +{
> > +    struct V9fsDirent *next = NULL;
> > +
> > +    for (; e; e = next) {
> > +        next = e->next;
> > +        g_free(e->name);
> > +        g_free(e);
> > +    }
> > +}
> > +
> > 
> >  /* size[4] Tlopen tag[2] fid[4] flags[4] */
> >  static P9Req *v9fs_tlopen(QVirtio9P *v9p, uint32_t fid, uint32_t flags,
> >  
> >                            uint16_t tag)
> > 
> > @@ -480,6 +568,66 @@ static void fs_walk(void *obj, void *data,
> > QGuestAllocator *t_alloc)> 
> >      g_free(wqid);
> >  
> >  }
> > 
> > +static bool fs_dirents_contain_name(struct V9fsDirent *e, const char*
> > name) +{
> > +    for (; e; e = e->next) {
> > +        if (!strcmp(e->name, name)) {
> > +            return true;
> > +        }
> > +    }
> > +    return false;
> > +}
> > +
> > +static void fs_readdir(void *obj, void *data, QGuestAllocator *t_alloc)
> > +{
> > +    QVirtio9P *v9p = obj;
> > +    alloc = t_alloc;
> > +    char *const wnames[] = { g_strdup(QTEST_V9FS_SYNTH_READDIR_DIR) };
> > +    uint16_t nqid;
> > +    v9fs_qid qid;
> > +    uint32_t count, nentries;
> > +    struct V9fsDirent *entries = NULL;
> > +    P9Req *req;
> > +
> > +    fs_attach(v9p, NULL, t_alloc);
> > +    req = v9fs_twalk(v9p, 0, 1, 1, wnames, 0);
> > +    v9fs_req_wait_for_reply(req, NULL);
> > +    v9fs_rwalk(req, &nqid, NULL);
> > +    g_assert_cmpint(nqid, ==, 1);
> > +
> > +    req = v9fs_tlopen(v9p, 1, O_DIRECTORY, 0);
> > +    v9fs_req_wait_for_reply(req, NULL);
> > +    v9fs_rlopen(req, &qid, NULL);
> > +
> > +    req = v9fs_treaddir(v9p, 1, 0, P9_MAX_SIZE - P9_IOHDRSZ, 0);

... I'll have to change that to:

	req = v9fs_treaddir(v9p, 1, 0, P9_MAX_SIZE - 11, 0);


> > +    v9fs_req_wait_for_reply(req, NULL);
> > +    v9fs_rreaddir(req, &count, &nentries, &entries);
> > +
> > +    /*
> > +     * Assuming msize (P9_MAX_SIZE) is large enough so we can retrieve
> > all
> > +     * dir entries with only one readdir request.
> > +     */
> > +    g_assert_cmpint(
> > +        nentries, ==,
> > +        QTEST_V9FS_SYNTH_READDIR_NFILES + 2 /* "." and ".." */
> > +    );
> > +
> > +    /*
> > +     * Check all file names exist in returned entries, ignore their order
> > +     * though.
> > +     */
> > +    g_assert_cmpint(fs_dirents_contain_name(entries, "."), ==, true);
> > +    g_assert_cmpint(fs_dirents_contain_name(entries, ".."), ==, true);
> > +    for (int i = 0; i < QTEST_V9FS_SYNTH_READDIR_NFILES; ++i) {
> > +        char *name = g_strdup_printf(QTEST_V9FS_SYNTH_READDIR_FILE, i);
> > +        g_assert_cmpint(fs_dirents_contain_name(entries, name), ==,
> > true);
> > +        g_free(name);
> > +    }
> > +
> > +    v9fs_free_dirents(entries);
> > +    g_free(wnames[0]);
> > +}
> > +
> > 
> >  static void fs_walk_no_slash(void *obj, void *data, QGuestAllocator
> >  *t_alloc) {
> >  
> >      QVirtio9P *v9p = obj;
> > 
> > @@ -658,6 +806,7 @@ static void register_virtio_9p_test(void)
> > 
> >                   NULL);
> >      
> >      qos_add_test("fs/flush/ignored", "virtio-9p", fs_flush_ignored,
> >      
> >                   NULL);
> > 
> > +    qos_add_test("fs/readdir/basic", "virtio-9p", fs_readdir, NULL);
> > 
> >  }
> >  
> >  libqos_init(register_virtio_9p_test);

Best regards,
Christian Schoenebeck




^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2020-01-17 16:51 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-13 23:18 [PATCH v3 00/11] 9pfs: readdir optimization Christian Schoenebeck
2020-01-13 22:20 ` [PATCH v3 01/11] tests/virtio-9p: add terminating null in v9fs_string_read() Christian Schoenebeck
2020-01-13 22:21 ` [PATCH v3 02/11] 9pfs: require msize >= 4096 Christian Schoenebeck
2020-01-16 13:15   ` Greg Kurz
2020-01-16 16:16     ` Christian Schoenebeck
2020-01-16 18:07       ` Greg Kurz
2020-01-16 21:39         ` Christian Schoenebeck
2020-01-17 10:24           ` Greg Kurz
2020-01-17 12:01             ` Christian Schoenebeck
2020-01-17 15:15               ` Greg Kurz
2020-01-17 16:41                 ` Christian Schoenebeck
2020-01-13 22:22 ` [PATCH v3 03/11] 9pfs: validate count sent by client with T_readdir Christian Schoenebeck
2020-01-16 13:33   ` Greg Kurz
2020-01-16 16:51     ` Christian Schoenebeck
2020-01-17 15:50       ` Greg Kurz
2020-01-13 22:23 ` [PATCH v3 04/11] hw/9pfs/9p-synth: added directory for readdir test Christian Schoenebeck
2020-01-13 23:08 ` [PATCH v3 05/11] tests/virtio-9p: added " Christian Schoenebeck
2020-01-17 15:51   ` Greg Kurz
2020-01-17 16:44     ` Christian Schoenebeck
2020-01-13 23:11 ` [PATCH v3 06/11] tests/virtio-9p: added splitted " Christian Schoenebeck
2020-01-13 23:13 ` [PATCH v3 07/11] tests/virtio-9p: failing " Christian Schoenebeck
2020-01-13 23:16 ` [PATCH v3 08/11] 9pfs: readdir benchmark Christian Schoenebeck
2020-01-13 23:16 ` [PATCH v3 09/11] hw/9pfs/9p-synth: avoid n-square issue in synth_readdir() Christian Schoenebeck
2020-01-13 23:17 ` [PATCH v3 10/11] 9pfs: T_readdir latency optimization Christian Schoenebeck
2020-01-13 23:17 ` [PATCH v3 11/11] hw/9pfs/9p.c: benchmark time on T_readdir request Christian Schoenebeck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.