qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Christian Schoenebeck <qemu_oss@crudebyte.com>
To: qemu-devel@nongnu.org
Cc: Greg Kurz <groug@kaod.org>, Paolo Bonzini <pbonzini@redhat.com>,
	Peter Maydell <peter.maydell@linaro.org>
Subject: [PATCH 1/4] docs/devel: add 9p.rst
Date: Tue, 23 Mar 2021 19:36:13 +0100	[thread overview]
Message-ID: <afe044a589742961a06563200eec591cb3df8ca6.1616528420.git.qemu_oss@crudebyte.com> (raw)
In-Reply-To: <cover.1616528420.git.qemu_oss@crudebyte.com>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 29617 bytes --]

This page has been auto converted from:


by manually grabbing the page's mediawiki source from:


and by using 'pandoc' for the actual conversion:

  pandoc -f mediawiki -t rst foo.txt > docs/devel/9p.rst

Unfortunately Mediawiki's API does not allow to grab the wiki source text
automatically in an easy way (i.e. without having to use parse tools).

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
 docs/devel/9p.rst | 544 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 544 insertions(+)
 create mode 100644 docs/devel/9p.rst

diff --git a/docs/devel/9p.rst b/docs/devel/9p.rst
new file mode 100644
index 0000000000..e2a57b77b2
--- /dev/null
+++ b/docs/devel/9p.rst
@@ -0,0 +1,544 @@
+9pfs Developers Documentation
+This page is intended for developers who want to put their hands on the
+9p passthrough filesystem implementation in QEMU. For regular user
+aspects you rather want to look at the separate page
+`Documentation/9psetup <Documentation/9psetup>`__ instead.
+9p Protocol
+9pfs uses the `Plan 9 Filesystem
+Protocol <https://en.wikipedia.org/wiki/9P_(protocol)>`__ for
+communicating the file I/O operations between guest systems (clients)
+and the `9p server (see below) <#9P_Server>`__. There are a bunch of
+separate documents specifying different variants of the protocol, which
+might be a bit confusing at first, so here is a summary of the
+individual protocol flavours.
+If this is your first time getting in touch with the 9p protocol then
+you might have a look at this introduction by Eric Van Hensbergen which
+is an easy understandable text explaining how the protocol works,
+including examples of individual requests and their response messages:
+`Using 9P2000 Under
+Linux <https://www.usenix.org/legacy/events/usenix05/tech/freenix/full_papers/hensbergen/hensbergen_html/index.html>`__
+There are currently 3 dialects of the 9p network protocol called
+"9p2000", "9p2000.u" and "9p2000.L". Note that QEMU's 9pfs
+implementation only supports either "9p2000.u" or "9p2000.L".
+This is the basis of the 9p protocol the other two dialects derive from.
+This is the specification of the protocol: `9p2000
+Protocol <http://ericvh.github.io/9p-rfc/rfc9p2000.html>`__
+The "9p2000.u" dialect adds extensions and minor adjustments to the
+protocol for Unix systems, especially for common data types available on
+a Unix system. For instance the basic "9p2000" protocol version only
+returns an error text if some error occurred on server side, whereas
+"9p2000.u" also returns an appropriate, common POSIX error code for the
+individual error. `9p2000.u
+Protocol <http://ericvh.github.io/9p-rfc/rfc9p2000.u.html>`__
+Similar to the "9p2000.u" dialect, the "9p2000.L" dialect adds
+extensions and minor adjustments of the protocol specifically for Linux
+systems. Again this is mostly targeted at specializing for data types of
+system calls available on a Linux system. `9p2000.L
+Protocol <https://github.com/chaos/diod/blob/master/protocol.md>`__
+The following figure shows the basic structure of the 9pfs
+implementation in QEMU.
+`frameless|upright=3.0 <File:9pfs_topology.png>`__
+The implementation consists of 3 modular components: 9p server, 9p
+filesystem drivers and 9p transport drivers. The 9p client on guest OS
+side is not part of the QEMU code base. There are a bunch of 9p client
+implementations e.g. for individual OSes. The most commonly used one is
+the client that comes with the stock Linux kernel. `Linux 9p
+Client <https://github.com/torvalds/linux/tree/master/fs/9p>`__
+9p Server
+This is the controller portion of the 9pfs code base which handles the
+raw 9p network protocol handling, and the general high-level control
+flow of 9p clients' (the guest systems) 9p requests. The 9p server is
+basically a full-fledged file server and accordingly it has the highest
+code complexity in the 9pfs code base, most of this is in
+`hw/9pfs/9p.c <https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c>`__
+source file.
+9p Filesystem Drivers
+The 9p server uses a
+`VFS <https://en.wikipedia.org/wiki/Virtual_file_system>`__ layer for
+the actual file operations, which makes it flexible from where the file
+storage data comes from and how exactly that data is actually accessed.
+There are currently 3 different 9p file system driver implementations
+1. local fs driver
+This is the most common fs driver which is used most often with 9p in
+practice. It basically just maps the individual VFS functions (more or
+less) directly to the host system's file system functions like open(),
+read(), write(), etc. You find this fs driver implementation in
+`hw/9pfs/9p-local.c <https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-local.c>`__
+source file.
+Most of the "local" driver's code deals with remapping of permissions,
+which solves a fundamental problem: a high privileged user like "root"
+(and the kernel itself) on the guest system expects to have full control
+over its filesystems. For instance it needs to be able to change the
+owning user and group of files and directories, be able to add, change
+and remove attributes, changing any file permissions and so forth.
+Without these assumed permissions, it would nearly be impossible to run
+any useful service on guest side ontop of a 9pfs filesystem. The QEMU
+binary on the host system however is usually not running as privileged
+user for security reasons, so the 9pfs server can actually not do all
+those things on the file system it has access to on host side.
+For that reason the "local" driver supports remapping of file
+permissions and owners. So when the "remap" driver option of the "local"
+driver is used (like it's usually the case on a production system), then
+the "local" driver pretends to the guest system it could do all those
+things, but in reality it just maps things like permissions and owning
+users and groups as additional data on the filesystem, either as some
+hidden files, or as extended attributes (the latter being recommended)
+which are not directly exposed to the guest OS. With remapping enabled,
+you can actually run an entire guest OS on a single 9pfs root filesystem
+2. proxy fs driver
+This fs driver was supposed to dispatch the VFS functions to be called
+from a separate process (by
+`fsdev/virtfs-proxy-helper <https://gitlab.com/qemu-project/qemu/-/blob/master/fsdev/virtfs-proxy-helper.c>`__)
+and increasing security by that separation, however the "proxy" driver
+is currently not considered to be production grade.
+`hw/9pfs/9p-proxy.c <https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-proxy.c>`__
+However the "proxy" fs driver shows some potential of 9pfs. As a fs
+driver for 9pfs is just a thin, lite-weight VFS layer to the actual fs
+data, it would for instance be considerable to implement a fs driver
+that allows the actual filesystem to be kept entirely on a separate
+storage system and therefore increasing security and availability. If an
+attacker would then e.g. be able to gain full control over the 9pfs host
+system, the attacker would still not have access to the raw filesystem.
+So with a separate `COW <https://en.wikipedia.org/wiki/Copy-on-write>`__
+storage system, an attacker might be able to temporarily command data
+changes on storage side, but the uncompromised data before the attack
+would remain available and an immediate rollback would therefore be
+possible. And due to not having direct raw access to the storage
+filesystem, the attack could then be audited later on in detail as the
+attacker would not be able to wipe its traces on the storage logs.
+3. synth fs driver
+The original ambition for this driver was to allow QEMU subsystems to
+expose a synthetic API to the client, i.e. to expose some stats,
+information or any knob you can think of to the guest *à la* linux
+kernel /sys. This never gained momentum and remained totally unused for
+years, until a new use case was found : use it to implement 9p protocol
+validation tests. This fs driver is now exclusively used for development
+purposes. It just simulates individual filesystem operations with
+specific test scenarios in mind, and therefore is not useful for
+anything on a production system. The main purpose of the "synth" fs
+driver is to simulate certain fs behaviours that would be hard to
+trigger with a regular (production) fs driver like the "local" fs driver
+for instance. Right now the synth fs driver is used by the automated
+`9pfs test cases <#Synth_Tests>`__ and by the automated 9pfs fuzzing
+code. The automated test cases use the "synth" fs driver for instance to
+check the 9p server's correct behaviour on 9p
+`Tflush <http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28>`__
+requests, which a client may send to abort a file I/O operation that
+might already be blocking for a long time. In general the "synth" driver
+is very useful for effectively simulating any multi-threaded use case
+`hw/9pfs/9p-synth.c <https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-synth.c>`__
+9p Transport Drivers
+The third component of the 9pfs implementation in QEMU is the
+"transport" driver, which is the communication channel between host
+system and guest system used by the 9p server. There are currently two
+9p transport driver implementations available in QEMU:
+1. virtio transport driver
+The 9p "virtio" transport driver uses e.g. a virtual PCI device and
+ontop the
+`virtio <https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html>`__
+protocol to transfer the 9p messages between clients (guest systems) and
+9p server (host system).
+`hw/9pfs/virtio-9p-device.c <https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/virtio-9p-device.c>`__
+2. Xen transport driver
+`hw/9pfs/xen-9p-backend.c <https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/xen-9p-backend.c>`__
+Threads and Coroutines
+The 9pfs implementation in QEMU heavily uses
+`Coroutines <https://en.wikipedia.org/wiki/Coroutine>`__ to handle
+individual 9p requests.
+If you haven't used Coroutines before, simply put: a Coroutine manages
+its own stack memory. That's it. So when a thread enters the scope of a
+Coroutine then everything that is usually put on the thread's own stack
+memory (and the latter being always firmly tied to that thread) is
+rather put on the Coroutine's stack memory instead. The advantage is, as
+Coroutines are just data structures, they can be passed from one thread
+to another. So Coroutines allow to use memory stacks that are decoupled
+from specific threads.
+Another important aspect to know is that once a thread leaves the scope
+of a Coroutine, then that thread is back at using its own thread-owned
+stack again.
+`frameless|upright=2.4 <File:Coroutines_stacks.png>`__
+Each coroutine instance usually handles a certain "collaborative" task,
+where "collaborative" means that individual parts of the task usually
+need to be executed by different threads before the overall task
+eventually can be considered as fulfilled. So if a thread knows it has
+to start a new task that may also require other threads to process parts
+of that task, then that thread allocates a Coroutine instance. The
+thread then "enters" the Coroutine scope, which means starting at this
+point every local variable and all following function calls (function
+call stack, including function arguments and their return values) are
+put on the Coroutine's stack memory instead of the thread's own memory
+stack (as it would usually). So now the thread would call arbitrary
+functions, run loops, create local variables inside them, etc. and then
+at a certain point the thread realizes that something of the task needs
+to be handled by a different thread next. At this point the thread
+leaves the Coroutine scope (e.g. by either "yielding" or "awaiting"), it
+then passes the Coroutine instance to another thread which in turn
+enters the Coroutine scope and finds the call stack and all local
+variables exactly as it was left by the previous thread using the
+Coroutine instance before.
+It is important to understand that Coroutines are really just covering
+memory stack aspects. They are not dealing with any multi-threading
+aspects by themselves. Which has the advantage that Coroutines can be
+combined with any multi-threading concept & framework (e.g. POSIX
+threads, Grand Central Dispatch, ...).
+Control Flow
+The following figure shows the control flow and relationship of Threads
+and Coroutines of the 9pfs implementation.
+`frameless|upright=3.5 <File:9pfs_control_flow.png>`__
+Getting back to 9pfs as concrete user of Coroutines, every 9P client
+request that comes in on 9P server side is a task the 9P server needs to
+fulfill on behalf of the client / guest OS. So for every 9P request a
+Coroutine instance is allocated. Then the 9P server's main thread
+"enters" the Coroutine scope to start processing the client's 9P
+request. At a certain point something of that request usually needs to
+be handled by the fs driver which means the fs driver needs to call file
+I/O syscall(s) which might block for a long time. Therefore the 9P
+server leaves the Coroutine at that point and dispatches the Coroutine
+instance to a QEMU worker thread which then executes the fs driver
+function(s) for fulfilling the actual file system I/O task(s). Once the
+worker thread is done with the fs I/O task portion it leaves the
+Coroutine scope and dispatches the Coroutine data structure back to the
+server's main thread, which in turn would re-enter the Coroutine and
+continue processing the request with the result as provided by the
+worker thread. So yet again, main thread finds the call stack and local
+variables exactly as it was left by the worker thread when it
+re-rentered the Coroutine.
+The primary major advantages of this design is that the 9P server's main
+thread can continue handling another 9P request while a worker thread
+would do the (maybe long taking) fs driver I/O subtask(s), and yet code
+complexity is reduced substantially in comparison to other
+multi-threaded task handling concepts, which also improves safety.
+Main Thread
+Almost the entire 9p server is running on the QEMU main thread, with the
+exception of some worker threads handling fs driver file I/O tasks as
+described above. So basically everything in
+`hw/9pfs/9p.c <https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c>`__
+you can assume to run on main thread, except of function calls there
+with the naming scheme \*_co_*(). So if you find a call with such a
+function name pattern you can know immediately that this function
+dispatches the Coroutine at this point to a worker thread (by using the
+macro v9fs_co_run_in_worker(...) inside its function implementation),
+and when the \*_co_*() function call returned, it already dispatched the
+Coroutine back to main thread.
+Incoming 9p requests are processed by the 9p server's main thread in the
+order they arrived. However while 9p requests (i.e. their coroutine) are
+dispatched for filesystem I/O to a worker thread, the 9p server's main
+thread would handle another 9p request (if any) in the meantime. Each 9p
+request (i.e. coroutine) might be dispatched between main thread and
+some worker thread several times (for the same 9p request that is)
+before the 9p request is completed by the server and a 9p response
+eventually been sent to client. So pending 9p requests are therefore
+handled in parallel by the 9p server, and there is no guarantee that 9p
+replies are transmitted in the exact same order as their 9p requests
+originally came in.
+Carrying out several 9p requests simultaniously allows higher
+performance, provided that the 9p client implementation supports
+parallelism as well. Apart from performance aspects, the 9p protocol
+requires parallel handling of
+`Tflush <http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28>`__
+requests, to allow aborting I/O requests that might be blocking for a
+long time, e.g. to prevent them from hanging for good on server side. We
+do have a test case for this Tflush behaviour by the way.
+Test Cases
+Whatever you are doing there on the 9pfs code base, please run the
+automated test cases after you modified the source code to ensure that
+your changes did not break the expected behaviour of 9pfs. Running the
+tests is very simple and does not require any guest OS installation, nor
+is any guest OS booted, and for that reason you can run them in few
+seconds. The test cases are also a very efficient way to check whether
+your 9pfs changes are actually doing what you want them to while still
+To run the 9pfs tests e.g. on a x86 system, all you need to do is
+executing the following two commands:
+| ``    export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64``
+| ``    tests/qtest/qos-test -m slow``
+All 9pfs test cases are in
+`tests/qtest/virtio-9p-test.c <https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/virtio-9p-test.c>`__
+source file. If all runs well and all tests pass, you should see an
+output like this:
+| ``   ...``
+| ``   /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/pci-device/pci-device-tests/nop: OK``
+| ``   /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio/virtio-tests/nop: OK``
+| ``   /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/config: OK``
+| ``   /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/version/basic: OK``
+| ``   /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/attach/basic: OK``
+| ``   /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/basic: OK``
+| ``   /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/no_slash: OK``
+| ``   /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/dotdot_from_root: OK``
+| ``   /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/lopen/basic: OK``
+| ``   /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/write/basic: OK``
+| ``   /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/success: OK``
+| ``   /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/ignored: OK``
+| ``   /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/basic: OK``
+| ``   /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_512: OK``
+| ``   /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_256: OK``
+| ``   /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128: OK``
+| ``   /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/config: OK``
+| ``   /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_dir: OK``
+| ``   /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir: OK``
+| ``   /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_file: OK``
+| ``   /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_file: OK``
+| ``   /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/symlink_file: OK``
+| ``   /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_symlink: OK``
+| ``   /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/hardlink_file: OK``
+| ``   /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_hardlink: OK``
+| ``   ...``
+If you don't see all test cases appearing on screen, or if some problem
+occurs, try adding --verbose to the command line:
+``   tests/qtest/qos-test -m slow --verbose``
+Keep in mind that QEMU's qtest framework automatically enables just
+those test cases that are supported by your machine and configuration.
+With the --verbose switch you will see exactly which individual tests
+are enabled and which not at the beginning of the output:
+| ``   ...``
+| ``   # ALL QGRAPH NODES: {``
+| ``   #        name='e1000e-tests/rx' type=3 cmd_line='(null)' [available]``
+| ``   #        name='virtio-9p-tests/synth/readdir/basic' type=3 cmd_line='(null)' [available]``
+| ``   #        name='virtio-scsi-pci' type=1 cmd_line=' -device virtio-scsi-pci' [available]``
+| ``   #        name='virtio-9p-tests/synth/readdir/split_128' type=3 cmd_line='(null)' [available]``
+| ``   #        name='virtio-net-tests/vhost-user/multiqueue' type=3 cmd_line='(null)' [available]``
+| ``   #        name='virtio-9p-tests/local/unlinkat_symlink' type=3 cmd_line='(null)' [available]``
+| ``   ...``
+And for each test case being executed, you can see the precise QEMU
+command line that is used for that individual test:
+| ``   ...``
+| ``   GTest: run: /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir``
+| ``   # Run QEMU with: '-M pc  -fsdev local,id=fsdev0,path='/home/me/src/qemu/build/qtest-9p-local-ELKQGv',security_model=mapped-xattr -device virtio-9p-pci,fsdev=fsdev0,addr=04.0,mount_tag=qtest'``
+| ``   GTest: result: OK``
+| ``   ...``
+You can also just run one or a smaller list of tests to concentrate on
+whatever you are working on. To get a list of all test cases:
+``   tests/qtest/qos-test -l``
+Then pass the respective test case name(s) as argument -p to run them as
+"partial" tests, e.g.:
+``   tests/qtest/qos-test -p /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128``
+Synth Tests
+As you can see at the end of the virtio-9p-test.c file, the 9pfs test
+cases are split into two groups of tests. The first group of tests use
+the "synth" fs driver, so all file I/O operations are simulated and
+basically you can add all kinds of hacks into the synth driver to
+simulate whatever you need to test certain fs behaviours, no matter how
+exotic that behaviour might be. This is the place to validate that the
+9p server in
+`hw/9pfs/9p.c <https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c>`__
+honors the 9p protocol, e.g.
+`Tflush <http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28>`__
+actually cancels a pending request. Testing of *real life* scenarios
+doesn't belong here : they should be performed with the "local" fs
+driver because this is what is used in production.
+Local Tests
+The second group of tests use the "local" fs driver, so they are
+actually operating on real dirs and files in a test directory on the
+host filesystem. Some issues that happened in the past were caused by a
+combination of the 9p server and the actual "local" fs driver that's
+usually used on production machines. For that reason this group of tests
+are covering issues thay may happen across these two components of 9pfs.
+Again, this works without any guest OS, which has the advantage that you
+can test the behaviour independent of third-party 9p client
+This is a rough list of things that are planned to be changed in future.
+Implementation Plans
+-  Optimizations:
+   -  Reducing thread hops: Right now in
+      `hw/9pfs/9p.c <https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c>`__
+      almost every request (its coroutine that is) is dispatched
+      multiple times between 9p server's main thread and some worker
+      thread back and forth. Every thread hop adds latency to the
+      overall completion time of a request. The desired plan is to
+      reduce the amount of thread hops to a minimum, ideally one 9p
+      request would be dispatched exactly one time to a worker thread
+      for all required filesystem related I/O subtasks and then
+      dispatched back exactly one time back to main thread. Some work on
+      this has already been done for
+      `Treaddir <https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory>`__
+      request handling, as this was the request type suffering the most
+      under large amount of thread hops, and reduction of those hops
+      provided `significant performance improvements for
+      Treaddir <https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg05539.html>`__
+      handling. For other request types similar changes should be
+      applied.
+   -  Making Tflush non-blocking: When handling a
+      `Tflush <http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28>`__
+      request, server currently blocks the Tflush request's coroutine
+      until the requested other I/O request was actually aborted. From
+      the specs though Tflush should return immediately, and currently
+      this blocking behaviour has a negative performance impact
+      especially with 9p clients that do not support handling parallel
+      requests.
+Protocol Plans
+These are some of the things that we might want to change on 9p protocol
+level in future. Right now this list just serves for roughly collecting
+some ideas for future protocol changes. Don't expect protocol changes in
+near future though, this will definitely take a long time.
+-  Fixes:
+   -  Increase qid.path Size: The
+      `qid.path <http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor32>`__
+      (which should not be confused with a filesystem path like
+      "/foo/bar/") is an integer supposed to uniquely identify a file,
+      which is currently a 64-bit number. A filesystem on host often has
+      things like hard links which means different pathes on the
+      filesystem might actually point to the same file and a numeric
+      file ID in general is used to detect that by systems. Certain
+      services like Samba are using this information, and incorrect
+      handling (i.e. collisions) of unique file IDs can cause
+      misbehaviours. The problem though is that 9p might share more than
+      one filesystem anywhere under its 9p share's root path. So a truly
+      unique file ID under Linux for instance is the combination of the
+      mounted filesystem's device ID and the individual file's inode
+      number, which is larger than 64-bit combined and hence would
+      exceed 9p protocol's qid.path field. By default we only pass the
+      file's inode number via qid.path, so we are assuming that only one
+      filesystem is shared per 9p share. If multiple filesystems are
+      detected, a warning is logged at runtime noting that file ID
+      collisions are possible, and suggesting to enable the
+      multidevs=remap option, which (if enabled) remaps file IDs from
+      host to guest in a way that would prevent such collisions. In
+      practice this remapping should happen with no noticable overhead,
+      but obviously in a future protocol change this should be addressed
+      by simply increasing the qid.path e.g. to 128 bits so that we
+      won't need to remap file IDs in future anymore.
+-  Cleanup:
+   -  Merge Dialects: It might make sense merging the individual 9p
+      dialects to just one protocol version for all systems to reduce
+      complexity and confusion.
+-  Optimizations:
+   -  Extend Treaddir: To retrieve a list of directory entries a
+      `Treaddir <https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory>`__
+      request is sent by clients. In practice, this request is followed
+      by a large amount of individual requests for getting more detailed
+      information about each directory entry like permissions, ownership
+      and so forth. For that reason it might make sense for allowing to
+      optionally return such common detailed information already with a
+      single Rreaddir response to avoid overhead.
+Please refer to `Contribute/SubmitAPatch <Contribute/SubmitAPatch>`__
+for instructions about how to send your patches.
+On doubt, just send a message to
+`qemu-devel <https://lists.nongnu.org/mailman/listinfo/qemu-devel>`__
+first; but as this is a high traffic mailing list, don't forget to add
+"9p" to the subject line to prevent your message from ending up unseen.
+`Category:Developer documentation <Category:Developer_documentation>`__

  reply	other threads:[~2021-03-23 20:31 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-23 19:40 [PATCH 0/4] add in-tree 9pfs developers documentation Christian Schoenebeck
2021-03-23 18:36 ` Christian Schoenebeck [this message]
2021-03-23 18:52 ` [PATCH 2/4] docs/devel: add directory for pictures Christian Schoenebeck
2021-03-23 18:55 ` [PATCH 3/4] docs/devel/9p: fix references to pictures Christian Schoenebeck
2021-03-23 19:31 ` [PATCH 4/4] MAINTAINERS: add responsibility for docs/devel/9p.rst Christian Schoenebeck
2021-04-06 12:27 ` [PATCH 0/4] add in-tree 9pfs developers documentation Christian Schoenebeck
2021-04-07  7:32   ` Greg Kurz
2021-04-07  8:52     ` Christian Schoenebeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afe044a589742961a06563200eec591cb3df8ca6.1616528420.git.qemu_oss@crudebyte.com \
    --to=qemu_oss@crudebyte.com \
    --cc=groug@kaod.org \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --subject='Re: [PATCH 1/4] docs/devel: add 9p.rst' \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).