On Tue, Mar 23, 2021 at 01:54:46PM +0100, Christian Schoenebeck wrote: > On Montag, 22. März 2021 11:54:56 CET Stefan Hajnoczi wrote: > > > > Thanks, Christian. I am still trying to figure out the details of the > > > > ROP > > > > attacks. > > > > > > > > However, QEMU's vhost-user reconnection is based on chardev socket > > > > reconnection. The socket reconnection can be enabled by the "--chardev > > > > socket,...,reconnect=N" in QEMU command options, in which N means QEMU > > > > will > > > > try to connect the disconnected socket every N seconds. We can increase > > > > N > > > > to increase the reconnect delay. If we want to change the reconnect > > > > delay > > > > dynamically, I think we should change the chardev socket reconnection > > > > code. > > > > It is a more generic mechanism than vhost-user-fs and vhost-user > > > > backend. > > > > > > > > By the way, I also considered the socket reconnection delay time in the > > > > performance aspect. As the reconnection delay increase, if an > > > > application > > > > in the guest is doing I/Os, it will suffer larger tail latency. And for > > > > now, the smallest delay is 1 second, which is rather large for > > > > high-performance virtual I/O devices today. I think maybe a more > > > > performant > > > > and safer reconnect delay adjustment mechanism should be considered in > > > > the > > > > future. What are your thoughts? > > > > > > So with N=1 an attacker could e.g. bypass a 16-bit PAC by brute-force in > > > ~18 hours (e.g. on Arm if PAC + MTE was enabled). With 24-bit PAC (no > > > MTE) it would be ~194 days. Independent of what architecture and defend > > > mechanism is used, there is always the possibility though that some kind > > > of side channel attack exists that might require a much lower amount of > > > attempts. So in an untrusted environment I would personally limit the > > > amount of automatic reconnects and rather accept a down time for further > > > investigation if a suspicious high amount of crashes happened. > > > > > > And yes, if a dynamic delay scheme was deployed in future then starting > > > with a value smaller than 1 second would make sense. > > > > If we're talking about repeatedly crashing the process to find out its > > memory map, shouldn't each process have a different randomized memory > > layout? > > > > Stefan > > Yes, ASLR is enabled on Linux and other OSes by default for more than 10 > years. But ASLR does not prevent ROP attacks which are commonly using relative > offsets, tweaking the stack, indirect jumps, as well as heap spraying. Plus > side channels exist to gain access to direct addresses. > > The situation might improve significantly when shadow stacks (e.g. Intel CET) > become widely used in future. But in the meantime I would be cautious if > something is crashing too often in a certain time frame. It's a good point for performance as well. A broken service should not hog CPU by constantly restarting itself. If it's broken badly it may never come back up and should be throttled. Stefan