El jue., 18 nov. 2021 16:31, Hanna Reitz <hreitz@redhat.com> escribió:
On 18.11.21 14:50, Paolo Bonzini wrote:
> On 11/15/21 17:03, Hanna Reitz wrote:
>>
>> I only really see four solutions for this:
>> (1) We somehow make the amend job run in the main context under the
>> BQL and have it prevent all concurrent I/O access (seems bad)
>> (2) We can make the permission functions part of the I/O path (seems
>> wrong and probably impossible?)
>> (3) We can drop the permissions update and permanently require the
>> permissions that we need when updating keys (I think this might break
>> existing use cases)
>> (4) We can acquire the BQL around the permission update call and
>> perhaps that works?
>>
>> I don’t know how (4) would work but it’s basically the only
>> reasonable solution I can come up with.  Would this be a way to call
>> a BQL function from an I/O function?
>
> I think that would deadlock:
>
>     main                I/O thread
>     --------            -----
>     start bdrv_co_amend
>                     take BQL
>     bdrv_drain
>     ... hangs ...

:/

Is there really nothing we can do?  Forgive me if I’m talking complete
nonsense here (because frankly I don’t even really know what a bottom
half is exactly), but can’t we schedule some coroutine in the main
thread to do the perm notifications and wait for them in the I/O thread?

I think you still get a deadlock, just one with a longer chain. You still have a cycle of things depending on each other, but one of them is now the I/O thread waiting for the bottom half.

Hmm...  Perhaps.  We would need to undo the permission change when the
job finishes, though, i.e. in JobDriver.prepare() or JobDriver.clean(). 
Doing the change in qmp_x_blockdev_amend() would be asymmetric then, so
we’d probably want a new JobDriver method that runs in the main thread
before .run() is invoked. (Unfortunately, “.prepare()” is now taken
already...)

Ok at least it's feasible.

Doesn’t solve the FUSE problem, but there we could try to just take the
RESIZE permission permanently and if that fails, we just don’t allow
truncates for that export.  Not nice, but should work for common cases.

Yeah definitely not nice. Probably permissions could be protected by their own mutex, even a global one like the one we have for jobs. For now I suggest just ignoring the problem and adding a comment, since it's not really something that didn't exist.

Paolo