On 4/5/22 02:33, Bean Huo wrote: > For performance improvement, according to my test, if we abandon SCSI > command parsing, we can get 3%~5% performance improvement. Maybe this > is little or no improvement? Yes, reliability issues outweigh this > performance improvement. Error handling and UFS probes should also be > rebuilt. But most importantly, it makes UFS more scalable. How do you > think about adding an immature development driver to drever/staging > first? name it driver/staging/lightweight-ufs? I do not understand the interest in bypassing the SCSI core. The scsi_debug driver supports more than a million IOPS on a single CPU core (see also the attached script). I think this shows that the SCSI core is not a performance bottleneck. Additionally, I think it will take a while until UFS devices scale from the current performance level (100K IOPS?) to more than a million IOPS. Please note that bypassing the SCSI core would make it much harder than necessary to introduce zoned storage support and also that this would lead to plenty of duplicated code. >> For other SCSI LLDs the cost of >> atomic operations and memory barriers in the LLD outweighs the cost >> of >> the operations in the SCSI core and sd drivers. I'm not sure whether >> that's also the case for the UFS driver. > > I didn't take this into account, maybe it's not a big deal, since the > UFS driver might use its own lock/serialization lock. Every single atomic operation in the hot path has a measurable performance impact. That includes locking operations. Thanks, Bart.