linux-cxl.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* CXL Vendor Specific Capabilities
@ 2021-05-19 16:45 Ariel.Sibley
  2021-05-20  9:53 ` Christoph Hellwig
  0 siblings, 1 reply; 4+ messages in thread
From: Ariel.Sibley @ 2021-05-19 16:45 UTC (permalink / raw)
  To: linux-cxl
  Cc: Ahmad.Danesh, cpetersen, mahesh.natu, dan.j.williams,
	Vincent.Hache, Ariel.Sibley, Sanjay.Goyal, scolee, jhinkle,
	guru.anbalagane, kevin.marks, bhirano, chet.r.douglas, dcaracci,
	jgroves, andy.rudoff

Re-opening discussion on vendor-specific capabilities on linux-cxl mailing list.

Background:
===========

The default/recommended value of the CONFIG_CXL_MEM_RAW_COMMANDS Kconfig knob will block passthrough of vendor defined mailbox commands.  If this Kconfig knob is enabled, passthrough of vendor defined commands will still taint the kernel.

There is a consensus among device and system manufacturers that vendor-specific capabilities are a necessary component of specifications and we need a solution to support these capabilities. Within reason, these capabilities need to be enabled without tainting the kernel.

Key points to consider regarding handling of vendor-specific capabilities:
==========================================================================

* Tainting the kernel during harmless operations is not desirable. While some users have full control over the kernel and could patch this behavior, those users would prefer to use upstreamed kernel vs. carrying patches.

* There is agreement that some capabilities are candidates for standardization, but a path for enabling that class of functionality until it makes its way into the spec is needed.  It takes a long time to get anything through the full process of acceptance into the spec and upstreamed. This will slow adoption and stifle innovation.

* The reality is that we can’t put everything into the spec. The primary cases are vendor-specific implementations which are not suited for standardization, and proprietary/differentiating features.

* Limiting the usefulness of the path for vendor capabilities through the driver will likely result in undesirable side-effects as suppliers and others won’t accept this, forcing vendors to develop workarounds that bypass the driver. If the driver provides a usable pathway that does not taint the kernel, vendors are less likely to implement such approaches. The driver will then be able to perform some policing, for instance based on CEL contents, and CXL device providers can be considered good citizens of the Linux community.

Recommendation:
===============

Allow root to issue vendor defined passthrough commands by default and use the CEL to determine whether a given command should taint the kernel.  Commands that have no effects, or harmless effects, should not taint the kernel.  Commands that have potentially harmful effects should taint the kernel.

Examples:
=========

Case 1: Vendor defined is needed, and the operations do not impact the normal operation of the device. The CEL contents could be used to determine there is no impact and thereby permit these commands without tainting the kernel.

* Example 1.1: Interfacing with common diagnostic features which may have wide-ranging implementations and capabilities across different vendors. E.g. Signal integrity diagnostics, debug counters.

* Example 1.2: Retrieve diagnostics data from proprietary value-add features.

Case 2: Vendor defined is needed, but the operations impact the normal operation of the device. CEL contents can be used to determine the impact and taint the kernel.

* Example 2.1: Vendor-specific diagnostics features that are intended to stress the device or memory for qualification and validation purposes. 

Q & A:
======

Q: Why is it a problem to have CONFIG_CXL_MEM_RAW_COMMANDS disabled by default?
A: This pushes the problem to the distros. Distros are likely to follow the Kconfig recommendation by default, so industry will need to have this discussion with each major distro maintainer. If we can reach a consensus in this forum that we can default this knob to enabled, distros are more likely to also enable the functionality.

Q: Why is it a problem to taint the kernel for all passthrough commands when CONFIG_CXL_MEM_RAW_COMMANDS is enabled?
A: Some functionality is harmless. If the kernel is tainted when executing basic read-only diagnostics, it may be harder to get support for real issues as the state of the system will be suspect. The driver should be able to use the CEL to determine when to taint.

Q: Why can't we use Vendor Log for this?
A: Several reasons:
* There is no ability to control functionality. Many diagnostic features require input to configure what is being read.
* The device may require time to collect the diagnostic information, but must respond to Get Log in under 2 seconds (preferably much faster).
* It is a single blob, which can be accessed at arbitrary offsets. The limited set of use cases that could be covered by this model would require driver changes to enable the Vendor Log to be dumped with finer granularity. But even then, the fact that the host could potentially read anywhere in the Vendor Log at any time makes robustly encoding a set of debug forensics that are populated on demand (in response to Get Log) from underlying hardware into vendor defined offsets in the Vendor Log difficult.

Q: What about security?
A: While vendor commands have the potential to be a security risk, it should not be the driver’s responsibility to solve that problem. The in-band path through the driver is just one path into the device. For a device to be secure, vendors and their partner customers need to lock out anything that is a security risk at the device level for production. Given that this device level lockout is necessary, the catch all lockout at the driver level is redundant from a device security point of view.

Q: How about we add commands to UAPI instead?
A: Vendors should not be adding hooks into a class driver which is supposed to be generic. The details of anything that goes in UAPI would need to be fully public. Any proposed additions would need to be fully explained as to why they are not able to be added to the spec. This limits the viability of UAPI as a method for unlocking value-add functionality.

Conclusion:
===========

I look forward to a healthy discussion on how we can move this forward, and hope that we can reach an answer that is acceptable for the kernel maintainers, CXL device vendors, and end users.

Regards,
Ariel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-06-12  2:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-19 16:45 CXL Vendor Specific Capabilities Ariel.Sibley
2021-05-20  9:53 ` Christoph Hellwig
2021-05-20 11:55   ` Ariel.Sibley
2021-06-12  2:25     ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).