Hi Kenny, Thanks for the info. Do you mind forwarding the existing discussion to me or have me cc'ed in that thread? Best, Yiwei On Wed, Oct 30, 2019 at 10:23 PM Kenny Ho wrote: > Hi Yiwei, > > I am not sure if you are aware, there is an ongoing RFC on adding drm > support in cgroup for the purpose of resource tracking. One of the > resource is GPU memory. It's not exactly the same as what you are > proposing (it doesn't track API usage, but it tracks the type of GPU > memory from kmd perspective) but perhaps it would be of interest to > you. There are no consensus on it at this point. > > (sorry for being late to the discussion. I only noticed this thread > when one of the email got lucky and escape the spam folder.) > > Regards, > Kenny > > On Wed, Oct 30, 2019 at 4:14 AM Yiwei Zhang wrote: > > > > Hi Jerome and all folks, > > > > In addition to my last reply, I just wanna get some more information > regarding this on the upstream side. > > > > 1. Do you think this(standardize a way to report GPU private > allocations) is going to be a useful thing on the upstream as well? It > grants a lot benefits for Android, but I'd like to get an idea for the > non-Android world. > > > > 2. There might be some worries that upstream kernel driver has no idea > regarding the API. However, to achieve good fidelity around memory > reporting, we'd have to pass down certain metadata which is known only by > the userland. Consider this use case: on the upstream side, freedreno for > example, some memory buffer object(BO) during its own lifecycle could > represent totally different things, and kmd is not aware of that. When we'd > like to take memory snapshots at certain granularity, we have to know what > that buffer represents so that the snapshot can be meaningful and useful. > > > > If we just keep this Android specific, I'd worry some day the upstream > has standardized a way to report this and Android vendors have to take > extra efforts to migrate over. This is one of the main reasons we'd like to > do this on the upstream side. > > > > Timeline wise, Android has explicit deadlines for the next release and > we have to push hard towards those. Any prompt responses are very much > appreciated! > > > > Best regards, > > Yiwei > > > > On Mon, Oct 28, 2019 at 11:33 AM Yiwei Zhang wrote: > >> > >> On Mon, Oct 28, 2019 at 8:26 AM Jerome Glisse > wrote: > >>> > >>> On Fri, Oct 25, 2019 at 11:35:32AM -0700, Yiwei Zhang wrote: > >>> > Hi folks, > >>> > > >>> > This is the plain text version of the previous email in case that was > >>> > considered as spam. > >>> > > >>> > --- Background --- > >>> > On the downstream Android, vendors used to report GPU private memory > >>> > allocations with debugfs nodes in their own formats. However, > debugfs nodes > >>> > are getting deprecated in the next Android release. > >>> > >>> Maybe explain why it is useful first ? > >> > >> > >> Memory is precious on Android mobile platforms. Apps using a large > amount of > >> memory, games, tend to maintain a table for the memory on different > devices with > >> different prediction models. Private gpu memory allocations is > currently semi-blind > >> to the apps and the platform as well. > >> > >> By having the data, the platform can do: > >> (1) GPU memory profiling as part of the huge Android profiler in > progress. > >> (2) Android system health team can enrich the performance test coverage. > >> (3) We can collect filed metrics to detect any regression on the gpu > private memory > >> allocations in the production population. > >> (4) Shell user can easily dump the allocations in a uniform way across > vendors. > >> (5) Platform can feed the data to the apps so that apps can do memory > allocations > >> in a more predictable way. > >> > >>> > >>> > > >>> > --- Proposal --- > >>> > We are taking the chance to unify all the vendors to migrate their > existing > >>> > debugfs nodes into a standardized sysfs node structure. Then the > platform > >>> > is able to do a bunch of useful things: memory profiling, system > health > >>> > coverage, field metrics, local shell dump, in-app api, etc. This > proposal > >>> > is better served upstream as all GPU vendors can standardize a gpu > memory > >>> > structure and reduce fragmentation across Android and Linux that > clients > >>> > can rely on. > >>> > > >>> > --- Detailed design --- > >>> > The sysfs node structure looks like below: > >>> > /sys/devices/// > >>> > e.g. "/sys/devices/mali0/gpu_mem/606/gl_buffer" and the gl_buffer is > a node > >>> > having the comma separated size values: "4096,81920,...,4096". > >>> > >>> How does kernel knows what API the allocation is use for ? With the > >>> open source driver you never specify what API is creating a gem object > >>> (opengl, vulkan, ...) nor what purpose (transient, shader, ...). > >> > >> > >> Oh, is this a hard requirement for the open source drivers to not > bookkeep any > >> data from userland? I think the API is just some additional metadata > passed down. > >> > >>> > >>> > >>> > For the top level root, vendors can choose their own names based on > the > >>> > value of ro.gfx.sysfs.0 the vendors set. (1) For the multiple gpu > driver > >>> > cases, we can use ro.gfx.sysfs.1, ro.gfx.sysfs.2 for the 2nd and 3rd > KMDs. > >>> > (2) It's also allowed to put some sub-dir for example "kgsl/gpu_mem" > or > >>> > "mali0/gpu_mem" in the ro.gfx.sysfs. property if the root > name > >>> > under /sys/devices/ is already created and used for other purposes. > >>> > >>> On one side you want to standardize on the other you want to give > >>> complete freedom on the top level naming scheme. I would rather see a > >>> consistent naming scheme (ie something more restraint and with little > >>> place for interpration by individual driver) > >> > >> > >> Thanks for commenting on this. We definitely need some suggestions on > the root > >> directory. In the multi-gpu case on desktop, is there some existing > consumer to > >> query "some data" from all the GPUs? How does the tool find all GPUs and > >> differentiate between them? Is this already standardized? > >> > >>> > For the 2nd level "pid", there are usually just a couple of them per > >>> > snapshot, since we only takes snapshot for the active ones. > >>> > >>> ? Do not understand here, you can have any number of applications with > >>> GPU objects ? And thus there is no bound on the number of PID. Please > >>> consider desktop too, i do not know what kind of limitation android > >>> impose. > >> > >> > >> We are only interested in tracking *active* GPU private allocations. So > yes, any > >> application currently holding an active GPU context will probably has a > node here. > >> Since we want to do profiling for specific apps, the data has to be per > application > >> based. I don't get your concerns here. If it's about the tracking > overhead, it's rare > >> to see tons of application doing private gpu allocations at the same > time. Could > >> you help elaborate a bit? > >> > >>> > For the 3rd level "type_name", the type name will be one of the GPU > memory > >>> > object types in lower case, and the value will be a comma separated > >>> > sequence of size values for all the allocations under that specific > type. > >>> > > >>> > We especially would like some comments on this part. For the GPU > memory > >>> > object types, we defined 9 different types for Android: > >>> > (1) UNKNOWN // not accounted for in any other category > >>> > (2) SHADER // shader binaries > >>> > (3) COMMAND // allocations which have a lifetime similar to a > >>> > VkCommandBuffer > >>> > (4) VULKAN // backing for VkDeviceMemory > >>> > (5) GL_TEXTURE // GL Texture and RenderBuffer > >>> > (6) GL_BUFFER // GL Buffer > >>> > (7) QUERY // backing for query > >>> > (8) DESCRIPTOR // allocations which have a lifetime similar to a > >>> > VkDescriptorSet > >>> > (9) TRANSIENT // random transient things that the driver needs > >>> > > >>> > We are wondering if those type enumerations make sense to the > upstream side > >>> > as well, or maybe we just deal with our own different type sets. Cuz > on the > >>> > Android side, we'll just read those nodes named after the types we > defined > >>> > in the sysfs node structure. > >>> > >>> See my above point of open source driver and kernel being unaware > >>> of the allocation purpose and use. > >>> > >>> Cheers, > >>> Jérôme > >>> > >> > >> Many thanks for the reply! > >> Yiwei > > > > _______________________________________________ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel >