From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 268757AA for ; Tue, 2 Aug 2016 00:56:56 +0000 (UTC) Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 204C7217 for ; Tue, 2 Aug 2016 00:56:55 +0000 (UTC) Date: Tue, 2 Aug 2016 02:56:50 +0200 From: "Luis R. Rodriguez" To: "Rafael J. Wysocki" Message-ID: <20160802005650.GF3296@wotan.suse.de> References: <20160729111303.GA10376@sirena.org.uk> <20160801190309.GX3296@wotan.suse.de> <1555444.RIYpEbA2F7@vostro.rjw.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1555444.RIYpEbA2F7@vostro.rjw.lan> Cc: Oded Gabbay , =?iso-8859-1?Q?J=F6rg_R=F6del?= , ksummit-discuss@lists.linuxfoundation.org, Mauro Carvalho Chehab , "vegard.nossum@gmail.com" , "rafael.j.wysocki" , Cristina Moraru , Roberto Di Cosmo , Marek Szyprowski , Stefano Zacchiroli , Valentin Rothberg Subject: Re: [Ksummit-discuss] [TECH TOPIC] Addressing complex dependencies and semantics (v2) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Tue, Aug 02, 2016 at 02:01:56AM +0200, Rafael J. Wysocki wrote: > On Monday, August 01, 2016 09:03:09 PM Luis R. Rodriguez wrote: > > On Fri, Jul 29, 2016 at 12:13:03PM +0100, Mark Brown wrote: > > > On Fri, Jul 29, 2016 at 09:45:55AM +0200, Hans Verkuil wrote: > > > > > > > My main problem is not so much with deferred probe (esp. for cyclic dependencies > > > > it is a simple method of solving this, and simple is good). My main problem is > > > > that you can't tell the system that driver A needs to be probed after drivers B, > > > > C and D are probed first. > > > > > > > That would allow us to get rid of v4l2-async.c which is a horrible hack. > > > > I'd like to understand the requirement for this a bit better, so someone explaining > > this would be good if this moves forward as a tech session. > > One case I'm familiar with is when a device has two (or more) device IDs, where > one is more specific than the other. The idea being that if the OS has a driver > for that particular device, it will use the more specific device ID (say A) to > look for it, but otherwise it will use the other "generic" ID (say B) to match > against a "generic" driver. > > Of course, that only works if the driver for A is probed before the driver for B. That's a good case to keep in mind which is indeed complex. > > > > That code allows a bridge driver to wait until all dependent drivers are probed. > > > > This really should be core functionality. > > > > > > > Do other subsystems do something similar like drivers/media/v4l2-core/v4l2-async.c? > > > > Does anyone know? > > > > > > ASoC does, it has an explicit card driver to join things together and > > > that just defers probe until everything it needs is present. This was > > > originally open coded in ASoC but once deferred probe was implemented we > > > converted to that. > > > > OK that's 2. > > > > Another piece of code that deals with complex dependencies I've recently > > ventured into was the x86 IOMMU stuff. The complexities here are both at > > the built-in init level and also later at probe. > > > > For the built-in code we have a code run time sort where based on a simple set > > of semantics used to declare dependencies final code that was compiled in is > > sorted out so that the code that needs to initialize first triggers first. hpa > > had suggested long ago that generalizing this was desirable, so I've taken that > > task and have some basic building blocks which are now being proposed in their > > RFC v3 series [0]. If it seems that the run time sort mechanism is left out, > > its correct, upon review with Andy on RFC v2 we don't *yet* need a run time > > sort, even though I expanded on the existing one for IOMMU and strengthened the > > semantics there, its still available on the userspace mockup solution for > > linker tables though [1]. I should note that I did at least determine that > > generalizing a sort for dependency maps which shuffles code at run time > > did not make sense due to the fact that you'd either need to generalize the > > building blocks used for defining a dependency map. If you move one item used > > for init from the end to the front, your iteration function must use the same > > structure in the same way. Run time sorts for code sections then are left up > > to each subsystem to implement. On x86, I evaluated sorting out dependencies > > further, for instance on setup_arch() -- however it just seemed not needed > > at this point. For other subsystems this may make more sense. > > > > At the probe level IOMMUs face another issue when dealing with GPU drivers. > > The kernel has 7 levels of initialization possible (pure_initcall() == 0, > > core_initcall() == 1, ..., fs_initcall() == 5, device_initcall() == 6, > > late_initcall() == 7), module_init() maps to device_initcall(), but modules may > > also use late_initcall() if they know a driver built-in needs to definitely run > > very late. Naturally if you're not a piece of code running very early on, and > > you want the option to be a module as well (device_initcall == 6) then you > > really only have at your disposal two levels available before you get to an end > > device driver. It turns out that for GPU drivers this is not enough to map out > > enough dependencies at the higher level, so the only next best solution > > available is to implicitly rely on linker order. This is the case for ordering > > logistics between AMD IOMMU v1, AMD IOMMUv2 (depends on AMD IOMMU v1), and > > AMD KFD driver, and the AMD radeon driver: > > > > 0. AMD IOMMUv1: arch/x86/kernel/pci-dma.c > > 1. AMD IOMMUv2: drivers/iommu/amd_iommu_v2.c > > 2. AMD KFD: drivers/gpu/drm/amd/amdkfd/kfd_module.c > > 3. AMD Radeon: drivers/gpu/drm/radeon/radeon_drv.c > > > > For details refer to a recent discussion on this [2]. > > > > Using linker tables for initialization should enable us to scale initialization > > levels well beyond 7 levels, in fact this could be however long an ASCII string > > is acceptable in C as the order level is stored as a string (part of the section > > name) and also enable each subsystem to have its own set of levels / heuristics > > for initialization as well. Ordering would actually be done at link time, so > > the code is already ordered *iff* all dependency heuristics are available at > > compilation time. If dependency information can only be available at run time, > > a run time sort is optional but using the existing x86 IOMMU code as an example > > and precedent (and considering my expanded stricter semantics) -- in the future > > this can be an option for any subsystem as well as a supplement. > > > > My interest in the media controller and the SAT solver is I consider the run > > time sort nothing but a sloppy hack (see my userspace tree for tons of > > semantics fixes), and formalizing this to something more concrete that can be > > generic should be more useful to other parts of the kernel. This is why I > > wanted to discuss these things. Is there a generic common goal here and > > can we share code ? If so what would that look like ? > > > > My impression at first is that linker-time dependencies should suffice to cover > > a lot cases, but that should still mean then that binutils ld SORT() could > > potentially be enhanced further in the future, specially if a simple dependency > > map could be expressed. For instance can we make it run faster, or would > > having it interpret certain other criteria enable us to use a SAT solver > > to optimize dependency resolution / ordering. Then there is run time sorting > > and optimization -- but the same questions apply here. > > > > I think a fruitful session would be to: > > > > o review / explain each of the complex ordering problems from different > > kernel subsystems > > o review / explain existing solutions to these problems > > o finally try to figure out if a common solution is possible, both > > short term and long term > > > > [0] https://lkml.kernel.org/r/1469222687-1600-1-git-send-email-mcgrof@kernel.org > > [1] https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linker-tables.git/ > > [2] https://lkml.kernel.org/r/1464311916-10065-1-git-send-email-mcgrof@kernel.org > > I'm still wondering why probe ordering is special and suspend-resume and runtime > PM ordering isn't (and shutdown ordering too, for that matter). > > In the majority of cases when probe ordering matters, suspend/resume and runtime > PM ordering matters too, so considering one of them alone doesn't seem very > useful to me. > > Now, even if you can address probe ordering issues by using some link-time > methods and similar, runtime PM is async by nature and suspend/resume is > async for many devices too (for efficiency reasons) and the only approach > that works here is to have the dependencies represented as data somehow > and use those when you need to carry out the operation. Apologies, its why I Cc'd you -- I did mean to implicate both, given that as I also had reviewed your work, I figured similar issues were being dealt with there. So indeed, what I am suggesting is perhaps we can knock a few birds with one stone here. Luis