On 23/04/2021 00:49, Florian Fainelli wrote: > On 4/22/2021 1:10 PM, Joe Burmeister wrote: >>> On 4/20/2021 1:34 AM, Joe Burmeister wrote: >>>> It was previoulsy possible to have a device tree with more chips than >>>> the driver supports and go off the end of CS arrays. >>> Do you mind walking me through the code how that could have happened? We >>> have spi_register_controller() call of_spi_get_gpio_numbers() which has >>> the following: >>> >>> ctlr->num_chipselect = max_t(int, nb, ctlr->num_chipselect); >>> >>> such that what the controller has is the maximum between the number of >>> 'cs-gpios' properties parsed and what was already populated in >>> ctrl->num_chipselect during bcm2835_spi_probe(), which for this driver >>> is BCM2835_SPI_NUM_CS (3). >> If you make a initial device tree (or add overlay in the rpi's >> config.txt) with more on the bus than BCM2835_SPI_NUM_CS (in my case 8 >> devices), you get into this trampling memory state. As the devices are >> added, once the chip_select is equal to or greater than >> BCM2835_SPI_NUM_CS, it's writing off the end of the arrays. > OK. > >> There is no protection from this happening. By the looks of it, this >> isn't the only driver this could happen with, but it is the one I have >> hardware for to test. There are also drivers that look like they don't >> have a problem going well beyond the limit they gave. > Right, which means that we should probably seek a solution within the > SPI core itself, even if you can only test with spi-bcm2835.c chances > are that the fix would be applicable for other controllers if done in > the core. I'm not sure it's possible to do in the core alone. The numb of the issue is the core changes ctlr->num_chipselect to what is in the device tree and some drivers are cool with that overs quietly stomp memory. If we stop the core changing ctlr->num_chipselect then sod's law says that we'd break existing devices which exceed the drivers num_chipselect without a problem. I've got a simple little patch to warn when the core expands ctlr->num_chipselect. This warning won't go off in bcm2835 with my patch because I am also extending ctlr->num_chipselect to the amount in the device tree before the core does that expansion. Hopefully that new warning would make people investigate and fix problem drivers. >> There is protection in spi_add_device, which will catch extra added >> later, but not ones in the device tree when the spi controller was >> registered. > Not sure I follow you, if we have the overlay before > spi_register_controller() is called, how can the check there not > trigger? And if we load the overlay later when the SPI controller is > already registered, why does not spi_add_device()'s check work? I think it might be a RPI thing. I think it is merging in the overlay and giving Linux one already merged. > How would I go about reproducing this on a Pi4? Attached is a device tree overlay. If you compile that up and stick it in /boot/overlays and add dtoverlay=rpi-bug to your config.txt, you can get into this state. If you do dtoverlay, you don't see anything, but if you do: ls /dev/spi* You can see all the spidev added by this are added. 2 of which go beyond the drivers CS arrays. >>>> This patches inforces CS limit but sets that limit to the max of the >>>> default limit and what is in the device tree when driver is loaded. >>>> >>>> Signed-off-by: Joe Burmeister >>> You have changed many more things that just enforcing a limit on >>> BCM2835_SPI_NUM_CS you have now made all chip-select related data >>> structuresd dynamically allocated and you have changed a number of >>> prints to use the shorthand "dev" instead of &pdev->dev. >> The change to dynamic allocated arrays is just to support what is given >> in the deviceĀ  tree rather than increase and enforce the CS limit just >> for my case. >> >> The shorthand is of course not required. I'll drop it on resubmitting. >> >> Regards, >> >> Joe >> >>