From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?ISO-8859-1?Q?Matthias_Wei=DFer?= Date: Tue, 25 Jan 2011 11:55:22 +0100 Subject: [U-Boot] [PATCH] arm: Use optimized memcpy and memset from linux In-Reply-To: <20110124200729.2134CB187@gemini.denx.de> References: <1295884607-9044-1-git-send-email-weisserm@arcor.de> <20110124161338.B0345D42A89@gemini.denx.de> <4D3DD1EC.7010506@arcor.de> <20110124200729.2134CB187@gemini.denx.de> Message-ID: <4D3EAC1A.5030707@arcor.de> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: u-boot@lists.denx.de Am 24.01.2011 21:07, schrieb Wolfgang Denk: > OK - so which results do you see in reallife use, say when loading and > booting an OS? How much boot time can be saved? All tests are done with jadecpu | HEAD(1)| HEAD(1)| HEAD(2)| HEAD(2)| | | +patch | | +patch | -----------------------+--------+--------+--------+--------+ Reset to prompt | 438ms | 330ms | 228ms | 120ms | | | | | | TFTP a 3MB img | 4782ms | 3428ms | 3245ms | 2820ms | | | | | | FATLOAD USB a 3MB img* | 8515ms | 8510ms | ------ | ------ | | | | | | BOOTM LZO img in RAM | 3473ms | 3168ms | 592ms | 592ms | where CRC is | 615ms | 615ms | 54ms | 54ms | uncompress | 2460ms | 2462ms | 450ms | 451ms | final boot_elf | 376ms | 68ms | 65ms | 65ms | | | | | | BOOTM LZO img in FLASH | 3207ms | 2902ms | 1050ms | 1050ms | where CRC is | 600ms | 600ms | 135ms | 135ms | uncompress | 2209ms | 2211ms | 828ms | 828ms | final boot_elf | 376ms | 68ms | 65ms | 65ms | (1) No dcache (2) dcache enabled in board_init *Does not work when dcache is on I think we can see that there seems to be no negativ impact of theses patches when only execution speed is taken into consideration. The gain is noticable when caching is not used or not activated. For pure RAM to RAM copy when caching is activated the patch didn't change anything. Here are some additional numbers for copying a 1.4MB image from NOR to RAM: HEAD : 134ms HEAD + patch : 72ms HEAD + dcache : 120ms HEAD + dcache + patch : 70ms So, for copy actions from flash to RAM there is also an improvement. As boot times are a bit critical or us every improvement > 10ms is interesting for us. > I guess the speed improvemnt you see for a few large copy operations > is just one side - probably there will be slower excution (due to the > effort to set up the operations) for the (many more frequent) small > operations. In addition, there is an increase of the memory footprint > of nearly 1 kB. > > I think additional measuremnts need to be done - for example, we > should check how the execution times change for typical operations > like TFTP download, reading from NAND flash and MMC/SDcard, booting a > Linux kernel etc. As the test above show there is no negative performance impact with the test cases I have done. As we don't use Linux here I can't test this. Maybe someone other can jump in here. > Also, it should be possible to enable this feature consditionally, so > users can decide wether speed or size is more important in their > configurations. Would it be an option to use the CONFIG entries CONFIG_USE_ARCH_MEMCPY and CONFIG_USE_ARCH_MEMSET to enable that feature? If that is OK I can send a new version of the patch. The only problem I see with this approach is that there are architectures which already have their own implementations which are then not affected by these config options. Regards Matthias