From mboxrd@z Thu Jan 1 00:00:00 1970 From: Graeme Russ Date: Fri, 26 Oct 2012 10:04:28 +1100 Subject: [U-Boot] [PATCH v9] [RFC] Add dmmalloc module for DM. In-Reply-To: References: <1346069529-27397-1-git-send-email-tmshlvck@gmail.com> <1351122590-15068-1-git-send-email-tmshlvck@gmail.com> Message-ID: List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: u-boot@lists.denx.de Hi Tomas, On Fri, Oct 26, 2012 at 6:16 AM, Tomas Hlavacek wrote: > Hello Graeme, > > On Thu, Oct 25, 2012 at 3:40 AM, Graeme Russ wrote: > >>> diff --git a/arch/arm/include/asm/global_data.h b/arch/arm/include/asm/global_data.h >>> index 2b9af93..9045829 100644 >>> --- a/arch/arm/include/asm/global_data.h >>> +++ b/arch/arm/include/asm/global_data.h >>> @@ -82,6 +82,9 @@ typedef struct global_data { >>> unsigned long post_log_res; /* success of POST test */ >>> unsigned long post_init_f_time; /* When post_init_f started */ >>> #endif >>> +#ifdef CONFIG_SYS_EARLY_MALLOC >>> + void *early_heap; /* heap for early_malloc */ >>> +#endif >> >> Why not early_heap_header *early_heap; ? >> > > It might be. > > Actually it is a good point because I am using 3 different ways of > dealing with addresses: 1) struct early_heap header * or struct > early_block_header * - I am using this when I want to access members > of the stucture or compute address past the structure (which is where > the heap or block starts); 2) phys_addr_t - which is plain integer and > I use this for simple computations when I do not want to worry about > pointer arithmetic; 3) void * when I have just plain address, > especially when I want to pass an addres among logical parts of the > mallocator or outside. This may a bit controversial and perhaps I > should replace it by specific strucutre pointers internally. > > I am unable to decide: Should I remove struct early_heap_header from > dmmalloc.h making it publicly unavailable or should I rather change > the void * to struct early_heap_header * in the GD structure? What do > you think is better? I think struct early_heap_header * in the GD structure is the better way to go as that is exactly what it is. [snip] >>> + h = (struct early_heap_header *)CONFIG_SYS_EARLY_HEAP_ADDR; >>> + b = (struct early_block_header *)(h + 1); >> >> Hmmm, does this really work? I would have thought: >> >> b = (struct early_block_header *)(h + sizeof(struct early_heap_header)); >> >> but I could be mistaken > > It seems that it works as it is (at least I wrote bunch of tests and I > inspected resulting heaps and it was all right). I believe that since > h is a pointer to the struct early_heap_header then pointer arithmetic > is in effect and h+1 actually means "next element in the array of > struct early_heap_header". Which is the address past the header that > equals beginning of the heap data block. (?) As I said, I could be mistaken - it appears I am :) >>> +int early_malloc_active(void) >>> +{ >>> + return ((gd->flags & GD_FLG_RELOC) != GD_FLG_RELOC); >>> +} >> >> I think we need another flag - GD_FLG_RELOC gets set before the permanent >> heap is initialised, so there is a window of opportunity where things may >> break > > Well, as I wrote in the commit message - this is only a temporary > implementation. I suppose I am going to change this when we have more > coarse initialization flags wired into DM (which I believe we are > going to have it anyway). So now I am just working around "forward > dependency" here. > >> >>> + >>> +void early_heap_dump(struct early_heap_header *h) >>> +{ >>> + struct early_block_header *b; >>> + >>> + debug("heap: h=%p, h->size=%d\n", h, h->size); >>> + >>> + b = (struct early_block_header *)(h+1); >>> + while ((phys_addr_t)b + sizeof(struct early_block_header) >>> + < (phys_addr_t)h + h->size) { >>> + debug("block: h=%p h->size=%d b=%p b->size=%d b->(used)=%d\n", >>> + h, h->size, b, BLOCK_SIZE(b->size), >>> + BLOCK_USED(b->size)); >>> + assert(BLOCK_SIZE(b->size) > 0); >>> + b = (struct early_block_header *)((phys_addr_t)b + >>> + sizeof(struct early_block_header) + >>> + BLOCK_SIZE(b->size)); >>> + } >>> + debug("--- heap dump end ---\n"); >>> +} >> >> Nice touch, but could we just iterate through all ealry heap chunks starting >> from gd->early_heap? > > Or I can have two functions. One heap specific and one for all heaps. > I think both might be useful when somebody needs to debug early_malloc > or memory usage etc. in the early stage. Thanks. True, just adding another function which iterates through the heaps and calls this function would be fine. [snip] >>> +static inline void *dmrealloc(void *oldaddr, size_t bytes) >>> +{ >>> +#ifdef CONFIG_SYS_EARLY_MALLOC >>> + char *addr; >>> + struct early_block_header *h; >>> + if (early_malloc_active()) { >>> + addr = dmmalloc(bytes); >>> + if (addr == NULL) >>> + return NULL; >>> + >>> + h = BLOCK_HEADER(oldaddr); >>> + if (BLOCK_FREE(h->size)) >>> + return NULL; >>> + >>> + if (bytes > BLOCK_SIZE(h->size)) >>> + bytes = BLOCK_SIZE(h->size); >>> + >>> + memcpy(addr, oldaddr, bytes); >>> + dmfree(oldaddr); >>> + return addr; >>> + } >>> +#endif /* CONFIG_SYS_EARLY_MALLOC */ >>> + return realloc(oldaddr, bytes); >>> +} >> >> Hmmm, we have a very intersting corner case to deal with. What if we use >> dmrealloc() to move allocated data from the early malloc pool to the final >> malloc pool? >> >> At some point, we need to assume the developer is only ever going to pass >> early malloc'd memory to dmrealloc() > > Good point! The current code would do the job assuming that the > early_malloc can still access the proper early_heap (or a copy, but in > that case some additional magic is needed) and the real malloc is > already initialized. > > As you can see I am still sticking with the double-copy method. It is > maybe due to lack of insight. But the discussion here was not > absolutely conclusive last time. I have even some experimental code > (not ready for submitting at all) for double copy method but I would > prefer to discuss it separately when the early_malloc() is done. Well the double-copy approach to reallocating early-malloc'd memory and how dmrealloc() is implemented are tightly coupled. I understand the reason for the double copy is performance related (getting into a position to enable cache as early as possible), but I wonder how much will really be gained. It seems to me that the only performance gain will be for the execution of the driver 'relocation fixup' code wich, I assume, would not really consume that many CPU cycles. There is a point where code simplicity outweighs performance gains. >> The other option is to use the gd->flags... >> >> Define two flags - something like: >> >> GD_FLG_HEAP_INIT -> Final heap has been initialised >> GD_FLG_EH_DONE -> free(), realloc() refer to final heap >> >> (I don't like the names, I'm just not up to thinking of anything better) >> >> This way we can use dmalloc() prior to the heap being initialised and then >> set GD_FLG_HEAP_INIT. Once GD_FLG_HEAP_INIT has been set, do a bunch of >> dmrealloc() calls to essentially move the data into the permanent heap (of >> course pointers int the structures need to be fixed up by the drivers) and >> finally set GD_FLG_EH_DONE (and call early_heap_dump) >> >> One problem I see is what happens of you call malloc() and free() on the >> same block between the setting of GD_FLG_HEAP_INIT and GD_FLG_EH_DONE? The >> code will try to reference the block as if it is in the early heap, but it >> won't be. >> >> One solution is, on detection of GD_FLG_HEAP_INIT being set, dmfree() and >> dmrealloc() could search the early heap chunks instead of just assuming >> that the referenced block is in the early heap (if GD_FLG_HEAP_INIT has not >> been set, it will be safe to assume the memory is on the early heap) > > Sure we will have that flags. But I think we can use them as well for > switching from DM driver instance list to tree for example. Or other > way around: I can use DM flags for early_malloc. Therefore I would > like to synchronize with DM cores for PCI and another low-level things > which are certainly going to start in early stage. It would be best to > use the same flags and switch on/off early_malloc based on DM internal > state. Ah, I see where more performance gains are to be made by switching on cache earlier - During the reallocation phase, you are switching from the list based structure (fast for the small number of pre-SDRAM drivers) into the final tree based structure. I'm looking at this early malloc code from a much more generic point of view - I think there are use-cases outside the driver model, so I don't see a need (rather the opposite) to tie early malloc to the driver model Regards, Graeme