From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756167Ab2IXOuV (ORCPT ); Mon, 24 Sep 2012 10:50:21 -0400 Received: from devils.ext.ti.com ([198.47.26.153]:51743 "EHLO devils.ext.ti.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756129Ab2IXOuS (ORCPT ); Mon, 24 Sep 2012 10:50:18 -0400 Message-ID: <506072FE.2020506@ti.com> Date: Mon, 24 Sep 2012 10:49:34 -0400 From: Cyril Chemparathy User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:15.0) Gecko/20120907 Thunderbird/15.0.1 MIME-Version: 1.0 To: Dave Martin CC: , , , , , , , , , , , , , , , , , , , , , , , Subject: Re: [PATCH v3 RESEND 01/17] ARM: add mechanism for late code patching References: <1348242975-19184-1-git-send-email-cyril@ti.com> <1348242975-19184-2-git-send-email-cyril@ti.com> <20120924120637.GB2122@linaro.org> In-Reply-To: <20120924120637.GB2122@linaro.org> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Dave, Thanks for the detailed review... On 9/24/2012 8:06 AM, Dave Martin wrote: > On Fri, Sep 21, 2012 at 11:55:59AM -0400, Cyril Chemparathy wrote: >> The original phys_to_virt/virt_to_phys patching implementation relied on early >> patching prior to MMU initialization. On PAE systems running out of >4G >> address space, this would have entailed an additional round of patching after >> switching over to the high address space. >> >> The approach implemented here conceptually extends the original PHYS_OFFSET >> patching implementation with the introduction of "early" patch stubs. Early >> patch code is required to be functional out of the box, even before the patch >> is applied. This is implemented by inserting functional (but inefficient) >> load code into the .runtime.patch.code init section. Having functional code >> out of the box then allows us to defer the init time patch application until >> later in the init sequence. > > There are currently a few different patching mechanisms in the kernel, and > it would be good if we could collect more of them under some common > framework. > That would be great! I could use some pointers here. I've looked at the kprobes code, and there doesn't appear to be much of an intersection there... > For example, it might be possible to do the SMP-on-UP fixups using the same > framework you propose. Best not to attempt that yet, though. > ... and I've looked at the ALT_SMP/ALT_UP stuff as well. The problem here appears to be that ALT_SMP is needed way early in boot - up in head.S assembly-land. The third place is probably the jump label mechanism. That may be a fit with some work, but I'm not sure yet. > Overall, this looks well thought out and useful, though it looks like it > has a few issues that need attention. > > Comments below. > > Cheers > ---Dave > [...] >> Note: the obtuse use of stringified symbols in patch_stub() and >> early_patch_stub() is intentional. Theoretically this should have been >> accomplished with formal operands passed into the asm block, but this requires >> the use of the 'c' modifier for instantiating the long (e.g. .long %c0). >> However, the 'c' modifier has been found to ICE certain versions of GCC, and >> therefore we resort to stringified symbols here. > > You might find that the "n" constraint works. The explanation in the > GCC docs is pretty incomprehensible, but at least it exists. > > __stringify hacks are not uncommon anyway though, so it's not a problem > either way. > The stringify hack is nasty, and I'd like to get rid of it if I could... I looked up constraints.md, and couldn't find "n" defined as a constraint. For that matter, I couldn't find "n" being handled as a modifier in arm_print_operand() either. BTW, I'm looking at GCC 4.6 for these. Could you please point me to the doc in question that lists this constraint/modifier? Is this specific to newer (or older) versions of GCC. [...] >> +#define early_patch_stub(type, code, pad, patch_data, ...) \ >> + __asm__("@ patch stub\n" \ >> + "1:\n" \ >> + " b 6f\n" \ >> + " .fill " __stringify(pad) ", 1, 0\n" \ > > What is the pad argument for? It never seems to be set to anything other > than 0 in your series. > The pad argument is used when we need more than one 32-bit instruction slot in the straight line (post patch) code path. Please look at patch 05/17 of the series (ARM: LPAE: support 64-bit virt_to_phys patching). When we patch 64-bit virt_to_phys(), we need that extra slot to fill up the upper 32-bits of the result. > The compiler uses a pretty dumb heuristic to guess the size of asms: > 4 * (number of ; or \n in the string) > > Directives that the compiler can't predict the size of are not safe if > they output into any segment that the compiler uses. .fill/.skip are > obvious candidates, but macro expansions, .rept, .irp etc. can cause > these problems too. > > For example: > > void g(int); > void f(void) > { > g(0xd00dfeed); > asm(".skip 0x1000"); > } > > If you try building this with gcc -marm -Os for example: > > /tmp/ccXYm1uP.s: Assembler messages: > /tmp/ccXYm1uP.s:21: Error: bad immediate value for offset (4100) > > ...because the assembler assumes that it can dump a literal at the end > of the function and reference it from the g() callsite. > > > It may be that you have some intended future use for pad (such as > pasting one instruction sequence in place of another possibly > differently-sized sequence at fixup time), in which case this might > require a bit more thought. > Good point. Thanks. I'm not sure if this helps, but I don't think pad should be used to insert more than a couple of instruction words into the code. >> + "2:\n" \ >> + " .pushsection .runtime.patch.table, \"a\"\n" \ >> + "3:\n" \ >> + " .word 1b\n" \ >> + " .hword (" __stringify(type) ")\n" \ >> + " .byte (2b-1b)\n" \ >> + " .byte (5f-4f)\n" \ >> + "4:\n" \ >> + patch_data \ >> + " .align\n" \ >> + "5:\n" \ >> + " .popsection\n" \ >> + " .pushsection .runtime.patch.code, \"ax\"\n" \ > > I have a vague feeling that this should have .text in the name somewhere, > since it is code that gets executed in place. > Fair enough. Will change to .runtime.patch.text instead. >> + "6:\n" \ >> + code \ >> + " b 2b\n" \ > > .ltorg > > (See [1] below.) > >> + " .popsection\n" \ >> + __VA_ARGS__) >> + >> +/* constant used to force encoding */ >> +#define __IMM8 (0x81 << 24) >> + >> +/* >> + * patch_imm8() - init-time specialized binary operation (imm8 operand) >> + * This effectively does: to = from "insn" sym, >> + * where the value of sym is fixed at init-time, and is patched > > If I've understood correctly, then this description is a bit misleading. > sym is not an absolute operand, but rather *(sym + ofs) is a > variable containing the fixup operand. > Correct. I'll clarify in the text. >> + * in as an immediate operand. This value must be >> + * representible as an 8-bit quantity with an optional >> + * rotation. >> + * >> + * The stub code produced by this variant is non-functional >> + * prior to patching. Use early_patch_imm8() if you need the >> + * code to be functional early on in the init sequence. >> + */ >> +#define patch_imm8(_insn, _to, _from, _sym, _ofs) \ > > Why are the _sym and _ofs parameters separate? Nothing in your series > seems to requre _sym to be a symbol or _ofs to be a number; and nothing > passes _ofs != 0, but anyway I don't see why the caller can't add those > two values together in the macro argument. > I could (should) really get rid of ofs. This is a hold over from earlier versions of the patch, where we were using the ofs to point to the upper 32-bits of the phys_offset ( = 4 for LE, 0 for BE). I will clean this up in the next rev. Thanks. >> + patch_stub( \ >> + /* type */ \ >> + PATCH_IMM8, \ >> + /* code */ \ >> + _insn " %[to], %[from], %[imm]\n", \ >> + /* patch_data */ \ >> + ".long " __stringify(_sym + _ofs) "\n" \ > > If _sym or _ofs is a complex expression, the + may mis-bind. If the > _ofs parameter is needed at all, it would probably be a good idea to > have parentheses around _sym and _ofs. > I'm guessing this doesn't apply once we get rid of ofs? >> + _insn " %[to], %[from], %[imm]\n", \ >> + /* operands */ \ >> + : [to] "=r" (_to) \ >> + : [from] "r" (_from), \ >> + [imm] "I" (__IMM8), \ >> + "i" (&(_sym)) \ >> + : "cc") >> + >> +/* >> + * patch_imm8_mov() - same as patch_imm8(), but for mov/mvn instructions >> + */ >> +#define patch_imm8_mov(_insn, _to, _sym, _ofs) \ >> + patch_stub( \ >> + /* type */ \ >> + PATCH_IMM8, \ >> + /* code */ \ >> + _insn " %[to], %[imm]\n", \ >> + /* patch_data */ \ >> + ".long " __stringify(_sym + _ofs) "\n" \ >> + _insn " %[to], %[imm]\n", \ >> + /* operands */ \ >> + : [to] "=r" (_to) \ >> + : [imm] "I" (__IMM8), \ >> + "i" (&(_sym)) \ >> + : "cc") >> + >> +/* >> + * early_patch_imm8() - early functional variant of patch_imm8() above. The >> + * same restrictions on the constant apply here. This >> + * version emits workable (albeit inefficient) code at >> + * compile-time, and therefore functions even prior to >> + * patch application. >> + */ >> +#define early_patch_imm8(_insn, _to, _from, _sym, _ofs) \ >> +do { \ >> + unsigned long __tmp; \ >> + early_patch_stub( \ >> + /* type */ \ >> + PATCH_IMM8, \ >> + /* code */ \ >> + "ldr %[tmp], =" __stringify(_sym + _ofs) "\n"\ > > This would not be OK if assembled into the .text section, for the reasons > described above. (The compiler doesn't know about the extra data word > injected by the ldr= pseudo-instruction.) > > early_patch_stub puts into a custom section, so that's OK. > > However, there's still nothing to ensure that the literal word is in > range of the load. That can be fixed with an .ltorg to dump out the > literal(s) right after the branch following in early_patch_stub. > >> + "ldr %[tmp], [%[tmp]]\n" \ >> + _insn " %[to], %[from], %[tmp]\n", \ >> + /* pad */ \ >> + 0, \ >> + /* patch_data */ \ >> + ".long " __stringify(_sym + _ofs) "\n" \ >> + _insn " %[to], %[from], %[imm]\n", \ >> + /* operands */ \ >> + : [to] "=r" (_to), \ >> + [tmp] "=&r" (__tmp) \ >> + : [from] "r" (_from), \ >> + [imm] "I" (__IMM8), \ >> + "i" (&(_sym)) \ >> + : "cc"); \ > > Should we have "cc" here? > > Since this macro is only used with a single instruction _insn, there > would be no way to make use of a condition set by that instruction. > > Therefore, flag-setting instructions don't really make any sense here, > and "cc" could be removed. > > If so, it could make sense for the apply_patch_imm8() implementation > to check for and reject flag-setting encodings. > That makes sense. I've modified the do_patch_imm8() functions to explicitly check for attempts to set condition codes, and removed the "cc". [...] >> diff --git a/arch/arm/kernel/runtime-patch.c b/arch/arm/kernel/runtime-patch.c >> new file mode 100644 >> index 0000000..28a6367 >> --- /dev/null >> +++ b/arch/arm/kernel/runtime-patch.c [...] >> +static inline void flush_icache_insn(void *insn_ptr, int bytes) >> +{ >> + unsigned long insn_addr = (unsigned long)insn_ptr; >> + flush_icache_range(insn_addr, insn_addr + bytes - 1); >> +} > > This function appears unused. > Indeed. This function should have been removed. Thanks. > Do we actually do the cache flushing anywhere? > Yes, in __patch_text(). [...] >> +static int do_patch_imm8(u32 insn, u32 imm, u32 *ninsn) >> +{ [...] >> + *ninsn = insn & ~(BIT(26) | 0x7 << 12 | 0xff); >> + *ninsn |= (rot >> 3) << 26; /* field "i" */ >> + *ninsn |= (rot & 0x7) << 12; /* field "imm3" */ >> + *ninsn |= val; > > You need to convert this back to memory order. If fixups might be > applied while the MMU is off, misaligned 32-bit accesses will fail. > It's better to store the two halfwords separately: > > __opcode_to_mem_thumb16(__opcode_thumb32_first(foo)) > __opcode_to_mem_thumb16(__opcode_thumb32_second(foo)) > > If the MMU is on, you can use __opcode_to_mem_thumb32(foo) and do > a possibly misaligned store, though. > > This may be a good idea even if the MMU is on, because fixup is a > once-only process and we don't expect a meaningful performance > impact from that, especially when caching is enabled. But splitting > the access may also make it easier to reuse the framework in > situations where the cache and MMU are off. > > Because of all this, I suggest you don't repeatedly modify *ninsn. > Preparing the value and then writing it (or each half) once is probably > cleaner. > [...] >> +static int do_patch_imm8(u32 insn, u32 imm, u32 *ninsn) [...] >> + /* patch in new immediate and rotation */ >> + *ninsn = (insn & ~0xfff) | (rot << 8) | val; > > You need __opcode_to_mem_arm() to convert this back to memory order. > The do_patch_imm8() functions do not write to the instruction. The ninsn pointer here is not a pointer to the instruction that is being patched, it is simply a pointer used to return a value to the caller (apply_patch_imm8()). The instruction is written in apply_patch_imm8(): u32 ninsn; ... err = do_patch_imm8(info->insn, *info->imm, &ninsn); if (err) return err; __patch_text(insn_ptr, ninsn); Here the __patch_text() call converts the instruction and performs the actual write. If I read this correctly, the __patch_text() code takes care of the splitting up thumb2 instructions as you've indicated. [...] >> +int runtime_patch(const void *table, unsigned size) > > Minor nits: the type-unsafety could be dealt with outside this function; > table could be struct patch_info const *. > > Also, why do we not just pass end to this function instead of subtracting > the table base and then adding it back again? > In addition to the internal usage within runtime-patch.c, this function is called from module.c. In the module load case, the base + size form is more convenient. Second, based on Nico's comments about keeping the namespace clean, I've now moved the structure definitions to runtime-patch.c. These types are no longer exposed, and so we have to keep them opaque in this interface. [...] -- Thanks - Cyril