ARM Cortex A53 ============== Some personal architectural notes on the A53. ARM specific info ----------------- source: Arm Cortex A53 MPCore Processor Technical Reference Manual MMU roles: * controls table walk hardware * translates addresses, virtual->physical MMU configuration and management happens through system control registers. see section 4.1 ASID - Address Spece IDentifier MMU uses an ASID to distinguish, within a TLB (see down), between memory pages having the same virtual address. Assigned by the OS. = Privileges: source: https://developer.arm.com/documentation/102412/0102/Privilege-and-Exception-levels - Execution States: AArch32 and AArch64. - Exception Levels example EL3 (firmware), EL2 (hypervisor) EL1 (kernel), EL0 (application) EL3: can change Security State (see below) EL2: can handle virtualization feats - Security State Being in a non-secure state limits the access to {address space}, {system registers} and {interrupts}. Being in a secure state opens up additional resources of the classes above, besides those available in non-secure state. Realm/Root (RME - Realm Management Extension, see later) EL3 has a fixed Security State, privilged (e.g. Secure State) - Exceptions: synchronous and asynchronous. - Synchronous Served immediately (e.g. MMU permission fail, or special instructions to change exception level). - Asynchronous Can be temporarily masked, they are required to be served in a "finite time". - IRQ - FIQ (fast interrupt request, used to be high prio) - NMI (not maskable interrupts) - SError (system errors, internal of CPU, e.g. bus error) - V(irtual) {IRQ,FIQ,SError) = Memory Management Guide source: https://developer.arm.com/documentation/101811/0102/?lang=en The virtual address is handled by the TLB (see below) within the MMU. The address must be translated before the cache lookup (physically tagged). Multi-level table: the lookup is hierarchical (as in generic TLB page walk, see below). Bounded number of levels (e.g. ARMv8-A -> 4 levels max). The OS decides how to organise the tables (e.g. larger blocks = short walks, smaller blocks = finer control, but longer walks). "Translation regimes": Each item of the {Exception Level} x {Security State} matrix has its own virtual address space (settings and tables). e.g. NS.EL2:0x8000 => non-secure state, exception level 2, address 0x8000. = Trustzone source: https://developer.arm.com/architectures/learn-the-architecture/trustzone-for-aarch64 = Realm Management source: https://developer.arm.com/documentation/den0126/latest = Memory Management Guide Special interest info --------------------- = MMU Configuration via system registers Faulty exception checks cause synchronous exceptions [?] implied: regular and permitted access is handled transparently = Data Caching Generic info ------------ = TLB Translation Lookaside Buffer Caches the information needed for the onversion of virtual pages to physical pages. The address has the virtual part of the address identifying the page, so it is replaced by a lookup. TLB has a fixed size, so it is subject to cache miss. In case of miss, the system refers to the page table in physical memory. = Cache Policies source: https://en.wikipedia.org/wiki/Cache_placement_policies == Direct mapped Single line per set: each memory block can occupy a single line. Cheap, fast, but low hit rate (conflicts results in content being replaced). == Fully associative Single set with multiple lines. Each memory block can be anywhere in the set, and iteration is needed. Cheap but slow due to iteration. == N ways associative cache N-way set associative: provide N blocks in each set. This reduces the likelihood of a cache miss of a factor N, while also doubling the size of the cache. Direct mapped corresponds to 1-way associative. = Cache indexing and tagging {P,V}I{P,V}T Indexing determines the cache set, tagging determines what line of the set contains the data. They can be Physical or Virtual, with various pros and cons. = Page walk source: https://cs.stackexchange.com/questions/102834/what-is-happening-during-table-walk With a 64 bits architecture, a 4k page size (12 bits of displacement) gives place to a lookup table that has an insane number of entries: 2^(64 - 12) Using a multi-step page table allows to fit it in memory, but implies multiple memory accesses (hence the term "walk"). Walks are reduced by the use of a TLB, which however may have cache misses. Useful references ----------------- FIQ (Fast Interrupt reQuest) vs IRQ A matter of priority, FIQs can interrupt IRQs. https://stackoverflow.com/questions/973933/what-is-the-difference-between-fiq-and-irq-interrupt-system/14212234#14212234