Logo SoCRocket

Transaction-Level Modeling Framework for Space Applications

mmu_cache - Memory Management Unit and Cache

Table of Contents

Functionality and Features

Overview

The MMU_CACHE SystemC IP models behaviour and timing of the Gaisler GRLIB Harvard L1 Cache, Localrams and GRLIB Memory Management Unit (MMU). The respective VHDL reference is the mmu_cache entity of the GRLIB hardware library.

The structure of the Cache Sub-System is depicted in Figure 4. The top-level class mmu_cache provides two TLM 2.0 simple_target_sockets ( icio , dcio ) for communication with the LEON ISS and one Carbon/GreenSocs amba_master_socket for the connection to the AHBCTRL. All the sub-components, such as the mmu, the caches and the localrams are implemented in plain C++.

Equivalent to the hardware model the caches can be directly mapped, 2-way, 3-way or 4-way set associative. For multi-set configurations LRU, LRR and pseudo-random replacement are supported. The size of the cache sets can be between 1 and 64 kBytes, with 16 or 32 bytes per line. The caches can be flushed, frozen or locked on a line-by-line basis. The write policy of the data cache is write-through with no-allocate on write miss. The caches can be separately disabled. In that case requests from the ISS are directly forwarded to the AHB master or the MMU (if enabled).

The model also provides instruction and data scratchpads (localrams), with zero-waitstate access to up to 512 kByte of memory.

The MMU can also be optionally enabled. The MMU page size is 4, 8, 16 or 32 kByte. The TLBs can hold between 2 and 32 page descriptors. In case of a page miss a 3-level table walk is carried out on main memory. Similar to the localrams, instantiation of the mmu is done by late binding depending on configuration parameters. The caches connect to the mmu through tlb_adaptor objects. The tlb_adaptors present a unified memory interface towards the caches ( mem_if ). The same memory interface is used to provide access to the AHB master socket on top-level. This way it can be dynamically decided whether a request from one of the caches shall be forwarded to a shared or common TLB (virtual addressing), or directly go to the AHB interface (physical addressing).

Todo:
insert missing picture

Figure 4 - Structure of Cache Sub-System

Address Space Identifiers (ASI)

SPARC processors generate an 8-bit address space identifier (ASI), to provide access to up to 256 separate 32-bit address spaces. A big share of the ASIs is used for control of the cache sub-system. A list of the ASIs supported by the TLM model is given in Table 19.

Table 19 - Supported ASIs
ASIAddressUsage
0x01anyForced cache miss
0x020x00Cache control register
0x04Reserved
0x08Instruction cache configuration register
0x0cData cache configuration register
0xffTrigger debug output
0x08,0x09,0x0A,0x0BanyNormal cache access
0x0csee Diagnostic AccessAccess instruction cache tags
0x0dsee Diagnostic AccessAccess instruction cache data
0x0esee Diagnostic AccessAccess data cache tags
0x0fsee Diagnostic AccessAccess data cache data
0x15see Diagnostic AccessFlush instruction cache
0x16see Diagnostic AccessFlush data cache
0x190x000MMU control register
0x100MMU Context pointer register
0x200MMU Context register
0x300MMU Fault status register
0x400MMU Fault address register

ASIs are emitted by the data interface of the processor. For this purpose an extension has been linked to the data cache payload object ( dcio_payload_extensions ). For more information about payload extensions see Payload Extensions.

The ASIs are decoded in the exec_data function of class mmu_cache. The decoder maps the ASIs to API functions of the corresponding sub-components (caches, mmu). The API functions are described in section Interface.

System and Control Registers

The cache sub-system is controlled by a set of system registers, which can be accessed using ASIs.

Three of the mentioned registers are dedicated to the caches (ASI 0x02). The Cache Control Register (CCR, see below) effects both, data and instruction cache. Therefore, it is implemented on top-level ( mmu_cache ). Moreover, each of the caches has its own private Configuration Register (CR, see below). The CRs describe structure and size of the caches and are read-only.

CACHE CONTROL REGISTER
31 24 23 22 21 20 17 16 15 14 13 6 5 4 3 2 1 0
empty DS FD FI empty IB IP DP empty DF IF DCS ICS
CACHE CONTROL REGISTER Description
BitsIdDescription
31 - 24emptyempty
23DSData cache snoop enable: If set, will enable data cache snooping (todo).
22FDFlush data cache: If set, will flush the instruction cache. Always reads zero.
21FIFlush instruction cache: If set, will flush the instruction cache. Always reads zero.
20 - 17emptyempty
16IBInstruction burst fetch: This bit enables burst fill during instruction fetch.
15IPInstruction cache flush pending (not supported)
14DPData cache freeze on interrupt (not supported)
13 - 6emptyempty
5DF
4IFInstruction cache freeze on interrupt (not supported)
3 - 2DCSData cache state: Indicates the current data cache state according to the following: X0 = disable, 01 = frozen, 11 = enabled.
1 - 0ICSInstruction cache state: Indicates the current instruction cache state according to the following: X0 = disabled, 01 = frozen, 11 = enabled.

ICACHE & DCACHE Configuration Register
31 29 28 27 26 24 23 20 19 18 16 15 12 11 4 3 0
CL   REPL SN SETS SSIZE LR LSIZE LRSIZE LRSTART M  
ICACHE & DCACHE Configuration Register Description
BitsIdDescription
31CLCache looking: If set, cache locking is implemented 30]()
29 - 28REPLCache replacement policy: 00 = non (direct mapped), 01 = least recently used (LRU), 10 = least recently used (LRR), 11 = random
27SNData cache snooping: Set if snooping is implemented
26 - 24SETSNumber of sets in the cache: 000 = direct mapped, 001 = 2-way associative, 010 = 3-way associative, 011 = 4-way associative
23 - 20SSIZESet size: Indicates the size (Kbytes) of each cache set (Size = 2^SSIZE).
19LRLocal RAM: Set if local scratchpad is present.
18 - 16LSIZELine size: Indicates the size (words) of each cache line (Line size = 2^LSIZE).
15 - 12LRSIZELocal RAM size: Indicates the size (Kbytes) of the implemented scratchpad RAM (Size = 2^LRSIZE).
11 - 4LRSTARTLocal RAM start address: Indicates the 8 most significant bits of the local RAM start address.
3MMMU present: Set if MMU is present 2:0]()

The MMU is controlled by five 32-bit registers, which can be accessed with ASI 0x19 (see below). All of them are implemented in class mmu.

MMU Control Register
31 1 0
not used E
MMU Control Register Description
BitsIdDescription
31 - 1not used
0EFrom the MMU Control Register only one bit is implemented in the TLM model. It is used to enable and disable the MMU.

Context Pointer Register
31 2 0
Context Table Pointer  
Context Pointer Register Description
BitsIdDescription
31 - 2Context Table PointerThe Context Pointer Register points to the Context Table in main memory. It forms bit 35 – 6 of the physical address. The table is indexed by the contents of the Context Register.

Context Register
31 0
Context Number
Context Register Description
BitsIdDescription
31 - 0Context NumberThe Context Register contains the number of the current context and defines which of the possible address spaces is used for address translation.

Fault Status Register
31 0
Not implemented yet
Fault Status Register Description
BitsIdDescription
31 - 0Not implemented yetThe Fault Status Register provides information on exceptions (faults) issued by the MMU. It is currently not implemented and reads as 0.
Todo:
is this implemented now?
Todo:
fix issue fix bracket handling in register table

Fault Address Register
31 0
Fault address (not implemented yet
Fault Address Register Description
BitsIdDescription
31 - 0Fault address (not implemented yet) The Fault Address Register contains the virtual memory address of the fault recorded in the Fault Status Register. It is currently not implemented and reads as 0.

Data Cache Snooping

The mmu_cache IP supports data cache snooping. Snooping can be enabled by setting bit 23 of the Cache Control Register. The model provides a SignalKit input snoop. The constructor of mmu_cache registers a callback function ( snoopingCallBack ). The AHBCTRL triggers this function on every write operation. Via the snoop input the callback receives the bus id of the responsible master, the target address and the length of the write operation. If the master id does not equal the own id and dcache as well as dcache snooping are enabled, mmu_cache calls the snoop_invalidate function of dcache. The latter checks, whether the access is directed to a locally cached address. If the address is cached, the affected entries are invalidated.

Instruction burst fetch

Instruction burst fetch can be enabled by setting the IB bit of the Cache Control Register (CCR). In burst fetch mode the respective cache line is filled from main memory starting at the missed address until the end of the line. For this purpose the AHB master executes a burst transfer. The RTL reference model forwards the incoming instructions directly to the processor (streaming). In case of internal dependencies or multi-cycle instructions, the processor stalls until the whole line is cached. The TLM model has a slightly simplified behavour. It will always complete the burst, before sending instructions to the ISS. Burst fetch with disabled instruction cache is not supported. If the instruction cache is disabled, instructions are always being fetch using single transfers (NONSEQ).

Cache Flushing

The instruction cache and the data cache can be flushed in multiple ways. If the processor sends a flush instruction, both caches are flushed simultaneously. The mmu_cache recognizes a flush via the flush payload extension. The hardware model of the mmu_cache additionally provides two more input fields flushl and fline. They seems to be intended to flush certain cache lines, but are currently not used. The TLM model provides respective payload extension, to be future-proof.

A flush of only the instruction cache can be triggered by setting bit 21 (FI) of the Cache Control Register or by any write operation with ASI 0x15. Equally, the data cache may be flushed by setting bit 22 (FD) of the Cache Control Register or by any write operation with ASI 0x16.

Freezing and Locking

The instruction cache and/or the data cache can be frozen by setting the ICS and/or the DCS field of the Cache Control Register to "01" (CCR). A cache in frozen state is accessed and kept in sync with the main memory as if it was enabled, but no new lines are allocated on read misses.

Bit 31 of the two Cache Configuration Registers configures cache line locking. A cache line can be locked by setting the lock bit of a line. This can be done by a diagnostic write to the cache tag (Diagnostic Access). Locked cache lines will be updated on read-miss and will remain in the cache until the line is unlocked.

Diagnostic Access

Most of the internal data structures of cache and mmu can be accessed for diagnostic purpose via dedicated ASIs.

The tag and data RAMs of instruction and data cache can be read and written using ASI 0x0C – 0x0F (see Address Space Identifiers (ASI)). Addressing and alignment of data are equivalent to the mechanism described in section 55.5.2. of GRLIB IP Library User’s Manual.

ADDRESS = WAY & LINE & DATA & "00"

Payload Extensions

The communication between the processor and the Cache Sub-System requires additional information to be attached to the TLM 2.0 generic payload. The extensions are modeled in two classes:

Both classes declare a debug extension, which is modeled as a 32bit unsigned integer. The usage of the debug extension is explained in Debug Mechanism.

The dcio extension additionally contains fields for cache flushing, bus locking and the address space identifier ( asi ). All are represented by 32 bit unsigned integers.

The mmu_cache checks all incoming transactions for the presence of the payload extensions. For data transactions this is done in function exec_data , for instruction transactions in function exec_instr. An error message is generated, if the extensions are not available.

Debug Mechanism

The cache sub-system is a rather complex model. Hence, for assertion-based verification, it is not sufficient to simply check whether the data response on a request is correct. It is also important to know in which way the result was produced (e.g. cache hit/miss).

For this purpose a 32bit unsigned integer extension has been attached to the generic payload of the icio / dcio sockets. A set of macros is provided for handling the debug extension. The encoding the debug field is shown below. The macro definitions can be found in the file defines.h of the mmu_cache library.

Debug Extension
31 22 21 20 16 15 14 13 12 11 4 3 2 1 0
reserved MMUS TLBN reserved FM CB SP reserved CST CS
Debug Extension Description
BitsIdDescription
31 - 22reserved
21MMUSMMU state: 0 – TLB hit; 1- TLB miss
20 - 16TLBNTLB number: for TLB hit – number of the TLB that delivered the hit for TLB miss – number of the TLB that was refilled by the miss
15reserved
14FMFrozen miss: If the cache is frozen, no new lines are allocated on a read miss. However, unvalid data will be replaced as long the tag of the line does not change. In case the results of a read miss is not cached, the FM bit is switched on.
13CBCache bypass: Is set to 1, if cache bypass was used (cache disabled in CCR)
12SPScratchpad: Is set to 1, if the request was answered by the local scratchpad RAM
11 - 4reserved
3 - 2CSTCache State: 00 – read hit, 01 – read miss, 10 – write hit, 11 – write miss
1 - 0CSCache Set: for read/write hit – number of set which produced the hit for read miss – number of set refilled by miss processing for write miss – 0b00 (no allocate on write miss)

Power Monitoring

Power monitoring can be enabled by setting the constructor parameter pow_mon to true. Enabling power monitoring for the top-level mmu_cache instance also enables power monitoring for all sub-components. The model is annotated with default power information that has been gathered using a generic 90nm Standard-Cell Library and statistical power estimation at Gate-Level.

The accuracy of the build-in power models and the default switching energy settings cannot be guaranteed. In order to achieve the best possible results the user is recommended to annotate the design with custom target-technology dependent power information.

The power models of the mmu_cache, the i/d caches, the localrams and the mmu; all required parameters, and default settings are explained in the SoCRocket Power Modeling Report [RD11].

Todo:
power modeling report?

Interface

The GRLIB VHDL model of the MMU_CACHE is configured using Generics. For the implementation of the TLM model most of these Generics were refactored to constructor parameters of class mmu_cache (Table 24). The parameters of the top-level class are used for the configuration of all sub-components (caches, localrams, mmu).

Table 24 - Constructor Configuration Parameters
ParameterDescription
icenEnable instruction cache
ireplIcache replacement strategy 00 = non, 01 = LRU, 10 = LRR, 11 = random
isetsNumber of instruction cache sets (1-4)
ilinesizeIndicates size of instruction cache line in words (line size = 2^ilinesize, ilinesize <= 3)
isetsizeIndicates size (kbytes) of instruction cache set(set size = 2^isetsize, isetsize <= 6 (max 64 kbytes))
isetlockEnable instruction cache locking
dcenEnable data cache
dreplDcache replacement strategy00 = non, 01 = LRU, 10 = LRR, 11 = random
dsetsNumber of data cache sets (1-4)
dlinesizeIndicates size of data cache line in words(line size = 2^dlinesize, dlinesize <= 3 )
dsetsizeIndicates size (kbytes) of data cache set(set size = 2^dsetsize, dsetsize <= 6 (max. 64 kbytes))
dsetlockEnable data cache locking
dsnoopEnable data cache snooping
ilramEnable instruction scratchpad
ilramsizeIndicates size of instruction scratchpad in kbytes(size = 2^ilramsize, ilramsize <= 9 (max. 512 kbytes))
ilramstart8 MSB bits used to decode local instruction RAM area (16 MB segm.)
dlramEnable data scratchpad
dlramsizeIndicates size of data scratchpad in kbytes(size = 2^dlramsize, dlramsize <= 9 (max. 512 kbytes))
dlramstart8 MSB bits used to decode local data RAM area (16 MB segment)
cachedFixed cacheability mask (overrides AMBA Plug & Play settings)
mmu_enEnable MMU
itlb_numIndicates number of instruction TLBs(tlb number = 2^itlb_num, itlb_num <= 5 (max. 32))
dtlb_numIndicates number of data TLBs(tlb number = 2^dtlb_num, dtlb_num <= 5 (max. 32))
tlb_typeTLB implementation type0 = separate, 1 = shared instruction and data TLB
tlb_repTLB replacement policy0 = LRU, 1 = random
mmupgszMMU page size0, 2 = 4kbytes, 3 = 8kbytes, 4 = 16kbytes, 5 = 32kbytes
nameSystemC name of module
idID of the AHB bus master
pow_monEnable power monitoring
abstractionLayerAbstraction/Coding style of the model (LT or AT)

The system-level interface of the model comprises two TLM 2.0 simple_target_sockets ( icio , dcio ) and one GreenSocs/Carbon AHB master socket ( ahb_master ).

tlm_utils::simple_target_socket<mmu_cache> icio / bind to CPU instruction socket
tlm_utils::simple_target_socket<mmu_cache> dcio / bind to CPU data socket
amba::amba_master_socket<32> ahb_master / bind to AMBA system bus

Depending on the constructor parameter abstractionLayer the sockets are configured for blocking (LT) or non-blocking (AT) communication. In the LT case the module registers one TLM blocking transport function for the dcio and one for the icio socket. In the AT case the model registers one TLM non-blocking forward transport function for the dcio and one for the icio socket, and one TLM non-blocking backward transport function for the ahb_master socket. Additionally, the model comes with debug transport functions, for non-intrusive code execution (TRAP) and checking. The signatures of all transport functions are compliant with the TLM2.0 standard. Next to the TLM sockets the model contains SignalKit inputs for data bus snooping ( snoop ), clock cycle time ( clk ) and reset ( rst ). The clk and rst inputs are inherited from class CLKDevice , while snoop is directly defined in mmu_cache.

Internal Structure

Todo:
move this section to the respective files?

This section describes the internal structure and the behavior of the MMU_CACHE SystemC IP. The model consists of multiple classes, which are spread over a number of source files, all of which can be found in the models/leon3/mmucache directory.

Files of the mmu_cache library

The defines.h file

This file contains data type definitions and macros, and is included by almost all the other files of the library.

It defines the structure of the cachelines ( t_cache_line ), the data cache entries ( t_cache_data ), the cache tags ( t_cache_tag ), the mmu page table entries ( t_PTE_context ) and the virtual address tags ( t_VAT ). Moreover, the file contains macros for handling the debug payload extension (Debug Mechanism).

The payload_extension files

The TLM MMU_CACHE owns two tlm::simple_target_sockets for connection to the instruction and data sockets of the processor simulator. These connections implement a simple point-to-point communication, which can be widely realized relying on TLM 2.0 generic payload. Only a few optional payload extensions are required.

The payload extensions for the instruction cache input/output socket ( icio ) are implemented in the files icio_payload_extensions.h/cpp :

// extensions
// ----------
unsigned int flush;
unsigned int flushl;
unsigned int fline;
unsigned int * debug;

The payload extensions for the data cache input/output socket ( dcio ) are implemented in the files dcio_payload_extensions.h/cpp :

// extensions
// ----------
unsigned int asi;
unsigned int flush;
unsigned int flushl;
unsigned int lock;
unsigned int * debug;

The mem_if.h file

The mem_if.h file defines a generic memory interface that is directly or indirectly implemented by almost all the classes of the MMU_CACHE (Figure 5).

The class mem_if is an abstract class with two virtual member functions:

virtual void mem_write(unsigned int addr, unsigned char * data, unsigned int length, sc_core::sc_time * t, unsigned int * debug, bool is_dbg) {};
virtual void mem_read(unsigned int addr, unsigned char * data, unsigned int length, sc_core::sc_time * t, unsigned int * debug, bool is_dbg) {};

The interface is implemented by the top-level class mmu_cache , the caches, the localrams and the mmu ( tlb_adapters ). As a consequence the modules of the mmu_cache library can be bound to each other like building blocks. For example, depending on the dcen and mmu_en constructor parameters, transactions from the data socket ( dcio ) can be directed to the cache, to the mmu or to the ahb master. Transactions from the dcache or icache are directly forwarded to the ahb master or to the mmu.

Todo:
inlucde image from word doc

Figure 5 - Generic Memory Interface / Dependencies

The mmu_cache_if.h file

The mmu_cache_if class extends the mem_if class by two functions for reading and writing the Cache Control Register (CCR).

virtual unsigned int read_ccr();
virtual void write_ccr(unsigned char * data, unsigned int len,
sc_time *delay, bool is_dbg);

The CCR is implemented at the top-level of class mmu_cache _._ The caches and the mmu require access to the CCR at runtime. Therefore, they receive a pointer of type mmu_cache_if as a constructor argument.

The cache_if.h file

The cache_if class is another extension of class mem_if. It describes the interface of all cache models in the system. Next to reading or writing data ( mem_if ), caches must allow to flush data, to read/write cache tags/entries, to access configuration registers and to handle snooping:

virtual void flush(sc_core::sc_time * t, unsigned int * debug, bool is_dbg) = 0;
virtual void read_cache_tag(unsigned int address, unsigned int * data,
sc_core::sc_time *t) = 0;
virtual void write_cache_tag(unsigned int address, unsigned int * data,
sc_core::sc_time *t) = 0;
virtual void read_cache_entry(unsigned int address, unsigned int * data,
sc_core::sc_time *t) = 0;
virtual void write_cache_entry(unsigned int address, unsigned int * data,
sc_core::sc_time *t) = 0;
virtual unsigned int read_config_reg(sc_core::sc_time *t) = 0;
virtual unsigned int check_mode() = 0;
virtual void snoop_invalidate(const t_snoop &snoop,
const sc_core::sc_time& delay) = 0;
virtual void clkcng(sc_core::sc_time &clk) = 0;
virtual void dbg_out(unsigned int line) = 0;

Next to the data cache ( dvectorcache ) and the instruction cache ( ivectorcache ), the interface is implemented by the plain structural module nocache. In case one of the caches is not present in the system (disabled via i/dcen constructor parameters), the mmu_cache binds one or two instances of nocache. The nocache class implements stubs for all cache_if functions. Forbidden operations generate an error message.

The mmu_cache.h/cpp files

The files declare and implement the top-level class of the MMU_CACHE. The class mmu_cache implements the mmu_cache_if interface and instantiates all sub-modules depending on the selected configuration. All sub-components are dynamically created in the constructor of the class. The instantiation depends on parametrization options (see Fehler! Verweisquelle konnte nicht gefunden werden.). In case a certain module is not required, a NULL pointer will be assigned. If the mmu is enabled, the caches use the memory interfaces ( mem_if ) of the instruction tlb_adapters and data tlb_adapters for miss processing, otherwise they are directly connect to the ahb master.

The following pointers provide access to the APIs of all subordinate components:

All sub-components are implemented in plain C++, for highest possible simulation speed.

MMU_CACHE also inherits from class AHBDevice and class CLKDevice. From AHBDevice mmu_cache receives a PNP configuration record for identification as an AHB master. Class CLKDevice provides an unified interface for clock and reset distribution, that is shared with most of the components in the SoCRocket library. The timing information received via the clk SignalKit input is distributed to all sub-components by function clkcng.

As the top-level class, mmu_cache implements the interface to the outside TLM world. Next to a GreenSocs/Carbon AHB master socket ( ahb_master ), the class contains two TLM 2.0 simple_target_sockets for connection to the instruction and data ports of the processor simulator (Table 25).

Table 25 - TLM Sockets Cache Sub-System
NameTypeDescription
icioTLM2 / simple_target_socket (LT)Instruction cache in/out
dcioTLM2 / simple_target_socket (LT)Data cache in/out
ahb_masterGreenSocs / amba_master_socket (LT)AHB bus master

With respect to the TLM 2.0 standard the TLM interface of the mmu_cache supports two levels of accuracy: loosely timed (LT) and approximately timed (AT). The abstraction level can be selected via the abstractionLayer constructor parameter. The LT interface is described in LT Behaviour and the AT interface in AT Behaviour. Both interfaces map the incoming transactions to functions that encapsulate the behaviour of the model. Instruction transactions invoke exec_instr and data transactions exec_data.

The exec_instr function is very compact. After extracting the payload object and verifying the extension, the tlm_command attribute is checked. TLM read commands are translated into calls to the read ( mem_if) function of the icache or ilocalram. Because the instruction cache is read only, TLM write requests cause a TLM_COMMAND_ERROR_RESPONSE and an error message to be printed on the screen.

The exec_data function is more deeply structured. The main reason is the decoder for the address space identifiers (ASIs – see Address Space Identifiers (ASI)). The ASI is implemented as a mandatory extension to the TLM 2.0 generic payload. Depending on the ASI the exec_data function maps the incoming transactions to the APIs of the different sub-components. Default cache access is performed for ASIs 0x8, 0x9, 0xa and 0xb. Other modes are used to access system registers (0x2), tag rams (0xc, 0xe), cache data blocks (0xd, 0xf), mmu internal registers (ASI 0x19) and more. For every transaction the payload extensions are checked. TLM read commands are translated into calls to the read ( mem_if ) function of the dcache or the dlocalram. If the dcache is disabled the transactions are forwarded to the mmu or to the ahb_master socket.

The class mmu_cache also contains the Cache Control Register (CCR) and its access functions read_ccr and write_ccr (see Table 20).

The vectorcache.h/cpp files

The files vectorcache.h and vectorcache.cpp form the base class for the implementation of the instruction cache (ivectorcache) and the data cache (dvectorcache). Class vectorcache implements the cache_if API and provides almost all the functionality required by both caches. In the following these functions are briefly described:

void read(unsigned int address, unsigned char * data, unsigned int len, sc_core::sc_time * t, unsigned int * debug, bool is_dbg);

The read ( mem_if ) function is called for any type of load operation (byte, short, word, dword). The length of the access in bytes is given by the len parameter. The address is split into a cache tag and a cache index portion. The respective line is loaded from all sets and compared against the index. If one of the tags equals the index and the valid bit is set, the cache entry is copied to the *data pointer (read hit). In case the tags do not match or the valid bit is not set, the request is forwarded to the ahb interface or to the mmu (read miss). After miss processing, the fresh data is filled into the cache and copied to the *data pointer.

The is_dbg flag signals that the read function was called in a TLM debug transport.

void write(unsigned int address, unsigned char \* data, unsigned int len, sc_core::sc_time \* t, unsigned int \* debug, bool is_dbg);

The write ( mem_if ) function is called for any type of store operation (byte, short, word, dword). The length of the access in bytes is given by the len parameter. The address is split into a cache tag and a cache index portion. The respective line is loaded from all sets and compared against the index. If one of the tags equals the index and the valid bit is set, the respective data entry is updated and the request is forwarded to the mmu or the ahb interface (write hit). If the tag does not match or the valid bit is not set the request directly goes to mmu or ahb interface (write miss). The cache will not be updated on a write miss. The write policy is write-through with no-allocate on write miss.

void flush(sc_core::sc_time * t, unsigned int * debug);
Flushes the cache_. _During a cache flush all valid data in the cache is transferred to main memory for synchronization.
void read_cache_tag(unsigned int address, unsigned int * data, sc_time *t);
void write_cache_tag(unsigned int address, unsigned int * data, sc_time *t);
void read_cache_entry(unsigned int address, unsigned int * data, sc_core::sc_time *t);
void write_cache_entry(unsigned int address, unsigned int * data, sc_time *t);

These functions are used for diagnostic access to cache tags and cache entries (see Diagnostic Access).

unsigned int read_config_reg(sc_core::sc_time \*t);

Returns the configuration register of the cache. The Cache Configuration Register is initialized in the constructor of class mmu_cache. The register is read only.

virtual unsigned int check_mode() = 0;

A cache can be in one of three different modes of operation: enabled, disabled or frozen. The current mode can be deterimend by checking the Cache Control Register, which is implemented in the top-level class mmu_cache. Depending on the type of cache (instruction or data) the DCS or ICS bits of the CCR must be checked. Therefore, the check_mode function is plain virtual. The function must be overwritten by the actual icache or dcache implementation.

The ivectorcache.h/cpp files

The class ivectorcache contains the actual implementation of the instruction cache. The class inherits from class vectorcache. The write function is overwritten, because the instruction cache is not writable. A call to the write function produces an error message and stops the simulation.

The class implements the virtual function check_mode. For checking the mode of operation the ICS bits of the Cache Control Register are used.

The dvectorcache.h/cpp files

The class dvectorcache contains the actual implementation of the data cache. The class inherits from class vectorcache.

The virtual check_mode function is implemented. For checking the mode of operation the DCS bits of the Cache Control Register are used.

The localram.h/cpp files

The class localram models a fast scratchpad memory that can be attached to both instruction and data cache controllers. It implements the generic memory interface mem_if. The actual memory is implemented as a character array ( scratchpad ).

The mmu.h/cpp files

Todo:
what is RD08?

The files implement the memory management unit of the MMU_CACHE. The component was modeled following the recommendations for the SparcV8 reference MMU given in [RD08]. The class mmu receives the number of instruction TLBs, the number of data TLBs, the TLB type, the TLB replacement policy and the mmu page size as constructor arguments. Depending on the TLB type two split TLBs or one shared TLB is generated for instructions and data. The TLBs are implemented as a std::map. The key for a TLB lookup is a virtual address tag ( t_VAT ). The caches connect to the mmu through tlb_adapter objects (The tlb_adaptor.h file). In shared TLB mode only one adapter is generated. Next to the adapter objects the class mmu offers a set of API functions. The most important of these functions is:

unsigned int tlb_lookup(unsigned int addr, std::map<t_VAT, t_PTE_context> * tlb, unsigned int tlb_size, sc_core::sc_time * t, unsigned int * debug);

The tlb_lookup function is responsible for translating virtual addresses into physical addresses. It receives the virtual address and a TLB pointer as input arguments. In the body of the function the virtual address is split into three indices. The bit width of these indices depends on the virtual page size. The following page sizes and index combinations are supported (Table 26):

Table 26 - Page size / index combinations
virt. page sizeidx 1idx 2idx 3
4kb8 bit6 bit6 bit
8kb7 bit6 bit6 bit
16kb6 bit6 bit6 bit
32kb4 bit7 bit6 bit

In case of a TLB miss the indices are used for addressing the page tables in main memory. A successful read of a page table returns either a page table descriptor (PTD) or a page table entry (PTE). A PDC is a pointer to the next-level page table, while a PTE corresponds to an actual TLB entry. Up to three page table levels are supported.

The mmu contains a set of internal control registers. These registers can be accessed through ASI 0x19 (Table 22). Respectivly read and write requests are translated into calls to following functions:

// read mmu internal registers (ASI 0x19)
unsigned int read_mcr();
unsigned int read_mctpr();
unsigned int read_mctxr();
unsigned int read_mfsr();
unsigned int read_mfar();
// write mmu internal registers (ASI 0x19)
void write_mcr(unsigned int * data);
void write_mctpr(unsigned int * data);
void write_mctxr(unsigned int * data);

Another group of member functions is dedicated to diagnostic TLB access. The addressing of the different bit fields can be taken from [RD08].

// diagnostic read/write of instruction PDC (ASI 0x5)
void diag_read_itlb(unsigned int addr, unsigned int * data);
void diag_write_itlb(unsigned int addr, unsigned int * data);
// diagno. read/write of data PDC or shared instruction and data PDC (ASI 0x6)
void diag_read_dctlb(unsigned int addr, unsigned int * data);
void diag_write_dctlb(unsigned int addr, unsigned int * data);

The tlb_adaptor.h file

The class tlb_adapter implements the generic memory interface mem_if. Depending on the configuration the mmu creates one or two objects of type tlb_adapter , which provide access to the instruction and/or data tlb. Pointers to these objects can be obtained by calling the mmu API functions get_itlb_if and get_dtlb_if.

LT Behaviour

The LT mode of the MMU_CACHE is intended for fast register accurate simulation (programmers view).

Instruction transactions

For instruction fetch the model provides the iciosimple_target_socket. In LT mode this socket is bound to theicio_b_transportblocking transport function. Incoming transactions are directly forwarded to theexec_instrfunction (The mmu_cache.h/cpp files), which models the functional interface of themmu_cacheIP. Depending on the configurationexec_instrperforms a lookup of the instruction cache or loads data from the instruction scratchpad. Cache misses or bypass operations create a transaction on theahb_mastersocket. Ifmmu_enis set all addresses are considered virtual and will be translated to physical addresses by the mmu. In the meantime the processor is blocked. Theexec_instrfunction returns the accumulated delay of all involved sub-components. Before unblocking the master theicio_b_transport function calls wait to consume the component delay.

Data transactions

For data load/store the model provides the dcio simple_target_socket. In LT mode this socket is bound to the dcio_b_transport blocking transport function. Similar to instruction fetch, incoming transactions are directly forwarded to a function encapsulating the behaviour of the data cache. Depending on the configuration and the settings contained in the payload extensions the exec_data function performs a lookup of the data cache, loads/store of the instruction or data scratchpad or read/writes of internal registers. Cache misses or bypass operations create a transaction on the ahb_master socket. If mmu_en is set all addresses are considered virtual and will be translated to physical addresses by the mmu. In the meantime the processor is blocked. The exec_data function returns the accumulated delay of all involved sub-components. Before unblocking the master the dcio_b_transport function calls wait to consume the component delay.

AT Behaviour

The AT mode of the MMU_CACHE is intended for architecture exploration and RTL co-simulation. It contains multiple parallel threads, which are not present in LT mode.

Instruction transactions

Instruction transactions arrive in the icio_nb_transport_fw function with phase BEGIN_REQ. The function enters the transaction in the icio_PEQ payload event queue and returns to the master with END_REQ and TLM_UPDATED. The SC_THREAD icio_service_thread is sensitive to the default event of icio_PEQ. It invokes the exec_instr function for every transaction from the queue. As already mentioned, exec_instr encapsulates the functional part of the model. Within exec_instr the payload is processed in the same way as described for LT mode. After return from exec_instr the icio_service_thread consumes the accumulated delay of all involved mmu_cache sub-components. Afterwards, the master is notified by sending BEGIN_RESP on the backward path. The master may reply with TLM_COMPLETED or TLM_ACCEPTED. A final END_RESP from the master will be accepted, but is not required.

Data transactions

In AT mode the constructor of mmu_cache registers a non-blocking transport function at the dcio simple_target_socket (dcio_nb_transport_fw). The dcio_nb_transport_fw function is called at every phase change of a data transaction. New transactions arrive with phaseBEGIN_REQ. The transport function enters the transaction in thedcio_PEQpayload event queue and returns to the master withEND_REQandTLM_UPDATED. Thedcio_PEQis used to forward the transaction to the SC_THREADdcio_service_thread. It invokes theexec_datafunction for every transaction from the queue. Withinexec_datathe payload is processed in the same way as described for LT mode. After return fromexec_datathedcio_service_threadconsumes the accumulated delay of all involvedmmu_cachesub-components. Afterwards, the master is notified by sendingBEGIN_RESPon the backward path. The master may reply withTLM_COMPLETEDorTLM_ACCEPTED. Similar to instruction transactions, a finalEND_RESP from the master will be accepted, but is not required.

The AHB master

Class mmu_cache implements the mem_if memory interface, to provide access to the ahb_master socket, for all modules of the library.

For read transaction this invokes function mem_read. Every call to mem_read creates a new payload object. The payload is taken from the transaction pool provided by the GreenSocs/Carbon ahb socket. Target address and payload pointer of the original transaction are copied. Next to the default payload attributes, the function initializes a set of ahb specific extensions:

After setting up the payload the mem_read function checks the is_dbg flag and the abstractionLayer parameter. In debug mode the transaction is send using the untimed TLM debug transport interface:

ahb->transport_dbg(*trans)

In LT mode mem_read invokes a blocking transport:

ahb->b_transport(*trans, delay)

After returning from b_transport the model synchronizes with the SystemC scheduler by calling wait. This consumes the accumulated delay of the AHB transfer.

In AT mode the bus transfer is modeled using multiple phases. This requires a non-blocking backward transport function ( ahb_nb_transport_bw ) to be bound to the ahb_master socket and a number of SC_THREADs. If mem_read is called in AT mode the AHB transfer is initialized by sending BEGIN_REQ on the forward path:

ahb->nb_transport_fw(*trans, phase, delay);

In the AHBCTRL this causes the transaction to be scheduled for arbitration. The bus model will reply with TLM_ACCEPTED. The signal for successful arbitration is END_REQ being received on the backward path. At the time of END_REQ the ahb_nb_transport_bw function notifies the mEndRequestEvent, which unblocks the mem_read function. For read operations END_REQ is directly followed by BEGIN_RESP. A BEGIN_RESP from the AHBCTRL triggers the ResponseThread (via mResponsePEQ). The ResponseThread is responsible for sending END_RESP to the AHBCTRL. Moreover, it forwards the transaction to the cleanUP thread. The latter returns the transaction to the memory pool with a delay of 100 clock_cycles. The additional lifetime of the transaction guarantees that the data pointer can be savely copied by the master.

Write transactions are processed in a very similar way. Modules writing to the ahb_master socket use the mem_write ( mem_if ) interface function. The mem_write function obtains a payload object from the memory pool and initializes all data members including the mentioned ahb specific extensions. The mem_write function also distinguishes between debug, blocking (LT) and non-blocking (AT) communication. While debug and blocking communication are trivial, the non-blocking communication differs from the standard TLM protocol. This is due to the pipelined nature of AHB.

AHB communication is split into two phases: address and data. RTL slaves sample the address at the first clock edge and the data at the second. The data phase of the first transaction equals the address phase of the second (succeeding) one. Especially for write transactions from RTL masters to TLM slaves, the TLM standard protocol is insufficient. The slave can never know, when the data pointer of a transaction becomes valid. Therefore, the BEGIN_RESP phase of the standard protocol has been replaced by phase BEGIN_DATA , which is directed from the master to the slave. The END_RESP phase is replaced by phase END_DATA. END_DATA is send by the slave and indicates the end of a write operation.

The mem_write function initiates a bus transfer by sending BEGIN_REQ on the forward path. Equal to read operations the AHBCTRL will reply with END_REQ (backward path), as soon the master has won arbitration. After receiving END_REQ the ahb_nb_transport_bw function notifies the mEndRequestEven t. Moreover, the transaction is forwarded to SC_THREAD DataThread (via mDataPEQ ). The DataThread sends BEGIN_DATA to the AHBCTRL. As soon as the bus has sent END_DATA (via backward or return path), the transaction is considered complete. To make sure all pointers can be properly saved, the payload is returned to the memory pool with a delay of 100 clock_cycles ( mEndTransactionPEQ ).

Compilation

For the compilation of the MMU_CACHE IP, a WAF wscript is provided and integrated in the superordinate build mechanism of the library.

All required objects for simulating the MMU_CACHE on platform level are compiled in a sub-library named mmu_cache using following command:

$ ./waf –target=mmu_cache

To utilize mmu_cache in simulations with other components, add mmu_cache to the use list of your wscript.

Example Instantiation

The example below demonstrates the instantiation of the MMU_CACHE inside an sc_main or an arbitrary top-level class. The instantiating module needs to include at least mmu_cache.h and amba.h. The MMU_CACHE is created in line 3 – 32. In lines 39 – 40 the icio and dcio slave sockets are bound to the master sockets of the testbench (or processor). The ahb master port is bound to the AHBCTRL in line 43. Line 46 shows how to connect the snooping. Since MMU_CACHE has internal storage, it needs a notion of time. In this example the clock cycle time is set in line 49.

// CREATE MMU Cache
// ----------------
mmu_cache mmu_cache(1, // int icen = 1 (icache enabled)
2, // int irepl = 2 (icache random replacement)
4, // int isets = 4 (4 instruction cache sets)
4, // int ilinesize = 4 (4 words per icache line)
1, // int isetsize = 1 (1kB per icache set)
0, // int isetlock = 0 (no icache locking)
1, // int dcen = 1 (dcache enabled)
2, // int drepl = 2 (dcache random replacement)
4, // int dsets = 4 (4 data cache sets)
4, // int dlinesize = 4 (4 words per dcache line)
1, // int dsetsize = 1 (1kB per dcache set)
0, // int dsetlock = 0 (no dcache locking)
0, // int dsnoop = 0 (no cache snooping)
0, // int ilram = 0 (instr. localram disable)
1, // int ilramsize = 1 (1kB ilram size - disabled)
0x0000008e, // int ilramstart = 8e (0x8e000000 default ilram start address)
0, // int dlram = 0 (data localram disable)
1, // int dlramsize = 1 (1kB dlram size - disabled)
0x0000008f, // int dlramstart = 8f (0x8f000000 default dlram start address)
0xffff, // int cached = 0 (fixed cacheability mask)
0, // int mmu_en = 0 (mmu not present)
8, // int itlb_num = 8 (8 itlbs - not present)
8, // int dtlb_num = 8 (8 dtlbs - not present)
0, // int tlb_type = 0 (split tlb mode - not present)
1, // int tlb_rep = 1 (random replacement)
0, // int mmupgsz = 0 (4kB mmu page size)>
"mmu_cache", // name of sysc module
0, // id of the AHB master
0, // bool pow_mon = 1 (disable power monitoring)
amba::amba_LT); // select LT or AT abstraction
...
// *** BIND SOCKETS
// Connect testbench (cpu) to mmu-cache
tbm.icio(mmu_cache.icio);
tbm.dcio(mmu_cache.dcio);
// Connect mmu_cache to TLM bus
mmu_cache.ahb(ahbctrl.ahbIN);
// Connect snooping
ahbctrl.snoop(mmu_cache.snoop);
// Set timing (clock cycle)
mmu_cache.set_clk(10, SC_NS);