| The Datasheet Archive - 100 Million Datasheets from 7500 Manufacturers. |
72-TRN-274-01 1/205 Contents Contents Introduction
Top Searches for this datasheetST20-C1 Core Instruction Reference Manual 72-TRN-274-01 1/205 Contents Contents Introduction ST20-C1 features Manual structure Notation 4.10 4.11 4.12 Instruction listings Instruction definitions Operators used definitions Data structures constants Values Memory Registers Instruction encoding Manipulating evaluation stack. Loading storing Expression evaluation Arithmetic Forming addresses Comparisons jumps Evaluation boolean expressions Bitwise logic operations. Shifting byte swapping Function procedure calls Peripherals Status register. Data formats umac Short multiply accumulate loop Biquad filter Data vectors Scaling Data formats Architecture Using ST20-C1 instructions Multiply accumulate 2/205 Contents Exceptions 7.10 Exception levels Exception vector table. Exception control block saved state. Initial exception handler state Restrictions exception handlers Interrupts Traps Setting exception handler Processes Descheduled processes Queues Timeslicing Inactive processes Descheduled process state. Initializing multi-tasking Scheduling kernels Semaphores Sleep Multi-tasking Instruction Reference Appendices Constants data structures. .169 Instruction summary. .174 Compiling ST20-C1 .178 Glossary .187 3/205 ST20-C1 features Introduction This manual provides summary reference ST20 architecture instruction ST20-C1 core. ST20 technology building successful embedded VLSI designs. ST20 devices comprise collection VLSI macro-cells connected through high-performance onchip bus. This architecture allows easy construction both general purpose (e.g. ST20-MC1 micro-controller) application specific devices (e.g. ST20-TPx digital family). ST20 macro-cell library includes micro-cores, on-chip memories wide range digital analogue devices. SGS-THOMSON offers range ST20 micro-cores, allowing best cost performance trade-off achieved each application area. This manual describes ST20-C1 micro-core. ST20 devices available from SGS-THOMSON licensed second source vendors. ST20-C1 features implemented 2-way superscalar, 3-stage pipeline, with internal 16word register cache. This architecture sustain instructions progress, with maximum instructions completing cycle. uses variable length instruction coding scheme based 8-bit units which gives excellent static dynamic code size. Instructions take between units code, with average 1.25 units bits) instruction. provides flexible prioritized vectored interrupt capabilities. worst case interrupt latency microseconds operating frequency). provides extensive instruction level support 16-bit digital signal processing (DSP) algorithms. particularly suitable power battery-powered applications, with core operating power, sophisticated power management facilities. provides extensive real-time debugging capability through optional ST20 diagnostic controller unit (DCU) macro-cell, which supports fully non-intrusive breakpoints, watchpoints code tracing. flexible powerful built-in hardware scheduler. This light-weight real-time operating system (RTOS) directly implemented microcode ST20-C1 processor. hardware scheduler customized provides support software schedulers. provides built-in user-programmable 32-bit input/output register providing system control communication capability directly from CPU. ST20-C1 following features: 4/205 Introduction Manual structure This introduction chapter, which explains structure book; notation chapter (Chapter which explains layout notation conventions used instruction definitions elsewhere; architecture chapter (Chapter which explains structure ST20C1 core, registers, memory addressing, format instructions exception handling process models; Four chapters using instructions instructions used achieve certain useful outcomes: Chapter general instructions; Chapter multiply-accumulate; Chapter interrupts traps; Chapter processes support multi-tasking. alphabetical listing instructions, page (Chapter Descriptions formal definitions presented standard format with instruction mnemonic full name instruction page. notation used explained detail Chapter manual divided into following chapters: addition there appendices listing constants structures, covering issues related compiling ST20-C1 core listing instruction plus glossary ST20-C1 terminology. 5/205 Instruction listings Notation This chapter describes notation used throughout this manual, including meaning instruction listings meanings values constants. Instruction listings instructions listed alphabetical order, page. Descriptions presented standard format with instruction mnemonic full name instruction page, followed these categories information: Code: instruction code; Description: brief summary purpose behavior instruction; Definition: more formal complete description instruction, using notation described below section 2.2; Status Register: list errors other changes Status Register which occur; Comments: list other important features instruction; also: cross references provided other instructions with related functions. These categories explained more detail below, using instruction example. 2.1.1 Instruction name header each page shows instruction mnemonic and, right, full name instruction. primary instructions mnemonic followed indicate operand instruction; same notation used description show operand used. explanation primary secondary instruction formats given section 3.4. 2.1.2 Code code instruction value that would appear memory represent instruction. secondar instructions instruction `operation code' shown memory code actual bytes, including prefixes, which stored memory. value given sequence bytes hexadecimal, decoded left right. codes stored memory `little-endian' format, with first lowest address. example, entry instruction Code: This means that hexadecimal byte value would appear memory and. 6/205 Notation primary instructions code stored memory determined partly value operand instruction. this case op-code shown `Function where function code last byte instruction. example, (add constant) shown Code: Function This means that would appear memory hexadecimal byte value operand range would appear memory 2.1.3 Description description section provides indication purpose instruction well summary behavior. This include details registers, whose initial values used parameters into which results stored. example, instruction contains following description: Description: Bitwise Areg Breg. 2.1.4 Definition definition section provides formal description behavior instruction. behavior defined effect state processor, i.e. changes values registers memory before after instruction executed. effects instruction registers, etc. given statements following form: register expression involving registers, etc. expression involving registers, etc. memory_location Primed names (e.g. Areg) represent values after instruction execution, while names without primes represent values when instruction execution starts. example, Areg represents value Areg before execution instruction while Areg represents value Areg afterwards. example above states that after instruction been executed register memory location left hand side holds value expression right hand side. Only changed registers memory locations given left hand side statements. value register memory location given then value unchanged instruction. description written with main function instruction stated first. example main function instruction Areg Breg into Areg). This followed other effects instruction, such rotating stack. There temporal ordering implied order which statements written. 7/205 Instruction definitions example, instruction contains following description: Definition: Areg Breg Creg Breg Areg Creg Areg This says that integer stack rotated Areg assigned bitwise values that were initially Breg Areg. After instruction executed Breg contains value that originally Creg, Creg value that Areg. notation described more fully section 2.2. 2.1.5 Status Register This section instruction definitions lists changes bits Status register which occur. Status register described more detail section 3.3.2. 2.1.6 Comments This section used listing other information about instructions that interest. This includes indication type instruction: "Primary instruction" indicates functions which directly encoded with operand single byte instruction. "Secondary instruction" indicates instruction which encoded using opr. explanation primary secondary instruction formats given section 3.4. Comments section also describes situations where operation instruction undefined invalid limits parameter values. example, only comment listed instruction Comments: Secondary instruction. This says that secondar instruction. Instruction definitions following sections give full description notation used formal definition section instruction descriptions. 8/205 Notation 2.2.1 process state process state consists registers (Areg, Breg, Creg, Iptr, Tdesc, Wptr, Status), contents memory. description meanings uses registers special memory locations data structures given section 3.3. 2.2.2 General instruction descriptions intended describe instructions implemented, only their effect state processor. example, result shown terms intermediate result calculated infinite precision, although such intermediate result used implementation. Comments italics) used both clarify description describe actions values that cannot easily represented notation used here; e.g. take timeslice trap. Some these actions values described more detail other chapters. ellipsis used show range values; e.g. 0.31' means that values from inclusive. Subscripts used indicate particular bits word; e.g. Aregi Areg; Areg0.7 least significant Areg. Note that least significant word, most significant bit. Except Iptr, certain reserved words memory, taking exceptions switching processes, description does mention state register memory location after instruction, then value will changed instruction. Iptr assigned address next instruction code before instruction execution starts. Iptr included description only when there additional effects instruction (e.g. jump instruction). these cases address next instruction indicated comment `next instruction'. 2.2.3 Undefined alues Some instructions some circumstances leave contents register memory location undefined state This means that value location changed instruction, value cannot easily defined, meaningful result instruction. example, when division zero attempted, Breg Creg become undefined, i.e. they contain meaningful data. undefined value represented name undefined. values registers which become undefined result executing instruction implementation dependent guaranteed same different members revisions ST20 family processors. 2.2.4 Data types instruction includes operations three sizes data: 32-bit objects. 8-bit 16-bit data represent signed unsigned integers 32-bit data 9/205 Instruction definitions represent addresses, signed unsigned integers. Generally arithmetic signed. some cases clear from context (e.g. from operators used) whether particular object represents signed unsigned number. subscripted label added (e.g. Aregunsigned) clarify where necessary. 2.2.5 Representing memory memory represented arrays each data type. These indexed value representing byte address. Access three data types represented instruction descriptions following way: byte[address]references byte memory given address sixteen[address]references 16-bit half word memory word[address]references 32-bit word memory these, state machine referenced that before instruction function used without prime (e.g. word[address]), that after instruction function used with prime (e.g. word[address]). example, writing value given expression, expr, word memory address addr represented word[addr] expr reading word from memory location achieved register word[addr] Writing memory these ways will update contents memory, these updates will consistently visible other representations memory. example, writing byte address will modify least significant byte word address Data alignment Generally, word half word data items have restrictions their alignment memory. Byte values accessed byte address, i.e. they byte aligned. 16-bit objects only accessed even byte addresses, i.e. least significant address must 32-bit objects must word aligned, i.e. least significant bits address must zero. Address calculation address identifies ticular byte memory. Addresses frequently calculated from base address offset. different instructions offset given units bytes words depending data type being accessed. order calculate address data, word offset must converted byte offset before being added base address. This done multiplying offset number bytes word, i.e. there many accesses memory word offsets, shorthand notation used represent calculation word address. notation register used 10/205 Notation represent address which offset words bytes) from address register. example, specification load non-local there Areg word[Areg Here, Areg loaded with contents word that words from address pointed Areg, i.e. word address Areg cases, given base address correct alignment then offset used will also give correctly aligned address. Operators used definitions full list operators used instruction definitions given Table 2.1. Unless otherwise stated, arithmetic signed. Symbol >>arith Meaning Unchecked (modulo) integer arithmetic Signed integer add, subtract, multiply, divide remainder. computation overflows result operation truncated word length. divide remainder zero occurs result operation undefined. errors signalled. operator also used monadic operator. Signed comparison operators Comparisons signed integer values: `less than', `greater than', `less than equal', `greater than equal', `equal' `not equal'. Bitwise operators `Not', `and', `or', `exclusive or', logical left right shift arithmetic right shift operations bits words. Boolean operators Boolean combination conditionals. Table Operators used instruction descriptions Modulo operators Arithmetic done using modulo arithmetic i.e. there checking errors and, calculation overflows, result `wraps around' range values representable word length processor e.g. adding address 11/205 Operators used definitions address produces address byte bottom address map. These operators represented symbols `+', `-', etc. Error conditions errors that occur instructions which defined modulo operators indicated explicitly instruction description. example instruction indicates cases that cause overflow underflow, independently actual addition: (sum MostPos) Areg 2BitsPerWord Statusunderflo clear Statusoverflow else (sum MostNeg) Areg 2BitsPerWord Statusunderflo Statusoverflow clear else Areg Statusunderflo clear Statusoverflow clear 2.3.1 Functions Type conversions following notation used indicate type cast 16-bit integer: int16 large small into 16-bit integer then result instruction undefined. Double word splitting Where calculation performed using 48-bit 64-bit value, value split into words. function low_word returns least significant word function high_word returns most significant word. 12/205 Notation 2.3.2 Conditions instructions many cases, action instruction depends current state processor. these cases conditions shown clause; this take following forms: condition statement condition statement else statement condition statement else condition statement else statement These conditions nested. Braces, used group statements which dependent condition. example, (conditional jump) instruction contains following lines: (Areg Iptr next instruction else Iptr next instruction Areg Breg Breg Creg Creg Areg This says that value Areg zero, then jump taken (the instruction operand, added instruction pointer), otherwise stack popped execution continues with next instruction. Data structures constants number data structures have been defined this ual. Each comprises number data slots that referenced name text instruction descriptions. These data structures listed tables Appendix Each table gives name each slot structure word offsets from base address structure. slot data structure identified using offset notation descr ibed section 2.2.5: word[base_address word_offset] 13/205 Data structures constants example, back pointer semaphore structure address would word[sem s.Back] addition, several constants used identify fixed values ST20-C1 processor. constants listed Appendix Product identity value This value returned ldprodid instruction. specific product ST20 family refer SGS-THOMSON. 14/205 Notation 15/205 Values Architecture This chapter describes general architectural features ST20-C1 core which relevant more than instruction group instructions. Interrupts traps described Chapter support multi-tasking described Chapter Other features which related specific tasks descr ibed Chapter full list constants data structures given Appendix ST20-C1 instruction covers: control flow arithmetic logical operations field manipulations shifting byte-swapping register manipulations memory access with various addressing modes data sizes task scheduling direct input/output Values ST20-C1 core supports data objects different sizes, either signed unsigned. sizes directly supported bytes (8-bit), half words (16-bit), words (32-bit) multiple words (64-bit, 96-bit etc.). Bytes, half-words words loaded stored. Arithmetic operations provided signed words multiple words. half word called sixteen instruction names. most negative integer (0x80000000) known MostNeg most positive (0x7FFFFFFF) MostPos. Boolean objects, taking values true false, also used some instructions. False represented value true value Section describes other values implemented language compilation. Several data structures defined this ual. Each comprises number data words (sometimes called slots) that referenced name text instruction descriptions addressed offsets from base data structure. full list these data structures other constants given Appendix 3.1.1 Ordering information ST20 little-endian i.e. less significant data always held lower addresses. This applies bits bytes, bytes words words memory. Hence, word data representing integer, byte more significant than another byte selector larger. Figure shows ordering bytes words ST20. 16/205 Architecture Most significant Bytes word Least significant Most significant Bits word Least significant Figure Bytes bits words example, most significant word most significant byte byte consisting bits This ordering compatible with Intel processors, Motorola SPARC. compatibility with other devices, swap32 instruction provided reverse order bytes within word. 3.1.2 Signed integers sign extension signed object stored twos-complement format. signed value represented object size. Most commonly signed integer represented single word, explained, stored, example, 64-bit object, 16-bit object, 8-bit object. each these formats, bits within object contain useful information. length object that stores signed value increased, that object size increased without changing value that represented. This operation known sign extension. extra bits that allocated larger object, meaningful value signed integer; they must therefore appropriate value. value these extra bits same value most significant i.e. sign smaller object. ST20-C1 provides instructions that sign extend byte half-word objects words. example shown Figure shows value stored 32-bit register, either 8-bit object 32-bit object. this case, bits meaningful 32-bit object 8-bit object. These bits 32-bit object. 17/205 Memory these values related integer value position signed integer value (-10) stored 8-bit object (byte) position signed integer value (-10) stored 32-bit object (word) Figure Storing signed integer different length objects Memory ST20 processor 32-bit word machine, with byte addressing Gbyte address space. This section explains data arranged that address space. address object address base, i.e. byte with lowest address. 3.2.1 Word address byte selector machine address, pointer, single word data which identifies byte memory i.e. byte address. comprises parts, word address byte selector. byte selector occupies least significant bits word; word address thirty most significant bits. address treated signed value, range which starts most negative integer continues, through zero, most positive integer. This enables standard arithmetic comparison functions used pointer values same that they used numerical values. Certain values never used pointers because they represent reserved addresses bottom memory space. They reserved processor initialization. full list names values constants used this manual given Appendix particular, null process pointer (known NotProcess) value MostNeg, since zero could valid process address. 3.2.2 Alignment data object said word-aligned address with byte selector zero, i.e. full address object divisible Similarly, data object said 18/205 Architecture half-word-aligned address with even byte selector, i.e. full address object divisible Word objects, including addresses, normally stored word-aligned memory. This usually desirable make best 32-bit wide memory. Also most instructions that involve fetching data from storing data into memory, word aligned addresses load store four contiguous bytes. However, there some instructions that manipulate part word. half-word object normally half-word-aligned, stored either least significant bits word most significant bits. data item that represented contiguous words called double word object normally word-aligned. 3.2.3 Ordering information memory Data stored memory using little-endian rule. Objects consisting more than byte stored consecutive bytes, with least significant byte lowest address most significant highest address Figure shows ordering bytes words memory. word-aligned address then word consists bytes addresses X+3, where byte least significant byte most significant byte word. Memory (bytes) 32-bit words word-aligned byte address byte bytes past Figure Bytes words memory 3.2.4 Work space ST20-C1 uses stack-based data structure memory hold local working data program, called work space. work space word-aligned collection 32-bit words pointed work space pointer register (Wptr). programmer's model that local data held work space, i.e. memory, must brought into evaluation stack operated then written back from evaluation stack work space. 19/205 Registers implementation ST20-C1 core include register cache. This provides mechanism accelerate access local work space without changing programmer's model work space operates impacting either excellent code density interrupt latency associated with stack-based instruction set. Registers This section introduces ST20-C1 core registers that visible programmer. Seven registers, known process state registers, define local state executing process. These registers preserved through exceptions. other register provided performing input/output, preserved through exceptions. registers 32-bit. Each instruction explicitly refers specific registers described instruction definitions. state executing process instant defined contents machine registers listed Table 3.1. registers illustrated Figure described rest this section. Register Areg Breg Creg Iptr Status Wptr Tdesc IOreg Evaluation stack register Evaluation stack register Evaluation stack register Instruction pointer register, pointing next instruction executed Status register Work space pointer, pointing stack currently executing process Task descriptor Input output register Description Table Processor registers ST20-C1 Core Evaluation Stack Areg Breg Creg Status IOReg Local Program Data Task Descriptor Tdesc Instruction Pointer Iptr Program Code Memory Task Control Block Workspace Pointer Wptr offset base Figure Register 20/205 Architecture 3.3.1 Evaluation stack registers Areg, Breg Creg organized three register evaluation stack, with Areg top. evaluation stack used expression evaluation hold operands results instructions. Generally, instructions values from push values onto evaluation stack both, address individual evaluation stack registers. Pushing value onto stack means that value initially Breg pushed into Creg, value Areg pushed into Breg value Areg. Popping value from stack means that value taken from Areg, value initially Breg popped into Areg, value Creg popped into Breg.The value left Creg varies between instructions, generally value initially Areg. These actions illustrated Figure Figure 3.6. Before Areg Breg Creg After Figure Pushing value onto evaluation stack Before Areg Breg Creg After Figure Popping value from evaluation stack 3.3.2 Status register status register contains status bits which describe current state executing process errors which have been detected. Initially status register value given Table 7.3. contents status register summarized Table described more detail following paragraphs. Generally status register local except 21/205 Registers numbers Full name mac_count mac_buffer mac_scale mac_mode global_interrupt_enable local_interrupt_enable overflo underflo carry user_mode interrupt_mode trap_mode sleep reserved start_next_task timeslice_enable timeslice_count Meaning when meaning value Multiply-accumulate number steps. Multiply-accumulate data buffer size code. Multiply-accumulate scaling code. Multiply-accumulate accumulator format code. Enable external interrupts until explicitly disabled. Enable external interrupts. Clearing this disables interrupts until current process descheduled. arithmetic operation gave positive overflow. arithmetic operation gave negative overflow. arithmetic operation produced carry. user process executing. interrupt handler executing trapped. trap handler executing. processor sleep. Reserved. must start executing process. Timeslicing enabled. Timeslice counter. Table Status register bits global interrupt enable timeslice enable, which global carried from process another across context switch. mac_count, mac_buffer, mac_scale mac_mode fields used multiply-accumulate instructions hold initialization data which must saved when exception occurs. Chapter details multiply accumulation. global_interrupt_enable enables external interrupts. Interrupts remain enabled disabled until explicitly disabled enabled again. This global maintained when process descheduled. local_interrupt_enable enables external interrupts. Clearing this disables external interrupts until current process descheduled. This needed when process delegates part processing peripheral then deschedules until completion, described section 4.11.3. Overflo underflo carry bits relating arithmetic state kept status word. ST20-C1 maintains "sticky" bits status word which indicate whether overflow underflow occurred. This allows complete expression evaluated before testing whether overflow occurred. Overflow underflo chosen they apply both addition well multiply opposed more traditional method replicating bits carry 22/205 Architecture chain. addition, they allow saturated arithmetic implemented relatively easily. (non-sticky) carry provided allow efficient implementation long addition subtraction. carry only manipulated addc subc instructions allowing other instructions used address formation multi-word values where carry propagation required that carry lost address formation evaluations. user_mode indicates when machine handling user process, i.e. process which exception handler. interrupt_mode indicates when machine handling interrupt, trap from interrupt handler trap_mode indicates when machine executing trap handler. operating system need distinguish between modes allow perform scheduling activities from trap handler. These bits also required enable eret instruction determine whether signal interrupt controller required. sleep indicates that sleep, i.e. turn clocks into power mode. This when detects there user process execute cleared when goes sleep. start_next_task when causes processor attempt next process from scheduling queue. timeslice_enable timeslice_count field used timeslicing, described section 7.4. instructions status register described section 4.12. 3.3.3 work space pointer programs need somewhere store local working data, e.g. local variables application code. ST20 architecture, this local storage termed work space program. Wptr register local work space pointer, which holds address stack executing process. stack downward pointing, space allocated moving Wptr lower address. This address word aligned therefore least significant bits zero. When process descheduled, Wptr stored part process descriptor block, which pointed Tdesc. Wptr used base addressing local variables. word offset from Wptr operand instructions (load local), (store local) ldlp (load local pointer). ST20-C1 simplifies stack scheme decoupling load/store action from pointer update: Load-local store-local instructions access values work space with addresses relative Wptr, change value Wptr. Separate instructions (ajw, gajw) provided update work space 23/205 Instruction encoding pointer amount step without needing series increments decrements. calling function procedure, Wptr normally decreased lower address allocate space parameters local variables function. This performed using instruction ajw. Wptr returned initial value before returning from function free local work space. 3.3.4 task descriptor task descriptor Tdesc points process descriptor block currently executing process. value held Tdesc becomes process identifier when process executing. process descriptor block block memory whose contents depend state process. will generally hold saved Wptr Iptr process, hold link next process process queue waiting processes. process descriptor block described section 7.2. 3.3.5 register bits IOreg mapped external connections ST20-C1 core. They used signal read signals from, peripherals chip. instruction used read write IOreg described section 4.11. IOreg global, remains unchanged context switch. bits IOreg defined Table 3.3. Bits 0-15 16-31 Output data Input data Purpose Table IOreg bits some ST20 variants, some bits register reserved system use. reserved bits will most significant bits appropr iate half word. number such bits given data sheet each variant. Instruction encoding ST20-C1 zero-address machine. Instruction operands always implicit bits needed instruction representation carry address operand location information. This results very short instructions exceptionally high code density. instruction encoding designed that most commonly executed instructions occupy least number bytes. This reduces size code, which saves memory reduces memory bandwidth needed instruction fetching. This section describes encoding mechanism. sequence single byte instruction components used encode instruction. ST20 interprets this sequence instruction fetch stage execution. Most 24/205 Architecture programmers, working level microprocessor assembly language high-level language, need aware existence instruction components generally need consider encoding. This section been included provide background. Appendix discusses consequential issues which need considered order implement code generator. 3.4.1 instruction component Each instruction component byte long, divided into 4-bit parts. four most significant bits byte form function code, four least significant bits used build instruction data value shown Figure 3.7. function code data Figure Instruction format This representation provides sixteen function code values (one each function), each with data field ranging from Instructions that specify instruction directly function code called primary instructions functions. There primary instructions, other three possible function code values used build larger data values other instructions. function code values, pfix nfix used extend instruction data value prefixing. function code operate (opr) used specify instruction indirectly using instruction data value. used implement secondar instructions operations. 3.4.2 instruction data value prefixing data field instr uction component used create instruction data value. Primary instructions interpret instruction data value operand instruction. Secondary instructions interpret operation code instruction itself. mnemonic name prefix negative prefix pfix nfix Table Prefixing instr uction components instruction data value signed integer that represented 32-bit word. each instruction sequence, initial value this integer zero. Since there only bits data field single instruction component, only possible most instruction components initially assign instruction data value range Prefix components used extend range instruction data value. 25/205 Instruction encoding more prefixing components needed create full instruction data value. prefixes shown Table explained below. instruction components initially load four data bits into least significant four bits instruction data value. pfix loads four data bits into instruction data value, then shifts this value four places. Consequently, sequence more prefixes used extend data value following instruction positive value. Instruction data values range represented using pfix. nfix similar, except that complements bits instruction data value before shifting thus changing sign instruction data value. Consequently, sequence more pfixes with nfix used extend data value following instruction negative value. Instruction data values range -256 represented using nfix When processor encounters instruction component other than pfix nfix loads 4-bit data field into instruction data value. instruction encoding complete instruction executed. instruction data value then cleared that processor ready fetch next instruction component, building instruction data value. example, load constant 0x11, instruction 0x11 encoded with sequence: pfix instruction 0x2A68 encoded with sequence: pfix pfix pfix instruction encoded with sequence: nfix 3.4.3 Primary Instructions Research shown that computers spend most time executing small number instructions such instructions load store from small number `local' variables; instructions compare with small constants; instructions jump call other parts program. efficiency, ST20 these encoded directly primary instructions using function field instruction component. Thirteen instruction components used encode most important operations performed computer executing high level language. These used conjunction with zero more prefixes) implement primary instructions. Primary instructions interpret instruction data value operand instruction. mnemonic primary instruction always includes this operand, shown this manual 26/205 Architecture mnemonics names primary instructions listed Table 3.5. mnemonic name constant adjust work space function call conditional jump equals constant jump load constant load local load local pointer load non-local load non-local pointer store local store non-local fcall ldlp ldnl ldnlp stnl Table Primary instructions 3.4.4 Secondary instructions ST20 encodes other instructions, known secondary instructions, indirectly using instruction data value. mnemonic name operate Table Operate instruction function code causes instruction data value interpreted operation code instruction executed. This selects operation performed values held evaluation stack, that further operations encoded single byte instruction. pfix instruction component used extend instruction data value, allowing number operations encoded. Secondary instructions have operand specified encoding, because instruction data value been used specify operation. ensure that programs represented compactly possible, operations encoded such that most frequently used secondary instructions represented without using prefix instructions. example, instruction encoded instruction encoded 27/205 Instruction encoding which turn encoded with sequence: pfix 3.4.5 Summary encoding produces very compact code. simplifies language compilation, providing completely uniform allowing primary instruction take operand size processor word-length. allows these operands represented form independent wordlength processor. enables number secondary instructions implemented. encoding mechanism important consequences. clarity brevity, prefix sequences explicitly shown this guide. Each instruction represented mnemonic, primary instructions item data, which stands appropriate instruction component sequence. Hence examples above would just shown add, and. Where appropriate, expression placed code sequence represent code needed evaluate that expression. 28/205 Architecture 29/205 Manipulating evaluation stack Using ST20-C1 instructions This chapter describes purpose which sequential instructions intended, except multiply-accumulate instructions, which described Chapter These instructions described context their intended use. Some instructions designed particular sequence instructions, this chapter describes those sequences. Instructions exceptions described Chapter multi-tasking instructions described Chapter architecture ST20-C1, including registers memory arrangement, described Chapter Manipulating evaluation stack evaluation stack consists registers Areg, Breg Creg. general action evaluation stack described section 3.3.1. Instructions provided shuffling re-order values evaluation stack, listed Table 4.1. Mnemonic Name rotate stack anti-rotate stack duplicate stack reverse stack arot Table Evaluation stack manipulation instructions pops value from Areg evaluation stack rotates into Creg, arot pushes value from Creg onto stack. swaps Areg Breg, pushes copy Areg onto stack. Table shows each these affects evaluation stack. Each shows contents evaluation stack after these instructions executed initial values Areg, Breg Creg respectively. Instruction Areg Breg Creg arot Table Evaluation stack manipulation Many instructions leave initial Areg Creg. This value restored into Areg using arot. 30/205 Using ST20-C1 instructions Loading storing Name load constant load local store local load non-local store non-local load byte increment store byte increment load sixteen increment load sixteen sign extended increment store sixteen increment load word increment store word increment Load constant Load value from words above Wptr. Store value words above Wptr. Load value from words above Areg. Store value words above Areg. Load byte increment address byte. Store byte increment address byte. Load half word increment address bytes. Load half word sign extend bits increment address bytes. Store half word increment address bytes. Load word increment address bytes. Store word increment address bytes. Description loading storing instructions listed Table 4.3. Mnemonic ldnl stnl lbinc sbinc lsinc lsxinc ssinc lwinc swinc Table Loading storing instructions ST20, term loading means pushing value onto evaluation stack. value loaded value read from memory, constant, copy another register calculated value. Storing means popping value from evaluation stack. value written into memory written into another register. evaluation stack described section 3.3, evaluation expressions described section 4.3. Relative addresses used accessing memory order reduce code size, operand values smaller than full machine addresses. Data structures word-aligned, relative addresses word offsets, reducing operand size further. most common operations performed program loading storing small number variables, loading small literal values. 4.2.1 Loading constants primary instruction provided loading general constant, initializing variable register constant expression. 4.2.2 Local non-local variables When loading from storing memory, ST20 distinguishes between local non-local addressing. Local addressing means that address given word offset from Wptr. Non-local addressing means that address given word offset from Areg. practice, Wptr points stack, local addressing 31/205 Loading storing normally used local variables stack while non-local addressing normally used other variables. primary instructions perform loading storing local variables. example load value words above Wptr write location words above Wptr: primary instructions ldnl stnl perform loading storing non-local variables. example, load value above base address x_base store location words above y_base, where x_base y_base held local variables: x_base; ldnl y_base; stnl Note that purposes this manual, denotes loading value from variable where local non-local variable, either ldnl used appropriate. Similarly denotes storing value into variable where local non-local variable, either stnl used. 4.2.3 Byte half-word values Instructions provided loading storing byte half-word variables. each case, address initially Areg incremented size object, that repeated loads stores used copy block memory. load instructions place loaded value Areg, incremented address Creg leave Breg unaffected. store instructions write initial Breg into memory address Areg, leaving incremented address Breg, initial Creg Areg initial Breg pushed down Creg. Byte loading storing lbinc loads byte address Areg, into evaluation stack. lbinc replaces address Areg with byte stored that address, treating unsigned integer setting twenty-four most significant bits Areg incremented address left Creg. sbinc writes least significant Breg location addressed Areg. address incremented Breg. Half-word loading storing lsinc lsxinc load half-word object address Areg, into evaluation stack. lsinc replaces address Areg with half word, treating unsigned integer setting sixteen most significant bits Areg lsxinc similar lsinc, treats half-word signed integer twos-complement format, hence sign extends representation setting sixteen most significant bits Areg same value most significant half-word object. Sign extension discussed section 4.4.6. 32/205 Using ST20-C1 instructions ssinc writes half word least significant ytes Breg location addressed Areg. 4.2.4 Memory block copy block memory copy implemented using instructions lwinc swinc. These instructions load store word, increment addresses used. copy bytes from source destination, where source destination both word-aligned, loop should written, using temporary variable limit, following code: source; destination add; limit LOOP: lwinc; rev; swinc limit; arot END; LOOP; END: This most efficient method copying, since reads writes full words, making best 32-bit memory. However, this always possible alignment source destination blocks different. that case byte half-word load store should used. Expression evaluation Expression evaluation address calculation performed using evaluation stack. example, evaluation operations with integer operands performed instructions that operate values Areg Breg. result left Areg. Arithmetic boolean calculations considered sections respectively. This section describes evaluation stack used. Loading storing instructions described section 4.2. this subsequent sections, examples assembly code, single letter identifier itten instruction either expression segment code. expression then means `evaluate expression leave result Areg. 4.3.1 Using evaluation stack compiler normally loads constant expression using ldc: Loading from constant table described section 4.3.3. expression consisting single local variable loaded using Methods loading non-local variables discussed section 4.2, array elements section 4.5. 33/205 Expression evaluation Evaluation expressions sometimes requires temporary variables process work space, number these minimized careful choice evaluation order. details this achieved compiler described Appendix section C.3. 4.3.2 Loading operands three registers evaluation stack used hold operands instructions. Evaluation operand parameter involve more than register. Care needed when evaluating such operands ensure that first oper loaded pushed bottom evaluation stack evaluation later operands. processor does detect evaluation stack overflow. Three registers available loading first operand, registers second third. Consequently, instructions designed that Creg holds operand which, average, most complex, Areg operand which least complex. some cases, necessar evaluate Areg Breg operands advance, store results temporary variables. This sometimes avoided using reverse instruction. following sequences used load operands into Areg, Breg Creg respectively. rev; rev; rev; rev; choice loading sequence, which operands should evaluated advance determined number registers required evaluate each operands. algorithm used compilers given Appendix section C.4. 4.3.3 Tables constants ST20-C1 instruction been optimized that loading small constants coded compactly example allows loading constants between coded single byte. Analysis programs shows that such small constants occur markedly more frequently than large constants. However when large constant does need loaded necessary prefix sequence long. Other techniques more efficient these cases simple mechanism increase code compactness table constants. This implemented storing long constants into look-up table. This table constant entries must aligned word boundary. address this table held local variable which used index array. Then load 34/205 Using ST20-C1 instructions constant from entry constant table stored address constants_ptr following code would used: constants_ptr ldnl where instruction ldnl explained section 4.2.2. This code sequence only takes bytes, provided constants_ptr less than words from work space pointer address there more than word-length constants. worse unlikely take more than bytes. Hence, constant takes more bytes load using then this sequence often improves code compactness especially constant used more than once. 4.3.4 Assignment Single words, half words bytes assigned using load store instructions described section 4.2. Word assignment both single word variables word valued expression then word assignments compiled Byte assignment compiles compiles both single byte variables byte valued expression then byte assignments compiled compiles compiles address(a); lbinc; address(b); sbinc; address(b); sbinc; where address(variable) address variable. Forming addresses discussed section 4.5. Half word assignment both half-word variables half-word valued expression then half-word assignments compiled compiles compiles address(a); lsinc; address(b); ssinc; address(b); ssinc; where address(variable) address variable. Forming addresses discussed section 4.5. Arithmetic This section describes arithmetic instructions except multiplyaccumulate instructions, which described Chapter forming addresses, which described section 4.5. Boolean expression evaluation discussed 35/205 Arithmetic section 4.7, general principles expression evaluation described section 4.3. 4.4.1 Addition, subtraction multiplication Single length signed arithmetic provided operations listed Table 4.4. Mnemonic Name constant subtract multiply short multiply smul Table Single length signed integer arithmetic instructions Each these instructions except smul signal overflow underflow setting appropriate status register. overflow occurs result greater than MostPos underflow less than MostNeg. overflow underflow occurs, then least significant bits full result left Areg. overflow underflow `sticky', when been set, cleared other cannot subsequent arithmetic. overflow underflo bits used saturated arithmetic, described section 4.4.3. primary instruction adds constant value Areg. Breg Creg unaffected. This used incrementing decrementing variables counters. add, sub, smul, then instruction sequence evaluates expression i.e. takes value Breg left hand operand value Areg right hand operand, loads result into Areg. content Creg popped into Breg initial Areg rotated into Creg. smul multiples half-word values producing 32-bit result. cannot overflow underflow faster than mul. 4.4.2 Division remainder Division remainder performed using operations listed Table 4.5. Mnemonic Name divide step unsign argument divstep unsign Table Division remainder instructions 36/205 Using ST20-C1 instructions Each divstep generates four bits unsigned quotient, eight divsteps needed full 32-bit unsigned division, will also generate remainder. result division integer division rounded towards zero (truncated). quotient left Breg, remainder Creg, rotation pops quotient into Areg. unsign used separate sign from magnitude operands before performing division. Division then performed magnitudes, signs results derived from signs operands. Overflow occur only divisor (Areg) zero, dividend (Breg) MostNeg divisor divstep does detect these cases, does status bits, check should applied before performing division. following code sequence performs integer division a/b. signed quotient left Areg. POS: arot; unsign; arot; unsign; POS; rot; divstep; divstep; divstep; divstep; divstep; divstep; divstep; divstep; rot; not; END; rot; divstep; divstep; divstep; divstep; divstep; divstep; divstep; divstep; rot; END: following code sequence performs remainder signed remainder left Areg. POS: rev; unsign; arot; unsign; POS; divstep; divstep; divstep; divstep; divstep; divstep; divstep; divstep; arot; not; END; rot; divstep; divstep; divstep; divstep; divstep; divstep; divstep; divstep; arot; END: 4.4.3 Saturated arithmetic saturated arithmetic, when overflow underflow occurs result most positive most negative possible result respectively, instead least significant bits full result. This ensures that result near possible real value prevents glitches caused wrap-around. 37/205 Arithmetic Saturated arithmetic achieved ST20-C1 evaluating expression then performing saturate instruction. overflow underflow occurred then corresponding status will have been set, which will cause saturate change value Areg most positive most negative value respectively. saturate clears overflow underflow bits. example, perform saturated multiply mul; saturate; 4.4.4 Unary minus expression (-e) evaluated with overflow signalling not; sub; first sequence using not, requires less stack register than second. bitwise inversion which described section 4.8. 4.4.5 Long arithmetic long arithmetic instructions listed Table 4.6. Mnemonic Name with carry subtract with carry unsigned multiply accumulate addc subc umac Table Long arithmetic instructions Multiple length addition subtraction Multiple length addition subtraction performed using addc subc, executed once each word result. both instructions, carry borrow) held carry status register. This keeps carrying separate from overflow, address calculations safely performed using add, wsub without affecting carry. addc instruction forms (Breg Areg) Statuscarry leaving least significant word result Areg most significant (carr carry status register. Areg rotated into Creg. Similarly, subc instruction forms (Breg Areg) Statuscarry leaving least significant word result Areg borrow carry status register. Areg rotated into Creg. Addition double length unsigned values, giving without overflow signalling therefore compiled follows 38/205 Using ST20-C1 instructions Xlo; Ylo; addc; Zlo; Xhi; Yhi; addc; subscripts `lo' `hi', used here subsequent text, specify least most significant word respectively double word variable with which they associated. Subtraction double length values, from giving without overflow signalling compiled Xlo; Ylo; subc; Zlo; Xhi; Yhi; subc; Overflow signalling signed arithmetic added performing extra addc subc produce final which contains only sign positive negative) unless overflow occurred. example, following code could used perform double length signed addition with overflow signalling: clear carry, overflow underflow status bits Xlo; Ylo; addc; Zlo; Xhi; Yhi; addc; Zhi; dup; addc; dup; #7ffffff; overflows only carry word rev; #8000001; underflows only carry word Multiple length multiplication umac instruction multiplies single word unsigned operands Areg Breg, adds single word carry operand Creg form double length unsigned result. more significant (carr word result left Breg, less significant Areg. overflow signalled this instruction. Multiplication single length unsigned value double length unsigned value (leaving `carry' Areg) performed Ylo; umac; Zlo; Yhi; umac; Double length unsigned multiplication more complex. product unsigned double length words expressed (Xhi*232 Xlo)*(Yhi*232 Ylo) (Xhi*Yhi)*264 (Xhi*Ylo Xlo*Yhi)*232 (Xlo*Ylo) This coded follows: Xlo; Ylo; umac; Xlo; Yhi; umac; rev; 39/205 Arithmetic Xhi; Ylo; umac; Xhi; Yhi; umac; rev; rev; addc; addc; This gives quadruple length unsigned result where least significant most significant word 4.4.6 Object length conversion Object length conversion operations provided instructions listed Table 4.7. Mnemonic Name sign extend byte word sign extend sixteen word xbword xsword Table Object length conversion instructions Section explains that data represented data objects various sizes. This section describes instructions that used convert between these representations. Most ST20-C1 integer arithmetic instructions operate signed integers held evaluation stack registers 32-bit objects, produce results this form. Object length conversion important conversion high level language data types. ST20-C1 therefore provides instructions that allow byte half-word signed integer sign extended 32-bits copying sign bits that were previously significant, shown Figure 3.2. sign extension performed value Areg result placed Areg. other registers affected. xbword extends signed byte word copying into bits xsword extends signed half-word word copying into bits lsxinc loads sixteen value, sign extends bits increments address bytes. This same lsinc; xsword; 40/205 Using ST20-C1 instructions Forming addresses addressing instructions provide access items data structures using short sequences single byte instructions. These instructions listed Table 4.8. Mnemonic Name load local pointer load non-local pointer load pointer instruction word subscript Meaning Load value Wptr Load value Areg Load value Iptr Load value Areg 4.Breg. ldlp ldnlp ldpi wsub Table Addressing instructions 4.5.1 address variable absolute address local work space location loaded using ldlp primary instruction. ldlp used load value Wptr. ldnlp primary instruction provided calculate absolute address nonlocal variable. meaning local non-local described section 4.2. 4.5.2 address instruction address location program being executed obtained ldpi operation follows. address location bytes past next instruction (which itself pointed instruction pointer register) pushed onto evaluation stack ldpi example, address label loaded (L-M); ldpi where label address instruction that follows ldpi instruction. First offset bytes from loaded into Areg. ldpi then uses this offset value instruction pointer register (which will address label load address label into Areg. This technique useful generating relocatable code. Breg Creg unaffected. 4.5.3 Arrays wsub instruction interprets Areg address beginning vector word-sized data objects, Breg index into that vector. After execution, Areg holds address indexed element, Creg popped into Breg, leaving Areg rotated into Creg. operation performed wsub multiply integer Breg four this address Areg (without overflow checking). 41/205 Forming addresses Access component array split into sections; first address component must constructed, then transfer data from that component must performed. Evaluating subscript Array subscripts evaluated efficiently using smul instruction. array been declared A[S1] [Sn]; where 1.n) dimensions, then arranging this memory have elements array contiguous block. purposes this section, suppose that elements last dimension stored adjacently; otherwise change order dimension subscripts. example Figure shows elements particular three dimensional array (Array) stored this way. Array[2][2][3]; Increasing memory addresses Array[1][1][2] Array[1][1][1] Array[1][1][0] Array[1][0][2] Array[1][0][1] Array[1][0][0] Array[0][1][2] Array[0][1][1] Array[0][1][0] Array[0][0][2] Array[0][0][1] Array[0][0][0] Contiguous locations words memory space Figure possible method storing array integers access required following array element A[e1] [en] then code evaluate subscript mul; add; mul; add; mul; add; example evaluate subscript element Array[x][y][z], (where Array declared Figure 4.1) code sequence mul; add; mul; add; then this evaluates which seen from Figure 4.1, correct offset from base array. 42/205 Using ST20-C1 instructions Accessing word addressed array Wa_ptr pointer array that starts word boundary, which component types measured words. subscript expression. address component Wa_ptr; wsub; constant expression this optimized Wa_ptr; ldnlp Accessing byte addressed array Similarly, Ba_ptr pointer array (Ba) which start byte location, which each component type measured bytes. subscript expression. address component Ba_ptr; add; Comparisons jumps This section describes arithmetical comparison instructions their conditional program behavior. Unconditional jumps also described. Functions procedures described section 4.10, evaluation boolean expressions described section 4.7. Comparisons, conditional behavior jumps provided instructions listed Table 4.9. Mnemonic Name equal constant greater than greater than unsigned order order unsigned conditional jump jump jump absolute order orderu Table Comparison jump instructions 4.6.1 Representation true false ST20 uses false true. These values generated predicate operations (for example comparisons). They loaded with single byte load constant instructions. 43/205 Comparisons jumps Implementation languages with different representations true false easy implement programming languages that different representation true false. example, using not; place not; place does affect representation false result, changes representation true which used some programming languages. 4.6.2 Comparison primary instruction loads Areg with truth value true Areg initially equal instruction operand (n), false otherwise. Breg Creg unaffected. take integer operands Areg Breg produce boolean result which loaded into Areg. They also load value Creg into Breg, saving copy initial Areg Creg. instruction loads Areg with true Breg Areg, false otherwise, treating Areg Breg signed values. Similarly loads Areg with true unsigned value Breg greater than unsigned value Areg; false otherwise. 4.6.3 Jump conditional jump There relative jump instructions; both primary instructions. unconditional jump instruction, adds operand address instruction immediately following puts result into Iptr, thus transferring execution another part program. conditional jump instruction, performs jump value Areg does affect evaluation stack. value Areg rotates value Areg bottom evaluation stack continues with next instruction. Consequently serves `jump false' provided that language being implemented interprets false (see section 4.6.1). 4.6.4 Conditional transfer control conditional expressions used conditional branch construct compiled using conditional jump. statement: This compiles ENDIF; 44/205 Using ST20-C1 instructions where label ENDIF: code construct. compilation while loop shown following example. while This compiles ENDWHILE: ENDWHILE timeslice; Note that this loop includes timeslice instruction. This causes current process descheduled timeslice timeslicing enabled. presence this ensures that process cannot occupy long provided timeslicing enabled. good practice multi-tasking programs include timeslice instruction every loop. Timeslicing described section 7.4. Single task programs need timeslice, should have timeslicing disabled, timeslice instruction effect. repeat until loop shown following example. repeat until This compiles END: 4.6.5 timeslice; Ordering instructions instructions provided select smaller values. Breg smaller than Areg signed integer, order will swap Areg Breg; otherwise order will have effect. This used find minimum signed variables: order; minimum; maximum Similarly orderu used find minimum maximum unsigned values Evaluation boolean expressions This section describes operations using logical true false values, used with conditional jump Conditional behavior comparisons described section 4.6. Bitwise boolean operations described section 4.8. General issues concerning expression evaluation discussed section 4.3. 45/205 Evaluation boolean expressions following shows correspondence between logical expressions ST20-C1 instructions. represent expressions,and represents constant. symbol logical (see section 4.7.1). true false sub; sub; Further optimizations made `not equals' comparison when followed conditional jump. 4.7.1 sub; Evaluation zero represents false represents true, then logical performed 4.7.2 Evaluation evaluation logical operations, instruction sequence depends whether strict non-strict evaluation used, i.e. whether both operands always evaluated. This important side-effects occur, such trap, second operand always defined, ((ptr NULL) (ptr->tag TAG_VAL)) this example, ptr->tag defined NULL. languages such ANSI non-strict evaluation required, following short-cuts must used: non-strict evaluation, following laws should applied compilation conditional expressions before code generated ensure that jump taken early possible: L:)] 46/205 Using ST20-C1 instructions other languages, evaluation boolean expressions strict (for example, gives programmer choice) both expressions dyadic logical operations need evaluated. Where false represented true represented fixed pattern other than (e.g. true always true always -1), then following transformations apply: BITOR BITAND bitwise instructions given section used: Note that even some non-strict evaluations, above sequence preferable. Where simple boolean expression such local variable, evaluation does cause side-effects, does harm implement non-strict evaluation using bitwise operation. Bitwise logic operations Mnemonic Name exclusive bitwise load store mask memory read modify write Bitwise logic operations provided instructions listed Table 4.10. bitld bitst bitmask Table 4.10 Bitwise logic instructions operation only operand that taken from Areg. result this, which bitwise inversion bits operand, loaded into Areg, leaving Breg Creg unaffected. and, bitwise logical operations operands that taken from Areg Breg. each, result loaded into Areg. data previously held Creg popped into Breg initial Areg left Creg. These operations commutative. bitld, bitst bitmask used setting, clearing testing bits word. bitld returns value single from value Breg, bitst sets clears single bitmask creates mask with single set. each case number initially Areg result Areg. bitld bitst, value containing tested, cleared initially Breg. 47/205 Shifting byte swapping 4.8.1 Memory test clear Bits word memory tested cleared instruction rmw. address memory word held Areg masks Breg Creg. clears bits memory word that Creg, then sets bits memory word that Breg.The initial memory word loaded into Areg, with Areg pushed down Breg Breg pushed down Creg. Shifting byte swapping shift byte swapping operations provided instructions listed Table 4.11. Mnemonic Name shift left shift right arithmetic shift right byte swap ashr swap32 Table 4.11 Shifting byte swapping instructions shift operations (shl, ashr) shift operand Breg number bits specified unsigned integer Areg result Areg. fill vacated positions with zero bits, while ashr fills vacated bits with copies which original sign bit. Areg zero, result initial value Breg. When value Areg greater than number bits object being shifted, result operation undefined. data previously held Creg popped into Breg, initial Breg left Creg. swap32 reverses order bytes Areg swapping byte with byte swapping byte with byte 4.10 Function procedure calls function procedure call operations provided instructions listed Table 4.12. Mnemonic Name function call jump absolute adjust work space general adjust workspace fcall gajw Table 4.12 Function procedure instructions primary instruction fcall calls function procedure. stores instruction pointer (which holds return address) word pointed Wptr. operand call added address next instruction produce address first instr uction procedure function being called. Since call address relative, code relocatable. 48/205 Using ST20-C1 instructions function called using fcall must have fixed offset compile time. used calling functions procedures dynamically calculated addresses, example when using function pointers. instruction also used perform return. return address must have been restored with from stack into Areg. procedure function that requires local work space will normally include instructions allocate deallocate space. When instruction executed, programmer must ensure that: Areg holds return address; workspace claimed procedure should have been released that Wptr returned value held start procedure. instruction uses word evaluation stack, other words therefore used return values calling code, including pointer block additional data returned. 4.10.1 Adjusting work space primary instruction used perform relative adjustment stack pointer Wptr create work space stack beginning function return work space pointer function. increases value workspace pointer number words operand value, Work space created beginning function procedure with negative operand released before returning with positive operand. amount extra work space needed will normally include: space save parameters passed evaluation stack; space local variables temporaries; space hidden system variables such static chain. example, function with words local work space might myfunction (param_1, param_2) local variable declarations; return (E); This compiled param_1; param_2; 49/205 4.10 Function procedure calls jab; ST20-C1 processors have workspace cache which holds copy words bottom work space. This cache transparent programmer substantially improve performance. refilled whene Wptr adjusted, instruction should used excessively. 4.10.2 Parameters convenient load first three parameters procedure function into evaluation stack registers, arrange work space calling code that additional parameters stored locations work space before procedure called. Location zero work space used return address. This illustrated Figure 4.2, which shows possible work space layout function procedure with parameters four local variables. Wptr calling code Wptr function procedure param param param Return Iptr param param param variable variable variable variable Figure Example function procedure workspace enable procedure access non-local variables parameters procedure include link environment which procedure declared. 4.10.3 Returning results results size less than equal word length processor returned from function evaluation stack instruction uses third register. Further results, results larger than word length, returned passing into function addresses locations store these results extra parameters. 50/205 Using ST20-C1 instructions function used purposes illustration. simplicity, assumed that single result returned evaluation stack: -local_variables-1; local_variables; local_variables+1; jab; loading sequences described earlier required expressions returned registers contain evaluations. 4.10.4 Calling function first three ameters should loaded into evaluation stack before fcall instruction. These parameters stored local variables after workspace pointer been moved down, make best work space cache. remainder parameters passed should loaded into work space before fcall executed. When function returns, results whose addresses were passed will already have been stored that remains store results returned evaluation stack. example function call F(E1, could compiled (n-3); static_link; fcall compiler must have already allocated sufficient workspace parameters that stacked explicitly. Single result functions most programming languages, function that returns single result used expression well assignment. common form function returns single value contained word mechanism described above will return this Areg. When compiling expressions, (using algorithm described section Appendix depth such function call should taken being infinite i.e. deeper than other form expression. This because function call will always lose other information registers. giving infinite depth expression compilation algorithm will never call function while another expression result being held register. 51/205 4.11 Peripherals 4.10.5 Other work space allocation techniques gajw instruction exchanges contents Wptr Areg, allowing work spaces allocated dynamically, allowing dynamic switching between existing work spaces. process work space holds pointer work space, then following code changes work space stores pointer work space. Wnew; gajw; Wold; work space restored Wold; gajw; addition, work space accessed from work space, using Wold; ldnl Wold; stnl Wold; ldnlp 4.11 Peripherals peripheral instructions listed Table 4.13. Mnemonic Name input output mask load task descriptor stop process bitmask ldtdesc stop Table 4.13 Peripheral instructions 4.11.1 Using register register 32-bit register used simple control devices outside core. bits register directly mapped external connections ST20C1 core. connections from register on-chip external peripherals depending particular chip design. bits register defined Table 3.3. Some bits most significant each half word reserved system some ST20 variants; data sheet variant. Setting output will cause corresponding connection driven high, clearing will drive connection low. Similarly input output tested state connection; connection high will connection will clear. instruction sets clears bits register loads copy initial register. bottom half-word register bit_number; bitmask; bottom half-word cleared register bit_number; bitmask; 52/205 Using ST20-C1 instructions read from register dup; bit_number; bitld; register global changed saved context switch. more than process accesses register then need protected semaphore. reset register zeros. 4.11.2 Memory-mapped peripherals On-chip peripherals have memory-mapped registers address space. Access these registers performed same accessing memory. peripheral block word-aligned registers with base address peripheral then register with word offset register read peripheral; ldnl register; value written register value; peripheral; stnl register; 4.11.3 Channel-type peripherals Some peripherals, example peripherals using (direct memory access), channel-type control model. This section describes such peripherals, which micro-interrupt notify that assigned completed. This type peripheral works best with multi-tasking program, that other processes execute while peripheral busy. However, multi-tasking otherwise required, then interrupt model used. Multi-tasking described Chapter interrupts exception vector table described Chapter Multi-tasking principle using channel model with multi-tasking that tells peripheral start then deschedules current process. might peripheral input/output transfer. This allows continue executing other processes while progress. When peripheral completes signals CPU, which reschedules process. enable this happen, task descriptor user process entered into exception vector table. This entry called peripheral channel. peripheral signals micro-interrupt, which interrupts with exception level associated with user process. recognizes that exception vector table entry user process because zero UserProcessType, either adds process scheduling queue takes schedule exception trap installed. scheduling exception trap allows scheduling kernel control rescheduling process. more detail, steps perform using this model are: 53/205 4.11 Peripherals saves task descriptor current process exception vector table exception level peripheral. tells peripheral number bytes read written, address start data input buffer. signals peripheral that start. deschedules process, using stop instruction, wait peripheral complete job. executes other processes while peripheral performs job. peripheral completes sends micro-interrupt CPU, with exception level. reads exception vector table recognises entry user process. process added back scheduling queue, schedule execution trap enabled then trap taken. code execute steps must interrupted, since otherwise peripheral completed before process descheduled stop instruction, which crash processor. Interrupts temporarily disabled clearing local_interrupt_enable status register, which automatically when process descheduled. Steps happen automatically need coding. code drive peripheral will depend peripheral interface Typically parameters (e.g. byte count buffer address) would written memory mapped registers then further write would needed start peripheral job. code perform transmit param_count bytes from param_buffer using exception level except_level where registers periph_count, periph_buffer periph_start block peripheral would similar following: ldtdesc; ExceptionBase; stnl except_level; param_count; peripheral; stnl periph_count; rev; param_buffer; stnl periph_buffer; local_interrupt_enable; bitmask; statusclr; rot; start_value; arot; stnl periph_start; stop; process will resume next instruction after stop when peripheral complete. Single tasking programs which multi-tasking, desirable deschedule program while peripheral busy. main program should continue with other jobs while peripheral busy. interrupt handler written signal main program that peripheral complete. descriptor interrupt handler exception control block placed exception vector table. descriptor 54/205 Using ST20-C1 instructions address control block ORed with type flag ExceptionProcessType code perform transmit param_count bytes from param_buffer using exception level except_level where registers periph_count, periph_buffer periph_start block peripheral interrupt handler except_control_block would similar following: except_control_block; ExceptionProcessType; ExceptionBase; stnl except_level; param_count; peripheral; stnl periph_count; param_buffer; arot; stnl periph_buffer; start_value; arot; stnl periph_start; When interrupt handler starts execution peripheral will complete. 4.12 Status register status register manipulated using instructions listed Table 4.14. Mnemonic Name clear bits status register bits status register test status register statusclr statusset statustst Table 4.14 Semaphore instructions each these instructions, Areg holds mask. statusclr copies initial status register into Areg clears bits status register that initial Areg. example, clear bit_number bit_number; bitmask; statusclr statusset similar, sets bits status register that initial Areg. statustst returns Areg status bits masked initial Areg. Breg Creg unaffected. status register described section 3.3.2. 55/205 Data formats Multiply accumulate This section describes multiply-accumulate instructions their use. these instructions described context their intended use. Instructions general (arithmetic, loading, storing etc.) described Chapter Instructions exceptions described Chapter multi-tasking instructions described Chapter architecture ST20-C1, including registers memory arrangement, described Chapter Multiply accumulate operations provided signal processing instructions listed Table 5.1. Mnemonic Name multiply accumulate unsigned multiply accumulate initialize short multiply accumulate loop short multiply accumulate loop biquad filter step umac smacinit smacloop biquad Table Multiply accumulate instructions Data formats signed fractional number bits described x.y, where x+y=N. This means number made from bits before binary point, implied binary point, fractional bits. More details data formats given section 5.7. umac umac general purpose multiply accumulate instructions, multiplying 32-bit values adding them 32-bit unsigned initial accumulator, giving 64-bit accumulator. treats multiplicands signed umac treats them unsigned. Initially Areg Breg hold values multiplied Creg holds initial accumulator. completion, Areg least significant word result accumulator, Breg most significant Creg holds copy initial Areg. Short multiply accumulate loop smacloop instruction performs multiply-accumulate operation vectors 16-bit values held memory. takes initial accumulator value pointers, each data vectors. vector data values normally considered reside within circular buffer programmable size, this turned off. When data fetches reach this buffer, pointer wraps-around back start buffer continues. vector must word aligned. vector coefficients always flat address space never wraps around. vector must half-word aligned. 56/205 Multiply accumulate data items from each vector read from memory turn products formed between corresponding pairs from vectors. Each these products added into running accumulator value. instruction completes with values stack final accum ulator value updated data pointers. Four control values held status register, shown Table 5.2. Field mac_count mac_buffer mac_scale mac_mode Size Meaning bits number steps (from items). bits size code data buffer within which data vector lies. bits Shift control scaling coefficient alues. Accumulator format indicates 16-bit (short) indicates 32-bit (long) value. Table smacloop status register fields These values initialized smacinit instruction. smacinit instruction takes packed control word Areg, extracts control fields loads these into status register. smacinit, Areg organized shown Table 5.3. Field mac_count mac_buffer mac_scale mac_mode Size bits bits bits Least significant Most significant Table smacinit Areg format These status register values global saved when process timesliced descheduled. more than process performing short multiply accumulate loops then values should reloaded process code using smacinit after each timeslice stop instruction. 5.3.1 buffer size vector buffer size determined mac_buffer control field, which take values When mac_buffer then address wrapping takes place, i.e. buffer assumed infinite size. Otherwise, buffer size 2mac_buffer+2, shown Table 5.3. buffer must aligned multiple size, buffer bytes must start address whose value multiple bytes. 5.3.2 Number steps mac_count control field status register deter mines number multiplyaccumulate steps smacloop. This unsigned 8-bit integer. value zero signifies steps; otherwise value mac_count number steps. 57/205 Biquad filter mac_buffer Buffer size (data items) infinite Buffer size (bytes) infinite Table mac_buffer coding 5.3.3 Scaling Scaling input data vector controlled mac_scale field status register, described section 5.6. 5.3.4 Accumulator format mode smacloop supports data formats initial final accum ulator value. mac_mode then (sign extended bits) used, mode said ShortMode. mac_mode then used, mode said LongMode. Biquad filter biquad instruction performs fixed sequence multiply-accumulates. Figure shows example using format. parameters instruction pointers three vectors 16-bit values: input data vector, results vector coefficient vector This vector must word-aligned. biquad calculates next item vector according following formula, writes this memory, incrementing pointers bytes: Y[2] X[0].C[0] X[1].C[1] X[2].C[2] Y[0].C[3] Y[1].C[4] vectors must either both word-aligned neither word-aligned. pointer left unchanged. This allows successive biquad instructions executed back-to-back generate filter outputs with additional overhead. biquad scales input data according mac_scale field status register, described section 5.6. mac_scale field using smacinit, described section 5.3. 58/205 Multiply accumulate Data vectors Both biquad smacloop operate arrays 16-bit values, packed word. This allows ST20-C1 read values cycle from memory which fundamental high performance multiply-accumulate instructions. cases, data values must half-word aligned. c[2] Input Output y[2] c[1] x[2] c[4] x[1] c[0] y[1] c[3] x[0] y[0] Shifts Coefficients: Accumulator: Figure ST20-C1 biquad instruction example: Scaling biquad smacloop operations performed with oversize accumulator bits. accumulator value always sign-extended full width accumulator. During multiply-accumulate sequence value accumulator temporarily outside representable range final result, never overflow accumulator single biquad smacloop. 5.6.1 Accumulator scaling user-visible accumulator either LongMode (Q31) ShortMode (Q15). smacloop, mode defined mac_mode status register field. biquad only supports ShortMode. Pre-scaling converts user-visible accumulator internal format accumulator, shown Figure 5.2. inverse operation post-scaling which converting 59/205 Scaling internal format accumulator user-visible accumulator. ShortMode biquad User-visible accumulator Internal accumulator initially rounding Implied binary point LongMode User-visible accumulator Internal accumulator Figure Accumulator scaling accumulator left-shifted bits that assumed binary point moved from below below accumulator value saturated from upwards. multiply-accumulate sequence accumulator value shifted down (right) extra bits compensate left shift coefficient inputs. 5.6.2 Coefficient scaling standard data format data coefficients 1.15 (Q15). product 1.15 numbers 2.30. coefficient value each multiply-accumulate operation pre-scaled before being into multiplier. shift distances controlled mac_scale field: status register, shown Table 5.5. mac_scale coefficient shift left left left Table mac_scale values standard behavior 1.15 (Q15) values shift coefficients places, using mac_scale which exactly compensates extra right shift 60/205 Multiply accumulate final accum ulator. This shown Figure 5.3. data value data value internal accumulator Figure Coefficient scaling with mac_mode Shifting coefficient extra place (i.e. places) used normalize 2.14 (Q14) coefficient correct position binary point. This shown Figure 5.4. data value data value internal accumulator Figure Coefficient scaling with mac_mode 61/205 Scaling Under shifting less than used magnitude reduction (most suitable smacloop) bits (x16) using mac_scale=1 bits (x256) using mac_scale=0. Under-shifting bits shown Figure 5.5, bits Figure 5.6. data value data value internal accumulator Figure Coefficient scaling with mac_mode data value data value internal accumulator Figure Coefficient scaling with mac_mode 62/205 Multiply accumulate 5.6.3 Pre-scaling rounding smacloop initial accumulator value must loaded into accumulator start smacloop instruction. initial accumulator either format (LongMode), format (ShortMode) handled accordingly. LongMode, left shift places (right left sign extend from most significant (for rounding) left shift places (left left ShortMode, sign extend from most significant (for rounding) Note that rounding achieved adding half least significant initial value, which associativity equivalent adding final value. 5.6.4 Pre-scaling rounding biquad biquad instruction starts with empty accumulator. Since result always (equivalent ShortMode), rounding achieved loading accumulator with (which half least significant result). 5.6.5 Post-scaling saturation smacloop smacloop instruction final accum ulator value saturated scaled appropriate format, according mac_mode. either overflow underflow bits status register set, then final value appropriate exceptional value from Table 5.6. Note that this does involve testing accumulator value, which considered invalid either these status register bits set. Otherwise, when status register reports overflow underflow, bits inclusive accumulator tested mutual equality. they same, accumulator value well-formed post-scaling applied produce final accumulator value. LongMode, arithmetic right shift places(left right truncate bits arithmetic right shift places (right right truncate bits ShortMode, 63/205 Scaling However bits equal, then error occurred. most significant (bit zero then overall result positive, error overflow. most significant then verall result negative, error underflow. appropriate status register (overflow underflow) accordingly, final value taken from Table 5.6. 5.6.6 Post-scaling saturation biquad result biquad always short (16-bits). saturation test from upwards, overflow/underflo flags required. Note initial value overflow/underflo flags taken into account. saturation error occurs, appropriate ShortMode value from Table used. error occurs, scaling 5.6.7 arithmetic right shift places truncate bits Error: load exceptional value There four exceptional values based whether result delivered ShortMode (Q15) LongMode (Q31) value, whether error overflow underflow: mac_mode Overflo 00007FFF 7FFFFFFF Underflo FFFF8000 80000000 ShortMode LongMode Table Exceptional values Note that cases final value placed 32-bit register (Areg). 5.6.8 Performance interrupts biquad biquad instruction takes cycles execute, assuming single cycle memory accesses. ST20-C1 cannot interrupted during this period. smacloop smacloop steps takes cycles complete, assuming single cycle memory accesses. ST20-C1 cannot interrupted during this period, which about microseconds (for single cycle memory operating frequency Mhz). user split long multiply accumulate into shorter ones passing intermediate accumulator value from next. This will reduce interrupt latency, loses some numerical accuracy that within single smacloop intermediate values held bits precision, while values passed from smacloop another have most bits precision (and saturated). 64/205 Multiply accumulate Data formats This section gives details data formats used smacloop biquad instructions. signed fractional number bits characterized x.y, where x+y=N. This means number made from bits before binary point, implied binary point, fractional bits. Some examples listed Table 5.7. Total bits Range Format 1.31 1.15 2.14 Short name Table Example data formats 5.7.1 Range value format range -2x-1 2x-1. example, value format 1.15 range format 2.14 range 5.7.2 Multiplication characteristic product fractional values given a+c.b+d 5.7.3 Supported formats Table shows data formats multiply-accumulate operations supported ST20-C1. Description Signed 16-bit fractional Signed 16-bit fractional Signed 16-bit fractional Name Format significant actional bits significant actional bits significant actional bits Table Supported multiply accumulate data formats Note that value (optimally) stored 16-bit field, wider (>16-bit) field with redundant sign bits. storage methods described below. 5.7.4 bits format 16-bit signed fractional value range value stored two's-complement form with sign (bit 15), implied binary point between bits significant fractional bits (bit 65/205 Data formats value 16-bit field organiz shown Figure 5.7. Most significant position -(20) Least significant 2-14 2-15 binary point sign Figure data format (16-bits) minimum maximum representable values shown Table 5.9. Value Minimum Maximum 8000 7FFF Decimal Value +0.9999 Table limiting values 5.7.5 oversized field When value stored oversized field, .g.a 32-bit register, significant bits placed least significant field, alue sign-extended width whole field. value still same range same number significant bits. value 32-bit field organiz shown Figure 5.8. Most significant position Least significant 2-13 2-14 2-15 -(20) -(20) -(20) -(20) -(20) sign extended sign binary point Basic value Figure data format (32-bits) minimum maximum representable values shown Table 5.10. 66/205 Multiply accumulate Value Minimum Maximum FFFF8000 00007FFF Decimal Value +0.9999 Table 5.10 limiting values Memory access stored memory loaded Areg with lsxinc instruction which will automatically sign-extend value. Similarly written memory with ssinc instruction (which will discard bits). Saturation well-formed value sign extended width field, bits inclusive will identical. Conversely, bits from inclusive identical, then value well-formed. either overflowed b31=0, positive) underflo bit31=1, negative). value Areg saturated with sequence: 00007FFF; order; FFFF8000; order; rev; 5.7.6 format format 32-bit signed fractional value range value stored two's-complement form with sign (bit 31), implied binary point between bits significant fractional bits (bit value 32-bit field organiz shown Figure 5.9. Most significant position -(20) Least significant 2-29 2-30 2-31 binary point sign Figure data format minimum maximum representable values shown Table 5.11. 67/205 Data formats Value Minimum Maximum 80000000 7FFFFFFF Decimal Value +0.99999999 Table 5.11 limiting values 68/205 Multiply accumulate 69/205 Exceptions This chapter describes exceptions them. architecture ST20-C1, including registers memory arrangement, described Chapter full list constants data structures given Appendix exception exceptional event detected ST20-C1 core, which causes context switch from normal flow executing program. event triggering exception generated software inside core, which case called trap. Otherwise, event hardware signal from outside core, which case called interrupt, except that interrupt attempt perform scheduling action, which cause trap. When exception occurs changes context exception handler, which section code only executed when exception occurs. process state registers (Areg, Breg, Creg, Iptr, Wptr, Status Tdesc) saved while exception handler running, restored when returns. Exception handlers traps called trap handlers exception handlers interrupts called interrupt handlers. Normal processes which exception handlers known user processes. exception handler transient process. Each exception handler starts execution with standard initial state, runs completion terminates with empty workspace. When triggering event occurs again, handler restarted from standard initial state again runs completion terminates. Exception handlers nested arbitrary depth, they re-entrant, care should taken ensure that exception which caused handler cannot occur while handler running, trapped interrupted. nesting exceptions illustrated Figure 6.1. User process Exception taken state user process saved Exception executing Exception taken state exception saved Exception executing Exception returns state exception restored Exception executing User process Exception returns state user process restored Figure Nested exceptions 70/205 Exceptions Exception handler code completed executing eret instruction, which restores state interrupted trapped process. When interrupt handler executes eret, also signals interrupt controller that interrupt completed. This allows interrupt handler start lower priority waiting interrupt required. exception instructions listed Table 6.1. Mnemonic Name exception call exception return breakpoint ecall eret breakpoint Table Exception instructions Exception levels exception handlers identified integer called exception level. Exception levels HighestException (255) available user-defined exceptions, while system exceptions have negative levels, defined Table 6.2. Exception level el_breakpoint_trap el_illegal_instr_trap el_idle_trap el_schedule_exception_trap el_run_trap el_stop_trap el_timeslice_trap Name Circumstances when taken null Interrupt, system call, user process. breakpoint instruction executed breakpoint request. Illegal op-code encountered. becomes idle. Schedule user process exception. Execute instruction. Execute stop instruction. Take timeslice. Table Exception levels exception triggered from software with ecall instruction. User-defined exceptions interrupt handlers, user processes waiting peripherals trap handlers used system calls executing ecall. System exceptions traps which triggered automatically when certain states. They intended mainly operating system kernels trap scheduling events debuggers trap breakpoints. circumstances which each system trap taken follows: el_breakpoint_trap This trap taken when either breakpoint instruction executed diagnostic controller (DCU) signals requesting breakpoint. trap null then process continues. This trap used debuggers. 71/205 Exception vector table el_illegal_instr_trap This trap taken when encounters instruction with illegal opcode. trap null then instruction treated nop. el_idle_trap This trap taken when becomes idle, i.e. current process executes stop when there active processes waiting time, timeslice trapped interrupted that there active processes waiting when attempts start next process. trap null then waits interrupt process scheduled. This trap used software scheduling kernels. el_schedule_exception_trap This trap taken when interrupt received from peripheral exception level assigned user process, which will generally descheduled waiting peripheral complete job. trap null then user process queued. This trap used software scheduling kernels. el_run_trap This trap taken when instruction executed. trap null then adds process back scheduling queue. This trap used software scheduling kernels. el_stop_trap This trap taken when stop instruction executed. trap null then current process descheduled starts executing process front scheduling queue, goes idle there none. This trap used software scheduling kernels. el_timeslice_trap This trap taken when timeslice enabled timeslice instruction executed. trap null then current process timesliced, i.e. placed back scheduling queue. This trap used software scheduling kernels. Scheduling timeslices discussed Chapter Using user processes exceptions handle peripherals described section 4.11. Exception vector table exception vector table maps each exception level user process exception. base exception vector table fixed address ExceptionBase (#80000040) on-chip memory, exception level word offset from ExceptionBase vector. Thus exception level used index into exception vector table. address user process exception always word aligned, bits zero. entry exception vector table descriptor, which consists address user process exception ORed with type. type 72/205 Exceptions descriptor, either ExceptionProcessType (which value UserProcessType (which value type allows each exception level assigned following: exception handler. entry exception handler address exception control block described section 6.3) bitwise ORed with ExceptionProcessType indicate exception. user process waiting peripheral described section 4.11).The entry user process task descriptor (i.e. address process control block) bitwise ORed with UserProcessType indicate user process. null entry NotProcess. treats null entries disabled exceptions. When exception triggered, looks table entry requested exception level. value table NotProcess then exception trap taken continues with default behavior. Otherwise, UserProcessType then address part value assumed valid task descriptor process. ExceptionProcessType, then address part assumed pointer valid exception control block. Using user processes exceptions handle peripherals described section 4.11. rest this chapter refers only exceptions. Typically separate descriptor used each interrupt level system call, that this scheme provides system vectored interrupts system calls. exception handler then executed, itself interrupted trapped. Exception control block saved state When exception taken, state saved exception control block. evaluation stack, status register, Iptr, Wptr Tdesc automatically saved taking exception restored returning. This done enable exception handler have direct access state underlying process required some software scheduler implementations. constants Table define locations control block. Word offset Name ex.HandlerIptr ex.InterptdStatus ex.InterptdTdesc ex.InterptdIptr ex.InterptdWptr ex.InterptdCreg ex.InterptdBreg ex.InterptdAreg Purpose Exception handler instruction pointer. Interrupted trapped process status register. Interrupted trapped process task descriptor. Interrupted trapped process instruction pointer. Interrupted trapped process workspace pointer. Interrupted trapped process Creg. Interrupted trapped process Breg. Interrupted trapped process Areg. Table Exception control block 73/205 Initial exception handler state control block initial Wptr exception handler, locations word offsets from initial Wptr exception handler. initial Wptr address exception control block which address part entry exception vector table. Initial exception handler state When exception handler starts, Wptr address exception control block. work space exception handler normally below control block, like function procedure call, first actions exception handler code adjust Wptr downwards create space local variables using ajw. Wptr must adjusted back again before handler returns. initial Iptr exception handler value loaded from ex.HandlerIptr exception control block. state interrupted trapped process saved exception control block. exception idle trap then saved state state last descheduled process. Initially status register values shown Table 6.4. Field mac_count mac_buffer mac_scale mac_mode global_interrupt_enable local_interrupt_enable overflo underflo carry user_mode interrupt_mode trap_mode sleep reserved start_next_task timeslice_enable timeslice_count Value interrupted trapped process. interrupted trapped process. interrupted trapped process. interrupted trapped process. False exception trap, otherwise preserved. interrupted trapped process. False. False. False. False. True interrupts running, false otherwise. True exception trap, false otherwise. False. Undefined. False. False. interrupted trapped process. Table Exception handler initial status register exception handler interrupting user process, then address exception control block (i.e. user process state) left Tdesc. necessary, exception handler save this address. sequence nested interrupts 74/205 Exceptions occurred then this only that nested interrupts identify state user process. exception schedule_exception trap then trap handler also needs know process scheduled. descriptor process held exception vector table) saved SavedTaskDescriptor near bottom address space. Restrictions exception handlers Exception handlers cannot queued, they must deschedule. This means that following permitted inside exception handlers: stop instruction; timeslice instruction. Exception handlers nested arbitrary depth, they re-entrant, care should taken ensure that exception which caused handler cannot occur while handler running, trapped interrupted. Interrupts interrupts, whether from on-chip peripherals external pins, routed through interrupt controller, which normally on-chip peripheral. interrupt controller responsible arbitration between multiple interrupt signals. design interrupt controller varies between ST20 variants. Typically, interrupt priorities managed interrupt controller, which will usually track priority highest level task currently executed core, will interrupt ST20-C1 again higher priority interrupt occurs. When interrupt requested interrupt controller, ST20-C1 always changes context appropriate interrupt handler. interrupt request accompanied identifier interrupt handler, which exception level. ST20C1 scheduler uses exception level from interrupt controller start appropriate interrupt handler. also sets interrupt_mode status register clears trap_mode user_mode bits. interrupt_mode indicates that interrupt handler running, though have been trapped. When interrupt handler executes eret instruction, signals interrupt controller that handler returned. This allows interrupt controller keep track which interrupt handlers running, that start priority waiting interrupt when higher priority handler completes. interrupt controller requests interrupt level which null interrupt handler then signals controller that interrupt completed. 75/205 Traps Traps ST20-C1 system traps which software generated interrupts. level exception called user exception using ecall. This mechanism used system calls operating system. addition some special exception levels reserved system provide trapping breakpoints, scheduling events, illegal operations machine becoming idle. reserved `system' exception levels described section 6.1. system trap event occurs trap called exception level which null trap handler then ignores trap continues. When non-null trap handler started, trap_mode status register user_mode cleared. interrupt_mode altered. trap_mode indicates that trap handler currently executing, cleared trap handler interrupted. Setting exception handler create exception handler, exception work space area must created, with enough space exception handler's stack words work space exception control block, defined Table 6.3. minimum interrupt latency, interrupt handler control blocks should fast memory, preferably on-chip. normal work space control block exception handler shown Figure 6.2. control block, ex.HandlerIptr must initialized point entry point exception handler code. Exception Handler ex.HandlerIptr Interrupted trapped state words) Pointer Initial exception handler Wptr Exception handler code entry point Exception level (word offset) Exception vector table ExceptionBase Work space Figure Exception handler 76/205 Exceptions code exception handler access modify state interrupted trapped process. state interrupted trapped process stored exception control block, which located initial Wptr exception handler. exception handler must eret return interrupted trapped process. 6.8.1 Enabling disabling exceptions When exception handler been created initialized, exception enabled. address exception handler control block, ORed with exception type bit, must written exception vector table level exception. write entry control_block with type ExceptionProcessType exception vector table, following code used: control_block; ExceptionProcessType; ExceptionBase; stnl exception_level; interrupts, both global_interrupt_enable local_interrupt_enable bits status register must set. addition interrupt controller need initialized, including interrupt enable bits masks. trap disabled writing NotProcess into exception vector table. Interrupts disabled four ways: Clearing status register global_interrupt_enable disables interrupts until explicit write status register. Clearing status register local_interrupt_enable disables interrupts until current process descheduled. single exception level disabled writing NotProcess into exception vector table. interrupt controller will generally have means disabling interrupts individually globally writing interrupt controller registers. 77/205 Processes Multi-tasking This chapter describes features ST20-C1 core provided support multitasking, them. architecture ST20-C1, including registers memory arrangement, described Chapter Interrupts traps described Chapter full list constants data structures given Appendix Support provided instruction timeslicing, scheduling processes manipulating queues processes. Processes process (also known task thread) independent unit software with single thread control, i.e. sequential algorithm. number processes run. process which been started terminated several different states: executing CPU; interrupted trapped exception; inactive, i.e. waiting peripheral semaphore signal; waiting time. process that inactive said active. process that executing interrupted said descheduled. states main transitions shown Figure 7.1. scheduling transitions trapped that software scheduling kernel modify transitions change scheduling behavior, example providing system process priorities. started Waiting time Terminate Executing Timeslice Descheduled Interrupt trap Terminated Descheduled processes Active processes Event Return Inactive Interrupted trapped Figure Process states main transitions 78/205 Multi-tasking process state held memory registers. Sufficient register state must saved when process interrupted context switch occurs, that process reloaded continue execution later time. register state consists instruction pointer register; work space pointer register; task descriptor register; evaluation stack registers; status register. order save memory space context switch time, processes only descheduled when evaluation stack status register empty. This achieved only allowing processes deschedule certain instructions, called deschedule instructions, after which final values evaluation stack undefined status register reset default value. deschedule instructions stop timeslice. Table lists multi-tasking instructions. Mnemonic Name process Stop process Timeslice Load task descriptor Enqueue process Dequeue process stop timeslice ldtdesc enqueue dequeue Table Multi-tasking instructions Descheduled processes process waiting peripheral semaphore descheduled timeslice then evaluation stack saved. instruction pointer Wptr saved process descriptor block. task descriptor address process descriptor block. therefore identifies waiting process points saved state. task descriptor fixed address each process, unlike Wptr which changes code executes. When process running, task descriptor held Tdesc register. Word offset Slot name pw.Iptr pw.Wptr pw.Link Purpose process saved instruction pointer. process saved work space pointer. link next process queue. Table Process descriptor block 79/205 Queues structure process descriptor block shown Table 7.1. When process executing, contains saved work space pointer instruction pointer process, plus queue link process queue. Figure illustrates descheduled process. Process descriptor block Iptr Work space pointer Link Process local work space (stack) Code Next process Task descriptor Figure descheduled process Queues There number processes waiting execution, queue (i.e. linked list) waiting processes formed, called scheduling queue. This example general queue supported instruction queueing waiting processes. queue linked list process control blocks, formed links included process control blocks. Each link points control block next process queue unless last queue, which undefined. front back pointers queue held queue control block, shown Table 7.2. queue control block held memory, address block identifier queue Word offset Slot name q.BPtrLoc q.FPtrLoc Purpose back queue. front queue. Table Queue control block complete queue illustrated Figure 7.4. case scheduling queue, control block stored reserved address called SchedulerQptr (which 80/205 Multi-tasking value MostNeg) bottom memory space. Queue control block Back Front Front Iptr Wptr Iptr Wptr Iptr Wptr Back Iptr Wptr Process descriptor blocks Figure process queue Timeslicing Other recent searchesTLV5621I - TLV5621I TLV5621I Datasheet ST202 - ST202 ST202 Datasheet SM5T17 - SM5T17 SM5T17 Datasheet NX5032GA - NX5032GA NX5032GA Datasheet NX5032GA-16 - NX5032GA-16 NX5032GA-16 Datasheet LM4962 - LM4962 LM4962 Datasheet LM4951 - LM4951 LM4951 Datasheet KS9801 - KS9801 KS9801 Datasheet KS9803 - KS9803 KS9803 Datasheet KS9802 - KS9802 KS9802 Datasheet DR5001 - DR5001 DR5001 Datasheet RX6001 - RX6001 RX6001 Datasheet
Privacy Policy | Disclaimer |