If you know assembly programming but are not familiar with the PIC
family of microcontrollers you will find in these notes a concise
yet rather complete introduction to programming in assembly with PIC. If
you are a PIC programmer you may find useful the discussion below on orthogonal
assembly notation for addressing, its use in designing a simple macro
library to overcome PIC's asymmetric assembler notation and the program examples
and exercises at the end of this tutorial.
Note, however, that we do not address I/O programming at all, which
is the main objective to use a PIC microcontroller, to start with: controlling devices,
interrupts and many types of Real Time applications. You will
find plenty of excellent examples on the many PIC pages such as beginners check list and talking electronics.
So, take the word "complete" above, in a very restricted sense: data access methods, extended precision arithmetic and programming algorithms.
PIC architecture summary:
Some highlights of the PIC16F8X microcontroller architecture,
taken from the manufacturer's datasheet:
Harvard architecture with a separate program memory bus (14 bits
wide) for instructions and a data memory bus (8 bits wide).
RISC architecture with 35 instructions, each occupying a single
14 bit program memory word and a two-stage pipeline allowing most instructions
to be executed in a single cycle (the 16F8X models have 1K
program flash memory words on the chip; other models have up to 8K words).
internal ram memory implemented in two switcheable file register
banks with 80 bytes each (they are switched by bit 5 of the Status register;
other PIC models may have up to 4 banks); the first 12 file registers are special
purpose (and named Special File Registers or SFR), including the Status register
word, Program Counter (PC), interrupt control and timer.
64 bytes of EEPROM memory for storing constant data.
hardware controlled stack, 8 levels deep ( up to 8 nested subroutine calls)
13 bidirectional I/O pins.
5 types of internal and external interrupts, programmable timer and Watchdog
timer.
orthogonal instruction set "allowing any operation on any register using
any addressing mode".
PIC's Arithmetic and Logic Unit (ALU) is 8 bits wide and
has a single accumulator called the working register or W register.
The ALU is capable of addition, subtraction (two's complement) and logic
operations such as rotates, or, and, exclusive or, etc. Three bits
in the STATUS register (which is file register 03)
may be affected by these instructions: Z ( Zero), C (Carry)
and DC (Digit Carry, which is analogous to the Auxiliary Carry of
the 8085 and 8086 microprocessors).They are, respectively, Status register
bits 2, 0 and 1.
Two-operand arithmetic and logic instructions take W as one
operand and a file register or a literal (constant) as the
second operand. In the case of W and a file register as operands, one bit
in the instruction selects the destination of the result, which can
be either the working register W (value 0) or the file
register (value 1). This destination is generically called d
and specifically called w or f by the assembler. For example,
the instruction addwf fr1, w adds file register
fr1 and W leaving the result in W, while addwf fr1, f
does the same addition, but leaves the result in file register fr1.
This allows some unconventional operations such as
subwf fr1, w which
performs the operation: fr1 - w => w.
Three mov type instructions allow one to copy the value from
a file register to W (movf fr, w ), from W
to a file register (movwf fr), and to load a constant
orliteral into W (movlw k ). We found these
assembler mnemonics asymmetric and particularly confusing for the
beginner PIC programmer, for the reasons outlined in the next section.
2. Assembly language addressing paradigms
There are 2 widely used paradigms for addressing operands in assembly
languages:
the Intel paradigm (we called this way because it is used
by all Intel processors) codes a generic two operand opr instruction
(such as add) this way: opr dest, source with
the meaning: dest <= dest opr source (read this
as: dest becomes " dest opr source" )
the PDP11 paradigm (named after the venerable PDP11 minicomputer
of the early 70's and used by its successors from Motorola, the MCHC11
and MC68000 microprocessors) codes a similar instruction as:
opr source, dest
with the meaning: source opr dest => dest (read
this as: "source opr dest" goes to dest)
These two paradigms are equally convenient and natural if used in an orthogonal
(i. e., symmetric) way : every instruction with two operands
should use one of these two formats. The PIC16F8X adopts the
PDP11 paradigm (for the destination designator d is the second operand in a two register operand instruction) in a non-orthogonal way, however, as
the above three different mov instructions clearly show. It would
be much clearer to write, for example: mov fr, w,
mov w, fr and mov # literal, w (as did the PDP11). A simple
macro library that overcomes some of these problems can be found
here.
It extends the mov macro-instruction to include two distinct file
registers and includes a clever xchg fr1, fr2 macro that exchanges
the values of two file registers using only the accumulator W as a temporary
variable (adapted from a macro of Ivan Cenov). They may help a beginner programmer to think on the problem
he/she wants to solve instead of the assembler idiosyncrasies.
3. Instruction Set Summary
Most Instruction Set documents, including Microchip manuals, group PIC
instructions according to their physical format and not by their common
addressing modes or functions, which makes much easier learning and using
them. We have adopted this later approach, and divided PIC instructions
in the following groups:
Mov instructions - they copy a value from/to a file register or literal
to/from register W
Logic and arithmetic instructions with a file register and register W as
operands
Logic and arithmetic instructions with a literal and register W as
operands
One operand Logic and Arithmetic instructions
Branch, Skip, Call and Return instructions
Useful macros for conditional branches, logic and arithmetic operations
You should look carefully at this instruction summary document.
To test your first PIC programs an assembler and simulator are the ideal tools. You can download
from Microchip the excellent integrated editor, assembler and simulator MPLAB IDE for Windows .
4. Pointer or indirect addressing wih PIC
If you need to use arrays or more complex data structures such as lists
you will need pointer variables, which in most computer architectures
are implemented through register indirect addressing: in other words,
use the contents of a register as the address of some aggregate
data structure, and access the data indirectly through this register. PIC
has just one such register called the FSR register ( file register
04) which is used as an indirect address register in an also indirect
way: whenever you want to use the FSR register as a pointer, you use
the fictitious register INDF (which is file register 0 )
as one operand of your mov, arithmetic or logical instruction: the PIC
processor "takes the contents of the FSR register"as if
you had coded it directly in your instruction instead of INDF. It
seems weird, but it makes sense if you recall that PIC designers wanted
to code all instructions with a single 14 bit word (well, you may
argue, they could have designed PIC with a 15 or 16 bit instruction word
and reserved one bit for indirect file register addressing, turning any
file register into a potential pointer register, wouldn't that be great?
There are indeed 16 bit program word PIC models, but as far as I know, none
incorporates this feature!).
In any case, you can easily loop through a vector of bytes using
the FSR register and incrementing (or decrementing) it to point
to the next element and addressing the data element through INDF.
As an example, (adapted from PIC's datasheet) this program fragment fills
the 68 General Purpose Registers (GPR) addresses 0xC thru
0x4F, with the constant oxFF:
movlw 0xc
; oxc => w
movwf FSR
; 0xc => FSR
loop:
movlw 0x50
; 0x50 => W (last GPR number + 1)
clrf INDF
;clear memory at address (FSR)
decf INDF,1
; set memory at addr (FSR) to FF
incf FSR, 1
; FSR points to next file register
subwf FSR, w
; (FSR) - 50h => W
bnz loop
; if result # 0 goto loop
Exercise 1: change the above program fragment to fill the 68 GPR registers with the numbers 1, 2, ..,68.
As a more elaborate example of pointer addressing with INDF and FSR,
this program computes the first few elements of the Fibonacci
sequence (recall from your Math classes that the Fibonacci sequence
is computed using the last two elements to find the next one: you start
with the first two elements 0 and 1 and next you get: 1, 2, 3, 5,
8, 13, 21, 34, and so on). The xchg macro fits nicely into this
example.
You can also look at the program code below: count, f0 and f1 are scratchpad
variables; computed Fibonacci numbers are stored in a table starting at
file register fib; f0 and f1 store the last two computed Fibonacci numbers;
up to 12 Fibonacci numbers numbers can be computed with 8 bit precision.
Computing the first 12 Fibonacci numbers:
movlw fib
; table address => w
movwf FSR
; table address => FSR
movl d'12', w
; compute 12 Fibonacci numbers
mov w, count
; count them,
clrf f0
; 1st Fibonacci number is 0
clrf f1
incf f1
; 2nd Fibonacci number is 1
loop:
mov f0, w
; f0 =>w
add f1, w
; f0+f1 =>w
movwf INDF
; store f0 + f1 in current table entry
xchg f1, w
; f1=> w, f0+f1 =>f1
mov w, f0
; move previous f1 value to f0
incf FSR
; FSR points no next table entry
decbnz count,loop
;count-1 => count, if # 0 goto loop
Exercise 2: extend this program to compute Fibonacci numbers with
16 bit precision; for this purpose write a 16 bit addition subroutine; detect the 16 bit sum overflow
in order to end your loop (therefore you don't need to count the 23 Fibonacci numbers that fit in 16 bits).
5. Using program memory to store data tables
The PIC 16F8X has a relatively large program memory (1K 14 bit words) compared
to only 2x68 bytes of ram. It would be nice if we could use part of the
program memory to store tables of read only data. This can be easily done
if the table is small enough to fit in a 256 byte "page boundary" (an address
multiple of 256). If you look at the PIC instruction set you will find
a useful return instruction called retlw k which loads W with a
literal k before returning to the calling program (popping the Program Counter
from the hardware stack); this gives a convenient and fast way to
return a value from a routine call. Well, this instruction can do the trick
if we fill our program memory table with up to 256 such return instructions,
each containing the desired constant, and using this table as a "call and
jump table". We will pay 6 extra bits for each constant, but our program
memory may have enough free space, anyway. How can we index into this table
to read an entry value? The solution lies in the fact that the 8 least
significant bits of the Program Counter (which, by the way, is 13 bits
wide, but only 11 can be used in the 16F8X PIC model) are stored
in file register 2 (called PCL). Now, suppose that
our jump table starts at a 256 byte page boundary -1 (call that address
mytable) in your assembler program, and that we want to read the
value of an entry whose index we have loaded in W. This can be done if
at the address
mytable we code the instruction addwf PCL, 1 (which
adds W to PCL). In our program, we should execute the following instructions:
movlw HIGH (mytable +1)
; get the high order bits of the first entry address into W,
movewf PCLATH
; and store in this special FSR to concatenate later with PCL
mov index, w
; put index into W
call mytable
; should return in W the desired table entry
When the instruction call mytable is executed the following actions
take place:
The PC (containing the address of the instruction after
the call) is pushed into the hardware stack,
The PC is loaded with the 11 bit constant embedded in the calling instruction
and that is the address of the instruction at mytable; at the same time
PCL is loaded with the 8 least significant bits of this address. The PC 5
most significant bits are not loaded into the special register
PCLATH (file register 0Ah); that 's why we had to do it previously
The instruction addwf PCL, 1 at mytable is executed. At this
point (just before this execution) the PC has been incremented to
point to the next instruction which is the first instruction of our jump
table, and PCL has been adjusted accordingly. Executing the instruction
addwf
PCL, 1 adds W to PCL and loads the PC with the contents
of PCLATH concatenated with the contents of PCL, leading the processor
to fetch the desired table entry where the retlw k instruction
returns in W the required entry value! Because this is an 8 bit addition
and not an 11 bit addition, our table cannot extend beyond the current
256 page boundary without further calculations in the initial setup just
before the call mytable instruction.
Exercise 3: suppose your table spans multiple 256 byte pages and its
index is computed with 16 bit precision. Modify the above setup calculations
in order to retrieve the required table entry. As a further enhancement allow
your jump table to start at any memory address and not only at a 256 byte page boundary
(you will need this if you decide to go on and work on exercise 6 at the end of these notes!)
Let's apply this technique in a complete example, the solution of the so called "Maximum Sum Subvector Problem: given a vector of 8 bit signed integers randomly distributed, find a subvector of consecutive elements with maximum sum". It is simple to devise an algorithm with computation effort proportional to the cube of the number n of elements in the vector (this is called by computer scientists an "O(n**3) solution" ) but it is not trivial to devise a linear time algorithm (i.e. O(n)). The following deceptively simple algorithm (written in C) is such a solution (try it!):
Linear Time Maximum Sum Subvector Algorithm:
void main()
{
int i,j,start,end,csum,maxsum;
char tab[TMAX];
rand8(tab,TMAX); /*initialize vector with random signed 8 bit integers*/
csum=maxsum=start=0; end=-1;
for (i=0, j=0; j < TMAX; j++){
csum= csum + tab[j];
if (csum> maxsum){
maxsum=csum;
start= i;
end=j;
}else
if (csum < 0){
i= j+1;
csum=0;
}
}
Our goal is to rewrite this algorithm in PIC assembly language, initializing the vector of random signed integers as a table in PIC's program memory (on a tiny PIC 16F8X we could have a table with more than 900 entries!). This exercise will illustrate several important general assembly language programming techniques:
loop control based on a counter variable (the C for statement)
the C statement csum= csum + tab[j] requires to sign extend to 16 bits the value tab[j] and a 16 bit addition to compute csum (if we do it with 8 bits we could easily get an overflow); see the subroutine sum in the assembly code.
the comparison in if (csum> maxsum) requires a 16 bit two's complement signed integer comparison routine (subroutine cmpcsummaxsum)
the if statements require several conditional branches in assembly code
several 8 bit and 16 bit variable assignment statements (trivial but tedious to code, our macro-instruction mov fr1, fr2 comes handy here).
Although it is unlikely you will ever find a practical use for this algorithm in a real PIC application, the sub-problems listed above certainly will arise in many real applications, and is the main reason to include the program in this tutorial.
I also think it is more didactic to show a complete small structured program than code fragments (it is more fun anyway, and this was indeed the first non trivial program I wrote in PIC).
With the above explanations, the code should now be simple to follow:
the instructions from init: to loop: initialize the program variables,
the C for loop goes from label loop: till instruction b loop,
the first 3 instructions after loop check if we have iterated j from 0 to TMAX,
the sum and cmpcsummaxsum subroutines do the more complex computations in two's complement arithmetic; read them carefully;
the rest of the code fill the several conditional branches needed by the two C if statements: look at the assembly line comments to check the corresponding C source code.
Exercise 4: we have cheated a bit when we said that the subroutine cmpcsummaxsum
compares two signed 16 bit integers, when in reality the second integer (maxsum) is always >= 0, and we took advantage of that to make our code faster. Rewrite this subroutine so that it can compare two arbitrary signed 16 bit integers. You can start looking at some 16 bit unsigned comparison subroutines.
Exercise 4.1: write a 16 bit signed subtraction subroutine. Look here for a 16 bit subtraction subroutine. Test your subroutine computing the Fibonacci numbers backwards, i.e., start with two consecutive Fibonacci numbers (for example, 46368 and 28657), and subtract backwards until you reach number 0.
Exercise 4.2: write a 16 bit signed multiplication subroutine. Test it computing the successive powers of a small negative integer (say, -3), which gives alternately positive and negative integers. For this purpose you could extend this 8 bit unsigned multiplication routine found in the MPLAB installation.
Exercise 5 :
Compile the above C program with your preferred C compiler (you can download the maxsum.c program, written for the free, old, but still useful Borland C 2.0) and test it with small vectors. Make sure it gives the correct answers.
Test your program with a vector of 256 entries. Show the start, end and maxsum results in hexadecimal.
Change your C program in order to generate the vector of random signed integers in the format required by PIC's assembler tab jump table, as in our example (recall that the assembler default for constants is hexadecimal; you can change that with the radix dec assembler directive). Write this output to a file.
Copy and paste the above file into your PIC assembly source code where the tab table is.
Assemble and run your program. Check the results against your C program results: they should be the same.
Exercise 6 (this should be fun!):
If the above exercise was not a real challenge for you, allow your jump table to start at any address and to be as large as possible (900 bytes, say); you should modify the assembler program so that all table indices are now 16 bits wide. Make sure csum does not overflow 16 bits in your C tests before generating the assembler table (let me know if it works!).