The string instructions facilitate operations on sequences of bytes or words. None of them take an explicit operand; instead, they all work implicitly on the source and/or destination strings. The current element (byte or word) of the source string is at DS:SI, and the current element of the destination string is at ES:DI. Each instruction works on one element and then automatically adjusts SI and/or DI; if the Direction flag is clear, then the index is incremented, otherwise it is decremented (when working with overlapping strings it is sometimes necessary to work from back to front, but usually you should leave the Direction flag clear and work on strings from front to back).
To work on an entire string at a time, each string instruction can be accompanied by a repeat prefix, either REP or one of REPE and REPNE (or their synonyms REPZ and REPNZ). These cause the instruction to be repeated the number of times in the count register, CX; for REPE and REPNE, the Zero flag is tested at the end of each operation and the loop is stopped if the condition (Equal or Not Equal to zero) fails.
The MOVSB and MOVSW instructions have the following forms:
MOVSB
REP MOVSB
MOVSW
REP MOVSW
The first form copies a single byte from the source string, at address
DS:SI, to the destination string, at address ES:DI, then increments (or
decrements, if the Direction flag is set) both SI and DI. The second
form performs this operation and then decrements CX; if CX is not zero,
the operation is repeated. The effect is equivalent to the following
pseudo-C code:
while (CX != 0) {
*(ES*16 + DI) = *(DS*16 + SI);
SI++;
DI++;
CX--;
}
(recall that ES*16 + DI is the physical address corresponding
to the segment and offset ES:DI). The remaining two forms move a word
at a time, instead of a single byte; correspondingly, SI and DI are
incremented or decremented by 2 each time through the loop.The STOSB and STOSW instructions are similar to MOVSB and MOVSW, except the source byte or word comes from AL or AX instead of the memory address in DS:SI. For example, the following is a very fast way to initialize the block of memory from ES:1000h to ES:4FFFh with zeroes:
MOV DI, 1000h ;Starting address
MOV CX, 2000h ;Number of words
MOV AX, 0 ;Word to store at each location
CLD ;Make sure direction is increasing
REP STOSW ;Perform the initialization
Correspondingly, the LODSB and LODSW instructions are
variations on the move instructions where the destination is the
accumulator (instead of the memory address in ES:DI). These are not
very useful operations with the repeat prefix; instead, they are used as
part of larger loops to perform more complex string processing. For
example, here is a program fragment that will convert the NUL-terminated
string starting at the address in DX to be all lower-case (there is a
faster way to do the conversion of each character, using the
XLATB instruction, but that is not the point here):
MOV SI, DX ;Initialize source
MOV DI, DX ; and destination indices
MOV AX, DS ;Copy DS (source segment)
MOV ES, AX ; into ES (destination segment)
CLD
NextCh LODSB ;Load next character into AL
CMP AL, 'A'
JB NotUC ;Jump if below 'A'
CMP AL, 'Z'
JA NotUC ; or above 'Z'
ADD AL, 'a' - 'A' ;Convert UC to lc
NotUC STOSB ;Store modified character back
CMP AL, 0
JNE NextCh ;Do next character if not at end of string
None of the preceding string operations have any effect on the status
flags. By contrast, the remaining two string operations are executed
solely for their effect on the status flags, just like the
CMP operation on numbers. The CMPSB and CMPSW
operations compare the current bytes or words of the source and
destination strings by subtracting the destination from the source and
recording the properties of the result in FLAGS. The SCASB and
SCASW operations are the variants of this that use the
accumulator (AL or AX) for the source. Each of these may be preceded by
either of the repeat prefixes REPE or REPNE, which cause
the operation to be repeated up to CX times, as long as the condition
holds true after each iteration. Here is the corresponding pseudo-C for
REPE CMPSB:
while (CX != 0) {
SetFlags(*(DS*16 + SI) - *(ES*16 + DI));
SI++;
DI++;
CX--;
if (!ZeroFlag) break;
}
A common use of the REPNE SCASB instruction is to find the length
of a NUL-terminated string. Here is an example:
MOV DI, DX ;Starting address in DX (assume ES = DS)
MOV AL, 0 ;Byte to search for (NUL)
MOV CX, -1 ;Start count at FFFFh
CLD ;Increment DI after each character
REPNE SCASB ;Scan string for NUL, decrementing CX for each char
MOV AX, -2 ;CX will be -2 for length 0, -3 for length 1, ...
SUB AX, CX ;Length in AX