META II A SYNTAX-ORIENTED COMPILER WRITING LANGUAGE D. V. Schorre UCLA Computing Facility META II is a compiler writing language which consists of syntax equations resembling Backus normal form and into which instructions to output assembly language commands are inserted. Compilers have been written in this language for VALGOL I and VALGOL II. The former is a simple algebraic language designed for the purpose of illustrating META II. The latter contains a fairly large subset of ALGOL 60. The method of writing compilers which is given in detail in the paper may be explained briefly as follows. Each syntax equation is translated into a recursive subroutine which tests the input string for a particular phrase structure, and deletes it if found. Backup is avoided by the extensive use of factoring in the syntax equations. For each source language, an interpreter is written and programs are compiled into that interpretive language. META II is not intended as a standard language which everyone will use to write compilers. Rather, it is an example of a simple working language which can give one a good start in designing a compiler-writing compiler suited to his own needs. Indeed, the META II compiler is written in its own language, thus lending itself to modification. History The basic ideas behind META II were described in a series of three papers by Schmidt[1], Metcalf[2], and Schorre[3]. These papers were presented at the 1963 National A.C.M. Convention in Denver, and represented the activity of the Working Group on Syntax-Directed Compilers of the Los Angeles SIGPLAN. The methods used by that group are similar to those of Glennie and Conway, but differ in one important respect. Both of these researchers expressed syntax in the form of diagrams, which they subsequently coded for use on a computer. In the case of META II, the syntax is input to the computer in a notation resembling Backus normal fore. The method of syntax analysis discussed in this paper is entirely different from the one used by Irons[6] and Bastian[7]. All of these methods can be traced back to the mathematical study of natural languages, as described by Chomsky[8]. Syntax Notation The notation used here is similar to the meta language of the ALGOL 60 report. Probably the main difference is that this notation can be keypunched. Symbols in the target language are represented as strings of characters, surrounded by quotes. Metalinguistic variables have the same form as identifiers in ALGOL, viz., a letter followed by a sequence of letters or digits. Items are written consecutively to indicate concatenation and separated by a slash to indicate alternation. Each equation ends with a semicolon which, due to keypunch limitations, is represented by a period followed by a comma. An example of a syntax equation is: LOGICALVALUE = '.TRUE' / '.FALSE' ., In the versions of ALGOL described in this paper the symbols which are usually printed in boldface type will begin with periods, for example: .PROCEDURE .TRUE .IF To indicate that a syntactic element is optional, it may be put in alternation with the word .EMPTY. For example: SUBSECONDARY = '*' PRIMARY / .EMPTY ., SECONDARY = PRIMARY SUBSECONDARY ., By factoring, these two equations can be written as a single equation. SECONDARY = PRIMARY( '*' PRIMARY / .EMPTY) ., Built into the META II language is the ability to recognize three basic symbols which are: 1. Identifiers -- represented by .ID, 2. Strings -- represented by .STRING, 3. Numbers -- represented by .NUMBER. The definition of identifier is the same in META II as in ALGOL, viz., a letter followed by a sequence of letters or digits. The definition of a string is changed because of the limited character set available on the usual keypunch. In ALGOL, strings are surrounded by opening and closing quotation marks, making it possible to have quotes within a string. The single quotation mark on the keypunch is unique, imposing the restriction that a string in quotes can contain no other quotation marks. The definition of number has been radically changed. The reason for this is to cut down on the space required by the machine subroutine which recognizes numbers. A number is considered to be a string of digits which may include imbedded periods, but may not begin or end with a period; moreover, periods may not be adjacent. The use of the subscript 10 has been eliminated. Now we have enough of the syntax defining features of the META II language so that we can consider a simple example in some detail. The example given here is a set of four syntax equations for defining a very limited class of algebraic expressions. The two operators, addition and multiplication, will be represented by + and * respectively. Multiplication takes precedence over addition; otherwise precedence is indicated by parentheses. Some examples are : A A + B A + B * C (A + B) * C The syntax equations which define this class of expressions are as follows: EX3 = .ID / '(' EX1 ')' ., EX2 = EX3 ('*' EX2 / .EMPTY) ., EX1 = EX2 ('+' EX1 / .EMPTY) ., EX is an abbreviation for expression. The last equation, which defines an expression of order 1, is considered the main equation. The equations are read in this manner. An expression of order 3 is defined as an identifier or an open parenthesis followed by an expression of order 1 followed by a closed parenthesis. An expression of order 2 is defined as an expression of order 3, which may be followed by a star which is followed by an expression of order 2. An expression of order 1 is defined as an expression of order 2, which may be followed by a plus which is followed by an expression of order 1. Although sequences can be defined recursively, it is more convenient and efficient to have a special operator for this purpose. For example, we can define a sequence of the letter A as follows: SEQA = $ 'A' ., The equations given previously are rewritten using the sequence operator as follows: EX3 = .ID / '(' EX1 ')' ., EX2 = EX3 $ ('*' EX3) ., EX1 = EX2 $ ('+' EX2) ., Output Up to this point we have considered the notation in META II which describes object language syntax. To produce a compiler, output commands are inserted into the syntax equations. Output from a compiler written in META II is always in an assembly language, but not in the assembly language for the 1401. It is for an interpreter, such as the interpreter I call the META II machine, which is used for all compilers, or the interpreters I call the VALGOL I and VALGOL II machines, which obviously are used with their respective source languages. Each machine requires its own assembler, but the main difference between the assemblers is the operation code table. Constant codes and declarations may also be different. These assemblers all have the same format, which is shown below. LABEL CODE ADDRESS 1- -6 8- -10 12- -70 An assembly language record contains either a label or an op code of up to 3 characters, but never both. A label begins in column 1 and may extend as far as column 70. If a record contains an op code, then column 1 must be blank. Thus labels may be any length and are not attached to instructions, but occur between instructions. To produce output beginning in the op code field, we write .OUT and then surround the information to be reproduced with parentheses. A string is used for literal output and an asterisk to output the special symbol just found in the input. This is illustrated as follows: EX3 = .ID .OUT('LD ' *) / '(' EX1 ')' ., EX2 = EX3 $ ('*' EX3 .OUT('MLT')) ., EX1 = EX2 $ ('+' EX2 .OUT('ADD')) ., To cause output in the label field we write .LABEL followed by the item to be output. For example, if we want to test for an identifier and output it in the label field we write: .ID .LABEL * The META II compiler can generate labels of the form A01, A02, A03, ... A99, B01, .... To cause such a label to be generated, one uses *1 or *2. The first time *1 is referred to in any syntax equation, a label will be generated and assigned to it. This same label is output when ever *1 is referred to within that execution of the equation. The symbol *2 works in the same way. Thus a maximum of two different labels may be generated for each execution of any equation. Repeated executions, whether recursive or externally initiated, result in a continued sequence of generated labels. Thus all syntax equations contribute to the one sequence. A typical example in which labels are generated for branch commands is now given. IFSTATEMENT = '.IF' EXP '.THEN' .OUT('BFP' *1) STATEMENT '.ELSE' .OUT('B ' *2) .LABEL *1 STATEMENT .LABEL *2 ., The op codes BFP and B are orders of the VALGOL I machine, and stand for "branch false and pop" and "branch" respectively. The equation also contains references to two other equations which are not explicitly given, viz., EXP and STATEMENT. VALGOL I - A Simple Compiler Written in META II Now we are ready for an example of a compiler written in META II. VALGOL I is an extremely simple language, based on ALGOL 60, which has been designed to illustrate the META II compiler. The basic information about VALGOL I is given in figure 1 (the VALGOL I compiler written in META II) and figure 2 (order list of the VALGOL I machine). A sample program is given in figure 3. After each line of the program, the VALGOL I commands which the compiler produces from that line are shown, as well as the absolute interpretive language produced by the assembler. Figure 4 is output from the sample program. Let us study the compiler written in META II (figure 1) in more detail. The identifier PROGRAM on the first line indicates that this is the main equation, and that control goes there first. The equation for PRIMARY is similar to that of EX3 in our previous example, but here numbers are recognized and reproduced with a "load literal" command. TERM is what was previously EX2; and EXP1 what was previously EX1 except for recognizing minus for subtraction. The equation EXP defines the relational operator "equal", which produces a value of 0 or 1 by making a comparison. Notice that this is handled Just like the arithmetic operators but with a lower precedence. The conditional branch commands, "branch true and pop" and "branch false and pop", which are produced by the equations defining UNTILST and CONDITIONALST respectively, will test the top item in the stack and branch accordingly. The "assignment statement" defined by the equation for ASSIGNST is reversed from the convention in ALGOL 60, i.e., the location into which the computed value is to be stored is on the right. Notice also that the equal sign is used for the assignment statement and that period equal (.=) is used for the relation discussed above. This is because assignment statements are more numerous in typical programs than equal compares, and so the simpler representation is chosen for the more frequently occurring. The omission of statement labels from the VALGOL I and VALGOL II seems strange to most programmers. This was not done because of any difficulty in their implementation, but because of a dislike for statement labels on the part of the author. I have programmed for several years without using a single label, so I know that they are superfluous from a practical, as well as from a theoretical, standpoint. Nevertheless, it be too much of a digression to try to justify this point here. The "until statement" has been added to facilitate writing loops without labels. The "conditional" statement is similar to the one in ALGOL 60, but here the "else" clause is required. The equation for "input/output", IOST, involves two commands, "edit" and "print". The words EDIT and PRINT do not begin with periods so that they will look like subroutines written in code. "EDIT" copies the given string into the print area, with the first character in the print position which is computed from the given expression. "PRINT" will print the current contents of the print area and then clear it to blanks. Giving a print command without previous edit commands results in writing a blank line. IDSEQ1 and IDSEQ are given to simplify the syntax equation for DEC (declaration). Notice in the definition of DEC that a branch is given around the data. From the definition of BLOCK it can be seen that what is considered a compound statement in ALGOL 60 is, in VALGOL I, a special case of a block which has no declaration. In the definition of statement, the test for an IOST precedes that for an ASSIGNST. This is necessary, because if this were not done the words PRINT and EDIT would be mistaken as identifiers and the compiler would try to translate "input/output" statements as if they were "assignment" statements. Notice that a PROGRAM is a block and that a standard set of commands is output after each program. The "halt" cowhand causes the machine to stop on reaching the end of the outermost block, which is the program. The operation code SP is generated after the "halt" command. This is a completely 1401-oriented code, which serves to set a word mark at the end of the program. It would not be used if VALGOL I were implemented on a fixed word-length machine. How the META II Compiler Was Written Now we come to the most interesting part of this project, and consider how the META II compiler was written in its own language. The interpreter called the META II machine is not a much longer 1401 program than the VALGOL I machine. The syntax equations for META II (figure 5) are fewer in number than those for the VALGOL I machine (figure 1). The META II compiler, which is an interpretive program for the META II machine, takes the syntax equations given in figure 5 and produces an assembly language version of this same interpretive program. Of course, to get this started, I had to write the first compiler-writing compiler by hand. After the program was running, it could produce the same program as written by hand. Someone always asks if the compiler really produced exactly the program I had written by hand and I have to say that it was "almost" the same program. I followed the syntax equations and tried to write just what the compiler was going to produce. Unfortunately I forgot one of the redundant instructions, so the results were not quite the same. Of course, when the first machine-produced compiler compiled itself the second time, it reproduced itself exactly. The compiler originally written by hand was for a language called META I. This was used to implement the improved compiler for META II. Sometimes, when I wanted to change the metalanguage, I could not describe the new metalanguage directly in the current metalanguage. Then an intermediate language was created -- one which could be described in the current language and in which the new language could be described. I thought that it might sometimes be necessary to modify the assembly language output, but it seems that it is always possible to avoid this with the intermediate language. The order list of the META II machine is given in figure 6. All subroutines in META II programs are recursive. When the program enters a subroutine a stack is pushed down by three cells. One cell is for the exit address and the other two are for labels which may be generated during the execution of the subroutine. There is a switch which may be set or reset by the instructions which refer to the input string, and this is the switch referred to by the conditional branch commands. The first thing in any META II machine program is the address of the first instruction. During the initialization for the interpreter, this address is placed into the instruction counter. VALGOL II Written in META II VALGOL II is an expansion of VALGOL I, and serves as an illustration of a fairly elaborate programming language implemented in the META II system. There are several features in the VALGOL II machine which were not present in the VALGOL I machine, and which require some explanation. In the VALGOL II machine, addresses as well as numbers are put in the stack. They are marked appropriately so that they can be distinguished at execution time. The main reason that addresses are allowed in the stack is that, in the case of a subscripted variable, an address is the result of a computation. In an assignment statement each left member is compiled into a sequence of code which leaves an address on top of the stack. This is done for simple variables as well as subscripted variables, because the philosophy of this compiler writing system has been to compile everything in the most general way. A variable, simple or subscripted, is always compiled into a sequence of instructions which leaves an address on top of the stack. The address is not replaced by its contents until the actual value of the variable is needed, as in an arithmetic expression. A formal parameter of a procedure is stored either as an address or as a value which is computed when the procedure is called. It is up to the load command to go through any number of indirect address in order to place the address of a number onto the stack. An argument of a procedure is always an algebraic expression. In case this expression is a variable, the value of the formal parameter will be an address computed upon entering the procedure; otherwise, the value of the formal parameter will be a number computed upon entering the procedure. The operation of the load command is now described. It causes the given address to be put on top of the stack. If the content of this top item happens to be another address, then it is replaced by that other address. This continues until the top item on the stack is the address of something which is not an address. This allows for formal parameters to refer to other formal parameters to any depth. No distinction is made between integer and real numbers. An integer is just a real number whose digits right of the decimal point are zero. Variables initially have a value called "undefined", and any attempt to use this value will be indicated as an error. An assignment statement consists of any number of left parts followed by a right part. For each left part there is compiled a sequence of commands which puts an address on top of the stack. The right part is compiled into a sequence of instructions which leaves on top of the stack either a number or the address of a number. Following the instruction for the right part there is a sequence of store commands, one for each left part. The first command of this sequence is "save and store", and the rest are "plain" store commands. The "save and store" puts the number which is on top of the stack (or which is referred to by the address on top of the stack) into a register called SAVE. It then stores the contents of SAVE in the address which is held in the next to top position of the stack. Finally it pops the top two items, which it has used, out of the stack. The number, however, remains in SAVE for use by the following store commands. Most assignment statements have only one left part, so "plain" store commands are seldom produced, with the result that the number put in SAVE is seldom used again. The method for calling a procedure can be explained by reference to illustrations 1 and 2. Illustration 1: Storage Map for VALGOL II Procedures XXXXXXXX Function XXXXXXXX Arguments XXXXXXXX ........ XXXXXXXX b Word of one blank character to mark the end of the arguments. ........ Body. Branch commands cause control to go around data ........ stored in this area. R Ends with a "return" command. Illustration 2: Map of the Stack Relating to Procedure Calls XXXXXXXX Arguments in reverse order XXXXXXXX ........ XXXXXXXX XXX Flag XXX Address of Exit XXX ........ procedure ........ ........ ........ Stack before executing Stack after executing the call instruction the call instruction The arguments which are in the stack are moved to their place at the top of the procedure. If the number of arguments in the stack does not correspond to the number of arguments in the procedure, an error is indicated. The "flag" in the stack works like this. In the VALGOL II machine there is a FLAG register. To set a flag in the stack, the contents of this register is put on top of the stack, then the address of the word above the top of the stack is put into the FLAG register. Initially, and whenever there are no flags in the stack, the FLAG register contains blanks. At other times it contains the address of the word in the stack which is just above the uppermost flag. Just before a call instruction is executed, the FLAG register contains the address of the word in the stack which is two above the word containing the address of the procedure to be executed. The call instruction picks up the arguments from the stack, beginning with the one stored just above the flag, and continuing to the top of the stack. Arguments are moved into the appropriate places at the top of the procedure being called. An error message is given if the number of arguments in the stack does not correspond to the number of places in the procedure. Finally the old flag address, which is just below the procedure address in the stack, is put in the FLAG register. The exit address replaces the address of the procedure in the stack, and all the arguments, as well as the flag, are popped out. There are just two op codes which affect the FLAG register. The code "load flag" puts a flag into the stack, and the code "call" takes one out. The library function "WHOLE" truncates a real number. It does not convert a real number to an integer, because no distinction is made between them. It is substituted for the recommended function "ENTIER" primarily because truncation takes fewer machine instructions to implement. Also, truncation seems to be used more frequently. The procedure ENTIER can be defined in VALGOL II as follows: .PROCEDURE ENTIER(X) ., .IF 0 .L= X .THEN WHOLE (X) .ELSE .IF WHOLE(X) = X .THEN X .ELSE WHOLE(X) -1 The "for statement" in VALGOL II is not the same as it is in ALGOL. Exactly one list element is required. The "step .. until" portion of the element is mandatory, but the "while" portion may be added to terminate the loop immediately upon some condition. The iteration continues so long as the value of the variable is less than or equal to the maximum, irrespective of the sign of the increment. Illustration 3 is an example of a typical "for statement". A flow chart of this statement is given in illustration 4. Illustration 3: Compilation of a typical "for statement" in VALGOL II FOR I = 0 .STEP 1 .UNTIL N .DO SET Set switch to indicate first time through. A91 LD I FLP Test for first time through. BFP A92 LDL 0 Initialize variable. SST B A93 A92 LDL 1 Increment variable. ADS A93 RSR Compare variable to maximum. LD N LEQ BFP A94 RST Reset switch to indicate not first time through. B A91 A94 Illustration 4: Flow chart of the "for statement" given in figure 12 [... omitted ...] Figure 7 is a listing of the VALGOL II compiler written in META II. Figure 8 gives the order list of the VALGOL II machine. A sample program to take a determinant is given in figure 9. Backup vs. No Backup Suppose that, upon entry to a recursive subroutine, which represents some syntax equation, the position of the input and output are saved. When some non-first term of a component is not found, the compiler does not have to stop with an indication of a syntax error. It can back-up the input and output and return false. The advantages of backup are as follows: 1. It is possible to describe languages, using backup, which cannot be described without backup. 2. Even for a language which can be described without backup, the syntax equations can often be simplified when backup is allowed. The advantages claimed for non-backup are as follows: 1. Syntax analysis is faster. 2. It is possible to tell whether syntax equations will work just by examining them, without following through numerous examples. The fact that rather sophisticated languages such as ALGOL and COBOL can be implemented without backup is pointed out by various people, including Conway[5], and they are aware of the speed advantages of so doing. I have seen no mention of the second advantage of no-backup, so I will explain this in more detail. Basically one writes alternations in which each term begins with a different symbol. Then it is not possible for the compiler to go down the wrong path. This is made more complicated because of the use of ".EMPTY". An optional item can never be followed by something that begins with the same symbol it begins with. The method described above is not the only way in which backup can he handled. Variations are worth considering, as a way may be found to have the advantages of both backup and no-backup. Further Development of META Languages As mentioned earlier, META II is not presented as a standard language, but as a point of departure from which a user may develop his own META language. The term "META Language," with "META" in capital letters, is used to denote any compiler-writing language so developed. The language which Schmidt[1] implemented on the PDP-1 was based an META I. He has now implemented an improved version of this language for a Beckman machine. Rutman[9] has implemented LOGIK, a compiler for bit-time simulation, on the 7090. He uses a META language to compile Boolean expressions into efficient machine code. Schneider and Johnson[10] have implemented META 3 on the IBM 7094, with the goal of producing an ALGOL compiler which generates efficient machine code. They are planning a META language which will be suitable for any block structured language. To this compiler-writing language they give the name META 4 (pronounced metaphor). References 1. Schmidt, L., "Implementation of a Symbol Manipulator for Heuristic Translation," 1963 ACM Natl. Conf., Denver, Colo. 2. Metcalfe, Howard, "A Parameterized Compiler Based on Mechanical Linguistics," 1963 ACM Natl. Conf., Denver, Colo. 3. Schorre, Val, " A Syntax - Directed SMALGOL for the 1401," 1963 ACM Natl. Conf., Denver, Colo. 4. Glennie, A., "On the Syntax Machine and the Construction of a Universal Compiler," Tech. Report No. 2, Contract NR 049-141, Carnegie Inst. of Tech., July, 1960. 5. Conway, Melvin E., "Design of a Separable Transition-Diagram Compiler," Comm. ACM, July 1963. 6. Irons, E . T ., "The Structure and Use of the Syntax-Directed Compiler," Annual Review in Automatic Programming, The Macmillan Co., New York. 7. Bastian, Lewis, "A Phrase-Structure language Translator," AFCRL-Rept-62-549, Aug. 1962. 8. Chomsky, Noam, "Syntax Structures," Mouton and Co., Publishers, The Hague, Netherlands. 9. Rutman, Roger, "LOGIK, A Syntax Directed Compiler for Computer Bit-Time Simulation," Master Thesis, UCLA, August 1964. 10. Schneider, F. W., and G. D. Johnson, "A Syntax-Directed Compiler-Writing Compiler to Generate Efficient Code," 1964 ACM Natl. Conf., Philadelphia. The VALGOL I Compiler Written in META II Language FIGURE 1 .SYNTAX PROGRAM PRIMARY = .ID .OUT('LD ' *) / .NUMBER .OUT('LDL' *) / '(' EXP ')' ., TERM = PRIMARY $('*' PRIMARY .OUT('MLT') ) ., EXP1 = TERM $('+' TERM .OUT('ADD') / '-' TERM .OUT('SUB') ) ., EXP = EXP1 ( '.=' EXP1 .OUT ('EQU') / .EMPTY) ., ASSIGNST = EXP '=' .ID .OUT('ST ' *) ., UNTILST = '.UNTIL' .LABEL *1 EXP '.DO' .OUT('BTP' *2) ST .OUT('B ' *1) .LABEL *2 ., CONDITIONALST = '.IF' EXP '.THEN' .OUT('BFP' *1) ST '.ELSE' .OUT('B ' *2) .LABEL *1 ST .LABEL *2 ., IOST = 'EDIT' '(' EXP ',' .STRING .OUT('EDT' *) ')' / 'PRINT' .OUT('PNT') ., IDSEQ1 = .ID .LABEL * .OUT('BLK 1') ., IDSEQ = IDSEQ1 $(',' IDSEQ1) ., DEC = '.REAL' .OUT('B ' *1) IDSEQ1 .LABEL *1 ., BLOCK = '.BEGIN' (DEC '.,' / .EMPTY) ST $('.,' ST) '.END' ., ST = IOST / ASSIGNST / UNTILST CONDITIONALST / BLOCK ., PROGRAM = BLOCK .OUT('HLT') .OUT('SP 1') .OUT('END') ., .END Order List of the VALGOL I Machine FIGURE 2 Machine Codes LD AAA load Put the contents of the address AAA on top of the stack. LDL NUMBER load literal Put the given NUMBER on top of the stack. ST AAA store Store the number which is on top of the stack into the address AAA and pop up the stack. ADD add Replace the two numbers which are on top of the stack with their sum. SUB subtract Subtract the number which is on top of the stack from the number which is next to the top, then replace them by this difference. MLT multiply Replace the two numbers which are on top of the stack with their product. EQU equal Compare the two numbers on top of the stack. Replace them by the integer 1, if they are equal, or by the integer 0, if they are unequal. B AAA branch Branch to the address AAA. BFP AAA branch false Branch to the address AAA if the top term and pop in the stack is the integer 0. Otherwise, continue in sequence. In either case, pop up the stack. BTP AAA branch true Branch to the address AAA if the top term and pop in the stack is not the integer 0. Otherwise, continue in sequence. In either case, pop up the stack. EDT STRING edit Round the number which is oh top of the stack to the nearest integer N. Move the given STRING into the print area so that its first character falls on print position N. In case this would cause characters to fall outside the print area, no movement takes place. PNT print Print a line, then space and clear the print area. HLT halt Halt. Constant and Control Codes SP N space N = 1--9. Constant code producing N blank spaces. BLK NNN block Produces a block of NNN eight character words. END end Denotes the end of the program. A Program as Compiled for the VALGOL I Machine FIGURE 3 .BEGIN .REAL X ., 0 = X ., B A01 0000 G 0012 X 0004 BLK 001 0004 A01 0012 LDL 0 0012 A ST X 0021 B 0004 .UNTIL X .= 3 .DO .BEGIN A02 0025 LD X 0025 0 0004 LDL 3 0029 A EQU 0038 F BTP A03 0039 K 0097 EDIT( X*X * 10 + 1, '*') ., PRINT ., X + 0.1 = X LD X 0043 O 0004 LD X 0047 O 0004 MLT 0051 E LDL 10 0052 A MLT 0061 E LDL 1 0062 A ADD 0071 C EDT 01'*' 0072 I PNT 0074 O LD X 0075 O 0004 LDL 0.1 0079 A ADD 0088 C ST X 0089 B 0004 .END Output from the VALGOL I Program Given in Figure 3 FIGURE 4 * * * * * * * * * * * * * * * * * * * * * * * * * * * The META II Compiler Written in Its Own Language FIGURE 5 .SYNTAX PROGRAM OUT1 = '*1' .OUT('GN1') / '*2' .OUT('GN2') / '*' .OUT('CI') / .STRING .OUT('CL ' *)., OUTPUT = ('.OUT' '(' $ OUT1 ')' / '.LABEL' .OUT('LB') OUT1) .OUT('OUT') ., EX3 = .ID .OUT ('CLL' *) / .STRING .OUT('TST' *) / '.ID' .OUT('ID') / '.NUMBER' .OUT('NUM') / '.STRING' .OUT('SR') / '(' EX1 ')' / '.EMPTY' .OUT('SET') / '$' .LABEL *1 EX3 .OUT ('BT ' *1) .OUT('SET')., EX2 = (EX3 .OUT('BF ' *1) / OUTPUT) $(EX3 .OUT('BE') / OUTPUT) .LABEL *1 ., EX1 = EX2 $('/' .OUT('BT ' *1) / EX2 ) .LABEL *1 ., ST = .ID .LABEL * '=' EX1 '.,' .OUT('R')., PROGRAM = '.SYNTAX' .ID .OUT('ADR' *) $ ST '.END' .OUTPUT('END')., .END Order List of the META II Machine FIGURE 6 Machine Codes TST STRING test After deleting initial blanks in the input string, compare it to the string given as argument. If the comparison is met, delete the matched portion from the input and set switch. If not met, reset switch. ID identifier After deleting initial blanks in the input string, test if it begins with an identifier, ie., a letter followed by a sequence of letters and/or digits. If so, delete the identifier and set switch. If not, reset switch. NUM number After deleting initial blanks in the input string, test if it begins with a number. A number is a string of digits which may contain imbeded periods, but may not begin or end with a period. Moreover, no two periods may be next to one another. If a number is found, delete it and set switch. If not, reset switch. SR string After deleting initial blanks in the input string, test if it begins with a string, ie., a single quote followed by a sequence of any characters other than single quote followed by another single quote. If a string is found, delete it and set switch. If not, reset switch. CLL AAA call Enter the subroutine beginning in location AAA. If the top two terms of the stack are blank, push the stack down by one cell. Otherwise, push it down by three cells. Set a flag in the stack to indicate whether it has been pushed by one or three cells. This flag and the exit address go into the third cell. Clear the top two cells to blanks to indicate that they can accept addresses which may be generated within the subroutine. R return Return to the exit address, popping up the stack by one or three cells according to the flag. If the stack is popped by only one cell, then clear the top two cells to blanks, because they were blank when the subroutine was entered. SET set Set branch switch on. B AAA branch Branch unconditionally to location AAA. BT AAA branch if true Branch to location AAA if switch is on. Otherwise, continue in sequence. BF AAA branch if false Branch to location AAA if switch is off, otherwise, continue in sequence. BE branch to error Halt if switch is off. if false Otherwise, continue in sequence. CL STRING copy literal Output the variable length string given as the argunent. A blank character will be inserted in the output folloning the string. CI copy input Output the last sequence of characters deleted from the input string. This command may not function properly if the last command which could cause deletion failed to do so. GN1 generate 1 This concerns the current label 1 cell, ie., the next to top cell in the stack, which is either clear or contains a generated label. If clear, generate a label and put it into that cell. Whether the label has just been put into the cell or was already there, output it. Finally, insert a blank character in the output following the label. GN2 generate 2 Same as GN1, except that it concerns the current label 2 cell, ie., the top cell in the stack. LB label Set the output counter to card column 1. OUT output Punch card and reset output counter to card column 8. Constant and Control Codes ADR IDENT address Produces the address which is assigned to the given identifier as a constant. END end Denotes the end of the program. VALGOL II Compiler Written in META II FIGURE 7 .SYNTAX PROGRAM ARRAYPART = '(.' EXP '.)' .OUT('AIA') ., CALLPART = '(' .OUT('LDF') ( EXP $ (',' EXP ) .EMPTY) ')' .OUT('CLL') ., VARIABLE = .ID .OUT('LD ' *) (ARRAYPART / .EMPTY) ., PRIMARY = 'WHOLE' '(' EXP ')' .OUT('WHL') / .ID .OUT('LD ' *) (ARRAYPART / CALLPART / .EMPTY) / '.TRUE' .OUT('SET') / '.FALSE' .OUT('RST') / '0 ' .OUT('RST') / '1 ' .OUT('SET') / .NUMBER .OUT('LDL' *) / '(' EXP ')' ., TERM = PRIMARY $ ('*' PRIMARY .OUT('MLT') / '/' PRIMARY .OUT('DIV') / './.' PRIMARY .OUT('DIV') .OUT('WHL') ) ., EXP2 = '-' TERM .OUT('NEG') / '+' TERM / TERM ., EXP1 = EXP2 $('+' TERM .OUT('ADD') / '-' TERM .OUT('SUB')) ., RELATION = EXP1 ( '.L =' EXP1 .OUT('LEQ') / '.L' EXP1 .OUT('LES') / '.=' EXP1 .OUT('EQU') / '.-=' EXP1 .OUT('EQU') .OUT('NOT') / '.G=' EXP1 .OUT('LES') .OUT('NOT') / '.G' EXP1 .OUT('LEQ') .OUT('NOT') / .EMPTY) ., BPRIMARY = '.-' RELATION .OUT('NOT') / RELATION ., BTERM = BPRIMARY $ ('.A' .OUT('BF ' *1) .OUT('POP') BPRIMARY) .LABEL *1 ., BEXP1 = BTERM $( '.V' .OUT('BT ' *1) .OUT('POP') BTERM) .LABEL *1 ., IMPLICATION1 = '.IMP' .OUT('NOT') .OUT('BT ' *1) .OUT('POP') BEXP1 .LABEL *1 ., IMPLICATION = BEXP1 $ IMPLICATION1 ., EQUIV = IMPLICATION $('.EQ' .OUT('EQU') ) ., EXP = '.IF' EXP '.THEN' .OUT('BFP' *1) EXP .OUT('B ' *2) .LABEL *1 '.ELSE' EXP .LABEL *2 / EQUIV ., ASSIGNPART = '=' EXP ( ASSIGNPART .OUT('ST') / .EMPTY .OUT('SST') ) ., ASSIGNCALLST = .ID .OUT('LD ' *) (ARRAYPART ASSIGNPART / ASSIGNPART / (CALLPART / .EMPTY .OUT('LDF') .OUT('CLL') ) .OUT('POP') ) ., UNTILST = '.UNTIL' .LABEL *1 EXP '.DO' .OUT('BTP' *2) ST .OUT('B ' *1) .LABEL *2 ., WHILECLAUSE = '.WHILE' .OUT('BF ' *1) .OUT('POP') EXP .LABEL *1 / .EMPTY ., FORCLAUSE = VARIABLE '=' .OUT('FLP') .OUT('BFP' *1) EXP '.STEP' .OUT('SST') .OUT('B ' *2) .LABEL *1 EXP '.UNTIL' .OUT('ADS') .LABEL *2 .OUT('RSR') EXP .OUT('LEQ') WHILECLAUSE '.DO' ., FORST = '.FOR' .OUT('SET') .LABEL *1 FORCLAUSE .OUT('BFP' *2) ST .OUT('RST') .OUT('B ' *1) .LABEL *2 ., IOCALL = 'READ' '(' VARIABLE ',' EXP ')' .OUT('RED') / 'WRITE' '(' VARIABLE ',' EXP ')' .OUT('WRT') / 'EDIT' '(' EXP ',' .STRING .OUT('EDT' *) ')' / 'PRINT' .OUT('PNT') / 'EJECT' .OUT('EJT') ., IDSEQ1 = .ID .LABEL * .OUT('BLK 1') ., IDSEQ = IDSEQ1 $(',' IDSEQ1) ., TYPEDEC = '.REAL' IDSEQ ., ARRAY1 = .ID .LABEL * '(.' '0' '..' .NUMBER .OUT('BLK 1') .OUT('BLK' *) '.)' ., ARRAYDEC = '.ARRAY' ARRAY1 $( ',' ARRAY1) ., PROCEDURE = '.PROCEDURE' .ID .LABEL * .LABEL *1 .OUT('BLK 1') '(' (IDSEQ / .EMPTY) ')' .OUT('SP 1') '.,' ST .OUT('R ' *1) ., DEC = TYPEDEC / ARRAYDEC / PROCEDURE ., BLOCK = '.BEGIN' .OUT('B ' *1) $(DEC '.,') .LABEL *1 ST $('.,' ST) '.END' (.ID / .EMPTY) ., UNCONDITIONALST = IOCALL / ASSIGNCALLST / BLOCK ., CONDST = '.IF' EXP '.THEN' .OUT('BFP' *1) (UNCONDITIONALST ('.ELSE' .OUT('B ' *2) .LABEL *1 ST .LABEL *2 / .EMPTY .LABEL *1) / (FORST / UNTILST) .LABEL *1) ., ST = CONDST / UNCONDITIONALST / FORST / UNTILST / .EMPTY ., PROGRAM = BLOCK .OUT('HLT') .OUT('SP 1') .OUT('END') ., .END Order List of the VALGOL II MACHINE FIGURE 8 Machine Codes LD AAA load Put the address AAA on top of the stack. LDL NUMBER load literal Put the given NUMBER on top of the stack. SET set Put the integer 1 on top of the stack. RST reset Put the integer 0 on top of the stack. ST store Store the contents of the register, STACK1, in the address which is on top of the stack, then pop up the stack. ADS add to storage Add the number on top of the stack to the (note 1) number whose address is next to the top, and place the sum in the register, STACK1. Then store the contents of that register in that address, and pop the top two items out of the stack. SST save and store Put the number on top of the stack into (note 1) the register, STACK1. Then store the contents of that register in the address which is the next to top term of the stack. The top two items are popped out of the stack. RSR restore Put the contents of the register, STACK1, on top of the stack. ADD add Replace the two numbers which are on top (note 2) of the stack with their sum. SUB subtract Subtract the number which is on top of the (note 2) stack from the number which is next to the top, then replace them by this difference. MLT multiply Replace the two numbers which are on top (note 2) of the stack with their product. DIV divide Divide the number which is next to the top (note 2) of the stack by the number which is on top of the stack, then replace them by this quotient. NEG negate Change the sign of the number on top of (note 1) the stack. WHL whole Truncate the number which is on top of the stack. NOT not If the top term in the stack is the integer 0, then replace it with the integer 1. Otherwise, replace it with the integer 0. LEQ less than If the number which is next to the top of or equal the stack is less than or equal to the (note 2) number on top of the stack, then replace them with the integer 1. Otherwise, replace them with the integer 0. LES less than If the number which is next to the top of (note 2) the stack is less than the number on top of the stack, then replace them with the integer 1. Otherwise, replace them with the integer 0. EQU equal Compare the two numbers on top of the (note 2) stack. Replace them by the integer 1, if they are equal, or by the integer 0, if they are unequal. B AAA branch Branch to the address AAA. BT AAA branch true Branch to the address AAA if the top term in the stack is not the integer 0. Otherwise, continue in sequence. Do not pop up the stack. BF AAA branch false Branch to the address AAA if the top term in the stack is the integer 0. Otherwise, continue in sequence. Do not pop up the stack. BTP AAA branch true Branch to the address AAA if the top term and pop in the stack is not the integer 0. Otherwise, continue in sequence. In either case, pop up the stack. BFP AAA branch false Branch to the address AAA if the top term and pop in the stack is the integer 0. Otherwise, continue in sequence. In either case, pop up the stack. CLL call Enter a procedure at the address which is below the flag. LDF load flag Put the address which is in the FLAG register on top of the stack, and put the address of the top of the stack into the FLAG register. R AAA return Return from procedure. AIA array increment Increment the address which is next to the address top of the stack by the integer which is on top of the stack, and replace these by the resulting address. FLP flip Interchange the top two terms of the stack. POP pop Pop up the stack. EDT string edit Round the number which is on top of the (note 1) stack to the nearest integer N. Move the given string into the print area so that its first character falls on print position N. In case this would cause characters to fall outside the print area, no movement takes place. PRT print Print a line, then space and clear the print area. EJT eject Position the paper in the printer to the top line of the next page. RED read Read The first N numbers from a card and store them beginning in the address which is next to the top of the stack. The integer N is the top term of the stack. Pop out both the address and the integer. Cards are punched with up to 10 eight digit numbers. Decimal point is assumed to be in the middle. An 11-punch over the rightmost digit indicates a negative number. WRT write Print a line of N numbers beginning in the address which is next to the top of the stack. The integer N is the top term of the stack. Pop out both the address and the integer. Twelve character positions are allowed for each number. There are four digits before and four digits after the decimal. Leading zeroes in front of the decimal are changed to blanks. If the number is negative, a minus sign is printed in the position before the first non-blank character. HLT halt Halt. Constant and Control Codes SP N space N = 1--9. Constant code producing N blank spaces. BLK NNN block Produces a block of NNN eight character words. END end Denotes the end of the program. Note 1: If the top item in the stack is an address, it is replaced by its contents before beginning this operation. Note 2: Same as note 1, but applies to the top two items. Example Program in VALGOL II FIGURE 9 .BEGIN .PROCEDURE DETERMINANT(A, N) ., .BEGIN .PROCEDURE DUMP() ., .BEGIN .REAL D ., .FOR D = 0 .STEP 1 .UNTIL N-1 .DO WRITE(MATRIX(. N*D .), N) ., PRINT .END DUMP ., .PROCEDURE ABS(X) ., ABS = .IF 0 .L= X .THEN X .ELSE -X ., .REAL PRODUCT, FACTOR, TEMP, R, I, J ., PRODUCT = 1 ., .FOR R = 0 .STEP 1 .UNTIL N-2 .WHILE PRODUCT .-= 0 .DO .BEGIN I = R ., .FOR J = R+1 .STEP 1 .UNTIL N-1 .DO .IF ABS( A(. N*I + R .) ) .L ABS( A(. N*J + R .) ) .THEN I = J ., .IF A(. N*I + R .) .= 0 .THEN PRODUCT = 0 .ELSE .IF I .-= R .THEN .BEGIN PRODUCT = -PRODUCT ., .FOR J = R .STEP 1 .UNTIL N-1 .DO .BEGIN TEMP = A(. N*R + J .) ., A(. N*R + J .) = A (. N* I + J .) ., A(. N*I + J .) = TEMP .END .END ., TEMP = A(. N*R + R .) ., .FOR I = R+1 .STEP 1 .UNTIL N-1 .DO .BEGIN FACTOR = A(. N*I + R .) / TEMP ., .FOR J = R .STEP 1 .UNTIL N-1 .DO A(. N*I + J .) = A(. N*I + J .) -FACTOR * A(. N*R + J .) ., DUMP .END .END ., .FOR I = 0 .STEP 1 .UNTIL N-1 .DO PRODUCT = PRODUCT * A(. N*I + I .) ., DETERMINANT = PRODUCT .END DETERMINANT ., .REAL M, W, T ., .ARRAY MATRIX (. 0 .. 24 .) ., .UNTIL .FALSE .DO .BEGIN EDIT(I, 'FIND DETERMINANT OF' ) ., PRINT., PRINT., READ(M, 1) ., .FOR W = 0 .STEP 1 .UTNIL M-1 .DO .BEGIN READ(MATRIX (. M*W .), M) ., WRITE(MATRIX (. M*W .), M) .END ., PRINT ., T = DETERMINANT (MATRIX, M) ., WRITE(T, 1) ., PRINT., PRINT .END .END PROGRAM