2009년 1학기 컴파일러 – Lex(Flex) & Yacc(Bison)

By | 2009/06/10

  컴파일러 과제로 나온 것입니다. minijava라는 java의 문법 중 자주 쓰이는 것들만 모은 것을 scanning하고 parsing하여 AST와 SymbolTable을 그리는 과제입니다. Scanning은 Scanner인 Lex(Flex)로 하고, Parsing은 Parser인 Yacc(Bison)을 이용하였습니다.

  처음에는 이 둘의 문법을 잘 몰라 책과 인터넷을 찾아보면서 공부하였습니다. 그래서 상당히 힘들 것이라 예상했으나 의외로 문법을 알고 나니까 쉽게 만들 수 있었습니다.

  여기에는 AST를 그리는데 필요한 파일(CPP 파일)은 올리지 않고, Flex와 Bison 소스 파일, input data와 그 결과인 AST, Symbol table을 올리겠습니다.

  Flex source file

   1: %{
   2: #include <stdio.h>
   3: #include <stdlib.h>
   4:  
   5: #include "minijava.tab.h"
   6: #include "./ast/SyntaxTree.h"
   7: #include "visitor/SymbolTableHeader.h"
   8:  
   9: // function declaration
  10: Identifier* id(char *s);
  11: int number(char *);
  12: extern void yyerror(const char *s);
  13:  
  14: extern SymbolTable* Program_SymbolTable;
  15:  
  16: %}
  17:  
  18: %option noyywrap
  19:  
  20: LETTER    [A-Za-z]
  21: DIGIT    [0-9]
  22:  
  23: %%
  24:  
  25: "/*"([^\*]|\*[^/])*"*/"    ;                    /* multiline comments */
  26: "//".*[\n]+            ;                        /* oneline comments */
  27: [ \t\n\r]+            ;                        /* whitespace */
  28:  
  29: "class"                            { return CLASS; }        /* reserved word */
  30: "main"                            { return MAIN_T; }
  31: "String"                        { return STRING; }
  32: "extends"                        { return EXTENDS; }
  33: "void"                            { return VOID; }
  34: "return"                        { return RETURN_T; }
  35: "public"                        { return PUBLIC_T; }
  36: "private"                        { return PRIVATE_T; }
  37: "static"                        { return STATIC_T; }
  38: "int"                            { return INT_T; }
  39: "boolean"                        { return BOOLEAN_T; }
  40: "this"                            { return THIS_T; }
  41: "if"                            { return IF_T; }
  42: "else"                            { return ELSE_T; }
  43: "while"                            { return WHILE_T; }
  44: "System.out.println"            { return PRINTLN_T; }
  45: "true"                            { return TRUE_T; }
  46: "false"                            { return FALSE_T; }
  47: "null"                            { return NULL_T; }
  48: "new"                            { return NEW_T; }
  49: "("                               { return SMALLPL_T; }
  50: ")"                               { return SMALLPR_T; }
  51: "{"                               { 
  52:                                     // token '{' means that the new level is appeared. So call the function
  53:                                     Program_SymbolTable->MIDPL_T_Start();
  54:                                     return MIDPL_T;
  55:                                 }
  56: "}"                               {
  57:                                     // token '}' means that the level is finished. So call the function
  58:                                     Program_SymbolTable->MIDPR_T_End();
  59:                                     return MIDPR_T;
  60:                                 }
  61: "["                                { return BIGPL_T; }
  62: "]"                                { return BIGPR_T; }
  63: "="                               { return EQU_T; }
  64: ";"                               { return SEMICOL_T; }
  65: "."                                { return DOT_T; }
  66: ","                                { return COMMA_T; }
  67:  
  68: ">"                                { return OP01; }            /* token binop '>' */
  69: "<"                                { return OP02; }            /* token binop '<' */
  70: "!"                                { return OP03; }            /* token binop '!' */
  71: "=="                            { return OP04; }            /* token binop '==' */
  72: "<="                            { return OP05; }            /* token binop '<=' */
  73: ">="                            { return OP06; }            /* token binop '>=' */
  74: "!="                            { return OP07; }            /* token binop '!=' */
  75: "&&"                            { return OP08; }            /* token binop '&&' */
  76: "||"                            { return OP09; }            /* token binop '||' */
  77: "+"                                { return OP10; }            /* token binop '+' */
  78: "-"                                { return OP11; }            /* token binop '-' */
  79: "*"                                { return OP12; }            /* token binop '*' */
  80: "/"                                { return OP13; }            /* token binop '/' */
  81:  
  82:  
  83: ({LETTER})({LETTER}|{DIGIT}|_)*    { yylval.TT_ID = id(yytext); return IDENT; }    /* token id */
  84: ({DIGIT})+                        { yylval.TT_NUMBER = number(yytext); return NUMBER; }    /* token num */
  85:  
  86:  
  87: .                           yyerror("Unknown character");    // All other character is not understood by scanner, so call error function.
  88: %%
  89:  
  90: Identifier* id(char *s) {
  91:     /* ID is id, so in scanner, it makes the Identifier object to return it. */
  92:     char* temp_str = new char[128];    // I think 128 is enough to saved the id.
  93:     strcpy(temp_str, s);            // copy it
  94:     Identifier* temp = new Identifier(temp_str);    // make the Identifier object
  95:     return temp;                                    // return it.
  96: }
  97:  
  98: int number(char *s) {
  99:     return atoi(s);                                    // if number, chaged the type from char* to int
 100: }
 101:  
 102:  

  Bison source file

   1: %{
   2: // include whole header file to use this code.
   3: #include <stdio.h>
   4: #include <stdlib.h>
   5: #include <string.h>
   6: #include <fstream>
   7: using std::ofstream;
   8:  
   9: #include "visitor/PrettyPrintVisitor.h"
  10: #include "ast/SyntaxTree.h"
  11: #include "visitor/SymbolTableHeader.h"
  12:  
  13: // declare the extern things to use scanner
  14: extern FILE *yyin;
  15: extern void yyerror(const char *s);
  16: extern int yylex();
  17: extern int yyparse();
  18:  
  19: Program* PA;                                            // static Program object
  20: char* empty_string = new char[4];                        // empty string to represent epsilon string
  21: char* java_filename = new char[64];                        // input java file name
  22:  
  23: SymbolTable* Program_SymbolTable = new SymbolTable();    // SymbolTable
  24: int SymbolTable::class_start_point = 0;                    // class base pointer
  25:  
  26: ofstream fout;                                            // write stream
  27:  
  28:  
  29: %}
  30:  
  31: %union    {
  32:     int TT_NUMBER;
  33:     int TT_UNOP;
  34:     Identifier* TT_ID;
  35:     
  36:     /* Non Terminal Type */
  37:     Program* NTT_Program;
  38:     MainClass* NTT_MainClass;
  39:     ClassDecl* NTT_ClassDeclaration;
  40:     ClassDeclList* NTT_EClassDeclaration;
  41:     MemberDecl* NTT_FMDeclaration;
  42:     MemberDeclList* NTT_EFMDeclaration;
  43:     MemberDecl* NTT_FieldDeclaration;
  44:     MemberDecl* NTT_MethodDeclaration;
  45:     ModifierList* NTT_Modifier;
  46:     Type* NTT_Type;
  47:     ParamList* NTT_ParameterList;
  48:     Ref* NTT_Ref;
  49:     Identifier* NTT_RefExtra;
  50:     ArgList* NTT_ArgList;
  51:     ArgList* NTT_ArgExp;
  52:     Statement* NTT_Statement;
  53:     StatementList* NTT_EStatement;
  54:     Expr* NTT_Expression;
  55:     
  56: };
  57:  
  58: %token <TT_ID>    IDENT            /* id */
  59: %token <TT_NUMBER>     NUMBER    /* num */
  60: %token OP01        /* '>' */
  61: %token OP02        /* '<' */
  62: %token OP03        /* '!' */
  63: %token OP04        /* '==' */
  64: %token OP05        /* '<=' */
  65: %token OP06        /* '>=' */
  66: %token OP07        /* '!=' */
  67: %token OP08        /* '&&' */
  68: %token OP09        /* '||' */
  69: %token OP10        /* '+' */
  70: %token OP11        /* '-' */
  71: %token OP12        /* '*' */
  72: %token OP13        /* '/' */
  73:  
  74:  
  75: %token CLASS        /* class */
  76: %token MAIN_T        /* main */
  77: %token STRING        /* String */
  78: %token EXTENDS        /* extends */
  79: %token VOID            /* void */
  80: %token RETURN_T        /* return */
  81: %token INT_T        /* int */
  82: %token BOOLEAN_T    /* boolean */
  83: %token PUBLIC_T        /* public */
  84: %token PRIVATE_T    /* private */
  85: %token STATIC_T        /* static */
  86: %token THIS_T        /* this */
  87: %token IF_T            /* if */
  88: %token ELSE_T        /* else */
  89: %token WHILE_T        /* while */
  90: %token PRINTLN_T    /* System.out.println */
  91: %token TRUE_T        /* true */
  92: %token FALSE_T        /* false */
  93: %token NULL_T        /* null */
  94: %token NEW_T        /* new */
  95: %token SMALLPL_T     /* ( */
  96: %token SMALLPR_T     /* ) */
  97: %token MIDPL_T         /* { */
  98: %token MIDPR_T         /* } */
  99: %token BIGPL_T         /* [ */
 100: %token BIGPR_T         /* ] */
 101: %token EQU_T         /* = */
 102: %token SEMICOL_T     /* ; */
 103: %token DOT_T         /* . */
 104: %token COMMA_T         /* , */
 105:  
 106:  
 107:  
 108: %type <TT_UNOP>                    UNOP            /* unop */
 109:  
 110: %type <NTT_Program>                Program
 111: %type <NTT_MainClass>            MainClass
 112: %type <NTT_ClassDeclaration>    ClassDeclaration
 113: %type <NTT_EClassDeclaration>    EClassDeclaration
 114: %type <NTT_FMDeclaration>        FMDeclaration            /* FieldDeclaration | MethodDeclaration */
 115: %type <NTT_EFMDeclaration>        EFMDeclaration
 116: %type <NTT_FieldDeclaration>    FieldDeclaration
 117: %type <NTT_MethodDeclaration>    MethodDeclaration
 118: %type <NTT_Modifier>            Modifier
 119: %type <NTT_Type>                Type
 120: %type <NTT_ParameterList>        ParameterList
 121: %type <NTT_Ref>                    Ref
 122: %type <NTT_RefExtra>            RefExtra                /* In the template, last (.id)* is just identifier. */
 123: %type <NTT_ArgList>                ArgList
 124: %type <NTT_ArgExp>                ArgExp
 125: %type <NTT_Statement>            Statement
 126: %type <NTT_EStatement>            EStatement
 127: %type <NTT_Expression>            Expression
 128:  
 129:  
 130:  
 131:  
 132: /* http://www.difranco.net/cop2551/java_op-prec.htm */
 133: /* write the operation precedence from Java */
 134: %right    EQU_T                    /* = */
 135: %left    OP09                    /* || */
 136: %left    OP08                    /* && */
 137: %left    OP04 OP07                /* == != */
 138: %left    OP01 OP02 OP05 OP06        /* > < <= >= */
 139: %left    OP10 OP11                  /* + - */
 140: %left    OP12 OP13                  /* * / */
 141: %right    UNARY                     /* unary operation */
 142: %left    SMALLPL_T SMALLPR_T        /* () */
 143: %nonassoc IFX                   /* if statement */
 144: %nonassoc ELSE_T                /* if-else statement */
 145:  
 146:  
 147: %start        Program                // Start symbol is Program
 148:  
 149:  
 150: %%
 151:  
 152:  
 153: Program : MainClass EClassDeclaration                                     {
 154:                                                                             Program_SymbolTable->set_key();
 155:                                                                             PA = new Program($1, $2);
 156:                                                                             /* Make the program object */
 157:                                                                         }
 158:     ;
 159:     
 160: MainClass : CLASS IDENT MIDPL_T PUBLIC_T STATIC_T VOID MAIN_T SMALLPL_T STRING BIGPL_T BIGPR_T IDENT SMALLPR_T MIDPL_T EStatement MIDPR_T MIDPR_T    {
 161:                                                                                                                                                         // Make the SymbolTableEntry from argument
 162:                                                                                                                                                         SymbolTableEntry* STE1 = new SymbolTableEntry($12, 2, NULL, NULL, 1);
 163:                                                                                                                                                         // Make the SymbolTableEntry from mainclass
 164:                                                                                                                                                           Program_SymbolTable->append(STE1);
 165:                                                                                                                                                           SymbolTableEntry* STE2 = new SymbolTableEntry($2, 6, NULL, NULL, 0);
 166:                                                                                                                                                           Program_SymbolTable->append(STE2);
 167:                                                                                                                                                           // Set the classname
 168:                                                                                                                                                           Program_SymbolTable->set_classname($2, SymbolTable::class_start_point);
 169:                                                                                                                                                           // Set the class_start_point to next class
 170:                                                                                                                                                           SymbolTable::class_start_point = Program_SymbolTable->size();
 171:                                                                                                                                                           $$ = new MainClass($2, $12, $15);
 172:                                                                                                                                                           }
 173:     ;
 174:     
 175: ClassDeclaration : CLASS IDENT MIDPL_T EFMDeclaration MIDPR_T            {
 176:                                                                             // Make the SymbolTableEntry from class
 177:                                                                             SymbolTableEntry*  STE = new SymbolTableEntry($2, 5, NULL, NULL, 0);
 178:                                                                             Program_SymbolTable->append(STE);
 179:                                                                             Program_SymbolTable->set_classname($2, SymbolTable::class_start_point);
 180:                                                                             SymbolTable::class_start_point = Program_SymbolTable->size();
 181:                                                                             $$ = new ClassDeclSimple($2, $4);
 182:                                                                         }            /* no extends */
 183:     | CLASS IDENT EXTENDS IDENT MIDPL_T EFMDeclaration MIDPR_T            {
 184:                                                                             // Make the SymbolTableEntry from class
 185:                                                                             SymbolTableEntry* STE = new SymbolTableEntry($2, 5, NULL, NULL, 0);
 186:                                                                             Program_SymbolTable->append(STE);
 187:                                                                             Program_SymbolTable->set_classname($2, SymbolTable::class_start_point);
 188:                                                                             SymbolTable::class_start_point = Program_SymbolTable->size();
 189:                                                                             $$ = new ClassDeclExtends($2, $4, $6);
 190:                                                                         }        /* extends */
 191:     ;
 192:     
 193: EClassDeclaration : ClassDeclaration EClassDeclaration                    {
 194:                                                                             // make new ClassDeclList object
 195:                                                                             ClassDeclList* temp = new ClassDeclList();
 196:                                                                             // append ClassDeclaration
 197:                                                                             temp->append($1);
 198:                                                                             // append other ClassDeclaration
 199:                                                                             int size = $2->size();
 200:                                                                             for(int i = 0 ; i < size ; i++)
 201:                                                                             {
 202:                                                                                 temp->append($2->elementAt(i));
 203:                                                                             }
 204:                                                                             $$ = temp;
 205:                                                                         }
 206:     | /* empty string */                                                { $$ = new ClassDeclList(); // empty string, so return empty object
 207:                                                                         }
 208:     ; 
 209:     
 210: FMDeclaration : FieldDeclaration                                        { $$ = $1; // I can remove this rule, but for convenience remain this rule.
 211:                                                                         }
 212:     | MethodDeclaration                                                    { $$ = $1; }
 213:     ;
 214:     
 215: EFMDeclaration : FMDeclaration EFMDeclaration                            {
 216:                                                                             // make new MemberDeclList object
 217:                                                                             MemberDeclList* temp = new MemberDeclList();
 218:                                                                             // append FMDeclaration
 219:                                                                             temp->append($1);
 220:                                                                             // append other FMDeclaration
 221:                                                                             int size = $2->size();
 222:                                                                             for(int i = 0 ; i < size ; i++)
 223:                                                                             {
 224:                                                                                 temp->append($2->elementAt(i));
 225:                                                                             }
 226:                                                                             $$ = temp;
 227:                                                                         }
 228:     | /* empty string */                                                { $$ = new MemberDeclList(); // empty string, so return empty object
 229:                                                                         }
 230:     ;
 231:     
 232: FieldDeclaration : Modifier Type IDENT SEMICOL_T                        {
 233:                                                                             // There is a id to write on the symboltable, so do it.
 234:                                                                             SymbolTableEntry* STE = new SymbolTableEntry($3, 7, $1, $2);
 235:                                                                             Program_SymbolTable->append(STE);
 236:                                                                             $$ = new FieldDecl($1, $2, $3);
 237:                                                                         }
 238:     ;
 239:     
 240: MethodDeclaration : Modifier Type IDENT SMALLPL_T ParameterList SMALLPR_T MIDPL_T EStatement RETURN_T Expression SEMICOL_T MIDPR_T     {
 241:                                                                                                                                         // There is a id to write on the symboltable, so do it.
 242:                                                                                                                                         SymbolTableEntry* STE = new SymbolTableEntry($3, 1, $1, $2);
 243:                                                                                                                                         Program_SymbolTable->append(STE);
 244:                                                                                                                                         $$ = new MethodDeclReturn($1, $2, $3, $5, $8, $10);
 245:                                                                                                                                     }    /* Type */
 246:     | Modifier VOID IDENT SMALLPL_T ParameterList SMALLPR_T MIDPL_T EStatement MIDPR_T     {
 247:                                                                                             // There is a id to write on the symboltable, so do it.
 248:                                                                                             SymbolTableEntry* STE = new SymbolTableEntry($3, 1, $1, NULL);
 249:                                                                                             Program_SymbolTable->append(STE);
 250:                                                                                             $$ = new MethodDeclVoid($1, $3, $5, $8);
 251:                                                                                         }    /* void */
 252:     ;
 253:     
 254: Modifier : PUBLIC_T                                                        {
 255:                                                                             ModifierList* temp_list = new ModifierList();
 256:                                                                             Public* temp1 = new Public();
 257:                                                                             temp_list->append(temp1);
 258:                                                                             $$ = temp_list;
 259:                                                                         }
 260:     | PUBLIC_T STATIC_T                                                    { 
 261:                                                                             ModifierList* temp_list = new ModifierList();
 262:                                                                             Public* temp1 = new Public();
 263:                                                                             temp_list->append(temp1);
 264:                                                                             Static* temp2 = new Static();
 265:                                                                             temp_list->append(temp2);
 266:                                                                             $$ = temp_list;
 267:                                                                         }
 268:     | PRIVATE_T                                                            {
 269:                                                                             ModifierList* temp_list = new ModifierList();
 270:                                                                             Private* temp1 = new Private();
 271:                                                                             temp_list->append(temp1);
 272:                                                                             $$ = temp_list;
 273:                                                                         }
 274:     | PRIVATE_T STATIC_T                                                { 
 275:                                                                             ModifierList* temp_list = new ModifierList();
 276:                                                                             Private* temp1 = new Private();
 277:                                                                             temp_list->append(temp1);
 278:                                                                             Static* temp2 = new Static();
 279:                                                                             temp_list->append(temp2);
 280:                                                                             $$ = temp_list;
 281:                                                                         }
 282:     | STATIC_T                                                            {
 283:                                                                             ModifierList* temp_list = new ModifierList();
 284:                                                                             Static* temp1 = new Static();
 285:                                                                             temp_list->append(temp1);
 286:                                                                             $$ = temp_list;
 287:                                                                         }
 288:     | /* empty string */                                                { $$ = NULL; }
 289:     ;
 290:     
 291: Type : INT_T                                                            { $$ = new IntType(); }
 292:     | BOOLEAN_T                                                            { $$ = new BooleanType(); }
 293:     | INT_T BIGPL_T BIGPR_T                                                { $$ = new IntArrayType(); }
 294:     | IDENT                                                                { $$ = new IdentifierType($1); }
 295:     ;
 296:     
 297: ParameterList : Type IDENT ParameterList                                {
 298:                                                                             // ParamList is also list, so make new thing and attach remaining thing.
 299:                                                                             ParamList* temp = new ParamList();
 300:                                                                             Param* temp_param = new Param($1, $2);
 301:                                                                             temp->append(temp_param);
 302:                                                                             /* attach the ArgExp to ArgList */
 303:                                                                             if(NULL != $3)
 304:                                                                             {
 305:                                                                                 int size = $3->size();
 306:                                                                                 for(int i = 0 ; i < size ; i++)
 307:                                                                                 {
 308:                                                                                     temp->append($3->elementAt(i));
 309:                                                                                 }
 310:                                                                             }
 311:                                                                             // There is a id to write on the symboltable, so do it.
 312:                                                                             SymbolTableEntry* STE = new SymbolTableEntry($2, 2, NULL, $1);
 313:                                                                             Program_SymbolTable->append(STE);
 314:                                                                             $$ = temp;
 315:                                                                         }
 316:     | COMMA_T Type IDENT ParameterList                                    {
 317:                                                                             // This is the expand things.
 318:                                                                             ParamList* temp = new ParamList();
 319:                                                                             Param* temp_param = new Param($2, $3);
 320:                                                                             temp->append(temp_param);
 321:                                                                             /* attach the ArgExp to ArgList */
 322:                                                                             if(NULL != $4)
 323:                                                                             {
 324:                                                                                 int size = $4->size();
 325:                                                                                 for(int i = 0 ; i < size ; i++)
 326:                                                                                 {
 327:                                                                                     temp->append($4->elementAt(i));
 328:                                                                                 }
 329:                                                                             }
 330:                                                                             // There is a id to write on the symboltable, so do it.
 331:                                                                             SymbolTableEntry* STE = new SymbolTableEntry($3, 2, NULL, $2);
 332:                                                                             Program_SymbolTable->append(STE);
 333:                                                                             $$ = temp;
 334:                                                                         }
 335:     | /* empty string */                                                {
 336:                                                                             $$ = NULL; // if it is empty, then return NULL.
 337:                                                                                         // Then PrettyPrinterVisitor know that there is no parameter.
 338:                                                                         }
 339:     ;
 340:     
 341: Ref : THIS_T RefExtra                                                    {
 342:                                                                             // first thing is 'this'
 343:                                                                             This* temp_this = new This();
 344:                                                                             // I don't know how to represent (.id)*, So just deal with string.
 345:                                                                             // So if RefExtra's string is null string(length is 0), it means just this.
 346:                                                                             if(0 < strlen($2->string))
 347:                                                                             {
 348:                                                                                 /* (.id)* */
 349:                                                                                 $$ = new MemRef(temp_this, $2);
 350:                                                                             }
 351:                                                                             else
 352:                                                                             {
 353:                                                                                 /* alone */
 354:                                                                                 $$ = temp_this;
 355:                                                                             }
 356:                                                                         }
 357:     | IDENT RefExtra                                                    {
 358:                                                                             // first thing is 'id'
 359:                                                                             IdentifierRef* temp_idref = new IdentifierRef($1);
 360:                                                                             // I don't know how to represent (.id)*, So just deal with string.
 361:                                                                             // So if RefExtra's string is null string(length is 0), it means just this.
 362:                                                                             if(0 < strlen($2->string))
 363:                                                                             {
 364:                                                                                 /* (.id)* */
 365:                                                                                 $$ = new MemRef(temp_idref, $2);
 366:                                                                             }
 367:                                                                             else
 368:                                                                             {
 369:                                                                                 /* alone */
 370:                                                                                 $$ = temp_idref;
 371:                                                                             }
 372:                                                                         }
 373:     ;
 374:     
 375: RefExtra : DOT_T IDENT RefExtra                                            {
 376:                                                                             /* I don't know how to deal this exactly.
 377:                                                                                 So just deal with string. */
 378:                                                                             char* temp_str = new char[64];
 379:                                                                             strcpy(temp_str, $2->string);
 380:                                                                             if(0 < strlen($3->string))
 381:                                                                             {
 382:                                                                                 strcat(temp_str, ".");
 383:                                                                                 strcat(temp_str, $3->string);
 384:                                                                             }
 385:                                                                             $$ = new Identifier(temp_str);
 386:                                                                         }
 387:     | /* empty string */                                                {
 388:                                                                             // I deal the Ref to string. so use empty string.
 389:                                                                             $$ = new Identifier(empty_string);
 390:                                                                         }
 391:     ;
 392:     
 393: ArgList : Expression ArgExp                                                {
 394:                                                                             // ArgList is list, so make new object, attach between new thing and ArgExp.
 395:                                                                             ArgList* temp = new ArgList();
 396:                                                                             temp->append($1);
 397:                                                                             /* attach the ArgExp to ArgList */
 398:                                                                             if(NULL != $2)
 399:                                                                             {
 400:                                                                                 int size = $2->size();
 401:                                                                                 for(int i = 0 ; i < size ; i++)
 402:                                                                                 {
 403:                                                                                     temp->append($2->elementAt(i));
 404:                                                                                 }
 405:                                                                             }
 406:                                                                             $$ = temp;
 407:                                                                         }
 408:     | /* empty string */                                                {
 409:                                                                             $$ = NULL;
 410:                                                                         }
 411:     ;
 412:     
 413: ArgExp : COMMA_T Expression ArgExp                                        {
 414:                                                                             // ArgList is list, so make new object, attach between new thing and ArgExp.
 415:                                                                             ArgList* temp = new ArgList();
 416:                                                                             temp->append($2);
 417:                                                                             /* attach the ArgExp to ArgList */
 418:                                                                             if(NULL != $3)
 419:                                                                             {
 420:                                                                                 int size = $3->size();
 421:                                                                                 for(int i = 0 ; i < size ; i++)
 422:                                                                                 {
 423:                                                                                     temp->append($3->elementAt(i));
 424:                                                                                 }
 425:                                                                             }
 426:                                                                             $$ = temp;
 427:                                                                         }
 428:     | /* empty string */                                                {
 429:                                                                             $$ = NULL;
 430:                                                                         }
 431:     ;
 432:     
 433: Statement : MIDPL_T EStatement MIDPR_T                                    { $$ = new StatementBlock($2); }
 434:     | Type IDENT SEMICOL_T                                                {
 435:                                                                             // There is a id to write on the symboltable, so do it.
 436:                                                                             SymbolTableEntry* STE = new SymbolTableEntry($2, 3, NULL, $1);
 437:                                                                             Program_SymbolTable->append(STE);
 438:                                                                             $$ = new LocalVarDecl($1, $2, NULL);
 439:                                                                         }
 440:     | Type IDENT EQU_T Expression SEMICOL_T                                {
 441:                                                                             // There is a id to write on the symboltable, so do it.
 442:                                                                             SymbolTableEntry* STE = new SymbolTableEntry($2, 3, NULL, $1);
 443:                                                                             Program_SymbolTable->append(STE);
 444:                                                                             $$ = new LocalVarDecl($1, $2, $4);
 445:                                                                         }
 446:     | Ref EQU_T Expression SEMICOL_T                                    { $$ = new Assign($1, $3); }
 447:     | Ref BIGPL_T Expression BIGPR_T EQU_T Expression SEMICOL_T            { $$ = new ArrayAssign($1, $3, $6); }
 448:     | Ref SMALLPL_T ArgList SMALLPR_T SEMICOL_T                            { $$ = new InvokeStatement($1, $3); }
 449:     | IF_T SMALLPL_T Expression SMALLPR_T Statement    %prec IFX            { 
 450:                                                                             // I set %prec IFX. and use prec I remove the ambiguity.
 451:                                                                             $$ = new If($3, $5, NULL);
 452:                                                                         }
 453:     | IF_T SMALLPL_T Expression SMALLPR_T Statement ELSE_T Statement    {
 454:                                                                             // I set %prec IFX. and use prec I remove the ambiguity.
 455:                                                                             $$ = new If($3, $5, $7);
 456:                                                                         }
 457:     | WHILE_T SMALLPL_T Expression SMALLPR_T Statement                    { $$ = new While($3, $5); }
 458:     | PRINTLN_T SMALLPL_T Expression SMALLPR_T SEMICOL_T                { $$ = new Print($3); }
 459:     ;
 460:  
 461: EStatement : Statement EStatement                                        {
 462:                                                                             StatementList* temp = new StatementList();
 463:                                                                             temp->append($1);
 464:                                                                             int size = $2->size();
 465:                                                                             for(int i = 0 ; i < size ; i++)
 466:                                                                             {
 467:                                                                                 temp->append($2->elementAt(i));
 468:                                                                             }
 469:                                                                             $$ = temp;
 470:                                                                         }
 471:     | /* empty string */                                                { $$ = new StatementList(); }
 472:     ;
 473: Expression : Ref                                                        { $$ = new RefExpr($1); }
 474:     | UNOP Expression %prec UNARY                                        {
 475:                                                                             // UNARY operation is higher precedence than BINAY operation
 476:                                                                             // So write %prec
 477:                                                                             switch ($1)
 478:                                                                             {
 479:                                                                                 case 1:
 480:                                                                                     /* ! */
 481:                                                                                     $$ = new LogNeg($2);
 482:                                                                                     break;
 483:                                                                                 case 2:
 484:                                                                                     /* - */
 485:                                                                                     $$ = new ArithNeg($2);
 486:                                                                                     break;
 487:                                                                                 default:
 488:                                                                                     yyerror("Expression UNOP switch error!! ");
 489:                                                                                     break;
 490:                                                                             }
 491:                                                                         }
 492:     | Expression OP01 Expression                                        { $$ = new GreaterThan($1, $3); }
 493:     | Expression OP02 Expression                                        { $$ = new LessThan($1, $3); }
 494:     | Expression OP04 Expression                                        { $$ = new Equal($1, $3); }
 495:     | Expression OP05 Expression                                        { $$ = new LessEqual($1, $3); }
 496:     | Expression OP06 Expression                                        { $$ = new GreaterEqual($1, $3); }
 497:     | Expression OP07 Expression                                        { $$ = new NotEqual($1, $3); }
 498:     | Expression OP08 Expression                                        { $$ = new LogAnd($1, $3); }
 499:     | Expression OP09 Expression                                        { $$ = new LogOr($1, $3); }
 500:     | Expression OP10 Expression                                        { $$ = new Add($1, $3); }
 501:     | Expression OP11 Expression                                        { $$ = new Sub($1, $3); }
 502:     | Expression OP12 Expression                                        { $$ = new Mult($1, $3); }
 503:     | Expression OP13 Expression                                        { $$ = new Div($1, $3); }
 504:     | Ref BIGPL_T Expression BIGPR_T                                    { $$ = new ArrayLookup($1, $3); }
 505:     | Ref SMALLPL_T ArgList SMALLPR_T                                    { $$ = new InvokeExpr($1, $3); }
 506:     | SMALLPL_T Expression SMALLPR_T                                    { $$ = $2; }
 507:     | NUMBER                                                            { $$ = new Num($1); }
 508:     | TRUE_T                                                            { $$ = new True(); }
 509:     | FALSE_T                                                            { $$ = new False(); }
 510:     | NULL_T                                                            { $$ = new Null(); }
 511:     | NEW_T IDENT SMALLPL_T SMALLPR_T                                    {
 512:                                                                             // There is a id to write on the symboltable, so do it.
 513:                                                                             SymbolTableEntry* STE = new SymbolTableEntry($2, 4, NULL, NULL);
 514:                                                                             Program_SymbolTable->append(STE);
 515:                                                                             $$ = new NewObject($2);
 516:                                                                         }
 517:     | NEW_T INT_T BIGPL_T Expression BIGPR_T                            { $$ = new NewArray($4); }
 518:     ;
 519:  
 520: UNOP : OP03                                                                { $$ = 1; }
 521:     | OP11                                                                 { $$ = 2; }
 522:     ;
 523:  
 524: %%
 525:  
 526: int main(int argc, char* argv[])
 527: {
 528:     char* fn = new char[64];                                                // Make temporary string variable
 529:     
 530:     strcpy(empty_string, "");                                                // define empty string
 531:     strcpy(java_filename, argv[1]);                                            // save the java filename
 532:     
 533:     cout << "Start Scanning and Parsing the " << java_filename << endl;        // print the program status
 534:       yyin = fopen(argv[1], "r");                                                // open the java source file
 535:     yyparse();                                                                // do parsing
 536:     fclose(yyin);                                                            // close the java source file
 537:     
 538:     
 539:     /* print_ptree(ptree); */
 540:     cout << "Write the AST to the " << java_filename << ".ast" << endl;        // print the program status
 541:     strcpy(fn, java_filename);                                                // make the filename to write the AST
 542:     strcat(fn, ".ast");                                                        // make the filename to write the AST
 543:     fout.open(fn);                                                            // open the AST file
 544:     
 545:     PrettyPrintVisitor PPV;                                                    // define PPV to write the AST
 546:     PPV.visit(PA);                                                            // write the AST
 547:     
 548:     fout.close();                                                            // close the AST file
 549:     
 550:     
 551:     /* print symbol table */
 552:     cout << "Write the SymbolTable to the " << java_filename << ".syt" << endl;    // print the program status
 553:     strcpy(fn, java_filename);                                                // make the filename to write the SymbolTable
 554:     strcat(fn, ".syt");                                                        // make the filename to write the SymbolTable
 555:     fout.open(fn);                                                            // open the SymbolTable file
 556:     
 557:     Program_SymbolTable->print();                                            // write the SymbolTable to the file
 558:     
 559:     fout.close();                                                            // close the SymbolTable file
 560:     
 561:     
 562:     cout << "End the program " << endl;                                        // print the program status
 563:     
 564:     return 0;                                                                // return to the OS
 565: }
 566:  
 567: void yyerror(const char *s)
 568: {
 569:     fprintf(stderr, "%s\n", s);                                                // print error message
 570: }
 571:  
 572:  

  Input minijava source file

   1: class TestAll {
   2:     public static void main(String[] args) {
   3:         /* Type */
   4:         /* test comment * */
   5:         int i;
   6:         boolean b = false;
   7:         int[] ia = new int[3];
   8:         ClassB cb = new ClassB();
   9:         /* Ref(except this), ArgList */
  10:         cb.c = new ClassC();
  11:         i = cb.c.mi();
  12:         i = cb.c.mi1(4);
  13:         i = cb.c.mi2(5, 6);
  14:         cb.c.mv();
  15:         /* Statement(except ArgList) */
  16:         {int j;}
  17:         int k;
  18:         int l = 7;
  19:         i = 8;
  20:         ia[0] = 9;
  21:         if (b) {}
  22:         if (b) {} else {}
  23:         while (b) {}
  24:         System.out.println(i);
  25:         /* Expression(except ArgList, new) */
  26:         i = i;
  27:         i = -i;
  28:         b = !b;
  29:         i = i * i;
  30:         i = i / i;
  31:         i = i + i;
  32:         i = i - i;
  33:         b = i > i;
  34:         b = i < i;
  35:         b = i == i;
  36:         b = i != i;
  37:         b = i >= i;
  38:         b = i <= i;
  39:         b = b && b;
  40:         b = b || b;
  41:         i = ia[1];
  42:         i = (i);
  43:         i = 10;
  44:         b = true;
  45:         b = false;
  46:         cb = null;
  47:     }
  48: }
  49: class ClassB {
  50:     public ClassC c;
  51:     private int i;
  52:     /* Ref(this) */
  53:     public ClassB cb() {
  54:         return this;
  55:     }
  56:     public int i() {
  57:         return this.i;
  58:     }
  59:     public int ii() {
  60:         return this.c.i;
  61:     }
  62: }
  63: class ClassC extends ClassB {
  64:     /* FieldDeclaration, Modifier*/
  65:     int i;
  66:     static int si;
  67:     public int pui;
  68:     public static int pusi;
  69:     private int pri;
  70:     private static int prsi;
  71:     /* MethodDeclaration, ParameterList
  72:         and Multiline comment */
  73:     int mi() {
  74:         int i = 1;
  75:         System.out.println(i);
  76:         return 0;
  77:     }
  78:     int mi1(int i) {
  79:         System.out.println(i);
  80:         return 0;
  81:     }
  82:     int mi2(int i, int j) {
  83:         System.out.println(i);
  84:         System.out.println(j);
  85:         return 0;
  86:     }
  87:     void mv() {
  88:         int i = 2;
  89:         System.out.println(i);
  90:     }
  91: }
  92:  

  AST

   1: class TestAll {
   2:     public static void main (String[] args) {
   3:         int i;
   4:         boolean b = false;
   5:         int [] ia = new int [3];
   6:         ClassB cb = new ClassB();
   7:         cb.c = new ClassC();
   8:         i = cb.c.mi();
   9:         i = cb.c.mi1(4);
  10:         i = cb.c.mi2(5, 6);
  11:         cb.c.mv();
  12:         {
  13:     int j;
  14: }
  15:         int k;
  16:         int l = 7;
  17:         i = 8;
  18:         ia[0] = 9;
  19:         if (b) {
  20: }
  21:         if (b) {
  22: }
  23: else {
  24: }
  25:         while (b) {
  26: }
  27:         System.out.println(i);
  28:         i = i;
  29:         i = -(i);
  30:         b = !b;
  31:         i = (i * i);
  32:         i = (i / i);
  33:         i = (i + i);
  34:         i = (i - i);
  35:         b = (i > i);
  36:         b = (i < i);
  37:         b = (i == i);
  38:         b = (i != i);
  39:         b = (i >= i);
  40:         b = (i <= i);
  41:         b = (b && b);
  42:         b = (b || b);
  43:         i = ia[1];
  44:         i = i;
  45:         i = 10;
  46:         b = true;
  47:         b = false;
  48:         cb = null;
  49:     }
  50: }
  51:  
  52: class ClassB {
  53:     public ClassC c;
  54:     private int i;
  55:     public ClassB cb () {
  56:         return this;
  57:     }
  58:     public int i () {
  59:         return this.i;
  60:     }
  61:     public int ii () {
  62:         return this.c.i;
  63:     }
  64: }
  65:  
  66: class ClassC extends ClassB {
  67:     int i;
  68:     static int si;
  69:     public int pui;
  70:     public static int pusi;
  71:     private int pri;
  72:     private static int prsi;
  73:     int mi () {
  74:         int i = 1;
  75:         System.out.println(i);
  76:         return 0;
  77:     }
  78:     int mi1 (int i) {
  79:         System.out.println(i);
  80:         return 0;
  81:     }
  82:     int mi2 (int i, int j) {
  83:         System.out.println(i);
  84:         System.out.println(j);
  85:         return 0;
  86:     }
  87:     void mv () {
  88:     int i = 2;
  89:     System.out.println(i);
  90: }
  91: }

  Symbol Table

   1: Name                            Kind                Key                             Type                Modifier
   2: i                               Variable            TestAll_2                       int                 
   3: b                               Variable            TestAll_2                       boolean             
   4: ia                              Variable            TestAll_2                       int []              
   5: ClassB                          new id()            TestAll_2                                           
   6: cb                              Variable            TestAll_2                       ClassB
   7: ClassC                          new id()            TestAll_2                                           
   8: j                               Variable            TestAll_3                       int                 
   9: k                               Variable            TestAll_2                       int                 
  10: l                               Variable            TestAll_2                       int                 
  11: args                            Parameter           TestAll_1                                           
  12: TestAll                         MainClass           TestAll_0                                           
  13: c                               Field               ClassB_1                        ClassCpublic               
  14: i                               Field               ClassB_1                        int                 private 
  15: cb                              Method              ClassB_1                        ClassBpublic               
  16: i                               Method              ClassB_1                        int                 public 
  17: ii                              Method              ClassB_1                        int                 public 
  18: ClassB                          Class               ClassB_0                                            
  19: i                               Field               ClassC_1                        int                 
  20: si                              Field               ClassC_1                        int                 static 
  21: pui                             Field               ClassC_1                        int                 public 
  22: pusi                            Field               ClassC_1                        int                 public static 
  23: pri                             Field               ClassC_1                        int                 private 
  24: prsi                            Field               ClassC_1                        int                 private static 
  25: i                               Variable            ClassC_2                        int                 
  26: mi                              Method              ClassC_1                        int                 
  27: i                               Parameter           ClassC_1                        int                 
  28: mi1                             Method              ClassC_1                        int                 
  29: j                               Parameter           ClassC_1                        int                 
  30: i                               Parameter           ClassC_1                        int                 
  31: mi2                             Method              ClassC_1                        int                 
  32: i                               Variable            ClassC_2                        int                 
  33: mv                              Method              ClassC_1                        void                
  34: ClassC                          Class               ClassC_0                                            

  README file

   1: CSE3023 Compiler 
   2: AST Construction (scanner & parer for miniJava)
   3:  
   4: This Document written in Korean and English.
   5: This Document's encode is UTF8.
   6:  
   7: Team member
   8:  
   9:  
  10:  
  11:  
  12: 1. build instructions
  13:     1. make
  14:         - make를 하게 되면 자동으로 parser 파일이 만들어집니다.
  15:         만들어진 parser 파일을 실행시키면 됩니다.
  16:         parser test00.java 와 같은 방식으로 뒤에 java source file 이름을 넣어주면 이를 parsing합니다.
  17:  
  18:  
  19: 2. test instructions and expected result
  20:     java_source 폴더 안에 test00.java ~ test25.java라는 minijava source file이 있습니다. 이 파일을 parsing하는데 편리함을 주고자 test라는 이름의 batch file을 만들었습니다. 따라서 단순히 ./test 라는 명령어로 손쉽게 java_source 폴더 안의 소스 파일을 parsing할 수 있습니다.
  21:     프로그램을 실행시키면 다음과 같은 문구가 나옵니다.
  22: --------------------------------------------------------------------
  23: Start Scanning and Parsing the ./java_source/test00.java
  24: Write the AST to the ./java_source/test00.java.ast
  25: Write the SymbolTable to the ./java_source/test00.java.syt
  26: End the program
  27: --------------------------------------------------------------------
  28:     이는 각각 파일을 실행시킨 후 출력한 다음 AST가 저장되어있는 test25.java.ast 파일을 만듭니다. 그리고 SymboleTable이 저장되어있는 test25.java.syt 파일을 만듭니다. 그리고나서 프로그램을 종료하기 전에 문구를 출력합니다.
  29:     AST를 보면 다음과 같습니다.
  30: --------------------------------------------------------------------
  31: class MainClass {
  32:     public static void main (String[] args) {
  33:         int i;
  34:         int j = (-(37) + 31);
  35:         int k = ((i * y) + 2);
  36:         int z = (i + (y * 2));
  37:     }
  38: }
  39:  
  40: class TestC {
  41:     int TestC_a;
  42:     public static void func (boolean qwer) {
  43:     int i = 332;
  44: }
  45:     public int funcd (int asdf) {
  46:         int j = 31234;
  47:         return k;
  48:     }
  49:     private void funcdh () {
  50:     if ((a == 3)) {
  51:     int i;
  52:     int k = 3;
  53: }
  54: }
  55:     static boolean qwoierjslakf (int aldfkj, boolean qwioersl) {
  56:         int a;
  57:         int b = 12;
  58:         while (12) {
  59:     int k = 24;
  60: }
  61:         return 1234;
  62:     }
  63: }
  64: --------------------------------------------------------------------
  65:     즉, 주석과 같은 것을 제거한 소스 파일을 깔끔하게 출력하여 제공합니다.
  66:     다음으로 SymbolTable을 보면 다음과 같습니다.
  67: -----------------------------------------------------------------------------------------------------------------------
  68: Name                            Kind                Key                             Type                Modifier
  69: i                               Variable            MainClass_2                     int                 
  70: j                               Variable            MainClass_2                     int                 
  71: k                               Variable            MainClass_2                     int                 
  72: z                               Variable            MainClass_2                     int                 
  73: args                            Parameter           MainClass_1                                         
  74: MainClass                       MainClass           MainClass_0                                         
  75: TestC_a                         Field               TestC_1                         int                 
  76: qwer                            Parameter           TestC_1                         boolean             
  77: i                               Variable            TestC_2                         int                 
  78: func                            Method              TestC_1                         void                public static 
  79: asdf                            Parameter           TestC_1                         int                 
  80: j                               Variable            TestC_2                         int                 
  81: funcd                           Method              TestC_1                         int                 public 
  82: i                               Variable            TestC_3                         int                 
  83: k                               Variable            TestC_3                         int                 
  84: funcdh                          Method              TestC_1                         void                private 
  85: qwioersl                        Parameter           TestC_1                         boolean             
  86: aldfkj                          Parameter           TestC_1                         int                 
  87: a                               Variable            TestC_2                         int                 
  88: b                               Variable            TestC_2                         int                 
  89: k                               Variable            TestC_3                         int                 
  90: qwoierjslakf                    Method              TestC_1                         boolean             static 
  91: TestC                           Class               TestC_0                                             
  92: -----------------------------------------------------------------------------------------------------------------------
  93:     Name : id의 이름.
  94:     Kind : Method, Parameter, Variable, new id(), Class, MainClass, Field.
  95:     Key : ClassName_Level 즉, 클래스이름_레벨로 구성되어 있습니다.
  96:     Type : int, boolean, int [], id. Grammar의 Type과 동일합니다.
  97:     Modifier : public, private, public static, private static. Grammar의 Modifier와 동일합니다.
  98:     
  99:  
 100: 3. optional requirements implemented
 101:     SymbolTable을 만들고 출력하기 위해 SymbolTable, SymbolTableEntry 클래스를 만들었습니다. SymbolTable 안에 SymbolTableEntry pointer를 element type으로 가지는 STL vector를 두어 table을 stack으로 구현하였습니다.
 102:  
 103:  
 104: 4. anything that TA should know
 105:     1. 기존에 제공된 template에서는 cout을 사용하였는데, 이를 fout이라는 변수를 만들어 파일에 저장하도록 전부 수정하였습니다. 그래고 이 변수는 extern으로 두어 global하게 처리하였습니다.
 106:     2. SymbolTable의 level을 계산할 때 minijava.l에서 {을 만나면 SymbolTable의 MIDPL_T_Start() 함수를, }을 만나면 SymbolTable의 MIDPR_T_End() 함수를 불러 level을 올리고 내리는 일을 처리하였습니다. {을 만날 때의 stack top을 기억하여 }을 만났을 때 기억된 top부터 현 top까지의 level을 설정하였습니다.
 107:     3. miniJava Grammar에서 Statement 중 System.out.println( Expression_integer ); 에서 Expression_integer의 의미를 이해하지 못해 일단 Expression으로 처리하였습니다. 덕분에 System.out.println("Hello World!"); 와 같은 코드는 처리할 수 없습니다.
 108:     4. 코드를 보면 new를 잔뜩하였지만 이를 delete하는 코드는 만들지 않았습니다. 이 점이 아쉽습니다.
 109:     5. 테스트 환경은 다음과 같습니다.
 110:         Fedora 10 2.6.27.21-170.2.56.fc10.i686, flex 2.5.35, bison (GNU Bison) 2.3, gcc (GCC) 4.3.2 20081105 (Red Hat 4.3.2-7)
 111:  
 112:  
 113: 5. 구현 방법
 114:     minijava.l 파일에서 토큰을 가져오게 하였습니다. flex는 위에서 아래로 하나씩 검토하기에 가장 많은 것을 가질 수 있는 id와 숫자는 제일 밑에 두었습니다. 여기서 {과 }을 만났을 때 SymbolTable의 level을 신경쓰도록 하였습니다. (자세한 내용은 4.2)
 115:     minijava.y 파일에서 parsing을 하였습니다. 제공된 minijava grammar에 맞게 만들었으며 *의 경우 Non-Terminal을 하나 더 두어 처리하였습니다. 각 rule에는 거기에 맞는 class를 new하여 새롭게 만든 후 그 pointer를 반환하였습니다. 최종적으로 Program 클래스는 global하게 두었습니다. 이렇게 하여 AST를 만들었습니다. 또한, id를 만나는 rule이 있다면 SymbolTableEntry를 만들어 SymbolTable에 하나씩 추가하였습니다.
 116:     최종적으로 AST와 SymbolTable의 출력을 위해 main 함수에서 이를 수행하였습니다.
 117:  
 118:  

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.