Home
       ir.md - scc - simple c99 compiler
  HTML git clone git://git.simple-cc.org/scc
   DIR Log
   DIR Files
   DIR Refs
   DIR Submodules
   DIR README
   DIR LICENSE
       ---
       ir.md (8892B)
       ---
            1 # scc intermediate representation #
            2 
            3 The scc IR tries to be be a simple and easily parseable intermediate
            4 representation, and it makes it a bit terse and cryptic. The main
            5 characteristic of the IR is that all the types and operations are
            6 represented with only one letter, so parsing tables can be used
            7 to parse it.
            8 
            9 The language is composed of lines, representing statements.
           10 Each statement is composed of tab-separated fields.
           11 Declaration statements begin in column 0, expressions and
           12 control flow begin with a tabulator.
           13 When the frontend detects an error, it closes the output stream.
           14 
           15 ## Types ##
           16 
           17 Types are represented with uppercase letters:
           18 
           19 * C -- signed    8-Bit integer
           20 * I -- signed   16-Bit integer
           21 * W -- signed   32-Bit integer
           22 * Q -- signed   64-Bit integer
           23 * K -- unsigned  8-Bit integer
           24 * N -- unsigned 16-Bit integer
           25 * Z -- unsigned 32-Bit integer
           26 * O -- unsigned 64-Bit integer
           27 * 0 -- void
           28 * P -- pointer
           29 * F -- function
           30 * V -- vector
           31 * U -- union
           32 * S -- struct
           33 * B -- bool
           34 * J -- float
           35 * D -- double
           36 * H -- long double
           37 
           38 This list has been built for the original Z80 backend, where 'int'
           39 has the same size as 'short'. Several types (S, F, V, U and others) need
           40 an identifier after the type letter for better differentiation
           41 between multiple structs, functions, vectors and unions (S1, V12 ...)
           42 naturally occuring in a C-program.
           43 
           44 ## Storage classes ##
           45 
           46 The storage classes are represented using uppercase letters:
           47 
           48 * A -- automatic
           49 * R -- register
           50 * G -- public (global variable declared in the module)
           51 * X -- extern (global variable declared in another module)
           52 * Y -- private (variable in file-scope)
           53 * T -- local (static variable in function-scope)
           54 * M -- member (struct/union member)
           55 * L -- label
           56 
           57 ## Declarations/definitions ##
           58 
           59 Variable names are composed of a storage class and an identifier
           60 (e.g. A1, R2, T3).
           61 Declarations and definitions are composed of a variable
           62 name, a type and the name of the variable:
           63 
           64         A1        I        maxweight
           65         R2        C        flag
           66         A3        S4        statstruct
           67 
           68 ### Type declarations ###
           69 
           70 Some declarations (e.g. structs) involve the declaration of member
           71 variables.
           72 Struct members are declared normally after the type declaration in
           73 parentheses.
           74 
           75 For example the struct declaration
           76 
           77         struct foo {
           78                 int i;
           79                 long c;
           80         } var1;
           81 
           82 generates
           83 
           84         S2      foo     (
           85         M3      I       i
           86         M4      W       c
           87         )
           88         G5      S2      var1
           89 
           90 ## Functions ##
           91 
           92 A function prototype
           93 
           94         int printf(char *cmd, int flag, void *data);
           95 
           96 will generate a type declaration and a variable declaration
           97 
           98         F5        P        I        P
           99         X1        F5        printf
          100 
          101 The first line gives the function-type specification 'F' with
          102 an identifier '5' and subsequently lists the types of the
          103 function parameters.
          104 The second line declares the 'printf' function as a publicly
          105 scoped variable.
          106 
          107 Analogously, a statically declared function in file scope
          108 
          109         static int printf(char *cmd, int flag, void *data);
          110 
          111 generates
          112 
          113         F5      P       I       P
          114         T1      F5      printf
          115 
          116 Thus, the 'printf' variable  went into local scope ('T').
          117 
          118 A '{' in the first column starts the body of the previously
          119 declared function:
          120 
          121         int printf(char *cmd, int flag, void *data) {}
          122 
          123 generates
          124 
          125         F5      P       I       P
          126         G1      F5      printf
          127         {
          128         A2      P       cmd
          129         A3      I       flag
          130         A4      P       data
          131         -
          132         }
          133 
          134 Again, the frontend must ensure that '{' appears only after the
          135 declaration of a function. The character '-' marks the separation
          136 between parameters and local variables:
          137 
          138         int printf(register char *cmd, int flag, void *data) {int i;};
          139 
          140 generates
          141 
          142         F5      P       I       P
          143         G1      F5      printf
          144         {
          145         R2      P       cmd
          146         A3      I       flag
          147         A4      P       data
          148         -
          149         A6      I       i
          150         }
          151 
          152 ### Expressions ###
          153 
          154 Expressions are emitted in reverse polish notation, simplifying
          155 parsing and converting into a tree representation.
          156 
          157 #### Operators ####
          158 
          159 Operators allowed in expressions are:
          160 
          161 * \+ -- addition
          162 * \- -- substraction
          163 * \* -- multiplication
          164 * % -- modulo
          165 * / -- division
          166 * l -- left shift
          167 * r -- right shift
          168 * < -- less than
          169 * > -- greather than
          170 * ] -- greather or equal than
          171 * [ -- less or equal than
          172 * = -- equal than
          173 * ! -- different than
          174 * & -- bitwise and
          175 * | -- bitwise or
          176 * ^ -- bitwise xor
          177 * ~ -- bitwise complement
          178 * : -- asignation
          179 * _ -- unary negation
          180 * c -- function call
          181 * p -- parameter
          182 * . -- field
          183 * , -- comma operator
          184 * ? -- ternary operator
          185 * ' -- take address
          186 * a -- logical shortcut and
          187 * o -- logical shortcut or
          188 * @ -- content of pointer
          189 
          190 Assignation has some suboperators:
          191 
          192 * :/ -- divide and assign
          193 * :% -- modulo and assign
          194 * :+ -- addition and assign
          195 * :- -- substraction and assign
          196 * :l -- left shift and assign
          197 * :r -- right shift and assign
          198 * :& -- bitwise and and assign
          199 * :^ -- bitwise xor and assign
          200 * :| -- bitwise or and assign
          201 * :i -- post increment
          202 * :d -- post decrement
          203 
          204 Every operator in an expression has a type descriptor.
          205 
          206 #### Constants ####
          207 
          208 Constants are introduced with the character '#'. For instance, 10 is
          209 translated to #IA (all constants are emitted in hexadecimal),
          210 where I indicates that it is an integer constant.
          211 Strings are a special case because they are represented with
          212 the " character.
          213 The constant "hello" is emitted as "68656C6C6F. For example
          214 
          215         int
          216         main(void)
          217         {
          218                 int i, j;
          219 
          220                 i = j+2*3;
          221         }
          222 
          223 generates
          224 
          225         F1
          226         G1        F1        main
          227         {
          228         -
          229         A2      I        i
          230         A3      I        j
          231                 A2        A3        #I6        +I        :I
          232         }
          233 
          234 Type casts are expressed with a tuple denoting the
          235 type conversion
          236 
          237         int
          238         main(void)
          239         {
          240                 int i;
          241                 long j;
          242 
          243                 j = (long)i;
          244         }
          245 
          246 generates
          247 
          248         F1
          249         G1      F1      main
          250         {
          251         -
          252         A2      I       i
          253         A3      W       j
          254                 A2      A3      WI      :I
          255         }
          256 
          257 ### Statements ###
          258 #### Jumps #####
          259 
          260 Jumps have the following form:
          261 
          262         j        L#        [expression]
          263 
          264 the optional expression field indicates some condition which
          265 must be satisfied to jump. Example:
          266 
          267         int
          268         main(void)
          269         {
          270                 int i;
          271 
          272                 goto    label;
          273         label:
          274                 i -= i;
          275         }
          276 
          277 generates
          278 
          279         F1
          280         G1      F1      main
          281         {
          282         -
          283         A2        I        i
          284                 j        L3
          285         L3
          286                 A2        A2        :-I
          287         }
          288 
          289 Another form of jump is the return statement, which uses the
          290 letter 'y' followed by a type identifier.
          291 Depending on the type, an optional expression follows.
          292 
          293         int
          294         main(void)
          295         {
          296                 return 16;
          297         }
          298 
          299 generates
          300 
          301         F1
          302         G1        F1        main
          303         {
          304         -
          305                 yI        #I10
          306         }
          307 
          308 
          309 #### Loops ####
          310 
          311 There are two special characters that are used to indicate
          312 to the backend that the following statements are part of
          313 a loop body.
          314 
          315 * b -- beginning of loop
          316 * e -- end of loop
          317 
          318 #### Switch statement ####
          319 
          320 Switches are represented using a table, in which the labels
          321 where to jump for each case are indicated. Common cases are
          322 represented with 'v' and default with 'f'.
          323 The switch statement itself is represented with 's' followed
          324 by the label where the jump table is located, and the
          325 expression of the switch:
          326 
          327         int
          328         func(int n)
          329         {
          330                 switch (n+1) {
          331                 case 1:
          332                 case 2:
          333                 case 3:
          334                 default:
          335                         ++n;
          336                 }
          337         }
          338 
          339 generates
          340 
          341         F2        I
          342         G1        F2        func
          343         {
          344         A1        I        n
          345         -
          346                 s        L4        A1        #I1        +I
          347         L5
          348         L6
          349         L7
          350         L8
          351                 A1        #I1        :+I
          352                 j        L3
          353         L4
          354                 t        #4
          355                 v        L7        #I3
          356                 v        L6        #I2
          357                 v        L5        #I1
          358                 f        L8
          359         L3
          360         }
          361 
          362 The beginning of the jump table is indicated by the the letter 't',
          363 followed by the number of cases (including default case) of the
          364 switch.
          365 
          366 ## Resumen ##
          367 
          368 * C -- signed    8-Bit integer
          369 * I -- signed   16-Bit integer
          370 * W -- signed   32-Bit integer
          371 * O -- signed   64-Bit integer
          372 * M -- unsigned  8-Bit integer
          373 * N -- unsigned 16-Bit integer
          374 * Z -- unsigned 32-Bit integer
          375 * Q -- unsigned 64-Bit integer
          376 * 0 -- void
          377 * P -- pointer
          378 * F -- function
          379 * V -- vector
          380 * U -- union
          381 * S -- struct
          382 * B -- bool
          383 * J -- float
          384 * D -- double
          385 * H -- long double
          386 * A -- automatic
          387 * R -- register
          388 * G -- public (global variable declared in the module)
          389 * X -- extern (global variable declared in another module)
          390 * Y -- private (variable in file-scope)
          391 * T -- local (static variable in function-scope)
          392 * M -- member (struct/union member)
          393 * L -- label
          394 * { -- beginning of function body
          395 * } -- end of function body
          396 * \\ -- end of function parameters
          397 * \+ -- addition
          398 * \- -- substraction
          399 * \* -- multiplication
          400 * % -- modulo
          401 * / -- division
          402 * l -- left shift
          403 * r -- right shift
          404 * < -- less than
          405 * > -- greather than
          406 * ] -- greather or equal than
          407 * [ -- less or equal than
          408 * = -- equal than
          409 * ! -- different than
          410 * & -- bitwise and
          411 * | -- bitwise or
          412 * ^ -- bitwise xor
          413 * ~ -- bitwise complement
          414 * : -- asignation
          415 * _ -- unary negation
          416 * c -- function call
          417 * p -- parameter
          418 * . -- field
          419 * , -- comma operator
          420 * ? -- ternary operator
          421 * ' -- take address
          422 * a -- logical shortcut and
          423 * o -- logical shortcut or
          424 * @ -- content of pointer
          425 * :/ -- divide and assign
          426 * :% -- modulo and assign
          427 * :+ -- addition and assign
          428 * :- -- substraction and assign
          429 * :l -- left shift and assign
          430 * :r -- right shift and assign
          431 * :& -- bitwise and and assign
          432 * :^ -- bitwise xor and assign
          433 * :| -- bitwise or and assign
          434 * ;+ -- post increment
          435 * ;- -- post decrement
          436 * j -- jump
          437 * y -- return
          438 * b -- begin of loop
          439 * d -- end of loop
          440 * s -- switch statement
          441 * t -- switch table
          442 * v -- case entry in switch table
          443 * f -- default entry in switch table