PDF file grammar

 


 

The pdf file is analyzed following a syntactic level and a lexical level.

 

 

The syntactic level

 

The grammar of this level is presented in the YACC format:

   1  file : protocol_list
   2  protocol_list : pheader ptrailer
   3                | protocol_list pheader ptrailer
   4  pheader : PROTOCOL name structs ENDPR
   5          | PROTOCOL name ENDPR
   6  skip : SKIP ':' expression
   7  ptrailer : skip next_protocol
   8           | next_protocol
   9  next_protocol : ELSE name
  10                | n_protocol ELSE name
  11  n_protocol : CASE expression ':' name
  12             | n_protocol CASE expression ':' name
  13  structs : structs def bit_defs
  14          | def bit_defs
  15  bit_defs : '{' b_defs '}'
  16           |
  17  b_defs : BIT name'(' number ',' number ')'
  18         | b_defs BIT name'(' number ',' number ')'
  19  def : BYTE name
  20      | WORD name
  21      | DWORD name
  22  name: NAME
  23  number : NUMBER
  24         | NUMBER_H
  25  expression : name
  26              | number 
  27              | '(' expression ')'
  28              | expression '+' expression 
  29              | expression  NOT_EQ expression 
  30              | expression  MIU expression 
  31              | expression MAU expression 
  32              | expression '<' expression 
  33              | expression '>' expression 
  34              | expression '=' expression 
  35              | expression AND expression 
  36              | expression OR expression 
  37              | expression '-' expression 
  38              | expression '*' expression 
  39              | expression '/' expression 
  40              | expression '%' expression 
  41              | '-' expression 
  42              | NOT expression 

 

The operators precedence in the expressions::

Low Precedence
Operators Association
OR left
AND left
'=',NOT_EQ,'<','>',MIU,MAU left
'+','-' left
'*','/','%' left
NOT,MENO_UNARIO right
High Precedence

 

The lexical level

Here it is how the scanner recognizes the tokens in the file. It is presented in the LEX format:

SPACE ([\t\r\n ])
SPACES ({SPACE}*)
LETTER ([a-zA-Z])
DIGIT ([0-9])
F_DIGIT ([1-9])
DECIMAL ({DIGIT}*{F_DIGIT}|0)
HEX ((("0x")|("0X"))([a-fA-F]|{DIGIT})+)
INTEGER ({F_DIGIT}{DIGIT}*)
VAR_NAME ((("_")|{LETTER})({LETTER}|{DIGIT}|("_"))*)
CONST_INT_SHORT ({INTEGER}|[0])
OP ([-+=*/%():{},])
VAR ({VAR_NAME})
COMMENT (("/*"([^*]*|"*"+[^*/])"*"+\/)|("//".*))
%%
BYTE {return BYTE;}
BIT {return BIT;}
BITS {return BIT;}
WORD {return WORD;}
DWORD {return DWORD;}
AND {return AND;}
"&&" {return AND;}
"&" {return AND;}
OR {return OR;}
"||" {return OR;}
"|" {return OR;}
NOT {return NOT;}
"!" {return NOT;}
"~" {return NOT;}
PROTOCOL {return PROTOCOL;}
ENDPR {return ENDPR;}
NOT {return NOT;}
SKIP {return SKIP;}
CASE {return CASE;}
ELSE {return ELSE;}
"!=" {return NOT_EQ;}
"<=" {return MIU;}
">=" {return MAU;}
{HEX} {return NUMBER_H;}
{CONST_INT_SHORT} {return NUMBER;}
{COMMENT} {/* Discards comments */ }
{OP} {return yytext[0];}
{VAR} {return NAME;}
{SPACE}+ {/*ignores spaces*/}
. {printf("Scanner error at line %d\n",yylineno);return (-1);}
%%
'AND' can be written : 'AND', '&&', '&'
'OR' can be written: 'OR', '||', '|'
'NOT' can be written: 'NOT', '!', '~'
'=' can be written: '==', '='