Symbol Tables and Hash Functions
Every compiler stores data in different symbol tables. Symbol table is a special data structure that is designed for easy insertion of new symbol with associated data and quick search through all elements in the table.
Turbo Pascal uses many symbol tables. Main symbol table stores all identifiers with their associated data while other tables store data created and needed during compilation process. Unit file (tpu
extension) is a collection of all symol tables that are created after unit source file compilation.
Type TSymbolTable = (
stMain,
stProcedures,
stCodeBlocks,
stTypedConstantsBlocks,
stVariablesBlocks,
st5,
stReferencedModules,
stUsedFiles,
stSourceLines,
stSourceLineCodeOffsets,
stTemporary,
st11,
stCode,
stTypedConstants,
stCodeReferences,
stTypedConstantsReferences,
st16,
st17,
st18,
stIntermediateCode);
Identifiers are stored in one-way linked lists. Each list item (identifier with data) has a link to the next item. This link is relative offset so the link does not depend on the table address. To speed up the search process hash function is used. For each identifier a hash value based on the case insensitive name characters is calculated:
Procedure CopyStringToCurrentIdentifier (Const Str: String);
Var N: Byte;
begin
CurrentIdentifier := Str;
CurrentIdentifierHash := - Length (CurrentIdentifier);
For N := 1 to Length (CurrentIdentifier) do
Inc (CurrentIdentifierHash, Byte (CurrentIdentifier [N]) and $DF);
CurrentIdentifierHash := CurrentIdentifierHash shl 1;
end;
Last few bits of this hash value (symbol table Mask parameter) are used to define one linked list for this identifier. Instead of searching through all identifiers, only one shorter list with identifiers that have particular hash value has to be searched. Therefore, each identifier symbol table consists of many linked lists.
Type
PIdentifierList = ^TIdentifierList;
TIdentifierList = Record
Mask: Word;
Offset: Array [0..255] of Word;
end;
PIdentifier = ^TIdentifier;
TIdentifier = Record
Next: Word;
Token: TToken;
Name: Str64Rec;
end;
Each identifier has some additional data according to its type which immediately follows identifier name in the symbol table:
Type
TUnitIdentifierFlags = (UsedInInterface);
TUnitIdentifierFlagsSet = Set of TUnitIdentifierFlags;
PUnitIdentifierData = ^TUnitIdentifierData;
TUnitIdentifierData = Record
UnitSegment: Word;
PublicIdentifiersChecksum: Word;
NextUnitIdentifier: Word;
CurrentUsedUnitIdentifier: Word;
UnitIdentifierFlags: TUnitIdentifierFlagsSet;
end;
PConstantIdentifierData = ^TConstantIdentifierData;
TConstantIdentifierData = Record
UnitTypeOffsets: TUnitOffsets;
Case Byte of
0: (Value: Byte); { Constant value (1..n bytes }
1: (OrdinalValue, W2, W4, W6: Word); { Enumerated item }
end;
PVariableIdentifierData = ^TVariableIdentifierData;
TVariableIdentifierData = Record
Flags: TVarFlagsSet;
Case Byte of
0: (W1: PtrRec; W5: Word; UnitTypeOffsets: TUnitOffsets;); { Field: Ofs = Offset of field }
1: (AbsoluteVarDataOffsets: TUnitOffsets);
end;
PTypeIdentifierData = ^TTypeIdentifierData;
TTypeIdentifierData = Record
UnitTypeOffsets: TUnitOffsets;
end;
TProcedureFlags = (pfFar, pfInline, pfInterrupt, pfExternal, pfMethod, pfConstructor, pfDestructor,
pfAssembler, pf0100, pf0200, pfDynamic, pfStackFrame);
TProcedureFlagsSet = Set of TProcedureFlags;
PProcedureIdentifierData = ^TProcedureIdentifierData;
TProcedureIdentifierData = Record
Flags: TProcedureFlagsSet;
ProceduresRecordOffset: Word; { INLINE CodeSize }
OuterBlockProcedureIdentifier: Word; { also ancestor method, ancestor object TypeDef }
LocalIdentifiersList: Word;
W8: Word; { Method VMT Table Offset, Dynamic Method Index }
ProcedureTypeDefinition: TProcedureTypeDefinition;
end;
PProcedureParameterData = ^TProcedureParameterData;
TProcedureParameterData = Record
UnitTypeOffsets: TUnitOffsets;
VarFlags: TVarFlagsSet;
end;
PLabelIdentifierData = ^TLabelIdentifierData;
TLabelIdentifierData = Record
LabelRecordOffset: Word;
end;
PSystemProcedureIdentifierData = ^TSystemProcedureIdentifierData;
TSystemProcedureIdentifierData = Record
ProcData: Word;
end;