Universal Semantic Graph

A language-agnostic graph representation that captures the pure semantics of programs beyond syntactic diversity

Applied to and operating in the Lena Code platform.

Formal Definition

Mathematically rigorous structure and components of USG

Graph Structure

USG is a directed graph defined as a 5-tuple. Each component represents the core semantic structure of programs.

G = (N, E_c, E_d, η, ι)

Node Set (N)

Finite set of representative atomic motifs (versioned catalog, not fixed)

Control Flow (E_c)

Directed edges representing program execution order

Data Dependencies (E_d)

Edges representing definition-use relationships of values

Type Function (η)

Maps each node to a semantic atom type

Semantic Atoms

Rep. atomic motifs (versioned) to express fundamental computation units

Major Atom Categories

  • Control Flow: Sequence, Conditional, Loop, Call
  • Data Manipulation: Binding, Assignment, Access, Literal
  • Operations: Arithmetic, Logical, Comparison, Type
  • Structure: Module, Function, Class, Interface
TOPAZ
1// Conceptual sketch of representative atom motifs
2let representativeAtoms = [
3    { kind: "Sequence", category: "control" },
4    { kind: "Conditional", category: "control" },
5    { kind: "Binding", category: "data" },
6    { kind: "Literal", category: "data" },
7    { kind: "BinaryOp", category: "operation" }
8]
9
10// Usage example
11let node = {
12    kind: "Conditional",
13    condition: comparisonNode,
14    thenBranch: executionNode1,
15    elseBranch: Some(executionNode2)
16}

Graph Properties

Mathematical properties and semantic guarantees of USG

Completeness

Designed to capture broad patterns with representative atoms (not a formal completeness claim)

Canonicality

Strives for consistent canonicalization for same-semantics programs (excluding boundary cases)

Minimality

Eliminates unnecessary redundancy to include only essential program semantics

Type Safety

Aims to preserve and use type hints/annotations.

Effect Label Lattice

Example relation: pure ⊑ read ⊑ write ⊑ io ⊑ global

Construction Process

Systematic transformation process from source code to USG

Transformation Steps

  • Phase 1: Language-specific parsing and AST generation
  • Phase 2: Semantic analysis and type inference
  • Phase 3: Translation to semantic atoms
  • Phase 4: Control/data flow graph construction
TOPAZ
1// USG construction process example
2function constructUSG(sourceCode: string, language: string) -> Result<USG, Error> {
3    // 1. Language-specific parsing
4    let ast = parseLanguageSpecific(sourceCode, language)?
5    
6    // 2. Semantic analysis
7    let semanticInfo = analyzeSemantics(ast)?
8    
9    // 3. Atom transformation
10    let atoms = transformToAtoms(semanticInfo)?
11    
12    // 4. Graph construction
13    let controlFlow = buildControlFlow(atoms)
14    let dataFlow = buildDataFlow(atoms)
15    
16    return Ok(buildUSG(atoms, controlFlow, dataFlow))
17}
18
19// Call example: any language normalizes to the same USG
20let source = """
21    def factorial(n):
22        return 1 if n <= 1 else n * factorial(n - 1)
23    """
24
25match constructUSG(source, "python") {
26    case Ok(usg) => print("USG node count: {usg.nodeCount()}")
27    case Err(error) => print("Normalization error: {error}")
28}

Applications

Various application domains leveraging USG's language-agnostic nature

Cross-language Translation

Use the USG graph projection for semantic-preserving cross-language analysis and reconstruction

Code Analysis

Development of language-agnostic static analysis, optimization, and bug detection tools

Program Verification

Mathematical foundation for behavior-gated verification and correctness review

Core Components

Relationship between USG and other core components of Core Semantic Kernel