Architecture¶
IRx is organized as a small compiler pipeline with a deliberate boundary between semantic meaning and backend-specific lowering. The goal is to keep the codebase easy to extend without letting semantic rules slowly drift into code generation.
Design Goals¶
The current architecture is shaped by a few practical goals:
- Keep parsing, semantic analysis, and code generation as distinct phases.
- Make semantic analysis the authority for meaning and program validity.
- Keep backend packages focused on emission, not interpretation.
- Preserve method-based multiple dispatch for visitor-driven lowering.
- Use package structure to communicate architecture instead of large utility
modules or generic
helpers/folders.
Pipeline Overview¶
IRx currently follows this high-level flow:
ASTx parser output -> semantic analysis -> resolved semantic sidecars -> backend code generation
The parser produces raw ASTx nodes. Those nodes are still close to surface
syntax and may not yet have enough information for direct lowering. The
semantic-analysis phase walks that tree, resolves symbols and types, validates
program rules, and attaches a structured node.semantic sidecar to the nodes
that backend code needs.
By the time a backend starts lowering, it should not need to infer meaning from raw syntax or re-run language validation from scratch.
Semantic Analysis¶
The semantic-analysis package lives in packages/irx/src/irx/analysis/ and is
intentionally independent from LLVM or llvmlite.
It is responsible for:
- symbol resolution
- lexical scope tracking
- mutability and assignment validation
- function and return validation
- loop-control legality such as
breakandcontinue - expression typing and promotion policy
- operator normalization
- semantic flag normalization such as unsigned and fast-math intent
- diagnostics collection and semantic error reporting
The public entry points are:
irx.analysis.analyze(node)irx.analysis.analyze_module(module)irx.analysis.analyze_modules(root, resolver)
These entry points return the same AST root after attaching semantic sidecars.
If semantic validation fails, analysis raises SemanticError before codegen
begins.
Semantic Contract¶
The host-facing semantic boundary is now explicit in code through
irx.analysis.get_semantic_contract(). That contract names the stable semantic
phases, the SemanticInfo and CompilationSession metadata that must exist
before codegen, and the boundary between semantic, lowering, and linking/runtime
failures.
See Semantic Contract for the concise contract summary.
Why sidecars instead of a separate HIR?¶
For the current size of IRx, attaching explicit semantic sidecars to AST nodes is the lightest approach that still creates a clean boundary. It gives codegen resolved information without introducing a second full tree structure before it is needed.
If the language grows to the point where a true HIR becomes useful, the current phase split still leaves room for that evolution.
Multi-Module Boundary¶
IRx now also supports a parser-agnostic multi-module path for imports.
The boundary is explicit:
- the host compiler parses source text into
astx.Moduleobjects - the host compiler decides how an import specifier maps to a module
- IRx receives
ParsedModuleobjects plus anImportResolver - IRx expands the reachable dependency graph, performs cross-module semantic analysis, and lowers the reachable graph into one LLVM module for the MVP
Import-from resolution remains symbol-first, but it also supports child-module
namespace sugar: import stats from sciarx may bind sciarx.stats as a local
module namespace when sciarx does not already expose an importable symbol
named stats.
IRx does not parse source text, search the filesystem, or implement package discovery. Those responsibilities stay outside the library.
Template Specialization Metadata¶
IRx also carries semantic-only template metadata for compile-time specialization. The current scope is bounded template functions and methods.
Semantic analysis preserves:
- template parameters attached to callable definitions
- finite union bounds used as specialization domains
- unresolved template type variables inside generic signatures
- explicit template arguments attached to call sites
- stable specialization identities and generated concrete callables
During analysis, template bodies are validated over every admissible bound substitution. Successful specializations are materialized as generated concrete functions so backend lowering can continue to operate mostly on ordinary non-template callables. That generated specialization set is treated as per-analysis state and is cleared before rerunning semantic analysis on the same AST module.
For v1, template methods lower only as direct concrete specializations. They do not participate in class dispatch slots or virtual-style dispatch tables.
Compilation Session¶
The multi-module path is centered on CompilationSession in
packages/irx/src/irx/analysis/session.py.
That session owns:
- the root parsed module and resolver callback
- the cache of reachable parsed modules
- the import dependency graph and stable load order
- cycle diagnostics
- per-module visible top-level bindings used for direct imports and module namespace aliases
- semantic-only module namespace values and namespace-member lookup metadata
Semantic identity for top-level functions and structs is module-aware. Backend lowering consumes that semantic identity rather than raw source names, which is what keeps same-named declarations in different modules from colliding in LLVM.
Shared Visitor Foundation¶
IRx also has a shared visitor layer in packages/irx/src/irx/base/visitors/.
It currently provides:
BaseVisitorProtocol: the minimal typing contract shared by visitor-style classesBaseVisitor: a concrete Plum-dispatch scaffold with explicitNotImplementedErrordefaults for the current ASTx node surface
This keeps typing and runtime behavior separate:
- protocols define what visitor-like objects must expose
- the concrete base class defines what happens for unsupported nodes
In practice:
SemanticAnalyzerinheritsBaseVisitorBuilderVisitorinheritsBaseVisitor- builder-specific protocols such as
builder.VisitorProtocolextendBaseVisitorProtocol
Builder Architecture¶
IRx now exposes a single builder package at packages/irx/src/irx/builder/. The
package path identifies the concrete LLVM builder, while the public classes
inside it use short generic names.
For example, packages/irx/src/irx/builder exposes:
BuilderVisitorVisitorProtocol- optional
VisitorCoreas a module-private implementation class
This keeps the public API concise without reintroducing legacy class prefixes.
Builder Package Layout¶
The LLVM backend is split into first-class modules instead of one monolithic builder:
../packages/irx/src/irx/base/visitors/: shared visitor protocol and runtime scaffoldbackend.py: public backend entry pointscore.py: shared mutable lowering state and backend lifecycleprotocols.py: typing contract used by mixins and runtime featurestypes.py,casting.py,vector.py,strings.py,runtime/: shared IR infrastructurelowering/: concern-groupedvisit(...)overloads../packages/irx/src/irx/buffer.py: the canonical low-level buffer owner/view semantic substrate that Arx can target without exposing an array API
Foundational modules stay at the package root because they are architectural components, not incidental helpers.
Buffer/View Indexing¶
IRx treats first-class indexing as a low-level operation over the canonical
buffer/view descriptor in packages/irx/src/irx/buffer.py. It is the stable
memory/container path that Arx can target for element access such as a[i],
a[i, j], and the corresponding stores. It is not a NumPy-like array API and
does not define slicing, broadcasting, fancy indexing, masks, or shape
inference.
Indexed access has an explicit IRx node surface for reads and stores. Semantic analysis validates the descriptor base, the number of indices, index scalar types, mutability for stores, static bounds when descriptor shape and literal indices make the answer provable, and the scalar element type used by lowering. The MVP requires static descriptor metadata for rank validation. Dynamic-rank runtime checks are intentionally deferred.
Backend lowering keeps address computation separate from load/store emission.
The address helper extracts descriptor fields through
BUFFER_VIEW_FIELD_INDICES, starts from data, includes offset_bytes, loads
byte strides from strides, and computes:
effective_byte_offset = offset_bytes + sum(index_k * stride_k)
The result is cast to the resolved element pointer type. Indexed reads emit a load from that pointer; indexed stores cast the right-hand side to the resolved element type and emit a store. The default bounds policy means semantic static bounds rejection when provable and no emitted runtime bounds helper yet. Future checked and unchecked runtime modes can reuse the same element-pointer helper.
Dynamic List Construction¶
IRx also exposes one intentionally small list-building surface for frontend- emitted AST:
ListCreate(element_type)creates an empty list value with explicit element typeListAppend(base, value)grows a mutable list variable or field- regular
SubscriptExprlowering may read from produced list values
This is deliberately narrower than a full collection API. The goal is to let frontends author pure source routines that accumulate list results inside loops without moving collection policy into the frontend. The current runtime owns append/growth and indexed reads only; list teardown is intentionally deferred to a future ownership API.
Common Collection Methods¶
IRx also exposes backend-neutral query nodes for common collection operations:
CollectionLength(base)returns the logical length asInt32CollectionIsEmpty(base)returns a Boolean emptiness checkCollectionContains(base, value)checks list, tuple, or set values and dict keysCollectionIndex(base, value)returns the first list/tuple index or-1CollectionCount(base, value)returns the number of list/tuple matches
Semantic analysis validates the receiver kind, probe type, and result type and
attaches a ResolvedCollectionMethod sidecar. Lowering consumes that sidecar
instead of re-resolving the collection operation from raw AST shape.
Literal lists, tuples, sets, and dictionaries support common length, emptiness, and containment queries. Dynamic IRx lists also support length, emptiness, contains, index, and count by reusing the existing list runtime and emitting small search loops where needed. Dynamic set and dictionary method lowering remains intentionally deferred until those runtime representations exist.
Iterable Semantics¶
IRx now models iteration as a semantic capability instead of as backend-specific
collection probing. Semantic analysis resolves known iterable expressions into a
ResolvedIteration sidecar that records the adapter kind, yielded element type,
ordering contract, and loop/comprehension target symbol. Backend lowering
consumes that sidecar instead of rediscovering whether an expression is a list,
set, or dict.
The executable MVP supports ForInLoopStmt and ListComprehension over list
iterables, including literal lists and dynamic IRx lists. List iteration follows
index order and evaluates the iterable once when that loop or comprehension
clause is entered. Dict and set literals are recognized semantically as
iterables as well: dict iteration yields keys, while set iteration order remains
unspecified. Their dynamic lowering is intentionally guarded until IRx has
runtime-backed dynamic dict and set construction APIs.
Generator Semantics¶
IRx models named generator functions as functions returning GeneratorType(T).
Semantic analysis validates top-level yield sites against the declared yielded
element type and exposes generator values through the same ResolvedIteration
sidecar used by ForInLoopStmt.
The initial executable lowering supports straight-line named generator functions
with top-level YieldStmt nodes. Calls create a small generator object
containing an opaque frame pointer and an internal resume function pointer.
For-in lowering calls the resume function until it reports exhaustion. More
Python-compatible behavior such as nested-control-flow suspension, yield from,
generator expressions, send, throw, and close remains deferred.
Tensor Layering¶
IRx now treats Tensor support as a distinct semantic layer aligned with Apache
Arrow's homogeneous tensor model:
- the builtin Arrow C++ backed tensor runtime provides dtype, shape, stride, and
data-buffer ownership for homogeneous N-dimensional values through
arrow::Tensor - the canonical
irx_buffer_viewsubstrate remains the lowering descriptor for indexing, byte-offset calculation, ownership, and layout flags
That split keeps the data-container roles explicit:
tensoris the homogeneous N-dimensional semantic abstractionarrayremains the one-dimensional Arrow array/runtime abstraction for column-like values and futureSerieswork- future
DataFramesupport should wrap heterogeneous named columns backed by Arrow C++arrow::Tablestorage buffer/viewremains the low-level layout and ownership substrate
Current tensor lowering stays intentionally conservative:
- literals build Arrow C++ tensor handles through
irx_arrow_tensor_*, then wrap borrowed tensor buffers in external-owner buffer views - indexing and byte-offset queries reuse buffer/view stride arithmetic
- view construction is shallow and metadata-driven
- fixed-width numeric element types are supported in this phase
- Arrow C++ backed
Tensorvalues remain readonly in this phase
Why visit(...) Remains the Public Lowering Boundary¶
The codegen layer continues to use method-based Plum multiple dispatch:
visit(self, node: ...)
This remains the only public dispatch boundary for backend lowering. IRx does
not use a free-function dispatch registry or a second public API like
lower(...) or build_node(...).
That choice keeps backend code readable and local:
- AST-family-specific lowering remains attached to the visitor class.
- Mixins can group overloads by concern without changing the public surface.
- Shared lowering state stays on the visitor instance instead of moving into a registry-driven design.
Core Class and Protocol¶
VisitorProtocol and VisitorCore serve different purposes:
VisitorProtocoldefines the stable interface that mixins and runtime feature declarations depend on for typing, building onBaseVisitorProtocol.VisitorCoreis the concrete implementation center that owns mutable state, module setup, helper methods, and backend lifecycle.
VisitorCore is still internal to the backend package. IRx uses
from public import private for module-level internal helpers and internal
implementation classes when a clear non-underscored name reads better than an
underscore-prefixed export. That keeps internal names readable without making
them part of the intended public surface.
The protocol is not a replacement for the core class. It exists so backend subsystems can depend on a narrow contract instead of the full concrete type.
Visitor Mixins¶
The final backend visitor is composed from concern-specific mixins plus the shared core. Each mixin should contain:
@dispatch def visit(self, node: ...)overloads for one concern- a small number of private helpers local to that concern
Examples of concern boundaries include:
- literals
- variables
- unary and binary operators
- control flow
- functions
- runtime or domain-specific lowering
This keeps dispatch organization aligned with language structure while still sharing one lowering state object.
Canonical Loop Lowering¶
IRx now treats loop lowering as one small shared control-flow contract instead of three ad hoc visitors:
while:cond -> body -> exit, withcontinuetargetingcondfor-count:cond -> body -> update -> exit, withcontinuetargetingupdatefor-range:cond -> body -> step -> exit, withcontinuetargetingstep
Loop variables remain semantic symbols rather than backend-only temporaries. For-count initializers are visible only within the loop. For-range induction variables are body-visible, loop-scoped, and restored after lowering so outer shadowed bindings remain stable. Mutable post-loop state reconciles through the existing variable-slot model instead of accidental value-stack state.
Contributor Guidelines¶
When extending IRx, these rules help preserve the architecture:
- Put semantic meaning and validation in
analysis/, not in a backend. - Let codegen consume normalized semantic information instead of re-deriving it.
- Keep buffer/view support framed as a low-level memory/container substrate, not as NumPy-like user-facing array behavior.
- Keep shared visitor dispatch defaults in
packages/irx/src/irx/base/visitors/so semantic and backend visitors fail consistently for unsupported ASTx nodes. - Add new backend-wide infrastructure at the package root, not under
helpers/. - Keep mutable lowering state instance-local.
- Prefer explicit code over clever abstractions.
- Use the package name, not class prefixes, to identify the backend.
If Another Backend Ever Returns¶
IRx currently standardizes on a single builder package. If another backend is ever introduced again, keep the public class names generic and make the package split an explicit architecture decision instead of quietly rebuilding a plural builders namespace.