Runtime Features¶
IRx lowers ASTx nodes to LLVM IR with llvmlite, but some capabilities are
better modeled as calls into a native runtime instead of handwritten container
logic in LLVM IR. The runtime-feature system exists for exactly that boundary.
Why This Exists¶
IRx already had a small precedent for external/native symbols such as puts.
That worked for a few direct libc calls, but it did not provide a maintainable
way to:
- declare external symbols once per feature
- activate native capabilities per compilation unit
- compile and link native C/C++ sources only when they are actually needed
- keep native runtime ownership rules outside the LLVM IR middle-end
This runtime-feature layer keeps IRx focused on lowering while allowing Arx to grow additional native integrations later. It is also the public native-dependency side of IRx's stable FFI contract.
Architecture¶
The runtime stack is layered in four parts:
irx.builder.runtime.featuresDefines feature specs: external symbols, native artifacts, linker flags, and metadata.irx.builder.runtime.registryRegisters features by name and tracks activation/declarations for one LLVM module.irx.builder.runtime.linkingCompiles native C/C++ sources and links optional objects only for active features.- Feature packages such as
libcandarrayConsume the generic system without special cases in the builder.
Activation Model¶
Runtime features are named and activated per compilation unit. The builtin array runtime is packaged with IRx even though its native artifacts are linked only when needed.
libcDeclares symbols such asputs,malloc, andsnprintf.assertionsDeclares__arx_assert_fail(...)and links the native fatal assertion helper that emits machine-readable stderr reports.libmDeclares math symbols such assqrtand contributes-lm.bufferDeclares the low-level buffer owner/view lifetime helper ABI.arrayDeclares the builtin one-dimensional Arrow array runtime surface.tensorDeclares the builtin homogeneous N-dimensional Arrow tensor runtime surface.listDeclares the minimal dynamic-list runtime used byListCreate,ListAppend, and lowered list indexing.
The builder and visitor cooperate as follows:
- explicit extern declarations may declare
runtime_featureorruntime_featuresonFunctionPrototype - lowering requests feature-owned symbols through
require_runtime_symbol(feature, symbol) - the request activates the feature for that compilation unit
- the linker step collects native artifacts only from active features
- inactive features contribute nothing to the link command
This is intentionally separate from any future language-level import or module
system. A future Arx array-facing layer can decide when to activate array, but
the native integration remains owned by IRx.
Dynamic List Runtime Caveat¶
The current list runtime is intentionally narrow:
- it supports append/growth
- it supports indexed access
- it does not yet expose a destroy/release helper
That means dynamically produced list storage is currently process-lifetime. This is acceptable for the current MVP surface, but it is not a complete ownership model yet and should not be read as a final memory-management contract.
Extern Declarations And Feature-Backed Linking¶
Public FFI declarations now use one consistent rule:
- extern declarations with no runtime features emit only an LLVM external declaration and rely on the system linker/toolchain to resolve the symbol
- extern declarations with
runtime_feature/runtime_featuresstill lower as ordinary externs, but they also activate the named runtime features for that compilation unit - if a runtime feature already owns a matching symbol declaration, lowering reuses that feature-owned declaration instead of inventing a parallel native path
- runtime features remain the only place where IRx packages native objects, native C/C++ sources, or extra linker flags
Example split:
- plain
putsextern: system linker resolution only sqrtextern withruntime_feature = "libm": LLVM declaration plus thelibmfeature's-lmlinker flag- Array helpers: IRx-owned nodes imply the
arrayfeature and its packaged native runtime
External Symbols¶
External declarations are centralized inside each feature definition instead of being scattered through visitor methods.
Benefits:
- declarations are reused per module
- function signatures live in one place
- future features can add their own symbol sets without changing the linker or builder architecture
Native Linking¶
IRx still emits the main object file with llvmlite and links with clang. The
difference now is that runtime features may add native artifacts such as:
- C source files
- C++ source files
- prebuilt objects
- static libraries
The current builtin Arrow runtime uses a small C++ wrapper with a stable C ABI, which keeps Arrow C++ container ownership behind runtime feature declarations without introducing dynamic loading.
Assertion Failure Reporting¶
The assertions runtime feature exists for fatal AssertStmt lowering. Its
native helper writes one machine-readable line to stderr before exiting the
process with a non-zero status:
ARX_ASSERT_FAIL|<source>|<line>|<col>|<message>
IRx also exposes small Python-side parsing helpers under
irx.builder.runtime.assertions so higher-level runners can extract one stable
report from stderr without scraping human-oriented text. Source and message
payloads escape backslashes, newlines, carriage returns, tabs, and protocol
delimiters before printing so the report always remains one physical line. The
source field uses the analyzed module display name when available and otherwise
falls back to the module name stored in the AST.
Builtin Array Runtime¶
IRx array support is implemented as a builtin native runtime backed by Arrow C++, not as handwritten LLVM IR container logic.
Current array substrate:
- opaque runtime handles for schemas, array builders, and arrays
- supported primitive storage types:
int8,int16,int32,int64,uint8,uint16,uint32,uint64,float32,float64, andbool - explicit builder / import / inspect / export / release lifecycle
- Arrow C Data import/export support with copy and move/adopt imports
- explicit nullability and validity-bitmap inspection on Arrow handles
- readonly bridge from supported fixed-width numeric arrays into
irx_buffer_view - Python
pyarrowdependency installed by default in IRx for Arrow C Data interop tests and linkable Arrow C++ libraries arx-arrowcpp-sourcesinstalled by default for Arrow C++ headers/source metadata used by native runtime builds
Current initial Tensor layer alongside that substrate:
- tensor values are created through
irx_arrow_tensor_*runtime symbols - tensor construction stores homogeneous fixed-width values in Arrow C++
arrow::Tensorhandles with dtype, shape, and stride metadata - tensor values lower through the same
irx_buffer_viewdescriptor used by the low-level buffer/view model for indexing and lifetime management - indexing and byte-offset calculation reuse descriptor
shape,strides, andoffset_bytes - shallow tensor views may replace shape/stride/offset metadata without copying storage
- current tensor lowering supports fixed-width numeric element types only
What IRx does not do here:
- no direct LLVM struct encoding of Arrow containers
- no full Arrow type system
- no Arx language syntax or module layer
- no RecordBatch, Table, or ArrowArrayStream runtime yet
- no dataframe/query semantics or compute-kernel surface
ABI Boundary¶
The public high-level abstraction is array-oriented, while the low-level ABI
exposed to generated LLVM IR and native harnesses remains the IRx-owned Arrow C
ABI under irx_arrow_*.
Key rules:
- handles are opaque pointers
- runtime-owned memory is released with explicit
irx_arrow_*_release() - Arrow C++ stays internal to the implementation
- Arrow C Data structs are the interchange boundary
- import is explicit:
irx_arrow_array_import_copy(...)copies external C Data into a new runtime-owned array handleirx_arrow_array_import_move(...)adopts external C Data into a new runtime-owned array handle and leaves the input structs moved-from on success- export is explicit:
irx_arrow_array_export(...)copies a runtime-owned array handle into an independent Arrow C Data pair that the caller releases separately- schema handles use the same pattern through
irx_arrow_schema_import_copy(...)andirx_arrow_schema_export(...)
Ownership Rules¶
Current ownership model:
- builder handles own their mutable Arrow builder state
- finishing a builder transfers ownership into an immutable array handle
- schema and array handles are refcounted through explicit retain/release calls
- array handles own their schema plus array resources
- exported Arrow C Data structs own their copied resources and must be released independently
- copied imports leave the caller's Arrow C Data ownership unchanged
- move/adopt imports transfer ownership into IRx on success
Nullability And Buffer Bridges¶
Arrow nullability stays Arrow-specific in this layer.
- Arrow arrays may be nullable independently of
irx_buffer_view irx_arrow_array_is_nullable(...),irx_arrow_array_null_count(...), andirx_arrow_array_has_validity_bitmap(...)expose Arrow-side null metadatairx_arrow_array_validity_bitmap(...)exposes the physical validity bitmap pointer plus bit offset and lengthirx_buffer_viewremains a plain physical view; generic indexing and writes do not become null-awareirx_arrow_array_borrow_buffer_view(...)projects only the physical value buffer and always returns a borrowed readonlyirx_buffer_view- when a bridged Arrow array has a validity bitmap, the returned view sets
IRX_BUFFER_FLAG_VALIDITY_BITMAP - bool arrays are supported as Arrow handles but are not buffer-view compatible because their values are bit-packed
- caller code that needs null semantics must keep using Arrow inspection APIs
The buffer bridge is intentionally conservative:
- only fixed-width byte-addressable primitive arrays are bridged
- the bridge is 1-D and columnar (
shape[0] == length,stride == element_size) - writable views are not exposed in this phase
- borrowed views use a null owner handle, so the caller must keep the Arrow array handle alive explicitly
The Tensor layer uses Arrow C++ arrow::Tensor handles and then projects
tensors through the canonical buffer/view descriptor:
- fresh tensor literals allocate Arrow tensor handles, then wrap borrowed tensor
buffers in external-owner
irx_buffer_viewvalues - tensor views stay shallow and metadata-driven
- readonly semantics are preserved for Arrow C++ backed
Tensorvalues in this phase
Arrow C++ Runtime Backend¶
IRx now depends on pyarrow and arx-arrowcpp-sources by default for the
native Arrow runtime.
arx-arrowcpp-sources provides the vendored Apache Arrow C++ 24.0.0 source and
header snapshot used for native runtime includes and build metadata. pyarrow
provides the installed Arrow C++ shared library used by local builds and tests.
Reasons:
- real Arrow C++ containers now own array and tensor runtime data
Tensorvalues are backed byarrow::Tensorwhile IRx keeps the stableirx_arrow_tensor_*C ABI- array import/export still uses Arrow C Data structs as the interop boundary
- reproducible headers/source metadata in CI and local development without keeping a second Arrow C++ copy inside the IRx repo
- clear ownership of the native runtime surface while keeping Arrow C++ hidden behind the IRx ABI
IRx does not expose C++ types such as std::shared_ptr, arrow::Array, or
arrow::Tensor through generated LLVM IR. They remain implementation details of
the native runtime wrapper.
Buffer As A Runtime Feature¶
The buffer feature owns lifetime-sensitive helper operations for the canonical
buffer/view substrate. Plain irx_buffer_view descriptors lower as structs and
do not activate this feature. Explicit helper calls such as
irx_buffer_view_retain and irx_buffer_view_release activate it.
The feature keeps owner handles opaque at the IR level. Native code may retain or release an owner handle, but generic lowering does not infer ownership transfer or emit hidden retains/releases for descriptor copies. Statically known borrowed views are rejected before retain/release lowering; descriptor-pointer runtime calls are reserved for owned or external-owner views.
What Exists Now¶
Implemented in this phase:
- generic runtime-feature registry/state/linking
libcrouted through the new feature system- low-level
bufferruntime feature for owner/view retain-release helpers - builtin array runtime feature backed by Arrow C++
arrow::Array - builtin tensor runtime feature backed by Arrow C++
arrow::Tensor - Python
pyarrowdependency and direct Arrow C Data interop tests - centralized Arrow runtime symbol declarations
- one internal array lowering path:
irx.astx.ArrayInt32ArrayLength - tests for registry behavior, IR declarations, build integration, primitive type coverage, nullability, move/copy ownership, and Arrow-to-buffer-view projection
Follow-up Roadmap¶
Phase 2:
- string and binary arrays
- richer schema helpers
- better Arrow import/export diagnostics
Phase 3:
- RecordBatch and Table handles
- ArrowArrayStream support
- richer stream-oriented interop helpers
Phase 4:
- limited native compute kernels where justified
- optional Arrow compute backend evaluation if a future Arx layer needs it