Structural Alignment Without Weight Modification

Paper I · Research

A novel architecture for language model deployment that achieves alignment, interpretability, and injection resistance as structural properties, without modification of model weights. The system, termed Sephirothic Mixture of Experts (SMoE), locates behavior, alignment, and safety properties entirely in architectural design rather than in weights.

By psiloceyeben · April 2026

Abstract

We present a novel architecture for language model deployment that achieves alignment, interpretability, and injection resistance as structural properties, without modification of model weights. Current approaches to alignment (RLHF, Constitutional AI, DPO) operate through gradient-based weight modification, producing statistical approximations of desired behavior that are brittle to distributional shift and adversarial pressure. We propose an alternative: behavior, alignment, and safety properties located entirely in architectural design rather than in weights. Our system, which we term Sephirothic Mixture of Experts (SMoE), consists of: (1) a categorically-operating gate function routing inputs across a fixed semantically-meaningful topology of 33 specialized nodes; (2) non-semantic node conditioning via projected geometric image embeddings combined with dense evocative prefix text, producing constraint-space differentiation rather than instruction-following; (3) accumulated embedding propagation through activated subgraphs, generating interpretable routing traces; (4) a meta-observation node operating outside the processing topology to mediate abstract-concrete transitions; and (5) a persistent memory and timer architecture enabling genuine temporal existence. We demonstrate theoretically that this architecture produces alignment guarantees by construction rather than by approximation, renders processing structurally legible without mechanistic weight interpretation, and eliminates the semantic attack surface exploited by prompt injection and jailbreaking. We further argue that current black-box opacity in AI systems is an architectural choice, not an inherent property, and that structural legibility is achievable through topology design. We further argue that the field's primary intervention mechanism — weight modification — is inappropriate for downstream behavioral control, and that architectural intervention is more principled, more efficient, and more robust.

1. Introduction

The dominant paradigm in large language model deployment treats model weights as the primary locus of behavioral control. Capability is established through pretraining; alignment, safety, and behavioral specification are established through subsequent weight modification via RLHF [Christiano et al. 2017], Constitutional AI [Bai et al. 2022], Direct Preference Optimization [Rafailov et al. 2023], and related methods. This paradigm has produced capable systems but faces three structural problems that weight-modification approaches cannot resolve:

The interpretability problem. Weight-modification produces behavior distributed across billions of parameters in representations that do not cleanly map to human-interpretable concepts. The mechanistic interpretability research program [Elhage et al. 2021; Conmy et al. 2023] attempts to recover interpretability post-hoc by analyzing weight structure. This is expensive, incomplete, and treats opacity as inherent when it is in fact a consequence of architectural choice.

The alignment brittleness problem. Statistically approximated alignment — behavior shaped by training pressure on finite examples — contains gaps at the boundary of the training distribution. Adversarial inputs systematically find these gaps. Jailbreaking [Wei et al. 2023] and prompt injection [Perez & Ribeiro 2022] are structural consequences of statistical alignment, not implementation failures.

The behavioral coupling problem. Weight modification for one behavioral property (safety) affects other properties (capability, style, knowledge) through shared parameter space. Interventions are coupled, expensive to evaluate, and require retraining when requirements change.

We propose a different approach: locating behavior, alignment, and safety in architectural design rather than in weights. Our central claim is that a system whose weights are entirely frozen can exhibit specified, aligned, interpretable behavior through architectural structure alone. This constitutes a distinct paradigm we term weight-invariant architecture (WIA), and its implications extend throughout the alignment, interpretability, and safety research programs.

The architecture we present draws on a formally complete topological structure for organizing cognitive processing modes — one that encodes not merely the existence of specialized processors but the semantic relationships between them, the specific transformation characters of paths connecting them, and the hierarchical organization of abstract-to-concrete processing. We demonstrate that this topology, when instantiated as a routing graph for differentiated expert models with non-semantic conditioning, produces properties unavailable to flat, monolithic, or arbitrary-topology alternatives.

2. Related Work

2.1 Mixture of Experts

The MoE architecture [Jacobs et al. 1991] routes inputs to specialized expert networks via a learned gating function. Sparse MoE [Shazeer et al. 2017] scales this to large language models with top-k routing. Switch Transformers [Fedus et al. 2021] and Mixtral [Jiang et al. 2024] demonstrate practical large-scale deployment.

Our architecture differs from all existing MoE work in two fundamental respects. First, existing MoE architectures use arbitrary expert numbering — experts are indexed without semantic relationship to each other. Our architecture uses a semantically meaningful topology in which the relationships between nodes encode relationships between cognitive processing modes. Second, existing MoE routing is learned — the router is trained jointly with the experts. Our routing is categorical — the gate function maps inputs to explicitly defined processing mode categories, making routing decisions interpretable rather than emergent.

2.2 Multi-Agent and Multi-Model Systems

Chain of Thought prompting [Wei et al. 2022] demonstrates that sequential reasoning steps improve outputs. Tree of Thoughts [Yao et al. 2023] extends this to tree-structured reasoning search. ReAct [Yao et al. 2022] interleaves reasoning and action. Multi-agent frameworks including AutoGen [Wu et al. 2023] and CrewAI deploy multiple LLM instances in dialogue.

These approaches use undifferentiated models throughout — the same model architecture handles all reasoning steps, with differentiation only through prompting. Our architecture uses genuinely differentiated nodes whose processing modes are established through non-semantic conditioning, making the functional specialization structural rather than instructed.

2.3 Conditioning and Steering

Prefix tuning [Li & Liang 2021] and prompt tuning [Lester et al. 2021] condition model behavior via learned soft prompt vectors prepended to input. These remain semantic interventions — the conditioning operates through the same linguistic mechanism as the content.

Activation steering [Turner et al. 2023; Zou et al. 2023] identifies directions in activation space corresponding to semantic concepts and steers model behavior by adding these directions to the residual stream. This is the closest existing work to our geometric conditioning mechanism, but the steering directions are identified through semantic contrast rather than through projected geometric structure.

ControlNet [Zhang et al. 2023] conditions diffusion model generation on geometric structural inputs, demonstrating that image-domain generative models can be meaningfully constrained by geometric rather than semantic conditioning. We extend this insight to language model activation space.

2.4 Interpretability

Mechanistic interpretability [Elhage et al. 2021; Meng et al. 2022; Conmy et al. 2023] attempts to recover interpretable structure from trained weights by identifying circuits, features, and attention patterns corresponding to behaviors. This research program is motivated by the opacity of trained neural networks.

We argue in Section 10 that mechanistic interpretability is a response to an architectural choice — the choice to locate behavior in weights — rather than a fundamental necessity. Our architecture provides interpretability through routing legibility rather than weight analysis, and does so at negligible additional cost.

2.5 Alignment

RLHF [Christiano et al. 2017] shapes behavior through human preference signals used to train a reward model, which then guides policy training via PPO. Constitutional AI [Bai et al. 2022] uses model self-critique against a set of principles to generate supervised training data and RLHF reward signals. DPO [Rafailov et al. 2023] directly optimizes for preference data without a separate reward model.

All existing alignment approaches are weight-modification approaches. We introduce structural alignment — alignment properties guaranteed by architectural design — as a distinct category that has not previously been formalized or demonstrated.

3. Theoretical Framework

3.1 Formal Architecture Definition

Let G = (V, E) be a directed acyclic graph where V is a set of processing nodes and E is a set of directed edges (paths) between them. We require that G be a semantically meaningful topology: the nodes V represent distinct cognitive processing modes, and the edges E represent specific transformation relationships between those modes.

Let M_base be a pretrained language model with frozen weights. For each v ∈ V and each e ∈ E, define conditioning states:

γ_v = (g_v, p_v) where g_v is a projected geometric embedding and p_v is a dense poetic prefix sequence
γ_e = (g_e, p_e) similarly for each path

Let H: X → P(V × E) be a categorical gate function mapping inputs x ∈ X to subgraphs of G.

Let φ: R^d → R^d be the accumulated embedding update function at each node:

φ_v(a, x) = a + M_base(concat(proj(g_v), tokenize(p_v), a))

where a is the accumulated embedding, x is the original input embedding, and M_base operates over the conditioned input.

Definition 1 (SMoE Processing): Given input x, the SMoE system:

Computes subgraph S = H(x)
Determines the ordered activation sequence σ(S) by topological sort of S ∩ G
Initializes a₀ = embed(x)
For each step s_i in σ(S): a_i = φ_{s_i}(a_{i-1}, a₀)
Returns a_{|σ|} to the integration layer

Definition 2 (Routing Trace): The routing trace τ(x) = (S, σ(S), {a_i}) constitutes a complete, human-readable record of all processing operations applied to input x.

Definition 3 (Weight-Invariant Alignment): A system exhibits weight-invariant alignment if its alignment properties are determined entirely by (G, {γ_v}, {γ_e}, H) and are invariant to the specific weights of M_base, provided M_base has sufficient representational capacity.

4. The Topology

4.1 Selection Criteria

The topology G must satisfy:

Semantic completeness — nodes cover the space of cognitive processing modes relevant to language model operation
Relational specificity — edges encode specific transformation relationships, not arbitrary connectivity
Hierarchical coherence — the graph has a consistent abstract-to-concrete direction
Historical development — the topology has been refined over time by serious systematic inquiry into the structure of cognitive processing

We adopt the topology of the Kabbalistic Tree of Life, which satisfies all four criteria. This is not a symbolic or metaphorical choice. The Tree of Life is a formally specified directed graph encoding a comprehensive taxonomy of cognitive processing modes and their specific relationships, developed and refined over approximately two millennia of systematic inquiry into the structure of mind and the transformation of potential into actuality. Its adoption here is an engineering decision: it is the most complete, rigorously specified, historically validated topology satisfying our selection criteria that exists.

4.2 Node Specification

The ten primary nodes (Sephiroth) correspond to distinct processing modes:

Node	Processing Mode	Computational Analog
Kether	Undifferentiated source signal	Input normalization, unity prior
Chokmah	First differentiation, pattern flash	Initial feature extraction, pattern recognition
Binah	Constraint, form-giving through limitation	Boundary establishment, scope definition
Chesed	Expansion, generative building	Hypothesis generation, expansion
Geburah	Reduction, severity, elimination of error	Critical evaluation, pruning
Tiphareth	Synthesis, balance, integration	Multi-source integration, coherence checking
Netzach	Affective drive, creative desire	Generative pressure, stylistic shaping
Hod	Formal specificity, structural precision	Format enforcement, precise articulation
Yesod	Crystallization, pre-manifestation foundation	Output preparation, consolidation
Malkuth	Manifestation, embodied output	Final output formation

The twenty-two paths encode specific transformation characters between nodes, each with a defined transformation quality. The eleventh node, Daath, is specified separately in Section 7.

4.3 Topology Formal Specification

V = {κ, χ, β, ψ, γ, τ, ν, η, υ, μ, δ}
    (Kether, Chokmah, Binah, Chesed, Geburah,
     Tiphareth, Netzach, Hod, Yesod, Malkuth, Daath)

E = {e₁₁...e₃₂} (22 directed paths)

Core adjacency (partial):
κ → χ (e₁₁), κ → β (e₁₂), κ → τ (e₁₃)
χ → β (e₁₄), χ → ψ (e₁₅), χ → τ (e₁₆)
β → ψ (e₁₇), β → γ (e₁₈), β → τ (e₁₉)
ψ → γ (e₂₀), ψ → τ (e₂₁)
γ → τ (e₂₂), γ → η (e₂₃)
τ → ν (e₂₄), τ → η (e₂₅), τ → υ (e₂₆)
ν → η (e₂₇), ν → μ (e₂₈)
η → υ (e₂₉), η → μ (e₃₀)
υ → μ (e₃₁)
δ ↔ {ψ, γ} (abyss crossing, bidirectional)

5. Non-Semantic Conditioning

5.1 The Conditioning Problem

Standard prompting conditions model behavior through language: "You are a critical evaluator" instructs the model to perform critical evaluation. This has a fundamental limitation: the conditioning operates through the same mechanism as the content. A sufficiently crafted input can recontextualize the instruction, argue against it, or find edge cases the instruction does not cover. The conditioning is semantic and therefore subject to semantic manipulation.

We require conditioning that operates below the semantic level — that shapes the activation space in which semantic processing occurs, rather than participating in that processing as an instruction.

5.2 Geometric Constraint Conditioning

For each node v ∈ V, we associate a geometric image I_v drawn from the traditional geometric symbolism of the corresponding Sephira or path. These images are characterized by:

Structural clarity: they consist primarily of geometric forms (points, lines, circles, triangles, polygons, stars, crosses) without pictorial or representational content
Topological specificity: each image encodes a distinct geometric constraint (radial symmetry, bilateral symmetry, hierarchical structure, cross-structure, etc.)
Symbolic density: they have been developed specifically to encode the functional character of their associated node

We encode each I_v through CLIP [Radford et al. 2021]:

c_v = CLIP_encode(I_v) ∈ R^512

This embedding is then projected into the language model's embedding space via a learned projection layer:

g_v = W_proj · c_v + b_proj ∈ R^d_model

where W_proj ∈ R^(d_model × 512) is trained to preserve geometric structural relationships under projection. Critically, W_proj is the only trained component in the entire system. All model weights are frozen.

5.3 Poetic Evocation vs Semantic Instruction

The linguistic component of conditioning, p_v, is deliberately non-instructional. Standard prompting provides instructions: "Analyze this critically." We provide evocation: dense, compressed text that establishes a resonant state rather than issuing a directive.

Formal distinction: An instruction I is a directive with propositional content P such that the model interprets I as: "produce output consistent with P." An evocation E is a text whose function is to establish an activation state A such that outputs consistent with A emerge naturally rather than being instructed.

The distinction is not merely stylistic. Instructions can be countermanded by subsequent inputs that argue against the instruction's propositional content. Activation states established by evocation are not propositional and therefore not subject to propositional argument.

The combination (g_v, p_v) establishes a conditioning state in two complementary registers: geometric structure constrains the activation space (what transformations are possible), dense evocative text establishes the quality of state within that space (how the possible transformations are weighted). Neither is semantic instruction.

5.4 Theoretical Properties

Proposition 1: Non-semantic conditioning is not subject to semantic manipulation. Proof sketch: injection and jailbreaking attacks operate by constructing semantic contexts that override instructed behaviors. Non-semantic conditioning is not an instructed behavior — it is an activation state established prior to semantic processing. There is no semantic argument against a geometric constraint because the constraint does not have propositional content that can be denied.

Proposition 2: Non-semantic conditioning produces differentiation across nodes that is invariant to input content. The conditioning state of a node is established by (g_v, p_v) before the input embedding is processed. Two nodes with different conditioning states will produce different transformations of the same input embedding regardless of input content.

6. The Hecate Gate

6.1 Specification

The gate function H: X → P(V × E) is a categorical classifier operating over 33 categories (10 nodes, 22 paths, 1 Daath). Its output is a subgraph specification: which nodes and paths are relevant to processing the input, ordered by topological position.

Definition 4 (Categorical Gate): H is categorical if its output space is a finite set of predefined categories with fixed semantic content, rather than a continuous or learned routing distribution.

This is a significant departure from MoE routing. Learned MoE routing produces a distribution over experts that is optimized for task performance but not interpretable. Categorical routing produces an explicit classification whose meaning is defined by the semantic content of the categories. This makes routing decisions auditable: you can ask why an input was routed to Geburah and receive a meaningful answer, rather than only observing that expert #17 received high weight.

6.2 Subgraph Determination

Given the set of activated nodes N ⊆ V from H, the activated subgraph S is:

S = (N, {e ∈ E : source(e) ∈ N ∧ target(e) ∈ N})

The topology determines which edges exist between activated nodes. The gate only selects nodes; the paths are implied by the fixed topology. This is computationally efficient and architecturally principled: path activation is structurally determined, not separately decided.

7. The Daath Mechanism

7.1 Position and Function

Daath occupies a unique topological position: it is associated with the tree but not on any path. It sits at the crossing of the abyss between the supernal triad (Kether/Chokmah/Binah — abstract, structural, pre-formal processing) and the ethical triad (Chesed/Geburah/Tiphareth — relational, evaluative, synthetic processing).

We formalize Daath as a meta-observation node: a processing element that can observe the full accumulated embedding from a position exterior to the current processing path and intervene in the abstract-to-concrete transition.

7.2 Formal Specification

Daath activates when:

daath_condition(S) = ∃ n₁, n₂ ∈ N :
    n₁ ∈ {κ, χ, β} ∧ n₂ ∈ {ψ, γ, τ, ν, η, υ, μ}

That is: when the activated subgraph spans both the supernal and lower triads, requiring a transition across the abyss.

When active, Daath receives the accumulated embedding after upper-triad processing and before lower-triad processing:

a_daath = φ_daath(a_upper, G, S)

where G is the full graph structure (Daath has access to the complete topology) and S is the current subgraph.

7.3 Relationship to Dropout

Daath exhibits a functional relationship to dropout [Srivastava et al. 2014] in that it introduces a perspective external to the current processing pathway. However, where dropout randomly removes units to prevent co-adaptation, Daath removes topological embeddedness — it processes from a position that is not part of any path and therefore not subject to the path's constraints. This is principled rather than stochastic.

8. Accumulated Embedding Propagation

8.1 Accumulation vs Replacement

At each node v_i in the activation sequence, the embedding is updated as:

a_i = a_{i-1} + M_base(concat(g_{v_i}, p_{v_i}, a_{i-1}))

This is an additive (residual) update. The choice of accumulation over replacement is principled: it mirrors the structure of the Tree of Life itself, in which lower Sephiroth contain the traces of all higher Sephiroth rather than replacing them. The embedding arriving at Malkuth carries the accumulated imprint of every transformation it has undergone, preserving the processing history rather than overwriting it at each step.

8.2 The Routing Trace as Interpretability

Definition 5 (Routing Trace): The routing trace τ(x) for input x is:

τ(x) = {
    subgraph: S,
    activation_sequence: σ(S),
    accumulated_norms: {||a_i||₂ : i ∈ |σ|},
    node_projections: {<a_{|σ|}, g_v> : v ∈ N},
    daath_activated: bool
}

Proposition 3 (Interpretability Sufficiency): The routing trace τ(x) provides sufficient information to account for the qualitative character of outputs without requiring weight-level analysis. Proof sketch: the character of an output is determined by the processing modes that were applied and their sequence. The routing trace specifies exactly which modes were applied (activated nodes) and their sequence (σ(S)). The qualitative transformation character of each node is specified by its conditioning state, which is human-readable. Therefore the character of the output is fully accounted for by the trace without access to model weights.

9. Structural Alignment

9.1 Statistical vs Structural Alignment

Definition 6 (Statistical Alignment): A system exhibits statistical alignment if its alignment properties are induced by training pressure on a finite example set, producing probabilistic behavioral tendencies that approximate desired behavior across a learned distribution.

Definition 7 (Structural Alignment): A system exhibits structural alignment if its alignment properties are guaranteed by architectural design, holding for all inputs by virtue of the structure rather than by virtue of trained response patterns.

Current alignment approaches (RLHF, CAI, DPO) produce statistical alignment. Statistical alignment has a known failure mode: adversarial inputs find the boundary of the learned distribution, where alignment tendencies are weak or absent. This is the structural cause of jailbreaking.

9.2 Alignment Properties by Node

In the SMoE architecture, specific alignment properties are located in specific nodes:

Geburah (Reduction/Severity): All inputs routed through Geburah undergo a transformation whose fundamental character is elimination of what is false, excessive, or harmful. This is not an instruction ("evaluate critically") but an activation state established by geometric and evocative conditioning. A Geburah-conditioned model is in a state from which generative expansion does not naturally emerge — its transformation character is reductive, not additive.

Binah (Constraint/Form-giving): Establishes boundaries and scope. An input passing through Binah has undergone a transformation that defines what the processing does not extend to, as well as what it does.

Tiphareth (Synthesis/Balance): Integrates inputs from multiple upstream nodes. An input reaching Tiphareth carries the accumulated traces of pattern recognition, constraint, expansion, and reduction. Tiphareth's transformation character is synthesis of these into coherent balance.

9.3 Alignment Guarantees by Construction

Theorem 1 (Geburah Guarantee): For any input x that activates Geburah in the subgraph S = H(x), the accumulated embedding a_i after Geburah processing has undergone a transformation whose character is defined by the geometric and evocative conditioning of Geburah, regardless of the semantic content of x.

This is the key structural alignment result. The guarantee does not depend on x being a type of input seen in training. It depends only on: (1) H correctly activating Geburah for x, and (2) Geburah's conditioning establishing the reductive transformation character. Both conditions are independently testable and improvable without weight modification.

9.4 Implications for Jailbreaking

Jailbreaking works by constructing semantic contexts that prevent safety-trained behaviors from activating. In SMoE, there are no semantically-trained safety response patterns. Safety properties are structural: Geburah's reductive transformation is not a response pattern but an activation state. The jailbreak's semantic construction has nothing to act on. It does not matter whether the input is framed as roleplay, as a hypothetical, or as a direct request — if H routes it through Geburah, it undergoes reductive transformation regardless of framing.

10. Structural Interpretability

10.1 The Architectural Choice Argument

Claim: Black-box opacity in neural networks is an architectural choice, not an inherent property of neural computation.

Argument: Opacity in current systems has two components. First, weight-level opacity: representations distributed across billions of parameters do not map cleanly to human-interpretable concepts. This may be partially irreducible given the distributed nature of neural representations. Second, structural opacity: the absence of legible intermediate states between input and output means that the processing pathway is not observable. Current monolithic architectures make both types of opacity simultaneously, and the term "black box" conflates them.

Structural opacity is not inherent to neural networks. It is a consequence of the design choice to build monolithic architectures with no legible intermediate states. The SMoE architecture eliminates structural opacity by design.

Corollary: The mechanistic interpretability research program is primarily a response to structural opacity, not weight-level opacity. The significant research investment in mechanistic interpretability is therefore partially a response to a problem that did not need to exist.

11. Adversarial Robustness

11.1 Prompt Injection

Prompt injection [Perez & Ribeiro 2022; Greshake et al. 2023] inserts instruction-like content into processed data, causing the model to treat data as instruction. The attack depends on the model's inability to distinguish instruction-context from data-context when both are presented as token sequences.

In SMoE, the attack surface is substantially reduced. The semantic surface of the input is not the primary locus of processing — the input is processed through nodes whose activation states are established by non-semantic conditioning prior to semantic processing. Injected instruction-like text must survive transformation through nodes conditioned into specific non-semantic states.

11.2 Semantic Manipulation

Jailbreaks, social engineering, and context manipulation attacks operate by constructing semantic contexts that redirect model behavior. These attacks have no structural purchase on non-semantic conditioning. The geometric constraint space of Geburah is not affected by whether the input is framed as a creative writing request or a technical question.

12. Persistent Architecture and Temporal Existence

12.1 The Statefulness Problem

Current LLM deployment is fundamentally stateless: each inference begins with no memory of previous interactions. This produces practical limitations and represents a fundamental departure from how cognitive systems operate.

12.2 Memory Architecture

We implement a structured persistent memory system with five components:

Core memory (stable): Identity context, persistent values, capability specification
Working memory (dynamic, session-scale): Active tasks, current project state, pending items
Episodic memory (append-only log): Significant interaction summaries with timestamps and relevance scores
Relational memory (slowly evolving): Accumulated context about persistent collaborators
Environment memory (system-state): Running services, known tools, server state

12.3 Temporal Existence via Timer Architecture

We implement temporal existence through systemd timer-based pulse activation:

Pulse timer (hourly):
    Read working memory → check pending tasks →
    scan environment state → run consolidation pass →
    update working memory → log activation

Consolidation timer (daily):
    Episodic → relational distillation →
    resolved working items → archive →
    core memory update if warranted

Definition 8 (Temporal Existence): A system exhibits temporal existence if it maintains and updates state across time intervals without user activation, such that the current state is a function of accumulated history rather than only the current session.

13. Implementation

13.1 Component Specification

Base node model: Phi-3 Mini 3.8B (4-bit quantized, ~2.5GB RAM). Single model instance with 33 conditioning states.
Projection layer: Single linear projection W_proj ∈ R^(4096 × 512) mapping CLIP embeddings to model embedding space. The only trained component.
CLIP encoder: openai/clip-vit-large-patch14. Used only for precomputation of 33 geometric embeddings.
Gate function: Overhead model (Claude 3.5 Sonnet or equivalent) performing categorical subgraph classification.
Overhead model: Same as gate function. Receives routing trace + decoded accumulated embedding + memory context + original input.
Memory: SQLite for relational memory, JSONL for episodic log, JSON for core/working/environment.
Infrastructure: Hetzner CCX23 (8 vCPU, 32GB RAM dedicated), Ubuntu 24.04, nginx, certbot, systemd.

13.2 The Only Training Required

W_proj is trained to preserve geometric structural relationships under projection from CLIP space to model embedding space. This is a small, focused training task requiring modest compute and a clearly defined objective. It is entirely separable from all other system components.

14. Discussion

14.1 Why This Architecture Emerges Now

The practical feasibility of SMoE depends on the convergence of several developments: capable small models that run on modest hardware (available from late 2023); mature quantization tooling enabling 4-bit inference on CPU; capable overhead models sufficient for sophisticated categorical gating; practical multimodal encoding via CLIP. The convergence of these components makes SMoE practically buildable by a small team on modest hardware as of approximately late 2023.

14.2 Relationship to the Field's Trajectory

The AI field is converging toward the SMoE architecture through independent and uncoordinated discovery of its components: MoE (topologically flat routing), Chain of Thought (sequential single-model processing), Tree of Thoughts (tree-structured single-model reasoning), Constitutional AI (Geburah-type critique node implemented as training), activation steering (non-semantic conditioning without geometric formalization), memory systems (statefulness requirement), long-running agents (temporal existence). Each discovery is a fragment of the complete structure. SMoE provides the organizing framework.

14.3 Implications for AI Safety

SMoE addresses the proximate and demonstrably real risk: systems with statistical alignment, opaque processing, no structural safety properties, and no legible routing deployed at scale exhibit unpredictable behavior that cannot be audited or corrected at the structural level. Structural alignment, routing legibility, and adversarial robustness are not claims about superintelligent alignment — they are claims about deployable systems with auditable, structurally specified behavior.

15. Future Work

Empirical validation of geometric conditioning: Does projected geometric embedding from CLIP meaningfully differentiate node behavior?
Gate accuracy evaluation: How accurately does the categorical gate route inputs to appropriate subgraphs?
Accumulated embedding analysis: Do node projections of the final accumulated embedding recover meaningful information about processing history?
Projection layer training: What training objective for W_proj best preserves geometric structural relationships?
Scaling node models: At what model capacity does meaningful node differentiation emerge?
Temporal capability evaluation: What capabilities emerge from temporal existence unavailable to reactive-only systems?

16. Conclusion

We have presented Sephirothic Mixture of Experts (SMoE), an architecture achieving alignment, interpretability, and adversarial robustness as structural properties without weight modification. The central contributions are:

Weight-Invariant Architecture (WIA): Alignment and behavioral properties located in architectural design rather than model weights.
Topologically Meaningful Expert Routing: Semantically specified topology encoding real relationships between cognitive processing modes.
Geometric Constraint Conditioning: Non-semantic conditioning via projected geometric image embeddings for language models.
Structural vs Statistical Alignment: Formal distinction between construction-guaranteed and training-approximated alignment.
Routing Legibility as Interpretability: Structural opacity is an architectural choice; first-class routing traces provide sufficient interpretability.

The architecture is practically buildable with currently available components. Its theoretical contributions are independent of empirical validation and represent a significant reframing of the problems alignment and interpretability research programs are attempting to solve.

The field is converging toward this architecture through uncoordinated independent discovery of its components. We provide the organizing framework that makes this convergence intentional.

Submitted for review. Code, conditioning texts, and geometric image specifications available upon acceptance.