Day 08: AI Anomaly Detection (DWN Research)
// Objective
Section titled “// Objective”To explore lightweight, behavior-based anomaly detection without deep neural networks.
Modern AI security often depends on heavy deep-learning models (CNNs, LSTMs, Transformers) that require GPUs and large datasets.
The objective of this day was to study and prototype a Weightless Neural Network (WiSARD-style) approach that can model program behavior using binary representations and run efficiently on CPU-only systems.
// The Architecture: Dynamic Weightless Network (DWN)
Section titled “// The Architecture: Dynamic Weightless Network (DWN)”A Weightless Neural Network (WNN) does not rely on floating-point multiplications or learned weights.
Instead, it operates using lookup tables (RAM nodes) that map binary input patterns to stored responses.
In this work, I explored a Dynamic Weightless Network (DWN) variant that allows training via gradient-based methods while preserving discrete, logic-based inference.
// Key Engineering Decisions
Section titled “// Key Engineering Decisions”1. Thermometer Encoding
Section titled “1. Thermometer Encoding”- Problem: Raw numerical values and frequency counts do not translate well into binary representations.
- Solution: Used thermometer encoding, where magnitude is represented by contiguous bits.
Example:
- Count = 3 →
11100 - Count = 4 →
11110
This preserves similarity in Hamming space, which is critical for weightless models.
2. Behavioral Input Representation
Section titled “2. Behavioral Input Representation”Instead of raw sequences, behavior is summarized using:
- Sliding windows over system-call traces
- Bag-of-events (histogram-style aggregation)
- Binary encoding suitable for lookup-based learning
This makes the model robust to small variations while capturing statistical behavior.
3. Differentiable Training via EFD
Section titled “3. Differentiable Training via EFD”Lookup tables are discrete and non-differentiable by nature.
To enable training, I implemented Extended Finite Difference (EFD), which approximates gradients by considering the influence of nearby binary addresses in the lookup table.
This allows:
- Gradient-based optimization during training
- Pure lookup-table inference during deployment
// Experimental Setup
Section titled “// Experimental Setup”- Dataset: UNSW-NB15 (used as an initial controlled benchmark)
- Model: Multi-discriminator DWN (Normal vs Attack)
- Training Mode: Normal-only behavioral learning
- Hardware: CPU-only (no GPU dependency)
The focus of this phase was architecture validation, not metric optimization.
// Observations & Learnings
Section titled “// Observations & Learnings”- Weightless models can learn behavioral patterns without deep neural networks.
- Binary encodings strongly influence model stability and learning quality.
- Training is fast and lightweight compared to traditional deep learning.
- Understanding why the system works is more important than raw accuracy at this stage.
// Reflection (Honest)
Section titled “// Reflection (Honest)”This work represents an exploratory research phase.
My understanding is currently conceptual and surface-level, but structured.
The main outcome is clarity on how kernel behavior, binary representations, and lightweight ML models connect.
// Next Steps
Section titled “// Next Steps”- Integrate DWN input directly with real Linux syscall traces via
ptrace - Study anomaly scoring rather than classification accuracy
- Perform controlled experiments on normal vs abnormal program behavior