Shields AI: Building Sub-Millisecond DNS Security
The technical journey behind our ML-powered DNS protection. From eBPF packet processing to real-time threat detection.
DNS is the phonebook of the internet—and a prime target for attackers. Shields AI provides intelligent DNS security with machine learning threat detection and sub-millisecond latency. Here's how we built it.
The Challenge
DNS security must be:
- **Fast**: Every website load starts with DNS
- **Accurate**: Block threats, not legitimate sites
- **Intelligent**: Detect new threats automatically
- **Private**: Don't log or sell query data
Traditional approaches fail on at least one dimension.
Architecture Overview
┌─────────────────────────────────────────────────────────────┐
│ Shields AI │
├─────────────────────────────────────────────────────────────┤
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ eBPF │───→│ Rust │───→│ ML │ │
│ │ Capture │ │ Filter │ │ Engine │ │
│ └───────────┘ └───────────┘ └───────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ Packet │ │ Block │ │ Threat │ │
│ │ Stats │ │ Lists │ │ Score │ │
│ └───────────┘ └───────────┘ └───────────┘ │
└─────────────────────────────────────────────────────────────┘Layer 1: High-Performance Packet Processing
DNS queries arrive as UDP packets. We need to process them with minimal latency.
eBPF for Zero-Copy Processing
eBPF (extended Berkeley Packet Filter) lets us process packets in the kernel:
// Simplified eBPF DNS parser
SEC("xdp")
int dns_filter(struct xdp_md *ctx) {
void *data = (void *)(long)ctx->data;
void *data_end = (void *)(long)ctx->data_end;
// Parse Ethernet header
struct ethhdr *eth = data;
if ((void *)(eth + 1) > data_end)
return XDP_PASS;
// Parse IP header
struct iphdr *ip = (void *)(eth + 1);
if ((void *)(ip + 1) > data_end)
return XDP_PASS;
// Check for UDP port 53 (DNS)
if (ip->protocol != IPPROTO_UDP)
return XDP_PASS;
struct udphdr *udp = (void *)(ip + 1);
if (udp->dest != htons(53))
return XDP_PASS;
// Extract domain name and check blocklist
struct dns_header *dns = (void *)(udp + 1);
if (is_blocked(dns->qname))
return XDP_DROP; // Block in kernel
return XDP_PASS; // Allow to userspace
}Benefits:
- Packet processing in kernel space
- No context switches for blocked domains
- Microsecond-level latency
- Can handle millions of packets/second
Rust for Safety and Speed
Beyond eBPF, our userspace processing is in Rust:
use std::net::UdpSocket;
use tokio::sync::mpsc;
async fn dns_server(blocklist: Arc<BlockList>, ml_engine: Arc<MlEngine>) {
let socket = UdpSocket::bind("0.0.0.0:53").unwrap();
let mut buf = [0u8; 512];
loop {
let (len, src) = socket.recv_from(&mut buf).await?;
// Parse DNS query
let query = DnsQuery::parse(&buf[..len])?;
// Check blocklist (O(1) hash lookup)
if blocklist.contains(&query.domain) {
socket.send_to(&blocked_response(&query), src).await?;
continue;
}
// ML threat scoring (async, doesn't block)
let threat_score = ml_engine.score(&query).await;
if threat_score > THRESHOLD {
log_threat(&query, threat_score);
socket.send_to(&blocked_response(&query), src).await?;
continue;
}
// Forward to upstream DNS
let response = forward_query(&query).await?;
socket.send_to(&response, src).await?;
}
}Layer 2: Intelligent Blocklists
Static blocklists are the first line of defense.
Blocklist Architecture
pub struct BlockList {
// Hash set for exact domain matching
exact: HashSet<String>,
// Aho-Corasick automaton for pattern matching
patterns: AhoCorasick,
// Bloom filter for quick negative lookups
bloom: BloomFilter,
// Categories for fine-grained control
categories: HashMap<Category, HashSet<String>>,
}
impl BlockList {
pub fn contains(&self, domain: &str) -> bool {
// Quick bloom filter check (false = definitely not blocked)
if !self.bloom.might_contain(domain) {
return false;
}
// Exact match check
if self.exact.contains(domain) {
return true;
}
// Pattern matching for wildcards
self.patterns.is_match(domain)
}
}Blocklist Sources
We aggregate from multiple sources:
- Community blocklists (updated hourly)
- Threat intelligence feeds
- User-reported domains
- ML-identified threats
Total: 5M+ domains across categories:
- Malware & phishing
- Advertising & tracking
- Adult content
- Social media (optional)
- Gambling (optional)
Layer 3: Machine Learning Threat Detection
Static blocklists can't catch new threats. Our ML engine identifies suspicious domains in real-time.
Feature Extraction
def extract_features(domain: str) -> np.ndarray:
features = []
# Lexical features
features.append(len(domain))
features.append(domain.count('.'))
features.append(entropy(domain))
features.append(consonant_ratio(domain))
features.append(digit_ratio(domain))
# N-gram analysis
bigrams = extract_ngrams(domain, 2)
features.extend(ngram_frequency(bigrams))
# Domain age (from WHOIS)
features.append(domain_age_days(domain))
# TLD risk score
features.append(tld_risk_score(domain))
# Similarity to known malicious patterns
features.append(malware_similarity_score(domain))
return np.array(features)Model Architecture
We use an ensemble of models:
- **Random Forest**: Fast, interpretable
- **XGBoost**: High accuracy
- **Neural Network**: Complex pattern detection
class ThreatEnsemble:
def __init__(self):
self.rf = RandomForestClassifier(n_estimators=100)
self.xgb = XGBClassifier()
self.nn = load_model('threat_nn.h5')
def predict(self, features: np.ndarray) -> float:
# Get predictions from each model
rf_pred = self.rf.predict_proba(features)[0][1]
xgb_pred = self.xgb.predict_proba(features)[0][1]
nn_pred = self.nn.predict(features)[0][0]
# Weighted ensemble
return 0.3 * rf_pred + 0.4 * xgb_pred + 0.3 * nn_predTraining Data
Our models train on:
- 10M+ known malicious domains
- 50M+ legitimate domains
- Daily updates with new threats
- Feedback from user reports
Real-Time Inference
ML inference must not slow down DNS:
// Async ML scoring doesn't block DNS response
async fn score_domain(domain: &str) -> f32 {
// Check cache first
if let Some(score) = ML_CACHE.get(domain) {
return score;
}
// Extract features (fast)
let features = extract_features(domain);
// Run inference (uses ONNX runtime for speed)
let score = ML_MODEL.run(&features);
// Cache result
ML_CACHE.insert(domain, score);
score
}Average inference time: <1ms
Performance Results
Latency
Throughput
Accuracy
Privacy by Design
What We Log
- Aggregate query counts (no domains)
- Blocked category statistics
- Performance metrics
What We Don't Log
- Individual domains queried
- IP addresses
- User identifiers
- Query timestamps
Zero-Knowledge Architecture
User Query → Shields AI → Response
↓
Statistics only:
- "Blocked 47 ads today"
- "Protected from 3 threats"
- No specific domains storedDeployment Options
Cloud (Default)
- Managed infrastructure
- Global edge network
- Automatic updates
- No maintenance required
Self-Hosted
- Full source code available
- Docker/Kubernetes deployment
- Your infrastructure, your rules
- Same features, local control
Hybrid
- Cloud for threat intelligence
- Local for query processing
- Best of both worlds
Future Roadmap
Short-term
- IPv6 optimization
- QUIC/HTTP3 support
- Mobile apps
Medium-term
- Browser extensions
- Router integrations
- Threat sharing network
Long-term
- Federated threat detection
- Decentralized blocklists
- Hardware appliances
Conclusion
Building DNS security that's both fast and intelligent requires careful engineering at every layer. By combining eBPF for kernel-level processing, Rust for safe and fast userspace code, and machine learning for threat detection, we've created a system that protects users without compromising performance or privacy.
The result: sub-millisecond DNS security that catches 99%+ of threats while maintaining complete privacy.
*Ready to protect your network? Check out Shields AI and get started for free.*