Every platform has unique designs and strengths. This goes for BroadWorks, NetSapiens, Metaswitch/Alianza, Ribbon, 2600hz, PortaOne - each one has unique advantages. NetSapiens operators can gain a lot by fully exploiting the benefits of the SBUS-Based Clustering architecture in the modern NetSapiens platform. You can get these advantages whether you host in your own data center, or cloud hosting with Crexendo (via Oracle Cloud).
If you do it right, you minimize outages, provide a reliable service, and get some really powerful "elastic" scalability. But setting up the architecture - and keeping it running! - requires some care. NetSapiens operators need to know about the foundations of the architecture, they need to know some networking, and they need to perform testing.
The NetSapiens standard is a seven-server geo-redundant cluster. Experts at Crexendo explain that a minimal production setup spans two data centers: Data Center A typically hosts four hosts (multiple cores plus supporting modules), while Data Center B mirrors the critical functions.
Core module: The core module handles all voice switching, voicemail, greetings, auto-attendants, and call features. SIP endpoints register directly to the Core, as well as PSTN peers (Sinch, Bandwidth, Twilio, etc.)
NQS: Paired with the Core is the NQS (QoS) module, which is used to analyze traffic, and is essential to supporting customers.
Call Recording: MIS recording module performs call recording. Internally it uses the CALEA technology but is used for everyday customer call recording.
The standard Linux distribution is Canonical's Ubuntu LTS (currently supporting 20.04, 22.04, and 24.04), and the entire stack can now be virtualized on most hypervisors (VMware, Hyper-V, KVM, Proxmox, etc.).
To support QoS, you need a "promiscuous" (Linux configuration term) listening interface required for QoS. Each virtualization platform (hypervisor) has a unique way of supporting the packet-capture interface. A key part of your design of virtualization should be ensuring this is operational, because the QoS module is essential.
But, this is not a rigid appliance—leaders can stack additional cores horizontally for capacity or separate portal/API servers for high-traffic customers. Large NetSapiens deployments often run dedicated portal hosts exposed to the public while keeping core servers on a protected backbone. The architecture supports facilities (customer-owned hardware), hosted (Crexendo-managed Oracle Cloud), or hybrid models, giving leaders flexibility to match their risk tolerance and capex/opex preferences.
Call limits were openly discussed: the theoretical sweet spot is 1,000–1,250 concurrent calls per core, though tuned systems regularly exceed 1,500 for residential traffic. NetSapiens has proven with load testing during referenced real-world benchmarks reaching 20,000 calls across seven cores without failure.
A custom technology called NetSapiens SBUS keeps every cluster member in near real-time sync. For example, if one component shuts down, SBUS immediately begins queuing dial-plan changes, subscriber updates, CDRs, greetings, and routing rules. When connectivity was restored, the queue flushed automatically.
SBUS replicates everything except in-flight call state—meaning a data-center outage drops only active calls, but voicemails, call history, and configurations survive. SBUS is an encrypted pathway (TLS-secured 443) traffic and works across any number of sites (often between four and ten data centers). This replication is what enables true active-active behavior and data sovereignty: leaders can keep EMEA customer data in Europe and U.S. data in the U.S. without licensing complications unless separate databases are required.
Every core is fully geo-redundant by design. Calls are normally pinned to the customer’s “home” core via preferred-server settings and DNS, but if that core is unavailable, traffic automatically fails over to the sister core in the remote data center (if the DNS configuration for the customer devices is setup properly!) Geo-connections between cores automatically reroute misdirected calls (e.g., a DID landing on the wrong cluster because the PSTN carrier routed the call to the sub-optimal core) with minimal latency. Leaders benefit because they can “sister” cores—pairing Core A1 with B1, A2 with B2—so that a single-site failure only shifts half the load rather than overwhelming one remaining host. The architecture scales both horizontally (add more cores) and vertically (more powerful hosts), and virtualization makes right-sizing trivial.
The NQS module (branded VoIP Monitor) provides promiscuous listening on every core. In bare-metal setups it uses a port mirror; in virtual environments it leverages the hypervisor’s traffic-mirror feature. It captures RTP streams, calculates MOS scores, stores packet captures, and enables one-click playback of any leg of a call. Call-history screens in the manager portal pull MOS scores and deep-link directly into VoIP Monitor for detailed troubleshooting. Resellers (I.e., those who do not license the NetSapiens platform but sell services based on it it) can be given scoped access to individual calls without seeing the entire QoS dashboard, satisfying security-conscious customers. The module is essential for voice-quality SLAs and is integrated into the platform’s Insight monitoring.
The MIS (Media Intercept Service) recording module was built on legal-intercept foundations. When a call is flagged for recording, the core streams RTP plus metadata directly to the recorder over dedicated ports (5400/1040). After the call ends, the module renders a playable WAV file. Trainers explained that recordings are stored separately from cores, can be offloaded to AWS S3-compatible storage for compliance, and are fully searchable via the manager portal and API. Because the module is “bolt-on,” it can be upgraded independently of the cores, which is useful during major version jumps.
The endpoints module (provisioning, device config, line-appearance handling) is intentionally redundant—customers normally run exactly two instances. The switch itself sees only one logical endpoint reference; DNS and MySQL/rsync synchronization keep the pair in lockstep.
Supporting SIP endpoints - like Yealink, Grandstream, Poly - can be a workload-intensive task. Fortunately, the NetSapiens licensing model is friendly to adding many more NDP servers to provide high capacity for mass-reboot or mass-upgrade events.
The entire platform operates active-active—no cold standby. DNS, preferred-server settings, and SBUS replication handle failover automatically.
DNS and the proper configuration of networking are essential to proper failover. If service providers are having reliability problems in their NetSapiens platform, the first question is whether the rest of the network was built with a solid understanding of the traffic flows.
In one test case, we took an entire data center offline; the remaining site continued processing calls and queued configuration changes until the link returned. Call state is not preserved (a call in progress is lost when the server operating it is shut down), but all data (CDRs, voicemails, dial plans) remains intact. ECG recommends “sistering” cores and keeping utilization under 40% so that failover capacity is never exceeded, e.g., server A2 handles no more than half of the workload for customer group "A"; in this way, if server A1 goes down, then A2 has available capacity with 20% headroom room for unexpected bursts. This design gives business leaders true geo-redundancy with minimal operational overhead and satisfies even the most stringent uptime requirements.
Crexendo engineers Matt MacCullum, Sean Gill, and Rob Harris contributed information for this article.