MoSys Stellar Packet Classification IP for FPGAs and ASICs

Overview

The MoSys STELLAR Packet Classification Platform is provided as IP for a variety of accelerators and supports ultra-high search performance using lookup rules based on highly complex Access Control List (ACL) and Longest Prefix Match (LPM).

Stellar supports header lookups at 100s of millions of lookups per second with millions of rules and can easily support networks from 25Gbps to Terabits/s.

Stellar leverages the company’s innovative Graph Memory Engine (GME) for performing embedded search and classification of packet headers as an alternative to TCAM functions.

The platform includes software that compiles TCAM and search images into graphs for the GME to process, using a common API for portability.

The MoSys Stellar Packet Classification Platform supports a wide range of hardware and operates with or without a MoSys IC – support options include:

  • Optimized RTL IP for FPGA or ASIC
    • with internal SRAM only
    • with external DDR memory
    • with external HBM memory
    • with MoSys Accelerator Engine QPR memory series
    • with MoSys Accelerator BE Bandwidth Engine series
    • or Hybrid multi-tier combination of above options
  • Software that runs on an x86
  • Firmware for the MoSys Programmable HyperSpeed Engine (PHE)

Two Optimized Platforms

MoSys offers two Stellar Packet Classification Platforms that take advantage of MoSys’s Graph Memory Engines for FPGA and ASIC that are optimized for very flexible, very complex, many tuple lookups using a mix of both ACL & LPM rules or for high performance/high capacity LPM processing.

1) High Flexibility / High Complexity ACL & LPM

  • Ultra-High-Speed Search Engine IP
  • Ideal for a very wide range of use cases
  • Tuned for very complex n-tuple lookups using Access Control List (ACL)
  • Also supports Longest Prefix Match (LPM)
  • Hundreds of Millions of lookups per second
  • Millions of Rules
  • Optimized for complex 1 – 10+ tuple matches
  • Typical key size 40 – 480b keys
  • Very fast rule updates – No need to recompile
  • On the fly live updates – no need to stop traffic
  • Also supports Exact Match

2) High Performance / High Capacity LPM

  • Ultra-High-Speed Search Engine IP
  • Ideal for very high-performance routing
  • Tuned for High Performance, High-Capacity Routing Lookups using Longest Prefix Match (LPM)
  • Hundreds of Millions of lookups per second
  • Tens of Millions of Rules
  • Optimized for 1 or 2 tuple matches
  • Typical Key size 40 – 160b, capacities and key sizes beyond normal routing
  • Any mix of IPV4 and IPV6 lookups
  • Supports virtual routes
  • Supports large number of bits for next hop data

Use Cases and Markets:

Routing (5G Wireless and Wireline)

  • 5G Wireless UPF (User Plane Function) for Edge & Core
  • Carrier-Grade NAT
  • BNG (Broadband Network Gateway)
  • Service Provider / Multi-access Edge
  • Cloud / Data Center routing
  • Network Classification
  • Enterprise Networking
  • Flow Steering
  • L3 Forwarding & Filtering
  • NVFi
  • vRouter
  • Open vSwitch (OVS) Offload
  • Software Defined Networking
  • Cloud Gateways and more

Security

  • Network Next-Gen Firewall (NGFW)
  • Access Control Lists (ACLs)
  • DDoS prevention
  • Allow/Deny Lists
  • Network Detection and Response (NDR)
  • Anomaly Detection
  • Lawful Intercept (LI) and more
  • Quality of Service (QoS) and Hierarchical QoS (HQoS)

Operations

  • Application Delivery Controllers (ADC)
  • L4 Load balancing
  • Network traffic load balancing
  • Application and Network Analysis & Telemetry
  • Network Traffic Analysis
  • Test and Measurement
  • Network Packet Brokers (NPB)
  • Log File Analysis and more

Virtual Accelerator Engine Strategy
(Software / FPGA or ASIC RTL / Firmware)

  • MoSys Virtual Accelerator Engines (VAE) are designed to support a particular functional platform. i.e., “packet classification” or “deep packet inspection”
  • It is “Virtual” because it can be standalone software for x86 or ARM, FPGA or ASIC RTL, or embedded firmware based for our own family of RISC based processors
  • All VAEs are based on the innovative MoSys Graph Memory Engine (GME)
  • All VAEs use a common software interface (API) across the various hardware environments, which enables system designers to reuse internally developed software code to tune the performance required
  • In addition, all FPGA and ASIC-based VAEs also use a common RTL interface that allows hardware transportability
  • A VAE with a common API can run on:
    • x86 or ARM CPUs
    • FPGA or ASIC RTL for
      • FPGA or ASIC that is not attached to a MoSys IC
      • FPGA or ASIC that is attached to
        • MoSys Quazar QPR Memory IC
        • MoSys Blazar Bandwidth Engine (BE) Accelerator Engine IC
    • RISC firmware on a MoSys Programmable HyperSpeed Engine (PHE) IC
  • Key benefits of using a Virtual Accelerator Engine approach are:
    • Protection of software investment
    • Common API for transportability
    • Performance scaling over many different hardware environments
    • Seamless ports across a range of performance platforms

The Technology behind Stellar – The GME (Graph Memory Engine)


The MoSys Stellar Packet Classification Platform IP is based on the Graph Memory Engine (GME), MoSys’s patent-pending design that supports millions of rules and hundreds of millions of lookups per second in a single FPGA.

The GME can process search keys in a minimum of cycles because it can inspect as many as 32 bits at a time. That allows the GME to outperform other approaches that usually look at 4, 8 or at most 16 bits at a time and that also waste memory by replicating data


In the diagram above the packet header of a typical packet is being classified. The high-speed GME logic inspects the first 16 bits of the header, starting at bit 0, the first bit.


The first 16 bits of the packet header is 10111111_10011101 (hex 0xBF9D); that could, for example, correspond to the first part of the packet’s source IP address. The GME logic compares that sequence to see which rule is the best match. Each rule can contain three states for each bit: a 1 (true), 0 (false) or * (don’t care).


Here, the lookup determines that the first 16 bits best matches rule 4 (shown in the red box) in the TCAM-Like rule table. The GME logic follows the match to node 3 in the graph and selects the next relevant portion of the packet header for the next stage of the inspection


The GME logic proceeds through the graph, continually selecting and inspecting different bit fields from the packet header. When it arrives at the last node in the graph, depending on the GME process either:

  1. Determines where to route the packet
    • Using a Longest Prefix Match (LPM) rules
  2. Determines whether a packet is allowed access the next point in the system
    • Using complex Access Control Lists (ACL)

At each cycle a different part of the input is matched against different fields (tuples)


In example above the first 8 bits are examined in cycle 0, then in cycle 1, the 8 bits at bit offset 32, then in cycle 2, the 16 bits at bit offset 40, then lastly in cycle 3 the system moves back up the input and examines 16 bits at bit offset 8


In case multiple rules match, the GME logic can also decide which rule is the best fit based on a pre-programmed priority

Hardware Support:


The MoSys GME architecture can support a wide range of FPGAs from Achronix, Intel, Xilinx. It is designed with the flexibility to support a range of performance goals and resource utilization to accommodate many different applications.

  • Achronix Speedster7t FPGAs
  • Intel Stratix 10 & Agilex FPGAs
  • Xilinx Ultrascale+ FPGAs
  • And can be easily adapted to other FPGAs or ASICs

And all utilize a common RTL interface to facilitate platform portability.


Development Options:

  • Achronix, Intel or Xilinx FPGA Development boards
    • Internal SRAM only
    • Internal SRAM + DDR
    • Internal SRAM + HBM
  • Silicom FPGA SmartNIC N5010 series (with Intel Stratix 10 FPGA)
  • Silicom FPGA IPU C5010X (Xeon-D + Intel Stratix 10 FPGA)
  • MoSys Cheetah Development Card (with Xilinx VU9P FPGA)
  • MoSys FPGA Mezzanine Card (FMC) Card with MoSys Memory
    • Available to support any evaluation platform with FMC support
    • Support for either MoSys BE2, BE3 or PHE
  • MoSys PHE accelerator engine with 32 RISC cores

MoSys Graph Memory Engine (GME) – Software Overview


As markets continue to migrate to software-defined environments, performance scaling has become key to remaining competitive while addressing the growing demands being placed on the network. Software investments now must be transferrable across multiple hardware environments to be both cost-effective and to provide the required flexibility to meet changing performance demands


The MoSys Graph Memory Engine that the Stellar Packet Classification platform is based on is designed to be software API compatible across a range of software and hardware implementations

At the core of providing a highly scalable platform is virtualizing the accelerator function by creating a functional abstraction with a high-level software API for a specific application area that can be implemented across different hardware and software environments. MoSys calls this a Virtual Accelerator Engine (VAE)


The VAE leverages a common application program interface (API) to enable a platform to achieve performance scaling of up to 100x. The same functions can scale from software running on a CPU; RTL running on an FPGA to very high-performance implementations which combine an FPGA connected to MoSys Accelerator Engine ICs with in-memory compute


Single Common API:


A Virtual Accelerator Engine (VAE) employs a common application program interface (API) to allow for platform solutions portability of a given accelerator function. Implementations can range from software on a CPU, to modules in FPGAs, to very high-end, highly accelerated solutions using FPGAs with MoSys Accelerator Engine ICs like the MoSys PHE with its 32 RISC cores


Single Common RTL Interface


Most VAE platforms will have FPGA-based products with different performance capabilities. The RTL interface across these FPGA products will have a common RTL logic specification (VAE IF)


Designing to a common hardware interface allows for easy migration between different performance and capacity requirements. Additionally, the designer can create a wider range of product offerings and can more easily implement add-on acceleration modules. This provides future proofing because as new silicon becomes available with a VAE port, the designer can drop it into the same logical “socket”


Adaption Layer Software to Higher Existing API


Implementation of a VAE uses an Adaptation Layer


The adaptation layer provides a way for upper levels of software with an existing API to bridge to the VAE common API. The designer may develop their own adaption software libraries or choose to use software provided by MoSys as part of an application-specific platform

New software can utilize the VAE API directly, achieving the highest level of performance and portability. Often the designer must work with existing software, which will require a different method to address.


The adaptation code has an API that is customer-specific and allows the customer to protect the transportability of the code to different hardware environments base on application performance needs.


For example, a packet classification could be implemented as a Stellar Virtual Accelerator and be deployed at different points in the stack and different points in the network all with the same software control API where some instances are pure software on a CPU all the way to hardware accelerator. Having the same common API enables portability of higher-level platform functionality. As new HW accelerator technology emerges, there is an increased probability that it can be utilized without disrupting the layers above, thus providing future proof path to increased performance scaling and capacity.

Utilizing the concept of common API and a virtual function, the VAE architecture can be used in hierarchical systems where the exist parallel paths processing packets from a fast path to slower paths. The benefit of the common API is that the same software stack can be used to control each of the paths in the hierarchy in a unified fashion.

GME Scalability


At the core of providing a highly scalable platform is virtualizing the accelerator function by creating a functional abstraction with a high-level software API for a specific application area that can be implemented across different hardware and software environments. MoSys calls this a Virtual Accelerator Engine (VAE).


The VAE leverages a common application program interface (API) to enable a platform to achieve performance scaling of up to 100x. The same functions can scale from software running on a CPU; RTL running on an FPGA to very high-performance implementations which combine an FPGA connected to MoSys Accelerator Engine ICs with in-memory compute.


Performance Scalability Across Multiple Hardware Environments


Scalability comes from the ability of each implementation to take advantage of available compute and memory resources. Memory can range from DRAM adjacent to the CPU, to an FPGA with just internal SRAM memory or with attached external DRAM, or HBM or MoSys Accelerator Engine IC.

A typical range of performance is from

  • 30 million Graph Memory Engine nodes per second processing on a single x86 core doing processing
  • 300 million nodes per second on FPGA with only internal SRAM memories
  • 600 million nodes per second on FPGA with externally connected MoSys Quazar QPR memories or Blazar BE Accelerator Engines
  • 1.2 billion nodes per second on FPGA with HBM memory
  • 3 billion nodes per second on FPGA plus MoSys PHE IC with firmware on RISC cores

So, depending on the needs of the application a 1 to 100X performance boost can easily be configured using different hardware

Other pages of interest:

Resources