Accelerating FPGA Applications, Reducing Costs and Quicker Design Time

  • MoSys has been developing memory-based products for close to 20 years. It started with the development of the 1-T (transistor) memory IP. The 1 T-SRAM, has access speeds close to SRAM but support a density that approaches that of a RLDRAM and has revolutionized the memory industry. It is still being used by many companies in their products today.
  • Most memory manufacturers have focused on memory features of speed or density with a belief that one size fits all. They have not taken into consideration that memories could speed up applications by adding a level of intelligence.
  • Most memory manufacturers have focused on memory features of speed or density with a belief that one size fits all. They have not taken into consideration that memories could speed up applications by adding a level of intelligence.
  • The result, a family of devices call Accelerator Engines. This is a new class of memory called EFAM (Embedded Function Accelerating Memory). 
  • Each Accelerator Engine is a combination of four capabilities:
    • High Capacity Memory of 576Mb  or 1.152Gb, with a tRC  of 2.67ns
    • High Speed Embedded In-Memory Functions (Intelligence) which include BURST and RMW
    • User Define In Memory Functions using Embedded 32 RISC cores
    • High Speed Serial Interface for High Bandwidth and Simplified Board Layout 
  • MoSys Supplied FPGA RTL Memory Controller. Handles All Serial Communication and Provides a Parallel QDR Like Interface
  • The Embedded In-Memory Functions of BURST and RMW are designed to execute much faster as in-memory, than could be executed in traditional memory. For the highest acceleration possible, common or complex function can be moved into an EFAM with 32 RISC cores for HyperSpeed performance.

Key Features

The Blazar Family of Accelerator Engines Provide a system Architect/Designer a new Acceleration Options encompassing Software and Hardware options that no Competing Product can Offer

The EFAM memory evolution is a result of the development of the MoSys Blazar Accelerator Engine Family

Diagram of the key components of the MoSys Blazar Accelerator Engine Family.

 

The Accelerator Engine Memory IC family includes:

  • Bandwidth Engine 2 BURST (BE2-BURST)
  • Bandwidth Engine 3 BURST (BE3-BURST)
  • Bandwidth Engine 2 RMW (BE2-RMW)
  • Bandwidth Engine 3 RMW (BE3-RMW)
  • Programable HyperSpeed Engine (PHE)

General Application Selector Guide

Block Diagrams of Memory Architecture and Capacity

BE2-BURST with 576Mb
BE2-RMW with 1.52Gb
BE3-BURST with 576Mb
BE3-RMW with 1.52Gb
PHE with 1.52Gb

Benefits of BE3 Serial Memory vs QDR

Serial memories bring many advantage over traditional parallel signal memory device like QDRs.

Allows high bandwidth over a few pins

This is a typical 2 QDR design compared with a BE2 or BE3

Summary of Benefits

  • 1Gb device … Replaces 8 QDR/SyncSRAM devises
  • Costs … One BE-3 is approx. the price of 3 QDR memories with 8x the memory
  • Pins … Typical application uses only 16 signals (32 pins) with signal Auto-Adaptation

More High-Speed memory generally allows
acceleration options for software and
hardware architects/Designers

Overview comparison of BE3 to QDR

  • Memory size
    • BE3 with 1Gb equivalent to 8 QDR at 144Mb per device
  • Device PCB board Space Saving
    • 1 BE3 device vs 8 QDR devices
  • Signal Pins Reductions
    • 8 QDR…1Gb requires 1072-1440 pins
    • 1 BE3 …1Gb…BE3 typical system uses 8 lanes or 32 pins.
    • All BE devices have Auto-Adaptation which handles on-board signal tuning, eliminating the need for any external components to insure a clean, reliable signals
  • Costs
    • One BE-3 is approx. the price of 3 QDR memories with 8x the memory of single QDR
  • Benefits
    • Larger Buffers, High Bandwidth
    • Allows Realtime operations and analysis at Line rate
    • Eliminates need for complex parallel operations using RLDRAM, HBM, or slow DRAM

Benefits of BE2 Serial Memory vs QDR

Serial memories bring many advantage over traditional parallel signal memory device like QDRs.

Allows high bandwidth over a few pins

This is a typical 2 QDR design compared with a BE2 or BE3

Summary of Benefits

  • 576 Mb device … Replaces 4 QDR/SyncSRAM devises
  • Costs … One BE-2 is approx. the price of 2 QDR memories
  • Pins … Typical application uses only 16 signals (32 pins) with signal Auto-Adaptation

More High-Speed memory generally allows
acceleration options for software and
hardware architects/Designers

Overview comparison of BE2 to QDR

  • Memory size
    • BE2 with 576 Mb equivalent to 4 QDR at 144Mb per device
  • Device PCB board Space Saving
    • 1 BE2 device vs 4 QDR devices
  • Signal Pins Reductions
    • 4 QDR…576 Mb requires 500-720 pins
    • 1 BE2 …576 Mb…BE2 typical system uses 8 lanes or 32 pins.
    • All BE devices have Auto-Adaptation which handles on-board signal tuning, eliminating the need for any external components to insure a clean, reliable signals
  • Costs
    • One BE-2 is approx. the price of 2 QDR memories with 4x the memory of single QDR
  • Benefits
    • Larger Buffers, High Bandwidth
    • Allows Realtime operations and analysis at Line rate
    • Eliminates need for complex parallel operations using RLDRAM, HBM, or slow DRAM

High Speed Memory … More Is Better

  • The Accelerator Engine family has been designed to be used as a typical high speed, high density memory.  The application interface is parallel and similar to a QDR or Sync SRAM.
  • MoSys memory devices
    • 576 Mb
    • 1Gb
    • tRC 2.6ns
    • MoSys provide the FPGA RTL control and allows the user to select a version that best fits his requirement
      • x8, x16, x32, etc.
  • In addition the Accelerator Engine requires a few as 32 signals to interface to an FPGA.

Performace vs QDR

  • For system using 2 or more QDR
    • Truly random memory access the performance is essentially the same
    • Using the BURST or RMW In Memory functions, depending on the application, typical speed improvement can significant.
  • For systems using 1 QDR
    • For single QDR systems, depending on the pattern of memory accesses, we can be equivalent
    • Using the BURST or RMW In Memory functions, depending on the application, typical speed improvement can be significant

For systems using 1 QDR, you should see a speed improvement by moving to a higher capacity  BE2 with 576Mb which is 4x a QDR, for about the same cost as 2 QDRs.

QDR Parallel vs MoSys Serial Comparison

More High Speed memory generally allows acceleration options for software and hardware architects/Designers

Overview comparison to QDR

  • Memory Size
    • BE2 with 576Mb requires 4 QDR 144Mb per device
    • BE3 with 1.52Gb requires 8 QDR with 144MB per device
  • Device Space Saving
    • 1 BE2 vs 4 QDR
    • 1 BE3 vs 8 QDR
  • Signal Pins Reductions
    • 576Mb…BE2 with 32 pins/signals, 4 QDR with500-700 pins/signals
    • 1Gb…BE3 with 32 pins/signals, 8 QDRs with 1000-1500 pins/signals
  • Benefits
    • Larger Buffers
    • High Bandwidth
    • Allows Line and Data Realtime operations and analysis
    • Eliminates need for complex parallel operations using RLDRAM, HBM, or slow DRAM


Serial FPGA Memory Interface

  • Implementation of parallel QDR SRAMs require, routing 100s of high-speed signals
    • A typical 2 QDR system of 288Mb can require approximately 300 , 4 devices requires approx. 600 high speed pins, and so on up to 1400 for 1Gb.
    • The denser the memory requirement the more difficult and complex the routing of signal traces becomes.
    • To implement a MoSys Accelerator Engine design of 576Mb or 1Gb is accomplished with only one device and a typical system only uses 16 signals which is 32 pins​

In addition the MoSys devices have Auto-Adaptation which handles on-board signal tuning, eliminating the need for any external components to insure a clean, reliable signal​

Simple SRAM Parallel RTL Interface

MoSys RTL Controller:

  • Serial to Simple FPGA Parallel SRAM Interface

Makes the serial interface transparent…looks like a typical parallel SRAM interface

  • Not familiar with a Serial Memory…don’t worry about it!
  • MoSys provides, a free, FPGA RTL Memory Controller that interfaces with the MoSys Bandwidth Engine.
  • The signals interface at the User Application for the Bandwidth Engine is a SIMPLE SRAM, parallel memory read/write operation with optional Burst/RMW commands.
  • Version available for x8, x16, x32, etc.
  • This controller handles all the logic for the Serial GigaChip Interface (GCI) between the FPGA and the BE.
  • This simple interface shields the users from the BE2 commands and the scheduling logic for Bandwidth Engine  memory partition timing.
  • Simple! Like using a QDR, but faster!

Optional Advanced Acceleration Options

In Memory Functions: Intelligent Acceleration Functions

  • By using the In-Memory functions applications:
    • Are accelerated beyond the limitations of memory access speeds.
    • Acceleration is achieved by the In-Memory function executing IN the memory chip (Accelerator Engine) which reduces the number external system commands needed to accomplish the same task.
  • Adding Accelerator Engine ICs as part of your overall memory strategy enables your applications to run faster and more efficiently. This level of performance is achieved by leveraging MoSys’ heritage of superior memory architecture, high-speed SerDes input/output transmission, and advanced IMF (In Memory Function) Technology.
  • By using the In Memory functions you can accelerate application beyond just memory speeds. The In Memory function is executed without needing external intervention to the memory, which reduces the number of system operations needed outside the memory to accomplish the same task. Again, freeing up more system operation time for other application tasks.
  • Simplify your RTL
  • MoSys has defined three groups of Embedded In-Memory Functions (EIMFs):
    • BURST FUNCTIONS… For high speed sequential read and write operation for data movement    (LEARN MORE)
    • RMW (Read/Modify/Write) FUNCTIONS… For in device data modification and decision  (LEARN MORE)
    • USER DEFINED FUNCTIONS… Such as common task or complex algorithms  (LEARN MORE)
  • Each group delivers different increments of performance acceleration. Which functions are embedded and whether is has 576Mb or 1.52Gb defines a MoSys Accelerator Engine IC.


In Memory Function – BURST Functions

  • Focused on DATA MOVEMENT to accelerate getting data in and out of the memory faster and more efficiently by reducing the number of command cycles.
  • A typical BURSTS In Memory function allows the system to read and/or write sequential memory location by only giving the starting address and then specifying either 2, 4 or 8 location access.
  • The BURST Read/Write In-Memory Functions can combine up to 8 READS and 8 WRITES into a single BURST command.
  • Tripling the amount of date by reducing the number of command cycles
  • BURST Functions can execute simultaneously, further increasing system performance

Example of BURST In-Memory Execution

In Memory Functions – RMW Functions

  • Focused on DATA COMPUTING AND DECISION where there is need for memory location modification involving RMW in applications such as metering, as well a single or dual counter update for statistics.
  • There are over 27 operations available such as add, subtract, compare, increment, etc.
  • The RMW function is done in one command, where traditional memory require 3 commands
    • A location modification requires first, one command to READ a memory location, a second command to MODIFY the value, and a third command to WRITE the new value back to the memory location.
  • The RMW Functions provide at least two levels of speed acceleration.
    • First, the RMW functions can be executed with a single command.
    • Second, since the modification is executed within memory, there is no need to move the data out to be modified, and then back into memory to write. This removes all of the associated I/O latency.

Example of RMW In-Memory Execution


In Memory Functions – USER DEFINED Functions

The Programmable HyperSpeed Accelerator Engine (PHE) has 32 RISC Cores and allows many options for Acceleration by firmware in the device:

Moving functions and operations into the PHE:

  • Functions in the FPGA RTL
    • Commonly used functions
    • Standard and application unique algorithms
    • Special functions
    • Frees up RTL in an FPGA
  • Function currently in Software Application to significantly impact performance
  • The general capability of the 32 cores
    • Powerful RISC Instruction Set that include instructs for hashing etc.
    • Allows parallel processing
    • Same function can be installed several times (up to 32 times) and run simultaneously
    • Up to 256 threads
    • Other optional powerful features

Examples of User Defined In Memory Functions

Using the 32 RISC Cores…Think Creatively!

  • User Defined Functions are specialized for a user’s application but here are some possible functions:
    • Bayesian
    • Random Forest of Trees
    • Repetitive data modification
    • Data Analysis
    • Image translation/editing functions
    • High speed buffer data analysis
    • Etc.
  • What functions or algorithms do you need to:
    • Run faster?
    • Move from FPGA to free up resources?
    • Move from CPU Software to MoSys Firmware for speed
    • Take advantage of speed boost by parallel operation (Multi-thread option)

Programmable HyperSpeed Engine (PHE)

32 RISC CORE Architecture for User Defined Functions


Advanced Memory Application Use

Standard Memory Interface

  • Each Accelerator Engine Memory has two 8 lane serial Ports.
  • Each port has 16 Data Lane which is 32 signals.
  • Typical system needs only one 8 LAN port (shown)
  • A QDR application requires on 1 port which is 32 signals
  • In addition the MoSys devices have auto-adaptation which handles on-board signal tuning, eliminating the need for any external components to insure a clean, reliable signal.

Dual Port Memory Interface Application

  • Each Accelerator Engine Memory has two 8 lane serial Ports.
  • Each port has 16 Data Lane which is 32 signals.
  • Each Accelerator Port operates as a true Dual Port with completely independent and simultaneously access
  • In addition the MoSys devices have auto-adaptation which handles on-board signal tuning, eliminating the need for any external components to insure a clean, reliable signal.

Super High Bandwidth Interface

  • Each Accelerator Engine Memory has two 8 lane serial Ports.
  • Each port has 16 Data Lane which is 32 signals.
  • For extremely high bandwidth requirements, these two ports can be combined as one super high bandwidth port.
  • In addition the MoSys devices have auto-adaptation which handles on-board signal tuning, eliminating the need for any external components to insure a clean, reliable signal.

Select the best fit Acceleration Engine for your next project!

Free Email Updates
Sign up to get the latest content
We respect your privacy.