Select Page

Product Details

Part Number: MSR830
Product Family: BLAZAR
Product Type: Accelerator Engine ICs
Package: BGA
Package Size: 27mm x 27mm

Bandwidth Engine 3 – RMW Memory IC

The BE3-RMW features multi-level high-performance SRAM memory with Embedded In-Memory BURST functions for speed and RMW functions for processing data. We combine this with our high-speed serial protocol I/O interface to enable your applications to achieve hyper speed performance.

RMW Accelerator Engine ICs

Superior, High Speed Random Access Memory Architecture

The heart of the memory IC is our advance, parallel array 1-T SRAM with a capacity of 1.152Gb.

  • The memory is divided into 4 partitions.  Each partition has 64 banks allowing parallel (simultaneous) access.
  • Since there are two independent I/O ports per device, several memory access as well as multiple EIMFs can be executing at the same time.
  • Can be used as a Dual-Port memory

The tRC is 2.67 ns allowing up to 5 billion transactions per second.

Fixed In-Memory BURST Functions

The BURST Functions are focused on DATA MOVEMENT where they accelerate getting data in and out of the memory faster and more efficiently by reducing the number of commands.

The BURST Multi-Read/Multi-Write In-Memory Functions can combine up to 8 READS or 8 WRITES into a single BURST function. This reduces the number of memory accesses when moving data, nearly doubling the amount of data that can be moved with that same bandwidth.

And, the Accelerator Engine can do several BURST Functions simultaneously!  Further increasing system performance.

Fixed In-Memory RMW Functions

The RMW Functions are focused on DATA COMPUTING where there is need for memory location modification involving RMW in applications such as counting, statistics, dual counter updates.

Normal memory location modification requires one command to READ a memory location, a second operation to MODIFY the value, and a third command to WRITE the new value back to the memory location.

The RMW Functions provide two levels of speed acceleration. First, the RMW functions can be executed with one command. Second, since the modification is executed within memory, there is no need to move the data out to be modified, and then back into memory to write. This removes all of the I/O latency.

High-Speed Serial Protocol I/O Interface

Our 16 SerDes lanes can transmit data up to 28Gbps, with a optional rates of 10Gbps and 15Gbps.  MoSys' GigaChip Interface (GCI) delivers  full duplex, CRC protected data throughput, enabling up to 10 Billion memory transaction per second on as few as 16 signals.

Traditional memory design requires a lot of interface pins (in some cases 1000’s of pins), making signal routing and integrity a design challenge.

Each Accelerator engine has 2 completely independent, 8 lane, I/O ports that allow simultaneous memory access operations.

Easy to Design-In

  • Fewer pins using serial I/O with the GigaChip Interface technology
  • Clean and reliable signal integrity board layout
  • Standard use as a QDR replacement

Simple to understand EIMF (Embedded In-Memory Functions) to accelerate performance

A lot of high speed random access memory, with easy to understand EIMFs, with so few signal pins.

Cannot get simpler than that!

Key Specifications

Part Number: MSR830
Total Memory Density: 1.152Gb
Max tRC: 2.67ns
Transactions: 10 Billion p/s
Package: BGA
Footprint: 27mm x 27mm


Bandwidth Engine 3 RMW Performance and Features Snapshot

Density (Mb)

tRC (ns)

BURST Embedded In-Memory Functions for superior bandwidth performance.

Max SerDes Rate (Gbps)

Latency (ns)

RMW Embedded In-Memory Functions for offloading common and repetitive functions to memory.

Buffer BW (Gbps)

Billion Accesses (ps)

Bandwidth Engine 3 – RMW (BE3-RMW) Architecture

Understanding MoSys’ Advanced 1T-SRAM Technology

Parallel Array Architecture

  • 16 outstanding transactions
  • 3.75 Billion Transactions per Second
  • 160Gbps full duplex throughput
  • 16ns deterministic read latency
  • 2.67ns Random Cycle time (tRC)

GigaChip Interface

  • 90% efficient throughput
  • Up to 16 low-latency SerDes lanes (12.5Gbps or 25Gbps)

Single-Cell SRAM 70x better SER

  • Full ECC support
  • CRC protected and self-recovering
  • SEU resistant

BE3-RMW Embedded In-Memory Function Overview

ALU/Logical on 72b

  • add, sub, adc, sbb, s1add, s2add, s3add, s3sub, and, or, xor, andn, sar, sir, sll, minu, maxu, mult

Bit field of variable len @ variable pos

  • Extract, deposit, chomp
  • Can be acrossregister boundaries
  • Optional auto incr of pos

Special Functions

  • Find first zero, find first one
  • Population count
  • Swap bits in bytes and bytes in words
  • 144b HASH to 72b (non-crypto)
  • Compute CRC32
  • Mult-way compare with 4, 6, 9, & 12 input


Test and Branch

  • tsteq, tstgt, tstnle, tstlt, tstnge, tstgtu, tstnleu, tstltu, tstngeu, tstbs, tstne, tstle,tstngt, tstge, tstnlt, tstleu, tstngtu, tstgeu, tstnltu, or tstbc
  • Jmp, jeq, jgt, jnle, jlt, jnge, jgtu, jnleu, jltu, jngeu, jbs, jne, jle, jngt, jge, jnlt, jleu, jngtu, jgeu, jnltu, or jbc
  • Multiway branch 2, 3, and 4

Loads and Stores

  • Local Dmem:
    • 8b, 16b, 32b, 64b and 72b
    • Reg + offset, w/auto incr reg
  • Partition
    • Burst reads, load balanced reads and broadcast
    • 64b, 72b, 128b, 135b, 144b
    • Reg + reg or reg + offset, w/auto incr reg

Atomic Operations

  • Local:
    • 8b, 16b, 32b and 64b
    • adda, suba, anda, xora, andna, xchga, cmpxchga
  • Partition
    • 16b, 32b, and 64b
    • Add(s), sub(s), xor, rd/set, tst/set, cmp/set, avg, tm, age

Program Control

  • Hit, Brk and nop
  • Add/mov and halt (tread)
  • Yield

Special Registers

  • GPR indirect specification
  • Auto increment
  • Command, memory, result, result len
  • Time stamp, random, zero, all ones, thread id, wake up, sink


BE3 Embedded In-Memory BURST and RMW Function Opcode Map

The BE3 RMW and BURST functions are a superset of the BE2 BURST and RMW. We have added 31 new functions. Those in Green are not in the BE2.

BE3-RMW Embedded In-Memory BURST Functions

BURST Fixed-Functions

Burst functions are designed to get data in and out the memory more efficiently by reducing the number of commands. Normal transmission requires one command for each transfer of data. However, by bundling eight packages of data to a single command, you’ve eliminated seven unnecessary commands. You can bundle 2, 4 or 8 data packets.

    SerDes Speed Grade
    WidthBURSTThroughput (Gbps)Throughput (Gbps)Throughput (Gbps)
    16 LanesBL8160200320
    8 LanesBL880100160
    4 LanesBL8405080
    Share This