Saturday, June 3, 2017

FPGA with AXI plus GPP with DPDK equals Arkville

And just like that, the passion that has taken so much effort and focus for so long, is launched. The launch this week of Arkville was, for Atomic Rules, a very big deal. By far our most ambitious product to date. We had a lot of help, and really could not have done it alone. The support of The Linux Foundation and the DPDK Community was vital. They ran a must-read terrific interview that really sums it up nicely.

Are we done? No way! Work is already underway for the 17.08 release. Check back here in August.

Tuesday, May 2, 2017

Hoplite Comes of Age

Jan Gray's groundbreaking work on Hoplite has spread far and wide. Besides Jan, nobody has grabbed that baton and run farther with it than Nachiket Kapre. At FCCM2017 earlier this week, we were treated to not one, but two, presentations extending Hoplite in exciting and new directions.

FPGA overlays are a peculiar thing. The proliferation of spatial heterogeneity in reconfigurable computing devices (Trimberger FPL2007, et al) has effectively killed off the promise of a relocatable circuit module.  Overlays fight back, in part, by delivering an abstraction that helps hide the spatial speed bumps.

Kudos to Jan, Nachiket and others who have advanced this work. They are delivering value to the community that may not be fully appreciated until we have our second 25 year retrospective.




Sunday, March 19, 2017

The Road to Arkville

For over a year now, five engineers have applied their passion on the journey of birthing a new product. Arkville is a big deal for Atomic Rules. Coming 18 months after our first product, the UDP Offload Engine, Arkville will be a whole new bag. We're betting that there is significant value at the intersection of two key technologies: DPDK and AXI.

We dont expect this to be an easy journey. Why? Let's take DPDK and AXI one by one.

DPDK is the de facto Linux Kernel Bypass mechanism evolving now for over a decade for use when you need to do useful work on multiple cores without the kernel stack getting in the way. Most DPDK users work with merchant ASIC-based NICs or virtualized NICs. Despite the "Data Plane" in DPDK, not all users see it as just an I/O mechanism. There's more to it.

Also over the past decade, AXI, a proper sub-set of AMBA, has emerged in the FPGA world as a standardized hardware interface and API. If you are building a Green-box fleet of RTL accelerators to go inside your FPGA; you probably want to use AXI for the plumbing. If not, you are probably writing gaskets with a throughput and latency tax.

Atomic Rules understands that both the DPDK GPP software world and the AXI RTL gateware world are two different things. And experts in one are seldom experts in the other. The guiding light for Arkville design has been to take the established DPDK APIs and ABIs and implement a DPDK- and AXI- aware packet conduit between GPP and FPGA. Conceptually abstract, but physically (at product launch in May) to be PCIe. We're excited about all of this, and if you are too, please get in touch. Since you made it this far, here's a hidden link on our site about Arkville in alpha-testing.


Monday, January 16, 2017

Paced Packet Player

We employed our Esopus Creek technology to build a DPDK Paced Packet Player (PPP), shown here inside a Dell R730 server. Four independent 100 GbE ports, line rate, ns accurate, no waiting. Red arrows show the added 12V supply used in this instance. Watch for more Esopus Creek in our upcoming Arkville product offering later this year.

Wednesday, December 2, 2015

Exciting Times

It’s almost 2016. 20nm FPGAs are here and 16 nm are coming. You can buy an inexpensive 128-port OpenFlow switch/router with 10/25/40/50/100 GbE ports. Intel is roaring back into networking with Red Rock Canyon (RRC). Opportunities abound at the endpoints of these commodity communications networks!

An exciting time for Reconfigurable Computing (RC):
  • 20 nm FPGAs are in production
  • 20 nm FPGAs have 25 Gbps+ capable SERDES
  • 25/50/100 GbE technology-enablers
  • Able to serve 10/40 GbE by running SERDES at lower rates
An exciting time for Software Defined Networking (SDN):
  • The P4 language is rapidly evolving
  • Game-Changing Broadcom Tomahawk
  • Game-Changing Intel Red Rock Canyon (RRC)
An exciting time for Atomic Rules:
  • Timing closure at 400 MHz through architecture
  • A UDP/UOE product, first mover for 25 GbE
  • 10/25/100 GbE L2/L3 stacks using production IP

Saturday, April 19, 2014

Dealt a new Deck

We are so excited with the release this week by Xilinx of Vivado 2014.1 . Having been involved since the dark-ages with the underpinnings of this world-class CAD tool for FPGA, we're always eager to see what's new and improved. With almost a dozen 28 nm designs under our belt, we made quick work of porting designs working on platforms like the KC705, VC709, and ZC706 from 2013.4 to 2014.1. We love that we can script in Tcl, or click in the GUI. We like to see both work equally well. We need more time to have hard quantitative evidence, but across the board we are seeing 10~20% placer run time reductions, with no loss of Quality of Results. Not sure where this gain is coming from, but we will measure and find out. This is crazy good. Thanks Xilinx!

Altera is looking to answer with their salvo of Quartus II 14.0 soon, so stay tuned. Altera does have some interesting silicon announced down the pike; and we're happy to bring our RTL to whatever devices our clients are using. Still, from a do-it-all CAD tool perspective, we lament that Quartus lacks a built-in functional simulator like Vivado. Oh well... separation of concerns!

What a world it would be if Xilinx and Altera got out of the CAD tool business and just focused on world-class FPGA silicon, leaving the EDA tools to the EDA tool vendors. Or to the open source community! It's nice to dream. Still, with tool suites like Vivado that build in aspects like IP Integrator, and IP Packager, we're glad (for now) that the FPGA vendors are in the EDA game.

Sunday, March 30, 2014

Functional Correctness Quickly

A saying we use around here that has evolved from axiom to dogma is this:

“We shall achieve Functional Correctness Quickly and Performance Correctness Iteratively”

In a fast-moving world, this has substantial value to our clients who are often looking for us to operationalize a new platform or algorithm. They turn to us to provide a “show-me” demo under tight time and cost constraints. That near-in demonstration is rarely the end goal. Frequently they want to tune for lower latency, more throughput, better energy efficiency, lower gate area, and add features needed to address their continuously evolving market requirements. We live for this. This is our passion.

How do we do this? In our world, where reconfigurable computing intersects with the business of complex concurrency, we have some things dictated to us and we have places where we have some room to choose.

In the dictated-to-us department we have the FPGA back-end tools for synthesis and implementation. Xilinx dictates Vivado, Altera dictates Quartus. Tabula dictates Stylus, and so on. The vendor tools all have their strengths and weaknesses; and they all provide the capability to go from RTL to bitstream. We have our preferences here, but our clients come first. If they choose an Efinix device, we will use the flow thus dictated to us. Fine.

It is in the higher-level choice of Domain Specific Language (DSL) where our choice counts the most, especially as it relates to “Functional Correctness Quickly”. In our agile sprints to quickly achieve first-functional code, we have used Bluespec SystemVerilog (BSV) for nearly a decade.

The fire-drill to first-functional often goes something like this: We rough out a relationship of various modules and their hierarchy. Each module provides an interface. Without getting caught up in the implementation that goes in these modules, we are free to think solely of the interaction patterns between them. We go through this healthy IP birthing-phase where we may anthropomorphize the module on behalf of the designer. “Shep’s TagServer is in the business of producing an ordinal sequence of tags to a plurality of clients”. In this first step we are very much concerned with what each module does, not how it does it, and what sort of interaction patterns are needed with other modules. We may even assign abstract types to the interfaces because we don’t know yet what is needed.

Bluespec System Verilog (BSV) is exceptionally well-suited for this kind of scrum. We work in either the command-line or GUI-based Bluespec Development Workstation (BDW) mode rapidly crafting Types, Interfaces and Modules. Feedback from design entry to viewing compiled results comes in just seconds; an edit compile debug cycle that is no laggard to the best software development techniques. When we’re done with this step we have the rough-structure of our design in place, which I like to call scaffolding.

Getting to first-functional, the Functional Correctness Quickly part, requires us to provide an implementation for our modules. Because we spent the first step compartmentalizing what does what and how modules communicate, this second-step goes more-quickly as we already know precisely “what business” every module is in. This focus not only let’s us code more concisely, but is also well-suited for quick, on-the-fly, DUT testbenches that add a built in verification aspect to the design process. We will usually code both standalone Bluesim simulations, which are C-based bit- and cycle- true simulations that can be seen in less than a minute; as well as Verilog-RTL simulation which we can run on a Verilog simulator (rarely) or push through to the FPGA and run on hardware (often). Because all BSV code synthesizes there is no special step or flow for this. The benefit is that we see feedback from the FPGA vendor tools for area and Fmax early in the module component development process.

With our collection of modules implemented, we push through the entire design to FPGA. Since the individual modules have been tested, and the interconnection of modules are tested and standardized, it is infrequent that the system-level composition exhibits unexpected behavior. In the rare case where a behavior is not understood, we bisect until we locate the issue. But most of the time, ta dah, we have our first functional code running. Functional Correctness Quickly!

But we may not be done. We will measure the metrics and check against the current requirements to see if we are. It is common to see requirements change. A design that does not tolerate change well is said to be brittle. BSV designs are the opposite of brittle. Let’s say the TagServer we described earlier needed to be embellished to produce a tag type that was not just a unsigned integer, but more-elaborate data structure. Fine! Have at it! With a few tactical changes the modules that produce and consume tags can be adapted. Changes that would be gut wrenching in VHDL or Verilog are almost effortless. And conceptually you are empowered to change the “what” (Types) and how (Rules) separately. Bottom line here is that you get your Performance Correctness Iteratively!

We acknowledge that there may be other ways to realize our agile practice of racing to first-functional and then continuously improve. However we know of no better EDA tool than BSV, when dealing with parallel communicating sequential processes, what we call Complex Concurrency, so common in FPGA.