Wednesday, December 2, 2015

Exciting Times

It’s almost 2016. 20nm FPGAs are here and 16 nm are coming. You can buy an inexpensive 128-port OpenFlow switch/router with 10/25/40/50/100 GbE ports. Intel is roaring back into networking with Red Rock Canyon (RRC). Opportunities abound at the endpoints of these commodity communications networks!

An exciting time for Reconfigurable Computing (RC):
  • 20 nm FPGAs are in production
  • 20 nm FPGAs have 25 Gbps+ capable SERDES
  • 25/50/100 GbE technology-enablers
  • Able to serve 10/40 GbE by running SERDES at lower rates
An exciting time for Software Defined Networking (SDN):
  • The P4 language is rapidly evolving
  • Game-Changing Broadcom Tomahawk
  • Game-Changing Intel Red Rock Canyon (RRC)
An exciting time for Atomic Rules:
  • Timing closure at 400 MHz through architecture
  • A UDP/UOE product, first mover for 25 GbE
  • 10/25/100 GbE L2/L3 stacks using production IP

Saturday, April 19, 2014

Dealt a new Deck

We are so excited with the release this week by Xilinx of Vivado 2014.1 . Having been involved since the dark-ages with the underpinnings of this world-class CAD tool for FPGA, we're always eager to see what's new and improved. With almost a dozen 28 nm designs under our belt, we made quick work of porting designs working on platforms like the KC705, VC709, and ZC706 from 2013.4 to 2014.1. We love that we can script in Tcl, or click in the GUI. We like to see both work equally well. We need more time to have hard quantitative evidence, but across the board we are seeing 10~20% placer run time reductions, with no loss of Quality of Results. Not sure where this gain is coming from, but we will measure and find out. This is crazy good. Thanks Xilinx!

Altera is looking to answer with their salvo of Quartus II 14.0 soon, so stay tuned. Altera does have some interesting silicon announced down the pike; and we're happy to bring our RTL to whatever devices our clients are using. Still, from a do-it-all CAD tool perspective, we lament that Quartus lacks a built-in functional simulator like Vivado. Oh well... separation of concerns!

What a world it would be if Xilinx and Altera got out of the CAD tool business and just focused on world-class FPGA silicon, leaving the EDA tools to the EDA tool vendors. Or to the open source community! It's nice to dream. Still, with tool suites like Vivado that build in aspects like IP Integrator, and IP Packager, we're glad (for now) that the FPGA vendors are in the EDA game.

Sunday, March 30, 2014

Functional Correctness Quickly

A saying we use around here that has evolved from axiom to dogma is this:

“We shall achieve Functional Correctness Quickly and Performance Correctness Iteratively”

In a fast-moving world, this has substantial value to our clients who are often looking for us to operationalize a new platform or algorithm. They turn to us to provide a “show-me” demo under tight time and cost constraints. That near-in demonstration is rarely the end goal. Frequently they want to tune for lower latency, more throughput, better energy efficiency, lower gate area, and add features needed to address their continuously evolving market requirements. We live for this. This is our passion.

How do we do this? In our world, where reconfigurable computing intersects with the business of complex concurrency, we have some things dictated to us and we have places where we have some room to choose.

In the dictated-to-us department we have the FPGA back-end tools for synthesis and implementation. Xilinx dictates Vivado, Altera dictates Quartus. Tabula dictates Stylus, and so on. The vendor tools all have their strengths and weaknesses; and they all provide the capability to go from RTL to bitstream. We have our preferences here, but our clients come first. If they choose an Efinix device, we will use the flow thus dictated to us. Fine.

It is in the higher-level choice of Domain Specific Language (DSL) where our choice counts the most, especially as it relates to “Functional Correctness Quickly”. In our agile sprints to quickly achieve first-functional code, we have used Bluespec SystemVerilog (BSV) for nearly a decade.

The fire-drill to first-functional often goes something like this: We rough out a relationship of various modules and their hierarchy. Each module provides an interface. Without getting caught up in the implementation that goes in these modules, we are free to think solely of the interaction patterns between them. We go through this healthy IP birthing-phase where we may anthropomorphize the module on behalf of the designer. “Shep’s TagServer is in the business of producing an ordinal sequence of tags to a plurality of clients”. In this first step we are very much concerned with what each module does, not how it does it, and what sort of interaction patterns are needed with other modules. We may even assign abstract types to the interfaces because we don’t know yet what is needed.

Bluespec System Verilog (BSV) is exceptionally well-suited for this kind of scrum. We work in either the command-line or GUI-based Bluespec Development Workstation (BDW) mode rapidly crafting Types, Interfaces and Modules. Feedback from design entry to viewing compiled results comes in just seconds; an edit compile debug cycle that is no laggard to the best software development techniques. When we’re done with this step we have the rough-structure of our design in place, which I like to call scaffolding.

Getting to first-functional, the Functional Correctness Quickly part, requires us to provide an implementation for our modules. Because we spent the first step compartmentalizing what does what and how modules communicate, this second-step goes more-quickly as we already know precisely “what business” every module is in. This focus not only let’s us code more concisely, but is also well-suited for quick, on-the-fly, DUT testbenches that add a built in verification aspect to the design process. We will usually code both standalone Bluesim simulations, which are C-based bit- and cycle- true simulations that can be seen in less than a minute; as well as Verilog-RTL simulation which we can run on a Verilog simulator (rarely) or push through to the FPGA and run on hardware (often). Because all BSV code synthesizes there is no special step or flow for this. The benefit is that we see feedback from the FPGA vendor tools for area and Fmax early in the module component development process.

With our collection of modules implemented, we push through the entire design to FPGA. Since the individual modules have been tested, and the interconnection of modules are tested and standardized, it is infrequent that the system-level composition exhibits unexpected behavior. In the rare case where a behavior is not understood, we bisect until we locate the issue. But most of the time, ta dah, we have our first functional code running. Functional Correctness Quickly!

But we may not be done. We will measure the metrics and check against the current requirements to see if we are. It is common to see requirements change. A design that does not tolerate change well is said to be brittle. BSV designs are the opposite of brittle. Let’s say the TagServer we described earlier needed to be embellished to produce a tag type that was not just a unsigned integer, but more-elaborate data structure. Fine! Have at it! With a few tactical changes the modules that produce and consume tags can be adapted. Changes that would be gut wrenching in VHDL or Verilog are almost effortless. And conceptually you are empowered to change the “what” (Types) and how (Rules) separately. Bottom line here is that you get your Performance Correctness Iteratively!

We acknowledge that there may be other ways to realize our agile practice of racing to first-functional and then continuously improve. However we know of no better EDA tool than BSV, when dealing with parallel communicating sequential processes, what we call Complex Concurrency, so common in FPGA.

Sunday, February 23, 2014

The Ultimate Digital Designer’s Assistant

For most of the past decade we’ve had the choice of implementing the solutions to our client’s needs in RTLs, such as VHDL, Verilog, or SystemVerilog; or in Bluespec SystemVerilog (BSV). Because much of what we do involves the business of complex concurrency, lots of moving parts that are difficult to reason about in the whole but easier to grasp a rule-step at a time, BSV has been an obvious choice for us. But there is an important aspect of designing with BSV that we feel is often overlooked by those not familiar with the art, and is perhaps one of the language’s best features:
BSV is the Ultimate Digital Designer’s Assistant!

There’s a saying in our shop with regard to clearly-written BSV codes that goes something like this: “If it compiles, it will work”. With those words, hundreds of verification engineers have just aimed tactical nukes towards Manchester New Hampshire, so please let me explain. There are several reasons why this is the case, I’ll touch on a couple.

First, there are crazy-good type safety and formal guarantees enforced by the compiler. Compared to Verilog, which is awful, and even VHDL, which is less-awful, the BSV compiler does an amazing amount of analysis and type-checking statically at compile time within seconds of your pinky coming down on the enter key. We have spent days debugging issues in conventional RTL that the BSV compiler would have detected in seconds. In 2006, Stuart Sutherland presented a SNUG paper highlighting 57 “Gotchas” with Verilog and SystemVerilog. BSV detects or avoids the vast majority of them. There is simply that much less to get-wrong, and when it is wrong, you are getting the most-excellent feedback from the compiler while the ideas and concepts have barely left your fingertips and are fresh in your mind.

Next, there is this quality; you may have seen it on the masthead of this blog, of “Scalable Atomicity”. In a nutshell, the BSV designer reasons about rules within a module, and methods a module exposes to others. The BSV compiler ruthlessly checks and ensures that the deterministic behavior of the resultant sequential circuit is equivalent to the result of each rule firing one at a time. We look at any rule in our system, and the interesting state-changing actions between the curly-braces and reason about them one-at-a-time. Our reptilian brains can handle that focused task! The BSV compiler then composes a deterministic schedule, devoid of any new state elements, that produces results identical to the one-at-a-time firing of all the rules in the system we have just reasoned about.

Our IP shop designs DMA engines, Packet Processors, Beamformers, and the like. Our clients demand that we “go deep quick”. Meaning we show them something partially-functional almost immediately. Without BSV by our side to check our work at every compile, we would not be able to move nearly as quickly. The edit-compile-debug cycle with BSV is seconds. We go around that loop, including bit- and cycle- true C simulation dozens of times each day. Then we make a bitstream, and it “just works”. If you find yourself complaining that the FPGA vendor tools take a long time and slow your productivity; perhaps you should explore how a digital designer’s assistant such as BSV would change your world.

Wednesday, November 27, 2013

Debian 7.2 and OpenCPI

A client of ours using OpenCPI had selected the Debian flavor of Linux for their work. That's fine. Certainly in the FPGA space, and for that matter, across all processor technologies, we have tried not to dictate any particular Linux. That said, the developers had added support for RedHat, CentOS, and MacOS from the ancient times. More recently Ubuntu was added into the mix. And just yesterday, with this commit, we dealt with Debian. It's interesting seeing the subtle, and not-so-subtle, differences across a dozen different Linux distributions.

Wednesday, October 16, 2013

OCP-IP move to Accellera

We're cautiously optimistic, maybe even excited, about the news that went public yesterday of OCP-IP being rolled into Accellera's warm arms. We have participated in technical issues for both; but very much like that Accellera is a proven conduit to IEEE standardization. Open and accessible standards are a good thing. Time will tell, but out of the gate we are pleased that our man years of investment into the OpenCPI Worker Interface Profiles (WIPs), which often incorporate OCP-IP concepts, can live on. Come see us at FPGA-2014 in Monterey to understand why, years later, we still feel the OCP/AXI choice is like Coke and Pepsi.

Tuesday, October 8, 2013

Ubuntu 13.04 and OpenCPI

We've been using RHEL as our standard Linux for over five years. Initially RHEL5 64b WS, then RHEL6. On the plus side, RHEL is almost always embraced by our CAD and CAE tool vendors as a supported OS. On the minus side, the libraries are as old as the hills and we really weren't feeling the love of sending $200/machine-image/year to Red Hat for this configuration grief. We had noodled in earlier versions of Ubuntu and had some mixed feelings. With Ubuntu 13.04 this past summer, I tried it again and it was just great: everything "just worked". We didn't even have to muck around with GPU drivers. What used to be a half day project of dependency-hell installed in one apt-get. Wow! I don't think of us as OS-bigots; but the understandable contrast between the library hassles we had with something as old as RHEL5 and didn't have with an OS as modern as Ubuntu 13.04 was just too much to overlook. We've started to let our RHEL licenses lapse; just keeping around a few frozen RHEL5s and one maintained RHEL6 so we have them in shop. But unless a client or application needs some other Linux; it's Ubuntu 13.04 in our shop this fall and going forward.

Jim Kulp at Parera did us all a solid by refreshing the OpenCPI mainline to build cleanly to Ubuntu 13.04 as well. Nice! He pushed those changes to the OpenCPI GitHub repo this evening. Thanks Jim!