Sunday, March 30, 2014

Functional Correctness Quickly

A saying we use around here that has evolved from axiom to dogma is this:

“We shall achieve Functional Correctness Quickly and Performance Correctness Iteratively”

In a fast-moving world, this has substantial value to our clients who are often looking for us to operationalize a new platform or algorithm. They turn to us to provide a “show-me” demo under tight time and cost constraints. That near-in demonstration is rarely the end goal. Frequently they want to tune for lower latency, more throughput, better energy efficiency, lower gate area, and add features needed to address their continuously evolving market requirements. We live for this. This is our passion.

How do we do this? In our world, where reconfigurable computing intersects with the business of complex concurrency, we have some things dictated to us and we have places where we have some room to choose.

In the dictated-to-us department we have the FPGA back-end tools for synthesis and implementation. Xilinx dictates Vivado, Altera dictates Quartus. Tabula dictates Stylus, and so on. The vendor tools all have their strengths and weaknesses; and they all provide the capability to go from RTL to bitstream. We have our preferences here, but our clients come first. If they choose an Efinix device, we will use the flow thus dictated to us. Fine.

It is in the higher-level choice of Domain Specific Language (DSL) where our choice counts the most, especially as it relates to “Functional Correctness Quickly”. In our agile sprints to quickly achieve first-functional code, we have used Bluespec SystemVerilog (BSV) for nearly a decade.

The fire-drill to first-functional often goes something like this: We rough out a relationship of various modules and their hierarchy. Each module provides an interface. Without getting caught up in the implementation that goes in these modules, we are free to think solely of the interaction patterns between them. We go through this healthy IP birthing-phase where we may anthropomorphize the module on behalf of the designer. “Shep’s TagServer is in the business of producing an ordinal sequence of tags to a plurality of clients”. In this first step we are very much concerned with what each module does, not how it does it, and what sort of interaction patterns are needed with other modules. We may even assign abstract types to the interfaces because we don’t know yet what is needed.

Bluespec System Verilog (BSV) is exceptionally well-suited for this kind of scrum. We work in either the command-line or GUI-based Bluespec Development Workstation (BDW) mode rapidly crafting Types, Interfaces and Modules. Feedback from design entry to viewing compiled results comes in just seconds; an edit compile debug cycle that is no laggard to the best software development techniques. When we’re done with this step we have the rough-structure of our design in place, which I like to call scaffolding.

Getting to first-functional, the Functional Correctness Quickly part, requires us to provide an implementation for our modules. Because we spent the first step compartmentalizing what does what and how modules communicate, this second-step goes more-quickly as we already know precisely “what business” every module is in. This focus not only let’s us code more concisely, but is also well-suited for quick, on-the-fly, DUT testbenches that add a built in verification aspect to the design process. We will usually code both standalone Bluesim simulations, which are C-based bit- and cycle- true simulations that can be seen in less than a minute; as well as Verilog-RTL simulation which we can run on a Verilog simulator (rarely) or push through to the FPGA and run on hardware (often). Because all BSV code synthesizes there is no special step or flow for this. The benefit is that we see feedback from the FPGA vendor tools for area and Fmax early in the module component development process.

With our collection of modules implemented, we push through the entire design to FPGA. Since the individual modules have been tested, and the interconnection of modules are tested and standardized, it is infrequent that the system-level composition exhibits unexpected behavior. In the rare case where a behavior is not understood, we bisect until we locate the issue. But most of the time, ta dah, we have our first functional code running. Functional Correctness Quickly!

But we may not be done. We will measure the metrics and check against the current requirements to see if we are. It is common to see requirements change. A design that does not tolerate change well is said to be brittle. BSV designs are the opposite of brittle. Let’s say the TagServer we described earlier needed to be embellished to produce a tag type that was not just a unsigned integer, but more-elaborate data structure. Fine! Have at it! With a few tactical changes the modules that produce and consume tags can be adapted. Changes that would be gut wrenching in VHDL or Verilog are almost effortless. And conceptually you are empowered to change the “what” (Types) and how (Rules) separately. Bottom line here is that you get your Performance Correctness Iteratively!

We acknowledge that there may be other ways to realize our agile practice of racing to first-functional and then continuously improve. However we know of no better EDA tool than BSV, when dealing with parallel communicating sequential processes, what we call Complex Concurrency, so common in FPGA.