Saturday, April 19, 2014

Dealt a new Deck

We are so excited with the release this week by Xilinx of Vivado 2014.1 . Having been involved since the dark-ages with the underpinnings of this world-class CAD tool for FPGA, we're always eager to see what's new and improved. With almost a dozen 28 nm designs under our belt, we made quick work of porting designs working on platforms like the KC705, VC709, and ZC706 from 2013.4 to 2014.1. We love that we can script in Tcl, or click in the GUI. We like to see both work equally well. We need more time to have hard quantitative evidence, but across the board we are seeing 10~20% placer run time reductions, with no loss of Quality of Results. Not sure where this gain is coming from, but we will measure and find out. This is crazy good. Thanks Xilinx!

Altera is looking to answer with their salvo of Quartus II 14.0 soon, so stay tuned. Altera does have some interesting silicon announced down the pike; and we're happy to bring our RTL to whatever devices our clients are using. Still, from a do-it-all CAD tool perspective, we lament that Quartus lacks a built-in functional simulator like Vivado. Oh well... separation of concerns!

What a world it would be if Xilinx and Altera got out of the CAD tool business and just focused on world-class FPGA silicon, leaving the EDA tools to the EDA tool vendors. Or to the open source community! It's nice to dream. Still, with tool suites like Vivado that build in aspects like IP Integrator, and IP Packager, we're glad (for now) that the FPGA vendors are in the EDA game.

Sunday, March 30, 2014

Functional Correctness Quickly

A saying we use around here that has evolved from axiom to dogma is this:

“We shall achieve Functional Correctness Quickly and Performance Correctness Iteratively”

In a fast-moving world, this has substantial value to our clients who are often looking for us to operationalize a new platform or algorithm. They turn to us to provide a “show-me” demo under tight time and cost constraints. That near-in demonstration is rarely the end goal. Frequently they want to tune for lower latency, more throughput, better energy efficiency, lower gate area, and add features needed to address their continuously evolving market requirements. We live for this. This is our passion.

How do we do this? In our world, where reconfigurable computing intersects with the business of complex concurrency, we have some things dictated to us and we have places where we have some room to choose.

In the dictated-to-us department we have the FPGA back-end tools for synthesis and implementation. Xilinx dictates Vivado, Altera dictates Quartus. Tabula dictates Stylus, and so on. The vendor tools all have their strengths and weaknesses; and they all provide the capability to go from RTL to bitstream. We have our preferences here, but our clients come first. If they choose an Efinix device, we will use the flow thus dictated to us. Fine.

It is in the higher-level choice of Domain Specific Language (DSL) where our choice counts the most, especially as it relates to “Functional Correctness Quickly”. In our agile sprints to quickly achieve first-functional code, we have used Bluespec SystemVerilog (BSV) for nearly a decade.

The fire-drill to first-functional often goes something like this: We rough out a relationship of various modules and their hierarchy. Each module provides an interface. Without getting caught up in the implementation that goes in these modules, we are free to think solely of the interaction patterns between them. We go through this healthy IP birthing-phase where we may anthropomorphize the module on behalf of the designer. “Shep’s TagServer is in the business of producing an ordinal sequence of tags to a plurality of clients”. In this first step we are very much concerned with what each module does, not how it does it, and what sort of interaction patterns are needed with other modules. We may even assign abstract types to the interfaces because we don’t know yet what is needed.

Bluespec System Verilog (BSV) is exceptionally well-suited for this kind of scrum. We work in either the command-line or GUI-based Bluespec Development Workstation (BDW) mode rapidly crafting Types, Interfaces and Modules. Feedback from design entry to viewing compiled results comes in just seconds; an edit compile debug cycle that is no laggard to the best software development techniques. When we’re done with this step we have the rough-structure of our design in place, which I like to call scaffolding.

Getting to first-functional, the Functional Correctness Quickly part, requires us to provide an implementation for our modules. Because we spent the first step compartmentalizing what does what and how modules communicate, this second-step goes more-quickly as we already know precisely “what business” every module is in. This focus not only let’s us code more concisely, but is also well-suited for quick, on-the-fly, DUT testbenches that add a built in verification aspect to the design process. We will usually code both standalone Bluesim simulations, which are C-based bit- and cycle- true simulations that can be seen in less than a minute; as well as Verilog-RTL simulation which we can run on a Verilog simulator (rarely) or push through to the FPGA and run on hardware (often). Because all BSV code synthesizes there is no special step or flow for this. The benefit is that we see feedback from the FPGA vendor tools for area and Fmax early in the module component development process.

With our collection of modules implemented, we push through the entire design to FPGA. Since the individual modules have been tested, and the interconnection of modules are tested and standardized, it is infrequent that the system-level composition exhibits unexpected behavior. In the rare case where a behavior is not understood, we bisect until we locate the issue. But most of the time, ta dah, we have our first functional code running. Functional Correctness Quickly!

But we may not be done. We will measure the metrics and check against the current requirements to see if we are. It is common to see requirements change. A design that does not tolerate change well is said to be brittle. BSV designs are the opposite of brittle. Let’s say the TagServer we described earlier needed to be embellished to produce a tag type that was not just a unsigned integer, but more-elaborate data structure. Fine! Have at it! With a few tactical changes the modules that produce and consume tags can be adapted. Changes that would be gut wrenching in VHDL or Verilog are almost effortless. And conceptually you are empowered to change the “what” (Types) and how (Rules) separately. Bottom line here is that you get your Performance Correctness Iteratively!

We acknowledge that there may be other ways to realize our agile practice of racing to first-functional and then continuously improve. However we know of no better EDA tool than BSV, when dealing with parallel communicating sequential processes, what we call Complex Concurrency, so common in FPGA.

Sunday, February 23, 2014

The Ultimate Digital Designer’s Assistant

For most of the past decade we’ve had the choice of implementing the solutions to our client’s needs in RTLs, such as VHDL, Verilog, or SystemVerilog; or in Bluespec SystemVerilog (BSV). Because much of what we do involves the business of complex concurrency, lots of moving parts that are difficult to reason about in the whole but easier to grasp a rule-step at a time, BSV has been an obvious choice for us. But there is an important aspect of designing with BSV that we feel is often overlooked by those not familiar with the art, and is perhaps one of the language’s best features:
BSV is the Ultimate Digital Designer’s Assistant!

There’s a saying in our shop with regard to clearly-written BSV codes that goes something like this: “If it compiles, it will work”. With those words, hundreds of verification engineers have just aimed tactical nukes towards Manchester New Hampshire, so please let me explain. There are several reasons why this is the case, I’ll touch on a couple.

First, there are crazy-good type safety and formal guarantees enforced by the compiler. Compared to Verilog, which is awful, and even VHDL, which is less-awful, the BSV compiler does an amazing amount of analysis and type-checking statically at compile time within seconds of your pinky coming down on the enter key. We have spent days debugging issues in conventional RTL that the BSV compiler would have detected in seconds. In 2006, Stuart Sutherland presented a SNUG paper highlighting 57 “Gotchas” with Verilog and SystemVerilog. BSV detects or avoids the vast majority of them. There is simply that much less to get-wrong, and when it is wrong, you are getting the most-excellent feedback from the compiler while the ideas and concepts have barely left your fingertips and are fresh in your mind.

Next, there is this quality; you may have seen it on the masthead of this blog, of “Scalable Atomicity”. In a nutshell, the BSV designer reasons about rules within a module, and methods a module exposes to others. The BSV compiler ruthlessly checks and ensures that the deterministic behavior of the resultant sequential circuit is equivalent to the result of each rule firing one at a time. We look at any rule in our system, and the interesting state-changing actions between the curly-braces and reason about them one-at-a-time. Our reptilian brains can handle that focused task! The BSV compiler then composes a deterministic schedule, devoid of any new state elements, that produces results identical to the one-at-a-time firing of all the rules in the system we have just reasoned about.

Our IP shop designs DMA engines, Packet Processors, Beamformers, and the like. Our clients demand that we “go deep quick”. Meaning we show them something partially-functional almost immediately. Without BSV by our side to check our work at every compile, we would not be able to move nearly as quickly. The edit-compile-debug cycle with BSV is seconds. We go around that loop, including bit- and cycle- true C simulation dozens of times each day. Then we make a bitstream, and it “just works”. If you find yourself complaining that the FPGA vendor tools take a long time and slow your productivity; perhaps you should explore how a digital designer’s assistant such as BSV would change your world.