Some suggested reading:
- Jonathan Shapiro, Programming Language Challenges in Systems Codes: Why Systems Programmers Still Use C, and What to Do About It (2006) -- "BitCC"
- Eric Brewer et al, Thirty Years is Long Enough: Getting Beyond C (2005) -- "Ivy"
- Trevor Jim, et al, Cyclone: A Safe Dialect of C (2002)
- Daniel Framton et al, Demystifying Magic: High-level Low-level Programming (2009)
- Mark Gritter, Conundrum: Do All Systems Research Papers Include a Colon? (forthcoming)
(I couldn't find the first two listed in Citeseer, although the ACM digital library turned up a few references to them, including the Framton paper.)
What's right with C? All these guys want to replace C in some manner. But first (as I did un-seriously yesterday) they have to tackle the issue of why C has been the right solution. What capabilities need to be preserved?
- "an intuitive model of what is happening on the machine at the source code level." [Shapiro]
- Systems programming requires: the ability to operate in constrained memory. [Shapiro] Multiple fixed-precision integral types. [Shapiro, Frampton]
- Good bulk I/O performance (zero or close-to-zero copy). Good performance, which in turn requires control of data representation. [Shapiro, Brewer, Jim]
- Systems programs retain state, which penalizes automatic storage reclaimation. They may also cache data. Hence, user-managed storage. [Shapiro]
- The ability to drop down even further to assembly language, for applications like microkernel IPC [Shapiro]
- C programs performed nearly as well as their assembly counterparts. [Brewer]
- "transparent, efficient access to the underlying hardware and/or operating system, unimpeded by abstractions." [Frampton]
- "unboxed types" [Frampton, Shapiro] and "the ability to bypass built-in abstractions" [Frampton] --- an unboxed type is one whose insides can be aliased and whose location in memory can be controlled.
- "explicit memory management" and "control over low-level data representations" [Frampton, Jim]
A key idea here is that important optimizations like buffer re-use, zero-copy techniques, scatter/gather DMA, are impossible if you can't control data placement. A second point is that systems programmers often want to build or access something that doesn't fit well in a higher-level abstraction: a new CPU feature, a requirement for multiprocessor operation, a balky piece of hardware, a predefined packet format.
What needs to be done differently? There is broad agreement that a successor language needs type and memory safety. These are so obviously pain points, and obstacles to more sophisticated analysis of programs, that any modern language needs to deal with them in some way. The "Magic" (Frampton) approach is to start with a high-level language and find ways to add the low-level programming features needed--- some neat ideas that I won't take the space to get into. The other works take a variety of bottom-up approaches.
- Take advantage of advances in prover technology and automated proof systems [Shapiro] and more aggressive static checking [Jim]
- Facilitate the expression of global properties and application constraint checking, and programmer knowledge about idiomatic manual storage. [Shapiro]
- Extensibility [Brewer, Frampton]
- Refactoring support --- in Brewer's system, "analyzing existing code to find patterns that could be better epxressed with a specific language extension."
- Integrate threads and atomic sections, concurrency control [Brewer]
- API adherence checking [Brewer, Shapiro]
- Dependent types as a mechanism for type-safe data layout. [Brewer]
- Allow the user of new language features for older code, and the choice of which language features the programmer wants to use. [Brewer, Frampton]
- Replace ad-hoc preprocessor hacks with a real macro system [Brewer]
- Control the places where run-time checks are inserted when needed for safety. [Jim]
The Ivy (Brewer) system assumes an additional, huge, requirement--- to evolve existing C code rather than rewrite it. So their approach is actually the most aggressive in relying upon heavy-duty automation, but the idea of adding language features in an "agile" fashion and refactoring the working code is appealing. I tends to stand more with Shapiro here; we have plenty of new languages and systems around to prove that "starting over" is feasible.
I. Core requirements. Predictable performance, small footprint, and explicit control of data layout--- just like C.
On top of that we need mechanisms to reduce common errors: not just type and memory errors, but resource leaks and inconsistencies of all sorts. One mechanism used in all three of the "bottom up" proposals is aggressive compile-time checking for provable correctness. It's hard to talk about the requirement here without getting into specific features, though. Compilation-time is far better than run-time checking, though, as a requirement.
User-controlled extensibility (and escape mechanisms) for common design patterns, application-specific knowledge, new language features, and "new" (or just difficult) hardware environments.
Modular decomposition of programs and libraries. (Seriously, include files need to die.)
In order to get anywhere, you will need to talk to (at least) C code to start.
Support for multi-core CPUs. (Read Boehm--- you need compiler support to get concurrency both correct and performant.) I hate to pick "threading" as the thing being supported because that's such a specific model.
Related to that, there's a huge potential requirement that depends on how you build systems. Are they monolithic structures or assemblages of small restartable processes? If its the latter, how do you support writing software that works that way? (You have to read between the lines of the NYT article--- but if you do, it's clear David's new company Optumsoft is working on an answer.)
II. The Penumbra of Experience. In addition to these "big ideas" I think there are a lot of things which we've learned are useful in a programming environment, but might slip in a "first version". On the other hand, some of them may be harder to get right than the more well-understood mechanisms of type inference and memory checking.
API and documentation output. (Maybe even literate programming!)
Interactive development and debugging. Remote debugging. Probe and tracing support. Memory and CPU profiling. Unit testing. Fault injection and other whitebox testing support.
Dynamic loading of modules, and shared object files across processes. A stable ABI for precompiled modules. Cross-compilation support.
III. What else? Features that would be nice to have to make programmers' lives easier.
Type inference to eliminate redundant typing. Other features to make code concise. (Example features might be tuples, named arguments. "Macros" of whatever flavor should be included more as an extensibility mechanism rather than a conciseness one.)
Parallel compilation and linking. *Fast* compilation, ideally.
Profile-driven (or even run-time) optimization.
Default associative array implementation that doesn't suck.
Mocking and delegation support, for unit testing, refactoring, and OOP. Serialization (though that one is tricky to get "right" without making a lot of assumptions.)
This post is already too long. But there is one other requirement that successful programming languages have and unsuccessful ones don't--- a real application that drives their use. Unix and C were successful in part because somebody (if even just their authors!) wanted to use them. A lot of other "modern" languages exist because somebody had an itch they needed to scratch, and massively over-engineered by writing a programming language to fit the bill. Despite the shortcomings of C, I don't think we'll see a Ivy-style evolution; we should instead expect a disruptive innovation from a language driven by some particular need --- just like C itself.
Part 1: 40 years of C
Part 2: Understanding the requirements (is always the hardest part)
Part 3: How do Go and other efforts stack up?