A lot of programming language advocacy (even when just restricted to procedural/object-oriented languages) seems to revolve around side issues such as syntax and neatness. I believe that language level support for the core data types (strings, arrays/lists, associate arrays), library development and resource management are more important issues that is often overlooked.
Last updated 2006/03/22
Summarised from posts in a thread at http://discuss.fogcreek.com/joelonsoftware/default.asp?cmd=show&ixPost=319&ixReplies=105
There always seems to be a lot of arguments about what programming language people think is best. While I believe in 'the right tool for the job', I feel that people often fail to notice some core areas where some languages excel at, or fall down at
As a programmer who went through a computer science course at university, and have worked on commerical code for about 10 years, I have programmed in and been exposed to a fairly common set of languages some in might position might have seen: C, C++, VB, Delphi, Tcl, Perl, Python, ML, Scheme, Prolog, etc. In general, procedural and object oriented languages seem to dominate the majority of software development these days, so most of the arguments below apply to these rather than functional (LISP, Scheme, etc) and logic (prolog) languages.
Historically, most procedural languages seem to be very good at the computational, conditional and looping control aspect of programming (eg. integer/floating point math, if/then statements, while/for loops). However, the early legacy of C and Pascal that abstracted away fairly minimally the underlying CPU/memory hardware has meant that more complex structures were not been built in to these early procedural languages. Programmers were expected to manage memory, strings, arrays, etc closely themselves, using whatever code/libraries they felt they needed to tradeoff the small memory size/cpu speed to optimise for their particular application.
Given that the original memory/CPU constraints are no longer relevant for a lot of programs these days (though embedded programmers, or programmers with huge data sets or strict performance requires would still disagree), it's become reaonsble over the last 20 years to expect more and more libraries to be used to manage these low level structures and details for the programmer.
Joel feels that automatic memory management is an important feature required in a language these days. I'd say that memory management is only one part of the 'garbage collection' process, and a language should support some form of general guaranteed finalisation. C++ offers this through stack based objects and destructors, Perl and Python offer these through their reference counted semantics. Interestingly, more languages are planning on moving away from this to non-deterministic garbage collection!
People seem to want full garbage collection semantics, something that reference counting doesn't provide (because you can create loops). While full garbage collection ensures that even looped structures are deallocated, it's still not without problems. Even with one missed reference, you can still get a large 'leak' if that reference references many other large structures, which is why Java still has memory leak detection tools.
The other main problem with full garbage collection, is that you loose guaranteed destruction semantics on objects, something that can be very useful. The C++ idiom of "resource acquisition is initialization" and the corresponding destructor releases the resource is probably one of the most powerful idioms for dealing with all forms of resource acquisition and disposal, not just memory leaks which, given the push to full garbage collection everywhere, is what people seem the most obsessed about.
Examples where you want guaranteed finalisation include the common examples of closing of a file at the end of a function/method, or releasing a mutex/semaphore at the end of a function/method. The second one is especially commonly done in C++ thread libraries as a way to ensure that any exception thrown inside a method doesn't leave a locked mutex lying around.
Still, full garbage collection semantics doesn't mean that you can't have guaranteed finalisation code. It's possible to have garbage collected memory, and still have guaranteed finalisation through language semantics. Perl 6 is looking at having a POST block which is always executed at the end of a block. Unfortunately, almost no other language seems to be enabling something like this.
It's possible that it would be a very useful idiom to allow "enter scope" and "exit scope" actions to be placed together. Enter scope actions are commonly just code at that particular point, but being able to specify at the same point something that must occur when the same stack point is unwound could be a really useful feature.
function do_something() {
mutex.acquire();
onexit { mutex.release(); }
}
Along with resource aquisition and disposal however, I'd add at least 3 main structures that a language should support intrinsically in an efficient and well planned way:
Historically, C was probably the worst for each of these, especially 3 which is non-existant. Most scripting languages do well at all of these, to lesser and greater degrees. C++ worked on the principle that rather than putting anything in the language per-se, the language should be powerful enough to create a library to do whatever you want. This seems to have worked reasonably well since the STL and boost libraries created some really powerful features that do all 3 of the above, with your choice of space/complexity tradeoff. Lisp derivatives go so far as to make both code and data just a list. Surprisingly even modern languages like Java seemed relatively poor on these, such as having to use non-dynamic arrays, or using boxing/unboxing vectors. Only the recent generics support has made this a lot better.
Whenever I use VB, it's the poor support for 2 & 3 above that really annoys me. Also I'm not sure if it's still true, but I remember reading that string concatenation was on O(N^2) operation under VB meaning item 1 was of suspect usefulness as well. VB array support is horrible, and maps were an additional component, rather than an inbuilt language feature.
Map and list interoperability is a particularly interesting issue. All maps allow you to iterate over the keys or key-value pairs of the map. Whether this uses an iterator, or returns some list of items can seems to depend on whether the designer is more comfortable with a procedural or functional programming system.
While libraries have been around since almost programming started, their importance seems to continue to increase more and more. Many early C programmers seemed to have a "roll your own" attitude to most things, which harks back to the "it's not quite optimised for what I need" syndrome. These days a language can live or die by it's library support.
One of the core features of a language is the ability to create libraries
and use them. This has been one of C++'s greatest strengths, and weaknesses.
Because for quite some time there was no standard string class/array class/map
class, dozens of libraries and users created their own. This created the unfortunate
problem that then trying to build additional higher-level libraries that used these core
structures was hard to do, because which did you use? The result was lots
of libraries that still just used char *
pointers for strings,
and ..., int length, t_type * items,
for arrays. While mostly
fine for input data, this still creates problems for methods that want to
return data and results in annoying "call once to get the size, call again
to get the data" type interfaces that abound in C libraries.
Even with the STL that created a powerful standard library, a lack of a common
implementation, bugs in different implementations, a change in the standard
(#include
vs #include
and the
way template compilation occurs, it was basically impossible to create a
binary linkable C++ library. In this respect, the idea of a .dll/.so with
an interface definition that you can compile against and then link to has
never really been possible with C++ and the STL.
Libraries for other modern languages seem to fall into two main categories; loose community supplied libraries and highly structured centralised hierarchical class libraries.
When Perl 5 was introduced, it included a fairly simple packaging and library system, and the CPAN community site where libraries could be uploaded, indexed, searched and installed from. This has been immensely popular, and other scripting languages have tried to emulate it (eg PEAR for PHP). As perl is interpreted, all the libraries/modules are provided as source and loaded as runtime as used. Perl also provided a simple documentation format (perldoc) that encourages a certain comment style in code, and a certain layout. The most obviously visible aspect of this is the SYNOPSIS section at the top of most perl modules that gives a 1 page introduction on how to immediately use the module in code. In over 50% of cases I've found this lets you pretty much use the module straight away without having to read the full documentation list.
.Net and Java have included large libraries from their respective creators which are regarded as part of the language. These are usually pretty comprehensive wih regard to file, io, thread, container, etc.
On VB. Although I complained about the lack of lists/maps above, I have to say that there are some things I really like about VB.
I think most arguments are generally pretty much hot air. {} vs keywords vs indenting are all relatively minor points in comparison to core data structure support in my mind. Of course whole different paradigms like functional and logic languages are another kettle of fish that you can argue on completely different grounds. I wish I had some time to try out OCaml and Haskell which look really interesting.
However, I must admit to having a bias here. Every time I used Perl, I used to be really flustered by the 'noise' level of the language. Over time though, I've grown to really like it, as well as Larry's design ideas of 'huffman coding' a language. Basically the idea is that the most common things you want to do should be easy, and hard things not impossible. This extends down to how much typing should be needed for basic language features. This thinking manifests itself all through perl, including really deep support for strings, lists and maps.
Many people comment on Perl as a 'write once' language. I think any language can be, but sometimes people confuse 'noise' with 'conciseness'. Do you prefer reading legalise writing like "the aim, not withstanding other obstacles to the successful completion of the exercise, is to increase the general vibrational energy of the H2O molecules present in the ceramic container, or in other words, to create a strong level of Brownian motion", or do you prefer "heat the cup of tea". I see Perl as the second, and some other languages as the first. Of course it's a bit different to that, because from someone else's perspective the second statement "heat the cup of tea" might be written in Russian, so it's actually harder to understand then the long winded version...