Mathematicians are not Programmers!
By David Wiley
July 29, 2000
I quickly forget exactly why I hesitate to use pre-fabricated “libraries.” I also quickly remember as soon as I start rummaging through some newly acquired code from the Internet. I present here a few of my findings from my latest acquisition: SparseLib++ and MV++ Numerical Matrix/Vector C++ Library.
Dave’s Proverb 1: The only thing worse than C code written by a Mathematician is C++ code written by a Mathematician
Disregarding the documentation quality (which seems fair in relation to its peers) I immediately analyze the basic “design” of the system. Obviously from its name you could make the assumption that this is a sparse matrix library of some sort. So, my first goal was to look for the “sparse matrix class.”
What I found was “sparse matrix classes” (note the plural). Upon closer examination of the classes I discovered there are different implementations for double, float, integer, and complex. Seems like a wonderful idea, one may need the precision of doubles or the complexity of complex. The classes are neatly divided into four separate header files and four separate implementation files. Upon closer inspection of the classes I made the anticipated discovery that three of the four classes could be replicated exactly from the remaining class simply by performing a text replacement of the type (“float” for “double” etc). For Pete’s sake, use templates.
It does not make sense to me to put private class members at the top of class declarations. The top is the first place a user of the class looks to examine a class. Under the paradigm of “private” it is implied that these members are inaccessible. So why am I forced to read something I can’t even use? It wastes time and brainpower.
I encountered a comment as I read through the private section of one of the classes:
int
dim0_; // perferred to using dim_[2].
some compilers
int
dim1_; // refuse to initalize these in
the constructor.
Ignoring all the misspellings, I immediately knew why “some compilers refuse to init[i]alize these in the constructor.” So I took a look at the constructor to make sure:
MV_ColMat_float::MV_ColMat_float() : v_(), dim0_(0), dim1_(0) , lda_(0),
ref_(0){}
Yep. Is it a sin to actually use the constructor for what it is supposed to be used for? Just because the compiler lets you slip some extraneous initializations into the constructor member initialization list does not mean that it is still good programming style. The only initializations that must be in the initialization list are those of constant data members and references. I prefer to only have constant data member and reference initializations in this list and move all other initializations into the body of the constructor.
One may argue that using the initialization list improves class performance. Well, according to Amdahl’s speedup law the constructor(s) would have to be called quite often to make this “speedup” useful. So, this would mean that a programmer would have to be constructing a bunch of these sparse matrix objects either through the new operator, parameter passing, or temporary objects; this is just bad programming. Constructing many objects places a strain on the memory management and should be avoided if possible. So, just put declarations in the body of the constructor and don’t construct a bunch of objects. Please see Dave’s Constructor Tip 1.
Dave’s Constructor Tip 1: If you have several constructors (or just one with the anticipation of adding more), create a method called “Construct()” that initializes the data members to a known value
The class I am currently examining has nine constructors. Disregarding that I can only find six of the nine constructor implementations in the implementation file, each constructor proceeds to initialize each of its data members, as it should. Some of the initializations are specific to the constructor but many simply initialize the data members to 0.
I find that having a default data member initialization method, say Construct(), reduces errors in object construction because it assures that all data members are initialized to something meaningful (i.e. 0 or NULL).
The idea is to call Construct() at the beginning of each constructor, then after that you can modify only the data members that need to be modified. More importantly, if you add another data member, you can effectively update all constructors by modifying the Construct() method. This simple method helps prevent many bugs.
One, again, may argue that this will degrade class performance. Sure, but Amdahl’s speedup law applies again, thus the degradation will not be noticeable.
I noticed another interesting thing in the private section in the class declaration:
int
ref_; // true if this is declared as
a reference vector,
// i.e. it does not own the memory
space, but
// rather it is a view to another
vector or array.
Why does this class object need to know whether it is a reference or not? I had to think about this one for a couple minutes, mostly while I was writing the paragraphs above. This would only be true if one object actually shared the memory of another object. Why would an object need to do this? This is a very bad idea, especially when it does not need to be done. Doesn’t the C++ specification define something called “references?” Even the use of a simple pointer could replace this additional complexity. I checked all the uses of ref_ and that is exactly what it is used for.
The C++ specification states that class methods that are entirely defined in the class declaration are by default inline. I find it quite redundant to add inline to the method declaration:
inline
unsigned int size() const {
return dim_;}
inline
unsigned int dim() const {
return dim_;}
inline
int ref() const { return ref_;}
inline
int null() const
{return dim_== 0;}
Without rigorous examination of whether or not they actually used “const” in all the places they could have or whether it is done correctly, it appears that they use it often enough and in the right places. They have constant reference parameters, constant methods, as well as constant reference return values from methods. I commend the use. Here are a few examples:
inline
float& operator()(unsigned
int, unsigned int);
inline
const float& operator()(unsigned
int, unsigned int) const;
MV_ColMat_float operator()(const MV_VecIndex &I,
const MV_VecIndex &J) ;
const
MV_ColMat_float operator()(const
MV_VecIndex &I, const MV_VecIndex &J) const;
Note the function overloading where the first method returns a simple reference and the second is a constant method and returns a constant reference – excellent programming style. My spirits are brightened. See Dave’s Library Tip 1.
Dave’s Library Tip 1: Do not use inline methods in a library (when the implementation is not in the class declaration)
In one sense, a “library” is defined as a binary file and an accompanying header file – assuming it is compiled this way. When using the compiled library, if a method is declared as inline and the code is not in the header file, then where does the compiler find the inline code? Well, it can’t because the code is secretly buried with the library binary.
I kept on seeing an elusive variable called nz_ (or nz). It would show up in constructors and as a parameter when setting the size of the sparse matrix. The comment next to it in the class declaration is: “number of nonzeros.” Simple enough explanation I suppose, but after some time of tracing through the code it appears that nz represents the number of non-zero elements in the matrix. Well, that’s what the comment said, right?
Anyway, what is the point of a sparse matrix class if you have to know the exact number of non-zero elements? So, the problem of finding a sparse matrix class has now been reduced to figuring out the number of non-zero elements in my sparse matrix. Good grief, anyone can write a sparse matrix class if they know how many non-zero elements there are.
In my adventure through SparseLib++ I found the portion of code that finds the matrix element when given arbitrary coordinates. They essentially store the non-zero elements in a set of arrays and maintain some other arrays that contain the indexing information. So, when a lookup is performed they iterate through these arrays until they find the correct element, if not they return 0. Highly efficient they creators probably think.
A better method would be to use a hash table for storing the non-zero elements. We could then abandon the need for knowing the number of non-zero elements and could perform lookups much faster. But hash tables require more storage space! Maybe, but this is a sparse matrix class, right? And who said I wasn’t good at algorithm analysis? Oh, right, it was Chip and the pre-lims.
dfwiley@ucdavis.edu http://wwwcsif.cs.ucdavis.edu/~wiley/homepage.html
Copyright © 2000 David Wiley. All rights reserved.