Table Of Contents

Previous topic

The C++ Programming Language

Next topic

Integer Representations

A Quick Review of C++

Basic Syntax

Human languages are incredibly nuanced and flexible. Their vocabulary and grammatical rules are in constant flux, and even well-established norms can be occasionally disregarded (out of laziness or for specific effect). In part, this is because human brains are extremely sophisticated when it comes to language comprehension. We are free to take liberties in speech without risk of being misunderstood, because we can be reasonably sure that the people we are communicating with will understand based on context: “Hey, are you guys going to that thing later?” (But which people, to what, and when?) And just as often, we are imprecise by design: various creative ambiguities lie at the heart of word play in poetry and prose; and humans can be evasive (“It depends on what the meaning of is is”).

A compiler is not nearly so clever, and vagueness of any kind is its enemy. Its sole purpose is to produce a faithful and unique translation of your program code into machine code that the computer can execute. Hence, computer languages are designed to be unambiguous (and C++ is, for the most part). In mathematical terms, we want there to be a 1-1 mapping between the language constructs and the underlying machine instructions. This is guaranteed by enforcing rigid adherence to very strict grammatical forms (syntax). To be a successful C++ programmer, you will have to memorize many seemingly arbitrary rules.

Try to find all of the errors in the program listed below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#include "iostream"
using std:cout;
using std:endl;

Double unity(void) { return 1.0 }
Double 2ndPower(Double x) { return x*x }

Int main()
{
   Double x = unity;           /* assign x the value 1
   Double y = 2ndPower(3.0);   /* assign y the value 9
   cout << y - x << endl;      /* output their difference
   return 0;
}
  • The #include directive uses quotes only for files in your own search path. Headers from the C++ standard library are enclosed in angled brackets.

    #include <iostream>
    
  • The scope operator (::) is represented by a double-colon. Hence,

    using std::cout;
    using std::endl;
  • Keep in mind that C++ is case-sensitive, and all the C++ keywords are lowercase. Double and Int should read double and int.

  • An identifier, the name you give to an object or function, must begin with a letter or underscore. 2ndPower is not a valid name.

  • Every statement must be followed by a semicolon, even the last statement in a code block.

    double unity(void) { return 1.0; }
    double secondPower(Double x) { return x*x; }
  • Functions that take no arguments (i.e., those declared with a void argument type) still require trailing parentheses when they are called.

    double x = unity();
  • Comments come in two forms. The stand-alone double-slash (//) denotes a comment stretching to the end of the current line. A comment can also be enclosed in matching slash-star pairs (/* ... */). The comments in the program above should look like one of the following:

    /* assign x the value 1 */
    // assign x the value 1

Control structures

The following code snippet is supposed to compute the sum of the first ten squares, \(1^2 + 2^2 + \cdots 9^2 + 10^2 = 385\). It doesn’t work as advertised. Try to correct all of the problems.

int sum_squares = 0.0;
for (i = 1, i < 10, ++i);
   sum_squares =+ i*i;
  • The literal 0.0, by virtue of its having a decimal point, is of type double. When this number is assigned to sum_squares, the zero is quietly cast to an int. This is not a syntax error (and probably won’t even generate a compiler warning), but the assignement should more properly be written as

    int sum_squares = 0;
  • The three arguments to for should be separated by semicolons rather than commas.

  • The loop range, as written, does not include the value 10.

  • The for command should have no trailing semicolon. Otherwise, it will do nothing during each iteration of the loop.

  • The type of the variable i is never specified.

  • There is no =+ operator. The operator += is legitimate, however, with x += y standing in for x = x+y. The final statement should read

    sum_squares += i*i;
    

The final corrected version is

int sum_squares = 0;
for (int i = 1; i < 11; ++i)
   sum_squares += i*i;

Now consider a function sum_cubes that takes two integer arguments \(i\) and \(j\). It’s supposed to return \(i^3 + (i+1)^3 + \cdots + (j-1)^3 + j^3\). Again, try to find the errors:

int sum_cubes(const int i, const int j)
{
   for (int sum = 0; i < j+1; sum += ++i*i*i);
   return sum;
}
  • The variable i cannot be incremented if it is marked const. Since both i and j are passed by value (i.e., the arguments from the function call are copied into internal temporaries), there is no real need to declared either of them const.
  • There is a scope error when we try to return sum. Since this variable is defined in the argument list to for, it exists only for the duration of the loop. The simplest solution is to move the definition outside (so that sum has function scope).
  • Note that this function is not well-behaved if its two arguments aren’t properly ordered. In designing the algorithm, we’ve made the assumption that i is never greater than j. It’s best to make this assumption explicit. The command assert(logical_expression) enforces its argument when called: assert(true) does nothing and assert(false) triggers a run-time error.
  • There’s one other bug in the code—not a syntax error, but a conceptual error. It hinges on when the increment operator is performed. Remember that because the ++ is in the prefix position, sum += ++i*i*i is equivalent to i = i+1, sum += i*i*i. This means that we’re summing \((i+1)^3 + (i+2)^3 + \cdots + j^3 + (j+1)^3\). What we really want is the postfix version, sum += i*i*i++, which corresponds to the opposite order: sum += i*i*i, i = i+1.

The corrected code is

#include <cassert>

int sum_cubes(int i, int j)
{
   assert(i <= j);
   int sum = 0;
   for (; i < j+1; sum += i*i*i++);
   return sum;
}

Let’s generalize this function somewhat. Suppose we want to compute the sum \(i^n + \cdots + j^n\) for each of n = 1,2,3,4. Since the function can have at most four return values, it’s best to rework this as a procedure that accepts four pass-by-reference arguments.

#include <cassert>

void sum_powers(int i, int j, int &sum1, int &sum2, int &sum3, int &sum4)
{
   sum1 = sum2 = sum3 = sum4 = 0;
   assert(i <= j);
   int sum = 0;
   for (; i < j+1; ++i)
   {
     sum1 += i;
     sum2 += i*i;
     sum3 += i*i*i;
     sum4 += i*i*i*i;
   }
   return sum;
}

The code is valid and correct as written. It is not, however, as efficient as it could be. The statements in braces make use of 6 multiplication operations. The same work can be done in 3.

{
   const int i2 = i*i;
   sum1 += i;
   sum2 += i2;
   sum3 += i*i2;
   sum4 += i2*i2;
}

Finally, let’s look at a function designed to compute the Taylor series expansion \(\sin(x) = x - x^3/3! + x^5/5! - x^7/7! + \cdots\)

The terms appear at all odd orders with alternating sign. The programmer here noted that \(x^n/n!\) for \(n = 1,3,5,7,\ldots\) appears with positive sign when (n mod 4) is 1 and with negative sign when (n mod 4) is 3 and decided to implement the function this way:

double sin_taylor(double x, int maxorder)
{
   double sum = 0.0, term;
   for (int n = 1, sign; n <= maxorder; sum += sign*term, n += 2, term *= x*x/(n*(n-1)))
   {
     sign = 1;
     if (n > 1)
       if (n%4 == 3)
         sign = -1;
     else
       term = x;
   }
   return sum;
}

The layout of the code is very misleading. The programmer has chosen to align else with the first if.

if (n > 1)
   if (n%4 == 3)
      sign = -1;
else
   term = x;

But remember that white space (spaces, tabs, returns) is meaningless to the compiler. The C++ syntax rules specify that else is always associated with the most recent if statement, unless the placement of braces explicitly says otherwise.

if (n > 1)
   if (n%4 == 3)
      sign = -1;
   else
      term = x;

To match the logical flow intended by the programmer, one would have to include braces around the inner if.

double sin_taylor(double x, int maxorder)
{
   double sum = 0.0, term;
   for (int n = 1, sign; n <= maxorder; sum += sign*term, n += 2, term *= x*x/(n*(n-1)))
   {
     sign = 1;
     if (n > 1)
     {
       if (n%4 == 3)
         sign = -1;
     }
     else
       term = x;
   }
   return sum;
}

On the other had, written this way, no braces are needed at all:

double sin_taylor(double x, int maxorder)
{
   double sum = 0.0, term;
   for (int n = 1, sign; n <= maxorder; sum += sign*term, n += 2, term *= x*x/(n*(n-1)))
     if (n > 1)
       if (n%4 == 3)
         sign = -1;
       else
         sign = +1;
     else
       term = x;
   return sum;
}

Better still, the programmer might do away with the sign variable entirely, realizing that the alternating sign can be accounted for when term in updated.

double sin_taylor(double x, int maxorder)
{
   double sum = 0.0, term = x;
   for (int n = 1; n <= maxorder; sum += term, n += 2, term *= -x*x/(n*(n-1)));
   return sum;
}

Scope rules

Global variables exist everywhere in the program. Local variables have a scope restricted to the current code block (i.e., they die at the corresponding closing brace). New declarations with a previously used name will temporarily obscure the pre-existing object.

int i = 0; // Global variable declared in the preamble

int my_func(int j) { return j*i; } // Here, i refers to the global i

int main()
{
   int i = 1; // orphan
   int j;
   {
      int i = 2;
      int k;
      // Can access only ::i (the global i) and i (the most recently declared i).
      // No way to refer to the orphaned i from here.
   } // k dies here
} // j dies here

There is a special scope for objects declared in the argument list of a function or for loop:

double add(double a, double b)
{
   return a+b;
} // a and b die here

for (int k = 0; true; k += k) if (k > 2000) break; // k valid here
k = 1; // k invalid here

for (double w = -5.0; w <= 5.0; w += 0.12)
{

} // w dies here

Quick review of arrays and pointers

int x; defines a single integer with the identifier x referring directly to the memory location allocated to store the integer. Hence, we can write x = 5; to set the value of the memory location to 5. On the other hand, int x[5]; creates an array of five integers, arranged contiguously in memory.

An optional initialization list can be included at the moment of declaration:

int x[5] = { 1, 3, 7, 13, 21};

Otherwise, we could do the assignment elementwise:

int x[5];
x[0] = 1;
x[1] = 3;
x[2] = 7;
x[3] = 13;
x[4] = 21;

If the array element type is flagged as const, however, then the initialization list is required, since subsequent assignment to the elements is forbidden:

int p[7] = { 1, 3, 7, 13, 21, 31, 43 };

Individual elements are addressed using the square brackets notation and following the zero-indexing convention: x[0], x[1], ..., x[4]. The fact that the identifier x is a pointer to the first element of the list means that x holds the address of the first element of the array: x == &(x[0]). Hence,

*x == *(&(x[0])) == x[0]
*(x+1) == x[1]
*(x+4) == x[4]

Note that *(x+5) and x[5] are undefined since they exist past the bounds of the array.

Since x is a pointer to an integer, it can be used in int* pointer assignments and arithmetic:

int* y; // pointer to int (y currently points to nothing in particular)
y = x; // points to first element of array
y = x + 3; // pointer arithmetic: y now points to fourth element of the array
*y = -17; // changes x[3] from 13 to -17

Here is a function that takes the average value of the elements in an integer array:

double average(const int a[], int N) // must pass the array pointer and the array size
{
   assert(N > 0);
   double sum = a[0];
   for (int i = 1; i < N; ++i)
      sum += a[i];
   return sum/N;
}

double ave = average(x,5);
ave = average(p,7); // wouldn't work if we hadn't specified const int in
                    // the argument list of average

Passing arguments by value and by reference

A quick brain teaser: what three numbers does the following program output to the terminal?

#include <iostream>
using std::cout;
using std::endl;

int i = 1; // file scope

void transform(int &x, int s) { x = s*x + i; }

int main()
{
   int i = 2; // function scope
   transform(i,2);
   transform(::i,3);
   transform(i,::i);
   transform(::i,i);
   {
      int i = 3; // block scope
      transform(i,i);
      transform(::i,2);
      cout << i << endl;
   }
   cout << i << endl;
   cout << ::i << endl;
   return 0;
}

There are three different integer variables of the same name. When we refer to i in the program, the compiler understands that to refer to the most recent declaration. Hidden variables re-emerge once the variables that eclipsed them go out of scope. It’s convenient to make a table showing how the values stored in memory evolve as the program executes.

File scope Function scope Block scope
1 2 3
1 5 = 2*2+1 3
4 = 1*3+1 5 3
4 24 = 5*4+4 3
100 = 4*24+4 24 3
100 24 109 = 3*3 + 100
300 = 100*2 + 100 24 109

Global variables

Why do we bother with scope issues at all? We could simply introduce all the objects we need for the program in the preamble before main. Then all of those objects would have file scope and be available everywhere in the program: they would be so-called global variables.

Global variables were common in early (and more rudimentary) programming languages, but they are now considered bad programming style. The strategy in modern languages is to control as much as possible when and how data can be modified. This helps to reduce the possibility that data is accidentally changed or corrupted somewhere in the program. In other words, it makes your program much easier to debug.

One legitimate use for global variables is to specify quantities that many functions need access to. These might include physical and mathematical constants. In the following example, we define a global constant double-precision floating point number that holds the value of and another that holds a length.

#include <cmath>
using std::sin;
using std::cos;

const double pi = 3.14159265358979323846;
const double L = 100.0;

double mode(int k, double x) { return sin(k*pi*x/L); }
double anti_mode(int k, double x) { return cos(k*pi*x/L); }

Properties of objects and functions

Let’s briefly summarize the various properties we’ve encountered so far.

Objects:
type, scope/duration, mutability must be specified; recognized by its identifier
Functions:
return type, various argument types, and argument-passing style must be specified; recognized by identifier + argument types (allows for operator overloading)

The scope and duration are controlled by the location of the declaration. Mutability is determined by the presence of the const keyword. In its absence, objects are assumed to be mutable.

The types come in four categories: logical, lexical, integer, and floating point.

logical lexical integer floating point
bool char short int float
  signed char unsigned short int double
  unsigned char int long double
    unsigned int  
    long int  
    unsigned long int  

Note that signed char, signed char, and unsigned char are treated as three unique types. On the other hand, int, signed int, and unsigned int constitute two types since int and signed int are identical.

Declaration versus definition

Objects and functions in C++ can be used before they are defined. Technically, the compiler just needs to know their type structure. This information is provided by a declaration.

For objects, the difference between a definition and a declaration is just whether an initialization value is appended:

[const/mutable] [signed/unsigned] [short/long] int identifier; // declaration
[const/mutable] [signed/unsigned] [short/long] int identifier = initialization_value; // definition

Square brackets here indicate that the keywords are optional.

For functions, the declaration comes in the form of a skeleton called a prototype. Note the trailing semicolon!

return_type identifier( [const] arg1_type[&], [const] arg2_type[&], ... );

The function definition, on the other hand, has a code block that follows:

return_type identifier( [const] arg1_type[&] arg1, const arg2_type[&] arg2, ... ) { ... }

Reminders

It will be worth your while to have a good understanding of the following:

  1. The integer and floating-point types and their relative sizes
  2. Rules for the const modifier
  3. Binary representations and two’s complement
  4. Casting between integer, floating-point, and boolean types
  5. Proper use and positioning of semicolons
  6. Names and behaviour of common operators
  7. Order of arithmetic operations (precedence rules)
  8. Branching and looping constructs. Difference between while and do while
  9. Scope rules
  10. Function syntax (argument list and return type)
  11. Pass-by-reference and pass-by-value semantics
  12. Pointer and array declaration syntax using * and []
  13. Pointer comparisons
  14. Pointers and memory
  15. C arrays as mathematical vectors and matrices
  16. Passing command line arguments using agrc and argv

You might also want to keep in mind certain bugs that appear again and again:

  1. Changes to an object don’t propagate outside of a function if the object was passed by value
  2. Operations on integers produce integers (e.g., 11/3 == 3)
  3. The compiler uses implicit casting: ((0.0+1)+2)*5/2 becomes (1.0+2)*5/2 becomes 3.0*5/2 becomes 15.0/2 becomes 7.5)
  4. The increment and decrement operators (++ and --) have high precedence and behave differently depending on their position (e.g., these have four distinct meanings: ++*p, *++p, *p++, (*p)++)
  5. for loops have their own scope
  6. Valid identifiers must begin with a letter or underscore character; identifiers are case-sensitive
  7. Functions must have a return value or be declared void; (overloaded) functions are distinguished by their identifier plus the types in their argument list
  8. Arrays are zero-indexed; C strings have a hidden sentinel character (\0)
  9. Changes can’t be made to objects flagged as const
  10. Unsigned integer types wrap around to their highest value when they drop below zero