Table Of Contents

Previous topic

UNIX Tools

Next topic

A Quick Review of C++

The C++ Programming Language

Overview

In contrast with some older languages, such as FORTRAN 77 or COBOL, C++ is freeform in the sense that whitespace (spaces, tabs, returns) has no meaning. The code can be laid out as you wish, provided that each individual command is separated by a semicolon (;) and blocks of commands are delineated by braces ({ ... }). Explanatory remarks, called comments, can be inserted by signalling to the compiler that certain parts of the program file are to be ignored. Single line comments begin with a double slash (//) and extend to the end of the current line. Blocks of comments are enclosed by matching slash-star pairs (/* ... */). Both the following code listings are interpreted identically by the compiler.

int main()
{
   return 0; // zero is the standard return value
             // when there are no errors to report
}
// C++ code is freeform and all whitespace is treated equally

 int   main(  ){
   ; ;
  ;      ;
      return   0  ;}

/* Related blocks of code are enclosed in matching pairs
   of braces. Each individual command is followed by a
   semicolon. In the function above, there are four "do
   nothing" commands and one return statement. */

These two examples are variations on the null program, int main(){return 0;}, which is the simplest possible. (It begins, does nothing, and ends.) Every valid C++ program must contain exactly one function called main that returns an integer value to the operating system. Program flow begins with the call to main and terminates when all the statements in main have been executed.

Roughly speaking, everything in the C++ language is either an object or a function. An object occupies memory (to store its data) and has a definite type associated with it. A function acts on zero or more objects and returns at most one object. (C++ functions encompass both the mathematical notion of a function and what is more commonly called a procedure.) The function syntax is illustrated below.

int mult(int x, int y) { return x*y; }

#include <cmath> // read in the C standard math library
using std::pow;  // make the function pow available
double geo_ave(double a, double b, double c)
{
   double prod = a*b*c;
   return pow(prod,1.0/3.0);
}

void rescale(float &x, int n) { x *= n; }

The function mult takes two ints (i.e., integer numbers), named x and y, and returns their product. The function geo_ave takes three doubles (i.e., double precision floating point numbers) and returns their geometric average. Note that geo_ave itself calls another function, pow, which is used to raise a*b*c to the one third power. (In some languages, exponentiation is provided via the ** or ^ operators; this is not true of C and C++.) The function rescale takes a float (i.e., a single precision floating point number) and an int and multiplies the former by the latter. void indicates that the function has no return value.

The ampersand (&) before x tells the compiler that that argument is passed by reference rather than by value, which means that changes to x within rescale are propagated outside the function.

int, double, and float are examples of atomic objects—what are commonly called plain old datatypes (PODs). As we will see later, composite objects can be created using C arrays (to group many objects of the same type) or using the struct and class keywords (to group objects of arbitrary type). All objects are either constant or mutable and must be declared before they are used. Constants (constant objects) must be defined at the moment of declaration and their value can never change. Variables (mutable objects) can be defined at the moment of declaration or later. Their value can always be changed.

Example

$ cat > add.cpp
#include <iostream>
using std::cout;  // stream object directed to stdout
using std::endl;  // end-of-line marker

int add(int a, int b) { return a+b; }
int add(int a, int b, int c) { return a+b+c; }
void add2(int a, int b, int &c) { c = a+b; }

int main()
{
   cout << "1+2+3 = " << add(1,2,3) << endl;
   const int x = 4; // x declared as an integer constant
                    // and assigned the value 4
   int y,z;  // y and z declared as integer variables
   y = 5;    // y assigned the value 5; z remains undefined.
   cout << "4+5 = " << add(x,y) << endl;
   add2(x,y,z);  // z is assigned the sum x+y
   cout << "4+5 = " << z << endl;
   return 0;
}
[ctrl-d]
$ g++ -o add add.cpp
$ ./add
1+2+3 = 6
4+5 = 9
4+5 = 9

In the example above, the iostream library is invoked to handle output to the terminal. Note that the function add is overloaded, i.e., it has multiple definitions with different sets of arguments. This is legal in C++ provided that the compiler can unambiguously determine which version of the function is being called. The function add2 requires a different name since overloaded functions cannot be distinguished by return type (and another add already exists that takes three integer arguments).

Objects exist within a certain scope—at the code block level in which they are declared and in all blocks nested inside, unless preempted by another variable declared with the same name. In some cases, preempted names can be uncovered using ::, the scope operator.

Example

$ cat > scope.cpp
#include <iostream>
using std::cout;
using std::endl;

int i = 1; // global variable
class foo
{
  public:
   static int i; // static class variable
};
int foo::i = 2;
int main()
{
   int i = 3;
   int j,k,l,m;
   {
      int i = 4;
      j = i;
      l = foo::i;
      m = ::i;
   }
   k = i;
   cout << j << k << l << m << endl;
}
[ctrl-d]
$ g++ -o scope scope.cpp
$ ./scope
4321

Identifiers and keywords

The names that are given to objects and functions are known as identifiers. Identifiers can be made up of any combination of letters, numbers, and underscore characters (_), except that they must begin with a letter or underscore and they cannot be identical to any of the keywords that make up the C++ language. Note that C++ is case-sensitive; hence, foo, Foo, and FOO are three unique names.

Keywords reserved by the C++ language
Keyword Description
asm insert an assembly instruction
auto declare a local variable
bool declare a boolean variable
break break out of a loop
case a block of code in a switch statement
catch handles exceptions from throw
char declare a character variable
class declare a class
const declare immutable data or functions that do not change data
const_cast cast from const variables
continue bypass iterations of a loop
default default handler in a case statement
delete make memory available
do looping construct
double declare a double precision floating-point variable
dynamic_cast perform runtime casts
else alternate case for an if statement
enum create enumeration types
explicit only use constructors when they exactly match
export allows template definitions to be separated from their declarations
extern tell the compiler about variables defined elsewhere
false the boolean value of false
float declare a floating-point variable
for looping construct
friend grant non-member function access to private data
goto jump to a different part of the program
if execute code based on the result of a test
inline optimize calls to short functions
int declare a integer variable
long declare a long integer variable
mutable override a const variable
namespace partition the global namespace by defining a scope
new allocate dynamic memory for a new variable
operator create overloaded operator functions
private declare private members of a class
protected declare protected members of a class
public declare public members of a class
register request that a variable be optimized for speed
reinterpret_cast change the type of a variable
return return from a function
short declare a short integer variable
signed modify variable type declarations
sizeof return the size of a variable or type
static create permanent storage for a variable
static_cast perform a nonpolymorphic cast
struct define a new structure
switch execute code based on different possible values for a variable
template create generic functions
this a pointer to the current object
throw throws an exception
true the boolean value of true
try execute code that can throw an exception
typedef create a new type name from an existing type
typeid describes an object
typename declare a class or undefined type
union a structure that assigns multiple variables to the same memory location
unsigned declare an unsigned integer variable
using import complete or partial namespaces into the current scope
virtual create a function that can be overridden by a derived class
void declare functions or data with no associated data type
volatile warn the compiler about variables that can be modified unexpectedly
wchar_t declare a wide-character variable
while looping construct

Types and literals

C++ offers a variety of integer types based on int and modified by the keywords short, long, signed, and unsigned. It also offers three floating point (FP) types, float, double, and long double. On almost all modern computer architectures, the first two types correspond to the single precision (binary32) and double precision (binary64) encodings specified in the IEEE Standard for Floating-Point Arithmetic (IEEE 754). long double refers to a floating point type that may, and usually does, have greater than double precision. On the x86 architecture, most compilers implement long double as the 80-bit extended precision type supported by that hardware. On some other architectures, long double is a 128-bit quadruple precision type.

Plain old datatypes provided by C++
Type Description Example
bool boolean value bool t = true; bool f = false;
char single character char c = 'a';
signed char single character signed char c = 'a';
unsigned char single character unsigned char c = 'a';
wchar_t single wide character wchar_t wc = 'a';
short int short integer short i = 5;
int integer int i = 5;
long int integer long i = 5L;
unsigned short int unsigned short integer unsigned short i = 5U;
unsigned int unsigned integer unsigned i = 5U;
unsigned long int unsigned long integer unsigned long i = 5UL;
float single precision FP float x = 10.2F;
double double precision FP double x = 10.2;
long double high precision FP long double x = 10.2L;

Numerical constants such as 5 and 10.2 are called literals. All literals have a type. By convention, obvious integers such as 5 are of type int rather than of type short or long. An unsigned or long version of an integer constant can be created by appending the suffix U or L or both. (In this one situation, C++ is case-insensitive. These suffixes can equally be written lower case.) Thus, if the function foo is overloaded as follows:

void foo(int);
void foo(unsigned);
void foo(unsigned long);

then foo(23), foo(23u), and foo(23ul) call three different functions.

Example

$ cat overload.cpp >
#include <iostream>
using std::cout;
using std::endl;

void report(int i)
{
   cout << "The integer " << i << " is signed" << endl;
}

void report(unsigned int i)
{
   cout << "The integer " << i << " is unsigned" << endl;
}

int main()
{
   report(5);
   report(-5);
   report(5u);
   return 0;
}
$ g++ -o overload overload.cpp
$ ./overload
The integer 5 is signed
The integer -5 is signed
The integer 5 is unsigned

Similarly, obvious decimal numbers such as 10.2 are of type double. A single precision version can be specified by appending F. A quadruple precision version can be specified by appending L. Floating point literals can also be specified in a variant of scientific notation, where \(m\)E\(e\) stands in for \(m \times 10^e\). The mantissa (\(m\)) and exponent (\(e\)) are themselves arbitrary decimal numbers.

C++ literals
Type Encoding/Base Example
char[] ASCII "hello"
char ASCII 'a'
unsigned short int ASCII L'ab' (one or two characters)
int octal 01
  decimal 1
  hexadecimal 0x1
  ASCII 'ABC' (two to four characters)
unsigned int octal 01U
  decimal 1U
  hexadecimal 0x1U
long int octal 01L
  decimal 1L
  hexadecimal 0x1L
unsigned long int octal 01UL
  decimal 1UL
  hexadecimal 0x1UL
float decimal 12.3F
  scientific 1.23E1F, 123E-1F
double decimal 12.3
  scientific 1.23E1, 123E-1
long double decimal 12.3L
  scientific 1.23E1L, 123E-1L

Literals of the lexical types are enclosed in quotation marks: single quotes for characters (char) and double quotes for C strings (char[]). Since ' and " are used as delimiters, special escape sequences \' and \" are used to produce quotes within a literal. For example, char a = '\'' assigns a single quote to the char variable a. There are many other two-character escape sequences beginning with a backslash.

Escape sequences
Code Character Description
\\ \ backslash
\' ' single quote
\" " double quote
\? ? question mark
\0 <NUL> binary 0
\a <BEL> bell (audible alert)
\b <BS> back space
\f <FF> form feed
\n <NL> new line
\r <CR> carriage return
\t <HT> horizontal tab
\v <VT> vertical tab

We have not yet said anything about the internal representation of the various types. In C++ this is left up to the compiler (with some restrictions on the relative sizes) and varies from platform to platform. The sizeof operator can be used to query the compiler as to the number of bytes that are needed to store each of the PODs.

Example

$ cat > sizes.cpp
#include <iostream>
using std::cout;
using std::endl;

int main()
{
   const char message[] = "Hello";
   cout << "char = " << sizeof(char) << endl;
   cout << "\"Hello\" = " << sizeof(message) << endl;
   cout << "unsigned short = " << sizeof(unsigned short) << endl;
   cout << "short = " << sizeof(short) << endl;
   cout << "unsigned int = " << sizeof(unsigned int) << endl;
   cout << "int = " << sizeof(int) << endl;
   cout << "unsigned long = " << sizeof(unsigned long) << endl;
   cout << "long = " << sizeof(long) << endl;
   cout << "float = " << sizeof(float) << endl;
   cout << "double = " << sizeof(double) << endl;

   return 0;
}
$ g++ -o sizes sizes.cpp
$ ./sizes
char = 1
"Hello" = 6
unsigned short = 2
short = 2
unsigned int = 4
int = 4
unsigned long = 4
long = 4
float = 4
double = 8

There are a few things to take note of in the example above. The square brackets [] indicate that message is an array of chars. The size of "hello" is 6 rather than 5 since the string is actually stored in memory as a list of characters terminated by the binary byte 0.

_images/Hello.png

The string "Hello" occupies six contiguous bytes in memory. The decimal and binary digits shown here correspond to the most widely used character encoding (ASCII)

Operations

Various operations can be performed on PODs. A unary operation acts on a single object. Binary and tertiary operations act on pairs and triplets of objects. The available operators are listed in order of precedence.

Operators
Precedence Operator Description Example
1 :: scope operator Class::age = 2;
2 () grouping operator (a+b)/4;
  [] arrary access array[4] = 2;
  -> member access from a pointer ptr->age = 34;
  . member access from an object obj.age = 34;
  ++ post-increment for (i = 0; i < 10; i++) ...
  -- post-decrement for (i = 10; i > 0; i--) ...
3 ! (unary) logical negation if (!done) ...
  ~ (unary) bitwise complement flags = ~flags;
  ++ pre-increment for (i = 0; i < 10; ++i) ...
  -- pre-decrement for (i = 10; i > 0; --i) ...
  + (unary) plus int i = +1;
  - (unary) minus int i = -1;
  * pointer deference data = *ptr;
  & address of address = &obj;
  (type) cast to a given type int i = (int) floatNum;
  sizeof return size in bytes int size = sizeof(float);
4 ->* member pointer selector ptr->*var = 24;
  .* member object selector obj.*var = 24;
5 * multiplication int i = 2*4;
  / division float f = 10.0/3;
  % modulus int rem = 4%3;
6 + addition int i = 2+3;
  - subtraction int i = 5-1;
7 << bitwise shift left int flags = 33 << 1;
  >> bitwise shift right int flags = 33 >> 1;
8 < comparison less-than if (i < 42) ...
  <= comparison less-than-or-equal-to if (i <= 42) ...
  > comparison greater-than if (i > 42) ...
  >= comparison greater-than-or-equal-to if (i >= 42) ...
9 == comparsion equal-to if (i == 42) ...
  != comparsion not-equal-to if (i != 42) ...
10 & bitwise AND flags = flags & 42;
11 ^ bitwise exclusive OR flags = flags ^ 42;
12 | bitwise inclusive OR flags = flags | 42;
13 &&, and logical AND if (a && b) ...
14 ||, or logical OR if (a or b) ...
15 ? : (tertiary) conditional if-then-else int i = (a > b) ? a : b;
16 = assignment operator int a = b;
  += add and assign a += 3;
  -= subtract and assign a -= 4;
  *= multiply and assign a *= 5;
  /= divide and assign a /= 2;
  %= modulo and assign a %= 3;
  &= bitwise AND and assign flags1 &= flags2;
  ^= bitwise exclusive OR and assign flags1 ^= flags2;
  |= bitwise inclusive OR and assign flags1 |= flags2;
  <<= bitwise shift left and assign flags <<= 2;
  >>= bitwise shift right and assign flags >>= 2;
17 , squential evaluation operator for (i=0, j=0; i < 10; ++i, ++j) ...

The precedence of an operation determines the order in which it is performed. For example, a+b*-c is interpreted as a+(b*(-c)), since unary minus has higher precedence than multiplication which in turn has higher precedence than addition. Similarly, a/b-+c is read as (a/b)-(+c) and a%b*c as (a%b)*c. Parentheses override the order of operations.

Example

$ cat ops.cpp >
#include <iostream>
using std::cout;
using std::endl;

const double hbar = 6.57E-16; // eV.s
const double omega0 = 5.1E14; // 1/s

double energy(int n)
{
   return hbar*omega0*(n+0.5);
}

void write(int n)
{
   cout << "level " << n << ": " << energy(n) << " eV" << endl;
}

int main()
{
   cout << "Harmonic oscillator energies: \n"
           "----------------------------- \n"
           "level 0: " << energy(0) << " eV" << endl
        << "level 1: " << energy(1) << " eV" << endl
        << "level 2: " << energy(2) << " eV" << endl;

   int n = 2;
   write(++n);
   write(++n);
   write(n+1);
   write(n+2);
   write(2*n-1);

   write(1 << 3);
   write(75 % 11);
   n *= 3;
   n -= 2;
   write(n);


   double hbar = 0;
   cout << endl << "hbar = " << ::hbar << " eV.s" << endl;

   return 0;
}
[ctrl-d]
$ g++ -o ops ops.cpp
$ ./ops
Harmonic oscillator energies:
-----------------------------
level 0: 0.167535 eV
level 1: 0.502605 eV
level 2: 0.837675 eV
level 3: 1.17275 eV
level 4: 1.50782 eV
level 5: 1.84289 eV
level 6: 2.17796 eV
level 7: 2.51303 eV
level 8: 2.8481 eV
level 9: 3.18317 eV
level 10: 3.51824 eV

hbar = 6.57e-16 eV.s

For some operations, the order in which actions are performed in not always unambiguous. In particular, one should beware of the increment (++) and decrement (--) operators, which rely on side effects (the term of art for operations that modify their operands). For example, the unary operations +x and -x have no side effects; they return the value of x and its negative, but leave the variable itself unchanged. On the other hand, ++x increments x by one and then returns the new value. x++ returns the current value of x and then increments the variable by one. Exactly when the increment happens is sometimes hard to predict.

int x = 1;
int y = ++x; // y == 2, x == 2
int z = x++; // z == 2, x == 3
x = x / ++x; // unclear

In the last line above, the value of x is not guaranteed to be consistent across compilers. It is not clear at which point x should be incremented. When the order of evaluation is in doubt, it is best to break up operations into several steps.

There are a few subtleties to the logical operations. First, it is very important to distinguish the equality comparison operator (==) from the assignment operator (=). The statement x = 0 assigns x the value zero, whereas x == 0 checks whether x has the value zero and returns true or false. Second, comparison operations should not be chained together, since x < y < z (incorrect) and x < y and y < z (correct) are interpreted quite differently by the compiler. Finally, C++ employs lazy evaluation. That is, it takes advantage of the fact that the outcome of some binary logical comparisons can be predicted from one operand alone. Consider the truth tables for and and or.

Logical OR
or true false
true true true
false true false
Logical AND
and true false
true true false
false false false

It is clear that true or x is always true and that false and y is always false, regardless of the boolean values x and y. Since x and y do not need to be evaluated in this instance, they will not be. This matters primarily when the values x and y are returned from a function. If we attempt to evaluate the expression (one() or two()) and one() evaluates to true, then the function two() is not called. Similarly, if we attempt to evaluate the (one() and two()) and one() evaluates to false, then again two() is not called. This may be important if two() has side effects. Lazy evaluation always occurs from left to right.

The time required for the computer to execute each operation is not constant and depends on the particular machine architecture. Of the arithmetic operations, addition and subtraction are generally a little faster than multiplication, and division is always the slowest. Nonetheless the execution times are roughly comparable, and it is meaningful to think of performance in terms of how many operations are necessary to complete a particular calculation.

For example, what is an efficient algorithm for evaluating polynomials of the form \(P(x) = a_nx^n + \cdots + a_1x + a_0\) ? Naively, we might write something like

double polynom(double x, double a1, double a0)
{
   return a1*x + a0;
}

double polynom(double x, double a2, double a1, double a0)
{
   return a2*x*x + a1*x + a0;
}

double polynom(double x, double a3, double a2, double a1, double a0)
{
   return a3*x*x*x + a2*x*x + a1*x + a0;
}

in which case the total number of operations scales as

\[n + 1 + (n-1) + 1 + \cdots 2 + 1 + 1 = \frac{n(n-1)}{2} + n = \frac{n(n+1)}{2}.\]

That is, \(n(n-1)/2\) multiplications and \(n\) additions gives \(n(n+1)/2\) operations in total. On the other hand, if we regroup the terms in the polynomial as follows

\[P(x) = \bigl(\cdots\bigl(\bigl(a_n + a_{n-1}x\bigr)x + a_{n-2}\bigr)x + \cdots + a_1\bigr)x + a_0,\]

then evaluation requires only \(2(n-1)\) operations.

double polynom(double x, double a1, double a0)
{
   return a1*x + a0;
}

double polynom(double x, double a2, double a1, double a0)
{
   return (a2*x + a1)*x + a0;
}

double polynom(double x, double a3, double a2, double a1, double a0)
{
   return ((a3*x + a2)*x + a1)*x + a0;
}

In so-called “big-O” notation, we say that the first scheme is \(O(n^2)\) whereas the second, called Horner’s scheme, is \(O(n)\). For polynomials of very high order, this difference becomes significant.

Exercise

Write a function that evaluates \(5x^7 -8 x^6 + x^4 - x\) using the fewest operations.

Type conversion

Not all the C++ operators can act on all the PODs. For instance, the modulus operation only makes sense for integer values. Hence, 4%3 is valid code, whereas 5.75%2.25 triggers a compiler error. Another important restriction is that binary operations can only operate on two operands of identical type. If they are made to act between different types, and where it is sensible to do so, the compiler will quietly convert to the most expressive type of the two. This behind-the-scenes type conversion is called implicit casting.

double x = 2.0;
float y = 3.0;
x+y;

The addition operation above is actually carried out as x+(double)y or x+double(y). That is, the float is first cast to a double before the addition is carried out. The result of the addition is thus a double.

In some cases, you will want to explicitly cast one type to another. A good example is the division operation, which has subtly different behaviour depending on the types involved.

5/2;     // == 2
5.0/2.0; // == 2.5
double(5)/2;              // == double(5)/double(2) == 2.5
5/( (double)2 );          // == double(5)/double(2) == 2.5
static_cast<double>(5)/2; // == double(5)/double(2) == 2.5

C++ has inherited from C the cast notation (type)object and type(object). It also has its own specialized casting operators static_cast, const_cast, dynamic_cast, and reinterpret_cast, the last two of which are rarely used. When acting on PODs, the static_cast is equivalent to a C cast.

It is important to note that floating point types are cast to integer type by truncating the fractional part: e.g., (int)3.14 == 3, int(3.999) == 3, and int(-3.999) == -3. This behaviour implies that positive numbers are always rounded down and negative numbers always rounded up. To acheive conventional rounding, you might define a function like this.

inline int round(double x)
{
   return x >= 0 ? (int)(x+0.5) : (int)(x-0.5);
}

Exercise

Write a function that rounds to the nearest even integer.

In order to maintain backward compatibility with C (which does not have a built-in boolean type), the values true and false are cast to the integers 1 and 0, respectively. This is the reason why the chained logical comparison we encountered in the last section does not actually produce an error. (It is valid code. It just doesn’t behave as expected!) The programmer’s intent is clearly to check that the three numbers have increasing value. In practice that is not how things work out. The less than (<) operator is binary and groups from left to right, so the statement x < y < z is read as (x < y) < z, i.e., as a nested pair of comparisons. The bracketed term then evaluates to either true of false, neither of which is a numerical type that can be compared to z. Hence, an implicit case transforms the second comparison into either 0 < z or 1 < z.

Control structures

Branching

You will often want the computer to conditionally execute code based on the current value of some variable. The way to do this is with the if keyword or with the ?: operator. For example, to take the absolute value of a number, you want to check if it is negative and if so negate it.

double x = -5.0;
double abs_x;
if (x > 0.0)
   abs_x = x;
else
   abs_x = -x;

Only one of these code branches is executed, after which abs_x holds the absolute value of the variable x. The syntax for if is as follows: if (logical expression) action1 else action2, where the actions are either a single statement or a code block enclosed in braces. The fail condition marked by else is optional. We could have written this instead:

double x = -5.0;
double abs_x = x;
if (abs_x < 0.0) abs_x = -abs_x;

An alternative formulation, based on the ?: operator, has the advantage that abs_x can be made const, since it is defined at the same time it is declared.

double x = -5.0;
const double abs_x = ( x > 0.0 ? x : -x );

Testing for several logical conditions can be carried out using if and else chained in series.

if (x == 1) { /* code for x == 1 */ }
else if (x == 2) { /* code for x == 2 */ }
else if (x == 3) { /* code for x == 3 */ }
else { /* code for x != 1 and x != 2 and x != 3 */ }

The same thing can also be done with a switch.

switch (x)
{
   case 1:
   // code for x == 1
   break;
   case 2:
   // code for x == 2
   break;
   case 3:
   // code for x == 3
   break;
   default:
   // code for x != 1 and x != 2 and x != 3
   break;
}

This construction is most useful when there are a large number of discrete conditions to check for. It is important not to forget the break statements. Otherwise, control falls through to the next case.

Conditional tests can be nested.

bool even;
if (x%2 == 1)
{
   even = false;
   if (x == 1) { /* code for x == 1 */ }
   else { /* code for x odd and x != 1*/ }
}
else
{
   even = true;
   if (x == 2) { /* code for x == 2 */ }
   else { /* code for x even and x != 2 */ }
}

Logical expressions can be combined using the logical operations and (&&), or (||), and not (!). (Note that !(x==1) is equivalent to x!=1.) Subexpressions can be grouped using parentheses to overide the natural precedence.

bool updated = true;
int x = 5;
if ( !( x > 1 and updated ) or x == 0 ) { /* does not execute */ }
if ( x < 2 or x > 4 and updated ) { /* does execute */ }
if ( x > 1 and x < 5 and !updated ) { /* does not execute */ }

Remember that x and y is true only if both x and y are true. Because of C++’s left-to-right lazy evaluation,

if ( a == 1 and expensive_function() )

may be much more efficient than

if ( expensive_function() and a == 1 )

especially if this test is performed multiple times and a often has a value other than 1.

Keep in mind that the indenting is just a formatting convention for the benefit of the programmer. Whitespace, although it has no meaning for the compiler, can occasionally be misleading to a human reader. At the end of the following code snippet, j == 2 and not j == -1.

double t = 99.0;
int j = -1;
if (t < 100.0)
   if (j == 0)
      j = 1;
else
   j = 2;

Absent braces, else is always attached to the most recent if. To produce the logical flow suggested by the indentation, you would have to enclose the second if in braces:

double t = 99.0;
int j = -1;
if (t < 100.0)
{
   if (j == 0)
      j = 1;
}
else
   j = 2;

To eliminate possible confusion, it is often a good idea to include braces even when they’re not strictly necessary.

Looping

The for construction is typically used to execute a block of code multiple times. A variable is introduced to serve as a counter. In the following, i ranges over the values 0, 1, ..., 9 in sequence. (Recall that i++ is shorthand for i += 1 or i = i + 1.)

for (int i = 0; i < 10; i++)
{
   // code
}

The general syntax is for (initialization;logical expression;action2) action1. The initialization step is performed once at the start. At the beginning of each loop the logical condition is checked; if true, action1 and action2 are performed (in that order). The loop ends the first time the logical condition evaluates to false.

The for loop is very flexible. The range of the counter is arbitrary, and it does not need to step in unit increments. Here, i takes the even values -6, -4, -2, 0 , 2, 4, 6.

for (int i = -6; i < 7; i += 2)
{
   // code
}

The scope of variables defined in a for loop is restricted. This is incorrect:

for (int i = -6; i < 7; i += 2)
{
   // code
}
int j = i; // error: the variable i doesn't exist outside the for loop

This is correct:

int i;
for (i = -6; i < 7; i += 2)
{
   // code
}
int j = i; // valid: j is assigned 6

The sequence operator (,) used in this context has a different behaviour than it does in function arguments. The two-part statement a,b; tells the compiler to compute a and throw away the result, then compute b and return the result. Similarly, a,b,c,d; computes each of a, b, c, and d, but evaluates to d. What this means is that

// sum numbers from 1 to 100
int sum = 0;
for (int i = 1; i < 101; ++i)
   sum += i;

can be compressed to

int sum, i;
for (sum = 0, i = 0; sum += i, i < 101; ++i);

Such a rewriting is not always advisable, especially if it makes the code harder to decipher.

An alternative to for is while, which comes in two flavours. The first

int n = 0;
while (n < 10)
{
   ++n;
   cout << n << (n != 10 ? '", ": " ");
}

and the second

int n = 0;
do
{
   ++n;
   cout << n << (n != 10 ? '", ": " ");
} while (n < 10);

differ only in when the exit condition is checked. The behaviour of the two code snippets shown above is identical: both output 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 to the terminal. Sometimes, however, testing to exit in one position or another does matter. For example, in the following recompute may never be executed at all.

bool is_converged(void);
void recompute(void);

while (!is_converged())
   recompute();

If the loop is written this way, however, recompute will be executed at least once:

do
{
   recompute();
} while (!is_converged());

The loop can also be exited at any point by issuing a break statement.

while (true)
{
   recompute();
   if (is_converged()) break;
}

while (true)
{
   if (is_converged()) break;
   recompute();
}

Consider the problem of evaluating the truncated series,

\[S_N = \sum_{n=1}^N \frac{1}{n^2}.\]

Convince yourself that the following program computes \(S_{10}\).

#include <iostream>
using std::cout;
using std::endl;

int main()
{
   double sum = 1.0;
   const int N = 10;
   for (int n = 2; n <= N; ++n)
      sum += 1.0/(n*n);
   cout << "The series truncated at N = " << N
        << " evaluates to " << sum << endl;

   return 0;
}

Exercise

What if sum += 1.0/(n*n) is switched out for sum += (1.0/n)/n or sum += 1/double(n))/n? Would this make any difference? Is any one of these statements more likely to overflow than the others?

Exercise

Consider the finite sequence of numbers

\[S = (n^2 + 3n^5)_{n = 1}^{65} = (4, 100, 738, \ldots, 3480876100).\]

How many of the numbers in \(S\) are divisible by 12? (Hint: beware of overflow.)

Exercise

How many of the numbers 1, 2, ..., 1000 are perfect squares? How many are perfect cubes? (Hint: you can solve this exercise without using either sqrt or pow.)

Exercise

Consider an unbounded square grid of points spaced by \(\Delta x = 0.1\) and \(\Delta y = 0.1\). How many points lie inside the diamond \(|x|+|y| = 2\)? How many points lie inside the circle \(x^2+y^2 = 4\)? (Hints: (i) be sure not to count the points on the boundary; (ii) you may be better off reformulating the problem so that you can use integer rather than floating-point types in your code.)

It’s valid to nest loops with an inner loop depending in some way on an outer loop’s counter variable. In this way, we can generate all unique ordered pairs

for (int i = 1; i < N; ++i)
   for (int j = 0; j < i; ++j)
          cout << "(" << j << "," << i << ")" << endl;

and all unique ordered triples

for (int i = 2; i < N; ++i)
   for (int j = 1; j < i; ++j)
      for (int k = 0; k < j; ++k)
         cout << "(" << k << "," << j << "," << i << ")" << endl;

Logical skeletons

Loops come in three flavours: for, while, and do while. Each can be understood as a nested sequence of actions and logical tests.

The for loop

for (initialization; logical_expression; statement2) statement1;

is equivalent to

{
   initialization;
   if (logical_expression)
   {
      statement1;
      statement2;
      if (logical_expression)
      {
         statement1;
         statement2;
         if (logical_expression)
         {
            statement1;
            statement2;
            .
            .
            .
         }
      }
   }
}

The while loop

while (logical_expression) statement;

is equivalent to

if (logical_expression)
{
   statement;
   if (logical_expression)
   {
      statement;
      if (logical_expression)
      {
         statement;
         .
         .
         .
      }
   }
}

The do while loop

do statement; while (logical_expression);

is equivalent to

{
   statement;
   if (logical_expression)
   {
      statement;
      if (logical_expression)
      {
         statement;
         if (logical_expression)
         {
            .
            .
            .
         }
      }
   }
}

The for structure is particularly well-suited for enumerated loops. The initialization and statement2 slots are typically used to define and increment a counter variable (whose scope is restricted to the loop).

while and do while check the exit condition before and after the statement. Hence, do while always performs the statement at least once.

Functions

Arguments to functions

By default, arguments to functions are passed by value. This means that the variables declared in a function’s argument list are copies. They are temporary variables occupying their own memory locations but assigned the external values gleaned from the function call. For example, consider the following function definition.

int odd(int i) { return 2*i+1; }

The call odd(3) transfers program control to odd, where a temporary int named i is allocated and assigned the value 3. The function returns 2*i+1. The statements int i = 3; odd(i); also lead to the creation of a temporary variable—an internal i that is assigned the value of the external i.

Changes to the internal temporaries do not propagate outward.

int odd(int i)
{
   i = i+1;
   return 2*i-1;
}

int main()
{
   int j = 5;
   int k = odd(j);
   assert(j == 5);
   assert(k == 11);
   return 0;
}

Alternatively, an argument may be flagged with an ampersand (&) to indicate that it should be passed by reference. In that case, the function receives the address (i.e., the actual memory location) of the variable, rather than its value, and no temporary is created.

int odd(int &i)
{
   i = i+1;
   return 2*i-1;
}

int main()
{
   int j = 5;
   int k = odd(j); // j is altered during the function call
   assert(j == 6);
   assert(k == 11);
   return 0;
}

A common idom is to pass a variable by reference but to declare it const. This ensures that the function has no side effects in that argument slot.

int odd(const int &i) { return 2*i+1; }  // compiler approves
int odd(const int &i)
{
   ++i;  // compiler reports an error here
   return 2*i-1;
}

For PODs, there is no good reason to pass arguments this way: it has exactly the same effect as passing by value. On the other hand, for large class objects, which are very expensive to copy, this is the prefered method. (Whereas the object may be large, its address is always just one machine word in size.)

Non-const passing by reference is most commonly used to implement procedure-like functions, especially in situations where it’s necessary to bypass the restriction that functions return at most one object as return value. For example, imagine a procedure that solves for the eigenvectors and eigenvalues of a matrix. It is implemented as a function that takes a matrix, a list_of_vectors, a list_of_doubles (all hypothetical types), and a bool.

void EigenSolver(const matrix &M, list_of_vectors &V,
                 list_of_doubles &E, bool &is_singular);

We might call it as follows.

matrix M;
list_of_vectors V;
list_of_doubles E;
bool failed;
EigenSolver(M,V,E,failed);
if (!failed)
{
   // perform operations on the eigenvectors and
   // eigenvalues that have been assigned to V and E
}

Prototypes

It is true of both objects and functions that they must be declared before they are used. This is a requirement of C++’s strong type system. It is valid, however, if they are defined later. A function’s declaration is called a prototype. It consists of the function’s return type, name, and argument list, followed by a semicolon.

void inc_by(int&, int);  // function declaration

int main()
{
   int i, j, k; // declaration of integers i and j
   k = j;    // valid but dangerous assignment
   i = 5;
   j = 7;
   inc_by(i,j);
   return 0;
}

void inc_by(int &x, int dx) { x += dx; }  // function definition

Prototypes are required for functions that have mutual dependencies. For example, foo needs to be declared before bar is defined and vice versa:

double foo(double);
double bar(double);

double foo(double x)
{
   if (x < 0.0) return x;
   return x + bar(x);
}

double bar(double x)
{
   if (x > 0.0) return x;
   return x - foo(x);
}

It is common practice to organize libraries of functions in a file filename.cpp and to store the corresponding prototypes in a separate header file named filename.h.

$ cat > B4.cpp
#include "my_math_functions.h"
// my_math_functions.h contains the prototype double Bessel0(double);
// Bessel0 is defined in my_math_functions.cpp

int main()
{
   double x = Bessel0(4.0);
   return 0;
}
[ctrl-d]
$ g++ -c my_math_functions.cpp
$ g++ -c B4.cpp
$ g++ -o B4 B4.o my_math_functions.o

Functions as arguments

The address of function can be passed as an argument to another function. This is useful when you want to perform some computation on a generic function that is to be specified later. For example, a simple numerical integration of a function \(f(x)\) might be given by

\[\int_{a}^b \!dx\,f(x) \approx \frac{1}{2}f(a) + \frac{1}{N}\sum_{n = 1}^{N-1} \biggl(a+\frac{(b-a)n}{N}\biggr) + \frac{1}{2}f(b)\]

The corresponding C++ function might look as follows:

trapezoidIntegrator( double (&f) (double),
                     double a, double b, unsigned int N)
{
   assert(N != 0);
   const double width = b-a;
   const double h = width/N;
   double sum = 0.5*( f(a) + f(b) );
   for (unsigned int i = 1; i < N; ++i)
   {
      const double x_i = a + i*h;
      sum += f(x_i);
   }
   return sum*h;
}

The 100-slice trapezoid approximation to

\[\int_0^{2\pi}\!d\theta\,\cos \theta\]

would be called as follows:

#include <cmath>
using std::cos;

const double I = trapezoidIntegrator(cos,0.0,2*M_PI,100);

Later, we’ll encounter a more sophisticated way to pass functions (and function objects) using templates.

Recursion

Recursive functions are ones that call themselves. The classic example is the factorial function, which can be defined either explicity, in terms of the product \(n! = n \cdot (n-1) \cdots 3 \cdot 2 \cdot 1\), or implicitly, by specifying a recursion condition \(n! = n \cdot (n-1)!\) and a terminating condition \(0! = 1! = 1\).

The conventional definition might look like one of the following.

unsigned int factorial(unsigned int n)
{
   unsigned int prod = 1;
   for (unsigned int m = 2; m <= n; ++m) prod *= m;
   return prod;
}
unsigned int factorial(unsigned int n)
{
   unsigned int prod = 1;
   while (n > 1) prod *= n--;
   return prod;
}

Keep in mind that C++ function calls are stored on a finite application stack. Typically, 100 or 200 levels of recursion are permitted before the program runs out of memory and crashes. Recursive functions also tend to be slower than their non-recursive alternatives (because of the overhead from repeated function calls).

$ cat > factorial.cpp
#include <iostream>
using std::cout;
using std::endl;

unsigned int factorial(unsigned int n)
{
   if (n == 0 or n == 1) return 1;
   // else
      return n*factorial(n-1);
}

int main()
{
   for (unsigned int n = 0; n < 8; ++n)
      cout << char('0' + n) << "! = " << factorial(n) << endl;
   return 0;
}
$ g++ -o factorial factorial.cpp
$ ./factorial
0! = 1
1! = 1
2! = 2
3! = 6
4! = 24
5! = 120
6! = 720
7! = 5040

Exercise

Explain why the function factorial works the same regardless of whether the else is commented out.

Exercise

Write a fibonacci function that uses recursion to calculate the series 1, 2, 3, 5, 8, 13, 21, ...

Templates

C++ provides a template mechanism for pattern matching of variables and type names. This allows the programmer to implement generic functions that act sensibly on objects of various types using a single definition.

$ cat > middle.cpp
#include <cassert>

template <typename T>
T middle(T x, T y, T z)
{
   if (y < x and x < z or z < x and x < x)
      return x;
   else if (x < y and y < z or z < y and y < x)
      return y;
   else
      return z;
}

int main()
{
   assert( middle(1,2,3) == middle(2,1,3) );
   const double x = middle(-5.0,99.3,26.0);
   assert( x > 25.0 and x < 27.0);
   const float y = middle(1.5F,1.0F,4.0F);
   const unsigned long int i = middle( (unsigned long int) y, 0UL, 2UL);
   assert( i == 1UL);
   assert( middle('d','o','g') < 'f' );
}
[ctrl-d]
$ g++ -o middle middle.cpp
$ ./middle
Assertion failed: (middle('d','o','g') < 'f'), function main
Abort trap

Here is an example of a bubble sort function that works with both conventional C arrays and C++ vectors:

$ cat > bubble_sort.cpp
#include <vector>
using std::vector;

#include <iostream>
using std::cout;
using std::endl;

#include <iterator>
using std::ostream_iterator;

#include <algorithm>
using std::swap;
using std::copy;

template <typename Iter>
void bubble_sort(Iter p1, Iter p2)
{
   bool mismatch;
   do
   {
      mismatch = false;
      for (Iter q = p1; q < p2-1; ++q)
         if ( *(q+1) < *q )
         {
            swap(*q,*(q+1));
            mismatch = true;
         }
   } while (mismatch);
}

int main()
{
   int a[5] = { 3, 9, 0, 1, 7 };
   vector<int> v(a,a+5);
   v.push_back(2);
   v.push_back(13);

   bubble_sort(a,a+5);
   bubble_sort(v.begin(),v.end());

   cout << "a = ";
   copy(a, a+5, ostream_iterator<int>(cout, " "));
   cout << endl;

   cout << "v = ";
   copy(v.begin(), v.end(), ostream_iterator<int>(cout, " "));
   cout << endl;

   return 0;
}
[ctrl-d]
$ g++ -o bubble_sort bubble_sort.cpp
$ ./bubble_sort;
a = 0 1 3 7 9
v = 0 1 2 3 7 9 13

Standard Library

Unlike some languages, C++ has very few built-in functions. Instead, they are provided as a large external library, broken into broad subcategories. Each small grouping of functions is loaded by including the appropirate header file. For example, most of the important math functions are accessed by #include <cmath>.

Functions provided by cmath
Function Description
cos cosine
sin sine
acos arc cosine
asin arc sine
atan arc tangent
atan2 arc tangent (2 parameters)
cosh hyperbolic cosine
sinh hyperbolic sine
tanh hyperbolic tangent
exp exponential function
frexp get significand and exponent
ldexp generate number from significand and exponent
log natural logarithm
log10 logarithm base-10
modf break into fractional and integral parts
pow raise to power
sqrt square root
ceil round up value
fabs compute absolute value
floor round down value
fmod computer remainder of division hline

A macro for the decimal representation of \(\pi\), called M_PI, is also included. Under some versions of UNIX, compilation of code that uses math functions may require the GCC option -lm. Header files that are part of the C language are available in C++ with a change in the naming convention: e.g., #include <math.h> becomes #include <cmath>, #include <stdlib.h> becomes #include <cstdlib>, etc. There is not always a complete correspondence between the two languages, since updates to language specifications are not in sync. For instance, the rounding functions round(), trunc(), and rint() are now in math.h as of C99 (the most recently ratified version of C). They are not yet part of C++, but likely coming in the near future.

Here’s another way to implement rounding:

inline int round(double x)
{
   const double abs_x = fabs(x);
   const double int i = int(floor(abs_x+0.5));
   return ( x > 0 ? i : -i );
}

Let’s consider a common mathematical procedure that requires both the square root (sqrt) and absolute value (fabs) functions. The quadratic polynomial \(ax^2 + bx + c = 0\) has two roots,

\[x_{1,2} = \frac{ -b \pm \sqrt{b^2 - 4ac}}{2a}.\]

This expression is mathematically exact, but when \(b^2 \gg 4ac\), we may run into the problem that one of the roots

\[x_{1,2} = -b \pm |b|\biggl(1-\frac{2ac}{b^2}+\cdots\biggr)\]

will very nearly vanish to leading order in \(b\). For floating point numbers (with finite precision), this can near cancellation can result in a catastrophic loss of significance.

A convenient workaround is to rewrite the troublesome root as follows.

\[\begin{split}x_{1,2} &= \frac{ -b \pm \sqrt{b^2 - 4ac}}{2a} = \frac{ -b \pm \sqrt{b^2 - 4ac}}{2a} \times \frac{-b \mp \sqrt{b^2 - 4ac}}{-b \mp \sqrt{b^2 - 4ac}}\\ &= \frac{b^2 - (b^2-4ac)}{2a}\frac{1}{-b \mp \sqrt{b^2 - 4ac}} = \frac{2c}{-b \mp \sqrt{b^2 - 4ac}}\end{split}\]

The two roots can be expressed as

\[x_1 = \frac{S}{2a},\ x_2 = \frac{2c}{S}\ \ \text{where} \ \ S = \mathrm{sgn}(b)\biggl(|b| + \sqrt{b^2-4ac}\biggr).\]
#include <cassert>
#include <cmath>
using std::sqrt; // square root
using std::fabs; // absolute value of a floating point number

void quadratic_roots(double a, double b, double c,
                     double &x1, double &x2)
{
   const double X2 = b*b-4*a*c;
   assert(X2 >= 0.0);
   const double X = sqrt(X2);
   const double Ym = -b-X;
   const double Yp = -b+X;
   const double Y = (fabs(Ym) > fabs(Yp) ? Ym : Yp);
   x1 = 2*c/Y;
   x2 = Y/(2*a);
}

Input/output

Command line arguments

Unix provides several ways for you to communicate with your program. One is to pass information to it from the command line when the program is first run. At that time, the command line invocation is parsed and deposited into an array of C strings called argv; the number of elements is stored in an integer argc. These two variables can be included in the argument list to main.

To be precise, argc is an int whose value is set equal to the number of individual terms entered on the command—including the program name. Each term is assigned consecutively to argv[0], argv[1], ..., argv[argc-1]. Consider the following example.

$ ./myprog reinit -n 100 -J5.0 --input=datafile.txt

In this case, argc is equal to 6. The array values, "myprog", "reinit", "-n", "100", "-J5.0", "--input=datafile.txt", are C strings. The cstdlib library provides functions atoi and atof for converting strings to numerical types.

(The whitespace determines the partition of the terms). Your program can be made to interpret this text data in any way you please. Note that to extract numerical values, the text must first be converted to a numerical type. The functions atoi and atof perform this task.

#include <iostream>
using std::cerr;
using std::cout;

#include <cstdlib>
using std::atoi; // function that converts text to an integer value
using std::atof; // function that converts text to a floating point value

int main(int agrc, char* argv[])
{
   int N;
   double T;
   if (argc != 3) // program requires exactly two arguments
   {
      cerr << "Error: two arguments are required"
           << "Usage: myprog number_particles temperature" << endl;
      return 1; // exit program
   }
   else
   {
      N = atoi(argv[1]);
      T = atof(argv[2]);
      cout << "Beginning simulation with " << N << " particles at temperature " << T << endl;
   }
   // code that makes use of the user-provided values in N and T
   return 0; // exit program
}

A user interacting with this program in the BASH terminal might have the following exchange:

$ ./myprog 7.5
Error: two arguments are required.
Usage: myprog number_particles temperature
$ ./myprog 100 7.5
Beginning simulation with 100 particles at temperature 7.5

Let’s return to the series \(S_N\) that we looked at earlier. How would we go about computing the infinite series? One approach would be to extrapolate from the sequence \(S_{10}, S_{20}, S_{40}, \ldots\) to \(S_{\infty}\).

$ cat > series.cpp
#include <cstdlib>
using std::atoi;

#include <cassert>

#include <iostream>
using std::cout;
using std::endl;

#include <iomanip>
using std::setw;

int main(int argc, char *argv[])
{
   assert(argc == 2);
   double sum = 0.0;
   int N = atoi(argv[1]);
   assert(N > 1);
   cout.precision(12);
   for (int n = 1; n <= N; ++n)
      sum += 1.0/(n*n);
   cout << setw(10) << N << setw(20) << sum << endl;

   return 0;
}
[ctrl-d]
$ g++ -o series series.cpp
$ cat > batch.bash
#!/bin/bash

N=10
./series $N > converge.dat
while (( $N < 2000 ))
do
   let N=N*2
   ./series $N >> converge.dat
done
exit
[ctrl-d]
$ chmod +x batch.bash
$ ./batch.bash
$ more converge.dat
        10       1.53976773117
        20       1.59366324391
        40       1.61961896301
        80       1.63235561634
       160       1.63866449491
       320       1.64180417895
       640       1.64337034551
      1280       1.64415251159
      2560       1.64454336554
$ gnuplot
gnuplot> plot "converge.dat" using 1:2 with points
gnuplot> f(x) = f0 + f1*x + f2*x**2
gnuplot> set fit errorvariables
gnuplot> fit f(x) "converge.dat" using (1.0/$1):2 via f0,f1,f2
gnuplot> plot "converge.dat" using (1.0/$1):2 with points, f(x)
gnuplot> print f0, f0_err
1.64493188042833 1.42971432364597e-06

We find that numerical estimate for \(\lim_{N\to\infty}S_N\) is 1.644932(1).

Exercise

Modify the program so that ./series m computes the series to m decimal digits of accuracy.

$ emacs rev3.cpp
#include <iostream>
using std::cout;
using std::cerr;
using std::endl;

int main(int argc, char* argv[])
{
   if (argc == 1)
      cerr << "Too few arguments!" << endl;
   else if (argc == 2)
      cout << argv[1] << endl;
   else if (argc == 3)
      cout << argv[2] << " " << argv[1] << endl;
   else if (argc == 4)
      cout << argv[3] << " " << argv[2] << " " << argv[1] << endl;
   else
      cout << "Too many arguments!" << endl;
   return 0;
}
[ctrl-x][ctrl-s][ctrl-x][ctrl-c]
$ ls
rev3.cpp
$ g++ -o rev3 rev3.cpp
$ ls -F
rev3*  rev3.cpp
$ ./rev3 a
a
$ ./rev3 a b
b a
$ ./rev3 a b c
c b a
$ echo I do not like them, $(./rev3 am I Sam).
I do not like them, Sam I am.

Exercise

Modify the program so that it can reverse as many as five arguments:

$ cp rev3.cpp rev5.cpp
$ emacs rev5.cpp
[Your changes]
$ g++ -o rev5 rev5.cpp
$ ls -F
rev3*  rev3.cpp  rev5*  rev5.cpp
$ echo $(./rev5 believe do I) $(./rev5 correctly this did I that)
I do believe that I did this correctly

Streams

A UNIX stream is an ordered sequence of bytes terminated by an end-of-file (EOF) character. The EOF can be produced using the [ctrl-d] key combination ([ctrl-z] for MS-DOS and Windows). For example, the following BASH session redirects a stream of user-supplied character input to a file.

$ cat > my file.txt
This is a sequence of characters redirected from stdin to this file.
[ctrl-d]

Internally, the stream is encoded using ASCII. The corresponding hex values are

54 68 69 73 20 69 73 20 61 20 73 65 71 75 65 6E 63 65 . . . 69 6C 65 2E 04

An important property of streams in that they are unidirectional. A stream establishes a connection to a device for the purpose of sending or receiving data (not both). Three predefined streams are provided in the UNIX environment (connecting your program to the terminal), and these have special handlers in C++.

UNIX stream C++ stream object operator standard input (stdin) cin >> standard output (stdout) cout << standard error (stderr) cerr <<

A subtle but important point: these handles aren’t keywords (i.e., they aren’t part of the C++ language); rather, they are identifiers (the names of objects).cin is of type istream, whereas cout and cerr are of type ostream. All three are class objects, as opposed to PODs. Class objects have special functions associated with them, sometimes called methods. These are accessed using a dot (.) notation.

For example, cin has a method good that checks on the status of the stream. If a >> operation fails for some reason or if the EOF marker has been reached, then cin.good() evaluates to false. This code will count the number of integers it can read in from the standard input stream:

int i;
unsigned long int count = 0;
cin >> i;
while (cin.good())
{
   ++count;
   cin >> i;
}
cout << count << " integers read from stdin." << endl;

The primary methods for controlling input/ouput (I/O) connections are width, precision, setf, unsetf. The first two are self-explanatory. The final two are used to turn on and off various flags (all defined in the std::ios namespace), the most common being std::ios::scientific and std::ios::fixed. Used in combination, these method allow the user to adjust the output format.

cout.setf(std::ios::fixed);
cout.precision(2);
const double money1 = 9.99;
const double money2 = 9.9987;
cout << "$" << money1 << endl; // $9.99
cout << "$" << money2 << endl; // $10.00

cout.precision(8);
const double pi = 3.14159265358979323846;
cout << pi << endl; // 3.14159265
        //  01234567

cout.unsetf(std::ios::fixed);
cout.setf(std::ios::scientific);

cout.width(16);
cout << pi;
cout.width(16);
cout << 2*pi;
cout.width(16);
cout << 3*pi;
cout.width(16);
cout << 4*pi << endl;
// 3.14159265e+00 6.28318531e+00 9.42477796e+00 1.25663706e+01
//0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef

An alternative to invoking the stream object methods is to embed so-called manipulators in the stream itself. The manipulators setw, setprecision, setiosflags, and resetiosflags perform nearly equivalent tasks to the methods discussed above. For example, to arrange the multiples of pi in four columns, we could also write the following:

cout << setiosflags(std::ios::scientific) << setprecision(8)
     << setw(16) << pi << setw(16) << 2*pi
     << setw(16) << 3*pi << setw(16) << 4*pi << endl;

In the following example, a sequence of numbers is reformatted in three columns.

cat > columns3.cpp
#include <iostream>
using std::cin;
using std::cout;
using std::endl;

int main()
{
   int i;
   cin >> i;
   unsigned int count = 0;
   while (cin.good())
   {
     ++count;
     cout << i;
     if (count%3 == 0)
       cout << endl;
     else
       cout << "t";
     cin >> i;
   }
   if (count%3 != 0)
     cout << endl;
   return 0;
}
[ctrl-d]
$ g++ -o columns3 columns3.cpp
$ cat > numbers.dat
1 2 3 4 5
6 7
8 9 10 11 12 13
14
15
16 17
[ctrl-d]
$ ./columns3 < numbers.dat
1      2      3
4      5      6
7      8      9
10    11     12
13    14     15
16    17

Some formatting options can be adjusting by setting flags via the setf and unsetf member functions of the stream object. Other member functions control the character width of the output (width), the number of digits of precision (precision), and the choice of padding character when number of required digits is smaller than the specified width (fill).

I/O Flags
Flag Description
std::ios::skipws Skip leading whitespace on input
std::ios::left Left justify output
std::ios::right Right justify output
std::ios::internal Pad numeric output by inserting a fill character
std::ios::boolalpha Use true and false for boolean true and false
std::ios::dec Output numbers in base 10, decimal format
std::ios::oct Output numbers in base 8, octal format
std::ios::hex Output numbers in base 16, hexadecimal format
std::ios::showbase Print out a base indicator at the beginning of each number
std::ios::showpoint Show a decimal point for all floating-point numbers
std::ios::uppercase When converting hexadecimal numbers, show the digits A–F as uppercase
std::ios::showpos Put a plus sign before all positive numbers
std::ios::scientific Convert all floating-point numbers to scientific notation on output
std::ios::fixed Convert all floating-point numbers to fixed point on output
std::ios::unitbuf Buffer output

Much of the same functionality can be obtained with manipulators, which are chained into the stream itself.

I/O manipulators
Manipulator Description
std::dec Output numbers in decimal format
std::hex Output numbers in hexadecimal format
std::oct Output numbers in octal format
std::ws Skip whitespace on input
std::endl Output end-of-line
std::ends Output end-of-string (\0)
std::flush Force any buffered output out
std::setiosflags(long) Set selected conversion flags
std::resetiosflags(long) Reset selected flags
std::setbase(int) Set conversion base to 8, 10, or 16
std::setw(int) Set the width of the output
std::setprecision(int) Set the precision of floating-point output
std::setfill(char) Set the fill character

Let’s attempt to approximate \(\pi\) from the idenity

\[\pi = 16\arctan(1/5) - 4\arctan(1/239)\]

by means of series expansion. The conventional Taylor series for \(\arctan\) can be computed efficiently with nested operations as follows.

\[\begin{split}\arctan x &= x - \frac{1}{3}x^3 + \frac{1}{5}x^5 - \frac{1}{7}x^7 + \cdots\\ &= x\biggl[\biggl(\biggl(\biggl(\biggl(\cdots + \frac{1}{9}\biggr)x^2 - \frac{1}{7}\biggr)x^2 + \frac{1}{5}\biggr)x^2 - \frac{1}{3}\biggr)x^2 + 1\biggr]\end{split}\]

Here’s a program that computes the approximation term-by-term and outputs the results with 16 digits of precision to stdout.

Example

$ cat > pi.cpp
#include <cmath>
using std::arctan;

#include <iostream>
using std::cout;
using std::endl;

double atan_series(double x, unsigned int N)
{
   const double x2 = x*x;
   double val = 0.0;
   for (int n = N, m = 2*N+1; n >= 0; --n, m -= 2)
     val = val*x2 + (n%2 == 0 ? 1.0 : -1.0)/m;
   return val*x;
}

int main()
{
   cout.precision(16);
   cout << "via 16*arctan(1/5) - 4*arctan(1/239):" << endl;
   for (unsigned int terms = 1; terms < 10; ++terms)
     cout << "pi (" << terms << "-term approx) = "
         << 16*atan_series(0.2,terms) - 4*atan_series(1.0/239,terms)
         << endl;
   cout << "pi         (exact) = " << M_PI << endl;

   cout << endl << "via 4*arctan(1):" << endl;
   for (unsigned int terms = 1; terms < 10; ++terms)
     cout << "pi (" << terms << "-term approx) = "
         << 4*atan_series(1.0,terms) << endl;
   cout << "pi         (exact) = " << M_PI << endl;

   return 0;
}[ctrl-d]
$ g++ -o pi pi.cpp -lm
$ ./pi
via 16*arctan(1/5) - 4*arctan(1/239):
pi (1-term approx) = 3.140597029326061
pi (2-term approx) = 3.141621029325035
pi (3-term approx) = 3.141591772182178
pi (4-term approx) = 3.1415926824044
pi (5-term approx) = 3.141592652615309
pi (6-term approx) = 3.141592653623555
pi (7-term approx) = 3.141592653588603
pi (8-term approx) = 3.141592653589836
pi (9-term approx) = 3.141592653589792
pi         (exact) = 3.141592653589793

via 4*arctan(1):
pi (1-term approx) = 2.666666666666667
pi (2-term approx) = 3.466666666666667
pi (3-term approx) = 2.895238095238096
pi (4-term approx) = 3.33968253968254
pi (5-term approx) = 2.976046176046176
pi (6-term approx) = 3.283738483738484
pi (7-term approx) = 3.017071817071817
pi (8-term approx) = 3.252365934718876
pi (9-term approx) = 3.041839618929402
pi         (exact) = 3.141592653589793

This program produces columnar data suitable for gnuplot.

Example

$ cat > circle.cpp
#include <iostream>
using std::cout;
using std::endl;

#include <iomanip>
using std::setw;

#include <cmath>
using std::cos;
using std::sin;

int main()
{
   const int steps = 100;
   for (int n = 0; n <= steps; ++n)
   {
     const double theta = 2.0*M_PI*n/steps;
     cout << setw(15) << cos(theta) << setw(15) << sin(theta) << endl;
   }
   return 0;
}
[ctrl-d]
$ g++ -o circle circle.cpp -lm
$ ./circle > circle.dat
$ gnuplot
gnuplot> plot "circle.dat" using 1:2 with lines
gnuplot> unset key
gnuplot> set size square
gnuplot> replot
gnuplot> quit

Exercise

Using circle.cpp as a template, write a program that outputs the coordinate pair

\[\begin{split}x(t) &= a\cos(2\pi p t)\\ y(t) &= b\cos(2\pi q t + \phi)\end{split}\]

over one cycle.

  1. Experiment with different values of the ratio \(p/q\). What is the condition that produces a closed curve?
  2. For \(p=q\), investigate how the shape of the curve is related to the phase shift \(\phi\).
  3. Plot \(x(t)+y(t)\) as a function of \(t\). Investigate the appearance of beats when \(p \neq q\) are close in value. Superimpose a wave at the carrier \((\omega_1 + \omega_2)/2\) and modulation \((\omega_1-\omega_2)\) frequencies.

Text files

Most files are human-readable and stored as a sequence of characters.

Example

$ cat > io.cpp
#include <cassert>

#include <iostream>
using std::cerr;
using std::endl;

#include <fstream>
using std::ofstream;
using std::ifstream;

ifstream fin;

int main()
{
   ofstream fout("test.txt");   // open an empty file test.dat
                        // overwrite if file already exists
   fout << "0 1 2 3 4 ... are the natural numbers" << endl;
   fout.close();

   int a,b,c,d;
   fin.open("test.dat");
   fin >> a >> b >> c >> d; // read in four integers
   assert(a == 0 and b == 1 and c == 2 and d == 3);
   fin.close();

   fout.open("test.txt",std::ios::app); // open an existing file and
                              // append all output
   if (fout.is_open())
     fout << "Another additional line" << endl;
   {
     cerr << "Could not open file `test.txt`" << endl;
     return 1;
   }
   fout.close();

   return 0;
}
[ctrl-d]
$ g++ -o io io.cpp
$ ./io
$ cat test.txt
0 1 2 3 4 ... are the natural numbers
Another additional line

Remember that :: is the scope operator. app is inside the ios namespace which is inside the std namespace.

Binary files

Binary files are treated as a stream of bits rather than characters. To manipulate them, we do not use the familiar text chaining operators << and >>. Instead, we use methods provided by the ofstream and ifstream classes to access the underlying bit patterns. The workhorse methods are read and write.

Example

#include <fstream>
using std::ofstream;
using std::ifstream;

ifstream fin;
fin.open("infile.dat", ios::in | ios::binary);

ofstream fout;
fout.open(outfile.dat", ios::out | ios::binary);

int main()
{
   char buffer[100];
   fin.read(buffer,100);
   if (!fin)
   {
     cerr << "Looking for 100 bytes. Only "
         << fin.gcount << "bytes read." << endl;
     fin.clear();
   }
   fout.write(buffer,100);

   fin.close();
   fout.close();

   return 0;
}

Note that a binary file is marked with the std::ios::bin flag.

The binary data for an arbitrary type can be written to a file, but usually requires a cast to char*.

struct S
{
   char name[20];
   double x;
   int i;
};

S datum;
S data[20];
fout.write((char*)(&datum),sizeof(S));
fout.write((char*)(data),20*sizeof(S));

The methods tellg() and tellp() query the current getstream and putstream position. The seekg(offset,direction) method moves the file position with repsect to ios::beg, ios::cur, or ios::end.

Example

ifstream fin("data.binary");
ifstream::pos_type begin = fin.tellg();
fin.seekg(0,ios::end);
ifstream::pos_type end = fin.tellg();
fin.close()
cout << "File size is " << end-begin << " bytes." << endl;

Example

ifstream fin;
fin.open("data.binary", ios::in | ios::binary | ios::ate);
if (fin.is_open())
{
   ifstream::pos_type bytes = fin.tellg();
   vector<char> buffer(bytes);
   fin.seekg(0,ios::beg);
   fin.read(buffer.begin(),bytes);
   fin.close();
}