Data input and datatypes

Data input and datatypes#

Including libraries#

We can stand on the shoulder of code that others have written by including libraries. In other words, reusable code is packaged into libraries. In C, a library consists of dozens of header files. We use an include directive to include a header file.

For example, the C standard library includes the standard input output header file stdio, that includes functions like scanf and printf. To make these functions available to us, we write:

#include <stdio.h>

include directive

replaces a #include directive line with the content of the file specified

Libraries are typically included in the beginning of a file. This way the code in the included library is available for the whole file.

standard library

The library made available across implementations of a programming language, e.g., C standard library and Python standard library.

The standard library is like the Swiss knife of a programming language. If you need to implement functionality and this functionality is already available in the standard library, you should prioritize functions in the standard library over implementing yourself.

Tip

You find a link to the standard library for browsing on the navigation bar.

Activity 6

Browse the C standard library.

How many header files are there?
Which header file/s could be interesting for:
- generating random numbers
- for calculating the length of an English word we store in the memory?
(optional) Compare the C and Python standard libraries regarding the functionality they provide.

Process:

3 min research
We review

For example using <time.h>:

#include <stdio.h>
#include <time.h>

int main() {
  time_t now = time(NULL);
  printf("Now: %s", ctime(&now));
}

Data input & output on the console#

scanf: inputs data by scanning from the console
printf: outputs data by printing to the console

f stands for formatted. These functions use a format string. A format string uses one or many format specifiers.

https://upload.wikimedia.org/wikipedia/commons/thumb/2/2c/Printf.svg/525px-Printf.svg.png — Fig. 10 How `printf` statement generates its output. The red string is the *format string*. A word beginning with `%` is a *format specifier*. Each argument is paired with a format specifier. Text below shows the final string generated by the format string.
CC BY-SA 3.0. By Surachit. Source: Wikimedia Commons#

format string: a string containing placeholders, called format specifiers, which describe how variable text should be displayed.

format specifier

a placeholder that follows the format %[parameter][flags][width][.precision][length]type, which describes how data read from a variable should be displayed.

The format string is a template language, e.g.: “%s is %d years old.” includes a string (hence %s format specifier) and an integer decimal (%d), which could resemble a name and their age. This template can then be filled with data (printf) or used to parse data from a sentence in this format (scanf).

parser

a software component that takes input data (typically text) and builds a data structure

#include <stdio.h>
#define MAX_WORD_LENGTH 120

char word[MAX_WORD_LENGTH];
int number;

int main() {
  printf("Enter a word press Enter: ");
  scanf("%s", word);
  // %s stands for string
  // scanf reads the input into the character array `word`

  printf("Your message was: %s\n", word);
  // This time %s used 

  printf("Enter an integer number: ");
  scanf("%d", &number);
  // %d stands for decimal integer
  // Use `&` if you read the input into a non-array.
  // `&` means *get-the-address* of the variable (instead of only the value)

  printf("The square of %d is: %d\n", number, number * number);
  
  puts("Bye!");
  // If you don't need to output a variable and need a newline (`\n`), use
  // `puts`. `puts` automatically includes a newline.

decimal

a number that uses the base ten.

The format string is a markup language itself. To see what is possible, refer here.

Tip

%d and %i have the same meaning in printf, but I recommend using %d.

Activity 7 (Using a format specifier)

You want to print a column of prices in DKK as follows, but not like on the right:

aligned

 13 DKK
140 DKK
900 DKK
  2 DKK
  0 DKK

not-aligned

13 DKK
140 DKK
900 DKK
2 DKK
0 DKK

You expect the prices up to ~900 DKK. Which format specifiers and format strings would you use in the following code?

You can use the syntax here. [] denotes optional elements in the syntax string.

#include <stdio.h>
#define PRICE_COUNT_MAX 10

int product_prices[] = {13, 140, 900, 2};
int main(void) {
  for (int i = 0; i < PRICE_COUNT_MAX; ++i)
    // YOUR CODE HERE
  // Use as starting point:
  // printf("%d: %d", i, product_prices[i]);
}

Data types#

A computer works with zeroes and ones. These can be interpreted in different ways:

For example, 1110:

four Boolean values (a Boolean can be true or false): true, true, true, false.
an integer: -1 (uses two’s complement)
an unsigned integer: 14
a floating point number: \(-1 \cdot 2^{-2} = 0.25\) (its decimal dot can float to the right). In this example the dot is shifted two times to the right. First 1 is the sign.
as a letter; 0110 binary corresponds to 6 in decimal, so sixth letter: F
as two letters; second (01) and third (10) letters in sequence: BC

Note that C does not have a built-in type for four bits, so these are just example interpretations and now how C interprets 1110.

Some C types that are similar to the interpretations above:

array of bools
int
unsigned int
double (more precise version of float)
char
array of char

The main datatype that uses the smallest number of bits is bool, even pads seven bits with zeroes, i.e., in 0000_0001, seven bits on the left of 1 are not used. The reason lies in the architecture of most computers. The smallest bit width a computer can address is one byte.

The next large main datatype after bool is char, which must have at least 8 bits. So let us see some example interpretations of a byte in C:

Code

#include <stdio.h>
char data = 0b1000'0110; // Single quotation mark for better readability

int main() {
  printf("byte count: %lu\n", sizeof data);
  printf("as int: %hhd\n", data);
  printf("as unsigned: %hhu\n", data);
  printf("as char: %c\n", data);
}

Output

as int: -122
as unsigned: 134
as char: �

string: a sequence of chars that ends with a null character

null character

a non-visible character with the value zero

We define a string using double quotes (") and a char using single quotes ('). If we use ", then a null character is automatically added to the end of the letters we write. When printf reads a string, it reads character by character, until a null character is reached.

Both variables generate the same output:

#include <stdio.h>

char msg_as_str[] = "bye!";
char msg_as_array_of_chars[] = {'b', 'y', 'e', '!', '\0'};

int main() {
  puts(msg_as_str);
  puts(msg_as_array_of_chars);
}

Refer to this table about main datatypes for:

all main types
their format specifiers

Question to ponder

Select datatypes for the following variables:

human height in meters
number of people in a city
door is open or not
price of a product in EUR
capacity of a modern hard-disk in bytes
gender: divers, female, male

Arguments and return value in functions#

These act like input and output to a function.

        flowchart LR
  x[argument/s]:::dashed <--> B[function]
  B -- return value --> y:::hidden

  classDef hidden display: none;
  classDef dashed stroke-dasharray:5;

Functions can have zero, one or many arguments and one return value. Usually arguments are used as input and the return value as output, but it is also possible to input addresses instead of values. If we input an address, then a function can use an argument to output values.

Examples:

function	arguments	return value
`printf("Hej Geko 👋");`	`"Hej Geko 👋"`: string	13: int (9+4 bytes written)
`scanf("%d %u", &n1, &n2);`	`%d`: format string: string, `&n1`: int, `&n2`: unsigned	number of successfully matched items, e.g., 2: int
`rand();`	none	random number: int

&n1 means the address of the variable n1.

If we use a function in a wrong way, we get an error:

int matched_item_count = scanf();

Output of clangd in the editor:

Too few arguments to function call, at least argument '__format' must be specified

Fortunately, our IDE pings us already while we write code:

Conversion between datatypes#

printf converts many datatypes to strings, and scanf converts vice-versa. We can also a non-string datatype to another non-string datatype:

#include <stdio.h>
#include <stdlib.h>

int main() {
  // Implicit conversion (safe, no data loss)
  // Small numeric types → larger numeric types
  int wholeNumber = 42;
  double bigNumber = wholeNumber; // implicit
  printf("%f\n", bigNumber);      // 42.000000

  // Implicit conversion – we may lose data
  double piDouble = 3.14159;
  int piInt = piDouble; // truncates to 3 (we lost data)
  printf("%d\n", piInt);

  // String to numbers
  char temperature_str[] = "23.45";
  double temperature = strtod(temperature_str, NULL);
  printf("%f\n", temperature); // 23.450000

  char age_str[] = "23";
  unsigned age = strtoul(age_str, NULL, 10);
  printf("%u\n", age); // 23
}

Question to ponder

Why do we use strings and numeric datatypes? Can’t we just use only strings or numeric datatypes?

Math expressions#

int a = 5;
int b = 2;
int sum       = a + b;        // 7
int diff      = a - b;        // 3
int product   = a * b;        // 10
int quotient  = a / b;        // 2
int remainder = a % b;        // 1

double x = 5.0;
double y = 2.0;
double quotient2 = x / y;         // 2.5

Question to ponder

Come up with an example where any of the math expressions be useful in Conveyor Belt Capacity Check problem?

Using variables and constants#

// First define a variable
int age = 30;

// Then use it (`age`)
if (age > 30)
  printf("Crisis?");

// Constants 
// Convention: CAPITAL letters
const double PI = 3.1415926535;

We don’t have to use const for functionality, but we humans make mistakes and this extra information lowers the chance of a mistake later. The compiler will warn us if we try to change the variable.

Program structure#

main() is where your program starts.

#include <stdio.h>

// Variables
int time;

int main() {
  puts("Welcome to the time machine 👋");
  time += 10;
  printf("Now the time is: %d\n", time);
}

Back to the problem#

Activity 8

Now come back to the problem Activity 5 and try again.

Steps:

10 min pair-programming
we review

Appendix#

fgets can also read input from the console. I chose to introduce scanf first, which can convert string to other datatypes.
scanf may overflow during scanning of input, if the user types more characters than the size of the buffer. A secure practice is to use fgets which reads a limited number of characters and then use sscanf on the buffer. We will introduce security aspects later.

Data input and datatypes

Contents

Data input and datatypes#

Including libraries#

Data input & output on the console#

Data types#

Arguments and return value in functions#

Conversion between datatypes#

Math expressions#

Using variables and constants#

Program structure#

Back to the problem#

Appendix#