Filter CSV by age

Demonstration of the program. Warning: `cat` is not available on Powershell. I used it to show the content of files, which you can use the editor for. I mistyped some of the `cat` commands and did not use them. Moreover, the usage string in the beginning is incorrect, but is fixed in the template below.

Filter CSV by age#

Implement a program that filters people by age from a dataset.

Input data is a comma-separated values (CSV) file. If there is a row which does not contain two columns, then output an error message.

Comma-separated values

plain text data format for storing tabular data where the values of a record are separated by a comma and each record is a line.

Milestones:

  1. First begin by processing people-with-age.csv line by line. If you encounter a blank line, then warn the user, but continue processing by printing the non-blank lines.

    ./main.exe 15 people-with-age.csv
    
  2. Parse one line into tokens and filter by age

  3. Recognize the three different malformed lines

    1. Whole line missing

    2. Age missing (no token)

    3. Age not recognized (token cannot be converted to a number)

  4. Support output to a file

    ./main.exe people-with-age.csv out.csv
    
  5. Also support reading the input from stdin as follows:

    "Anna, 12" | ./main.exe 15
    
    echo "Anna, 12" | ./main.exe 15
    

people-with-age.csv:

Nuvi Våle, 18

Aeral Körn
Lumio Satō, 29
Veski Ruañ, 12

The output file must not contain any error messages.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define LINE_MAX 100
#define DELIM "," // CSV delimiter

char *ifile, *ofile;
unsigned filter_age_max;
FILE *istream, *ostream;

const char USAGE[] =
    R"(Filters CSV rows, keeping only those with provided maximum age
%1$s max-age [input-file] [output-file]

Example: 
%1$s max-age 17 input.csv output.csv
%1$s max-age 10 input.csv (outputs to stdout)
%1$s max-age 54           (inputs from stdin, outputs to stdout)
)";

void filter_stream(FILE *istream, FILE *ostream) {
  char line[LINE_MAX];
  char *fgets_return;
  char *name, *age_str;
  size_t line_no = 0;

  while (
      // Read a line from `istream` and assign the return value to
      // `fgets_return`
      // YOUR CODE HERE
  ) {
    ++line_no;

    if (fgets_return && *fgets_return != '\n') {
      if (strlen(line) > 1) {
        // Assign `name` and `age_str` using `strtok`
        // YOUR CODE HERE

        // Alternative to strtok:
        // sscanf(line, "%*[^,],%d", &age);

        if (!age_str) {
          // Error message
          // YOUR CODE HERE
          continue;
        }
      }
    } else {
      // Error message
      // YOUR CODE HERE
      continue;
    }

    // Age processing
    unsigned age;
    auto recognized_count = sscanf(age_str, "%d", &age);
    if (recognized_count == 1) {
      if (age <= filter_age_max) {
        // Forward input line to `ostream`
        // YOUR CODE HERE
      }
    } else {
      // Error message
      // YOUR CODE HERE
    }
  }
}

int main(int argc, char *argv[]) {
  switch (argc) {
  case 4:
    // max-age ifile ofile
    ofile = argv[3];
  case 3:
    // max-age ifile
    ifile = argv[2];
  case 2:
    // max-age
    if (!sscanf(argv[1], "%d", &filter_age_max)) {
      puts("First argument is not an age.");
      exit(EXIT_FAILURE);
    }
    break;
  default:
    printf(USAGE, argv[0]);
    return EXIT_SUCCESS;
  }

  if (ifile) {
    // Open `ifile` and assign it to `istream`
    // YOUR CODE HERE

    // Exit program with an error message if file cannot be opened
    // YOUR CODE HERE
  } else {
    // Set `istream` if no file provided
    // YOUR CODE HERE
  }

  if (ofile) {
    // Open `ofile` and assign it to `ostream`
    // YOUR CODE HERE

    // Exit program with an error message if file cannot be opened
    // YOUR CODE HERE
  } else {
    // Set `ostream` if no file provided
    // YOUR CODE HERE
  }

  filter_stream(istream, ostream);
}

Requirements:

  • Deliver a very general flowchart where at least the line by line processing is visible, but not how each error is recognized.

Some concepts used from previous chapters: