Building Your Own wc Tool in C# Code Challenge

The Unix wc (word count) tool is a classic command-line utility that counts the number of lines, words, and characters in a text file. In this post, we'll work on a minimalistic version of this tool using C#.

For this project let's focus on clarity over optimization, so even beginners can follow along.

Introduction

The wc tool is quite useful for analyzing text files in the terminal. It provides simple statistics:

  • Number of lines
  • Number of words
  • Number of characters
  • Number of bytes

This blog post is based on a code challenge where the goal is to recreate the functionality of the wc tool in your language of choice.

Below, we’ll build a minimal version of wc using C#.

Step-by-Step Implementation

First, let's break down what we'll need.

  1. Reading the file: We’ll use a stream to read the contents of the file.
  2. Counting lines, words, and characters: We'll write simple methods to count these.
  3. Handling command-line arguments: We'll add options to count specific things (lines, words, bytes, etc.).

Here’s the code for our minimal wc tool:

using System;
using System.IO;
using System.Linq;
using static System.Console;

if (args.Length == 0)
{
    PrintUsage();
    return;
}

var parsedArgs = ArgumentsParser.Parse(args);

try
{
    using var reader = new StreamReader(parsedArgs.FilePath);
    long lineCount = 0, wordCount = 0, charCount = 0, byteCount = 0;

    string? line;
    while ((line = reader.ReadLine()) != null)
    {
        if (parsedArgs.CountLines) lineCount++;
        if (parsedArgs.CountCharacters) charCount += line.Length + 1; // +1 for newline character
        if (parsedArgs.CountWords) wordCount += CountWords(line);
        if (parsedArgs.CountBytes) byteCount += System.Text.Encoding.UTF8.GetByteCount(line) + 1;
    }

    if (parsedArgs.CountLines) Write($"{lineCount} ");
    if (parsedArgs.CountWords) Write($"{wordCount} ");
    if (parsedArgs.CountBytes) Write($"{byteCount} ");
    if (parsedArgs.CountCharacters) Write($"{charCount} ");
    WriteLine(parsedArgs.FilePath);
}
catch (FileNotFoundException)
{
    WriteLine($"Error: The file '{parsedArgs.FilePath}' does not exist.");
}

static long CountWords(string line)
{
    // Split the line by spaces and count non-empty entries
    var words = line.Split(' ', StringSplitOptions.RemoveEmptyEntries);
    return words.Length;
}

static void PrintUsage()
{
    const string usage_helper = @"Usage: ccwc [OPTION]... [FILE]...
  or:  ccwc [OPTION]... --files0-from=F
Print newline, word, and byte counts for each FILE, and a total line if
more than one FILE is specified.  A word is a non-zero-length sequence of
characters delimited by white space.

With no FILE, or when FILE is -, read standard input.

The options below may be used to select which counts are printed, always in
the following order: newline, word, character, byte, maximum line length.
  -c, --bytes            print the byte counts
  -m, --chars            print the character counts
  -l, --lines            print the newline counts
  -w, --words            print the word counts
      --help     display this help and exit
      --version  output version information and exit
      
!!!DISCLAIMER: This is a clone of famous wc (Word Count) command line. ";
    WriteLine(usage_helper);
}

A very simple ArgumentsParser.

public record class CWArgument(
    string FilePath, 
    bool CountBytes, 
    bool CountWords, 
    bool CountLines, 
    bool CountCharacters);

public static class ArgumentsParser
{
    /*
     * -c : number of bytes
     * -l : number of lines
     * -w : number of words
     * -m : number of characters
     * Default options - equivalent to -c -l -w
     */
    public static CWArgument Parse(string[] arguments)
    {
        var countB = arguments.Contains("-c") || arguments.Contains("--bytes");
        var countW = arguments.Contains("-w") || arguments.Contains("--words");
        var countL = arguments.Contains("-l") || arguments.Contains("--lines");
        var countM = arguments.Contains("-m") || arguments.Contains("--chars");
        var path = arguments[^1];

        var defaults = !countB && !countW && !countL && !countM;
        if (defaults) return new CWArgument(path, true, true, true, false);

        return new CWArgument(path, countB, countW, countL, countM);
    }
}

How It Works

  1. File Handling: We use a StreamReader to read the file line by line. This is the simplest way to handle file reading in C#.
  2. Counting Options: We handle the different counting options using the CWArgument record and the ArgumentsParser class. The parser processes the command-line arguments to determine what should be counted (bytes, words, lines, or characters). By default, it counts bytes, words, and lines, unless specified otherwise.
  3. Counting Lines, Words, and Characters: Depending on the options provided, we count the number of lines, words, characters, and bytes.
  4. Command-Line Arguments: The file path and options (e.g., -c for bytes) are passed as command-line arguments. The program parses them to determine which statistics to display.

Usage

To run the program, compile it and run it from the command line:

dotnet run -- -l -w -c <path_to_your_file>

For example:

dotnet run -- -l -w example.txt

The output will show the selected statistics based on the arguments passed. For example:

5 15 78 example.txt

This means the file example.txt has 5 lines, 15 words, and 78 characters.

Create an runnable executable

To do so all you need is.

dotnet publish -c Release -o ../output

Check the output folder recently created.

Special Mention

This implementation is a minimalist solution to the Build Your Own wc Tool challenge.

If you'd like to see a more refined version, including optimizations and additional features, check out my GitHub repository: Rmauro.CommandLines.WordCount.

Conclusion

We’ve built a minimalist version of the Unix wc tool in C# using just a few lines of code.

By adding argument parsing, we made the tool customizable based on the user’s needs.

This implementation is a great starting point for building command-line utilities and getting comfortable with file handling and argument parsing in C#.

You can experiment further by adding features or improving performance, but this version should give you a solid foundation.

Happy coding! 😎

Love Discord?