Skip to content

linguini1/text-machines

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text Machines

A silly interface for creating state machines that act on ASCII text inputs.

To read documentation about the provided text machines, see this document.

Building

This project is primarily a library of text machine operations and the necessary tools to write your own. However, it does come with a few examples you can run in main.c.

To build and run the executable, simply use

$ make
$ ./tmac

Testing

To run the test cases and view their results in the console, use

$ make check

Individual test suites can be run with

$make tests/test_name.test

This is basically equivalent to the test path but with the .c suffix replaced with .test.

Each test case has its own in-code documentation which describes what the test case is testing. The test suite collections have a top-level comment at the beginning of the file to describe the overall test suite.

To add your own tests, just create a new file in the tests/ directory and fill it with test cases. You can use the other test cases as examples. It will automatically be executable using the make commands above.

Structure

The core of the library is the following structure:

struct txtmac {
    char (*next)(struct txtmac *self);
    void (*destroy)(struct txtmac *self)
    void *priv;
};

which represents a text machine.

The next function takes in the text machine itself as its sole parameter and must output the next character that is generated by the text machine.

The destroy function takes in the text machine itself as its sole parameter and performs the tear-down necessary to destroy a machine and release its memory. This field can be left NULL to perform the default tear-down, which amounts to a free() call on the machine reference.

The priv member allows the structure to hold private malloc'd internals. This makes it possible for text machines to hold state in between each next() call, which makes for complex behaviour. These text machines are meant to be state machines under the hood, inspired by the jumbler project.

Chaining

The idea of the text machines are that they can be "chained" together. That is, one text machine may act on the output of another text machine to create a pipeline of text transformations. For this to work, there must be an input to act on. That is why there are two kinds of text machines in this implementation:

  • Generative machines: do not need an input to work, they generate an output
  • Filtering machines: apply some transformation on an input

To make things easy to use, a couple of "adapter" text machines exist. One such machine is the file text machine, which can be used as follows:

FILE *file = fopen("someinputfile.txt", "r");
if (file == NULL)
    {
        fprintf(stderr, "Couldn't open file: %s\n", strerror(errno));
        return -1;
    }
struct txtmac *input = minit_file(file);

Calling the next method on this text machine reads characters from the input file one by one (with built-in buffering).

To create a text-machine chain that capitalizes every letter in a file, you can create this function to apply to each character:

static char toupper_c(char c) { return toupper(c); }

Then write some program like this:

FILE *file = fopen("someinputfile.txt", "r");
if (file == NULL)
    {
        fprintf(stderr, "Couldn't open file: %s\n", strerror(errno));
        return -1;
    }

struct txtmac *stream = minit_file(file);
struct txtmac *tm = minit_applicator(stream, toupper_c);

/* Prints the first 10 characters of the file in all caps */

for (int i = 0; i < 10; i++) {
    printf("%c", tm->next(tm));
}

Sinking

In order to get the output from your text machine(s) somewhere useful, you can use the sink API. This allows you to provide a text machine as input ans choose whether its output goes to a FILE stream, a stream with a file descriptor handle or a buffer.

For example, to print the capitalized letters from the previous example to the console, you could use:

    /* ...All the setup from the previous example... */

    struct txtmac *stream = minit_file(file);
    struct txtmac *tm = minit_applicator(stream, toupper_c);

    /* Print everything to the console, stopping when EOF is hit */

    int err = sink_file(tm, stdout);
    if (err) {
        fprintf(stderr, "Encountered error: %s\n", strerror(err));
    }

All sinks stop when EOF is returned by the text machine.

Limitations

Internals are allocated with malloc, which doesn't give the user control over memory allocation.

Characters are processed one-by-one, which can be slow. Although characters are buffered when read from a file to avoid extra system calls and instead operate mostly directly on memory, chained text machines are forced to operate on a character-by-character basis and perform their own buffering if they need memory.

Releases

No releases published

Packages

No packages published