A silly interface for creating state machines that act on ASCII text inputs.
To read documentation about the provided text machines, see this document.
This project is primarily a library of text machine operations and the necessary tools to write your own. However, it
does come with a few examples you can run in main.c.
To build and run the executable, simply use
$ make
$ ./tmacTo run the test cases and view their results in the console, use
$ make checkIndividual test suites can be run with
$make tests/test_name.testThis is basically equivalent to the test path but with the .c suffix replaced with .test.
Each test case has its own in-code documentation which describes what the test case is testing. The test suite collections have a top-level comment at the beginning of the file to describe the overall test suite.
To add your own tests, just create a new file in the tests/ directory and fill it with test cases. You can use the
other test cases as examples. It will automatically be executable using the make commands above.
The core of the library is the following structure:
struct txtmac {
char (*next)(struct txtmac *self);
void (*destroy)(struct txtmac *self)
void *priv;
};which represents a text machine.
The next function takes in the text machine itself as its sole parameter and must output the next character that is
generated by the text machine.
The destroy function takes in the text machine itself as its sole parameter and performs the tear-down necessary to
destroy a machine and release its memory. This field can be left NULL to perform the default tear-down, which amounts
to a free() call on the machine reference.
The priv member allows the structure to hold private malloc'd internals. This makes it possible for text machines to
hold state in between each next() call, which makes for complex behaviour. These text machines are meant to be state
machines under the hood, inspired by the jumbler project.
The idea of the text machines are that they can be "chained" together. That is, one text machine may act on the output of another text machine to create a pipeline of text transformations. For this to work, there must be an input to act on. That is why there are two kinds of text machines in this implementation:
- Generative machines: do not need an input to work, they generate an output
- Filtering machines: apply some transformation on an input
To make things easy to use, a couple of "adapter" text machines exist. One such machine is the file text machine, which can be used as follows:
FILE *file = fopen("someinputfile.txt", "r");
if (file == NULL)
{
fprintf(stderr, "Couldn't open file: %s\n", strerror(errno));
return -1;
}
struct txtmac *input = minit_file(file);Calling the next method on this text machine reads characters from the input file one by one (with built-in
buffering).
To create a text-machine chain that capitalizes every letter in a file, you can create this function to apply to each character:
static char toupper_c(char c) { return toupper(c); }Then write some program like this:
FILE *file = fopen("someinputfile.txt", "r");
if (file == NULL)
{
fprintf(stderr, "Couldn't open file: %s\n", strerror(errno));
return -1;
}
struct txtmac *stream = minit_file(file);
struct txtmac *tm = minit_applicator(stream, toupper_c);
/* Prints the first 10 characters of the file in all caps */
for (int i = 0; i < 10; i++) {
printf("%c", tm->next(tm));
}In order to get the output from your text machine(s) somewhere useful, you can use the sink API. This allows you to
provide a text machine as input ans choose whether its output goes to a FILE stream, a stream with a file descriptor
handle or a buffer.
For example, to print the capitalized letters from the previous example to the console, you could use:
/* ...All the setup from the previous example... */
struct txtmac *stream = minit_file(file);
struct txtmac *tm = minit_applicator(stream, toupper_c);
/* Print everything to the console, stopping when EOF is hit */
int err = sink_file(tm, stdout);
if (err) {
fprintf(stderr, "Encountered error: %s\n", strerror(err));
}All sinks stop when EOF is returned by the text machine.
Internals are allocated with malloc, which doesn't give the user control over memory allocation.
Characters are processed one-by-one, which can be slow. Although characters are buffered when read from a file to avoid extra system calls and instead operate mostly directly on memory, chained text machines are forced to operate on a character-by-character basis and perform their own buffering if they need memory.