C 协程 已翻译 100%

chishaxie 投递于 2013/03/19 09:28 (共 21 段, 翻译完成于 03-26)
阅读 14834
收藏 35
7
加载中

Introduction

Structuring a large program is always a difficult job. One of the particular problems that often comes up is this: if you have a piece of code producing data, and another piece of code consuming it, which should be the caller and which should be the callee?

Here is a very simple piece of run-length decompression code, and an equally simple piece of parser code:

/* Decompression code */
    while (1) {
        c = getchar();
        if (c == EOF)
            break;
        if (c == 0xFF) {
            len = getchar();
            c = getchar();
            while (len--)
                emit(c);
        } else
            emit(c);
    }
    emit(EOF);
/* Parser code */
    while (1) {
        c = getchar();
        if (c == EOF)
            break;
        if (isalpha(c)) {
            do {
                add_to_token(c);
                c = getchar();
            } while (isalpha(c));
            got_token(WORD);
        }
        add_to_token(c);
        got_token(PUNCT);
    }

已有 1 人翻译此段
我来翻译
Each of these code fragments is very simple, and easy to read and understand. One produces a character at a time by callingemit(); the other consumes a character at a time by callinggetchar(). If only the calls toemit()and the calls togetchar()could be made to feed data to each other, it would be simple to connect the two fragments together so that the output from the decompressor went straight to the parser.

In many modern operating systems, you could do this using pipes between two processes or two threads.emit()in the decompressor writes to a pipe, andgetchar()in the parser reads from the other end of the same pipe. Simple and robust, but also heavyweight and not portable. Typically you don't want to have to divide your program into threads for a task this simple.

In this article I offer a creative solution to this sort of structure problem.

已有 1 人翻译此段
我来翻译

Rewriting

The conventional answer is to rewrite one of the ends of the communication channel so that it's a function that can be called. Here's an example of what that might mean for each of the example fragments.

int decompressor(void) {
    static int repchar;
    static int replen;
    if (replen > 0) {
        replen--;
        return repchar;
    }
    c = getchar();
    if (c == EOF)
        return EOF;
    if (c == 0xFF) {
        replen = getchar();
        repchar = getchar();
        replen--;
        return repchar;
    } else
        return c;
}
void parser(int c) {
    static enum {
        START, IN_WORD
    } state;
    switch (state) {
        case IN_WORD:
        if (isalpha(c)) {
            add_to_token(c);
            return;
        }
        got_token(WORD);
        state = START;
        /* fall through */

        case START:
        add_to_token(c);
        if (isalpha(c))
            state = IN_WORD;
        else
            got_token(PUNCT);
        break;
    }
}

Of course you don't have to rewrite both of them; just one will do. If you rewrite the decompressor in the form shown, so that it returns one character every time it's called, then the original parser code can replace calls togetchar()with calls todecompressor(), and the program will be happy. Conversely, if you rewrite the parser in the form shown, so that it is called once for every input character, then the original decompression code can callparser()instead ofemit()with no problems. You would only want to rewrite both functions as callees if you were a glutton for punishment.

已有 1 人翻译此段
我来翻译

And that's the point, really. Both these rewritten functions are thoroughly ugly compared to their originals. Both of the processes taking place here are easier to read when written as a caller, not as a callee. Try to deduce the grammar recognised by the parser, or the compressed data format understood by the decompressor, just by reading the code, and you will find that both the originals are clear and both the rewritten forms are less clear. It would be much nicer if we didn't have to turn either piece of code inside out. 

已有 1 人翻译此段
我来翻译

Knuth's coroutines

InThe Art of Computer Programming, Donald Knuth presents a solution to this sort of problem. His answer is to throw away the stack concept completely. Stop thinking of one process as the caller and the other as the callee, and start thinking of them as cooperating equals.

In practical terms: replace the traditional "call" primitive with a slightly different one. The new "call" will save the return value somewhere other than on the stack, and will then jump to a location specified in another saved return value. So each time the decompressor emits another character, it saves its program counter and jumps to the last known location within the parser - and each time the parser needs another character, it saves its own program counter and jumps to the location saved by the decompressor. Control shuttles back and forth between the two routines exactly as often as necessary.

已有 1 人翻译此段
我来翻译

This is very nice in theory, but in practice you can only do it in assembly language, because no commonly used high level language supports the coroutine call primitive. Languages like C depend utterly on their stack-based structure, so whenever control passes from any function to any other, one must be the caller and the other must be the callee. So if you want to write portable code, this technique is at least as impractical as the Unix pipe solution. 

已有 1 人翻译此段
我来翻译

Stack-based coroutines

So what we would really like is the ability to mimic Knuth's coroutine call primitive in C. We must accept that in reality, at the C level, one function will be caller and the other will be callee. In the caller, we have no problem; we code the original algorithm, pretty much exactly as written, and whenever it has (or needs) a character it calls the other function.

The callee has all the problems. For our callee, we want a function which has a "return and continue" operation: return from the function, and next time it is called, resume control from just after the return statement. For example, we would like to be able to write a function that says

int function(void) {
    int i;
    for (i = 0; i < 10; i++)
        return i;   /* won't work, but wouldn't it be nice */
}

and have ten successive calls to the function return the numbers 0 through 9.

已有 1 人翻译此段
我来翻译

How can we implement this? Well, we can transfer control to an arbitrary point in the function using agotostatement. So if we use a state variable, we could do this:

int function(void) {
    static int i, state = 0;
    switch (state) {
        case 0: goto LABEL0;
        case 1: goto LABEL1;
    }
    LABEL0: /* start of function */
    for (i = 0; i < 10; i++) {
        state = 1; /* so we will come back to LABEL1 */
        return i;
        LABEL1:; /* resume control straight after the return */
    }
}

This method works. We have a set of labels at the points where we might need to resume control: one at the start, and one just after eachreturnstatement. We have a state variable, preserved between calls to the function, which tells us which label we need to resume control at next. Before any return, we update the state variable to point at the right label; after any call, we do aswitchon the value of the variable to find out where to jump to.

已有 1 人翻译此段
我来翻译

It's still ugly, though. The worst part of it is that the set of labels must be maintained manually, and must be consistent between the function body and the initialswitchstatement. Every time we add a new return statement, we must invent a new label name and add it to the list in theswitch; every time we remove a return statement, we must remove its corresponding label. We've just increased our maintenance workload by a factor of two. 

已有 1 人翻译此段
我来翻译

Duff's device

The famous "Duff's device" in C makes use of the fact that acasestatement is still legal within a sub-block of its matchingswitchstatement. Tom Duff used this for an optimised output loop:

switch (count % 8) {
        case 0:        do {  *to = *from++;
        case 7:              *to = *from++;
        case 6:              *to = *from++;
        case 5:              *to = *from++;
        case 4:              *to = *from++;
        case 3:              *to = *from++;
        case 2:              *to = *from++;
        case 1:              *to = *from++;
                       } while ((count -= 8) > 0);
    }

We can put it to a slightly different use in the coroutine trick. Instead of using aswitchstatement to decide whichgotostatement to execute, we can use theswitchstatement to perform the jump itself:

int function(void) {
    static int i, state = 0;
    switch (state) {
        case 0: /* start of function */
        for (i = 0; i < 10; i++) {
            state = 1; /* so we will come back to "case 1" */
            return i;
            case 1:; /* resume control straight after the return */
        }
    }
}

Now this is looking promising. All we have to do now is construct a few well chosen macros, and we can hide the gory details in something plausible-looking:

#define crBegin static int state=0; switch(state) { case 0:
#define crReturn(i,x) do { state=i; return x; case i:; } while (0)
#define crFinish }
int function(void) {
    static int i;
    crBegin;
    for (i = 0; i < 10; i++)
        crReturn(1, i);
    crFinish;
}

(note the use ofdo ... while(0)to ensure thatcrReturndoes not need braces around it when it comes directly betweenifandelse)

已有 1 人翻译此段
我来翻译
本文中的所有译文仅用于学习和交流目的,转载请务必注明文章译者、出处、和本文链接。
我们的翻译工作遵照 CC 协议,如果我们的工作有侵犯到您的权益,请及时联系我们。
加载中

评论(8)

jander
jander
太不好用了。
Cobbage
Cobbage
顶一个
张三alpha
张三alpha
赞一个!
霡霂
霡霂

引用来自“缪斯的情人”的评论

好累人

樯橹间灰飞烟灭
chishaxie
chishaxie

引用来自“jimmyjmh”的评论

引用来自“缪斯的情人”的评论

好累人

这么勤奋肯定累啊。

你们好猛啊……
chishaxie
chishaxie

引用来自“缪斯的情人”的评论

好累人

你们好猛啊……
jimmyjmh
jimmyjmh

引用来自“缪斯的情人”的评论

好累人

这么勤奋肯定累啊。
缪斯的情人
缪斯的情人
好累人
返回顶部
顶部