On the structure of parallel computing or the arguments against the “Go” operator

Each language that supports parallel (competitive, asynchronous) computing needs a way to run code in parallel. Here are examples from different APIs:

go myfunc(); // Golang pthread_create(&thread_id, NULL, &myfunc); /* C with POSIX threads */ spawn(modulename, myfuncname, []) % Erlang threading.Thread(target=myfunc).start() # Python with threads asyncio.create_task(myfunc()) # Python with asyncio 

There are many options for notation and terminology, but one semantics is to run myfunc in parallel with the main program and continue the parent thread of execution (Eng. "Control Flow")

Another option is Callbacks :

 QObject::connect(&emitter, SIGNAL(event()), // C++ with Qt &receiver, SLOT(myfunc())) g_signal_connect(emitter, "event", myfunc, NULL) /* C with GObject */ document.getElementById("myid").onclick = myfunc; // Javascript promise.then(myfunc, errorhandler) // Javascript with Promises deferred.addCallback(myfunc) # Python with Twisted future.add_done_callback(myfunc) # Python with asyncio 

And again, the notation changes, but all the examples make it so that, starting from the current moment, if and when a certain event happens, then myfunc start. Once callback is set, control returns and the calling function continues. (Sometimes callbacks are wrapped in convenient combining functions or Twisted-style protocols , but the basic idea is unchanged.)

And it's all. Take any popular concurrency general purpose language and you will probably find that it falls into one of these paradigms (sometimes both, asyncio).

But not my new weird Trio library. She does not use these approaches. Instead, if we want to run myfunc and anotherfunc in parallel, we write something like this:

 async with trio.open_nursery() as nursery: nursery.start_soon(myfunc) nursery.start_soon(anotherfunc) 

nursery - nursery, nursery

For the first time faced with the design of "nursery", people are lost. Why is there a context manager (with-block)? What is this nursery, and why is it needed to run a task? Then people understand that the nursery interferes with the usual approaches in other frameworks and get angry. Everything seems bizarre, specific, and too high-level to be a basic primitive. All these are understandable reactions! But bear with it a little.

In this article, I want to convince you that nurseries are not a fad, but rather a new primitive for controlling the flow of execution, as fundamental as loops and function calls. Moreover, the approaches discussed above (creating threads and registering callbacks) need to be discarded and replaced by nurseries.

Sounds too bold? But this has already happened: once goto widely used to control the behavior of a program. Now this is an occasion to laugh:

Several languages ​​still have the so-called goto , but its capabilities are much more limited than the original goto . And in most languages ​​it is not at all. What happened to him? This story is surprisingly relevant, although unfamiliar to most because of its antiquity. Let's remind ourselves what goto , and then see how this can help in asynchronous programming.

Table of contents

  • What is goto?
  • What is go?
  • What happened to goto?
    • goto destroys abstraction
    • Brave new world without goto
    • No more goto
  • About the dangers of “Go” type expressions
    • go expressions break abstractions.
    • go-expressions break the auto-cleaning of open resources.
    • go expressions break error handling.
    • No more go
  • Nursery as a structural replacement for go
    • The nursery retains the abstraction of functions.
    • Nursery support dynamic adding tasks.
    • You can still leave the nursery.
    • You can identify new types that quack as a nursery.
    • No, however, nurseries are always waiting for the completion of all tasks inside.
    • Works automatic cleaning of resources.
    • Bug raising works.
    • Brave new world without go
  • Nurseries in practice
  • conclusions
  • Comments
  • Acknowledgments
  • Footnotes
  • about the author
  • Continuation

What is goto ?

The first computers were programmed using assembler , or even more primitively. This is not very convenient. So in the 1950s, people such as John Backus of IBM and Grace Hopper of Remington Rand began developing languages ​​such as FORTRAN and FLOW-MATIC (better known for its direct descendant COBOL ).

FLOW-MATIC was very ambitious at the time. You can think of it as the great-great-great-great-grandfather of Python - it was the first language developed primarily for people, and the second for computers. He looked like this:

Note that unlike modern languages, there are no conditional if blocks, loops or function calls - in fact there are no blocks or indents at all. This is just a sequential list of expressions. Not because this program is too short to require control statements (other than JUMP TO ) - just such a syntax has not yet been invented!

Instead, FLOW-MATIC had two options for controlling the flow of execution. Usually the flow was consistent - start from the top and move down, one expression at a time. But if you execute the special JUMP TO expression, it could take control somewhere else. For example, expression (13) jumps to expression (2):

Just as with the primitives of parallelism from the beginning of the article, then there was no agreement on what to call this “take one-way jump” operation. In the listing, this is JUMP TO , but goto historically taken root (like "go there"), which I use here.

Here is the complete set of goto jumps in this small program:

This seems confusing not only to you! FLOW-MATIC inherited this jumping-based programming style directly from assembler. It is powerful, well close to how the computer hardware actually works, but it is very difficult to work directly with it. This ball of arrows is the reason for the invention of the term "spaghetti code."

But why did goto cause such a problem? Why are some control statements good and others not? How to choose the good ones? At that time it was completely incomprehensible, and if you do not understand the problem, it is difficult to solve.

What is go ?

Let's digress from our story. Everyone knows goto was bad, but what does this have to do with asynchrony? Look at the famous go expression from Golang, which is used to spawn the new "goroutine" (lightweight stream):

 // Golang go myfunc(); 

Is it possible to draw a diagram of its flow of execution? It is slightly different from the diagram above, because here the stream is divided. Let's draw it like this:

The colors here are meant to show that both paths are chosen. From the point of view of the parent goroutine (green line) - the control flow is executed sequentially: it starts from above and then immediately goes down. Meanwhile, from the point of view of the descendant function (lilac line), the stream comes from above and then jumps into the body of myfunc . Unlike a regular function call, there is a one-way jump - starting myfunc we switch to a completely new stack and the runtime immediately forgets where we came from.

apparently I mean the call stack

But this applies not only to Golang. This diagram is true for all primitives (controls) listed at the beginning of the article:

  • Threading libraries usually return some kind of control object that will allow them to join the thread later - but this is an independent operation that the language itself knows nothing about. The primitive for creating a new thread has the diagram shown above.
  • Callback registration is semantically equivalent to creating a background thread (although it’s obvious that the implementation is different), which:
    a) is blocked until an event occurs, and then
    b) launches a callback function
    So, in terms of high-level control operators, callback registration is an expression identical to go .
  • With Futures and Promises the same thing - when you run the function and it returns Promise , it means that it planned to work in the background and returns a control object to get the result later (if you want). From the point of view of management semantics, it is the same as creating a flow. After that, you pass the callback to Promis and then as in the previous paragraph.

This same pattern shows itself in many forms - the key similarity is that in all these cases the control flow is divided - a jump is made to the new thread, but the parent returns to the one who called it. Knowing what to look at, you will see it everywhere! This is an interesting game (at least for some types of people)!

Still, it annoys me that there is no standard name for this category of control statements. I use the expression “go” to call them, just as “goto” has become a generic term for all goto expressions. Why go ? One reason is that Golang gives us a very clean example of such syntax. And the other one is:

Notice the similarity? That's right - go is one of the goto forms.

Asynchronous programs are notorious for the difficulty of writing and analyzing. As well as programs based on goto . The problems caused by goto mostly resolved in modern languages. If we learn how to fix goto , will this help to create more convenient asynchronous APIs? Let's find out!

What happened to goto ?

So what's wrong with goto that causes so many problems? In the late 60s, Edsger Wee Dijkstra wrote a couple of works now known that helped to understand this much more clearly: The arguments against the goto operator and Notes on structural programming .

goto destroys abstraction

In these works, Dijkstra worried about how we write non-trivial programs and ensure their correctness. There are many interesting points. For example, you probably heard this phrase:

Testing programs can show the presence of errors, but never their absence.

Yes, this is from Structural Programming Notes . But his main concern was abstraction . He wanted to write programs too big to hold them in their head. To do this, you must treat the parts of the program as black boxes - for example, you see this program in Python:

 print("Hello World!") 

and you don’t need to know all the details of how print (line formatting, buffering, cross-platform differences, etc.). All you need to know is that print will somehow print the text you passed in, and you can concentrate on what you want to do in this piece of code. Dijkstra wanted languages ​​to support this type of abstraction.

At this point, block syntax was invented and languages ​​like ALGOL accumulated ~ 5 different types of control statements: they still had a sequential thread of execution and goto :

And also acquired conditions, cycles and function calls:

You can implement these high-level constructs using goto , and this is how people thought of them before: as a convenient shortcut. But Dijkstra pointed out the big difference between goto and the rest of the control operators. For everything but goto , the thread of execution

  • comes from above => [something happens] => the flow comes from below

We can call this the “black box rule” - if the control structure (control operator) has this form, then in a situation where you are not interested in the details inside, you can ignore the part “something happens” and treat the block as with a regular sequential team. Even better, this is true for any code that is composed of these blocks. When I look at:

 print("Hello World!") 

I don’t need to read the sources of print and all its dependencies in order to understand where the execution thread will go. Maybe inside print there is a loop, and in it there is a condition in which there is a call to another function ... it's all not important - I know that the thread will go to print , the function will do its job, and eventually the thread will return to the code that I I read.

But if you have a language with goto - a language where functions and everything else is built on the basis of goto , and goto can jump anywhere, anytime - then these structures are not black boxes at all! If you have a function with a loop, inside which there is a condition, and inside it there is goto ... then this goto can pass execution anywhere. Perhaps control will suddenly return completely from another function that you have not even called! You do not know!

And that breaks the abstraction - any function can have a potential goto inside, and the only way to find out if this is the case is to keep all the source code of your system in mind. Once the language has goto , you cannot predict the flow of execution. That's why goto leads to spaghetti code.

And as soon as Dijkstra understood the problem, he was able to solve it. Here is his revolutionary assumption - we should not think of conditions / loops / function calls as abbreviations for goto , but as fundamental primitives with our rights - and we should completely remove goto from our languages.

From 2018, this seems pretty obvious. But how do programmers react when you try to pick up their unsafe toys? In 1969, Dijkstra's proposal seemed incredibly dubious. Donald Knuth defended goto . People who became experts in writing code with goto were rightly indignant against having to re-learn how to express their ideas in new, more restrictive terms. And of course, it took to create completely new languages.

As a result, modern languages ​​are a little less strict than Dijkstra's original wording.

Left: traditional goto . Right: Domesticated goto , as in C, C #, Golang, etc. Failure to cross the boundaries of a function means that he can still pee on your shoes, but is unlikely to rip you apart.

They allow you to jump the nesting levels of structural control statements using break , continue , or return . But at a basic level, they are all built around Dijkstra's idea and can disrupt the sequential flow of execution in a strictly limited way. In particular, functions — a fundamental tool for wrapping a thread of execution in a black box — are indestructible. You cannot execute the break command from one function to another and return cannot return you further than the current function. No manipulation of the thread of execution inside the function will affect other functions.

And the languages ​​that preserved the goto operator (C, C #, Golang, ...) severely limited it. At a minimum, they do not allow you to jump from the body of one function to another. If you are not using Assembler [2], the classic, unlimited goto is a thing of the past. Dijkstra won.

Brave new world without goto

Something interesting happened with the disappearance of goto - the language creators were able to start adding new features based on a structured flow of execution.

For example, Python has a cool syntax for automatically clearing resources - a context manager . You can write:

 # Python with open("my-file") as file_handle: some code 

and this ensures that the file will be opened at runtime some code but after that - immediately closed. Most modern languages ​​have equivalents ( RAII , using , try-with-resource, defer , ...). And they all assume that the control flow is in order. And what happens if we jump into the with block using goto ? Is the file open or not? And if we jump out of there instead of leaving as usual?

after the code inside the block is completed, with starts the __exit__() method which closes open resources, such as files and connections.

Will the file close? In goto , context managers simply don't work in a clear way.

The same problem with error handling - when something goes wrong, what should the code do? Often - send a description of the error up the stack (of calls) to the calling code and let it decide what to do. Modern languages ​​have constructions specifically for this, such as Exceptions or other forms of automatic error raising . But this help is only available if the language has a call stack and a robust "call" concept. Recall the spaghetti in the flow example in the FLOW-MATIC example and imagine the exception thrown in the middle. Where can it even come?

No more goto

So, the traditional goto - which ignores function boundaries - is bad not only because it is difficult to use correctly. If only this, goto could have stayed - many bad language constructs remained.

But even the very goto feature in the language makes everything more complicated. Third-party libraries can not be considered a black box - without knowing the source, you can’t figure out which functions are normal and which unpredictably control the flow of execution. This is a major obstacle to predicting local code behavior. Powerful features such as context managers and automatic error pop-ups are also lost. It is better to remove goto altogether, in favor of control operators that support the black box rule.

About the dangers of expressions like "Go"

So, we looked at the goto story. But is it applicable to the go operator? Well ... all in all! The analogy is shockingly accurate.

go expressions break abstractions.

Remember how we said that if the language allows goto , then any function can hide goto in itself? In most asynchronous frameworks, go expressions lead to the same problem - any function may (or may not) run the task in the background. It looks like the function has returned control, but does it still work in the background? And there is no way to find out without reading the source of the function and everything that it calls. And when will it end? Hard to say. If you have go and its analogues, then functions are no longer black boxes that respect the flow of execution. In my first article on asynchronous APIs , I called this a “causation violation” and found that this is the root cause of many common, real problems in programs using asyncio and Twisted, such as flow control problems, problems with proper shutdowns, etc.

This refers to the control of the flow of data entering and leaving the program. For example, the program receives data at a speed of 3MB / s, and leaves at a speed of 1MB / s, and accordingly the program consumes more and more memory, see another article by the author

go-expressions break the auto-cleaning of open resources.

Let's take a look at an example with statement again:

 # Python with open("my-file") as file_handle: some code 

Earlier we said that we were “guaranteed” that the file would be open while some code working, and closed after. But what if some code starts a background task? : , , with , with , , , . , ; , , some code .

, , - , , .

, Python threading — , , — , with

, , , ( ). , . , .

go- .

, , (exceptions), . " ". , . , , . , , … , . , - . ( , , " - " — ; .) Rust — , , - — . (thread) , Rust .

, , join , errbacks Twisted Promise.catch Javascript . , , . , Traceback . Promise.catch .

, .


, goto , go- — , , , . , goto , , go .

, , ! :

  • go -, , " ",
  • , go -.

, Trio .


: , , , . , , :

, , , " " .

? " " ,

) , , ( ),
) , .

. , - . , .. [3]

: , , , "" , . Trio , async with :

, as nursery nursery . nursery.start_soon() , () : myfunc anotherfunc . . , , () , , .

, , — , , . , .

, .


, . Here is some of them:


go- — , , , . — , , . , , .


, . :

 run_concurrently([myfunc, anotherfunc]) 

async.gather Python, Promise.all Javascript, ..

, , , . , accept , .
accept Trio:

 async with trio.open_nursery() as nursery: while True: incoming_connection = await server_socket.accept() nursery.start_soon(connection_handler, incoming_connection) 

, , run_concurrently . , run_concurrently — , , run_concurrently , .


. , , ? : . , async with open_nursery() nursery.start_soon() , — [4], , , . , , .

, , " ", :

  • , , , , , .
  • , .
  • , .

, .

, , go-, .

, .

, - . , , . :

 async with my_supervisor_library.open_supervisor() as nursery_alike: nursery_alike.start_soon(...) 

, , . .

Trio , asyncio : start_soon() , Future ( , Future ). , ( , Trio Future !), .

, , .

, , — — .

Trio, . , , " " ( ), Cancelled . , , — - , " ", , .. , , . , , , .


" ", with . , with , .


, , . .

Trio, , … - . , . , — " " — , myfunc anotherfunc , . , , .

, : (re-raise) , . ,

" " , , , , , .

, , . ?

— ( ) , . , , , , .

, , - ( task cancellation ). C# Golang, — .


goto , with ; go - . For instance:

  • , , , . ( : - )
  • — Python , ctrl-C ( ). , .

, . ?

… : ! , , . , , , break continue .

, . — , 1970 , goto .

. (Knuth, 1974, .275):

, goto , , " " goto . goto ! , , goto , . , , . , — , — "goto" .

: . , , . , , . , , .

, Happy Eyeballs ( RFC 8305 ), TCP . , — , , . Twisted — 600 Python . 15 . , , , . , , . , . ? Time will tell. .


go , , , Futures , Promises ,… — goto , . goto , -- goto , . , , ; , . , goto , .

, , ( CTRL+C ) , .

, , , , — , goto . FLOW-MATIC , , - . , , Trio , , .


Trio .


Trio : https://trio.discourse.group/


Graydon Hoare, Quentin Pradet, and Hynek Schlawack . , , .

berez .


Wolves in Action, Martin Pannier, CC-BY-SA 2.0 , .
, Daniel Borker, CC0 public domain dedication .


[2] WebAssembly , goto : ,

[3] , , , , :
The "parallel composition" operator in Cooperating/Communicating Sequential Processes and Occam, the fork/join model, Erlang supervisors, Martin Sústrik's libdill , crossbeam::scope / rayon::scope Rust. golang.org/x/sync/errgroup github.com/oklog/run Golang.
, - .

[4] start_soon() , , start_soon , , , . , .

about the author

Nathaniel J. Smith , Ph.D., UC Berkeley numpy , Python . Nathaniel .



, , , Haskell , , .

( , 0xd34df00d , ) , ( Happy Eyeballs ), .

, Trio ? Haskell Golang ?


Source: https://habr.com/ru/post/479186/

All Articles