T h e J o y o f C o n c u r r e n t L o g i c P r o g r a m m i n g Introduction This article is an attempt to describe my experiences with concurrent logic programming languages, a programming paradigm that has been largely forgotten but has a number of interesting properties that make it worthwhile to explore. Especially today, where CPU designers have run out of ideas and have settled on throwing more and more processor cores at us, parallel and concurrent programming remain the areas where performance gains seem still be achievable, yet, our programming models are not at all suited to this - coarse-grained processes with shared state is still the most widely used strategy and parallelizing application remains hard and error prone. One concept that promises an easier model for thinking about concurrency are actors, lightweight processes communicating via messages. Depending on the implementation this can solve many of the synchronization problems in a clean manner and allows composing concurrent functionality conveniently. Erlang [10] would be the prime example, being probably the first well engineered attempt to embrace concurrent software while at the same time admitting that software can break. This humility to accept that processes fail and putting thought into how to handle such situations and how to recover is probably one of the great achievements of the folks who invented and implemented Erlang. Yet, there is a programming paradigm that precedes Erlang and which is, in a sense, even more parallel while providing similar mechanisms for synchronization and composition, but in a more elegant and consistent way. The family of concurrent logic programming languages was once a fertile field of research and was the basis of Japan's Fifth Generation Computing [11] project, which unfortunately died, together with a lot of other "AI" research during the AI winter. This is a shame, since, logic programming languages (just as LISP) can be used for a lot more than what is or was usually called "Artificial Intelligence". All of these languages originate in PROLOG, the most well known logic programming notation, but differ in the way they deviate from the basic execution model. Starting with Concurrent Prolog (CP), further language developments removed more and more of the distinctive features of Prolog while becoming more streamlined and easier and more efficient to implement. Dialects that where spawned by CP are Parlog, FCP, GHC, FGHC, KL1, Flat Parlog, Strand, FLENG, Janus, PCN and many others (as I said, the research was fertile). Most of these are gone now, where only theoretical models or where never fully implemented. KL1 [8] was a dialect of (Flat) GHC used on custom hardware designed in Japan. Strand [1] is probably the most streamlined and elegant of the family. One could say that Strand is to Erlang what Scheme is to Lisp, the cleaned up and generalized version of a more pragmatic engineering effort to implement reliable concurrent systems. Note that Strand precedes Erlang, in fact, Erlang was initially implemented on top of Strand, but deemed impractical due to the very fine grained parallelism that Strand provides. What must be noted is that, as the dialects develop further, they remove more and more classic Prolog features. This is explained in detail in [5]. As to my knowledge there exist only very few implementations of concurrent logic languages. There is a compiler for KL1 to C available [2], which seems to work, but suffers from bit rot. See also [7] for an FCP implementation, the status of which I can say nothing about. There exists a simple Strand implementation (by the author) [3]. Strand was once a commercial product but it doesn't seem to be available anymore. The Strand book [1] demonstrates very sophisticated industrial applications in its section on case studies, which show that it was used in a commercial context. The book is highly recommended, by the way, as it gives a very gentle and clear introduction to the concepts and ideas that are fundamental to programming in these languages. In what follows, I will use the FGHC dialect, for which I also provide an implementation [3], in case you are interested. Syntactically, the various dialects have a lot in common and the techniques that are used apply to most of them. It is in particular the techniques that I find fascinating, as the programming model is fundamentally different from the mainstream paradigms and which lets you see software from a new viewpoint, something that always enlivens and refreshes ones conception of the practice of software construction. A Crash Course in FGHC So, let's begin. Some basic knowledge of Prolog may be helpful to avoid difficulties understanding the jargon. Programs in FGHC consist of sets of clauses of the following form: predicate(Argument, ...) :- Guard, ... | Term, ... . Predicates of the same name and arity (number of Arguments) together form a process definition and should be grouped acordingly. Invocation of a process with the given name and arity will select one of the predicates with matching arguments and execute the body terms when all guards succeed. Guards are conditional expressions and must be selected from a predefined set of primitives (that is what makes FGHC "flat"). Once the clause is selected, the terms in the body are executed in parallel - there is no predefined order. That's not a problem, in case this worries you, as there is a simple method of synchronizing execution. As a matter of fact, you merely restrict the parallelism by forcing synchronization implicitly or explicitly, the execution model is geared towards as much parallelism as possible. This can be surprising, as we will see later. I wrote that a predicate clause is selected by matching. Matching is not the classic unification of Prolog: if a logic variable is unbound, then matching a pattern containing such a variable suspends and does not continue until the variable has been bound. This is different from Prolog, where access to an unbound variable does not influence the control flow. Unification does take place in the body part of a clause and works similar to Prolog. FGHC has logic variables like Prolog, they can be bound to other variables and dereferencing chains of variables is automatic, so the unifications in A = B, B = C, D = 123, C = foo(D) result in A holding the value "foo(123)" - the order in which these unifications take place is immaterial, any order is fine, as unbound variables can be thought of as "holes" that can be filled at any time. Variables are distinguished by starting with an uppercase letter or with "_". The variable "_" is the anonymous variable, multiple occurrences in a clause stand for multiple distinct variables. Unification is restricted to body terms, all accesses to unbound variables in head matching, guards and calls to builtin predicates will suspend the current process until some other process binds the variable. Then the suspended process will resume and perform a new attempt at selecting a matching clause or exeuting the builtin predicate. Another subtle point is that the clauses of a process definition are not matched in order - any clause can be matched first, the order is arbitrary and doesn't even have to be the same from invocation to invocation, it is non-deterministic. To control the order one can use suitable guards. A clause without guards omits the " ... |" part. A clause without guards and body looks thus: done. If no clause matches, then the process fails, which terminates the program. There is no backtracking as in Prolog. And, curiously enough, this is nearly all. There are no loops, you use recursion instead: a clause that (directly or indirectly) spawns a single process invoking the same predicate again is tail-recursive and so has constant memory requirements. There are a number of built-in guards and primitive predicates, "is/2" for expression evaluation, "=/2" for unification and that is about all you need to know about the semantics of FGHC: There exists a pool of processes executing concurrently, processes are taken from that pool, executed and spawn new processes to be added to the pool until all processes are finished or a process fails. How the concurrency is implemented is currently secondary, but usually you have multiple processors each holding a pool of quasi-parallel processes. Here is a example FGHC program, a predicate to reverse a list: /* FGHC */ -initialization(main). rev([], Y) :- Y = []. rev([X|Xs], Y) :- rev(Xs, Ys), append(Ys, [X], Y). append([], Ys, Z) :- Z = Ys. append([X|Xs], Ys, Z) :- append(Xs, Ys, Z2), Z = [X|Z2]. main :- rev([1, 2, 3, 4, 5], R), writeln(R). This is basically equivalent to the Prolog version, with the exception that one has to unify result (or "output") variables manually. The first clause of "rev/2" shows this clearly: the body ("Y = []") can only be executed when the head is fully matched. Different dialects handle this in different ways, FCP and Parlog have "mode" declarations, FGHC and Strand do not allow output-unification in clause heads, Strand furthermore does not allow unification at all, only direct assignment to unbound variables. One distinctive property of this program is that the reversal takes place concurrently, for different parts of the input list. As long as processes do not need to suspend, execution continues by spawning sub-processes without restraint, until an unbound variable argument is matched with a non-variable element in a clause head or guard. So, in the example above, since the argument list is fully ground (contains no unbound variables), processes of "rev/2" for all sub-lists are spawned, as are "append/3" processes, but the latter will suspend instantly until the various instances of "Ys" are bound to some result. "writeln/1" is a built-in and will suspend until the result is fully ground and then prints the reversed list. Note that synchronization between all the processes takes place automatically by waiting for variables to be bound. There is no need to artificially orchestrate how the processes execute in parallel, this is done implicitly like the cells of a spreadsheet - once a required result is available, it may trigger further variables to be bound and processes to be resumed, and so on, until the network of processes settles. Declarative Programming A word about syntax: I find Prolog syntax quite pleasing, it reduces special characters mostly to punctuation, similar to Lisp, Pascal or Ada, which makes programs more readable. Operators are restricted to expressions where one is used to them but do not clutter up the code, giving subtle differences in meaning that confuse more than that they help to abbreviate. Also, the meta-syntax of Prolog lends itself to automated manipulation by code, not necessarily using macros, but with the option to generate or process code in a useful manner, an advantage of languages that are "homoiconic", like Lisp or Prolog or derived languages. If you prefer a more traditional syntax, see below for PCN, which provides a more familiar look but offers the same facilities for concurrent programming. The declarative nature of logic programs may make them suitable to automated and manual proof, but this is something that someone more familiar with this subject than me must decide. I personally find Prolog code to be shorter than equivalent programs in other languages. Also, the declarative notation of specifying clauses to match arguments makes it straightforward to cleanly encode special cases, provided one properly factors subsequent code. It is a bit of an art to describe certain complex algorithms elegantly and dumb imperative code may look awkward, but that is perhaps because it is dumb and imperative and doesn't look much nicer in any language. In classic Prolog, modelling a problem to take maximum advantage of backtracking, failure driven loops or unification needs a lot of experience and the same will apply to other languages which have a powerful internal resolution scheme. In concurrent logic programming, you will use different techniques, but I claim that they provide actually more flexibility and power in a way that classic Prolog can not. Moreover we have to start looking for suitable programming paradigms if we want to truly master multicore processors and distributed computing that is more than just C++ and locks, message passing libraries like MPI or HTTP based lowest common denominators like REST. Textbooks on multithreaded programming mostly present a messy amalgam of locks, shared state and an absurd obsession with low-level details, which becomes so complex and obscure that one wants just to give up because the attempt to produce anything highly parallel seems to be inhumanly hard (because, using these techniques, it is). The languages I am talking about here will not make parallel or concurrent programs trivial, but they will let you concentrate on the important parts and will reduce boilerplate code and bookkeeping tasks. Concurrent logic languages usually have no side effects, or when, very little, in a controlled manner. I have seen descriptions of stream based I/O for some implementations, but prefer the pragmatic view of the previous generation of languages that started functional programming: by accepting that side effects are unavoidable sometimes and to let the programmer decide when to use them and then let her use it without cumbersome abstractions or "monad" pseudoscience. Streams as Communication Channels So, how do we actually design code that takes advantage of parallel processes and concurrent execution? By, at the same time, generalizing and simplifying the concept of what shared state is. The logic variable is the key concept - a first class object that can be bound to other unbound variables and where referencing it dereferences any existing chain of bindings automatically. Accessing (matching a variable) will suspend until it is bound. Unification will bind variables. Taken together we can view a list where the tail is an unbound variable as a stream of values, being produced by some process and consumed by another process. The consumer suspends (blocks) when a tail is not bound yet, the producer extends the stream by binding the tail. This makes producer and consumer run in parallel, without any further explicit synchronization: /* FGHC */ -initialization(main). producer(1000, Out) :- Out = []. % terminate stream producer(N, Out) :- otherwise | Out = [N|Out2], % Create a list cell N2 is N + 1, producer(N2, Out2). consumer([N|In]) :- % Match and deconstruct the list cell writeln(N), consumer(In). main :- producer(1, S), consumer(S). The stream "S" (which is "In" for the consumer and "Out" for the producer) is the communication channel between the two processes and the program will endlessly print (usually) increasing numbers. It may also run out of memory, because the producer may run faster than the consumer and so may spawn many child processes before the consumer is able to process them. This should always be kept in mind. If the producer runs slower than the consumer, the latter will patiently wait. One way to control the flow of messages (numbers, in this case) is to make the consumer provide the place where it expects the input, using the "bounded buffer" technique: /* FGHC */ -initialization(main). producer(N, [P|Places], Out) :- P = N, Out = [P|Out2], N2 is N + 1, producer(N2, Places, Out2). consumer([N|In], P) :- P = [_|Places], writeln(N), consumer(In, Places). main :- P = [_|P2], producer(1, P, S), consumer(S, P2). By initializing P to having one element we set the capacity of the buffer. We could have increased the initial size and the number of empty variables in "consumer/2" to give the producer more space to fill up at a time, thereby increasing throughput. We have effectively turned it around: the consumer produces a stream of unbound variables and passes it to the producer to fill them in. Running this program will print numbers, not necessarily strictly increasing, as the invocations of "writeln/1" may not take place in strict order - everything is parallel. Depending on the actual language there may be facilities to enforce completion of a side effecting operation, either by having a sequencing operator or by having "writeln/2" return a variable that is bound to some value once the operation is complete. The classic problems of concurrent programming have not magically vanished, race conditions and deadlocks still exist, but at least you can concentrate on the logic instead of dealing with low-level details. You may have noticed that both producer and consumer are tail recursive, they invoke themselves before terminating and thus run in a loop with bounded storage consumption. This is similar but not equivalent to tail-recursion in procedural languages, where the position of the recursive call designates whether a process has constant storage requirements. In a concurrent logic language, a recursive call adds the current process to the process pool, possibly with different arguments, and the old one will vanish once all sub-goals have been processed. This example already demonstrates how having the building blocks for communication channels can be seen to be more powerful than having built-in channels themselves, as we can construct more elaborate devices, like the "short circuit" solving the problem of how to tell if N processes have all finished? /* FGHC */ -initialization(main). main :- one(ok, R), done(R). done(ok). one(L, R) :- two(L, R2), wait(R2, R). two(L, R) :- three(L, R2), wait(R2, R). three(L, R) :- wait(L, R). wait(R1, R) :- R = R1. Here we start 3 processes, each having two variables ("L"eft and "R"ight). We pass a value into the left for the first process, and once a process is done, it assigns the left side to the right. At the end of the chain we know that all processes have finished once the right side of the last process in the chain has been bound. In this example the processes do no real work, but in real code, the different processes could start arbitrary complex jobs. The Strand book [2] explains these and many other techniques using often fancyful analogies. The ease with which communication channels can thus be created allows for a straightforward decoupling of components. As list (stream) elements can be arbitrarily complex objects that are easily destructured by pattern matching, and can contain themselves unbound variables as a back-channel for communicating results, different parts of an application can exchange information, commands and results through a single stream: /* FGHC */ -initialization(main). logger([X|S]) :- log(X, Ok), logger(Ok, S). logger([], S) :- logger(S). filter([bogus(_)|S1], S) :- filter(S1, S). filter([_|S1], S) :- otherwise | S = [S1|S2], filter(S1, S2). main :- start_task(S), logger(S), filter(S, S2), consume_work(S2). The logger has been wedged between "start_task/1" and "consume_work/1" to log all communication, without the other processes being bothered having to know any of this. We use a second predicate to continue logging when the actual log operation has completed (signified by "Ok" being bound to "[]", a common idiom). "filter/2" on the other hand removes certain messages and passes all others on to an intermediate stream. Here we use the "otherwise/0" guard, which marks the second clause of "filter/2" to be applicable for matching only if all preceding clauses fail to match. "start_task/1" and "consume_work/1" are unaware of the operations that have been put in between or any communication that may be taking place between auxiliary processes. That way components are kept strictly separate, have a high degree of compositionality but few restrictions on how to communicate, if they require to do so. Streams are also a convenient API to functionality that requires sequential access: as the messages are ordered, their processing is, too. Streams can be merged as well, preserving the order of elements in each sub-stream: /* FGHC */ merge([], B, C) :- C = B. merge(A, [], C) :- C = A. merge([X|A], B, C) :- C = [X|C2], merge(A, B, C2). merge(A, [X|B], C) :- C = [X|C2], merge(A, B, C2). Here we take advantage of the non-deterministic nature of clause selection: some clauses match both, we intentionally don't care as long as one will finally be selected and continues the merge operation. As in Prolog, strings are lists of character codes which makes them automatically suitable for stream processing. Apart from avoiding the tedious duplication of list and string manipulation operations, this allows for efficient, asynchronous handling of streams of characters, so the extra space needed for lists is partially balanced by manipulating them in a lazy manner, one piece at a time, by communicating processes. Streams even allow a certain style of object-orientation. If we consider a stream as a message-handling entity, class hierarchies can be mirrored as hierarchies of message handlers: /* FGHC */ transport([move(Distance)|S], Position) :- update_position(Position, Distance, NewPosition), transport(S, NewPosition). transport([get_info(T)|S], Position) :- T = transport(Position), transport(S, Position). transport([_|S], Position) :- otherwise | transport(S, Position). car([refuel(Amount)|S], Transport, Fuel) :- fill_tank(Fuel, Amount, NewFuel), car(S, Transport, NewFuel). car([get_info(T)|S], Transport, Fuel) :- Transport = [get_info(T2)|Tr2], T = car(Fuel, T2), % delegate and intercept car(S, Tr2, Fuel). car([M|S], Transport, Fuel) :- Transport = [M|T2], % delegate to parent car(S, T2, Fuel). make_car(Car, Position) :- transport(T, Position), car(Car, T, 0). Note how we keep local state by passing it along in the tail-recursive message loop, updating it as needed, depending on messages. There is nothing special about this, we just use streams of messages between entities that perform message processing in a loop, we create processes and then plumb them together with streams. Note in the example above how we can pass a variable in a message ("get_info/1") to receive a reply, exposing internal state just as much as is desired and necessary. "Car" and "transport" could run concurrently on different processors, all synchronization is implicit and automatic. You may have noticed in the examples above that streams represent general, first-class communication channels. This is quite similar, or even a generalized variant of the "Actor" model[5]. Actors are independent processes that communicate solely by messages and by creating new actors. One difficulty may be the sharing of an actors identity among multiple "client" actors that want to share messages, but there are solutions like "ports"[12] or my using an intermediary "merger" stream. Resource Allocation How all these processes are mapped to hardware is something that depends on the implementation of the language and nothing has been said about this so far. Strand has the concept of "virtual machines", that is, a network of processors connected in some topology. In Strand, you explicitly spawn a process on a different processing node using the "@" notation, where goal(...)@ designates that the invocation of "goal" shall take place on the processor given by "". What "" stands for depends on the topology, so for a ring, "" may be "fwd" or "bwd", meaning the next or previous node. For a torus topology you would have "north", "west", etc. All nodes share the available code and, in a circular topology, spawning a process on the "next" node may wrap around. Here is some code that models a "mapreduce"-like computation over a number of processors in a ring topology: /* FGHC */ mapreduce([], _, L, R) :- R = L. mapreduce([X|List], Call, L, R) :- % Call = "foo(1, 2)" will result in a call to "foo(1, 2, X, L, L2)" apply(Call, [X, L, L2]), mapreduce(List, Call, L2, R)@fwd. "apply/2" creates a goal to call dynamically, where the list of arguments in the second position is appended to already existing arguments of the goal designated by "Call". Each invocation of the call takes the list element and some state and passes it on via the final argument to the next iteration of "mapreduce" which runs on the processor in forward direction from the current processor. The number of processors is fully transparent, it may be just one, or it may be many, the workload will be spread among the available processors. Communication by sending data from one processor to another is seamless and requires no further extra effort. Provided the implementation allows interfacing to native code or provides means to easily run native sub-processes, one might want to consider using these languages just as a means to coordinate parallel computations spread over multiple processors (or machines, if supported). PCN A perhaps more traditional way of writing concurrent logic programs is "PCN" (Program Composition Notation)[13, 14]. It shares a lot of syntax with C and makes interfacing to native functions written in C quite easy. Here is the producer/consumer program in PCN: /* PCN */ main() {|| producer(1, s), consumer(s) } producer(n, out) int n; {; {? n < 1000 -> {; out = [n|out2], /* Create a list cell */ n := n + 1, producer(n, out2) }, default -> out = [] /* terminate stream */ } } consumer(inp) {; {? inp ?= [n|inp2] -> /* Match and deconstruct the list cell */ {; writeln(n), consumer(inp2) } } } PCN programs consist of procedures ("program definitions") that represent one of three types of compositions: sequential ("{; ...}"), parallel ("{|| ...}") and choice ("{? ... }"). Compositions can be arbitrarily nested and execute the constituents of their body one after the other, all in parallel, in no particular order, or select one statement, depending on a "guard", which is a comparison or a pattern match ("?="). PCN supports the usual first-class, single-assignment variables of logic languages with the addition of mutable numeric variables that can be assigned multiple times using the assigment (":=") statement. Arrays of numerical type are provided as well, as is a very straightforward integration of C code. PCN makes a good choice for integrating C libraries into applications requiring complex concurrent behaviour. It also serves as an introductory step into the world of concurrent logic programming due to its familiar syntax. Conclusion So, where to start? How can I try out this magical stuff? As you might have guessed, this whole text is a bit of pitch for my own FGHC implementation [3], a compiler of FGHC, Strand, PCN (and a low level dialect called FLENG, on which it is based) to native code for several platforms. The project is relatively new, but not so experimental that one wouldn't want to give it a serious look. FLENG provides a few additional features for convenience, for example the "&/2" operator to ensure sequential execution of goals, tasks (controlled process trees), ports (identity-preserving streams) and the "idle/0" guard to execute code when the current processor is idle. FLENG takes advantage of native threads, has a basic facility for interprocess communication, a number of useful libraries and compiles to native code. The garbage collection scheme is based on reference counting and thus does not cause GC pauses. Convenient use of single-shot and periodic timers, handling of UNIX signals and execution of child processes is provided, which makes concurrent logic programming an elegant tool to implement games, servers, coordinate parallel workloads, or for infrastructure applications. An interface to allow writing primitives in C/C++ is available as well. Give it a try, or at least learn more about this paradigm. It is definitely worth the effort to have a look at a completely different way of thinking about software architecture for concurrent tasks. Information about these languages is not easy to find, the research seems to have stopped and what is available is buried in scientific papers and a small number of books. The Strand book [1] is an excellent introduction, Shapiro [9] and Tick [5] provide an overview over the different dialects and their development. Does concurrent and parallel programming have to suck? I think not, but we may have to try to look at it from a different angle. This is an attempt to revive thinking about a paradigm, due to some unfortunate events regarding the failure of the early "AI" hype, has been lost in time. -- [1] "Strand: new concepts in parallel programming" Ian Foster, Stephen Taylor https://dl.acm.org/doi/book/10.5555/100638 Also available online here: http://www.call-with-current-continuation.org/files/strand-book.pdf [2] KLIC https://web.archive.org/web/20100724120623/http://www.klic.org/software/klic/index.en.html [3] FLENG A FLENG and FGHC implementation compiling to native code http://www.call-with-current-continuation.org/fleng/fleng.html [4] https://en.wikipedia.org/wiki/Actor_model [5] "The Deevolution of Concurrent Logic Programming Languages" Evan Tick https://www.sciencedirect.com/science/article/pii/0743106694000388/pdf?md5=c34f030442a47297654dd824682780c0&pid=1-s2.0-0743106694000388-main.pdf&_valck=1 [6] "Guarded Horn Clauses" K. Ueda https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.44.4015&rep=rep1&type=pdf [7] Emulated Flat Concurrent Prolog https://www.nongnu.org/efcp/ [8] "Design of the Kernel Language for the Parallel Inference Machine" U. Kazunori et al., 1990 [9] "The Family of Concurrent Logic Programming Languages" E. Shapiro https://apps.dtic.mil/sti/pdfs/ADA213958.pdf [10] http://www.erlang.org [11] https://en.wikipedia.org/wiki/Fifth_generation_computing [12] "Ports of Objects in Concurrent Logic Languages" https://people.kth.se/~johanmon/papers/port.pdf [13] "Parallel programming with PCN" Ian Foster and Steven Tuecke https://digital.library.unt.edu/ark:/67531/metadc1310344/ [14] "A Primer for Program Composition Notation." Chandy, K. Mani and Taylor, Stephen (1990), http://resolver.caltech.edu/CaltechCSTR:1990.cs-tr-90-10