Overview
Inter-process communication has become
a very wide-spread pattern in many modern applications. The nice
sandboxing that a multi-process architecture provides is a major
contributor to its popularity.
Google Chrome for instance spawns a
process for every tab the user opens and there are more system
processes running also. It is obvious that a fast IPC system is a
must.
After a some research on different transports for an IPC system the best candidates were:
After a some research on different transports for an IPC system the best candidates were:
- shared memory
- pipes
- TCP
After some experimentation on Windows, it was clear that a good shared
memory implementation outperformed the other ones (not difficult to guess) but
we found out that TCP actually has tremendous throughput when it
works on loopback and the data packets are reasonably big. Message pipes are very convenient but the need for a cross-platform solution meant that we'd need to accommodate both Linux and Win pipes in our system.
In this post I'd like to share
an implementation based on shared memory that we use in a commercial project where the messages themselves are very numerous
but small and the overhead per-message must be minimal.
I'll explain the interfaces, algorithm and choices we made using a
mix of C++ and pseudo-code. A shared memory transport is more difficult to use but provides us with
more control and speed and although there are differences in it's
behavior between OSs they are very easy to deal with.
I gathered the following requirements
for the queue that would be the backbone of the IPC system:
- single producer - single consumer
- lock-free
- variable-sized messages
- minimal number of allocations
- stream-like message queuing
The system is based on a message queue
that is SPSC, so to actually have a two-way communication channel we
simply use two queues for which the producer and the consumer are
exchanged in the
two processes.
Of all those requirements, I have to
say that the "variable-sized messages" and the "stream-like
message queuing" are the ones that make the implementation more
complicated. By "stream-like message queuing" I mean this: the
user that queues a message does not know the size of the messages
when she begins queuing it. An obvious solution would be for the user
to just compose the message in a buffer and then send the
whole buffer - the size would be known at the en-queue moment but
would conflict with the "minimal number of allocations"
requirement. We need a way to
let the user write directly IN the
queue and expand the nodes when necessary.
The relevant part of the public
interface for the producer looks like this:
The user requests a Node with some
size, she writes directly in the void* returned by the queue, if she
realizes that there is need for more memory, calls EnqueueMore with
the new size and
gets again memory directly in the
queue. The previously written-on memory gets transferred so the user
can continue safely adding data. When done, CommitNode() makes the
message available to
the consumer.
The Consumer interface has just one
method:
The lock-free implementation of the queue is based on a well-known algorithm(it is for multi-threading but applies to inter-process too):
We never allow for the producer and the consumer to reach each other by having a dummy divider node. The producer is in charge of freeing all consumed nodes so minimal sharing of concurrent
variables is needed. For a detailed
description of the algorithm take a look at Marginean's "Lock-FreeQueues". His
algorithm is correct, but the implementation is flawed and not
thread-safe as noted and explained by Sutter in a follow-up "WritingLock-Free Code: A Corrected Queue". Our queue
is a correct one and although at the
time of writing it I was unaware of Sutter's article, the resulting
implementation is very close to his.
The Implementation is divided in 3
classes: SMDataQueueBase, SMDataQueueProducer, SMDataQueueConsumer.
The base class provides all the shared bookkeeping structures and
methods. The producer and the consumer classes expose just
the methods highlighted before.
The process of en-queuing works like
this:
As you might have noticed, head always
points to a Node that has already been consumed (or is invalid on
start). When the queue is empty head == tail but there always is a
'guard node' in the structure. In this way the producer and the
consumer are always responsible only for their respective pointers
and in fact the only point where they touch is the check head !=
tail. It is very important that operator= and operator!= are mutually atomic - never should operator!= return true if the object is in mid-copy and thus invalid.
In the next posts I'll cover the memory
management part of the implementation.
Update: Building a shared memory IPC implementation - Part II
Update: Building a shared memory IPC implementation - Part II
No comments:
Post a Comment