Processes¶
You should interpret words and phrases that appear fully capitalized in this document as described in RFC 2119. Here is a brief summary of the RFC:
“MUST” indicates absolute requirements. Vivarium may not work correctly if you don’t follow these.
“SHOULD” indicates strong suggestions. You might have a valid reason for deviating from them, but be careful that you understand the ramifications.
“MAY” indicates truly optional features that you can include or exclude as you wish.
Models in Vivarium are built by combining processes, each of
which models a mechanism in the system being studied. These processes
can be combined in a composite to build more complicated
models. Process models are defined in classes that inherit from
vivarium.core.process.Process
, and these process
classes can be instantiated to create individual processes. During
instantiation, the process class may accept configuration options.
Note
Processes are the foundational building blocks of models in Vivarium, and they should be as simple to define and compose as possible.
Process Interface Protocol¶
Each process class MUST implement the application programming interface (API) that we describe below.
Class Variables¶
Each process class SHOULD define default configurations in a
defaults
class variable. The constructor SHOULD read these defaults.
For example:
class MyProcess:
defaults = {
'growth_rate': 0.0006,
}
Constructor¶
The constructor of a process class MUST accept as its first positional argument an optional dictionary of configurations. If the process class is configurable, it SHOULD accept configuration options through this dictionary.
In the constructor, the process class MUST call its superclass constructor with a dictionary of parameters.
Passing Parameters to Superclass Constructor¶
The dictionary of parameters SHOULD include any configuration options
not used by the process class. Any information needed by the process
class MAY also be included in these parameters. Once the object has
been instantiated, these parameters are available as
self.parameters
, where they have been stored by the
vivarium.core.process.Process
constructor.
Example Constructor¶
Let’s examine an example constructor from a growth process class.
def __init__(self, initial_parameters=None):
if initial_parameters == None:
initial_parameters = {}
parameters = {'growth_rate': self.defaults['growth_rate']}
parameters.update(initial_parameters)
super().__init__(parameters)
Note that Vivarium Core actually handles combining the provided parameters with the default parameters, so a constructor as simple as the one above can actually be dropped. The superclass constructor makes it redundant, but we show it here for clarity.
Warning
Python creates only one instance of both class variables
and function argument defaults. This means that you MUST not change
the default parameters object. Make a copy instead. This also means
that you SHOULD avoid using a mutable object as a default argument.
This is why we use None
as the default for initial_parameters
instead of {}
.
While the default growth rate is 0.0006
, this can be overridden by
including a growth_rate
key in the configuration dictionary passed
to initial_parameters
.
These special parameters get handled by the superclass constructor:
name
: The value of thename
parameter gets assigned to the process’sname
attribute (e.g.my_process.name
). If no name is specified in the parameters or as a class variable, we useself.__class__.__name__
as the name.time_step
: If not specified, thetime_step
parameter is set to 1. This parameter determines how frequently the simulation engine runs this process’snext_update
function._condition
: The value of this parameter should be a path in thestates
dictionary passed tonext_update()
to a variable. The variable should hold a boolean specifying whether the process’snext_update
function should run.
Ports Schema¶
Each process declares what stores it expects by specifying a port for each store it accepts. Note that if two processes are to be combined in a model and share variables through a shared store, the processes MUST use the same variable names for the shared variables.
The process class MUST implement a ports_schema
method with no
required arguments. This method MUST return nested dictionaries of the
following form:
{
'port_name': {
'variable_name': {
'schema_key': 'schema_value',
...
},
...
},
...
}
Schema keys¶
schema_key
MUST be a schema key and have an appropriate
value. Any applicable and omitted schema keys will take on their default
values. Note that every variable SHOULD specify _default
. If the
cell will be dividing, every variable also MUST specify _divider
.
Variables in the ports schema SHOULD NOT specify _value
.
Available schema keys include:
_default
: The default value of the state variable if no initial value is provided. This also sets the data type of the variable, including units._updater
: How to apply state variable updates. Available updaters are listed in below_divider
: How to divide the state variable’s values between daughter cells. Available dividers are listed below._emit
: A Boolean value that sets whether to log this variable to the simulation database for later analysis._properties
: User-defined properties such as molecular weight. These can be used for calculating variables such as total system mass.
Updaters¶
Updaters are methods by which an update from a process is applied to a variable’s value.
Updaters provided by vivarium-core include:
accumulate
: The default updater. Add the update value to the current value.set
: The update value becomes the new current value.merge
: Update an existing dictionary with new values, and add any newly declared keys.null
: Do not apply the update.nonnegative_accumulate
: Add the update value to the current value, and set to 0 if the result is negative.dict_value
: translates_add
and_delete
-style updates to operations on a dictionary.
New updaters can be easily defined and passed into a port schema:
# updater that returns a random value
def random_updater(current_value, update_value):
return random.random()
def port_schema(self):
ports = {
'port1': {
'variable1': {
'_default': 1.0
'_updater': {
'updater': random_updater
}
}
}
}
return ports
Dividers¶
Dividers are methods by which a variable’s value is divided when division is triggered.
Dividers available in vivarium-core include:
set
: The default divider. Daughters get the same value as the mother.binomial
: Sample the first daughter’s value from a binomial distribution of the mother’s value, and the second daughter gets the remainder.split
: Divide the mother’s value in two. Odd integers will make one daughter receive 1 more than the other daughter.split_dict
: Splits a dictionary of{key: value}
pairs, with each daughter receiving a dictionary with the same keys, but with each value split.zero
: Daughter values are both set to 0.no_divide
: Asserts that this value should not be divided.
New dividers can be easily defined and passed into a port schema:
# divider that returns a random value for each daughter
def random_divider(mother_value, state):
return [
random.random(),
random.random()]
def port_schema(self):
ports = {
'port1': {
'variable1': {
'_default': 1.0
'_divider': {
'divider': random_divider
}
}
}
}
return ports
Example Ports Schema¶
def ports_schema(self):
return {
'global': {
'mass': {
'_emit': True,
'_default': 1339 * units.fg,
'_updater': 'set',
'_divider': 'split'},
'volume': {
'_updater': 'set',
'_divider': 'split'},
'divide': {
'_default': False,
'_updater': 'set'
}
}
}
Here we specify that only mass
should be emitted. We assign a
default value of 1339 fg to mass
, and we declare that the mass
and volume
variables should be split in half on division. Further,
we specify that all the three variables should have their updates set,
not accumulated.
Views¶
When the process is asked to provide an update to the model state, it is only provided the variables it specifies. For example, it might get a model state like this:
{
'global': {
'mass': 1339 <Unit('femtogram')>,
'volume': 1.2,
'divide': False,
},
}
This would happen even if the store linked to the global
port
contained more variables. We call this stripping-out of variables the
process doesn’t need masking.
Advanced Ports Schema¶
Use the glob *
schema to declare expected sub-store structure,
and view all child values of the store:
schema = {
'port1': {
'*': {
'_default': 1.0
}
}
}
Use the glob **
schema to connect to an entire sub-branch, including
child nodes, grandchild nodes, etc:
schema = {
'port1': '**'
}
Ports flagged as output-only won’t be viewed through the next_update’s states, which can save some overhead time:
schema = {
'port1': {
'_output': True,
'A': {'_default': 1.0},
}
}
Next Updates¶
Each process class MUST implement a next_update
method that accepts
two positional arguments: the timestep and the current state of
the model. The timestep describes, in units of seconds, the length of
time for which the update should be computed.
State Format¶
The next_update
method MUST accept the simulation state as a
dictionary of the same form as the ports schema dictionary, but with the dictionary of schema keys
replaced with the current (i.e. pre-update) value of the variable.
Note
In the code, you may see the simulation state referred to as
states
. This is left over from when stores were called states,
and so the simulation state was a collection of these states. As you
may already notice, this naming was confusing, which is why we now
use the name “stores.”
Because of masking, each port will contain only the variables specified in the ports schema, even if the linked store contains more variables.
Warning
The next_update
method MUST NOT modify the states it is
passed in any way. The state’s variables are not copied before they
are passed to next_update
, so changes to any objects in the state
will affect the simulation state before the update is applied.
Update Format¶
next_update
MUST return a single dictionary, the update that
describes how the modeled mechanism would change the simulation state
over the specified time. The update dictionary MUST be of the same form
as the ports schema dictionary, though
with the dictionaries of schema keys replaced with update values. Also,
variables that do not need to be updated can be excluded.
Example Next Update Method¶
Here is an example next_update
method for our growth process:
def next_update(self, timestep, states):
mass = states['global']['mass']
new_mass = mass * np.exp(self.parameters['growth_rate'] * timestep)
return {'global': {'mass': new_mass}}
Recall from our example schema that we use
the set
updater for the mass
variable. Thus, we compute the new
mass of the cell and include it in our update. Notice that we access the
growth rate specified in the constructor by using the
self.parameters
attribute.
Note
Notice that this function works regardless of what timestep we use. This is important because different simulations may need different timesteps based on what they are modeling.
Process Class Examples¶
Many of our process classes have examples in the form of test functions at the bottom. These are great resources if you are trying to figure out how to use a process.
If you are writing your own process, please include these examples!
Also, executing the process class Python file should execute one of
these examples and save the output as demonstrated in
vivarium.processes.glucose_phosphorylation
. Lastly, any
top-level functions you include that are prefixed with test_
will be
executed by pytest
. Please add these tests to help future developers
make sure they haven’t broken your process!
Steps¶
Processes have one major drawback: you cannot specify when or in what order they run. Processes can request timesteps, but the Vivarium engine may not honor that request. This behavior can be problematic when you have operations that need to run in a particular order. For example, imagine that you want to model transcription and chromosome replication in a bacterium. It seems natural to have a transcription process and another replication process, but then how do you handle collisions between the replisome and the RNA Polymerase (RNAP)? You might want to say something like “If a replisome and RNAP collide, remove the RNAP from the chromosome.” To support this kind of statement, you can create a step.
vivarium.core.process.Step
is a subclass of
vivarium.core.process.Process
that is not time-dependent.
Steps run before the first timestep and after the dynamic processes
during simulation. They run according to a dependency graph called a
flow (like a workflow) – see our guide to flows. These can serve many different roles, including
translating states between different modeling formats, implementing lift
or restriction operators to translate states between scales, and as
auxiliary processes that offload complexity. As an example of offloading
complexity, a step might recalculate concentrations after counts have
been updated.
To create a step, you follow the same steps as you would to create a
process except that your class should inherit from
vivarium.core.process.Step
. For example, we could create a
replisome-RNAP collision reconciler like this:
class CollisionReconciler(Step):
def ports_schema(self):
return {
'replisomes': {
'*': {
'position': {'_default': 0},
},
},
'RNAPs': {
'*': {
'position': {'_default': 0},
},
},
}
def next_update(self, timestep, states):
# We can ignore the timestep since it will always be 0.
replisome_positions
replisome['position']
for replisome in states['replisomes'].values()
])
rnap_positions = np.array([
rnap['position']
for rnap in states['RNAPs'].values()
])
# Assume that our timestep is small enough that we can
# ignore RNAPs and replisomes that move past each other
# (instead of to the same position) in one timestep.
collision_mask = replisome_positions == rnap_positions
rnap_keys = np.array(list(states['RNAPs'].keys()))
to_remove = rnap_keys[collision_mask]
return {
'RNAPs': {
'_delete': to_remove.tolist(),
},
}
Note
Steps are always given a timestep of 0 by the simulation engine.
Step Implementation Details¶
Steps are technically identified by whether their
vivarium.core.process.Process.is_step()
methods return
True
. This means that you can make a process that determines whether
it should be a Step based on its configuration. Note however that we do
not support changing whether a process is a step mid-simulation.
Advanced Features¶
Adaptive Timesteps¶
You can set process timesteps for the duration of a simulation using the
time_step
parameter, but you can also override the
vivarium.core.process.Process.calculate_timestep()
method to
compute timesteps dynamically based on the same view into the simulation
state that next_update()
sees.
Conditional Updates¶
Sometimes you might want the simulation engine to skip a process when
generating updates. You can implement this by overriding
vivarium.core.process.Process.update_condition()
to return
False
whenever you don’t want the process to run. This method takes
as a parameter the same view into the simulation state that
next_update()
sees.
Using Process Objects¶
Your use of process objects will likely be limited to instantiating them and passing them to other functions in Vivarium that handle running the simulation. Still, you may find that in some instances, using process objects directly is helpful. For example, for simple processes, the clearest way to write a test may be to run your own simulation loop.
Simulating a process can be sketched by the following pseudocode:
# Create the process
configuration = {...}
process = ProcessClass(configuration)
# Get the initial state from the process's schema
# This means the stores and ports are the same
state = {}
schema = process.ports_schema()
for port, port_dict in schema.items():
for variable, variable_schema in port_dict.items():
state[port][variable] = variable_schema["_default"]
# Run the simulation in a loop for 10 seconds
time = 0
while time < 10:
# We are using a timestep of 1 second
update = process.next_update(1, state)
# This is a simplified way to apply the update that assumes all
# all variables are numbers and all updaters are "accumulate"
for port in update:
for variable_name, value in port.items():
state[port][variable_name] += value
# Now that the loop is finished, the predicted state after 10
# seconds is in "state"
The above pseudocode is simplified, and for all but the most simple processes you will be better off using Vivarium’s built-in simulation capabilities. We hope though that this helps you understand how processes are simulated and the purpose of the API we defined.
Parallel Processing¶
Process Commands¶
When a process is run in parallel, we can’t interact with it in the normal Python way. Instead, we can only exchange messages with it through a pipe. Vivarium structures these exchanges using process commands.
Vivarium provides some built-in commands, which are documented in
vivarium.core.process.Process.send_command()
. Also see that
method’s documentation for instructions on how to add support for your
own commands.
Process commands are designed to be used asynchronously, so to retrieve
the result of running a command, you need to call
vivarium.core.process.Process.get_command_result()
. As a
convenience, you can also call
vivarium.core.process.Process.run_command()
to send a command
and get its result as a return value in one function call.
Running Processes in Parallel¶
In normal situations though, you shouldn’t have to worry about process
commands. Instead, just pass '_parallel': True
in a process’s
configuration dictionary, and the Vivarium Engine will handle the
parallelization for you. Just remember that parallelization requires
that processes be serialized and deserialized at the start of the
simulation, and this serialization only preserves the process
parameters. This means that if you instantiate a process and then change
its instance variables, those changes won’t be preserved when the
process gets parallelized.