[music-rfc] Comments

Tue Sep 9 00:18:33 CEST 2008

2008/8/19 Marc de Kamps <dekamps at comp.leeds.ac.uk>:
> Some comments on MUSIC

Dear Mark and James,

Many thanks for your comments on the RFC.  Below follows some
reflections on the different issues.

> First of all, it is highly encouraging to see the INCF take on a leading
> role in creating software standards for the community.

Yes, it's good that they are doing that.  I hope they will continue to
pursue that.

> 1. Standardization of technology, as opposed to data formats, has the
> potential to insulate the field from technical advances achieved outside of
> neuroscience, and limit the unforeseen application of existing models and
> their output.    Dynamic visualizations, integration of aggregate MUSIC
> output into higher-scale models, etc., highlight the need for data format
> standards, as opposed to modeling method standards.  In addition, non-MPI
> distributed computing solutions exist, such as Condor and the GRID, which
> have solved many non-trivial technical issues in large-scale computing.
> Such alternative techniques are likely to be more appropriate in certain
> neuroscience modeling scenarios, and the field would benefit from the
> integration of these and MPI-based simulations.

MUSIC is not intended as *the* simulator interface.  We have tried to
design it in a way that it has a minimal impact on the application
which supports it.  It should be possible to support other interface
standards.  It should even be possible to have such support
simultaneously within the same running application.  For example, a
MUSIC-based simulation could be dispatched on a GRID.

Also, there is always a need of balance between stability (which a
standard gives) and the need to explore different solutions.  I think
we should see the MUSIC 1.0 API just as the MUSIC 1.0 API---a
framework intended for applications using MPI.  Let's get this rolling
and let's then evaluate if it's useful and in what directions we want
to take development.  Let's also keep an open and accepting attitude
towards other technologies.

> 2. Sending massive amounts of data between processes, as is the stated goal
> of MUSIC (music-rfc pp. 3) suggests either that an application is limited to
> a few specific modeling scenarios, or that scaling problems will occur as
> the number of communicating components increase. Existing distributed
> computing systems generally try to minimize communication due to the
> existing disparity between computation and communication limitations.

I think there might be some misunderstanding here.  Page 3 of the
report does not mention the communication of massive amounts of data
between processes, but between *applications*.  Of course we imagine a
situation with parallelized tools which scale well!  But they might
still want to communicate large amounts of data at high bandwidth.
This is consistent with individual processes communicating at low
bandwidth.

> Operating in the regime of massive data transfer, one must be able to
> recover from bandwith and latency problems or the solution will not scale
> well. MUSIC's proposed model of synchronisation may run into trouble if some
> processes miss ticks.

We should talk about this at the INCF conference, but MUSIC is
intended for a scenario when one can expect processes not to miss
ticks.  If they miss ticks, they might also miss computations, and
then we won't have correct results.  Maybe I'm getting you all wrong
here.  If the concern is with large variability of computation time
between different tick intervals, note that MUSIC allows processes to
run asynchronously in the sense that the simulation clocks of
different MPI processes need not to be synchronized at each tick.  In
fact, it may happen that only a subset or even none of the processes
communicate at a given tick.

> 3. Data collection and analysis. Suppose that MUSIC is used for running
> several simulators in parallel. How are the data from these simulators then
> collated? The waveconsumer/waveproducer example suggests that there must be
> a single process which is responsible for collating the heterogeneous
> simulation data and storing them in a single location. If not, the user is
> faced with the formidable task of collecting the data produced by different
> simulators. But to have an application that collates data from other
> applications requires knowledge of the simulators involved. And for
> analysis, effective use of MUSIC requires the user to know about the data
> format for every combination of simulators that are used. This is only
> workable if the data are relatively simple, such as, for example, spike
> times. It is very likely that for modelling the brain we need a complex
> hierarchy of data formats. To predict EEG or fMRI signals, we are not
> concerned with spikes of individual neurons, but with averaged responses
> over brain areas. We may be interested in local field potentials and
> haemodynamic models, rather than with spiking neurons. A simulation may
> consist of a hierarchy of simulation tools, each with their own data format.
> In this case the collection and interpretation of the data produced by the
> individual simulators becomes a big problem that as far as we are able to
> judge, must currently be solved by the user.

I think it is outside of MUSIC:s scope to manage the collection of
simulation data.  Communicating data to a single node in a cluster
easily gives scaling problems.  A better idea would be if some other
API standard + library provides a way for different simulators to
store simulation data in a distributed manner.  HDF5 comes to mind.
MUSIC is only intended to provide communication between applications
in a cluster.

It is true, though, that data formats is a concern.  In order to be
pragmatic and get MUSIC running reasonably soon, we have made the
double compromise to 1. provide only a couple of standardized data
formats (spike events and continuous data) and 2. provide the generic,
non-standardized, message protocol to allow for flexibility and the
development of usage conventions in the community.  Very probably, a
future MUSIC API has to be more precise about data formats.  Note,
though, the possibility to develop MUSIC usage conventions which
attach specific interpretations to the continuous data arrays
communicated by MUSIC.

> a)  the development of existing data-format efforts such as NeuroML and
> CellML which are essential for flexible, evolvable multi-simulation
> communication, and where necessary to extend them to support higher levels
> of simulations (populations, network hierarchies, etc.)

Agreed, and, just to be clear, this is outside the scope of MUSIC.

> b) the integration of scalable simulations running on more general platforms
> such as the GRID, and web services, which would be enabled by the data
> standardization efforts of (a).

Ditto.

> The real effort in boot-strapping standards in a community comes from
> attaining a critical mass of projects that use the standard.  Development of
> documentation, tutorials, examples, and tools which leverage the standard
> all contribute to this. The good thing about MUSIC is that it provides the
> community with a working prototype that can serve as a basis  for further
> work on interoperability. We hope that the example of the MUSIC standard for
> MPI-based models provides the impetus to the community to commit resources
> towards developing and promoting more general standards.

We hope so too.  We aim to improve the MUSIC documentation and have
ongoing collaboration with the NEST and Moose teams with the aim to
add MUSIC support to these simulators.  At some point, I think it
would be very important to add a MUSIC interface to Neuron.
Discussions with Dr. Hines indicate that this shouldn't be a big
problem.  Dr. Hines is willing to give technical support to anyone
undertaking such an effort.

Best regards,
Mikael D.