From dekamps at comp.leeds.ac.uk Tue Aug 19 18:37:17 2008 From: dekamps at comp.leeds.ac.uk (Marc de Kamps) Date: Tue, 19 Aug 2008 17:37:17 +0100 Subject: [music-rfc] Comments Message-ID: Dear Musicians, Find our comments below. If there are questions or comments, I'm happy to discuss them in Stockholm in September. James Watson has delivered substantial input and changed my outlook on interoperability radically. Best wishes, Marc de Kamps ---------------------------------------------------------------------------- ------------------------------------- Some comments on MUSIC Marc de Kamps and James Watson {dekamps, jwatson}@comp.leeds.ac.uk School of Computing University of Leeds First of all, it is highly encouraging to see the INCF take on a leading role in creating software standards for the community. The amount of code replication in the neurosciences has been staggering, and to relieve the research community from a part of the burden that comes with developing and maintaining software standards and infrastructure is an important step forward. In particular, increasing the interoperability of existing simulators is a critical step for improving communication and collaboration, and for reducing duplication of effort. We applaud the MUSIC project for instigating efforts towards this goal. While acknowledging MUSIC as a well-thought-out framework for interfacing between a specific set of simulations, we have three concerns with the proposal as a standard, arising from its limited ability to scale both with new modeling techniques and technologies, and to significantly larger models that are likely to be required in the future. 1. Standardization of technology, as opposed to data formats, has the potential to insulate the field from technical advances achieved outside of neuroscience, and limit the unforeseen application of existing models and their output. Dynamic visualizations, integration of aggregate MUSIC output into higher-scale models, etc., highlight the need for data format standards, as opposed to modeling method standards. In addition, non-MPI distributed computing solutions exist, such as Condor and the GRID, which have solved many non-trivial technical issues in large-scale computing. Such alternative techniques are likely to be more appropriate in certain neuroscience modeling scenarios, and the field would benefit from the integration of these and MPI-based simulations. 2. Sending massive amounts of data between processes, as is the stated goal of MUSIC (music-rfc pp. 3) suggests either that an application is limited to a few specific modeling scenarios, or that scaling problems will occur as the number of communicating components increase. Existing distributed computing systems generally try to minimize communication due to the existing disparity between computation and communication limitations. Operating in the regime of massive data transfer, one must be able to recover from bandwith and latency problems or the solution will not scale well. MUSIC's proposed model of synchronisation may run into trouble if some processes miss ticks. 3. Data collection and analysis. Suppose that MUSIC is used for running several simulators in parallel. How are the data from these simulators then collated? The waveconsumer/waveproducer example suggests that there must be a single process which is responsible for collating the heterogeneous simulation data and storing them in a single location. If not, the user is faced with the formidable task of collecting the data produced by different simulators. But to have an application that collates data from other applications requires knowledge of the simulators involved. And for analysis, effective use of MUSIC requires the user to know about the data format for every combination of simulators that are used. This is only workable if the data are relatively simple, such as, for example, spike times. It is very likely that for modelling the brain we need a complex hierarchy of data formats. To predict EEG or fMRI signals, we are not concerned with spikes of individual neurons, but with averaged responses over brain areas. We may be interested in local field potentials and haemodynamic models, rather than with spiking neurons. A simulation may consist of a hierarchy of simulation tools, each with their own data format. In this case the collection and interpretation of the data produced by the individual simulators becomes a big problem that as far as we are able to judge, must currently be solved by the user. Despite these concerns, we think that MUSIC is a practical, timely solution that leverages existing MPI based neural simulators. We are highly encouraged by the fact that the INCF is willing to take on responsibility for tools that are important to the field, and is willing to take some of the burden of tool development away from individual research groups. Given this commitment, it is good to think beyond such immediate technical problems. Experience in other disciplines, such as bioinformatics, has shown that in order to produce scalable solutions, it is important not to overly commit to a single technology. Specifically, we suggest: a) the development of existing data-format efforts such as NeuroML and CellML which are essential for flexible, evolvable multi-simulation communication, and where necessary to extend them to support higher levels of simulations (populations, network hierarchies, etc.) b) the integration of scalable simulations running on more general platforms such as the GRID, and web services, which would be enabled by the data standardization efforts of (a). People prefer to use technologies that they are familiar with. So research communities build up expertise in application domains. Well defined interfaces, data standards, etc. allow people who are specialised in different application domains to work together, rather than force communities to adapt new technologies. The real effort in boot-strapping standards in a community comes from attaining a critical mass of projects that use the standard. Development of documentation, tutorials, examples, and tools which leverage the standard all contribute to this. The good thing about MUSIC is that it provides the community with a working prototype that can serve as a basis for further work on interoperability. We hope that the example of the MUSIC standard for MPI-based models provides the impetus to the community to commit resources towards developing and promoting more general standards.