JRA2 – Demonstrator of a Photon Science Analysis Service (DaaS)
Light sources as generators of big volumes of complex scientific data and their users need assistance in analysing the scientific data.
The aim of this Joint Research Activity work package is to build up demonstrators for remote data analysis for a small number of archetypal experiments. The demonstrators
- will build on the HPC platforms of each participating institute,
- will be cloud based in those institutes where cloud technology is deployed. In the other institutes the demonstrator will run on standard HPC hardware, and
- a web portal will ensure users to have a common user experience.
Deployment of complex data analysis frameworks and tool-chains is a common task at research facilities and frequently a major hurdle for scientists, hampering rapid data analysis and publications. Simple assembly of integrated and deployable applications would both reduce the Research Infrastructure (RI) efforts as well as accelerate the scientific process.
Applications will be implemented as deployable packages, as pre-configured virtual machines or as containers. Virtual machines or containers provide encapsulated user environments, which can be archived together with the experimental data, thereby capturing valuable provenance data and strongly supporting reproducibility of the original experiment and data analysis workflows.
For remote data analysis, particularly important for experiments with very high data volumes, the system will provide access through a user portal implementing a standard authentication system (building on the UMBRELLA system realised through the PaNdata and CRISP FP7 projects) for secure access. The Umbrella attribute authority will be extended to allow users to add required attributes in a self-service way. Users will log into the sites and seamlessly access the compute and storage resources.
The sites will solve use cases for both industrial and non-industrial experiments. Testbeds will be developed for solving the issues users face in exporting large data sets and obtaining access to the necessary CPU resources and appropriate software. How cloud infrastructures and/or supercomputing centers can be exploited for these use cases will be studied. DESY and ESRF are official “end users” in the HNScienceCloud project contributing photon science specific use case, thereby validating the compatibility of generic cloud services and specific application framework deployment. Profiling of applications as part of the provenance data furthermore permits to provide users with cost estimates and means for users to select cloud costs efficiently, thereby lowering the barriers to use of Research Infrastructures for researchers across Europe. JRA2 will work together with NA3 on industrial innovation.
The Joint Reserach Activity JRA2 will be strong linked to a number of other work packages within CALIPSOplus:
- WP2 – NA1 – User tools for access and data management
- WP4 – N3 – European Light Sources for Industrial Innovation plus (ELSIIplus) working together on industrial innovation
- WP5 – NA4 – Striving for Sustainability of Photon Science in Europe especially ESUO to get feedback from different user communities
- Atherton, C.J. et al, Federated Identity Management for Research Collaborations (doi: 10.5281/zenodo.1296031)
|D#||Deliverable name||Task||Planned delivery date|
|D24.1||Report on kick-off meeting workshop for the CALIPSOplus partners to present their needs for remote data analysis||M3|
|D24.2||Blueprint on implementing a platform and manuals for the implementation at the different sites||M18|
|D24.3||Cross site use case requirements report including comparison of existing solutions||M12|
|D24.4||Software packages for the selected experiment use cases ready to install and run||M18|
|D24.5||Report on test and deployment of mini demonstrator on at least six sites||M24|
|D24.6||White paper on sustainability of HHScienceCloud and European Open Science Cloud for synchrotron and FEL applications||M36|
|D24.7||Organisation of a workshop to present the results of the DaaS demonstrator and obtain feedback from users on how this approach fits the current needs. The output of the workshop will be a white paper on how the remote data analysis needs of the photon community can be best served using modern computing paradigms||M36|