Packaging and containerization of computational methods
Mohammed Alser, Brendan Lawlor, Richard J. Abdill, Sharon Waymost, Ram Ayyala, Neha Rajkumar, Nathan LaPierre, Jaqueline Brito, André M. Ribeiro-dos-Santos, Nour Almadhoun, Varuni Sarwal, Can Firtina, Tomasz Osinski, Eleazar Eskin, Qiyang Hu, Derek Strong, Byoung-Do (B.D) Kim, Malak S. Abedalthagafi, Onur Mutlu, Serghei Mangul





















Extended
Extended Data Fig. 1 General overview of the installation process for omics software tools.
When biomedical researchers need to use omics software tools and reproduce reported results, they first need to locate the version number and web address of each omics tool using the published research paper and supplementary information. They can then download the omics tools, determine each tool’s dependencies using information provided by the tool’s developers, and try to install the tools on their personal computer or HPC cluster. If the tool is successfully installed, the researchers download the relevant omics data and apply the tools as needed. However, even when the hardware resource requirements (CPU type, memory capacity, and storage) of each tool are met, some omics tools are likely to fail when exact reproduction is attempted because of installation challenges.
Extended Data Fig. 2 Development timeline and brief description of popular (a) package managers and (b) containers.
Each tool is described with key information regarding its functionality, purpose, and supported operating system. In addition to the surveyed package managers and containers, the first package manager, PMS, and the first container, FreeBSD Jail, are shown.
Extended Data Fig. 3 Standard workflow for installing software with a package manager.
The user, usually an administrator, asks the package manager to install a specific piece of software. If the software is not already installed, the package manager fetches the appropriate package from a repository. If any of the dependencies are not already installed, the package manager retrieves the dependency’s package from the repository and starts the installation procedure for that package. Once all the dependencies are installed, the initially requested software is installed. The package manager often goes through several iterations of this process, because every dependency can have its own list of dependencies, in which case each of the dependency’s dependencies must be verified through the same process.
Extended Data Fig. 4 Standard workflow for running software with containerization.
The user asks to install a specific container image. If the container image is available locally, then the user can run it directly through the container engine. Potential dependencies are already handled without any intervention from the user. If the software image is not available locally, the appropriate image must be fetched from a repository.