The Pipeline of Processing fMRI data with Python Based on the Ecosystem NeuroDebian

In the neuroscience research field, specific for medical imaging analysis, how to mining more latent medical information from big medical data is significant for us to find the solution of diseases. In this review, we focus on neuroimaging data that is functional Magnetic Resonance (fMRI) which non-invasive techniques, it already becomes popular tools in the clinical neuroscience and functional cognitive science research. After we get fMRI data, we actually have various software and computer programming that including open source and commercial, it's very hard to choose the best software to analyze data. What's worse, it would cause final result imbalance and unstable when we combine more than software together, so that's why we want to make a pipeline to analyze data. On the other hand, with the growing of machine learning, Python has already become one of very hot and popular computer programming. In addition, it is an open source and dynamic computer programming, the communities, libraries and contributors fast increase in the recent year. Through this review, we hope that can make neuroimaging data analysis more easy, stable and uniform base the one platform system.


Introduction:
High-Field structural and functional MRI (Magnetic Resonance Imaging), this technique can non-invasively detect brain signal and has substantially high spatial resolution compared with EEG(Electroencephalography, MEG(magnetoencephalogram), etc.However, neuroimaging is a complex field that explores inter-disciplinary studies including physics, engineering, biological science, clinical medicine, physiology, statistics, and much more.The pure (f)MRI data only provide limited information and feedback about the brain, so data mining is a necessary and significant step for us to get more quantitative and broad functional information.When I started to analysis my first batch of (f)MRI data, there existed a lot of software and scripts on the Internet.
Unfortunately, First, the source code was extremely messed.Second, the script was written by a variety of programming language, and it hardly connects all code together.Third, according to prior research, if you set up different parameters in the same software, it would affect the final results and make them unstable.Even worse, it's hard for us to combine all kind of software or scripts together, then batch processing (f)MRI data (Eklund et al. 2015;Pauli et al. 2016;Della-Maggiore et al. 2002).Fourth, the main computer programming is Matlab for the (f)MRI data analysis, but with the boost of machine learning and deep learning, Python is gradually beyond Matlab.In addition, Python is a totally open-source computer programming language, so compared to Matlab, it can accommodate the huge community contributes to the Python.What's more, Python has already formed a complete software and powerful community in the (f)MRI data analysis.Finally, compared to other computer programming languages, Python script is an interpretable and dynamic programming language, the source code is more simple and understandable (Fig. 1).
Fig. 1 Python compared with other main popular computer language used in data science.The main analytic computer languages for data analysis in the neuroscience and neuroimaging field also fit this trend.The image is adapted from Business Broadway.
With the advent of online neuroimaging databases, people can mine more features from these big data.This is also especially important for students at smaller colleges and universities without an MRI scanner.My goal is to make neuroimaging analysis understandable and accessible for anyone who is interested in neuroscience and neuroimaging.Above all, I aim to make fMRI analysis more precise and integrative.

The Data Management Tools-Data Portal
DataLad (https://www.datalad.org/)providing a data portal and a versioning system for data download or transmit, DataLad lets you can control and share your data more flexible.
As I have mentioned above, with the advent of online neuroimaging databases, some famous datasets are listed as follows: • Core Nets -the substantial database of macaque tract-tracing data.
• NITRC -another portal that provides the ability to search across many different data sets and databases.
When we face these big data, the size of one subject scanning data can be larger than 1GB.Assume we want to transmit multi-subjects scanning data for remote services or share with other universities or people, that will be a big challenge for researchers.DataLad can fix these problems.You can download, upload, consume these reproducibility data with other people.

Install DataLad
DataLad can easily be installed via pip.

pip install --user datalad
If your developing system is Linux(e.g.Ubuntu, Debian etc.)

Basic Unix Command Line
It is a necessary step for us to study image analysis.You need to know some basic and frequently-used command lines in the Unix system.e.g.ls, cd, rm, grep etc.More tutorial about Unix commands line link here (http://fsl.fmrib.ox.ac.uk/fslcourse/unix_intro/).

Version Control System
Git (https://git-scm.com/) is a free, powerful and open source distributed version control system designed to operate and cooperate projects from small to very large with speed and efficiency.
First, you can choose Git version based on your system.More information on how to install Git link here (https://git-scm.com/downloads).Here I will show you a simple case of how to use Git to share your scripts or control your version system.The first step: build an account on the Github.comand configure your environment.More details link here (https://git-scm.com/doc).Then build a new repository and initial it.You can clone or download this repository into the local directory.

Docker
Docker (https://www.docker.com/) is a lightweight container software, it provides container software that is ideal and easy for users who are looking to occupy small computer memory and handle data with container-based applications, It has totally different principle with Virtual machine, because docker share the computer hardware resources and operating system with host machine, it can easy to make dynamic distribution of resources.
The docker can make each software or app or imaging run separately in the container.You also can build you imaging according to your needs in your work.More basic tutorials about Docker link here (https://neurohackweek.github.io/docker-for-scientists/).Docker is also very popularly used in the neuroimaging analysis recently.
One of the big problems for (f)MRI data analysis is data organization in the directory.In the fMRI experiment, it consists of BOLD data, anatomy data, behavioral data (Fig. 2).The researchers can use BIDS app to make their data in a more standard format and order.The benefit can be listed as follows: PyBIDS is one of BIDS library that can centralize interactions with datasets.For more information about BIDS visit here (http://bids.neuroimaging.io).More command lines with Python link here (https://bids-standard.github.io/pybids/).

import bids.layout import bids.tests import os
The last data organization after BIDS is shown as follows: Fig. 2 The BIDS processing result of (f)MRI data.The image is adapted from http://bids.neuroimaging.io/.

The Quality Control of (f)MRI data
This step is also very important before you start processing your raw data, In the Python, mriqc is a very powerful tool for checking your data quality and it also can provide a user-friendly and nice report for your data.More example for mriqc, you can check here (https://mriqc.readthedocs.io/en/stable/).

The Tools of Preprocessing of (f)MRI data Based on The NeuroDebian
In this section, we highly recommend two tools for preprocessing raw (f)MRI data with Python.

fmriprep: A Robust Preprocessing Pipeline for fMRI Data
Fmriprep (https://fmriprep.readthedocs.io/en/stable/) is a functional magnetic resonance imaging (fMRI) data preprocessing pipeline based Python, it provides very easy, friendly interface for users, it also allowed uses to input minimum parameters to drive the total pipeline works, but it providing more details output reporting (Esteban et al. 2018).
Compared to SPM (Statistical Parametric Mapping, https://www.fil.ion.ucl.ac.uk/spm/) software, it would be more easy interface the next analysis with machine learning and other approaches.
The document of fmriprep link here (https://fmriprep.readthedocs.io/en/stable/).The total fmriprep workflow can be shown as follows (Fig. 3): Fig. 3 The fmriprep workflow chart.In the left side, it is used for preprocessing anatomic MRI data, e.g. head motion correction, tissue segmentation, surface reconstruction etc.In the right side, it is used for preprocessing functional MRI data, e.g.slicing correction, smooth and statistics etc.
For more tutorials on how to practice it, you can link here (https://github.com/poldracklab/fmriprep-notebooks).Here we supply a very nice example to show how it works with Python.The image is adapted from https://fmriprep.readthedocs.io/en/stable/.

Nipype: Neuroimaging in Python Pipelines and Interfaces
Nipype (https://nipype.readthedocs.io/en/latest/),an open source, community contribute based from NiPy, is a Python project that provides a uniform interface to existing neuroimaging software and facilitates interaction between these packages within a single workflow.(Gorgolewski et al. 2011) The mechanism of Nipype is very similar to fmripype.It also combined different packages together (e.g.ANTS, FreeSurfer etc) (Fig. 4).More document of Nipype can be found here (https://nipype.readthedocs.io/en/latest/documentation.html).
Fig. 4 The framework of Nipype.It clusters SPM, FSL, FreeSurfer etc software togehter and supplies user-friendly interface with Python.The image is adapted from https://nipype.readthedocs.io/en/latest/index.html.

Machine learning
After preprocessing (f)MRI data, we can do more exploration for these data and mining more medical information.Machine learning is one of the perfect examples to explain it.Here I will introduce some libraries with Python to do machine learning for neuroimaging.

Nilearn
Nilearn (https://nilearn.github.io/) is a Python module for fast and easy statistical learning on NeuroImaging data.It based on the scikit-learn (https://scikitlearn.org/stable/index.html)toolbox (popular machine learning package with Python) for multivariate statistics with applications such as classification, decoding, or connectivity analysis etc, it still growing and adding more function by large community contributors, in the future, it will have more function and property into it.
Among the best libraries of machine learning for neuroimaging with Python, I think Nilearn is the best choice for us.In the next part, I will show you some simple examples to help you quickly get into Nilearn.Nilearn also supplies interface that helps you to automatically download data from OpenfMRI and NeuroVault ( a new home for all brain statistical maps).

Pymvpa -MultiVariate Pattern Analysis (MVPA) in Python
PyMVPA is a Python package intended to more statistical learning analyses of large datasets.It provides more enrich high-level interface to a broad range of algorithms for classification, regression, feature selection, result plot.It is designed to integrate well with related software packages, such as scikit-learn, shogun (http://www.shoguntoolbox.org/),etc. PyMVPA is free software and requires nothing but free-software to run (Hanke et al. 2009).The next code from PyMVPA official website.

Deep learning
As mentioned before, with the development of supercomputer and machine learning, more and more data is produced from society.In the neuroscience research field, the traditional data analysis approaches face a big challenge and bottleneck when the big data fade into it, so that's why deep learning make very popular in the neuroscience research area that specific for fMRI data analysis, through deep learning model that we can train enough data to dig more medical information or cognitive mechanism.When we back to see the published paper in the recent year, the research topic related to deep learning already become a hot topic.So what kind of tools that we can use to neuroscience data analysis, in the next section, I will introduce some famous, classical and friendly model or library that we can use in our research.

Tensorflow -Very popular open source deep learning model framework
Tensorflow (https://www.tensorflow.org/) is an open source software library for numerical computation using data flow graphs.It provides stable Python API and C APIs as well as without API backwards compatibility guarantee like C++, Java, JavaScript and Swift.
Tensorflow is fully supporting CPU and GPU, it also supports friendly interface with Python, you can through easy install or drive CPU and GPU works (Pattanayak 2017;Gad 2018;Pattanayak 2017).
The core idea of tensorflow is based graphy drive model, more abstract and details that can be described as follows (Fig. 7): Fig. 7 Tensorflow data flow chart.The graph nodes represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) that flow between them.The image is adapted from https://www.tensorflow.org/guide/graphs.brain.js(https://brain.js.org/) is a library of neural networks written in JavaScript.Front developing is the main and popular software method for display some context, actually, more programmer love JavaScript and use it to training some neural networks, what's more, JaveScript is also the very popular computer programming compared others.So the brain.jsthat is a deep learning library that based with JavaScript (Wang, Cai, and Wei 2016;van der Spuy 2012).In order to you can have a direct impression for brain.js,more examples link https://github.com/BrainJS/brain.js/tree/master/examples.In this section, I only list some famous and frequently used deep learn frameworks, actually, more deep learning framework is developed according to each demand in various areas today, but for neuroscience data analysis, you can use above frameworks is already can solve your problem, of course, you also can contribute to these deep learning frameworks through Github, all these frameworks are open sources, welcome to contribute your idea and make it more perfect.

Summary
This review that we generalize all methods and software from the ultimate system platform to a high-level deep learning framework, we hope that this review that can give some new student or researchers can easily walk into neuroscience data analysis.In general, the basic approaches or methods in the neuroscience data analysis(specific for neuroimaging data, such as fMRI etc.) are all cover on this paper.
All software used in this review or more software in the neuroscience data analysis that we did a website that you can find by linking https://sinodanish.github.io/brainsoftware/.

•
OpenfMRI -free and open sharing of raw magnetic resonance imaging (MRI) datasets.(The number of currently available datasets: 95.The number of subjects across all datasets: 3372.Until 5. Nov 2018 ) • Neurodata.io-the database of large-scale connectomics data.
git clone ####The Link of Repository#### Now, you can modify, add, change and remove files or folders in your local directory.What has been changed in your repository, you can check it.#Check current condition git status Before you submit to your remote repository, you need to add comments for what kind of information is updated.Then you can push your local repository into remote services.git commit -m "add info for what has been changed" git push -u origin master %For secturity, you need login your name and password% %Then you can go back your Github to check your updated repository%

10.2 Others-The Deep Learning library 10.2.1 Keras
More example or tutorials that link https://www.tensorflow.org/,you can find more API s or samples about tensorflow.//www.deeplearning.net/software/theano/).Keras is also a very popular deep learning framework, it offers a very easy interface with Python.More example and tutorials link https://keras.io/.