MS-Ana

Maintained by ppernot

analysis.R script

For each DMS-MS/MS experiment as given in a series in the taskTable file, the series of metabolites given in tgTable is analyzed. The aim of the analysis is to integrate the peak (i.e., to estimate the area) corresponding to each metabolite.

Peak model

In the present version, a Gaussian peak shape is used. The formula of a Gaussian function is  G(x;a,x_0,\sigma)=\frac{a}{\sqrt{2\pi}\sigma} \exp\left(-\frac{1}{2}\left(\frac{x-x_0}{\sigma}\right)^2\right)

where a is the area, x_0 is the position of the peak, and \sigma is related to the full width at maximum (fwhm) by fwhm = 2\sqrt{2\log(2)}\sigma. Upon the fit process of the data, the area (a) is optimized, as well as the peak’s position and width (x_0 and \sigma).

From the two dimensional data (m/z, CV), the area can be extracted using a 2D fit where the fit function is the product of two Gaussian functions, one in the m/z, the other in the CV dimension.

It turns out that we need three types of fit:

Fit algorithm

We use a non-linear (weighted) least-squares algorithm to estimate the parameters of the model: the nls function of the stats package [1].

The parameters are constrained to intervals defined by control variables defined below. For the 2D fits, we implemented a ‘fallback’ strategy to 1D fit in the CV space, in cases where the 2D optimization does not converge. The effective dimension of the fit is reported in the results tables.

Control variables

The choice of fit type is set using the fit_dim variable. The important user configuration parameters are listed within the first line of the analysis.R script as follows:

#----------------------------------------------------------
# User configuration params -------------------------------
#----------------------------------------------------------

ms_type   = c('esquire','fticr')[2]

taskTable = 'Test2/files_quantification_2018AA.csv'
tgTable   = 'Test2/targets_paper_renew.csv'

fit_dim  = 1    

filter_results = TRUE
area_min       = 10

userTag = paste0('fit_dim_',fit_dim)

save_figures = TRUE
plot_maps    = FALSE

where:

A set of technical parameters, affecting various aspects of the peaks fit are also available. However, their default values should not be changed without caution.

#----------------------------------------------------------
# Technical params (change only if you know why...) -------
#----------------------------------------------------------

fallback        = TRUE   
correct_overlap = FALSE  
weighted_fit    = FALSE
refine_CV0      = TRUE
debug           = FALSE  

Outputs

The output files can be found in the following repositories:

figRepo  = '../results/figs/'
tabRepo  = '../results/tables/'

All output files are prefixed with a string built by concatenation of the DMS file date, MS file root and fit_dim value. For instance, if your data are (MS_file = ‘C0_AS_DV-1800_1.d.ascii’, DMS_file = ‘Fichier_Dims 20190517-000000.txt’), and if fit_dim=2,
one has prefix = 20190517_C0_AS_DV-1800_1_fit_dim_2_.

Figures

For each task and target, a figure is generated (on screen and as a file if save_figures=TRUE), showing the 2D location of the peak and its profile, either in the CV dimension (fit-dim =1,2), or in the m/z dimension (fit_dim=0). The name of the file is built from the task prefix and the target name.

Example of a 2D fit

Example of a 1D fit along m/z (fit_dim=0)

Tables

For each experiments/task associated with (MS_file, DMS_file), three comma delimited ‘.csv’ files are generated: prefix_results.csv, prefix_fit.csv and prefix_XIC.csv.

For each task, a file names prefix_ctrlParams.yaml is also generated for reproducibility purpose. It contains the values of all the control variables for this task.

Notes

Fit results: XXX_results.csv

Peak profiles: XXX_fit.csv and XXX_XIC.csv

The XXX_XIC.csv file contains the time/CV data profiles integrated over m/z for the compounds in tgTable (fit_dim=1,2) or the m/z data profile (fit_dim=0) for the species in tgTable. The XXX_fit.csv file contains the corresponding gaussian peak profiles.