LipidAnalyst; A Comprehensive Tool for Lipidomic Data Visualization and Analysis

Xinyi Liu; Alla Karnovsky; Subramaniam Pennathur; Farsad Afshinnia

doi:10.20944/preprints202605.1883.v1

Submitted:

26 May 2026

Posted:

27 May 2026

You are already at the latest version

Abstract

Introduction: Proper analysis of lipidomic data requires specialized tools for data processing, normalization, and visualization. Limitations in lipid parsing, data processing and visualization in most commercially available packages generated gaps for optimal analysis of lipidomic data. LipidAnalyst is developed to fill the gaps in current packages, enabling efficient lipid parsing, data processing, visualization, and analysis of lipidomic data. Methods: We used R Shiny platform to develop LipidAnalyst which is hosted on MiServer, at “https://lipidanalyst.miserver.it.umich.edu/lipidanalyst/”. Results: LipidAnalyst capabilities are summarized in three major categories of data processing, visualization, and analysis. Data processing includes quality control filtering, normalization and quantification by internal standards, besides unique features for imputation, lipid parsing, combination, further normalization, transformation, and scaling. Visualization capabilities include demonstration of data distribution by boxplots or violin plots, principal component analysis (PCA) plots, hierarchical clustering heatmaps, differential mean lipid heatmaps, volcano plots, correlation plots, and debiased sparse partial correlation (DSPC) clustering plots. Analysis includes t-test, analysis of variance (ANOVA), DSPC, partial least square-differential analysis (PLS-DA), orthogonal partial least square-differential analysis (OPLS-DA), and Random Forest. Conclusion: LipidAnalyst is a powerful tool for optimal processing, visualization, and analysis of lipidomic data. LipidAnalyst allows comprehensive visualization of lipidomic data, empowering researchers to develop appropriate analytical plans accordingly.

Keywords:

lipidomics

;

data processing

;

visualization

;

analysis

;

software

Subject:

Computer Science and Mathematics - Software

Introduction

Historically, clinical lipid research has been limited to measurement of total cholesterol, serum lipoproteins, and total triglyceride in circulation. However, lipids are the most abundant metabolites in circulation [1,2], with various species and diverse roles on different aspects of metabolism and cell physiology. There are over thousands of different lipids that can be classified into major lipid classes beyond the lipid panels measured in clinical practice [3,4]. Technological advances in mass-spectrometry provide opportunities to explore lipidome in biological samples with high granularity and to investigate its relevance to disease in various studies [5,6,7,8,9,10,11]. Both the targeted and untargeted lipidomic platforms are capable of identifying thousands of lipid species in short period of time and generate a large body of data for clinical studies [12,13]. Proper analysis of such a large body of data, especially in untargeted platforms using data driven approaches when the sample size is smaller than the number of identified lipids poses a challenge and necessitates an appropriate analytical plan.

Historically, the differential expression of various metabolites by study phenotype in large datasets has been primarily focused on identification of top differentially expressed metabolites irrespective of the relevance of the remainder of metabolites to differential system biology [14,15,16]. Such an approach is heavily vested on univariate compound-by-compound analysis, heavily relying on volcano plots for visualization. A fundamental assumption for application of a compound-by-compound analysis is that the variability of the candidate metabolite is almost entirely explained by the phenotype of interest and its severity. Univariate approaches work well for biomarkers whose variability is dominated by the phenotype (e.g., troponin in myocardial injury [17,18]); they are less appropriate for heterogeneous chronic diseases such as chronic kidney disease. Furthermore, application of system biology approaches unravels existence of ongoing meaningful differential biology by disease state that may not simply be limited to the top differentially identified candidates proposed by a compound-by-compound approach [17,18]. There is a need for data analysis platforms that provide both system-level and compound-level views of lipidomics data, incorporating lipid structural features and classification by class, carbon chain length, and saturation, to facilitate the discovery of disease-associated metabolic pathways.

On the other hand, appropriate data analysis starts with thorough visualization of data, a prerequisite for data analysis which provides the necessary insight for proper design of analytical plans, especially for data driven approaches. Alterations in lipidomic data are function of net effect of dietary factors, de novo lipogenesis, lipolysis, elongation, and desaturation [19]; processes that may undergo differential modification by disease state and manifest as altered abundance of lipids within each lipid class. Hence, choice of proper analytical plan, especially when there is no a priori hypothesis, requires optimized data visualization as a prerequisite. Although existence of excellent online software and packages (MetaboAnalyst [20], LipidSig [21], LipidOne [22], LipidSuite [23], LipidSigR [24], and Lipidr [25] allowed facilitating choice of analytical plan, there are still gaps in lipid parsing, data processing, data normalization and visualization (Table 1). First, many existing software tools have limitations in lipid parsing because they rely heavily on database-based name matching or require lipids to follow strict nomenclature conventions. As a result, their flexibility in handling diverse or inconsistently annotated lipid names is reduced. Second, while most software provides general data processing and normalization methods, they often lack methods tailored to lipidomics datasets, such as merging lipid duplicates or adducts consolidation and internal standard normalization. Because lipid expression levels may vary substantially across different lipid classes, class level normalization would be beneficial in removing the systematic difference between lipid classes. However, these lipid-specific normalization utilities remain limited in currently available software tools. Third, systematically interpretation of lipidomics data often requires analysis of the lipid structural features rather than focusing on individual lipid species. Visualization tools that incorporate structural information, such as lipid saturation status and acyl chain length distributions, can facilitate the identification of biologically meaningful patterns and support hypothesis generation from lipidomics datasets. In response to these gaps, we have designed “LipidAnalyst” a user-friendly R-based interactive data management program for optimized normalization and visualization of lipidomic data which is suitable for users who are not comfortable with command line tools, allowing choice of proper analytical plan.

Methods

LipidAnalyst Overview: LipidAnalyst was developed using R shiny technology. LipidAnalyst is hosted on MiServer. As shown in Figure 1, LipidAnalyst accepts three types of input files: lipidomics data table, metadata to specify the sample grouping information, and an optional file containing internal standard information that can be used for data normalization (See Example Data File for sample file format). The first step in LipidAnalyst pipeline is data preprocessing. It includes feature filtering, removal and imputation of missing values, and optional grouping of different isomers into singular features. The second step is data normalization, that includes internal standard normalization, normalization using user-supplied metadata, sample normalization, data transformation, and scaling. The normalized data can then be used in visualization and statistical analysis.

Input Data & Quality Control

Preprocessing: Data preprocessing in LipidAnalyst involves feature filtering, data imputation, and combining lipids.

Data filtering helps to eliminate noise and irrelevant features, ensuring that subsequent analyses focus on meaningful biological variations. Users can apply filters to remove low-quality, low-abundance, and low-variance features. Low-quality features are defined as lipid features with excessive missing values across samples that are not specific to one study group. Users can specify a missingness threshold to filter these features. Low abundance filters can help remove features with a relatively low abundance across samples. Low Variance Filter removes features with low variability across samples, which would not be useful for distinguishing between different conditions or groups.

We categorize missingness into two major types:1) Group-level missingness: A lipid is almost completely missing within one experimental group (e.g., all disease group samples have NA). This often suggests true biological absence or very low abundance. In these cases, methods such as Limit of Detection (LoD) imputation are generally more appropriate. 2) General missingness: Values are sporadically missing across samples but not confined to a single group. This pattern typically reflects technical noise or stochastic signal dropout. For this scenario, K-Nearest Neighbors (KNN) imputation(sample-wise) is recommended as it leverages similarity among samples to estimate reasonable values.

In lipidomics data, it is common to see different chemical adducts for the same lipid species (e.g., [M+H]+, [M+Na]+, [M+K]+). Combining these adducts into a single representative entry can help streamline data analysis and interpretation. We present combining lipids features to deal with duplicated lipids or lipids with different adducts. Users can select different methods to combine lipids such as using the mean or sum of the duplicated lipids.

Lipid parsing: Lipid parsing function ensures lipid names are correctly interpreted for class-based analysis. Lipid parsing helps identify the lipid class, and lipid chains from the lipid shorthand name. If some general lipid names may not explicitly show the lipid structure (such as palmitic acid, oleic acid), we can automatically search for structure in the LIPID MAPS® Structure Database (LMSD) [26,27,28]. Lipid parsing table is also editable in the software. If any lipid information is incorrect, users may revise it directly in the table.

Normalization: Normalization is a crucial step to remove technical differences and data skewness for lipidomics data analysis. We present internal standard normalization, normalization by supplementing data such as cell counts or dilution factors, sample wise aware normalization, lipid class aware normalization, data transformation methods and data scaling methods.

LipidAnalyst Analysis Methods

Univariate Statistical Analysis: LipidAnalyst provides univariate statistical testing to identify individual lipid species or lipid classes that differ significantly between experimental groups. Supported methods include Student’s t-test, Welch’s t-test, and analysis of variance (ANOVA), depending on study design and variance assumptions. Fold change analysis and volcano plots are implemented to jointly visualize effect size and statistical significance, enabling rapid identification of biologically meaningful lipid alterations. Differential Mean Lipid Class Heatmaps enable hypothesis-driven exploration of disease-associated lipid alterations by mapping mean abundance differences across lipid carbon chain length and saturation states.

Multivariate Projection & Dimension Reduction: To capture global lipidomic patterns and assess sample-level structure, LipidAnalyst implements multivariate projection and dimension reduction techniques including principal component analysis (PCA), partial least squares discriminant analysis (PLS-DA), and orthogonal PLS-DA (OPLS-DA). These methods facilitate visualization of group separation, detection of outliers, and identification of lipids contributing most strongly to phenotypic differences. Hierarchical clustering and heatmaps are provided for both lipid- and sample-centric exploratory analysis.

Network & Dependency Analysis: Beyond abundance-based analyses, LipidAnalyst supports network-based approaches to investigate lipid–lipid dependency structures. Correlation networks and Differential Sparse Partial Correlation (DSPC) networks [29] are implemented to identify coordinated lipid regulation and condition-specific network rewiring. These analyses enable the detection of altered lipid interaction patterns that may not be apparent from univariate statistics alone.

Predictive Modeling & Machine Learning: LipidAnalyst incorporates supervised machine learning methods, including Random Forest, to support classification and feature importance analysis. These models enable evaluation of lipidomic signatures that best discriminate experimental conditions while providing interpretable importance rankings for candidate biomarkers.

Software availability: All software development was conducted in R (version 4.4.3). The computational environment, including R and all package dependencies, is encapsulated within a Docker image to ensure reproducibility across systems. LipidAnalyst provides a unified interface that can be deployed locally through Docker and further extended through its open-source code available on GitHub. It is currently available at (https://lipidanalyst.miserver.it.umich.edu/lipidanalyst/).

Results

Working with LipidAnalyst starts with the Welcome page (Figure S1). In the next step, users can upload the lipidomic data or load the example lipidomic data which we have recently published [30] (Figure S2), and view the raw data (Figure S3). Next, metadata should be uploaded (Figure S4). Uploading the file of internal standards is an available option if applicable (Figure S5). Subsequently, users have the option data filtering by exclusion by missing, abundance, or variance threshold, or simply skip Data Filtering (Figure S6). Next, the extent of missingness can be assessed (Figure S8) and imputed using various imputation techniques (Figures S7, S9). Some users may prefer to combine various isoforms of a certain lipid into one entry. For example, TG48:0 represented three isoforms with different acyl chains at Sn1 carbon in the example lipidomic data. To do so, users can choose the desired lipid class in which their members need to be combined along with the method of combining the corresponding isoforms to a single value or skipping the step (Figure S10). In the next step, users have the opportunity to verify class level of each lipid, carbon numbers, and number of double bonds (unsaturation) at Sn1 to Sn3 carbons using Lipid Parsing Controls (Figure S11, S12). Data Preview allows a global preview of data distribution (Figure S13), before digging into downstream processing. If internal standards are provided, users then can choose the appropriate internal standards for normalization of various lipid classes in the next step (Figures S14, S15). Further normalization by a user-defined constant value to account for dilution, weight, or volume factors can be performed or skipped in the next step (Figure S16A, S16B). Values can also be normalized by the metadata factors such as cell count, or protein concentration, if the user provided them in the metadata they uploaded (Figure S16A, S16B). Next, users may further normalize the data by various statistics (sum, mean, median), transform the data distribution, and apply various scaling methods, or skip any one of the normalization methods (Figure S17).

Plots: Visualization begins with “Global Distribution Boxplots” (Figure S18) for samples (Figure 2A), lipids classes (Figure 2B), and single lipids.

Figure 2. A. Overview of “Global Distribution Boxplot”. Boxplots for Sample show the sample based lipidomic distribution of data across study samples.

Figure 2. B. Boxplots for Lipid Class show the distribution of lipids within each lipid class across various classes of lipids.

PCA Plot provides graphical distribution of samples by study groups using the lipidomic data in 2-dimmensional (Figure S19A) or 3-dimensional (Figure S19B) view.

Heatmap and Hierarchical Clustering generates hierarchical clustering heatmap of lipidomic data across study samples (Figure S20).

Differential Mean Lipid Heatmap (DMLH) is introduced to visualize the mean values of various lipids within each lipid class distributed by number of acyl chain carbons in X axis, and the number of acyl chain double bonds in Y axis, across study groups (Figure S21). For lipids with single acyl chain or those with more than one acyl chain but combined by their isoforms into single value, X-axis represents total carbons in all chains; for example, in triacylglycerols (TGs, Figure 3).

In lipids with two chains such as diacylglycerols, phosphatidylethanolamines, and phosphatidylcholines, X-axis can be sorted by the number of carbons at Sn1 acyl chain, Sn2, or total carbon numbers (sum of Sn1 and Sn2 acyl chain carbon numbers) for optimal visualization (Figure 4A-C).

Class level lipid comparison: Using box plots, users will be able to compare distribution of lipids by study groups (Figure S22). Using filters to change the number of carbons or the number of double bonds, the box plots can represent the distribution of all lipids within each lipid class (Figure 5A), combination of a select number of lipids within each lipid class (Figure 5B), or a single lipid by study groups (Figure 5C). Alternative to Box plots Violin plots can similarly be applied. Box or violin plots of individual lipids by study groups can also be viewed through the section of Individual Lipid Comparison (Figure S23).

Volcano plots: can be generated in the next step aimed at identifying top differentially measured lipids by study groups (Figures S24, 6). P value threshold, fold change and adjusted p value can be explored as available options.

Figure 6. Volcano plots: shows the distribution of significance of top differentially measured lipids by study groups. P value threshold, fold change and adjusted p value can be explored as available options.

Statistical Tests: “Lipidomics Mean Calculator” presents mean of each lipid by study group (Figure S25). “T-Test” shows the nominal and adjusted p value of a compound-by-compound t-test comparing the statistical differences between the study groups (Figure S26). Similarly, “One Way ANOVA” tests the differences between mean lipids in more than two groups. “Correlation” provides a bivariate correlation heatmap across lipids by study groups (Figure S27A-C). Debiased Sparse Partial Correlation (DSPC) Network generates clusters of differentially correlated lipids by study groups using partial correlations (Figures S28, and 7). Partial Least Square Discriminant Analysis (PLS-DA) allows identification of the top lipid candidates discriminating the study groups (Figure S29A-D). Orthogonal Partial Least Square Discriminant Analysis (OPLS-DA) is used to identify the predictive components associated with the study groups as well as the orthogonal components unrelated to groups differences (Figures S30A-E). Finally, Random Forest can be used as a machine learning technique to rank the top lipid candidates by random forest importance (Figures S31A-C). At the end, users can all the data or the plots generated in the working session (Figure S32).

Figure 7. Debiased Sparse Partial Correlation (DSPC) allows comparing subnetworks of differentially correlated lipids using partial correlations. In this example, correlates of free fatty acids (FFA) in sham model of chronic kidney disease (CKD, left panel) shows preserved interrelationship, whereas loss of significant correlations with FFA in CKD model (right panel) suggests disruption of physiologic pathways.

Discussion

In this report we introduce the development of LipidAnalyst in our group and introduce its applicability in lipidomic research. LipidAnalyst is a specialized powerful tool for data processing, normalization, visualization, and analysis of lipidomic data, based on a self-explanatory menu-driven (point-and click) user friendly interface enabling users to apply complex deep functionalities without the need for prior coding training. These functionalities are summarized into three categories including data processing, visualization, and analysis. Data processing includes a step-by-step approach after uploading the mass spectrometry generated raw data enabling the choice of appropriate filtering thresholds (% missing, abundance, variance); imputation; lipid parsing; combining isomers and different adducts of the same lipid; normalization by internal standards, user defined constant value, metadata, or measures of central tendency and distribution; transformation; and scaling. Visualization involves Data Preview; boxplots or violin plots; PCA plots; heatmaps of hierarchical clustering; Differential Mean Lipid Heatmap (DMLH); volcano plots; correlation matrix heatmap; and DSPC network clusters. Analysis section includes descriptive statistics, t-test and Analysis of Variance (ANOVA) adjusted for multiplicity, correlation calculator, DSPC networking, PLS-DA, OPLS-DA, and Random Forest.

Missingness are categorized into random and non-random [31], and originate from a number of biological and technical reasons that include and are not limiting to group specific expression of feature, low analyte abundance below detection limits, stochastic data-dependent acquisition sampling, and ion suppression [32,33,34]. Hence, the imputation requires approaches specific to the type of missingness [2]. One of the unique features of LipidAnalyst is that it allows users to explore missingness within each group of samples rather than doing it within entire dataset.

Historically, shotgun lipidomics pioneered in the early 2000s by Han and Gross [35] was established as a direct infusion mass spectrometry approach designed to rapidly identify and quantify individual lipid molecular species based on their total carbon number and total number of double bonds in the fatty acyl chains, rather than through prior chromatographic separation. Technological advancements in targeted mass spectrometry, particularly when combined with separation techniques like liquid chromatography or ion mobility spectrometry, enables the separation and identification of lipid isomers that differ in their acyl chain composition, including variations in chain length, unsaturation, and positioning (sn-position) [36,37,38]. This approach overcomes the limitations of conventional mass spectrometry in distinguishing isobaric species (compounds with the same mass but different structures). LipidAnalyst provides flexible lipid name parsing capabilities, including rule-based shorthand parsing, database matching against LIPID MAPS, and manual editing of the parsed results by users. Thus, it enables users to verify the allocation of carbon number (acyl chain length) and number of double bonds at various sn-positions (in glycerophospholipids), at sphingoid base and the corresponding N-Acyl fatty acid (in sphingolipids) and shows the total number of carbon and number of double bonds in the lipid acyls chains. This step serves as a measure of quality control and is a crucial step to generate downstream DMLHs. Comparable programs such as Metaboanalyst focuses on general metabolomics workflows and lacks the lipidomics specificity and lipid structural parsing abilities [39]. LipidSig and its companion r package LipidSigR leverage Goslin for automated parsing, which enforces nomenclature consistency but limits support for vendor-specific or non-standard lipid names. LipidAnalyst offers flexible rule-based parsing, database search and an editable parsing table to accommodate such names. LipidOne [22], LipidSuite [23], and lipidr [25] also rely on standardized lipid nomenclature prior to parsing, which may introduce additional inconvenience for users [21,25].

Identification of various isomers with the same mass by targeted lipidomic platforms leads to exponentially higher number of quantified lipids as the number of acyl chain increases. For example, triacylglycerols can have numerous isoforms per mass, owing to three fatty acyl chains on their sn1 to sn3 glycerol carbons. Some users may prefer to combine the abundance of these isomers into single mass. Similarly, some platforms report different adducts of the same lipids which technically necessitate their combination into a single value. A distinctive feature of LipidAnalyst is its ability to combine lipid isomers of a lipid with certain mass, or different adducts of a certain lipid into a single variable, which eliminates the need for manually cleaning the datasheet and merging the corresponding species.

Normalization by internal standards is crucial step to reduce technical variability, to improve the accuracy and consistency of peak area measurements, and to quantify the lipids in targeted platforms. Most mass spectrometry software, especially in targeted platforms allow normalization, and quantification of lipids from spectral peaks by incorporation of authenticated internal standards with known concentration. Therefore, the downloaded data from such preprocessed step can be used for further downstream normalization and analyses without any further need for internal standard normalization. However, in many instances users may download the raw data representing the peak spectral abundance of lipidomic and internal standards for each sample prior to internal standard normalization. To that end, upon proper uploading of the corresponding internal standard data file, the users may choose a certain internal standard for each lipid class to be normalized by. For final quantification, users may further normalize the data by a user defined constant value to account for fixed factors (dilution factor, weight, volume, etc.) or variable factors (factors that can vary per sample, such as protein concentration). Normalization by variable factors necessitates that the corresponding variables be listed in the metadata. Normalization by metadata accounts for variations in factors that are sample specific. While MetaboAnalyst also provides metadata-based normalization options, it requires manual specification of normalization values for individual samples, which can be a prohibitive task in large datasets.

Intraclass lipids (lipid species belonging to the same chemical class, such as phospholipids or triglycerides) exhibit significantly different concentrations and molecular structures, often spanning several orders of magnitude in abundance within a single sample. While they share a common backbone or functional group, variations in fatty acid chain length, saturation, and branching lead to high diversity in their concentration, ranging from mmol/mg to nmol/mg of protein [40,41]. Such magnitude of differences is further notable between lipid classes. Hence it is imperative that other than sample-wise normalization, intra-lipid class normalization be considered prior to the downstream analysis. Lipid class sum normalization expresses each lipid species as a proportion of its corresponding lipid class total, facilitating the identification of proportional contribution of each feature even within low-abundance lipid classes. The availability of these functions streamlines the analysis workflow and saves time for researchers. In addition to sample-wise normalization by sum and central measures of distribution, LipidAnalyst provides the lipid class normalization by such measures. The logit transformation maps proportions constrained between 0 and 1 to a continuous, unbounded scale (-ꝏ to + ꝏ), enable linear modeling on bounded data, ensuring symmetric distributions, and stabilizing variance [42]. This transformation is ideal when lipid class sum normalization (bounding data between 0 and 1) is applied. Otherwise for data with original scale other forms of data transformation such as log, square root, and cubic root are available which can be used as appropriate. At the end various data scaling methods can be applied for comparability. Users have the options to select the settings that are deemed appropriate to their studies, but we suggest exploring the setting of lipid class sum normalization, followed by logit transformation, followed by autoscaling for the downstream visualization and analyses.

Visualization tools in LipidAnalyst contain a number of different options. Boxplots and violin plots illustrate the distribution by samples, lipid class, subgroup of lipids within each class, or individual lipids. PCA plots reveal the separation of study groups using lipidomic data reduced to the top two principal components. Heatmap for hierarchical clustering reorganizes rows and columns to bring similar lipids together.

Differential Mean Lipid Heatmap (DMLH) It is a powerful hypothesis generating tool and provides unique visualization features in LipidAnalyst. Our group first applied this style of class-wise chain-length × saturation visualization at scale in CKD lipidomics [10]. The abundance of intraclass lipids alters by changes in the number of carbons and double bonds, in response to environmental factors, disease state, or metabolic alterations [8,9,10,43,44,45,46]. The rationale for DMLH is to illustrate and compare the relative abundance of various lipids within each class by changing carbon number and number of double bonds, by projecting the carbon numbers on x, and number of double bonds on y axis, while the heatmap is color coded to represent the z-score standardized mean values for comparability. For lipids with more than one acyl chain, the users have the option to sort the carbon numbers by chain-1, chain-2, or total carbon numbers to achieve the most appropriate visualization. Depending on lipid class and pattern of lipid alteration, it may unravel underpinning mechanisms such as alteration in elongation and desaturation [10], up or down regulation of de novo lipogenesis [8,44], impairment of β-oxidation [8,10], alterations in phospholipase A1,2 activities [9], worsening insulin resistance [9], and upregulation of acyl-CoA synthetase 1 [30], besides many others. We strongly recommend reviewing the DMLHs for all lipid classes prior to developing the analytical plan, especially for a data driven approach when there is no a priori hypothesis.

The integrated statistical framework enables comprehensive analysis of lipidomics datasets, combining differential testing, supervised modeling, and network-based analysis within a unified workflow. While univariate and multivariate approaches facilitate robust identification of discriminatory lipid features, the DSPC network methodology further extends the analysis to a systems-level perspective. By employing graphical lasso–based sparse partial correlation modeling, DSPC prioritizes direct lipid–lipid associations while minimizing indirect correlations, which is particularly advantageous in high-dimensional lipidomics datasets. The resulting networks can be exported for further refinement and functional interpretation in Metscape [47], thereby enhancing biological interpretability and hypothesis generation.

A major advantage of LipidAnalyst is integration of multiple data processing, normalization, and visualization modules with analytical frameworks within a unified workflow compared with existing lipidomics tools. Table 1 shows its relative strength compared to other available packages. Despite the advantages of LipidAnalyst, a few limitations should be noted. First, the package does not include all statistical techniques available in more advanced packages such as SAS. However, the users can download the data at different stages of data evolution and export it to other packages of their choice if analysis beyond the scope of LipidAnalyst is needed. Second, pathway analysis is not integrated in LipidAnalyst. Nevertheless, users can similarly export the analysis results from LipidAnalyst to dedicated apps such as BioPan [48], Lion/Web [49], or lipid set enrichment analysis (LSEA) in LipidSig [21] and lipidr [25] [48]. Lastly, although the framework facilitates identification of biologically meaningful lipid modules, experimental validation of the identified lipid interactions was beyond the scope of this study.

In conclusion, LipidAnalyst is a specialized powerful tool for data processing, normalization, visualization, and analysis of lipidomic data, empowering researchers to optimally process, visualize and analyze the lipidomic data with functionalities above comparable tools. LipidAnalyst allows comprehensive visualization of lipidomic data, empowering researchers to develop appropriate analytical plans accordingly.

Funding

S. Pennathur: National Institute of Diabetes and Digestive and Kidney Diseases (U54DK137314 and P30DK89503).

References

Quehenberger, O.; Dennis, E.A. The human plasma lipidome. N Engl. J. Med. 2011, 365, 1812–1823. [Google Scholar] [CrossRef] [PubMed]
Lazar, C.; Gatto, L.; Ferro, M.; Bruley, C.; Burger, T. Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies. J. Proteome Res. 2016, 15, 1116–1125. [Google Scholar] [CrossRef]
Fahy, E.; Subramaniam, S.; Brown, H.A.; Glass, C.K.; Merrill, A.H., Jr.; Murphy, R.C.; Raetz, C.R.; Russell, D.W.; Seyama, Y.; Shaw, W.; et al. A comprehensive classification system for lipids. J. Lipid Res. 2005, 46, 839–861. [Google Scholar] [CrossRef]
Fahy, E.; Subramaniam, S.; Murphy, R.C.; Nishijima, M.; Raetz, C.R.; Shimizu, T.; Spener, F.; van Meer, G.; Wakelam, M.J.; Dennis, E.A. Update of the LIPID MAPS comprehensive classification system for lipids. J. Lipid Res. 2009, 50, S9–14. [Google Scholar] [CrossRef]
Anh, N.K.; Thu, N.Q.; Tien, N.T.N.; Long, N.P.; Nguyen, H.T. Advancements in Mass Spectrometry-Based Targeted Metabolomics and Lipidomics: Implications for Clinical Research. Molecules 2024, 29. [Google Scholar] [CrossRef]
Luque de Castro, M.D.; Quiles-Zafra, R. Lipidomics: An omics discipline with a key role in nutrition. Talanta 2020, 219, 121197. [Google Scholar] [CrossRef]
Vvedenskaya, O.; Holcapek, M.; Vogeser, M.; Ekroos, K.; Meikle, P.J.; Bendt, A.K. Clinical lipidomics - A community-driven roadmap to translate research into clinical applications. J. Mass. Spectrom. Adv. Clin. Lab 2022, 24, 1–4. [Google Scholar] [CrossRef]
Afshinnia, F.; Nair, V.; Lin, J.; Rajendiran, T.M.; Soni, T.; Byun, J.; Sharma, K.; Fort, P.E.; Gardner, T.W.; Looker, H.C.; et al. Increased lipogenesis and impaired beta-oxidation predict type 2 diabetic kidney disease progression in American Indians. JCI Insight 2019, 4(21), e130317. [Google Scholar] [CrossRef]
Afshinnia, F.; Rajendiran, T.M.; He, C.; Byun, J.; Montemayor, D.; Darshi, M.; Tumova, J.; Kim, J.; Limonte, C.P.; Miller, R.G.; et al. Circulating Free Fatty Acid and Phospholipid Signature Predicts Early Rapid Kidney Function Decline in Patients With Type 1 Diabetes. Diabetes Care 2021, 44, 2098–2106. [Google Scholar] [CrossRef] [PubMed]
Afshinnia, F.; Rajendiran, T.M.; Soni, T.; Byun, J.; Wernisch, S.; Sas, K.M.; Hawkins, J.; Bellovich, K.; Gipson, D.; Michailidis, G.; et al. Impaired b-Oxidation and Altered Complex Lipid Fatty Acid Partitioning with Advancing CKD. J. Am. Soc. Nephrol. 2018, 29, 295–306. [Google Scholar] [CrossRef] [PubMed]
Rhee, E.P.; Clish, C.B.; Ghorbani, A.; Larson, M.G.; Elmariah, S.; McCabe, E.; Yang, Q.; Cheng, S.; Pierce, K.; Deik, A.; et al. A combined epidemiologic and metabolomic approach improves CKD prediction. J. Am. Soc. Nephrol. 2013, 24, 1330–1338. [Google Scholar] [CrossRef] [PubMed]
ASN.2012101006; [pii.
Contrepois, K.; Mahmoudi, S.; Ubhi, B.K.; Papsdorf, K.; Hornburg, D.; Brunet, A.; Snyder, M. Cross-Platform Comparison of Untargeted and Targeted Lipidomics Approaches on Aging Mouse Plasma. Sci. Rep. 2018, 8, 17747. [Google Scholar] [CrossRef] [PubMed]
Meikle, T.G.; Huynh, K.; Giles, C.; Meikle, P.J. Clinical lipidomics: realizing the potential of lipid profiling. J. Lipid Res. 2021, 62, 100127. [Google Scholar] [CrossRef]
Kumar, N.; Hoque, M.A.; Sugimoto, M. Robust volcano plot: identification of differential metabolites in the presence of outliers. BMC Bioinform. 2018, 19, 128. [Google Scholar] [CrossRef]
Cui, X.; Churchill, G.A. Statistical tests for differential expression in cDNA microarray experiments. Genome Biol. 2003, 4, 210. [Google Scholar] [CrossRef]
Jin, W.; Riley, R.M.; Wolfinger, R.D.; White, K.P.; Passador-Gurgel, G.; Gibson, G. The contributions of sex, genotype and age to transcriptional variance in Drosophila melanogaster. Nat. Genet 2001, 29, 389–395. [Google Scholar] [CrossRef]
Diaz-Beltran, L.; Cano, C.; Wall, D.P.; Esteban, F.J. Systems biology as a comparative approach to understand complex gene expression in neurological diseases. Behav. Sci. 2013, 3, 253–272. [Google Scholar] [CrossRef]
Yue, R.; Dutta, A. Computational systems biology in disease modeling and control, review and perspectives. npj Syst. Biol. Appl. 2022, 8, 37. [Google Scholar] [CrossRef]
Zeng, W.; Beyene, H.B.; Kuokkanen, M.; Miao, G.; Magliano, D.J.; Umans, J.G.; Franceschini, N.; Cole, S.A.; Michailidis, G.; Lee, E.T.; et al. Lipidomic profiling in the Strong Heart Study identified American Indians at risk of chronic kidney disease. Kidney Int. 2022, 102, 1154–1166. [Google Scholar] [CrossRef]
Pang, Z.; Lu, Y.; Zhou, G.; Hui, F.; Xu, L.; Viau, C.; Spigelman, A.F.; MacDonald, P.E.; Wishart, D.S.; Li, S.; et al. MetaboAnalyst 6.0: towards a unified platform for metabolomics data processing, analysis and interpretation. Nucleic Acids Res. 2024, 52, W398–W406. [Google Scholar] [CrossRef] [PubMed]
Liu, C.H.; Shen, P.C.; Lin, W.J.; Liu, H.C.; Tsai, M.H.; Huang, T.Y.; Chen, I.C.; Lai, Y.L.; Wang, Y.D.; Hung, M.C.; et al. LipidSig 2.0: integrating lipid characteristic insights into advanced lipidomics data analysis. Nucleic Acids Res. 2024, 52, W390–W397. [Google Scholar] [CrossRef] [PubMed]
Pellegrino, R.M.; Giulietti, M.; Alabed, H.B.R.; Buratta, S.; Urbanelli, L.; Piva, F.; Emiliani, C. LipidOne: user-friendly lipidomic data analysis tool for a deeper interpretation in a systems biology scenario. Bioinformatics 2022, 38, 1767–1769. [Google Scholar] [CrossRef]
Mohamed, A.; Hill, M.M. LipidSuite: interactive web server for lipidomics differential and enrichment analysis. Nucleic Acids Res. 2021, 49, W346–W351. [Google Scholar] [CrossRef] [PubMed]
Liu, C.H.; Shen, P.C.; Tsai, M.H.; Liu, H.C.; Lin, W.J.; Lai, Y.L.; Wang, Y.D.; Hung, M.C.; Cheng, W.C. LipidSigR: a R-based solution for integrated lipidomics data analysis and visualization. Bioinform. Adv. 2025, 5, vbaf047. [Google Scholar] [CrossRef]
Mohamed, A.; Molendijk, J.; Hill, M.M. lipidr: A Software Tool for Data Mining and Analysis of Lipidomics Datasets. J. Proteome Res. 2020, 19, 2890–2897. [Google Scholar] [CrossRef]
Conroy, M.J.; Andrews, R.M.; Andrews, S.; Cockayne, L.; Dennis, E.A.; Fahy, E.; Gaud, C.; Griffiths, W.J.; Jukes, G.; Kolchin, M.; et al. LIPID MAPS: update to databases and tools for the lipidomics community. Nucleic Acids Res. 2024, 52, D1677–D1682. [Google Scholar] [CrossRef]
Fahy, E.; Sud, M.; Cotter, D.; Subramaniam, S. LIPID MAPS online tools for lipid research. Nucleic Acids Res. 2007, 35, W606–612. [Google Scholar] [CrossRef]
Sud, M.; Fahy, E.; Cotter, D.; Brown, A.; Dennis, E.A.; Glass, C.K.; Merrill, A.H., Jr.; Murphy, R.C.; Raetz, C.R.; Russell, D.W.; et al. LMSD: LIPID MAPS structure database. Nucleic Acids Res. 2007, 35, D527–532. [Google Scholar] [CrossRef]
Basu, S.; Duren, W.; Evans, C.R.; Burant, C.F.; Michailidis, G.; Karnovsky, A. Sparse network modeling and metscape-based visualization methods for the analysis of large-scale metabolomics data. Bioinformatics 2017, 33, 1545–1553. [Google Scholar] [CrossRef]
Saum, K.; Liu, X.; Rajendiran, T.; Zeng, L.; Kayampilly, P.; Byun, J.; Afshinnia, F.; Pennathur, S. Chronic Kidney Disease Induces Distinct Alterations of Macrophage Lipid Metabolism in a Mouse Model of Atherosclerosis. J. Lipid Res. 2026, 100975. [Google Scholar] [CrossRef] [PubMed]
Rubin, D.B. Inference and missing data. Biometrika 1976, 63, 581–592. [Google Scholar] [CrossRef]
Hrydziuszko, O.; Viant, M.R. Missing values in mass spectrometry based metabolomics: an undervalued step in the data processing pipeline. Metabolomics 2012, 8, 161–174. [Google Scholar] [CrossRef]
McGurk, K.A.; Dagliati, A.; Chiasserini, D.; Lee, D.; Plant, D.; Baricevic-Jones, I.; Kelsall, J.; Eineman, R.; Reed, R.; Geary, B.; et al. The use of missing values in proteomic data-independent acquisition mass spectrometry to enable disease activity discrimination. Bioinformatics 2020, 36, 2217–2223. [Google Scholar] [CrossRef]
Xu, J.; Wang, Y.; Xu, X.; Cheng, K.K.; Raftery, D.; Dong, J. NMF-Based Approach for Missing Values Imputation of Mass Spectrometry Metabolomics Data. Molecules 2021, 26. [Google Scholar] [CrossRef] [PubMed]
Han, X.; Gross, R.W. Shotgun lipidomics: electrospray ionization mass spectrometric analysis and quantitation of cellular lipidomes directly from crude extracts of biological samples. Mass. Spectrom. Rev. 2005, 24, 367–412. [Google Scholar] [CrossRef]
Maccarone, A.T.; Duldig, J.; Mitchell, T.W.; Blanksby, S.J.; Duchoslav, E.; Campbell, J.L. Characterization of acyl chain position in unsaturated phosphatidylcholines using differential mobility-mass spectrometry. J. Lipid Res. 2014, 55, 1668–1677. [Google Scholar] [CrossRef]
Takeda, H.; Izumi, Y.; Takahashi, M.; Paxton, T.; Tamura, S.; Koike, T.; Yu, Y.; Kato, N.; Nagase, K.; Shiomi, M.; et al. Widely-targeted quantitative lipidomics method by supercritical fluid chromatography triple quadrupole mass spectrometry. J. Lipid Res. 2018, 59, 1283–1293. [Google Scholar] [CrossRef]
de Bruin, C.R.; de Bruijn, W.J.C.; Hemelaar, M.A.; Vincken, J.P.; Hennebelle, M. Separation of triacylglycerol (TAG) isomers by cyclic ion mobility mass spectrometry. Talanta 2025, 281, 126804. [Google Scholar] [CrossRef]
Xia, J.; Mandal, R.; Sinelnikov, I.V.; Broadhurst, D.; Wishart, D.S. MetaboAnalyst 2.0--a comprehensive server for metabolomic data analysis. Nucleic Acids Res. 2012, 40, W127–133. [Google Scholar] [CrossRef]
Muro, E.; Atilla-Gokcumen, G.E.; Eggert, U.S. Lipids in cell biology: how can we understand them better? Mol. Biol. Cell 2014, 25, 1819–1823. [Google Scholar] [CrossRef] [PubMed]
Hu, C.; Duan, Q.; Han, X. Strategies to Improve/Eliminate the Limitations in Shotgun Lipidomics. Proteomics 2020, 20, e1900070. [Google Scholar] [CrossRef]
Seiffert, S.; Weber, S.; Sack, U.; Keller, T. Use of logit transformation within statistical analyses of experimental results obtained as proportions: example of method validation experiments and EQA in flow cytometry. Front Mol. Biosci. 2024, 11, 1335174. [Google Scholar] [CrossRef]
Afshinnia, F.; Jadoon, A.; Rajendiran, T.M.; Soni, T.; Byun, J.; Michailidis, G.; Pennathur, S.; Michigan Kidney Translational Core, C.I.G. Plasma lipidomic profiling identifies a novel complex lipid signature associated with ischemic stroke in chronic kidney disease. J. Transl. Sci. 2020, 6, 1–8. [Google Scholar] [CrossRef]
Fort, P.E.; Rajendiran, T.M.; Soni, T.; Byun, J.; Shan, Y.; Looker, H.C.; Nelson, R.G.; Kretzler, M.; Michailidis, G.; Roger, J.E.; et al. Diminished retinal complex lipid synthesis and impaired fatty acid beta-oxidation associated with human diabetic retinopathy. JCI Insight 2021, 6. [Google Scholar] [CrossRef] [PubMed]
Romanauska, A.; Kohler, A. Lipid saturation controls nuclear envelope function. Nat. Cell Biol. 2023, 25, 1290–1302. [Google Scholar] [CrossRef]
Ali, O.; Szabo, A. Review of Eukaryote Cellular Membrane Lipid Composition, with Special Attention to the Fatty Acids. Int. J. Mol. Sci. 2023, 24. [Google Scholar] [CrossRef]
Karnovsky, A.; Weymouth, T.; Hull, T.; Tarcea, V.G.; Scardoni, G.; Laudanna, C.; Sartor, M.A.; Stringer, K.A.; Jagadish, H.V.; Burant, C.; et al. Metscape 2 bioinformatics tool for the analysis and visualization of metabolomics and gene expression data. Bioinformatics 2012, 28, 373–380. [Google Scholar] [CrossRef] [PubMed]
Gaud, C.; B, C.S.; Nguyen, A.; Fedorova, M.; Ni, Z.; O'Donnell, V.B.; Wakelam, M.J.O.; Andrews, S.; Lopez-Clavijo, A.F. BioPAN: a web-based tool to explore mammalian lipidome metabolic pathways on LIPID MAPS. F1000Res 2021, 10, 4. [Google Scholar] [CrossRef] [PubMed]
Molenaar, M.R.; Jeucken, A.; Wassenaar, T.A.; van de Lest, C.H.A.; Brouwers, J.F.; Helms, J.B. LION/web: a web-based ontology enrichment tool for lipidomic data analysis. Gigascience 2019, 8. [Google Scholar] [CrossRef]

Figure 1. Overview of LipidAnalyst (Icon Created with BioRender.com).

Figure 3. Differential Mean Lipid Heatmap of triacylglycerols (TG) compares the mean lipid of each lipid by study groups. X-axis represents the total number of carbons in three acyl chains, and Y-axis represents the total number of double bonds in three acyl chains.

Figure 4. Differential Mean Lipid Heatmap of Phosphatidylethanolamine-O (PE-O) compares the mean lipid of each lipid by study groups. X-axis represents the number of carbons in acyl chain at Sn1 carbon, number of carbons in acyl chain at Sn2 carbon or the total number of carbons in acyl chains at Sn1 and Sn2. Y-axis represents the total number of double bonds in the two acyl chains. Users have the flexibility to sort the lipids by carbon number at Sn1 (Panel A), Sn2 (Panel B), or total carbon numbers (Panel C) for optimal view.

Figure 5. Class Level Lipid Comparison: Using filters to change the number of carbons or the number of double bonds, the box plots can represent the distribution of all lipids within each lipid class (Figure 5A), combination of a select number of lipids within each lipid class (Figure 5B), or a single lipid by study groups (Figure 5C).

Table 1. Comparison of LipidAnalyst with existing lipidomics analysis tools.

Feature	LipidAnalyst	MetaboAnalyst	LipidSig	LipidSuite	LipidOne	LipidSigR	Lipidr
User interface	Web-based GUI	Web-based GUI	Web-based GUI	Web-based GUI	Web-based GUI	An R package	An R package
Supported input format	.csv/.tsv/.xlsx	.csv/.tsv	.csv/.tsv/.xlsx	numerical matrix (.csv), skyline export, and mwTab	.csv/.txt	Lipid abundance matrix (.csv/.txt)	numerical matrix (.csv), skyline export, and mwTab
Feature filtering	Yes	Yes	Limited (restricted to filtering features with high missingness)	Lipids can be filtered by their %CV	No	Limited (restricted to filtering features with high missingness)	Lipids can be filtered by their %CV
Customizable data imputation	Yes	Limited (no group-specific missing value imputation)	Limited (no group-specific missing value imputation)	Limited (no group-specific missing value imputation)	No	Limited (no group-specific missing value imputation)	Limited (no group-specific missing value imputation)
Merging duplicate lipids and adduct variants	Yes	No	No	No	No	No	No
Lipid Parsing	Yes	Limited (name matching against LIPID MAPS, no chain information provided)	Limited (constrained by supported nomenclature rules)	Limited (constrained by supported nomenclature rules)	Yes (no parsing validation reference provided)	Limited (constrained by supported nomenclature rules)	Limited (constrained by supported nomenclature rules)
Method coverage of normalization	Yes	Yes	Yes	Yes	No	Yes	Yes
Internal standard normalization	Yes	No	No	Yes	No	No	Yes
Normalization by user defined factors	Yes	Yes (require manual input for each sample)	No	No	No	No	No
Lipid class sum normalization	Yes	No	No	No	No	No	No
Differential Mean Lipid Heatmap	Yes	No	No	No	No	Yes, but only total carbon length is supported on the x-axis, limiting flexibility for multi-chain lipids.	No
DSPC network	Yes	Yes	No	No	No	No	No

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.