The accuracy in predicting dominant oligomerization type was 97% (Prediction_tf_llps_factor,
Figures S2 and S3). In addition, oligomerization type prediction distributions, in other words probabilities of oligomerization types a prot-assembly formed, were calculated by AI algorithms in Weka library; An example is given in
Table S2. Caution should be taken in selecting what prot assemblies to be included in a dataset, if you plan to analyze results from oligomerization type prediction distribution. There were several entries that had the same domain membership but different oligomerization types in the dataset. We selected one oligomerization type at random and included it in the dataset for each of these entries. The distribution had the actual oligomerization types in sequential order from the highest probability. The dataset was made on LLPS factors that interacted with TFs by physical contacts based on String database. However, only a few of them had several entries with multiple oligomerization types. Therefore, an estimation on the credibility of the accuracy needs to be made. Correlation analysis showed that domain dimer feature from ProtCAD, features of binding interface of DDI from 3did, flags, LLPS types, LLPS functional types, and number of domains of LLPS factors had relatively high correlation with oligomerization types (corr > 0.6). We applied FAMD and plotted coordinates of the variables in the first and the second principal dimensions; variables of TF's target motifs were located close to the variables from oligomerization and LLPS modules (
Figure 2).
Note: Attributes in the vertical column in left is Y axis and those in the horizontal row on top is X axis in each scatter plot. Labels of X-axis- A:protcad_llps_one_pfam_groupid_sing_intra_inter_freq_sum; B:protcad_llps_one_pfam_groupid_sing_oli_stat_sum; C:protcad_llps_one_pfam_groupid_sing_even_odd_sum; D:protcad_llps_one_pfam_groupid_sing_all_max_sum; E:protcad_llps_dimer_sing_oli_sum; F:protcad_llps_dimer_sing_intra_inter_freq_sum; G:protcad_llps_dimer_multi_oli_sum; H:protcad_llps_dimer_multi_intra_inter_freq_sum; I:protcad_llps_one_pfam_sing_intra_inter_freq_sum; J:protcad_llps_one_pfam_multi_intra_inter_freq_sum; K:protcad_llps_one_pfam_sing_oli_stat_sum; L:protcad_llps_one_pfam_multi_oli_stat_sum; M:protcad_llps_cell_loc; N:threedid_llps; O:flag_llps; P:ANA_dimer; Q:ANA_onepfam; R:TEMP_dimer; S:TEMP_onepfam; T:CONDEN_FUNC_LLPS_onepfam; U:llpsOrNot; V:llpsChkUniprot; W:llpsType; X:llpsUniprot; Y:co_tf_fam_val; Z:co_tf_motif_fam_vall; A1:plant_prAS_data_0; B1:plant_prAS_data_1; C1:plant_prAS_list_data_0; D1:plant_prAS_list_data_1; E1:plant_prAS_dom_linker_data_0; F1:plant_prAS_dom_linker_data_1; G1:plant_prAS_dom_linker_list_data_0; H1:plant_prAS_dom_linker_list_data_1; I1:ptmfeature_data_0; J1:ptmfeature_data_1; K1:ptmfeature_list_data_0; L1:ptmfeature_list_data_1; M1:ptmfeature_dom_linker_data_0; N1:ptmfeature_dom_linker_list_data_0; O1:query_plant_prAS_data; P1:query_plant_prAS_list_data; Q1:query_plant_prAS_dom_linker_data; R1:query_plant_prAS_dom_linker_list_data; S1:query_ptmfeature_data; T1:query_ptmfeature_list_data; U1:co_tf_per_tf_val_0; V1:co_tf_per_tf_val_1; W1:co_tf_per_tf_val_2; X1:co_tf_per_tf_val_3; Y1:co_tf_per_tf_val_4; Z1:co_tf_per_tf_val_5; A2:co_tf_per_tf_val_6; B2:co_tf_per_tf_val_7; C2:co_tf_per_tf_val_8; D2:co_tf_per_tf_val_9; E2:co_tf_per_tf_val_10; F2:co_tf_per_tf_val_11; G2:co_tf_per_tf_val_12; H2:co_tf_per_tf_val_13; I2:co_tf_per_tf_val_14; J2:co_tf_per_tf_val_15; K2:co_tf_per_tf_val_16; L2:co_tf_per_tf_val_17; M2:co_tf_per_tf_val_18; N2:co_tf_per_tf_val_19; O2:co_tf_per_tf_val_20; P2:co_tf_per_tf_val_21; Q2:co_tf_per_tf_val_22; R2:co_tf_per_tf_val_23; S2:co_tf_per_tf_val_24; T2:co_tf_per_tf_val_25; U2:co_tf_per_tf_val_26; V2:co_tf_per_tf_val_27; W2:co_tf_per_tf_val_28; X2:co_tf_per_tf_val_29; Y2:co_tf_per_tf_val_30; Z2:co_tf_per_tf_val_31; A3:co_tf_per_tf_val_32; B3:co_tf_per_tf_val_33; C3:co_tf_per_tf_val_34; D3:co_tf_per_tf_val_35; E3:co_tf_per_tf_val_36; F3:co_tf_motif_val; Y-AXIS; co_tf_motif_fam_val.
one_pfam/onepfam: Per one domain; dimer: Per domain dimer (e.g., Dom1 and Dom2); multi: Per Pfam architechture of GroupID (e.g., Dom1, Dom2, Dom3); intra_inter_freq_sum: sum of frequencies of domain(s) in three different groups of prot-assemblies; the first, the second, and the third is where the number of the Uniprot genes is less than, the same as, and the greater than the stoichiometry provided in each GroupID, respectively; oli_stat_sum: sum of frequencies of occurrences in the oligomerization type (e.g., C1_obligate_monomer_obligate, C1_obligate_hetero_single_oligomer_obligate, homo_obligate_monomer_oligomer_moderate, homo_obligate_oligomer_obligate, hetero_obligate_monomer_oligomer_moderate, hetero_obligate_oligomer_obligate, homo_hetero_moderate_monomer_obligate, homo_hetero_moderate_monomer_oligomer_moderate, homo_hetero_moderate_oligomer_obligate); even_odd_sum: sum of frequencies of occurrences in two different groups of symmetries; the first and the second is odd (e.g., C3) and even (e.g., C4), respectively.
cell_loc: presence/absence in 16 different subcellular locations; threedid_llps: differences between non-redundant and redundant sets created based on 13 variables; flag_llps: frequences of special flags; ANA: frequences of 295 different PO IDs; TEMP: frequencies of 64 different PO IDs; CONDEN_FUNC_LLPS_onepfam: frequencies of LLPS functional types: client, regulator, scaffold; llpsType: frequencies in 265 different LLPS types; llpsChkUniprot: if uniprot genes are associated with LLPS or not in each llpsType; llpsUniprot: the maximum number of Pfam assignment of associated uniprot genes in each llpsType; co_tf_fam_val: frequencies of PPI with TF families (BHLH,BZIP,C2H2_ZF,CSD,E2F,HOMEODOMAIN,HSF,MADF,MYB/SANT,NAC/NAM,SOX,TBP,TCP,TCR/CXC,UNKNOWN); co_tf_motif_fam_vall: frequencies of TF motif families (MS02_2.00,MS10_2.00,MS11_2.00,MS13_2.00,MS18_2.00,MS21_2.00,MS27_2.00,MS28_2.00,MS31_2.00,MS33_2.00,MS42_2.00,MS46_2.00,MS56_2.00,MS57_2.00,MS59_2.00,MS62_2.00,MS63_2.00,MS64_2.00); For 37 TFs, (AT1G09770,AT1G55520,AT1G75080,AT2G17870,AT2G23380,AT2G23740,AT2G41130,AT3G02150,AT3G12810,AT3G13445,AT3G17609,AT3G19510,AT3G24140,AT3G24520,AT3G28730,AT3G44460,AT3G47620,AT3G48160,AT3G48430,AT3G52300,AT3G56770,AT3G56850,AT4G02020,AT4G02640,AT4G16780,AT4G29000,AT4G34530,AT4G35580,AT4G37790,AT5G04240,AT5G11260,AT5G22220,AT5G22290,AT5G28770,AT5G46690,AT5G51910,AT5G63420), plant_prAS and ptmfeature data were created; prAS_data: values created based on gravy,pi,local,local2,solblity,ubiqui,glycosyl,lowcomp,beta_sheet,disorder,signal,trans_mem,S–S bond,dom_link,pass,o_glycosyl, where loc1 includes chloroplast transit peptide, mitochondrial targeting peptide, secretory pathway signal peptide and loc2 includes E.R.,chlo,mito, cysk, cyto, nucl, plas, extr, golg, pero, vacu; ptmfeature_data: values created based on Glycation,Lysine,Methylation,N-glycosylation,N-terminal,O-GlcNAcylation,Oxidation,Persulfidation,Phosphorylation,S-cyanylation,S-nitrosylation,S-sulfenylation; query: values of LLPS factor; co_tf_per_tf_val: Per each of 37 TFs, domain associated values were calculated. For details, please refer to the codes provided (
Table S1).