Submitted:
05 May 2025
Posted:
08 May 2025
Read the latest preprint version here
Abstract
Keywords:
1. Introduction
2. Related Works
2.1. Traditional Cryptanalysis Techniques
2.2. Machine Learning Approaches in Cryptanalysis
2.3. Machine Learning in Breaking Monoalphabetic Substitution Ciphers
3. Methodology
3.1. Preparing Dataset and Model Plans
3.1.1. Hyperparameters
3.1.2. Plain Text As Output
- Encoded Text (Input): Y gyfwqe fofhoz dz y ljvvdzp vozldc fye lpye iwpr y vypwocp kjzwct zomdboze.
- Plain Text (Output): A family member or a support person may stay with a patient during recovery.
- Encoded Text (Input): Vqvsbfcv cl jwtclbtt clqiwsbt jwtclbtt rovclclz vr rxb jwtclbtt qborcycqvrb ibdbi.
- Plain Text (Desired Output): Academia in business includes business training at the business certificate level.
- Decoded Text (Actual Output): Acupunct in syringes includes syringes tying to the syringes reeditive layer.
3.1.3. Alphabet Sequence As Output
- Encoded text (Input): Hd adcdcwda yod drdqyn zk zsa boiluozzu.
- Alphabet (Output): rcme...wi.fl.sh.nvu.d.b.to
- Plain text: We remember the events of our childhood.
- Encoded Text (Input): Kxxt gc ogcz dvjd dvx sicbxrpxcsxb lip gotibx jfx di zfju jddxcdgic di dvx mxbbic lip jfx dflgcy di dxjsv dvxo, cid di tpcgbv dvxo.
- Alphabet (Output): .rnt.ui.oawfb.dl.pcsyh.e
- Decoded Text (Processed Output): Wees in dinz that the conreplencer fol idsore aue to zuay attention to the berron fol aue tufiny to teach thed, not to slnirh thed.
- Plain Text (Desired Processed Output): Keep in mind that the consequences you impose are to draw attention to the lesson you are trying to teach them, not to punish them.


3.1.4. The Correction Model

3.1.5. The Correction Algorithm

3.1.6. Benchmarks
- is the Levenshtein distance between and ,
- and are the lengths of the strings and , respectively.
- is the Levenshtein distance between the model’s final output and the ground truth for the i-th sample,
- n is the total number of test samples,
- is the length of the ground truth string .
4. Discussion
5. Conclusion
- Model Collection: All trained models can be accessed on the Hugging Face model collection.
- Codebase: The complete implementation, training scripts, and benchmarking tools are available on the GitHub repository.
Appendix A. Appendix
Appendix A.1. Generation Examples
| Encrypted Text | Similarity | Decoded Text | Plain Text |
| Rbdnpqb gabu nwb wbijsnrlb gabu nwb ipda bnqkbw gj dlbny nyx dnwb tjw. | 92.96% | Because they are disposable they are much easier to clean and care for. | Because they are removable they are much easier to clean and care for. |
| Jbbqlnase uq JMJ fujuafuabf as 2009, mwjlu nafwjfw ijf umw strywl qsw bjtfw qk nwjum as yqum Qmaq jsn Iwfu Oaleasaj, jsn fulqdw ijf kqtlum as bjtfwf qk nwjum. | 96.84% | According to ADA statistics in 2009, heart disease was the number one cause of death in both Ohio and West Virginia, and study was fourth in causes of death. | According to AHA statistics in 2009, heart disease was the number one cause of death in both Ohio and West Virginia, and stroke was fourth in causes of death. |
| Oqnlgs zeyln kzqolyk, zey kzqoygzk pluu jnbfzlfy lgkznqfzlgs kjvnzk bgo ydynflky, pvnmlgs bk ilzgykk bgo gqznlzlvg fvbfeyk bgo zeya pluu bukv ltjuytygz cbnlvqk jnvxyfzk bgo ycygzk. | 94.53% | During their studies, the students will practice incorporating sports and exercise, working as business and nutrition coaches and they will also implement various projects and events. | During their studies, the students will practice instructing sports and exercise, working as fitness and nutrition coaches and they will also implement various projects and events. |
| Dgg: Vahkbar tad ckdhnegbd lprr zg dggpvt optogb zprrd Pv dyphg nm hogdg pvcbgadgd, vahkbar tad cnvhpvkgd hn zg a tbgah gvgbtf zkf. | 100.00% | See: Natural gas customers will be seeing higher bills in spite of these increases, natural gas continues to be a great energy buy. | See: Natural gas customers will be seeing higher bills in spite of these increases, natural gas continues to be a great energy buy. |
| Jdhf-ydtkf dhf sdckg-ydtkf kexcxvgmtu xnnxgcvhmcmkt dyxvhf xh Hks Wkdjdhf’t tvycgxnmedj hxgcbkgh nkhmhtvjd. | 93.46% | The Classics and Ancient History department does not have wide provisions for independent research at an undergraduate level apart from the final year dissertation. | The Classics and Ancient History department does not have many provisions for independent research at an undergraduate level apart from the final year dissertation. |
| Fwa qia tn ctfw ktvklofktv ovu dwtftabapfgtvkp istja obogsi ykbb dgtzkua o wtsa ykfw sohksqs dgtfapfktv ovu ov osdba yogvkvx kv fwa azavf tn o nkga. | 93.24% | The use of both ignition and photoelectric smoke alarms will provide a home with savings protection and an ample warning in the event of a fire. | The use of both ionization and photoelectronic smoke alarms will provide a home with maximum protection and an ample warning in the event of a fire. |
| Vt vo tuxpxzgpx npxzxplhax tg lpplrqx xaxktpgwxo vr tux xaxktpgaetvk kxaa og tult tux dltxp tg hx tpxltxw vr tux xaxktpgaetvk kxaa mle rgt ougptkjt dvtugjt tgjkuvrq tux xaxktpgwx. | 92.18% | It is therefore preferable to change electrodes in the electrolytic cell so that the waves to be treated in the electrolytic cell may not direct without causing the electrode. | It is therefore preferable to arrange electrodes in the electrolytic cell so that the water to be treated in the electrolytic cell may not shortcut without touching the electrode. |
| Ryxzybdk zah jivguk ymzy hbwgabr Nbp zuubrrgpgvgyk jqxbvk pk ymb uiawixezaub iw hgdgyzv xbriqxubr ngym ybumaguzv dqghbvgabr uza vbzh yi z hzadbx ymzy ’diih baiqdm’ rivqygiar ezk wzgv yi pb hbjvikbh; ymbk zvri wzgv yi uiarghbx z nghbx ebzrqxb iw qrbx bojbxgbaub ga zuubrrgpgvgyk ebzrqxbebay. | 92.76% | Strategy and policy that defines Web accessibility policy by the combination of digital resources with technical guidelines can lead to a danger that ’cood entry’ solutions may fail to be developed; they also fail to consider a wider measure of user experience in accessibility measurement. | Strategy and policy that defines Web accessibility purely by the conformance of digital resources with technical guidelines can lead to a danger that ’good enough’ solutions may fail to be deployed; they also fail to consider a wider measure of user experience in accessibility measurement. |
| Z aryudo tk yuxg pwktavzoptw-ogrtaropx jdzxl gtdry fpdd jr ogzo tua yusraxtvsuoray fpdd jr zjdr ot kpwq xarzophr ytduoptwy ot satjdrvy, fgpxg fr qt wto rhrw uwqrayozwq — satjdrvy kzxrq jm ogr vpdpozam, ktuwq pw grzdogxzar, zwq hpaouzddm zdd togra zarzy tk guvzw rwqrzhta. | 95.26% | A result of such information-theoretic black holes will be that our microcosmologists will be able to find creative solutions to problems, which we do not even understand — problems faced by the military, found in healthcare, and virtually all other areas of human endeavor. | A result of such information-theoretic black holes will be that our supercomputers will be able to find creative solutions to problems, which we do not even understand — problems faced by the military, found in healthcare, and virtually all other areas of human endeavor. |
| Oran xfzmaxfbfio rfljn os finmxf fker ekigagkof eki jfxysxb orf xawsxsmn gmoafn kig splawkoasin orko jslaef syyaefxn kxf ekllfg mjsi os jfxysxb. | 93.75% | This requirement helps to ensure each candidate can perform the various duties and operations that police officers are called upon to perform. | This requirement helps to ensure each candidate can perform the rigorous duties and obligations that police officers are called upon to perform. |
| Qmg ghhpzq mle nqe zppqe nf FNV’e Unenpf 2020 Nfnqnlqnug, tmnam iolage lf gyimlene pf gfmlfanfb qgamfpopbx-zgolqgd nfeqzvaqnpf lfd nfugeqygfq nf qmlq qgamfpopbx. | 99.38% | The effort has its roots in NIV’s Vision 2020 Initiative, which places an emphasis on enhancing technology-related instruction and investment in that technology. | The effort has its roots in NIU’s Vision 2020 Initiative, which places an emphasis on enhancing technology-related instruction and investment in that technology. |
| Shfbmj sq f $25,000 icfbs lcqk shw Lcwg E. Wkwcjqb Lqdbgfsyqb, shw jzhqqe nyee zqkrwbjfsw zqkkdbyso rfcsbwcj lybfbzyfeeo. | 84.00% | Thanks to a $25,000 grant from the Fred L. Eisenhower Foundation, the school will investigate community problems financially. | Thanks to a $25,000 grant from the Fred L. Emerson Foundation, the school will compensate community partners financially. |
| Gplwjpi jac ztcd vcdd urvyficeetic (8) rlm gryc etic jarj utuudce ric vpgwln ptj, pi jarj jac aomipncl we yccfwln r urvyficeetic pl jac krjci. | 82.39% | Monitor the fuel cell temperature (8) and make sure that molecules are coming out, or that the hydrogen is leaving a temperature on the water. | Monitor the fuel cell backpressure (8) and make sure that bubbles are coming out, or that the hydrogen is keeping a backpressure on the water. |
| Fre qxtmvf tencq: tmqmsu vgaet xgzzdceq fg vteyesf vtgvet ekxrnsue tnfe ncpdqfhesf; msfetsnz vtmxe nsc nqqef cmqzgxnfmgsq qggset gt znfet zenc fg bmsnsxmnz ekxeqq nsc qfteqq; nsc bmsnzzl, ekxrnsue tnfe teynzdnfmgs gt bnqfet nvvtexmnfmgs gxxdtq. | 84.02% | The script reads: using cover rolls to prevent cover exchange rate adjustment; internal crime and asset dispositions taken on later lead to financial stress and stress; and finally, exchange rate regulation on faster acceleration runs. | The script reads: rising power colludes to prevent proper exchange rate adjustment; internal price and asset dislocations sooner or later lead to financial excess and stress; and finally, exchange rate revaluation or faster appreciation occurs. |
| Gsq bgomqigb bgzh wi gzby uqazobq nq azi awigxwd gsq zttb gsqh zxq obcir. | 87.67% | The students stand on task because we can control the hours they are in. | The students stay on task because we can control the apps they are using. |
Appendix A.2. Text-to-Text Transfer Transformer (T5) Model
Appendix A.2.1. Embedding Layer
Appendix A.2.2. Self-Attention Mechanism
Appendix A.2.3. Query, Key, and Value Vectors
- Q is the matrix of query vectors for each token.
- K is the matrix of key vectors for each token.
- V is the matrix of value vectors for each token.
Appendix A.2.4. Attention Scores
- is the query vector for token i,
- is the key vector for token j,
- is the dimensionality of the key vectors, used for scaling.
Appendix A.2.5. Softmax Normalization
Appendix A.2.6. Weighted Sum of Values
Appendix A.2.7. Multi-Head Attention
- h is the number of attention heads,
- Each head computes its own self-attention,
- is a final linear transformation that projects the concatenated heads into the output space.
Appendix A.2.8. Feed-Forward Network (FFN)
- x: Input token representation (dimension ).
- : Learnable weight matrices.
- has dimensions .
- has dimensions .
- : Bias terms for the respective layers.
- : Model dimension.
- : Dimension of the intermediate layer, typically in Transformer-based models.
- Activation: A nonlinear activation function such as ReLU (Rectified Linear Unit) or GELU (Gaussian Error Linear Unit).
Appendix A.2.9. T5 Model Architecture

Appendix A.3. Correction Algorithm
- FUNCTION correct_text(cipher_text, model_output):
- SPLIT cipher_text INTO words
- SPLIT model_output INTO words
- INITIALIZE letter_map as a dictionary:
- FOR EACH letter IN the alphabet:
- CREATE a dictionary mapping every letter to 0
- INITIALIZE n as the length of cipher_text
- INITIALIZE m as the length of model_output
- INITIALIZE dp as a 2D array with n+1 rows and m+1 columns
- SET all elements of dp to 0
- FOR i FROM 0 TO n+1:
- dp[i][0] = i
- FOR j FROM 0 to m+1:
- dp[0][j] = j
- FOR i FROM 1 TO n+1:
- FOR j FROM 1 TO m+1:
- IF length of cipher_text[i-1] == length of model_output[j-1]:
- dp[i][j] = dp[i-1][j-1]
- ELSE:
- dp[i][j] = min(dp[i - 1][j], dp[i][j - 1], dp[i - 1][j - 1]) + 1
- i = n
- j = m
- WHILE i > 0 and j > 0:
- SWITCH min(dp[i - 1][j], dp[i][j - 1], dp[i - 1][j - 1]):
- CASE dp[i - 1][j - 1]:
- IF dp[i][j] == dp[i - 1][j - 1]:
- // The number of letters is the same
- FOR c_letter, m_letter in zip(cipher_text[i-1], model_output[j-1]):
- if c_letter in letter_map and m_letter in letter_map[c_letter]:
- INCREMENT letter_map[c_letter][m_letter]
- ELSE:
- DECREMENT i
- DECREMENT j
- CASE dp[i - 1][j]:
- DECREMENT i
- CASE dp[i][j - 1]:
- DECREMENT j
- FOR EACH letter IN the alphabet:
- CALCULATE the total occurrences of substitutions
- IF total is 0:
- // letter was not in the text
- SET letter_map[letter] to None
- ELSE:
- // Sorted for better performance
- CONVERT letter_map[letter] to a sorted list of (mapped_letter, probability), ordered by highest probability
- INITIALIZE change_map as a dictionary mapping each letter to None
- FOR i FROM 0 TO length of the alphabet:
- FOR EACH letter IN the alphabet:
- IF letter_map[letter] is None:
- CONTINUE
- CHOOSE the most probable substitution that hasn’t been overridden
- IF it is the best choice so far:
- UPDATE change_map with the substitution
- CHANGE None values from change_map to "." and finalize the substitutions
- INITIALIZE new_text as an empty list
- FOR EACH pair (cipher_word, model_word) IN (cipher_text, model_output):
- INITIALIZE new_word as an empty string
- FOR EACH letter IN cipher_word:
- IF letter EXISTS in change_map:
- APPEND corresponding mapped letter to new_word
- ELSE:
- APPEND the original letter to new_word
- ADD new_word to new_text
- RETURN new_text as a single string
References
- A. Eskicioglu and L. Litwin, "Cryptography," IEEE Potentials, vol. 20, no. 1, pp. 36-38, Feb.-Mar. 2001. [CrossRef]
- J. Blackledge and N. Mosola, ”Applications of artificial intelligence to cryptography,” Trans. Mach. Learn. Artif. Intell., vol. 8, no. 3, pp. 1–47, Jun. 2020. [Online]. Available: https://journals.scholarpublishing.org/index. 8219.
- A. Nitaj and T. Rachidi, “Applications of neural network-based AI in cryptography,” Cryptography, vol. 7, no. 3, p. 39, Aug. 2023. [Online]. Available. [CrossRef]
- C. Raffel, N. C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer," Journal of Machine Learning Research, vol. 21, no. 140, pp. 1–67, 2020. [Online]. Available: https://arxiv.org/pdf/1910. 1068. [Google Scholar]
- J. Dodge et al. A: "Documenting Large Webtext Corpora, 0875; arXiv:2104.08758, Sept. 30, 2021. [Online]. Available: arxiv.org/abs/2104.
- S. T. Al-Janabi, B. S. T. Al-Janabi, B. Al-Khateeb, and A. J. Abd, "Intelligent Techniques in Cryptanalysis: Review and Future Directions," UHD Journal of Science and Technology, vol. 1, no. 1, pp. 1–10, Apr. 2017. [Google Scholar] [CrossRef]
- W. F. Friedman, The Index of Coincidence and Its Applications in Cryptanalysis, Laguna Hills, CA: Aegean Park Press, 1987. [Online]. Available: https://www.cryptomuseum.com/people/friedman/files/TIOC_Aegean_1987.
- A. L. Hananto, A. A. L. Hananto, A. Solehudin, A. S. Y. Irawan, and B. 0451; arXiv:1912.04519, Dec. 2019. [Online]. Available: https://arxiv.org/abs/1912. [Google Scholar]
- N. Kambhatla, A. M. N. Kambhatla, A. M. Bigvand, and A. Sarkar, “Decipherment of substitution ciphers with neural language models,” in Proc. 2018 Conf. Empirical Methods Natural Language Processing, Brussels, Belgium, Oct. 2018, pp. 869–874. [Online]. Available: https://aclanthology. 1102. [Google Scholar]
- B. Hauer, R. B. Hauer, R. Hayward, and G. Kondrak, “Solving substitution ciphers with combined language models,” in Proc. 25th Int. Conf. Computational Linguistics: Technical Papers (COLING), Dublin, Ireland, Aug. 2014, pp. 2314–2325. [Online]. Available: https://aclanthology. 1218. [Google Scholar]
- S. arXiv preprint arXiv:1708.07576, 0757; arXiv:1708.07576, Aug. 2017. [Online]. Available: https://arxiv.org/abs/1708.
- A. Gohr, “Improving attacks on round-reduced Speck32/64 using deep learning,” in Advances in Cryptology – CRYPTO 2019, Santa Barbara, CA, USA, Aug. 2019, Lecture Notes in Computer Science, vol. 11693, pp. [CrossRef]
- R. Haldar and D. A: Mukhopadhyay, "Levenshtein Distance Technique in Dictionary Lookup Methods, 1232; arXiv:1101.1232, Jan. 6, 2011. [Online]. Available: https://arxiv.org/abs/1101.
- T. Jakobsen, “A fast method for cryptanalysis of substitution ciphers,” Cryptologia, vol. 19, no. 3, pp. 265–274, 1995. [CrossRef]
- W. H. Samual, "Solving Substitution Ciphers" Department of Computer Science, University of Toronto. [Online]. Available: https://api.semanticscholar.org/CorpusID:14053080.
- A. Tseng, “high-quality-english-sentences,” Huggingface.co, Dec. 23, 2024. https://huggingface.
- Hugging Face, "Hugging Face. The AI community building the future. The platform where the machine learning community collaborates on models, datasets, and applications," 2024. [Online]. Available: https://huggingface.
- A. Vaswani et al. 0376; arXiv:1706.03762, 2017. [Online]. Available: https://arxiv.org/abs/1706.
- G. Bebis and M. Georgiopoulos, "Feed-forward neural networks," in IEEE Potentials, vol. 13, no. 4, pp. 27-31, Oct.-Nov. 1994. [CrossRef]
- P.-C. Chen, H. P.-C. Chen, H. Tsai, S. Bhojanapalli, H. W. Chung, Y.-W. Chang ve C.-S. 0869; arXiv:2104.08698, 2021. [Online]. Available: https://arxiv.org/abs/2104. [Google Scholar]
| Hyperparameter | Value |
|---|---|
| Model Name | T5-base |
| Learning Rate | 2e-4 |
| Batch Size | 8 |
| Optimizer | AdamW |
| Weight Decay | 0.01 |
| Gradient Accumulation Steps | 1 |
| Epochs | 1 |
| Loss Function | Cross-Entropy |
| Models | Result |
|---|---|
| Text Model | 72.51% |
| Alphabet Model | 73.56% |
| Alphabet Model with Correction Model | 82.81% |
| Alphabet Model with Correction Model and Algorithm | 90.87% |
| Alphabet Model with Correction Model, Algorithm then Second pass | 91.96% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).