Submitted:
09 September 2024
Posted:
11 September 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Works
2.1. Rule-Based Approaches
2.2. Machine Learning-Based Approaches
2.3. Deep Learning-Based Approaches
3. Method
3.1. Data Pre-Processing
- X1: This represents the token indexes based on the window. It captures the indexes of tokens within each windowed segment of the input data.
- X2: This denotes the character indexes. It records the indexes of individual characters within the windowed segments.
- Y: This represents the target data, where the indexes are shifted one step forward from the X1 indexes. It captures the next token index in the sequence following the windowed segments.
3.2. AE-Based Password Generator
3.2.1. Training AE
3.2.2. Password Generation Using AE
- Specify the desired number of passwords to be generated.
- Provide the trained model.
- Include the Vocabulary file.
- Include the Characters file.
- Specify the window length.
4. Results and Discussion
4.1. Dataset
4.2. Settings
| Parameter | Value |
|---|---|
| Batch Size | 128 |
| Epochs | 12 |
| Min Frequency | 10 |
| Vocabulary Size | 30000 |
| Window size | 5 |
| Step size | 1 |
| Embedding Size | 300 |
| Number of hidden units | 32 |
| Learning rate | 8e-4 |
| Optimizer | Adam |
4.3. Evaluation Metrics
4.3.1. Duplicate Rate
4.3.2. BLEU Score
4.3.3. HIBP
4.4. Experimental Results
- The outcomes obtained from the proposed model exhibit consistency across different repetition periods. However, a slight improvement in results, particularly in terms of the BLEU and HIBP criteria, can be observed when the repetition period is set to 11. Additionally, employing a repetition period of 12 leads to a reduction in the occurrence of repetitive passwords. It is worth noting that as the dataset size increases, with the utilization of the Myspace dataset instead of Hotmail, the duplicate rate also increases, rising from an average of 0.0009 in the smaller dataset to 0.004 in the larger dataset.
- The quantity of passwords allocated in the proposed model aligns with the number of passwords generated throughout all steps. In contrast, the GNPassGAN model generates a smaller number of passwords in all procedures.
- When comparing the duplicate rate of the proposed model with the GNPassGAN model, it is evident that the proposed model performs better in the Hotmail dataset, while it exhibited a slightly higher duplicate rate with an average of 0.004, in contrast to the GNPassGAN model's average of 0.0003
- The increase in dataset volume has a minimal impact on the quality outcomes measured by the BLEU metric. However, it has yielded positive effects on the HIBP findings, demonstrates that we were able to find more leaked passwords. This can be attributed to the proposed model utilizing additional patterns from the dataset, specifically through the BBPE deployment, to generate improved passwords.
5. Conclusions
Funding
Data Availability Statement
Acknowledgements
Conflicts of Interest
Abbreviations
| AE | Auto Encoder |
| BBPE | Byte-level Byte Per Encoding |
| CNNs | Convolutional Neural Networks |
| RNNs | Recurrent Neural Networks |
| 1 |
References
- M.-D. Cano, A. Villafranca, and I. Tasic, Performance Evaluation of Cuckoo Filters as an Enhancement Tool for Password Cracking, Cybersecurity 6, 57 (2023). [CrossRef]
- T. Zhang, Z. Cheng, Y. Qin, Q. Li, and L. Shi, Deep Learning for Password Guessing and Password Strength Evaluation, A Survey, in 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom) (IEEE, Guangzhou, China, 2020), pp. 1162–1166.
- B. Hitaj, P. Gasti, G. Ateniese, and F. Perez-Cruz, PassGAN: A Deep Learning Approach for Password Guessing, in Applied Cryptography and Network Security: 17th International Conference, ACNS 2019, Bogota, Colombia, June 5–7, 2019, Proceedings (Springer-Verlag, Berlin, Heidelberg, 2019), pp. 217–237.
- E. Ribeiro de Mello, M. Silva Wangham, S. Bristot Loli, C. E. da Silva, G. Cavalcanti da Silva, S. A. de Chaves, and B. Bristot Loli, Multi-Factor Authentication for Shibboleth Identity Providers, Journal of Internet Services and Applications 11, 8 (2020).
- C. Gehrmann, M. Rodan, and N. Jönsson, Metadata Filtering for User-Friendly Centralized Biometric Authentication, EURASIP Journal on Information Security 2019, 7 (2019). [CrossRef]
- K. Lim, J. H. Kang, M. Dixson, H. Koo, and D. Kim, Evaluating Password Composition Policy and Password Meters of Popular Websites, in 2023 IEEE Security and Privacy Workshops (SPW) (2023), pp. 12–20.
- Password Guessing: Learn the Nature of Passwords by Studying the Human Behavior. https://dspace.unive.it/handle/10579/19986.
- V. Belikov and I. Prokuronov, Password Strength Verification Based on Machine Learning Algorithms and LSTM Recurrent Neural Networks, Russian Technological Journal 11, 7 (2023). [CrossRef]
- J. Rando, F. Perez-Cruz, and B. Hitaj, PassGPT: Password Modeling and (Guided) Generation with Large Language Models, in (2024).
- E. Darbutaitė, P. Stefanovič, and S. Ramanauskaitė, Machine-Learning-Based Password-Strength-Estimation Approach for Passwords of Lithuanian Context, Applied Sciences 13, 13 (2023). [CrossRef]
- Y. Mo, S. Li, Y. Dong, Z. Zhu, and Z. Li, Password Complexity Prediction Based on RoBERTa Algorithm, Applied Science and Engineering Journal for Advanced Research 3, 3 (2024).
- M. Vainer, Multi-Purpose Password Dataset Generation and Its Application in Decision Making for Password Cracking through Machine Learning, New Trends in Computer Sciences 1, 1 (2023). [CrossRef]
- B. N. Vi, N. Ngoc Tran, and T. G. Vu The, A GAN-Based Approach for Password Guessing, in 2021 RIVF International Conference on Computing and Communication Technologies (RIVF) (IEEE, Hanoi, Vietnam, 2021), pp. 1–5.
- Hashcat - Advanced Password Recovery. https://hashcat.net/hashcat/.
- John the Ripper Password Cracker. https://www.openwall.com/john/.
- Z. Xie, M. Zhang, A. Yin, and Z. Li, A New Targeted Password Guessing Model, in Information Security and Privacy: 25th Australasian Conference, ACISP 2020, Perth, WA, Australia, November 30 – December 2, 2020, Proceedings (Springer-Verlag, Berlin, Heidelberg, 2020), pp. 350–368.
- M. Xu, C. Wang, J. Yu, J. Zhang, K. Zhang, and W. Han, Chunk-Level Password Guessing: Towards Modeling Refined Password Composition Representations, in Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (Association for Computing Machinery, New York, NY, USA, 2021), pp. 5–20.
- X. Guo, Y. Liu, K. Tan, W. Mao, M. Jin, and H. Lu, Dynamic Markov Model: Password Guessing Using Probability Adjustment Method, Applied Sciences 11, 4607 (2021). [CrossRef]
- H. Cheng, W. Li, P. Wang, and K. Liang, Improved Probabilistic Context-Free Grammars for Passwords Using Word Extraction, in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2021), pp. 2690–2694.
- A Large-Scale Analysis of the Semantic Password Model and Linguistic Patterns in Passwords | ACM Transactions on Privacy and Security. https://dl.acm.org/doi/abs/10.1145/3448608. [CrossRef]
- J. Xie, H. Cheng, R. Zhu, P. Wang, and K. Liang, WordMarkov: A New Password Probability Model of Semantics, in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2022), pp. 3034–3038.
- D. Wang, Y. Zou, Z. Zhang, and K. Xiu, Password Guessing Using Random Forest | USENIX, in (n.d.).
- Z. Xie, F. Shi, M. Zhang, H. Ma, H. Wang, Z. Li, and Y. Zhang, GuessFuse: Hybrid Password Guessing With Multi-View, IEEE Transactions on Information Forensics and Security 19, 4215 (2024).
- Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, Improved Training of Wasserstein GANs, in Proceedings of the 31st International Conference on Neural Information Processing Systems (Curran Associates Inc., Red Hook, NY, USA, 2017), pp. 5769–5779.
- B. Hitaj, P. Gasti, G. Ateniese, and F. Perez-Cruz, PassGAN: A Deep Learning Approach for Password Guessing, arXiv:1709.00440.
- C. Fu, M. Duan, X. Dai, Q. Wei, Q. Wu, and R. Zhou, DenseGAN: A Password Guessing Model Based on DenseNet and PassGAN, in (2021), pp. 296–305.
- V. Garg and L. Ahuja, Password Guessing Using Deep Learning, in 2019 2nd International Conference on Power Energy, Environment and Intelligent Control (PEEIC) (IEEE, Greater Noida, India, 2019), pp. 38–40.
- F. Yu and M. V. Martin, GNPassGAN: Improved Generative Adversarial Networks For Trawling Offline Password Guessing, in 2022 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW) (IEEE, Genoa, Italy, 2022), pp. 10–18.
- X. Guo, Y. Liu, K. Tan, M. Jin, and H. Lu, PGGAN: Improve Password Cover Rate Using the Controller, J. Phys.: Conf. Ser. 1856, 012012 (2021). [CrossRef]
- Z. Xia, P. Yi, Y. Liu, B. Jiang, W. Wang, and T. Zhu, GENPass: A Multi-Source Deep Learning Model for Password Guessing, IEEE Trans. Multimedia 22, 1323 (2020). [CrossRef]
- S. Nam, S. Jeon, and J. Moon, Generating Optimized Guessing Candidates toward Better Password Cracking from Multi-Dictionaries Using Relativistic GAN, Applied Sciences 10, 7306 (2020).
- M. Kaleel and N.-A. Le-Khac, Towards a New Deep Learning Based Approach for the Password Prediction, in 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom) (IEEE, Guangzhou, China, 2020), pp. 1146–1150.
- H. Li, M. Chen, S. Yan, C. Jia, and Z. Li, Password Guessing via Neural Language Modeling, in Machine Learning for Cyber Security: Second International Conference, ML4CS 2019, Xi’an, China, September 19-21, 2019, Proceedings (Springer-Verlag, Berlin, Heidelberg, 2019), pp. 78–93.
- Y. Zhang, H. Xian, and A. Yu, CSNN: Password Guessing Method Based on Chinese Syllables and Neural Network, Peer-to-Peer Networking and Applications 13, 6 (2020). [CrossRef]
- D. Huang, Y. Wang, and W. Chen, RLPassGAN: Password Guessing Model Based on GAN with Policy Gradient, in (2022).
- D. Pasquini, G. Ateniese, and C. Troncoso, Universal Neural-Cracking-Machines: Self-Configurable Password Models from Auxiliary Data, arXiv:2301.07628.
- M. Xu, J. Yu, X. Zhang, C. Wang, S. Zhang, H. Wu, and W. Han, Improving Real-World Password Guessing Attacks via Bi-Directional Transformers | USENIX, in (n.d.).
- M. Islam, M. S. Bohuk, P. Chung, T. Ristenpart, and R. Chatterjee, Araña: Discovering and Characterizing Password Guessing Attacks in Practice | USENIX, in (n.d.).
- X. Su, X. Zhu, Y. Li, Y. Li, C. Chen, and P. Esteves-Veríssimo, PagPassGPT: Pattern Guided Password Guessing via Generative Pretrained Transformer, arXiv:2404.04886.
- D. Wang, Y. Zou, Y.-A. Xiao, S. Ma, and X. Chen, Pass2Edit: A Multi-Step Generative Model for Guessing Edited Passwords | USENIX, in (n.d.).
- C. Wang, K. Cho, and J. Gu, Neural Machine Translation with Byte-Level Subwords, Proceedings of the AAAI Conference on Artificial Intelligence 34, 9154 (2020).
- SecLists/Passwords/Leaked-Databases/Myspace.Txt at Master · Danielmiessler/SecLists. https://github.com/danielmiessler/SecLists/blob/master/Passwords/Leaked-Databases/myspace.txt.
- SecLists/Passwords/Leaked-Databases/Hotmail.Txt at Master · Danielmiessler/SecLists. https://github.com/danielmiessler/SecLists/blob/master/Passwords/Leaked-Databases/hotmail.txt.






| #Iteration | Time (minute) | ||
|---|---|---|---|
| Hotmail | Myspace | ||
| BBPE_AE | 10 | 27 | 103 |
| 11 | 28 | 121 | |
| 12 | 30 | 142 | |
| GNPassGAN | 3000 | 61 | 140 |
| 4000 | 150 | 162 | |
| 5000 | 186 | 140 | |
| # Passwords to Generate | # Generated Passwords | Duplicate Rate | HIBP | BLEU | ||
|---|---|---|---|---|---|---|
| Unigram | Bigram | |||||
| Hotmail | 5000 Iteration | |||||
| 3000 | 2944 | 0.0023 | 574 | 0.88 | 0.79 | |
| 2000 | 1984 | 0.0020 | 383 | 0.88 | 0.78 | |
| 1000 | 960 | 0 | 186 | 0.87 | 0.76 | |
| 4000 Iteration | ||||||
| 3000 | 2944 | 0.0073 | 496 | 0.87 | 0.77 | |
| 2000 | 1984 | 0.0045 | 332 | 0.87 | 0.77 | |
| 1000 | 960 | 0.0040 | 164 | 0.85 | 0.75 | |
| 3000 Iteration | ||||||
| 3000 | 2944 | 0.0076 | 379 | 0.86 | 0.76 | |
| 2000 | 1984 | 0.0065 | 257 | 0.86 | 0.75 | |
| 1000 | 960 | 0.0050 | 134 | 0.85 | 0.73 | |
| Average | 0.0043 | 332.7 | 0.86 | 0.76 | ||
| Myspace | 5000 Iteration | |||||
| 3000 | 2944 | 0 | 423 | 0.88 | 0.78 | |
| 2000 | 1984 | 0 | 283 | 0.87 | 0.77 | |
| 1000 | 960 | 0 | 153 | 0.86 | 0.76 | |
| 4000 Iteration | ||||||
| 3000 | 2944 | 0.0016 | 341 | 0.88 | 0.76 | |
| 2000 | 1984 | 0.0015 | 235 | 0.87 | 0.77 | |
| 1000 | 960 | 0 | 116 | 0.86 | 0.76 | |
| 3000 Iteration | ||||||
| 3000 | 2944 | 0.0003 | 386 | 0.88 | 0.78 | |
| 2000 | 1984 | 0 | 263 | 0.87 | 0.77 | |
| 1000 | 960 | 0 | 127 | 0.86 | 0.75 | |
| Average | 0.0003 | 258.5 | 0.87 | 0.76 | ||
| # Passwords to Generate | # Generated Passwords | Duplicate Rate | HIBP | BLEU | ||
|---|---|---|---|---|---|---|
| unigram | bigram | |||||
| Hotmail | 12 Iteration | |||||
| 3000 | 3000 | 0.0006 | 719 | 0.90 | 0.82 | |
| 2000 | 2000 | 0.0005 | 472 | 0.90 | 0.81 | |
| 1000 | 1000 | 0.0010 | 249 | 0.85 | 0.79 | |
| 11 Iteration | ||||||
| 3000 | 3000 | 0.0010 | 790 | 0.90 | 0.82 | |
| 2000 | 2000 | 0.0015 | 522 | 0.90 | 0.81 | |
| 1000 | 1000 | 0.0010 | 252 | 0.89 | 0.79 | |
| 10 Iteration | ||||||
| 3000 | 3000 | 0.0006 | 701 | 0.90 | 0.82 | |
| 2000 | 2000 | 0.0015 | 443 | 0.90 | 0.80 | |
| 1000 | 1000 | 0.0010 | 217 | 0.89 | 0.80 | |
| Average | 0.0009 | 485 | 0.89 | 0.80 | ||
| Myspace | 12 Iteration | |||||
| 3000 | 3000 | 0.0033 | 955 | 0.90 | 0.81 | |
| 2000 | 2000 | 0.0030 | 671 | 0.89 | 0.80 | |
| 1000 | 1000 | 0.0020 | 346 | 0.88 | 0.78 | |
| 11 Iteration | ||||||
| 3000 | 3000 | 0.0043 | 1006 | 0.90 | 0.81 | |
| 2000 | 2000 | 0.0045 | 674 | 0.89 | 0.80 | |
| 1000 | 1000 | 0.0050 | 316 | 0.88 | 0.78 | |
| 10 Iteration | ||||||
| 3000 | 3000 | 0.0060 | 940 | 0.90 | 0.81 | |
| 2000 | 2000 | 0.0050 | 632 | 0.89 | 0.80 | |
| 1000 | 1000 | 0.0040 | 311 | 0.88 | 0.78 | |
| Average | 0.0040 | 650.1 | 0.89 | 0.79 | ||
| Generated Password (Hotmail) | HIBP Result | Generated Password (Myspace) | HIBP Result |
|---|---|---|---|
| ilovered | 2633 | moocow1 | 15320 |
| 1234567 | 4059875 | mickey13 | 10968 |
| superbeta | 60 | P32069 | 3 |
| alexarun | 25 | ilovebill1 | 1882 |
| moricita | 5 | Davy1972 | 1 |
| marita15 | 187 | coolmam1 | 13 |
| babygirl18 | 13611 | something1 | 39316 |
| bettyteam | 1 | madman13 | 1401 |
| carlos1991 | 1028 | Neston12 | 3 |
| CHARINI18 | 1 | chrisy11 | 118 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).