Submitted:
25 March 2025
Posted:
26 March 2025
You are already at the latest version
Abstract

Keywords:
1. Introduction


2. ByteCraft
2.1. Architecture
2.2. Tokenization
2.3. Data Augmentation on Prompts
2.4. Training
2.5. Usage
3. Results
4. Potential Future Improvements
4.1. Scaling
4.2. Reinforcement Learning
4.3. Test-time Compute
4.4. Better Generalization on Small Data Using Data Augmentations on Bytes
5. Conclusions
References
- Runaway. Runway Research | Gen-2: Generate novel videos with text, images or video clips, 2023.
- OpenAI. Sora: Creating video from text, 2024.
- DeepMind. Veo 2: A High-Definition Generative Model for Video, 2024.
- Menapace, W.; Lathuiliere, S.; Tulyakov, S.; Siarohin, A.; Ricci, E. Playable video generation. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10061–10070.
- Yang, M.; Li, J.; Fang, Z.; Chen, S.; Yu, Y.; Fu, Q.; Yang, W.; Ye, D. Playable Game Generation. arXiv 2024, arXiv:2412.00887. [Google Scholar]
- Valevski, D.; Leviathan, Y.; Arar, M.; Fruchter, S. Diffusion models are real-time game engines. arXiv 2024, arXiv:2408.14837. [Google Scholar]
- Che, H.; He, X.; Liu, Q.; Jin, C.; Chen, H. Gamegen-x: Interactive open-world game video generation. arXiv 2024, arXiv:2411.00769. [Google Scholar]
- Yu, J.; Qin, Y.; Wang, X.; Wan, P.; Zhang, D.; Liu, X. GameFactory: Creating New Games with Generative Interactive Videos. arXiv 2025, arXiv:2501.08325. [Google Scholar]
- Kanervisto, A.; Bignell, D.; Wen, L.Y.; Grayson, M.; Georgescu, R.; Valcarcel Macua, S.; Tan, S.Z.; Rashid, T.; Pearce, T.; Cao, Y.; et al. World and Human Action Models towards gameplay ideation. Nature 2025, 638, 656–663. [Google Scholar] [CrossRef] [PubMed]
- Todd, G.; Padula, A.G.; Stephenson, M.; Piette, É.; Soemers, D.; Togelius, J. GAVEL: Generating games via evolution and language models. Advances in Neural Information Processing Systems 2024, 37, 110723–110745. [Google Scholar]
- Hu, C.; Zhao, Y.; Liu, J. Game generation via large language models. Proceedings of the 2024 IEEE Conference on Games (CoG). IEEE, 2024; 1–4. [Google Scholar]
- Anjum, A.; Li, Y.; Law, N.; Charity, M.; Togelius, J. The ink splotch effect: A case study on chatgpt as a co-creative game designer. In Proceedings of the Proceedings of the 19th International Conference on the Foundations of Digital Games, 2024, pp. 1–15.
- Rosebud, AI. AI Game Creator | AI-Powered Game Dev Platform, 2024.
- X. Grok 3 Beta — The Age of Reasoning Agents | xAI, 2025.
- Horton, M.; Mehta, S.; Farhadi, A.; Rastegari, M. Bytes are all you need: Transformers operating directly on file bytes. arXiv 2023, arXiv:2306.00238. [Google Scholar]
- Wu, S.; Tan, X.; Wang, Z.; Wang, R.; Li, X.; Sun, M. Beyond Language Models: Byte Models are Digital World Simulators. arXiv 2024, arXiv:2402.19155. [Google Scholar]
- Pérez, J.C.; Pardo, A.; Soldan, M.; Itani, H.; Leon-Alcazar, J.; Ghanem, B. Compressed-Language Models for Understanding Compressed File Formats: a JPEG Exploration. arXiv arXiv:2405.17146 2024.
- Han, X.; Ghazvininejad, M.; Koh, P.W.; Tsvetkov, Y. Jpeg-lm: Llms as image generators with canonical codec representations. arXiv preprint arXiv:2408.08459, arXiv:2408.08459.
- Systems, A. Adobe Flash Player Administration Guide for Flash Player 10.1, 2010. Archived from the original (PDF) on 2010-11-21. Retrieved 2011-03-10.
- Dubey, A.; Jauhri, A.; Pandey, A.; Kadian, A.; Al-Dahle, A.; Letman, A.; Mathur, A.; Schelten, A.; Yang, A.; Fan, A.; et al. The llama 3 herd of models. arXiv arXiv:2407.21783.
- Mistral, AI. Un Ministral, des Ministraux, 2024.
- Wang, P.; Bai, S.; Tan, S.; Wang, S.; Fan, Z.; Bai, J.; Chen, K.; Liu, X.; Wang, J.; Ge, W.; et al. Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution. arXiv arXiv:2409.12191.
- Yang, A.; Yang, B.; Zhang, B.; Hui, B.; Zheng, B.; Yu, B.; Li, C.; Liu, D.; Huang, F.; Wei, H.; et al. Qwen2.5 Technical Report. arXiv arXiv:2412.15115.
- Gage, P. A new algorithm for data compression. The C Users Journal 1994, 12, 23–38. [Google Scholar]
- Kingma, D.P. Adam: A method for stochastic optimization. arXiv arXiv:1412.6980.
- Loshchilov, I. Decoupled weight decay regularization. arXiv arXiv:1711.05101.
- Jordan, K.; Jin, Y.; Boza, V.; You, J.; Cesista, F.; Newhouse, L.; Bernstein, J. Muon: An optimizer for hidden layers in neural networks, 2024.
- Zhao, Y.; Gu, A.; Varma, R.; Luo, L.; Huang, C.C.; Xu, M.; Wright, L.; Shojanazeri, H.; Ott, M.; Shleifer, S.; et al. Pytorch fsdp: experiences on scaling fully sharded data parallel. arXiv arXiv:2304.11277. [CrossRef]
- Hsu, P.L.; Dai, Y.; Kothapalli, V.; Song, Q.; Tang, S.; Zhu, S.; Shimizu, S.; Sahni, S.; Ning, H.; Chen, Y. Liger Kernel: Efficient Triton Kernels for LLM Training. arXiv arXiv:2410.10989.
- Ruffle. Ruffle - Flash Emulator, 2025.
- Kwon, W.; Li, Z.; Zhuang, S.; Sheng, Y.; Zheng, L.; Yu, C.H.; Gonzalez, J.; Zhang, H.; Stoica, I. Efficient memory management for large language model serving with pagedattention. In Proceedings of the Proceedings of the 29th Symposium on Operating Systems Principles, 2023, pp. 611–626.
- Jaro, M.A. Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida. Journal of the American Statistical association 1989, 84, 414–420. [Google Scholar] [CrossRef]
- Bai, S.; Chen, K.; Liu, X.; Wang, J.; Ge, W.; Song, S.; Dang, K.; Wang, P.; Wang, S.; Tang, J.; et al. Qwen2.5-vl technical report. arXiv arXiv:2502.13923.
- Snell, C.; Lee, J.; Xu, K.; Kumar, A. Scaling llm test-time compute optimally can be more effective than scaling model parameters. arXiv arXiv:2408.03314.
- Kim, H.; Choi, S.; Son, J.; Park, J.; Kwon, C. Neural Genetic Search in Discrete Spaces. arXiv arXiv:2502.10433.
- Roelofs, G. History of PNG. libpng, 2010. Retrieved 20 October 2010.
- Collins English Dictionary. Definition of JPEG, 2013. Archived from the original on 21 September 2013. Retrieved 23 May 2013.
- Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of chemical information and computer sciences 1988, 28, 31–36. [Google Scholar] [CrossRef]
- Sterling, T.; Irwin, J.J. ZINC 15–ligand discovery for everyone. Journal of chemical information and modeling 2015, 55, 2324–2337. [Google Scholar] [CrossRef] [PubMed]
- Gómez-Bombarelli, R.; Wei, J.N.; Duvenaud, D.; Hernández-Lobato, J.M.; Sánchez-Lengeling, B.; Sheberla, D.; Aguilera-Iparraguirre, J.; Hirzel, T.D.; Adams, R.P.; Aspuru-Guzik, A. Automatic chemical design using a data-driven continuous representation of molecules. arXiv 2016, arXiv:1610.02415v3. [Google Scholar] [CrossRef]
- Kusner, M.J.; Paige, B.; Hernández-Lobato, J.M. Grammar variational autoencoder. In Proceedings of the International conference on machine learning. PMLR; 2017; pp. 1945–1954. [Google Scholar]
- Ma, T.; Chen, J.; Xiao, C. Constrained generation of semantically valid graphs via regularizing variational autoencoders. Advances in neural information processing systems 2018, 31. [Google Scholar]
- Ma, C.; Zhang, X. GF-VAE: a flow-based variational autoencoder for molecule generation. In Proceedings of the Proceedings of the 30th ACM international conference on information & knowledge management, 2021, pp. 1181–1190.
- Ahn, S.; Chen, B.; Wang, T.; Song, L. Spanning tree-based graph generation for molecules. In Proceedings of the International Conference on Learning Representations; 2021. [Google Scholar]
- Jolicoeur-Martineau, A.; Zhang, Y.; Knyazev, B.; Baratin, A.; Liu, C.H. Generating π-Functional Molecules Using STGG+ with Active Learning, 2025, [arXiv:cs.LG/2502.14842].

| Classification | ||||
| Fully broken (wrong format) | 20.4% | 26.4% | 20.4% | 26.0% |
| Blank canvas (flat color) | 69.6% | 62.8% | 67.2% | 68.0% |
| Stuck on loading screen | 14.4% | 14.4% | 14.0% | 10.8% |
| Showing/hearing something | 14.8% | 15.6% | 18.0% | 14.0% |
| Fully working | 10.4% | 10.8% | 10.4% | 11.2% |
| Max similarity | 0.790 | 0.790 | 0.789 | 0.791 |
| Truth similarity | 0.627 | 0.628 | 0.628 | 0.626 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).