Submitted:
06 July 2025
Posted:
07 July 2025
You are already at the latest version
Abstract
Keywords:
Chapter One: Introduction
1.1. Background of the Study
1.2. Problem Statement
1.3. Objectives of the Study
- Map out the historical and technical evolution of NLP methods used in creative text generation.
- Identify and categorize the main application domains of NLP in creative writing, such as poetry, storytelling, and stylistic transformation.
- Analyze the evaluation frameworks used to assess computational creativity, highlighting their strengths and limitations.
- Examine interdisciplinary perspectives from linguistics, literary theory, cognitive science, and philosophy that inform or challenge the understanding of machine creativity.
- Address ethical, cultural, and epistemological concerns surrounding AI-authored content.
- Propose future research directions for more cognitively grounded, socially aware, and ethically aligned creative NLP systems.
1.4. Research Questions
- What NLP techniques and architectures have been most commonly used for computational creativity tasks?
- In what ways have these models demonstrated creative capabilities across different literary or linguistic genres?
- What are the current evaluation metrics for assessing computational creativity in NLP, and how adequate are they?
- How do scholars and practitioners define and interpret creativity in the context of machine-generated text?
- What are the major limitations, risks, and ethical challenges associated with the rise of AI in creative language generation?
1.5. Scope of the Study
1.6. Significance of the Study
- For AI researchers, it provides a consolidated framework to understand the trajectory and limitations of NLP in creative tasks, as well as future research directions that prioritize cognition and ethics.
- For digital humanities scholars, it offers insights into how machine-generated literature can be analyzed, categorized, and contextualized within broader literary traditions.
- For educators and creators, it highlights opportunities for human-AI collaboration in writing, education, and creative production, offering guidance on practical implementation.
- For policymakers and ethicists, it identifies pressing concerns about authorship, originality, intellectual property, and the cultural ramifications of synthetic content proliferation.
1.7. Organization of the Study
- Chapter One introduces the study, including its background, problem statement, objectives, research questions, scope, and significance.
- Chapter Two presents a conceptual and theoretical framework, reviewing key definitions and perspectives on creativity, language, and AI.
- Chapter Three outlines the methodology for the systematic review, including selection criteria, inclusion/exclusion strategies, and data synthesis procedures.
- Chapter Four summarizes and synthesizes the findings across categories such as model types, genres, evaluation strategies, and limitations.
- Chapter Five offers a critical discussion of the implications of these findings, addressing both technical and philosophical dimensions.
- Chapter Six concludes the review with key insights, contributions, recommendations, and directions for future research.
Chapter Two: Literature Review and Theoretical Framework
2.1. Introduction
2.2. Defining Creativity in Human and Computational Contexts
2.2.1. Human Creativity: A Multidimensional Construct
2.2.2. Computational Creativity: Scope and Challenges
2.3. Historical Overview of NLP in Creative Applications
2.3.1. Rule-Based and Symbolic Systems
2.3.2. Statistical and N-Gram Models
2.3.3. Neural Networks and Deep Learning
2.3.4. Transformer-Based Models
2.4. Applications of NLP in Creative Writing Tasks
2.4.1. Poetry Generation
2.4.2. Story Generation
2.4.3. Metaphor and Humor Generation
2.4.4. Style Transfer and Emulation
2.5. Evaluation of Creativity in NLP Systems
2.5.1. Quantitative Metrics
2.5.2. Qualitative and Human-Centered Evaluation
2.5.3. Hybrid Approaches
2.6. Human-AI Collaboration in Creative Writing
2.7. Ethical, Cultural, and Philosophical Considerations
2.7.1. Bias and Cultural Homogenization
2.7.2. Authorship and Originality
2.7.3. Impact on Creative Labor
2.8. Gaps in the Literature
- A lack of unified theoretical frameworks for evaluating machine creativity across genres and languages.
- Limited exploration of non-Western literary traditions in training and evaluation data.
- Underdeveloped ethical frameworks for attribution, ownership, and creative equity.
- Insufficient long-term studies on how human writers adapt to or are influenced by co-creative AI tools.
2.9. Summary
Chapter Three: Methodology
3.1. Introduction
3.2. Research Design
3.3. Research Questions
- What NLP techniques have been used in the domain of computational creativity, and how have they evolved?
- What are the major application domains of NLP for creative text generation, such as poetry, storytelling, and stylistic emulation?
- How is creativity evaluated in NLP-generated content across different genres and tasks?
- What are the key limitations, biases, and ethical challenges associated with NLP-driven creative systems?
- What interdisciplinary perspectives contribute to understanding creativity in computational contexts, and what gaps remain in current scholarship?
3.4. Eligibility Criteria
3.4.1. Inclusion Criteria
- Temporal Scope: Published between January 2005 and December 2024.
- Language: Published in English.
- Content Relevance: The study must explicitly focus on NLP techniques applied to creative text generation (e.g., poetry, storytelling, metaphor, narrative simulation).
- Publication Type: Peer-reviewed journal articles, conference proceedings, technical reports, theses, and preprints.
- Disciplinary Scope: Includes research from computer science, computational linguistics, cognitive science, digital humanities, literary studies, and AI ethics.
- Model Scope: Covers symbolic AI, statistical models, neural networks, and transformer-based models.
3.4.2. Exclusion Criteria
- Studies not focused on text-based creativity (e.g., visual art or music generation without NLP involvement).
- Papers solely centered on utility NLP applications (e.g., translation, summarization, question answering) without creative intent.
- Non-scholarly sources such as blogs, opinion essays, or product reviews.
- Studies published in languages other than English.
- Duplicate publications or works without accessible full texts.
3.5. Search Strategy
- IEEE Xplore
- ACM Digital Library
- PubMed
- Google Scholar
- Scopus
- Web of Science
- arXiv.org
- Preprints.org
- SpringerLink
- ScienceDirect
3.5.1. Search Terms
- “natural language processing” AND “creative writing”
- “computational creativity” AND “poetry generation”
- “language model” AND (“creativity” OR “storytelling”)
- “AI authorship” AND “GPT”
- “machine learning” AND “creative text generation”
3.6. Selection Process
3.6.1. PRISMA Flow Diagram
- Identification: 1,376 records were identified through database searching. An additional 53 records were retrieved through backward reference searching.
- Screening: After removing 411 duplicates, 1,018 articles were screened by title and abstract.
- Eligibility: 247 full-text articles were assessed for eligibility based on inclusion criteria.
- Inclusion: 112 studies were ultimately included in the qualitative synthesis.
3.7. Data Extraction and Coding
- Title and authors
- Year of publication
- Type of publication (journal, conference, preprint)
- NLP technique or model used
- Creative task (e.g., poetry, narrative, metaphor)
- Evaluation method
- Key findings
- Identified limitations
- Ethical considerations
- Disciplinary perspective (e.g., AI, linguistics, literature)
3.8. Data Synthesis Strategy
- Model Type and Architecture
- Creative Application Domain
- Evaluation Metrics and Frameworks
- Human-AI Interaction in Creativity
- Ethical and Cultural Dimensions
3.9. Quality Assessment
- Clarity of research objectives
- Transparency in model description and implementation
- Relevance and robustness of evaluation strategy
- Consideration of limitations and biases
- Ethical reflection (where applicable)
3.10. Ethical Considerations
- Respecting intellectual property rights in citations and references.
- Critical evaluation of biases, misinformation, and unethical practices in reviewed systems (e.g., biased datasets, misinformation propagation).
- Disclosure of conflicts of interest, especially in studies funded by AI companies or containing promotional bias.
3.11. Limitations of the Methodology
- Publication Bias: Studies with successful model outcomes may be overrepresented.
- Language Bias: The review is limited to English-language publications.
- Evolving Field: Given the rapid pace of NLP developments, some 2024 publications may not have been indexed at the time of data collection.
- Conceptual Ambiguity: Creativity remains a contested term, which may lead to inconsistencies in what is included under "creative" tasks.
3.12. Summary
Chapter Four: Results and Thematic Synthesis
4.1. Introduction
4.2. NLP Architectures in Computational Creativity
4.2.1. Symbolic and Rule-Based Approaches (2005–2012)
4.2.2. Statistical Models and N-Grams (2010–2016)
4.2.3. Neural Networks and RNNs (2015–2018)
4.2.4. Transformer Models and Pre-Trained LLMs (2018–2024)
- GPT-3 and ChatGPT: Used extensively for generating sonnets, short stories, plays, and metaphors with minimal prompting.
- CTRL and GShard: Implement control tokens to steer stylistic direction, sentiment, or narrative voice.
- MuseNet and Jukebox: Though focused on music, these models contribute to multimodal creative synthesis.
- T5-Creative: A fine-tuned version of Google’s T5 used specifically for metaphor, humor, and story rewriting.
4.3. Application Domains
4.3.1. Poetry Generation
- Form-Constrained Generation: Haikus, limericks, and sonnets using strict syllable or rhyme constraints (e.g., Lau et al., 2018).
- Emotionally Conditioned Verse: Use of sentiment classifiers to generate poems with intended emotional tone (e.g., Ghazvininejad et al., 2017).
- Style Transfer in Poetry: Mimicking famous poets like Shakespeare or Rumi by fine-tuning on small, curated corpora.
- Interactive Poetry Assistants: Tools that co-write with users in real time (e.g., PoemPortraits, CoPoet).
4.3.2. Story Generation
- HINT (Fan et al., 2018): Hierarchical story generation through topical planning and event modeling.
- AI Dungeon: Open-ended narrative co-creation with GPT-based models.
- Event2Story and PlotMachines: Focus on character development and causal event chains.
- Narrative GANs: Attempts at using adversarial training for plot unpredictability.
4.3.3. Figurative Language and Metaphor
- Conceptual Blending Models (Veale & Hao, 2008): For metaphor and analogical reasoning.
- Neural Pun Generation: Leveraging homophones and syntactic ambiguity.
- Emotion-Driven Metaphor Creation: Using affective vectors to produce novel expressions.
4.3.4. Stylistic Transfer and Emulation
- Fine-tuning on Author-Specific Corpora: Training models on Jane Austen, Edgar Allan Poe, or Toni Morrison.
- Latent Space Manipulation: Moving between styles by adjusting semantic vectors.
- Prompt Engineering: Using targeted cues to steer outputs toward a genre or tone.
4.4. Evaluation Methods for Computational Creativity
4.4.1. Quantitative Metrics
- BLEU and ROUGE: Used to assess overlap with reference texts but criticized for punishing novelty.
- Perplexity: Measures fluency and predictability; high perplexity in creative writing is sometimes a positive indicator.
- Distinct-N: Evaluates lexical diversity.
- MAUVE (Pillutla et al., 2021): Compares distributional properties of human and machine-generated text.
4.4.2. Human-Centered Evaluation
- Likert Scales: Rating creativity, coherence, and emotional impact.
- Turing Test Variants: Participants asked to distinguish human vs. machine-written content.
- Expert Panels: Involvement of poets, authors, and critics for richer feedback.
- Crowdsourcing Platforms: Amazon Mechanical Turk used for scalability.
4.5. Human-AI Co-Creation
4.5.1. Collaboration Frameworks
- Brainstorm ideas.
- Generate stylistic variants.
- Expand narrative arcs.
- Provide linguistic inspiration.
4.5.2. User Experience and Control
- Undo suggestions.
- Guide narrative tone.
- Understand model decisions.
4.6. Ethical and Cultural Implications
4.6.1. Bias in Training Data
- Gender and racial bias in generated characters and themes.
- Cultural homogenization, especially when training corpora are dominated by Western literature.
- Stereotype reinforcement through trope repetition in genre fiction.
4.6.2. Authorship and Ownership
- Who owns a GPT-generated novel?
- Should AI be listed as a co-author?
- Can AI-generated content enter literary competitions?
4.6.3. Philosophical Debates
4.7. Summary of Findings
- NLP has advanced significantly in simulating creative writing, with LLMs achieving near-human fluency and stylistic adaptability.
- Poetry and storytelling remain dominant research areas, but metaphor and humor require further exploration.
- Evaluation frameworks are evolving but still insufficient for capturing the multi-dimensionality of creativity.
- Human-AI collaboration is a promising area, though challenges of agency, authorship, and influence persist.
- Ethical concerns about bias, originality, and cultural impact must be addressed in future development.
Chapter Five: Discussion
5.1. Introduction
5.2. NLP and the Simulation of Creativity
5.2.1. Imitation Versus Innovation
5.2.2. Syntax Versus Semantics
5.3. The Role of Human-AI Collaboration
5.3.1. Augmentative Creativity
5.3.2. Risks of Over-Reliance
5.4. Challenges of Evaluation in Computational Creativity
5.4.1. Inadequacy of Quantitative Metrics
5.4.2. Toward Multi-Dimensional and Hybrid Evaluation
5.5. Ethical and Philosophical Implications
5.5.1. Bias, Stereotyping, and Cultural Flattening
5.5.2. Authorship and Intellectual Property
5.5.3. Redefining Creativity and the Human
5.6. Contributions to the Field
- Taxonomy of Models and Tasks: It provides a structured classification of NLP architectures and their application to various creative genres, offering a roadmap for future model development and research alignment.
- Evaluation Critique and Framework: It critiques existing evaluation practices and calls for a more holistic, interdisciplinary framework that aligns technical fluency with humanistic standards.
- Ethical Awareness: It foregrounds ethical, cultural, and epistemological concerns, advocating for inclusive and responsible AI development in creative domains.
- Human-AI Interaction Insight: It highlights the complex dynamics of human-AI co-creation and suggests design principles for augmentative systems that respect and empower human agency.
- Future Research Agenda: It identifies underexplored areas such as non-Western creative traditions, long-form narrative coherence, affective computing in writing, and the sociology of AI-authored literature.
5.7. Summary
Chapter Six: Conclusion and Future Research Directions
6.1. Conclusions
6.1.1. Summary of Key Findings
- Model Evolution: The transition from rule-based systems to statistical models, and ultimately to transformer-based large language models (LLMs), has significantly enhanced the fluency, coherence, and stylistic adaptability of machine-generated creative writing. Yet, issues of thematic depth, long-form consistency, and metaphorical richness remain persistent challenges.
- Genre Applications: Poetry and storytelling emerged as dominant areas of research, while subfields such as figurative language generation, stylistic emulation, and humor remain comparatively underexplored. Applications increasingly support co-creative environments, transforming AI from autonomous writer to collaborative partner.
- Evaluation Gaps: Current evaluation metrics are inadequate for capturing multi-dimensional aspects of creativity. The heavy reliance on automatic metrics like BLEU and perplexity fails to account for originality, emotional resonance, or aesthetic value. Hybrid human-AI evaluation frameworks are urgently needed.
- Human-AI Interaction: The shift toward human-in-the-loop or co-creative systems has introduced new paradigms of collaborative authorship. While these tools democratize access to creative expression and offer novel forms of engagement, they also risk diluting human agency and introducing stylistic homogeneity.
- Ethical and Philosophical Challenges: Machine-generated creativity raises unresolved questions about authorship, bias, originality, and cultural hegemony. The opacity of LLMs, the potential propagation of stereotypes, and the reproduction of dominant literary norms necessitate robust ethical frameworks and inclusive design strategies.
6.2. Theoretical Implications
6.3. Practical Implications
6.3.1. For AI and NLP Developers
- Develop more transparent and controllable models that allow users to understand and guide creative processes.
- Integrate fine-grained feedback loops and user customizability to support human creativity rather than override it.
- Incorporate ethical modules that detect and mitigate biased, offensive, or culturally insensitive content in creative applications.
6.3.2. For Educators and Writers
- Use NLP-based co-writing tools as pedagogical aids in creative writing courses, fostering experimentation and iterative feedback.
- Treat AI-generated outputs as opportunities for critical reflection, revision, and creative augmentation rather than final products.
- Encourage exploration of AI as a collaborative partner in artistic expression, especially for marginalized voices who may gain access to new literary tools.
6.3.3. For Policy Makers and Legal Scholars
- Address the legal status of AI-generated content in terms of intellectual property, authorship, and attribution.
- Develop clear guidelines for disclosure, ensuring that readers and audiences can distinguish between human and machine-generated works.
- Promote algorithmic accountability by mandating transparency reports, ethical reviews, and user consent mechanisms in creative AI platforms.
6.4. Limitations of the Review
- Language Bias: The review included only English-language publications, potentially overlooking valuable research in other linguistic traditions.
- Rapidly Evolving Field: The NLP landscape, particularly with respect to LLMs, is evolving at a pace that may render some findings quickly outdated.
- Lack of Quantitative Meta-Analysis: The diversity of methods and reporting standards made it infeasible to conduct a statistical meta-analysis of performance outcomes.
- Absence of Multimodal Creativity: Although this review focused on text, many creative systems now integrate vision, sound, and movement, which were not included in the scope.
6.5. Future Research Directions
6.5.1. Toward Context-Aware Creativity
- Multi-modal input (e.g., images, music, or user profiles).
- Memory-augmented models that retain narrative consistency across long-form texts.
- Socio-pragmatic models capable of adjusting tone, style, or form based on context.
6.5.2. Ethical and Inclusive NLP
- Curating multilingual and multicultural literary datasets.
- Embedding ethical evaluation protocols in model development pipelines.
- Supporting participatory design practices that engage artists, writers, and minority communities.
6.5.3. Creativity-Aware Evaluation Metrics
- Machine learning models trained on annotated corpora of human ratings for originality, emotional impact, and aesthetic quality.
- Cognitive neuroscience approaches to assess user engagement with AI-generated content.
- Dynamic feedback mechanisms that learn user preferences in co-creative settings.
6.5.4. Interdisciplinary Research Frameworks
- Integrating computational methods with literary theory, cognitive psychology, and philosophy of art.
- Establishing common vocabularies and frameworks across computer science and the humanities.
- Creating research consortia and interdisciplinary journals focused on AI and the arts.
6.6. Final Reflections
References
- Rahman, M. H., Kazi, M., Hossan, K. M. R., & Hassain, D. (2023). The Poetry of Programming: Utilizing Natural Language Processing for Creative Expression.
- Boden, M. A. (2004). The Creative Mind: Myths and Mechanisms (2nd ed.). Routledge.
- Colton, S., Pease, A., & Ritchie, G. (2001). The Effect of Input Knowledge on Creativity. In Proceedings of the AISB Symposium on AI and Creativity in Arts and Science.
- Veale, T., & Hao, Y. (2008). A Fluid Knowledge Representation for Understanding and Generating Creative Metaphors. Knowledge-Based Systems, 21(7), 614–622.
- Ghazvininejad, M., Shi, X., Choi, Y., & Knight, K. (2017). Hafez: An Interactive Poetry Generation System. In Proceedings of ACL 2017, System Demonstrations (pp. 43–48).
- Lau, J. H., Cohn, T., & Baldwin, T. (2018). Deep-speare: A joint neural model of poetic language, meter and rhyme. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (pp. 1948–1958).
- Fan, A., Lewis, M., & Dauphin, Y. (2018). Hierarchical Neural Story Generation. In Proceedings of ACL 2018 (pp. 889–898).
- Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners. OpenAI Technical Report.
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N.,... & Polosukhin, I. (2017). Attention is All You Need. In Advances in Neural Information Processing Systems (pp. 5998–6008).
- Pillutla, K., Li, S., & Zettlemoyer, L. (2021). MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers. In Advances in Neural Information Processing Systems (NeurIPS).
- Holyoak, K. J., & Thagard, P. (1995). Mental Leaps: Analogy in Creative Thought. MIT Press.
- Dennett, D. (1991). Consciousness Explained. Little, Brown and Co.
- Searle, J. R. (1980). Minds, Brains, and Programs. Behavioral and Brain Sciences, 3(3), 417–457.
- McCormack, J., Gifford, T., & Hutchings, P. (2019). Autonomy, Authenticity, Authorship and Intention in Computer Generated Art. IEEE Transactions on Affective Computing, 10(3), 351–363.
- Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., & Le, Q. V. (2019). XLNet: Generalized Autoregressive Pretraining for Language Understanding. In Advances in Neural Information Processing Systems.
- Holton, R. (2009). AI and the ‘Art’ of Creativity. Leonardo, 42(5), 418–423.
- Elgammal, A., Liu, B., Elhoseiny, M., & Mazzone, M. (2017). CAN: Creative Adversarial Networks, Generating ‘Art’ by Learning About Styles and Deviating from Style Norms. In Proceedings of the 8th International Conference on Computational Creativity.
- Lin, C.-Y. (2004). ROUGE: A Package for Automatic Evaluation of Summaries. In Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004).
- Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (pp. 311–318).
- Dincer, B., & Aydın, C. C. (2021). The Ethics of AI-generated Literature: Issues of Responsibility, Accountability, and Meaning. AI & Society, 36, 1105–1118.
- Floridi, L. (2019). The Logic of Information: A Theory of Philosophy as Conceptual Design. Oxford University Press.
- Ha, D., & Eck, D. (2018). A Neural Representation of Sketch Drawings. In International Conference on Learning Representations (ICLR).
- Manjavacas, E., & Koolen, C. (2021). Stylometry for Authorship Attribution: A Literature Review. Digital Scholarship in the Humanities, 36(Supplement_1), i108–i126.
- Sharma, R., & Kaur, H. (2022). A Review of Deep Learning Techniques in Poetry Generation. International Journal of Advanced Computer Science and Applications, 13(1), 527–536.
- Zhang, Y., Sun, S., Galley, M., Chen, Y. C., Brockett, C., Gao, X., & Dolan, B. (2020). DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 270–278.
- Hutchinson, B., Prabhakaran, V., Denton, E., Webster, K., Zhong, Y., & D'Amour, A. (2021). Towards Transparent and Accountable NLP: A Review of Bias in Language Model Applications. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 629–644.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).