Video Visual Relation Detection plays a central role in understanding complex video content by identifying evolving spatio-temporal interactions between object tracklets. However, current approaches are hindered by long-tailed predicate distributions, the gap between image-based semantics and video dynamics, and the challenge of generalizing to unseen relation categories. We introduce the Dynamic Contextual Relational Alignment Network (DCRAN), an end-to-end framework designed to address these issues. DCRAN integrates a spatio-temporal gating mechanism to enrich tracklet representations with surrounding context, a dynamic relational prompting module that produces adaptive predicate prompts for each subject--object pair, and a multi-granular semantic alignment module that jointly aligns object features and relational representations with their corresponding textual cues through hierarchical contrastive learning. Experiments on standard benchmarks show that DCRAN substantially improves the detection of both frequent and previously unseen relations, demonstrating the value of dynamic prompting and multi-level alignment for robust video relational understanding.