The terminology of survival modeling: An insight and alternative modeling of student retention

Student retention is one indicator of accountability in the implementation of educational programs. Achievement of student retention rates indicates the performance of the quality objectives of an institution or college. To get an accurate picture of the factors related to retention, we need to do modeling. The retention variable is the time response variable measured in semester units. One of the statistical analyzes that can be used to analyze response data in time is survival analysis. The selection of an accurate analytical method in modeling will produce valid conclusions and impact making policies that are right and on target. This paper presents alternative modeling of student retention in distance education using survival analysis. The method used is a literature review. This paper also briefly describes distance education, open and distance education, distance education students' characteristics, distance education student retention, and survival models for modeling student retention in distance education.


INTRODUCTION
Data modeling is the process of making a model of a collection of data stored in a database. The purpose of modeling is to visually represent data so that useful information is obtained to policymaking. In general, there is no excellent or perfect modeling produced. It's just that the modeling is expected to be close to the real conditions and useful for stakeholders in making decisions. Data modeling is inseparable from the method or data analysis tool used. Modeling will produce a valid and accurate model and be used to predict when using the right method or analysis. Thus, data modeling is closely related to understanding the selection of an appropriate way or research.
Student retention is one indicator of institutional accountability in the implementation of educational programs. A low student retention rate indicates that the quality of the institution's goals is good. The institution has a graduate quality that can be relied upon and accepted by the community. Therefore, retention becomes essential for a tertiary institution in determining the quality of education services. Likewise, with the Universitas Terbuka (UT) Indonesia implementing the Open and Distance Education (ODE) system, student retention is an indicator of education quality assurance.
Student retention conditions need to be appropriately modeled to obtain precise and accurate information following the data characteristics of existing data. The use of appropriate and appropriate analytical methods can produce valid conclusions. For this reason, a good understanding of the proper statistical way or analysis and following the characteristics of student retention data needs to be done. how and from whom they study; making the reality of education accessible to more and more people. Although, in practice, DE has been criticized as having lower quality and effect compared to campusbased schools, some studies suggest that the DE model is more effective than the traditional campusbased education model.
From the various definitions that have been stated previously, then in DE required qualified technology to bridge the separation of students, instructors, and learning resources. Meyer (2002) says that to help alleviate the demands of separation between teachers and students, institutions have begun to utilize available technologies, such as audio connections (i.e., telephones), video cassettes, and television. Starting in the 1980s, popular media in DE used communication satellites to send broadcasts of lectures and instructions to students far from campus locations. In the late 1980s to 1990s, the use of interactive video-based media.
In detail, Taylor (2001) summarizes the five generations of technology concerning the flexibility and ability to facilitate interactions in learning and the cost aspects, as presented in Table 1.

Open and Distance Education
The Open and Distance Education (ODE) System comes from two terms: the open education system and the distance education (DE) system. The concept of open education (open education or open learning) is a goal or ideals of policy regarding the education system. Emphasizing the importance of system flexibility to minimize place, time, and aspect constraints caused by student characteristics such as economic conditions (Bates, 1995).
The term open in the open education system means more free from limitations and has absolutely nothing to do with the ODE system. The ODE system refers more to the system or mode of delivery of the learning process. The open education system relates to changes in educational organizations' structure, becoming an open organization in place, time, learning material, learning system, etc. Open education is a mindset and approach used to provide a variety of choices in learning for students and give as much control as possible for students to determine what will be discovered and learning strategies. Belawati (2019) states that not all distance learning is open. Some literature indicates that the characteristics of open education must at least contain an element of flexibility, including aspects of age (no age restrictions), location (bias from anywhere), costs (cheap or even free), length of study (no limit on study time), and prerequisites (no need to have a previous education diploma), multi-entry and multiexit (can enter and stop at various time / anytime alternatives). Furthermore, Belawati (2019) described DE and ODE as an incision depicted in Figure 2.

Figure 2. Distance Education, Online Education, Open and Distance Education
Bozkurt (2019) emphasizes that the term ODE is more widely used to reflect a shift toward a more social and learner-centered view of learning that adopts openness to broader social justice. ODE is an essential element in the education and training system in the future. Currently, ODE plays a crucial role in determining the creation of a global knowledge-based society. UNESCO (2012) reaffirmed the ODE's role in diversifying the education delivery system, particularly for technical and vocational education, encouraging cooperation and partnerships between companies, professional bodies, and distance teaching institutions. Support is also given to ODE to meet the unique needs of disabled people, migrants, cultural and linguistic minorities, refugees, populations in crises, which cannot be efficiently achieved by traditional shipping systems. Besides, the ODE is deeply embedded in teacher education, correctly, for the training of teachers in positions and the practice of teacher educators.
One example of the "open" movement in education is reflected in OER (Open Education Resources) and MOOCs (Massive Open Online Courses), which is an example of new developments using a range of connected technologies. The main idea of the emergence of the OER movement is the idea of the world of knowledge, namely public goods that technology in general and the worldwide web (www) provides opportunities for everyone to share, use and reuse. In other words, the OER movement is also driven by social responsibility, which aims to provide fair and universal access to knowledge and web platforms, in particular, to function as distribution platforms. This driving force behind the OER movement has encouraged various initiatives around the world to provide access to various educational resources, including lecture material as well as educational material.
Two official definitions are available for OER; one from UNESCO and one from OECD. UNESCO (2002) defines OER as digital teaching material for the learning and research process. This OER is packaged in various digital or other media, which are in the public domain or have been released under an open license, which can be accessed without charge, and can be distributed to others without any restrictions. Meanwhile, UNESCO (2012) explained that OER is a digital material offered freely and openly for educators, students, and independent students and is used again for teaching, learning, and research. Based on this definition, OER emphasizes more on "openness," "digital" format, and "reusability/adaptation" or resources.
Besides OER, other open movements are MOOCs. MOOCs are referred to as "the great revolution in education" (Bates 2014). Simoson et al. (2015) stated that digital technology, OER, and MOOCs provide more extensive access to learning for people who do not have the opportunity to learn in faceto-face lectures. Through OER and MOOCs, the initial goals of distance education, social justice and openness for all, and reflecting community-based and socially driven learning approaches will soon be realized. Bates (2014) further identified MOOCs design practices, namely 1. use of social media (online courses supported by various 'connected' tools and media), 2. material content is determined by the participants or students who decide and contributing material content, 3. well-distributed communication (communication is an automatic network with many subcomponents), and 4. students decide for themselves whether what they have learned is appropriate for them. Anderson and Simpson (2012) suggest that cellular technology helps in the development of ODE. Mobile technology can improve learning anytime and anywhere without the constraints of space and time and distance.

Characteristics of Distance Education Students
The characteristics of DE students are unique and diverse (Cercone 2008). The tendency of students at DE institutions is adult students, which varies depending on age, sex, knowledge, and skills, and the context of the place of learning (Kara et al. 2019). Knowles (1990) suggests that one of the hallmarks of adult learning is that they are motivated to learn if they have perceived needs and interests that can be fulfilled through education; therefore, the starting point for learning is their needs and interests. Merriam and Caffarella (1999) state that most adult students are highly motivated and taskoriented. Furthermore, Merriam and Caffarella (1999) state that adult students have responsibilities such as family, work, and situations (e.g., childcare, need for income) that can disrupt the learning process. Tladi (2013) states that 62.8% of DE students are not ready for the exam because they do not have enough time to study. Furthermore, Tladi (2013) revealed that DE students are generally married, have families and responsibilities, so they lack the commitment to explore independently. Schuemer (1993) suggested that in the DE system, students' learning process is more complicated because, in general, DE students are elderly, work, and have a family. Some researchers also stated the same thing. Age factors are very influential in the DE system (Valasek, 2001). Besides, gender factors (Aragon and Johnson 2008), race/ethnicity (Moore et al. 2002, Sullivan 2001, financial needs (Parker 2003), and GPA (Holder 2007, Harrel andBower 2011). The variety of conditions and characteristics of UT students causes them to be required to be able to coordinate various aspects, such as family, work, and free time with study time. Given these conditions, their motivation to attend lectures in the DE system is also very diverse.
Regarding students' motivation to join DE institutions, some students stated that they joined DE on their initiative or at the institution's instigation/assignment where they worked. Many DE students end up dropping out. Motivation/psychology plays an essential role in the perseverance of DE students to survive to complete their education. Personal goals, a sense of community, and family support are also psychological motivators that increase student persistence (Yang et al. 2017).

Student Retention of Distance Education
The term retention has various meanings. Some researchers have conducted studies regarding the retention of DE students, including Belawati (2006) and Arifin (2018). Belawati (1998) defines retention as the completion of courses and continuing re-registration. Student retention is expressed as a commitment to completing studies within the given timeframe and re-registering to continue education and maintain their educational status without significant disruption. Sembiring (2014) states that student retention is conceptually defined as a commitment to consistently reuse favored services in the future, namely to re-register. Meanwhile, Arifin (2018) says that student retention is related to student behavior, continuing learning in a particular program by registering for four consecutive semesters.
Low student retention is the biggest problem and has a significant contribution to student studies' failure in the DE program (Rovai and Downey 2010). From an institutional perspective, retention serves as a measure of institutional accountability in the effectiveness and quality of the implementation of educational programs (Tinto 2006). Besides, Kim et al. (2010) state that retention shows the achievement of an institution's mission to educate and prepare students to be able to survive to complete their studies. Dropping out of school indicates the failure to develop and support students and is a significant loss for an institution. Belawati (1998) states that organizationally, low retention can endanger UT's existence; a decrease in the number of students means a reduction in UT's financial income. In turn, this will affect the institution's ability to expand and improve the quality of teaching and education services. A low retention rate can jeopardize an institution's image because it appears to assist with a flawed system.
Several studies conducted by DE experts show that student retention in DE is low, lower than conventional/face-to-face education. Xu and Jaggars (2011) stated that DE's student retention was more depressed by around 8-14% compared to face-to-face learning. Moore and Fetzner (2009) argue that student retention rates in distance classes are claimed to be 10-20% lower than conventional education. Likewise, Simpson's (2003) study in England stated the same thing. The percentage of remote institutional retention is more moderate than in traditional institutions (Figure 2.4). Belawati (1998) has conducted a UT study, and she stated that the average retention rate of UT students whose first registration between 1984 and 1990 was only 4.8%.
Retention is a complex and complicated problem. Various factors influence student retention at DE. Choi (2017) provides a concept regarding students' dropout rate at DE, as presented in Figure 2.5. Figure 2.5 shows that retention is influenced by various factors, namely: learner factors, external factors, internal factors, and external factors. Pierrakeas et al. (2004) stated that younger students (<30 years) had low retention. This fact is because they do not yet have an independent learning experience, and they tend to underestimate the effort and workload needed for study at the university level. Xenos et al. (2002) and Pierrakeas et al. (2004) in Greece stated that there is a correlation between age and dropping out of DE students.  Rovai (2008) states, in general, the factors that cause dropouts experienced by DE students include old age, lack of study time, difficulties in accessing the internet, lack of feedback from tutors, work, family, external stimuli, and personal financial problems.
Besides, several studies have shown that GPA is very influential in student retention and is a determining factor for universities' sustainability at universities (McCormic and Lucas 2014). Students' academic characteristics are determinants of dropout students (Ratnaningsih et al. 2008;Boton and Gregory 2015). Another factor is the credit load taken by students. Cambruzzi et al. (2015) in Brazil stated that many students dropped out of college because the credit load did not match students' abilities. Allen et al. (2016) said that many students experienced dropouts due to taking too much course material and paying tuition fees.

OU of Netherlands
Tele-university of

BRAOU India
Univ of South Africa aspects, both institutions, instructors, and students. Institutional factors that influence student retention include institutional encouragement in student services and orientation, tutoring services, technology support, institutional understanding of student conditions and needs, curriculum or program difficulty level, and logical course structure. Boston et al. (2011) in Muljana and Luo (2019) suggested that a too easy or too tricky curriculum can affect student retention.

Figure 4. Conceptual model of failure rate in Distance Education
Instructor/instructor factors that influence student retention include active interaction between instructors and students, social interaction between students, so they do not feel isolated, student activeness in discussions, tutors activeness in learning activities, providing guidance and feedback by tutors on discussions or student assignments, effective communication between tutors and students, attractive learning designs, interactive learning materials, clear instructions from the instructor, and the ease of obtaining teaching materials. From the student's perspective, factors that influence student retention include behavioral characteristics, demographic characteristics, and other personal characteristics. Student behavior attributes include self-regulation, metacognition, self-efficacy, discipline, motivation, learning strategies, learning satisfaction, learning readiness, technological skills, setting clear learning goals, and time management. Demographic characteristics include age and gender (although gender is not always associated with student retention in DE). Wladis et al. (2015) in Muljana and Luo (2019) suggested that older students tend to perform better and tend to have high retention in learning. Meanwhile, other personal characteristics that influence student retention include family support, home environment, family responsibilities, work, responsibilities and workload, financial problems, life problems (related to health), class conditions, and GPA. Coffield et al. (2004) suggested that learning in the DE system can be improved if students are motivated to learn by understanding their learning styles. Merriam and Caffarella (1999) report that education is a process, so there is no successful adult learning theory applied to all adult learning environments. This fact is supported by Cercone (2008), which states that adult learners in DE are diverse and have their characteristics. Every individual is different; learners at DE are unique and have different needs. This condition shows that each individual is not the same, and learning motivation will undoubtedly be extra. Thus, each individual's risk to be able to survive well in lectures will indeed be different. Also, student retention cannot be separated or bound by socio-cultural factors or the social culture of a country (Belawati 1998, Rovai 2008, and Holder 2007. Diversity in educational and organizational culture, geography, technology, study programs, student characteristics, development of support systems from institutions all play a significant role in increasing student retention in the context of DE.

Survival Model for Modeling Students Retention of Distance Education
One indicator of institutional accountability in organizing educational programs is student retention. Low student retention indicates an inability of institutions to improve the quality of education services. The issue of student retention needs to be the main focus of education services, including the ODE. Information related to factors influencing student retention needs to be explored carefully, precisely, and accurately. This information can be used to consider institutions in policymaking. With adequate, accurate, and accurate student retention modeling, ODE organizers are expected to increase student retention in completing studies and maintain student Rough Participation Rates (GER). Student retention and GER are indicators of the quality and success of Higher Education in Indonesia.
In statistics, the analytical method used for modeling survival time data is survival analysis. Survival analysis is a set of statistical procedures used to analyze data where the variable of concern is until the event occurs (Kleinbaum and Klein 2012). One of the aims of survival analysis is to discover the connection between event time and predictor variables (covariates), which are possibly significant for the process. The time until a particular event occurs is called a survival time/failure time. Survival time can be stated in years, months, weeks, or days from the beginning of the study until an event occurs.
In survival analysis, response variables related to time can be grouped as complete data or incomplete data. Comprehensive data is data when the time of occurrence of events can be observed during the study period. Meanwhile, insufficient information is data when the phenomenon of events cannot be observed ultimately. Incomplete data in survival analysis is often called censored data. D. R Cox, in 1972, developed a regression analysis called Cox regression or better known as the Cox proportional hazard model.
Some researchers have conducted studies on student retention in DE. However, the research has not reviewed that student retention problems can be seen as problems related to event analysis time or time. Low student retention can lead to dropping out of college (failure to complete study). Failure to conduct an investigation can be seen as a failure time. Thus, modeling of student retention can be seen as modeling survival time. Therefore, analysis of modeling data retention of students can use survival analysis. An illustration that student retention in the ODE can be seen as a survival time can be illustrated in 7 cases that may occur while attending lectures. Figure 5 shows the various possible causes of student academic travel during lectures at the ODE. In the first case, students smoothly attend classes until finally they are declared graduated. They can go on an educational journey without a time lag to stop temporarily. They attend lectures according to the allotted time. In the second case, students experience academic leave several semesters. They were declared non-active because they did not register for four consecutive semesters. The conditions of their educational trips are on / off. Until some time has been determined to be non-active status.

Figure 5. Cases of Student Academic Track during Lecture at Open Distance Education
Students can complete their studies in the third and fourth cases even though they have on/off trips. But not until experiencing non-active. This condition is possible because, in general, ODE students already work and have families. In the fifth case, students only re-register / register courses at the beginning of the semester. The rest, they do not register terms until the status becomes non-active. In the sixth case, students from the beginning did not re-register. They only register initially and do not continue to enroll in semester one and the following semester. Until a few semesters, they did not register for the course. In the seventh case, students experienced significant fluctuations. During their lectures, they experienced non-active status several times. In the end, they remained non-active.
The academic track conditions, illustrated in Figure 5, often occur and are real events that appear at the Open University, Indonesia. Open University is the only Higher Education institution that implements the ODE system. Based on the data conditions depicted in Figure 5, the UT student retention can be viewed as a matter related to time or time analysis. Therefore, alternative modeling for student retention cases can use survival analysis.
The factors that influence student retention are very complex and varied. In general, these factors can be grouped into individual/learner factors, internal factors, and external factors. Student retention is very personally tied to the student himself, where each individual has different characteristics and learning needs. Besides, retention is also related to the socio-cultural factor or social culture of a country. Diversity in educational and organizational culture, geography, technology, study programs, student characteristics, development of institutional support systems all play a significant role in increasing the retention of DE students.
In modeling, influential factors are called covariates. There are very many covariates involved in modeling student retention. These covariates can be grouped into time-independent covariates, timedependent covariates, and random covariates. Covariates are time-independent, including gender, marital / family status, employment status, education level / educational background, student residence, and curriculum structure. The covariates depend on time, among others: age, credit load is taken, many courses registered, GPA, and acquisition of teaching materials  Meanwhile, covariates included random effects observed have study programs of interest, tutoring services, tutorials, student activeness in discussions, tutorial assignments, attendance of tutors, feedback by tutors in meetings or tasks, and communication effective between tutors and students. Covariates of unobserved random effects include motivation, learning time management, institutional support in technology, student social interaction, student learning styles, institutional understanding of student needs, and organizational culture within the institution. With the diversity of these covariates, the survival analysis modeling using the Cox model is inadequate. Modeling that can be done is incomparable risk modeling by considering various existing covariates. Therefore, in data retention, possible data analysis is a survival model with a non-proportional hazard.
The roadmap for survival analysis and its differences with ordinary regression analysis and basic terms in survival analysis is presented in Figure 6. From Figure 6, it can be explained that if the response variable is censored data, the use of regression analysis will produce bias in estimating standard parameters and errors ( Henderson and Oman 1999). Therefore, if the response variable contains censored data, then the use of survival analysis is more appropriate. Here is a definition of some general terms contained in survival analysis that need to be known ). 1. Censored data is one form of incomplete observation. It means that it did not experience any events during the study period. In this study, censored data is data of active or graduated students during the study period. 2. Events are events that are observed in individuals during the study period. 3. Survival time is the time of resilience, i.e. when a subject/individual goes through something from a specific time until an event occurs. 4. Failure Time is the time of "failure" experienced by the subject/individual during the study period. 5. Time to an event is the time experienced by the subject/individual until an event occurs. 6. Lifetime is the lifetime of the subject observed in survival analysis. 7. Cox proportional hazard is a Cox model with relative risks that the level of risk for one individual is proportional to the level of risk of another individual in a constant ratio over time. 8. Non-Proportional Hazard means that the subject/individual observed has different risks and can change according to time changes.
9. The hazard ratio is the level of risk or danger of an individual to the observed covariate. 10. Covariates are factors in modeling that act as explanatory variables. 11. Time independent covariates are covariates (factors that act as explanatory variables) in modeling that are not time-dependent-for example, gender, domicile, educational background. 12. Time-dependent covariates are covariates (factors that act as explanatory variables) in timedependent modeling. For example: taking courses, the number of credits that can change each semester. 13. The fixed effect is a permanent effect on a model that functions as a covariate (explanatory variable). 14. Random effects are random effects that can be involved in modeling. 15. Frailty is a proportionality factor that is not observed random effects that modify individuals or related individuals' hazard function, for example, in cultural health/hospital culture in treating patients, patient genetics, or patient dietary patterns. For instance, in education, student learning culture, environmental factors, or student learning time management. The use of survival models in modeling student retention can produce more valid and accurate analysis results. In modeling, it considers censored data that is data of students who are on / off in registering courses per semester. This condition is thought to influence the resulting modeling. To use appropriate statistical analysis to produce valid and accurate conclusions, it is necessary to understand the data structure and the assumptions underlying the research's use. The study of UT student retention can be read in studies conducted by Ratnaningsih et al. (2008), Ratnaningsih et al. (2018), and .

CONCLUSION
Modeling is something that needs to be done to simplify existing problems. Good modeling can produce a valid and accurate outcome. The accuracy of using modeling methods requires a good understanding of the data and assumptions that underlie the use of the method or analysis tool used.
Student retention is one indicator of an education provider's accountability, including Distance Education and Open and Distance Education. The factors that influence retention are very complex. Retention problems can be viewed as problems that are related to time or time analysis of events. Statistical analysis to analyze time events is survival analysis. Survival analysis used to analyze the relationship between factors or covariates is the Cox analysis or the Cox model.
In the case of data retention, that is influenced by several factors, namely: individual/learner factors, internal factors, and external factors. These factors can be categorized as time-dependent covariates, time-dependent covariates, and random effects. The use of Cox regression or Cox models for data retention modeling involving various factors is inadequate. Survival models that can overcome multiple covariates are survival models with incomparable risk by considering the different types of covariates engaged in modeling.