4. Methodology
The methodology developed in this study to estimate flexibility in electricity demand is based on the premise that users are potentially more flexible if they show high variability in their electricity consumption. This approach focuses on analyzing consumption variability, particularly through the coefficient of variation, which is calculated as the standard deviation divided by the mean of hourly consumption.
In line with this perspective, the study introduces an eight-phase methodology designed to identify the appropriate time intervals for implementing a differentiated ToU tariff structure, as illustrated in
Figure 1. The methodology aims to examine the variability in electricity demand and indirectly deduce the flexibility of such demand. Additionally, it considers how different exogenous variables influence demand variability, such as the socioeconomic level of consumers, climatic conditions, altitude, types of users, and their geographical location. The phases are developed as follows:
- 1)
Data Combination and Standardization
This first stage involves the collection of hourly electricity consumption data from smart meters. During this process, information is meticulously cleaned and structured, which includes removing interferences, correcting empty or inconsistent records, and identifying and handling anomalies. Additionally, consumption data is cross-referenced with other databases, such as the socioeconomic level of the users, climate, altitude, type of users, and city, including others, to provide a more complete description of the consumers, as shown in
Figure 2. Furthermore, it is imperative to transform non-ordinal categorical variables using one-hot encoding and, for ordinal variables, to establish a numerical range to ensure that machine learning models can optimally process the dataset.
- 2)
Consumption Segmentation by Customer
In this phase, each user’s consumption data is consolidated into individual files, resulting in a single file compiling all their available records. This approach allows for efficient data management, which significantly optimizes processing time in subsequent stages by eliminating the need to traverse the entire database to access information on a specific user. Additionally, consumption profiles are rigorously selected while discarding those users whose data does not represent at least a full year of records, i.e., lacking a minimum of 8,760 hourly consumption readings. This refinement ensures that the variability of consumption can be evaluated with sufficient accuracy for each consumer included in the analysis.
Figure 3 shows this procedure.
- 3)
Calculation of the Coefficient of Variation
In this phase of the study, the coefficient of variation is calculated hourly, from the first to the twenty-third hour of the day, for each user. This is done by distinguishing three types of days: weekdays, Saturdays, and holidays or Sundays, as shown in
Figure 4. It is important to mention that, at this stage, the coefficient is calculated exclusively on the basis of the electricity consumption data, without considering external variables such as the socioeconomic level of consumers, climatic conditions, altitude, types of users, or their geographical location.
The coefficient of variation, defined as the quotient between the standard deviation and the average of hourly electricity consumption, is a statistical tool that normalizes consumption variability among different users. In standardizing the variability of consumption, it allows a fair comparison between users with different consumption patterns.
- 4)
Clustering by the coefficient of variation
The purpose of this stage is to categorize consumers based on electricity consumption patterns with similar characteristics. To this end, various clustering methodologies are explored, such as k-means, k-medoids, hierarchical clustering, and DBSCAN; each has its own selection criteria based on particular characteristics of the data, such as its type, distribution, magnitude, presence of anomalies, and preference for groupings. An extensive analysis of the electric consumption dataset determined that k-means was the most effective clustering strategy for this context. As a result of this process, illustrated in
Figure 5, different user clusters are obtained that reflect consumption profiles for each category of day: weekday, Saturday, and Sunday-holiday, using k-means to distinguish these groupings from the hourly variation in consumption, CV0 to CV23, as shown in the examples of user data at the top of the diagram.
- 5)
Generation of Association Rules
The main purpose of this phase is to explore how exogenous variables such as socioeconomic level, geographical location, and climatic conditions can influence the assignment of a user to a particular consumption cluster. Using both individual external data and pre-defined grouping labels, this stage focuses on creating association rules in which those conditions serve as antecedents and belonging to a specific cluster as consequent. The Apriori algorithm is used to identify robust association rules that link external characteristics with energy consumption patterns, as shown in
Figure 6.
As a result of this phase, detailed rules are obtained that provide a comprehensive view of the influence of exogenous variables on energy consumption. This knowledge could be used to develop public policies aimed at implementing differential tariff systems, ensuring that they effectively reflect the variability and needs of users.
- 6)
Selection of Clusters and Analysis of Time Intervals
Based on the results derived from the clustering process, the clusters that stand out for their magnitude and the density of users contained in them are identified and prioritized, taking the variability in electricity consumption as the main reference. Subsequently, time intervals with significant variability in user consumption are analyzed. These intervals are essential, as they suggest opportunities for the application of differential tariffs, guiding electric supply companies towards billing strategies more appropriate to the consumption habits of their customers.
- 7)
Removal of Clusters with Low Variability
This stage of the methodology aims to identify and discard those clusters that have low variability in their electricity consumption. This filtering process is essential to focus the study on users with high variability consumption patterns. To determine the relevance of each cluster, the coefficient of variation is analyzed. Only those groups with an average variability above 70% are retained; otherwise, they are excluded. This decision is based on the premise that users with low variability in their electricity consumption are unlikely to benefit from a differential tariff scheme due to their limited flexibility in consumption. Therefore, the study focuses on users who are likely to adapt to such a scheme. With the adjusted dataset, the stages of grouping by coefficient of variation, association rule generation and classification, and time interval selection and analysis are repeated, now focusing on users with high variability in their electricity consumption.
- 8)
Determination of Tariff Intervals
The final phase involves the concrete definition of the time intervals in which the differentiated tariffs will be applied. It is imperative that this phase be adaptable according to the specificities of each electric company and its respective market. The selection of time slots should seek a balance that benefits both electric companies and users based on the available energy sources and other factors relevant to the implementation.