Submitted:
30 June 2023
Posted:
03 July 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. Introduction
1.2. Terms and Definitions
| Section | Term | Definition |
|---|---|---|
| 1.2.1 | Exposure to data breaches | The potential risk is that an organization's data or information may be accessed, stolen, or compromised due to unauthorized access or cyberattacks. In the context provided, exposure to data breaches is determined by analyzing the valid email addresses in the organizational database and identifying any connections to known compromised data breaches [19]. |
| 1.2.2 | Known compromised data breaches. | Data breaches have been identified and documented in which unauthorized individuals have accessed, stolen, or compromised sensitive data. In this case, the focus is on breaches involving organizational email domains (.com, .org, .net, and .gov) [19]. |
| 1.2.3 | Compromised email addresses across all known data breaches | Compromised email addresses across all known data breaches: This term refers to the email addresses of unique individuals found in the datasets of known compromised data breaches. In this study, 1,530 unique individuals have compromised email addresses across all the known data breaches [19]. |
1.3. Scope
1.4. Research Question and Hypothesi
- (RQ1). How effective are DFA techniques in identifying patterns and trends in malicious failed login attempts in M365 environments?
1.5. Significance of the Research
2. Materials and Methods
2.1. Methodology
2.2. Data Collection
2.3. Data Preprocessing
2.3.1. Elastic Extraction of M365 Login Information
- Timestamp
- User ID
- Source IP address (attacker)
- Action (login attempt)
- Result (outcome of the login attempt).
2.3.2. Windows PowerShell Extraction of M365 Account Information
- User ID
- Account Enabled (Y/N)
- Blocked Login (Y/N)
- User Type (List)
- Licensed (Y/N)
- Mailbox Type (List)
- MFA Enabled (Y/N)
2.3.3. Exploration and Identification of Public Data Breaches
2.3.4. Consolidation of Public Data Breach Information
2.3.5. Anonymization and Transforming of Data
- Anonymizing all personal data that could identify a user or an organization by assigning a random generated four-digit number.
- Removing irrelevant entries.
- Normalizing timestamp formats.
- Breaking out timestamp information and converting it to a numerical format.
- Converting source IP address to a numerical format.
- Transform any additional remaining categorical data (Action and Result) into numerical representations.
2.3.6. Final Preprocessing
2.4. Pattern Recognition Techniques
2.5. Validation
3. Results
3.1. Introduction to Results
3.2. Data Collection and Preprocessing Results
3.2.1. Demographic Distribution Analysis of Known Data Breaches:
- The organizations' email domains were involved in sixty-nine known compromised data breaches.
- The organizations involved in the study have 2,968 valid email addresses, which were used to determine the exposure to data breaches.
-
Out of valid email addresses, 1,530 unique accounts were found to have compromised email addresses across known data breaches.
- ◦
- 485 (16.341%) matching email addresses were found in both the list of valid organizational email addresses and the list of known compromised data breaches, suggesting a significant security concern for the organizations. These identified accounts were utilized for the study.
- ◦
- 956 (32.210%) fake or spoofed email addresses were identified in the breaches. Although these email addresses were not valid organizational email addresses, they represent potential threats to the organization's email security. These identified accounts were excluded from the study.
- ◦
- 89 (2.999%) user IDs were excluded for not having complete or enough valid organizational information or email addresses. These identified accounts were also excluded from the study.
- A total of 3,925 compromised email addresses were used in the data breaches, indicating that some individuals experienced multiple breaches.
3.2.2. Retrospective Analysis
- Historical Learning: By analyzing previous data breaches, invaluable insights into the methodologies employed by cybercriminals can be gleaned. This knowledge aided in identifying patterns and trends, which can then be utilized to bolster current and future cybersecurity strategies.
- Vulnerability Identification: The examination of specifics from past breaches, such as the types of data compromised, enables common vulnerabilities exploited by attackers to be pinpointed. This information can guide organizations in directing their resources and efforts toward protecting against similar vulnerabilities.
- Relationship Establishment: Computing correlation coefficients between pairs of datasets where a user's ID was compromised facilitated the understanding of the relationships between these breaches. This process is key in determining whether these breaches are isolated incidents or parts of broader, interconnected cyberattack patterns.
3.2.4. Data Preprocessing
-
Unsuccessful malicious failed login attempts (Dataset 1)
- ◦
- Dataset 1 (D1) comprised of 2,025,493 failed login attempts from 60,209 unique source IPs across 176 countries.
- ◦
- In this dataset, the most frequent outcome of the login action observed was "UserLoginFailed," which aligns with the anticipated expectation.
- ◦
- 449 unique user IDs, were identified for the study.
-
Successful, legitimate logins (Dataset 2).
- ◦
- Dataset 2 (D2) contained 253,148 successful login attempts that originated from 8,990 unique source IPs across 99 countries.
- ◦
- In this dataset, the most frequent outcome of the login action observed was "UserLoggedIn," which is what was expected.
3.3. Pattern Recognition Results
3.3.1. Correlation Analysis Results
- Σd^2 is the sum of the squared differences between the ranks of corresponding values in two columns.
- n is the number of data points (rows) in each column.
3.3.2. Clustering Analysis Results
3.3.2.1. Descriptive statistics of the clusters showed:
- Cluster 1 had the most significant number of user IDs, with 215.
- Cluster 2 had a moderate number of user IDs, with 117.
- Clusters 3, 4, and 5 had the smallest user IDs, with 52, 60, and 56, respectively.
3.3.2.2. Analysis of the combined cluster matrix revealed several key findings:
- A significant proportion of user IDs were associated with multiple data breaches, indicating that users are often exposed to multiple threats.
- Some data breaches were more prevalent across user IDs, suggesting that certain breaches have a wider-reaching impact on user exposure.
- The distribution of user IDs among clusters varied, with some clusters having a higher concentration of users exposed to specific data breaches.
- Relationships between clusters and data breaches were observed, with certain clusters being more strongly associated with specific data breaches.
3.3.2.3. Appendix A and B analysis of the cluster matrix
- Cluster 1 contained most of the dataset and likely represented those users who have experienced the most severe security incidents or breaches.
- Cluster 2 represents users who have experienced more significant security incidents or breaches.
- Cluster 3 represents users who have experienced security incidents or breaches related to specific industries or regions.
- Cluster 4 represents users who have experienced some security incidents but are not as significant as those in other clusters.
- Cluster 5 represents users who have not experienced any significant data breaches or security incidents.
3.3.3. Association Rule Mining Results
- Rules 2, 3, and 5 have similar confidence, lift, and Zhang's Metric values, suggesting that these rules also have a strong positive association between the antecedents and consequents.
- Rule 4 has slightly lower confidence but still presents a high lift, and Zhang's Metric also indicates a strong positive association.
3.3.4. APT Groups and Data Breaches Results
- APT28 (Fancy Bear) was linked to the LinkedIn breach, using spear-phishing and exploiting software vulnerabilities to compromise millions of user accounts.
- The Syrian Electronic Army (SEA) was suspected of being behind the Twitter breach, leveraging social engineering tactics and stolen credentials to gain unauthorized access.
- APT29 (Cozy Bear) was connected to the Dropbox breach, using advanced malware and lateral movement techniques to maintain persistence and exfiltrate data.
3.3.5. Proposed Future Exploratory Analysis
3.4. Validation Results
3.5. Summary of Results
- Correlation Analysis Results: 98 meaningful correlations were identified, with the top ten pairs having the highest correlations, suggesting shared characteristics, patterns, or vulnerabilities between the breaches.
- Clustering Analysis Results: The analysis grouped user IDs based on their similarity in breach characteristics, revealing differing risks of compromise. It also showed relationships between clusters and data breaches, providing insights into specific threats and vulnerabilities.
- Association Rule Mining Results: The analysis identified relationships between breach pairs and TTPs, uncovering patterns within security logs and helping to understand better the tactics employed by malicious actors.
4. Discussion Section
4.1. Introduction
4.2. Interpretation of Results
4.2.1. Pattern Recognition Results Interpretation
4.2.1.1. Brute Force Attacks and Credential Stuffing
4.2.1.2. Targeted Accounts and High-Value Users.
4.2.1.3. Inactive and Disabled Accounts.
4.2.2. Interpretation of Results for RQ1 and H1.
4.3. Comparison to Previous Research
4.4. Practical Implications and Recommendations
4.4.1. Implications and Impact
4.4.1.1. Develop more proactive and targeted cybersecurity strategies.
4.4.1.2. Enhance threat detection and response capabilities.
4.4.1.3. Strengthen overall cybersecurity posture.
4.4.1.4. Foster collaboration and information sharing.
4.4.2. Limitations and Future Research
- Scope of data: The research focused on malicious failed login attempts and their connections to public data breaches and TTPs in M365 tenants. Consequently, the findings may not be generalizable to other cloud-based platforms or cybersecurity contexts.
- Data collection period: The data analyzed in this study were collected over a specific time frame. As cyber threats continuously evolve, further research should be conducted periodically to ensure the relevance and effectiveness of the proposed strategies.
- Human factors: While the importance of addressing human factors in cybersecurity was discussed, the research did not thoroughly explore the psychological, social, and organizational aspects that may contribute to the observed patterns of malicious failed login attempts. Future research could investigate these dimensions more comprehensively.
- Insider threats: The study primarily focused on external threats associated with malicious failed login attempts. Future research could expand the scope to include insider threats and investigate potential links between internal and external threat actors.
- Causality: The relationships observed in the study are correlational and do not necessarily imply causality. Future research could employ experimental or longitudinal designs to understand better the causal relationships between malicious failed login attempts, public data breaches, and TTPs.
- Mitigation strategies: The research focused on analyzing and understanding the relationships between malicious failed login attempts, public data breaches, and TTPs, rather than proposing specific mitigation strategies. Future research could build on these findings to develop more targeted and effective cybersecurity defenses.
5. Conclusions
5.1. Summary of Main Findings
- A significant relationship exists between malicious failed login attempts in M365 tenants and known public data breaches or compromised email addresses.
- Digital forensics techniques effectively analyze M365 security logs, identifying patterns and trends in failed malicious login attempts linked to public data breaches or compromised email addresses.
- APT data integration enhances the detection of potential sources of failed malicious logins in M365 tenants and informs the development of proactive cybersecurity strategies.
- The study used association rule mining to reveal patterns within the security logs, highlighting the frequent co-occurrence of specific TTPs employed by malicious actors.
- Top association rules revealed in the study show strong relationships between multiple combinations of the identified TTPs. Security teams can use this information to identify patterns and trends in malicious login attempts and develop targeted mitigation strategies.
- Correlation analysis demonstrated the potential of using breach and APT data to detect potential sources of failed malicious logins and inform proactive cybersecurity strategy development. Significant correlations were found between different breaches.
- Cluster analysis identified distinct user ID clusters with varying risk levels, helping organizations prioritize defenses and allocate resources against relevant threats.
5.2. Contributions to the Field
- Determining a significant relationship exists between malicious failed login attempts in M365 tenants and known public data breaches or compromised email addresses.
- Demonstrating the effectiveness of digital forensics techniques in analyzing M365 security logs and identifying malicious login attempt patterns.
- Providing insights into the TTPs employed by threat actors in M365 cyber-attacks.
5.3. Practical Implications
- Enhanced detection and mitigation of malicious failed login attempts by leveraging digital forensics techniques.
- Improved understanding of the threat landscape, enabling organizations to adopt a proactive stance toward cybersecurity.
- Targeted allocation of resources and prioritization of defenses against the most relevant threats based on the identified patterns and trends.
5.4. Potential areas for future research include:
- Examining the role of artificial intelligence and automation in enhancing the analysis of M365 security logs.
- Exploring the impact of new cybersecurity policies, regulations, or industry standards on mitigating M365 cyber-attacks and developing proactive cybersecurity strategies.
5.5. Regarding Future Research Directions
- Cybercriminal Psychology: A more profound investigation into the psyche of cybercriminals could reveal their motivations, decision-making patterns, and behavioral tendencies. Understanding these psychological aspects could potentially improve predictive capabilities and inform more effective preventative measures against future attacks.
- Broadening the Analysis Scope: This includes extending the exploration to other cloud services and platforms, aiming to gather a more comprehensive understanding of adversarial behavior patterns and TTPs in various digital environments.
- Longitudinal Data Analysis: This approach involves analyzing data across extended periods to uncover evolving trends and shifts in threat actor tactics. The insights gathered would enrich the understanding of the constantly transforming cyber threat landscape.
- Delving into Mitigation Strategies: A deeper investigation into the efficacy of mitigation strategies against the identified TTPs can provide actionable recommendations for organizations, helping fortify their cybersecurity defenses.
- Experimental Research: Research involving controlled experiments or simulations could be beneficial to evaluate the effectiveness of diverse countermeasures and their impact on diminishing the risk of successful cyber-attacks.
- Process Modeling: Efforts could be directed towards creating a more systematic and replicable description of the process that leads to data breaches. The challenge in this endeavor arises from the varying methodologies employed by cybercriminals. However, developing such a process model could offer valuable insights into the dynamics of these cyber-attacks, subject to the data's limitations.
5.6. Final Thoughts
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
- Demographic Distribution Summary of Known Data Breaches
- Finance and Insurance: 35%
- Healthcare: 22%
- Technology: 16%
- Retail: 12%
- Manufacturing: 10%
- Other industries: 5%
- Business and Data Services: Adapt, Apollo, B2B USA Businesses, Data-Leads, Elasticsearch Instance of Sales Leads on AWS, Exactis, Factual, Lead Hunter, NetProspex, Verifications.io, Whitepages
- Technology Platforms: Adobe, Animoto, Bitly, Canva, Chegg, Disqus, Dropbox, Edmodo, Emotet, Epik, LinkedIn, LiveAuctioneers, LiveJournal, Modern Business Solutions, mSpy, MyFitnessPal, Nitro, QuestionPro, ShareThis, SlideTeam, Stratfor, Ticketfly, Twitter, Zomato, Zynga
- Retail and E-commerce: Bonobos, CafePress, Covve, Drizly, EatStreet, Evite, Fling, Gaadi, Gravatar, HauteLook, Houzz, Justdate.com, Minted, MMG Fusion, River City Media Spam List
- Automotive: Audi
- Gaming and Entertainment: ArmorGames, Digimon, Forbes, Straffic, Zynga
- Adult Content: Fling
- Health and Fitness: MyFitnessPal
- Education: Chegg, Edmodo
- Social Media and Networking: Disqus, Gravatar, LinkedIn, LiveJournal, Twitter
- Food and Beverage: Drizly, EatStreet, Zomato
- 2011: Fling, Stratfor, LinkedIn
- 2012: Adobe, Dropbox, Disqus, LinkedIn, Twitter
- 2013: Adobe, Gravatar
- 2014: Bitly, Digimon, Forbes, LiveJournal
- 2015: Gaadi, mSpy
- 2016: Anti Public Combo List, Exploit.In, NetProspex, Lead Hunter, Modern Business Solutions
- 2017: Edmodo, Factual, Onliner Spambot, River City Media Spam List, Trik Spam Botnet, Zomato
- 2018: Adapt, Animoto, Apollo, Bitly, Chegg, Covve, Dropbox, HauteLook, Houzz, MyFitnessPal, Straffic
- 2019: CafePress, Canva, EatStreet, Elasticsearch Instance of Sales Leads on AWS, Evite, Exactis, LiveAuctioneers, River City Media Spam List, ShareThis, Verifications.io, Whitepages
- 2020: Audi, Bonobos, Covve, Data Enrichment Exposure From PDL Customer, Drizly, HauteLook, LiveAuctioneers, Nitro, ParkMobile
- 2021: B2B USA Businesses, Data-Leads, Epik, MeetMindful, Minted, MMG Fusion, QuestionPro, SlideTeam
- Size of Data Breaches:
- Largest breaches (over 100 million records): Adobe, Canva, Collection #1, Evite, Exactis, LinkedIn, River City Media Spam List, Verifications.io
- Medium-sized breaches (10 million - 100 million records): Adapt, Apollo, Bitly, CafePress, Chegg, Disqus, Dropbox, Edmodo, Houzz, MyFitnessPal, NetProspex, Nitro, ParkMobile, ShareThis, Zomato, Zynga
- Smaller breaches (1 million - 10 million records): Animoto, Audi, Bonobos, Covve, Data-Leads, Drizly, EatStreet, Emotet, Epik, Factual, Fling, Forbes, Gaadi, Gravatar, HauteLook, Lead Hunter, LiveAuctioneers, LiveJournal, mSpy, Minted, MMG Fusion, Modern Business Solutions, QuestionPro, SlideTeam, Stratfor, Ticketfly, Twitter
- Email addresses
- Hashed and plaintext passwords
- Usernames
- Names
- Physical addresses
- Phone numbers
- Date of birth
- Social media profiles
- Personal preferences
- Payment information
- Health and fitness data
- Phishing campaigns
- Social engineering
- Exploiting unpatched vulnerabilities
- Credential stuffing
- Password spraying
- Brute force attacks
- SQL injection
- Malware infections
- Third-party service compromises
- Insider threats
- Advanced Persistent Threats (APTs)
- Supply chain attacks
Appendix B
- Adapt: A business data provider suffered a data breach in 2018, exposing approximately 9.3 million records, including email addresses, personal information, and business data.
- Adobe: In 2013, Adobe experienced a significant data breach, compromising about 153 million user records, including email addresses, passwords, and password hints.
- Animoto: A video creation platform breached in 2018, compromising 25 million user records, including email addresses and hashed passwords.
- Anti Public Combo List: A compilation of data breaches discovered in 2016, compromising over 458 million email addresses, usernames, and plaintext passwords.
- Apollo: A sales engagement platform suffered a data breach in 2018, compromising 125 million records, including email addresses, names, and job titles.
- Audi: A 2020 data breach at Audi and Volkswagen impacted 3.3 million customers, exposing email addresses, phone numbers, and vehicle identification numbers (VINs).
- B2B USA Businesses: In 2021, a database containing 63 million records from various B2B USA companies was leaked, including email addresses and other personal information.
- Bitly: The URL shortening service experienced a data breach in 2014, leading to the compromise of email addresses, encrypted passwords, and API keys.
- Bonobos: The men's clothing retailer suffered a data breach in 2020, exposing approximately 70GB of data, including 7 million email addresses and other personal information.
- CafePress: In 2019, CafePress experienced a data breach, compromising 23 million user records, including email addresses and password hashes.
- Canva: A graphic design platform breached in 2019, exposing 137 million user records, including email addresses and bcrypt-hashed passwords.
- Chegg: An education technology company suffered a data breach in 2018, compromising 40 million records, including email addresses, usernames, and hashed passwords.
- Cit0day: In 2020, a collection of 23,000 breached databases was leaked, containing billions of records, including email addresses, usernames, and plaintext passwords.
- Collection #1: A massive data breach compilation discovered in 2019, consisting of over 770 million unique email addresses and over 21 million unique passwords.
- CouponMom-ArmorGames: A data breach in 2020 affected both CouponMom and ArmorGames, compromising 11 million records, including email addresses and plaintext passwords.
- Covve: A data breach in 2020 exposed the records of 22 million users, including email addresses, names, phone numbers, and LinkedIn profiles.
- Data Enrichment Exposure From PDL Customer: In 2019, a security lapse at People Data Labs (PDL) exposed 622 million records, including email addresses and other personal information.
- Data-Leads: In 2021, a data breach compromised 63 million records from various B2B companies, including email addresses, names, and phone numbers.
- Digimon: An unofficial forum for Digimon fans was hacked in 2014, compromising 4.9 million records, including email addresses, usernames, and IP addresses.
- Disqus: A blog comment hosting service breached in 2012, resulting in the exposure of 17.5 million user records, including email addresses, usernames, and hashed passwords.
- Drizly: An alcohol delivery platform experienced a data breach in 2020, compromising 2.5 million user records, including email addresses, hashed passwords, and personal information.
- Dropbox: In 2012, Dropbox suffered a data breach, resulting in the exposure of 68 million user records, including email addresses and hashed passwords.
- EatStreet: A food delivery platform breached in 2019, compromising 6 million user records, including email addresses, hashed passwords, and personal information.
- Edmodo: An educational platform experienced a data breach in 2017, exposing 77 million user records, including email addresses, usernames, and bcrypt-hashed passwords.
- Elasticsearch Instance of Sales Leads on AWS: In 2019, an unprotected Elasticsearch instance exposed 60 million sales leads, including email addresses and other personal information.
- Emotet: A notorious botnet and malware family involved in multiple phishing campaigns targeting email addresses, banking credentials, and other personal information.
- Epik: A domain registrar and web hosting company suffered a data breach in 2021, compromising email addresses, account credentials, and customer records.
- Evite: A social planning and invitation platform breached in 2019, leading to the exposure of 101 million user records, including email addresses, plaintext passwords, and personal information.
- Exactis: A data aggregator experienced a data breach in 2018, compromising 340 million records, including email addresses, phone numbers, and other personal information.
- Exploit.In: A forum for hackers, which in 2016 released a database containing 593 million email addresses and plaintext passwords from multiple data breaches.
- Factual: A location data company suffered a data breach in 2017, compromising 2.5 million user records, including email addresses, hashed passwords, and personal information.
- Fling: An adult dating website experienced a data breach in 2011, exposing 40 million user records, including email addresses, usernames, and plaintext passwords.
- Forbes: The media company suffered a data breach in 2014, compromising 1 million user records, including email addresses, usernames, and hashed passwords.
- Gaadi: An Indian car research platform experienced a data breach in 2015, exposing 2.2 million user records, including email addresses, usernames, and hashed passwords.
- Gravatar: In 2013, a security researcher discovered a vulnerability in Gravatar that could potentially expose user email addresses, but no data breach was reported.
- HauteLook: A fashion retailer suffered a data breach in 2018, compromising 28 million user records, including email addresses, bcrypt-hashed passwords, and personal information.
- Houzz: A home design platform experienced a data breach in 2018, exposing 48 million user records, including email addresses, usernames, and hashed passwords.
- Justdate.com: A dating platform suffered a data breach in 2017, compromising 1.7 million user records, including email addresses, bcrypt-hashed passwords, and personal information.
- Kayo.moe Credential Stuffing List: In 2018, a collection of 42.5 million email addresses and plaintext passwords from various sources was discovered, potentially used for credential stuffing attacks.
- Lead Hunter: A data breach in 2016 affected the sales lead generation platform, compromising 68 million user records, including email addresses, hashed passwords, and personal information.
- LinkedIn: In 2012, LinkedIn experienced a data breach, compromising 165 million user records, including email addresses and hashed passwords. A separate incident in 2021 involved scraped data from around 500 million LinkedIn users, including email addresses, though this was not a direct breach of their systems.
- LiveAuctioneers: An online auction platform breached in 2020, leading to the exposure of 3.4 million user records, including email addresses, hashed passwords, and personal information.
- LiveJournal: A blogging platform experienced a data breach in 2014, compromising 26 million user records, including email addresses, plaintext passwords, and usernames.
- MeetMindful: A dating platform suffered a data breach in 2021, exposing 2.3 million user records, including email addresses, names, and location data.
- Minted: An online marketplace for independent artists experienced a data breach in 2020, compromising 5 million user records, including email addresses, hashed passwords, and personal information.
- MMG Fusion: A dental marketing software provider suffered a data breach in 2021, exposing 2.6 million user records, including email addresses and other personal information.
- Modern Business Solutions: A data management and monetization company experienced a data breach in 2016, compromising 58 million user records, including email addresses, IP addresses, and personal information.
- mSpy: A mobile monitoring and parental control software provider suffered a data breach in 2015, exposing 4 million user records, including email addresses, encrypted passwords, and payment details.
- MyFitnessPal: A fitness and nutrition app experienced a data breach in 2018, compromising 150 million user records, including email addresses, hashed passwords, and usernames.
- NetGalley: An online book review platform suffered a data breach in 2020, exposing email addresses, names, usernames, and hashed passwords.
- NetProspex: A sales lead generation company experienced a data breach in 2016, compromising 33 million user records, including email addresses, names, job titles, and company information.
- Nitro: A document management and productivity software provider suffered a data breach in 2020, exposing 70 million user records, including email addresses, names, and hashed passwords.
- Onliner Spambot: In 2017, a spambot campaign is known as Onliner Spambot was discovered, compromising 711 million email addresses, along with usernames and passwords, used for sending spam and infecting systems with malware.
- ParkMobile: A parking app experienced a data breach in 2021, compromising 21 million user records, including email addresses, names, and hashed passwords.
- QuestionPro: An online survey platform suffered a data breach in 2021, exposing 198 million user records, including email addresses, names, and hashed passwords.
- River City Media Spam List: In 2017, a data breach involving River City Media, a spamming organization, exposed 1.34 billion email addresses, names, and other personal information.
- ShareThis: A social sharing platform experienced a data breach in 2018, compromising 41 million user records, including email addresses, hashed passwords, and usernames.
- SlideTeam: A presentation template provider suffered a data breach in 2021, exposing 1.4 million user records, including email addresses, names, and bcrypt-hashed passwords.
- Straffic: A botnet involved in various phishing campaigns was discovered in 2021, potentially compromising millions of email addresses, banking credentials, and others.
- Stratfor: A global intelligence company experienced a data breach in 2011, compromising 860,000 user records, including email addresses, usernames, and hashed passwords.
- Ticketfly: An event ticketing platform suffered a data breach in 2018, exposing 27 million user records, including email addresses, names, and phone numbers.
- Trik Spam Botnet: A malware botnet discovered in 2017, compromising 43 million email addresses and plaintext passwords, used for sending spam and infecting systems with additional malware.
- Twitter: In 2018, Twitter advised its 330 million users to change their passwords due to a bug that stored plaintext passwords in an internal log. However, there was no confirmed data breach or unauthorized access.
- Unverified Data Source: A collection of compromised records discovered in 2019 containing over 62 million email addresses and plaintext passwords from various sources, with no specific attribution to a single breach.
- Verifications.io: A data validation service experienced a data breach in 2019, exposing 763 million records, including email addresses, phone numbers, and other personal information.
- Whitepages: In 2019, an unprotected Elasticsearch database exposed 22 million Whitepages records, including email addresses, names, and phone numbers. However, this was not a direct breach of Whitepages systems.
- Youve Been Scraped/You've Been Scraped: These incidents refer to data scraping, where publicly available information is collected from websites without authorization. Email addresses are often a target in these situations, but specific breaches are difficult to pinpoint.
- Zomato: An Indian food delivery platform suffered a data breach in 2017, compromising 17 million user records, including email addresses and hashed passwords.
- Zynga: A mobile gaming company experienced a data breach in 2019, exposing 218 million user records, including email addresses, usernames, and hashed passwords.
Appendix C
- Advanced Persistent Threats (APTs) Groups Associated with the Known Public Data Breaches
- LinkedIn (2012): The LinkedIn data breach, where approximately 165 million user accounts were compromised, has been attributed to a Russia-based hacker group known as APT28 or Fancy Bear. They are believed to have ties to the Russian government.
- Twitter (2013): The Twitter breach, in which around 45,000 accounts were compromised, has been suspected to be the work of the Syrian Electronic Army (SEA), an APT group with connections to the Syrian government.
- Dropbox (2012): The Dropbox breach, which affected nearly 68 million users, has been attributed to a group known as APT29 or Cozy Bear. This group is also believed to have ties to the Russian government.
- Emotet (2014-present): Emotet is a sophisticated malware strain and botnet known for distributing banking Trojans and ransomware. Although not directly attributed to a specific nation-state APT, it has been linked to various cybercrime groups and is considered an advanced threat due to its persistence and evolving nature.
- Stratfor (2011): The breach of the global intelligence company Stratfor, where around 860,000 users' data was compromised, was claimed by the hacktivist group Anonymous. However, some cybersecurity researchers have suggested that the attack might have been supported by a nation-state APT group due to the level of sophistication.
- Collection #1 (2019): While direct attribution is not available, the sheer scale of this massive data breach compilation suggests the involvement of advanced threat actors. It is possible that multiple APT groups and cybercriminal organizations contributed to or took advantage of the compromised data.
- Adobe (2013): The breach is suspected to be the work of an APT group called "PawnStorm" (also known as APT28 or Fancy Bear), which has been linked to Russian intelligence agencies. This group is notorious for targeting high-profile organizations and using spear-phishing campaigns to infiltrate networks.
- mSpy (2015): The breach was initially attributed to an unknown hacking group. However, further analysis linked the breach to the Chinese APT group called "APT3" or "Buckeye." This group is known for targeting high-profile organizations in various industries, primarily to gain intellectual property and sensitive information.
- Onliner Spambot (2017): While not directly linked to a specific APT group, it can be associated with advanced persistent cybercriminal campaigns. These campaigns often involve the use of large-scale spamming operations and the distribution of sophisticated malware such as banking Trojans and ransomware.
- Cit0day (2020) is a collection of 23,000 breached databases containing billions of records. Although difficult to attribute to a specific APT group, the scale implies multiple hacking groups' involvement. Various TTPs, such as phishing, credential stuffing, and exploiting web vulnerabilities, were likely employed in the breaches.
- SolarWinds (2020): This high-profile supply chain attack compromised numerous government and private organizations. The breach has been attributed to a Russian APT group known as APT29, also referred to as Cozy Bear or The Dukes. They are believed to have ties to Russia's foreign intelligence service, the SVR.
- Equifax (2017): The massive breach of the credit reporting agency, which affected around 147 million users, has been attributed to the Chinese APT group called APT10 or Menupass. The group is known for targeting large organizations and is believed to have ties to China's Ministry of State Security.
- WannaCry (2017): This widespread ransomware attack affected organizations and users globally. The attack has been attributed to the North Korean APT group known as Lazarus Group or Hidden Cobra. They are believed to be linked to the North Korean government and have been involved in several high-profile cyberattacks.
- NotPetya (2017): This destructive malware attack targeted organizations primarily in Ukraine but also affected global businesses. The NotPetya attack has been attributed to the Russian APT group Sandworm Team, also known as Voodoo Bear or TeleBots. They are believed to be connected to Russia's military intelligence agency, the GRU.
- Zomato (2017): Although direct attribution is not available, the scale and nature of the attack suggest that an advanced cybercriminal organization or APT group may have been involved. The breach resulted in the compromise of 17 million user records, including email addresses and hashed passwords.
- Zynga (2019): The breach affecting 218 million user records has been attributed to a well-known cybercriminal known as Gnosticplayers. While not an APT group, Gnosticplayers is responsible for a series of large-scale data breaches, indicating a high level of sophistication and persistence in their operations.
- MyFitnessPal (2018): The breach of MyFitnessPal, which compromised 150 million user records, was attributed to a group of prolific hackers known as "Magecart." Although typically known for its attacks on e-commerce sites, the group's scale and sophistication suggest it might operate at a level comparable to a nation-state APT.
- Houzz (2018): Houzz's data breach exposed 48 million user records. While a specific APT group hasn't been linked to this incident, the scale and nature of the data compromised suggest the involvement of a highly organized and possibly state-sponsored group.
- Verifications.io (2019): This incident exposed 763 million records, making it one of the most extensive collections of public data breaches. While the actual breach hasn't been linked to a specific APT, the scale and type of data suggests the involvement of advanced and persistent threat actors.
- Ticketfly (2018): While no specific APT group was attributed to the breach, the nature of the attack (a defacement of the website coupled with data exfiltration) suggests the involvement of a sophisticated threat actor, possibly with the characteristics of an APT.
Appendix D
- Different types of Microsoft 365 accounts observed in the study.
- UserMailbox: Yes, users can log in directly to their UserMailbox. This is the primary account type used by individuals to access their emails, calendar, contacts, and other Microsoft 365 services.
- SharedMailbox: No, users cannot log in directly to a shared mailbox. They need to have their own individual UserMailbox and be granted access to the shared mailbox. They can then access it via their own account.
- GAL Contact: No, users cannot log in directly to a GAL (Global Address List) Contact. These are just contact entries in the address book and do not have any login credentials associated with them.
- Room Mailbox: No, users cannot log in directly to a Room Mailbox. A room mailbox is a resource mailbox that represents a meeting space, like a conference room. Users can book the room through their own UserMailbox but cannot access the room mailbox itself.
- Health Mailbox: No, users cannot log in directly to a Health Mailbox. These mailboxes are used by Microsoft Exchange Server to monitor and test the health of the server. They are not meant for direct user access.
- Team Mailbox: No, users cannot log in directly to a Team Mailbox. A team mailbox is associated with a Microsoft Teams team and its channels. Users need to have their own individual UserMailbox and be a member of the relevant team to access the team mailbox.
- Alias: No, users cannot log in directly to an Alias. An alias is an additional email address associated with a UserMailbox that can be used to send and receive email. It is not a separate account and cannot be accessed independently.
- Equipment Mailbox: No, users cannot log in directly to an Equipment Mailbox. An equipment mailbox is a resource mailbox that represents a piece of equipment, like a projector or a company car. Users can book the equipment through their own UserMailbox but cannot access the equipment mailbox itself.
- System.Object: This is not a type of Microsoft 365 mailbox account. It appears to be a generic object reference in a programming language or script, and therefore cannot be logged into directly.
- No Mailbox: No, users cannot log in directly to a "No Mailbox" account, as it indicates that there is no mailbox associated with the user or object in question. Without a mailbox, there is no account for a user to log into.
- NoUser: No, users cannot log in directly to a "NoUser" account. This term typically refers to an account or object that has not been assigned a user or that does not have a mailbox associated with it. There is no account to log into in this case.
- Sync: This term is not a specific type of Microsoft 365 mailbox account. It might refer to the synchronization process between on-premises Active Directory and Azure Active Directory, or other data synchronization scenarios. As such, users cannot log into a "Sync" account, as it does not represent a mailbox or user account.
- Alias: No, users cannot log in directly to an Alias. An alias is an additional email address associated with a UserMailbox that can be used to send and receive email. It is not a separate account and cannot be accessed independently. Users need to log in to their primary UserMailbox to access emails sent to their alias.
References
- Carlson, A. (2019). Microsoft 365 and Exchange Server Hybrid Forensics (Doctoral dissertation, Utica College). ProQuest Dissertations Publishing. (27670117).
- El Jabri, C. El Jabri, C., Frappier, M., Tardif, P.-M., Lepine, G., & Boisvert, G. (2021). Statistical approach for cloud security: Microsoft Office 365 audit logs case study. In The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Conference Proceedings (pp. 1-6). Piscataway. [CrossRef]
- Back, S., & LaPrade, J. The future of cybercrime prevention strategies: Human factors and a holistic approach to cyber intelligence. International Journal of Cybersecurity Intelligence and Cybercrime 2019, 2, 1–4.
- Cornejo, G. A. (2021). Human Errors in Data Breaches: An Exploratory Configurational Analysis. ProQuest Dissertations Publishing, Nova Southeastern University. (28775912).
- Huang, T.-K. (2013). Understanding online malicious behavior: Social malware and email spam (Doctoral dissertation, University of California, Riverside). ProQuest Dissertations Publishing. (3600570).
- Bhardwaj, A., Kaushik, K., Alomari, A., Alsirhani, A., Alshahrani, M. M., & et al. BTH: Behavior-Based Structured Threat Hunting Framework to Analyze and Detect Advanced Adversaries. Electronics 2022, 11, 2992. [CrossRef]
- Derbyshire, R. J. Anticipating Adversary Cost: Bridging the Threat-Vulnerability Gap in Cyber Risk Assessment; ProQuest Dissertations Publishing, Lancaster University (United Kingdom), 2022. [Google Scholar]
- Mavroeidis, V., & Jøsang, A. (2021, March 28). Data-Driven Threat Hunting Using Sysmon. arXiv.org. https://arxiv.org/abs/2103.14903. 28 March.
- Montasari, R. The Comprehensive Digital Forensic Investigation Process Model (CDFIPM) for Digital Forensic Practice; ProQuest Dissertations Publishing, University of Derby (United Kingdom). (28460690), 2021. [Google Scholar]
- Amin, R. M. Detecting targeted malicious email through supervised classification of persistent threat and recipient-oriented features. (Doctoral dissertation, The George Washington University). ProQuest Dissertations Publishing. (3428188), 2010. [Google Scholar]
- Agrawal, G., Deng, Y., Park, J., Liu, H., & Chen, Y.-C. Building Knowledge Graphs from Unstructured Texts: Applications and Impact Analyses in Cybersecurity Education. Information 2022, 13, 526. [CrossRef]
- Mouzakitis, S. Mouzakitis, S., & Askounis, D. Assessing MITRE ATT&CK risk using a cyber-security culture framework. Sensors 2021, 21, 3267. [Google Scholar] [CrossRef]
- Serketzis, N., Katos, V., Ilioudis, C., Baltatzis, D., & Pangalos, G. J. Actionable threat intelligence for digital forensics readiness. Information and Computer Security 2019, 27(2), 273–291. [CrossRef]
- Ferguson-Walter, K. J., Gutzwiller, R. S., Scott, D. D., & Johnson, C. J. (2021). Oppositional human factors in cybersecurity: A preliminary analysis of affective states. In Proceedings of the Institute of Electrical and Electronics Engineers (IEEE) Conference (pp. 153-158). [CrossRef]
- Greitzer, F. L., & Hohimer, R. E. Modeling human behavior to anticipate insider attacks. Journal of Strategic Security 2011, 4(2), 25–48. [CrossRef]
- Ramlo, S., & Nicholas, J. B. The human factor: assessing 'individuals' perceptions related to cybersecurity. Information and Computer Security 2021, 25, 350–364. [CrossRef]
- Rohan, R., Funilkul, S., Pal, D., & Chutimaskul, W. (2021). Understanding of Human Factors in Cybersecurity: A Systematic Literature Review. International Conference on Computational Performance Evaluation (ComPE) (pp. 133-140.). IEEE. [CrossRef]
- Jeong, J., Mihelcic, J., Oliver, G., & Rudolph, C. Towards an Improved Understanding of Human Factors in Cybersecurity. IEEE 5th International Conference on Collaboration and Internet Computing (CIC) (pp. n.a.). IEEE.
- Hultquist, K. E. (2011). An Analysis of the Impact of Cyber Threats Upon 21st Century Business. (Doctoral Dissertation), The College of St. Scholastica, ProQuest Dissertations Publishing. (1503100).
- Liu, K., Wang, F., Ding, Z., Liang, S., Yu, Z., & et al. Recent Progress of Using Knowledge Graph for Cybersecurity. Electronics 2022, 11, 2287. [CrossRef]
- Nisioti, A., Loukas, G., Rass, S., & Panaousis, E. Game-Theoretic Decision Support for Cyber Forensic Investigations. Sensors 2021, 21, 5300. [CrossRef]
- Triplett, W. J. Addressing Human Factors in Cybersecurity Leadership. Journal of Cybersecurity and Privacy, 2022, 2, 573. [CrossRef]
- Salik, H. (2022). Offensive Cyber Operations: Failure to Dissuade Nation-State Adversaries in Cyberspace. ProQuest Dissertations Publishing, University of the Cumberlands. (29397595).
- Rahman, T. Rahman, T., Rohan, R., Pal, D., & Kanthamanon, P. (2021). Human Factors in Cybersecurity: A Scoping Review. The 12th International Conference on Advances in Information Technology (IAIT2021), Bangkok, Thailand, June 29–July 01, 2021; pp. 1–11. [Google Scholar] [CrossRef]
- Sutter, O. W. (2020). The cyber profile: Determining human behavior through cyber-actions. (Doctoral dissertation). Capitol Technology University, ProQuest Dissertations Publishing. (29257172).
- Sutter, O. W. (2020). The cyber profile: Determining human behavior through cyber-actions. (Doctoral dissertation). Capitol Technology University, ProQuest Dissertations Publishing. (29257172). EAI Endorsed Transactions on Security and Safety 2013, 1. [CrossRef]
- Elastic. Filebeat module: o365. Elastic.co. Available online: https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-module-o365.html (accessed on 31 May 2023).
- Wells, J. A., LaFon, D. S., & Gratian, M. Assessing the Credibility of Cyber Adversaries. International Journal of Cybersecurity Intelligence & Cybercrime 2021, 4, 3–24.
- Dalal, R. S., Howard, D. J., Bennett, R. J., Posey, C., Zaccaro, S. J., & others. Organizational science and cybersecurity: abundant opportunities for research at the interface. Journal of Business and Psychology 2022, 37, 1–29. [CrossRef]
- Kioskli, K., & Polemi, N. Psychosocial approach to cyber threat intelligence. International Journal of Chaotic Computing 2020, 7(1), 159–165.
- Singh, T. (2021). The Role of Stress among Cybersecurity Professionals (Doctoral dissertation, The University of Alabama). ProQuest Dissertations Publishing. (28546079).
- Clapper, J., Lettre, M., & Rogers, M. S. (2017, January 31). Foreign Cyber Threats to the United States. Hampton Roads International Security Quarterly, 1.
- McCall, G. C. Jr. (2022). Exploring a Cyber Threat Intelligence (CTI) Approach in the Thwarting of Adversary Attacks: An Exploratory Case Study (Doctoral dissertation, Northcentral University). ProQuest Dissertations Publishing. (28968146).
- Pangsuban, P. Pangsuban, P., Nilsook, P., & Wannapiroon, P. Real-time Risk Assessment for Information System with CICIDS2017 Dataset Using Machine Learning. International Journal of Machine Learning and Computing 2020, 10, 538–543. [Google Scholar]
- Scott, J., & Kyobe, M. (2021). Trends in Cybersecurity Management Issues Related to Human Behaviour and Machine Learning. In International Conference on Electrical, Computer and Energy Technologies (ICECET) (pp. n.a.). IEEE.
- Parsons, K., McCormac, A., Butavicius, M., & Ferguson, L. (2010). Human factors and information security: Individual, culture and security environment. Defense Science and Technology Organization, Commonwealth of Australia.
| Section | Technique | Description |
|---|---|---|
| 3.2.1.1 | Have I Been Pwned | Allows users to check whether their email address has been involved in known data breaches. Enter the email address, and the site will advise if it has been compromised. |
| 3.2.1.2 | BreachAlarm | Monitors the internet for stolen data that includes email addresses and sends an email alert if the email address is found in any compromised data. |
| 3.2.1.3 | Firefox Monitor | Allows users to check whether their email address has been involved in known data breaches. Users can sign up for alerts if their email address is found in a new data breach. Mozilla provides this service. |
| 3.2.1.4 | Identity Leak Checker | Allows users to check whether their email address has been involved in known data breaches. Users can also check for compromised usernames and passwords. The Hasso Plattner Institute provides this free service. |
| 3.2.1.5 | DeHashed | Allows individuals to search for compromised email addresses, usernames, and passwords. Users can sign up for alerts if their email address is found in a new data breach. |
| Section | Analysis Technique | Description |
|---|---|---|
| 2.4.1. | Correlation Analysis | Spearman's rank correlation was used to measure the strength and direction of the relationship between breaches. This analysis helped identify significant correlations between breach pairs, highlighting shared TTPs or overlapping threat actor groups [28,20]. |
| 2.4.2. | Clustering Analysis | K-means clustering was employed to group user IDs based on their similarity in terms of failed login attempts, geographical distribution, and account statuses. This approach helped identify variations in user ID distribution among clusters, indicating differing risks of compromise [18,30]. |
| 2.4.3. | Association Rule Mining | The Apriori algorithm was used to discover interesting relationships between breach pairs and TTPs. Metrics like support, confidence, lift, leverage, and Zhang's metric were employed to evaluate the strength of these relationships. This analysis uncovered patterns within security logs, such as the frequent co-occurrence of specific TTPs, which can be used to understand better tactics employed by malicious actors and develop counter strategies [8,31]. |
| Industry | Percentage |
| Finance & Insurance | 35% |
| Healthcare | 22% |
| Technology | 16% |
| Retail | 12% |
| Manufacturing | 10% |
| Other Industries | 5% |
| Pair | Correlation | P-value |
|---|---|---|
| LiveAuctioneers & Eye4Fraud | 1 | 0 |
| LiveAuctioneers & Drizly | 1 | 0 |
| Eye4Fraud & Drizly | 1 | 0 |
| MeetMindful & Houzz | 0.989842782 | 0 |
| LiveAuctioneers & EatStreet | 0.978510047 | 0 |
| Eye4Fraud & EatStreet | 0.978510047 | 0 |
| EatStreet & Drizly | 0.978510047 | 0 |
| NetGalley & LeadHunter | 0.893865598 | 0 |
| DataEnrichmentExposureFromPDLCustomer & Exactis | 0.805917369 | 0 |
| Verificationsio & Exactis | 0.804184683 | 0 |
| Rank | Antecedents | Consequents | Confidence | Lift | Leverage | Zhang's Metric |
|---|---|---|---|---|---|---|
| 1 | {Exploit.In, Verifications.io} | {Data_Enrichment_Exposure_From_PDL_Customer, Anti_Public_Combo_List} | 0.857143 | 34.675325 | 0.013094 | 0.986682 |
| 2 | {Exploit.In, Data_Enrichment_Exposure_From_PDL_Customer, Verifications.io} | {Anti_Public_Combo_List} | 0.857143 | 31.785714 | 0.013059 | 0.984018 |
| 3 | {Exploit.In, Data_Enrichment_Exposure_From_PDL_Customer} | {Anti_Public_Combo_List, Verifications.io} | 0.857143 | 31.785714 | 0.013059 | 0.984018 |
| 4 | {Data_Enrichment_Exposure_From_PDL_Customer, Anti_Public_Combo_List} | {Exploit.In, Verifications.io} | 0.545455 | 34.675325 | 0.013094 | 0.995776 |
| 5 | {Anti_Public_Combo_List} | {Exploit.In, Data_Enrichment_Exposure_From_PDL_Customer, Verifications.io} | 0.5 | 31.785714 | 0.013059 | 0.995381 |
| Parameter | Value |
|---|---|
| Antecedents | 'Exploit.In', |
| 'Verifications.io' | |
| Consequents | 'Data_Enrichment_Exposure_From_PDL_Customer', |
| 'Anti_Public_Combo_List' | |
| Confidence | 0.857143 |
| Lift | 34.675325 |
| Leverage | 0.013094 |
| Zhang's Metric | 0.986682 |
| Section | Key Findings | Description |
|---|---|---|
| 3.5.1 | Pattern Recognition Results | Application of pattern recognition techniques (correlation analysis, clustering, association rule mining) revealed significant patterns and vulnerabilities targeted by threat actors, leading to better identification and categorization of threats. |
| 3.5.2 | Demographic Distribution Summary Results | Data breaches affected many industries and sectors, compromising billions of user records. Understanding common targets and vulnerabilities exploited by threat actor's aids in proactive measures for high-risk sectors or regions. |
| 3.5.3 | APT Groups and Data Breaches Results | Overview of known APT groups in the context of data breaches, including preferred targets, TTPs, and associations with specific breaches, helping organizations identify potential threats and understand various APT tactics. |
| Section | Recommendations | Description |
|---|---|---|
| 4.2.2.1 | Enhance threat intelligence | Better understand the threat landscape and prepare for potential attacks, focusing on the most active regions. |
| 4.2.2.2 | Prioritize vulnerability management | Address security weaknesses exploited in similar breach pairs. |
| 4.2.2.3 | Develop incident response playbooks | Develop playbooks and procedures based on the correlations, findings, and demographic data for faster detection and containment of breaches. |
| 4.2.2.4 | Increase user awareness | Raise awareness of the risks associated with data breaches and provide targeted training to reduce the likelihood of successful social engineering attacks, especially for regular users (members or active M365 accounts). |
| 4.2.2.5 | Share findings and collaborate | Collaborate with industry peers and information-sharing organizations to collectively improve defensive postures and contribute to a better understanding of the threat landscape. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
