Characterizing Female Workplace Bullying via Social Media

Motivated by the #Metoo movement, we explore in this paper people’s perception of female bullying at workplace. We looked at #workplacebullying and found that 1) people were split between identifying the prevalence of workplace bullying against female and the view that such bullying simply does not exist and is a nuisance, 2) The tweets also showed the existence of psychological effects of cyberbullying, and 3) the tweets showed many intervention techniques that can minimize the effects of such bullying. We further explored the top three recurring hashtags mentioned under the #workplacebullying and found that the three top hashtags were #sexism, #feminism and #equality. Our results showed that the above hashtags represent the positive and negative approach to workplace bullying i.e. #feminism hashtag was mostly used by people who denied that workplace bullying against females exist while # sexism was mentioned as the prime cause by people who agree that such bullying exist. #equality overwhelmingly comprises of techniques to minimize workplace bullying against females.


Motivation and Background
The #metoo movement brought attention to the world regarding the problems females face in terms of sexual harassment and abuse all over the globe. The social media storm saw people on both sides of the aisle where the debate raged between people supporting the movement to take a stand against such practices and people who either deny that such a problem exists or consider it greatly exaggerated. Warner et al. in [1] reported that despite women representing 50.8% of the US population, earning almost 50% of medical, law and specialized master's degree and holding 52 percent of management and professional positions, they only represent 19 percent of equity partners, 16 percent of medical school deans, 30 percent of college presidents and 12.5 percent of chief financial officers in Fortune 500. So the question we ask is: What kind of behavior females face in their professional lives?
Bullying at workplace has been a subject of traditional research before the advent of social media where the results show that both males and females have been subject to workplace bullying. Interestingly though, the literature indicated that females are subject to bullying by both males and females. Since the advent of social computing -using computers for social use -social media is increasingly playing a big role in changing perspectives and bringing issues to the front (#metoo is an example off this phenomenon). The authors in [2] make interesting point about change in people's interaction. The huge amount of data available from such platform and the rise of Big Data has researchers from various walks of life increasingly looking at such data to glean useful information. Work done by [3]- [5] show the use of using data from Twitter in the realm of mental health and medicine. Given the importance of collecting data, most of the social media providers provide standard APIs that allow researchers and corporations pull data from social media. Social media platforms provide certain heuristics such as likes, dislikes etc. that can gauge individuals' reaction to particular post. However, majority of valuable data is present in the form of unstructured text. Natural Language Processing (NLP) techniques are used to pull and put together relevant text gaining an insight in both the context and the relevant meaning. The NLP techniques depend on an existing corpus that can represent a language [8]. Such techniques have been used in various disciplines such as [6] in Political Science and [7] in mental health realm.
The question that comes to mind is: How reliable are insights gleaned from such data? Youyou et. al., in their work [8] showed that the profile gleaned from Facebook was more accurate than the one given by a person's actual friends and acquaintances. Paniagua and Korzynski in their work [9] contend that crowdsourcing is an integral part of the social media platform. Researchers have used the Twitter platform for both active (users are cognizant that their feedback is being used for a certain goal) and passive crowdsourcing (users are unaware that their input is being pulled together for a certain task). Work done by [10] provides an example of active crowdsourcing where users affected in emergency areas provided feedback while hashtags used in Twitter are examples of passive crowdsourcing. Our earlier work [11], [12] has made use of various hashtags such as #depression and #unemployment to gather data and glean valuable insights. While the insights gained from such hashtags confirmed research that was done traditionally, the question we wanted to answer was: Do the insights only reflect sentiments of people who are writing or do they actually also portray interventions and possible reasons for the sentiments?

Problem Description
Based on the above, we ask the following questions: 1) How do people view female bullying at workplace? 2) Does it conform to what traditional research has found? 3) Are the results consistent across various discussion groups in social media? Our current work used NLP techniques to cull together data from the social media (specifically Twitter) to characterize public's opinion regarding female bullying at workplace. Based on the data we gathered, we formulated certain hypotheses. Furthermore, we looked at the hashtags that occur the most under such discussions and repeat the process to see whether the hypotheses we gleaned from characterizing various tweets can be proven. We employed the unigram, bigram and trigram model to eliminate any ambiguity and ensure that the results we deduce are consistent across the board.
This research conducted by [13] recognized that the cyberbullying has emerged both as a national social and health issue. The authors did a study on Twitter data and presented algorithms to detect cyberbullying in social media. [14] explored the cyberbullying among the youth and concluded that cyberbullying is affecting more than half of the youth on social media. The research further showed that bullies to delete their accounts. Research done by [15] focused on school bullying based on socio demographic characteristics, parental support and friends. The authors used a quantitative approach for children belonging to 6th and 10th grade. The results from this research showed greater prevalence rates of children bullied in the school. Specifically, the results showed that at least once in the previous 2 months a student was bullied. 20% of the students were physically bullied, 53% of the students were verbally bullied and 13% of the students were electronically bullied.
This study showed that the parental support can minimize the bullying effects on the adolescents.
The research conducted by [16] looked at the students belonging to the 9th to 12th grade, concluded that, a total of 15.8% of students were the victims of cyberbullying and 25.9% said that they were the victims of school bullying in the past 12 months. [17] explored the role of cognitive and the affective empathy using focus groups and found that adolescent's bystanders preferred providing indirect assistance to the cyber bullied people by addressing to adults over the direct intervention. The research reported that empathy training, teachers and parent's intervention can play a key role in the prevention and curtaining of cyberbullying both within and outside of the school environment. [18] in a study focused on high schoolers contended that participants involved in cyberbullying as bullies had greater levels of depression and problem alcohol use. [19] targeting influence of cyberbullying on mental health for both children and adults, found a weak correlation between cyberbullying and anxiety but very strong correlation between cyberbullying and depression.
[20] based the cyberbullying detection on specific words in the tweets -a technique that is similar to Bag of Words (BoW) approach specifically and focusing on the corpus (as an example, consider the work done by [7]). This approach is also reflected in the work done by [21] where the authors looked at tweets in Indonesia and discovered the terms and patterns used by bullies. [22] used text mining methods for English language and showed that "people", "kids", "students", "schools", and "stop" were most commonly used words -results that showed that many people were concerned with stopping the bullying behavior especially among the kids. [23] contended that adolescents tend to fall back on the trust between then and other individuals online and bullies tended to target the highly trusted relationship. [20] used an enhanced version of Bag of Words model (EBoW) on Twitter to detect bullying messages. Researchers in [24] studied negative behaviors of people on ask.fm -a famous website that led to many suicidal incidents as a result of cyberbullying. They used sentiment analysis techniques along with certain heuristics to categorize people into Highly Negative, Highly Positive, Positive Negative and Others buckets. [25] did a study on Twitter and ask.fm data sets and proposed a model that discovers both bullies and victims of cyberbullying in addition to new bullying vocabularies by implying seed dictionary of cyberbullying and social interactions. Results indicates that this method is able to detect new bullying vocabularies. [24] aimed to discover new approaches to detect cyberbullying occurring over images on Instagram by trying to find correlation between several features and cyberbullying. Results indicate that about half of instagram sessions included cyberbullying. [26] showed that young female lawyers were subjected to cyber bullying (mostly through WhatsApp) and amazingly many of these female lawyers were unaware of any existing legal framework on cyber bullying. The research done by [27] research showed that harassment and bullying have the direct negative important impact on job productivity along with higher rate of burnout. [28] in their study based on 158 trainee doctors found that 46.2% experienced one or more act of bullying that negatively affected job satisfaction and mental strain. The research done by [29] explored the effects of cyber bullying and in place work bullying and found that females were subject to more workplace bullying than cyber bullying while managerial professionals faced more cyber bullying. The study also concluded that both types of bullying are linked to bad workplace environment. [30] also showed that men were more subject to cyber bullying than females in Sweden.

Workplace Bullying
[31] discovered in their work that women initiated 58% of bullying incidents and 90% of the time other women. While men were said to be more aggressive in terms of sexual behavior, women bullies focused on relational aggression -attacking the victims' social status and relationships. Similarly, the work done by [32] showed the difference between gender-based view of workplace bullying where female employees were bothered more by emotional bullying and professional discredit while the males focused more on abusive work conditions. The study aimed at Danish eldercare [33] workers showed a high correlation of female workers' bullying and the onslaught of major depression. The work done by [34] showed that a) both males and females faced bullying at work, 2) females were subjected to sexual harassment much more and 3) females were subjected to bullying more by their male colleagues while males were bullied by both coworkers and supervisors. [35] did a study on Austrian Armed forces and found that 6.5% suffer from long-term aggression and bullying. They also showed that the support units were cognizant of females' abilities to perform their duties competently compared to the combat units. Authors in [36] discussed the violence suffered by graduate and undergraduate female nursing students in developed countries -such as US, UK, Canada and Australia -and discussed practical approaches to prevent such phenomenon. [37] studied the environment in India and proved the hypothesis that women are subject to severe bullying. The study done by [38] showed in UK that in addition to the fact that females faced more bullying, interestingly 70% of the female respondents reported being bullied by female managers. The study in [39] established the role of workplace bullying and showed that 1) more than half the respondents including males and females faced bullying at work, 2) males were bullied by males while females were bullied by both males and females and 3) lower bullying by female managers could be attributed to lower number of female managers.

Workplace Bullying and Physical/Mental Health
Authors in [40] did a systematic review to find a bi-directional correlation between bullying and mental health. [41] showed that the lingering effects on mental health on a long term basis where the authors studied the mental health status after five years. [42] also showed a positive correlation between workplace bullying and both mental health as well as somatic symptoms. The author in [43] discussed the role of various professions, the inherent power structure and the workplace bullying and offers different intervention techniques to be used at workplace to minimize bullying. Furthermore, [44] shows that bullying not only affects the mental health of victims but also puts them at risk for cardiovascular disease in the long term. Lastly, authors in [45] discussed how choice of IOT is also helping gather medical data for users

Experimental Setup
As is the case in such research, we only collected the data from public sources to ensure that we address the privacy concerns of using such data [46], [47] and [48]. Furthermore, we do not publish any user handle on twitter but make sure that we eliminate duplicate tweets to ensure that we are working from a clean set of data.

Preprocessing and Processing Data
We followed the following process for preprocessing and processing of data: 1. Collected a months' tweets under the hashtag #workplacebullying 2. Used the nltk toolkit to parse the texts and get rid of the stop-words (recurring words such as articles that need to be filtered out) 3. We use the tf-idf algorithm [49] to generate the keywords 4. Generated the unigram, bigram and trigram keywords 5. Once we have finalized the preprocessing part, we used the sklearn library to tokenize and vectorize the tweets. 6. For the sake of our work, we treated the entire set of tweets as one corpus. 7. In addition to collecting the n-gram keywords, we also collected all the hashtags that are mentioned in the tweets and the number of times they were used. 8. Implemented the above on a standard Dell running Ubuntu Linux and Python3 program with a 16G RAM

The APIs used
For this work, we used the python programming to gather and analyze the data. We used the following open-source APIs available for python programming language. 1. Twitter API: This requires registering with Twitter and creating a twitter development account. The twitter library can be installed for Python that provides all the requisite APIs 2. Pandas: This is an open-source python library which allows data cleaning, preparation and fast analysis. The data can be easily imported into Excel. 3. NLTK: This is one of the most powerful NLP libraries that provides the basic tools such as tokenization, stemming, lemmatization etc. Interested readers can refer to [49] for pertinent details. 4. Sklearn: This library helps in big data analysis such as classification, regression, clustering etc.

Results and Discussion
As a reminder, we will restate the research questions we posed as goals of this study: To answer the questions, we did an analysis of the hashtag #workplace bullying and looked at the ngram model for n= 1, 2 and 3. The n-gram model gives us the basis to answer RQ1 and RQ2. Furthermore, we looked at the top three hashtags that were mentioned in the results. These hashtags gives us the basis to answer RQ3.

#workplacebullying
We analyzed the #workplacebullying for the month of April. Following are the top ten words that were mentioned (Table 1 below). The unigram model indicated only the psychological effects (without specifying) and the need to put an end to such behavior. Note that the intervention regarding lack of understanding of bullying goes back to address the prevalent view that workplace bullying does not exist which given the traditional research is not a true notion. In summary, the bigram and trigram model reflected the following as answers to RQ1 and RQ2: From a characterization perspective, we had almost equal number of keywords that indicated the obvious existence of workplace bullying and the impression that workplace bullying is not an actual phenomenon but rather comprised of false accusations.
The tweets showed the existence of psychological effects of cyberbullying. A huge number of tweets reflected the possible intervention techniques such as a) need to understand what bullying is and b) possible solutions (not specified/detected).
We also collected the top three hashtags mentioned under the hashtag #workplacebullying which were #feminism, #equality and #sexism. Note the following: 1. #sexism is potentially mentioned as a reason for workplace bullying 2. #equality potentially reflects the desired outcome of getting rid of #bullying 3. #feminism is possibly listed as a reason by people who believe that workplace bullying does not exist.
Given the above possible reasons, we repeated the experiments done on #workplacebullying.

#equality
Looking at the #equality hashtag, we found the following keywords for the unigram, bigram and trigram model. (Table 2). Looking at the results above, we conclude the following: 1. The word Japan was mentioned significantly. Interestingly, Japan has recently put a roadmap to end workplace bullying in place. Research indicated that both males and females were subject to workplace bullying 2. Majority of the tweets discussed potential solutions to workplace bullying such as searching for safe harbor and seeking justice. 3. The status quo of being physically and verbally abused is potentially prevalent in Japanese culture and that the truth might get away. 4. Even though the potential workplace bullying equally applied to males and females in Japanese culture, concerns regarding females were discussed in more detail as females face unique challenges due to being working mothers in such culture (indicated by the term "Time Left Home") and the fact they are working with males on the frontlines in top professions indicated by the term "Doctors Engineers Frontline". This was also evident in terms of support for working moms as "Shine working moms". 5. Lastly, we also saw the term "Sexism" used quite often but interestingly in a positive way indicating that potentially the dominant male culture look at teaching sexism in a positive way.
The hashtag #equality, included other hashtags such as well, such as #girlsmatter #humanrights #environmnet #sexism #womenleaders which further asserts the fact that workplace bullying for females is being discussed in this hashtag.

#feminism
The hashtag #feminism generated the following results (Table 3). The #feminism keywords has a mostly negative connotations confirming our assumption from the keywords gleaned from #workplacebullying as many people considered it equivalent to corona virus and cancer as can be seen above. The keywords "sexism boys nice" was interestingly also present in the #equality. Also many people considered feminism as fake.

#sexism
Looking at the #sexism hashtags, we arrived at the following results (Table 4). Strictly looking at the unigram, bigram and trigram model, we see that the tweets indicated the prevalent issues against females in workplace. Interestingly, the hashtag pointed out the prevalence of misogyny in the field of science. Overall, the hashtag characterizes the status quo when it comes to workplace bullying against females.
The hashtags we found mentioned overwhelmingly under the #sexism tag were #racism, #feminist and #women.

Conclusions
In this paper, we explored how cyberbullying against females at work is viewed in social media. We started with looking at #workplace bullying and found that 1) people were split between identifying the prevalence of workplace bullying against female and the view that such bullying simply does not exist and is a nuisance, 2) The tweets also showed the existence of psychological effects of bullying in the workplace, and 3) the tweets showed many intervention techniques that can minimize the effects of such bullying. We explored the top three hashtags that came up under the #workplacebullying and found that the three top hashtags were #sexism, #feminism and #equality. This led us to the conjecture that the three top hashtags go back to the three results we obtained under the #workplacebullying namely 1) cause of bullying against females (#sexism), 2) denial of bullying existence (#feminism) and possible intervention techniques (#equality). Our results confirmed the above and found overwhelming negative connotation to #feminism while the opposite was true for #equality.
We believe that the results from this study are significant enough to explore this further. Specifically, we want to explore the concept of wisdom of crowd and see if it applies to social media to give further credence to this work. Furthermore, it will be interesting to compare the results of this study to tweets written in another language.
Author Contributions: For research articles with several authors, a short paragraph specifying their individual contributions must be provided. The following statements should be used "Samara Ahmed: Conceptualization, 1,2, 5 and writing-original draft preparation and software; Adil Rajput: methodology, 3, 4 and writingoriginal draft preparation and software; Akila Sarirete: 4, validation and software and writing-review and editing; Asma Aljaberi, Ohoud Alghanem and Abrar Alsheraigi: investigation, data curation, 2.2 and 3; Funding: "This research received no external funding".
Conflicts of Interest: "The authors declare no conflict of interest."