One of the most used social networks today is Twitter, where daily millions of users express their opinions in 280 characters. In this diverse social showcase there is room for all kinds of ideas, although many take advantage of the anonymity offered by the platform to lose their inhibitions and spread insults and hate messages.
As in real life, this aversion is usually directed at the most vulnerable groups. An example of this is the situation experienced by women in a position of responsibility who also have public profiles on Twitter, who are harassed almost daily. For this reason, many of them decide to delete their profiles or make them private, with which their voices end up being silenced and lost.
Artificial intelligence can help us analyze the real dimensions of the problem of misogynistic insults on Twitter. Through natural language processing, for example, we can analyze from a sociolinguistic perspective which words are the most repeated regarding a topic of discussion, the users who utter the most insults, or even extract the sentiment conveyed by a set of tweets.
The first step for the analysis would be to create a developer account on Twitter and search with the help of a library of the Python programming language (such as Tweepy) only those tweets that include words relevant to the investigation. In our case, we could search for tweets that contain “feminazi” and are sent on specific dates (such as Women’s Day or after the resignation of a high office), or usernames of women in the public eye to analyze the opinion of the majority of users about them.
Once the samples are extracted, we can use data analysis tools to clean the texts and remove punctuation marks or symbols such as “@” or “#”. The objective of this process is to keep the words with the most meaning, such as adjectives and nouns in Spanish. We could also go further and carry out a lemmatization, separating each word according to morphological characteristics such as suffixes and roots for later analysis. It can also be useful to apply a spell checker that “translates” internet jargon into a more standard language.
Next, we apply machine learning algorithms using Python, which contains easily accessible libraries that greatly simplify the analysis process. Thus, through a function we can distinguish male and female users thanks to their usernames and a comparison with a list taken from the INE, where Spanish names associated with a gender appear.
A next step to analyze tweets is to use artificial intelligence programs to draw conclusions about the sentiment they express. We can find sentiment analysis models on the website hugging face, based on a text classification technology developed by Google. This makes use of a neural network architecture and labels words according to their context and their position in the sentence, comparing it with millions of texts; to determine if a tweet has more content that expresses, for example, happiness, sadness or anger.
All the cited tools are useful for various analyzes of a sociolinguistic nature. Precisely, they served for the realization by the author of the final work of the Master in Digital Letters entitled Silenced in cyberspace: an approach to online misogyny, where the opinion of Twitter users about the women Adriana Lastra and Macarena Olona was analyzed on the respective days of their resignation in 2022. After the study, serious tweets were discovered threatening the health of both and classified mostly as “anger”. by artificial intelligence models. Also, through a model that classified sentiment as positive or negative, Lastra and Olona received figures of 93% and 63% negative comments over 50,000 tweets compared to 7% and 37%. of positive comments, respectively. Finally, it was concluded that the platform’s prevention measures against misogynistic harassment are ineffective, since we verified that openly misogynistic communities such as the so-called incels are not sufficiently penalized and their posts continued (and continue to be) legible on Twitter.
Much of the responsibility for ending discrimination falls on the institutions themselves, which must ensure an equal internet. To this end, the aforementioned artificial intelligence tools can be used by both individuals and companies to analyze the phenomenon globally and contribute to its extinction, adapting the measures to the nature of the insults and the characteristics of the users who utter them. In the future, it may be the machines themselves that act as moderators in the face of online violence using an advanced version of these algorithms.
Blanca Garrido Salmeron She is a computational linguist and graduated from the Master in Digital Letters from the Complutense University of Madrid.
Chronicles of the Intangible is a space for the dissemination of computer science, coordinated by the academic society SISTEDES (Sociedad de Ingeniería de Software y de Tecnologías de Desarrollo de Software). The intangible is the non-material part of the computer systems (that is, the software), and its history and evolution are related here. The authors are professors from Spanish universities, coordinated by Ricardo Peña Marí (professor at the Complutense University of Madrid) and Macario Polo Usaola (tenured professor at the University of Castilla-La Mancha).