Digital sources : a case study of the analysis of the Recovery of Historical Memory in Spain on the social network Twitter

The incorporation of digital sources from online social media into historical research brings great opportunities, although it is not without technological challenges. The huge amount of information that can be obtained from these platforms obliges us to resort to the use of quantitative methodologies in which algorithms have special relevance, especially regarding network analysis and data mining. The Recovery of Historical Memory in Spain on the social network Twitter will be analysed in this article. An open-code tool called T-Hoarder was used; it is based on objectivity, transparency and knowledge-sharing. It has been in use since 2012.


INTRODUCTION
The digital world is becoming so omnipresent that society is growing increasingly unaware of how immersed in it it actually is.Most real-world activities have their equivalent in the digital universe: shopping, entertainment, administrative formalities, conversations with friends and family, etc.There is little that does not have a digital counterpart.This immersion, which has intensified this decade, is bringing about social changes whose impact has not been felt yet.
Researchers need to extend their activity into the digital dimension -but the foundations are yet to be laid.Newspaper libraries are already just a small portion of the secondary sources.The role of media in shaping public opinion is being overtaken by the new digital environment.According to the Estudio General de Medios [General Media Study] conducted from February to November 2017, 1 the share of newspaper readers stands at 24.3% whilst Internet is accessed by 75.7% of the population, greater than the share of radio listeners (59.3%) and approaching that of television viewers (85.2%).The growth of the Internet as the preferred place for getting information and debating topics is unstoppable.Much of this growth has come from virtual social media, which have revolutionised the way in which content is delivered.We may not be witnessing a phenomenon of mass self-communication (Castells, 2009) but we are experiencing society's power to make the media agenda more or less relevant.
Many of the conversations and discussions that used to take place on the analogue plane are now being incessantly recorded in the digital world in the form of text, images or video thanks to social media.New, direct channels of communication are opening up between politics and the public that fall outside traditional media.Everything happens faster and more directly and leaves an indelible trace.
Online social media are in the hands of a few companies, such as Facebook (Facebook, Instagram and WhatsApp), Microsoft (LinkedIn), Alphabet (G+ and YouTube) and Twitter.These organisations use the information they obtain from people's profiles and interactions within their medium for commercial purposes.On the other hand, access by researchers to this information is very limited and is strictly controlled by these companies.In the case of Twitter, the information generated by most of its users is in the public domain and can be accessed through its API; however, the full volume of generated messages cannot be accessed for free -only a portion of it.Even so, Twitter is today the most widely used source of social data.
This new digital environment -where researchers will have to dip their toes in -offers great opportunities, but is not without technological challenges.The huge amount of information outputted by social media requires the application of quantitative methods and other data analysis techniques in order to study it.The other side of the coin is the low reliability of the identity of the users who publish or spread content, as is the case of Twitter, where fictitious, false or automatic profiles (bots) proliferate (Ferrara et al., 2016).
This paper lays down a cyclical, three-phase (data capture, data processing and data display) methodology to help qualify the profiles that publish information on Twitter.How to access the information in this social network is explained, the types of data that can be obtained and their limitations.Procedures for analysing Twitter user profiles are defined and different types of display for detecting behavioural patterns listed.
A tool called t-hoarder_kit2 , in use since 2012, was used as technological support; since it is an open-source tool it meets the requirements of transparency and knowledge sharing.This methodology was applied to a case study on the profiles of users who write about the Recovery of Historical Memory in Spain on the Twitter social network to determine the degree of reliability thereof.

THE ENVIRONMENT AND CHARACTERISTICS OF SOCIAL MEDIA
Virtual social media do not involve the whole of society, only a percentage of the people that have an Internet connection.According to the data obtained by the Spanish Centre for Sociological Research (CIS) in a survey it conducted right after the 2016 general elections in Spain, 3 67.8% of Spaniards accessed the Internet during the last three months leading up to the elections, 74.6% of which belonged to the Facebook social network, the most widespread of all social media in Spain.
But not only is there a part of society that is not connected via these platforms, there are also access gaps by gender (Fig 1 ) and by age (Fig 2).This lopsided presence of different social groups greatly hinders opinion prediction and analysis methodologies (Gayo-Avello, 2011).Any statistical study based on social media data would have a large social bias.However, the amount of information that can be obtained from virtual social interactions opens the way to new types of studies based on network analysis and data mining.
Communication on social media is fragmented and consists in short messages that are shared or commented on.In the case of the Twitter platform, the size of the messages -known as tweets -has been limited from its inception to 140 characters until it was doubled in November 2017.In other social media where this restriction does not exist, the length of the posts tends to be short.These messages are often accompanied by multimedia content, which in some platforms is more important than text.
Sticking to two very popular platforms in Spain (Facebook and Twitter), the differences in the number of users who participate in them, their degree of privacy and their restrictions to retrieving information can be seen in the following table.
The Facebook platform seems at first more suitable for retrieving information because of both the number of profiles it has and the segmentation of its users.However, the privacy restriction on personal profiles leaves only the profiles of entities visible, thereby drastically reducing its scope.This is why the possibility of obtaining data from personal profiles on Twitter has made this platform the main source of data for researching online social media.
On the other hand, Twitter beats Facebook in immediacy; it is a platform where messages are shared faster and conversations are more agile.It is a place where current events are discussed and communication campaigns organised.
Twitter's strength as a public source of opinion becomes a weakness as far as certain hot topics are concerned owing to noise and overreaction.This causes the most extremist positions to prevail, creating what is known as the "spiral of silence" (Noelle-Neumann, 1995).Thus, the analyses should take this into account in order for their results not to become distorted.Although Twitter is not a perfect source of digital information for researchers, it has served as the basis for multi-ple studies that have allowed the social pulse in limited environments to be analysed.The birth and development of social movements that have lead to social transformations, such as 15M (Toret, J., et al., 2013) (González-Bailón et al., 2013) (Peña-López et al., 2014), or the networking of new citizen platforms (Aragón et al., 2017) has been researched.Likewise, it has been used to research crisis situations that lead to political polarisation (Morales et al. 2015).It has also been used globally to research electoral campaigns in Spain (Barberá & Rivero, 2012) (Congosto, 2015), the United States (Livne et al., 2010) (Hanna et al., 2011) (Bessi & Ferrara, 2016) (Wang et al., 2016) and Europe (Jungherr et al., 2011) (Ferrara, 2017).Lately, the social alarm generated by fake news is being analysed from Twitter as it is one of the channels over which they are being spread (Fletcher et al., 2018) (Stella et al., 2018).

METHODOLOGY
There is no consolidated global methodology for analysing online social media.Researchers apply specific methods to obtain results from their experiments.Independently of the method that is ultimately applied, it must take into account the types of entities that communicate with each other on social media, the manner in which this communication takes place and the restrictions to collecting this information.Only the definition of a methodological framework that uses open, transparent tools can ensure that experiments can be repeated and checked by third parties.
Platforms provide Application Programming Interfaces (APIs) for obtaining their data.These mechanisms allow the data to be downloaded via a very efficient protocol, but under the conditions set by the platform, which may vary over time.The restrictions affect aspects such as data privacy, the age of the messages and the amount of information that is provided per unit of time (rate limit).An alternative consists in using web scraping tech- Table 1.Comparison of the Facebook and Twitter platforms.

Characteristic Facebook Twitter
niques (Iacus, 2015) to obtain data directly from the platforms' websites.This option allows sidestepping the message age limitation, but the retrieval of information is less efficient; in some cases, some kinds of data cannot even be downloaded.Its use is only advisable when the APIs find it impossible to retrieve information of a certain age.
Twitter is the social network that is the most agreeable to data downloading because its messages are mostly public.Even if it is not the best data source, as it is not the most widely used and has gender and age gaps, it is the most readily available to researchers.This is why the methodology set out herein is going to focus on Twitter.

Access to Twitter's API
Twitter has several platform access APIs. 4 The ones which are related to data downloading are listed below: • The REST API: it gives access to multiple types of Twitter data.It allows downloading all the data that can be displayed by the graphic interface, such as the users' profiles, their followers and the people they follow (following), their tweets, trending topics, and so on.It is the most suitable API for analysing user profiles and the relations among users.• The Search API: it allows tweets to be retrieved from the tweet history.Queries may be done by entering sets of keywords separated by logical connectors or through advanced searches.In addition, results can be filtered by language and by location.• The Streaming API: it provides a real-time flow of tweets by establishing a permanent connection with Twitter's servers.Data can be filtered by ten different types of parameters.The most common ones are: keywords, users, and locations.This is the API that is most suited to gathering information about a subject matter on a continual manner over a long period of time (months or years).
APIs limit the amount of information that can be downloaded over a certain period of time.
• The REST API has a variable limitation according to the requested method.The restriction is measured in the number of requests that can be made in a 15-minute period.The values range from 15 to 900.
In turn, an operation can include a multiple answer (pagination), which speeds up downloading.The most restrictive methods are those that provide lists of users' followers or of the people users follow, which are kept down to 15 requests.The less restricted methods are the requests for user tweets, which allow up to 900 queries.• In the Search API, the limitation stands at 180 requests every 15 minutes.
• Since the Streaming API provides a constant flow of tweets, the restriction applies to the flow that is received, which can never exceed 50 tweets per second.
There is also a time limitation for retrieving information from different levels: • The Search API can only retrieve information that is seven days old at the most.On the other hand, the REST API can download a user's last 3,200 tweets.In this latter case, the age of the data will depend on the tweeting frequency, which can range from years to months.
Additionally, neither the Search API nor the Streaming API (in free mode) provide all tweets after just one query.The percentage of tweets that are outputted ranges from 85% to 95% -the criterion according to which the tweets are filtered is not known.Nevertheless, this is an acceptable percentage for analysis purposes.
There are several open-code tools that provide access to Twitter's APIs, such as TwapperKeeper 5 , Twitter-Tap 6 twitterstream-to-mongodb 7 , dmi-tcat 8 and t-hoarder_kit.The last tool was the one that was used to conduct this study.

The t-hoarder kit tool
T-hoarder-kit is an evolution of the T-Hoarder platform (Congosto et al., 2017).It consists of a collection of open-code software that allows Twitter information to be both downloaded and processed so as to make it easier to use in network analysis and information display tools.Since the analysis of online social media involves working with massive amounts of information, it is essential to display it to let patterns or singularities emerge to guide the analysis in its next phase.
t-hoarder_kit uses Twitter's REST, Search and Streaming APIs and allows the following kinds of information to be retrieved: • All information that is associated with a user's profile, followers, following, tweets posted and the lists to which they belong (REST API).• The existing follower-following relations for a set of users (REST API).• The most recently posted 3,200 tweets from a set of users (REST API).• Consulting the Twitter history for a set of tweets that match a search pattern (Search API).• Consulting in real time a set of tweets that match a search pattern (Streaming API).
Tweet processing is aimed at information aggregation and inference.Aggregation will allow the degree of repetition of some tweet components to be quantified, and inference will let the underlying characteristics of the infor-Digital sources: a case study of the analysis of the Recovery of Historical Memory in Spain on the social network Twitter • 5 mation emerge.The different types of processing are listed below: • Entities.A tweet is just a short text message (280 characters) but is made up of multiple entities that are included in it or in the metadata provided by Twitter APIs.The entities t-hoarder-kit takes into consideration in each tweet are references to other users, the most frequently used words, hashtags for tweet classification, URLs, images and the application from which the tweet was posted (metadatum).
By quantifying the frequency of appearance of these entities, the tool produces a summary of the collection of information it captured.

• Diffusion. One of the actions that characterises
Twitter is the re-posting of information, or retweeting (RT).Retweeting is a convention that Twitter users settled on at the very beginning for sharing tweets with their followers.Initially it was done by posting other users' tweets by preceding them with the letters RT and the name of the author.In 2009 Twitter included an RT button that did the same thing but automatically, which greatly facilitated the propagation of tweets.Users normally spread tweets with which they agree, so each retweet can be considered to be a positive vote for the original tweet (Conover et al., 2010).This feature causes the information that circulates through Twitter to be very redundant.Therefore, quantifying the diffusion of original tweets results in a ranking of the discourse that has been spread.• Location.The location of tweets can be known in two ways.The first one consists in obtaining the location that appears on a user's profile.This piece of data might not have been entered or consist in the name of a fictional location, so not all tweets can be pinpointed geographically.However, a large percentage of tweets (60-70%) can still be geolocated.The second option consists in collecting the geolocated tweets of those users who have activated the geolocation feature in Twitter.In this case, the percentage is much smaller (1.5% in the Spanish case).Standardising the location of tweets allows presenting them using a mapping tool and thus having an overall view of the geographical location of tweets.• User characterisation.When users use Twitter they leave a trace from which their characteristics, such as their personality (role), impact (h-index), network ratio, propagation ratios, and link and image usage frequency, can be inferred.
-  -Network ratio: it identifies the asymmetry of a user's declared network.

Network ratio = Number followers Number following
-RT_in ratio: it is used to calculate a user's retweeting capacity.

RT _ in ratio = Number RTs received Number original tweets
-RT_out ratio: it measures a user's tendency to be retweeted.

RT _ out ratio = Number RTs sent Number Tweets
-Hashtag ratio: it shows the frequency of tweet hashtagging.

Hashtag ratio = Number tweets with hashtag Number tweets
-Link ratio: it calculates the frequency of tweets with URLs.

Link ratio = Number Tweets with links Nuber tweets
-Media ratio: it indicates the use of multimedia.

Media ratio = Number tweets with media Number Tweets
• Relations.A Twitter user is connected either because they express an interest in what other user says (by following them) or because other users become interested in them (their followers).This declaration of interest of some users for others creates the network of declared relations through which their tweets will flow, the dynamic network being the network that will emerge from the interactions among them.The declared network and the dynamic network might not be the same.One of the first analyses of Twitter (Huberman et al., 2009) shows how both networks differ.A user does not interact with all the nodes in their declared network, and sometimes they interact with nodes from other networks.This is down to the fact that tweets are public; thus, they can be accessed in other ways.A user can access tweets from another user with whom they do not have any declared relation either because they get a retweet from a user in their network or because they check the tweets associated with some word or hashtag.Both types of relation -declared and dynamic -can be extracted using the t-hoarder-kit tool, which will generate a graph that can then be imported into network analysis and graph display tools.
Irrespective of the type of relation, declared or dynamic, according to which the graph is generated, the mathematical model for determining network parameters is the same, as the graph is a mathematical abstraction: a graph G is an ordered pair G = (V, E), where V is a set of vertices or nodes and E is a set of links or arcs that connect these nodes.
Of the multiple associated network parameters, the following have been selected: -Degree centrality: the number of links in a node, ie the number of nodes to which the former is connected.the different paths connecting other nodes.This metric identifies the nodes that hold the network together and tie different subgroups together.
-PageRank: it is based on Google's PageRank algorithm (Page et al., 1998), adapted to network analysis by the Gephi graph platform. 9 An iterative algorithm that measures the importance of each node within the network.The metric assigns each node a probability that is the probability of being at that page after many clicks.The page rank values are the values in the eigenvector that has the highest corresponding eigenvalue of a normalized adjacency matrix A'.The standard adjacency matrix is normalized so that the columns of the matrix sum to 1.
-Modularity: it is a network metric that lets the different communities in a graph emerge.It groups those nodes that have the most solid ties among them within a group and hardly any relations with members of other groups.There are many modularity algorithms (Newman & Girvan, 2004) (Leskovec et al., 2010) (Grabowicz et al., 2012).Blondel et al's algorithm (Blondel et al., 2008) is the one that has been used in this study.
t-hoarder-kit's role in the display of data is basically to transform the data so that display tools can import it.There are four different ways in which to display data: • Timeline: it is a display that shows the evolution of one or several variables in chronological order.The information is represented along the X and Y axes: the X axis shows the units of time and the Y axis the value of the variables.The data is prepared by linking to the variables their time information.• Variable comparison: tabulated information of a set of variables in order for them to be visually represented as histograms, bar charts, correlations, and so forth.• Map: it is used to represent geolocated information.
The data is structured by associating a geolocation with a set of variables.• Graph: it is used for network analysis.The data is modelled as a set of nodes having attributes and a number of links among them.

CASE STUDY
The case study focused on those Twitter profiles that tweet about the 2 nd Spanish Republic, the Spanish Civil War and the Franco Regime.Behind those profiles are both associations and private individuals that spread all kinds of contents -some of them new.
The group that was the subject of this study comprised 70 profiles whose common denominator is the historical memory topic above any other.The group was identified in two phases.During the first phase, 61 profiles were manually catalogued based on the available information on historical memory associations.During the second, 9 other profiles were discovered by analysing the interactions on Twitter among the members of the initial group.
The Twitter profile of each of the users under study, their declared relations (follower-following) and the last 3,200 tweets they had posted were downloaded using thoarder-kit.This data provided an overview of the contents generated by this group, whilst it also enabled these users to be characterised based on their behaviour, their acceptance by other group members and their overall impact on Twitter.This characterisation allowed these users to be ranked according to several indicators and a reliability index to be calculated.

Group posting timeline
A timeline of the tweets posted by the members of this group every month starting from 2011 was generated (Fig 5).This timeline shows two variables: the number of tweets made (excluding RTs) and the number of RTs received.The variables had different scales (1-10), so they were proportionally represented in order to be able to see their correlation.The months of July were marked to see whether the Anniversary of the 18 th of July led to an increase in tweets during that month; an increase, however, was only seen in 2015.
Can be seen that there is a growing trend to tweet and an increase in virality over time.Retweeting was very low in 2013 but it increased considerably in 2014 and 2015, when it was proportionally higher than in 2017.
This timeline allowed the peak moments of both tweeting and retweeting to be identified and the information limited in order to be able to analyse it in greater detail.This study does not dwell on this analysis because it revolves around analysing the sources.Nevertheless, the possibilities of this kind of display are duly noted.

User behaviour
The first approach to the analysis of users focused on analysing their activity and its impact.The following indicators were calculated for each user, as set out in the Methodology section: role, h-index, RT_in ratio, RT_out ratio, Media ratio, URL ratio and HT ratio.The results are shown in Table 2 below, the users having been sorted from higher to lower h-index.
Users were sorted according to the way they interact, and a role was assigned to them.This sorting into roles has been applied in 18 study cases in different fields, such as the press, elections, trending topics and international events (Congosto, 2016).These roles' distributions vary depending on the field but fall within a delimited range of values: speakers are fewer than 3%; networkers amount to less than 2%; retweeters range from 6% to 15%; replicators from 3% to 17%; monologists from 1 to 4%; isolated users from 7% to 20%; and, finally, common users add up to more than 50%.Taking these percentages of reference into account, and as can be seen in Figure 6, the study group is way above the percentages of speakers and networkers (low speaker: 42.03%; medium speaker: 21.74%; and networker: 28.9%) and way below the percentage of retweeters (1.45%).No replicator, common or isolated profiles were detected.The predominant profile is that of low speaker followed by those of networker and medium speaker.Therefore, it could be said that the members of this group generate contents that are spread by others.h-index measures the impact of the tweets, that is, the echo they have in Twitter thanks to their being retweeted.The higher this indicator for a user, the greater this user's capacity to grab the attention of other users in a continued manner and cause them to retweet their tweets.This is the result of their having a stable, motivated audience to the contents they post.The h-index distribution on Twitter follows a power law distribution (Newman 2005), where a few users have high values for this indicator and most users have a low value -usually below 4.
As can be seen in the histogram of the h-indexes of the users in the group (Fig 6), only eight have a value below 5.This means that most of these profiles have a relatively high h-index value compared to most Twitter users.Within the group, the most frequent value falls between 10 and 20, the highest being 146.
The network ratio is very irregular in this group; it ranges from 0 to 8,242.80.This indicator gives an idea of the asymmetry of a user's declared network.A value greater than 1 for a profile means that there are more people interested in the user than people in which the user is interested.The higher the value, the more popular the user is.This metric has to be taken into account, although it must always be qualified with the degree of acceptance of the tweets of a user since many of their followers may be passive and not interact with them (Huberman et al., 2009;Romero & Huberman, 2011).
The RT-in ratio for this set of profiles ranges from 0.04 to 88.92.This indicator is the result of calculating the average number of received RTs per tweet.It is a measure of the propagation capacity and is sensitive to diffusion peaks.It is calculated differently than the h-index because, in this case, the continued retweeting capacity is not calculated but an overall retweeting capacity.For example, a tweet that has been retweeted more than 1,000 times would cause the value of this indicator to go up dramatically, but not the h-index.
As can be seen in Figure 7, there is no correlation between the network ratio and the RT_in ratio; users with similar values for the RT_in ratio have different network ratios.The same is the case with the correlation between the network ratio and the h-index (Fig 8).This goes to confirm that the declared network not always matches the dynamic network of interactions and that the indicators of the latter provide a metric that is more in tune with reality.Therefore, the h-index and the RT_in ratio appear to be more realistic metrics of reference for ranking users.
As indicated above, even though the h-index and the RT_in ratio measure user interactions, they do not do it in the same manner.The h-index measures the continued retweeting capacity (x messages retweeted over x times), whereas the RT_in ratio provides the average numbers of retweets per message.Figure 9 shows that both variables have a rather high coefficient of determination.Of the two metrics the h-index was chosen as the indicator because it is considered to measure not only the number of retweets but also the retweeting success.

Acceptance among group members
Another way to rank the members of this group is to analyse how they value each other.This appreciation can   be measured in two ways: the way in which they follow each other and the way in which they interact with each other.In both cases network analysis was used to determine their degree of connection (degree centrality, indegree centrality, outdegree centrality) and their position in the network as regards their proximity to other nodes (closeness centrality) and their intermediation (betweenness centrality).The PageRank algorithm was also used for evaluation purposes.Additionally, users were grouped into communities according to their connections (modularity).Network analysis and graph display were carried out using the Gephi tool. 10  The way in which the different group profiles follow each other determines the declared follower-following network.However, as previously mentioned, this network may be a declaration of intentions rather than an active relation.In the case of the analysis of a group that is associated with a specific goal, the types of relations among users are important since the act of following a certain user invests the latter with credibility.
In order to establish how group members follow each other, the declared connections were obtained with thoarder-kit and graphed.In this graph the nodes correspond to the members of the group and the links to follower-following relations.Relations are asymmetric, that is, it is not necessary for two users to follow each other; it is enough for a member to follow another for there to be a connection.This gives rise to a directed graph where relations have a direction from one node to another.If two users were to follow each other, there would be two relations, each one starting at a respective node and ending in the other.
The graph has been visually represented in Figure 10.This figure shows how the nodes are connected by follower-following relations.The size of the nodes is directly proportional to indegree centrality, those profiles that are more followed within the group standing out.The colour of the nodes corresponds to the three communities formed by those users.The red community represents 42.86% of the nodes; the most followed users are foromemoria and ARMH_Memoria.The blue community, which is the same size as the red one, has Buscameblog and AmigosBrigadas as its most followed profiles.The green community encompasses 14.28% of group members and SOSCarabanchel and amauthausen stand out for the number of followers they have.
Table 3 lists the network parameters for each of the members of the analysed group sorted by PageRank.It can be seen at first glance that there is not a very strong correlation among indegree centrality, closeness centrality, betweenness centrality and PageRank This is due to the fact that they measure different node characteristics that some take the quality of the connections into account and others do not.(Connection quality is understood to mean the weight that is assigned to links coming from important nodes in the network.It is not the same thing to be linked to peripheral, hardly connected nodes than to central, highly connected nodes.)The metric that takes the quality of connections more into account is PageRank, so it was the one that was used to rank the sources.
In order to find out how real the declared declarations are, the interactions within the group as to the way of being mentioned or cited were analysed.To this end, the last 3,200 tweets posted by each of the members of the analysed group were downloaded with t-hoarder-kit, and a graph was generated which only included the mentions among them -those made about users outside the group being discarded.
In this case, the graph is also directed, that is, mentions go from one user to another -which may be returned or not.Unlike the follower-following graph above, where there was only one relation, in this graph a user might have mentioned another user several times.For the purposes of this graph, a user's multiple mentions of an-   The graph that represents the mentions among group members is shown in Figure 11.Node size is directly proportional to indegree centrality, whereby the most mentioned nodes stand out.The colour of the nodes corresponds to the communities into which they have been grouped.All group members have been mentioned by others save for GuerraCivil3639, which has been classed as a monologist.
The manner in which members mention each other gave rise to four communities, the red one being the one with the most members (47.06%) -where the ARMH_ memoria, Memoria_Publica, Buscameblog, foromemoria and SOSCarabanchel profiles stand out.AmigosBrigadas, LincolnBrigade and jmgarretas stand out in the purple community (33.82%).DiarideGuerra and bibrepublica are prominent in the yellow community (8.82%).depor-tado4443 and demiguelch are noticeable in the green community (8.82%).These communities do not perfectly overlap the communities detected in the follower-following graph, but they nevertheless have certain elements in common.
Table 4 lists the parameters of the dynamic network formed by users' mentions to others.In this case indegree, outdegree and degree were included with the weight of each node.Users were sorted by PageRank.

Reliability ratio
Group members were ranked in two environments: the exogenous and the endogenous.The exogenous environment corresponds to the positions of each of the members of the analysed group in relation to all Twitter users, and the endogenous to how each member is perceived  within the group.Endogenous acceptance had more weight in the determination of the reliability ratio than exogenous acceptance because the perception of the members of this group by Twitter profiles that specialise on the same topic was considered more important than that of more generalist users.The declared and dynamic networks were taken into account in endogenous assessment (the latter having more weight), whereas only the dynamic part of the network was considered in the case of the exogenous assessment (Fig 12).The following formula was used: Table 5 ranks every group member according to their final reliability ratio.It can be seen here that the exogenous appreciation is qualified by the endogenous, whereby group members with a high h-index were relegated to less prominent positions because they had a low endogenous score.For instance, DefensaDeMadrid, which has an h-index of 116, went from third according to the h-index to tenth according to overall score.After analysing the tweets made by this profile, it was found that it posts current political news rather than tweets about historical memory, so it is to be expected that the rest of the group would echo its tweets less.This correction is fitting because it means that this profile is not a source of specialised content only.

CONCLUSIONS AND FUTURE WORK
There are no consolidated methodologies to help humanities and social sciences researchers handle large amounts of information.In view of the difficulty to address this issue in a generic manner, there is always the possibility to provide partial solutions to identified needs.This paper aims to contribute to this by enriching the in-  formation provided by the Twitter platform about its users, a lack that is always present when analysing data from this social network.This paper has put forward a methodology for ranking groups of Twitter profiles in a specific field.This methodology makes it easier to aggregate and infer information on the activity of this platform's users of by using quantitative methods and other techniques that are typical of data analysis.
A user profile reliability ratio was devised based on the acceptance of user profiles within the social network.This acceptance is won unknowingly by users when they interact, which leaves useful clues about how other users perceive them.This perception is given more weight when it comes from users who specialise on one topic than when it comes from more generalist users.
To illustrate this methodology, a case study of Twitter profiles that tweet about the 2 nd Spanish Republic, the Spanish Civil War and the Franco Regime was used.After successive steps, the result was a ranking of users according to their perceived reliability as providers of contents about historical memory.
In addition to the reliability ranking, this methodology provides other information of interest, such as the role these profiles play on Twitter, the type of content they post (use of URLs or multimedia), the subgroups they form within a group and their position in the network of declared and dynamic relations.This data can also shed light on the research into Twitter communities.
In the future it would be interesting to add network analysis algorithms to t-hoarder-kit in order for it to be possible to automatically calculate the reliability ratio.Part of this analysis required the use of external network analysis tools.

Figure 1 .
Figure 1.Percentage of profiles on online social media by gender.Source: CIS: Survey conducted after the 2016 Spanish general elections -Socio-demographic variables

Figure 2 .
Figure 2. Percentage of profiles on online social media by gender.Source: CIS: Survey conducted after the 2016 general elections -Socio-demographic variables.
Role: users have been categorised into the following role types based on González-Bailón et al's definition of influence (González-Bailón et al. 2013) (Fig 3): -Speaker: their tweets are retweeted.According to the mean value of diffusion of their tweets, they can be classified as either low (at least three times), medium (at least ten times) or high (at least one hundred times).-Networker: their tweeting frequency is high and they make and receive pretty much the same number of retweets.-Retweeter: they mostly spread information (60% of RTs).-Replicator: their most usual activity is to respond to tweets (60% of replies).-Monologist: the tweets they post are hardly retweeted (less than 70% are not retweeted).-Isolated: they have an insular attitude; they neither retweet nor are retweeted.-Common: their level of activity is very low and they hardly interact with other users.-h-index: this indicator is employed to calculate a user's impact (Hirsch, 2005) by simultaneously measuring the quality and the quantity of their scientific output.The calculation of the indicator is adapted to the Twitter environment by replacing tweets with publications and RTs with citations (Fig 4).This algorithm sorts tweets by number of RTs received and looks for the point where the number of tweets and the number of retweets match.For example, a user with an h-index of 40 has made at least 40 tweets that have been retweeted 40 times.This metric rewards continued success instead of one-offs.

Figure 3 .
Figure 3. Classification of user roles according to their level of activity and impact.

Figure 4 .
Figure 4. Way of calculating the h-index.

Figure 5 .
Figure 5. Timeline of tweets vs retweets of the set of Twitter profiles that talk about the 2 nd Spanish Republic, the Spanish Civil War and the Franco Regime.

Figure 6 .
Figure 6.Distribution of roles and h-indexes.

Figure 7 .
Figure 7. Distribution of roles according to the network ratio vs the RT_in ratio.

Figure 8 .
Figure 8. Distribution of roles according to the network ratio vs the h-index.

Figure 9 .
Figure 9. Correlation between the RT_in ratio and the h-index.

Figure 10 .
Figure10.Network of follower-following connections of those Twitter profiles that talk about the 2 nd Spanish Republic, the Spanish Civil War and the Franco Regime.

Figure 12 .
Figure 12.Indicators for profile evaluation according to the endogenous and the exogenous environment.

Table 3 .
Parameters of the declared network among group members sorted by PageRank.
Figure 11.Dynamic network of mentions among the Twitter profiles that talk about the 2 nd Spanish Republic, the Spanish Civil War and the Franco Regime.

Table 4 .
Parameters of the dynamic network among group members sorted by PageRank.

Table 5 .
Ranking of the members of the group according to their overall reliability ratio.