Consent and information – ethical considerations when conducting research on social media

Dag Elgesem

Department of Information Science and Media Studies, University of Bergen

Dag.Elgesem@infomedia.uib.no

Introduction

My topic is research on social media and the requirements regarding information and consent arising from such research. This article will primarily discuss the responsibility of researchers for giving due consideration to their research participants. It is also important to remember, however, that the value of the research is an ethical consideration that must be given weight, as the Norwegian National Committees for Research Ethics (NESH) points out in its guidelines on Internet research (NESH, 2003, point 1):

Research on the Internet is valuable both because it can generate insight into a new and important communication channel and because the Internet provides the opportunity to study known phenomena (e.g. formation of norms, dissemination of information, communication, formation of groups) in new ways.

The requirements regarding information and consent when conducting research on social media are not essentially different from other research involving people’s participation. However, research is conducted in contexts that are structured by technologies and in which the conditions for communication are not always as clear or known for everyone involved. This applies in particular to the boundaries between the public and private spheres, which are often drawn in new ways and which therefore cause us in some cases to be uncertain about which requirements regarding information and consent should apply. But not everything is equally unclear. In cases where a service is both password protected and entails sensitive information, such as a personal Facebook profile, it seems obvious that the usual requirements regarding consent must apply. In contrast, I argue in this chapter that there are weaker grounds for obtaining consent to use non-private information that individuals themselves have made available in a public forum, such as postings about political issues in debate forums in online newspapers or on Twitter. I argue that in some cases research on social media is ethically responsible without consent and that the interests of those involved may be safeguarded in other ways.

Consent

A useful starting point for this discussion is the model developed by McKee and Porter (2009, p. 88), shown in Figure 1 below, which identifies four factors that affect the need to obtain consent when research is conducted on and outside of the Internet: degree of accessibility in the public sphere, sensitivity of the information, degree of interaction with the research participants and the vulnerability of the research participants.

F0001_14-34.jpg
Figure 1 Factors affecting the requirement regarding consent (McKee and Porter, 2009, p. 88)

McKee and Porter’s model identifies some of the sources of the uncertainty surrounding the requirements regarding consent when conducting research on social media: the ethically relevant factors (public versus private, sensitivity, interaction, vulnerability) are present in varying degrees and may occur in various combinations. It is therefore difficult to formulate simple, general rules, and on this basis McKee and Porter recommend a case-based approach with concrete assessments of the ethical issues raised by various research projects.

It is clear that the four factors affecting requirements regarding consent in McKee and Porter’s model are not unique to research on the Internet, but are relevant in all research on communication. However, what complicates matters is that the boundaries between the private and public spheres appear in new ways, and the technological context creates new forms of interaction. This means that our ethical intuition about how we should regard these aspects is less clear.

In a number of often cited works, danah boyd has identified some properties of what she calls the «networked public sphere», which give communication on the Internet a character different from communication in other channels (boyd, 2008, p. 26 ff):

These are interesting and important observations of some of the special features of Internet communication, which also shed light on why issues related to consent in research on the Internet may be more difficult to assess than other types of research. For example, since it may be unclear who the audience is for postings on the public sphere of the Internet, it is also more unclear who the postings in this sphere are intended for, and thus it is more difficult to assess whether the use of communication in research conflicts with this intention. The question is whether or not the use of information is related to a purpose different from the original one. A clear «yes» to this question will normally result in a requirement to obtain consent. The problem is that there is no clear delimitation of the target in much of the communication on the Internet because the intended audience is not restricted by the context of the communication. Examples of postings in which the audience is «invisible» and not clearly defined are replies in a comment field in an online newspaper, a Twitter post or an article in a blog. Below I return to the question of which role consent should play in research on media with an invisible audience.

By the same token, not all communication on the Internet has all of these properties to the same degree. Not all Facebook content is searchable by everyone, and we know who the audience is for the comments we post there (if we have set our privacy settings correctly). Often the ethical requirements regarding research will be stricter when the communication does not have the four properties identified by boyd because this communication is more private.

I share McKee and Porter’s view that it is difficult to give simple, general rules for assessing when the requirement regarding consent should apply, and that it is necessary to make concrete assessments on a case-by-case basis. However, I will argue that there is an ethically relevant distinction between situations in which participating in the research entails a risk of harm or discomfort and those in which there is no such risk but the research nonetheless challenges the individual’s interest in retaining control over information about himself/herself. Although the boundary here is fluid, and breaches of personal privacy are of course burdensome, I believe the two situations are different in ethically relevant ways. In the first case, there must be a requirement to obtain consent, whereas information and consent in the other type of situation is an important consideration, which in some cases may be weighed against other considerations. I will argue that research on certain types of communication on social media, such as political postings on Twitter, may be conducted without obtaining consent.

Situations in which there is a risk of discomfort or harm trigger an unconditional requirement to obtain consent: It must be up to the potential research participant to decide whether to subject himself/herself to the relevant risk or discomfort. As mentioned in the introduction, I believe that assessments related to the value of the research and its quality are relevant considerations in an ethical assessment, but in situations in which there is a risk of discomfort or harm, the consideration given to the value of the research will not diminish the requirement to obtain consent. My view – and I think I am in line with the NESH guidelines – is that if it is not possible to obtain participants’ consent in projects that entail such risk, the research cannot be carried out. Allow me to illustrate this point with an example:

Example 1: Research on Internet dating services

A group of economists in one of Norway’s neighbouring countries wanted to study preference patterns of partner selection on Internet dating sites. Simply explained, the researchers created fictional profiles on the dating site, some of women and some of men. The profiles had some similar features, but were different with regard to income, education and ethnicity. The researchers wanted to find out what difference these features made in the market for partners. For each variable the researchers planned to contact a random sample of (real) persons on the dating site and register the features of the profiles of those who responded and those who did not. After the data was collected, the researchers would tell those who had answered the inquiries that they were no longer interested.

The project, which as far as I know was never carried out,1 aimed to shed light on an increasingly popular phenomenon in the social network which provides new ways of finding a partner with consequences we know very little about. The methodological design of the project also seemed to be well planned. But this could hardly make up for the project’s ethical problems. Firstly, the project had a hidden agenda in which it was essential that those involved did not know the real purpose of the inquiries. They did not even know that they were objects of research. Moreover, people who post a profile on a dating site are in a vulnerable situation, and the research activity may trigger burdensome emotional processes resulting from dashed hopes and disappointment. So although it could be argued that a project like this is interesting and increases insight into an important phenomenon, consideration towards the people who are the object of the research indicates that the project should not be carried out in this form.

Retaining control over one’s own information

Ethical challenges related to personal privacy arise when the research infringes on the individual’s interest in retaining control of information about himself/herself. The problem here is not necessarily that the research may be burdensome, as in the example above, but whether the research shows reasonable respect for the individual’s integrity and interest in retaining control of his/her own information. Respect for personal privacy indicates that consent to use information about an individual in a research project should normally be obtained, although I will argue that this consideration is weaker than the requirement to avoid the risk of harm and discomfort.

In situations where the research will challenge the individual’s interest in retaining control of information about himself/herself, this interest should normally be protected through consent obtained by the researcher. By the same token, I believe there are situations, especially when consent is very difficult to obtain, in which consideration for the value of the research may make it defensible to implement the project without consent. I return to this matter below. But let us first look at an example of research on social media that is clearly problematic from the perspective of personal privacy.

Example 2: Research on Facebook2

In 2008, US researchers made the Facebook profiles of an entire class of students from an unidentified US college available on the Internet. The dataset contained 1,700 profiles from the students’ first academic year in 2006. Comparable data were also collected from the two subsequent years, which were planned to be published at a later time. Making the data publicly available was done in accordance with requirements imposed by the project’s public funding source to allow other researchers to reuse the data.

The data was collected by research assistants who were also members of the Facebook network, but the other students had not given their consent to the use of the information in the research project. However, the information was made less identifiable and less sensitive before it was published by deleting the students’ names and identification numbers and removing the most sensitive information about their interests. Thus the information published was not directly identifiable, and it could only be used for statistical purposes.

The researcher responsible for the project defended the project on the grounds that the research would not entail a risk or burden for the people involved. «We have not accessed any information not otherwise available on Facebook. We have not interviewed anyone, nor asked them for any information, nor made information about them public (unless, as you all point out, someone goes to the extreme effort of cracking our dataset, which we hope it will be hard to do).»3

As it turned out, however, it was possible to identify the school in question. But the most important objection raised in the discussion about the project was the method of data collection. Zimmer criticized the absence of consent to collect information as undermining the condition for communication between the members within the network.

While the information was indeed available to the RA, it might have been accessible only due to the fact that the RA was within the same «network» as the subject, and that a privacy setting was explicitly set with the intent to keep that data within the boundaries of that network. Instead, it was included in a dataset released to the general public.4

In my view, Zimmer’s objection is reasonable. Facebook is a system in which participants create a framework of protected communication with selected friends by logging in and actively choosing who they want to share information with. Participants in the network express clear preferences about the limitation of access to information about themselves through their privacy settings on their profiles. Using the information for research therefore violates the conditions on which the participants’ communication is based, although it is correct as the researchers pointed out that they did not do anything to expose the students to risk or discomfort.5 The case exemplifies how data collection on the Internet can undermine the individual’s interest in retaining control of the information about himself/herself, and thus trigger the requirement to obtain consent. The case also illustrates that this requirement may arise even though the research subjects are not exposed to any risk or burden. I will nonetheless assert that there is an ethically relevant distinction between research that results in a risk or burden for the participants and research that does not.

Reasonable expectation – of what?

In a system with a log-in function and privacy settings that limit access to personal information, it is clear in my view that consideration for the individual’s interest in retaining control of information about himself/herself triggers a requirement to obtain consent. However, in contexts where the communication channel is more open, it is not as clear. In that case, some of the other factors identified by McKee and Porter may play a role: degree of vulnerability, sensitivity and degree of interaction with the research participants. I will return to this point, but first I want to discuss a particular way of formulating the requirement regarding control over information about oneself. Many have proposed that information should not be used without consent if the people being studied do not have an expectation that the information will be used in research. It is natural to formulate it in this way, e.g. in the assessment of research on Facebook profiles (discussed above).

Hoser and Nitschke (2009) are among those who have spoken in favour of such a formulation of the consent requirement in research on social network services.

Thus, we could establish a simple rule: The data someone posted, e.g. in a social network site or newsgroup may be used in the context and by the audience he or she intended it for. The intended audience is, even if it is large and not personally known to the user, the «community» he or she joined. So nobody else should be allowed to use, without consent, the data generated in such a site. Researchers are probably not the audience an average user intends to reach by his or her postings and serving as a research object is normally not the purpose an average user has in mind when posting on a social network site or in a newsgroup.6

We see that the authors do not qualify which types of network services they believe should require consent, e.g. whether or not there is a log-in function. It appears they believe that if the postings were not intended for researchers, they should not be used in research. But if we formulate the criteria in this way, it will imply a consent requirement for all research, including for comments posted in the public sphere, e.g. postings in a debate forum in an online newspaper. There are two problems connected with this. One is that in some cases it is so difficult and resource intensive to obtain consent, such as from everyone who has participated in a debate on Twitter, that it is not possible in practical terms. The other problem is that it seems unreasonable to require consent in cases where people themselves seek public attention for their views, such as about political issues on Twitter. Let us look at an example.

Example 3: Research on political debate on Twitter

A Norwegian and a Swedish researcher7 wanted to compare the political discussion on Twitter in connection with the elections held in 2012 (Sweden) and in 2011 (Norway). They used a program (TwapperKeeper) that downloads messages from Twitter related to certain #-tags, e.g. #elecl2010. They collected 100,000 messages from 9,000 individuals, which they made the object of qualitative analyses and network analyses. The question of which requirements regarding information and consent should apply in a study like this was raised in the dialogue with the Regional Ethical Review Board in Uppsala, Sweden, and the Data Protection Official for Research (NSD) in Norway. After some time, approval to implement the study in both locations without a consent requirement was received. In the assessment it was also pointed out that it would be difficult to obtain consent. The researchers’ argument in this context was that the postings constituted political discussion in the public sphere, and should therefore be available for research without restriction.

Everything that gets tweeted is public, but all of it is not necessarily for the public. Still, we would argue that the setting of our project – thematically tagged communication about an upcoming election – is public, and that the users could be expected to share that view.8

Note that they do not assert that all communication on Twitter should necessarily be available for research without consent: There may be communication on Twitter that should be protected. They argue for their conclusion on the basis of a concrete assessment that the channel is open, the topic is of a general political nature and the condition for discussion is that people are seeking attention for their views in a public debate.

Such a concrete assessment of how researchers should regard communication in open forums is in keeping with NESH guidelines. On the one hand, NESH says that research on open forums may be conducted without obtaining consent.

As a general rule, researchers may freely use material from open forums without obtaining consent from those who have produced the information or those about which the information applies. For example, a researcher may freely use information obtained from the coverage an online newspaper has gathered about an issue.9

At the same time, NESH emphasizes in its guidelines that information that appears in open forums may also require researchers to exercise caution when disseminating research results, e.g. due to topic sensitivity or the subjects’ vulnerability.

I argued above that it is unreasonably limiting to formulate a general requirement regarding consent if the subjects do not expect that researchers will obtain access to the information. In my view, the Twitter project discussed above is an example of a project in which the subjects do not necessarily expect that researchers will study their postings, but in which the research must nonetheless be said to be acceptable. My view is that research may be compatible with the premises for the communication situation even though the participants do not actively expect that researchers will gain access to it.

There is a logical difference between an expectation that something will not occur and the absence of an expectation that it will occur. The first implies the second, i.e. if the expression to the left of the arrow is true, the expression to the right of the arrow must also be true:

expect not-A → do not expect A,

– but the opposite does not follow.

If there is an expectation that people on the outside will not gain access, as was the case in the Facebook example, then it is a breach of this expectation to use the information in research without consent. While in the Twitter example most of the debaters do not expect that the information will be used in research, neither is it a reasonable expectation, given the context, that the information will not be used in research. Thus, in the latter instance the researchers’ access to the information does not undermine the premises for communication.

The need for protection against identification

But even though researchers’ access to the information does not necessarily undermine the premises for communication, researchers will often need to give special consideration to this when disseminating their results. For instance, there are challenges related to the fact that quoting from the Internet makes it easier to search for the person being quoted. The question here is whether the further use of the research presents challenges, especially if identification is burdensome. The ethical assessments that this type of situation raises are different from those we has seen above, because the data collection in itself is burdensome or clearly infringes on the individual’s interest in retaining control of information about himself/herself.

Also in cases where the researchers’ access to information does not necessarily undermine the premises for communication, there may often be grounds to require consent to use the information in research, because the information is sensitive or the persons concerned are vulnerable. NESH mentions this consideration in its guidelines:

Persons whose personal or sensitive information appears in an open forum are entitled in a research context to have such information used and disseminated in an appropriate manner. Living persons are also entitled to control whether sensitive information about themselves may be used for research purposes. The potential to trace an informant’s identify is greater when using digital forums compared with other information channels […]. Researchers must anonymize sensitive information that they make use of.10

Regarding the third point, the assessment is more complex and the consideration for research is clearer. In this case, obtaining consent is not the only means of taking research participants into account. One alternative is to refrain from identifying the participants, but in this case a concrete assessment must be made of the specific case; it is not possible to formulate rules that can be used more or less mechanically. This also means that cases will appear in this landscape where it is not so easy to draw clear conclusions. Let me give an example.

Example 4: Research on Internet communication about mental health problems

A Swedish project, described in Halvarson and Lilliengren (2003), wanted to investigate ordinary explanations for interpersonal problems. They wanted to learn which strategies average people without formal training in psychology use when they discuss strategies for tackling life crises and personal problems. The researchers wanted to study this by monitoring open Internet forums. The participants in these forums shared their personal histories, gave advice and support to others, and related their own problems. There were many young users on the websites, and the researchers were especially concerned with how they communicated about their problems. The researchers did not obtain consent to gather this information or to quote from it.

In my view, the most difficult question in this connection is whether the researchers should quote the participants’ postings, especially because it involves comments with sensitive information involving a vulnerable group. Halvarson and Lilliengren argue that it is not necessary to obtain consent to gather the information. They believe that the researchers’ observation of the discussion in this open forum does not entail any risk or burden for the participants. Moreover, they point out that this is an openly available forum and that the researchers’ observation and registration of the communication does not limit the participants’ control over information about themselves. The question could be raised as to whether all the participants are aware of this openness to the same degree, but let us assume that the researchers are correct. They also argue that the project is beneficial by pointing out that it is important to understand ordinary psychological explanations. Such explanations are the most important resource used by most people to tackle personal and interpersonal problems, and it is important to understand the basis for the strategies people use, e.g. for providing a basis for improving professional treatment. In addition, the researchers believe that there is no other alternative to observing natural communication, such as by setting up a discussion group and inviting people to participate in it. In this case, they believe that the recruitment would be biased and that they would not have got very many participants.

The question that remains, if they are correct that gathering information without consent is acceptable, is how the researchers should handle the information they collect when they disseminate their results. The two researchers chose to quote from the postings on the forum without giving the pseudonyms that the young people use when they participate in the discussions. The argument for this is that people often use the same pseudonym for several different Internet services, so that the names can be used in a search to find them in other places and thus help to identify them. But should the researchers have asked for consent to use the quotes they gathered? Halvarson and Lilliengren discuss this question and conclude that asking for consent could negatively affect communication in the forum:

When studying private explanatory systems at this specific venue, obtaining informed consent is not a practical problem. All informants can be contacted via their public e-mail address and thus asked for consent to quote their postings. However, it is difficult to know how this would affect their experience and future use of the venue. If it were to be perceived as an intrusion it could have negative effects and violate later participation in discussions.11

The problem is that those who receive such an inquiry might regard it as intrusion, which would decrease their interest in taking part in the forum in the future. This is obviously an important consideration. But if the researchers believe people may dislike it if they knew they were being quoted, is this not a reason to refrain from quoting their postings or to ask for their consent – especially because many of the comments are posted by young people and by people who might be in a vulnerable situation? In this case it is not easy to give a straightforward answer. It has to do in part with how great the potential is to be identified through the quotes, but it also has to do with how much the documentation is weakened by not using quotes when the results are presented, what alternatives are available for providing evidence for interpretations of the communication, and through which channels the results are disseminated. We do not have enough information to assess all of these aspects, but I would stress that there is no way to avoid a concrete assessment of all relevant values and alternatives in the situation, including the research consideration, in order to take a decision. One thing that is clear, however, is that if it is decided that consent to quote should be obtained, people should also be allowed to decide whether they want to take part in the study at all.

Example 5: Research on communication processes

The problem encountered here by Halvarson and Lilliengren is typical for many studies of communication processes: Information and questions about consent will disturb the natural interaction researchers want to study. Hudson and Bruckman (2004) have argued that in some cases like this it is acceptable to conduct research without consent, even though the researchers know that some participants in the service will dislike it. Hudson and Bruckman studied the reactions of participants on a chat service (IRC, a moderated, synchronous service) when they were informed that they were being observed and their communication was being registered. The researchers posted information stating that a study was being conducted in four different ways in a sample of discussion threads: In some they were present with the pseudonym «chat_study», in some they only posted that registration was being carried out, in a third group they mentioned the registration and gave the email address where people could opt out, and in the fourth people received an offer to opt in. In a majority of groups the researchers were thrown out by the moderators, and in all groups they received many negative reactions. Hudson and Bruckman summarize the results of their experiment as follows:

Based on this study, we can safely conclude that individuals in online environments such as chatrooms generally do not approve of being studied without their consent. The vehement reaction of many in our study indicates that they object to being studied. Further, when given the option to opt in or opt out of research, potential subjects still object.12

However, Hudson and Bruckman point out that in many groups they were not thrown out and that they do not have the chance to find out who does not want to participate in research and who is only reacting to the way the question about consent was asked. Thus they argue that it is acceptable – and the only possibility – to conduct research without consent if the IRB (Institutional Review Board)13 rules for such research are fulfilled:

  1. The research involves no more than minimal risk to the subjects.
  2. The waiver or alteration will not adversely affect the rights and welfare of the subjects.
  3. The research could not practicably be carried out without the waiver or alteration.
  4. Whenever appropriate, the subjects will be provided with additional pertinent information after participation.14

The key question is whether it is impossible to make the research based on consent (point 3). Hudson and Bruckman’s response is that in practice it is impossible to do so because their experiment shows that in synchronous forums it is difficult to implement a recruitment process in which the researchers reach those who want to participate without disturbing the communication.

This is problematic as a general conclusion, and Bruckman and Hudson also believe that a concrete assessment must be conducted of the potential negative effects of the research. But an objection to their approach is that they do not assess alternative strategies for obtaining the consent of participants from communities on the Internet. McKee and Porter comment on Bruckman and Hudson’s argument for research without consent in the following way:

We arrive at a different conclusion: Users are not always hostile to researchers. However, they do not want to be studied by researchers who have not shown proper respect for the community and who have not built up some measure of respect within the community. Trust is a key element of online communication.15

Conclusion

I have proposed a model for ethical assessments that distinguishes between three types of situations in which the question of consent is raised when research is conducted on users of social media. Research that exposes the participants to the risk of pain or discomfort triggers a requirement to obtain consent. If the research undermines the premises for communication that the participants have given their explicit approval to, consent is also necessary for maintaining the participants’ autonomy. In situations where the researchers’ observation and registration of the communication do not undermine the conditions for participation, typically public debate arenas, consent is not the only way to take the research participants into account. One problem will often be how the information will be used when the research results are presented, e.g. whether quotes that may identify the participants will be used. In this assessment, consideration for the quality and value of the research should also play a role.

The properties of social media vary along many dimensions, and this is the source of uncertainty related to their ethical assessment. An important dimension is communication’s degree of accessibility in the public sphere, which varies in different ways from other media. A variety of social media such as Facebook, Twitter, Instagram, Snapchat, etc. have different forms of user control, which offer different ways of limiting the audience. This helps to make it difficult to draw a clear distinction between situations where the researchers’ participation undermines the premises for communication and where it does not. There may also be other considerations that affect the weight of the ethical considerations. Among these are the vulnerability of the people being studied, the sensitivity of the topic of communication, searchability of the information being presented, the degree of interactivity with those being studied, and the participants’ actual competence in and understanding of how social media function.

References

Boyd, Danah (2008) Taken out of context: American teen sociality in networked publics. PhD thesis. University of California-Berkeley.
Halvarson, Camilla and Lilliengren, Peter (2003) Ethical and Methodological Dilemmas in Research with/on Children and Youths on the Net. I Applied Ethics in Internet Research, edited by May Thorseth. NTNU. 141–154.
Hoser, Bettina and Nitschke, Tanja (2010) Questions on ethics for research in the virtually connected world. Social Networks 32: 180–186.
Hudson, Jim and Bruckman, Amy (2004) «Go Away». Participant Objections to Being Studied and the Ethics of Chatroom Research. The Information Society. 20: 127–139.
McKee, Heidi A. and Porter, James E. (2009) The Ethics of Internet Research. Peter Lang.
Moe, Hallvard and Larsson, Anders Olof (2012) Methodological and Ethical Challenges Associated with Large-scale Analyses of Online Political Communication. Nordicom Review. 3 (1): 117–124.
NESH. 2009. Forskningsetiske retningslinjer for forskning på Internett. http://www.etikkom.no/Forskningsetikk/Etiske-retningslinjer/Samfunnsvitenskap-jus-og-humaniora/Internett-forsking/. Downloaded 5 July 2013.
Zimmer, Michael (2010) “But the data is already public”: on the ethics of research in Facebook. Ethics and Information Technology. 12: 313–325.

Footnotes

1 Personal communication.

2 The description is based on Zimmer (2010).

3 Quoted in Zimmer (2010), p. 316.

4 Zimmer, 2010, p. 318

5 There are open profiles on Facebook, e.g. open groups or open political profiles which, in my view, should not require consent in order to be used in research.

6 Hoser and Nitschke, 2009, page 185–186, my emphasis.

7 Moe and Larsson (2012).

8 Moe and Larsson, 2011, p. 122.

9 NESH, 2003, point 4.

10 NESH, 2003, point 6.

11 Hallvarson and Lilliengren 2003, p. 130.

12 Hudson and Bruckman, 2004, p. 135.

13 Independent ethical committees that oversee human subject research at each institution.

14 Quoted in Hudson and Brickman, 2004, p. 137.

15 McKee and Porter, 2009, p. 109.