Cell Line Misidentification

“Evidence suggests that up to one-third of tumor cell lines being used in scientific research are affected by inter- or intraspecies cross-contamination or have been wrongly identified, thereby rendering many of the conclusions doubtful if not completely invalid.”

— Lancet Oncology, vol. 2, July 2001, p. 393

Cell line misidentification is a huge problem in cell culture research. It has been known for over half a century yet instead of the problem getting better, it has only become worse over time. This has led to many false and erroneous papers being published and their findings built upon by other researchers which has created a spiraling problem that has yet to be resolved. Excerpts from the two articles presented below highlight this troublesome issue and provide ample evidence for why all cell culture studies should be questioned.

Cell line misidentification: the beginning of the end

Cell lines are used extensively in research and drug development as models of normal and cancer tissues. However, a substantial proportion of cell lines is mislabelled or replaced by cells derived from a different individual, tissue or species. The scientific community has failed to tackle this problem and consequently thousands of misleading and potentially erroneous papers have been published using cell lines that are incorrectly identified. Recent efforts to develop a standard for the authentication of human cell lines using short tandem repeat profiling is an important step to eradicate this problem.”


The ghosts of HeLa: How cell line misidentification contaminates the scientific literature

“While problems with cell line misidentification have been known for decades, an unknown number of published papers remains in circulation reporting on the wrong cells without warning or correction. Here we attempt to make a conservative estimate of this ‘contaminated’ literature. We found 32,755 articles reporting on research with misidentified cells, in turn cited by an estimated half a million other papers. The contamination of the literature is not decreasing over time and is anything but restricted to countries in the periphery of global science. The decades-old and often contentious attempts to stop misidentification of cell lines have proven to be insufficient. The contamination of the literature calls for a fair and reasonable notification system, warning users and readers to interpret these papers with appropriate care.

The misidentification of cell lines is a stubborn problem in the biomedical sciences, contributing to the growing concerns about errors, false conclusions and irreproducible experiments [12]. As a result of mislabelled samples, cross-contaminations, or inadequate protocols, some research papers report results for lung cancer cells that turn out to be liver carcinoma, or human cell lines that turn out to be rat [34]. In some cases, these errors may only marginally affect results; in others they render results meaningless [4].

The problems with cell line misidentification [5] have been known for decades, commencing with the controversies around HeLa cells in the 1960s [610]. In spite of several alarm calls and initiatives to remedy the problem, misidentification continues to haunt biomedical research, with new announcements of large-scale cross-contaminations and widespread use of misidentified cell lines appearing even recently [1113]. Although no exact numbers are known, the extent of cell line misidentification is estimated between one fifth and one third of all cell lines [414]. (Although currently only 488 or 0.6% of over 80,000 known cell lines have been reported as misidentified, most cell lines are used infrequently [15].) In addition, misidentified cell lines keep being used under their false identities long after they have been unmasked [16], while other researchers continue to build on their results. Considering the biomedical nature of research conducted on these cell lines, consequences of false findings are potentially severe and costly [17], with grants, patents and even drug trials based on misidentified cells [18]. Several case studies performed by the International Cell Line Authentication Committee (ICLAC) highlight some of the potential consequences of using misidentified cell lines [1920]. Especially in the last decade, the gravity of the problem has been widely acknowledged, with several calls for immediate action in journal articles [3122123], requirements for grant applications (e.g. [2425]) and even an open letter to the US secretary of health [26].

The current calls for action and remediation activities are almost exclusively concerned with avoiding future contaminations, such as through systems for easier verification of cell line identities. Various solutions have been proposed [2729], among others employing genotypic identification through short tandem repeats (STR) [30]. In addition, authors are expected to check overviews of misidentified cells (such as [12152731]) before conducting their experiments. However, little attention is currently paid to the damage that has already been done through the past distribution of research articles based on misidentified cells. Although systems such as retractions and corrections are available to alert other researchers of potential problems in publications, these systems are rarely used to flag problems with cell lines [2032]. Even if future misidentifications could be avoided completely–which is not likely given the track record of earlier attempts–these ‘contaminated’ articles will therefore continue to affect research.

Before any action can be taken, it is essential that we get a sense of the size and nature of the problem of contaminated literature. This raises several questions. First, how many research articles have been based on misidentified or contaminated cell lines? How wide is their influence on the scientific literature? Second, what can we say about origins and trends in the contaminated literature? Is the problem getting better, or restricted to peripheral regions of the world’s research, where perhaps protocols are less strict? Third, what could be appropriate ways to deal with the contaminated literature? To answer these questions, we searched the literature for research papers using cell lines that are known to have been misidentified. In order to put the results of this search in perspective, we analysed the precise complications of misidentification for three particular cell lines.”

“Using complementary search strategies (see methods), we were able
to identify 32,755 articles (on August 4th, 2017) based on cell lines that are currently known to be different from the cell lines reported in these publications. As we only searched for cell lines known to be misidentified, this constitutes a conservative estimate of the scale of contamination in the primary literature.”

“In addition, research based on misidentified cell lines has a wide impact on the scientific literature, as it appears that these research papers are comparatively highly cited. WoS does not allow for precise total numbers, but we can give indications of this ‘secondary contamination’ of the literature. Analysing citations to primary contaminated articles, we found 46 papers with more than a thousand citations and over 2600 contaminated articles with over a hundred citations. Furthermore, over 92% of the contaminated papers are cited at least once, which is more than average for biomedical literature [34]. In total, we can conservatively estimate the citations to the primary contaminated primary literature at over 500,000, excluding self-citations, thereby leaving traces in a substantial share of the biomedical literature. Even though it is clear that articles may receive citations for many reasons, including negative or even ritual citations, and hence not all citing articles contain (critical) errors, the amount of research potentially building on false grounds remains worrisome.”

A transitory problem?

“One might wonder whether the contamination of the research literature is mainly a problem of the past, given that the first concerns about misidentified cell lines were expressed half a century ago [910] and that numerous initiatives have tried to alleviate the problem since.

Based on the set of 32,755 records of primary contaminated literature, we analysed the publication dates of the articles. The majority of the articles, 57%, were written since 2000 and the number of articles using misidentified cell lines is still growing (see Fig 2). Clearly, the problem is definitely not one of the past, but is very relevant to contemporary science, with 58 new articles based on contaminated literature appearing even as recently as February 2017.

Fig 2 indicates three moments in history when cell line contamination became evident. First, through the work of Stanley Gartler it became possible to detect intraspecies cell contamination, after which several of such contaminations involving HeLa cells were reported in Nature in 1968 [910]. Second, cell culture contamination was put on the global research agenda by the work of Walter Nelson-Rees et al. in the 1970s [78], culminating in a list of contaminated cell cultures in Science in 1981 that demonstrated large-scale contamination of cell cultures by HeLa cells [44]. From this point on, it could be expected that most scientists working in those areas of research frequently employing cell cultures, were aware of the potential issues with their research material. However, the vast majority of research papers based on misidentified cell lines was published after this point in time. Even after the introduction of STR in 2001 [45], the annual number does not decrease.

Similar to the primary literature, the number of articles in the secondary literature is also still growing. In 2016, over 40,000 papers were published that referred to primary contaminated literature. In addition, from the information in the Supplementary Material (S2 File), we conclude that the majority of misidentified cell lines continue to contaminate the secondary literature in 2017 (251 cell lines for search method 1 and 232 cell lines for search method 2), while dozens of cell lines created most of their secondary literature in the past two years (38 for search method 1 and 87 for search method 2). Moreover, we conclude that many cell lines (108 for search method 1, 87 for search method 2) have generated contamination in secondary literature for a period of more than 25 years, with articles appearing long after it became known that the cell line was misidentified. Hence the contamination of the literature through reference to articles using misidentified cell lines remains a very topical problem.

A peripheral problem?

Another objection to our findings could be that cross-contamination occurs particularly in regions with new or emerging research communities, in which levels of training or access to testing facilities may be limited. For example, several recent publications indicate levels of cell line contamination for China between 25% [13] and 46% [46] and demonstrate that of all ‘new’ cell lines developed in China 85% actually turned out to be HeLa cells [13].

However, the majority of the articles using misidentified cell lines originate from countries holding well-established research traditions (e.g. US, Japan, Germany). Relative to their share of total research output, authors from these countries often perform research on misidentified cell lines. In fact, mainly due to their enormous share of total literature on cell lines, over 36% of all contaminated primary literature stems from the US. Fig 3 shows the percentage of contaminated primary articles as a fraction of the total number of articles on cells per country (see Supplementary Materials S2 File for data). It includes the 25 countries with the largest share of the contaminated primary literature. In this list, we see countries holding excellent research reputations ranking high. Hence, the problem does not only occur in regions with low standards of quality and diligence in research, but is also a problem in countries that hold excellent research reputations. Nevertheless, an analysis of the literature for the past five years showed a dramatic rise of China’s share in the contaminated literature, confirming recent worries expressed in the literature [13].

“Our results seem to present worrying problems for the biomedical sciences. Although the issue of misidentified cell lines has long been known, its effect on the scientific literature has not been properly recognised, let alone properly treated [4748].”

“Despite measures to authenticate new and existing cell lines [27], research based on the wrong cells is still present in the literature and in fact continues to be published.”


In Summary:

  • A substantial proportion of cells are mislabeled or replaced by cells from different individual tissues or species
  • Thousands of erroneous and false papers have been consistently published using incorrectly identified cell lines
  • The problem has been known for decades yet faulty papers are still out in circulation
  • The misidentification of cell lines has led to a growing concern over errors, false conclusions, and irreproducible results
  • A review found 32,755 articles using misidentified cell lines which were cited by an estimated half a million other papers (which was a conservative estimate)
  • The accumulation of false literature is not slowing down
  • Estmates place the amount of mislabeled cell lines to be 1/5th to 1/3rd of all cell lines
  • Researchers are building upon false results
  • The majority of the articles written using misidentified cell lines (57%) have come since 2000, well after the problem was discovered in the 1960’s
  • The amount of false research is unknown and the problem continues to persist without being properly treated

It is clear that cell line misidentification is a problem that is not going away and has grown out of control over time. Taken into consideration with the evidence of the vast problem of cell line contamination from biological, chemical, and environmental factors, the toxic effects of the antibiotics/fetal bovine serum/media used, the lack of proper replication of the in vivo environment, and the inability to reproduce results, it is a wonder why any cell culture study should be considered valid.

It is fraud built upon fraud.



    1. I would imagine quite a few. According to this November 2021 article, it is still a problem:

      Using unauthenticated cell lines can create major problems

      “The latest version of the ICLAC Register of Misidentified Cell Lines released earlier this year, lists a staggering 576 cell lines as being misidentified. Of these, 531 have no known authentic stock, 73 do not correspond to the original donor, and 67 come from a different species, while HeLa is the most common contaminant among 144 different contaminants that have been identified. “At a minimum, scientists using misidentified or contaminated cell lines will waste valuable time and resources on experiments until the error is uncovered,” comments Leta Steffen, Senior Applications Scientist at Promega. “If the error goes undetected, the resulting data can lead to incorrect theories, derailing scientific progress and risking both reputations and future funding.” ICLAC has documented several case studies in cell-line misidentification, including an analysis of publications that refer to Chang liver cells in the title or abstract; these cells are now known to be HeLa derivatives that are unsuitable as a model for normal liver, causing major repercussions within the research community.”



Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: