University of California San Francisco
Helen Diller Family Comprehensive Cancer Center

Big Data Shows How Cancer Interacts with Its Surroundings

Analysis Reveals All Major Solid Tumors Follow Same Process

By Laura Kurtzman | UCSF.edu | October 20, 2017

Big Data Shows How Cancer Interacts with Its Surroundings

By combining data from sources that at first seemed to be incompatible, UC San Francisco researchers have identified a molecular signature in tissue adjacent to tumors in eight of the most common cancers that suggests they are all using the same mechanism to remodel normal tissue and spread.

The new study is the first systematic analysis of the normal-looking tissue near tumors that gets removed in cancer operations. Precision medicine researchers use this so-called normal adjacent tissue, which looks normal under a microscope and is usually at least two centimeters from tumors, as a basis of comparison to highlight the changes that occur in cancer. But the new study suggests the tissue is far from normal at the molecular level and is rather somewhere in between cancerous and healthy. The analysis demonstrates how tumors of many different types may be instigating inflammatory and other cancer-related processes in other tissues, to facilitate their spread.

“Tumors secrete factors all around, changing nearby tissue and possibly even tissues that are far away,” said Dvir Aran, PhD, a postdoctoral fellow at the UCSF Institute for Computational Health Sciences (ICHS) and the first author of the paper, published October 20, 2017, in Nature Communications. “We saw more or less the same effects across all the major cancer types, which suggests this is an important mechanism for the tumor.”

Much of what the study found actually replicates the findings of laboratory studies of how various types of cancer interact with their surroundings. What it adds, however, is a comprehensive view of how all the different cancers are using similar strategies to alter tissue outside their boundaries.

“The whole cancer world is focused on trying to figure out what the environment of these cancer cells is really like,” said ICHS Director Atul Butte, MD, PhD, who is the Priscilla Chan and Mark Zuckerberg Distinguished Professor in the UCSF Department of Pediatrics and also the senior author of the paper. “We’ve got to do more work like this to understand how the cancers are growing and thriving.”

The researchers used data from The Cancer Genome Atlas (TCGA) to analyze which genes were being turned on or off in the tissue adjacent to tumors in eight major cancers: lung, colon, breast, uterine, liver, bladder, prostate, and thyroid. They found that genes involved in the acute phase of systemic inflammation had been turned on, creating a pattern that was distinct from both the tumors and the healthy tissue from similar spots in the body found in another database—the Genotype-Tissue Expression (GTEx) program—that they used as a comparison. The GTEx program collected data from many different places in the body in hundreds of patients who died in the hospital and donated their bodies for research. It excluded tissue from donors who died of cancer or who had received chemotherapy or radiation within two years, so while the samples may not be completely healthy, they provided a good contrast to the cancerous and tumor adjacent tissue in TCGA.

Comparing the two databases presented several technical challenges. First, the researchers had to parse out the statistical variation that comes from slight differences in computational methods, then they had to find a way to control for the “batch effects” that come when data is collected at different times and places by different people. This was particularly hard to do, since there was no overlap between the two databases. But they found a way to use a statistical technique developed for RNA sequencing data called “remove unwanted variation,” and wound up with a combined database with information on 1,558 normal control samples, 428 adjacent tissue samples, and 4,500 cancer samples.

To demonstrate the validity of their initial finding that the tumor adjacent tissue was different from both cancerous and healthy tissue, the researchers analyzed data from smaller public repositories with adjacent and tumor samples from colon, liver, breast, and prostate cancers, along with healthy tissue samples from the corresponding locations in the body. These samples are not from combined projects, and are therefore less affected by the statistical noise the researchers were trying to remove from their larger assembled dataset. The data in the smaller datasets broke cleanly between the three tissue types in colon, liver, and breast cancer, and trended that way in prostate cancer.

This finding echoed a pattern the researchers saw in their big dataset: the tissue adjacent to cancers with more distinct boundaries, like breast, colon, liver, lung, and uterine cancer, was more clearly defined by the data than the tissue adjacent to cancers of the prostate and some types of thyroid cancer, in which tumors are more diffuse.

Read more at UCSF.edu