Predictive biomarkers of platinum and taxane resistance using the transcriptomic data of 1816 ovarian cancer patients

• Biomarkers capable to predict response to platinum and taxane combination treatment in ovarian tumors were identi ﬁ ed. • Proteins including NCOR2, TFE3, AKIP1 and AKIRIN2 were among the most signi ﬁ cant genes validated in an independent cohort. • The integrated database with available treatment and response data can be mined to validate new biomarker candidates.


Introduction
Ovarian cancer is the second most common cause of death related to gynecologic malignancies in women.In 2018, the number of new cases worldwide was estimated to be 295,414 and the number of deaths was estimated to be 184,799 [1].The incidence and mortality rates of the disease vary by region and have shown an increasing trend in developed countries in the last decade [2].Almost 90% of ovarian malignancies are of epithelial origin.From a histopathological point of view, epithelial ovarian cancer (EOC) can be divided into serous, endometroid, mucinous and clear-cell histology subtypes.Of these, high-grade serous carcinoma is the most frequently diagnosed type [3].
Early diagnosis of ovarian cancer is difficult because most patients are asymptomatic in the early stages, and therefore, many tumors are detected only in the advanced stages.The recommendations of The National Comprehensive Cancer Network (NCCN) for advanced ovarian cancer treatment comprise surgery followed by systemic chemotherapy using cisplatin/paclitaxel (www.nccn.org).Although more than 80% of the patients initially respond to first-line treatment, most will have a recurrence within two years that progresses to advanced disease [4].
The most established biomarker for ovarian cancer is serum CA125, which is the gold standard for disease monitoring.A correlation between CA125 and chemotherapy response was proposed earlier [5], but it has not yet reached clinical application.Conversely, there is currently no validated predictive biomarker for ovarian cancer, although there is an urgent need to identify patients who are unlikely to benefit from platinum and taxane combination therapy.
To deliver personalized treatment decisions, a clear understanding of the exact mechanisms of drug resistance is required; resistance can be caused by multiple independent mechanisms, some of which have been extensively studied in patient samples and cell culture model systems.We and others have shown previously that resistance mechanisms against platinum-based agents include decreased drug accumulation [6], drug inactivation by glutathione and glutathione Stransferases [7], increased autophagy [8], and increased levels of DNA repair [9].Taxane resistance can result from overexpressed efflux pump genes [10], modulated microtubule dynamics [11], altered expression of β tubulin isotypes [11] and enhanced epithelial-tomesenchymal transition [12].Multiple studies have described gene expression alterations that are associated with drug resistance in ovarian cancer.Higher expression of IGF2BP [13], LIN28B [13] and MSLN [14] were reported in paclitaxel-and platinum-treated nonresponder tissues.Upregulated CHI3L1 inhibited paclitaxel-induced apoptosis in nonresponder cell lines [15].The higher expression of FOXM1 increased cell cycle progression in platinum-treated drug-resistant tissue samples [16,17].In cisplatin-resistant cell lines, upregulated CSF-1R [18] and downregulated OXCT1 facilitated the inhibition of apoptosis [19,20].
In the present study, our aim was to establish a framework to uncover and validate gene expression-based predictive biomarkers of therapy resistance by mining large, publicly available transcriptomic datasets of ovarian cancer patients with known treatment protocols and available clinical follow-up data.Furthermore, we also performed an independent sample collection of ovarian cancer specimens and performed RT-PCR using RNA from these tumor samples to validate the best performing biomarker candidates for predicting platinum and taxane resistance.

Preprocessing
First, the raw .CEL files were MAS5 normalized in the R statistical environment (www.r-project.org)using the Affy Bioconductor library [21].This was followed by a second scaling normalization to set the average expression of the 22,277 identical probe sets in each chip to 1000 [22].Normalized gene expression and clinical data were integrated into a PostgreSQL relational database.

Statistical computations
The tumor samples were divided into responder and nonresponder cohorts based on their clinical characteristics.For cases with available pathological response (PR), we classified the patients as published by the authors (PR dataset).If the pathological response was not available, the classification was based on the duration of the progression-free survival.Those with a relapse-free survival shorter than six months were compared to those without a relapse before six months.Patients censored before six months were excluded from the analysis.
The two cohorts were compared using the Mann-Whitney test and the receiver operating characteristic test in the R statistical environment (www.r-project.org)using Bioconductor libraries (www.bioconductor.org).The cutoff for p values was set at p b 0.05.False Discovery Rate (FDR) was calculated using the q value package (http://github.com/jdstorey/qvalue), and only results with a FDR b 5% were accepted as significant.

Clinical sample collection
In total, 81 fresh frozen ovarian tissue samples were collected during surgery from patients with ovarian cancer at the National Institute of Cancer (OOI) Budapest, Hungary (OOI set).Samples were stored in RNA later (Thermo Fisher Scientific, USA) at −80 °C until RNA isolation.An institutional ethics committee (Országos Onkológiai Intézet, Intézeti Kutatásetikai Bizottság -OOI IKEB) approved the study with the reference number OOI-Ált-9444-1/2013/59. Anonymized clinical data were obtained from medical and pathological records.

RNA isolation and cDNA synthesis
Total RNA was isolated using the AllPrep DNA/RNA Kit (Qiagen, Germany) following the manufacturer's protocol.The quality of RNA was assessed by UV spectrophotometry (NanoDrop, Thermo Fisher Scientific, USA).For quantitative PCR analysis, 1 μg of total RNA was reverse transcribed in a final volume of 20 μl using the Maxima First Strand cDNA Synthesis Kit for RT-qPCR, and dsDNase was used to remove any potential DNA contamination (Thermo Fisher Scientific, USA).

Quantitative PCR analysis
Quantitative PCR was performed in a CFX384 real-time PCR instrument (Bio-Rad Laboratories, USA) using the SensiFAST SYBR No-ROX Kit (Bioline Reagents, UK).Primers were designed for the same exons targeted by the microarray probes of each selected gene.GAPDH and ACTB were employed as endogenous controls for normalization.The reactions were performed in 10 μl containing 1 μl of cDNA, diluted 25-fold, and 250 nM of each primer.After an initial denaturation step for 2 min at 95 °C, 36 cycles with three steps were performed: 95 °C for 10 s, 62 °C for 10 s and 72 °C for 20 s.Each sample was measured in triplicate, and the threshold cycle (Ct) was determined for all genes using Bio-Rad CFX Maestro software (Bio-Rad Laboratories, USA).Relative gene expression values were analyzed using the ΔCt method.

Validation
For the validation, we selected a set of the best performing genes from the taxane-and platinum-treated samples.Similar to the discovery set, the Mann-Whitney test and ROC analyses were performed to compare the expression of each investigated gene in the responder and nonresponder cohorts.Statistical significance was set at p b 0.05.

Database
Overall, 10,283 patients in 134 GEO and TCGA datasets met our search criteria (Fig. 1A).After eliminating samples with insufficient clinical data, we maintained 1816 ovarian cancer samples.Of these, pathologic response was available for 1022 patients (PR dataset) and relapse-free survival at 6 months was obtainable for 1347 patients (RFS dataset).We selected the RFS dataset for further analysis because this included the larger patient cohort.Aggregate clinical characteristics of the cohorts are presented in Table 1 and Fig. 1B.

Validation
The top potential biomarker candidates from the in-silico analysis were selected for further validation by RT-PCR in the OOI cohort of ovarian cancer patients.Table 4 lists the primer sequences used in the PCR validation.From the 81 fresh-frozen ovarian tissue samples, we excluded 15 samples due to insufficient follow-up data (n = 7), lack of chemotherapy (n = 1) and conflicting histological diagnosis (n = 8).From the remaining 66 samples, 47 samples were categorized as responders, and 19 samples were categorized as non-responders based on the RFS duration as described in the validation cohorts.All patients in the validation set had cancer of the serous subtype.The clinical characteristics of the specimens are presented in Fig. 1C.
The relative expression values in comparison to GAPDH and ACTB, including clinical information for each sample, are presented in Supplemental Table 1.

Web application
Finally, to enable the independent validation of our results and the analysis of novel gene candidates, the previously established ROC Plotter web application [23] was extended to include the ovarian cancer datasets described above.The registration-free web interface can be accessed at www.rocplot.org/ovar.

Discussion
Chemoresistance is a key problem in cancer treatment and is responsible for the poor prognosis of ovarian cancer patients.The identification of drug resistance-related genes enabling personalization of treatment selection is of utmost importance.The primary aim of this study was to identify potential predictive biomarkers that could predict the response to the most commonly used combination treatment, platinum and taxane, in serous ovarian tumors.Second, we aimed to validate these findings in an independent set of clinical specimens.Our final task was to extend our freely accessible online tool to enable the investigation of gene expression-based predictive biomarkers in ovarian cancer.
Overall, we identified eight genes capable of classifying platinum and taxane drug responses.The independent validation results confirmed 6 of these genes.Some of these genes, namely, TFE3, NCOR2, PDXK and MARVELD1, were previously related to platinum-or taxanebased therapy resistance.Of these, TFE2, NCOR2, and PDXK were not significant in the platinum monotherapy cohort suggesting that these are markers linked to response to the combination therapy.
The translocation of the transcription factor E3 (TFE3) with different fusion partners was described in renal carcinomas and alveolar soft part sarcomas [24].It is overexpressed in head and neck squamous carcinoma treated with cisplatin-based chemotherapy.Higher expression of TFE3 indicated a poorer response to treatment [25].Consistent with these findings, we observed elevated expression of TFE3 in the nonresponder cohort.
The nuclear receptor corepressor 2 (NCOR2) is the repressor of the pregnane X receptor (PXR).PXR is a nuclear receptor that plays a role in the metabolism of different xenobiotics and endobiotics and was previously linked to cancer pathogenesis [26].NCOR2-overexpressing head and neck cancer cell lines showed increased resistance to paclitaxel, cisplatin and 5-FU [27].Our results also suggest that elevated expression of NCOR2 is one of the top biomarkers of resistance.
Pyridoxal kinase (PDXK) is the key gene in the synthesis of pyridoxal-5-phosphate during B6 vitamin metabolism.Previous studies reported the key role of B6 vitamin in the uptake of cisplatin in A549 lung cancer cells, and high PDXK expression was associated with better disease outcome in lung cancer patients; however, the latter finding was unrelated to the patient's chemotherapy treatment [28].In our clinical cohort, the elevated expression of PDXK was associated with a nonresponder phenotype, which seems to be a new feature of PDXK.
The Marvel domain containing 1 (MARVELD1) protein is a member of the MARVEL domain containing proteins.These proteins are involved in cell cycle progression, chemotactic activity and endocytosis.Higher expression of MARVELD1 was associated with increased chemosensitivity to epirubicin and 10-hydroxycamptothecin in hepatocellular carcinoma cells [29].The inhibition of MARVELD1 repressed paclitaxel and cisplatin resistance in lung cancer cells [30].In contrast, elevated MARVELD1 expression inhibited arsenictrioxide-induced apoptosis in liver cancer cells and was significantly related to worse overall survival of liver cancer patients [31].Here, higher expression of this protein in ovarian cancer samples increased chemoresistance to platinum and taxane combination therapy.
Two additional biomarker candidate genes (AKIP1 and AKIRIN2) have not been previously reported as potential resistance genes in platinum-and taxane-based chemotherapy.The A-kinase interacting protein 1 (AKIP1) is a nuclear protein that plays a role in NF-κB signaling [32].Previously, higher expression of AKIP1 was described in breast cancer samples, and the higher expression correlated with worse survival [33].In another study, AKIP1 was identified as a regulator of WNT/β-catenin signaling activation, which promotes the metastatic relapse of hepatocellular carcinoma [34].Our results confirm the positive relationship between high AKIP1 expression and poor prognosis.Akirin 2 (AKIRIN2) is another nuclear protein that functions in B cell activation Fig. 2. ROC curves and boxplots of the top four biomarker candidates involved in platinum + taxane resistance identified using the RFS at 6 months cohort.Only samples with serous histology and those treated with platinum and taxane combined therapy were included in the analysis.
and humoral immune responses [35].In a previous study, the gene was overexpressed in human cholangiocarcinoma cell lines and tumor tissues, and its elevated expression was associated with cell proliferation, migration, invasion, and angiogenesis [36].AKIRIN2 knockdown led to decreased chemoresistance in temozolomide-treated glioblastoma cell lines [37].Based on our study, the elevated expression of AKIRIN2 may  also have an impact on the response to platinum-taxane combination therapy.
The genes MARVELD1, AKIP1 and AKIRIN2 were also significant when performing the analysis in the platinum monotherapy treated cohortthese could be markers of resistance for both treatment settings.However, a limitation of this analysis is that only 109 patients were available for the platinum monotherapy cohort and an independent analysis in patients who received taxane only was not possible due to a very low sample number (n = 17).
There are two notable limitations of the database utilized as a discovery set in our study: first, the number of patients is limited for some treatment cohorts, including those with targeted or second-line therapy.For this reason, we could only select the platinum-taxane combination for further validation.Second, the database contains incomplete clinical annotations for some of the samples.These limitations can be abridged by a future extension of the database.
In summary, we collected 1816 ovarian cancer gene expression microarray samples with clinical information, including treatment and response data.Mining of this database has the potential to identify new predictive biomarkers, as we demonstrated here for platinum and taxane combination treatment.We validated a limited set of biomarker candidates by RT-PCR and identified six genes (TFE3, NCOR2, PDXK, AKIP1, MARVELD1 and AKIRIN2) with significant correlations with chemoresistance.The extended online analysis platform available at www.rocplot.org/ovarenables the discovery, validation and ranking of further predictive biomarker candidates in ovarian cancer.

Fig. 1 .
Fig. 1.Overview of the ovarian cancer databases included in the study.(A) Flowchart for the setup of the discovery datasets.(B) Clinical characteristics of each included dataset.OOI = National Institute of Oncology.

Fig. 3 .
Fig.3.ROC curves and boxplots of RT-PCR-validated genes in the serous histology subtype clinical specimens.Only samples with serous histology and those treated with platinum and taxane combined therapy were retained for validation.

Table 1
Summary of the clinical characteristics of the datasets included in the analysis.PCR: pathological response; RFS: relapse-free survival at 6 months.

Table 2
Overview of treatments administered to patients included in the discovery datasets.

Table 3
Top 8 gene expression-based biomarker candidates of platinum + taxane combined chemotherapy response in the RFS discovery dataset.

Table 4
Quantitative PCR primers for the selected and reference genes.