Cali cancer registry methods

Abstract Background: The Population Cancer Registry of Cali (RPCC) has operated since 1962, disseminating high quality information to provide a framework to assess and control the burden of cancer in Cali. Methods: The collection of new cancer cases in permanent residents of Cali is done through active search in and notification from hospitals, and public and private laboratories. The Secretary of Municipal Public Health provides individual information on general mortality and death from cancer. Tumors are coded with ICDO-3 and mortality with ICD-10. Presented rates are standardized by age and trends are assessed by estimating the percentage annual change using the regression analysis in JoinPoint. The 5-year net survival was analyzed with the Pohar-Perme estimator. Results: The 88.5% of the registered cancers had morphological verification (MV). The proportion of unknown primary site represented 5% and the death certificate only cases (DCO) varied between 0 to3% depending on the cancer site. All deaths were certified by a physician, 94.2% of cancer deaths were correctly certified. The ill-defined site proportion was 5.3% and that of uterine cancer not specified (C55) was 0.5%. For survival analysis, existing data collection procedure and infrastructure ensures assessment of the patient’s vital status and follow-up, with an average lost to follow-up of 13.2%. Comment: The information has been published in the eleven volumes of "Cancer Incidence in Five Continents" confirming high quality of the collected data. The RPCC PCRC has also participated in the Concord Study and is participating in SURVCAN-3.


Introduction
The Population Cancer Registry of Cali (RPCC) was started in 1962 as a research program of the Department of Pathology of the Universidad del Valle. It was initially funded by a donation from the Ana Fuller Fund. Later, La Universidad del Valle became the main source of both financial and scientific resources of the registry. The RPCC began at the same time as Pan American Health Organization (PAHO) conducted the Urban Mortality Study, which examined in detail all the death certificates of the city 1 . The systematic study of these certificates was part of the data collection for the RPCC 2 .
Cancer registries are systems that collect information in a continuous and systematic way about each new cancer case identified within a specific population in a given area and period 3 . There are two types of cancer registries that complement each other, although they have distinct procedures and objectives: the population-based cancer registry (PBCR) and the hospital-based Cancer Registry (HBCR). The HBCR records all cases that go to a health center or specialized service, regardless of their place of residence, for administrative and patient care purposes. The purpose of the PBCR is to identify all new cases of cancer that appear among the inhabitants of a well-defined, natural or administrative demographic area. The main objective is to produce information to provide a framework to assess and control the impact of cancer on health of the community. Some registries might be specialized on one or several tumor location(s) are called Monographic; and can be both hospital-based and population-based. Central cancer registries gather and consolidate information from several registries that cover different areas, which can also be populationbased or hospital-based 3 .
The value of the modern cancer registry and its ability to carry out cancer control activities depend to a large extent on the underlying quality of its data and the established quality control procedures 4 . In this article, the Population-based Cancer Registry of Cali shows a standardized methodological guide and maintains the quality criteria for a reliable information system to estimate the burden of cancer in Cali.
Obtaining new cases of cancer Population and registration area Cali is the third largest city in Colombia, capital of the Province of Valle del Cauca, located by the Cauca river valley at coordinates 3°27'00" N 76°32'00" W. The western limit is the Farallones of Cali, which are part of the Western Cordillera of the Colombian Andes. According to both the 2005 census and National Administrative Department of Statistics of Colombia (DANE) projections, the estimated population for 2010 was 2.3 million inhabitants, 52% are women, and 26.2% self-identify as belonging to the black ethnic group 5,6 . The life expectancy at birth was 73.1 years for men, and 78.5 years for women 7 . The facilities for oncological care includes165 oncology services 8 , located in the urban area, where 95% of the population resides in an area of 110 km 2 . This area corresponds to 20% of the extension of the municipality of Cali (561.7 km 2 ) 9 ; Administratively Cali was divided into 22 communes, with a gross density of 4,094.7 inhabitants/km 2 . The rural land is approximately 424.4 km 2 (divided into 15 corregimientos or designated areas) 9 with a gross density of 0.83 inhabitants/km 2 . In 2012, the municipality of Cali was defined as the cancer registry area. The geopolitical map is shown in Figure 1.
Case definition People of any age, residents in the urban area of Cali, with a diagnosis of invasive malignant tumor for the first time (incident), of any anatomical location, that has been confirmed or treated in partial or in total. The basis for diagnosis can be both microscopic (fluid cytology, peripheral blood and bone marrow, histology of primary tumors and autopsy); and non-microscopic (clinical, surgical and imaging diagnosis). The following cancers were included: single or multiple primary malignant tumors, all tumors of the Central Nervous System and in situ breast and cervical cancer. Excluded are benign tumors with uncertain behavior, malignant tumors of metastatic sites, and basal cell and epidermoid carcinoma of the skin (these were included until 1986). The cases that arrived in the city for treatment or diagnosis purposes are not considered residents of Cali.   10 . In summary, the system, in addition to the registry of incident cases, actively follows children under 19 years old treated in pediatric oncology units in Cali. The system includes both residents of the city and patients referred from other municipalities and departments. As part of the RPCC, it also receives information from secondary sources, achieving an exhaustiveness of around 94% and a follow-up of 95% of registered cases. The outcomes under surveillance are the vital status, relapses, abandonment of treatment and second primary cancers. This system continues to monitors patients who leave treatment and, if their vital status is unknown, they are included as events for survival analyzes. The observed survival is reported, using the Kaplan-Meier method.
Comparability of the basic data collected The basic information for the RPCC is collected in a pre-coded form that includes data of the person: name, sex, date of birth, age, and address. Neoplasms are described with anatomical location, morphology, behavior and, degree of differentiation, multiple primary tumors, the extent of disease (breast and cervix) and the most valid basis of cancer diagnosis.
For the last 20 years, information on the outcomes has been collected: date of last contact, vital status, date of death, and cause of death. Neoplasms in adults are coded with ICDO-3 11 , whereas in children with ICCC-3 12 .
To calculate date of incidence we used the guidelines of the European Network of Cancer Registries (ENCR) 13 and this corresponds to the date of the first histological or cytological confirmation of cancer. For the classification of multiple primary tumors, the IARC / IACR guidelines 14 were used, which are also used elsewhere around the world, to report the incidence rates.

Confidentiality of information
The guidelines of the European Network of Cancer Registries (ENCR) 13 are followed. The director of the RPCC is responsible for the security of the information. All the staff members of the RPCC sign an agreement to guarantee the protection of the confidentiality of the data on the persons whose cancer is informed to the RPCC. Access to the physical space of the Registry is restricted to authorized persons only. The access to the confidential information is carried out using personal passwords that permit access to the computers holding the classified information and additionally closed files are used. Any data that is not used is automatically destroyed.
A single person (administrator) makes initial matching between databases to detect new cases and update vital status information. A registration number is assigned to each case and the information that identifies a patient is deleted before the data is analyzed (name and other documents that can lead to identification of the patient).

Facilities
Universidad del Valle has been the main source of financial and technical resources. The research group at RPCC has a head quarter (287 m² area) with 15 employees working in the registry. The head of staff and his advisors are senior researchers and pathology professors at the School of Medicine. The coordinator is a business administrator with a master's degree in epidemiology and the information system is managed by an engineer with a master's degree in engineering with emphasis in systems engineering and computer science. There are three data collectors. The staff has job stability due to university affiliation that provided permanent contracts. The RPCC assures stability to the rest of the human resources using specific projects funds. The Information Technology network includes an intranet with Internet access supported by the Office of Information Technology and Telecommunications of Universidad del Valle. The local network includes a server, 11 computers and 5 laptops. Backup copies are made twice a day by means of an automatic daily script and a monthly external copy. The technical team of the RPCC meets weekly to resolve the problem cases. The software of the RPCC (Siscan) performs consistency checks when entering the data and the internal consistency is checked every six months with IarcTools 15 . Before sending the information to international collaborators or external projects such as the IARC and the CONCORD program, the whole data set is rechecked with IarcTools 15 .
Periodic survey of medical specialists The three-yearly survey of medical specialists in the city is a key activity in which several groups of students from the Faculty of Health of the Universidad del Valle have participated. This survey lasts for eight weeks and complements the continuous cancer data collection by the RPCC. As an initial step, the inventories of sites that have oncological services for the diagnosis and treatment of cancer that are not covered during routine collection activities are updated. The Faculty of Health of the Universidad del Valle is contacted, and the participating students are trained in biology, cancer nomenclature, and the methodology standardized to obtain cancer cases. Each participant is assigned a supervisor (member of the RPCC) and support materials are provided that include: 1) General recommendations; 2) minimum variables for collection; 3) list of malignant tumors; 4) manual for completing the form of the cancer morbidity survey; 5) list of assigned specialist physicians; 6) cover letters; and 7) collection forms. The supervisor has permanent contact to clarify doubts and concerns and receive weekly update of the information collected.
Procedure for obtaining new cases of cancer Figure 2 summarizes the procedures for collecting information to obtain new cancer cases among permanent residents of Cali. The information is in physical format and structured and unstructured digital formats; and the extraction of the variables of interest is done in several phases manually or automatically. Figure 3 shows the procedures for detection of duplicate cases, multiple tumors, updating vital status, date of last contact, residence and identity of each new case of cancer. The procedures involved three phases, which are as follow: Phase 1. Extraction of information This is done through active search and manually when the information is in physical format and structured and unstructured digital formats; or automatic to obtain structured and unstructured listings. Hospital expenditures are obtained periodically in a structured digital format. With an automatic process of data extraction, for each case a matching with the database of the Population Cancer Registry is done in two methods: Exact search (Fig. 3) and Search by approximation (Fig. 4).
Phase 2. Update of the information When the cases already exist in the base of the RPCC (prevalent cancers), additional information is sought in the health insurance databases (public and private), general mortality in the city, and hospital discharges from clinics and hospitals in Cali. Information of identification, residence, date of last contact and vital state is

Phase 3. Inclusion of new cases
In phase 3, cases that are not found in the main database of the cancer registry are processed. First, the three additional data sources are searched ( Fig. 3) to find additional information that allows identification, residence and vital status to be completed. Afterwards they are entered into the main database as a new case of cancer (incidence). If additional information is not retrieved in the auxiliary databases, the case enters with only the data obtained in the extraction phase.
Search by approximation It is used when there is no information on the personal identification document (Fig. 4). The two sets of data to be compared are prepared namely data set (A) that are the extraction lists which contains the possible new cases of cancer and data set (B) which is the database that contains the information system of the RPCC. First the data set is divided into smaller groups to optimize matching, then standardized and indexed by blocks of similarity between two fields (names and date of birth), finally a weighted vector classification is made, where a threshold of similarity, the result is two groups of records: those that are estimated as potentially equal and those that are considered as a possible match whose process continues with a manual review, the records are evaluated to be paired between the two data sets 16 .
Procedures for the analysis of incidence and mortality The International Classification of Diseases (ICD-10) 17 is used for the coding of cancer. The main locations were defined according to the guidelines suggested by the IARC for the analysis of the incidence information; and by the WHO to group the primary site of the tumor and the causes of (cancer) death 18,19 . The structure of the population by sex and five-year age groups for each calendar year was obtained in the DANE 5 . The incidence and mortality rates for the entire population were standardized by age (ASR) by means of the direct method, using as reference the world standard population 20,21 . The global and specific rates by age and sex are expressed by 100,000 person-years. Trends in incidence rates were analyzed over ten 5-year period from 1962 to 2012; and those of mortality during six five-year periods, from 1984 to 2015. The summary measures to assess the trend of the rates over time was the annual percentage change (APC), calculated by the minimum method weighted squares 22 . For some locations and age groups it was impossible to estimate the APC because in some years there were no new cases or cancer deaths in these categories.   Analysis plan The response variable was the time between the diagnosis of cancer and the death of each individual. The maximum observation time for each subject for the failure to occur was five years. The censored variable was applied for patients who did not present the fault within the study period, and as a mechanism of censorship the loss was established during the follow-up and the end of the study.  Exhaustiveness assessment by death certificate method To verify the exhaustiveness, the death certificate method was used 27 . The principle is illustrated in Figure 6. Individual certificates of general mortality from all causes are received annually in a structured file in a digital format with information on causes of death in text and the basic cause codified with ICD-10 17 . We reviewed the causes of death to detect cancer cases that were not coded as cancer in the basic cause; and a variable is created to identify cancer cases (ICD-10: C00-C97; D05-D06, D32-D33, D45-D46, D47.1, D47.3). The initial pairing with the RPCC database allows to identify the prevalent cases that have died, the vital status and the date of death are updated. New cases reported annually through the death certificate are included in the RPCC database and are identified in a variable such as DCN. These cases will then be updated when the RPCC data collectors obtain newer information from the biopsy, the bone marrow aspirate, or the flow cytometry; the diagnostic method is updated, from death certificate to diagnosis by morphology. The active and continuous search of cases excludes some cases of mortality that are not related to cancer; and which will be used to update, once more, the diagnostic method that will convert from death certificate to diagnosis by clinical or by images. Finally, there is a remnant of cases whose only information came from death certificate (DCO). The proportion of unregistered cases that remained alive was estimated with the proportion of cases initiated by the death certificate (DCI) and the mortality: incidence ratio (M: I). Exhaustivity = 1−DCI *(1M:I)/(1−DCI)

Indicators of quality of the incidence information
The main quality indicators for some selected cancer sites are presented in Table 2. Age was known in 99.4% of patients. The mortality incidence ratio showed consistent values except for liver (1.43) and lung (1.02). In these locations, the number of deaths was greater than the number of cases recorded in the registry.
The percentage of cases with morphological verification (MV) -histology, cytology, bone marrow aspiration and flow cytometry-, for all cancer sites was 88.5% ranging between 85-100%, except in the liver (68.3%) and lung (66.4%). In patients with leukemia, Hodgkin's lymphoma and melanoma the MV was 100%.
The percentage of cases with a death certificate only (DCO) varied between 0-3%, except in the liver (4.5%) and in the lung (6.0%). In general, for major cancer sites, they had a low percentage of cases obtained through death certificate only. Another indicator of quality that is also usually considered is the proportion of cancer cases that was coded as poorly defined site. Between the years 2008-2012 these tumors represented 4.6% of new cases of cancer in men and 5.4% in women.

Quality indicators of survival information
During the 1995-2009 period, 40,354 cases of the selected cancers were registered, 1.73% occurred in patients under 15 years. In 2.4% there was no age information and they were excluded from the analysis. All patients had follow-up and 13.2% of the observations were censored; this proportion was higher in brain, melanoma, colorectal and ovarian cancers. In cancers with poor survival: stomach, lung, liver and pancreas; the censored rate was less than 10%. In the most frequently diagnosed cancers the censored percentage was 10.1%, 11.5% and 16.4%; for breast, prostate and cervix, respectively. In 15.3% of the cases the date of death and the date of incidence were the same.
Quality indicators of cancer mortality certification Mortality due to cancer represented 18.0% (23,793 / 132,397) of the total deaths that occurred in the city during the period 2006-2015. 0.8% of the cases were not coded as cancer in the basic cause. All deaths were certified by a physician; the proportion of poorly defined site (C76-C80, C97) was 5.3% and that of the uterine cancer not specified (C55) was 0.5%. Only 4 (0.02%) of the death certificate cases did not have age information. 94.2% of cancer deaths were well certified.
All patients died from cancer during the 2008-2012 period were found in the cancer registry database. For recognized sites of metastasis; liver, lung, bone and brain; the ICD-10 (17) code of the death certificate was compared with the topographic code of the ICD-0-3 11 assigned by the cancer registry. Table 3 shows the concordance of the two systems to assign the code for each of the described locations. 45% of the deaths coded as liver cancer in the

Discussion
The Cali cancer registry is the only one registry in low and middle income countries that has accurately reported the cancer situation continuously over the last half century. The information is of high quality and has been included in all eleven volumes of Cancer Incidence in Five Continents (CI5) 21,28-37 .
For forty years, RPC-Cali and was the only valid source of information on the incidence of cancer in Colombia 2 . The National Cancer Institute of Colombia (INC-Col) with the support of Universidad del Valle, promoted in the first decade of the 21st century the establishment of RPCs in strategic regions of the country to increase coverage. Due to this effort, the incidence  Currently, the RPC-Cali participates in SURVCAN-3, an initiative of the IARC to produce reliable and comparable survival statistics for countries in transition. Due to the great strength of the Cancer Registry, Cali is the first city in the world to implement the initiative "C/Can 2025: Challenge of Cities Against Cancer"; an initiative of the International Union for Cancer Control (UICC) that seeks to increase the coverage and quality of oncological care in the cities of more than one million inhabitants of low and middle income countries. The RPCC has social recognition in the city, thus facilitating the process of data collection that is made passively and actively from the various sources of data information. The oncological care facilities in Cali, include 165 oncology services enabled 8 to offer accurate diagnosis and adequate treatment to 9,000 patients per year 39 . Since its foundation in 1962, the RPCC limited the registration area to the urban area of Cali and developed a clear definition of "case", including only the new cases of cancer diagnosed in the permanent residents of the city; and excluding the cases of patients referred to the city for diagnostic and/or treatment procedures.
To estimate the rates and to construct the life tables for the survival study, reliable denominators based on population censuses and projections are required. The DANE facilitated the demographic structure of the population for the period 1962-2015.
Regulations for the notification of cancer in Colombia.
The Colombian government positioned cancer as a primary public health problem and established actions for comprehensive care to reduce morbidity and mortality due to this disease and  The value of a cancer registry depends greatly on the quality of the data and on the quality control procedures in force 4 . The RPC-Cali takes four dimensions into account to determine the quality indicators of the data collected: comparability, validity, timeliness and exhaustiveness.

Comparability
The RPCC uses standard methods to make the information comparable to other regions of the country and the world. The neoplasms are coded with the ICD-O-3 for adults 11 and the ICCC-3 for children 12 . For date of incidence, the guidelines of the ENCR (13) are followed and the IARC guidelines for the classification of multiple primary tumors were used 14 .

Validity
The main and most reliable sources of data for the cancer registry are the histopathology reports; but they are not enough to guarantee clarity, such as poorly accessible tumors: those of the CNS, pancreas, lung, retroperitoneum and others; the basis of the diagnosis can be imaging studies, clinical examination and DCO.
The percentage of RPC-Cali cases with a morphologically verified diagnosis (MV%) was 88.5%, similar to other RPC-Colombians and RPC-Latin American; and inferior to the majority of PRC-Europeans and North American RPCCs (90% -95%) 37 . Africa has the two contrasts (53.9% Uganda: Kyandono Country, 97.8% Algeria: Sétif) 37 . In low and middle income countries, a large proportion of cases diagnosed through the pathology service may suggest deficiencies in the search for cases and, therefore, evidence of incomplete registration.
In the RPC-Cali, the percentage of cases known only by death certificate (DCO%) was 1.7%; the lowest of all the RPC-Latin American; and like most RPC-North American and RPC-European 37 . Some RPCs in Africa and Latin American have DCO% greater than 10%; which indicate poor case detection and poor quality, because death certificates do not provide information on the morphology of the tumor. A high proportion of new cases of cancer based on a clinical diagnosis has the same interpretation.

Exhaustiveness
The incidence rates have been stable over time and the expected values are comparable with those reported by cancer registries that serve similar populations such as Quito (192.8 person-years and 198.9 person-years in men and women, respectively) and Costa Rica (173.9 person-years and 167.0 person-years in men and women, respectively) 37 .
The collaborative work with the SSPM of Cali facilitates access to information on general mortality and cancer; and allows us to have an independent source of verification of new cases of cancer. Cancer deaths were well certified at about 94.2%. The M:I ratio for all cancer sites during the period 2008-2012 was 51%; similar to that of other RPC-Latin American (range, 38.3% to 68%) 36 and higher than that reported by the United States (34.8% in men, and 36% in women) 36 through the SEER (Surveillance, Epidemiology, and End Results Program). In many Latin American countries, the M:I ratio is greater than one in tumors with high fatality such as pancreas, liver, esophagus. Fatality of these cancers are due to lack of complete information and/or lack of diagnosis when the patient was alive.
The exhaustivity index was 87% (method of death certificates IE-CD) and in the cancers prioritized by the PNDC it was greater than 90%, except in cases of prostate cancer (72%). This RPC-Cali index is higher than that reported by other international cancer registers (82.8% in Japan, Miyagi, 80.4% in Germany, Münster and 65.6% in the United Kingdom) 41 .
The method depends on the availability of relatively good quality certificates, which mention the cause of death (completely and accurately) in the area covered by the cancer registry. This method has not been applied in other RPC-Latin American countries.

Opportunity
The statistics of the cancer situation in Cali are public access after 36 months following the year of diagnosis. Data is also available on the RPPC portal http://rpcc.univalle.edu.co. This information describes 50 years of incidence (1962-2012), 30 years of mortality (1984-2014) and 15 years of survival (1995-2009).

Limitations
The data of each service in each institution are handled autonomously and independently. because the information is managed on different platforms, generating duplication of data, data transfer difficulties and a decrease in the quality and integrity of the information.
The Colombian oncology services periodically notify to different dependencies of the ministry of health (SIVIGILA, RIPS, CAC). These legacy systems are mostly local applications that lack interoperability for proper data management. Institutions begin to perceive notification as a burden and relegate them and deprioritize data transfer to the cancer registry. This complexity is a risk factor to guarantee completeness in the collection of information. Consequently, there are great possibilities of underestimating the cancer risk in the population. It is urgent to modify the current Ministry of Health regulations so that the RPC-Colombians are incorporated into the cancer information system with an adequate budget allocation.

Future challenges
The implementation of standards and transfer mechanism, shared information flows and adoption of tools are priorities to communicate effectively with different information systems in the city of Cali. Also set public policies that facilitate the implementation of these solutions. And then, the creation of an interinstitutional data warehouse is essential to provide key support for making decisions both public at the population level and administrative at the clinical level. The main objective of this implementation is to guarantee quality information for knowledge management proposes.