This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Interactive Journal of Medical Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.i-jmr.org/, as well as this copyright and license information must be included.
Patient registries are often a helpful first step in estimating the impact and understanding the etiology of rare diseases - both requisites for the development of new diagnostics and therapeutics. The value and utility of patient registries rely on the use of both well-constructed structured research questions and relevant answer sets accompanying them. There are currently no clear standards or specifications for developing registry questions, and there are no banks of existing questions to support registry developers.
This paper introduces the [Rare Disease] PRISM (Patient Registry Item Specifications and Metadata for Rare Disease) project, a library of standardized questions covering a broad spectrum of rare diseases that can be used to support the development of new registries, including Internet-based registries.
A convenience sample of questions was identified from well-established (>5 years) natural history studies in various diseases and from several existing registries. Face validity of the questions was determined by review by many experts (both terminology experts at the College of American Pathologists (CAP) and research and informatics experts at the University of South Florida (USF)) for commonality, clarity, and organization. Questions were re-worded slightly, as needed, to make the full semantics of the question clear and to make the questions generalizable to multiple diseases where possible. Questions were indexed with metadata (structured and descriptive information) using a standard metadata framework to record such information as context, format, question asker and responder, and data standards information.
At present, PRISM contains over 2,200 questions, with content of PRISM relevant to virtually all rare diseases. While the inclusion of disease-specific questions for thousands of rare disease organizations seeking to develop registries would present a challenge for traditional standards development organizations, the PRISM library could serve as a platform to liaison between rare disease communities and existing standardized controlled terminologies, item banks, and coding systems.
If widely used, PRISM will enable the re-use of questions across registries, reduce variation in registry data collection, and facilitate a bottom-up standardization of patient registries. Although it was initially developed to fulfill an urgent need in the rare disease community for shared resources, the PRISM library of patient-directed registry questions can be a valuable resource for registries in any disease – whether common or rare.
N/A
The value and utility of patient registries are largely contingent upon the use of both high-quality research questions and structured and relevant answer sets accompanying them. The interoperability of registries or registry data—including the feeding of registry data into more rigorous clinical studies and regulatory submissions for new agents—depends upon the use of data standards; yet, there is currently no clear specification for developing registry questions nor are there banks of existing questions to support registry developers. The diverse nature of registries, sponsors, and disease-specific data requirements complicate efforts at standardization in registry applications. This paper introduces the
Often patient registries are a helpful first step in estimating the impact and understanding the etiology of rare diseases—both requisites for the development of new diagnostics and therapeutics. Because of the small numbers of patients affected by rare diseases, these registries present unique challenges related to registry design, enrollment of patients, and data collection [
There is tremendous variability in the type of data and specific questions that patient registries collect, due in part to the lack of registry-specific data standards and also to the heterogeneity of registries’ purposes and sponsors. Patient registries can be designed for many purposes, including public health surveillance, epidemiologic and longitudinal research, patient education, research recruitment, and population monitoring for the safety of post-marketed drugs and devices. Patient registries can include data reported by patients, researchers, or clinicians. (A characterization of registry types is summarized in Richesson and Vehik [
There is also a clear role for data standards to promote shared efficiencies in registry development and enable opportunities for data sharing. These needs are particularly pronounced for rare diseases, which have sparse resources and significantly fewer—and highly distributed—domain experts and affected patients. An important standards challenge is the fact that there is no central control of patient registries—there is no single funding or regulatory agency that can oversee all the different registry types and implementations. Because registries are developed by many sponsors to address distinct functions, a top-down standards effort would require countless stakeholders and is not feasible. Additionally, there is no central authority to monitor or enforce standards compliance once developed. Given the tremendous need for standards and the scope of data collected across disease-specific registries, and given that there is no incentive or regulatory means to develop standards or enforce compliance, alternatives to complement traditional Standards Development Organizations (SDOs) are needed. Non-traditional strategies for developing and promoting standards can be effective and embraced across various rare diseases if these various research communities perceive them as accessible, useful, helpful, and easy to adopt.
The PRISM project (funded by an American Recovery and Reinvestment Act (ARRA) grant administered through the National Library of Medicine (NIH), NLM Grant Number 1RC1LM010455-01, and supported by the Office of Rare Diseases Research) was developed to provide a useful resource to promote the efficient development of patient registries and standardized quality data collection by supporting the sharing and re-use of existing registry questions and data standards. The fundamental idea behind PRISM is that if registry developers could access questions used by other rare disease registries, they could consider and likely re-use these questions, thereby reducing the variation in questions/data collection across various patient registries, and leading to a
As a demonstration project, PRISM explored foundational issues related to the types of questions included in the bank (relative to other standards and question repositories) and the inclusion of metadata that will facilitate their search and retrieval. We describe our methodological approach to developing the PRISM library in the Methods section and present the resulting library structure, features, and composition afterward in the Results section.
The first questions identified for PRISM included a convenience sample of questions from well-established (>5 years) natural history studies in various diseases (metabolic [
We describe our selected strategy and design features for PRISM in the next section in terms of content, search and retrieval requirements, indexing model and metadata, and strategy for growing the content of the PRISM library. The authors thoroughly explored other standards throughout the design of PRISM and present the relationships and definitions between PRISM and other efforts related to standardized questions and patient reported data. Finally, in the discussion section, we describe immediate future directions for PRISM, including requirements for sharing the library, interface design, and future plans for maintenance and governance of PRISM.
At present, PRISM contains over 2,200 questions. A sample of 224 questions and selected metadata is presented in
To prevent overlapping with other standards efforts related to the collection of clinical data, a deliberate search for relevant standardization efforts was undertaken by the authors. This search of existing standards and informatics and library resources revealed several related and potentially relevant efforts, and informed the design of PRISM to leverage related efforts. As described in the introduction, the focus of PRISM is on
Relationship of PRISM to Related Standards Efforts and Resources (LOINC, caDSR, and PhenX).
Initiativea | Primary Sponsor | Objective | Scope of standard | Proposed relationship with PRISM |
Clinical LOINC |
NLM | Messaging and interoperability of clinical information. Specifically, the LOINC database provides a set of universal names and ID codes for identifying laboratory and clinical test results. | Health care (primarily) and research | Patient assessment scales are not generally included in PRISM. PRISM documentation directs users to LOINC for this content. [Note: Clinical LOINC does not contain every assessment scale ever published. Registry developers need to search Clinical LOINC or RELMA.] |
caDSR (CSHARE) |
NCI | Research data elements. “caDSR is a database and a set of APIs and tools to create, edit, control, deploy, and find common data elements (CDEs) …for use in software development.” |
Data elements for collection in clinical research studies. |
caDSR content and tools are targeted to clinical researchers. Much of caDSR content could be relevant to PRISM and rare disease registry developers, but is not complete nor easily searchable by rare disease users. PRISM includes a focus on registries and rare diseases and a community forum for rare disease registry standards. |
PhenX |
National Human Genome Research Institute (NHGRI) | To provide investigators with high-quality, relatively low-burden measures for inclusion in genome-wide association studies (GWAS) and other large-scale research efforts. | Data elements used in new research data collection, or used/queried from various electronic health records. |
Much of PRISM content is very disease specific, often idiosyncharic, and not included in PhenX. |
a These initiatives are not specific to patient registries or rare diseases.
Relationship of PRISM to Related Standards Efforts and Resources (PROMIS, SNOMED CT, RxNorm).
Initiativea | Primary Sponsor | Objective | Scope of Standard | Proposed relationship with PRISM |
PROMIS (Patient Reported Outcomes Measurement Information System) |
NIH | A system of highly reliable, precise measures of patient–reported health status for physical, mental, and social well-being. | Functional and quality-of-life assessment questions. Validated measures only; focus on psychosocial constructs across domains, not only specific diseases. | PRISM explicitly avoids content that is validated and intended to be used for measurement. |
SNOMED CT (SCT) |
Int’l. standards development organisation (IHTSDO) [Supported by dues from member nations] |
Provides a consistent way to index, store, retrieve, and aggregate clinical data across specialties and sites of care. | Comprehen-sive clinical terminology covering nursing and medical diagnoses, signs and symptoms, functional status, interventions, procedures, and outcomes. | SCT is used for indexing in PRISM. Each PRISM question associated with one or more codes that best represent the important content of the PRISM QAS. The most specific SCT code is used, with the understanding that only some PRISM questions get very precise representation in SCT. Similarly, multiple SCT codes can be used to index the clinical content of multi-concept questions. |
RxNorm | NLM | RxNorm contains the names of prescription and many non-prescription formulations in the US; aims to support electronic exchange of medication information and clinical decision support related to CPOE in health care contexts. | Standardized nomenclature for clinical drugs and drug delivery devices (mostly in US); gives normalized names for clinical drugs and links its names to many drug vocabularies used in pharmacy management and drug interaction software. | RxNorm does not represent data elements but is a nomenclature for clinical drugs. Many registries ask questions about specific medications. |
a These initiatives are not specific to patient registries or rare diseases.
Some data elements, like height and weight are in clinical data element repositories such as Clinical LOINC and research data element registries like caDSR. To enable “one-stop shopping” for registry developers, these items are also in PRISM, with the idea that future PRISM interfaces can identify linkages to these other standards where and when appropriate. These linkages can inform PRISM users that they indeed are using items from another designated standard and can also inform PRISM curators to ensure that they do not create or support future variations in that item.
In the interest of rapidly assembling content relevant to patient registries for any and all rare diseases, the initial PRISM strategy has been to accept virtually all questions, with the idea that either patient communities or curators might later filter, rate, or rank them for PRISM users. Despite limiting the inclusion of questions to those from well-established registries, some questions used in rare disease registries were poorly constructed or not as clear as they could be. Because PRISM is motivated to address the registry question needs for a spectrum of registry designs and diseases, it does contain registry data elements (in actual use) that might not be ideally constructed or might actually conflict with another value set. Regardless, the liberal acceptance policy of PRISM increases the breadth and volume of questions and ultimately increases the value of PRISM as a central resource for questions (which should be supplemented by advice on selection and use). We hope that others use PRISM as a resource for standards development or build applications that can facilitate the ranking or endorsement of certain PRISM questions over others in specific diseases or contexts.
As mentioned in the previous section, the growth of the library brought with it challenges for curation and use. Internally, a process was developed to add new content without duplicating questions and ensure that related questions or variants could be indexed for effective retrieval and comparison. We operated under the assumption that PRISM needed to be useful to provide value. We explored user roles and searching techniques to determine the best method for indexing question and answer sets (QAS). Our indexing scheme (including the use of controlled terminology) is described later. A key strategy for identifying the search and retrieval requirements was employing use cases.
The PRISM team developed narrative
A critical and largely unaddressed problem for registries (and clinical research data collection in general) is the need for patient registry questions and answers to be indexed in such a way that they can be retrieved for re-use, for example, to support rapid development of another related rare disease registry. In essence, indexing is the practice of applying metadata (structured and descriptive information) to items in a database for efficient and accurate retrieval [
1. Context (type of study, disease or treatment of interest, etc.)
2. Format of questions and location of semantics
3. Who is asking the question (patient, relative, doctor)?
4. Audience or person being asked the question (patient, family member or caregiver)
5. Relevant data standards for specific answer sets
Metadata are used to describe information resource-type features of questions, such as terms and attributes, and controlled vocabularies represent the actual content. Both are important for retrieval and, ultimately, interoperability. PRISM uses Dublin Core (DC) as a metadata framework for indexing PRISM QAS, and within DC metadata uses controlled terminology to reflect the semantics for each question. Our approach is described in detail in [
The use of Dublin Core metadata to annotate various QAS in PRISM offers a way to employ the most appropriate controlled vocabulary(s) for the content, while preserving retrievability. In addition to selected Dublin Core metadata elements and controlled terminologies, other decisions were made to ensure that these elements and vocabularies were used appropriately and consistently. Specifically, assuring each QAS is usable, reproducible, and understandable on its own merit. For example, a form that addresses gynecological issues may include a QAS addressing menstrual symptoms, such as “Do you experience cramps?” From the context of the form, this can refer only to menstrual cramps and may be coded with or without the “menstrual semantics”. However, when a later user searches the registry for library questions about abdominal, leg or other body site cramps, this question may be inappropriately selected. The QAS metadata (including the embedded SNOMED CT codes and the narrative definition of the question that PRISM includes) can be used to easily disambiguate this term.
Under the leadership of terminology experts from the College of American Pathologists, guidelines for using SNOMED CT were developed related to post-coordination, selection of hierarchies, and level of specificity. Guidelines for the use of SCT within the PRISM Library data model were developed collaboratively by the PRISM team and included the following decisions:
1. Assigning codes for entire question and answer set groups vs. discrete codes for questions and answers
2. An approach to take the (semantically) closest available SNOMED CT concept rather than creating a new one
3. Consequently, we considered but rejected the idea to create a SNOMED extension (“Ref Set”) mechanism
4. Versioning and change protocols were developed between USF and CAP partners
As described in the previous section, PRISM has developed a useful, consistent, and standards-compliant solution for the encoding of questions in PRISM. Ultimately, the metadata model and indexing strategy will be tested as the content of PRISM grows. Implementation plans should ensure that as the size of the PRISM library grows, duplicate questions are not inadvertently added. (Anecdotally, this is an issue with other question and metadata repositories, owing to the fact that a complete search of existing content must be undertaken before new content is added, and this search is both time-consuming and generally not incentivized.) For PRISM to remain a useful resource to rare disease registry developers, and for PRISM to support the goal to reduce question variation across rare disease registries, the library should not contain obvious duplicates and the library indexing model should be sufficient for users to search
One of the biggest challenges of the PRISM project has been to keep the scope reasonable and practical. Since the targeted audience for PRISM is researchers and registry developers (including PAGs with non-research backgrounds), we took a minimalist approach to coding with the goal of easy retrieval. It is our expectation that easy retrieval will drive increased usage, which will cultivate de facto standards, and those standards will ultimately support interoperability (see
We recognize that interoperability between registry questions and other data (eg, EHR data) would require more sophisticated coding with SNOMED CT and other data standards. Given the short duration of this project and the desire for maximum retrievability, we determined that this level of coding would be out of the scope for PRISM at this time.
It is not clear at this point how large a corpus of sharable items that there is among disparate and different rare diseases. Likewise, it is still unclear whether items for rare diseases are likely to be similar to items for more common diseases, and if so, whether there would be value in finding a way to include and reuse those items via PRISM as well. The future use and evaluation of PRISM content by multiple disease representatives will yield information on the reusability of the questions in PRISM within and across rare and common diseases, as well as provide practical examples of cross-disease standards and determination of standards gaps. To understand the reusability and generalizability of PRISM questions, additional work needs to explore the validity and reproducibility of the categorization and indexing of questions.
Theory of PRISM design to interoperability.
PRISM fills a void for the rare diseases research and registry development communities. The PRISM library resource can support standardized data collection in patient registries by reducing unnecessary variation. PRISM is free and available to search through the project website [
Authors are hopeful for future funding that will allow PRISM content to grow to meet the needs of the thousands of rare diseases registry applications and to allow computer mediated methods for adding and presenting content. The notion of using a distributed community of registry developers to curate this resource by commenting on and ranking items—as with the demonstration of caDSR content that is described in [
The PRISM developers, with the cooperation and support of the National Organization for Rare Disorders (NORD), are working to make the PRISM resource available and useful to registry developers representing all rare diseases and all countries. Currently, several rare disease patient advocacy organizations are participating in focus groups and expert interviews to inform the development of best interfaces and retrieval strategies to ensure that PRISM is a useful and accessible community-driven resource. In addition, we are developing international collaborations to explore the translation of items to support global rare disease research. Our overarching goal is that—given the sheer number of rare diseases, the variety of registry designs, and the number of languages that might need to be addressed—the PRISM leadership seizes and implements standards opportunities without burdening resource-strapped rare disease communities.
The lack of a clear set of standards and specifications for data collection using patient registries represents a significant data standards gap in an explosively growing application area—important to both drug development and patient-directed health communities. Standardization of patient registries can enable the interoperability of health and research data, as registries should be able to receive data from health care system or transmit data into various clinical research or pharmacovigilance applications. PRISM can be used to facilitate interoperability of existing and newly developed registries and to ensure that moving forward, registries use standard sets of questions. Without the use of such a resource, the proliferation of patient registries and variation of data collection questions will be inevitable. This central resource, the PRISM library, will support a bottom-up and incremental standards promulgation. By using a standard set of metadata elements and SNOMED CT to facilitate the retrieval and re-use of existing questions and standards, PRISM will reduce variation in the rare diseases registry community and assist registry implementers to produce high quality registries much more efficiently than ever before. Once variation in patient registries is reduced (ie, “standards” emerge), then issues related to harmonizing, mapping, and relating to the different standards communities for health care (eg, HL7) and research (eg, CDISC) can be addressed in an efficient manner. In this approach, the standardization of patient registry questions can serve to improve efficiencies, collaboration, and resource sharing across the entire drug development process.
Information regarding the development of PRISM and access can be found on the PRISM website [
Sample questions and selected metadata.
Cancer Data Standards Registry and Repository
College of American Pathologists SNOMED Terminology Solutions
Clinical Data Interchange Standards Consortium
CDISC Shared Health And Clinical Research Electronic Library (http://www.cdisc.org/cdisc-share)
Dublin Core Metadata Initiative
Health Level Seven
Logical Observation Identifiers Names and Codes
National Institute of Neuromuscular Disorders and Stroke
National Library of Medicine
National Organization for Rare Disorders
The White House Office of Management and Budget
Office of National Coordinator for Health Information Technology
Office of Rare Diseases Research
Patient Advocacy Group
Consensus measures for Phenotypes and eXposures
Patient Registry Item Specification and Metadata
Patient Reported Outcomes Measurement Information System
Question and Answer Set
Rare Disease Clinical Research Network (http://www.rarediseasesnetwork.org)
SNOMED CT
Standards Development Organization
Systematized Nomenclature of Medicine–Clinical Terms
University of South Florida
The [RD] PRISM Library project was funded by the American Reinvestment and Recovery Act. Support for the [RD] PRISM Library project is administered through the National Library of Medicine (NLM Grant Number 1RC1LM010455-01), a component of the NIH, and supported by the Office of Rare Diseases Research. The contents are solely the responsibility of the authors and do not necessarily represent the official views of NLM or ORDR or NIH.
The authors wish to thank Alice Graves, Heather Guillette, Jamie Malloy, Shu Liu, and Sarah Austin from the University of South Florida for their contributions to the PRISM project, and science and program officers from the NIH Office of Rare Diseases Research for their support.
The Cancer Checklists and the tools described in this article were developed by our colleagues at the College of American Pathologists, who are our partners in this work. Specifically we want to thank Narciso Albarracin, Christine Spisla, Rich Moldwin, Debra Konicek, Jaleh Mirza, and Debbie Klieman for their substantial efforts in the [RD] PRISM project.
The authors also thank the National Organization for Rare Disorders (NORD) and their member organizations for their support and advice, and for participation in ongoing evaluation and quality improvement activities.
None declared.