Data Entry for Science Hub
On January 26, 2009 Barack Obama signed a Memorandum1 on Transparency and Open government “to ensure the public trust and establish a system of transparency, public participation and collaboration”. On February 22, 2013 John P. Holdren, Director of the Office of Science and Technology Policy issued a memorandum2 “Increasing Access to the Results of Federally funded Scientific Research” which directed Federal agencies with more than $100M in R&D expenditures “to develop plans to make the results of federally funded research freely available to the public—generally within one year of publication.”
ScienceHub is EPA’s vehicle for meeting the above open data obligation and our opportunity to provide information generated from our research to colleagues and the public. There is an expectation that every EPA product cleared through the Science and Technology Information Clearance System (STICS) will have a corresponding data entry in ScienceHub. There are, however, a few exceptions and the type of information entered in ScienceHub can vary. Following is general guidance for making data available, or at least discoverable, in ScienceHub. Deviations for particular products can be considered but should ultimately maintain the spirit and intent of a transparent and open government see Table 1 for additional examples.
1. Products requiring complete data entry (metadata plus primary/secondary data) are those ‘owned’ by EPA either through in-house or EPA-funded efforts:
- Primary data from field or laboratory experiments used to inform/develop the product (e.g., surveys, citizen science/crowdsourcing, computationally, etc.)'
- Primary data supporting model development
- Products containing EPA-generated secondary data (adaptations or additions to primary data) to inform/develop the product (at a minimum, the location of primary data should be cited in a metadata entry)
2. Products requiring metadata entry only are those where the data used in the study were not generated or funded by EPA; the data is already available in the public domain; or, are not available to EPA for public dissemination. Metadata are intended to identify, at a minimum, the format, content and point-of-contact or location where the data can be found. The metadata should have a meaningful title, so the information can be located by a search engine.
- Data from external co-authors used to inform/develop the product
- Publicly available data used to inform/develop the product
- Cases where EPA generated data contains sensitive information (Human subjects research, PII, CBI, CUI, DURC or other homeland security risks)
3. Products requiring no metadata or data entry are those products excepted from data entry requirements. Note that this category, which requires a ‘No’ response in STICS line 14a, must be clearly justified in line 14b. Justifications must identify the reason for exclusion (e.g., product is an editorial with no data or model generated or presented). This category requires Branch Chief approval and the explanation in 14b will likely be reviewed.
- Review papers where no new data or models are generated or developed
- Editorials or opinions
- Instances where the EPA author conducted the work prior to joining the EPA (e.g., a recent recruit), or when an EPA employee mentors a student, and the data are stored elsewhere. Data belongs to the student or university or the author lists their affiliation in the article as their prior employer/organization affiliation and not EPA. May not require a STICS or ScienceHub entry.
- EPA Reports and Assessments do not require a ScienceHub entry
Table 1. ScienceHub Journal Article Dataset Entry Requirements
# |
Scenario |
Description |
Example |
STICS – EPA Associated Data? Answer: |
ScienceHub required elements |
---|---|---|---|---|---|
1 |
Article used EPA primary data. |
Primary data is data you generated directly from your research. |
Results for samples analyzed by EPA labs or for which EPA paid for analysis. |
Yes |
|
2 |
Article uses EPA secondary data |
Secondary is data generated by EPA from data previously published. |
Statistical analysis of 15 previously published studies (EPA or Non-EPA) to determine an association between the presence of two pollutants. |
Yes |
|
3a |
Model Use |
Journal article describes the use of an existing model. No data was collected or analyzed. |
Journal article that describes how the use of the SWIM model can be applied to differing situations. Uses sample data to demonstrate the model but not to draw conclusions. |
No |
None |
3b |
Model development/ refinement |
Journal article describes the development or refinement of a model. Provide data that was used to update the model. May be primary or secondary data |
An improvement to the SWIM model was made based upon new data used to improve the model. |
Yes |
|
4 |
Sensitive Data – PII, CUI, DURC* or other homeland security risks |
Data was used as part of the research, but the data contains confidential, proprietary business or personal identification information. Data if published would need to be redacted to remove what can’t be shared. Explanation and instructions for how to contact EPA expert for more information should be provided. |
Through a CRADA a private company agrees to share confidential business information and data. This data cannot be released because it would reveal a proprietary secret. |
Yes |
|
5 |
Literature review |
No scientific data. Just a review of existing literature. |
Article reviewingthe state of knowledge on a drinking water treatment process |
No |
None |
6 | Data collected for the research but doesn’t belong to EPA |
Data was not collected in EPA labs or paid for by EPA Note: If EPA collected any data or generated secondary data in addition to the collaborator’s data, then must provide EPA’s data (Scenario 1 or 2) and provide other information on how to access collaborator’s data. |
Collaborator at another agency or a university asks you to be an author, but they collect the data. You are listed as an author but did not contribute to the data | Yes |
See guidance below. |
7 |
Article is based upon data you or someone else has already made publicly available |
Data is available already to the public | A second article is published based on the same data released for a prior article. | Yes |
|
Note: Sciencehub metadata includes information Collaboration, Description of Data, Keywords, Date of Last Update, and Data Dictionary entries. Additional metadata/descriptors are necessary in the Data tab for articles with data not owned/collected by EPA.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 Federal Register Vol. 74 No. 15, Jan 26, 2009