You are seeing this message because your Web browser does not support basic Web standards. Find out more about why this message is appearing and what you can do to make your experience on this site better.


ABOUT ARCHIVES
Advanced Search

Welcome   | My Account | E-mail Alerts | Access Rights | Sign In


  Vol. 142 No. 9, September 2006 TABLE OF CONTENTS
  Archives
  •  Online Features
  Study
 This Article
 •Abstract
 •PDF
 •Send to a friend
 • Save in My Folder
 •Save to citation manager
 •Permissions
 Citing Articles
 •Citation map
 •Citing articles on HighWire
 •Citing articles on ISI (2)
 •Contact me when this article is cited
 Related Content
 •Similar articles in this journal
 Topic Collections
 •Humanities
 •Medicine and the Media
 •Internet
 •Dermatology, Other
 •Alert me on articles by topic

Uniform Resource Locator Decay in Dermatology Journals

Author Attitudes and Preservation Practices

Jonathan D. Wren, PhD; Kathryn R. Johnson, MD; David M. Crockett; Lauren F. Heilig, BA; Lisa M. Schilling, MD; Robert P. Dellavalle, MD, PhD, MSPH

Arch Dermatol. 2006;142:1147-1152.

ABSTRACT

Objectives  To describe dermatology journal uniform resource locator (URL) use and persistence and to better understand the level of control and awareness of authors regarding the availability of the URLs they cite.

Design  Software was written to automatically access URLs in articles published between January 1, 1999, and September 30, 2004, in the 3 dermatology journals with the highest scientific impact. Authors of publications with unavailable URLs were surveyed regarding URL content, availability, and preservation.

Main Outcome Measures  Uniform resource locator use and persistence and author opinions and practices.

Results  The percentage of articles containing at least 1 URL increased from 2.3% in 1999 to 13.5% in 2004. Of the 1113 URLs, 81.7% were available (decreasing with time since publication from 89.1% of 2004 URLs to 65.4% of 1999 URLs) (P<.001). Uniform resource locator unavailability was highest in The Journal of Investigative Dermatology (22.1%) and lowest in the Archives of Dermatology (14.8%) (P=.03). Some content was partially recoverable via the Internet Archive for 120 of the 204 unavailable URLs. Most authors (55.2%) agreed that the unavailable URL content was important to the publication, but few controlled URL availability personally (5%) or with the help of others (employees, colleagues, and friends) (6.7%).

Conclusions  Uniform resource locators are increasingly used and lost in dermatology journals. Loss will continue until better preservation policies are adopted.



INTRODUCTION
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Author information
 •References

Approximately 80% of dermatologists with Internet access use the Internet for medical updating and professional purposes.1 Locating online health information, however, can be problematic because of the inconstant nature of Internet addresses, also known as uniform resource locators (URLs).2-7 The continual flux of information on the Internet is reflected in the changing content and disappearance of URLs, which may become unavailable because of changes in Web site organization, hardware reconfiguration, and file renaming.8

Previous studies2, 4-5,7, 9 examined the loss of cited URLs in journals encompassing multiple academic disciplines. Unlike previous estimates of URL use and availability, this study used an automated program to examine many full-text publications. To our knowledge, this is also the first study to survey authors with unavailable URLs regarding URL content and preservation.


METHODS
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Author information
 •References

URL ASSESSMENTS

All online publications from January 1, 1999, to September 30, 2004, in the 3 dermatology journals with the highest scientific impact, according to the 2003 Institute of Scientific Information Journal Citation Reports, were examined: The Journal of Investigative Dermatology, Archives of Dermatology, and the Journal of the American Academy of Dermatology. Advertisements were excluded. Full-text publications were downloaded to a local hard drive and saved in HTML format using an automated script (Visual Basic 6). An automated program downloaded all full-text publications and extracted all URLs that were located within text sections. Hence, URLs embedded in tables or figures were not detected for this study. The availability of each URL was determined in September 2004 using a previously described program (Visual Basic 6).4

Article characteristics captured included PubMed identification, journal name, and date of publication. Data recorded for each URL included text location, URL address, top-level domain (eg, ".com" or ".gov"), directory depth, presence of tildes, availability of the URL, and recoverability of unavailable URLs using the Internet Archive (IA) (http://www.archive.org). The presence or absence of an accession date (date the author last accessed the URL) was noted for a random sample, chosen using a random-number generator (http://www.random.org), of 100 URLs found in journal articles with PubMed identifications.

The URLs were classified as either available (yielding no error message when accessed using an Internet browser) or unavailable (yielding an error message when accessed using an Internet browser). The URLs that were redirected were noted and classified as available.

For all unavailable URLs, content recovery was attempted using the IA, an Internet archiving resource. The URLs were pasted into the IA's Wayback Machine to minimize data entry error, and subcategorized as follows: (a) a recoverable URL (ie, at least some retrievable content via the IA) or (b) an unrecoverable URL (irretrievable content). Two investigators (D.M.C. and L.F.H.) independently attempted this recovery and created 2 separate databases, which were then compared and reconciled by consensus when differences occurred.

Journal policies regarding URLs were sought in the online versions of the "Instructions for Authors" for all 3 journals. Statistical analyses, including descriptive statistics and {chi}2 tests, were performed using SAS statistical software, version 8 (SAS Institute Inc, Cary, NC). Data were stored in a database (Access 2000; Microsoft Corporation, Redmond, Wash).

AUTHOR SURVEY

Between June 30 and September 30, 2005, a questionnaire (available from the authors) was sent to the corresponding author of articles containing URLs initially identified as unavailable in September 2004 and reconfirmed as unavailable in May 2005. Random selection was used in cases in which multiple articles had the same corresponding author or the same unavailable URL, so that each author and URL were unique. E-mails and addresses were obtained from the articles. When an e-mail was not given, standard post was used. Up to 3 contacts were attempted by e-mail and 1 via standard post for each author. Replies were compiled and descriptive statistics provided by a commercially available Internet electronic survey tool (http://www.surveymonkey.com). This study received Colorado Multiple Institutional Board approval.


RESULTS
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Author information
 •References

URLs IN DERMATOLOGY FULL-TEXT ARTICLES

In the 271 online journal issues sampled (Archives of Dermatology, 81; Journal of the American Academy of Dermatology, 108; and The Journal of Investigative Dermatology, 82), 7337 articles included 1113 URLs, of which 801 were unique (Table 1). Overall, 7.6% of articles (554 of 7337) contained at least 1 URL. The percentage of published articles containing at least 1 URL increased from 2.3% in 1999 to 13.5% in 2004 (January to September). The total number of URLs published increased annually from 78 in 1999 to 309 in 2003. Of the URLs, 27.3% appeared in the article text ("Introduction," "Materials and Methods," "Results," or "Discussion"), while the remaining 72.7% were located in the references or other areas of the article locations.


View this table:
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Table 1. URL Characteristics


URL AVAILABILITY

Overall, 18.3% of URLs were unavailable for all 3 dermatology journals in all years. The availability of URLs decreased significantly with time since article publication, with 89.1% of URLs published in 2004 (January through September) and 65.4% of those published in 1999 available (P<.001) (Figure 1). The availability was the highest in the Archives of Dermatology (85.2%) and the lowest in The Journal of Investigative Dermatology (77.9%) (P=.03). For all years, the likelihood of availability was significantly associated with top-level domain (P=.003): ".edu" (34.2%), ".org" (18.7%), ".net" (18.8%), ".com" (15.5%), ".gov" (14.7%), and other (ungrouped top-level domains) (21.5%). The URLs with a directory depth of 0 (also known as root directories) (eg, http://www.uchsc.edu) were significantly more likely to be available compared with those with a directory depth of 1 (eg, http://www.uchsc.edu/derm/) or more (8.4% vs 26.1% unavailable) (P<.001). Neither the presence of an accession date, indicating when the URL was last viewed by the author (P<1.0), nor a tilde in the URL (P≤.22) was significantly associated with availability. (A tilde [~] character is an "alias" indicating that the Web page is located on a user or group directory and, thus, does not specify the directory path. If a user account becomes inactive, then redirection will fail even if the Web page files are available on the Web server.) Of 100 randomly chosen URLs, 39 had accession dates.


Figure 600011
View larger version (86K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Figure 1. Uniform resource locator (URL) use in the Archives of Dermatology (A), the Journal of the American Academy of Dermatology (B), and The Journal of Investigative Dermatology (C) from 1999 to 2004. Percentages indicate unavailable URLs for each year. The asterisk indicates that data in 2004 were only from January through September.


Of 204 unavailable URLs, the content of 120 (58.8%) was recoverable in some form using the IA. This increased overall recoverability of at least partial content to 92.5% of URLs in all journals for all years.

SURVEY OF AUTHORS WITH UNAVAILABLE URLs

A total of 102 unique corresponding authors of articles with unavailable URLs were e-mailed a survey (Figure 2) regarding the unavailable URLs, and 67 (65.7%) responded. Less than half (43.9%) had attempted to access the URL after publication, suggesting that most URLs become unavailable without the knowledge of the citing authors. Most (55.0%) of the cited URLs reference content outside the direct control of the authors and their coworkers. Of 60 respondents, 7 (11.7%) had direct control over URL availability.


Figure 600012
View larger version (104K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Figure 2. Dermatology article author responses regarding unavailable uniform resource locators (URLs). CD indicates compact disc. Percentages are based on the denominator of total respondents for each question. Boldface indicates the most frequent response.


Most authors (32 [51.6%] of 62) did not know why the URL they cited was unavailable. However, consistent with previous findings,4 about 11% of URLs were misspelled in the final publication. Three (4.5%) indicated that the URLs became unavailable because of a lack of funding or support.

Most responding authors (63.9%) had preserved cited URL content, most commonly (29.5%) by printing it. Few (4.9%) had used an Internet-based archive for content preservation. Most (55.2%) agreed that the content of the cited URL was important to their publication, most often (60.7%) as a means of contributing to background information for the study. The most common reason for citing a URL was to provide additional information about a topic (54.1%) or to link to additional data or analyses (37.7%). Only 14.3% indicated that an alternative source of data (other than the cited URL) was available at publication.

Most often, the nature of the URL was a text-based document (46.8%), which can be backed up by several means, but 45.2% of the URL links pointed to either a database (33.9%) or a software program (11.3%), which is not as straightforward to back up.

URL POLICIES BY JOURNAL

Since January 2002, the "Instructions for Authors" of the Archives of Dermatology (http://archderm.ama-assn.org) has provided an example Internet reference with an accession date and has recommended that authors retain a printed copy of any referenced Internet-only information to ensure access to cited information if the URL is altered or disappears. The "Instructions for Authors" of the Journal of the American Academy of Dermatology (http://www.eblue.org) and The Journal of Investigative Dermatology (http://www.jidonline.org) do not mention an Internet referencing policy. None of the 3 journals restricted URLs to specific locations in articles.


COMMENT
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Author information
 •References

This study confirms that URLs are increasingly cited as sources of scholarly information in dermatology journals, and that a significant portion of cited information is no longer available. Of 1113 URLs examined, 18.3% were unavailable. The probability a URL would become unavailable was significantly associated with increasing time since publication, journal, top-level domain, and greater directory depth, but not with the presence of a tilde or an accession date. These associations support the findings of Casserly and Byrd2 in information science journals. Of unavailable URLs, 58.8% were recoverable in some form in the IA, and an assessment of content relevance of randomly selected URLs yielded no irrelevant information content. This study also corroborates findings that 12% of URLs in MEDLINE abstracts contain spelling or formatting errors that render the published URL unavailable.4

The Internet serves as an invaluable network that provides global access to information. However, the average lifespan of a Web site is far from sufficient to ensure reliable long-term availability.10-11 Because of the inconstant nature of URLs, neither publishers nor authors are able to guarantee the long-term accuracy or availability of digital information referenced in dermatology journals. Effective solutions will likely require a collaborative effort on the part of researchers, authors, and journal editors.

Digital archiving resources offer one approach to preserving digital information. The IA, a public nonprofit organization, was constructed with the purpose of archiving Internet content and can often locate content of otherwise unrecoverable URLs, with snapshots taken on multiple dates. Unfortunately, archived versions of dynamic Web pages may not fully retain functionality, and other URLs, including those that are password protected or that block Web crawlers, are not available for archiving. Moreover, IA archiving typically takes place every couple of months, so changes made during this time will not be preserved. Thus, while 58.8% of unavailable URLs were classified as "recoverable" on the IA, the information recovered could not be verified as identical to that viewed and cited by the author.

An additional problem is the possibility of copyright infringement associated with preserving Internet content that is not the intellectual property of the citing author. In terms of scientific publications, for example, a recent study12 demonstrated that many authors make journal article reprints available online, which may in turn be archived by the IA regardless of whether the journals want this content freely available. It is difficult, if not impossible, in many cases for the IA to ascertain what content has been legally posted and what content may be illegal. Web authors may ask to have their electronic content removed from the IA (more information is available at: http://www.archive.org/about/faqs.php), which may further limit the ability of the IA to preserve URLs.

Other efforts to remedy the problem of URL loss exist (Table 2). Software programs, such as Peridot (IBM Corporation, White Plains, NY)13 and Xenu's Link Sleuth (http://home.snafu.de/tilman/xenulink.html), automate the updating of linked Web sites. Another program (FURL; LookSmart, Ltd, San Francisco, Calif) (http://www.furl.net) also serves as a digital information archive, but preserves only URL content submitted by individuals for personal archiving. Alternatively, WebCite specifically targets preservation of URLs in academic journals.


View this table:
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Table 2. Tools for URL Preservation and Recovery


Readers commonly use additional recovery methods, such as typing the higher-level stem (beginning) of an unavailable URL or the entire URL into a search engine such as Google. About 30% of the unavailable URLs in our study yielded prima facie relevant information using these methods. In the end, however, the reader does not know with certainty that this retrieved information is, in fact, the originally cited information.

Uniform resource locator content might also be better preserved by using more permanent alternatives to URLs for locating information on the Internet. Uniform resource locators serve as the name (identifying content) and address (identifying location) for Internet resources, rendering cited content unavailable if either one changes. Alternatively, permanent URLs are associated with specific URLs, but are unchanging, effectively redirecting the Web client to the correct URL via an intermediary resolution service.8 This process is not fully location independent, and its success depends on the reliability of permanent URL maintainers to update the associated URL if it changes.8 Other alternatives are uniform resource names, permanent location-independent identifiers of cited resources that rely on a resolving service; and digital object identifiers, which identify a digital object by name only, using a persistent novel identifier embedded within a URL.14

In light of the limitations of URL preservation options, the importance of improving journal policies regarding URLs cannot be overstated. In a recent study15 of the top 100 medical and scientific journals, as rated by the Institute for Scientific Information for scientific impact, only one, the Archives of General Psychiatry, had a URL preservation policy stated in the "Instructions for Authors." Of the 3 dermatology journals, only the Archives of Dermatology gives specific mention to Internet referencing in the "Instructions for Authors," using the same policy used by the Archives of General Psychiatry. The Archives of Dermatology also demonstrated a significantly lower rate of unavailable URLs in this study. Publishers, editors, and authors should work together to discover and implement feasible solutions to URL content loss15-18 by (1) requiring authors to retain digital backup or printed copies of cited Internet-only information to facilitate content recovery should a URL become unavailable and (2) advocating the inclusion of referenced Internet content in an online archive (Table 2). In addition, URLs need systematic double checking before publication to minimize unavailability due to spelling errors or misprints.

The adoption of standard electronic referencing policies, the use of Internet-based archives, and collaboration between authors and publishers will hopefully lead to more permanent URL availability in dermatology journals. Ultimately, widespread acceptance and support for these easily implemented policies could serve as a model for all medical literature.


AUTHOR INFORMATION
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Author information
 •References

Correspondence: Robert P. Dellavalle, MD, PhD, MSPH, Dermatology Service, Department of Veterans Affairs Medical Center, 1055 Clermont St, Mail Code 165, Denver, CO 80220 (robert.dellavalle{at}uchsc.edu).

Accepted for Publication: January 9, 2006.

Author Contributions: Study concept and design: Wren, Schilling, and Dellavalle. Acquisition of data: Wren, Johnson, Crockett, Heilig, and Dellavalle. Analysis and interpretation of data: Wren, Johnson, Heilig, and Schilling. Drafting of the manuscript: Johnson, Crockett, Heilig, Schilling, and Dellavalle. Critical revision of the manuscript for important intellectual content: Wren, Heilig, Schilling, and Dellavalle. Statistical analysis: Heilig. Obtained funding: Dellavalle. Administrative, technical, and material support: Wren, Heilig, Schilling, and Dellavalle. Study supervision: Heilig and Dellavalle.

Financial Disclosure: None reported.

Funding/Support: This study was supported by grant EPS-0447262 from the National Science Foundation Experimental Program to Stimulate Competitive Research (Dr Wren); grant T32 AR07411 from the National Institutes of Health (Dr Johnson); in part by research grant R25 CA49981 from the National Cancer Institute Education (Mr Crockett); grant 5 D14HP00153, a Faculty Development in Primary Care Health Services Research Award (Dr Schilling); and grant K-07 CA92550 from the National Cancer Institute (Dr Dellavalle).

Previous Presentation: This study was presented at the Fifth International Congress on Peer Review and Biomedical Publication; September 16, 2005; Chicago, Ill.

Acknowledgment: We thank John Kittelson, PhD, Department of Preventive Medicine and Biometrics, University of Colorado at Denver and Health Sciences Center, for statistical advice; and Eric Hester, MD, Jennifer Myers, MD, Renee D’Ambrosia, MD, Kristy Lundahl, MBA, and Shayla Francis, MD, for their work on this project.

Author Affiliations: Department of Botany and Microbiology, Advanced Center for Genome Technology, University of Oklahoma, Norman (Dr Wren); Departments of Dermatology (Drs Johnson and Dellavalle and Ms Heilig), Preventive Medicine and Biometrics (Ms Heilig and Dr Schilling), and Medicine (Dr Schilling), University of Colorado at Denver and Health Sciences Center, Aurora; Colorado School of Mines, Golden (Mr Crockett); and Dermatology Service, Department of Veterans Affairs Medical Center, Denver (Dr Dellavalle).


REFERENCES
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Author information
 •References

1. Gjersvik PJ, Nylenna M, Aasland OG. Use of the Internet among dermatologists in the United Kingdom, Sweden and Norway. Dermatol Online J. 2002;8:1. PUBMED
2. Casserly M, Byrd J. Web citation availability: analysis and implications for scholarship. Coll Res Libr. 2003;64:300-317.
3. Currò V, Buonuomo PS, De Rose P, Onesimo R, Vituzzi A, D’Atri A. The evolution of Web-based medical information on sore throat: a longitudinal study. J Med Internet Res. 2003;5:e10. http://www.jmir.org/2003/2/e10. Accessed May 9, 2006. PUBMED
4. Wren JD. 404 not found: the stability and persistence of URLs published in MEDLINE. Bioinformatics. 2004;20:668-672. FREE FULL TEXT
5. Lawrence S, Coetzee F, Glover E, et al. Persistence of Web references in scientific research. IEEE Comput. 2001;34:26-31.
6. Koehler W. Web page change and persistence: a four-year longitudinal study. J Am Soc Inf Sci. 2002;53:162-171. FULL TEXT
7. Dellavalle RP, Hester EJ, Heilig LF, et al. Information science: going, going, gone: lost Internet references. Science. 2003;302:787-788. FREE FULL TEXT
8. Schafer K, Weibel S, Jul E. The PURL project. J Libr Adm. 2001;34:123. http://digitalarchive.oclc.org/da/ViewObject.jsp?objid=0000003338. Accessed November 14, 2005. FULL TEXT
9. Hester EJ, Heilig LF, Drake AL, et al. Internet citations in oncology journals: a vanishing resource? J Natl Cancer Inst. 2004;96:969-971. FREE FULL TEXT
10. Kahle B. Preserving the Internet. Sci Am. 1997;276:82-83. ISI | PUBMED
11. Spinellis D. The decay and failures of Web references. Commun ACM. 2003;46:71-77.
12. Wren JD. Open access and openly accessible: a study of scientific publications shared via the Internet. BMJ. doi:10.1136/bmj.38422.611736.E0. 2005;330:1128. Accessed May 9, 2006. FREE FULL TEXT
13. Twist J. Web tool may banish broken links. http://news.bbc.co.uk/1/hi/technology/3666660.stm. Accessed November 14, 2005.
14. Caplan P. DOI or don't we? http://info.lib.uh.edu/pr/v9/n1/capl9n1.html. Accessed November 14, 2005.
15. Schilling LM, Kelly DP, Drake AL, Heilig LF, Hester EJ, Dellavalle RP. Digital information archiving policies in high-impact medical and scientific periodicals. JAMA. 2004;292:2724-2726. FREE FULL TEXT
16. Johnson KR, Hester EJ, Schilling LM, Dellavalle RP. Addressing Internet reference loss. Lancet. 2004;363:660-661. ISI | PUBMED
17. Kelly DP, Hester EJ, Johnson KR, et al. Avoiding URL reference degradation in scientific publications. PLoS Biol. doi:10.1371/journal.pbio.0020099. 2004;2:e99.
18. Schilling LM, Wren JD, Dellavalle RP. Bioinformatics leads charge by publishing more Internet addresses in abstracts than any other journal [letter]. Bioinformatics. doi:10.1093/bioinformatics/bth385. 2004;20:2903. FREE FULL TEXT


THIS ARTICLE HAS BEEN CITED BY OTHER ARTICLES

URL decay in MEDLINE--a 4-year follow-up study
Wren
Bioinformatics 2008;24:1381-1385.
ABSTRACT | FULL TEXT  

The Prevalence and Inaccessibility of Internet References in the Biomedical Literature at the Time of Publication
Aronsky et al.
J. Am. Med. Inform. Assoc. 2007;14:232-234.
ABSTRACT | FULL TEXT  





HOME | CURRENT ISSUE | PAST ISSUES | TOPIC COLLECTIONS | CME | SUBMIT | SUBSCRIBE | HELP
CONDITIONS OF USE | PRIVACY POLICY | CONTACT US | SITE MAP
 
© 2006 American Medical Association. All Rights Reserved.