Documentation Center

Troubleshooting Archive Manager capture process

This topic describes limitations in the Archive Manager Capture Process.

URLs used in JavaScript files or script segments of an HTML page
URLs used in JavaScript files or script segments of an HTML page are not automatically included in the capture process.
Solution—to archive URLs used in script:
  • Publish directly from Content Manager
  • Configure as inclusion URLs in the Archive Manager configuration file
Redirection performed client-side using JavaScript code is not captured.
If a page contains a redirect, for example if redirect.jsp contains the following, then the original page which obtained on the Web server is the one archived (the redirect.jsp and not the Google Home Page):
<SCRIPT LANGUAGE="JavaScript">
window.location="http://www.google.com/";
</SCRIPT>
URL redirects
URLs that are redirected using an HTTP status code are archived using the redirected URL rather than the original one.
For example, if the capture process is configured to archive:
http://www.mysite.com

but an HTTP status code redirects to:

http://www.mysite.com/index.jsp

the archived URL is:

http://www.mysite.com/index.jsp
Nested resources such as Flash, Word, and PDF
When a document with a "text\html" or "text\css" content type is captured, the process automatically recaptures embedded\nested resources used by the document, such as CSS imports, JavaScript files, images, and so on when these pages are republished. Other document formats require additional work to archive these resources.
Solution—to force the recapturing of nested resources:
  • Publish directly from Content Manager
  • Configure as an inclusion URLs in the Archive Manager configuration file
Nested resources for documents of content types "text\html" or "text\css"
When capturing nested resources for documents of content types "text\html" or "text\css" the process only includes embedded resources and not linked resources (hyperlinks)
Solution—to archive linked resources:
  • Publish directly from Content Manager
  • Configure as an inclusion URLs in the Archive Manager configuration file
Web applications
Parts of a Web site running as Web application, such as site search and form data, are not supported.
Initial client page load data
Only data that is available during the initial client load of the page is captured.
Application Server caching
Application Server caching functionality affects the correct archiving of undeployment actions, specifically it affects the availability of a given URL.
Solution—switch off Application Server caching functionality.
Only the rendered output for dynamic pages and not the original file is archived.
Although the archiving of dynamic pages resulting from code execution (such as .jsp, .aspx, .asp pages, and so on) are supported by the capture process, only the rendered output for these pages and not the original file is archived.
Therefore, if such a page makes use of dynamic elements such as server time or database query, the archived record will contain the result that was rendered by these elements when the page was last captured.
Web sites that use more than one protocol require additional configuration.
Solution—Web sites are archived using a BaseUrl. If your Web site uses more than one protocol, for example parts of your site are secured using HTTPs, you need to archive this section separately by defining a separate BaseUrl. For example:
<Publications BaseUrl="http://www.mycompany.com:8080">
<Publication Id="1"/>
<Publication Id="1"
BaseUrl="https://www.mycompany.com/CustomerSupport:8081"/>
</Publications>