|
|
ARCHIVA WEB PAGE TEXT CAPTURE INCLUDED ONLY IN ARCHIVA PLATINUM (NOT INCLUDED IN PREMIUM)
|
Archiva Web Page Text Capture lets you easily capture the content of virtually any kind of web page, and save it in a variety of different formats—to a database, to the extended clipboard, or to a regular Nota Bene file. It extends the Archiva capture/parsing technology beyond the bibliographic capture that is characteristic of the other modules to entirely new realms—on-line newspapers, movies reviews, blogs, recipes, commentary, manuals, and the like. Quite literally, virtually the only limit is your imagination. Archiva Web Page Text Capture: 1. Comes predefined with rules for some sample sites (mostly on-line newspapers and magazines), enabling very specific parsing/structuring of the data from those sites 2. Provides a more general—but still very useful—level of parsing/structuring for all other sites (for which site-specific rules have not yet been written), in two categories: • User-designated sites—you can instruct Archiva to capture text from only those sites which you explicitly specify (by listing their URL’s), thus excluding all others • All other sites—alternatively (or in addition), you can have Archiva automatically capture data from any web site on which you select and copy text 3. Is designed as an open system, so that users can write their own capture/conversion rules, thus effectively moving any site from the second category to the first
The process is simple: then copy it (Ctrl+C) |
The Archiva modules work together to capture the full range of regular and bibliographic text, in the following sequence, and in the manner indicated:
While bibliographic citations captured by Archiva Articles always get written to an Archiva bibliographic database, you have a choice—in any combination—as to where you want text captured from a web page to be saved:
|
Configuring Web Page Text Capture Before you can capture text from a web page, you need to tell Archiva where you want to save the captured text:
However, you can configure Archiva in any way you like: • Each distinct supported web page (NY Times Articles, NY Times Blogs, the various Wall Street Journal options, etc.) can have their own distinct output options (or you can suppress output for a particular web page entirely) • If for any reason you want to separate out news articles from comments, blogs, and the like (or separate blogs from user comments), you can do so • Archiva comes a few predefined non-bibliographic databases specific to articles, blogs, and comments from which you can choose |
Archiva Web Page Text Capture opens up an entirely new way of managing data you discover on the web. In many ways it’s one of the most exciting of all of the Archiva modules. In the (unsolicited) words of those who have been testing it:
As an open, extendable system, the possibilities are virtually limitless.
|
|
|