|
Archiva is a very sophisticated program, but in the simplest of terms, it functions as a data-gathering and processing system—it reads the data on web pages, in formatted electronic bibliographies, on the clipboard, and from other sources (the results returned by library searches, or lists of ISBN numbers/scanned barcodes), and then structures and reorganizes, or “parses,” this data out into its component parts.
It does so by using a set of rule-based routines that search for text strings (regular text, punctuation, keywords, and tags, primarily HTML) that define the data of interest. Once identified, it then assigns this to the appropriate output destination, either (as in the case of most of the modules) to the appropriate Ibidem fields, or to some other, perhaps largely free-form, format, such as the clipboard or a regular Nota Bene file (an option with Archiva Web Page Text Capture).
But while Archiva comes preconfigured with rules that support capture and parsing of data from tens of thousands of journal databases, libraries, and web pages, there are literally millions of on-line and other sources which the Nota Bene/Archiva user community might want to access over the course of their scholarly careers. In addition, some people might want to capture and parse the data from already supported sites in different forms, for different purposes. It hardly needs to be said that no set of predefined rules and web sites will satisfy every need.
As a result, in order to make Archiva as useful, flexible, and wide-ranging as possible, we’ve designed it as an open system—any NB user can modify the rules already included, or write new ones, either for new data sets (for example, web pages) or for different kinds of structured results from existing data sets.
We also hope that in opening up Archiva’s data-capture rules we will encourage the kind of on-going collaboration that has been a hallmark of the Nota Bene user community for the last 25 years. Indeed, in the course of sharing customizations made to Nota Bene, many Nota Bene users have developed what have turned out to be very important friendships—extending far beyond the specific NB issues that first brought them together—with other Nota Bene users. It’s our dream that Archiva can serve in that capacity as well.
And not only do we hope that you will develop friendships, we also expect that you’ll have lots of fun writing rules. You will need to learn a little bit about things like “parsers” and “regular expressions,” but we’ve provided the tools that should make this process exciting, including a Wiki documentation system (which contributors can modify), a forum to carry on discussion of the issues that are encountered, and even a very helpful “debugger” that lets you simultaneously trace through the code of the routines you write while watching the contents of the clipboard (the HTML text) as it is processed. Our guess is that not a small number of you will even become addicted!
|
Archiva uses the following rules files (all based on regular expresssons):
WEBRULES.TBL — Rules for capturing text from web pages (Archiva Web Page Text Capture)
LIBRULES.TBL — Rules for:
• Capturing and converting citations found on web pages (Archiva Articles)
• Searching z39.50 libraries and converting retrieved data (Archiva Books)
• Converting ISBN numbers to bibliographic records (Archiva ISBN Converter)
BIBRULES.TBL — Rules for converting formatted bibliographies into field-oriented data (Archiva
Bibliography Converter)
|
|
A forum lets Archiva users carry on conversations with each other, including, among other things:
Posting requests for sites they would like supported (this is like an “insider’s” Request/Report forum)
Describing the sites and/or rules they themselves are working on
Posing questions as to how to make their custom rules work
Offering of hints, suggestions, and comments on programming procedures adopted by others
|
|
A “wiki” style documentation of how the rules work will be open to any Archiva user. Each Archiva wiki page has a link to a supplemental user-content page, where users can add text—corrections, expansions, examples, etc. It’s our intention that particularly useful information from these user-content pages will be periodically incorporated into the main Archiva pages, either by us or by Archiva wiki moderators, whose task it will be to manage the overall documentation.
|
| In addition to the on-line, editable, documentation, and the Archiva forum, we’ve incorporated one other aid to developing new rules, namely a debugger that lets you watch, step-by-step, how your rules (or, for that matter, the built-in rules) search the text on the page, identify the relevant pieces of data, and then (in the case of those modules which create Ibidem records) assign it to the two-character ID’s that identify the distinct pieces of data that Ibidem needs to build a complete, formattable, bibliographic citation.
The various panels of the debugger show:
The search and assignment rules
The actual text on the clipboard
The step-by-step searches and assignments, along with their status
The resulting output assignments
|
|
Once you spend a little time figuring out how rules work, and learn about regular expressions—and there are lots of web sites with step-by-step tutorials, in addition to our wiki—you’ll soon become adept at modifying or adding rules. We’d not be surprised if writing your rules becomes completely addictive!
|
|