EXERCISE 1 Characterize the distinction between those publications that typically localize citations within a document’s text and those that collect all citations at the document’s end. How does this vary across types of publication and across topical areas?

EXERCISE 2 Imagine you are an assistant professor. You realize that you will be evaluated for tenure soon and that your citation index (the number of citations to your papers) will be a major factor in the decision. One strategy might be to find a famous person in your field and criticize a central tenet of one of his or her major papers. Even if you’re wrong, you are almost guaranteed to have many replies to your (specious) argument. Although it may be embarrassing, your citation count should go through the roof!

Estimate the research front width for your field, and compute how far ahead of time you would need to begin this strategy for it to count in your tenure decision.

EXERCISE 3 Construct a Boolean function SameName(Name1, Name2) that returns 1 if the two people’s proper names should be treated as equivalent references and 0 otherwise.

First, try constructing your best manual solutions.

Next, try to learn this function. Ideally it should be trained with paires of names for which you have made the correspondences yourself (as well as “negative” examples of nonmatching names). See what happens if you use your manually constructed solutions as a training signal.

Finnaly, extend the function to a BestName(Name1, Name2) that still returns a 0 if the name are the same, but a merged Name1+2 that combines the best features of both (e.g., if one has a full versus abbreviated first name and other has a middle initial).

EXERCISE 4 Repeat Exercise 3 but with bibliographic citations rather than people’s names. (Note, however, the dependence of the bibliographic solution on the sameName(Name1, Name2) function.)

EXERCISE 5  One estimate (offered by someone familiar with the industry) of the value of West’s page numbers Page numbers are worth real money over legal citations without them was $400M. Use this estimate together with an estimate of the average case length per page to give a per-page estimate of the additional value of this higher precision.

EXERCISE 6  A good test of coverage is provided by a sample of papers in a proceedings of the major AI conference, AAAI. Using the data set of file AAAI94-submissions.t, determine what fraction of current AI work can be traced back to the 1954 founding fathers.

EXERCISE 7  Names like JOHNSON purportedly began as names to describe sons of John. Suggest rules like those used in Porter’s stemmer to exploit systematic variations in family names such as this.