Cost function

### Similar concepts

Similarity Concept
Document clustering
Generality
Term clustering
Term
E measure
Relevance
Data retrieval systems
Cluster based retrieval
Clustering

### Pages with this concept

Similarity Page Snapshot
29 subsets differing in the extent to which they are about a word w then the distribution of w can be described by a mixture of two Poisson distributions ...here p 1 is the probability of a random document belonging to one of the subsets and x 1 and x 2 are the mean occurrences in the two classes ...Although Harter [31]uses function in his wording of this assumption,I think measure would have been more appropriate ...assumption 1 we can calculate the probability of relevance for any document from one of these classes ...that is used to make the decision whether to assign an index term w that occurs k times in a document ...Finally,although tests have shown that this model assigns sensible index terms,it has not been tested from the point of view of its effectiveness in retrieval ...Discrimination and or representation There are two conflicting ways of looking at the problem of characterising documents for retrieval ...
117 D 1 and D 2 can be shown to be equivalent under certain conditions ...[P x w 1 P w 1 >P x w 2 P w 2 >x is relevant,x is non relevant]D 3 Notice that P x has disappeared from the equation since it does not affect the outcome of the decision ...[R w 1 x <R w 2 x][[equivalence]][l 21 l 11 P x w 1 P w 1 >l 12 l 22 P x w 2 P w 2]When a special loss function is chosen,namely,which implies that no loss is assigned to a correct decision quite reasonable and unit loss to any error not so reasonable,then we have [R w 1 x <R w 2 x [[equivalence]]P x w 1 P w 1 >P x w 2 P w 2]which shows the equivalence of D 2 and D 3,and hence of D 1 and D 2 under a binary loss function ...This completes the derivation of the decision rule to be used to decide relevance or non relevance,or to put it differently to retrieve or not to retrieve ...Form of retrieval function The previous section was rather abstract and left the connection of the various probabilities with IR rather open ...
116 The decision rule we use is in fact well known as Bayes Decision Rule ...[P w 1 x >P w 2 x >x is relevant,x is non relevant]D 1 The expression D 1 is a short hand notation for the following:compare P w 1 x with P w 2 x if the first is greater than the second then decide that x is relevant otherwise decide x is non relevant ...The meaning of [E >p,q]is that if E is true then decide p,otherwise decide q ...In other words once we have decided one way e ...This sum will be minimised by making P error x as small as possible for each x since P error x and P x are always positive ...Of course average error is not the only sensible quantity worth minimising ...R wi x li 1 P w 1 x li 2 P w 2 x i 1,2 The overall risk is a sum in the same way that the average probability of error was,R wi x now playing the role of P wi x ...[R w 1 x <R w 2 x >x is relevant,x is non relevant]D 2
126 In general we would have two tables of this kind when setting up our function g x,one for estimating the parameters associated with P x w 1 and one for P x w 2 ...The estimates shown above are examples of point estimates ...Two basic assumptions made in deriving any estimation rule through Bayesian decision theory are:1 the form of the prior distribution on the parameter space,i ...probability distribution on the possible values of the binomial parameter;and 2 the form of the loss function used to measure the error made in estimating the parameter ...Once these two assumptions are made explicit by defining the form of the distribution and loss function,then,together with Bayes Principle which seeks to minimise the posterior conditional expected loss given the observations,we can derive a number of different estimation rules ...where x is the number of successes in n trials,and a and b are parameters dictated by the particularcombination of prior and loss