IDF of a query?
 How do I calculate tf-idf for a query?  I understand how to calculate tf-idf for a set of documents with following definitions:  
tf = occurances in document/ total words in document
idf = log(#documents / #documents where term occurs
 But I don't understand how that correlates to queries.  
 For example , I read a resource that stated the values of a query " life learning "  
 life |  tf = .5 |  idf = 1.405507153 |  tf_idf = 0.702753576  
 learning |  tf = .5 |  idf = 1.405507153 |  tf_idf = 0.702753576  
 The tf values I understand, each term appears only once out of the two possible terms, thus 1/2, But I have no idea where the idf comes from.  
 I would think that #documents = 1 and occurrence = 1, log(1) = 0, so idf would be 0, but this doesn't seem to be the case.  Is it based on whatever documents you're using?  How do you calculate tf-idf for a query?  
Only tf(life) depends on the query itself. However, the idf of a query depends on the background documents, so idf(life) = 1+ ln(3/2) ~= 1.405507153. That is why tf-idf is defined as multiplying a local component (term frequency) with a global component (inverse document frequency).
 Assume your query is best car insurance , your total vocabulary contains car, best, auto, insurance and you have N=1,000,000 documents.  So your query is something like below:  

And one of your document could be:

 Now calculate cosine similarity between TF-IDF of your Query and Document .  
下一篇: 查询的IDF?
