Search This Blog

Translate

Tuesday, December 25, 2012

How to search in a Lucene Index

Want to follow news you care about.
Don't want to miss any action from premier League, Spanish League and other Leagues.
Want to make app with your own layout.

Check out NTyles.

Get it on....

NTyles-App



In my previous blog I show you guys the way how to index text files using apache lucene. Now in this post let me show you how to search in the index that we had made.
Lot of student's and newbie's who try Lucene will go directly to index any document and would probably want to search something out of that index. Like for example you create a text document named "Gangnam Style" and you fill up the text file with its lyrics and surely some of you guys may wanna search for the term "Ooppaaaaaaaa". So to make sure you guys won't get lost, here I am giving a nice cute little code for searching in a Lucene index.

The code sample that I am showing here uses Apache Lucene 3.4.0 which you can download it from here. First make sure you had read my previous blog on How to build an Lucene Index.

So now, without furthur ado let me show you the code:

//Searcher.java

package com.blogspot.computergodzilla;

import java.io.File;
import java.io.IOException;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;

/**
 *
 * @author Mubin Shrestha
 */
public class Searcher
{
 /**
  *
  * @instring : pass the query string to search.
  */
 public void searchIndex(String instring) throws IOException, ParseException
 {
  System.out.println("Searching for ' " + instring +  " '");          
  IndexSearcher searcher = new IndexSearcher(FSDirectory.open(new  File("INDEX_DIRECTORY")));
  Analyzer analyzer1 = new StandardAnalyzer(Version.LUCENE_34);
  QueryParser queryParser = new QueryParser(Version.LUCENE_34,FIELD_CONTENTS, analyzer1);
  QueryParser queryParserfilename = new   QueryParser(Version.LUCENE_34,FILE_PATH,analyzer1);
  Query query = queryParser.parse(instring);
  Query queryfilename = queryParserfilename.parse(instring);        
  TopDocs hits = searcher.search(query, 100);
  ScoreDoc[] document = hits.scoreDocs;
  
  System.out.println("Total no of hits for content: " + hits.totalHits);
  for(int i = 0;i <document.length;i++)
  {                 
     Document doc = searcher.doc(document[i].doc);      
     String filePath = doc.get("fullpath");                                      
     System.out.println(filePath);
  }
  
  TopDocs hitfilename = searcher.search(queryfilename,100);
  ScoreDoc[] documentfilename = hitfilename.scoreDocs;
  System.out.println("Total no of hits according to file name" + hitfilename.totalHits);       
  for(int i = 0;i < documentfilename.length ; i++)
  {
   Document doc = searcher.doc(documentfilename[i].doc);
   String filename= doc.get("filename");
   System.out.println(filename);                   
  } 
 }
 
 public static void main(String args[])
 {
  new Searcher().searchIndex("hello");  
 } 
}
This code will only run if you guys have created the index with field in the index that I have specified in my earlier post.

In my next blog I will show you guys how to index .pdf, .doc, .html, .xls, .ppt etc files.

4 comments:

  1. Good Tutorial. I am a newbie on Java.
    I followed your previous tutorial (Indexer.java) and It was successful, But now, in Searcher.java I have an error problem here:

    QueryParser queryParser = new QueryParser(Version.LUCENE_34,FIELD_CONTENTS, analyzer1);
    QueryParser queryParserfilename = new QueryParser(Version.LUCENE_34,FILE_PATH,analyzer1);

    It says that "FIELD_CONTENT cannot be resolved to a variable". Do you know how I can solve this problem?
    Thank you

    ReplyDelete
    Replies
    1. Put the name of the content field that you used to index the content of the file. For eg if you had used "content" as name of the field of the index that stores the content. Then FIELD_CONTENTS should be replace by "content".

      Delete
  2. Is QueryParserFileName used to search for the name of file ?

    ReplyDelete