Page 2 of 3

Re: Full text search - status?

PostPosted: Thu Jun 16, 2011 12:33 pm
by Nazar

Re: Full text search - status?

PostPosted: Thu Jun 16, 2011 6:27 pm
by ikm
No one's working on it right now. If someone wants to, that someone's welcome.

Re: Full text search - status?

PostPosted: Thu Jun 16, 2011 8:01 pm
by Tvangeste
ikm wrote:No one's working on it right now. If someone wants to, that someone's welcome.

I've been looking into full text search libraries for C++ that we could use, just in case there is something that already works.

So far, it seems that Xapian is the most interesting library (GPL licensed C++): http://xapian.org/features

Looks promising. Are you aware of any other projects that could be of use?

Re: Full text search - status?

PostPosted: Fri Jun 17, 2011 3:01 am
by ikm
Xapian indeed looks the most promising. As for the FTS implementation in GoldenDict, well, here's my proposal: I am willing to implement the dictionary crawling required to perform the FTS indexing, if someone (wink, wink!) is willing to implement the GUI for indexing and searching (using Xapian as a backend).

Re: Full text search - status?

PostPosted: Fri Jun 17, 2011 8:16 am
by Tvangeste
ikm wrote:Well, here's my proposal: I am willing to implement the dictionary crawling required to perform the FTS indexing, if someone (wink, wink!) is willing to implement the GUI for indexing and searching (using Xapian as a backend).


Whoa! Abgemacht! I mean, yeah, I"ll do my best. Hopefully, my abilities with Qt will be advanced enough by then! 8-)

Re: Full text search - status?

PostPosted: Fri Jun 17, 2011 8:27 am
by ikm
Ok then. I'll post back once I'm done.

Re: Full text search - status?

PostPosted: Fri Jun 17, 2011 1:17 pm
by betwee
yay, :D.

Re: Full text search - status?

PostPosted: Sat Jun 18, 2011 7:43 am
by ikm
Ok, the crawling interface is ready. Dictionary::Class objects now have the following two functions: isCrawlingSupported() and crawl(). If the former returns true, then the latter can be used to create a Dictionary::Crawler object. From that moment on, it is completely independent from the originating dictionary object, and can be used to traverse through all the articles. For each article, a list of headwords (first one is the main one, the others are alternates, if any) and a body in html is returned. A simple crawling example:
Code: Select all
    Dictionary::Class & d = .....  // Obtain a dictionary instance from somewhere

    if ( d.isCrawlingSupported() )
    {
      printf( "Gonna crawl it!\n" );

      File::Class f( "/tmp/crawled.txt", "wb" );

      sptr< Dictionary::Crawler > crw = d.crawl();

      vector< string > headwords;
      string body;
      while( crw->fetchNextArticle( headwords, body ) )
      {
        for ( int x = 0; x < headwords.size(); ++x )
        {
          string const & str = headwords[ x ];
          f.write( str.c_str(), str.size() );
          f.write( "\n", 1 );
        }
        f.write( body.c_str(), body.size() );
        f.write( "\n", 1 );
      }

      printf( "Done!\n" );
      }
    }

This interface right now is implemented for Dsl dictionaries. Support for others will follow.

All the code lives in the 'fts' branch at github. This should be sufficient to implement dictionary indexing and searching with Xapian.

Re: Full text search - status?

PostPosted: Mon Jun 27, 2011 6:46 pm
by Nazar
Well, it seems Konstantin has done his part as prompt as one could only wish. But what comes next? We want our full-text search!!! lol.

Re: Full text search - status?

PostPosted: Mon Jun 27, 2011 7:04 pm
by Tvangeste
Nazar wrote:Well, it seems Konstantin has done his part as prompt as one could only wish. But what comes next? We want our full-text search!!! lol.

Heh, Konstantin was WAAAY to fast for me. :) But no worry, once I'm done with the UI tweaks, this is the biggest priority.