Thursday, January 27, 2011

System Requirements of Lucene.NET

Can anyone with experience of implmementing / running a Lucene.NET solution recommend rough system specifications for a live environment. Is it processor, memory or disk intensive. Does it only use a single processor, so having a box with multiprocessors / cores wouldn't return much benefit, etc?

This is for a greenfields development so there is no relevent existing system to base usage from. It is expected that the data we would be indexing would contain 200K documents (customers) and would contain the standard stuff like name, contact details and a couple of addresses - so each record wouldn't be too big.

Helpfully we need to start our hardware recommendations before we really have chance to create any test solution - and the hardware that currently exists would make any tests difficult to compare / draw conclusion from.

  • You won't get a decent andwer as it totally depends on what you do (number and complexitiy of queries) as well as the size of the storage.

    It is expected that the data we would be indexing would contain 200K documents (customers) and would contain the standard stuff like name, contact details and a couple of addresses - so each record wouldn't be too big.

    I would question the selectin of Lucene as proper technology here to start with. Seems to be a "the only tool I know of is ah ammer, so I make my problem look like a nail".

    Lucene is not a generic database - it is a document full text index search system. it has serious limitations as well as serious strenghts. Any non-document data (address book etc.) I would NOT store in something like lucene.

    That said, 200.000 documents sounds like an awfull... mediocre size. You are probably OK with a "decent system". Disc wise my bet always is to use a Velociraptor Raid 10 for data storage if performance becomes an issue, but even then the hard disc controller can make a hugh difference.

    Paul Hadfield : @TomTom: Some extra info- whilst it is a greenfields project, for the first 1-2 years atleast it will have to integrate with a legacy file based storage system. This (with limitations) will be OK for indexed read access, however it will not provide the desired search functionality. That will need full text searching hence the decision to use something like Lucene.NET - for which I believe it to be a good fit. Would you recommend something else (and why?) I'm certainly not trying to look at everything as a nail to hit with a big hammer!
    TomTom : Well, if you ahve textual full text search, use lucene. If you ahve DATA, not DOCUMENTS, ust a database. The main problem with Lucene is that you have no control over the structure you store - which can lead to mainteancne nightmares further down. SQL Server, for example, is better for structured data - even the XML data type there can be told to ONLY accept fields matching a specific schema (or set of schemata).
    Paul Hadfield : @TomTom: Thanks for taking the time to respond
    From TomTom

0 comments:

Post a Comment