Thursday, February 17, 2011

Best distributed filesystem for commodity linux storage farm

I have a lot of spare intel linux servers laying around (hundreds) and want to use them for a distributed file system in a web hosting and file sharing environment. This isn't for a HPC application, so high performance isn't critical. The main requirement is high availability, if one server goes offline, the data stored on it's hard drives is still available from other nodes. It must run over TCP/IP and provide standard POSIX file permissions.

I've looked at the following:

  • Lustre (http://wiki.lustre.org/index.php?title=Main_Page): Comes really close, but it doesn't provide redundancy for data on a node. You must make the data HA using RAID or DRBD. Supported by Sun and Open Source, so it should be around for a while

  • gfarm (http://datafarm.apgrid.org/): Looks like it provides the redundancy but at the cost of complexity and maintainability. Not as well supported as Lustre.

Does anyone have any experience with these or any other systems that might work?

From stackoverflow
  • check also GlusterFS

  • Have you looked at Red Hat's Global Filesystem? http://www.redhat.com/gfs

    Left Hand Networks (http://lefthandnetworks.com) has a commercial option for this too (never used, but I've heard of them).

    Javier : doesn't manage redundancy, AFAIK
  • Ceph looks to be a promising new-ish entry into the arena. The site claims it's not ready for production use yet though.

  • Have you tries the Mirror File System developed by Twin Peaks Software. http://www.TwinPeakSoft.com/

0 comments:

Post a Comment