Sunday, January 23, 2011

Very large database, very small portion most being retrieved in real time

Hi folks, I have an interesting database problem. I have a DB that is 150GB in size. My memory buffer is 8GB.

Most of my data is rarely being retrieved, or mainly being retrieved by backend processes. I would very much prefer to keep them around because some features require them.

Some of it (namely some tables, and some identifiable parts of certain tables) are used very often in a user facing manner

How can I make sure that the latter is always being kept in memory? (there is more than enough space for these)

More info: We are on Ruby on rails. The database is MYSQL, our tables are stored using INNODB. We are sharding the data across 2 partitions. Because we are sharding it, we store most of our data using JSON blobs, while indexing only the primary keys

  • The best you can probably do is examine execution plans for your long running queries and tune 1) the query and 2) the database appropriately. You could build indexes for the "identifiable parts of certain tables" to speed queries. You could also move your more frequently used data into its own table, and the less frequently used data into its own.

    Doing this with JSON blobs will be difficult because if you need access to one attribute of the JSON blob, you will have to fetch and parse the whole blob. If your JSON blobs are in a consistent format, build a real table structure to reflect that, and you'll probably 1) already have improved the performance and 2) have a much more flexible structure when you need to performance tune later.

    From Shin
  • There's a lot of options here. First, NDB is MySQL's clustering engine, which stores data in memory. NDB does have some limitations, however.

    memcached is a popular solution that is often used but it requires the application architecture to support it.

    You could have MyISAM tables that you specifically store within a RAM disk, as they are able to be relocated individually unlike with InnoDB. InnoDB's entire table space would have to be stored on the RAM disk.

    You may find the memory engine better suited than my RAM disk hack, however. They're also more limited than other engines, as they can't support BLOBs among other things. For the data to maintained, you would have to have a wrapper script to dump and restore the data. This also introduces risk to the data, as a power loss even with scripts would result in data loss.

    Ultimately, you will likely benefit the most from properly tuning and optimizing your MySQL database and queries. A properly tuning MySQL database utilizes memory caching.

    There's a lot of resources available on this already both on Serverfault and the Internet as a whole. MySQL has a document and here's a MySQL performance blog post, which are both very useful resources. Here's another post where they have a formula for calculating InnoDB memory usage.

    TomTom : useless given that the post already talks of ONLY 8gb memory in the machine, you know.
    Warner : Perhaps you should re-read my post. Is English not your native language?
    From Warner

BIND split-view DNS config problem

We have two DNS servers: one external server controlled by our ISP and one internal server controlled by us. I'd like internal requests for foo.example.com to map to 192.168.100.5 and external requests continue to map to 1.2.3.4, so I'm trying to configure a view in bind. Unfortunately, bind fails when I attempt to reload the configuration. I'm sure I'm missing something simple, but I can't figure out what it is.

options {
        directory "/var/cache/bind";
        forwarders {
         8.8.8.8;
         8.8.4.4;
        };
        auth-nxdomain no;    # conform to RFC1035
        listen-on-v6 { any; };
};
zone "." {
        type hint;
        file "/etc/bind/db.root";
};
zone "localhost" {
        type master;
        file "/etc/bind/db.local";
};
zone "127.in-addr.arpa" {
        type master;
        file "/etc/bind/db.127";
};
zone "0.in-addr.arpa" {
        type master;
        file "/etc/bind/db.0";
};
zone "255.in-addr.arpa" {
        type master;
        file "/etc/bind/db.255";
};
view "internal" {
      zone "example.com" {
              type master;
              notify no;
              file "/etc/bind/db.example.com";
      };
};
zone "example.corp" {
        type master;
        file "/etc/bind/db.example.corp";
};
zone "100.168.192.in-addr.arpa" {
        type master;
        notify no;
        file "/etc/bind/db.192";
};

I have excluded the entries in the view for allow-recursion and recursion in an attempt to simplify the configuration. If I remove the view and just load the example.com zone directly, it works fine.

Any advice on what I might be missing?

  • First, check your logs, but I think you forget

    acl "lan_hosts" {
        192.168.0.0/24;             # network address of your local LAN
        127.0.0.1;              # allow loop back
    };
    view "internal" {
            match-clients { lan_hosts; };   
    [...]
    };
    
    organicveggie : Actually, match-clients is not required. From http://www.zytrax.com/books/dns/ch7/view.html, "If either or both of match-clients and match-destinations are missing they default to any (all hosts match)."
    From Dom
  • Post what named said.

    organicveggie : Huh. Didn't know about "named-checkconf" until now: # named-checkconf /etc/bind/named.conf:12: when using 'view' statements, all zones must be in views
    From urmum

Webserver concurrent connections

Where can I get statistics of concurrent connections that can be handled by Apache and IIS? Which one will serve more requests in peak times?

Thank you, Sri

  • That's hard to answer. As always, the answer to questions like this is "it depends." What type of requests? Static? Dynamic? Large? Small? Internal? External? A bit more information about your environment is needed to answer with any degree of accuracy.

    From McJeff

Where should I store the VM files (config, snapshots, vhd) on a Hyper-V server?

By default the VHDs go into “C:\Users\Public\Documents\Hyper-V\Virtual Hard Disks” and the config files go into “C:\ProgramData\Microsoft\Windows\Hyper-V”.

Should I leave them there?

Is it ok for the VHDs to be in a “Public” folder?

  • Here is what I do:

    • I have pretty much always a RAID 10 for the Hyper-V host, 4 discs. Either BLack Scorpio (lower performance) or Velociraptors.

    • 64gb base partition

    • The rest is a second partion "V:"

    • VM's live on V.

    • Public is not ok - i mean, seriously, what for?

    aduljr : you don't want your server images on a public drive. I would keep the server images on a separate disk system of either raid 1 or raid 10 depending on your performance needs, snapshots I would store on a different drive or storage server for backup and retrieval purposes. I guess the question to really ask is, what are you doing to begin with? Are you learning within a home environment or lab setup? Or is this going to be something that will go into production?
    TomTom : Well, normally the OS discs basically do nthing. After startup you can ignore them - as long as nothing else than hyper-v runs on them.The Raid 10 stops me from having to have more discs in. Me personally I do that with a lot of setups - sometimes quite high performance (32gb server, 4 cores, running sql server with directly mapped discs for the real data).
    From TomTom
  • In general, you'll want your VMs on a disk subsystem that is redundant and shared with every member of the Hyper-V cluster. This will almost never be C:.

How to setup Joomla CMS as a backend for iPhone app

I would like my iPhone app to get dynamic content off the net. This content should be managed using a CMS.

I have gone ahead and installed Joomla on my server and will be using the Joomla web interface to create and manage content.

I would now like the iPhone app to login to my server and fetch the content. I do not want the complete web pages for my iPhone app. Instead, I want the content in the form of XML or JSON or some serialized format so that I can use the data in a custom layout native to the app.

So I am looking for 2 things in particular: 1. How to setup HTTP based authentication for my iPhone app to access data from my server. 2. How to access the content in a serialized format (XML, JSON etc)

Are there plugins/extensions/components I can use to achieve the same.

Any advice on how this can be achieved would be helpful.

I am completely new to setting up/using CMS.

  • I would use Osmek instead of joomla. Osmek is a cms that is specifically tailored to transfer json and xml through their api, so no hacking is involved. Their API is the base of their service, and works through an HTTP request via POST data.

    srik : although i was looking at setting up something at my own server using existing cms like joomla or drupal.. the free version of osmek serves the purpose for simple applications. thanks.

Network latency -- how long does it take for a packet to travel halfway around the world?

Possible Duplicates:
How does geography affect network latency?
How much network latency is “typical” for east - west coast USA?

If I'm hosting an app in NY, what kind of delay can I expect for a user to get a single packet if they're in Australia, i.e. roughly the maximum distance from NY?

I'm looking for the maximum latency I'm likely to encounter on a regular basis -- if Australia's not the right destination point to consider, feel free to substitute another point.

Thanks!

Michael

  • This is hard to come by information, as You might know, because it is related to the individual endpoints and the way they are switched to the destination, especially partly even how their individual provider is switched and prioritized over the deep sea cables between the continents. Also it wildly differs from desination continent to destination continet as sometimes traffic for one continet is routed throgh another continent (read sea cables). Also it seems that the part between fiber lines and each endpoint is what brings the latency, so it seems to be more about the customer's internet connection than the Backbone You're sitting on. Be sure to have a look at the links provided by Zypher and Ward.

    If latency is a problem, think about a content delivery network, which serves each continent on the continent. That might help if You don't need data written to Your NY server in realtime.

    Cisco has some tiny bits for VOIP, but worthwile to read, there is a forum thread with some user measurements. The numbers differ wildly but never forget that users often mix up ping and lateny (like in the forum posts).

    Personally I would expect about 200 milliseconds end to end just to be sure.

    The thing I would do, would be to take an edonkey client with latency readings (I believe azureus has this) and have a look at connections from my destination to interesting spots on the map. That way You have raeal life end to end latency data.

  • If you had a fiber optic cable straight from NY to Sydney, the distance latency by itself would be ~90ms. Realistically you'd be lucky to stay under 200ms.

    From Chris S

PHP FastCGI SAPI: Reloading PHP Configuration

I am using PHP FastCGI SAPI on my web hosting environment to run PHP applications. To spawn FCGI processes I use spawn-fcgi helper program. My problem is whenever I make a change to php.ini file, I have to kill and respawn each FastCGI server for the new configuration to take effect.

Is there a way to reload PHP configuration(ie. php.ini directives) without respawning each FastCGI server? I try sending hangup signal (ie. kill -HUP PHPCGIPID) to the servers but this will result in termination of the servers.

  • As far as I know, PHP's FastCGI interpreter doesn't react to signals like HUP, USR1 or USR2 to reload its configuration.

    Maybe PHP-FPM could help you to achieve what you want. On the downside, it requires patching PHP.

    From joschi
  • If the servers are spawned automatically, kill them. If they’re manually started, restart them. PHP doesn’t have the ability to reload its own configuration — and generally, killing/restarting is not a problem. Is there a reason why you can’t kill them in this instance?

    From Mo