Looking For Best Practices On Building Feed Reader / Aggregator on a Cron

I have a social networking site which is beginning to gain some momentum and has an expanding user-base. We currently allow the users to import their blog, flickr and twitter feeds. We use the php library simplepie to read the feeds and then we check the DB to make sure we do not have a duplicate entry for each found feed item. If the feed item is new, we store it in the DB. The feed updaters each run on their own cron. So we have one for twitter feeds, one for flickr and one for blogs.

I have noticed the site gets sluggish and it is most likely when the cron tasks are running. There must be a better way to do this. Any thoughts?

13.10.2009 15:24:53
2 ОТВЕТА
РЕШЕНИЕ

The general idea is fine, I would not change that.

If you are sure that it is the cron tasks causing performance problems then I would run them on a separate server. Having a 'batch server' to run these sorts of jobs separate to the front-end web server is quite a common solution.

But I would not embark on any changes to improve performance without being absolutely sure what the problem is. For all I know, your database schema could just be horribly inefficient.

2
13.10.2009 15:30:16
thanks. I think what may have been causing it was a long timeout (20 seconds) on the call (via curl) to the read the feed remotely. I think that when it was getting hung up on multiple feeds, it was causing a network delay. Not sure, but reducing the timeout seems to have helped significantly. Thanks for the tip.
phirschybar 15.10.2009 18:05:55

Ben James gives a good point there, you need to be 100% sure that the cron's are the cause. I wouldn't jump on getting a new server yet tho, not until you are unable to optimize what you already have.

What type of sluggishness do you experience?

  1. Network delay?
  2. Database delay?
  3. General page load is less responsive (front end code?)
  4. Everything? etc;

Do an analysis and then know where to optimize, once you have all the variables.

1
13.10.2009 19:40:24