Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Here's my understanding of the potential for a scalability problem.

EXAMPLE: twitter.com has 24,650 twitter followers. If Dave gets 24,650 followers on an RSSCloud architecture then this is what happens when he posts.

1. Dave creates a 140 char post. Hs blogging software sends a notice to the cloud server that he has an updated RSS feed.

2. the cloud server sends update notices to the 24,650 subscribed "listeners" to Dave's "RSSCloud-twit-sream".

NOTE:It does not send Dave's new post text just the alert event.

3. the 24,650 listeners then do an RSS GET from Dave blogging server. This could create "cattle stampede" (i.e. slashdot effect) and many users may not get service when Dave's server is overrun. The server would likely be swamped with this massive interest in Dave's blog in a few seconds from these real-time subscribers.

At small levels of users the architecture is effective and elegant. At very large numbers it's missing an essential optimization. Only the "new blog" text should needs to be sent... maybe with the RSSCloud event for example.

An RSS GET will pull the whole string of recent blogs posts for all 24,650 users. A lot of excess text that most users already have from being real-time listeners anyway.

The RSSCloud Blogger's software needs to see a difference between a RSS GET for the recent blog text and an RSSCloud GET for the latest update text ONLY. Reducing the amount of text being sent out but a change to the protocols as described I think.

Of course, I could be way off base but I'm really trying to understand the overall architecture and the "realtime" problem this is intended to resolve for us all.

NOTE: If you federate the RSSCloud servers you just make the "GET" problem even worse. More demand on the blogger's RSS feed in a few seconds. It's like a user driven "slashdot effect". Post a 140 char message and notify the cloud and boom... you're server falls over.

I'll await corrections to my understanding.

NOTE: PubSubHubBub has an entirely different approach to the real-time optimization for bloggers. The Hub Server gets the blogger's new post text and the Hub Server forwards this delta to subscribed listeners. The Blogger's server never sees any excess traffic in or out. Of course, the PubSubHubBub service could require the resources of a Google, Amazon or Yahoo. A centralized service that could potentially have a "fail whale". Dave's RSS Cloud has a million "fail fishies".

Life as always is rife with tradeoffs. Go figure. YMMV.



A good deal of your critique is a rehash of concerns people expressed about RSS in the first place. They generally did not come to pass because either they weren't real problems in the first place, they were easily addressed, or general advances in technology moved faster than their onset.

You are concerned about the inefficiency of fetching all the items in a feed when just one item changes. Is that a real issue? Consider your use case. How much data is really being requested? If it is a real issue, the server might want to limit the # of entries returned based on the if-modified-since header. As for the load of all that traffic hitting the server in the space of a few seconds, ngnix can push a lot of requests on modest hardware and the load on whatever application logic is involved in generating the feed can be knocked way down by having it cache all feed requests for a second or two.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: