Performance Optimisation Ideas

One of the more common concerns I have read about in the forums relates to calls to xml.php or simple_messenger or the shout back links causing high load. I've done all of the recommended tweaks for site performance. To be honest, the speed is OK, but I look at the log files and see some opportunities for improvement.

 

Firstly, one thing I noticed was callbacks to the above url's happening from IP addresses that are not assigned to any logged in user. For example: /flash/XML.php?module=im&action=updateInvite&recipient=959&_t=1472290917535

 

I looked at the list of online users. At the time, the only user listed as online is me. Yet, I am receiving constant polls to the various common URL's to get messages, for a user that is not logged in. How can that be? If a session is ended, how can Dolphin be modified to send a "go away" signal in response to these requests to stop these requests?

 

In addition, I noticed that there were more requests coming in from some IP addresses compared to others. There appears to be a correlation if a user opens multiple tabs in their browser, that generates 1 request per refresh interval per window. Same question. Could Dolphin determine the active tab from a user by looking at which tab has the most recent human generated requests, and then sending a "go away" message to the other sessions?

 

Lastly... Could dolphin be modified to only poll once every 60 seconds by default, unless there has been chat interaction in the last 10 minutes which would increase the refresh rate to a higher value to allow responsive interaction during a chat session? This one improvement would help significantly reduce the load with all of these requests for update messages etc, but still providing dynamic responsiveness during a chat session.

 

Thanks.

Greg.

Quote · 27 Aug 2016

Not logged in user requests could be probably from Desktop app, also user can set offline status to not be shown as online.

You can change how often to check for new messages:

- Flash IM: Admin Panel > Modules > Flash Apps > Messages > Settings > Update Interval

- Simple messenger: Admin Panel > Modules > Simple Messenger > Simple Messenger Update Time

Rules → http://www.boonex.com/terms
Quote · 28 Aug 2016

Thanks for the reply alex.

 

1. Not using the desktop app. Just the website, but still seeing a fair bit of traffic from these users. I'll do some more research as to the topic of "hiding online status".

 

2. Fully aware of the ability to change those settings. Indeed I have already tweaked those settings. But, there is more to be done. What I was asking for as a feature request is a back-off algorithm. If no message sent/received in the last X minutes, change the update time to Y

 

If you guys implement this, it will dramatically reduce traffic, server load, and improve performance.

 

Regards,

Greg.

Quote · 28 Aug 2016

I had a look today at the log data. One IP has 15608 unique requests today. Every single one of those requests was to either flash.XML, simple_messenger, or shoutbox.

The user was NOT showing as online. The user profile setting has "Public" in the who can see you setting.

I took the next step. I had a look at the top 2 users that were generating the most traffic. I then looked at all of their requests, and not one of them was a human initiated request in the last 24 hours. Each one of these users was a "zombie" that was not logged in. The only requests were to flash.xml, simple messenger, shout box etc.

Overall, these two users were generating 340 requests per minute, or 5.6 requests per second each on average. Why so much? No, I don't have crazy refresh values. In my testing, I can replicate this by opening multiple tabs or browser windows to my site.

But. Back to the point. People have complained about dolphin performance for years. The cause of the problem is right here. Zombies.

I did a quick script a few moments ago that looked at the apache logs - and found all zombies that are not interacting, and I added iptables entries to block those IP's. My CPU load dropped by 90% immediately. I put the IP's back in. Load went back up.

This is not really complicated. We're talking about a design flaw.

  • When a user stops interacting (and we only have this zombie traffic), we need the client to have a "back-off" strategy, to reduce the number of requests per second...
  • When we get a request to these URL's to update data for a non logged in user, we need to send a "kill switch" response that will instruct the flash app etc to stop requesting updates.

By fixing these two bugs, we can once and for all improve the performance of a boonex site without disabling all of the cool features.

Greg.

Quote · 29 Aug 2016

Team.

Some more concrete stats from today. I was noticing some heavy load. I wrote a quick shell script that I call "zombiekiller.sh". What it does is look through an apache log file for any IP's that are hammering the callback url's only, with no human traffic. It then adds an iptables rule for those IP's to block new traffic.

By blocking the zombies, my server load went from 50 down to 0.5 ... Instantly.

To make sure my data was accurate, I unblocked, and relocked those IP's. The load matches the traffic of the calls to XML.php etc.

But, this is not a solution! Those IP's are from valid users, its just that they've left their web browsers open (with multiple tabs) all polling for IM updates. If I leave them blocked, they won't ever be able to use my site again. This was just a proof of concept.

And before anyone asks - I have already followed all of the optimisation guides out there. The server is tuned to the hilt. Even on a well oiled machine, zombie traffic like this will cause pain. My site has only been running bones for a month, and I've got 1105 users since activating boonex. If this type of traffic load does not stop, my only solution is to turn off all of the messaging components or pick another platform.

Help me out here guys. I've posted some suggestions. Give me some debate. Tell me I'm wrong, or help implement a solution that will make your platform viable in the long term.

Quote · 4 Sep 2016

Thank you for the investigation and optimization ideas:

https://github.com/boonex/dolphin.pro/issues/478

Rules → http://www.boonex.com/terms
Quote · 4 Sep 2016

Thanks Alex.

FYI, I have more data today, that may help quantify the size of this problem. I tweaked my zombiekiller script to instead just report statistics of the zombies on my server today, including doing a lookup of the username by doing a query to sys_ip_members_visits and Profiles. What was interesting that I did not anticipate was that some of the zombie IP's belonged to IP addresses that had NEVER been registered in sys_ip_members_visits. That I thought was unusual. I did some more digging, and got the user id from the URL. That user had connected with a different IP, and left their browser open for days. Open for so long, that their ISP session had disconnected, and came back with another IP address.

Anyway, I then went to look at what percentage of overall web traffic was coming from zombie users, who had left their browsers open. I examined log files over an 8 hour period. During that time, 83% of all requests to my web server were from IP addresses that ONLY connected to AJAX url's during that 8 hour period. No human interaction detected. And that's after already reducing the refresh intervals on those features.

Not to put too fine a point on this, but I strongly believe this is the single biggest issue impacting your product today. Since joining the dolphin world, I have seen many people complaining in the forum about speed and performance if they have enabled chat and/or messaging on their sites. I think we now know why. It is a basic design flaw that requires immediate and urgent attention. The previous answers were to reduce refresh times, tune the db, add cache solutions, or turn off those features. Possible short term workarounds, but not scalable for a large dolphin site, unless you spend a truckload on your hardware stack.

I urge you to prioritise this for immediate development effort.

Regards,

Greg.

Quote · 5 Sep 2016

If it's a browser being left open, then this will deal with it. I put this in the market 3 years ago to deal with idle members who left their browsers open.

https://www.boonex.com/m/auto-logout-idle-members

Of course this will not deal with all of the issues, such as a desktop app which i do not use or provide download for because that app chews up to many resources.

https://www.deanbassett.com
Quote · 5 Sep 2016

Im not sure that it will deal with it Deano. I checked, and the users that were polling were already not listed as online, yet they continued to poll.

Also, shout box and chat do not require online presence. Only simple messenger should need a user to be online. But even then, after the user is already logged off according to the system, his browser keeps polling. And for the record, we've not done anything with any desktop client installs, or let people even know this is available. The site has only been open a bit over a month, so they would not have worked this stuff out yet. This is just browsers left open.

Greg.

Quote · 5 Sep 2016
 
 
Below is the legacy version of the Boonex site, maintained for Dolphin.Pro 7.x support.
The new Dolphin solution is powered by UNA Community Management System.