Website Bandwidth Usage

Web hosting discussion, programming, and shared and dedicated servers.
9 posts Page 1 of 1
by prouty » Wed May 02, 2012 9:08 am
We were told that our Website Bandwidth Usage exceeded our quota and that it was being shut down through the end of the month, thanks to Bandwidth Quota Protection. That's fine, but lately my wife's genealogy site appears to attracting bots or greedy users who download the whole site and suck our bandwidth quota dry, as witnessed by the latest bandwidth usage graph:

Image

Can Sonic help us block such bots or limit access on a daily basis from a given IP?

Thank you,

-Jeff
by dja » Wed May 02, 2012 9:18 am
You can use .htaccess to block ip addresses, also if it's a well behaved bot, you may be able to just use a robots.txt file.

blocking by ip: http://www.clockwatchers.com/htaccess_block.html
robots.txt: http://www.thesitewizard.com/archive/robotstxt.shtml

Hope this helps! cheers!
by prouty » Wed May 02, 2012 10:12 am
Thanks, much, for the info! I'll look into both methods.

Is there a way I can get stats on site visitors (their IPs and usage)?

TIA,

-Jeff
by kgc » Wed May 02, 2012 11:39 am
Go to the membertools, https://members.sonic.net/account/resou ... dth_usage/ and then click on the appropriate site and you'll be shown some more detailed stats. There is always the possibility of looking at the raw logs yourself as well.
Kelsey Cummings
System Architect, Sonic.net, Inc.
by prouty » Wed May 02, 2012 11:55 am
Excellent!

One IP was responsible for 77% of our bandwidth usage last month!

Many thanks, guys! ツ

-Jeff
by gp1628 » Thu May 03, 2012 4:39 am
On the robots.txt
Generally a site DOES want to be found by search engine bots such as Google and Bing.
However many sites have large files of downloadable programs, or maybe a large picture gallery, or just rather heavy on the graphics. Or embedded music. Those can all eat up bandwidth.

If you design the site so that all of those things are in a subdirectory then you can use robots.txt (or even .htaccess I believe) to allow access to the main page but block access to the subdirectories with the big bandwidth files. Usually you dont want anyone searching for and downloading those things directly anyway, and definitely dont want them sucked down with one mega-collection of the whole site. This way your main page is still googlable.

ALSO for such large bandwidth items, if they're not something that every visitor gets sent to them but something that a person needs to click a link on then you might consider putting it somewhere else. After all, the link on your page can be to anywhere. And often it doesnt have to be someplace as high quality as Sonic is. Leave your main pages on Sonic for stability, and put the download stuff somewhere cheap (or even free) to link to. If that file site goes bad you can always quickly setup another and change the link. If you have some old computer in the closet because it wont run the latest windows games then you can even setup a home server if it meets your needs but still leave the main page on sonic.
by prouty » Thu May 03, 2012 7:13 am
Sage advice for everyone! Thank you, Gandalf!

This was the culprit: http://www.discoveryengine.com/discobot.html

I'll work on cleaning up the site this weekend. ✔
by ronks » Thu Jan 02, 2014 12:42 pm
To follow up on this thread (or I can create a new topic if more appropriate), is it possible to compose an effective robots.txt file when my "site" is of the form [http://www.sonic.net/myusername/] ?

It looks to me from the specs that the robots file has to be up at the top level (that is, [http://www.sonic.net/robots.txt]). Will [http://www.sonic.net/myusername/robots.txt] work?

Or if not, how can I stop bots and search engines from using up my monthly bandwidth allocation?

(As a practical matter, in my case I don't want my JPGs and other files to be found by search engines, just people I have mentioned them to and their friends. Not that I mind it - the files are publicly readable - I just don't want to pay for large-scale robotic access.)
by milbo » Thu Dec 04, 2014 3:17 am
I have also had sites mindlessly downloading large files from my Sonic hosted webpage.

Typically it's just one or two large files that are repeatedly downloaded (I can tell this by examination of the bandwidth resource usage logs provided by Sonic under member tools).

Sometimes you can tell from the logs who the culprit is from examination of the "Top 10 sites by KBytes" table. In my case an example has been http://rerc.tongji.edu.cn. But often you can't tell who the culprit is, because the log says something like 90.171.97.99.

A trick that has helped me in the past is to move the files that are being downloaded to a different directory, for example, from http://www.milbo.users.sonic.net/bigfile.tar.gz to http://www.milbo.users.sonic.net/1/bigfile.tar.gz. (And you also need to update any of your html files that link to the file.)

This fixes the problem when the site doing the downloads has a hardcoded link directly to your file --- by changing the file's directory you mess this up --- someone on the far end has to go and manually change the link, which since the whole process seems pretty mindless, doesn't seem to happen.
9 posts Page 1 of 1