Tracking down web traffic

General discussions and other topics.
5 posts Page 1 of 1
by gedurkee » Sun Jan 04, 2015 12:14 pm
Hi: I work in search and rescue and am getting peripherally involved in advising on attempts to reopen a missing person case from two years ago. The individual disappeared in a wilderness setting. There's a very slight hope he might have visited the local library shortly before he set off on a day hike & technical climb. The assumption is that he might have searched for information on routes. As far as I know, the only IT investigation done was the local police investigator being told that the library's computer histories were wiped each evening and no further attempt was made to track down web sites visited.

After two years, it's clearly a long shot and, in fact, I'm recommending against this as a priority line of inquiry simply because of limited people available to follow up (and none with technical expertise). Still, I'd like to understand the potential here better and the slight hope some useful information/clues might be recovered.

So, for information now and in the future, what can be recovered from:
1) the individual computer used. I was thinking that if the person's web mail, say, were known we could search for that IP address and then bracket a time and computer the person was using. Does that make sense? While web history might be wiped, would caches likely be wiped as well? For future reference, how is cache information organized such that it might be traced to a user at the computer at a certain time?

2) the router of the library. Are logs kept indefinitely or only until a text cache reaches a certain size?

3) the ISP for the library. Do ISPs tend to hang on to use logs or are they dumped after x time?

What information might be recoverable from the individual web sites visited?

Many thanks!

George
by gedurkee » Sat Jan 10, 2015 11:04 am
OK. Well, a small miracle, we did recover the library server's traffic from two years ago. I was sent a 25,000 page (!) log (PDF) of all the computers in use at the library for one day of traffic, when our guy was likely there. Another miracle was finding the computer he used by searching for one of the sites we thought he might have visited. We could then identify the computer he was on and the time there (and get rid of all but about 500 of those pages...).

So, a networking question: when a web request routes to the target site, how does the return information/web page get to the individual computer on a network that requested it? Does the target computer know that information or is that only put together at the originating router and then to the requesting computer? He did visit a mountaineering site and there's a hope he was researching specific climbing routes. The problem now is how to find that on the target site.

thanks,

George
by wa2ibm » Sat Jan 10, 2015 12:16 pm
Every data packet that traverses the Internet carries both the source and target IP address, placed there by the originator of the packet. This applies no matter what the application (web browsing, email, file transfers, etc.).

As each packet passes through the network, each router along the way uses the target address to forward the packet towards that destination. When the packet arrives at its destination, the target computer recognizes its own address and processes the request. The responding packets then follow the same logic back to the requester, using the original target/source information in reverse.

Typical requests or responses are made up of multiple packets. Every packet of information is an isolated piece of data crossing the internet. It's even possible, though rare, that the packets making up a single request or response might actually take different paths from source to target. The two endpoints are responsible to re-assemble those packets pack into their original form for processing.
by gedurkee » Sat Jan 10, 2015 12:38 pm
Ah, thanks!
by virtualmike » Sat Jan 10, 2015 11:11 pm
To add to wa2ibm's response, most likely, the library has a single external IP address and a router that manages NAT (Network Address Translation).

The computer making the original request will open a port for the response, and that port number is included in the original request. The library's router will note the IP address of the computer and the port, and then it will substitute its own IP address (e.g., the library's external IP address) and one of its own open port numbers in the request that gets sent to the remote site.

When the library's router receives the response that is directed to a specific port number, it will recognize the response as belonging to the original requesting computer. In the response packet, the router will restore the IP address and port number of the originating computer ans send the packet along.
5 posts Page 1 of 1