Floodgap Gopher Statistics Project methodology UPDATED 1 December 2007 The aim for the GStats project is not to catalogue a complete usage and activity count for the whole of modern Gopherspace, which would neither be accurate nor reasonably possible. Instead, the GStats project uses access data from the Floodgap Public Gopher Proxy to generate usage totals. This is useful because the Proxy can access any host -- just like a Gopher client, since it *is* one -- so it provides a very useful analysis of a large, anonymous user base's activity on Gopherspace. In the statistics, a successful connection to a valid gopher host is one hit. This means that accesses to hosts that were once valid but are not now will not be counted, which may underreport interest in former sites, but also allows us to screen out people probing the Proxy for security issues (such as typing in www.myspace.com and expecting it to act as an HTTP proxy instead). Monthly these hits are then aggregated into a total count and plotted on a rolling seven-month history for trend analysis, along with monthly pie charts (using GNUplot and ascii_chart). A count of number of IP/port pairs accessed is also generated. This is only indirectly comparable to the Veronica-2 count, as it counts host names instead, and is also usually behind on indexing new hosts due to its data massage cycles. From 5/07 to 9/07 inclusive, the Proxy was not configured to do traffic analysis and these figures were generated retrospectively from the webserver log. These figures are likely to be slightly higher than usual because it was not possible in all cases to screen out proxy abuse, although an effort to eliminate common probing attempts was made on the data set. For this reason, the host statistics are also likely to be slightly more inflated. However, I believe the difference is likely not large, so I have included these data sets in the rolling history and made them available. Again, I repeat that the numbers should not be interpreted as: - A total number of hosts in Gopherspace: merely a total number of IP/ports that were accessed through the Proxy. Veronica-2 is likely to have a more accurate count. - A total assessment of all traffic in Gopherspace: most Gopherspace traffic actually occurs directly (at least at Floodgap) from clients or web browsers with Gopher support. Proxy traffic is at most a minority of access here, and is probably the same for most other Gopher sites. I would appreciate your comments. gopher@floodgap.com .