0 bookmark(s) - Sort by: Date ↓ / Title /
Apache logfile parser with Spark
Extract the 11 elements from each log
def map_log(line): match = re.search('^(S+) (S+) (S+) (S+) [- » (d{4})] "(S+)s*(S+)s*(S+)s*(+)?s*"* (d{3}) (S+)',line) if match is None: match = re.search('^(S+) (S+) (S+) (S+) [- » (d{4})] "(S+)s*(+)>( w/s. » +)s(S+)s*(d{3})s*(S+)',line) return(match.groups()) parsed_rdd = rdd.map(lambda line: parse_log2(line)).filter(lambda line: line 1 » == 1).map(lambda line : line 0 » ) parsed_rdd2 = parsed_rdd.map(lambda line: map_log(line))
View Apache requests per minute Run the following command to see requests per minute: grep "23/Jan/2013:06" example.com | cut -d -f2 | cut -d » -f1 | awk -F: '{print $2":"$3}' | sort -nk1 -nk2 | uniq -c | awk '{ if ($1 > 10) print $0}' Code breakdown:
First / Previous / Next / Last
/ Page 2 of 0