Scrapping various Web pages I managed to gather data on players participating in last 4 Rugby World Cups. Is there a trend in body mass of rugby players participating in RWC tournaments?
Using Plotly API via Perl script (described here Box-plot chart with plot.ly and Perl API) I can quickly plot series of boxplots:
# cpan install WebService::Plotly in case WebService::Plotly is not installed plotly_boxplot.pl -col=5 -by=0 -title='RWC players by weight' -sep=';' rwc1999-2015.csv
Resulting boxplots and data can be viewed here.
The following Perl script reads data from a CSV file and draws a series of Box-Plots. Usage:
perl plotly_boxplot.pl -col=number -by=number -title=TITLE
where: -col=number
-- column number containig variable to plot,
-by=number
-- column number containig grouping variable.
#!/usr/bin/perl use WebService::Plotly; use Getopt::Long; # login to plotly script require "$ENV{'HOME'}/bin/login2plotly.pl"; my $plotly = WebService::Plotly->new( un => $plotly_user, key => $plotly_key ); my $sep_sequence = ';'; my $col_number = -1; my $by_col_number = -1; my $chart_title='??Chart title??'; my $header='Y'; #my $boxpoints='outliers'; ## or all' | 'outliers' | False my $USAGE="*** USAGE: -col=i -by=i -title=s -header=s -sep=s FILE *** \n"; # plot values from column 'col' grouped by column 'by'. If header is Y skip first row in data. # Add title 'title'. Columns in csv data are separated by 'sep' (default ';') GetOptions("col=i" => \$col_number, "by=i" => \$by_col_number, "title=s" => \$chart_title, 'header=s' => \$header, 'sep=s' => \$sep_sequence, ); ##'boxpoints=s' => \$boxpoints ) ; ## this option not work! if (($col_number == -1 ) || ($by_col_number == -1) ) { print $USAGE } while (<>) { chomp ($_); $nr++; if (($nr < 2) << ( $header eq 'Y' ) ) { next } $_ =~ s/"//g; my @fields = split(/$sep_sequence/, $_); push @{$data{$fields[$by_col_number]}}, $fields[$col_number]; # http://stackoverflow.com/questions/3779213/how-do-i-push-a-value-onto-a-perl-hash-of-arrays } my @variants = sort keys %data; print STDERR "*** No of rows scanned: $nr ***\n"; print STDERR "*** Groups found: @variants ($boxpoints) \n"; for $k (keys %data ) { print "$k"; push (@boxes, { y =>$data{$k}, type => 'box', #'boxpoints' => 'none', name => "$k" } ) } my $layout = { 'title' => $chart_title }; my $response = $plotly->plot(\@boxes, layout => $layout ); my $url = $response->{url}; my $filename = $response->{filename}; print STDERR "*** done: filename: '$filename' url: '$url' ***\n"
Example: Age of Nobel Prize winners by discipline (grouping wariable) plot.ly/~tomasz.przechlewski/28/
Average monthly outdoor temperature for Sopot, Abrahama in 2010--2013 (recorded with DS18B20 sensor). Average monthly temperature computed as $t_a = \sum_i^n t_i/n$, where: $t_i$ denotes measurement recorded in month $i$ (the temperature is recorded every hour BTW).
Click to see interactive version and data at plot.ly
The data shows (among other things:-) that December 2013 was really hot.