README.md 3.6 KB

Novinky Polls

Download and render (as HTML, TeX, or plain text) opinion polls from Novinky.cz, a czech news site.

Installation

This software requires Python 3. See Python's website for installation instructions.

When you have Python 3 installed, install required packages with pip (Python's package management system):

pip install requests
pip install beautifulsoup4
pip install pystache

Then you can call the executables:

./novinky-polls-add-archive-org -h
./novinky-polls-add-current -h
./novinky-polls-render-html -h
./novinky-polls-render-print -h
./novinky-polls-render-text -h
./novinky-polls-analyze -h

Or you can install this software as a Python package, which will also install all the dependencies and make the executables available globally:

python setup.py install

novinky-polls-add-archive-org -h
novinky-polls-add-current -h
novinky-polls-render-html -h
novinky-polls-render-print -h
novinky-polls-render-text -h
novinky-polls-analyze -h

Usage

Each Novinky.cz's poll is identified by a unique ID. The poll data itself (title, answers, percents) are accessible using a public JSON API (for an example of an HTTP request to retrieve this data see novinky_polls/test/curl_one_poll_json.sh). The poll data however do not include information on when (date and time) the poll was present at Novinky.cz's homepage. Therefore we need to get this information elsewhere -- either from Archive.org or by checking current Novinky.cz's homepage and saving current timestamp.

This software works in two phases:

1. Create a map file mapping poll IDs to timestamps

Add poll IDs and timestamps archived by Archive.org to a map file:

novinky-polls-add-archive-org -i my_polls_map.txt

Add current poll ID with current timestamp to a map file:

novinky-polls-add-current -i my_polls_map.txt

The map file my_polls_map.txt now contains a map of poll IDs to timestamps:

20150101190335 13678
20150101190338 13678
20150102153411 None
20150102202406 13677
20150102202408 13677
...

Value None means that there was no poll in the time of the timestamp.

2. Download and render the polls

Once you have the map file, you can download the polls and render them in various formats.

HTML

novinky-polls-render-html -c my_cache_dir -i my_polls_map.txt -o my_polls_export.html -l cs_CZ.utf8

TeX

novinky-polls-render-print -c my_cache_dir -i my_polls_map.txt -o my_polls_export.tex -l cs_CZ.utf8

The TeX file is meant to be exported to PDF using lualatex.

Plain text

novinky-polls-render-text -c my_cache_dir -i my_polls_map.txt -o my_polls_export.txt -l cs_CZ.utf8

Categorize polls and print the statistics

novinky-polls-analyze -c my_cache_dir -i my_polls_map.txt

Help

Call any of the executables mentioned in Usage with the parameter -h or --help to see full documentation. Example:

novinky-polls-add-current -h

Contributing

Feel free to remix this piece of software. See NOTICE and LICENSE for license information.

You might find these reusable modules useful:

  • scraper.py contains functions to download Novinky.cz's homepage and parse the ID of the current poll. It also contains functions to download a poll from Novinky.cz's public JSON API.
  • analyzer.py contains function to parse Novinky.cz's poll JSON data and to read and write the map of poll IDs and dates.
  • reader.py puts scraper.py and analyzer.py together. Use it's functions to retrieve the data of all polls mentioned in the poll map.