Download and render (as HTML, TeX, or plain text) opinion polls from Novinky.cz, a czech news site.
This software requires Python 3. See Python's website for installation instructions.
When you have Python 3 installed, install required packages with pip (Python's package management system):
pip install requests pip install beautifulsoup4 pip install pystache
Then you can call the executables:
./novinky-polls-add-archive-org -h ./novinky-polls-add-current -h ./novinky-polls-render-html -h ./novinky-polls-render-print -h ./novinky-polls-render-text -h ./novinky-polls-analyze -h
Or you can install this software as a Python package, which will also install all the dependencies and make the executables available globally:
python setup.py install novinky-polls-add-archive-org -h novinky-polls-add-current -h novinky-polls-render-html -h novinky-polls-render-print -h novinky-polls-render-text -h novinky-polls-analyze -h
Each Novinky.cz's poll is identified by a unique ID. The poll data itself (title, answers, percents) are accessible using a public JSON API (for an example of an HTTP request to retrieve this data see novinky_polls/test/curl_one_poll_json.sh). The poll data however do not include information on when (date and time) the poll was present at Novinky.cz's homepage. Therefore we need to get this information elsewhere -- either from Archive.org or by checking current Novinky.cz's homepage and saving current timestamp.
This software works in two phases:
Add poll IDs and timestamps archived by Archive.org to a map file:
novinky-polls-add-archive-org -i my_polls_map.txt
Add current poll ID with current timestamp to a map file:
novinky-polls-add-current -i my_polls_map.txt
The map file
my_polls_map.txt now contains a map of poll IDs to timestamps:
20150101190335 13678 20150101190338 13678 20150102153411 None 20150102202406 13677 20150102202408 13677 ...
None means that there was no poll in the time of the timestamp.
Once you have the map file, you can download the polls and render them in various formats.
novinky-polls-render-html -c my_cache_dir -i my_polls_map.txt -o my_polls_export.html -l cs_CZ.utf8
novinky-polls-render-print -c my_cache_dir -i my_polls_map.txt -o my_polls_export.tex -l cs_CZ.utf8
The TeX file is meant to be exported to PDF using lualatex.
novinky-polls-render-text -c my_cache_dir -i my_polls_map.txt -o my_polls_export.txt -l cs_CZ.utf8
novinky-polls-analyze -c my_cache_dir -i my_polls_map.txt
Call any of the executables mentioned in Usage with the parameter
--help to see full documentation. Example:
You might find these reusable modules useful:
scraper.pycontains functions to download Novinky.cz's homepage and parse the ID of the current poll. It also contains functions to download a poll from Novinky.cz's public JSON API.
analyzer.pycontains function to parse Novinky.cz's poll JSON data and to read and write the map of poll IDs and dates.
analyzer.pytogether. Use it's functions to retrieve the data of all polls mentioned in the poll map.