|
9 months ago | |
---|---|---|
novinky_polls | 9 months ago | |
.gitignore | 2 years ago | |
LICENSE | 8 years ago | |
Makefile | 9 months ago | |
NOTICE | 1 year ago | |
README.md | 4 years ago | |
novinky-polls-add-archive-org | 4 years ago | |
novinky-polls-add-current | 4 years ago | |
novinky-polls-analyze | 4 years ago | |
novinky-polls-refresh | 4 years ago | |
novinky-polls-render-html | 4 years ago | |
novinky-polls-render-print | 4 years ago | |
novinky-polls-render-text | 4 years ago | |
poetry.lock | 9 months ago | |
pyproject.toml | 9 months ago | |
setup.py | 9 months ago | |
tox.ini | 9 months ago |
README.md
Novinky Polls
Download and render opinion polls from the Czech news site Novinky.cz.
Supported output formats:
- HTML
- TeX
- plain text
Installation
Arch Linux
# pacman -S pipenv
$ make setup
Other systems
Install these dependencies manually:
- Python > 3.6
- pipenv
Then run:
$ make setup
Usage
Each Novinky.cz's poll is identified by a unique id. The poll data itself (title, answers, percents) are accessible using a public JSON API (for an example of an HTTP request to retrieve this data see novinky_polls/test/curl_one_poll_json.sh). The poll data however do not include information on when (date and time) the poll was present at Novinky.cz's homepage. Therefore we need to get this information elsewhere -- either from Archive.org or by checking current Novinky.cz's homepage and saving current timestamp.
This software works in two phases:
1. Create a CSV file mapping poll timestamps to poll ids
Add poll ids and timestamps archived by Archive.org to a CSV file:
./novinky-polls-add-archive-org -i my_polls.csv
Add current poll id with current timestamp to a CSV file:
./novinky-polls-add-current -i my_polls.csv
The CSV file my_polls.csv
now contains a map of poll timestamps to poll ids:
2015-01-01T19:03:35+02:00,13678
2015-01-01T19:03:38+02:00,13678
2015-01-02T15:34:11+02:00,None
2015-01-02T20:24:06+02:00,13677
2015-01-02T20:24:08+02:00,13677
...
Value None
means that there was no poll at the time of the timestamp.
2. Download and render the polls
Once you have the CSV file, you can download the polls and render them in various formats.
HTML
./novinky-polls-render-html -c my_cache_dir -i my_polls.csv -o my_polls_export.html -l cs_CZ.utf8
TeX
./novinky-polls-render-print -c my_cache_dir -i my_polls.csv -o my_polls_export.tex -l cs_CZ.utf8
The TeX file is meant to be exported to PDF using lualatex.
Plain text
./novinky-polls-render-text -c my_cache_dir -i my_polls.csv -o my_polls_export.txt -l cs_CZ.utf8
Categorize polls and print the statistics
./novinky-polls-analyze -c my_cache_dir -i my_polls.csv
Help
Call the executables with the argument -h
or --help
:
./novinky-polls-add-archive-org --help
./novinky-polls-add-current --help
./novinky-polls-render-html --help
./novinky-polls-render-print --help
./novinky-polls-render-text --help
./novinky-polls-analyze --help
Contributing
Feel free to remix this project under the terms of the Apache License, Version 2.0.
You might find these reusable modules useful:
scraper.py
contains functions to download Novinky.cz's homepage and parse the id of the current poll. It also contains functions to download a poll from Novinky.cz's public JSON API.analyzer.py
contains functions to parse Novinky.cz's poll JSON data and to read and write the map of poll timestamps to poll ids.reader.py
putsscraper.py
andanalyzer.py
together. Use it's functions to retrieve the data of all polls mentioned in the poll map.