You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
Go to file
Jakub Valenta 56ecccf521 setup.py: Restore completely 9 months ago
novinky_polls setup: Switch to poetry 9 months ago
.gitignore gitignore: Ignore LaTeX 'auto' dir 2 years ago
LICENSE Initial commit 8 years ago
Makefile setup: Switch to poetry 9 months ago
NOTICE NOTICE: Update copyright 1 year ago
README.md analyzer: Make map timestamps in ISO format 4 years ago
novinky-polls-add-archive-org Use pipenv and require Python 3.6 4 years ago
novinky-polls-add-current Use pipenv and require Python 3.6 4 years ago
novinky-polls-analyze Use pipenv and require Python 3.6 4 years ago
novinky-polls-refresh Use pipenv and require Python 3.6 4 years ago
novinky-polls-render-html Use pipenv and require Python 3.6 4 years ago
novinky-polls-render-print Use pipenv and require Python 3.6 4 years ago
novinky-polls-render-text Use pipenv and require Python 3.6 4 years ago
poetry.lock setup: Decrease Python requirement 9 months ago
pyproject.toml setup.py: Restore completely 9 months ago
setup.py setup.py: Restore completely 9 months ago
tox.ini setup: Switch to poetry 9 months ago

README.md

Novinky Polls

Download and render opinion polls from the Czech news site Novinky.cz.

Supported output formats:

  • HTML
  • TeX
  • plain text

Installation

Arch Linux

# pacman -S pipenv
$ make setup

Other systems

Install these dependencies manually:

  • Python > 3.6
  • pipenv

Then run:

$ make setup

Usage

Each Novinky.cz's poll is identified by a unique id. The poll data itself (title, answers, percents) are accessible using a public JSON API (for an example of an HTTP request to retrieve this data see novinky_polls/test/curl_one_poll_json.sh). The poll data however do not include information on when (date and time) the poll was present at Novinky.cz's homepage. Therefore we need to get this information elsewhere -- either from Archive.org or by checking current Novinky.cz's homepage and saving current timestamp.

This software works in two phases:

1. Create a CSV file mapping poll timestamps to poll ids

Add poll ids and timestamps archived by Archive.org to a CSV file:

./novinky-polls-add-archive-org -i my_polls.csv

Add current poll id with current timestamp to a CSV file:

./novinky-polls-add-current -i my_polls.csv

The CSV file my_polls.csv now contains a map of poll timestamps to poll ids:

2015-01-01T19:03:35+02:00,13678
2015-01-01T19:03:38+02:00,13678
2015-01-02T15:34:11+02:00,None
2015-01-02T20:24:06+02:00,13677
2015-01-02T20:24:08+02:00,13677
...

Value None means that there was no poll at the time of the timestamp.

2. Download and render the polls

Once you have the CSV file, you can download the polls and render them in various formats.

HTML

./novinky-polls-render-html -c my_cache_dir -i my_polls.csv -o my_polls_export.html -l cs_CZ.utf8

TeX

./novinky-polls-render-print -c my_cache_dir -i my_polls.csv -o my_polls_export.tex -l cs_CZ.utf8

The TeX file is meant to be exported to PDF using lualatex.

Plain text

./novinky-polls-render-text -c my_cache_dir -i my_polls.csv -o my_polls_export.txt -l cs_CZ.utf8

Categorize polls and print the statistics

./novinky-polls-analyze -c my_cache_dir -i my_polls.csv

Help

Call the executables with the argument -h or --help:

./novinky-polls-add-archive-org --help
./novinky-polls-add-current --help
./novinky-polls-render-html --help
./novinky-polls-render-print --help
./novinky-polls-render-text --help
./novinky-polls-analyze --help

Contributing

Feel free to remix this project under the terms of the Apache License, Version 2.0.

You might find these reusable modules useful:

  • scraper.py contains functions to download Novinky.cz's homepage and parse the id of the current poll. It also contains functions to download a poll from Novinky.cz's public JSON API.
  • analyzer.py contains functions to parse Novinky.cz's poll JSON data and to read and write the map of poll timestamps to poll ids.
  • reader.py puts scraper.py and analyzer.py together. Use it's functions to retrieve the data of all polls mentioned in the poll map.