You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Jakub Valenta db81072c57 setup: Bump patch version 1 month ago
novinky_polls analyzer: Fix ISO format 1 month ago
.gitignore Reformat code and fix linting 3 months ago
LICENSE Initial commit 3 years ago
Makefile Makefile: Add 'make install' 2 months ago
NOTICE Initial commit 3 years ago
Pipfile add_current: Save time including timestamp 1 month ago
Pipfile.lock add_current: Save time including timestamp 1 month ago
README.md analyzer: Make map timestamps in ISO format 1 month ago
novinky-polls-add-archive-org Use pipenv and require Python 3.6 3 months ago
novinky-polls-add-current Use pipenv and require Python 3.6 3 months ago
novinky-polls-analyze Use pipenv and require Python 3.6 3 months ago
novinky-polls-refresh Use pipenv and require Python 3.6 3 months ago
novinky-polls-render-html Use pipenv and require Python 3.6 3 months ago
novinky-polls-render-print Use pipenv and require Python 3.6 3 months ago
novinky-polls-render-text Use pipenv and require Python 3.6 3 months ago
setup.cfg Use pipenv and add testing and linting tools 3 months ago
setup.py setup: Bump patch version 1 month ago
tox.ini Use pipenv and add testing and linting tools 3 months ago

README.md

Novinky Polls

Download and render opinion polls from the Czech news site Novinky.cz.

Supported output formats:

  • HTML
  • TeX
  • plain text

Installation

Arch Linux

# pacman -S pipenv
$ make setup

Other systems

Install these dependencies manually:

  • Python > 3.6
  • pipenv

Then run:

$ make setup

Usage

Each Novinky.cz’s poll is identified by a unique id. The poll data itself (title, answers, percents) are accessible using a public JSON API (for an example of an HTTP request to retrieve this data see novinky_polls/test/curl_one_poll_json.sh). The poll data however do not include information on when (date and time) the poll was present at Novinky.cz’s homepage. Therefore we need to get this information elsewhere -- either from Archive.org or by checking current Novinky.cz’s homepage and saving current timestamp.

This software works in two phases:

1. Create a CSV file mapping poll timestamps to poll ids

Add poll ids and timestamps archived by Archive.org to a CSV file:

./novinky-polls-add-archive-org -i my_polls.csv

Add current poll id with current timestamp to a CSV file:

./novinky-polls-add-current -i my_polls.csv

The CSV file my_polls.csv now contains a map of poll timestamps to poll ids:

2015-01-01T19:03:35+02:00,13678
2015-01-01T19:03:38+02:00,13678
2015-01-02T15:34:11+02:00,None
2015-01-02T20:24:06+02:00,13677
2015-01-02T20:24:08+02:00,13677
...

Value None means that there was no poll at the time of the timestamp.

2. Download and render the polls

Once you have the CSV file, you can download the polls and render them in various formats.

HTML

./novinky-polls-render-html -c my_cache_dir -i my_polls.csv -o my_polls_export.html -l cs_CZ.utf8

TeX

./novinky-polls-render-print -c my_cache_dir -i my_polls.csv -o my_polls_export.tex -l cs_CZ.utf8

The TeX file is meant to be exported to PDF using lualatex.

Plain text

./novinky-polls-render-text -c my_cache_dir -i my_polls.csv -o my_polls_export.txt -l cs_CZ.utf8

Categorize polls and print the statistics

./novinky-polls-analyze -c my_cache_dir -i my_polls.csv

Help

Call the executables with the argument -h or --help:

./novinky-polls-add-archive-org --help
./novinky-polls-add-current --help
./novinky-polls-render-html --help
./novinky-polls-render-print --help
./novinky-polls-render-text --help
./novinky-polls-analyze --help

Contributing

Feel free to remix this project under the terms of the Apache License, Version 2.0.

You might find these reusable modules useful:

  • scraper.py contains functions to download Novinky.cz’s homepage and parse the id of the current poll. It also contains functions to download a poll from Novinky.cz’s public JSON API.
  • analyzer.py contains functions to parse Novinky.cz’s poll JSON data and to read and write the map of poll timestamps to poll ids.
  • reader.py puts scraper.py and analyzer.py together. Use it’s functions to retrieve the data of all polls mentioned in the poll map.