Compare commits

..

35 Commits
master ... main

Author SHA1 Message Date
rra
028bc1df84 fix issue where posts with enclosures would not download files because of missing dir 2024-11-20 11:25:26 +01:00
82a017f624 Merge branch 'new-taxonomy' 2024-11-07 16:35:18 +01:00
rra
9d9f8f6d72 do proper deletion 2024-11-06 17:24:55 +01:00
rra
e01aa9a607 Test whether a url still returns a feed, pass right filename as featured_image when handling enclosure, pass post_dir to existing_posts 2024-11-06 16:48:41 +01:00
3055ee37df Merge pull request 'adjust templates to new taxonomy' (#43) from new-taxonomy into main
Reviewed-on: ruangrupa/konfluks#43
2022-12-02 16:59:49 +01:00
rra
a4f749ebd7 adjust templates to new taxonomy 2022-09-15 18:37:25 +02:00
rra
0ecc0ecd3a handle paths and extensions properly, fix #41 2022-09-09 14:19:19 +02:00
rra
657ced1ceb undo dev setup changes 2022-09-09 13:27:29 +02:00
rra
d21158eb91 add support for videos in posts 2022-09-09 13:22:32 +02:00
decentral1se
98299daa1b
fix links 2022-07-18 12:21:01 +02:00
decentral1se
6020db4d15
additional gardening for konfluks rename 2022-07-18 12:16:52 +02:00
decentral1se
2b06a5f866
Merge remote-tracking branch 'konfluks/konfluks-renaming' 2022-07-18 12:05:00 +02:00
decentral1se
e66e3202da
add new hashtag 2022-06-21 00:00:32 +02:00
ff76378cdd merge christopher's changes pulling the timeline from pen.lumbung.space 2022-06-14 19:27:31 +05:00
41bc532ebc separate hashtags by comma 2022-06-10 15:55:17 +05:00
rra
845a54787b Update 'README.md' 2022-06-02 09:29:20 +02:00
rra
f162bb946a Update 'README.md'
correcting markup / styling
2022-06-02 09:28:37 +02:00
rra
00f795f16d rename project to konfluks for legibility, add docs 2022-06-02 09:23:58 +02:00
rra
b0f77831bd add 'contributors' as metadata category 2022-06-02 06:45:54 +02:00
rra
5ba944b6d1 Merge pull request 'handle feeds with enclosures (featured media / podcasts)' (#35) from r/lumbunglib:master into master
Reviewed-on: ruangrupa/lumbunglib#35
2022-06-01 08:05:36 +02:00
rra
ad591ea9cf add more checks for failures 2022-06-01 05:51:25 +02:00
rra
9c824fcd3f Merge remote-tracking branch 'upstream/master' 2022-05-29 14:45:30 +02:00
rra
cab36c8ac6 add less generic headers 2022-05-29 14:45:11 +02:00
rra
c84a975887 add reason for failure 2022-05-29 12:30:55 +02:00
2ca61c6197 Merge pull request 'accomodate authors as taxonomy' (#34) from r/lumbunglib:master into master
Reviewed-on: ruangrupa/lumbunglib#34
2022-05-27 13:24:32 +02:00
rra
fecf5cd64e add rudimentary support for enclosures & featured images 2022-05-24 15:39:11 +02:00
rra
6e64d64772 only return an author if there is one 2022-05-24 12:19:50 +02:00
rra
3b390d1ecb change template to authors to accomodate author taxonomy 2022-05-24 12:19:50 +02:00
rra
ce3bfc58b0 remove orphaned " 2022-05-24 12:19:50 +02:00
c5af3610a0 Merge pull request 'feed: assign pen category' (#33) from pen-category into master
Reviewed-on: ruangrupa/lumbunglib#33
2022-04-26 08:30:34 +02:00
3ea798b301
feed: assign pen category 2022-04-21 14:17:12 +02:00
decentral1se
7d3863641d
Revert "feat: sanitize all yaml"
This reverts commit 2fbc952a72.
2022-04-13 12:48:42 +02:00
decentral1se
f6a1a684c0
Revert "fix: don't escape some characters"
This reverts commit cf8b1ff7e9.
2022-04-13 12:48:20 +02:00
decentral1se
58afd189a7
Revert "feed: move to saneyaml"
This reverts commit a809433410.
2022-04-13 12:48:13 +02:00
19ab610dfc Merge pull request 'feat: sanitize all yaml' (#28) from knoflook/lumbunglib:master into master
Reviewed-on: ruangrupa/lumbunglib#28
2022-04-12 13:44:34 +02:00
16 changed files with 633 additions and 149 deletions

View File

@ -1,8 +1,60 @@
# lumbunglib
![Konfluks logo is a stylized and schematic representation of a drainage basin](./konfluks.svg)
> Python lib which powers `lumbung.space` automation
# Konfluks
## hacking
A drainage basin is a geographical feature that collects all precipitation in an area, first in to smaller streams and finally together in to the large river. Similarly, Konfluks can bring together small and dispersed streams of web content from different applications and websites together in a single large stream.
Specifically, Konfluks turns Peertube videos, iCal calendar events, other websites through their RSS and OPDS feeds and Mastodon posts under a hashtag in to Hugo page bundles. This allows one to publish from diverse sources to a single stream.
Konfluks was first made by [Roel Roscam Abbing](https://test.roelof.info/) as part of [lumbung.space](https://lumbung.space), together with [ruangrupa](https://ruangrupa.id) and [Autonomic](https://autonomic.zone).
## Philosophy
Konfluks tries to act as a mirror representation of the input sources. That means that whenever something remote is deleted, changed or becomes unavailable, it is also changed or deleted by Konfluks.
Konfluks tries to preserve intention. That means the above, but also by requiring explicit ways of publishing.
Konfluks works by periodically polling the remote sources, taking care not to duplicate work. It caches files, asks for last-modified headers, and skips things it has already. This makes every poll as fast and as light as possible.
Konfluks is written for clarity, not brevity nor cleverness.
Konfluks is extendable, a work in progress and a messy undertaking.
## High-level overview
Konfluks consists of different Python scripts which each poll a particular service, say, a [Peertube](https://joinpeertube.org) server, to download information and convert it in to [Hugo Page Bundles](https://gohugo.io/content-management/page-bundles/)
Each script part of Konfluks will essentially to the following:
* Parse a source and request posts/updates/videos/a feed
* Taking care of publish ques
* Create a Hugo post for each item returned, by:
* Making a folder per post in the `output` directory
* Formatting post metadata as [Hugo Post Frontmatter](https://gohugo.io/content-management/front-matter/) in a file called `index.md`
* Grabbing local copies of media and saving them in the post folder
* Adding the post content to `index.md`
* According to jinja2 templates (see `konfluks/templates/`)
The page bundles created, where possible, are given human friendly names.
Here is a typical output structure:
```
user@server: ~/konfluks/output: tree tv/
tv/
├── forum-27an-mother-earth-353f93f3-5fee-49d6-b71d-8aef753f7041
│   ├── 86ccae63-3df9-443c-91f3-edce146055db.jpg
│   └── index.md
├── keroncong-tugu-cafrinho-live-at-ruru-gallery-ruangrupa-jakarta-19-august-2014-e6d5bb2a-d77f-4a00-a449-992a579c8c0d
│   ├── 32291aa2-a391-4219-a413-87521ff373ba.jpg
│   └── index.md
├── lecture-series-1-camp-notes-on-education-8d54d3c9-0322-42af-ab6e-e954d251e076
│   ├── 0f3c835b-42c2-48a3-a2a3-a75ddac8688a.jpg
│   └── index.md
```
## Hacking
Install [poetry](https://python-poetry.org/docs/#osx--linux--bashonwindows-install-instructions):
@ -10,31 +62,20 @@ Install [poetry](https://python-poetry.org/docs/#osx--linux--bashonwindows-insta
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python -
```
We use Poetry because it locks the dependencies all the way down and makes it
easier to manage installation & maintenance in the long-term. Then install the
dependencies & have them managed by Poetry:
We use Poetry because it locks the dependencies all the way down and makes it easier to manage installation & maintenance in the long-term. Then install the dependencies & have them managed by Poetry:
```
poetry install
```
Each script requires some environment variables to run, you can see the latest
deployment configuration over
[here](https://git.autonomic.zone/ruangrupa/lumbung.space/src/branch/main/compose.yml),
look for the values under the `environment: ...` stanza.
Each script requires some environment variables to run, you can see the latest deployment configuration over [here](https://git.autonomic.zone/ruangrupa/lumbung.space/src/branch/main/compose.yml), look for the values under the `environment: ...` stanza.
All scripts have an entrypoint described in the
[`pypoetry.toml`](https://git.autonomic.zone/ruangrupa/lumbunglib/src/commit/40bf9416b8792c08683ad8ac878093c7ef1b2f5d/pyproject.toml#L27-L31)
which you can run via `poetry run ...`. For example, if you want to run the
[`lumbunglib/video.py`](./lumbunglib/video.py) script, you'd do:
All scripts have an entrypoint described in the [`pypoetry.toml`](./pyproject.toml) which you can run via `poetry run ...`. For example, if you want to run the [`konfluks/video.py`](./konfluks/video.py) script, you'd do:
```
mkdir -p testdir
export OUTPUT_DIR=/testdir
poetry run lumbunglib-vid
poetry run konfluks-vid
```
Run `poetry run poetry2setup > setup.py` if updating the poetry dependencies.
This allows us to run `pip install .` in the deployment and Pip will understand
that it is just a regular Python package. If adding a new cli command, extend
`pyproject.toml` with a new `[tool.poetry.scripts]` entry.
Run `poetry run poetry2setup > setup.py` if updating the poetry dependencies. This allows us to run `pip install .` in the deployment and Pip will understand that it is just a regular Python package. If adding a new cli command, extend `pyproject.toml` with a new `[tool.poetry.scripts]` entry.

31
konfluks.svg Normal file

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 29 KiB

View File

@ -138,9 +138,9 @@ def create_event_post(post_dir, event):
for img in event_metadata["images"]:
# parse img url to safe local image name
img_name = img.split("/")[-1]
fn, ext = img_name.split(".")
img_name = slugify(fn) + "." + ext
img_name = os.path.basename(img)
fn, ext = os.path.splitext(img_name)
img_name = slugify(fn) + '.' + ext
local_image = os.path.join(post_dir, img_name)

442
konfluks/feed.py Normal file
View File

@ -0,0 +1,442 @@
import os
import shutil
import time
from hashlib import md5
from ast import literal_eval as make_tuple
from pathlib import Path
from urllib.parse import urlparse
from re import sub
import arrow
import feedparser
import jinja2
import requests
from bs4 import BeautifulSoup
from slugify import slugify
from re import compile as re_compile
yamlre = re_compile('"')
def write_etag(feed_name, feed_data):
"""
save timestamp of when feed was last modified
"""
etag = ""
modified = ""
if "etag" in feed_data:
etag = feed_data.etag
if "modified" in feed_data:
modified = feed_data.modified
if etag or modified:
with open(os.path.join("etags", feed_name + ".txt"), "w") as f:
f.write(str((etag, modified)))
def get_etag(feed_name):
"""
return timestamp of when feed was last modified
"""
fn = os.path.join("etags", feed_name + ".txt")
etag = ""
modified = ""
if os.path.exists(fn):
etag, modified = make_tuple(open(fn, "r").read())
return etag, modified
def create_frontmatter(entry):
"""
parse RSS metadata and return as frontmatter
"""
if 'published' in entry:
published = entry.published_parsed
if 'updated' in entry:
published = entry.updated_parsed
published = arrow.get(published)
if 'author' in entry:
author = entry.author
else:
author = ''
if 'authors' in entry:
authors = []
for a in entry.authors:
authors.append(a['name'])
if 'summary' in entry:
summary = entry.summary
else:
summary = ''
if 'publisher' in entry:
publisher = entry.publisher
else:
publisher = ''
tags = []
if 'tags' in entry:
#TODO finish categories
for t in entry.tags:
tags.append(t['term'])
if "featured_image" in entry:
featured_image = entry.featured_image
else:
featured_image = ''
card_type = "network"
if entry.feed_name == "pen.lumbung.space":
card_type = "pen"
if "opds" in entry:
frontmatter = {
'title':entry.title,
'date': published.format(),
'summary': summary,
'author': ",".join(authors),
'publisher': publisher,
'original_link': entry.links[0]['href'].replace('opds/cover/','books/'),
'feed_name': entry['feed_name'],
'tags': str(tags),
'category': "books"
}
else:
frontmatter = {
'title':entry.title,
'date': published.format(),
'summary': '',
'author': author,
'original_link': entry.link,
'feed_name': entry['feed_name'],
'tags': str(tags),
'card_type': card_type,
'featured_image': featured_image
}
return frontmatter
def sanitize_yaml (frontmatter):
"""
Escapes any occurences of double quotes
in any of the frontmatter fields
See: https://docs.octoprint.org/en/master/configuration/yaml.html#interesting-data-types
"""
for k, v in frontmatter.items():
if type(v) == type([]):
#some fields are lists
l = []
for i in v:
i = yamlre.sub('\\"', i)
l.append(i)
frontmatter[k] = l
else:
v = yamlre.sub('\\"', v)
frontmatter[k] = v
return frontmatter
def parse_enclosures(post_dir, entry):
"""
Parses feed enclosures which are featured media
Can be featured image but also podcast entries
https://pythonhosted.org/feedparser/reference-entry-enclosures.html
"""
#TODO parse more than images
#TODO handle the fact it could be multiple items
for e in entry.enclosures:
if "type" in e:
print("found enclosed media", e.type)
if "image/" in e.type:
if not os.path.exists(post_dir): #this might be redundant with create_post
os.makedirs(post_dir)
featured_image = grab_media(post_dir, e.href)
media_item = urlparse(e.href).path.split('/')[-1]
entry["featured_image"] = media_item
else:
print("FIXME:ignoring enclosed", e.type)
return entry
def create_post(post_dir, entry):
"""
write hugo post based on RSS entry
"""
if "enclosures" in entry:
entry = parse_enclosures(post_dir, entry)
frontmatter = create_frontmatter(entry)
if not os.path.exists(post_dir):
os.makedirs(post_dir)
if "content" in entry:
post_content = entry.content[0].value
else:
post_content = entry.summary
parsed_content = parse_posts(post_dir, post_content)
template_dir = os.path.join(Path(__file__).parent.resolve(), "templates")
env = jinja2.Environment(loader=jinja2.FileSystemLoader(template_dir))
template = env.get_template("feed.md")
with open(os.path.join(post_dir, "index.html"), "w") as f: # n.b. .html
post = template.render(frontmatter=sanitize_yaml(frontmatter), content=parsed_content)
f.write(post)
print("created post for", entry.title, "({})".format(entry.link))
def grab_media(post_directory, url, prefered_name=None):
"""
download media linked in post to have local copy
if download succeeds return new local path otherwise return url
"""
media_item = urlparse(url).path.split('/')[-1]
headers = {
'User-Agent': 'https://git.autonomic.zone/ruangrupa/lumbunglib',
'From': 'info@lumbung.space' # This is another valid field
}
if prefered_name:
media_item = prefered_name
try:
if not os.path.exists(os.path.join(post_directory, media_item)):
#TODO: stream is true is a conditional so we could check the headers for things, mimetype etc
response = requests.get(url, headers=headers, stream=True)
if response.ok:
with open(os.path.join(post_directory, media_item), 'wb') as media_file:
shutil.copyfileobj(response.raw, media_file)
print('Downloaded media item', media_item)
return media_item
else:
print("Download failed", response.status_code)
return url
return media_item
elif os.path.exists(os.path.join(post_directory, media_item)):
return media_item
except Exception as e:
print('Failed to download image', url)
print(e)
return url
def parse_posts(post_dir, post_content):
"""
parse the post content to for media items
replace foreign image with local copy
filter out iframe sources not in allowlist
"""
soup = BeautifulSoup(post_content, "html.parser")
allowed_iframe_sources = ["youtube.com", "vimeo.com", "tv.lumbung.space"]
for img in soup(["img", "object"]):
if img.get("src") != None:
local_image = grab_media(post_dir, img["src"])
if img["src"] != local_image:
img["src"] = local_image
for iframe in soup(["iframe"]):
if not any(source in iframe["src"] for source in allowed_iframe_sources):
print("filtered iframe: {}...".format(iframe["src"][:25]))
iframe.decompose()
return soup.decode()
def grab_feed(feed_url):
"""
check whether feed has been updated
download & return it if it has
"""
feed_name = urlparse(feed_url).netloc
etag, modified = get_etag(feed_name)
try:
if modified:
data = feedparser.parse(feed_url, modified=modified)
elif etag:
data = feedparser.parse(feed_url, etag=etag)
else:
data = feedparser.parse(feed_url)
except Exception as e:
print("Error grabbing feed")
print(feed_name)
print(e)
return False
if "status" in data:
print(data.status, feed_url)
if data.status == 200:
# 304 means the feed has not been modified since we last checked
write_etag(feed_name, data)
return data
return False
def create_opds_post(post_dir, entry):
"""
create a HUGO post based on OPDS entry
or update it if the timestamp is newer
Downloads the cover & file
"""
frontmatter = create_frontmatter(entry)
template_dir = os.path.join(Path(__file__).parent.resolve(), "templates")
env = jinja2.Environment(loader=jinja2.FileSystemLoader(template_dir))
template = env.get_template("feed.md")
if not os.path.exists(post_dir):
os.makedirs(post_dir)
if os.path.exists(os.path.join(post_dir, '.timestamp')):
old_timestamp = open(os.path.join(post_dir, '.timestamp')).read()
old_timestamp = arrow.get(float(old_timestamp))
current_timestamp = arrow.get(entry['updated_parsed'])
if current_timestamp > old_timestamp:
pass
else:
print('Book "{}..." already up to date'.format(entry['title'][:32]))
return
for item in entry.links:
ft = item['type'].split('/')[-1]
fn = item['rel'].split('/')[-1]
if fn == "acquisition":
fn = "publication" #calling the publications acquisition is weird
prefered_name = "{}-{}.{}".format(fn, slugify(entry['title']), ft)
grab_media(post_dir, item['href'], prefered_name)
if "summary" in entry:
summary = entry.summary
else:
summary = ""
with open(os.path.join(post_dir,'index.md'),'w') as f:
post = template.render(frontmatter=sanitize_yaml(frontmatter), content=summary)
f.write(post)
print('created post for Book', entry.title)
with open(os.path.join(post_dir, '.timestamp'), 'w') as f:
timestamp = arrow.get(entry['updated_parsed'])
f.write(timestamp.format('X'))
def main():
feed_urls = open("feeds_list.txt", "r").read().splitlines()
start = time.time()
if not os.path.exists("etags"):
os.mkdir("etags")
output_dir = os.environ.get("OUTPUT_DIR")
if not os.path.exists(output_dir):
os.makedirs(output_dir)
feed_dict = dict()
for url in feed_urls:
feed_name = urlparse(url).netloc
feed_dict[url] = feed_name
feed_names = feed_dict.values()
content_dirs = os.listdir(output_dir)
for i in content_dirs:
if i not in feed_names:
shutil.rmtree(os.path.join(output_dir, i))
print("%s not in feeds_list.txt, removing local data" %(i))
# add iframe to the allowlist of feedparser's sanitizer,
# this is now handled in parse_post()
feedparser.sanitizer._HTMLSanitizer.acceptable_elements |= {"iframe"}
for feed_url in feed_urls:
feed_name = feed_dict[feed_url]
feed_dir = os.path.join(output_dir, feed_name)
if not os.path.exists(feed_dir):
os.makedirs(feed_dir)
existing_posts = os.listdir(feed_dir)
data = grab_feed(feed_url)
if data: #whenever we get a 200
if data.feed: #only if it is an actual feed
opds_feed = False
if 'links' in data.feed:
for i in data.feed['links']:
if i['rel'] == 'self':
if 'opds' in i['type']:
opds_feed = True
print("OPDS type feed!")
for entry in data.entries:
# if 'tags' in entry:
# for tag in entry.tags:
# for x in ['lumbung.space', 'D15', 'lumbung']:
# if x in tag['term']:
# print(entry.title)
entry["feed_name"] = feed_name
post_name = slugify(entry.title)
# pixelfed returns the whole post text as the post name. max
# filename length is 255 on many systems. here we're shortening
# the name and adding a hash to it to avoid a conflict in a
# situation where 2 posts start with exactly the same text.
if len(post_name) > 150:
post_hash = md5(bytes(post_name, "utf-8"))
post_name = post_name[:150] + "-" + post_hash.hexdigest()
if opds_feed:
entry['opds'] = True
#format: Beyond-Debiasing-Report_Online-75535a4886e3
post_name = slugify(entry['title'])+'-'+entry['id'].split('-')[-1]
post_dir = os.path.join(output_dir, feed_name, post_name)
if post_name not in existing_posts:
# if there is a blog entry we dont already have, make it
if opds_feed:
create_opds_post(post_dir, entry)
else:
create_post(post_dir, entry)
elif post_name in existing_posts:
# if we already have it, update it
if opds_feed:
create_opds_post(post_dir, entry)
else:
create_post(post_dir, entry)
existing_posts.remove(
post_name
) # create list of posts which have not been returned by the feed
for post in existing_posts:
# remove blog posts no longer returned by the RSS feed
post_dir = os.path.join(output_dir, feed_name, post)
shutil.rmtree(post_dir)
print("deleted", post_dir)
else:
print(feed_url, "is not or no longer a feed!")
end = time.time()
print(end - start)

View File

@ -23,6 +23,7 @@ hashtags = [
"ruruhaus",
"offbeatentrack_kassel",
"lumbungofpublishers",
"lumbungkiosproducts",
]
@ -59,6 +60,21 @@ def download_media(post_directory, media_attachments):
with open(os.path.join(post_directory, image), "wb") as img_file:
shutil.copyfileobj(response.raw, img_file)
print("Downloaded cover image", image)
elif item["type"] == "video":
video = localize_media_url(item["url"])
if not os.path.exists(os.path.join(post_directory, video)):
# download video file
response = requests.get(item["url"], stream=True)
with open(os.path.join(post_directory, video), "wb") as video_file:
shutil.copyfileobj(response.raw, video_file)
print("Downloaded video in post", video)
if not os.path.exists(os.path.join(post_directory, "thumbnail.png")):
#download video preview
response = requests.get(item["preview_url"], stream=True)
with open(os.path.join(post_directory, "thumbnail.png"), "wb") as thumbnail:
shutil.copyfileobj(response.raw, thumbnail)
print("Downloaded thumbnail for", video)
def create_post(post_directory, post_metadata):
@ -77,7 +93,6 @@ def create_post(post_directory, post_metadata):
post_metadata["account"]["display_name"] = name
env.filters["localize_media_url"] = localize_media_url
env.filters["filter_mastodon_urls"] = filter_mastodon_urls
template = env.get_template("hashtag.md")
with open(os.path.join(post_directory, "index.html"), "w") as f:

View File

@ -2,7 +2,7 @@
title: "{{ event.name }}"
date: "{{ event.begin }}" #2021-06-10T10:46:33+02:00
draft: false
categories: "calendar"
source: "lumbung calendar"
event_begin: "{{ event.begin }}"
event_end: "{{ event.end }}"
duration: "{{ event.duration }}"

View File

@ -0,0 +1,15 @@
---
title: "{{ frontmatter.title }}"
date: "{{ frontmatter.date }}" #2021-06-10T10:46:33+02:00
draft: false
summary: "{{ frontmatter.summary }}"
contributors: {% if frontmatter.author %} ["{{ frontmatter.author }}"] {% endif %}
original_link: "{{ frontmatter.original_link }}"
feed_name: "{{ frontmatter.feed_name}}"
card_type: "{{ frontmatter.card_type }}"
sources: ["{{ frontmatter.feed_name}}"]
tags: {{ frontmatter.tags }}
{% if frontmatter.featured_image %}featured_image: "{{frontmatter.featured_image}}"{% endif %}
---
{{ content }}

View File

@ -0,0 +1,27 @@
---
date: {{ post_metadata.created_at }} #2021-06-10T10:46:33+02:00
draft: false
contributors: ["{{ post_metadata.account.display_name }}"]
avatar: {{ post_metadata.account.avatar }}
title: {{ post_metadata.account.display_name }}
tags: [{% for i in post_metadata.tags %} "{{ i.name }}", {% endfor %}]
images: [{% for i in post_metadata.media_attachments %}{% if i.type == "image" %}"{{ i.url | localize_media_url }}", {%endif%}{% endfor %}]
videos: [{% for i in post_metadata.media_attachments %}{% if i.type == "video" %}"{{ i.url | localize_media_url }}", {%endif%}{% endfor %}]
---
{% for item in post_metadata.media_attachments %}
{% if item.type == "image" %}
<img src="{{item.url | localize_media_url }}" alt="{{item.description}}">
{% endif %}
{% endfor %}
{% for item in post_metadata.media_attachments %}
{% if item.type == "video" %}
<video controls width="540px" preload="none" poster="thumbnail.png">
<source src="{{item.url | localize_media_url }}" type="video/mp4">
{% if item.description %}{{item.description}}{% endif %}
</video>
{% endif %}
{% endfor %}
{{ post_metadata.content | filter_mastodon_urls }}

View File

@ -3,11 +3,12 @@ title: "{{ frontmatter.title }}"
date: "{{ frontmatter.date }}" #2021-06-10T10:46:33+02:00
draft: false
summary: "{{ frontmatter.summary }}"
author: "{{ frontmatter.author }}"
contributors: {% if frontmatter.author %} ["{{ frontmatter.author }}"] {% endif %}
original_link: "{{ frontmatter.original_link }}"
feed_name: "{{ frontmatter.feed_name}}"
categories: ["network", "{{ frontmatter.feed_name}}"]
tags: {{ frontmatter.tags }}
sources: ["timeline", "{{ frontmatter.feed_name}}"]
timelines: {{ frontmatter.timelines }}
hidden: true
---
{{ content }}

View File

@ -6,9 +6,10 @@ uuid: "{{v.uuid}}"
video_duration: "{{ v.duration | duration }} "
video_channel: "{{ v.channel.display_name }}"
channel_url: "{{ v.channel.url }}"
contributors: ["{{ v.account.display_name }}"]
preview_image: "{{ preview_image }}"
images: ["./{{ preview_image }}"]
categories: ["tv","{{ v.channel.display_name }}"]
sources: ["{{ v.channel.display_name }}"]
is_live: {{ v.is_live }}
---

View File

@ -5,6 +5,7 @@ from hashlib import md5
from ast import literal_eval as make_tuple
from pathlib import Path
from urllib.parse import urlparse
from re import sub
import arrow
import feedparser
@ -13,7 +14,7 @@ import requests
from bs4 import BeautifulSoup
from slugify import slugify
from re import compile as re_compile
import saneyaml
yamlre = re_compile('"')
def write_etag(feed_name, feed_data):
@ -84,28 +85,15 @@ def create_frontmatter(entry):
for t in entry.tags:
tags.append(t['term'])
if "opds" in entry:
frontmatter = {
'title':entry.title,
'date': published.format(),
'summary': summary,
'author': ",".join(authors),
'publisher': publisher,
'original_link': entry.links[0]['href'].replace('opds/cover/','books/'),
'feed_name': entry['feed_name'],
'tags': str(tags),
'category': "books"
}
else:
frontmatter = {
frontmatter = {
'title':entry.title,
'date': published.format(),
'summary': '',
'author': author,
'original_link': entry.link,
'feed_name': entry['feed_name'],
'tags': str(tags)
}
'timelines': str(tags),
}
return frontmatter
@ -120,12 +108,12 @@ def sanitize_yaml (frontmatter):
#some fields are lists
l = []
for i in v:
i = saneyaml.load(i)
i = yamlre.sub('\\"', i)
l.append(i)
frontmatter[k] = l
else:
v = saneyaml.load(v)
v = yamlre.sub('\\"', v)
frontmatter[k] = v
return frontmatter
@ -149,7 +137,7 @@ def create_post(post_dir, entry):
template_dir = os.path.join(Path(__file__).parent.resolve(), "templates")
env = jinja2.Environment(loader=jinja2.FileSystemLoader(template_dir))
template = env.get_template("feed.md")
template = env.get_template("timeline.md")
with open(os.path.join(post_dir, "index.html"), "w") as f: # n.b. .html
post = template.render(frontmatter=sanitize_yaml(frontmatter), content=parsed_content)
f.write(post)
@ -195,9 +183,10 @@ def parse_posts(post_dir, post_content):
allowed_iframe_sources = ["youtube.com", "vimeo.com", "tv.lumbung.space"]
for img in soup(["img", "object"]):
local_image = grab_media(post_dir, img["src"])
if img["src"] != local_image:
img["src"] = local_image
if img.get("src") != None:
local_image = grab_media(post_dir, img["src"])
if img["src"] != local_image:
img["src"] = local_image
for iframe in soup(["iframe"]):
if not any(source in iframe["src"] for source in allowed_iframe_sources):
@ -289,7 +278,7 @@ def create_opds_post(post_dir, entry):
def main():
feed_urls = open("feeds_list.txt", "r").read().splitlines()
feed_urls = open("feeds_list_timeline.txt", "r").read().splitlines()
start = time.time()

View File

@ -1,16 +0,0 @@
---
date: "{{ post_metadata.created_at }}" #2021-06-10T10:46:33+02:00
draft: false
author: "{{ post_metadata.account.display_name }}"
avatar: "{{ post_metadata.account.avatar }}"
categories: ["shouts"]
images: [{% for i in post_metadata.media_attachments %} "{{ i.url }}", {% endfor %}]
title: "{{ post_metadata.account.display_name }}"
tags: [{% for i in post_metadata.tags %} "{{ i.name }}", {% endfor %}]
---
{% for item in post_metadata.media_attachments %}
<img src="{{item.url | localize_media_url }}" alt="{{item.description}}">
{% endfor %}
{{ post_metadata.content | filter_mastodon_urls }}

64
poetry.lock generated
View File

@ -242,14 +242,6 @@ category = "main"
optional = false
python-versions = "*"
[[package]]
name = "pyyaml"
version = "6.0"
description = "YAML parser and emitter for Python"
category = "main"
optional = false
python-versions = ">=3.6"
[[package]]
name = "requests"
version = "2.27.1"
@ -268,21 +260,6 @@ urllib3 = ">=1.21.1,<1.27"
socks = ["PySocks (>=1.5.6,!=1.5.7)", "win-inet-pton"]
use_chardet_on_py3 = ["chardet (>=3.0.2,<5)"]
[[package]]
name = "saneyaml"
version = "0.5.2"
description = "Read and write readable YAML safely preserving order and avoiding bad surprises with unwanted infered type conversions. This library is a PyYaml wrapper with sane behaviour to read and write readable YAML safely, typically when used for configuration."
category = "main"
optional = false
python-versions = "<4,>=3.6.*"
[package.dependencies]
PyYAML = "*"
[package.extras]
docs = ["Sphinx (>=3.3.1)", "sphinx-rtd-theme (>=0.5.0)", "doc8 (>=0.8.1)"]
testing = ["pytest (>=6)", "pytest-xdist (>=2)"]
[[package]]
name = "sgmllib3k"
version = "1.0.0"
@ -342,7 +319,7 @@ socks = ["PySocks (>=1.5.6,!=1.5.7,<2.0)"]
[metadata]
lock-version = "1.1"
python-versions = "^3.9"
content-hash = "86ebded9dbd151b57502b40d3e58d6d92f837bc776184afa84d297c40d6daa7a"
content-hash = "c5c987253f949737210f4a3d3c3c24b0affd4a9c7d06de386c9bd514c592db8b"
[metadata.files]
arrow = [
@ -492,49 +469,10 @@ pytz = [
{file = "pytz-2021.3-py2.py3-none-any.whl", hash = "sha256:3672058bc3453457b622aab7a1c3bfd5ab0bdae451512f6cf25f64ed37f5b87c"},
{file = "pytz-2021.3.tar.gz", hash = "sha256:acad2d8b20a1af07d4e4c9d2e9285c5ed9104354062f275f3fcd88dcef4f1326"},
]
pyyaml = [
{file = "PyYAML-6.0-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:d4db7c7aef085872ef65a8fd7d6d09a14ae91f691dec3e87ee5ee0539d516f53"},
{file = "PyYAML-6.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:9df7ed3b3d2e0ecfe09e14741b857df43adb5a3ddadc919a2d94fbdf78fea53c"},
{file = "PyYAML-6.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:77f396e6ef4c73fdc33a9157446466f1cff553d979bd00ecb64385760c6babdc"},
{file = "PyYAML-6.0-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:a80a78046a72361de73f8f395f1f1e49f956c6be882eed58505a15f3e430962b"},
{file = "PyYAML-6.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:f84fbc98b019fef2ee9a1cb3ce93e3187a6df0b2538a651bfb890254ba9f90b5"},
{file = "PyYAML-6.0-cp310-cp310-win32.whl", hash = "sha256:2cd5df3de48857ed0544b34e2d40e9fac445930039f3cfe4bcc592a1f836d513"},
{file = "PyYAML-6.0-cp310-cp310-win_amd64.whl", hash = "sha256:daf496c58a8c52083df09b80c860005194014c3698698d1a57cbcfa182142a3a"},
{file = "PyYAML-6.0-cp36-cp36m-macosx_10_9_x86_64.whl", hash = "sha256:897b80890765f037df3403d22bab41627ca8811ae55e9a722fd0392850ec4d86"},
{file = "PyYAML-6.0-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:50602afada6d6cbfad699b0c7bb50d5ccffa7e46a3d738092afddc1f9758427f"},
{file = "PyYAML-6.0-cp36-cp36m-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:48c346915c114f5fdb3ead70312bd042a953a8ce5c7106d5bfb1a5254e47da92"},
{file = "PyYAML-6.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:98c4d36e99714e55cfbaaee6dd5badbc9a1ec339ebfc3b1f52e293aee6bb71a4"},
{file = "PyYAML-6.0-cp36-cp36m-win32.whl", hash = "sha256:0283c35a6a9fbf047493e3a0ce8d79ef5030852c51e9d911a27badfde0605293"},
{file = "PyYAML-6.0-cp36-cp36m-win_amd64.whl", hash = "sha256:07751360502caac1c067a8132d150cf3d61339af5691fe9e87803040dbc5db57"},
{file = "PyYAML-6.0-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:819b3830a1543db06c4d4b865e70ded25be52a2e0631ccd2f6a47a2822f2fd7c"},
{file = "PyYAML-6.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:473f9edb243cb1935ab5a084eb238d842fb8f404ed2193a915d1784b5a6b5fc0"},
{file = "PyYAML-6.0-cp37-cp37m-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:0ce82d761c532fe4ec3f87fc45688bdd3a4c1dc5e0b4a19814b9009a29baefd4"},
{file = "PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:231710d57adfd809ef5d34183b8ed1eeae3f76459c18fb4a0b373ad56bedcdd9"},
{file = "PyYAML-6.0-cp37-cp37m-win32.whl", hash = "sha256:c5687b8d43cf58545ade1fe3e055f70eac7a5a1a0bf42824308d868289a95737"},
{file = "PyYAML-6.0-cp37-cp37m-win_amd64.whl", hash = "sha256:d15a181d1ecd0d4270dc32edb46f7cb7733c7c508857278d3d378d14d606db2d"},
{file = "PyYAML-6.0-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:0b4624f379dab24d3725ffde76559cff63d9ec94e1736b556dacdfebe5ab6d4b"},
{file = "PyYAML-6.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:213c60cd50106436cc818accf5baa1aba61c0189ff610f64f4a3e8c6726218ba"},
{file = "PyYAML-6.0-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:9fa600030013c4de8165339db93d182b9431076eb98eb40ee068700c9c813e34"},
{file = "PyYAML-6.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:277a0ef2981ca40581a47093e9e2d13b3f1fbbeffae064c1d21bfceba2030287"},
{file = "PyYAML-6.0-cp38-cp38-win32.whl", hash = "sha256:d4eccecf9adf6fbcc6861a38015c2a64f38b9d94838ac1810a9023a0609e1b78"},
{file = "PyYAML-6.0-cp38-cp38-win_amd64.whl", hash = "sha256:1e4747bc279b4f613a09eb64bba2ba602d8a6664c6ce6396a4d0cd413a50ce07"},
{file = "PyYAML-6.0-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:055d937d65826939cb044fc8c9b08889e8c743fdc6a32b33e2390f66013e449b"},
{file = "PyYAML-6.0-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:e61ceaab6f49fb8bdfaa0f92c4b57bcfbea54c09277b1b4f7ac376bfb7a7c174"},
{file = "PyYAML-6.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:d67d839ede4ed1b28a4e8909735fc992a923cdb84e618544973d7dfc71540803"},
{file = "PyYAML-6.0-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:cba8c411ef271aa037d7357a2bc8f9ee8b58b9965831d9e51baf703280dc73d3"},
{file = "PyYAML-6.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:40527857252b61eacd1d9af500c3337ba8deb8fc298940291486c465c8b46ec0"},
{file = "PyYAML-6.0-cp39-cp39-win32.whl", hash = "sha256:b5b9eccad747aabaaffbc6064800670f0c297e52c12754eb1d976c57e4f74dcb"},
{file = "PyYAML-6.0-cp39-cp39-win_amd64.whl", hash = "sha256:b3d267842bf12586ba6c734f89d1f5b871df0273157918b0ccefa29deb05c21c"},
{file = "PyYAML-6.0.tar.gz", hash = "sha256:68fb519c14306fec9720a2a5b45bc9f0c8d1b9c72adf45c37baedfcd949c35a2"},
]
requests = [
{file = "requests-2.27.1-py2.py3-none-any.whl", hash = "sha256:f22fa1e554c9ddfd16e6e41ac79759e17be9e492b3587efa038054674760e72d"},
{file = "requests-2.27.1.tar.gz", hash = "sha256:68d7c56fd5a8999887728ef304a6d12edc7be74f1cfa47714fc8b414525c9a61"},
]
saneyaml = [
{file = "saneyaml-0.5.2-py3-none-any.whl", hash = "sha256:e54ed827973647ee9be8e8c091536b55ad22b3f9b1296e36701a3544822e7eac"},
{file = "saneyaml-0.5.2.tar.gz", hash = "sha256:d6074f1959041342ab41d74a6f904720ffbcf63c94467858e0e22e17e3c43d41"},
]
sgmllib3k = [
{file = "sgmllib3k-1.0.0.tar.gz", hash = "sha256:7868fb1c8bfa764c1ac563d3cf369c381d1325d36124933a726f29fcdaa812e9"},
]

View File

@ -1,9 +1,9 @@
[tool.poetry]
name = "lumbunglib"
name = "konfluks"
version = "0.1.0"
description = "Python lib which powers lumbung[dot]space automation"
authors = ["rra", "decentral1se"]
license = "GPLv3+"
description = "Brings together small and dispersed streams of web content from different applications and websites together in a single large stream."
authors = ["rra", "decentral1se", "knoflook"]
license = "AGPLv3+"
[tool.poetry.dependencies]
python = "^3.9"
@ -16,7 +16,6 @@ peertube = {git = "https://framagit.org/framasoft/peertube/clients/python.git"}
feedparser = "^6.0.8"
bs4 = "^0.0.1"
"Mastodon.py" = "^1.5.1"
saneyaml = "^0.5.2"
[tool.poetry.dev-dependencies]
poetry2setup = "^1.0.0"
@ -26,7 +25,8 @@ requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
[tool.poetry.scripts]
lumbunglib-cal = "lumbunglib.cloudcal:main"
lumbunglib-vid = "lumbunglib.video:main"
lumbunglib-feed = "lumbunglib.feed:main"
lumbunglib-hash = "lumbunglib.hashtag:main"
konfluks-cal = "konfluks.calendars:main"
konfluks-vid = "konfluks.video:main"
konfluks-feed = "konfluks.feed:main"
konfluks-timeline = "konfluks.timeline:main"
konfluks-hash = "konfluks.hashtag:main"

View File

@ -2,10 +2,10 @@
from setuptools import setup
packages = \
['lumbunglib']
['konfluks']
package_data = \
{'': ['*'], 'lumbunglib': ['templates/*']}
{'': ['*'], 'konfluks': ['templates/*']}
install_requires = \
['Jinja2>=3.0.3,<4.0.0',
@ -20,13 +20,14 @@ install_requires = \
'requests>=2.26.0,<3.0.0']
entry_points = \
{'console_scripts': ['lumbunglib-cal = lumbunglib.cloudcal:main',
'lumbunglib-feed = lumbunglib.feed:main',
'lumbunglib-hash = lumbunglib.hashtag:main',
'lumbunglib-vid = lumbunglib.video:main']}
{'console_scripts': ['konfluks-cal = konfluks.calendars:main',
'konfluks-feed = konfluks.feed:main',
'konfluks-timeline = lumbunglib.timeline:main',
'konfluks-hash = konfluks.hashtag:main',
'konfluks-vid = konfluks.video:main']}
setup_kwargs = {
'name': 'lumbunglib',
'name': 'konfluks',
'version': '0.1.0',
'description': 'Python lib which powers lumbung[dot]space automation',
'long_description': None,
@ -44,4 +45,3 @@ setup_kwargs = {
setup(**setup_kwargs)