29 Commits

Author SHA1 Message Date
rra
845a54787b Update 'README.md' 2022-06-02 09:29:20 +02:00
rra
f162bb946a Update 'README.md'
correcting markup / styling
2022-06-02 09:28:37 +02:00
rra
00f795f16d rename project to konfluks for legibility, add docs 2022-06-02 09:23:58 +02:00
rra
b0f77831bd add 'contributors' as metadata category 2022-06-02 06:45:54 +02:00
rra
5ba944b6d1 Merge pull request 'handle feeds with enclosures (featured media / podcasts)' (#35) from r/lumbunglib:master into master
Reviewed-on: ruangrupa/lumbunglib#35
2022-06-01 08:05:36 +02:00
rra
ad591ea9cf add more checks for failures 2022-06-01 05:51:25 +02:00
rra
9c824fcd3f Merge remote-tracking branch 'upstream/master' 2022-05-29 14:45:30 +02:00
rra
cab36c8ac6 add less generic headers 2022-05-29 14:45:11 +02:00
rra
c84a975887 add reason for failure 2022-05-29 12:30:55 +02:00
2ca61c6197 Merge pull request 'accomodate authors as taxonomy' (#34) from r/lumbunglib:master into master
Reviewed-on: ruangrupa/lumbunglib#34
2022-05-27 13:24:32 +02:00
rra
fecf5cd64e add rudimentary support for enclosures & featured images 2022-05-24 15:39:11 +02:00
rra
6e64d64772 only return an author if there is one 2022-05-24 12:19:50 +02:00
rra
3b390d1ecb change template to authors to accomodate author taxonomy 2022-05-24 12:19:50 +02:00
rra
ce3bfc58b0 remove orphaned " 2022-05-24 12:19:50 +02:00
c5af3610a0 Merge pull request 'feed: assign pen category' (#33) from pen-category into master
Reviewed-on: ruangrupa/lumbunglib#33
2022-04-26 08:30:34 +02:00
3ea798b301 feed: assign pen category 2022-04-21 14:17:12 +02:00
7d3863641d Revert "feat: sanitize all yaml"
This reverts commit 2fbc952a72.
2022-04-13 12:48:42 +02:00
f6a1a684c0 Revert "fix: don't escape some characters"
This reverts commit cf8b1ff7e9.
2022-04-13 12:48:20 +02:00
58afd189a7 Revert "feed: move to saneyaml"
This reverts commit a809433410.
2022-04-13 12:48:13 +02:00
19ab610dfc Merge pull request 'feat: sanitize all yaml' (#28) from knoflook/lumbunglib:master into master
Reviewed-on: ruangrupa/lumbunglib#28
2022-04-12 13:44:34 +02:00
a809433410 feed: move to saneyaml 2022-04-12 13:41:34 +02:00
cf8b1ff7e9 fix: don't escape some characters 2022-04-11 13:49:53 +02:00
2fbc952a72 feat: sanitize all yaml 2022-04-11 13:49:53 +02:00
bac9bbd7b3 vid: remove all vids if API down 2022-04-11 13:46:52 +02:00
8c4a36791f autoformatter says change this so i change 2022-04-06 09:44:19 +02:00
dfa4b40d52 more hashtags 2022-04-06 09:44:14 +02:00
rra
0aaa711538 Update 'lumbunglib/hashtag.py'
added extra tags on request
2022-04-03 16:40:22 +02:00
c40f740f50 Merge remote-tracking branch 'rra/master' 2022-03-04 15:39:07 +01:00
f69c092548 feed: escape quotation marks 2022-02-17 16:15:43 +01:00
14 changed files with 246 additions and 99 deletions

View File

@ -1,8 +1,60 @@
# lumbunglib ![Konfluks logo is a stylized and schematic representation of a drainage basin](https://git.autonomic.zone/r/konfluks/raw/branch/konfluks-renaming/docs/konfluks.svg)
> Python lib which powers `lumbung.space` automation # Konfluks
## hacking A drainage basin is a geographical feature that collects all precipitation in an area, first in to smaller streams and finally together in to the large river. Similarly, Konfluks can bring together small and dispersed streams of web content from different applications and websites together in a single large stream.
Specifically, Konfluks turns Peertube videos, iCal calendar events, other websites through their RSS and OPDS feeds and Mastodon posts under a hashtag in to Hugo page bundles. This allows one to publish from diverse sources to a single stream.
Konfluks was first made by Roel Roscam Abbing as part of [lumbung.space](https://lumbung.space), together with ruangrupa and Autonomic.
## Philosophy
Konfluks tries to act as a mirror representation of the input sources. That means that whenever something remote is deleted, changed or becomes unavailable, it is also changed or deleted by Konfluks.
Konfluks tries to preserve intention. That means the above, but also by requiring explicit ways of publishing.
Konfluks works by periodically polling the remote sources, taking care not to duplicate work. It caches files, asks for last-modified headers, and skips things it has already. This makes every poll as fast and as light as possible.
Konfluks is written for clarity, not brevity nor cleverness.
Konfluks is extendable, a work in progress and a messy undertaking.
## High-level overview
Konfluks consists of different Python scripts which each poll a particular service, say, a Peertube server, to download information and convert it in to [Hugo Page Bundles](https://gohugo.io/content-management/page-bundles/)
Each script part of Konfluks will essentially to the following:
* Parse a source and request posts/updates/videos/a feed
* Taking care of publish ques
* Create a Hugo post for each item returned, by:
* Making a folder per post in the `output` directory
* Formatting post metadata as [Hugo Post Frontmatter](https://gohugo.io/content-management/front-matter/) in a file called `index.md`
* Grabbing local copies of media and saving them in the post folder
* Adding the post content to `index.md`
* According to jinja2 templates (see `Konfluks/templates/`)
The page bundles created, where possible, are given human friendly names.
Here is a typical output structure:
```
user@server: ~/Konfluks/output: tree tv/
tv/
├── forum-27an-mother-earth-353f93f3-5fee-49d6-b71d-8aef753f7041
│   ├── 86ccae63-3df9-443c-91f3-edce146055db.jpg
│   └── index.md
├── keroncong-tugu-cafrinho-live-at-ruru-gallery-ruangrupa-jakarta-19-august-2014-e6d5bb2a-d77f-4a00-a449-992a579c8c0d
│   ├── 32291aa2-a391-4219-a413-87521ff373ba.jpg
│   └── index.md
├── lecture-series-1-camp-notes-on-education-8d54d3c9-0322-42af-ab6e-e954d251e076
│   ├── 0f3c835b-42c2-48a3-a2a3-a75ddac8688a.jpg
│   └── index.md
```
## Hacking
Install [poetry](https://python-poetry.org/docs/#osx--linux--bashonwindows-install-instructions): Install [poetry](https://python-poetry.org/docs/#osx--linux--bashonwindows-install-instructions):

31
docs/konfluks.svg Normal file

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 29 KiB

View File

@ -5,6 +5,7 @@ from hashlib import md5
from ast import literal_eval as make_tuple from ast import literal_eval as make_tuple
from pathlib import Path from pathlib import Path
from urllib.parse import urlparse from urllib.parse import urlparse
from re import sub
import arrow import arrow
import feedparser import feedparser
@ -84,6 +85,15 @@ def create_frontmatter(entry):
for t in entry.tags: for t in entry.tags:
tags.append(t['term']) tags.append(t['term'])
if "featured_image" in entry:
featured_image = entry.featured_image
else:
featured_image = ''
card_type = "network"
if entry.feed_name == "pen.lumbung.space":
card_type = "pen"
if "opds" in entry: if "opds" in entry:
frontmatter = { frontmatter = {
'title':entry.title, 'title':entry.title,
@ -104,7 +114,9 @@ def create_frontmatter(entry):
'author': author, 'author': author,
'original_link': entry.link, 'original_link': entry.link,
'feed_name': entry['feed_name'], 'feed_name': entry['feed_name'],
'tags': str(tags) 'tags': str(tags),
'card_type': card_type,
'featured_image': featured_image
} }
return frontmatter return frontmatter
@ -130,11 +142,33 @@ def sanitize_yaml (frontmatter):
return frontmatter return frontmatter
def parse_enclosures(post_dir, entry):
"""
Parses feed enclosures which are featured media
Can be featured image but also podcast entries
https://pythonhosted.org/feedparser/reference-entry-enclosures.html
"""
#TODO parse more than images
#TODO handle the fact it could be multiple items
for e in entry.enclosures:
if "type" in e:
print("found enclosed media", e.type)
if "image/" in e.type:
featured_image = grab_media(post_dir, e.href)
entry["featured_image"] = featured_image
else:
print("FIXME:ignoring enclosed", e.type)
return entry
def create_post(post_dir, entry): def create_post(post_dir, entry):
""" """
write hugo post based on RSS entry write hugo post based on RSS entry
""" """
if "enclosures" in entry:
entry = parse_enclosures(post_dir, entry)
frontmatter = create_frontmatter(entry) frontmatter = create_frontmatter(entry)
if not os.path.exists(post_dir): if not os.path.exists(post_dir):
@ -163,18 +197,25 @@ def grab_media(post_directory, url, prefered_name=None):
""" """
media_item = urlparse(url).path.split('/')[-1] media_item = urlparse(url).path.split('/')[-1]
headers = {
'User-Agent': 'https://git.autonomic.zone/ruangrupa/lumbunglib',
'From': 'info@lumbung.space' # This is another valid field
}
if prefered_name: if prefered_name:
media_item = prefered_name media_item = prefered_name
try: try:
if not os.path.exists(os.path.join(post_directory, media_item)): if not os.path.exists(os.path.join(post_directory, media_item)):
#TODO: stream is true is a conditional so we could check the headers for things, mimetype etc #TODO: stream is true is a conditional so we could check the headers for things, mimetype etc
response = requests.get(url, stream=True) response = requests.get(url, headers=headers, stream=True)
if response.ok: if response.ok:
with open(os.path.join(post_directory, media_item), 'wb') as media_file: with open(os.path.join(post_directory, media_item), 'wb') as media_file:
shutil.copyfileobj(response.raw, media_file) shutil.copyfileobj(response.raw, media_file)
print('Downloaded media item', media_item) print('Downloaded media item', media_item)
return media_item return media_item
else:
print("Download failed", response.status_code)
return url
return media_item return media_item
elif os.path.exists(os.path.join(post_directory, media_item)): elif os.path.exists(os.path.join(post_directory, media_item)):
return media_item return media_item
@ -195,9 +236,10 @@ def parse_posts(post_dir, post_content):
allowed_iframe_sources = ["youtube.com", "vimeo.com", "tv.lumbung.space"] allowed_iframe_sources = ["youtube.com", "vimeo.com", "tv.lumbung.space"]
for img in soup(["img", "object"]): for img in soup(["img", "object"]):
local_image = grab_media(post_dir, img["src"]) if img.get("src") != None:
if img["src"] != local_image: local_image = grab_media(post_dir, img["src"])
img["src"] = local_image if img["src"] != local_image:
img["src"] = local_image
for iframe in soup(["iframe"]): for iframe in soup(["iframe"]):
if not any(source in iframe["src"] for source in allowed_iframe_sources): if not any(source in iframe["src"] for source in allowed_iframe_sources):
@ -228,11 +270,12 @@ def grab_feed(feed_url):
print(e) print(e)
return False return False
print(data.status, feed_url) if "status" in data:
if data.status == 200: print(data.status, feed_url)
# 304 means the feed has not been modified since we last checked if data.status == 200:
write_etag(feed_name, data) # 304 means the feed has not been modified since we last checked
return data write_etag(feed_name, data)
return data
return False return False
def create_opds_post(post_dir, entry): def create_opds_post(post_dir, entry):

View File

@ -19,13 +19,16 @@ hashtags = [
"majelisakbar", "majelisakbar",
"warungkopi", "warungkopi",
"lumbungkios", "lumbungkios",
"kassel_ecosystem",
"ruruhaus",
"offbeatentrack_kassel",
"lumbungofpublishers",
] ]
def login_mastodon_bot(): def login_mastodon_bot():
mastodon = Mastodon( mastodon = Mastodon(
access_token=os.environ.get("MASTODON_AUTH_TOKEN"), access_token=os.environ.get("MASTODON_AUTH_TOKEN"), api_base_url=instance
api_base_url = instance
) )
return mastodon return mastodon
@ -69,9 +72,9 @@ def create_post(post_directory, post_metadata):
template_dir = os.path.join(Path(__file__).parent.resolve(), "templates") template_dir = os.path.join(Path(__file__).parent.resolve(), "templates")
env = jinja2.Environment(loader=jinja2.FileSystemLoader(template_dir)) env = jinja2.Environment(loader=jinja2.FileSystemLoader(template_dir))
name = post_metadata['account']['display_name'] name = post_metadata["account"]["display_name"]
name = sub('"', '\\"', name) name = sub('"', '\\"', name)
post_metadata['account']['display_name'] = name post_metadata["account"]["display_name"] = name
env.filters["localize_media_url"] = localize_media_url env.filters["localize_media_url"] = localize_media_url
env.filters["filter_mastodon_urls"] = filter_mastodon_urls env.filters["filter_mastodon_urls"] = filter_mastodon_urls
@ -136,7 +139,10 @@ def main():
create_post(post_dir, post_metadata) create_post(post_dir, post_metadata)
all_existing_posts.append(str(post_metadata["id"])) all_existing_posts.append(str(post_metadata["id"]))
else: else:
print("not pulling post %s (post is local only)" % (post_metadata["id"])) print(
"not pulling post %s (post is local only)"
% (post_metadata["id"])
)
# if we already have the post do nothing, possibly update # if we already have the post do nothing, possibly update
elif str(post_metadata["id"]) in existing_posts: elif str(post_metadata["id"]) in existing_posts:
@ -145,7 +151,10 @@ def main():
str(post_metadata["id"]) str(post_metadata["id"])
) # create list of posts which have not been returned in the feed ) # create list of posts which have not been returned in the feed
elif str(post_metadata["id"]) in all_existing_posts: elif str(post_metadata["id"]) in all_existing_posts:
print("skipping post %s as it was already pulled with a different hashtag." % (str(post_metadata["id"]))) print(
"skipping post %s as it was already pulled with a different hashtag."
% (str(post_metadata["id"]))
)
for post in existing_posts: for post in existing_posts:
print( print(

View File

@ -0,0 +1,15 @@
---
title: "{{ frontmatter.title }}"
date: "{{ frontmatter.date }}" #2021-06-10T10:46:33+02:00
draft: false
summary: "{{ frontmatter.summary }}"
authors: {% if frontmatter.author %} ["{{ frontmatter.author }}"] {% endif %}
original_link: "{{ frontmatter.original_link }}"
feed_name: "{{ frontmatter.feed_name}}"
categories: ["{{ frontmatter.card_type }}", "{{ frontmatter.feed_name}}"]
contributors: ["{{ frontmatter.feed_name}}"]
tags: {{ frontmatter.tags }}
{% if frontmatter.featured_image %}featured_image: "{{frontmatter.featured_image}}"{% endif %}
---
{{ content }}

View File

@ -0,0 +1,17 @@
---
date: {{ post_metadata.created_at }} #2021-06-10T10:46:33+02:00
draft: false
authors: ["{{ post_metadata.account.display_name }}"]
contributors: ["{{ post_metadata.account.acct}}"]
avatar: {{ post_metadata.account.avatar }}
categories: ["shouts"]
images: [{% for i in post_metadata.media_attachments %} {{ i.url }}, {% endfor %}]
title: {{ post_metadata.account.display_name }}
tags: [{% for i in post_metadata.tags %} "{{ i.name }}", {% endfor %}]
---
{% for item in post_metadata.media_attachments %}
<img src="{{item.url | localize_media_url }}" alt="{{item.description}}">
{% endfor %}
{{ post_metadata.content | filter_mastodon_urls }}

View File

@ -6,6 +6,7 @@ uuid: "{{v.uuid}}"
video_duration: "{{ v.duration | duration }} " video_duration: "{{ v.duration | duration }} "
video_channel: "{{ v.channel.display_name }}" video_channel: "{{ v.channel.display_name }}"
channel_url: "{{ v.channel.url }}" channel_url: "{{ v.channel.url }}"
contributors: ["{{ v.account.display_name }}"]
preview_image: "{{ preview_image }}" preview_image: "{{ preview_image }}"
images: ["./{{ preview_image }}"] images: ["./{{ preview_image }}"]
categories: ["tv","{{ v.channel.display_name }}"] categories: ["tv","{{ v.channel.display_name }}"]

View File

@ -102,52 +102,60 @@ def main():
v = peertube.VideoApi(client) v = peertube.VideoApi(client)
count = 100 count = 100
page = 0 page = 0
response = v.videos_get(count=count, filter="local", tags_one_of="publish", start=page) try:
response = v.videos_get(count=count, filter="local", tags_one_of="publish", start=page)
videos = response.to_dict() videos = response.to_dict()
total = videos['total'] total = videos['total']
videos = videos['data'] videos = videos['data']
total -= count total -= count
if total > 0: if total > 0:
to_download = total // count to_download = total // count
last_page = total % count last_page = total % count
for i in range(to_download): for i in range(to_download):
page += 1 page += 1
response = v.videos_get(count=count, filter="local", tags_one_of="publish", start=page) response = v.videos_get(count=count, filter="local", tags_one_of="publish", start=page)
videos += response.to_dict()['data'] videos += response.to_dict()['data']
if last_page > 0: if last_page > 0:
page += 1 page += 1
response = v.videos_get(count=count, filter="local", tags_one_of="publish", start=page) response = v.videos_get(count=count, filter="local", tags_one_of="publish", start=page)
videos += response.to_dict()['data'][-1*last_page:] videos += response.to_dict()['data'][-1*last_page:]
output_dir = os.environ.get("OUTPUT_DIR") output_dir = os.environ.get("OUTPUT_DIR")
if not os.path.exists(output_dir): if not os.path.exists(output_dir):
os.mkdir(output_dir) os.mkdir(output_dir)
existing_posts = os.listdir(output_dir) existing_posts = os.listdir(output_dir)
for video_metadata in videos: for video_metadata in videos:
post_name = slugify(video_metadata["name"]) + "-" + video_metadata["uuid"] post_name = slugify(video_metadata["name"]) + "-" + video_metadata["uuid"]
post_dir = os.path.join(output_dir, post_name) post_dir = os.path.join(output_dir, post_name)
if ( if (
post_name not in existing_posts post_name not in existing_posts
): # if there is a video we dont already have, make it ): # if there is a video we dont already have, make it
print( print(
"New: ", video_metadata["name"], "({})".format(video_metadata["uuid"]) "New: ", video_metadata["name"], "({})".format(video_metadata["uuid"])
) )
create_post(post_dir, video_metadata, host) create_post(post_dir, video_metadata, host)
elif ( elif (
post_name in existing_posts post_name in existing_posts
): # if we already have the video do nothing, possibly update ): # if we already have the video do nothing, possibly update
update_post(post_dir, video_metadata, host) update_post(post_dir, video_metadata, host)
existing_posts.remove( existing_posts.remove(
post_name post_name
) # create list of posts which have not been returned by peertube ) # create list of posts which have not been returned by peertube
except:
print("didn't get a response from peertube, instance might have been taken down or made private. removing all posts.")
output_dir = os.environ.get("OUTPUT_DIR")
if not os.path.exists(output_dir):
os.mkdir(output_dir)
existing_posts = os.listdir(output_dir)
for post in existing_posts: for post in existing_posts:
print("deleted", post) # rm posts not returned print("deleted", post) # rm posts not returned
shutil.rmtree(os.path.join(output_dir, post)) shutil.rmtree(os.path.join(output_dir, post))

View File

@ -1,13 +0,0 @@
---
title: "{{ frontmatter.title }}"
date: "{{ frontmatter.date }}" #2021-06-10T10:46:33+02:00
draft: false
summary: "{{ frontmatter.summary }}"
author: "{{ frontmatter.author }}"
original_link: "{{ frontmatter.original_link }}"
feed_name: "{{ frontmatter.feed_name}}"
categories: ["network", "{{ frontmatter.feed_name}}"]
tags: {{ frontmatter.tags }}
---
{{ content }}

View File

@ -1,16 +0,0 @@
---
date: "{{ post_metadata.created_at }}" #2021-06-10T10:46:33+02:00
draft: false
author: "{{ post_metadata.account.display_name }}"
avatar: "{{ post_metadata.account.avatar }}"
categories: ["shouts"]
images: [{% for i in post_metadata.media_attachments %} "{{ i.url }}", {% endfor %}]
title: "{{ post_metadata.account.display_name }}"
tags: [{% for i in post_metadata.tags %} "{{ i.name }}", {% endfor %}]
---
{% for item in post_metadata.media_attachments %}
<img src="{{item.url | localize_media_url }}" alt="{{item.description}}">
{% endfor %}
{{ post_metadata.content | filter_mastodon_urls }}

View File

@ -1,9 +1,9 @@
[tool.poetry] [tool.poetry]
name = "lumbunglib" name = "konfluks"
version = "0.1.0" version = "0.1.0"
description = "Python lib which powers lumbung[dot]space automation" description = "Python lib which powers lumbung[dot]space automation"
authors = ["rra", "decentral1se"] authors = ["rra", "decentral1se"]
license = "GPLv3+" license = "AGPLv3+"
[tool.poetry.dependencies] [tool.poetry.dependencies]
python = "^3.9" python = "^3.9"
@ -25,7 +25,7 @@ requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api" build-backend = "poetry.core.masonry.api"
[tool.poetry.scripts] [tool.poetry.scripts]
lumbunglib-cal = "lumbunglib.cloudcal:main" konfluks-cal = "konfluks.cloudcal:main"
lumbunglib-vid = "lumbunglib.video:main" konfluks-vid = "konfluks.video:main"
lumbunglib-feed = "lumbunglib.feed:main" konfluks-feed = "konfluks.feed:main"
lumbunglib-hash = "lumbunglib.hashtag:main" konfluks-hash = "konfluks.hashtag:main"

View File

@ -2,10 +2,10 @@
from setuptools import setup from setuptools import setup
packages = \ packages = \
['lumbunglib'] ['konfluks']
package_data = \ package_data = \
{'': ['*'], 'lumbunglib': ['templates/*']} {'': ['*'], 'konfluks': ['templates/*']}
install_requires = \ install_requires = \
['Jinja2>=3.0.3,<4.0.0', ['Jinja2>=3.0.3,<4.0.0',
@ -20,13 +20,13 @@ install_requires = \
'requests>=2.26.0,<3.0.0'] 'requests>=2.26.0,<3.0.0']
entry_points = \ entry_points = \
{'console_scripts': ['lumbunglib-cal = lumbunglib.cloudcal:main', {'console_scripts': ['konfluks-cal = konfluks.cloudcal:main',
'lumbunglib-feed = lumbunglib.feed:main', 'konfluks-feed = konfluks.feed:main',
'lumbunglib-hash = lumbunglib.hashtag:main', 'konfluks-hash = konfluks.hashtag:main',
'lumbunglib-vid = lumbunglib.video:main']} 'konfluks-vid = konfluks.video:main']}
setup_kwargs = { setup_kwargs = {
'name': 'lumbunglib', 'name': 'konfluks',
'version': '0.1.0', 'version': '0.1.0',
'description': 'Python lib which powers lumbung[dot]space automation', 'description': 'Python lib which powers lumbung[dot]space automation',
'long_description': None, 'long_description': None,