Update 'README.md'

correcting markup / styling
2022-06-02 09:29:20 +02:00 · 2022-06-02 09:28:37 +02:00 · 2022-06-02 09:23:58 +02:00 · 2022-06-02 06:45:54 +02:00 · 2022-06-01 08:05:36 +02:00 · 2022-06-01 05:51:25 +02:00
14 changed files with 246 additions and 99 deletions
--- a/README.md
+++ b/README.md
@ -1,8 +1,60 @@
-# lumbunglib
+![Konfluks logo is a stylized and schematic representation of a drainage basin](https://git.autonomic.zone/r/konfluks/raw/branch/konfluks-renaming/docs/konfluks.svg)
-> Python lib which powers `lumbung.space` automation
+# Konfluks
-## hacking
+A drainage basin is a geographical feature that collects all precipitation in an area, first in to smaller streams and finally together in to the large river. Similarly, Konfluks can bring together small and dispersed streams of web content from different applications and websites together in a single large stream. 
 Specifically, Konfluks turns Peertube videos, iCal calendar events, other websites through their RSS and OPDS feeds and Mastodon posts under a hashtag in to Hugo page bundles. This allows one to publish from diverse sources to a single stream. 
 Konfluks was first made by Roel Roscam Abbing as part of [lumbung.space](https://lumbung.space), together with ruangrupa and Autonomic.
 ## Philosophy
 Konfluks tries to act as a mirror representation of the input sources. That means that whenever something remote is deleted, changed or becomes unavailable, it is also changed or deleted by Konfluks.
 Konfluks tries to preserve intention. That means the above, but also by requiring explicit ways of publishing.
 Konfluks works by periodically polling the remote sources, taking care not to duplicate work. It caches files, asks for last-modified headers, and skips things it has already. This makes every poll as fast and as light as possible.
 Konfluks is written for clarity, not brevity nor cleverness.
 Konfluks is extendable, a work in progress and a messy undertaking.
 ## High-level overview
 Konfluks consists of different Python scripts which each poll a particular service, say, a Peertube server, to download information and convert it in to [Hugo Page Bundles](https://gohugo.io/content-management/page-bundles/)
 Each script part of Konfluks will essentially to the following:
 * Parse a source and request posts/updates/videos/a feed
  * Taking care of publish ques
 * Create a Hugo post for each item returned, by:
  * Making a folder per post in the `output` directory
  * Formatting post metadata as [Hugo Post Frontmatter](https://gohugo.io/content-management/front-matter/) in a file called `index.md`
  * Grabbing local copies of media and saving them in the post folder
  * Adding the post content to `index.md`
  	* According to jinja2 templates (see `Konfluks/templates/`)
 The page bundles created, where possible, are given human friendly names.
 Here is a typical output structure:
 ```
  user@server: ~/Konfluks/output: tree tv/
  tv/
  ├── forum-27an-mother-earth-353f93f3-5fee-49d6-b71d-8aef753f7041
  │   ├── 86ccae63-3df9-443c-91f3-edce146055db.jpg
  │   └── index.md
  ├── keroncong-tugu-cafrinho-live-at-ruru-gallery-ruangrupa-jakarta-19-august-2014-e6d5bb2a-d77f-4a00-a449-992a579c8c0d
  │   ├── 32291aa2-a391-4219-a413-87521ff373ba.jpg
  │   └── index.md
  ├── lecture-series-1-camp-notes-on-education-8d54d3c9-0322-42af-ab6e-e954d251e076
  │   ├── 0f3c835b-42c2-48a3-a2a3-a75ddac8688a.jpg
  │   └── index.md
 ```
 ## Hacking
 Install [poetry](https://python-poetry.org/docs/#osx--linux--bashonwindows-install-instructions):
--- a/docs/konfluks.svg
+++ b/docs/konfluks.svg
--- a/lumbunglib/cloudcal.py
+++ b/lumbunglib/cloudcal.py
--- a/lumbunglib/feed.py
+++ b/lumbunglib/feed.py
@ -5,6 +5,7 @@ from hashlib import md5
 from ast import literal_eval as make_tuple
 from pathlib import Path
 from urllib.parse import urlparse
 from re import sub
 import arrow
 import feedparser
@ -84,6 +85,15 @@ def create_frontmatter(entry):
        for t in entry.tags:
            tags.append(t['term'])
    if "featured_image" in entry:
        featured_image = entry.featured_image
    else:
        featured_image = ''
    card_type = "network"
    if entry.feed_name == "pen.lumbung.space":
        card_type = "pen"
    if "opds" in entry:
        frontmatter = {
        'title':entry.title,
@ -104,7 +114,9 @@ def create_frontmatter(entry):
        'author': author,
        'original_link': entry.link,
        'feed_name': entry['feed_name'],
-        'tags': str(tags)
+        'tags': str(tags),
        'card_type': card_type,
        'featured_image': featured_image
        }
    return frontmatter
@ -130,11 +142,33 @@ def sanitize_yaml (frontmatter):
    return frontmatter
 def parse_enclosures(post_dir, entry):
    """
    Parses feed enclosures which are featured media
    Can be featured image but also podcast entries
    https://pythonhosted.org/feedparser/reference-entry-enclosures.html
    """
    #TODO parse more than images
    #TODO handle the fact it could be multiple items
    for e in entry.enclosures:
        if "type" in e:
            print("found enclosed media", e.type)
            if "image/" in e.type:
                featured_image = grab_media(post_dir, e.href)
                entry["featured_image"] = featured_image
            else:
                print("FIXME:ignoring enclosed", e.type)
    return entry
 def create_post(post_dir, entry):
    """
    write hugo post based on RSS entry
    """
    if "enclosures" in entry:
        entry = parse_enclosures(post_dir, entry)
    frontmatter = create_frontmatter(entry)
    if not os.path.exists(post_dir):
@ -163,18 +197,25 @@ def grab_media(post_directory, url, prefered_name=None):
    """
    media_item = urlparse(url).path.split('/')[-1]
    headers = {
    'User-Agent': 'https://git.autonomic.zone/ruangrupa/lumbunglib',
    'From': 'info@lumbung.space'  # This is another valid field
    }
    if prefered_name:
        media_item = prefered_name
    try:
        if not os.path.exists(os.path.join(post_directory, media_item)):
            #TODO: stream is true is a conditional so we could check the headers for things, mimetype etc
-            response = requests.get(url, stream=True)
+            response = requests.get(url, headers=headers, stream=True)
            if response.ok:
                with open(os.path.join(post_directory, media_item), 'wb') as media_file:
                    shutil.copyfileobj(response.raw, media_file)
                    print('Downloaded media item', media_item)
                    return media_item
            else:
                print("Download failed", response.status_code)
                return url
            return media_item
        elif os.path.exists(os.path.join(post_directory, media_item)):
            return media_item
@ -195,9 +236,10 @@ def parse_posts(post_dir, post_content):
    allowed_iframe_sources = ["youtube.com", "vimeo.com", "tv.lumbung.space"]
    for img in soup(["img", "object"]):
-        local_image = grab_media(post_dir, img["src"])
+        if img.get("src") != None:
-        if img["src"] != local_image:
+            local_image = grab_media(post_dir, img["src"])
-            img["src"] = local_image
+            if img["src"] != local_image:
                img["src"] = local_image
    for iframe in soup(["iframe"]):
        if not any(source in iframe["src"] for source in allowed_iframe_sources):
@ -228,11 +270,12 @@ def grab_feed(feed_url):
        print(e)
        return False
-    print(data.status, feed_url)
+    if "status" in data:
-    if data.status == 200:
+        print(data.status, feed_url)
-        # 304 means the feed has not been modified since we last checked
+        if data.status == 200:
-        write_etag(feed_name, data)
+            # 304 means the feed has not been modified since we last checked
-        return data
+            write_etag(feed_name, data)
            return data
    return False
 def create_opds_post(post_dir, entry):
--- a/lumbunglib/hashtag.py
+++ b/lumbunglib/hashtag.py
@ -19,13 +19,16 @@ hashtags = [
    "majelisakbar",
    "warungkopi",
    "lumbungkios",
    "kassel_ecosystem",
    "ruruhaus",
    "offbeatentrack_kassel",
    "lumbungofpublishers",
 ]
 def login_mastodon_bot():
    mastodon = Mastodon(
-        access_token=os.environ.get("MASTODON_AUTH_TOKEN"),
+        access_token=os.environ.get("MASTODON_AUTH_TOKEN"), api_base_url=instance
        api_base_url = instance
    )
    return mastodon
@ -69,9 +72,9 @@ def create_post(post_directory, post_metadata):
    template_dir = os.path.join(Path(__file__).parent.resolve(), "templates")
    env = jinja2.Environment(loader=jinja2.FileSystemLoader(template_dir))
-    name = post_metadata['account']['display_name']
+    name = post_metadata["account"]["display_name"]
    name = sub('"', '\\"', name)
-    post_metadata['account']['display_name'] = name
+    post_metadata["account"]["display_name"] = name
    env.filters["localize_media_url"] = localize_media_url
    env.filters["filter_mastodon_urls"] = filter_mastodon_urls
@ -136,7 +139,10 @@ def main():
                    create_post(post_dir, post_metadata)
                    all_existing_posts.append(str(post_metadata["id"]))
                else:
-                    print("not pulling post %s (post is local only)" % (post_metadata["id"]))
+                    print(
                        "not pulling post %s (post is local only)"
                        % (post_metadata["id"])
                    )
            # if we already have the post do nothing, possibly update
            elif str(post_metadata["id"]) in existing_posts:
@ -145,7 +151,10 @@ def main():
                    str(post_metadata["id"])
                )  # create list of posts which have not been returned in the feed
            elif str(post_metadata["id"]) in all_existing_posts:
-                print("skipping post %s as it was already pulled with a different hashtag." % (str(post_metadata["id"])))
+                print(
                    "skipping post %s as it was already pulled with a different hashtag."
                    % (str(post_metadata["id"]))
                )
        for post in existing_posts:
            print(
--- a/lumbunglib/templates/calendar.md
+++ b/lumbunglib/templates/calendar.md
--- a/konfluks/templates/feed.md
+++ b/konfluks/templates/feed.md
@ -0,0 +1,15 @@
 ---
 title: "{{ frontmatter.title }}"
 date: "{{ frontmatter.date }}" #2021-06-10T10:46:33+02:00
 draft: false
 summary: "{{ frontmatter.summary }}"
 authors: {% if frontmatter.author %} ["{{ frontmatter.author }}"] {% endif %}
 original_link: "{{ frontmatter.original_link }}"
 feed_name: "{{ frontmatter.feed_name}}"
 categories: ["{{ frontmatter.card_type }}", "{{ frontmatter.feed_name}}"]
 contributors: ["{{ frontmatter.feed_name}}"]
 tags: {{ frontmatter.tags }}
 {% if frontmatter.featured_image %}featured_image: "{{frontmatter.featured_image}}"{% endif %}
 ---
 {{ content }}
--- a/konfluks/templates/hashtag.md
+++ b/konfluks/templates/hashtag.md
@ -0,0 +1,17 @@
 ---
 date: {{ post_metadata.created_at }} #2021-06-10T10:46:33+02:00
 draft: false
 authors: ["{{ post_metadata.account.display_name }}"]
 contributors: ["{{ post_metadata.account.acct}}"]
 avatar: {{ post_metadata.account.avatar }}
 categories: ["shouts"]
 images: [{% for i in post_metadata.media_attachments %} {{ i.url }}, {% endfor %}]
 title: {{ post_metadata.account.display_name }}
 tags: [{% for i in post_metadata.tags %} "{{ i.name }}", {% endfor %}]
 ---
 {% for item in post_metadata.media_attachments %}
 <img src="{{item.url | localize_media_url }}" alt="{{item.description}}">
 {% endfor %}
 {{ post_metadata.content | filter_mastodon_urls }}
--- a/lumbunglib/templates/video.md
+++ b/lumbunglib/templates/video.md
@ -6,6 +6,7 @@ uuid: "{{v.uuid}}"
 video_duration: "{{ v.duration | duration }} "
 video_channel: "{{ v.channel.display_name }}"
 channel_url: "{{ v.channel.url }}"
 contributors: ["{{ v.account.display_name }}"]
 preview_image: "{{ preview_image }}"
 images: ["./{{ preview_image }}"]
 categories: ["tv","{{ v.channel.display_name }}"]
--- a/lumbunglib/video.py
+++ b/lumbunglib/video.py
@ -102,52 +102,60 @@ def main():
    v = peertube.VideoApi(client)
    count = 100
    page = 0
-    response = v.videos_get(count=count, filter="local", tags_one_of="publish", start=page)
+    try:
-
+        response = v.videos_get(count=count, filter="local", tags_one_of="publish", start=page)
-    videos = response.to_dict()
+        videos = response.to_dict()
-    total = videos['total']
+        total = videos['total']
-    videos = videos['data']
+        videos = videos['data']
-    total -= count
+        total -= count
-    if total > 0:
+        if total > 0:
-        to_download = total // count
+            to_download = total // count
-        last_page = total % count
+            last_page = total % count
-        for i in range(to_download):
+            for i in range(to_download):
-            page += 1
+                page += 1
-            response = v.videos_get(count=count, filter="local", tags_one_of="publish", start=page)
+                response = v.videos_get(count=count, filter="local", tags_one_of="publish", start=page)
-            videos += response.to_dict()['data']
+                videos += response.to_dict()['data']
-        if last_page > 0:
+            if last_page > 0:
-            page += 1
+                page += 1
-            response = v.videos_get(count=count, filter="local", tags_one_of="publish", start=page)
+                response = v.videos_get(count=count, filter="local", tags_one_of="publish", start=page)
-            videos += response.to_dict()['data'][-1*last_page:]
+                videos += response.to_dict()['data'][-1*last_page:]
-    output_dir = os.environ.get("OUTPUT_DIR")
+        output_dir = os.environ.get("OUTPUT_DIR")
-    if not os.path.exists(output_dir):
+        if not os.path.exists(output_dir):
-        os.mkdir(output_dir)
+            os.mkdir(output_dir)
-    existing_posts = os.listdir(output_dir)
+        existing_posts = os.listdir(output_dir)
-    for video_metadata in videos:
+        for video_metadata in videos:
-        post_name = slugify(video_metadata["name"]) + "-" + video_metadata["uuid"]
+            post_name = slugify(video_metadata["name"]) + "-" + video_metadata["uuid"]
-        post_dir = os.path.join(output_dir, post_name)
+            post_dir = os.path.join(output_dir, post_name)
-        if (
+            if (
-            post_name not in existing_posts
+                post_name not in existing_posts
-        ):  # if there is a video we dont already have, make it
+            ):  # if there is a video we dont already have, make it
-            print(
+                print(
-                "New: ", video_metadata["name"], "({})".format(video_metadata["uuid"])
+                    "New: ", video_metadata["name"], "({})".format(video_metadata["uuid"])
-            )
+                )
-            create_post(post_dir, video_metadata, host)
+                create_post(post_dir, video_metadata, host)
-        elif (
+            elif (
-            post_name in existing_posts
+                post_name in existing_posts
-        ):  # if we already have the video do nothing, possibly update
+            ):  # if we already have the video do nothing, possibly update
-            update_post(post_dir, video_metadata, host)
+                update_post(post_dir, video_metadata, host)
-            existing_posts.remove(
+                existing_posts.remove(
-                post_name
+                    post_name
-            )  # create list of posts which have not been returned by peertube
+                )  # create list of posts which have not been returned by peertube
    except:
        print("didn't get a response from peertube, instance might have been taken down or made private. removing all posts.")
        output_dir = os.environ.get("OUTPUT_DIR")
        if not os.path.exists(output_dir):
            os.mkdir(output_dir)
        existing_posts = os.listdir(output_dir)
    for post in existing_posts:
        print("deleted", post)  # rm posts not returned
        shutil.rmtree(os.path.join(output_dir, post))
--- a/lumbunglib/templates/feed.md
+++ b/lumbunglib/templates/feed.md
@ -1,13 +0,0 @@
 ---
 title: "{{ frontmatter.title }}"
 date: "{{ frontmatter.date }}" #2021-06-10T10:46:33+02:00
 draft: false
 summary: "{{ frontmatter.summary }}"
 author: "{{ frontmatter.author }}"
 original_link: "{{ frontmatter.original_link }}"
 feed_name: "{{ frontmatter.feed_name}}"
 categories: ["network", "{{ frontmatter.feed_name}}"]
 tags: {{ frontmatter.tags }}
 ---
 {{ content }}
--- a/lumbunglib/templates/hashtag.md
+++ b/lumbunglib/templates/hashtag.md
@ -1,16 +0,0 @@
 ---
 date: "{{ post_metadata.created_at }}" #2021-06-10T10:46:33+02:00
 draft: false
 author: "{{ post_metadata.account.display_name }}"
 avatar: "{{ post_metadata.account.avatar }}"
 categories: ["shouts"]
 images: [{% for i in post_metadata.media_attachments %} "{{ i.url }}", {% endfor %}]
 title: "{{ post_metadata.account.display_name }}"
 tags: [{% for i in post_metadata.tags %} "{{ i.name }}", {% endfor %}]
 ---
 {% for item in post_metadata.media_attachments %}
 <img src="{{item.url | localize_media_url }}" alt="{{item.description}}">
 {% endfor %}
 {{ post_metadata.content | filter_mastodon_urls }}
--- a/pyproject.toml
+++ b/pyproject.toml
@ -1,9 +1,9 @@
 [tool.poetry]
-name = "lumbunglib"
+name = "konfluks"
 version = "0.1.0"
 description = "Python lib which powers lumbung[dot]space automation"
 authors = ["rra", "decentral1se"]
-license = "GPLv3+"
+license = "AGPLv3+"
 [tool.poetry.dependencies]
 python = "^3.9"
@ -25,7 +25,7 @@ requires = ["poetry-core>=1.0.0"]
 build-backend = "poetry.core.masonry.api"
 [tool.poetry.scripts]
-lumbunglib-cal = "lumbunglib.cloudcal:main"
+konfluks-cal = "konfluks.cloudcal:main"
-lumbunglib-vid = "lumbunglib.video:main"
+konfluks-vid = "konfluks.video:main"
-lumbunglib-feed = "lumbunglib.feed:main"
+konfluks-feed = "konfluks.feed:main"
-lumbunglib-hash = "lumbunglib.hashtag:main"
+konfluks-hash = "konfluks.hashtag:main"
--- a/setup.py
+++ b/setup.py
@ -2,10 +2,10 @@
 from setuptools import setup
 packages = \
-['lumbunglib']
+['konfluks']
 package_data = \
-{'': ['*'], 'lumbunglib': ['templates/*']}
+{'': ['*'], 'konfluks': ['templates/*']}
 install_requires = \
 ['Jinja2>=3.0.3,<4.0.0',
@ -20,13 +20,13 @@ install_requires = \
 'requests>=2.26.0,<3.0.0']
 entry_points = \
-{'console_scripts': ['lumbunglib-cal = lumbunglib.cloudcal:main',
+{'console_scripts': ['konfluks-cal = konfluks.cloudcal:main',
-                     'lumbunglib-feed = lumbunglib.feed:main',
+                     'konfluks-feed = konfluks.feed:main',
-                     'lumbunglib-hash = lumbunglib.hashtag:main',
+                     'konfluks-hash = konfluks.hashtag:main',
-                     'lumbunglib-vid = lumbunglib.video:main']}
+                     'konfluks-vid = konfluks.video:main']}
 setup_kwargs = {
-    'name': 'lumbunglib',
+    'name': 'konfluks',
    'version': '0.1.0',
    'description': 'Python lib which powers lumbung[dot]space automation',
    'long_description': None,
Author	SHA1	Message	Date
rra	845a54787b	Update 'README.md'	2022-06-02 09:29:20 +02:00
rra	f162bb946a	Update 'README.md' correcting markup / styling	2022-06-02 09:28:37 +02:00
rra	00f795f16d	rename project to konfluks for legibility, add docs	2022-06-02 09:23:58 +02:00
rra	b0f77831bd	add 'contributors' as metadata category	2022-06-02 06:45:54 +02:00
rra	5ba944b6d1	Merge pull request 'handle feeds with enclosures (featured media / podcasts)' (#35 ) from r/lumbunglib:master into master Reviewed-on: ruangrupa/lumbunglib#35	2022-06-01 08:05:36 +02:00
rra	ad591ea9cf	add more checks for failures	2022-06-01 05:51:25 +02:00
rra	9c824fcd3f	Merge remote-tracking branch 'upstream/master'	2022-05-29 14:45:30 +02:00
rra	cab36c8ac6	add less generic headers	2022-05-29 14:45:11 +02:00
rra	c84a975887	add reason for failure	2022-05-29 12:30:55 +02:00
Aadil Ayub	2ca61c6197	Merge pull request 'accomodate authors as taxonomy' (#34 ) from r/lumbunglib:master into master Reviewed-on: ruangrupa/lumbunglib#34	2022-05-27 13:24:32 +02:00
rra	fecf5cd64e	add rudimentary support for enclosures & featured images	2022-05-24 15:39:11 +02:00
rra	6e64d64772	only return an author if there is one	2022-05-24 12:19:50 +02:00
rra	3b390d1ecb	change template to authors to accomodate author taxonomy	2022-05-24 12:19:50 +02:00
rra	ce3bfc58b0	remove orphaned "	2022-05-24 12:19:50 +02:00
decentral1se	c5af3610a0	Merge pull request 'feed: assign pen category' (#33 ) from pen-category into master Reviewed-on: ruangrupa/lumbunglib#33	2022-04-26 08:30:34 +02:00
knoflook	3ea798b301	feed: assign pen category	2022-04-21 14:17:12 +02:00
decentral1se	7d3863641d	Revert "feat: sanitize all yaml" This reverts commit `2fbc952a72`.	2022-04-13 12:48:42 +02:00
decentral1se	f6a1a684c0	Revert "fix: don't escape some characters" This reverts commit `cf8b1ff7e9`.	2022-04-13 12:48:20 +02:00
decentral1se	58afd189a7	Revert "feed: move to saneyaml" This reverts commit `a809433410`.	2022-04-13 12:48:13 +02:00
knoflook	19ab610dfc	Merge pull request 'feat: sanitize all yaml' (#28 ) from knoflook/lumbunglib:master into master Reviewed-on: ruangrupa/lumbunglib#28	2022-04-12 13:44:34 +02:00
knoflook	a809433410	feed: move to saneyaml	2022-04-12 13:41:34 +02:00
knoflook	cf8b1ff7e9	fix: don't escape some characters	2022-04-11 13:49:53 +02:00
knoflook	2fbc952a72	feat: sanitize all yaml	2022-04-11 13:49:53 +02:00
knoflook	bac9bbd7b3	vid: remove all vids if API down	2022-04-11 13:46:52 +02:00
decentral1se	8c4a36791f	autoformatter says change this so i change	2022-04-06 09:44:19 +02:00
decentral1se	dfa4b40d52	more hashtags	2022-04-06 09:44:14 +02:00
rra	0aaa711538	Update 'lumbunglib/hashtag.py' added extra tags on request	2022-04-03 16:40:22 +02:00
knoflook	c40f740f50	Merge remote-tracking branch 'rra/master'	2022-03-04 15:39:07 +01:00
knoflook	f69c092548	feed: escape quotation marks	2022-02-17 16:15:43 +01:00