fix issue where posts with enclosures would not download files because of missing dir

Merge branch 'new-taxonomy'
do proper deletion
2024-11-20 11:25:26 +01:00 · 2024-11-07 16:35:18 +01:00 · 2024-11-06 17:24:55 +01:00 · 2024-11-06 16:48:41 +01:00 · 2022-12-02 16:59:49 +01:00 · 2022-09-15 18:37:25 +02:00
16 changed files with 633 additions and 149 deletions
--- a/README.md
+++ b/README.md
@ -1,8 +1,60 @@
-# lumbunglib
+![Konfluks logo is a stylized and schematic representation of a drainage basin](./konfluks.svg)

-> Python lib which powers `lumbung.space` automation
+# Konfluks

-## hacking
+A drainage basin is a geographical feature that collects all precipitation in an area, first in to smaller streams and finally together in to the large river. Similarly, Konfluks can bring together small and dispersed streams of web content from different applications and websites together in a single large stream.
+
+Specifically, Konfluks turns Peertube videos, iCal calendar events, other websites through their RSS and OPDS feeds and Mastodon posts under a hashtag in to Hugo page bundles. This allows one to publish from diverse sources to a single stream.
+
+Konfluks was first made by [Roel Roscam Abbing](https://test.roelof.info/) as part of [lumbung.space](https://lumbung.space), together with [ruangrupa](https://ruangrupa.id) and [Autonomic](https://autonomic.zone).
+
+## Philosophy
+
+Konfluks tries to act as a mirror representation of the input sources. That means that whenever something remote is deleted, changed or becomes unavailable, it is also changed or deleted by Konfluks.
+
+Konfluks tries to preserve intention. That means the above, but also by requiring explicit ways of publishing.
+
+Konfluks works by periodically polling the remote sources, taking care not to duplicate work. It caches files, asks for last-modified headers, and skips things it has already. This makes every poll as fast and as light as possible.
+
+Konfluks is written for clarity, not brevity nor cleverness.
+
+Konfluks is extendable, a work in progress and a messy undertaking.
+
+## High-level overview
+
+Konfluks consists of different Python scripts which each poll a particular service, say, a [Peertube](https://joinpeertube.org) server, to download information and convert it in to [Hugo Page Bundles](https://gohugo.io/content-management/page-bundles/)
+
+Each script part of Konfluks will essentially to the following:
+
+* Parse a source and request posts/updates/videos/a feed
+  * Taking care of publish ques
+
+* Create a Hugo post for each item returned, by:
+  * Making a folder per post in the `output` directory
+  * Formatting post metadata as [Hugo Post Frontmatter](https://gohugo.io/content-management/front-matter/) in a file called `index.md`
+  * Grabbing local copies of media and saving them in the post folder
+  * Adding the post content to `index.md`
+  * According to jinja2 templates (see `konfluks/templates/`)
+
+The page bundles created, where possible, are given human friendly names.
+
+Here is a typical output structure:
+
+```
+  user@server: ~/konfluks/output: tree tv/
+  tv/
+  ├── forum-27an-mother-earth-353f93f3-5fee-49d6-b71d-8aef753f7041
+  │   ├── 86ccae63-3df9-443c-91f3-edce146055db.jpg
+  │   └── index.md
+  ├── keroncong-tugu-cafrinho-live-at-ruru-gallery-ruangrupa-jakarta-19-august-2014-e6d5bb2a-d77f-4a00-a449-992a579c8c0d
+  │   ├── 32291aa2-a391-4219-a413-87521ff373ba.jpg
+  │   └── index.md
+  ├── lecture-series-1-camp-notes-on-education-8d54d3c9-0322-42af-ab6e-e954d251e076
+  │   ├── 0f3c835b-42c2-48a3-a2a3-a75ddac8688a.jpg
+  │   └── index.md
+```
+
+## Hacking

 Install [poetry](https://python-poetry.org/docs/#osx--linux--bashonwindows-install-instructions):

@ -10,31 +62,20 @@ Install [poetry](https://python-poetry.org/docs/#osx--linux--bashonwindows-insta
 curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python -
 ```

-We use Poetry because it locks the dependencies all the way down and makes it
-easier to manage installation & maintenance in the long-term. Then install the
-dependencies & have them managed by Poetry:
+We use Poetry because it locks the dependencies all the way down and makes it easier to manage installation & maintenance in the long-term. Then install the dependencies & have them managed by Poetry:

 ```
 poetry install
 ```

-Each script requires some environment variables to run, you can see the latest
-deployment configuration over
-[here](https://git.autonomic.zone/ruangrupa/lumbung.space/src/branch/main/compose.yml),
-look for the values under the `environment: ...` stanza.
+Each script requires some environment variables to run, you can see the latest deployment configuration over [here](https://git.autonomic.zone/ruangrupa/lumbung.space/src/branch/main/compose.yml), look for the values under the `environment: ...` stanza.

-All scripts have an entrypoint described in the
-[`pypoetry.toml`](https://git.autonomic.zone/ruangrupa/lumbunglib/src/commit/40bf9416b8792c08683ad8ac878093c7ef1b2f5d/pyproject.toml#L27-L31)
-which you can run via `poetry run ...`. For example, if you want to run the
-[`lumbunglib/video.py`](./lumbunglib/video.py) script, you'd do:
+All scripts have an entrypoint described in the [`pypoetry.toml`](./pyproject.toml) which you can run via `poetry run ...`. For example, if you want to run the [`konfluks/video.py`](./konfluks/video.py) script, you'd do:

 ```
 mkdir -p testdir
 export OUTPUT_DIR=/testdir
-poetry run lumbunglib-vid
+poetry run konfluks-vid
 ```

-Run `poetry run poetry2setup > setup.py` if updating the poetry dependencies.
-This allows us to run `pip install .` in the deployment and Pip will understand
-that it is just a regular Python package. If adding a new cli command, extend
-`pyproject.toml` with a new `[tool.poetry.scripts]` entry.
+Run `poetry run poetry2setup > setup.py` if updating the poetry dependencies. This allows us to run `pip install .` in the deployment and Pip will understand that it is just a regular Python package. If adding a new cli command, extend `pyproject.toml` with a new `[tool.poetry.scripts]` entry.
--- a/konfluks.svg
+++ b/konfluks.svg
--- a/lumbunglib/cloudcal.py
+++ b/lumbunglib/cloudcal.py
@ -138,9 +138,9 @@ def create_event_post(post_dir, event):
    for img in event_metadata["images"]:

        # parse img url to safe local image name
-        img_name = img.split("/")[-1]
-        fn, ext = img_name.split(".")
-        img_name = slugify(fn) + "." + ext
+        img_name = os.path.basename(img)
+        fn, ext = os.path.splitext(img_name)
+        img_name =  slugify(fn) + '.' + ext

        local_image = os.path.join(post_dir, img_name)

--- a/konfluks/feed.py
+++ b/konfluks/feed.py
@ -0,0 +1,442 @@
+import os
+import shutil
+import time
+from hashlib import md5
+from ast import literal_eval as make_tuple
+from pathlib import Path
+from urllib.parse import urlparse
+from re import sub
+
+import arrow
+import feedparser
+import jinja2
+import requests
+from bs4 import BeautifulSoup
+from slugify import slugify
+from re import compile as re_compile
+yamlre = re_compile('"')
+
+
+def write_etag(feed_name, feed_data):
+    """
+    save timestamp of when feed was last modified
+    """
+    etag = ""
+    modified = ""
+
+    if "etag" in feed_data:
+        etag = feed_data.etag
+    if "modified" in feed_data:
+        modified = feed_data.modified
+
+    if etag or modified:
+        with open(os.path.join("etags", feed_name + ".txt"), "w") as f:
+            f.write(str((etag, modified)))
+
+
+def get_etag(feed_name):
+    """
+    return timestamp of when feed was last modified
+    """
+    fn = os.path.join("etags", feed_name + ".txt")
+    etag = ""
+    modified = ""
+
+    if os.path.exists(fn):
+        etag, modified = make_tuple(open(fn, "r").read())
+
+    return etag, modified
+
+
+def create_frontmatter(entry):
+    """
+    parse RSS metadata and return as frontmatter
+    """
+    if 'published' in entry:
+        published = entry.published_parsed
+    if 'updated' in entry:
+        published = entry.updated_parsed
+
+    published = arrow.get(published)
+
+    if 'author' in entry:
+        author = entry.author
+    else:
+        author = ''
+
+    if 'authors' in entry:
+        authors = []
+        for a in entry.authors:
+            authors.append(a['name'])
+
+    if 'summary' in entry:
+        summary = entry.summary
+    else:
+        summary = ''
+
+    if 'publisher' in entry:
+        publisher = entry.publisher
+    else:
+        publisher = ''
+
+    tags = []
+    if 'tags' in entry:
+        #TODO finish categories
+        for t in entry.tags:
+            tags.append(t['term'])
+
+    if "featured_image" in entry:
+        featured_image = entry.featured_image
+    else:
+        featured_image = ''
+
+    card_type = "network"
+    if entry.feed_name == "pen.lumbung.space":
+        card_type = "pen"
+
+    if "opds" in entry:
+        frontmatter = {
+        'title':entry.title,
+        'date': published.format(),
+        'summary': summary,
+        'author': ",".join(authors),
+        'publisher': publisher,
+        'original_link': entry.links[0]['href'].replace('opds/cover/','books/'),
+        'feed_name': entry['feed_name'],
+        'tags': str(tags),
+        'category': "books"
+        }
+    else:
+        frontmatter = {
+        'title':entry.title,
+        'date': published.format(),
+        'summary': '',
+        'author': author,
+        'original_link': entry.link,
+        'feed_name': entry['feed_name'],
+        'tags': str(tags),
+        'card_type': card_type,
+        'featured_image': featured_image
+        }
+
+    return frontmatter
+
+def sanitize_yaml (frontmatter):
+    """
+    Escapes any occurences of double quotes
+    in any of the frontmatter fields
+    See: https://docs.octoprint.org/en/master/configuration/yaml.html#interesting-data-types
+    """
+    for k, v in frontmatter.items():
+        if type(v) == type([]):
+            #some fields are lists
+            l = []
+            for i in v:
+                i = yamlre.sub('\\"', i)
+                l.append(i)
+            frontmatter[k] = l
+
+        else:
+            v = yamlre.sub('\\"', v)
+            frontmatter[k] = v
+
+    return frontmatter
+
+def parse_enclosures(post_dir, entry):
+    """
+    Parses feed enclosures which are featured media
+    Can be featured image but also podcast entries
+    https://pythonhosted.org/feedparser/reference-entry-enclosures.html
+    """
+    #TODO parse more than images
+    #TODO handle the fact it could be multiple items
+
+    for e in entry.enclosures:
+        if "type" in e:
+            print("found enclosed media", e.type)
+            if "image/" in e.type:
+                if not os.path.exists(post_dir): #this might be redundant with create_post
+                    os.makedirs(post_dir)
+                featured_image = grab_media(post_dir, e.href)
+                media_item = urlparse(e.href).path.split('/')[-1]
+                entry["featured_image"] = media_item
+            else:
+                print("FIXME:ignoring enclosed", e.type)
+    return entry
+
+
+def create_post(post_dir, entry):
+    """
+    write hugo post based on RSS entry
+    """
+    if "enclosures" in entry:
+        entry = parse_enclosures(post_dir, entry)
+
+    frontmatter = create_frontmatter(entry)
+
+    if not os.path.exists(post_dir):
+        os.makedirs(post_dir)
+
+    if "content" in entry:
+        post_content = entry.content[0].value
+    else:
+        post_content = entry.summary
+
+    parsed_content = parse_posts(post_dir, post_content)
+
+    template_dir = os.path.join(Path(__file__).parent.resolve(), "templates")
+    env = jinja2.Environment(loader=jinja2.FileSystemLoader(template_dir))
+    template = env.get_template("feed.md")
+    with open(os.path.join(post_dir, "index.html"), "w") as f:  # n.b. .html
+        post = template.render(frontmatter=sanitize_yaml(frontmatter), content=parsed_content)
+        f.write(post)
+        print("created post for", entry.title, "({})".format(entry.link))
+
+
+def grab_media(post_directory, url, prefered_name=None):
+    """
+    download media linked in post to have local copy
+    if download succeeds return new local path otherwise return url
+    """
+    media_item = urlparse(url).path.split('/')[-1]
+
+    headers = {
+    'User-Agent': 'https://git.autonomic.zone/ruangrupa/lumbunglib',
+    'From': 'info@lumbung.space'  # This is another valid field
+    }
+    if prefered_name:
+        media_item = prefered_name
+
+    try:
+        if not os.path.exists(os.path.join(post_directory, media_item)):
+            #TODO: stream is true is a conditional so we could check the headers for things, mimetype etc
+            response = requests.get(url, headers=headers, stream=True)
+            if response.ok:
+                with open(os.path.join(post_directory, media_item), 'wb') as media_file:
+                    shutil.copyfileobj(response.raw, media_file)
+                    print('Downloaded media item', media_item)
+                    return media_item
+            else:
+                print("Download failed", response.status_code)
+                return url
+            return media_item
+        elif os.path.exists(os.path.join(post_directory, media_item)):
+            return media_item
+
+    except Exception as e:
+        print('Failed to download image', url)
+        print(e)
+    return url
+
+
+def parse_posts(post_dir, post_content):
+    """
+    parse the post content to for media items
+    replace foreign image with local copy
+    filter out iframe sources not in allowlist
+    """
+    soup = BeautifulSoup(post_content, "html.parser")
+    allowed_iframe_sources = ["youtube.com", "vimeo.com", "tv.lumbung.space"]
+
+    for img in soup(["img", "object"]):
+        if img.get("src") != None:
+            local_image = grab_media(post_dir, img["src"])
+            if img["src"] != local_image:
+                img["src"] = local_image
+
+    for iframe in soup(["iframe"]):
+        if not any(source in iframe["src"] for source in allowed_iframe_sources):
+            print("filtered iframe: {}...".format(iframe["src"][:25]))
+            iframe.decompose()
+    return soup.decode()
+
+
+def grab_feed(feed_url):
+    """
+    check whether feed has been updated
+    download & return it if it has
+    """
+    feed_name = urlparse(feed_url).netloc
+
+    etag, modified = get_etag(feed_name)
+
+    try:
+        if modified:
+            data = feedparser.parse(feed_url, modified=modified)
+        elif etag:
+            data = feedparser.parse(feed_url, etag=etag)
+        else:
+            data = feedparser.parse(feed_url)
+    except Exception as e:
+        print("Error grabbing feed")
+        print(feed_name)
+        print(e)
+        return False
+
+    if "status" in data:
+        print(data.status, feed_url)
+        if data.status == 200:
+            # 304 means the feed has not been modified since we last checked
+            write_etag(feed_name, data)
+            return data
+    return False
+
+def create_opds_post(post_dir, entry):
+    """
+    create a HUGO post based on OPDS entry
+    or update it if the timestamp is newer
+    Downloads the cover & file
+    """
+
+    frontmatter = create_frontmatter(entry)
+
+    template_dir = os.path.join(Path(__file__).parent.resolve(), "templates")
+    env = jinja2.Environment(loader=jinja2.FileSystemLoader(template_dir))
+    template = env.get_template("feed.md")
+
+    if not os.path.exists(post_dir):
+        os.makedirs(post_dir)
+
+    if os.path.exists(os.path.join(post_dir, '.timestamp')):
+        old_timestamp = open(os.path.join(post_dir, '.timestamp')).read()
+        old_timestamp = arrow.get(float(old_timestamp))
+        current_timestamp = arrow.get(entry['updated_parsed'])
+
+        if current_timestamp > old_timestamp:
+            pass
+        else:
+            print('Book "{}..." already up to date'.format(entry['title'][:32]))
+            return
+
+    for item in entry.links:
+        ft = item['type'].split('/')[-1]
+        fn = item['rel'].split('/')[-1]
+
+        if fn == "acquisition":
+            fn = "publication" #calling the publications acquisition is weird
+
+        prefered_name = "{}-{}.{}".format(fn, slugify(entry['title']), ft)
+
+        grab_media(post_dir, item['href'], prefered_name)
+
+        if "summary" in entry:
+            summary = entry.summary
+        else:
+            summary = ""
+
+    with open(os.path.join(post_dir,'index.md'),'w') as f:
+        post = template.render(frontmatter=sanitize_yaml(frontmatter), content=summary)
+        f.write(post)
+        print('created post for Book', entry.title)
+
+    with open(os.path.join(post_dir, '.timestamp'), 'w') as f:
+        timestamp = arrow.get(entry['updated_parsed'])
+        f.write(timestamp.format('X'))
+
+
+def main():
+    feed_urls = open("feeds_list.txt", "r").read().splitlines()
+
+    start = time.time()
+
+    if not os.path.exists("etags"):
+        os.mkdir("etags")
+
+    output_dir = os.environ.get("OUTPUT_DIR")
+
+    if not os.path.exists(output_dir):
+        os.makedirs(output_dir)
+
+    feed_dict = dict()
+    for url in feed_urls:
+        feed_name = urlparse(url).netloc
+        feed_dict[url] = feed_name
+
+    feed_names = feed_dict.values()
+    content_dirs = os.listdir(output_dir)
+    for i in content_dirs:
+        if i not in feed_names:
+            shutil.rmtree(os.path.join(output_dir, i))
+            print("%s not in feeds_list.txt, removing local data" %(i))
+
+    # add iframe to the allowlist of feedparser's sanitizer,
+    # this is now handled in parse_post()
+    feedparser.sanitizer._HTMLSanitizer.acceptable_elements |= {"iframe"}
+
+    for feed_url in feed_urls:
+
+        feed_name = feed_dict[feed_url]
+
+        feed_dir = os.path.join(output_dir, feed_name)
+
+        if not os.path.exists(feed_dir):
+            os.makedirs(feed_dir)
+
+        existing_posts = os.listdir(feed_dir)
+
+        data = grab_feed(feed_url)
+
+        if data: #whenever we get a 200
+            if data.feed: #only if it is an actual feed
+                opds_feed = False
+                if 'links' in data.feed:
+                    for i in data.feed['links']:
+                            if i['rel'] == 'self':
+                                if 'opds' in i['type']:
+                                    opds_feed = True
+                                    print("OPDS type feed!")
+
+                for entry in data.entries:
+                    # if 'tags' in entry:
+                    #     for tag in entry.tags:
+                    #        for x in ['lumbung.space', 'D15', 'lumbung']:
+                    #            if x in tag['term']:
+                    #                print(entry.title)
+                    entry["feed_name"] = feed_name
+
+                    post_name = slugify(entry.title)
+
+                    # pixelfed returns the whole post text as the post name. max
+                    # filename length is 255 on many systems. here we're shortening
+                    # the name and adding a hash to it to avoid a conflict in a
+                    # situation where 2 posts start with exactly the same text.
+                    if len(post_name) > 150:
+                        post_hash = md5(bytes(post_name, "utf-8"))
+                        post_name = post_name[:150] + "-" + post_hash.hexdigest()
+
+                    if opds_feed:
+                        entry['opds'] = True
+                        #format: Beyond-Debiasing-Report_Online-75535a4886e3
+                        post_name = slugify(entry['title'])+'-'+entry['id'].split('-')[-1]
+
+                    post_dir = os.path.join(output_dir, feed_name, post_name)
+
+                    if post_name not in existing_posts:
+                        # if there is a blog entry we dont already have, make it
+                        if opds_feed:
+                            create_opds_post(post_dir, entry)
+                        else:
+                            create_post(post_dir, entry)
+
+                    elif post_name in existing_posts:
+                        # if we already have it, update it
+                        if opds_feed:
+                            create_opds_post(post_dir, entry)
+                        else:
+                            create_post(post_dir, entry)
+                        existing_posts.remove(
+                            post_name
+                        )  # create list of posts which have not been returned by the feed
+
+
+                for post in existing_posts:
+                    # remove blog posts no longer returned by the RSS feed
+                    post_dir = os.path.join(output_dir, feed_name, post)
+                    shutil.rmtree(post_dir)
+                    print("deleted", post_dir)
+            else:
+                print(feed_url, "is not or no longer a feed!")
+
+    end = time.time()
+
+    print(end - start)
--- a/lumbunglib/hashtag.py
+++ b/lumbunglib/hashtag.py
@ -23,6 +23,7 @@ hashtags = [
    "ruruhaus",
    "offbeatentrack_kassel",
    "lumbungofpublishers",
+    "lumbungkiosproducts",
 ]


@ -59,6 +60,21 @@ def download_media(post_directory, media_attachments):
                with open(os.path.join(post_directory, image), "wb") as img_file:
                    shutil.copyfileobj(response.raw, img_file)
                    print("Downloaded cover image", image)
+        elif item["type"] == "video":
+            video = localize_media_url(item["url"])
+            if not os.path.exists(os.path.join(post_directory, video)):
+                # download video file
+                response = requests.get(item["url"], stream=True)
+                with open(os.path.join(post_directory, video), "wb") as video_file:
+                    shutil.copyfileobj(response.raw, video_file)
+                    print("Downloaded video in post", video)
+            if not os.path.exists(os.path.join(post_directory, "thumbnail.png")):
+                #download video preview
+                response = requests.get(item["preview_url"], stream=True)
+                with open(os.path.join(post_directory, "thumbnail.png"), "wb") as thumbnail:
+                    shutil.copyfileobj(response.raw, thumbnail)
+                    print("Downloaded thumbnail for", video)
+


 def create_post(post_directory, post_metadata):
@ -77,7 +93,6 @@ def create_post(post_directory, post_metadata):
    post_metadata["account"]["display_name"] = name
    env.filters["localize_media_url"] = localize_media_url
    env.filters["filter_mastodon_urls"] = filter_mastodon_urls
-
    template = env.get_template("hashtag.md")

    with open(os.path.join(post_directory, "index.html"), "w") as f:
--- a/lumbunglib/templates/calendar.md
+++ b/lumbunglib/templates/calendar.md
@ -2,7 +2,7 @@
 title: "{{ event.name }}"
 date: "{{ event.begin }}" #2021-06-10T10:46:33+02:00
 draft: false
-categories: "calendar"
+source: "lumbung calendar"
 event_begin: "{{ event.begin }}"
 event_end: "{{ event.end }}"
 duration: "{{ event.duration }}"
--- a/konfluks/templates/feed.md
+++ b/konfluks/templates/feed.md
@ -0,0 +1,15 @@
+---
+title: "{{ frontmatter.title }}"
+date: "{{ frontmatter.date }}" #2021-06-10T10:46:33+02:00
+draft: false
+summary: "{{ frontmatter.summary }}"
+contributors: {% if frontmatter.author %} ["{{ frontmatter.author }}"] {% endif %}
+original_link: "{{ frontmatter.original_link }}"
+feed_name: "{{ frontmatter.feed_name}}"
+card_type: "{{ frontmatter.card_type }}"
+sources: ["{{ frontmatter.feed_name}}"]
+tags: {{ frontmatter.tags }}
+{% if frontmatter.featured_image %}featured_image: "{{frontmatter.featured_image}}"{% endif %}
+---
+
+{{ content }}
--- a/konfluks/templates/hashtag.md
+++ b/konfluks/templates/hashtag.md
@ -0,0 +1,27 @@
+---
+date: {{ post_metadata.created_at }} #2021-06-10T10:46:33+02:00
+draft: false
+contributors: ["{{ post_metadata.account.display_name }}"]
+avatar: {{ post_metadata.account.avatar }}
+title: {{ post_metadata.account.display_name }}
+tags: [{% for i in post_metadata.tags %} "{{ i.name }}", {% endfor %}]
+images: [{% for i in post_metadata.media_attachments %}{% if i.type == "image" %}"{{ i.url | localize_media_url  }}", {%endif%}{% endfor %}]
+videos: [{% for i in post_metadata.media_attachments %}{% if i.type == "video" %}"{{ i.url | localize_media_url  }}", {%endif%}{% endfor %}]
+---
+
+{% for item in post_metadata.media_attachments %}
+{% if item.type == "image" %}
+<img src="{{item.url | localize_media_url }}" alt="{{item.description}}">
+{% endif %}
+{% endfor %}
+
+{% for item in post_metadata.media_attachments %}
+{% if item.type == "video" %}
+<video controls width="540px" preload="none" poster="thumbnail.png">
+	<source src="{{item.url | localize_media_url }}" type="video/mp4">
+{% if item.description %}{{item.description}}{% endif %}
+</video>
+{% endif %}
+{% endfor %}
+
+{{ post_metadata.content | filter_mastodon_urls }}
--- a/konfluks/templates/timeline.md
+++ b/konfluks/templates/timeline.md
@ -3,11 +3,12 @@ title: "{{ frontmatter.title }}"
 date: "{{ frontmatter.date }}" #2021-06-10T10:46:33+02:00
 draft: false
 summary: "{{ frontmatter.summary }}"
-author: "{{ frontmatter.author }}"
+contributors: {% if frontmatter.author %} ["{{ frontmatter.author }}"] {% endif %}
 original_link: "{{ frontmatter.original_link }}"
 feed_name: "{{ frontmatter.feed_name}}"
-categories: ["network", "{{ frontmatter.feed_name}}"]
-tags: {{ frontmatter.tags }}
+sources: ["timeline", "{{ frontmatter.feed_name}}"]
+timelines: {{ frontmatter.timelines }}
+hidden: true
 ---

 {{ content }}
--- a/lumbunglib/templates/video.md
+++ b/lumbunglib/templates/video.md
@ -6,9 +6,10 @@ uuid: "{{v.uuid}}"
 video_duration: "{{ v.duration | duration }} "
 video_channel: "{{ v.channel.display_name }}"
 channel_url: "{{ v.channel.url }}"
+contributors: ["{{ v.account.display_name }}"]
 preview_image: "{{ preview_image }}"
 images: ["./{{ preview_image }}"]
-categories: ["tv","{{ v.channel.display_name }}"]
+sources: ["{{ v.channel.display_name }}"]
 is_live: {{ v.is_live }}
 ---

--- a/konfluks/timeline.py
+++ b/konfluks/timeline.py
@ -5,6 +5,7 @@ from hashlib import md5
 from ast import literal_eval as make_tuple
 from pathlib import Path
 from urllib.parse import urlparse
+from re import sub

 import arrow
 import feedparser
@ -13,7 +14,7 @@ import requests
 from bs4 import BeautifulSoup
 from slugify import slugify
 from re import compile as re_compile
-import saneyaml
+yamlre = re_compile('"')


 def write_etag(feed_name, feed_data):
@ -84,28 +85,15 @@ def create_frontmatter(entry):
        for t in entry.tags:
            tags.append(t['term'])

-    if "opds" in entry:
-        frontmatter = {
-        'title':entry.title,
-        'date': published.format(),
-        'summary': summary,
-        'author': ",".join(authors),
-        'publisher': publisher,
-        'original_link': entry.links[0]['href'].replace('opds/cover/','books/'),
-        'feed_name': entry['feed_name'],
-        'tags': str(tags),
-        'category': "books"
-        }
-    else:
-        frontmatter = {
+    frontmatter = {
        'title':entry.title,
        'date': published.format(),
        'summary': '',
        'author': author,
        'original_link': entry.link,
        'feed_name': entry['feed_name'],
-        'tags': str(tags)
-        }
+        'timelines': str(tags),
+    }

    return frontmatter

@ -120,12 +108,12 @@ def sanitize_yaml (frontmatter):
            #some fields are lists
            l = []
            for i in v:
-                i = saneyaml.load(i)
+                i = yamlre.sub('\\"', i)
                l.append(i)
            frontmatter[k] = l

        else:
-            v = saneyaml.load(v)
+            v = yamlre.sub('\\"', v)
            frontmatter[k] = v

    return frontmatter
@ -149,7 +137,7 @@ def create_post(post_dir, entry):

    template_dir = os.path.join(Path(__file__).parent.resolve(), "templates")
    env = jinja2.Environment(loader=jinja2.FileSystemLoader(template_dir))
-    template = env.get_template("feed.md")
+    template = env.get_template("timeline.md")
    with open(os.path.join(post_dir, "index.html"), "w") as f:  # n.b. .html
        post = template.render(frontmatter=sanitize_yaml(frontmatter), content=parsed_content)
        f.write(post)
@ -195,9 +183,10 @@ def parse_posts(post_dir, post_content):
    allowed_iframe_sources = ["youtube.com", "vimeo.com", "tv.lumbung.space"]

    for img in soup(["img", "object"]):
-        local_image = grab_media(post_dir, img["src"])
-        if img["src"] != local_image:
-            img["src"] = local_image
+        if img.get("src") != None:
+            local_image = grab_media(post_dir, img["src"])
+            if img["src"] != local_image:
+                img["src"] = local_image

    for iframe in soup(["iframe"]):
        if not any(source in iframe["src"] for source in allowed_iframe_sources):
@ -289,7 +278,7 @@ def create_opds_post(post_dir, entry):


 def main():
-    feed_urls = open("feeds_list.txt", "r").read().splitlines()
+    feed_urls = open("feeds_list_timeline.txt", "r").read().splitlines()

    start = time.time()

--- a/lumbunglib/video.py
+++ b/lumbunglib/video.py
--- a/lumbunglib/templates/hashtag.md
+++ b/lumbunglib/templates/hashtag.md
@ -1,16 +0,0 @@
---
-date: "{{ post_metadata.created_at }}" #2021-06-10T10:46:33+02:00
-draft: false
-author: "{{ post_metadata.account.display_name }}"
-avatar: "{{ post_metadata.account.avatar }}"
-categories: ["shouts"]
-images: [{% for i in post_metadata.media_attachments %} "{{ i.url }}", {% endfor %}]
-title: "{{ post_metadata.account.display_name }}"
-tags: [{% for i in post_metadata.tags %} "{{ i.name }}", {% endfor %}]
---
-
-{% for item in post_metadata.media_attachments %}
-<img src="{{item.url | localize_media_url }}" alt="{{item.description}}">
-{% endfor %}
-
-{{ post_metadata.content | filter_mastodon_urls }}
--- a/poetry.lock
+++ b/poetry.lock
@ -242,14 +242,6 @@ category = "main"
 optional = false
 python-versions = "*"

-[[package]]
-name = "pyyaml"
-version = "6.0"
-description = "YAML parser and emitter for Python"
-category = "main"
-optional = false
-python-versions = ">=3.6"
-
 [[package]]
 name = "requests"
 version = "2.27.1"
@ -268,21 +260,6 @@ urllib3 = ">=1.21.1,<1.27"
 socks = ["PySocks (>=1.5.6,!=1.5.7)", "win-inet-pton"]
 use_chardet_on_py3 = ["chardet (>=3.0.2,<5)"]

-[[package]]
-name = "saneyaml"
-version = "0.5.2"
-description = "Read and write readable YAML safely preserving order and avoiding bad surprises with unwanted infered type conversions. This library is a PyYaml wrapper with sane behaviour to read and write readable YAML safely, typically when used for configuration."
-category = "main"
-optional = false
-python-versions = "<4,>=3.6.*"
-
-[package.dependencies]
-PyYAML = "*"
-
-[package.extras]
-docs = ["Sphinx (>=3.3.1)", "sphinx-rtd-theme (>=0.5.0)", "doc8 (>=0.8.1)"]
-testing = ["pytest (>=6)", "pytest-xdist (>=2)"]
-
 [[package]]
 name = "sgmllib3k"
 version = "1.0.0"
@ -342,7 +319,7 @@ socks = ["PySocks (>=1.5.6,!=1.5.7,<2.0)"]
 [metadata]
 lock-version = "1.1"
 python-versions = "^3.9"
-content-hash = "86ebded9dbd151b57502b40d3e58d6d92f837bc776184afa84d297c40d6daa7a"
+content-hash = "c5c987253f949737210f4a3d3c3c24b0affd4a9c7d06de386c9bd514c592db8b"

 [metadata.files]
 arrow = [
@ -492,49 +469,10 @@ pytz = [
    {file = "pytz-2021.3-py2.py3-none-any.whl", hash = "sha256:3672058bc3453457b622aab7a1c3bfd5ab0bdae451512f6cf25f64ed37f5b87c"},
    {file = "pytz-2021.3.tar.gz", hash = "sha256:acad2d8b20a1af07d4e4c9d2e9285c5ed9104354062f275f3fcd88dcef4f1326"},
 ]
-pyyaml = [
-    {file = "PyYAML-6.0-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:d4db7c7aef085872ef65a8fd7d6d09a14ae91f691dec3e87ee5ee0539d516f53"},
-    {file = "PyYAML-6.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:9df7ed3b3d2e0ecfe09e14741b857df43adb5a3ddadc919a2d94fbdf78fea53c"},
-    {file = "PyYAML-6.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:77f396e6ef4c73fdc33a9157446466f1cff553d979bd00ecb64385760c6babdc"},
-    {file = "PyYAML-6.0-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:a80a78046a72361de73f8f395f1f1e49f956c6be882eed58505a15f3e430962b"},
-    {file = "PyYAML-6.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:f84fbc98b019fef2ee9a1cb3ce93e3187a6df0b2538a651bfb890254ba9f90b5"},
-    {file = "PyYAML-6.0-cp310-cp310-win32.whl", hash = "sha256:2cd5df3de48857ed0544b34e2d40e9fac445930039f3cfe4bcc592a1f836d513"},
-    {file = "PyYAML-6.0-cp310-cp310-win_amd64.whl", hash = "sha256:daf496c58a8c52083df09b80c860005194014c3698698d1a57cbcfa182142a3a"},
-    {file = "PyYAML-6.0-cp36-cp36m-macosx_10_9_x86_64.whl", hash = "sha256:897b80890765f037df3403d22bab41627ca8811ae55e9a722fd0392850ec4d86"},
-    {file = "PyYAML-6.0-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:50602afada6d6cbfad699b0c7bb50d5ccffa7e46a3d738092afddc1f9758427f"},
-    {file = "PyYAML-6.0-cp36-cp36m-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:48c346915c114f5fdb3ead70312bd042a953a8ce5c7106d5bfb1a5254e47da92"},
-    {file = "PyYAML-6.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:98c4d36e99714e55cfbaaee6dd5badbc9a1ec339ebfc3b1f52e293aee6bb71a4"},
-    {file = "PyYAML-6.0-cp36-cp36m-win32.whl", hash = "sha256:0283c35a6a9fbf047493e3a0ce8d79ef5030852c51e9d911a27badfde0605293"},
-    {file = "PyYAML-6.0-cp36-cp36m-win_amd64.whl", hash = "sha256:07751360502caac1c067a8132d150cf3d61339af5691fe9e87803040dbc5db57"},
-    {file = "PyYAML-6.0-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:819b3830a1543db06c4d4b865e70ded25be52a2e0631ccd2f6a47a2822f2fd7c"},
-    {file = "PyYAML-6.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:473f9edb243cb1935ab5a084eb238d842fb8f404ed2193a915d1784b5a6b5fc0"},
-    {file = "PyYAML-6.0-cp37-cp37m-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:0ce82d761c532fe4ec3f87fc45688bdd3a4c1dc5e0b4a19814b9009a29baefd4"},
-    {file = "PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:231710d57adfd809ef5d34183b8ed1eeae3f76459c18fb4a0b373ad56bedcdd9"},
-    {file = "PyYAML-6.0-cp37-cp37m-win32.whl", hash = "sha256:c5687b8d43cf58545ade1fe3e055f70eac7a5a1a0bf42824308d868289a95737"},
-    {file = "PyYAML-6.0-cp37-cp37m-win_amd64.whl", hash = "sha256:d15a181d1ecd0d4270dc32edb46f7cb7733c7c508857278d3d378d14d606db2d"},
-    {file = "PyYAML-6.0-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:0b4624f379dab24d3725ffde76559cff63d9ec94e1736b556dacdfebe5ab6d4b"},
-    {file = "PyYAML-6.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:213c60cd50106436cc818accf5baa1aba61c0189ff610f64f4a3e8c6726218ba"},
-    {file = "PyYAML-6.0-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:9fa600030013c4de8165339db93d182b9431076eb98eb40ee068700c9c813e34"},
-    {file = "PyYAML-6.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:277a0ef2981ca40581a47093e9e2d13b3f1fbbeffae064c1d21bfceba2030287"},
-    {file = "PyYAML-6.0-cp38-cp38-win32.whl", hash = "sha256:d4eccecf9adf6fbcc6861a38015c2a64f38b9d94838ac1810a9023a0609e1b78"},
-    {file = "PyYAML-6.0-cp38-cp38-win_amd64.whl", hash = "sha256:1e4747bc279b4f613a09eb64bba2ba602d8a6664c6ce6396a4d0cd413a50ce07"},
-    {file = "PyYAML-6.0-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:055d937d65826939cb044fc8c9b08889e8c743fdc6a32b33e2390f66013e449b"},
-    {file = "PyYAML-6.0-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:e61ceaab6f49fb8bdfaa0f92c4b57bcfbea54c09277b1b4f7ac376bfb7a7c174"},
-    {file = "PyYAML-6.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:d67d839ede4ed1b28a4e8909735fc992a923cdb84e618544973d7dfc71540803"},
-    {file = "PyYAML-6.0-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:cba8c411ef271aa037d7357a2bc8f9ee8b58b9965831d9e51baf703280dc73d3"},
-    {file = "PyYAML-6.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:40527857252b61eacd1d9af500c3337ba8deb8fc298940291486c465c8b46ec0"},
-    {file = "PyYAML-6.0-cp39-cp39-win32.whl", hash = "sha256:b5b9eccad747aabaaffbc6064800670f0c297e52c12754eb1d976c57e4f74dcb"},
-    {file = "PyYAML-6.0-cp39-cp39-win_amd64.whl", hash = "sha256:b3d267842bf12586ba6c734f89d1f5b871df0273157918b0ccefa29deb05c21c"},
-    {file = "PyYAML-6.0.tar.gz", hash = "sha256:68fb519c14306fec9720a2a5b45bc9f0c8d1b9c72adf45c37baedfcd949c35a2"},
-]
 requests = [
    {file = "requests-2.27.1-py2.py3-none-any.whl", hash = "sha256:f22fa1e554c9ddfd16e6e41ac79759e17be9e492b3587efa038054674760e72d"},
    {file = "requests-2.27.1.tar.gz", hash = "sha256:68d7c56fd5a8999887728ef304a6d12edc7be74f1cfa47714fc8b414525c9a61"},
 ]
-saneyaml = [
-    {file = "saneyaml-0.5.2-py3-none-any.whl", hash = "sha256:e54ed827973647ee9be8e8c091536b55ad22b3f9b1296e36701a3544822e7eac"},
-    {file = "saneyaml-0.5.2.tar.gz", hash = "sha256:d6074f1959041342ab41d74a6f904720ffbcf63c94467858e0e22e17e3c43d41"},
-]
 sgmllib3k = [
    {file = "sgmllib3k-1.0.0.tar.gz", hash = "sha256:7868fb1c8bfa764c1ac563d3cf369c381d1325d36124933a726f29fcdaa812e9"},
 ]
--- a/pyproject.toml
+++ b/pyproject.toml
@ -1,9 +1,9 @@
 [tool.poetry]
-name = "lumbunglib"
+name = "konfluks"
 version = "0.1.0"
-description = "Python lib which powers lumbung[dot]space automation"
-authors = ["rra", "decentral1se"]
-license = "GPLv3+"
+description = "Brings together small and dispersed streams of web content from different applications and websites together in a single large stream."
+authors = ["rra", "decentral1se", "knoflook"]
+license = "AGPLv3+"

 [tool.poetry.dependencies]
 python = "^3.9"
@ -16,7 +16,6 @@ peertube = {git = "https://framagit.org/framasoft/peertube/clients/python.git"}
 feedparser = "^6.0.8"
 bs4 = "^0.0.1"
 "Mastodon.py" = "^1.5.1"
-saneyaml = "^0.5.2"

 [tool.poetry.dev-dependencies]
 poetry2setup = "^1.0.0"
@ -26,7 +25,8 @@ requires = ["poetry-core>=1.0.0"]
 build-backend = "poetry.core.masonry.api"

 [tool.poetry.scripts]
-lumbunglib-cal = "lumbunglib.cloudcal:main"
-lumbunglib-vid = "lumbunglib.video:main"
-lumbunglib-feed = "lumbunglib.feed:main"
-lumbunglib-hash = "lumbunglib.hashtag:main"
+konfluks-cal = "konfluks.calendars:main"
+konfluks-vid = "konfluks.video:main"
+konfluks-feed = "konfluks.feed:main"
+konfluks-timeline = "konfluks.timeline:main"
+konfluks-hash = "konfluks.hashtag:main"
--- a/setup.py
+++ b/setup.py
@ -2,10 +2,10 @@
 from setuptools import setup

 packages = \
-['lumbunglib']
+['konfluks']

 package_data = \
-{'': ['*'], 'lumbunglib': ['templates/*']}
+{'': ['*'], 'konfluks': ['templates/*']}

 install_requires = \
 ['Jinja2>=3.0.3,<4.0.0',
@ -20,13 +20,14 @@ install_requires = \
 'requests>=2.26.0,<3.0.0']

 entry_points = \
-{'console_scripts': ['lumbunglib-cal = lumbunglib.cloudcal:main',
-                     'lumbunglib-feed = lumbunglib.feed:main',
-                     'lumbunglib-hash = lumbunglib.hashtag:main',
-                     'lumbunglib-vid = lumbunglib.video:main']}
+{'console_scripts': ['konfluks-cal = konfluks.calendars:main',
+                     'konfluks-feed = konfluks.feed:main',
+                     'konfluks-timeline = lumbunglib.timeline:main',
+                     'konfluks-hash = konfluks.hashtag:main',
+                     'konfluks-vid = konfluks.video:main']}

 setup_kwargs = {
-    'name': 'lumbunglib',
+    'name': 'konfluks',
    'version': '0.1.0',
    'description': 'Python lib which powers lumbung[dot]space automation',
    'long_description': None,
@ -44,4 +45,3 @@ setup_kwargs = {


 setup(**setup_kwargs)
-
Author	SHA1	Message	Date
rra	028bc1df84	fix issue where posts with enclosures would not download files because of missing dir	2024-11-20 11:25:26 +01:00
knoflook	82a017f624	Merge branch 'new-taxonomy'	2024-11-07 16:35:18 +01:00
rra	9d9f8f6d72	do proper deletion	2024-11-06 17:24:55 +01:00
rra	e01aa9a607	Test whether a url still returns a feed, pass right filename as featured_image when handling enclosure, pass post_dir to existing_posts	2024-11-06 16:48:41 +01:00
Aadil Ayub	3055ee37df	Merge pull request 'adjust templates to new taxonomy' (#43 ) from new-taxonomy into main Reviewed-on: ruangrupa/konfluks#43	2022-12-02 16:59:49 +01:00
rra	a4f749ebd7	adjust templates to new taxonomy	2022-09-15 18:37:25 +02:00
rra	0ecc0ecd3a	handle paths and extensions properly, fix #41	2022-09-09 14:19:19 +02:00
rra	657ced1ceb	undo dev setup changes	2022-09-09 13:27:29 +02:00
rra	d21158eb91	add support for videos in posts	2022-09-09 13:22:32 +02:00
decentral1se	98299daa1b	fix links	2022-07-18 12:21:01 +02:00
decentral1se	6020db4d15	additional gardening for konfluks rename	2022-07-18 12:16:52 +02:00
decentral1se	2b06a5f866	Merge remote-tracking branch 'konfluks/konfluks-renaming'	2022-07-18 12:05:00 +02:00
decentral1se	e66e3202da	add new hashtag	2022-06-21 00:00:32 +02:00
Aadil Ayub	ff76378cdd	merge christopher's changes pulling the timeline from pen.lumbung.space	2022-06-14 19:27:31 +05:00
Aadil Ayub	41bc532ebc	separate hashtags by comma	2022-06-10 15:55:17 +05:00
rra	845a54787b	Update 'README.md'	2022-06-02 09:29:20 +02:00
rra	f162bb946a	Update 'README.md' correcting markup / styling	2022-06-02 09:28:37 +02:00
rra	00f795f16d	rename project to konfluks for legibility, add docs	2022-06-02 09:23:58 +02:00
rra	b0f77831bd	add 'contributors' as metadata category	2022-06-02 06:45:54 +02:00
rra	5ba944b6d1	Merge pull request 'handle feeds with enclosures (featured media / podcasts)' (#35 ) from r/lumbunglib:master into master Reviewed-on: ruangrupa/lumbunglib#35	2022-06-01 08:05:36 +02:00
rra	ad591ea9cf	add more checks for failures	2022-06-01 05:51:25 +02:00
rra	9c824fcd3f	Merge remote-tracking branch 'upstream/master'	2022-05-29 14:45:30 +02:00
rra	cab36c8ac6	add less generic headers	2022-05-29 14:45:11 +02:00
rra	c84a975887	add reason for failure	2022-05-29 12:30:55 +02:00
Aadil Ayub	2ca61c6197	Merge pull request 'accomodate authors as taxonomy' (#34 ) from r/lumbunglib:master into master Reviewed-on: ruangrupa/lumbunglib#34	2022-05-27 13:24:32 +02:00
rra	fecf5cd64e	add rudimentary support for enclosures & featured images	2022-05-24 15:39:11 +02:00
rra	6e64d64772	only return an author if there is one	2022-05-24 12:19:50 +02:00
rra	3b390d1ecb	change template to authors to accomodate author taxonomy	2022-05-24 12:19:50 +02:00
rra	ce3bfc58b0	remove orphaned "	2022-05-24 12:19:50 +02:00
decentral1se	c5af3610a0	Merge pull request 'feed: assign pen category' (#33 ) from pen-category into master Reviewed-on: ruangrupa/lumbunglib#33	2022-04-26 08:30:34 +02:00
knoflook	3ea798b301	feed: assign pen category	2022-04-21 14:17:12 +02:00
decentral1se	7d3863641d	Revert "feat: sanitize all yaml" This reverts commit `2fbc952a72`.	2022-04-13 12:48:42 +02:00
decentral1se	f6a1a684c0	Revert "fix: don't escape some characters" This reverts commit `cf8b1ff7e9`.	2022-04-13 12:48:20 +02:00
decentral1se	58afd189a7	Revert "feed: move to saneyaml" This reverts commit `a809433410`.	2022-04-13 12:48:13 +02:00
knoflook	19ab610dfc	Merge pull request 'feat: sanitize all yaml' (#28 ) from knoflook/lumbunglib:master into master Reviewed-on: ruangrupa/lumbunglib#28	2022-04-12 13:44:34 +02:00