[feed.py] do your own sanitization for media embeds #1

Open
opened 2022-01-01 07:32:25 +00:00 by aadil · 1 comment
Owner

copied over from lumbung-feed-aggregator repo

iframes media embeds etc are stripped out now in

note to self to normalize the reset the size of media embeds

copied over from [lumbung-feed-aggregator repo](https://git.vvvvvvaria.org/rra/lumbung-feed-aggregator/issues/4) > iframes media embeds etc are stripped out now in > note to self to normalize the reset the size of media embeds
r changed title from do your own sanitization for media embeds to [feed.py] do your own sanitization for media embeds 2022-01-07 09:44:55 +00:00
Member

Context: the feed parsing library sanitizes a lot of html elements that could be otherwise used to inject crap or execute arbitrary code, which means that particular things such as videos and PDFs embedded in posts (both based on iframe) are not returned.

The feedparsing library maintains a allowlist of tags. I've implemented the exception by first adding iframe to the feedparser allowlist: https://git.autonomic.zone/ruangrupa/lumbunglib/src/branch/master/lumbunglib/feed.py#L199

Secondly I implement another allowlist on top with 'trusted domains':
https://git.autonomic.zone/ruangrupa/lumbunglib/src/branch/master/lumbunglib/feed.py#L140

It is probably wise to keep the same level of carefulness as the library maintainers and only add very specific exceptions to the allow lists.

Context: the feed parsing library [sanitizes a lot of html elements](https://feedparser.readthedocs.io/en/latest/html-sanitization.html) that could be otherwise used to inject crap or execute arbitrary code, which means that particular things such as videos and PDFs embedded in posts (both based on iframe) are not returned. The feedparsing library maintains a allowlist of tags. I've implemented the exception by first adding `iframe` to the feedparser allowlist: https://git.autonomic.zone/ruangrupa/lumbunglib/src/branch/master/lumbunglib/feed.py#L199 Secondly I implement another allowlist on top with 'trusted domains': https://git.autonomic.zone/ruangrupa/lumbunglib/src/branch/master/lumbunglib/feed.py#L140 It is probably wise to keep the same level of carefulness as the library maintainers and only add very specific exceptions to the allow lists.
Sign in to join this conversation.
No Label
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: ruangrupa/konfluks#1
No description provided.