I have encountered a situation where I need to retrieve all posts for a user or community on Lemmy and store them in a database. Due to a recent mishap, I had to revert to an older version of the database, resulting in a loss of some posts.

To overcome this issue, I am seeking guidance on how to efficiently retrieve all posts for a user or community on Lemmy and extract important information such as the URL, Title, Body, and Posted Timestamp.

To accomplish this task, I am utilizing the Pythörhead library, which provides a Python interface for interacting with the Lemmy API.

One approach I have considered is searching for each URL before posting to avoid duplicating posts. However, this method would be very slow and would create a significant load on the instance, potentially impacting performance.

Any suggestions on how best to solve this issue would be invaluable.

For reference, this is the Lemmy API endpoint that can be utilized for retrieving posts is.

Thank you in advance for your assistance!

  • CoderSupreme@programming.devOP
    link
    fedilink
    arrow-up
    3
    arrow-down
    1
    ·
    1 year ago
    import json
    import sqlite3
    
    from config import *
    from pythorhead import Lemmy
    
    # Connect to the database
    conn = sqlite3.connect('lemmy_github.db')
    cursor = conn.cursor()
    
    def import_missing_posts(posts_list):
        for post in posts_list:
            post = post['post']
            try:
                cursor.execute('SELECT * FROM posts WHERE issue_number = ?', (issue_number(post['url']),))
                result = cursor.fetchone()
                if result is None:
                    cursor.execute('INSERT INTO posts (issue_number, lemmy_post_id, issue_title, issue_body) VALUES (?, ?, ?, ?)',
                                (issue_number(post['url']), post['id'], post['name'], post['body']))
                    conn.commit()
            except sqlite3.Error as e:
                print(f"SQLite error occurred: {e}")
            except KeyError as e:
                print(f"KeyError occurred: {e}. Check if the input dictionary has all the required keys.")
            except Exception as e:
                print(f"An error occurred: {e}")
    
    def issue_number(url) -> int:
        return int(url.split("/")[-1])
    
    def load_json(filename):
        with open(filename) as f:
            return json.load(f)
    
    
    def process_posts(lemmy, username):
        page = 1
        while True:
            posts = lemmy.user.get(username=username, page=page)['posts']
            if not posts:
                break
            import_missing_posts(posts)
            page += 1
    
    lemmy = Lemmy(LEMMY_INSTANCE_URL)
    lemmy.log_in(LEMMY_USERNAME, LEMMY_PASSWORD)
    process_posts(lemmy, LEMMY_USERNAME)
    
    # Close the connection
    conn.close()