Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support more data formats/integration with other PKMs #276

Open
zimengzhou1 opened this issue Jun 12, 2024 · 3 comments
Open

Support more data formats/integration with other PKMs #276

zimengzhou1 opened this issue Jun 12, 2024 · 3 comments

Comments

@zimengzhou1
Copy link
Contributor

What are our thoughts on supporting other document types other than markdown, for example PDF or plaintext? Also it would be nice if users could directly use other note taking apps like notion as a source of their data, it would provide a lower barrier to entry to using reor.

@samlhuillier
Copy link
Collaborator

Absolutely! This is something that would be great to add, particularly supporting plain text & supporting pdfs. Would you be keen to add this?

@zimengzhou1
Copy link
Contributor Author

Sure, I'll have a crack at it!

@zimengzhou1
Copy link
Contributor Author

I did a bit of poking around and it seems supporting pdfs is significantly harder than I thought, more so after reading this article.

I tried using the library "pdf-parse" to extract text from the pdfs, but after testing out the parser with several research papers, it was clear the chunks stored in the vector database and shown in the "Related Notes" were poorly formatted (especially equations and tables) and not very relevant.

Supporting plain text was pretty trivial, I just added ".txt" as an allowed extension.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants