Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LangChain] FAILED test.py::test_file[vector_search.py] - ValueError: Collection not found #580

Open
amotl opened this issue Aug 28, 2024 · 1 comment

Comments

@amotl
Copy link
Member

amotl commented Aug 28, 2024

Problem

Testing the integration with LangChain shows intermittent errors on scheduled runs, starting three weeks ago, going green in between for three runs, and going red again afterwards.

See: https://github.com/crate/cratedb-examples/actions/workflows/ml-langchain.yml

References

Details

This is probably the root cause?

ERROR    langchain_community.document_loaders.url:url.py:145 Error fetching or processing https://github.com/langchain-ai/langchain/raw/v0.0.325/docs/docs/modules/state_of_the_union.txt, exception: 

Traceback

------------------------------ Captured log call -------------------------------
ERROR    langchain_community.document_loaders.url:url.py:145 Error fetching or processing https://github.com/langchain-ai/langchain/raw/v0.0.325/docs/docs/modules/state_of_the_union.txt, exception: 
**********************************************************************
  Resource punkt_tab not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt_tab')
  
  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt_tab/english/

  Searched in:
    - '/home/runner/nltk_data'
    - '/opt/hostedtoolcache/Python/3.10.14/x64/nltk_data'
    - '/opt/hostedtoolcache/Python/3.10.14/x64/share/nltk_data'
    - '/opt/hostedtoolcache/Python/3.10.14/x64/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************
=============================== warnings summary ===============================
test.py::test_notebook[conversational_memory.ipynb]
  /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/nbformat/__init__.py:96: MissingIDFieldWarning: Cell is missing an id field, this will become a hard error in future nbformat versions. You may want to use `normalize()` on your notebooks before validations (available since nbformat 5.1.4). Previous versions of nbformat are fixing this issue transparently, and will stop doing so in the future.
    validate(nb)

test.py::test_file[conversational_memory.py]
  /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/langchain_community/chat_message_histories/sql.py:143: LangChainDeprecationWarning: `connection_string` was deprecated in LangChain 0.2.2 and will be removed in 0.3.0. Use Use connection instead instead.
    warn_deprecated(

test.py::test_file[vector_search.py]
  /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/langchain_community/vectorstores/pgvector.py:322: LangChainPendingDeprecationWarning: Please use JSONB instead of JSON for metadata. This change will allow for more efficient querying that involves filtering based on metadata.Please note that filtering operators have been changed when using JSOB metadata to be prefixed with a $ sign to avoid name collisions with columns. If you're using an existing database, you will need to create adb migration for your metadata column to be JSONB and update your queries to use the new operators. 
    warn_deprecated(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED test.py::test_file[vector_search.py] - ValueError: Collection not found
@amotl
Copy link
Member Author

amotl commented Aug 28, 2024

Thoughts I

This is probably the root cause?

ERROR    langchain_community.document_loaders.url:url.py:145 Error fetching or processing https://github.com/langchain-ai/langchain/raw/v0.0.325/docs/docs/modules/state_of_the_union.txt, exception: 

Maybe the reason is just because the code can't fetch https://github.com/langchain-ai/langchain/raw/v0.0.325/docs/docs/modules/state_of_the_union.txt? It works when probing the URL using my browser, but it might be different on CI/GHA?

Thoughts II

On the other hand, there is also this message, tripping from nltk.download('punkt_tab'):

Resource punkt_tab not found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant