Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid entries in multi-member ID lists cause entry repetition #82

Open
lukasschwab opened this issue Aug 18, 2021 · 0 comments
Open
Assignees
Labels
api Issues that correspond to arXiv API behavior rather than behavior introduced by this wrapper. bug Deviations from documented behavior.

Comments

@lukasschwab
Copy link
Owner

lukasschwab commented Aug 18, 2021

Description

A clear and concise description of what the bug is.

If id_list consists of a single nonexistent––but valid––ID, arXiv returns an empty feed which is interpreted to mean "no results."

If id_list consists of both existent and nonexistent valid IDs (["0000.0000", "1707.08567"]), the feed is non-empty––it contains a single item––but it has feed.feed.opensearch_totalresults == 2. The client takes this to be a partial page, and requests a page with offset 1... which lists paper 1707.08567 again. This is an API bug.

Notably, this behavior differs depending on the nonexistent ID. Nonexistent ID 1507.58567 yields an entry with missing fields (covered in #80, fixed by #82), whereas 1407.58567 yields no entries at all (covered here).

Example: https://export.arxiv.org/api/query?id_list=1407.58567,1707.08567

Steps to reproduce

Steps to reproduce the behavior; ideally, include a code snippet.

def test_invalid_id(self):
        results = list(arxiv.Search(id_list=["0000.0000"]).results())
        self.assertEqual(len(results), 0)
        results = list(arxiv.Search(id_list=["0000.0000", "1707.08567"]).results())
        print(len(results))
        self.assertEqual(len(results), 1) # Fails: 1707.08567 appears twice.

Expected behavior

A clear and concise description of what you expected to happen.

Results should not be duplicated.

Searching for ["0000.0000", "1707.08567"] should yield a single result.

Versions

  • python version: 3.7.9
  • arxiv.py version: 1.4.1
@lukasschwab lukasschwab added bug Deviations from documented behavior. api Issues that correspond to arXiv API behavior rather than behavior introduced by this wrapper. labels Aug 18, 2021
@lukasschwab lukasschwab self-assigned this Aug 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Issues that correspond to arXiv API behavior rather than behavior introduced by this wrapper. bug Deviations from documented behavior.
Projects
None yet
Development

No branches or pull requests

1 participant