papget

Collected methods and classes for providers of papers, e.g. Springer.

class papget.papget.Ams[source]

Provider implementation for American Mathematical Society

Examples

>>> url = 'https://bit.ly/2HEalfN'
>>> bool(Ams.need_to_pay(url))
False
>>> Ams.get_pdf_url(url)
u'http://www.ams.org/journals/jams/2016-29-01/...
>>> Ams.papget(url, 'temp.pdf')
u'temp.pdf'
>>> import os; os.remove('temp.pdf')
class papget.papget.Cammbridge[source]

Provider implementation for Cammbridge University Press

Examples

>>> url = 'https://bit.ly/2KoN7vU'
>>> bool(Cammbridge.need_to_pay(url))
False
>>> Cammbridge.get_pdf_url(url)
u'https://www.cambridge.org/core/services/aop-...
>>> Cammbridge.papget(url, 'temp.pdf')
u'temp.pdf'
>>> import os; os.remove('temp.pdf')
>>> url2 = 'https://bit.ly/2HBtD9F'
>>> bool(Cammbridge.need_to_pay(url2))
True
class papget.papget.Provider[source]

Class representing the providers of papers

Note

Do not instance this class but inherit from it and overwrite

need_to_pay() and get_pdf_url().

NAME = u''

(str) – Name of provider

RE_URL = None

(re.RegexObject) – Compiled regex used for matching URLs to this provider

static get_browser(browser=None)[source]

Create new browser if none is present.

Returns:(mechanize.Browser)
classmethod get_pdf_url(url, browser=None)[source]

Get URL of PDF resource

Parameters:
  • url (str) – URL pointing to desired webpage
  • browser (Optional[mechanize.Browser]) – If no browser is provided, a new instance will be created.
classmethod get_soup(url, browser=None)[source]

Get a parsed version of the HTML source of an URL

Parameters:
  • url (str) – URL pointing to desired webpage
  • browser (Optional[mechanize.Browser]) – If no browser is provided, a new instance will be created.
Returns:

(bs4.BeautifulSoup)

Parsed version of HTML source

classmethod need_to_pay(url, browser=None)[source]

Check whether one needs to pay for PDF download

Parameters:
  • url (str) – URL pointing to desired webpage
  • browser (Optional[mechanize.Browser]) – If no browser is provided, a new instance will be created.
classmethod papget(url, filename, browser=None)[source]

Comfortably download the PDF from a given URL

Parameters:
  • url (str) – URL pointing to desired webpage
  • browser (Optional[mechanize.Browser]) – If no browser is provided, a new instance will be created.
class papget.papget.SciHub[source]

Provider implementation for Sci-Hub

Raises:RuntimeError – if a CAPTACHA is encountered.
class papget.papget.Springer[source]

Provider implementation for Springer

Examples

>>> url = 'https://link.springer.com/article/10.1007%2Fs40065-017-0185-1'
>>> bool(Springer.need_to_pay(url))
False
>>> Springer.get_pdf_url(url)
u'https://link.springer.com/content/pdf/...
>>> Springer.papget(url, 'temp.pdf')
u'temp.pdf'
>>> import os; os.remove('temp.pdf')
NAME = u'Springer'

(str) – Name of provider

RE_URL = <_sre.SRE_Pattern object>

(re.RegexObject) – Compiled regex used for matching URLs to this provider