papget¶
Collected methods and classes for providers of papers, e.g. Springer.
-
class
papget.papget.Ams[source]¶ Provider implementation for American Mathematical Society
Examples
>>> url = 'https://bit.ly/2HEalfN' >>> bool(Ams.need_to_pay(url)) False >>> Ams.get_pdf_url(url) u'http://www.ams.org/journals/jams/2016-29-01/... >>> Ams.papget(url, 'temp.pdf') u'temp.pdf' >>> import os; os.remove('temp.pdf')
-
class
papget.papget.Cammbridge[source]¶ Provider implementation for Cammbridge University Press
Examples
>>> url = 'https://bit.ly/2KoN7vU' >>> bool(Cammbridge.need_to_pay(url)) False >>> Cammbridge.get_pdf_url(url) u'https://www.cambridge.org/core/services/aop-... >>> Cammbridge.papget(url, 'temp.pdf') u'temp.pdf' >>> import os; os.remove('temp.pdf') >>> url2 = 'https://bit.ly/2HBtD9F' >>> bool(Cammbridge.need_to_pay(url2)) True
-
class
papget.papget.Provider[source]¶ Class representing the providers of papers
Note
Do not instance this class but inherit from it and overwrite
need_to_pay()andget_pdf_url().-
NAME= u''¶ (str) – Name of provider
-
RE_URL= None¶ (
re.RegexObject) – Compiled regex used for matching URLs to this provider
-
static
get_browser(browser=None)[source]¶ Create new browser if none is present.
Returns: ( mechanize.Browser)
-
classmethod
get_pdf_url(url, browser=None)[source]¶ Get URL of PDF resource
Parameters: - url (str) – URL pointing to desired webpage
- browser (Optional[
mechanize.Browser]) – If no browser is provided, a new instance will be created.
-
classmethod
get_soup(url, browser=None)[source]¶ Get a parsed version of the HTML source of an URL
Parameters: - url (str) – URL pointing to desired webpage
- browser (Optional[
mechanize.Browser]) – If no browser is provided, a new instance will be created.
Returns: - (
bs4.BeautifulSoup) Parsed version of HTML source
-
classmethod
need_to_pay(url, browser=None)[source]¶ Check whether one needs to pay for PDF download
Parameters: - url (str) – URL pointing to desired webpage
- browser (Optional[
mechanize.Browser]) – If no browser is provided, a new instance will be created.
-
classmethod
papget(url, filename, browser=None)[source]¶ Comfortably download the PDF from a given URL
Parameters: - url (str) – URL pointing to desired webpage
- browser (Optional[
mechanize.Browser]) – If no browser is provided, a new instance will be created.
-
-
class
papget.papget.SciHub[source]¶ Provider implementation for Sci-Hub
Raises: RuntimeError– if a CAPTACHA is encountered.
-
class
papget.papget.Springer[source]¶ Provider implementation for Springer
Examples
>>> url = 'https://link.springer.com/article/10.1007%2Fs40065-017-0185-1' >>> bool(Springer.need_to_pay(url)) False >>> Springer.get_pdf_url(url) u'https://link.springer.com/content/pdf/... >>> Springer.papget(url, 'temp.pdf') u'temp.pdf' >>> import os; os.remove('temp.pdf')
-
NAME= u'Springer'¶ (str) – Name of provider
-
RE_URL= <_sre.SRE_Pattern object>¶ (
re.RegexObject) – Compiled regex used for matching URLs to this provider
-