Scholia packages

Submodules

scholia.api module

api.

Usage:

scholia.api get <qs>… scholia.api q-to-classes <q> scholia.api q-to-name <q> scholia.api search [options] <query>

Options:

--limit=<limit>: Number of search results to return [default: 10]

Description:

Interface to the Wikidata API and its bibliographic data.

Examples

$ python -m scholia.api get Q26857876 Q21172284 | wc: 2 1289 16174

$ python -m scholia.api q-to-classes Q28133147 Q13442814

scholia.api.entity_to_authors(entity, return_humanness=False)

Extract authors from entity.

Parameters:

entity (dict) – Dictionary with Wikidata item
return_humanness (bool) – Toogle whether return argument should contain a list of strings or a list of tuples with both name and an indication of whether the author is a human. Some authors are organizations and formatting of authors may need to distinguish between humans and organizations.

Returns:

authors – List with each element representing an author. Each element may either be a string with the author name or a tuple with the author name and a boolean indicating humanness of the author.

Return type:

list of str or list of two-tuple

scholia.api.entity_to_classes(entity)

Extract ‘instance_of’ classes.

Parameters:: entity (dict) – Dictionary with Wikidata item
Returns:: classes – List of strings.
Return type:: list of str

Examples

>>> entities = wb_get_entities(['Q28133147'])
>>> classes = entity_to_classes(list(entities.values())[0])
>>> 'Q13442814' in classes
True

scholia.api.entity_to_doi(entity)

Extract DOI of publication from entity.

Parameters:: entity (dict) – Dictionary with Wikidata item
Returns:: doi – DOI as string. An empty string is returned if the field is not set.
Return type:: str

Examples

>>> entities = wb_get_entities(['Q24239902'])
>>> doi = entity_to_doi(entities['Q24239902'])
>>> doi == '10.1038/438900A'
True

scholia.api.entity_to_full_text_url(entity)

Extract full text URL of publication from entity.

Parameters:: entity (dict) – Dictionary with Wikidata item
Returns:: url – URL as string. An empty string is returned if the field is not set.
Return type:: str

Examples

>>> entities = wb_get_entities(['Q28374293'])
>>> url = entity_to_full_text_url(entities['Q28374293'])
>>> url == ('http://papers.nips.cc/paper/'
...         '5872-efficient-and-robust-automated-machine-learning.pdf')
True

scholia.api.entity_to_journal_title(entity)

Extract journal of publication from entity.

Parameters:: entity (dict) – Dictionary with Wikidata item
Returns:: journal – Journal as string. An empty string is returned if the field is not set.
Return type:: str

Examples

>>> entities = wb_get_entities(['Q24239902'])
>>> journal = entity_to_journal_title(entities['Q24239902'])
>>> journal == 'Nature'
True

scholia.api.entity_to_label(entity)

Extract label from entity.

Parameters:: entity (dict) – Dictionary with Wikidata item
Returns:: label – String with label.
Return type:: str

scholia.api.entity_to_month(entity, language='en')

Extract month of publication from entity.

Parameters:

entity (dict) – Dictionary with Wikidata item.
language (str) – Language, if none, returns the month as a string with the month number.

Returns:

month – Month as string. If month is not specified, i.e., the precision is year then None is return.

Return type:

str or None

scholia.api.entity_to_name(entity)

Extract the name of the item.

Parameters:: entity (dict) – Dictionary with Wikidata item representing a person.
Returns:: name – Name of person.
Return type:: str or None

Examples

>>> entities = wb_get_entities(['Q8219'])
>>> name = entity_to_name(list(entities.values())[0])
>>> name == 'Uta Frith'
True

scholia.api.entity_to_pages(entity)

Extract pages of publication from entity.

Parameters:: entity (dict) – Dictionary with Wikidata item
Returns:: pages – Pages as string. An empty string is returned if the field is not set.
Return type:: str

Examples

>>> entities = wb_get_entities(['Q24239902'])
>>> pages = entity_to_pages(entities['Q24239902'])
>>> pages == '900-901'
True

scholia.api.entity_to_smiles(entity)

Extract SMILES of a chemical.

Parameters:: entity (dict) – Dictionary with Wikidata item
Returns:: smiles – SMILES as string.
Return type:: str

Examples

>>> entities = wb_get_entities(['Q48791494'])
>>> smiles = entity_to_smiles(entities['Q48791494'])
>>> smiles == 'CC(C)[C@H]1CC[C@@]2(CO2)[C@@H]3[C@@H]1C=C(COC3=O)C(=O)O'
True

scholia.api.entity_to_title(entity)

Extract title from entity.

Parameters:: entity (dict) – Dictionary with Wikidata item.
Returns:: title – Title as string. If the title is not set then None is returned.
Return type:: str or None

scholia.api.entity_to_volume(entity)

Extract volume of publication from entity.

Parameters:: entity (dict) – Dictionary with Wikidata item
Returns:: volume – Volume as string. An empty string is returned if the field is not set.
Return type:: str

Examples

>>> entities = wb_get_entities(['Q21172284'])
>>> volume = entity_to_volume(entities['Q21172284'])
>>> volume == '12'
True

scholia.api.entity_to_year(entity)

Extract year of publication from entity.

Parameters:: entity (dict) – Dictionary with Wikidata item
Returns:: year – Year as string.
Return type:: str or None

scholia.api.is_human(entity)

Return true if entity is a human.

Parameters:: entity (dict) – Structure with Wikidata entity.
Returns:: result – Result of comparison.
Return type:: bool

scholia.api.main(): Handle command-line arguments.

scholia.api.search(query, page, limit=10)

Search Wikidata.

Parameters:

query (str) – Query string.
page (int) – Number of current page.
limit (int, optional) – Number of maximum search results to return.

Returns:

result

Return type:

dict

scholia.api.select_value_by_language_preferences(choices, preferences=('en', 'de', 'fr'))

Select value based on language preference.

Parameters:

choices (dict) – Dictionary with language as keys and strings as values.
preferences (list or tuple) – Iterator

Returns:

value – Selected string. Returns an empty string if there is no choices.

Return type:

str

Examples

>>> choices = {'da': 'Bog', 'en': 'Book', 'de': 'Buch'}
>>> select_value_by_language_preferences(choices)
'Book'

scholia.api.wb_get_entities(qs)

Get entities from Wikidata.

Query the Wikidata webservice via is API.

Parameters:: qs (list of str) – List of strings, each with a Wikidata item identifier.
Returns:: data – Dictionary of dictionaries.
Return type:: dict of dict

scholia.arxiv module

arxiv.

Usage:: scholia.arxiv get-metadata <arxiv> scholia.arxiv get-quickstatements [options] <arxiv>
Options:: -o –output=file Output filename, default output to stdout

References

https://arxiv.org

scholia.arxiv.get_metadata(arxiv)

Get metadata about an arxiv publication from website.

Scrapes the arXiv webpage corresponding to the paper with the arxiv identifier and return the metadata for the paper in a dictionary.

Parameters:: arxiv (str) – ArXiv identifier.
Returns:: metadata – Dictionary with metadata. None is returned if the identifier is not found.
Return type:: dict or None

Notes

This function queries arXiv. It must not be used to crawl arXiv. It does not look at robots.txt.

The language is set to English.

References

Examples

>>> metadata = get_metadata('1503.00759')
>>> metadata['doi'] == '10.1109/JPROC.2015.2483592'
True
>>> 'error' in get_metadata('5432.01234')
True

scholia.arxiv.main(): Handle command-line interface.

scholia.arxiv.string_to_arxiv(string)

Extract arxiv id from string.

The arXiv identifier part of string will be extracted, where the identifier pattern should be in the format of a series of digits followed by a period followed by a series of digits. Other formats will not be matched. If multiple identifier patterns are in the input string then only the first is returned.

Parameters:: string (str) – String with arxiv ID.
Returns:: arxiv – String with arxiv ID.
Return type:: str or None

Examples

>>> string = "http://arxiv.org/abs/1103.2903"
>>> arxiv = string_to_arxiv(string)
>>> arxiv == '1103.2903'
True

scholia.arxiv.string_to_arxivs(string)

Extract arxiv IDs from string.

Multiple arXiv identifier part of string will be extracted, where the identifier pattern should be in the format of a series of digits followed by a period followed by a series of digits. Other formats will not be matched. If multiple identifier patterns are in the input string then only the first is returned.

Parameters:: string (str) – String with arxiv ID.
Returns:: arxivs – String with arxiv IDs.
Return type:: list of str

Examples

>>> string = "2210.03493 http://arxiv.org/abs/1103.2903"
>>> arxivs = string_to_arxivs(string)
>>> '1103.2903' in arxivs
True
>>> "2210.03493" in arxivs
True

scholia.github module

github.

Usage:: scholia.github get-user <username> scholia.github get-user-followers <username> scholia.github get-user-number-of-followers <username> scholia.github get-user-repos <username>

scholia.github.get(resource)

Query GitHub API for resource.

Parameters:: resource (str) – Resource, e.g., “/users/fnielsen” for the user ‘fnielsen’.
Returns:: data – Data from the GitHub API converted to a Python object from the JSON.
Return type:: dictionary or list

References

https://developer.github.com/v3/

scholia.github.get_user(username)

Get user information from GitHub.

Parameters:: username (str) – GitHub username as a string.
Returns:: data – User information as a dictionary.
Return type:: dict

Examples

>>> data = get_user('fnielsen')
>>> data.get('name', '').startswith('Finn') or 'name' not in data
True

scholia.github.get_user_followers(username)

Get user followers from GitHub.

Parameters:: username (str) – GitHub username as a string.
Returns:: data – List of users.
Return type:: list of dict

scholia.github.get_user_repos(username)

Get repos for a user from GitHub.

Parameters:: username (str) – GitHub username as a string.
Returns:: data – List of repos.
Return type:: list of dict

scholia.github.main(): Handle command-line interface.

scholia.googlescholar module

scholia.googlescholar.

Usage:: scholia.googlescholar get-user-data <user>
Options:: -h –help Documentation

Example

python -m scholia.googlescholar get-user-data gQVuJh8AAAAJ

scholia.googlescholar.get_user_data(user)

Return user data scrape from Google Scholar page.

Query Google Scholar with a specific Google Scholar user identifier and get citations statistics and the first metadata about the first works back.

Parameters:: user (str) – Google Scholar user identifier.
Returns:: data – User data.
Return type:: dict

Notes

Journals and proceedings title may not be written completely in Google Scholar, so is not returned completely.

Also the author list may be abbreviated and missing authors indicated with ‘…’. Year and citations information might also be missing from some of the works.

Only the first 20 works in the list are returned, - corresponding to the first page. This function will not page through the results.

Examples

>>> data = get_user_data('9cagBQYAAAAJ')
>>> data['citations'] > 6000  # F.A. Nielsen's citations are above 6.000
True

scholia.googlescholar.main(): Handle command-line interface.

scholia.model module

model.

class scholia.model.Work(work=None)

Bases: dict

Encapsulation of a work.

to_quickstatements()

Convert work to quickstatements.

Returns:: qs – Quickstatement-formatted work as a string.
Return type:: str

Examples

>>> work = Work(
...     {'authors': ['Niels Bohr'],
...      'title': 'On the Constitution of Atoms and Molecules'})
>>> qs = work.to_quickstatements()
>>> qs.find('CREATE') != -1
True

scholia.model.escape_string(string)

Escape string.

Parameters:: string (str) – String to be escaped
Returns:: escaped_string – Escaped string
Return type:: str

Examples

>>> string = 'String with " in it'
>>> escape_string(string)
'String with \\" in it'

scholia.model.main(): Handle command-line interface.

scholia.network module

network.

Usage:: scholia.network write-example-pajek-file

scholia.network.main(): Handle command-line interface.

scholia.network.write_pajek_from_sparql(filename, sparql): Write Pajek network file from SPARQL query.

scholia.qs module

Quickstatements.

scholia.qs.escape_quote(string)

Escape quotation mark.

Escape the quotation mark in a string.

Parameters:: string (str) – String to be escaped.
Returns:: escaped_string – Escaped string.
Return type:: str

scholia.qs.format_date_for_description(date_str)

Format date string for description.

Parameters:: date_str (str) – Date as DD-MM-YYYY.
Returns:: formatted_string – String formatted for description
Return type:: str

scholia.qs.normalize_string(string)

Normalize string for Quickstatements.

Strip initial and trailing spaces and convert multiple whitespaces to a single whitespace.

Parameters:: string (str) – String to be normalized.
Returns:: normalized_string – Normalized string.
Return type:: str

Examples

>>> normalize_string(' Finn  Nielsen ')
'Finn Nielsen'

scholia.qs.paper_to_quickstatements(paper)

Convert paper to Quickstatements.

Convert a paper represented as a dict in to Magnus Manske’s Quickstatement format for entry into Wikidata.

Parameters:: paper (dict) – Scraped paper represented as a dict.
Returns:: qs – Quickstatements as a string
Return type:: str

References

https://quickstatements.toolforge.org

Notes

title, authors (list), date, doi, year, language_q, volume, issue, pages, number_of_pages, url, full_text_url, published_in_q, openreview_id are recognized.

date takes precedence over year.

The label is shortened to 250 characters due if the title is longer than that due to a limitation in Wikidata.

Letters in DOI are uppercased in accordance with Wikidata convention.

scholia.qs.proceedings_to_quickstatements(proceedings)

Convert proceedings to Quickstatements.

Convert a paper represented as a dict in to Magnus Manske’s Quickstatement format for entry into Wikidata.

Parameters:: proceedings (dict) – Scraped paper represented as a dict.
Returns:: qs – Quickstatements as a string
Return type:: str

References

https://quickstatements.toolforge.org

Notes

title, authors (list), date, doi, year, language_q, volume, issue, pages, number_of_pages, url, full_text_url, published_in_q are recognized.

date takes precedence over year.

The label is shortened to 250 characters due if the title is longer than that due to a limitation in Wikidata.

scholia.query module

query.

Usage:: scholia.query arxiv-to-q <arxiv> scholia.query biorxiv-to-q <biorxiv> scholia.query chemrxiv-to-q <chemrxiv> scholia.query cas-to-q <cas> scholia.query atomic-symbol-to-q <symbol> scholia.query cordis-to-q <cordis> scholia.query count-authorships scholia.query count-scientific-articles scholia.query doi-to-q <doi> scholia.query github-to-q <github> scholia.query inchikey-to-q <inchikey> scholia.query issn-to-q <issn> scholia.query lipidmaps-to-q <lmid> scholia.query atomic-number-to-q <atomicnumber> scholia.query mesh-to-q <meshid> scholia.query ncbi-gene-to-q <gene> scholia.query ncbi-taxon-to-q <taxon> scholia.query omim-to-q <omimID> scholia.query orcid-to-q <orcid> scholia.query pubchem-to-q <cid> scholia.query pubmed-to-q <pmid> scholia.query q-to-label <q> scholia.query q-to-class <q> scholia.query random-author scholia.query random-podcast scholia.query random-work scholia.query ror-to-q <rorid> scholia.query twitter-to-q <twitter> scholia.query uniprot-to-q <protein> scholia.query viaf-to-q <viaf> scholia.query website-to-q <url> scholia.query wikipathways-to-q <wpid>

Examples

$ python -m scholia.query orcid-to-q 0000-0001-6128-3356 Q20980928

$ python -m scholia.query github-to-q vrandezo Q18618629

$ python -m scholia.query doi-to-q 10.475/123_4 Q41533080

$ python -m scholia.query q-to-label Q80 Tim Berners-Lee

exception scholia.query.QueryResultError

Bases: Exception

Generic query error.

scholia.query.arxiv_to_qs(arxiv)

Convert arxiv ID to Wikidata ID.

Parameters:: arxiv (str) – ArXiv identifier.
Returns:: qs – List of string with Wikidata IDs.
Return type:: list of str

Examples

>>> arxiv_to_qs('1507.04180') == ['Q27036443']
True

scholia.query.atomic_number_to_qs(atomic_number)

Look up a chemical element by atomic number and return a Wikidata ID.

Parameters:: atomic_number (str) – Atomic number.
Returns:: qs – List of strings with Wikidata IDs.
Return type:: list of str

Examples

>>> atomic_number_to_qs('6') == ['Q623']
True

scholia.query.atomic_symbol_to_qs(symbol)

Look up a chemical element by atomic symbol and return a Wikidata ID.

Parameters:: symbol (str) – Atomic symbol.
Returns:: qs – List of strings with Wikidata IDs.
Return type:: list of str

Examples

>>> atomic_symbol_to_qs('C') == ['Q623']
True

scholia.query.biorxiv_to_qs(biorxiv_id)

Convert bioRxiv ID to Wikidata ID.

Parameters:: biorxiv_id (str) – bioRxiv identifier.
Returns:: qs – List of string with Wikidata IDs.
Return type:: list of str

Examples

>>> biorxiv_to_qs('2020.08.20.259226') == ['Q104920313']
True

scholia.query.cas_to_qs(cas)

Convert a CAS registry number to Wikidata ID.

Parameters:: cas (str) – CAS registry number
Returns:: qs – List of strings with Wikidata IDs.
Return type:: list of str

Examples

>>> cas_to_qs('50-00-0') == ['Q161210']
True

scholia.query.chemrxiv_to_qs(chemrxiv_id)

Convert ChemRxiv ID to Wikidata ID.

Parameters:: chemrxiv_id (str) – ChemRxiv identifier.
Returns:: qs – List of strings with Wikidata IDs.
Return type:: list of str

Examples

>>> chemrxiv_to_qs('12791954') == ['Q98577324']
True

scholia.query.cordis_to_qs(cordis)

Convert CORDIS project ID to Wikidata ID.

Parameters:: cordis (str) – CORDIS identifier
Returns:: qs – List of strings with Wikidata IDs.
Return type:: list of str

Examples

>>> cordis_to_qs('604134') == ['Q27990087']
True

scholia.query.count_authorships()

Count the number of authorships.

Query the Wikidata Query Service to determine the number of authorships as the number of P50 relationships.

Returns:: count – Number of authorships.
Return type:: int

Notes

The count is determined from the SPARQL query

SELECT (COUNT(*) AS ?count) { [] wdt:P50 [] }

Examples

>>> count_authorships() > 1000000  # More than a million authorships
True

scholia.query.count_scientific_articles()

Return count for the number of scientific articles.

Returns:: count – #Number of scientific articles in Wikidata.
Return type:: int

scholia.query.doi_prefix_to_qs(doi)

Convert DOI prefix to Wikidata ID.

Wikidata Query Service is used to resolve the DOI.

The DOI string is converted to uppercase before any query is made. Uppercase DOIs are default in Wikidata.

Parameters:: doi (str) – DOI prefix identifier
Returns:: qs – Strings of Wikidata ID.
Return type:: list of str

Examples

>>> doi_prefix_to_qs('10.1186') == ['Q463494']
True

>>> doi_prefix_to_qs('10.1016') == ['Q746413']
True

scholia.query.doi_to_qs(doi)

Convert DOI to Wikidata ID.

Wikidata Query Service is used to resolve the DOI.

The DOI string is converted to uppercase before any query is made. Uppercase DOIs are default in Wikidata.

Parameters:: doi (str) – DOI identifier
Returns:: qs – List of strings with Wikidata IDs.
Return type:: list of str

Examples

>>> doi_to_qs('10.1186/S13321-016-0161-3') == ['Q26899110']
True

>>> doi_to_qs('10.1016/j.stem.2016.02.016') == ['Q23008981']
True

scholia.query.escape_string(string)

Escape string to be used in SPARQL query.

Parameters:: string (str) – String to be escaped.
Returns:: escaped_string – Excaped string.
Return type:: str

Examples

>>> escape_string('"hello"')
'\\"hello\\"'

>>> escape_string(r'\"hello"')
'\\\\\\"hello\\"'

scholia.query.github_to_qs(github)

Convert GitHub account name to Wikidata ID.

Parameters:: github (str) – github account identifier
Returns:: qs – List of strings with Wikidata IDs.
Return type:: list of str

Examples

>>> github_to_qs('vrandezo') == ['Q18618629']
True

scholia.query.identifier_to_qs(property, identifier)

Convert identifier to Wikidata identifiers.

Convert an specific identifier to a Wikidata identifier given the identifier type.

Parameters:

property (str) – String with Wikidata property identifier for a identifier.
identifier (str) – String with specific identifier.

Returns:

qs – List of zero or more strings with Wikidata IDs matching the identifier.

Return type:

list of str

Notes

The Wikidata Query Service is queried to resolve the given identifier. If an error happens an empty list is returned.

Examples

>>> property = "P10283"  # Property identifier for OpenAlex ID
>>> identifier = "a5060194743" # Corresponding to Q20980928 (E Willighagen)
>>> qs = identifier_to_qs(property, identifier)
>>> qs == ['Q20895241']
True

scholia.query.inchikey_to_qs(inchikey)

Convert InChIKey to Wikidata ID.

Parameters:: inchikey (str) – inchikey identifier
Returns:: qs – List of strings with Wikidata IDs.
Return type:: list of str

Examples

>>> inchikey_to_qs('UHOVQNZJYSORNB-UHFFFAOYSA-N') == ['Q2270']
True

scholia.query.iso639_to_q(language)

Convert ISO639 to Q item.

Parameters:: language (str) – language represented as a ISO 639 format
Returns:: q – Language represented as a q identifier.
Return type:: str or None

Examples

>>> iso639_to_q('en') == 'Q1860'
True

>>> iso639_to_q('dan') == 'Q9035'
True

scholia.query.issn_to_qs(issn)

Convert ISSN to Wikidata ID.

Parameters:: issn (str) – ISSN identifier as a string.
Returns:: qs – List of strings with Wikidata IDs.
Return type:: list of str

Examples

>>> issn_to_qs('1533-7928') == ['Q1660383']
True

scholia.query.lipidmaps_to_qs(lmid)

Convert a LIPID MAPS identifier to Wikidata ID.

Parameters:: lmid (str) – LIPID MAPS identifier
Returns:: qs – List of strings with Wikidata IDs.
Return type:: list of str

Examples

>>> lipidmaps_to_qs('LMFA') == ['Q63433687']
True
>>> lipidmaps_to_qs('LMFA00000007') == ['Q27114894']
True

scholia.query.main(): Handle command-line interface.

scholia.query.mesh_to_qs(meshid)

Convert MeSH ID to Wikidata ID.

Parameters:: meshid (str) – MeSH identifier
Returns:: qs – List of strings with Wikidata IDs.
Return type:: list of str

Examples

>>> mesh_to_qs('D028441') == ['Q33659470']
True

scholia.query.ncbi_gene_to_qs(gene)

Convert a NCBI gene identifier to Wikidata ID.

Wikidata Query Service is used to resolve the NCBI gene identifier.

The NCBI gene identifier string is converted to uppercase before any query is made.

Parameters:: gene (str) – NCBI gene identifier
Returns:: qs – List of strings with Wikidata IDs.
Return type:: list of str

Examples

>>> ncbi_taxon_to_qs('694009') == ['Q278567']
True

scholia.query.ncbi_taxon_to_qs(taxon)

Convert a NCBI taxon identifier to Wikidata ID.

Wikidata Query Service is used to resolve the NCBI taxon identifier.

The NCBI taxon identifier string is converted to uppercase before any query is made.

Parameters:: taxon (str) – NCBI taxon identifier
Returns:: qs – List of strings with Wikidata IDs.
Return type:: list of str

Examples

>>> ncbi_taxon_to_qs('694009') == ['Q278567']
True

scholia.query.omim_to_qs(omimID)

Convert OMIM identifier to Wikidata ID.

Parameters:: omim (str) – OMIM identifier
Returns:: qs – List of strings with Wikidata IDs.
Return type:: list of str

Examples

>>> 'Q41112' in omim_to_qs('181500')
True

scholia.query.openalex_to_qs(openalex)

Convert OpenAlex ID to Wikidata identifiers.

Given an identifier from the OpenAlex return zero or more corresponding Wikidata identifiers.

Parameters:: openalex (str) – OpenAlex identifier.
Returns:: qs – List of string with Wikidata IDs.
Return type:: list of str

Examples

>>> openalex_to_qs('a5060194743') == ['Q20895241']
True

scholia.query.orcid_to_qs(orcid)

Convert orcid to Wikidata ID.

Parameters:: orcid (str) – ORCID identifier
Returns:: qs – List of strings with Wikidata IDs.
Return type:: list of str

Examples

>>> orcid_to_qs('0000-0001-6128-3356') == ['Q20980928']
True

scholia.query.pubchem_to_qs(cid)

Convert a PubChem compound identifier (CID) to Wikidata ID.

Wikidata Query Service is used to resolve the PubChem identifier.

Parameters:: pmid (str) – PubChem compound identifier (CID)
Returns:: qs – List of strings with Wikidata IDs.
Return type:: list of str

Examples

>>> pubchem_to_qs('14123361') == ['Q289372']
True

scholia.query.pubmed_to_qs(pmid)

Convert a PubMed identifier to Wikidata ID.

Wikidata Query Service is used to resolve the PubMed identifier.

The PubMed identifier string is converted to uppercase before any query is made.

Parameters:: pmid (str) – PubMed identifier
Returns:: qs – List of strings with Wikidata IDs.
Return type:: list of str

Examples

>>> pubmed_to_qs('29029422') == ['Q42371516']
True

scholia.query.q_to_class(q)

Return Scholia class of Wikidata item.

The ‘class’, i.e., which kind of instance, the item is by querying the Wikidata Query Service.

Parameters:: q (str) – Wikidata item identifier.
Returns:: class_ – Scholia class represented as a string.
Return type:: ‘author’, ‘venue’, ‘organization’, …

Notes

The Wikidata Query Service will be queried for P31 value. The value is compared against a set of hardcoded matches.

scholia.query.q_to_dois(q)

Get DOIs for a Q item.

Query the Wikidata Query Service to get zero or more DOIs for a particular Q item identified by the Q identifier.

Parameters:: q (str) – String with Wikidata Q identifier.
Returns:: dois – List with zero or mores strings each containing a DOI.
Return type:: list of str

Examples

>>> dois = q_to_dois("Q87191917")
>>> dois == ['10.1016/S0140-6736(20)30211-7']
True

scholia.query.q_to_label(q, language='en')

Get label for Q item.

Parameters:

q (str) – String with Wikidata Q item.
language (str) – String with language identifier

Returns:

label – String with label corresponding to Wikidata item.

Return type:

str

Examples

>>> q_to_label('Q80') == "Tim Berners-Lee"
True

scholia.query.query_to_bindings(query)

Return response bindings from SPARQL query.

Query the Wikidata Query Service with the given query and return the response data as binding.

Parameters:: query (str) – SPARQL query as string
Returns:: bindings – Data as list of dicts.
Return type:: list

scholia.query.random_author()

Return random author.

Sample a scientific author randomly from Wikidata by a call to the Wikidata Query Service.

Returns:: q – Wikidata identifier.
Return type:: str

Notes

The author returned is not necessarily a scholarly author.

The algorithm uses a somewhat hopeful randomization and if no author is found it falls back on Q18618629.

Examples

>>> q = random_author()
>>> q.startswith('Q')
True

scholia.query.random_podcast()

Return random podcast.

Sample a podcast randomly from Wikidata by a call to the Wikidata Query Service.

Returns:: q – Wikidata identifier.
Return type:: str

Notes

The work returned is not necessarily a podcast.

The algorithm uses a somewhat hopeful randomization and if no work is found it falls back on Q21146099.

Examples

>>> q = random_work()
>>> q.startswith('Q')
True

scholia.query.random_work()

Return random work.

Sample a scientific work randomly from Wikidata by a call to the Wikidata Query Service.

Returns:: q – Wikidata identifier.
Return type:: str

Notes

The work returned is not necessarily a scholarly work.

The algorithm uses a somewhat hopeful randomization and if no work is found it falls back on Q21146099.

Examples

>>> q = random_work()
>>> q.startswith('Q')
True

scholia.query.ror_to_qs(rorid)

Convert a ROR identifier to Wikidata ID.

Wikidata Query Service is used to resolve the ROR identifier.

Parameters:: rorid (str) – ROR identifier
Returns:: qs – List of strings with Wikidata IDs.
Return type:: list of str

Examples

>>> ror_to_qs('038321296') == ['Q5566337']
True

scholia.query.search_article_titles(q, search_string=None)

Search articles with q item.

Parameters:

q (str) – String with Wikidata Q item.
search_string (str, optional) – String with query string. If it is not provided then the label of q items is used as the query string.

Returns:

results – List of dicts with query result.

Return type:

list of dict

Notes

This function uses the Egon Willighagen trick with iterating over batches of 500’000 thousand articles and performing a search in the (scientific) article title for the query string via the CONTAINS SPARQL function. Case is ignored.

scholia.query.search_article_titles_to_quickstatements(q, search_string=None)

Search article titles and return quickstatements.

Parameters:

q (str) – String with Wikidata Q identifier.
search_string (str, optional) – Search string

Returns:

quickstatements – String with quickstatement formated commands.

Return type:

str

scholia.query.twitter_to_qs(twitter)

Convert Twitter account name to Wikidata ID.

Parameters:: twitter (str) – Twitter account identifier
Returns:: qs – List of strings with Wikidata IDs.
Return type:: list of str

Examples

>>> twitter_to_qs('utafrith') == ['Q8219']
True

scholia.query.uniprot_to_qs(protein)

Convert a UniProt identifier to Wikidata ID.

Wikidata Query Service is used to resolve the UniProt identifier.

The UniProt identifier string is converted to uppercase before any query is made.

Parameters:: protein (str) – UniProt identifier
Returns:: qs – List of strings with Wikidata IDs.
Return type:: list of str

Examples

>>> uniprot_to_qs('P02649') == ['Q424728']
True

scholia.query.viaf_to_qs(viaf)

Convert VIAF identifier to Wikidata ID.

Parameters:: viaf (str) – VIAF identifier
Returns:: qs – List of strings with Wikidata IDs.
Return type:: list of str

Examples

>>> viaf_to_qs('59976288') == ['Q3259614']
True

scholia.query.website_to_qs(url)

Convert URL for website to Wikidata ID.

Parameters:: url (str) – URL for official website.
Returns:: qs – List of strings with Wikidata IDs.
Return type:: list of str

Examples

>>> url = ("https://papers.nips.cc/paper/"
...        "6498-online-and-differentially-private-tensor-decomposition")
>>> qs = website_to_qs(url)
>>> qs == ['Q46994097']
True

scholia.query.wikipathways_to_qs(wpid)

Convert a WikiPathways identifier to Wikidata ID.

Wikidata Query Service is used to resolve the WikiPathways identifier.

Parameters:: wpid (str) – WikiPathways identifier
Returns:: qs – List of strings with Wikidata IDs.
Return type:: list of str

Examples

>>> wikipathways_to_qs('WP111') == ['Q28031254']
True

scholia.rss module

rss.

Usage:: scholia.rss author-latest-works <q> scholia.rss venue-latest-works <q> scholia.rss topic-latest-works <q> scholia.rss organization-latest-works <q> scholia.rss sponsor-latest-works <q>
Description:: Functions related to feed.

Examples

$ python -m scholia.rss author-latest-works Q27061849 …

$ python -m scholia.rss venue-latest-works Q5936947 …

$ python -m scholia.rss topic-latest-works Q130983 …

$ python -m scholia.rss organization-latest-works Q1137652 …

$ python -m scholia.rss sponsor-latest-works Q1377836

References

https://validator.w3.org/feed/docs/rss2.html

scholia.rss.entities_to_works_rss(entities)

Convert Wikidata entities to works rss.

Parameters:: entities (list) – List of Wikidata items in nested structure.
Returns:: rss – RSS-formatted list of work items.
Return type:: str

Notes

Wikidata entities without a publication date are skipped.

scholia.rss.main(): Handle command-line arguments.

scholia.rss.wb_get_author_latest_works(q)

Return RSS-formated list of latest work for author.

Query the Wikidata Query Service for latest work from of the author specified with the Wikidata identifier q. Return the list formatted as a RSS feed.

Parameters:: q (str) – Wikidata identifier.
Returns:: rss – Feed in XML.
Return type:: str

Notes

The Wikidata Query Service may have problems for dates before 0. The SPARQL will fail in such instances [1]. This function will then return an empty list.

References

scholia.rss.wb_get_organization_latest_works(q)

Return feed for latest work from an organization.

Parameters:: q (str) – Wikidata identifer
Returns:: rss – RSS-formatted feed with latest work from an organization.
Return type:: str

scholia.rss.wb_get_sponsor_latest_works(q)

Return feed for latest work from a sponsor.

Parameters:: q (str) – Wikidata identifer
Returns:: rss – RSS-formatted feed with latest work from a sponsor.
Return type:: str

scholia.rss.wb_get_topic_latest_works(q)

Return feed for latest work on topic.

Parameters:: q (str) – Wikidata identifier
Returns:: rss – RSS-formatted feed with latest work on topic.
Return type:: str

scholia.rss.wb_get_venue_latest_works(q)

Return feed for latest work from venue.

Parameters:: q (str) – Wikidata identifer
Returns:: rss – RSS-formatted feed with latest work from venue.
Return type:: str

scholia.scrape module

Scrape websites.

scholia.scrape.nips module

Scraper for NIPS.

Usage:: scholia.scrape.nips scrape-paper-from-url <url> scholia.scrape.nips scrape-paper-urls-from-proceedings-url [options] <url> scholia.scrape.nips scrape-proceedings-from-url <url> scholia.scrape.nips paper-url-to-q <url> scholia.scrape.nips paper-url-to-quickstatements <url> scholia.scrape.nips paper-urls-to-quickstatements [options] <filename>
Options:: -o –output=file Output filename, default output to stdout –oe=encoding Output encoding [default: utf-8]

Notes

NeurIPS/NIPS papers are available from https://papers.nips.cc. The format of the NIPS/NeurIPS proceeding homepage has changed, so the scraper may not always work.

Papers may be published the year after the conference. Newer conferences seems to publish the same year while older conferences published the year after, e.g., NIPS 2008 is published in 2009, while NIPS 2009 is published in the same year, i.e., 2009.

For scrape-paper-urls-from-proceedings-url the proceedings URL should be one listed at https://papers.nips.cc/. It will return a JSON with a list of URLs for the individual papers.

The generated quickstatements from paper-url-to-quickstatements can be submitted to https://quickstatements.toolforge.org/.

scholia.scrape.nips.main(): Handle command-line interface.

scholia.scrape.nips.paper_to_q(paper)

Find Q identifier for paper.

Parameters:: paper (dict) – Paper represented as dictionary.
Returns:: q – Q identifier in Wikidata. None is returned if the paper is not found.
Return type:: str or None

Notes

This function might be use to test if a scraped NIPS paper is already present in Wikidata.

The match on title is using an exact query, meaning that any variation in lowercase/uppercase will not find the Wikidata item.

Examples

>>> paper = {
...     'title': 'Hash Embeddings for Efficient Word Representations',
...     'url': ('https://papers.nips.cc/paper/7078-hash-embeddings-for-'
...             'efficient-word-representations'),
...     'full_text_url': ('https://papers.nips.cc/paper/7078-hash-'
...                       'embeddings-for-efficient-word-'
...                       'representations.pdf')}
>>> paper_to_q(paper)
'Q39502551'

scholia.scrape.nips.paper_url_to_q(url)

Return Q identifier based on URL.

Scrape NIPS HMTL page with paper and use the extracted information on a query on Wikidata Query Service to find the Wikidata Q identifier.

Parameters:: url (str) – URL to NIPS HTML page.
Returns:: q – Q identifier for Wikidata or None if not found.
Return type:: str or None

Examples

>>> url = ("https://papers.nips.cc/paper/2020/hash/"
...        "00482b9bed15a272730fcb590ffebddd-Abstract.html")
>>> paper_url_to_q(url)
'Q104089790'

scholia.scrape.nips.paper_url_to_quickstatements(url)

Return Quickstatements for paper URL.

For a given URL pointing to a NIPS paper, scrape the bibliographic information from the NIPS website and return the corresponding Quickstatements command for entry into Wikidata.

Parameters:: url (str) – URL to NIPS paper.
Returns:: qs – String with paper formatted as Quickstatements.
Return type:: str

Notes

The function tests whether the paper is already entered into Wikidata and return a comment line with the corresponding Wikidata identifier.

scholia.scrape.nips.scrape_paper_from_old_url(url)

Scrape NIPS paper from uURL.

Download legacy HTML page from https://papers.nips.cc/paper/, extract and return bibliographic metadata.

Parameters:: url (str) – URL to NIPS paper. Should start with https://papers.nips.cc/paper/ The URL may either be to the HTML page or the PDF.
Returns:: paper – Dictionary with paper.
Return type:: dict

Notes

The information is scraped from the individual HTML pages on the website https://papers.nips.cc as it was formatted before 2020. The new format means that the scraping no longer works.

The returned paper dict contains url, title, authors as list, full_text_url, abstract, year and published_in_q. The year is corrected from the nominal to the actual publication year, such that papers published before NIPS 2009 has the publication year set to the year after the conference.

If the abstract is not listed on the papers.nips.cc HTML page then the abstract field is not available in the returned paper variable. Some of the earliest conferences does not list the abstract.

scholia.scrape.nips.scrape_paper_from_url(url)

Scrape NeurIPS paper from uURL.

Download HTML page from https://proceedings.neurips.cc, extract and return bibliographic metadata.

Parameters:: url (str) – URL to NeurIPS paper. Should start with https://proceedings.neurips.cc/paper/. The URL should be to the HTML page.
Returns:: paper – Dictionary with paper.
Return type:: dict

Notes

The information is scraped from the individual HTML pages on the website https://proceedings.neurips.cc.

The returned paper dict contains url, title, authors as list, full_text_url, abstract, year and published_in_q. The year is corrected from the nominal to the actual publication year, such that papers published before NIPS 2009 has the publication year set to the year after the conference.

If the abstract is not listed on the papers.nips.cc HTML page then the abstract field is not available in the returned paper variable. Some of the earliest conferences does not list the abstract.

Examples

>>> url = ("https://proceedings.neurips.cc/paper/2020/hash/"
...    "00482b9bed15a272730fcb590ffebddd-Abstract.html")
>>> entry = scrape_paper_from_url(url)
>>> entry['title'].startswith("An Unsupervised Information-Theoretic")
True

scholia.scrape.nips.scrape_paper_urls_from_proceedings_url(url)

Return paper URLs wrt. to proceedings.

Parameters:: url (str) – HTTPS URL for NIPS proceedings
Returns:: urls – Scraped URLs for papers in proceedings.
Return type:: list of str

scholia.scrape.nips.scrape_proceedings_from_url(url)

Scrape all papers from proceedings.

Parameters:: url (str) – HTTPS URL for NIPS proceedings
Returns:: entries – Scraped papers in list of dictionaries
Return type:: list of dict

scholia.scrape.ojs module

Scraping Open Journal Systems.

Usage:: scholia.scrape.ojs scrape-paper-from-url <url> scholia.scrape.ojs issue-url-to-quickstatements [options] <url> scholia.scrape.ojs paper-url-to-q <url> scholia.scrape.ojs paper-url-to-quickstatements [options] <url>
Options:: –iso639=iso639 Overwrite default iso639 -o –output=file Output filename, default output to stdout –oe=encoding Output encoding [default: utf-8]

Examples

$ python -m scholia.scrape.ojs paper-url-to-quickstatements: https://journals.uio.no/index.php/osla/article/view/5855

scholia.scrape.ojs.issue_url_to_paper_urls(url)

Scrape paper URLs from issue URL.

Scrape paper (article) URLs from a given Open Journal System issue URL.

Parameters:: url (str) – URL to an OJS issue.
Returns:: urls – List of URLs to papers.
Return type:: list of strs

Notes

Based on the URL, the HTML issue webpage will be fetched and the returned HTML parsed. Different matching approached are tried to extract the article URLs.

scholia.scrape.ojs.issue_url_to_quickstatements(url, iso639=None)

Return Quickstatements for papers in an issue.

From a Open Journal System issue URL extract metadata for individual papers and format them in the Quickstatement format for entry in Wikidata.

Parameters:

url (str) – URL for a OJS issue.
iso639 (str, optional) – String with ISO639 code. Default is None, meaning the iso639 will be read from the metadata.

Returns:

qs – String with quickstatements.

Return type:

str

scholia.scrape.ojs.main(): Handle command-line interface.

scholia.scrape.ojs.paper_to_q(paper)

Find Q identifier for paper.

Parameters:: paper (dict) – Paper represented as dictionary.
Returns:: q – Q identifier in Wikidata. None is returned if the paper is not found.
Return type:: str or None

Notes

This function might be used to test if a scraped OJS paper is already present in Wikidata.

The match on title is using an exact query, meaning that any variation in lowercase/uppercase will not find the Wikidata item. If the title is shorter than 21 character then only the URL is used to match.

Examples

>>> paper = {
...     'title': ('Linguistic Deviations in the Written Academic Register '
...               'of Danish University Students'),
...     'url': 'https://journals.uio.no/index.php/osla/article/view/5855'}
>>> paper_to_q(paper)
'Q61708017'

scholia.scrape.ojs.paper_url_to_q(url)

Return Q identifier based on URL.

Scrape OJS HTML page with paper and use the extracted information on a query on Wikidata Query Service to find the Wikidata Q identifier.

Parameters:: url (str) – URL to Open Journal System article webpage.
Returns:: q – Q identifier for Wikidata or None if not found.
Return type:: str or None

Examples

>>> url ='https://journals.uio.no/index.php/osla/article/view/5855'
>>> paper_url_to_q(url)
'Q61708017'

scholia.scrape.ojs.paper_url_to_quickstatements(url, iso639=None)

Scrape OJS paper and return quickstatements.

Given a URL to a HTML web page representing a paper formatted by the Open Journal Systems, return quickstatements for data entry in Wikidata with the Magnus Manske Quickstatement tool.

Parameters:

url (str) – URL to OJS paper as a string.
iso639 (str, optional) – String with ISO639 language code. Default is None, meaning the iso639 will be read from the metadata.

Returns:

qs – Quickstatements for paper as a string.

Return type:

str

Notes

It the paper is already entered in Wikidata then a comment will just be produced, - no quickstatements.

The quickstatement tool is available at https://quickstatements.toolforge.org.

scholia.scrape.ojs.scrape_paper_from_url(url)

Scrape OJS paper from URL.

Parameters:: url (str) – URL to paper as a string
Returns:: paper – Paper represented as a dictionary.
Return type:: dict

Example

>>> url = 'https://tidsskrift.dk/carlnielsenstudies/article/view/27763'
>>> paper = scrape_paper_from_url(url)
>>> paper['authors'] == ['John Fellow']
True

scholia.tex module

tex.

Usage:

scholia.tex extract-qs-from-aux <file> scholia.tex write-bbl-from-aux <file> scholia.tex write-bib-from-aux <file>

Description:

Work with latex and bibtex.

The functionality is not complete.

Example latex document:

documentclass{article} pdfoutput=1 usepackage[utf8]{inputenc}

begin{document} Scientific citations cite{Q26857876,Q21172284}. Semantic relatedness cite{Q26973018}. bibliographystyle{unsrt} bibliography{} end{document}

scholia.tex.authors_to_bibtex_authors(authors)

Convert a Wikidata entity to an author in BibTeX.

Parameters:: authors (dict) – Wikidata entity as hierarchical structure.
Returns:: entry – Bibtex entry in Unicode string.
Return type:: str

scholia.tex.entity_to_bibtex_entry(entity, key=None)

Convert Wikidata entity to bibtex-formatted entry.

Parameters:

entity (dict) – Wikidata entity as hierarchical structure.
key (str) – Bibtex key.

Returns:

entry – Bibtex entry in Unicode string.

Return type:

str

scholia.tex.escape_to_tex(string, escape_type='normal')

Escape a text to a tex/latex safe text.

Parameters:

string (str or None) – Unicode string to be escaped.
escape_type (normal or url, default normal) – Type of escaping.

Returns:

escaped_string – Escaped unicode string. If the input is None then an empty string is returned.

Return type:

str

Examples

>>> escape_to_tex("^^") == r'\^{}\^{}'
True

>>> escaped = escape_to_tex('10.1007/978-3-319-18111-0_26', 'url')
>>> escaped == '10.1007/978-3-319-18111-0\\_26'
True

References

scholia.tex.extract_dois_from_aux_string(string)

Extract DOIs from string.

Parameters:: string (str) – Extract Wikidata identifiers from citations.
Returns:: dois – List of strings.
Return type:: list of str

Examples

>>> string = "\\citation{10.1186/S13321-016-0161-3}"
>>> extract_dois_from_aux_string(string)
['10.1186/S13321-016-0161-3']

scholia.tex.extract_qs_from_aux_string(string)

Extract qs from string.

Parameters:: string (str) – Extract Wikidata identifiers from citations.
Returns:: qs – List of strings.
Return type:: list of str

Examples

>>> string = "\\citation{Q28042913}"
>>> extract_qs_from_aux_string(string)
['Q28042913']

>>> string = "\\citation{Q28042913,Q27615040}"
>>> extract_qs_from_aux_string(string)
['Q28042913', 'Q27615040']

>>> string = "\\citation{Q28042913,Q27615040,Q27615040}"
>>> extract_qs_from_aux_string(string)
['Q28042913', 'Q27615040', 'Q27615040']

>>> string = "\\citation{Q28042913,NielsenF2002Neuroinformatics,Q27615040}"
>>> extract_qs_from_aux_string(string)
['Q28042913', 'Q27615040']

>>> string = "\\citation{Q28042913,Q27615040.Q27615040}"
>>> extract_qs_from_aux_string(string)
['Q28042913']

scholia.tex.guess_bibtex_entry_type(entity)

Guess Bibtex entry type.

Parameters:: entity (dict) – Wikidata item.
Returns:: entry_type – Entry type as a string: ‘Article’, ‘InProceedings’, etc.
Return type:: str

scholia.tex.main(): Handle command-line arguments.

scholia.text module

scholia.text.

Usage:

scholia.text text-to-topic-q-text-setup scholia.text text-to-topic-qs <text> scholia.text text-to-topics-url <text>

Options:

-h –help Help

Description:

Handle text.

text-to-topic-qs command will setup a matching method that can convert a text to Wikidata Q identifiers associated with topics of scientific articles. The setup will call the Wikidata Query Service to setup a regular expression for the matching.

The result of the text-to-topic-qs command-line command can be used to query Scholia:

https://scholia.toolforge.org/topics/<qs>

class scholia.text.TextToTopicQText

Bases: object

Converter of text to Wikidata Q identifier data.

mapper

Dictionary between labels and associated Wikidata Q identifiers.

Type:: dict

pattern

Regular expression pattern for matching Wikidata labels.

Type:: re.SRE_Pattern

get_mapper()

Return mapper between label and Wikidata item.

Query the Wikidata Query service to get Wikidata identifiers and associated labels and convert them to a dictionary.

Returns:: mapper – Dictionary where the keys are labels associated with Wikidata Q identifiers.
Return type:: dict

Notes

This method queries the Wikidata Query Service with a static SPARQL query. It well take some time to complete, perhaps 30 seconds or more.

In some cases a timeout may occur in the middle of a response, making the JSON return invalid. The method will try second time. If this also fails, then the method will raise an exception.

load_mapper_from_json(filename=None)

Load map from JSON.

Parameters:: filename (str) – Filename for JSON file.

save_mapper_as_json(filename=None)

Save mapper as JSON file.

Parameters:: filename (str) – Filename for JSON file to be written.

save_object_as_pickle(filename=None): Save object.

text_to_topic_q_text(text)

Convert text to q-text.

Parameters:: text (str) – Text to be matched.
Returns:: q_text – Text with words and phrases substituted with Wikidata Q identifiers.
Return type:: str

text_to_topic_qs(text)

Return Wikidata Q identifiers from text matching.

Parameters:: text (str) – Text to be matched.
Returns:: qs – List with Wikidata Q identifiers as strings.
Return type:: list of str

scholia.text.load_pickle_text_to_topic_q_text()

Load an object that is already set up.

Load the TextToTopicQText object from a pickle file and if it is not available set it up from the object.

Returns:: text_to_topic_q_text – Text-to-topic-q-text object that is set up and ready to use.
Return type:: TextToTopicQText

scholia.text.load_text_to_topic_q_text()

Set up an object.

Set up TextToTopicQText.

Returns:: text_to_topic_q_text – Text-to-topic-q-text object that is set up and ready to use.
Return type:: TextToTopicQText

scholia.text.main(): Handle command-line interface.

scholia.utils module

utils.

scholia.utils.escape_string(string)

Escape string.

Parameters:: string (str) – String to be escaped
Returns:: escaped_string – Escaped string
Return type:: str

Examples

>>> string = 'String with " in it'
>>> escape_string(string)
'String with \\" in it'

scholia.utils.pages_to_number_of_pages(pages)

Compute number of pages based on pages represented as string.

Parameters:: pages (str) – Pages represented as a string.
Returns:: number_of_pages – Number of pages returned as an integer. If the conversion is not possible then None is returned.
Return type:: int or None

Examples

>>> pages_to_number_of_pages('61-67')
7

scholia.utils.remove_special_characters_url(url)

Remove url encoded characters and normalize non-ascii characters.

Parameters:: url (str) – URL-encoded string
Returns:: formatted_string – Normalized string without non-ascii characters or spaces
Return type:: str

scholia.utils.sanitize_q(q)

Sanitize Wikidata identifier.

Parameters:: q (str or int) – Wikidata identifier as string.
Returns:: sanitized_q – Sanitized Wikidata identifier, empty if not a Wikidata identifier.
Return type:: str

Examples

>>> sanitize_q(' Q5 ')
'Q5'
>>> sanitize_q('Q5"')
'Q5'
>>> sanitize_q('Wikidata')
''
>>> sanitize_q(5)
'Q5'
>>> sanitize_q('5')
'Q5'

scholia.utils.string_to_list(string)

Convert comma/space/tab/pipe separated string to list.

Parameters:: string (str) – Query string.
Returns:: elements – List of strings splitted based on separators
Return type:: list of str

Examples

>>> string_to_list("1, 2 | 3\t4 |5")
['1', '2', '3', '4', '5']
>>> string_to_list(" 10.10,abc|123 ")
['10.10', 'abc', '123']

scholia.utils.string_to_type(string)

Guess type of string.

Parameters:: string (str) – Query string.
Returns:: result
Return type:: str

Examples

>>> string_to_type('1121-4545')
'issn'

scholia.wikipedia module

wikipedia.

Usage:

scholia.wikipedia q-to-bibliography-templates <q> [options]

Options:

--debug: Debug messages.

-h –help Help message –oe=encoding Output encoding [default: utf-8] -o –output=<file> Output filename, default output to stdout –verbose Verbose messages.

Examples

$ python -m scholia.wikipedia q-to-bibliography-templates –debug Q20980928

scholia.wikipedia.main(): Handle command-line interface.

scholia.wikipedia.q_to_bibliography_templates(q)

Construct bibliography for Wikidata based on Wikidata identifier.

Parameters:: q (str) – String with Wikidata item identifier.
Returns:: wikitext – String with wikipedia template formatted bibliography.
Return type:: str

References

https://en.wikipedia.org/wiki/Template:Cite_journal

Examples

>>> wikitext = q_to_bibliography_templates("Q28923929")
>>> wikitext.find('Cite journal') != -1
True