Scholia packages
Submodules
scholia.api module
api.
- Usage:
scholia.api get <qs>… scholia.api q-to-classes <q> scholia.api q-to-name <q> scholia.api search [options] <query>
- Options:
- --limit=<limit>
Number of search results to return [default: 10]
- Description:
Interface to the Wikidata API and its bibliographic data.
Examples
- $ python -m scholia.api get Q26857876 Q21172284 | wc
2 1289 16174
$ python -m scholia.api q-to-classes Q28133147 Q13442814
- scholia.api.entity_to_authors(entity, return_humanness=False)
Extract authors from entity.
- Parameters:
entity (dict) – Dictionary with Wikidata item
return_humanness (bool) – Toogle whether return argument should contain a list of strings or a list of tuples with both name and an indication of whether the author is a human. Some authors are organizations and formatting of authors may need to distinguish between humans and organizations.
- Returns:
authors – List with each element representing an author. Each element may either be a string with the author name or a tuple with the author name and a boolean indicating humanness of the author.
- Return type:
list of str or list of two-tuple
- scholia.api.entity_to_classes(entity)
Extract ‘instance_of’ classes.
- Parameters:
entity (dict) – Dictionary with Wikidata item
- Returns:
classes – List of strings.
- Return type:
list of str
Examples
>>> entities = wb_get_entities(['Q28133147']) >>> classes = entity_to_classes(list(entities.values())[0]) >>> 'Q13442814' in classes True
- scholia.api.entity_to_doi(entity)
Extract DOI of publication from entity.
- Parameters:
entity (dict) – Dictionary with Wikidata item
- Returns:
doi – DOI as string. An empty string is returned if the field is not set.
- Return type:
str
Examples
>>> entities = wb_get_entities(['Q24239902']) >>> doi = entity_to_doi(entities['Q24239902']) >>> doi == '10.1038/438900A' True
- scholia.api.entity_to_full_text_url(entity)
Extract full text URL of publication from entity.
- Parameters:
entity (dict) – Dictionary with Wikidata item
- Returns:
url – URL as string. An empty string is returned if the field is not set.
- Return type:
str
Examples
>>> entities = wb_get_entities(['Q28374293']) >>> url = entity_to_full_text_url(entities['Q28374293']) >>> url == ('http://papers.nips.cc/paper/' ... '5872-efficient-and-robust-automated-machine-learning.pdf') True
- scholia.api.entity_to_journal_title(entity)
Extract journal of publication from entity.
- Parameters:
entity (dict) – Dictionary with Wikidata item
- Returns:
journal – Journal as string. An empty string is returned if the field is not set.
- Return type:
str
Examples
>>> entities = wb_get_entities(['Q24239902']) >>> journal = entity_to_journal_title(entities['Q24239902']) >>> journal == 'Nature' True
- scholia.api.entity_to_label(entity)
Extract label from entity.
- Parameters:
entity (dict) – Dictionary with Wikidata item
- Returns:
label – String with label.
- Return type:
str
- scholia.api.entity_to_month(entity, language='en')
Extract month of publication from entity.
- Parameters:
entity (dict) – Dictionary with Wikidata item.
language (str) – Language, if none, returns the month as a string with the month number.
- Returns:
month – Month as string. If month is not specified, i.e., the precision is year then None is return.
- Return type:
str or None
- scholia.api.entity_to_name(entity)
Extract the name of the item.
- Parameters:
entity (dict) – Dictionary with Wikidata item representing a person.
- Returns:
name – Name of person.
- Return type:
str or None
Examples
>>> entities = wb_get_entities(['Q8219']) >>> name = entity_to_name(list(entities.values())[0]) >>> name == 'Uta Frith' True
- scholia.api.entity_to_pages(entity)
Extract pages of publication from entity.
- Parameters:
entity (dict) – Dictionary with Wikidata item
- Returns:
pages – Pages as string. An empty string is returned if the field is not set.
- Return type:
str
Examples
>>> entities = wb_get_entities(['Q24239902']) >>> pages = entity_to_pages(entities['Q24239902']) >>> pages == '900-901' True
- scholia.api.entity_to_smiles(entity)
Extract SMILES of a chemical.
- Parameters:
entity (dict) – Dictionary with Wikidata item
- Returns:
smiles – SMILES as string.
- Return type:
str
Examples
>>> entities = wb_get_entities(['Q48791494']) >>> smiles = entity_to_smiles(entities['Q48791494']) >>> smiles == 'CC(C)[C@H]1CC[C@@]2(CO2)[C@@H]3[C@@H]1C=C(COC3=O)C(=O)O' True
- scholia.api.entity_to_title(entity)
Extract title from entity.
- Parameters:
entity (dict) – Dictionary with Wikidata item.
- Returns:
title – Title as string. If the title is not set then None is returned.
- Return type:
str or None
- scholia.api.entity_to_volume(entity)
Extract volume of publication from entity.
- Parameters:
entity (dict) – Dictionary with Wikidata item
- Returns:
volume – Volume as string. An empty string is returned if the field is not set.
- Return type:
str
Examples
>>> entities = wb_get_entities(['Q21172284']) >>> volume = entity_to_volume(entities['Q21172284']) >>> volume == '12' True
- scholia.api.entity_to_year(entity)
Extract year of publication from entity.
- Parameters:
entity (dict) – Dictionary with Wikidata item
- Returns:
year – Year as string.
- Return type:
str or None
- scholia.api.is_human(entity)
Return true if entity is a human.
- Parameters:
entity (dict) – Structure with Wikidata entity.
- Returns:
result – Result of comparison.
- Return type:
bool
- scholia.api.main()
Handle command-line arguments.
- scholia.api.search(query, page, limit=10)
Search Wikidata.
- Parameters:
query (str) – Query string.
page (int) – Number of current page.
limit (int, optional) – Number of maximum search results to return.
- Returns:
result
- Return type:
dict
- scholia.api.select_value_by_language_preferences(choices, preferences=('en', 'de', 'fr'))
Select value based on language preference.
- Parameters:
choices (dict) – Dictionary with language as keys and strings as values.
preferences (list or tuple) – Iterator
- Returns:
value – Selected string. Returns an empty string if there is no choices.
- Return type:
str
Examples
>>> choices = {'da': 'Bog', 'en': 'Book', 'de': 'Buch'} >>> select_value_by_language_preferences(choices) 'Book'
- scholia.api.wb_get_entities(qs)
Get entities from Wikidata.
Query the Wikidata webservice via is API.
- Parameters:
qs (list of str) – List of strings, each with a Wikidata item identifier.
- Returns:
data – Dictionary of dictionaries.
- Return type:
dict of dict
scholia.arxiv module
arxiv.
- Usage:
scholia.arxiv get-metadata <arxiv> scholia.arxiv get-quickstatements [options] <arxiv>
- Options:
-o –output=file Output filename, default output to stdout
References
- scholia.arxiv.get_metadata(arxiv)
Get metadata about an arxiv publication from website.
Scrapes the arXiv webpage corresponding to the paper with the arxiv identifier and return the metadata for the paper in a dictionary.
- Parameters:
arxiv (str) – ArXiv identifier.
- Returns:
metadata – Dictionary with metadata. None is returned if the identifier is not found.
- Return type:
dict or None
Notes
This function queries arXiv. It must not be used to crawl arXiv. It does not look at robots.txt.
The language is set to English.
References
Examples
>>> metadata = get_metadata('1503.00759') >>> metadata['doi'] == '10.1109/JPROC.2015.2483592' True >>> 'error' in get_metadata('5432.01234') True
- scholia.arxiv.main()
Handle command-line interface.
- scholia.arxiv.string_to_arxiv(string)
Extract arxiv id from string.
The arXiv identifier part of string will be extracted, where the identifier pattern should be in the format of a series of digits followed by a period followed by a series of digits. Other formats will not be matched. If multiple identifier patterns are in the input string then only the first is returned.
- Parameters:
string (str) – String with arxiv ID.
- Returns:
arxiv – String with arxiv ID.
- Return type:
str or None
Examples
>>> string = "http://arxiv.org/abs/1103.2903" >>> arxiv = string_to_arxiv(string) >>> arxiv == '1103.2903' True
- scholia.arxiv.string_to_arxivs(string)
Extract arxiv IDs from string.
Multiple arXiv identifier part of string will be extracted, where the identifier pattern should be in the format of a series of digits followed by a period followed by a series of digits. Other formats will not be matched. If multiple identifier patterns are in the input string then only the first is returned.
- Parameters:
string (str) – String with arxiv ID.
- Returns:
arxivs – String with arxiv IDs.
- Return type:
list of str
Examples
>>> string = "2210.03493 http://arxiv.org/abs/1103.2903" >>> arxivs = string_to_arxivs(string) >>> '1103.2903' in arxivs True >>> "2210.03493" in arxivs True
scholia.github module
github.
- Usage:
scholia.github get-user <username> scholia.github get-user-followers <username> scholia.github get-user-number-of-followers <username> scholia.github get-user-repos <username>
- scholia.github.get(resource)
Query GitHub API for resource.
- Parameters:
resource (str) – Resource, e.g., “/users/fnielsen” for the user ‘fnielsen’.
- Returns:
data – Data from the GitHub API converted to a Python object from the JSON.
- Return type:
dictionary or list
References
- scholia.github.get_user(username)
Get user information from GitHub.
- Parameters:
username (str) – GitHub username as a string.
- Returns:
data – User information as a dictionary.
- Return type:
dict
Examples
>>> data = get_user('fnielsen') >>> data.get('name', '').startswith('Finn') or 'name' not in data True
- scholia.github.get_user_followers(username)
Get user followers from GitHub.
- Parameters:
username (str) – GitHub username as a string.
- Returns:
data – List of users.
- Return type:
list of dict
- scholia.github.get_user_repos(username)
Get repos for a user from GitHub.
- Parameters:
username (str) – GitHub username as a string.
- Returns:
data – List of repos.
- Return type:
list of dict
- scholia.github.main()
Handle command-line interface.
scholia.googlescholar module
scholia.googlescholar.
- Usage:
scholia.googlescholar get-user-data <user>
- Options:
-h –help Documentation
Example
python -m scholia.googlescholar get-user-data gQVuJh8AAAAJ
- scholia.googlescholar.get_user_data(user)
Return user data scrape from Google Scholar page.
Query Google Scholar with a specific Google Scholar user identifier and get citations statistics and the first metadata about the first works back.
- Parameters:
user (str) – Google Scholar user identifier.
- Returns:
data – User data.
- Return type:
dict
Notes
Journals and proceedings title may not be written completely in Google Scholar, so is not returned completely.
Also the author list may be abbreviated and missing authors indicated with ‘…’. Year and citations information might also be missing from some of the works.
Only the first 20 works in the list are returned, - corresponding to the first page. This function will not page through the results.
Examples
>>> data = get_user_data('9cagBQYAAAAJ') >>> data['citations'] > 6000 # F.A. Nielsen's citations are above 6.000 True
- scholia.googlescholar.main()
Handle command-line interface.
scholia.model module
model.
- class scholia.model.Work(work=None)
Bases:
dictEncapsulation of a work.
- to_quickstatements()
Convert work to quickstatements.
- Returns:
qs – Quickstatement-formatted work as a string.
- Return type:
str
Examples
>>> work = Work( ... {'authors': ['Niels Bohr'], ... 'title': 'On the Constitution of Atoms and Molecules'}) >>> qs = work.to_quickstatements() >>> qs.find('CREATE') != -1 True
- scholia.model.escape_string(string)
Escape string.
- Parameters:
string (str) – String to be escaped
- Returns:
escaped_string – Escaped string
- Return type:
str
Examples
>>> string = 'String with " in it' >>> escape_string(string) 'String with \\" in it'
- scholia.model.main()
Handle command-line interface.
scholia.network module
network.
- Usage:
scholia.network write-example-pajek-file
- scholia.network.main()
Handle command-line interface.
- scholia.network.write_pajek_from_sparql(filename, sparql)
Write Pajek network file from SPARQL query.
scholia.qs module
Quickstatements.
- scholia.qs.escape_quote(string)
Escape quotation mark.
Escape the quotation mark in a string.
- Parameters:
string (str) – String to be escaped.
- Returns:
escaped_string – Escaped string.
- Return type:
str
- scholia.qs.format_date_for_description(date_str)
Format date string for description.
Format date string for description.
- Parameters:
date_str (str) – Date as DD-MM-YYYY.
- Returns:
formatted_string – String formatted for description
- Return type:
str
- scholia.qs.normalize_string(string)
Normalize string for Quickstatements.
Strip initial and trailing spaces and convert multiple whitespaces to a single whitespace.
- Parameters:
string (str) – String to be normalized.
- Returns:
normalized_string – Normalized string.
- Return type:
str
Examples
>>> normalize_string(' Finn Nielsen ') 'Finn Nielsen'
- scholia.qs.paper_to_quickstatements(paper)
Convert paper to Quickstatements.
Convert a paper represented as a dict in to Magnus Manske’s Quickstatement format for entry into Wikidata.
- Parameters:
paper (dict) – Scraped paper represented as a dict.
- Returns:
qs – Quickstatements as a string
- Return type:
str
References
https://quickstatements.toolforge.org
Notes
title, authors (list), date, doi, year, language_q, volume, issue, pages, number_of_pages, url, full_text_url, published_in_q, openreview_id are recognized.
date takes precedence over year.
The label is shortened to 250 characters due if the title is longer than that due to a limitation in Wikidata.
Letters in DOI are uppercased in accordance with Wikidata convention.
- scholia.qs.proceedings_to_quickstatements(proceedings)
Convert proceedings to Quickstatements.
Convert a paper represented as a dict in to Magnus Manske’s Quickstatement format for entry into Wikidata.
- Parameters:
proceedings (dict) – Scraped paper represented as a dict.
- Returns:
qs – Quickstatements as a string
- Return type:
str
References
https://quickstatements.toolforge.org
Notes
title, authors (list), date, doi, year, language_q, volume, issue, pages, number_of_pages, url, full_text_url, published_in_q are recognized.
date takes precedence over year.
The label is shortened to 250 characters due if the title is longer than that due to a limitation in Wikidata.
scholia.query module
query.
- Usage:
scholia.query arxiv-to-q <arxiv> scholia.query biorxiv-to-q <biorxiv> scholia.query chemrxiv-to-q <chemrxiv> scholia.query cas-to-q <cas> scholia.query atomic-symbol-to-q <symbol> scholia.query cordis-to-q <cordis> scholia.query count-authorships scholia.query count-scientific-articles scholia.query doi-to-q <doi> scholia.query github-to-q <github> scholia.query inchikey-to-q <inchikey> scholia.query issn-to-q <issn> scholia.query lipidmaps-to-q <lmid> scholia.query atomic-number-to-q <atomicnumber> scholia.query mesh-to-q <meshid> scholia.query ncbi-gene-to-q <gene> scholia.query ncbi-taxon-to-q <taxon> scholia.query omim-to-q <omimID> scholia.query orcid-to-q <orcid> scholia.query pubchem-to-q <cid> scholia.query pubmed-to-q <pmid> scholia.query q-to-label <q> scholia.query q-to-class <q> scholia.query random-author scholia.query random-podcast scholia.query random-work scholia.query ror-to-q <rorid> scholia.query twitter-to-q <twitter> scholia.query uniprot-to-q <protein> scholia.query viaf-to-q <viaf> scholia.query website-to-q <url> scholia.query wikipathways-to-q <wpid>
Examples
$ python -m scholia.query orcid-to-q 0000-0001-6128-3356 Q20980928
$ python -m scholia.query github-to-q vrandezo Q18618629
$ python -m scholia.query doi-to-q 10.475/123_4 Q41533080
$ python -m scholia.query q-to-label Q80 Tim Berners-Lee
- exception scholia.query.QueryResultError
Bases:
ExceptionGeneric query error.
- scholia.query.arxiv_to_qs(arxiv)
Convert arxiv ID to Wikidata ID.
- Parameters:
arxiv (str) – ArXiv identifier.
- Returns:
qs – List of string with Wikidata IDs.
- Return type:
list of str
Examples
>>> arxiv_to_qs('1507.04180') == ['Q27036443'] True
- scholia.query.atomic_number_to_qs(atomic_number)
Look up a chemical element by atomic number and return a Wikidata ID.
- Parameters:
atomic_number (str) – Atomic number.
- Returns:
qs – List of strings with Wikidata IDs.
- Return type:
list of str
Examples
>>> atomic_number_to_qs('6') == ['Q623'] True
- scholia.query.atomic_symbol_to_qs(symbol)
Look up a chemical element by atomic symbol and return a Wikidata ID.
- Parameters:
symbol (str) – Atomic symbol.
- Returns:
qs – List of strings with Wikidata IDs.
- Return type:
list of str
Examples
>>> atomic_symbol_to_qs('C') == ['Q623'] True
- scholia.query.biorxiv_to_qs(biorxiv_id)
Convert bioRxiv ID to Wikidata ID.
- Parameters:
biorxiv_id (str) – bioRxiv identifier.
- Returns:
qs – List of string with Wikidata IDs.
- Return type:
list of str
Examples
>>> biorxiv_to_qs('2020.08.20.259226') == ['Q104920313'] True
- scholia.query.cas_to_qs(cas)
Convert a CAS registry number to Wikidata ID.
- Parameters:
cas (str) – CAS registry number
- Returns:
qs – List of strings with Wikidata IDs.
- Return type:
list of str
Examples
>>> cas_to_qs('50-00-0') == ['Q161210'] True
- scholia.query.chemrxiv_to_qs(chemrxiv_id)
Convert ChemRxiv ID to Wikidata ID.
- Parameters:
chemrxiv_id (str) – ChemRxiv identifier.
- Returns:
qs – List of strings with Wikidata IDs.
- Return type:
list of str
Examples
>>> chemrxiv_to_qs('12791954') == ['Q98577324'] True
- scholia.query.cordis_to_qs(cordis)
Convert CORDIS project ID to Wikidata ID.
- Parameters:
cordis (str) – CORDIS identifier
- Returns:
qs – List of strings with Wikidata IDs.
- Return type:
list of str
Examples
>>> cordis_to_qs('604134') == ['Q27990087'] True
- scholia.query.count_authorships()
Count the number of authorships.
Query the Wikidata Query Service to determine the number of authorships as the number of P50 relationships.
- Returns:
count – Number of authorships.
- Return type:
int
Notes
The count is determined from the SPARQL query
SELECT (COUNT(*) AS ?count) { [] wdt:P50 [] }
Examples
>>> count_authorships() > 1000000 # More than a million authorships True
- scholia.query.count_scientific_articles()
Return count for the number of scientific articles.
- Returns:
count – #Number of scientific articles in Wikidata.
- Return type:
int
- scholia.query.doi_prefix_to_qs(doi)
Convert DOI prefix to Wikidata ID.
Wikidata Query Service is used to resolve the DOI.
The DOI string is converted to uppercase before any query is made. Uppercase DOIs are default in Wikidata.
- Parameters:
doi (str) – DOI prefix identifier
- Returns:
qs – Strings of Wikidata ID.
- Return type:
list of str
Examples
>>> doi_prefix_to_qs('10.1186') == ['Q463494'] True
>>> doi_prefix_to_qs('10.1016') == ['Q746413'] True
- scholia.query.doi_to_qs(doi)
Convert DOI to Wikidata ID.
Wikidata Query Service is used to resolve the DOI.
The DOI string is converted to uppercase before any query is made. Uppercase DOIs are default in Wikidata.
- Parameters:
doi (str) – DOI identifier
- Returns:
qs – List of strings with Wikidata IDs.
- Return type:
list of str
Examples
>>> doi_to_qs('10.1186/S13321-016-0161-3') == ['Q26899110'] True
>>> doi_to_qs('10.1016/j.stem.2016.02.016') == ['Q23008981'] True
- scholia.query.escape_string(string)
Escape string to be used in SPARQL query.
- Parameters:
string (str) – String to be escaped.
- Returns:
escaped_string – Excaped string.
- Return type:
str
Examples
>>> escape_string('"hello"') '\\"hello\\"'
>>> escape_string(r'\"hello"') '\\\\\\"hello\\"'
- scholia.query.github_to_qs(github)
Convert GitHub account name to Wikidata ID.
- Parameters:
github (str) – github account identifier
- Returns:
qs – List of strings with Wikidata IDs.
- Return type:
list of str
Examples
>>> github_to_qs('vrandezo') == ['Q18618629'] True
- scholia.query.identifier_to_qs(property, identifier)
Convert identifier to Wikidata identifiers.
Convert an specific identifier to a Wikidata identifier given the identifier type.
- Parameters:
property (str) – String with Wikidata property identifier for a identifier.
identifier (str) – String with specific identifier.
- Returns:
qs – List of zero or more strings with Wikidata IDs matching the identifier.
- Return type:
list of str
Notes
The Wikidata Query Service is queried to resolve the given identifier. If an error happens an empty list is returned.
Examples
>>> property = "P10283" # Property identifier for OpenAlex ID >>> identifier = "a5060194743" # Corresponding to Q20980928 (E Willighagen) >>> qs = identifier_to_qs(property, identifier) >>> qs == ['Q20895241'] True
- scholia.query.inchikey_to_qs(inchikey)
Convert InChIKey to Wikidata ID.
- Parameters:
inchikey (str) – inchikey identifier
- Returns:
qs – List of strings with Wikidata IDs.
- Return type:
list of str
Examples
>>> inchikey_to_qs('UHOVQNZJYSORNB-UHFFFAOYSA-N') == ['Q2270'] True
- scholia.query.iso639_to_q(language)
Convert ISO639 to Q item.
- Parameters:
language (str) – language represented as a ISO 639 format
- Returns:
q – Language represented as a q identifier.
- Return type:
str or None
Examples
>>> iso639_to_q('en') == 'Q1860' True
>>> iso639_to_q('dan') == 'Q9035' True
- scholia.query.issn_to_qs(issn)
Convert ISSN to Wikidata ID.
- Parameters:
issn (str) – ISSN identifier as a string.
- Returns:
qs – List of strings with Wikidata IDs.
- Return type:
list of str
Examples
>>> issn_to_qs('1533-7928') == ['Q1660383'] True
- scholia.query.lipidmaps_to_qs(lmid)
Convert a LIPID MAPS identifier to Wikidata ID.
- Parameters:
lmid (str) – LIPID MAPS identifier
- Returns:
qs – List of strings with Wikidata IDs.
- Return type:
list of str
Examples
>>> lipidmaps_to_qs('LMFA') == ['Q63433687'] True >>> lipidmaps_to_qs('LMFA00000007') == ['Q27114894'] True
- scholia.query.main()
Handle command-line interface.
- scholia.query.mesh_to_qs(meshid)
Convert MeSH ID to Wikidata ID.
- Parameters:
meshid (str) – MeSH identifier
- Returns:
qs – List of strings with Wikidata IDs.
- Return type:
list of str
Examples
>>> mesh_to_qs('D028441') == ['Q33659470'] True
- scholia.query.ncbi_gene_to_qs(gene)
Convert a NCBI gene identifier to Wikidata ID.
Wikidata Query Service is used to resolve the NCBI gene identifier.
The NCBI gene identifier string is converted to uppercase before any query is made.
- Parameters:
gene (str) – NCBI gene identifier
- Returns:
qs – List of strings with Wikidata IDs.
- Return type:
list of str
Examples
>>> ncbi_taxon_to_qs('694009') == ['Q278567'] True
- scholia.query.ncbi_taxon_to_qs(taxon)
Convert a NCBI taxon identifier to Wikidata ID.
Wikidata Query Service is used to resolve the NCBI taxon identifier.
The NCBI taxon identifier string is converted to uppercase before any query is made.
- Parameters:
taxon (str) – NCBI taxon identifier
- Returns:
qs – List of strings with Wikidata IDs.
- Return type:
list of str
Examples
>>> ncbi_taxon_to_qs('694009') == ['Q278567'] True
- scholia.query.omim_to_qs(omimID)
Convert OMIM identifier to Wikidata ID.
- Parameters:
omim (str) – OMIM identifier
- Returns:
qs – List of strings with Wikidata IDs.
- Return type:
list of str
Examples
>>> 'Q41112' in omim_to_qs('181500') True
- scholia.query.openalex_to_qs(openalex)
Convert OpenAlex ID to Wikidata identifiers.
Given an identifier from the OpenAlex return zero or more corresponding Wikidata identifiers.
- Parameters:
openalex (str) – OpenAlex identifier.
- Returns:
qs – List of string with Wikidata IDs.
- Return type:
list of str
Examples
>>> openalex_to_qs('a5060194743') == ['Q20895241'] True
- scholia.query.orcid_to_qs(orcid)
Convert orcid to Wikidata ID.
- Parameters:
orcid (str) – ORCID identifier
- Returns:
qs – List of strings with Wikidata IDs.
- Return type:
list of str
Examples
>>> orcid_to_qs('0000-0001-6128-3356') == ['Q20980928'] True
- scholia.query.pubchem_to_qs(cid)
Convert a PubChem compound identifier (CID) to Wikidata ID.
Wikidata Query Service is used to resolve the PubChem identifier.
- Parameters:
pmid (str) – PubChem compound identifier (CID)
- Returns:
qs – List of strings with Wikidata IDs.
- Return type:
list of str
Examples
>>> pubchem_to_qs('14123361') == ['Q289372'] True
- scholia.query.pubmed_to_qs(pmid)
Convert a PubMed identifier to Wikidata ID.
Wikidata Query Service is used to resolve the PubMed identifier.
The PubMed identifier string is converted to uppercase before any query is made.
- Parameters:
pmid (str) – PubMed identifier
- Returns:
qs – List of strings with Wikidata IDs.
- Return type:
list of str
Examples
>>> pubmed_to_qs('29029422') == ['Q42371516'] True
- scholia.query.q_to_class(q)
Return Scholia class of Wikidata item.
The ‘class’, i.e., which kind of instance, the item is by querying the Wikidata Query Service.
- Parameters:
q (str) – Wikidata item identifier.
- Returns:
class_ – Scholia class represented as a string.
- Return type:
‘author’, ‘venue’, ‘organization’, …
Notes
The Wikidata Query Service will be queried for P31 value. The value is compared against a set of hardcoded matches.
- scholia.query.q_to_dois(q)
Get DOIs for a Q item.
Query the Wikidata Query Service to get zero or more DOIs for a particular Q item identified by the Q identifier.
- Parameters:
q (str) – String with Wikidata Q identifier.
- Returns:
dois – List with zero or mores strings each containing a DOI.
- Return type:
list of str
Examples
>>> dois = q_to_dois("Q87191917") >>> dois == ['10.1016/S0140-6736(20)30211-7'] True
- scholia.query.q_to_label(q, language='en')
Get label for Q item.
- Parameters:
q (str) – String with Wikidata Q item.
language (str) – String with language identifier
- Returns:
label – String with label corresponding to Wikidata item.
- Return type:
str
Examples
>>> q_to_label('Q80') == "Tim Berners-Lee" True
- scholia.query.query_to_bindings(query)
Return response bindings from SPARQL query.
Query the Wikidata Query Service with the given query and return the response data as binding.
- Parameters:
query (str) – SPARQL query as string
- Returns:
bindings – Data as list of dicts.
- Return type:
list
- scholia.query.random_author()
Return random author.
Sample a scientific author randomly from Wikidata by a call to the Wikidata Query Service.
- Returns:
q – Wikidata identifier.
- Return type:
str
Notes
The author returned is not necessarily a scholarly author.
The algorithm uses a somewhat hopeful randomization and if no author is found it falls back on Q18618629.
Examples
>>> q = random_author() >>> q.startswith('Q') True
- scholia.query.random_podcast()
Return random podcast.
Sample a podcast randomly from Wikidata by a call to the Wikidata Query Service.
- Returns:
q – Wikidata identifier.
- Return type:
str
Notes
The work returned is not necessarily a podcast.
The algorithm uses a somewhat hopeful randomization and if no work is found it falls back on Q21146099.
Examples
>>> q = random_work() >>> q.startswith('Q') True
- scholia.query.random_work()
Return random work.
Sample a scientific work randomly from Wikidata by a call to the Wikidata Query Service.
- Returns:
q – Wikidata identifier.
- Return type:
str
Notes
The work returned is not necessarily a scholarly work.
The algorithm uses a somewhat hopeful randomization and if no work is found it falls back on Q21146099.
Examples
>>> q = random_work() >>> q.startswith('Q') True
- scholia.query.ror_to_qs(rorid)
Convert a ROR identifier to Wikidata ID.
Wikidata Query Service is used to resolve the ROR identifier.
- Parameters:
rorid (str) – ROR identifier
- Returns:
qs – List of strings with Wikidata IDs.
- Return type:
list of str
Examples
>>> ror_to_qs('038321296') == ['Q5566337'] True
- scholia.query.search_article_titles(q, search_string=None)
Search articles with q item.
- Parameters:
q (str) – String with Wikidata Q item.
search_string (str, optional) – String with query string. If it is not provided then the label of q items is used as the query string.
- Returns:
results – List of dicts with query result.
- Return type:
list of dict
Notes
This function uses the Egon Willighagen trick with iterating over batches of 500’000 thousand articles and performing a search in the (scientific) article title for the query string via the CONTAINS SPARQL function. Case is ignored.
- scholia.query.search_article_titles_to_quickstatements(q, search_string=None)
Search article titles and return quickstatements.
- Parameters:
q (str) – String with Wikidata Q identifier.
search_string (str, optional) – Search string
- Returns:
quickstatements – String with quickstatement formated commands.
- Return type:
str
- scholia.query.twitter_to_qs(twitter)
Convert Twitter account name to Wikidata ID.
- Parameters:
twitter (str) – Twitter account identifier
- Returns:
qs – List of strings with Wikidata IDs.
- Return type:
list of str
Examples
>>> twitter_to_qs('utafrith') == ['Q8219'] True
- scholia.query.uniprot_to_qs(protein)
Convert a UniProt identifier to Wikidata ID.
Wikidata Query Service is used to resolve the UniProt identifier.
The UniProt identifier string is converted to uppercase before any query is made.
- Parameters:
protein (str) – UniProt identifier
- Returns:
qs – List of strings with Wikidata IDs.
- Return type:
list of str
Examples
>>> uniprot_to_qs('P02649') == ['Q424728'] True
- scholia.query.viaf_to_qs(viaf)
Convert VIAF identifier to Wikidata ID.
- Parameters:
viaf (str) – VIAF identifier
- Returns:
qs – List of strings with Wikidata IDs.
- Return type:
list of str
Examples
>>> viaf_to_qs('59976288') == ['Q3259614'] True
- scholia.query.website_to_qs(url)
Convert URL for website to Wikidata ID.
- Parameters:
url (str) – URL for official website.
- Returns:
qs – List of strings with Wikidata IDs.
- Return type:
list of str
Examples
>>> url = ("https://papers.nips.cc/paper/" ... "6498-online-and-differentially-private-tensor-decomposition") >>> qs = website_to_qs(url) >>> qs == ['Q46994097'] True
- scholia.query.wikipathways_to_qs(wpid)
Convert a WikiPathways identifier to Wikidata ID.
Wikidata Query Service is used to resolve the WikiPathways identifier.
- Parameters:
wpid (str) – WikiPathways identifier
- Returns:
qs – List of strings with Wikidata IDs.
- Return type:
list of str
Examples
>>> wikipathways_to_qs('WP111') == ['Q28031254'] True
scholia.rss module
rss.
- Usage:
scholia.rss author-latest-works <q> scholia.rss venue-latest-works <q> scholia.rss topic-latest-works <q> scholia.rss organization-latest-works <q> scholia.rss sponsor-latest-works <q>
- Description:
Functions related to feed.
Examples
$ python -m scholia.rss author-latest-works Q27061849 …
$ python -m scholia.rss venue-latest-works Q5936947 …
$ python -m scholia.rss topic-latest-works Q130983 …
$ python -m scholia.rss organization-latest-works Q1137652 …
$ python -m scholia.rss sponsor-latest-works Q1377836
References
https://validator.w3.org/feed/docs/rss2.html
- scholia.rss.entities_to_works_rss(entities)
Convert Wikidata entities to works rss.
- Parameters:
entities (list) – List of Wikidata items in nested structure.
- Returns:
rss – RSS-formatted list of work items.
- Return type:
str
Notes
Wikidata entities without a publication date are skipped.
- scholia.rss.main()
Handle command-line arguments.
- scholia.rss.wb_get_author_latest_works(q)
Return RSS-formated list of latest work for author.
Query the Wikidata Query Service for latest work from of the author specified with the Wikidata identifier q. Return the list formatted as a RSS feed.
- Parameters:
q (str) – Wikidata identifier.
- Returns:
rss – Feed in XML.
- Return type:
str
Notes
The Wikidata Query Service may have problems for dates before 0. The SPARQL will fail in such instances [1]. This function will then return an empty list.
References
- scholia.rss.wb_get_organization_latest_works(q)
Return feed for latest work from an organization.
- Parameters:
q (str) – Wikidata identifer
- Returns:
rss – RSS-formatted feed with latest work from an organization.
- Return type:
str
- scholia.rss.wb_get_sponsor_latest_works(q)
Return feed for latest work from a sponsor.
- Parameters:
q (str) – Wikidata identifer
- Returns:
rss – RSS-formatted feed with latest work from a sponsor.
- Return type:
str
- scholia.rss.wb_get_topic_latest_works(q)
Return feed for latest work on topic.
- Parameters:
q (str) – Wikidata identifier
- Returns:
rss – RSS-formatted feed with latest work on topic.
- Return type:
str
- scholia.rss.wb_get_venue_latest_works(q)
Return feed for latest work from venue.
- Parameters:
q (str) – Wikidata identifer
- Returns:
rss – RSS-formatted feed with latest work from venue.
- Return type:
str
scholia.scrape module
Scrape websites.
scholia.scrape.nips module
Scraper for NIPS.
- Usage:
scholia.scrape.nips scrape-paper-from-url <url> scholia.scrape.nips scrape-paper-urls-from-proceedings-url [options] <url> scholia.scrape.nips scrape-proceedings-from-url <url> scholia.scrape.nips paper-url-to-q <url> scholia.scrape.nips paper-url-to-quickstatements <url> scholia.scrape.nips paper-urls-to-quickstatements [options] <filename>
- Options:
-o –output=file Output filename, default output to stdout –oe=encoding Output encoding [default: utf-8]
Notes
NeurIPS/NIPS papers are available from https://papers.nips.cc. The format of the NIPS/NeurIPS proceeding homepage has changed, so the scraper may not always work.
Papers may be published the year after the conference. Newer conferences seems to publish the same year while older conferences published the year after, e.g., NIPS 2008 is published in 2009, while NIPS 2009 is published in the same year, i.e., 2009.
For scrape-paper-urls-from-proceedings-url the proceedings URL should be one listed at https://papers.nips.cc/. It will return a JSON with a list of URLs for the individual papers.
The generated quickstatements from paper-url-to-quickstatements can be submitted to https://quickstatements.toolforge.org/.
- scholia.scrape.nips.main()
Handle command-line interface.
- scholia.scrape.nips.paper_to_q(paper)
Find Q identifier for paper.
- Parameters:
paper (dict) – Paper represented as dictionary.
- Returns:
q – Q identifier in Wikidata. None is returned if the paper is not found.
- Return type:
str or None
Notes
This function might be use to test if a scraped NIPS paper is already present in Wikidata.
The match on title is using an exact query, meaning that any variation in lowercase/uppercase will not find the Wikidata item.
Examples
>>> paper = { ... 'title': 'Hash Embeddings for Efficient Word Representations', ... 'url': ('https://papers.nips.cc/paper/7078-hash-embeddings-for-' ... 'efficient-word-representations'), ... 'full_text_url': ('https://papers.nips.cc/paper/7078-hash-' ... 'embeddings-for-efficient-word-' ... 'representations.pdf')} >>> paper_to_q(paper) 'Q39502551'
- scholia.scrape.nips.paper_url_to_q(url)
Return Q identifier based on URL.
Scrape NIPS HMTL page with paper and use the extracted information on a query on Wikidata Query Service to find the Wikidata Q identifier.
- Parameters:
url (str) – URL to NIPS HTML page.
- Returns:
q – Q identifier for Wikidata or None if not found.
- Return type:
str or None
Examples
>>> url = ("https://papers.nips.cc/paper/2020/hash/" ... "00482b9bed15a272730fcb590ffebddd-Abstract.html") >>> paper_url_to_q(url) 'Q104089790'
- scholia.scrape.nips.paper_url_to_quickstatements(url)
Return Quickstatements for paper URL.
For a given URL pointing to a NIPS paper, scrape the bibliographic information from the NIPS website and return the corresponding Quickstatements command for entry into Wikidata.
- Parameters:
url (str) – URL to NIPS paper.
- Returns:
qs – String with paper formatted as Quickstatements.
- Return type:
str
Notes
The function tests whether the paper is already entered into Wikidata and return a comment line with the corresponding Wikidata identifier.
- scholia.scrape.nips.scrape_paper_from_old_url(url)
Scrape NIPS paper from uURL.
Download legacy HTML page from https://papers.nips.cc/paper/, extract and return bibliographic metadata.
- Parameters:
url (str) – URL to NIPS paper. Should start with https://papers.nips.cc/paper/ The URL may either be to the HTML page or the PDF.
- Returns:
paper – Dictionary with paper.
- Return type:
dict
Notes
The information is scraped from the individual HTML pages on the website https://papers.nips.cc as it was formatted before 2020. The new format means that the scraping no longer works.
The returned paper dict contains url, title, authors as list, full_text_url, abstract, year and published_in_q. The year is corrected from the nominal to the actual publication year, such that papers published before NIPS 2009 has the publication year set to the year after the conference.
If the abstract is not listed on the papers.nips.cc HTML page then the abstract field is not available in the returned paper variable. Some of the earliest conferences does not list the abstract.
- scholia.scrape.nips.scrape_paper_from_url(url)
Scrape NeurIPS paper from uURL.
Download HTML page from https://proceedings.neurips.cc, extract and return bibliographic metadata.
- Parameters:
url (str) – URL to NeurIPS paper. Should start with https://proceedings.neurips.cc/paper/. The URL should be to the HTML page.
- Returns:
paper – Dictionary with paper.
- Return type:
dict
Notes
The information is scraped from the individual HTML pages on the website https://proceedings.neurips.cc.
The returned paper dict contains url, title, authors as list, full_text_url, abstract, year and published_in_q. The year is corrected from the nominal to the actual publication year, such that papers published before NIPS 2009 has the publication year set to the year after the conference.
If the abstract is not listed on the papers.nips.cc HTML page then the abstract field is not available in the returned paper variable. Some of the earliest conferences does not list the abstract.
Examples
>>> url = ("https://proceedings.neurips.cc/paper/2020/hash/" ... "00482b9bed15a272730fcb590ffebddd-Abstract.html") >>> entry = scrape_paper_from_url(url) >>> entry['title'].startswith("An Unsupervised Information-Theoretic") True
- scholia.scrape.nips.scrape_paper_urls_from_proceedings_url(url)
Return paper URLs wrt. to proceedings.
- Parameters:
url (str) – HTTPS URL for NIPS proceedings
- Returns:
urls – Scraped URLs for papers in proceedings.
- Return type:
list of str
- scholia.scrape.nips.scrape_proceedings_from_url(url)
Scrape all papers from proceedings.
- Parameters:
url (str) – HTTPS URL for NIPS proceedings
- Returns:
entries – Scraped papers in list of dictionaries
- Return type:
list of dict
scholia.scrape.ojs module
Scraping Open Journal Systems.
- Usage:
scholia.scrape.ojs scrape-paper-from-url <url> scholia.scrape.ojs issue-url-to-quickstatements [options] <url> scholia.scrape.ojs paper-url-to-q <url> scholia.scrape.ojs paper-url-to-quickstatements [options] <url>
- Options:
–iso639=iso639 Overwrite default iso639 -o –output=file Output filename, default output to stdout –oe=encoding Output encoding [default: utf-8]
Examples
- $ python -m scholia.scrape.ojs paper-url-to-quickstatements
- scholia.scrape.ojs.issue_url_to_paper_urls(url)
Scrape paper URLs from issue URL.
Scrape paper (article) URLs from a given Open Journal System issue URL.
- Parameters:
url (str) – URL to an OJS issue.
- Returns:
urls – List of URLs to papers.
- Return type:
list of strs
Notes
Based on the URL, the HTML issue webpage will be fetched and the returned HTML parsed. Different matching approached are tried to extract the article URLs.
- scholia.scrape.ojs.issue_url_to_quickstatements(url, iso639=None)
Return Quickstatements for papers in an issue.
From a Open Journal System issue URL extract metadata for individual papers and format them in the Quickstatement format for entry in Wikidata.
- Parameters:
url (str) – URL for a OJS issue.
iso639 (str, optional) – String with ISO639 code. Default is None, meaning the iso639 will be read from the metadata.
- Returns:
qs – String with quickstatements.
- Return type:
str
- scholia.scrape.ojs.main()
Handle command-line interface.
- scholia.scrape.ojs.paper_to_q(paper)
Find Q identifier for paper.
- Parameters:
paper (dict) – Paper represented as dictionary.
- Returns:
q – Q identifier in Wikidata. None is returned if the paper is not found.
- Return type:
str or None
Notes
This function might be used to test if a scraped OJS paper is already present in Wikidata.
The match on title is using an exact query, meaning that any variation in lowercase/uppercase will not find the Wikidata item. If the title is shorter than 21 character then only the URL is used to match.
Examples
>>> paper = { ... 'title': ('Linguistic Deviations in the Written Academic Register ' ... 'of Danish University Students'), ... 'url': 'https://journals.uio.no/index.php/osla/article/view/5855'} >>> paper_to_q(paper) 'Q61708017'
- scholia.scrape.ojs.paper_url_to_q(url)
Return Q identifier based on URL.
Scrape OJS HTML page with paper and use the extracted information on a query on Wikidata Query Service to find the Wikidata Q identifier.
- Parameters:
url (str) – URL to Open Journal System article webpage.
- Returns:
q – Q identifier for Wikidata or None if not found.
- Return type:
str or None
Examples
>>> url ='https://journals.uio.no/index.php/osla/article/view/5855' >>> paper_url_to_q(url) 'Q61708017'
- scholia.scrape.ojs.paper_url_to_quickstatements(url, iso639=None)
Scrape OJS paper and return quickstatements.
Given a URL to a HTML web page representing a paper formatted by the Open Journal Systems, return quickstatements for data entry in Wikidata with the Magnus Manske Quickstatement tool.
- Parameters:
url (str) – URL to OJS paper as a string.
iso639 (str, optional) – String with ISO639 language code. Default is None, meaning the iso639 will be read from the metadata.
- Returns:
qs – Quickstatements for paper as a string.
- Return type:
str
Notes
It the paper is already entered in Wikidata then a comment will just be produced, - no quickstatements.
The quickstatement tool is available at https://quickstatements.toolforge.org.
- scholia.scrape.ojs.scrape_paper_from_url(url)
Scrape OJS paper from URL.
- Parameters:
url (str) – URL to paper as a string
- Returns:
paper – Paper represented as a dictionary.
- Return type:
dict
Example
>>> url = 'https://tidsskrift.dk/carlnielsenstudies/article/view/27763' >>> paper = scrape_paper_from_url(url) >>> paper['authors'] == ['John Fellow'] True
scholia.tex module
tex.
- Usage:
scholia.tex extract-qs-from-aux <file> scholia.tex write-bbl-from-aux <file> scholia.tex write-bib-from-aux <file>
- Description:
Work with latex and bibtex.
The functionality is not complete.
Example latex document:
documentclass{article} pdfoutput=1 usepackage[utf8]{inputenc}
begin{document} Scientific citations cite{Q26857876,Q21172284}. Semantic relatedness cite{Q26973018}. bibliographystyle{unsrt} bibliography{} end{document}
- scholia.tex.authors_to_bibtex_authors(authors)
Convert a Wikidata entity to an author in BibTeX.
- Parameters:
authors (dict) – Wikidata entity as hierarchical structure.
- Returns:
entry – Bibtex entry in Unicode string.
- Return type:
str
- scholia.tex.entity_to_bibtex_entry(entity, key=None)
Convert Wikidata entity to bibtex-formatted entry.
- Parameters:
entity (dict) – Wikidata entity as hierarchical structure.
key (str) – Bibtex key.
- Returns:
entry – Bibtex entry in Unicode string.
- Return type:
str
- scholia.tex.escape_to_tex(string, escape_type='normal')
Escape a text to a tex/latex safe text.
- Parameters:
string (str or None) – Unicode string to be escaped.
escape_type (normal or url, default normal) – Type of escaping.
- Returns:
escaped_string – Escaped unicode string. If the input is None then an empty string is returned.
- Return type:
str
Examples
>>> escape_to_tex("^^") == r'\^{}\^{}' True
>>> escaped = escape_to_tex('10.1007/978-3-319-18111-0_26', 'url') >>> escaped == '10.1007/978-3-319-18111-0\\_26' True
References
- scholia.tex.extract_dois_from_aux_string(string)
Extract DOIs from string.
- Parameters:
string (str) – Extract Wikidata identifiers from citations.
- Returns:
dois – List of strings.
- Return type:
list of str
Examples
>>> string = "\\citation{10.1186/S13321-016-0161-3}" >>> extract_dois_from_aux_string(string) ['10.1186/S13321-016-0161-3']
- scholia.tex.extract_qs_from_aux_string(string)
Extract qs from string.
- Parameters:
string (str) – Extract Wikidata identifiers from citations.
- Returns:
qs – List of strings.
- Return type:
list of str
Examples
>>> string = "\\citation{Q28042913}" >>> extract_qs_from_aux_string(string) ['Q28042913']
>>> string = "\\citation{Q28042913,Q27615040}" >>> extract_qs_from_aux_string(string) ['Q28042913', 'Q27615040']
>>> string = "\\citation{Q28042913,Q27615040,Q27615040}" >>> extract_qs_from_aux_string(string) ['Q28042913', 'Q27615040', 'Q27615040']
>>> string = "\\citation{Q28042913,NielsenF2002Neuroinformatics,Q27615040}" >>> extract_qs_from_aux_string(string) ['Q28042913', 'Q27615040']
>>> string = "\\citation{Q28042913,Q27615040.Q27615040}" >>> extract_qs_from_aux_string(string) ['Q28042913']
- scholia.tex.guess_bibtex_entry_type(entity)
Guess Bibtex entry type.
- Parameters:
entity (dict) – Wikidata item.
- Returns:
entry_type – Entry type as a string: ‘Article’, ‘InProceedings’, etc.
- Return type:
str
- scholia.tex.main()
Handle command-line arguments.
scholia.text module
scholia.text.
- Usage:
scholia.text text-to-topic-q-text-setup scholia.text text-to-topic-qs <text> scholia.text text-to-topics-url <text>
- Options:
-h –help Help
- Description:
Handle text.
text-to-topic-qs command will setup a matching method that can convert a text to Wikidata Q identifiers associated with topics of scientific articles. The setup will call the Wikidata Query Service to setup a regular expression for the matching.
The result of the text-to-topic-qs command-line command can be used to query Scholia:
- class scholia.text.TextToTopicQText
Bases:
objectConverter of text to Wikidata Q identifier data.
- mapper
Dictionary between labels and associated Wikidata Q identifiers.
- Type:
dict
- pattern
Regular expression pattern for matching Wikidata labels.
- Type:
re.SRE_Pattern
- get_mapper()
Return mapper between label and Wikidata item.
Query the Wikidata Query service to get Wikidata identifiers and associated labels and convert them to a dictionary.
- Returns:
mapper – Dictionary where the keys are labels associated with Wikidata Q identifiers.
- Return type:
dict
Notes
This method queries the Wikidata Query Service with a static SPARQL query. It well take some time to complete, perhaps 30 seconds or more.
In some cases a timeout may occur in the middle of a response, making the JSON return invalid. The method will try second time. If this also fails, then the method will raise an exception.
- load_mapper_from_json(filename=None)
Load map from JSON.
- Parameters:
filename (str) – Filename for JSON file.
- save_mapper_as_json(filename=None)
Save mapper as JSON file.
- Parameters:
filename (str) – Filename for JSON file to be written.
- save_object_as_pickle(filename=None)
Save object.
- text_to_topic_q_text(text)
Convert text to q-text.
- Parameters:
text (str) – Text to be matched.
- Returns:
q_text – Text with words and phrases substituted with Wikidata Q identifiers.
- Return type:
str
- text_to_topic_qs(text)
Return Wikidata Q identifiers from text matching.
- Parameters:
text (str) – Text to be matched.
- Returns:
qs – List with Wikidata Q identifiers as strings.
- Return type:
list of str
- scholia.text.load_pickle_text_to_topic_q_text()
Load an object that is already set up.
Load the TextToTopicQText object from a pickle file and if it is not available set it up from the object.
- Returns:
text_to_topic_q_text – Text-to-topic-q-text object that is set up and ready to use.
- Return type:
- scholia.text.load_text_to_topic_q_text()
Set up an object.
Set up TextToTopicQText.
- Returns:
text_to_topic_q_text – Text-to-topic-q-text object that is set up and ready to use.
- Return type:
- scholia.text.main()
Handle command-line interface.
scholia.utils module
utils.
- scholia.utils.escape_string(string)
Escape string.
- Parameters:
string (str) – String to be escaped
- Returns:
escaped_string – Escaped string
- Return type:
str
Examples
>>> string = 'String with " in it' >>> escape_string(string) 'String with \\" in it'
- scholia.utils.pages_to_number_of_pages(pages)
Compute number of pages based on pages represented as string.
- Parameters:
pages (str) – Pages represented as a string.
- Returns:
number_of_pages – Number of pages returned as an integer. If the conversion is not possible then None is returned.
- Return type:
int or None
Examples
>>> pages_to_number_of_pages('61-67') 7
- scholia.utils.remove_special_characters_url(url)
Remove url encoded characters and normalize non-ascii characters.
- Parameters:
url (str) – URL-encoded string
- Returns:
formatted_string – Normalized string without non-ascii characters or spaces
- Return type:
str
- scholia.utils.sanitize_q(q)
Sanitize Wikidata identifier.
- Parameters:
q (str or int) – Wikidata identifier as string.
- Returns:
sanitized_q – Sanitized Wikidata identifier, empty if not a Wikidata identifier.
- Return type:
str
Examples
>>> sanitize_q(' Q5 ') 'Q5' >>> sanitize_q('Q5"') 'Q5' >>> sanitize_q('Wikidata') '' >>> sanitize_q(5) 'Q5' >>> sanitize_q('5') 'Q5'
- scholia.utils.string_to_list(string)
Convert comma/space/tab/pipe separated string to list.
- Parameters:
string (str) – Query string.
- Returns:
elements – List of strings splitted based on separators
- Return type:
list of str
Examples
>>> string_to_list("1, 2 | 3\t4 |5") ['1', '2', '3', '4', '5'] >>> string_to_list(" 10.10,abc|123 ") ['10.10', 'abc', '123']
- scholia.utils.string_to_type(string)
Guess type of string.
- Parameters:
string (str) – Query string.
- Returns:
result
- Return type:
str
Examples
>>> string_to_type('1121-4545') 'issn'
scholia.wikipedia module
wikipedia.
- Usage:
scholia.wikipedia q-to-bibliography-templates <q> [options]
- Options:
- --debug
Debug messages.
-h –help Help message –oe=encoding Output encoding [default: utf-8] -o –output=<file> Output filename, default output to stdout –verbose Verbose messages.
Examples
$ python -m scholia.wikipedia q-to-bibliography-templates –debug Q20980928
- scholia.wikipedia.main()
Handle command-line interface.
- scholia.wikipedia.q_to_bibliography_templates(q)
Construct bibliography for Wikidata based on Wikidata identifier.
- Parameters:
q (str) – String with Wikidata item identifier.
- Returns:
wikitext – String with wikipedia template formatted bibliography.
- Return type:
str
References
https://en.wikipedia.org/wiki/Template:Cite_journal
Examples
>>> wikitext = q_to_bibliography_templates("Q28923929") >>> wikitext.find('Cite journal') != -1 True