Wikipedia Module in Python

Wikipedia Module in Python

In this article, we will discuss the Wikipedia module in Python and also discuss how we can utilize the Wikipedia module using the Python script. We will fetch the verity of information from the Wikipedia.

Introduction

Internet is the most significant source of information. All knowledge is just one click away from us if we have an internet connection. Therefore, it is necessary to know how we can gather correct information from the right source. When we retrieve the information form various sources, this term is called Data Scraping. We all have used Wikipedia. It is the land of the full of information.

Wikipedia is the largest platform on the internet, which contains tons of information. It is an open-source platform which manages by the community of volunteer editors using a wiki-based editing system. It is a multi-lingual encyclopedia.

Python provides the Wikipedia module (or API) to scrap the data from the Wikipedia pages. This module allows us to get and parse the information from Wikipedia. In simple words, we can say that it is worked as a little scrapper and can scrap only a limited amount of data. Before we start working with it, we need to install this module on our local machine.

Installation

This module wraps the official Wikipedia API. In the first step, we will install the Wikipedia module using the following pip command. Type the below command in the terminal-

The above command will install the module in the system. Now, we need to import it using the following command.

Now we are ready to extract data from the Wikipedia.

Getting Started with Wikipedia Module

Wikipedia module consists of various built-in methods which help to get the desired information.

Search Title and Result

The Python Wikipedia module allows us to search a query supplied as an argument using the search() method. This method returns a list of all articles that contain the searched query. Let’s understand the following example.

Example –

Output:

['India', 'Constitution of India', 'Demographics of India', 'Languages of India', 'Republic Day (India)', 'Government of India', 'Economy of India', 'History of India', 'The Times of India', 'List of prime ministers of India']

As we can see in the above output, the method returned the title and the related search. We can limit the number of search titles by passing a value for the result parameter. Consider the following example.

Example –

Output:

['India', 'Constitution of India', 'Demographics of India', 'Languages of India']

The above code printed the four results because have made request to get only four results.

Suggestion

As the name suggests, the suggest method returns the suggested Wikipedia title for the query or none if it doesn’t found any. Let’s see the following example.

Example –

Output:

None

In the above code, we have searched for the “Coronavirus” but type the wrong spelling. The suggest() method returned None, because it didn’t find the searched query.

Summary of the Article

Python Wikipedia module provides the summary() method, which returns the article’s summary or topic. This method takes the two arguments – title and sentences and returns the summary in the string format. Let’s consider the below example.

Example –

Output:

Rohit Gurunath Sharma (born 30 April 1987) is an Indian international cricketer who plays for Mumbai in domestic cricket and captains Mumbai Indians in the Indian Premier League as a right-handed batsman and an occasional right-arm off break bowler. He is the vice-captain of the Indian national team in limited-overs formats.
Outside cricket, Sharma is an active supporter of animal welfare campaigns. He is the official Rhino Ambassador for WWF-India and is a member of People for the Ethical Treatment of Animals (PETA).

The summary of the give title printed and we customized the number of sentences in the summary text to be displayed by using the sentences argument.

It will be always remembered the summary() method raises a “disambiguation error” if the page doesn’t exist. Let’s understand the following example.

Example –

Output:

Traceback (most recent call last):
  File "C:/Users/DEVANSH SHARMA/PycharmProjects/MyPythonProject/pillow_image.py", line 194, in 
    print(wikipedia.summary("key"))
  File "C:/Users/DEVANSH SHARMA/PycharmProjects/MyPythonProject/venv/lib/site-packages/wikipedia/util.py", line 28, in __call__
    ret = self._cache[key] = self.fn(*args, **kwargs)
  File "C:/Users/DEVANSH SHARMA/PycharmProjects/MyPythonProject/venv/lib/site-packages/wikipedia/wikipedia.py", line 231, in summary
    page_info = page(title, auto_suggest=auto_suggest, redirect=redirect)
  File "C:/Users/DEVANSH SHARMA/PycharmProjects/MyPythonProject/venv/lib/site-packages/wikipedia/wikipedia.py", line 276, in page
    return WikipediaPage(title, redirect=redirect, preload=preload)
  File "C:/Users/DEVANSH SHARMA/PycharmProjects/MyPythonProject/venv/lib/site-packages/wikipedia/wikipedia.py", line 299, in __init__
    self.__load(redirect=redirect, preload=preload)raise DisambiguationError(getattr(self, 'title', page['title']), may_refer_to)
wikipedia.exceptions.DisambiguationError: "Key" may refer to: 
Key (cryptography)
Key (lock)
Key (map)
typewriter
test
Cay
Key, Alabama
Key, Ohio
Key, West Virginia
Keys, Oklahoma
Florida Keys

Extracting Metadata of Title

We can get the complete metadata or text content of the Wikipedia page excluding images, table, etc. This module provides the content attribute of the page object. Let’s see the following example.

Example –

Output:

Sachin Ramesh Tendulkar ( (listen); born 24 April 1973) is an Indian former international cricketer who served as captain of the Indian national team. He is widely regarded as one of the greatest batsmen in the history of cricket. He is the highest run scorer of all time in International cricket. Considered as the world's most prolific batsman of all time, he is the only player to have scored one hundred international centuries, the first batsman to score a double century in a One Day International (ODI), the holder of the record for the most runs in both Test and ODI cricket, and the only player to complete more than 30,000 runs in international cricket. In 2013, he was the only Indian cricketer included in an all-time Test World XI named to mark the 150th anniversary of Wisden Cricketers' Almanac.
............

Getting Full Wikipedia Page Data

Python Wikipedia module allows us to get the full Wikipedia using the page() function. It returns the page content, categories, coordinate, images, links and other metadata. Let’s understand the following example.

Example –

Output:

>
United States
['.as', '.com', '.edu', '.gov', '.gu', '.mil', '.mp', '.net', '.org', '.pr', '.um', '.us', '.vi', '100th meridian west', '117th United States Congress', '1790 United States Census', '1800 United States Census', '1810 United States Census', '1820 United States Census', '1830 United States Census']

Customizing the Page Language

We can change the default language of the existed page. The set_lang() method is used to change the page language. Each language has a standard prefix code which is passed as an argument in the method. Let’s understand the following example.

Example –

Output:

????? ?? ??????? ??????? ?? ??? ???????, ???? ?????? ???????????? ???? (General Purpose and High Level Programming language), ???????????, ???????? ?????????, ???????????? ???? ??? ?? ???? ?? ?? ??? ?? ?????? ???? ??? ?? ???? ????? ???? ?? ??? ????? ?? ???? ?? ???? ?? ?????
???? ???????????? ?????? ?? ??????, ?????? ???-??????? ?? ??????? ?? ??? ????? ?????? ( {} ) ?? ???????? ???? ???? ??, ????? ??? ???-??????? ?? ??????? ?? ??? ?????? ????? (white space) ?? ?????? ???? ???? ??? ?? ???????????? ???? ?? Guido van Rossum ?? 1991 ??? ????? ??? ?? ??????? ?? ???????????? ???? ?? ?????? ????????? ????? ?? ??? ??? ?? ??????, ???? ?????-??????? ???? ?? ????? ???? ??? ????? "????? ???? ?? ??? ???? ?????? ????????? ?????" ?? ???? ???? ??? ?? ???? ???? ????????? (standard library) ???? ?? ?????? ???
?? ???? ?? ??????-????? ??? ???-??????? (code readability) ?? ??? ???? ??? ??? ????? ?? ???? ?? ?? ???? ????????? ???? ?????? ??; ???? ???? ????????? ????? ?? ?????????? (comprehensive) ??? ?? ?????? ???????? ?? ??? ????? ?????? ????? ??? (pre-installed) ??? ???
???? ?????? ?????? ?? ???, ????? ????? ?? ???????????? ???? ?? ??? ??? ?????? ???? ???? ??, ????? ??? ??? ??? ???????????? ???????? ?? ?? ??????? ?????? ??? ?? ?????? ???? ???? ??? ??? ??????? ?? ????? ????, ????? ??? ?????????? ???????? ????? ????????? (???????????? ?????????) ?? ??? ??? ??? ???? ?? ???? ??? ????? ??????????? ?? ???????? ?????? ?? ??? ?????? ????

As we can see in the above code, it converted the request page in the Hindi. We can change any of the language using the set_lang() method.

Conclusion

We have covered all important concepts of the Wikipedia API using the Python code. We have also discussed how to get the variety of the information such as page title, summary, category and extract the data from the web.


原创文章,作者:ItWorker,如若转载,请注明出处:https://blog.ytso.com/263570.html

(0)
上一篇 2022年5月30日
下一篇 2022年5月30日

相关推荐

发表回复

登录后才能评论