Site Loader
Rock Street, San Francisco

Abstract—There is an urge for the efficient
and effective retrieval techniques. Merely finding the relevant information is
not the only task of Information Retrieval systems. Instead the Information
Retrieval systems are supposed to retrieve the relevant information as well as
rank or organize according to its degree of relevancy with the given query. The
main problem in ranking is to classify which documents are relevant and which
are irrelevant. Existing ranking techniques mostly rely on keywords to judge
the relevancy of the data with the given query. The relevancy was defined in
terms of number of times the words that is in the query appear in the document
i.e. term frequency. A semantic search framework was adopted in several studies
to handle this issue. Exploiting domain ontology, a semantic search framework
uses contextual meaning of terms in documents and queries to match ambiguous
terms. Here a semantic search framework for semantic pre-processing of
documents and queries is adopted which considers relationships among the words
in the document also in the query (e.g. is-a, part-of, sub-part etc.) with the
help of semantic network generation algorithm. By exploiting relationships
among terms, the possibility of matching users information needs with documents
can be increased.

Index Terms—Domain ontology, Engineering
document, Personalized search, Ranking, Semantic search.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now


engineering document retrieval is crucial to improve engineers productivity.
During the product development process, a large number of documents are
generated and accessed by engineers. Existing document retrieval approaches
need to be improved to satisfy engineers information needs. Most manufacturing
corporations use a keyword-based search approach due to its popularity and ease
of use. A keywordbased search approach retrieves documents by exact term
matching. It has difficulty in finding engineering documents because
engineering documents are different from general documents in terms of syntax
variations and semantic complexities. In general, abbreviations and acronyms
are frequently used in engineering documents. Brevity is a characteristic of
engineering documents. Those characteristics make it difficult to retrieve
relevant documents against users information needs. A semantic search framework
was adopted in several studies to handle this issue. Exploiting domain
ontology, a semantic search framework uses contextual meaning of terms in
documents and queries to match ambiguous terms. Meanwhile, the semantic search
framework can enhance retrieval performance by applying several techniques,
such as query expansion, document partitioning, or document ranking techniques.
Among these techniques, it is valuable to investigate and improve existing
document ranking techniques; semantics of a document could be precisely
represented and relevance of the document could be correctly estimated based on
domain ontology. Thus, a new document ranking technique based on domain
ontology is used.

In such an approach, since documents and queries are
represented by graphs using an ontology, dependencies among terms can be
explicitly recognized as specific relationships. Also, by exploiting the
ontology, it is possible to recover semantically related non-matched terms to
matched terms in an engineering document; these terms could be important
factors of document relevance. Thus, an ontology-based approach is an
appropriate solution for engineering document ranking. However, since existing
ontology-based ranking approaches use a naive measurement of document
relevance, studies for a more elaborative weighting scheme are necessary.
Semantic searching seeks to improve search accuracy of the search engine by
understanding searchers intent and the contextual meaning of the terms present
in the query to retrieve more relevant results. To find out the semantic
similarity between the query terms, WordNet is used as the underlying reference
database. Various approaches of learning to Rank are compared. In the semantic
network research area, there have been several studies on the
relation-weighting algorithm. These studies have proposed a ranking methodology
for multiple semantic routes between source node and destination node on a huge
semantic network. For this purpose, weighting schemes for relations are
proposed to evaluate the relevance of the semantic routes. Thus, the proposed
relation weighting schemes can be rich sources of inspiration for our document
ranking approach.

A. Background

They do not consider relationships among terms,
even though terms in documents have relationships (e.g. is-a, part-of, etc.)

Matched terms between a query and a document are
dominant factors of the documents relevance score to the query.


QE is
considered a viable solution, expanding process by expanding query keywords
with related terms. AHP is an effective tool for dealing with complex decision
making; it aids the decision maker to determine the priorities of used criteria

Learning to Rank is applied to
automatically learn a Ranking function from the training data. In this paper, a
new hybrid learner is introduced based on NN and SVM that gives better
performance than learning using NN alone 2.

Proposed system presents an
improved Semantic Similarity technique to rank a web page from a set of given
web page which access the user history to rank the webpage according to the
user query. An online interface is developed using web technologies and is being used as a programming language tool 3.

The proposed algorithm calculates
page rank value or importance of web pages based on the visits of incoming
links on a page. It observed that the page which has more visits of incoming
links is carrying more rank value than less Visited pages 4.

The method proposed in this paper
uses the concepts and relationship between the concepts that exists both in the
document and the user query to improve the retrieval of relevant document. A
different method is used for keyword extraction, hence it leads to better
results 5.

The proposed paper explores the problem of maintaining the
semantic relationship between different plain documents over the related
encrypted documents and give the design method to enhance the performance of
the semantic search 6. In this paper, we have proposed a new ranking strategy
known as SemRank which uses the SI measure to calculate the image relevancy
weights against the query. It has an advantage that it can employ the semantics
inside the image and the query in determining the ranking order 7.


In the proposed work, documents
relevance score is calculated using a semantic network generated from a
document. If a semantic network of a document contains more relations that are
related to users information needs, the document will get a higher relevance
score. To calculate the weight of each relation against the information needs,
the proposed approach exploits multiple measures. Also, user preference
information, called the user profile, is used for the personalized ranking
results. This total work can be divided into two steps: (1) Generating a
document semantic network for a document and (2) measuring relevance score for
the document semantic network. When a query is submitted, documents are retrieved
by Boolean model matching. To rank the retrieved documents, document semantic
networks are generated for each retrieved document. Then, relevance scores for
each document are calculated based on their document semantic network. In this
calculation process, user profile is also exploited for the personalized
relevance scoring. After that, retrieved documents are sorted in decreasing
order of the relevance score.

A. System Architecture

Fig. 1. System Architecture


Importance measurement
for terms in a document To decide which terms are minor or major information
elements in the document, term weighting schemes using three measures are
introduced. The three measures are

(1)  Structural

(2)  tfidf
Score, and (3) Semantic Score

The structural score of term Iij , StS (Iij) , is computed


Second, the tfidf score for term Iij is
calculated using the term frequency and the document frequency. Term frequency
(tf) and inverse document frequency (idf) are the foundations of the most
popular term weighting scheme in IR. The tfidf score of Iij , tfidf(Iij), is
computed as:

Tf?Idf(Iij) = Log(tf(Iij,dj)+1)?Log(|D|/1+df(Iij,D))

(2) Where tf
(Iij , dj) is the frequency of term Iij within document j and df (Iij , D) is
the no. of documents that contain term Iij in the document collection D. Thus,
terms with a high tf and low df will get high tf-idf scores.

Third, the semantic score of a
term Iij, SSr(Iij) is computed as:


where SD(Iij , Ikj) , called the semantic distance , is the
minimum no. of hops between Iij and Ikj in the ontology, and R is a parameter
to define the semantic distance range.The semantic score measures how each term
is far from the core semantics of a document. Through this measure, briefly
mentioned terms in a document could be revealed. The importance score function
IS (Iij), linear combinations of the three measures, is then calculated as:

= w1.StS(Iij)
?idf(Iij)+w3.SSR(Iij) (4)

Where w1, w2, and w3 are weighting parameters in the
interval 0, 1, and their sum is 1. Document semantic network generation for a
document The main goal of the DSN generation process is to connect separated
term-groups so as to represent the semantics of a document precisely. To make a
DSN in which all major terms in the document are connected, the algorithm
iteratively gathers a term one by one to make a connected graph. Since some of
terms can be omitted in a document for brevity, those terms should be
considered in the relevance measuring process. To provide personalized ranking
results, we exploit users highly accessed documents to build a user profile.
Using the algorithm, each of the DSNs for accessed documents are combined
together to form a connected graph. Thus, the user profile, a portion of the
ontology, represents which parts the user frequently handles and which
properties the user mainly considers.


key idea behind this proposed approach is to evaluate the impact of
relationships between terms in documents are meaningful sources for measuring
the relevance. Two main contributions of this seminar are (1) to represent the
semantics of a document using a document semantic network,in which major terms and their inferred relationships are included and
(2) to propose the relation-based weighting measures and a ranking function
that can consider user interest and intent at the same time. For this purpose,
an ontology-based user profile and user task context information are exploited.
By exploiting relationships among terms, the possibility of matching a users
information needs with documents can be increased. By suggesting precise
retrieval results relevant to the users information needs, the proposed
approach majorly contributes to the improvement of users productivities. 

Post Author: admin


I'm Anna!

Would you like to get a custom essay? How about receiving a customized one?

Check it out