RoboTech

A Study on Information Retrieval Methods in Text Mining

ABSTRACT:

Information in the legal domain is often stored as text in relatively unstructured forms. For example, statutes, judgments and commentaries are typically stored as free documents. Discovering knowledge by the automatic analysis of free text is a field of research that is evolving from information retrieval and is often called text mining.

Dozier states that text mining is a new field and there is still debate about what its definition should be. Jockson observe that text mining involves discovering something interesting about the relationship between the text and the world. Hearst proposes that text mining involves discovering relationships between the content of multiple texts and linking this information together to create new information.

Text information retrieval and data mining has thus become increasingly important. In this paper various information retrieval techniques based on Text mining have been presented.

KEYWORDS:

Information Retrieval, Information Extraction and Indexing Techniques

Start reading the literature.

http://scholar.google.com is your friend. I’ve also found Microsoft Academic is useful. Type in the topic area, whatever it is.

Begin looking at the links that are returned. Read the abstracts. Download the papers that interest you - when I download them I add the data of publication and the title of the paper as the name (because often the names give you no contextual clue as to the contents). It is also a good practice to capture the biblographic data (I grab the BibTeX since I write my CS papers in LaTeX) because you will need it when you start writing.

Start reading the interesting papers. Usually by the time you are done reading the introduction you will know if you want to read further. If you really liked the paper, then use the search engines to find papers that reference the paper you like.

If you are looking for the latest research, restrict your search to the past four years. I usually work backwards from the more recent work to older work. Sometimes I find survey articles, which can be great, usually I will find a few papers that are heavily referenced (100+ references) and that is often indicative of importance of the work.

Summarize those papers. For the ones you thought most relevant try to note what you learned from the paper. What did you like? What did you not like? Were there any areas they failed to address? Did they conflict with other work in the field? If so, why?

If you are looking for a research question try to identify something the paper didn’t do: maybe a technique you think they could have used, or a dataset, or specific equipment. Look to see if any of the other papers have done that. If not, you now have a potential research question: “What happens if we use X when addressing problem Y.” Then think up ways that you can “fail fast” - how can you see if there is merit to that approach. If not, you want to know quickly so you can discard that approach before you spend too much time on it. If it looks promising, then push further and see where your research takes you.

As you do this work, write it up. If you are really disciplined, you will start writing your research paper before you have done the research. That will help you think about how you want to present your findings, which in turn helps you focus on what you need to do to generate the necessary data. Of course, you are likely to find out things don’t work the way that you wanted and you’ll have to rewrite your paper. This way you do have a history of what you did as well. When you’re done, you’ll have a research paper.

That’s the point at which you’ll have to tear it apart. Criticize it. Find its weaknesses. Think about how you will fix those weaknesses or address them (“we did not investigate X and leave it for future work…”) Then ask other people to read it - they won’t have any of your insight so if you aren’t explaining it so they can understand why your work is important (the abstract and introduction!) then you need to go back an rewrite those sections.

At some point you decide you are done with the paper: the time allotted to it has expired, the work has been accepted for publication, you’re sick of it and want to do anything else.

Good luck!

Tony Mason, PhD from The University of British Columbia (2022)

https://www.quora.com/How-do-I-prepare-for-writing-a-research-paper-on-any-topic-in-computer-science/answer/Tony-Mason-10

Word vectors will enable breakthroughs in NLP applications and research. They highlight the beauty of neural network DL and the power of learned representations of input data in hidden layers.

Word Vectors

Word vectors represent a significant leap forward in advancing our ability to analyze relationships across words, sentences, and documents. In doing so, they advance technology by providing machines with much more information about words than has previously been possible using traditional representations of words. It is word vectors that make technologies such as speech recognition and machine translation possible. There are many excellent explanations of word vectors, but in this one, I want to make the concept accessible to data and research people who aren't very familiar with natural language processing (NLP).

What Are Word Vectors?

Word vectors are simply vectors of numbers that represent the meaning of a word. For now, that's not very clear, but we'll come back to it in a bit. It is useful, first of all, to consider why word vectors are considered such a leap forward from traditional representations of words.

Traditional approaches to NLP, such as one-hot encoding and bag-of-words models (i.e. using dummy variables to represent the presence or absence of a word in an observation, i.e. a sentence), while useful for some machine learning (ML) tasks, do not capture information about a word's meaning or context. This means that potential relationships, such as contextual closeness, are not captured across collections of words. For example, a one-hot encoding cannot capture simple relationships, such as determining that the words "dog" and "cat" both refer to animals that are often discussed in the context of household pets. Such encodings often provide sufficient baselines for simple NLP tasks (for example, email spam classifiers), but lack the sophistication for more complex tasks such as translation and speech recognition. In essence, traditional approaches to NLP such as one-hot encodings do not capture syntactic (structure) and semantic (meaning) relationships across collections of words and, therefore, represent language in a very naive way.

In contrast, word vectors represent words as multidimensional continuous floating point numbers where semantically similar words are mapped to proximate points in geometric space. In simpler terms, a word vector is a row of real-valued numbers (as opposed to dummy numbers) where each point captures a dimension of the word's meaning and where semantically similar words have similar vectors. This means that words such as wheel and engine should have similar word vectors to the word car (because of the similarity of their meanings), whereas the word banana should be quite distant. Put differently, words that are used in a similar context will be mapped to a proximate vector space (we will get to how these word vectors are created below). The beauty of representing words as vectors is that they lend themselves to mathematical operators. For example, we can add and subtract vectors — the canonical example here is showing that by using word vectors we can determine that:

king - man + woman = queen

In other words, we can subtract one meaning from the word vector for king (i.e. maleness), add another meaning (femaleness), and show that this new word vector (king - man + woman) maps most closely to the word vector for queen.

The numbers in the word vector represent the word's distributed weight across dimensions. In a simplified sense, each dimension represents a meaning and the word's numerical weight on that dimension captures the closeness of its association with and to that meaning. Thus, the semantics of the word are embedded across the dimensions of the vector.

A Simplified Representation of Word Vectors

https://dzone.com/articles/introduction-to-word-vectors

ABSTRACT:

Undergraduate students, who are digital native, are keen on using emoji (smileys and ideograms) frequently to express themselves emotionally in their digital communication such as WhatsApp Messenger.

Nevertheless, sometimes, they got into misunderstanding due to the different emoji's interpretation between the sender and the recipient.

Research investigating emoji is still relatively new and this study discusses the diverse interpretations of WhatsApp emoji specifically the smileys among Malaysian undergraduates in a public university.

This study attempted to investigate 210 undergraduates' interpretations of 75 smiley (face-like) meanings in WhatsApp Messenger.

The respondents were asked to give feedback in self-administrated survey questionnaire to gather information on their interpretation of the smileys used in WhatsApp.

A descriptive analysis was conducted on the students' interpretations and the findings disclosed that although the students interpreted a few smileys correctly, they did not know the intended meaning of most of the smileys correctly.

The results of this study suggested that the students should know the meaning of the smiley/ emoji used in their digital conversation to able to understand its intended use and to avoid miscommunication in their digital communication.

For WhatsApp users, the findings will be beneficial to emphasize the need to understand the emoji's intended meaning for future tolerant and wiser use.

KEYWORD:

WhatsApp, emoji, smiley, interpretation, undergraduate.

PUBREF:

https://ejournals.ukm.my/mjc/article/view/22621/7134

SEMREF:

https://www.semanticscholar.org/paper/Undergraduates'-Interpretation-on-WhatsApp-Smiley-Annamalai-Salam/ab53467155181400fb56e64d8c8a19800dad6e2e

RGTREF:

https://www.researchgate.net/publication/321969640_Undergraduates%27_Interpretation_on_WhatsApp_Smiley_Emoji

DOCREF:

http://repo.uum.edu.my/24505/1/JK%2033%204%202017%2089%20103.pdf

GIDREF:

https://app.razzi.my/findgref?gid=1XORRFSqiCVSsWHf6YxvqJ8QTT7y1oHXj

Do Intelligent Robots Need Emotion?

Luiz Pessoa 1

Affiliations expand

PMID: 28735707 PMCID: PMC6237080 DOI: 10.1016/j.tics.2017.06.010

Free PMC article

Abstract

What is the place of emotion in intelligent robots? Researchers have advocated the inclusion of some emotion-related components in the information-processing architecture of autonomous agents. It is argued here that emotion needs to be merged with all aspects of the architecture: cognitive-emotional integration should be a key design principle.

https://www.sciencedirect.com/science/article/abs/pii/S1364661317301341

How To Build Word Embeddings

Word embeddings are a type of word representation that allows words with similar meaning to have a similar representation.

They are a distributed representation for text that is perhaps one of the key breakthroughs for the impressive performance of deep learning methods on challenging natural language processing problems.

In this post, you will discover the word embedding approach for representing text data.

After completing this post, you will know:

(1) What the word embedding approach for representing text is and how it differs from other feature extraction methods.

(2) That there are 3 main algorithms for learning a word embedding from text data.

(3) That you can either train a new embedding or use a pre-trained embedding on your natural language processing task.

https://machinelearningmastery.com/what-are-word-embeddings/

How to Use Word Embedding Layers for Deep Learning with Keras

https://machinelearningmastery.com/use-word-embedding-layers-deep-learning-keras/

How to Develop Word Embeddings in Python with Gensim

Word embeddings are a modern approach for representing text in natural language processing.

Word embedding algorithms like word2vec and GloVe are key to the state-of-the-art results achieved by neural network models on natural language processing problems like machine translation.

In this tutorial, you will discover how to train and load word embedding models for natural language processing applications in Python using Gensim.

After completing this tutorial, you will know:

(1) How to train your own word2vec word embedding model on text data.

(2) How to visualize a trained word embedding model using Principal Component Analysis.

(3) How to load pre-trained word2vec and GloVe word embedding models from Google and Stanford.

https://machinelearningmastery.com/develop-word-embeddings-python-gensim/

A Study on Information Retrieval Methods in Text Mining

ABSTRACT:

KEYWORDS:

PUBLINK:

DOCLINK:

SEMLINK:

How to prepare for writing a research paper on any topic in computer science?

Introduction to Word Vectors

Undergraduates' Interpretation on WhatsApp Smiley Emoji

Do Intelligent Robots Need Emotion?

How To Build Word Embeddings

How to Develop Word Embeddings in Python with Gensim