Topic Modeling Best Practices

Project Overview

Literature Review

Compare Number of Topics

Compare Alpha Values

Compare Noun and Regular Corpus Models

Discussion

Conclusions

Appendix 1: Documentation of Corpus Preparation

Appendix 2: Documentation of Model Training Code

References

View the Project on GitHub msaxton/topic-model-best-practices

<!DOCTYPE html>

compare_alphas

Analysis: Studying the Properties of Topic Models with Different Alpha Values

In a topic model, the value of the hyper-parameter alpha dictates how the topics are distributed across the documents. A higher value for alpha means that a topic will be distributed more widely across the documents, whereas a lower value for alpha means that a topic will be distributed more narrowly across the documents. Wallach et al. (2009) argue that attention to the settings of alpha is important in constructing a robust topic model. Yet, many studies that utalize topic models for various purposes simply set the alpha hyper-parameter to the default value (Carron-Arthur et al. (2016) and Székely and vom Brocke (2017)). In the case of gensim the default value for alpha is 'symmetric.' This means that the value for alpha is uniform for each topic. The formula which gensim uses to calculate the symmetric value for alpha is to divide 1.0 by the number of topics in the model. So if the model has 75 topics, alpha will be set to 0.013.

Here I analyze the properties of topic models with three different alpha values:

  • model_alpha_symmetric: This model takes gensim's default setting for alpha, which here results in a value of 0.013. The number of topics is 75 and the model is based on a noun-only version of the corpus.
  • model_alpha_auto: This model has the alpha value set to 'auto' which means that gensim estimates a value for each topic which results in asymmetric values for alpha. See bellow for values of alpha for this mode. The number of topics is 75 and the model is based on a noun-only version of the corpus.
  • model_alpha_05: This model has an alpha value of 0.5 for each topic (symmetric). Setting the alpha value to 0.5 of each topic stretches the model beyond what is perhaps reasonable, but the intent is to show the effects such a setting has on the model. The number of topics for this model is 75 and the model is based on a noun-only version of the corpus.

Set Up: Import Packages and Load Topic Models

In [1]:
from gensim import corpora, models, similarities
import pyLDAvis.gensim
import spacy
import json

path = '../noun_corpus/'

# load metadata for later use
with open('../data/doc2metadata.json', encoding='utf8', mode='r') as f:
    doc2metadata = json.load(f)
    
# load dictionary and corpus for the noun models
dictionary = corpora.Dictionary.load(path + 'noun_corpus.dict')
corpus = corpora.MmCorpus(path + 'noun_corpus.mm')

# load alpha = symmetric model
model_alpha_symmetric = models.ldamodel.LdaModel.load(path + 'noun_75.model')

# load alpha = auto model
model_alpha_auto = models.ldamodel.LdaModel.load(path + 'alphas/noun_auto.model')

# load alpha = 0.5 model
model_alpha_05 = models.ldamodel.LdaModel.load(path + 'alphas/noun_05.model')

Alpha Values

In [2]:
print('Alpha values for symmetric model:\n ', model_alpha_symmetric.alpha)
Alpha values for symmetric model:
  [0.01333333 0.01333333 0.01333333 0.01333333 0.01333333 0.01333333
 0.01333333 0.01333333 0.01333333 0.01333333 0.01333333 0.01333333
 0.01333333 0.01333333 0.01333333 0.01333333 0.01333333 0.01333333
 0.01333333 0.01333333 0.01333333 0.01333333 0.01333333 0.01333333
 0.01333333 0.01333333 0.01333333 0.01333333 0.01333333 0.01333333
 0.01333333 0.01333333 0.01333333 0.01333333 0.01333333 0.01333333
 0.01333333 0.01333333 0.01333333 0.01333333 0.01333333 0.01333333
 0.01333333 0.01333333 0.01333333 0.01333333 0.01333333 0.01333333
 0.01333333 0.01333333 0.01333333 0.01333333 0.01333333 0.01333333
 0.01333333 0.01333333 0.01333333 0.01333333 0.01333333 0.01333333
 0.01333333 0.01333333 0.01333333 0.01333333 0.01333333 0.01333333
 0.01333333 0.01333333 0.01333333 0.01333333 0.01333333 0.01333333
 0.01333333 0.01333333 0.01333333]
In [3]:
print('Alpha values for auto model:\n ', model_alpha_auto.alpha)
Alpha values for auto model:
  [0.10270125 0.0592977  0.07087024 0.06412899 0.05616595 0.04951361
 0.03805129 0.22600546 0.12083639 0.05507494 0.06248622 0.06897874
 0.04855374 0.00229159 0.05208761 0.03678485 0.0615502  0.07355741
 0.04548518 0.06304422 0.07587688 0.10141291 0.0303052  0.06570146
 0.07862557 0.03714601 0.05216953 0.06001518 0.03180525 0.05900658
 0.0474006  0.0520789  0.05153614 0.0444841  0.08849999 0.06562751
 0.05767829 0.5677949  0.0429585  0.05217329 0.05128561 0.05068998
 0.03617514 0.04957202 0.07394192 0.04284829 0.05258643 0.05349901
 0.0978125  0.0437447  0.0301297  0.05985987 0.04824056 0.05434861
 0.05524151 0.04822579 0.05105721 0.05232021 0.04474712 0.06868275
 0.04497385 0.13791354 0.05414683 0.02755224 0.05568515 0.06769605
 0.04380047 0.18338257 0.02185606 0.05321367 0.0817384  0.039121
 0.11358655 0.11281471 0.055462  ]
In [4]:
print('Alpha values for 0.5 model:\n ', model_alpha_05.alpha)
Alpha values for 0.5 model:
  [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
 0.5 0.5 0.5]

Topic Coherence Test

Topic Coherence: model_alpha_symmetric

In [5]:
model_alpha_symmetric_viz = pyLDAvis.gensim.prepare(model_alpha_symmetric, corpus, dictionary)
pyLDAvis.display(model_alpha_symmetric_viz)
Out[5]:

model_alpha_symmetric produced 13 topics which lack semantic or contextual coherence, 5 topics of mixed coherence, and 57 topics which are coherent. Therefore its topics are:

  • 17.3% junk topics
  • 6.6% mixed topics
  • 76% coherent topics

A few examples of junk topics:

  • topic 3: essay, bible, john, commentary, old, theology, james, fortress, david, paul
  • topic 9: faith, hebrews, life, sin, thought, love, sense, people, heart, judgment
  • topic 74: plate, script, hatch, harrison, index, haran, equivalent, print, earthquake, tribution

An example of mixed topics:

  • topic 44: esther, hand, foot, eye, king, garment, head, house, moore, gold (This topic could be thought of as a mix of "body" and the story of Esther).

A few examples of coherent topics:

  • topic 11 (narrative criticism): story, narrative, character, reader, narrator, account, event, motif, element, pattern
  • topic 34 (dead sea scrolls): qumran, scroll, dead, sea, community, scrolls, document, cave, sect, fragment
  • topic 38 (family): son, father, child, family, mother, bother, marriage, wife, daughter, birth

Topic Coherence: model_alpha_auto

In [6]:
model_alpha_auto_viz = pyLDAvis.gensim.prepare(model_alpha_auto, corpus, dictionary)
pyLDAvis.display(model_alpha_auto_viz)
Out[6]:

model_alpha_auto produced 13 topics which lack semantic or contextual coherence, 5 topics of mixed coherence, and 57 topics which are coherent. Therefore its topics are:

  • 17.3% junk topics
  • 6.6% mixed topics
  • 76% coherent topics

A few examples of junk topics:

  • topic 3: essay, bible, john, commentary, old, james, fortress, theology, david, paul
  • topic 9: faith, hebrews, life, sin, thought, sense, love, people, heart, mind
  • topic 74: plate, hatch, harrison, haran, equivalent, print, script, index, earthquake, harrelson

An example of mixed topics:

  • topic 44: esther, hand, foot, eye, king, garment, head, house, moore, gold (This topic could be thought of as a mix of "body" and the story of Esther).

A few examples of coherent topics:

  • topic 11 (narrative criticism): story, narrative, character, reader, narrator, account, event, motif, element, pattern
  • topic 34 (dead sea scrolls): qumran, scroll, dead, sea, community, scrolls, document, cave, sect, fragment
  • topic 37 (family): son, father, child, family, mother, bother, marriage, wife, daughter, birth

Topic Coherence: model_alpha_05

In [8]:
model_alpha_05_viz = pyLDAvis.gensim.prepare(model_alpha_05, corpus, dictionary)
pyLDAvis.display(model_alpha_05_viz)
Out[8]:

model_alpha_05 produced 12 topics which lack semantic or contextual coherence, 5 topics of mixed coherence, and 58 topics which are coherent. Therefore its topics are:

  • 16% junk topics
  • 6.5% mixed topics
  • 77.3% coherent topics

A few examples of junk topics:

  • topic 3: essay, bible, commentary, fortress, scholars, james, david, old, introduction, theology
  • topic 70: esther, kin, lot, moore, garment, seal, gold, instrument, balaam, judith

An example of mixed topics:

  • topic 29 (family):son, father, child, family, brother, jacob, genesis, abraham, mother, joseph (This topic could be thought of as a mix of "family" and "Patriarchs."

A few examples of coherent topics:

  • topic 18 (narrative criticism): story, narrative, account, motif, event, episode, tale, theme, scene, joseph
  • topic 42 (dead sea scrolls): qumran, scroll, dead, sea, community, scrolls, fragment, cave, sect, document
  • topic 24 (justification): faith, christ, promise, salvation, covenant, law, righteousness, gentile, abraham, justification

Topic coherence: Brief Discussion

Each of these models produced a similar number of coherent topics: model_alpha_symmetric has 57 coherent topics, noun_alpha_auto also has 57 coherent topics, and model_alpha_05has 58. A close examination of the topics in these models reveals that they are very similar to one another (especially between model_alpha_symmetric and model_alpha_auto) in terms of the words in topics, although they differ in the order of prominence of words in the topic and the prominence of topics in the corpus (hence they are numbered differently in the visualizations above. Interestingly, model_alpha_05 identified an important topic in New Testament scholarship: topic 24 (justification).

Clustering Test

In [3]:
def cluster_test(corpus, model):
    docs_with_1_topic = 0
    docs_with_multiple_topics = 0
    docs_with_no_topics = 0
    total_docs = 0
    for doc in corpus:
        topics = model.get_document_topics(doc, minimum_probability=0.20)
        total_docs += 1
        if len(topics) == 1:
            docs_with_1_topic += 1
        elif len(topics) > 1:
            docs_with_multiple_topics += 1
        else:
            docs_with_no_topics += 1
    print('Corpus assigned to a single topic:', (docs_with_1_topic / total_docs) * 100, '%')
    print('Corpus assigned to multiple topics:', (docs_with_multiple_topics / total_docs) * 100, '%')
    print('corpus assigned to no topics:', (docs_with_no_topics / total_docs) * 100, '%')

Clustering: model_alpha_symmetric

In [24]:
cluster_test(corpus, model_alpha_symmetric)
Corpus assigned to a single topic: 55.252460419341034 %
Corpus assigned to multiple topics: 30.166880616174584 %
corpus assigned to no topics: 14.58065896448438 %

Clustering: model_alpha_auto

In [25]:
cluster_test(corpus, model_alpha_auto)
Corpus assigned to a single topic: 56.84638425331622 %
Corpus assigned to multiple topics: 26.79717586649551 %
corpus assigned to no topics: 16.356439880188276 %

Clustering: model_alpha_05

In [4]:
cluster_test(corpus, model_alpha_05)
Corpus assigned to a single topic: 45.860077021822846 %
Corpus assigned to multiple topics: 5.2952503209242625 %
corpus assigned to no topics: 48.84467265725289 %

Clustering: Brief Discussion

The results of the cluster test for model_alpha_symmetric and model_alpha_auto are fairly close to one another. model_alpha_symmetric left 14.5% of the documents in the corpus unassigned to a topic and model_alpha_auto left 16.3% of the documents in the corpus unassigned to a topic. model_alpha_05 did not perform nearly as well and left 48.8% of the documents in the corpus unassigned to a topic.

Information Retrieval Test

In [9]:
# build indicies for similarity queries
index_symmetric = similarities.MatrixSimilarity(model_alpha_symmetric[corpus]) 
index_auto = similarities.MatrixSimilarity(model_alpha_auto[corpus])
index_05 = similarities.MatrixSimilarity(model_alpha_05[corpus])

# define retrieval test
def retrieval_test(new_doc, lda, index):
    new_bow = dictionary.doc2bow(new_doc)  # change new document to bag of words representation
    new_vec = lda[new_bow]  # change new bag of words to a vector
    index.num_best = 10  # set index to generate 10 best results
    matches = (index[new_vec])
    scores = []
    for match in matches:
        score = (match[1])
        scores.append(score)
        score = str(score)
        key = 'doc_' + str(match[0])
        article_dict = doc2metadata[key]
        author = article_dict['author']
        title = article_dict['title']
        year = article_dict['pub_year']
        print(key + ': ' + author.title() + ' (' + year + '). ' + title.title() + '\n\tsimilarity score -> ' + score + '\n')
    
# set up nlp for new docs
nlp = spacy.load('en')
stop_words = spacy.en.STOPWORDS

def get_noun_lemmas(text):
    doc = nlp(text)
    tokens = [token for token in doc]
    noun_tokens = [token for token in tokens if token.tag_ == 'NN' or token.tag_ == 'NNP' or token.tag_ == 'NNS']
    noun_lemmas = [noun_token.lemma_ for noun_token in noun_tokens if noun_token.is_alpha]
    noun_lemmas = [noun_lemma for noun_lemma in noun_lemmas if noun_lemma not in stop_words]
    return noun_lemmas

# load and process Greene, N. E. (2017)
with open('../abstracts/greene.txt', encoding='utf8', mode='r') as f:
    text = f.read()
    greene = get_noun_lemmas(text)
    
#load and process Hollenback, G. M. (2017)
with open('../abstracts/hollenback.txt', encoding='utf8', mode='r') as f:
    text = f.read()
    hollenback = get_noun_lemmas(text)

# load and process Dinkler, M. B. (2017)
with open('../abstracts/dinkler.txt', encoding='utf8', mode='r') as f:
    text = f.read()
    dinkler = get_noun_lemmas(text)

Finding Articles Similar to Greene, N. E. (2017). Creation, destruction, and a Psalmist's plea: rethinking the poetic structure of Psalm 74.

Infomration Retrieval: model_alpha_symmetric

In [10]:
retrieval_test(greene, model_alpha_symmetric, index_symmetric)
doc_9217: Briggs, Charles A. (1899). An Inductive Study Of Selah
	similarity score -> 0.8951160907745361

doc_1411: Berry, George R. (1914). The Titles Of The Psalms
	similarity score -> 0.8929029703140259

doc_804: Peters, John P. (1921). Another Folk Song
	similarity score -> 0.8910683393478394

doc_757: Peters, John P. (1916). Ritual In The Psalms
	similarity score -> 0.8900140523910522

doc_2855: Jefferson, Helen Genevieve (1952). Psalm 93
	similarity score -> 0.8780790567398071

doc_123: Armstrong, Ryan M. (2012). Psalms Dwelling Together In Unity: The Placement Of Psalms 133 And 134 In Two Different Psalms Collections
	similarity score -> 0.8754923939704895

doc_8205: Waltke, Bruce K. (1991). Superscripts, Postcripts, Or Both
	similarity score -> 0.8719313144683838

doc_2970: Liebreich, Leon J. (1955). The Songs Of Ascents And The Priestly Blessing
	similarity score -> 0.8704249262809753

doc_9314: Peters, John P. (1910). Notes On Some Ritual Uses Of The Psalms
	similarity score -> 0.8673655986785889

doc_5418: Buss, Martin J. (1963). The Psalms Of Asaph And Korah
	similarity score -> 0.8537120223045349

Infomration Retrieval: model_alpha_auto

In [11]:
retrieval_test(greene, model_alpha_auto, index_auto)
doc_8205: Waltke, Bruce K. (1991). Superscripts, Postcripts, Or Both
	similarity score -> 0.9079644083976746

doc_804: Peters, John P. (1921). Another Folk Song
	similarity score -> 0.9012252688407898

doc_757: Peters, John P. (1916). Ritual In The Psalms
	similarity score -> 0.8905765414237976

doc_9217: Briggs, Charles A. (1899). An Inductive Study Of Selah
	similarity score -> 0.8876312375068665

doc_8503: Gillingham, S. (1999). Review Of The Message Of The Psalter: An Eschatological Programme In The Book Of Psalms
	similarity score -> 0.8860473036766052

doc_2855: Jefferson, Helen Genevieve (1952). Psalm 93
	similarity score -> 0.8855765461921692

doc_7877: Allen, Leslie C. (1989). Review Of The Identity Of The Individual In The Psalms
	similarity score -> 0.8853786587715149

doc_1411: Berry, George R. (1914). The Titles Of The Psalms
	similarity score -> 0.8818780183792114

doc_123: Armstrong, Ryan M. (2012). Psalms Dwelling Together In Unity: The Placement Of Psalms 133 And 134 In Two Different Psalms Collections
	similarity score -> 0.8806821703910828

doc_5418: Buss, Martin J. (1963). The Psalms Of Asaph And Korah
	similarity score -> 0.8636859059333801

Information Retrieval: model_alpha_05

In [12]:
retrieval_test(greene, model_alpha_05, index_05)
doc_8205: Waltke, Bruce K. (1991). Superscripts, Postcripts, Or Both
	similarity score -> 0.9121397137641907

doc_8503: Gillingham, S. (1999). Review Of The Message Of The Psalter: An Eschatological Programme In The Book Of Psalms
	similarity score -> 0.8917129039764404

doc_2855: Jefferson, Helen Genevieve (1952). Psalm 93
	similarity score -> 0.8912713527679443

doc_9217: Briggs, Charles A. (1899). An Inductive Study Of Selah
	similarity score -> 0.8875983357429504

doc_123: Armstrong, Ryan M. (2012). Psalms Dwelling Together In Unity: The Placement Of Psalms 133 And 134 In Two Different Psalms Collections
	similarity score -> 0.8743528723716736

doc_7648: Mccann,, J. Clinton (1990). Review Of Psalms: Part I With An Introduction To Cultic Poetry
	similarity score -> 0.8730670213699341

doc_7877: Allen, Leslie C. (1989). Review Of The Identity Of The Individual In The Psalms
	similarity score -> 0.8690416216850281

doc_1411: Berry, George R. (1914). The Titles Of The Psalms
	similarity score -> 0.867630124092102

doc_5418: Buss, Martin J. (1963). The Psalms Of Asaph And Korah
	similarity score -> 0.8577131628990173

doc_8075: Landes, George M. (1992). Review Of Jonah: A New Translation With Introduction, Commentary, And Interpretation
	similarity score -> 0.8557348847389221

Brief Discussion: Finding articles similar to Greene, N. E. (2017). Creation, destruction, and a Psalmist's plea: rethinking the poetic structure of Psalm 74.

These models achieved similar average similarity scores in the first information retrieval task and each model returned documents about psalms in its results. Six documents from the corpus were matches with the Greene article in all three models (although there was not consistency in how high of a match each was ranked):

  • doc_9217: Briggs, Charles A. (1899). An Inductive Study Of Selah
  • doc_1411: Berry, George R. (1914). The Titles Of The Psalms
  • doc_2855: Jefferson, Helen Genevieve (1952). Psalm 93
  • doc_123: Armstrong, Ryan M. (2012). Psalms Dwelling Together In Unity: The Placement Of Psalms 133 And 134 In Two Different Psalms Collections
  • doc_8205: Waltke, Bruce K. (1991). Superscripts, Postscripts, Or Both
  • doc_5418: Buss, Martin J. (1963). The Psalms Of Asaph And Korah

Finding Articles Similar to Hollenback, G. M. (2017). Who is doing what to whom revisited: Another look at Leviticus 18:22 and 20:13.

Infomration Retrieval: model_alpha_symmetric

In [13]:
retrieval_test(hollenback, model_alpha_symmetric, index_symmetric)
doc_8995: Martin, Troy W. (2004). Paul'S Argument From Nature For The Veil In 1 Corinthians 11:13-15: A Testicle Instead Of A Head Covering
	similarity score -> 0.8024715185165405

doc_463: Cosgrove, Charles H. (2005). A Woman'S Unbound Hair In The Greco-Roman World, With Special Reference To The Story Of The "Sinful Woman" In Luke 7:36-50
	similarity score -> 0.7837103009223938

doc_8719: Burrus, Virginia (1999). Review Of Early Christian Women And Pagan Opinion: The Power Of The Hysterical Woman
	similarity score -> 0.7089771628379822

doc_1851: Kraemer, Ross S. (1985). Review Of In Memory Of Her: A Feminist Theological Reconstruction Of Christian Origins
	similarity score -> 0.7029827833175659

doc_143: Townsley, Jeramy (2011). Paul, The Goddess Religions, And Queer Sects: Romans 1:23—28
	similarity score -> 0.7025076150894165

doc_1974: Trible, Phyllis (1987). Review Of The Israelite Woman: Social Role And Literary Type In Biblical Narrative
	similarity score -> 0.6900537610054016

doc_284: Lemos, T. M. (2006). Shame And Mutilation Of Enemies In The Hebrew Bible
	similarity score -> 0.689003050327301

doc_3940: Bailey, John A. (1970). Initiation And The Primal Woman In Gilgamesh And Genesis 2-3
	similarity score -> 0.6793158054351807

doc_7634: D'Angelo, Mary Rose (1990). Women In Luke-Acts: A Redactional View
	similarity score -> 0.6689492464065552

doc_8757: Walsh, Jerome T. (2001). Leviticus 18:22 And 20:13: Who Is Doing What To Whom?
	similarity score -> 0.6591153144836426

Information Retrieval: model_alpha_auto

In [14]:
retrieval_test(hollenback, model_alpha_auto, index_auto)
doc_8995: Martin, Troy W. (2004). Paul'S Argument From Nature For The Veil In 1 Corinthians 11:13-15: A Testicle Instead Of A Head Covering
	similarity score -> 0.764740526676178

doc_8719: Burrus, Virginia (1999). Review Of Early Christian Women And Pagan Opinion: The Power Of The Hysterical Woman
	similarity score -> 0.7516112923622131

doc_1851: Kraemer, Ross S. (1985). Review Of In Memory Of Her: A Feminist Theological Reconstruction Of Christian Origins
	similarity score -> 0.7409570217132568

doc_463: Cosgrove, Charles H. (2005). A Woman'S Unbound Hair In The Greco-Roman World, With Special Reference To The Story Of The "Sinful Woman" In Luke 7:36-50
	similarity score -> 0.7398333549499512

doc_1974: Trible, Phyllis (1987). Review Of The Israelite Woman: Social Role And Literary Type In Biblical Narrative
	similarity score -> 0.7349920868873596

doc_316: Nasrallah, Laura (2006). Review Of A Woman'S Place: House Churches In Earliest Christianity
	similarity score -> 0.7129771709442139

doc_284: Lemos, T. M. (2006). Shame And Mutilation Of Enemies In The Hebrew Bible
	similarity score -> 0.7090410590171814

doc_2149: Running, Leona Glidden (1983). Review Of Il Femminismo Della Bibbia
	similarity score -> 0.706333577632904

doc_6994: Corley, Kathleen E. (1996). Review Of The Double Message: Patterns Of Gender In Luke-Acts
	similarity score -> 0.695527195930481

doc_9039: Brawley, Robert L. (2001). Review Of Homoeroticism In The Biblical World: A Historical Perspective
	similarity score -> 0.6889135837554932

Information Retrieval: model_alpha_05

In [15]:
retrieval_test(hollenback, model_alpha_05, index_05)
doc_316: Nasrallah, Laura (2006). Review Of A Woman'S Place: House Churches In Earliest Christianity
	similarity score -> 0.8047031164169312

doc_463: Cosgrove, Charles H. (2005). A Woman'S Unbound Hair In The Greco-Roman World, With Special Reference To The Story Of The "Sinful Woman" In Luke 7:36-50
	similarity score -> 0.8028905987739563

doc_8995: Martin, Troy W. (2004). Paul'S Argument From Nature For The Veil In 1 Corinthians 11:13-15: A Testicle Instead Of A Head Covering
	similarity score -> 0.7987456917762756

doc_225: Miller, James E. (2009). A Critical Response To Karin Adams'S Reinterpretation Of Hosea 4:13-14
	similarity score -> 0.7941276431083679

doc_6466: Friedman, Mordechai A. (1980). Israel'S Response In Hosea 2:17B: "You Are My Husband"
	similarity score -> 0.7828770279884338

doc_8719: Burrus, Virginia (1999). Review Of Early Christian Women And Pagan Opinion: The Power Of The Hysterical Woman
	similarity score -> 0.7763488292694092

doc_7805: Bird, Phyllis A. (1993). Review Of Frauen Im Alten Israel: Eine Begriffsgeschichtliche Und Sozialrechtliche Studie Zur Stellung Der Frau Im Alten Testament
	similarity score -> 0.7680765390396118

doc_8757: Walsh, Jerome T. (2001). Leviticus 18:22 And 20:13: Who Is Doing What To Whom?
	similarity score -> 0.7636564373970032

doc_9258: Kalmanofsky, Amy (2011). The Dangerous Sisters Of Jeremiah And Ezekiel
	similarity score -> 0.751761794090271

doc_1481: Bassler, Jouette M. (1984). The Widows' Tale: A Fresh Look At 1 Tim 5:3-16
	similarity score -> 0.7390398383140564

Brief Discussion: Finding articles similar to Hollenback, G. M. (2017). Who is doing what to whom revisited: Another Look at Leviticus 18:22 and 20:13.

Each model returned documents dealing with gender and sexuality which is appropriate given the nature of the query article. Four documents from the corpus were matches with the Hollenback article in all three models:

  • doc_8995: Martin, Troy W. (2004). Paul'S Argument From Nature For The Veil In 1 Corinthians 11:13-15: A Testicle Instead Of A Head Covering
  • doc_463: Cosgrove, Charles H. (2005). A Woman'S Unbound Hair In The Greco-Roman World, With Special Reference To The Story Of The "Sinful Woman" In Luke 7:36-50
  • doc_8719: Burrus, Virginia (1999). Review Of Early Christian Women And Pagan Opinion: The Power Of The Hysterical Woman
  • doc_316: Nasrallah, Laura (2006). Review Of A Woman'S Place: House Churches In Earliest Christianity

Interestingly, all three models ranked doc_463 as the second most likely match. It is also worth noting that doc_8757: Walsh, Jerome T. (2001). Leviticus 18:22 And 20:13: Who Is Doing What To Whom? was returned as a match for model_alpha_symmetric and for model_alpha_05. This is the article to which the query article is a response.

Finding articles similar to Dinkler, M. B. (2017). Building Character on the Road to Emmaus: Lukan Characterization in Contemporary Literary Perspective.

Information Retrieval: model_alpha_symmetric

In [16]:
retrieval_test(dinkler, model_alpha_symmetric, index_symmetric)
doc_8158: Tyson, Joseph B. (1988). Review Of The Lukan Voice: Confusion And Irony In The Gospel Of Luke
	similarity score -> 0.8744149804115295

doc_7866: Lincoln, Andrew T. (1989). The Promise And The Failure: Mark 16:7, 8
	similarity score -> 0.857151210308075

doc_1952: Praeder, Susan Marie (1984). Review Of Mark As Story: An Introduction To The Narrative Of A Gospel
	similarity score -> 0.8532338738441467

doc_8712: Brodie, Thomas L. (1999). Review Of The Discipleship Paradigm: Readers And Anonymous Characters In The Fourth Gospel
	similarity score -> 0.8529982566833496

doc_7796: Malbon, Elizabeth Struthers (1993). Echoes And Foreshadowings In Mark 4-8 Reading And Rereading
	similarity score -> 0.8522621989250183

doc_264: Ahearne-Kroll, Stephen P. (2010). Audience Inclusion And Exclusion As Rhetorical Technique In The Gospel Of Mark
	similarity score -> 0.8406468629837036

doc_7865: Malbon, Elizabeth Struthers (1989). The Jewish Leaders In The Gospel Of Mark: A Literary Study Of Marcan Characterization
	similarity score -> 0.8112123608589172

doc_6706: Boomershine, Thomas E. (1981). Mark 16:8 And The Apostolic Commission
	similarity score -> 0.7979841828346252

doc_7110: Stegner, William Richard (1995). Review Of Israel'S Scripture Traditions And The Synoptic Gospels: Story Shaping Story
	similarity score -> 0.7967683672904968

doc_312: Sylva, Dennis (2006). Review Of Dialogue And Drama: Elements Of Greek Tragedy In The Fourth Gospel
	similarity score -> 0.7895323634147644

Information Retrieval: model_alpha_auto

In [17]:
retrieval_test(dinkler, model_alpha_auto, index_auto)
doc_8158: Tyson, Joseph B. (1988). Review Of The Lukan Voice: Confusion And Irony In The Gospel Of Luke
	similarity score -> 0.9522409439086914

doc_1952: Praeder, Susan Marie (1984). Review Of Mark As Story: An Introduction To The Narrative Of A Gospel
	similarity score -> 0.8992326259613037

doc_1951: Kee, Howard Clark (1984). Review Of Jesus Walking On The Sea: Meaning And Gospel Functions Of Matt 14:22-23, Mark 6:45-52 And John 6:15B-21
	similarity score -> 0.8724315166473389

doc_6960: Collins, Adela Yarbro (1994). Review Of Teaching With Authority: Miracles And Christology In The Gospel Of Mark
	similarity score -> 0.8649736046791077

doc_312: Sylva, Dennis (2006). Review Of Dialogue And Drama: Elements Of Greek Tragedy In The Fourth Gospel
	similarity score -> 0.8636617064476013

doc_7110: Stegner, William Richard (1995). Review Of Israel'S Scripture Traditions And The Synoptic Gospels: Story Shaping Story
	similarity score -> 0.8630051612854004

doc_8083: Anderson, Janice Capel (1992). Review Of Matthew'S Missionary Discourse: A Literary Critical Analysis
	similarity score -> 0.8462340831756592

doc_8712: Brodie, Thomas L. (1999). Review Of The Discipleship Paradigm: Readers And Anonymous Characters In The Fourth Gospel
	similarity score -> 0.843693733215332

doc_7109: Moore, Stephen D. (1995). Review Of Deconstructing The New Testament
	similarity score -> 0.842379093170166

doc_7796: Malbon, Elizabeth Struthers (1993). Echoes And Foreshadowings In Mark 4-8 Reading And Rereading
	similarity score -> 0.8411687612533569

Information Retrieval: model_alpha_05

In [18]:
retrieval_test(dinkler, model_alpha_05, index_05)
doc_7506: Moore, Stephen D. (1996). Review Of Reading Mark From The Outside: Eco And Iser Leave Their Marks
	similarity score -> 0.8930432796478271

doc_7978: Collins, Adela Yarbro (1993). Review Of Irony In Mark'S Gospel: Text And Subtext
	similarity score -> 0.8817631006240845

doc_8158: Tyson, Joseph B. (1988). Review Of The Lukan Voice: Confusion And Irony In The Gospel Of Luke
	similarity score -> 0.8774893283843994

doc_7819: Collins, Adela Yarbro (1993). Review Of  "Eine Neue Lehre In Vollmacht": Die Streit- Und Schulgespräche Des Markus-Evangeliums 
	similarity score -> 0.8729898929595947

doc_8712: Brodie, Thomas L. (1999). Review Of The Discipleship Paradigm: Readers And Anonymous Characters In The Fourth Gospel
	similarity score -> 0.8682312965393066

doc_9260: Iverson, Kelly R. (2011). A Centurion'S "Confession": A Performance-Critical Analysis Of Mark 15:39
	similarity score -> 0.8642517924308777

doc_7035: Green, Joel B. (1998). Review Of The Paradox Of Salvation: Luke'S Theology Of The Cross
	similarity score -> 0.850237250328064

doc_312: Sylva, Dennis (2006). Review Of Dialogue And Drama: Elements Of Greek Tragedy In The Fourth Gospel
	similarity score -> 0.8369032144546509

doc_5911: Nardoni, Enrique (1980). Review Of  La Transfiguración De Jesús Y El Diálogo Sobre Elías Según El Evangelio De San Marcos 
	similarity score -> 0.8311287760734558

doc_8446: Bautch, Richard J. (2004). Review Of Pontius Pilate: Portraits Of A Roman Governor
	similarity score -> 0.8306546807289124

Brief Discussion: Finding Articles Similar to Dinkler, M. B. (2017). Building character on the road to Emmaus: Lukan characterization in contemporary literary perspective.

Each topic model retrieved documents dealing with the gospels which on a general level are appropriate for the query article. The similarity score for each model are close to one another for this retrieval task. Three documents from the corpus were returned by all three models:

  • doc_8158: Tyson, Joseph B. (1988). Review Of The Lukan Voice: Confusion And Irony In The Gospel Of Luke
  • doc_8712: Brodie, Thomas L. (1999). Review Of The Discipleship Paradigm: Readers And Anonymous Characters In The Fourth Gospel
  • doc_312: Sylva, Dennis (2006). Review Of Dialogue And Drama: Elements Of Greek Tragedy In The Fourth Gospel

doc_8158 was ranked as the top match by model_alpha_symmetric and model_alpha_auto.