Topic Modeling Best Practices

Project Overview

Literature Review

Compare Number of Topics

Compare Alpha Values

Compare Noun and Regular Corpus Models

Discussion

Conclusions

Appendix 1: Documentation of Corpus Preparation

Appendix 2: Documentation of Model Training Code

References

View the Project on GitHub msaxton/topic-model-best-practices

<!DOCTYPE html>

compare_num_topics

Analysis: Studying the Properties of Topic Models with Different Numbers of Topics

A number of researchers have suggested that one of the limitations of LDA is that it cannot identify how many topics are in a corpus, leaving this decision to the human user (Yau et al., 2014 and Suominen and Toianen, 2016). Indeed, there is no way to identify the "correct" number of topics in advance of building the topic model (Carter et al., 2016). If the user specifies too few topics for the model, then the topics will be too general and useless for exploratory analysis or information retrieval. By contrast, if the user specifies too many topics for the model, the topics will be too specific, or redundant, to be of use; also, too many topics makes the interpretation of the model unwieldy. Therefore, most users experiment with the number of topics and make qualitative evaluations about which number of topics is most useful (Chang et al., 2016). Ultimately the right choice about the number of topics is dependent upon the way in which the model is going to be used (Carter, et al., 2016). As such, the ratio of documents (n) in a corpus to topics (k) to be extracted from the corpus ranges widely. Just to provide a few examples:

Here, I analyze the properties of topic models each of which have a different number of topics:

  • model_25_topics: This model has 25 topics, is based on a noun-only corpus, and has the alpha value set to symmetric.
  • model_75_topics: This model has 25 topics, is based on a noun-only corpus, and has the alpha value set to symmetric.
  • model_150_topics: This model has 150 topics, is based on a noun-only corpus, and has the alpha value set to symmetric.

Set Up: Import Packages and Load Topic Models

In [4]:
from gensim import corpora, models, similarities
import pyLDAvis.gensim
import json
import spacy

path = '../noun_corpus/'

# load metadata for later use
with open('../data/doc2metadata.json', encoding='utf8', mode='r') as f:
    doc2metadata = json.load(f)
    
# load dictionary and corpus for the noun models
dictionary = corpora.Dictionary.load(path + 'noun_corpus.dict')
corpus = corpora.MmCorpus(path + 'noun_corpus.mm')

# load model_25_topics
model_25_topics = models.ldamodel.LdaModel.load(path + 'noun_25.model')

# load model_75_topics
model_75_topics = models.ldamodel.LdaModel.load(path + 'noun_75.model')

# load model_150_topics
model_150_topics = models.ldamodel.LdaModel.load(path + 'noun_150.model')

Topic Coherence Test

Topic Coherence: model_25_topics

In [10]:
model_25_viz = pyLDAvis.gensim.prepare(model_25_topics, corpus, dictionary)
pyLDAvis.display(model_25_viz)
Out[10]:

model_25_topics produced 4 topics which lack semantic or contextual coherence, 2 topics of mixed coherence, and 19 topics which are coherent. Therefore its topics are:

  • 16% junk topics
  • 8% mixed topics
  • 76% coherent topics

To illustrate what is meant by each category, consider the following examples:

Examples of junk topics:

  • Topic 4: essay, bible, john, commentary, theology, james, paul, old, hebrew, introduction
  • Topic 6: spirit, world, revelation, life, angel, enoch, christ, death, idea, philo, lord

Example of mixed topic:

  • Topic 10: job, verb, wisdom, meaning, phrase, proverbs, sense, clause, context, noun

Examples of coherent topics:

  • Topic 2 (narrative criticism): narrative, story, analysis, reader, structure, speech, context, reading, character, function
  • Topic 9 (bible versions): hebrew, lxx, aramaic, translation, mt, greek, version, reading, meaning, targum
  • Topic 11 (textual criticism): reading, manuscript, greek, edition, codex, variant, version, line, e, fragment
  • Topic 18 (Dead Sea Scrolls): qumran, scroll, jerusalem, city, sea, period, dead, site, palestine, wall
  • Topic 23 (Poetry): psalm, psalms, line, song, poem, poetry, prayer, unit, psalter

Topic Coherence: model_75_topics

In [23]:
noun_75_viz = pyLDAvis.gensim.prepare(noun_75, corpus, dictionary)
pyLDAvis.display(noun_75_viz)
Out[23]:

model_75_topics produced 13 topics which lack semantic or contextual coherence, 5 topics of mixed coherence, and 57 topics which are coherent. Therefore its topics are:

  • 17.3% junk topics
  • 6.6% mixed topics
  • 76% coherent topics

A number of the topics from the model_25_topics reappear in the model_75_topics. However, some topics, such as topic 9 from model_25_topics appear to be given more nuance in model_75_topics, for example:

  • topic 35 (bible versions 1): lxx, hebrew, greek, translation, translator, mt, version, septuagint, reading, bible
  • topic 45 (bible versions 2): version, syriac, mt, targum, reading, manuscript, old, peshita, edition, variant It would be possible to dismiss these two topics as redundant. However, topic 35 appears to emphasize Greek versions (lxx, greek, septuagint) whereas topic 45 emphasizes Hebrew (mt, targum) and Syriac (syriac, peshita).

model_75_topics also introduces many new coherent topics not found in model_25_topics, for example:

  • topic 27 (apocalyptic literature): revelation, enoch, angel, apocalypse, vision, son, messiah, heaven, baruch, judgment
  • topic 49 (gender): woman, gender, male, husband, mary, sex, wife, body, role, hair
  • topic 59 (midrash): scripture, torah, rabbi, midrash, canon, mishnah, talmud, neusner, mishna, rabbinic

Topic Coherence: model_150_topics

In [4]:
model_150_viz = pyLDAvis.gensim.prepare(model_150_topics, corpus, dictionary)
pyLDAvis.display(model_150_viz)
Out[4]:

model_150_topics produced 33 topics which lack semantic or contextual coherence, 8 topics of mixed coherence, and 109 topics which are coherent. Therefore its topics are:

  • 22% junk topics
  • 5% mixed topics
  • 73% coherent topics

The coherent topics found in the previous models are present in the model_150_topics, but a large number of other coherent topics are added, for example:

  • topic 78 (holiness Code): leviticus, p, numbers, holiness, code, priestly, offering, legislation, exodus, milgrom
  • topic 104 (song of songs): song, metaphor, love, songs, lover, odes, beloved, solomon, bride, poem
  • topic 112 (patristics): justin, marcion, eusebius, ireneaus, tertullian, papias, hippolytus, tesitmony, epiphanius, harnack

Topic Coherence: Brief Discussion

model_25_topics contained 76% coherent topics, model_75_topics contained 77% coherent topics and model_150_topics contained 73% coherent topics. So, relative to the number of topics in each model, the performance was similar. However, given the raw numbers, model_150_topics contains far more coherent topics than either of the other two models. This suggests that model_150_topics provides a more nuanced model of the corpus. Topics which did not register in the other models, such as topic 78 (holiness code) and topic 112 (patristics), are revealed in model_150_topics. The utility of having nuanced topics needs to be wighed against the difficulty of keeping track of so many topics while doing an exploritory analysis of a corpus; nuance comes at the cost of efficenty.

Clustering Test

In [4]:
def cluster_test(corpus, model):
    docs_with_1_topic = 0
    docs_with_multiple_topics = 0
    docs_with_no_topics = 0
    total_docs = 0
    for doc in corpus:
        topics = model.get_document_topics(doc, minimum_probability=0.20)
        total_docs += 1
        if len(topics) == 1:
            docs_with_1_topic += 1
        elif len(topics) > 1:
            docs_with_multiple_topics += 1
        else:
            docs_with_no_topics += 1
    print('Corpus assigned to a single topic:', (docs_with_1_topic / total_docs) * 100, '%')
    print('Corpus assigned to multiple topics:', (docs_with_multiple_topics / total_docs) * 100, '%')
    print('corpus assigned to no topics:', (docs_with_no_topics / total_docs) * 100, '%')

Clustering: model_25_topics

In [5]:
cluster_test(corpus, model_25_topics)
Corpus assigned to a single topic: 42.55455712451862 %
Corpus assigned to multiple topics: 55.96919127086007 %
corpus assigned to no topics: 1.4762516046213094 %

Clustering: model_25_topics

In [6]:
cluster_test(corpus, model_75_topics)
Corpus assigned to a single topic: 55.26315789473685 %
Corpus assigned to multiple topics: 30.14548566538297 %
corpus assigned to no topics: 14.591356439880187 %

Clustering: model_150_topics

In [7]:
cluster_test(corpus, model_150_topics)
Corpus assigned to a single topic: 54.04364569961489 %
Corpus assigned to multiple topics: 17.811296534017973 %
corpus assigned to no topics: 28.145057766367138 %

Clustering: Brief Discussion

model_25_topics outperforms the other two models in that it only left 1.47% of documents unassigned to a topic. By contrast, model_75_topics left 14.59% of documents assigned and model_150_topics left 28.14% of the documents unassigned. Additionaly, although model_25_topics assigned fewer documents to a single topic than the other two models, it assigned more far more documents to multiple topics, thus providing a more robust clustering system where a document may belong to more than one topic.

Information Retrieval Test

In [5]:
# build indicies for similarity quiries
index_25 = similarities.MatrixSimilarity(model_25_topics[corpus])  
index_75 = similarities.MatrixSimilarity(model_75_topics[corpus])  
index_150 = similarities.MatrixSimilarity(model_150_topics[corpus])

# define retrieval text
def retrieval_test(new_doc, lda, index):
    new_bow = dictionary.doc2bow(new_doc)  # change new document to bag of words representation
    new_vec = lda[new_bow]  # change new bag of words to a vector
    index.num_best = 10  # set index to generate 10 best results
    matches = (index[new_vec])
    scores = []
    for match in matches:
        score = (match[1])
        scores.append(score)
        score = str(score)
        key = 'doc_' + str(match[0])
        article_dict = doc2metadata[key]
        author = article_dict['author']
        title = article_dict['title']
        year = article_dict['pub_year']
        print(key + ': ' + author.title() + ' (' + year + '). ' + title.title() + '\n\tsimilarity score -> ' + score + '\n')
    
# set up nlp for new docs
nlp = spacy.load('en')
stop_words = spacy.en.STOPWORDS

def get_noun_lemmas(text):
    doc = nlp(text)
    tokens = [token for token in doc]
    noun_tokens = [token for token in tokens if token.tag_ == 'NN' or token.tag_ == 'NNP' or token.tag_ == 'NNS']
    noun_lemmas = [noun_token.lemma_ for noun_token in noun_tokens if noun_token.is_alpha]
    noun_lemmas = [noun_lemma for noun_lemma in noun_lemmas if noun_lemma not in stop_words]
    return noun_lemmas

# load and process Greene, N. E. (2017)
with open('../abstracts/greene.txt', encoding='utf8', mode='r') as f:
    text = f.read()
    greene = get_noun_lemmas(text)
    
#load and process Hollenback, G. M. (2017)
with open('../abstracts/hollenback.txt', encoding='utf8', mode='r') as f:
    text = f.read()
    hollenback = get_noun_lemmas(text)

# load and process Dinkler, M. B. (2017)
with open('../abstracts/dinkler.txt', encoding='utf8', mode='r') as f:
    text = f.read()
    dinkler = get_noun_lemmas(text)
/Users/msaxton/anaconda3/lib/python3.6/site-packages/gensim/models/ldamodel.py:495: RuntimeWarning: invalid value encountered in multiply
  gammad = self.alpha + expElogthetad * np.dot(cts / phinorm, expElogbetad.T)

Finding Articles Similar to Greene, N. E. (2017). Creation, destruction, and a Psalmist's plea: rethinking the poetic structure of Psalm 74.

Information Retrieval: model_25_topics

In [6]:
retrieval_test(greene, model_25_topics, index_25)
doc_3297: Muilenburg, James (1944). Psalm 47
	similarity score -> 0.9694811105728149

doc_4673: Globe, Alexander (1974). The Literary Structure And Unity Of The Song Of Deborah
	similarity score -> 0.9577131867408752

doc_2855: Jefferson, Helen Genevieve (1952). Psalm 93
	similarity score -> 0.9295309782028198

doc_5369: Gerstenberger, Erhard (1963). Review Of The Psalms In Israel'S Worship
	similarity score -> 0.9221675395965576

doc_3360: Montgomery, James A. (1945). Stanza-Formation In Hebrew Poetry
	similarity score -> 0.9174731969833374

doc_7288: Limburg, James (1997). Review Of  Jahwe Wird Kommen, Zu Herrschen Über Die Erde: Ps 90-110 Als Komposition 
	similarity score -> 0.910325288772583

doc_8205: Waltke, Bruce K. (1991). Superscripts, Postcripts, Or Both
	similarity score -> 0.9049391746520996

doc_8304: Gladson, Jerry A. (1993). Review Of The Song Of Songs: A Commentary On The Book Of Canticles Or The Song Of Songs
	similarity score -> 0.8977546691894531

doc_2231: Shea, William H. (1986). Chiasmus And The Structure Of David'S Lament
	similarity score -> 0.8907470703125

doc_3092: Hyatt, J. Philip (1950). Review Of The Psalms Translated And Interpreted In The Light Of Hebrew Life And Worship
	similarity score -> 0.8892456293106079

Information Retrieval: model_75_topics

In [7]:
retrieval_test(greene, model_75_topics, index_75)
doc_9217: Briggs, Charles A. (1899). An Inductive Study Of Selah
	similarity score -> 0.8950629830360413

doc_1411: Berry, George R. (1914). The Titles Of The Psalms
	similarity score -> 0.8928478956222534

doc_804: Peters, John P. (1921). Another Folk Song
	similarity score -> 0.8910002708435059

doc_757: Peters, John P. (1916). Ritual In The Psalms
	similarity score -> 0.8899545669555664

doc_2855: Jefferson, Helen Genevieve (1952). Psalm 93
	similarity score -> 0.8780089616775513

doc_123: Armstrong, Ryan M. (2012). Psalms Dwelling Together In Unity: The Placement Of Psalms 133 And 134 In Two Different Psalms Collections
	similarity score -> 0.8754132986068726

doc_8205: Waltke, Bruce K. (1991). Superscripts, Postcripts, Or Both
	similarity score -> 0.871891438961029

doc_9314: Peters, John P. (1910). Notes On Some Ritual Uses Of The Psalms
	similarity score -> 0.8673380017280579

doc_2970: Liebreich, Leon J. (1955). The Songs Of Ascents And The Priestly Blessing
	similarity score -> 0.8623627424240112

doc_5418: Buss, Martin J. (1963). The Psalms Of Asaph And Korah
	similarity score -> 0.8536018133163452

Information Retrieval: model_150_topics

In [8]:
retrieval_test(greene, model_150_topics, index_150)
doc_7176: Malchow, Bruce V. (1997). Review Of Psalm 102 Im Kontext Des Vierten Psalmenbuches
	similarity score -> 0.7944641709327698

doc_7288: Limburg, James (1997). Review Of  Jahwe Wird Kommen, Zu Herrschen Über Die Erde: Ps 90-110 Als Komposition 
	similarity score -> 0.7870739102363586

doc_2855: Jefferson, Helen Genevieve (1952). Psalm 93
	similarity score -> 0.7819486260414124

doc_123: Armstrong, Ryan M. (2012). Psalms Dwelling Together In Unity: The Placement Of Psalms 133 And 134 In Two Different Psalms Collections
	similarity score -> 0.7743778824806213

doc_9217: Briggs, Charles A. (1899). An Inductive Study Of Selah
	similarity score -> 0.7706409692764282

doc_1411: Berry, George R. (1914). The Titles Of The Psalms
	similarity score -> 0.7584921717643738

doc_4324: Buss, Martin J. (1970). Review Of Studien Zur Formgeschichte Von Hymnus Und Danklied In Israel
	similarity score -> 0.7455636858940125

doc_5418: Buss, Martin J. (1963). The Psalms Of Asaph And Korah
	similarity score -> 0.740290641784668

doc_7286: Miller, Patrick D. (1997). Review Of Die Komposition Des Psalters: Ein Formgeschichtlicher Ansatz
	similarity score -> 0.7375591397285461

doc_8205: Waltke, Bruce K. (1991). Superscripts, Postcripts, Or Both
	similarity score -> 0.7220237851142883

Brief Discussion: Finding articles similar to Greene, N. E. (2017). Creation, destruction, and a Psalmist's plea: rethinking the poetic structure of Psalm 74.

Two documents from the corpus were matches with the Greene article in all three models:

  • doc_2855: Jefferson, Helen Genevieve (1952). Psalm 93
  • doc_8205: Waltke, Bruce K. (1991). Superscripts, Postcripts, Or Both

doc_2855 shows up as the 3rd highest match in model_25_topics (similarity score of 92.9%) and model_150_topics (similarity score of 78.2%), but as the 7th highest match in the model_75_topics (similarity score of 87.1%). doc_8205 shows up as the 7th highest match in model_25_topics (similarity score of 90.4%) and model_75_topics (similarity score of 87.1%), but as the 10th highest match in model_150_topics (similarity score of 72.2%).

Finding Articles Similar to Hollenback, G. M. (2017). Who is doing what to whom revisited: Another look at Leviticus 18:22 and 20:13.

Information Retrieval: model_25_topics

In [9]:
retrieval_test(hollenback, model_25_topics, index_25)
doc_5543: Weiss, David Halivni (1962).  A Note On <Rle>אשר לא ארשה<Pdf> 
	similarity score -> 0.8944346904754639

doc_672: Ginzberg, Louis (1922). Some Observations On The Attitude Of The Synagogue Towards The Apocalyptic-Eschatological Writings
	similarity score -> 0.8484905958175659

doc_2271: Peterj. Haas (1986). Review Of Support For The Poor In The Mishnaic Law Of Agriculture: Tractate Peah
	similarity score -> 0.8407023549079895

doc_7969: Morrow, William (1993). Review Of Property And The Family In Biblical Law
	similarity score -> 0.8349530100822449

doc_1594: Avery-Peck, Alan J. (1986). Review Of A History Of The Mishnaic Law Of Damages: Part 2: Baba Mesia: Translation And Explanation
	similarity score -> 0.8192698955535889

doc_3876: Goodblatt, David (1973). Review Of Theft In Early Jewish Law
	similarity score -> 0.8143787384033203

doc_8207: Kraemer, David (1991). The Formation Of Rabbinic Canon: Authority And Boundaries
	similarity score -> 0.8013389706611633

doc_2682: Lieberman, Saul (1952). The Discipline In The So-Called Dead Sea Manual Of Discipline
	similarity score -> 0.795365571975708

doc_5715: Silberman, Lou H. (1963). Review Of The Book Of Asseverations
	similarity score -> 0.7943026423454285

doc_2056: Aus, Roger David (1985). Luke 15:11-32 And R. Eliezer Ben Hyrcanus'S Rise To Fame
	similarity score -> 0.7869861721992493

Information Retrieval: model_75_topics

In [10]:
retrieval_test(hollenback, model_75_topics, index_75)
doc_8995: Martin, Troy W. (2004). Paul'S Argument From Nature For The Veil In 1 Corinthians 11:13-15: A Testicle Instead Of A Head Covering
	similarity score -> 0.7967202663421631

doc_463: Cosgrove, Charles H. (2005). A Woman'S Unbound Hair In The Greco-Roman World, With Special Reference To The Story Of The "Sinful Woman" In Luke 7:36-50
	similarity score -> 0.7828606367111206

doc_8719: Burrus, Virginia (1999). Review Of Early Christian Women And Pagan Opinion: The Power Of The Hysterical Woman
	similarity score -> 0.7453253865242004

doc_284: Lemos, T. M. (2006). Shame And Mutilation Of Enemies In The Hebrew Bible
	similarity score -> 0.7283123731613159

doc_1851: Kraemer, Ross S. (1985). Review Of In Memory Of Her: A Feminist Theological Reconstruction Of Christian Origins
	similarity score -> 0.7240121364593506

doc_316: Nasrallah, Laura (2006). Review Of A Woman'S Place: House Churches In Earliest Christianity
	similarity score -> 0.7107797265052795

doc_143: Townsley, Jeramy (2011). Paul, The Goddess Religions, And Queer Sects: Romans 1:23—28
	similarity score -> 0.7066832780838013

doc_1974: Trible, Phyllis (1987). Review Of The Israelite Woman: Social Role And Literary Type In Biblical Narrative
	similarity score -> 0.7020665407180786

doc_6994: Corley, Kathleen E. (1996). Review Of The Double Message: Patterns Of Gender In Luke-Acts
	similarity score -> 0.6915348768234253

doc_8757: Walsh, Jerome T. (2001). Leviticus 18:22 And 20:13: Who Is Doing What To Whom?
	similarity score -> 0.6825031638145447

Information Retrieval: model_150_topics

In [11]:
retrieval_test(hollenback, model_150_topics, index_150)
doc_7348: Pardee, Dennis (1997). Review Of  Gottes Himmlischer Thronrat: Hintergrund Und Bedeutung Von Sôd Jhwh Im Alten Testament 
	similarity score -> 0.6676588654518127

doc_1830: Buth, Randall (1985).  Luke 19:31-34, Mishnaic Hebrew, And Bible Translation: Is Κύριοι Τοῦ Πώλου Singular? 
	similarity score -> 0.6496256589889526

doc_548: Burrows, Millar (1930). The Original Language Of The Gospel Of John
	similarity score -> 0.5791046619415283

doc_638: Baab, Otto J. (1933). A Theory Of Two Translators For The Greek Genesis
	similarity score -> 0.5729736089706421

doc_4536: Thomas, J. D. (1972). The Greek Text Of Tobit
	similarity score -> 0.5619292259216309

doc_775: Torrey, Charles C. (1935). Professor Marcus On The Aramaic Gospels
	similarity score -> 0.5567817687988281

doc_3212: Lieberman, Saul (1946). Two Lexicographical Notes
	similarity score -> 0.5522226691246033

doc_9254: Lohr, Joel N. (2011). Sexual Desire? Eve, Genesis 3:16, And תשןקה
	similarity score -> 0.5462887287139893

doc_749: Burrows, Millar (1934). Principles For Testing The Translation Hypothesis In The Gospels
	similarity score -> 0.5451783537864685

doc_387: Büchner, Dirk (2010). 'Eξιλ
	similarity score -> 0.5386683344841003

Brief Discussion: Finding articles similar to Hollenback, G. M. (2017). Who is doing what to whom revisited: Another Look at Leviticus 18:22 and 20:13.

model_25_topics returned results having to do with biblical law and rabbinic interpretation. model_75_topics returned results that focus primarily on issues of gender and sexuality. Finally, model_150_topics returned results focusing on translation issues. Clearly, each model understands this article differently. All three themes--law, gender/sexuality, and translation issues-- are present in the article, so in a sense each model is useful. However, and interestingly, none of these models returned the article to which the present one is a response: Walsh, J.T. (2001). Leviticus 18:22 and 20:13: Who is Doing What to Whom? Journal of Biblical Literature, 120, 201-9.

Finding Articles Similar to Dinkler, M. B. (2017). Building character on the road to Emmaus: Lukan characterization in contemporary literary perspective.

Information Retrieval: model_25_topics

In [12]:
retrieval_test(dinkler, model_25_topics, index_25)
doc_6960: Collins, Adela Yarbro (1994). Review Of Teaching With Authority: Miracles And Christology In The Gospel Of Mark
	similarity score -> 0.9818847179412842

doc_8418: Chilton, Bruce (1993). Review Of Die Letzten Tage Jesu: Markus Und Johannes, Ihre Traditionen Und Die Historische Frage, Band 1
	similarity score -> 0.9816484451293945

doc_7626: Smith, D. Moody (1990). Review Of The Fourth Gospel And Its Predecessor: From Narrative Source To Present Gospel
	similarity score -> 0.9804047346115112

doc_8262: Malbon, Elizabeth Struthers (1988). Review Of Sense And Absence: Structure And Suspension In The Ending Of Mark'S Gospel
	similarity score -> 0.9803013205528259

doc_7786: Segovia, Fernando F. (1989). Review Of The Humanity Of Jesus In The Fourth Gospel
	similarity score -> 0.9789215326309204

doc_7819: Collins, Adela Yarbro (1993). Review Of  "Eine Neue Lehre In Vollmacht": Die Streit- Und Schulgespräche Des Markus-Evangeliums 
	similarity score -> 0.977328896522522

doc_5909: Donahue, John R. (1980). Review Of Mark'S Treatment Of The Jewish Leaders
	similarity score -> 0.9765888452529907

doc_8712: Brodie, Thomas L. (1999). Review Of The Discipleship Paradigm: Readers And Anonymous Characters In The Fourth Gospel
	similarity score -> 0.9748431444168091

doc_7428: Senior, Donald (1996). Review Of  Die Älteste Bericht Über Den Tod Jesu: Literarische Analyse Und Historische Kritik Der Passionsdarstellungen Der Evangelien 
	similarity score -> 0.9739815592765808

doc_8227: Black, C. Clifton (1991). Review Of Faith As A Theme In Mark'S Narrative
	similarity score -> 0.9721738696098328

Information Retrieval: model_75_topics

In [13]:
retrieval_test(dinkler, model_75_topics, index_75)
doc_8158: Tyson, Joseph B. (1988). Review Of The Lukan Voice: Confusion And Irony In The Gospel Of Luke
	similarity score -> 0.8752881288528442

doc_7866: Lincoln, Andrew T. (1989). The Promise And The Failure: Mark 16:7, 8
	similarity score -> 0.8574842214584351

doc_1952: Praeder, Susan Marie (1984). Review Of Mark As Story: An Introduction To The Narrative Of A Gospel
	similarity score -> 0.8537847995758057

doc_7796: Malbon, Elizabeth Struthers (1993). Echoes And Foreshadowings In Mark 4-8 Reading And Rereading
	similarity score -> 0.8523545265197754

doc_8712: Brodie, Thomas L. (1999). Review Of The Discipleship Paradigm: Readers And Anonymous Characters In The Fourth Gospel
	similarity score -> 0.852075457572937

doc_264: Ahearne-Kroll, Stephen P. (2010). Audience Inclusion And Exclusion As Rhetorical Technique In The Gospel Of Mark
	similarity score -> 0.839181661605835

doc_7865: Malbon, Elizabeth Struthers (1989). The Jewish Leaders In The Gospel Of Mark: A Literary Study Of Marcan Characterization
	similarity score -> 0.8106918931007385

doc_6706: Boomershine, Thomas E. (1981). Mark 16:8 And The Apostolic Commission
	similarity score -> 0.7990190982818604

doc_312: Sylva, Dennis (2006). Review Of Dialogue And Drama: Elements Of Greek Tragedy In The Fourth Gospel
	similarity score -> 0.7910127639770508

doc_3857: Robbins, Vernon K. (1973). The Healing Of Blind Bartimaeus (10:46-52) In The Marcan Theology
	similarity score -> 0.7885808348655701

Information Retrieval: model_150_topics

In [14]:
retrieval_test(dinkler, model_150_topics, index_150)
doc_8712: Brodie, Thomas L. (1999). Review Of The Discipleship Paradigm: Readers And Anonymous Characters In The Fourth Gospel
	similarity score -> 0.8000462055206299

doc_8420: Cassidy, Richard J. (1993). Review Of Conflict In Luke: Jesus, Authorities, Disciples
	similarity score -> 0.7992616295814514

doc_7865: Malbon, Elizabeth Struthers (1989). The Jewish Leaders In The Gospel Of Mark: A Literary Study Of Marcan Characterization
	similarity score -> 0.798093855381012

doc_7897: Powell, Mark Allan (1990). The Religious Leaders In Luke: A Literary-Critical Study
	similarity score -> 0.791632890701294

doc_312: Sylva, Dennis (2006). Review Of Dialogue And Drama: Elements Of Greek Tragedy In The Fourth Gospel
	similarity score -> 0.7893933057785034

doc_7746: Swartley, Willard M. (1991). Review Of Galilee, Jesus And The Gospels: Literary Approaches And Historical Investigations
	similarity score -> 0.7860599756240845

doc_7796: Malbon, Elizabeth Struthers (1993). Echoes And Foreshadowings In Mark 4-8 Reading And Rereading
	similarity score -> 0.7699451446533203

doc_8446: Bautch, Richard J. (2004). Review Of Pontius Pilate: Portraits Of A Roman Governor
	similarity score -> 0.7637203931808472

doc_8083: Anderson, Janice Capel (1992). Review Of Matthew'S Missionary Discourse: A Literary Critical Analysis
	similarity score -> 0.759093165397644

doc_7819: Collins, Adela Yarbro (1993). Review Of  "Eine Neue Lehre In Vollmacht": Die Streit- Und Schulgespräche Des Markus-Evangeliums 
	similarity score -> 0.7499772906303406

Brief Discussion: Finding Articles Similar to Dinkler, M. B. (2017). Building character on the road to Emmaus: Lukan characterization in contemporary literary perspective.

Each topic model retrieved docuuments dealing with the gospels which on a general level is fitting for this article. There is one document from the corpus which was retireved by all three models:

  • doc_8712: Brodie, Thomas L. (1999). Review Of The Discipleship Paradigm: Readers And Anonymous Characters In The Fourth Gospel

The model_25_topics ranked this as the 8th highest match (similarity score of 97.4%) whereas both model_75_topics and model_150_topics ranked this as the 1st highest match(similarity scores of 86.7% and 80.0% respectivley). It may seem strange that these two models ranked this document as the highest match insofar as it is about the Gospel of John but the query article was about the Gospel of Luke, but the nuance provided by these models are picking up the themes of literary charactiorizartion.