Using the Middle High German Conceptual Database

Revision 1.1 (May 27th, 2003)

Content

Introduction

Preamble

Search Strings

Simple Searches

Simple Search String

Search string for Word Search with preceding $-sign

Search string for Lemma Search with preceding @-sign

Search for Conceptual Categories

Word position in a text line

Punctuation

Grammar Tags

Text Idiosyncrasies
Special Symbols

Boolean Operators

Serial Searches

Context Queries

Representation of search results

Examples

Dictionary

comments and user mail

Introduction

Preamble

When using MHDBDB you ought to be aware that the entire database system is an ongoing construction site. That means that the dictionary as well as the entire text base is subject to continuous revision and expansion. Texts are being lemmatised and disambiguated, that is, homographs are being separated and ambiguous words are assigned to a specific meaning within their respective context. From time to time, we will inform you about the ongoing work on our News Page.
In addition, we are continuously working on the program system and the help tools. Therefore, we are urging you to send us comments and error reports so we may improve the system and make it more user friendly.

If you want to submit a query to MHDBDB, you first have to do three different things:

You have to submit a ("well-formulated") search string according to the following guidelines and

you have to decide whether you want to work with the Analyse Text module or with the Dictionary module.

Once you decided to work with Analyse Text, you have to determine the texts in which the search is supposed to take place. Thereby you have to take into consideration whether a text has already been fully integrated with the Dictionary or not. If that is not the case, you need to know that the given text is only partially lemmatised. In that case it is best to search for 'strings' (e.g. "diet*", if you want to retrieve all occurrences of the name "Dietrich"). Please, consult the module Text List on the main menu so that you may know exactly with which text (author, edition, manuscript) you would like to work.
Some texts may be lemmatised, but not yet disambiguated, which means homographs have not been separated and exact meanings have not been determined on basis of context. In such cases you may find that certain word forms cannot be found under certain categories, because they constitute a homograph and thus may belong to two different lemmas, but the specific assignments have not yet been made for a given text. If you search for a certain category in the Dictionary, you will receive all lemmas and their variants that have been assigned to this category up to that point. If the frequency shows 0 (null), the given lemma or variant has not been assigned a meaning in any text yet. Thus it is always advisable to back up your search with a search for 'strings', if you are interested in completeness of your information.
Some texts (editions) include certain idiosyncracies (caesuras, editorial comments, insertions, missing words/lines), about which you should inform yourself before submitting a query. When you click on the short form for the text (AH=Der arme Heinrich) in Text List, you will get to a page containing text information (edition and text idiosyncrasies). There you will find separate windows for Edition and Description, containing references to all idiosyncrasies of the given text edition. When you click on the author in Text List, you will get to a page containing information on the author.

Text Selection

The box for text selection contains three different types you can choose from:

Alle Texte (all texts) is the default setting in the first line. This selection includes all texts that are currently in the database, whether they have been lemmatised or not. Currently, only 25 texts have been fully lemmatised and categorised, especially the texts whose short code consists of only two letters (example: AH = Der arme Heinrich).
If you select from the main menu the module Statistics you will see an overview of the current editorial status of the entire project. Naturally, you may only retrieve partial results if you search in non-lemmatised texts for lemmas, categories, grammar tags instead of strings and word forms.

Textgruppen(text groups): those are currently texts that are grouped according to authors (Hartmann von Aue, Konrad von Würzburg, Ulrich von Lichtenstein und Wolfram von Eschenbach); groupings according to text types (Heldenepik/heroic epics, Lyrik/lyric poetry, Artusepik/Arthurian epics) will follow later.

Individualised Texts; here you can select all the texts individually, including those that are members of the text groups by mouse click. You may also form your own text groups. For that purpose, simply click on the first text you want to include in that group, then drag the bar to your right down until you see the last text you want to include. Hold down the shift key and click on the text. Now all texts between the first and the last will be highlighted. As a next step, you need to eliminate those texts you don't want to include by holding down the ctrl key and clicking on each text at a time so that they are no longer highlighted. Repeat that process until you have exactly the text group you want.

When you conduct a context search you may choose the size of your context between 1-99 words or lines before and after your keyword by changing the values in the smaller windows on the right hand side of your screen. In addition, you may adjust the settings for your output by selecting Varianten anzeigen (show variants) or Lemmas anzeigen (show lemmas), which will give you an output showing either lemmas with all their variants or the lemma forms only. Furthermore, you may determine whether your output shows the results for each text individually (Einzeltexte anzeigen) (show individual texts), or according to text groups (Textgruppen anzeigen), or simply as the sum total for all texts (Summe aller Texte).

Please, keep in mind that the speed of your returns is depending on two things:

the number and the length of texts (for instance, FD, KU, LZ, NB, PZ, TR or WH are very long) you investigate.

the time of the day (at certain times of peak usage the speed of data transport on the Internet may be very slow).

You can find links to additional Help Pages for both the Dictionary and the Analyse Text Modules below the project image, as well as two help functions for finding conceptual categories, Browse Categories and Search Word in Category System, as well as a page explaining the grammatical categories that may be searched,Explain grammar tags.

Search Strings

Simple Searches

All queries that submit simple search strings only or combine such simple search strings through Boolean operators, are "simple searches." With a "simple search string" you may, for instance, submit a string of letters of which a lemma or one of its variants may consist. Through Boolean operators you may determine which search strings may be connected with each other or may exclude each other. The search possibilities with such "simple search strings" are explained in detail in the following paragraph.

Simple Search String
A query for haben will retrieve all occurrences of this word form as well as other forms that are based on the same lemma, along with all words that include the string of letters "haben" (e.g. the negation "enhaben") and their derivatives. The search string may also include wild cards or jokers (* = word or ? = character, for example, hab*, *abe*, ?aben), but in this case, word combinations containing the string are no longer given. Instead, you may retrieve words which you may not have intended to get (e. g. "habech").
Search string for words preceded by a $-sign
The query for $haben, for instance, provides you exclusively with references for exactly the word form "haben"; neither variants of of the lemma "haben" nor compounds with "haben" will be shown. You may use wild cards or jokers, however, which may yield different results.
Search string for lemmas preceded by a @-sign
The search for @haben, for instance, provides you with all references for the lemma "haben" as well as its variants. However, the number of variants only reflects the current status of the lemmatisation of the text base. By using wild cards you may retrieve other variants for the lemma that have not yet been lemmatised. Thus, for instance, a search for @hab* will yield more variants for the lemma "haben", along with variants of other lemmas that begin with the string "hab", as variants for the lemma "habech"(hawk).
Search for Conceptual Categories
Before you are trying to search for conceptual categories you should familiarise yourself with the overall conceptual system Each category (that is its respective number code) or a combination of categories may be used as a search input. If you search for the category 2322, you will retrieve all words that have something to do with the concept 'horse and horsemanship'. Category numbers may also end in a wild card, since the numbering reflects the hierarchic order of the system. For instance, the search for 1402* will yield word material for all subcategories to the category 1402 ('birds') but not the category 1402 itself. You may get your number codes by clicking Browse Categories. All you need to do is to highlight and copy the selected category number in that window and paste it into your search window. You may also search for words or strings in the category system by clicking on Search Word in Category System If you enter, for instance, the word Namen into the little search window for that module, you will get all categories for names (family names, place names, etc.) on your screen.
Word Position Within Text Line
You may also search for words that occur in a certain position within a text line. This is especially interesting in texts that are written in metres and are using rhymes. The search string #3, for instance, yields all words that occur in the third position of each line in a given text. The query inputs #=3 or # = 3 are the equivalents for the first input. The search inputs # <3 or #>3 provide you with all words that occur either immediately before or following the third position in a line. Searching for #a gives you all words that occur at the beginning of a text line (the search string #1 will yield the same results), whereas #e provides you with all words occurring at the end of a line. The search string #m gives you all words at the beginning of a paragraph or section (for instance, words that tend to be written with a beginning capital letter); the input #v or #n finds all words before or after a caesura (a metric phenomenon that occurs in a small number of texts only).
Punctuation
Search strings for punctuation signs always have to be placed within brackets. The string [!?.], finds the respective sentence endings, where exclamation point, question mark or full stop are used. The input [*] yields all punctuation marks. The input is identical to [.!?,;:-"'()]. The input ["] or [<] yields the opening quotation mark, that is the beginning of a direct speech, whereas [>] finds the closing quotation mark that stands at the end of a direct speech. For quotations within direct speech the texts show the French quotation marks, << = opening quote and >> = closing quote. You may search for these with the following search strings: [{] or [X] for opening quote; [}] or [Y] for closing quote. Single quotation mark or apostrophe appears in contracted word forms (e.g.d'in. You may search for the apostrophe like for any regular character. The search string *'*, for instance, provides you with all words that contain an apostrophe.
Grammar Tags
You may search for words belonging to a certain word type by placing the abbreviated form (3 capital letters) of a grammar tag within the less than/larger than symbols. For instance, the string <ADJ> provides you with all adjectives. The grammar tags are: NOM (noun), NAM (name), ADJ (adjective), ADV (adverb), ART (article), CNJ (conjunction), CPA (comparative particle, e. g. als, wie), DET (determinant), DIG (digit), GRA (gradation particle, e. g. vil), INJ (interjection), IPA (interrogative pronoun), NEG (negation), NUM (numeral), POS (possessive pronoun), PRO (personal pronoun), PRP (preposition), VRB (verb), VEX (auxiliary), VEM (modal auxiliary). You may open an extra help window containing this list by clicking Explain grammar tags.
Text Idiosyncracies
may also be searched from the Analyse Text module. Omissions or gaps in a text are marked by three dots (...). Those may be searched via the search string [ _ ] = underscore within brackets. Capital letters at the beginning of lines can be searched with #l and major capitals at the beginning of chapters or sections may be found through the input #c. Emendations or insertions by the editors are set in italics in the electronic texts. They may be searched through the string #m.
Special Characters Special characters are not in all texts represented in the same way as they appear in the original edition. For instance, the superscripted «o» above the «u»is generally represented as «uo» in the electronic texts. «ß» appears in some texts as «s». «y» with lengthening sign appears as «Ý» that is as "Y with superscripted accent".
The «ß» is simplified in some texts to «s». In searches, you may substitute «ß» for «s.» if «ß» is too cumbersome to emulate on your keyboard. (Example: $has. finds all occurrences for haß.
Since the emulation of umlauts and vowels with lengthening sign may be equally cumbersome on some keyboards, you may always use a following «.(period)» for umlauts and a following «- (hyphen)» for lengthening sign. (example: ü = u. / â = a-)

Boolean Operators

By means of the Boolean operators 'und'(and) (&) and 'oder'(or) (|) you may connect individual search strings with each other. For instance, the combined search string $ich & #a retrieves all references for "ich" at the beginning of a text line; the combined string 1402* | 1403* will give you all words that belong to a subcategory of 1402 (birds) as well as to a subcategory of 1403 (fishes). In an extended search string, you may use as many Boolean operators as you wish in order to combine simple search strings (see above) with each other. Such a valid extended search string would be, for instance, haben | [;:!] | 1402* & #e. Whenever an extended search string contains the Boolean operator & as well as | the logical 'und' (and) takes priority over the logical 'as well as': e. g. 1402 & #e | $ha* is identical to the statement (1402 & #e) | $ha*. If you want to set a different priority, you have to use parentheses in your statement, for instance, 1402 & (#e | $ha*).

Please, be careful when using the Boolean 'and'-operator, since you may not always arrive at meaningful statements. For instance, the extended search string [*] & haben would not be meaningful, since a word cannot be a punctuation mark and a variant of "haben" at the same time. The Boolean operator 'und' (and) should generally be used in order to link otherwise unmarked search strings with conceptual categories or to link conceptual categories with text line positions.

Examples for meaningful uses of the Boolean 'und'(and)-operator are: haben & niht, by which you will find all word combinations of "haben" and "nicht", or 21071 & 2322, which searches for intransitive verbs within the conceptual area "Horse and Horsemanship", or <NOM> & <ADJ>, which will retrieve words that may be either nouns or adjectives , or # 2 & # <8 , which searches for all words within the positions 3 to 7 in a text line.

Serial Searches

Serial searches allow you to search for a series of two or more words within the text base. Such serial search statements consist of a series of simple search statements separated by commas, e. g. ich, 21071 | haben, #e. The words you search for must occur in the same text line and in the same sequence as stated in the serial search string. However, you may also allow for intervals of one or more words between your words within the search series by using wild cards, as for instance, in the query ich, *, 21071 | haben, *, *, #e. This query searches for the word 'ich' or any of its variants followed by any word, followed by a word within the conceptual catogory 21071 (intransitive verb) or the word 'haben' or any of its variants, followed by any two words, followed by the last word in a text line.

It is also possible to operate with an upper limit for the number of words in the interval between two words in a serial search. For instance, the query ich, {1}, 21071 | haben, *, *, {3}, #e will find all occurrences of the word 'ich' or any of its variants followed by a maximum of one word, followed by a word within the conceptual catogory 21071 (intransitive verb) or the word 'haben' or any of its variants, followed by at least two but no more than five words, followed by the last word in a text line.

Naturally, the maximum number of words within a serial search is limited by the context you have set for the search. For instance, if your context is set to "word" and the size of your context to 4, your serial search may not have more than two words between the first word in your query statement and the last word within a text line.

Context Queries

Context queries are the most common type of queries. They may include any number of simple searches or 'und/oder' (and/or) queries that must be separated by a plus sign. For instance, the statement ich, habe + <NAM> combines a serial search statement with a simple search statement.

A context query finds any context that combines words determined by the query statements within a given context frame. For example, when the context unit is set to "Zeilen" (lines) and the context scope to "3", the query string alphart + dietrich will find all context areas, where these two names (or their variants) appear with no more than two lines in between.

However, you have to consider one special condition, when submitting a context search: If your context search contains simple search strings, it will retrieve only one of these criteria per word for each context frame. For instance, the query $a* + $*e searches for at least one word that begins with the character "a" and one that ends with the character "e" within the context frame. If a word happens to begin with the character "a" and to end with an "e", the query will search at least for one more word that ends with an "e". In other words, the word that fulfills both search criteria will be found only for the search for beginning $a*, not for the search for ending with $*e.

The limitation that simple search criteria may not overlap each other does not apply to serial searches. For example, when you submit the query $a*, $*e , you are searching for a word that begins with "a" followed by a word ending in "e". Since there might be too many overlappings, you must not combine serial searches with each other by using a + sign.

The Representation of Query Results

Your query results are furnished either in form of a table (simple searches) or in form of a simple line (serial searches and context searches). In both cases, the figures in the second column and further columns to the right represent the sum of occurrences for each individual text or text group. The column on the far right contains the sum total for all texts or text groups. If a text belongs to more than one text group, there will be two columns for the same text, but the sum total will contain the frequency for this text only once. The columns for text groups and those for the sum total are highlighted in different colors.

In simple searches frequencies of occurrence are given for each variant separately. The lemma line, which is highlighted, contains the sum total for the entire lemma/variant group. If you click on the lemma, you will get to the entry for the entire lemma/variant group in the Dictionary. There you will find all variants, not only the ones that occur in the given text selection, along with all possible meanings of the lemma.

When clicking on frequency numbers, you will only arrive at text references, if the frequency number appears in blue. Any frequency number higher than 1000 will appear in black, which means you may not get directly at those text references, since the output would simply be too large. In other words, for those occurrences you will have to limit your search by setting the maximum of references to less than 1000. Depending on the number of references you retrieve you may either see a compacted list, containing one line for each reference or you may directly get full references within a larger context. In all text references the words fulfilling the search criteria are set off by a different color from the context. When your results appear in form of a compacted list you may get to the individual full text references by clicking on the blue text line numbers. You also have the option of selecting your own sublist by checking the little boxes next to the line numbers and then clicking the button "Auswahl anzeigen" (Show selection) at the bottom of the list.

Examples

Below you will find a number of query examples that were submitted for test purposes and which yield real results. In parentheses you find the short codes for the texts, which had been selected for the queries. In some cases we also give you the context parameters that had been set for the searches. We recommend that you actually submit these examples in order to get a feel for how to work with the information system.

@guot (KW,ML,UL) gives you all variants for the lemma "guot".

$? | [*] (KW,ML,UL) retrieves all words that consist only of one character, plus all punctuation marks.

<NAM> (KW) finds all names in the given text.

<NAM> & #e (KW) finds all names that occur at the end of a line in the given text.

haben (AT, AX) finds all variants for "haben".

haben & nicht (Alle Texte) retrieves from all texts all compound words that contain the components "haben" and "nicht".

kleinez, puoch (Alle Texte) finds in all texts all references "puoch" (or its variants as well as compounds containing it as a component), preceded by "kleinez" (or its variants as well as compounds containing it as a component).

$a*, $e* (AT) finds all words begining with the character "a", followed directly by a word ending in "e".

$a*, *, $e* (AT) is the same query as above allowing one word to stand in between the first and second word defined by the search criteria.

gab (Alle Texte) finds in all texts all variants and compounds containing "gab".

gab + gab (Alle Texte; Kontext: 1 Zeile) finds all lines that contain at least 2 variants of or compounds with "gab".

gab + gab + gab (Alle Texte; Kontext: 1 Zeile) finds all lines that contain at least 3 variants of or compounds with "gab"

$a* + $a* + $e* + $e* (AT; Kontext: 1 Zeile) finds all lines that contain at least 2 words beginning with "a" and at least two words beginning with "e".

sifrit (Alle Texte; Kontext: 1 Zeile) finds all variants for and compounds with "sifrit".

sifrit + gunther (Alle Texte; Kontext: 3 Zeilen) finds all references, where "sifrit" or its variants and compounds cooccur with "gunther" or its variants or compounds within a context frame of 3 lines.

<NAM>+<NAM> (KW, ML, UL; Kontext: 1 Zeile) finds all lines that contain at least 2 names.

231125 + 2412 (Hartmann; Kontext: 1 Zeile) retrieves all personal names that cooccur with a name for a country within one line.

24321 + 23231 (Alle Texte; Kontext: 2 Zeilen) retrieves from all texts references where words from the category "high nobility" cooccur with words from the category "weapons" within a context of 2 lines.

künec + 2433 (Alle Texte; Kontext: 2 Zeilen) retrieves from all texts references where the word "künec" or one of its variants cooccurs with a word from the category "law" within a context of two lines.

24321 + $milt* (Alle Texte; Kontext: 2 Zeilen) retrieves from all texts references where a word from the category "high nobility" cooccurs with the word "milt" or one of its variants within a context of 2 lines.

*en & #m (EIL) finds all words in Eilhart's Tristrant that have been either altered or added by the editor and which end in -en.

Dictionary

You may work with the dictionary in the same way as with the Analyse Text module, except for all queries that deal with context or text idiosyncrasies. The dictionary provides you with lemmas and all their possible variants that have been accounted for within the text base up to the current state of the lemmatisation process. The frequencies behind each variant reflect this current state of lemmatisation, except for homographs that may not have been disambiguated. In addition, the dictionary provides you with all possible meanings, for each lemma selected. A "meaning" may consist of one or several conceptual categories (e.g. vuoz = 1. 2103 = Körper und Gliedmaßen (body and bodily parts); 2. 312412 = Formen (forms) / 315 = Raum (space); 3. 3134 = Maße und Gewichte (measures and weights); 4. 2512 = Literatur (literature)). You may enter into the search window for the dictionary character strings containing the wild cards * or ? as well as lemmas preceded by @ or simply words preceded by $. You will always retrieve a lemma or a selection of lemmas that meet the given search criteria. From the group of selected lemmas you may click on any given lemma to arrive at its full entry. In addition, you may also submit categories (e.g. 231125 = Ehe/Famile/Namen(marriage/family/names), and retrieve all lemmas (proper names) that have been included in the dictionary according to the current state of lemmatisation. Attention, the list of all proper names exceeds by far the maximum table space! Therefore, you may want to call off the names in groups according to the letter of the alphabet with which they begin. Thus, for instance, the entry 231125 & a* retrieves all names that begin with the letter "a".

Comments and questions:
Horst Pütz - puetz@germsem.uni-kiel.de or
Klaus M. Schmidt - schmidt@bgnet.bgsu.edu

Back to Text Analysis

Back to the help contents