Using the Middle High German Conceptual Database
Revision 1.1 (May 27th, 2003)
Introduction
Preamble
When using MHDBDB you ought to be aware that the entire database
system is an ongoing construction site. That means that the dictionary
as well as the entire text base is subject to continuous revision and
expansion. Texts are being lemmatised and disambiguated, that is,
homographs are being separated and ambiguous words are assigned to a
specific meaning within their respective context. From time to time, we
will inform you about the ongoing work on our News
Page.
In addition, we are continuously working on the program
system and the help tools. Therefore, we are urging you to send us
comments and error reports so we may
improve the system and make it more user friendly.
If you want to submit a query to MHDBDB, you first have to do
three different things:
- You have to submit a ("well-formulated") search string according to
the following guidelines and
- you have to decide whether you want to work with the Analyse Text
module or with the Dictionary
module.
- Once you decided to work with
Analyse Text, you have to determine the texts in which the
search is supposed to take place. Thereby you have to take into
consideration whether a text has already been fully integrated with
the Dictionary
or not. If that is not the case, you need to know that the given text
is only partially lemmatised. In that case it is best to search for
'strings' (e.g. "diet*", if you want to retrieve all occurrences of
the name "Dietrich").
Please, consult the module Text List on
the main menu so that you may know exactly with which text (author,
edition, manuscript) you would like to work.
Some texts may be lemmatised, but not yet disambiguated, which
means homographs have not been separated and exact meanings have not
been determined on basis of context. In such cases you may find that
certain word forms cannot be found under certain categories, because
they constitute a homograph and thus may belong to two different
lemmas, but the specific assignments have not yet been made for a
given text. If you search for a certain category in the Dictionary,
you will receive all lemmas and their variants that have been assigned to
this category up to
that point. If the frequency shows 0 (null), the given lemma or
variant
has not been assigned a meaning in any text yet. Thus it is always
advisable to back up your search with a search for 'strings', if
you are interested in completeness of your information.
Some texts
(editions) include certain idiosyncracies (caesuras, editorial comments,
insertions,
missing words/lines), about which you should inform yourself before
submitting a query. When you click on the short form for the text (AH=Der
arme Heinrich) in Text List,
you will get to a page containing text information (edition and text
idiosyncrasies). There you will find
separate windows for Edition and Description, containing
references to all idiosyncrasies of the given text edition. When you
click on the author in
Text
List, you will get to a page containing information on the author.
Text Selection
The box for text selection contains three different types you can
choose from:
- Alle Texte (all texts)
is the default setting in the
first line. This selection includes all texts that are currently in the
database, whether they have been lemmatised or not. Currently, only 25
texts have been fully lemmatised and categorised, especially the texts
whose short code consists of only two letters (example: AH = Der arme
Heinrich).
If you select from the main menu the module Statistics
you will see an overview of the current editorial status of the
entire project. Naturally, you may only retrieve partial results if you
search in non-lemmatised texts for lemmas, categories, grammar tags
instead of strings and word forms.
- Textgruppen(text groups)
: those are currently texts that are
grouped according to authors (Hartmann von Aue, Konrad
von Würzburg, Ulrich von Lichtenstein und Wolfram von Eschenbach);
groupings according to text types (Heldenepik/heroic epics, Lyrik/lyric poetry,
Artusepik/Arthurian epics) will follow later.
- Individualised Texts
; here you can select all the texts
individually, including those that are members of the text groups by mouse click. You may
also form your own text groups. For that purpose, simply click on
the
first text you want to include in that group, then drag the bar to your
right down until you see the last text
you want to include. Hold down the shift key and click on the text.
Now all texts between the first and the
last will be highlighted. As a next step, you need to eliminate those
texts you
don't want to include by holding down the ctrl key and clicking
on each text at a time so that they are no longer highlighted. Repeat that
process until you have exactly the text group you want.
When you conduct a context search you may choose the size of your
context between 1-99 words or lines before and after your keyword by
changing the values in the
smaller windows on the right hand side of your screen. In addition, you
may adjust the settings for your output by selecting Varianten
anzeigen (show variants) or Lemmas anzeigen (show
lemmas), which will give you an output showing either lemmas with all
their variants or the lemma forms only. Furthermore, you may
determine whether your output shows the results for each text
individually (Einzeltexte anzeigen) (show individual
texts), or according to text groups (Textgruppen anzeigen), or simply as
the sum total for all texts (Summe aller Texte).
Please, keep in mind that the speed of your returns is depending
on two things:
- the number and the length of texts (for instance, FD,
KU, LZ, NB, PZ, TR or WH are very long) you
investigate.
- the time of the day (at certain times of peak usage the speed of
data transport on the Internet may be very slow).
You can find links to additional Help Pages for
both the Dictionary and the Analyse Text Modules below the project image,
as well as two help functions for finding conceptual categories, Browse
Categories and Search Word in
Category System, as well as a page explaining the grammatical
categories that may be searched,Explain grammar
tags.
Search Strings
Simple Searches
All queries that submit simple search strings only or combine
such simple search strings through Boolean operators, are "simple
searches." With a "simple search string" you may, for instance, submit a
string
of letters of which a lemma or one of its variants may consist. Through
Boolean operators you may determine which search strings may be
connected with each other or may exclude each other. The search
possibilities with such "simple search strings" are explained in detail
in the following paragraph.
- Simple Search String
A query for haben will retrieve all occurrences of
this word form as well as other forms that are based on the same
lemma, along with all words that include the string of letters
"haben" (e.g. the negation "enhaben") and their derivatives. The search
string may also include wild cards or jokers (* = word or ? = character, for
example, hab*,
*abe*, ?aben), but in this case, word
combinations containing the string are no longer given. Instead, you may
retrieve words which you may not have intended to get (e. g.
"habech").
- Search string for words
preceded by a $-sign
The query for $haben, for instance, provides you
exclusively with references for exactly the word form "haben";
neither variants of of the lemma "haben" nor compounds with "haben" will
be shown. You may use wild cards or jokers, however, which may
yield different results.
- Search string for lemmas preceded by a
@-sign
The search for @haben, for instance, provides you with all
references for the lemma "haben" as well as its variants. However, the
number of variants only reflects the current status of the lemmatisation
of the text base. By using wild cards you may retrieve other variants for
the lemma that have not yet been lemmatised. Thus, for instance, a search
for @hab* will yield more variants for the lemma
"haben", along with variants of other lemmas that begin with the string
"hab", as variants for the lemma "habech"(hawk).
- Search for Conceptual
Categories
Before you are trying to search for conceptual categories you should
familiarise yourself with the overall conceptual
system Each category (that is its respective number code) or a
combination of categories may be used as a search input. If you search for
the category 2322, you will retrieve all words that have
something to do with the concept 'horse and horsemanship'. Category
numbers may also end in a wild card, since the numbering reflects
the hierarchic order of the system. For instance, the search for
1402* will yield word material for all subcategories to
the category 1402 ('birds') but not the
category 1402 itself. You may get your number codes by clicking
Browse
Categories. All you need to do is to highlight and copy the
selected category number in that window and paste it into your search
window. You may also search for words or strings in the category system
by clicking on Search Word in
Category System If you enter, for instance, the
word Namen into the little search window for that module, you
will get all categories for names (family names, place names, etc.) on
your screen.
- Word Position Within Text Line
You may also search for words that occur in a certain position within a
text line. This is especially interesting in texts that are written in
metres and are using rhymes. The search string #3, for
instance, yields all words that occur in the third position of each line
in a given text. The query inputs #=3 or # =
3 are the equivalents for the first input.
The search inputs #
<3 or #>3 provide you with all
words that occur either immediately before or following the third
position in a line. Searching for #a gives you all words
that occur at the beginning of a text line (the search string
#1 will yield the same results), whereas
#e provides you with all words occurring at the end of
a line. The search string #m gives you all words at
the beginning of a paragraph or section (for instance, words that
tend to be written with a beginning capital letter); the input
#v or #n finds all
words before or after a caesura (a metric phenomenon that occurs
in a small number of texts only).
- Punctuation
Search strings for punctuation signs always have to be placed
within brackets. The string [!?.], finds the respective
sentence endings, where exclamation point, question mark or full stop are
used. The input [*] yields all punctuation marks. The
input is identical to [.!?,;:-"'()]. The input
["] or [<] yields the opening quotation mark,
that is the beginning of a direct
speech, whereas [>] finds the closing quotation
mark that stands at the end of a direct speech. For quotations
within direct speech the texts show the French quotation marks, <<
= opening quote and
>> =
closing quote. You may search for these with the following
search strings: [{] or [X] for opening quote; [}]
or [Y] for
closing quote. Single quotation mark or apostrophe appears in
contracted word forms (e.g.d'in. You may search for the apostrophe
like for any regular
character. The search string *'*, for instance, provides you with
all words that contain an apostrophe.
- Grammar Tags
You may search for words belonging to a certain word type by placing
the abbreviated form (3 capital letters) of a grammar tag within the
less than/larger than symbols. For instance, the string
<ADJ> provides you with all adjectives.
The grammar tags are:
NOM (noun), NAM (name), ADJ (adjective),
ADV (adverb), ART (article), CNJ (conjunction),
CPA (comparative particle, e. g. als, wie), DET
(determinant), DIG (digit), GRA (gradation particle, e.
g. vil), INJ (interjection), IPA (interrogative
pronoun),
NEG (negation), NUM (numeral), POS
(possessive pronoun), PRO (personal pronoun), PRP
(preposition), VRB (verb), VEX (auxiliary),
VEM (modal auxiliary). You may open an extra help window
containing this list by clicking Explain
grammar tags.
- Text Idiosyncracies
may also be searched from the Analyse Text module.
Omissions or gaps in a text
are marked by three dots (...). Those may be searched via the
search string [ _ ] = underscore within
brackets. Capital letters at the beginning of lines can be
searched with #l and major capitals at the beginning of
chapters or sections may be found through the input #c.
Emendations or insertions by the editors are set in
italics in the electronic texts. They may be searched through the
string #m.
- Special Characters Special
characters are not
in all texts represented in the same way as they appear in the
original edition. For instance, the superscripted «o»
above the «u»is generally represented as
«uo» in the electronic texts.
«ß» appears in some texts as
«s». «y» with lengthening sign
appears as «Ý» that is as "Y with
superscripted accent".
The
«ß» is simplified in some texts to
«s».
In searches, you may substitute «ß» for
«s.» if «ß» is too cumbersome
to emulate on your keyboard. (Example: $has. finds all
occurrences for haß.
Since the emulation of umlauts and vowels
with lengthening sign may be equally cumbersome on some keyboards, you may
always use a following «.(period)» for umlauts and a
following «- (hyphen)» for lengthening sign.
(example: ü =
u. / â = a-)
Boolean Operators
By means of the Boolean operators 'und'(and)
(&) and 'oder'(or)
(|) you may connect individual search strings with
each other. For instance, the combined search string $ich & #a
retrieves all references for "ich" at the beginning of a text
line; the combined string 1402* | 1403* will give you
all words that belong to a subcategory of 1402 (birds) as well
as to a subcategory of 1403 (fishes). In an extended search string,
you may use as many Boolean operators as you wish in order to combine
simple search strings (see above) with each other. Such a valid extended
search string would be, for instance, haben | [;:!] |
1402* & #e. Whenever an extended search string contains the
Boolean operator & as well as |
the logical 'und' (and) takes priority over the logical 'as
well as': e. g. 1402 & #e |
$ha*
is identical to the statement (1402 & #e) |
$ha*. If you want to set a different priority, you have to
use parentheses in your statement, for instance, 1402 & (#e |
$ha*).
Please, be careful when using the Boolean 'and'-operator, since you
may not always arrive at meaningful statements. For instance, the
extended search string [*] & haben would not be
meaningful, since a word cannot be a punctuation mark and a variant
of "haben" at the same time. The Boolean operator 'und' (and) should
generally be used in order to link
otherwise unmarked search strings with conceptual categories or to link
conceptual categories with text line positions.
Examples for meaningful uses of the Boolean
'und'(and)-operator are: haben & niht, by which
you will find all word combinations of "haben" and
"nicht", or 21071 & 2322, which searches for
intransitive verbs within the conceptual area "Horse and
Horsemanship", or <NOM> &
<ADJ>, which will retrieve words that may be either nouns
or adjectives , or # 2 & # <8 , which searches
for all words within the positions 3 to 7 in a text line.
Serial Searches
Serial searches allow you to search for a series of two or
more words within the text base. Such serial search statements consist of
a series of simple search statements separated by commas, e. g. ich,
21071 | haben, #e. The words you search for must occur in the
same text
line and in the same sequence as stated in the serial search string.
However, you may also allow for intervals of one or more words between
your words within the search series by using wild cards, as for instance,
in the query
ich, *, 21071 | haben, *, *, #e. This query searches for
the word 'ich' or any of its variants followed by any word, followed by a
word within the conceptual catogory 21071 (intransitive verb) or the word
'haben' or any of its variants, followed by any two
words, followed by the last word in a text line.
It is also possible to operate with an upper limit for the number of
words in
the interval between two words in a serial search. For instance, the query
ich, {1}, 21071 | haben, *, *, {3}, #e will find all
occurrences of
the word 'ich' or any of its variants followed by a maximum of one word,
followed by a word within the conceptual catogory 21071 (intransitive
verb) or the word 'haben' or any of its variants, followed by at least two
but no more than five words, followed by the last word in a text
line.
Naturally, the maximum number of words within a serial search is
limited by the context you have set for the search. For instance, if
your context is set to "word" and the size of your context to 4, your
serial search may not have more than two words between the first word
in your query statement and the last word within a text line.
Context Queries
Context queries are the most common type of queries.
They may include any number of simple searches or
'und/oder' (and/or) queries that must be separated by a
plus sign. For instance, the statement ich,
habe + <NAM> combines a serial search statement with a
simple search statement.
A context query finds any context that combines words determined by
the query statements within a given context frame.
For example, when the context unit is set to "Zeilen" (lines) and the
context scope to "3", the query string alphart +
dietrich will find all context areas, where these two names
(or their variants) appear with no more than two lines in between.
However, you have to consider one special condition, when submitting
a context search: If your context search contains simple search
strings, it will retrieve only one of these criteria per word for each
context
frame. For instance, the query $a* + $*e searches
for at least one word that begins with the character "a" and one that
ends with the character "e" within the context frame. If a word
happens to begin with the character "a" and to end with an "e", the
query will search at least for one more word that ends with an "e".
In other words, the word that fulfills both search criteria will be
found only for the search for beginning
$a*, not for the search for ending with $*e.
The limitation that simple search criteria may not overlap each other
does not apply to serial searches. For example, when you submit the
query $a*, $*e , you are searching for a word that begins
with "a" followed by a word ending in "e". Since there might be too many
overlappings, you must not combine serial searches with each other by
using a + sign.
The Representation of Query
Results
Your query results are furnished either in form of a table (simple
searches) or in form of a simple line (serial searches and context
searches). In both cases, the figures in the second column and
further columns to the right represent the sum of occurrences for
each individual text or text group. The column on the far right contains
the sum total for all texts or text groups. If a text belongs to more
than one text group, there will be two columns for the same text, but the
sum total will contain the frequency for this text only once. The columns for text groups and those for the
sum total are highlighted in different colors.
In simple searches frequencies of occurrence are given for
each variant separately. The lemma line, which is highlighted,
contains the sum total for the entire lemma/variant group. If you
click on the lemma, you will get to the entry for the entire
lemma/variant group in the Dictionary.
There you will find all variants, not only the ones that occur in the
given text selection, along with all possible meanings of the lemma.
When clicking on frequency numbers, you will only arrive at text
references, if the frequency number appears in blue. Any frequency
number higher than 1000 will appear in black, which means you may not
get directly at those text references, since the output would simply
be too large. In other words, for those occurrences you will have to
limit your search by setting the maximum of references to less than
1000. Depending on the number of references you retrieve you may
either see a compacted list, containing one line for each reference or
you may directly get full references within a larger context. In all
text references the words fulfilling the search criteria are set off by a
different color from the context. When your results appear in form of a
compacted list you may get to the individual full text references by
clicking on the blue text line numbers. You also have the option of
selecting your own sublist by checking the little boxes next to the
line numbers and then clicking the button "Auswahl anzeigen" (Show
selection) at the bottom of the list.
Examples
Below you will find a number of query examples that were
submitted for test purposes and which yield real results. In
parentheses you find the short codes for the texts, which had
been selected for the queries. In some cases we also give you
the context parameters that had been set for the searches. We
recommend that you actually submit these examples in order to
get a feel for how to work with the information system.
- @guot
(KW,ML,UL) gives you all variants for the
lemma "guot".
- $? | [*]
(KW,ML,UL) retrieves all words that
consist only of one character, plus all punctuation marks.
- <NAM>
(KW) finds all names in the given text.
- <NAM> & #e
(KW) finds all names that occur at the end of a line in
the given text.
- haben
(AT, AX) finds all variants for "haben".
- haben & nicht
(Alle Texte) retrieves from all
texts all
compound words that contain the components "haben" and
"nicht".
- kleinez, puoch
(Alle Texte) finds in all texts all
references "puoch" (or its variants as well as compounds containing it
as a component), preceded by "kleinez" (or its variants as well as
compounds containing it as a component).
- $a*, $e*
(AT) finds all words
begining with the character "a", followed directly by a word ending
in "e".
- $a*, *, $e*
(AT) is the same query as above allowing
one word to stand in between the first and second word defined by the
search criteria.
- gab
(Alle Texte) finds in all texts all variants and
compounds containing "gab".
- gab + gab
(Alle Texte; Kontext: 1 Zeile) finds all
lines that contain at least 2 variants of or compounds with "gab".
- gab + gab + gab
(Alle Texte; Kontext: 1 Zeile) finds
all
lines that contain at least 3 variants of or compounds with "gab"
- $a* + $a* + $e* + $e*
(AT; Kontext: 1 Zeile) finds all
lines that contain at least 2 words beginning with "a" and at least two
words beginning with "e".
- sifrit
(Alle Texte; Kontext: 1 Zeile) finds all
variants for and compounds
with "sifrit".
- sifrit + gunther
(Alle Texte; Kontext: 3 Zeilen)
finds all references, where "sifrit" or its variants and compounds cooccur
with
"gunther" or its variants or compounds within a context frame of 3
lines.
- <NAM>+<NAM>
(KW, ML, UL; Kontext: 1 Zeile)
finds all lines that contain at least 2 names.
- 231125 + 2412
(Hartmann; Kontext: 1 Zeile)
retrieves all personal names that cooccur with a name for a country within
one line.
- 24321 + 23231
(Alle Texte; Kontext: 2 Zeilen)
retrieves from all texts references where words from the category "high
nobility" cooccur with words from the category "weapons" within a
context of 2 lines.
- künec + 2433
(Alle Texte; Kontext: 2 Zeilen)
retrieves from all texts references where the word "künec" or one of
its variants cooccurs with a word from the category "law" within a context
of two lines.
- 24321 + $milt*
(Alle Texte; Kontext: 2 Zeilen)
retrieves from all texts references where a word from the
category "high nobility" cooccurs with the word "milt" or one of
its variants within a context of 2 lines.
- *en & #m
(EIL) finds all words in Eilhart's
Tristrant that have been either altered or added by the editor and
which end in -en.
You may work with the dictionary in the same way as with the Analyse
Text module, except for all queries that deal with context or text
idiosyncrasies. The dictionary provides you with lemmas and all their
possible variants that have been accounted for within the text base up
to the current state of the lemmatisation process. The frequencies
behind each variant reflect this current state of lemmatisation, except
for homographs that may not have been disambiguated. In addition, the
dictionary provides you with all possible
meanings, for each lemma selected. A "meaning" may consist of one
or several conceptual categories (e.g. vuoz = 1. 2103 = Körper
und Gliedmaßen (body and bodily parts); 2. 312412 = Formen (forms) /
315 = Raum (space); 3. 3134 =
Maße und Gewichte (measures and weights); 4. 2512 = Literatur
(literature)). You may enter into the search window for the dictionary
character strings containing the wild cards * or ?
as well as lemmas preceded by @ or
simply words preceded by $. You will always retrieve a lemma or a
selection of lemmas that meet the given search criteria. From the group of
selected lemmas you may click on any given lemma to arrive at its full
entry. In addition, you may also submit categories (e.g. 231125
= Ehe/Famile/Namen(marriage/family/names), and retrieve all lemmas
(proper names) that have been included in the dictionary according to
the current state of lemmatisation. Attention, the list of all
proper names exceeds by far the maximum table space! Therefore, you may
want to call off the names in groups according to the letter of the
alphabet with which they begin. Thus, for instance, the entry
231125 & a* retrieves all names that begin with the
letter "a".
Comments and questions:
Horst
Pütz - puetz@germsem.uni-kiel.de
or
Klaus M. Schmidt - schmidt@bgnet.bgsu.edu
Back to Text Analysis
Back to the help contents