Table of Contents
- Raw Data
- MySQL-Views
- What is a Term?
- When is a Term new?
- What is a "useful" term
- What constitutes a View?
- What does num_users_exposed mean?
- What is a full match
- What is a full word partial field match? (against Museum Docs, AAT?, …
- Truncated String Matching
- What is a resource?
- When is review of a tag "complete"?
- Who are the review users?
- What is a User?
- What is a Session?
- What is Session's Length?
- What Constitutes a Unique Term?
- What distinguishes an anonymous user?
- Work Attributes
- Term Attributes
For the purposes of reporting on the stated research hypotheses, we propose adopting the following definitions and terminology around the generation of reports against the data collected.
Raw Data
- PHPMyAdmin can be used to verify counts of information as 'raw' and un-interpreted information
MySQL-Views
- MySQL-views will be used to pool a common set of rules and filters so as to simplify the report generation in several different directions
- MySQL-views currently consist of:
- report_view_session - for reports that interpret data from a perspective of a session
- report_view_term - for term-centric reporting
- report_view_user - for user centric reporting
What is a Term?
- For the purposes of the report_view_term MySQL-View
- Do not include corrected or deleted terms. DO include blacklisted terms.
- steve_term.corrected_id is null or steve_term corrected_id = -2
- Do not include terms from steve team members (user not from steve team)
- steve_user.group_membership is null or steve_user.group_membership not like 'steve team'
- Eliminate TS4
When is a Term new?
- A term is new when no match is found against any field of the museum_object metadata using FM or FTPFM
- Matches against AAT, ULAN or Search Logs are not considered.
What is a "useful" term
- A term that has been reviewed and marked "useful" by its reviewer
What constitutes a View?
- Did someone see this work?
- A view is a work that has either been tagged at least once plus every skip of the work.
- Do not include TS4
What does num_users_exposed mean?
- The number of users exposed to a work is the distict number of users who have tagged plus the distict number of users who have skipped ensuring that users are not repeated in these sets
What is a full match
- Case insensitive trimmed full string -> Case insensitive trimmed full field match
- " BlUE" == "blue" == " bluE "
- " BluE" != "BluE Man Group "
- " blue?" != " blue"
- a entered string for a tag is atomic
What is a full word partial field match? (against Museum Docs, AAT?, ULAN?)
- A FWPFM exists when the case insensitive trimmed full string term matches against a case insensitive full word sub-string of a field in the matching resource.
- i.e. the term "boat" partially matches the title "The Boat Builders".
- "boat" != "boating"
- "boat?" == "boat"
- "boat" == "boat?"
- a tag is atomic
- no full match is a partial match (i.e. "a blue bell" and "a blue bell" are full matches not partial)
- "blue bell" == "a blue bell rings"
- "blue bell" != "blue bells of scotland"
- "blue bell" != "The blue man group"
- "sunny" matches the field "a sunny sunny day" once in this field.
Truncated String Matching
- The character string from the tag exists the character string in the resource with right-side truncation when the inputted tag > 3 characters.
- "painting" == "paintings"
- "blue bell" == "blue bells of scotland"
- "man" != "many"
- "them" == "themselves"
What is a resource?
- A set of terms we want to compare with tags
- 4 resources (AAT, ULAN, Museum Docs, Search Logs)
- Museum Documentation
- Only includes what's in the steve_museum_object table
- excluding institution_id, museum_object_id, *_id
- matches to include FWM, FWPFM, TSM
- counting a match once per field
- Only includes what's in the steve_museum_object table
- AAT
- match against the AAT term field in the term table?
- need to report the entire term row
- matches to include FWM, FWPFM, TSM
- count a match only once
- report every place at which a match was found in the resource.
- Identify the AAT facet - We need to crawl up the tree of AAT from the matched term to a predefined facet id.
- ULAN
- exactly the same as AAT
- ULAN match needs to deal with extended characters
- Search Logs
- Subset tags by institutions for which we have logs (SFMOMA, MIA, MMA)
- Only match an institutions tags against its own search logs
- matches to include FWM, FWPFM, TSM
- don't need to weight search match counts
- need to record the match as well as the frequency of searches from logs and frequency of the tag that matched
- frequencies should be normalized by the total size of the search vocabulary (sum of all frequencies) and the total number of terms for the institution
When is review of a tag "complete"?
- Term review is complete if any evaluation has been made (no skips)
- A term review is not complete when there is no evaluation in the database for this term-work pair.
Who are the review users?
- The official review users will have user_status = 2. Data in report_view_review will only contain reviews from these users.
What is a User?
- For the purposes of the report_view_user MySQL-View
- Users always exclude steve team members (user not from steve team)
- steve_user.group_membership is null or steve_user.group_membership not like 'steve team'
- Browser agents that are provably bots will be excluded from User reporting (record the bots that are excluded)
- Anonymous users with their session's user_agent appearing in the steve_bot_user_agent table are excluded
What is a Session?
- For the purposes of the report_view_session MySQL-View
- session reporting is derived from the definition of a User (based on excluding bots, etc...)
- For reporting concerning session statistics we need to include zero-tag sessions
- For reporting concerning users and tags we only need to include sessions with length != 0
What is Session's Length?
- session length is defined by the time of the last tag or skip minus the session start time
- INCORRECT timestampdiff(SECOND, steve_session.session_start_dtm, max(steve_term.entered_dtm))
What Constitutes a Unique Term?
- A term not LIKE any other term on the same image
- Two terms that differ only in their use of capital letters ARE NOT unique (terms are case-insensitive)
- Two terms that differ only in that they have whitespace at the beginning and/or the end ARE NOT unique (whitespace is trimmed)
- Punctuation is included as unique
- (see the definition of a full match)
What distinguishes an anonymous user?
- A anonymous user has a user status recorded as 5
- steve_user.user_status = 5
- Facebook users who did not link to an existing steve account AND did not register are anonymous
- Users only present in TS4 are not counted.
Work Attributes
- object_types
- 2D / 3D
- representational / non-representational
Term Attributes
- corrected, deleted, blacklisted
- Characterization of Terms (probe examination only)
- Proper name, place, time, pre-iconographic, emotion, color
- Not found in AAT
