For the purposes of reporting on the stated research hypotheses, we propose adopting the following definitions and terminology around the generation of reports against the data collected.

Raw Data

  • PHPMyAdmin can be used to verify counts of information as 'raw' and un-interpreted information

MySQL-Views

  • MySQL-views will be used to pool a common set of rules and filters so as to simplify the report generation in several different directions
  • MySQL-views currently consist of:
    • report_view_session - for reports that interpret data from a perspective of a session
    • report_view_term - for term-centric reporting
    • report_view_user - for user centric reporting

What is a Term?

  • For the purposes of the report_view_term MySQL-View
  • Do not include corrected or deleted terms. DO include blacklisted terms.
    • steve_term.corrected_id is null or steve_term corrected_id = -2
  • Do not include terms from steve team members (user not from steve team)
    • steve_user.group_membership is null or steve_user.group_membership not like 'steve team'
  • Eliminate TS4

When is a Term new?

  • A term is new when no match is found against any field of the museum_object metadata using FM or FTPFM
  • Matches against AAT, ULAN or Search Logs are not considered.

What is a "useful" term

  • A term that has been reviewed and marked "useful" by its reviewer

What constitutes a View?

  • Did someone see this work?
  • A view is a work that has either been tagged at least once plus every skip of the work.
  • Do not include TS4

What does num_users_exposed mean?

  • The number of users exposed to a work is the distict number of users who have tagged plus the distict number of users who have skipped ensuring that users are not repeated in these sets

What is a full match

  • Case insensitive trimmed full string -> Case insensitive trimmed full field match
  • " BlUE" == "blue" == " bluE "
  • " BluE" != "BluE Man Group "
  • " blue?" != " blue"
  • a entered string for a tag is atomic

What is a full word partial field match? (against Museum Docs, AAT?, ULAN?)

  • A FWPFM exists when the case insensitive trimmed full string term matches against a case insensitive full word sub-string of a field in the matching resource.
  • i.e. the term "boat" partially matches the title "The Boat Builders".
  • "boat" != "boating"
  • "boat?" == "boat"
  • "boat" == "boat?"
  • a tag is atomic
  • no full match is a partial match (i.e. "a blue bell" and "a blue bell" are full matches not partial)
  • "blue bell" == "a blue bell rings"
  • "blue bell" != "blue bells of scotland"
  • "blue bell" != "The blue man group"
  • "sunny" matches the field "a sunny sunny day" once in this field.

Truncated String Matching

  • The character string from the tag exists the character string in the resource with right-side truncation when the inputted tag > 3 characters.
  • "painting" == "paintings"
  • "blue bell" == "blue bells of scotland"
  • "man" != "many"
  • "them" == "themselves"

What is a resource?

  • A set of terms we want to compare with tags
  • 4 resources (AAT, ULAN, Museum Docs, Search Logs)
  • Museum Documentation
    • Only includes what's in the steve_museum_object table
      • excluding institution_id, museum_object_id, *_id
    • matches to include FWM, FWPFM, TSM
    • counting a match once per field
  • AAT
    • match against the AAT term field in the term table?
    • need to report the entire term row
    • matches to include FWM, FWPFM, TSM
    • count a match only once
    • report every place at which a match was found in the resource.
    • Identify the AAT facet - We need to crawl up the tree of AAT from the matched term to a predefined facet id.
  • ULAN
    • exactly the same as AAT
    • ULAN match needs to deal with extended characters
  • Search Logs
    • Subset tags by institutions for which we have logs (SFMOMA, MIA, MMA)
    • Only match an institutions tags against its own search logs
    • matches to include FWM, FWPFM, TSM
    • don't need to weight search match counts
    • need to record the match as well as the frequency of searches from logs and frequency of the tag that matched
    • frequencies should be normalized by the total size of the search vocabulary (sum of all frequencies) and the total number of terms for the institution

When is review of a tag "complete"?

  • Term review is complete if any evaluation has been made (no skips)
  • A term review is not complete when there is no evaluation in the database for this term-work pair.

Who are the review users?

  • The official review users will have user_status = 2. Data in report_view_review will only contain reviews from these users.

What is a User?

  • For the purposes of the report_view_user MySQL-View
  • Users always exclude steve team members (user not from steve team)
    • steve_user.group_membership is null or steve_user.group_membership not like 'steve team'
  • Browser agents that are provably bots will be excluded from User reporting (record the bots that are excluded)
  • Anonymous users with their session's user_agent appearing in the steve_bot_user_agent table are excluded

What is a Session?

  • For the purposes of the report_view_session MySQL-View
  • session reporting is derived from the definition of a User (based on excluding bots, etc...)
  • For reporting concerning session statistics we need to include zero-tag sessions
  • For reporting concerning users and tags we only need to include sessions with length != 0

What is Session's Length?

  • session length is defined by the time of the last tag or skip minus the session start time
    • INCORRECT timestampdiff(SECOND, steve_session.session_start_dtm, max(steve_term.entered_dtm))

What Constitutes a Unique Term?

  • A term not LIKE any other term on the same image
    • Two terms that differ only in their use of capital letters ARE NOT unique (terms are case-insensitive)
    • Two terms that differ only in that they have whitespace at the beginning and/or the end ARE NOT unique (whitespace is trimmed)
    • Punctuation is included as unique
    • (see the definition of a full match)

What distinguishes an anonymous user?

  • A anonymous user has a user status recorded as 5
    • steve_user.user_status = 5
  • Facebook users who did not link to an existing steve account AND did not register are anonymous
    • Users only present in TS4 are not counted.

Work Attributes

  • object_types
  • 2D / 3D
  • representational / non-representational

Term Attributes

  • corrected, deleted, blacklisted
  • Characterization of Terms (probe examination only)
    • Proper name, place, time, pre-iconographic, emotion, color
    • Not found in AAT