music xml introduction

Music XML Introduction and comparison

Overview of Music XML

Music XML aims to provide an interchange file format that represents sheet music. MIDI is the primary mode of storing and transmitting music performance data, however, it doesn’t store some information that is more useful for representing sheet music. MusicXML seeks to fill this gap (where others have failed by not being widely adopted) and it has largely succeeded: Most notation software now accepts and exports MusicXML.

History and Status

Music XML was first conceptualized circa 2000 by Michael Good in a series of whitepapers (e.g.). It is now developed by the W3C as an open format and supported by Makemusic, Inc. (makers of Finale) where Michael is now Vice President of MusicXML technologies. Despite a number of competing sheet music storage languages existing prior to (or begun around the same time) as Music XML it has seen the most widespread adoption thus far, being used or supported by over 250 notation programs.

Basic Description

Music XML likely found widespread use partially because it is a markup language; That is, it follows a hierarchical structure and provides human-readable as well as machine-readable syntax.

The structure will be familiar to anyone who has experienced HTML/XML:

<note>
    <pitch>
        <step>C</step>
        <octave>4</octave>
    </pitch>
    <duration>4</duration>
    <type>whole</type>
</note>

A small snippet representing how to specify an individual note. 

Top-level Ambiguity

The hierarchical nature of XML presents an interesting challenge: Sheet music does have a hierarchy, but there is top-level ambiguity. Which part of a score is the overall ‘parent’? Is it the ‘part’ (instrument) or the measure?

MusicXML solves this by having two possible top-level elements, part-wise, and time-wise organization. This means we either have parts with measures or overall measures that have subsequent parts. The latter may not make intuitive sense when imagining a musical score, but technically each part has at least empty/rest-only measures for the entire duration of a score and can be represented with the measures as a top-level organization.

Here are two example videos showing a simple Music XML structure with the corresponding sheet music. It will go through ‘reading’ the Music XML to better demonstrate the difference between part-wise and time-wise:

Example of ‘reading’ a part-wise score

Example of ‘reading’ a time-wise score

We see in the images how the Music XML would be structured and subsequently read in each scenario – The first, each part would be specified separately with their measures (and contents) in order. The second, each measure would be specified in order, with all parts and staves for that measure given.

The part-wise organization is the most common to be found “in the wild” and all major notation software exports using it, so we will focus on it for this post.

Part-wise Full Example

A (mostly) complete example of part-wise organization:

<score-partwise>
    ...MusicXML score definition, etc.
    <part id="P1">
        <measure number="1">
            <attributes>
                <division>1</division>
                <key>
                    <fifths>0</fifths>
                </key>
                <time>
                    <beats>4</beats>
                    <beat-type>4</beat-type>
                </time>
                <clef>
                    <sign>G</sign>
                    <line>2</line>
                </clef>
            </attributes>
            <note>
                <pitch>
                <step>C</step>
                <octave>4</octave>
                </pitch>
                <duration>4</duration>
                <type>whole</type>
            </note>
            ... Other Notes
        </measure>
        ...Other measures
    </part>
    ..Other parts
</score-partwise>

Some elements have been abridged to save space in the snippet.

In this example, we see all the basic requirements for a musical score to be constructed: Time signature, key signature, clef. These are organized under the measure tag since they are measure-level attributes. This measure tag exists under the “part” tag, which can be roughly seen as an instrument in sheet music. Then the notes, articulations, ornaments, etc. for that measure would be present. This is the basic structure of MusicXML (part-wise): Constructed part by part, measure by measure, note by note.

As we see in the example above, Music XML is quite explicit, often using elements instead of attributes.

For our above example, we would get something like this when actually rendered:

Having touched upon rendering, it is a good time to point out that Music XML does not itself provide any means of rendering. Much like HTML requires a web browser to display, Music XML requires translation from the element structure seen above to actual sheet music. As already mentioned, most music notation software can do this, and it is the primary objective of OSMD (which we use for rendering our examples in this post).

Important Structural Concepts

As shown in the section above, Music XML is quite human-readable and should make intuitive sense to those familiar with sheet music and HTML (or other XML specifications). For this reason, we won’t go through each element available and discuss it in detail (that can be found in the MusicXML specification). Instead, here we will discuss a few structural patterns that may be confusing to those brand new to Music XML.

Slur, Beam, Pedal, and Other “Range” Elements

Elements of sheet music that can cover a “range” of notes all behave in a similar way in Music XML. Here is a code example containing all three mentioned in the heading:

...
<direction placement="below">
  <direction-type>
    <pedal type="start" line="yes"/>
    </direction-type>
  </direction>
<note>
  <pitch>
    <step>G</step>
    <octave>4</octave>
    </pitch>
  <duration>1</duration>
  <voice>1</voice>
  <type>eighth</type>
  <stem>up</stem>
  <beam number="1">begin</beam>
  <notations>
    <slur type="start" placement="below" number="1"/>
    </notations>
  </note>
<note>
  <pitch>
    <step>A</step>
    <octave>4</octave>
    </pitch>
  <duration>1</duration>
  <voice>1</voice>
  <type>eighth</type>
  <stem>up</stem>
  <beam number="1">end</beam>
  <notations>
    <slur type="stop" number="1"/>
    </notations>
  </note>
<direction placement="below">
  <direction-type>
    <pedal type="stop" line="yes"/>
    </direction-type>
  </direction>
...

This would all be contained within a part, measure, etc. structure

In this example, we can see the similarity in behavior for these range elements. Beams are specified on the note element itself:

<beam number="1">begin</beam>
...
<beam number="1">end</beam>

We give them a number ID in case there are several beams in the range (e.g. 16th or lesser notes). Then we simply specify the beginning and endnote with the element text content.

Likewise, slurs are also defined under the note element, as a child element of notations:

  <notations>
    <slur type="start" placement="below" number="1"/>
    </notations>
...
  <notations>
    <slur type="stop" number="1"/>
  </notations>

We again can specify a number in case of multiple slurs, but this time the “start” and “end” of the slur are specified via the type attribute (also note the placement attribute. This is available on many ornaments and objects that can be rendered in different locations in relation to other objects).

The pedal is not associated with a specific note since it affects any notes in the staff above it (same for octave shift lines, for example). It instead occurs just before the intended start timestamp or immediately after the intended stop timestamp for the end:

...
<direction placement="below">
  <direction-type>
    <pedal type="start" line="yes"/>
    </direction-type>
</direction>
....
<direction placement="below">
  <direction-type>
    <pedal type="stop" line="yes"/>
    </direction-type>
</direction>
...

We also see that it occurs under the direction -> direction-type hierarchy in the measure element. Also of note is that it does not require an ID, since we would never have overlapping pedal lines for a given stave.

Our example here would render like this:

The Backup Element and Its Uses

The backup (and forward) element is used to coordinate multiple voices per staff, as well as multiple staves per part, so it is important to understand. The essential takeaway is that a backup element moves the “parsing cursor” back by the specified duration:

<backup>
  <duration>4</duration>
</backup>

Duration is a somewhat complicated value, explained well here. The short way to describe it is to say that it is a value intended to be MIDI-compatible, so it represents “…a note’s duration in terms of divisions per quarter note.” The divisions per quarter note are defined for each measure with the divisions element.

To keep it simple for this post, we will always set the number of divisions per quarter note as 1 so that the duration value will specify the number of quarter notes. So the example above would tell us to jump back 4 quarter notes.

Let’s look at some examples with context to better understand this.

Multiple Voices

IF we want to have multiple voices per staff, we need to use the backup element. Here is an abridged example:

    <measure number="1">
      <attributes>
        <divisions>1</divisions>
        ...
        </attributes>
      <note>
        <pitch>
          <step>A</step>
          <octave>4</octave>
          </pitch>
        <duration>2</duration>
        <voice>1</voice>
        <type>half</type>
        <stem>up</stem>
        </note>
      <note>
        <rest/>
        <duration>2</duration>
        <voice>1</voice>
        <type>half</type>
        </note>
      <backup>
        <duration>4</duration>
      </backup>
      <note>
        <pitch>
          <step>F</step>
          <octave>4</octave>
          </pitch>
        <duration>2</duration>
        <voice>2</voice>
        <type>half</type>
        <stem>down</stem>
        </note>
      <note>
        <rest/>
        <duration>2</duration>
        <voice>2</voice>
        <type>half</type>
      </note>
    ...

First, notice our measure attributes block contains the division specification. This is telling us that, for this measure, any duration element specified will have 1 division per quarter note (In this case, it means the duration unit is in quarter notes).

Next observe that our first two notes contain a voice element, putting them into voice 1.

We then see our backup element which tells us to go back 4 quarter notes. This is to demonstrate that the following note, in voice 2, begins 4 quarter notes back (at the beginning of the measure), at the same timestamp as the first voice 1 note. Here is an animation demonstrating how this is parsed sequentially to yield proper rendering:

MUSIC XML MEASURE ATTRIBUTES BLOCK
Multiple Staves (Per Part)

The same principle is applied for multiple staves per instrument, as is the case with a Piano for example. Here is a Music XML example of that in practice:

...
<measure number="1">
  <attributes>
    ...
    <divisions>1</divisions>
    <staves>2</staves>
    <clef number="1">
      <sign>G</sign>
      <line>2</line>
      </clef>
    <clef number="2">
      <sign>F</sign>
      <line>4</line>
      </clef>
    </attributes>
  <note>
    <pitch>
      <step>A</step>
      <octave>4</octave>
      </pitch>
    <duration>2</duration>
    <voice>1</voice>
    <type>half</type>
    <stem>up</stem>
    <staff>1</staff>
    </note>
  <note>
    <rest/>
    <duration>2</duration>
    <voice>1</voice>
    <type>half</type>
    <staff>1</staff>
    </note>
  <backup>
    <duration>4</duration>
    </backup>
  <note>
    <pitch>
      <step>G</step>
      <octave>3</octave>
      </pitch>
    <duration>2</duration>
    <voice>5</voice>
    <type>half</type>
    <stem>up</stem>
    <staff>2</staff>
    </note>
  <note>
    <rest/>
    <duration>2</duration>
    <voice>5</voice>
    <type>half</type>
    <staff>2</staff>
    </note>
</measure>
...

Note that we specify in the measure attributes that two staves exist and give their clefs. We then indicate which stave a note belongs to with its child staff element.

But the backup works in the exact same way here. If we combine both of these we can see how it would be rendered here, two staves with two voices per stave:

Comparison with MEI

There are a number of alternatives to MusicXML for storing sheet music data; The most notable presently is MEI (Music Encoding Initiative). As mentioned on the MEI website, MEI is also an XML specification. And while it does provide the same functionality of Music XML, the philosophy behind it differs. According to the MEI site: “…it can also encode information about the notation and its intellectual content in a structured and systematic way. MEI supports notation systems outside of standard Common Western Notation…”

Practically speaking, MEI has more metadata and score structure elements available (e.g. verse, watermark, distributor, accessRestrict, section and many more) While MusicXML provides the necessary hierarchy to simply faithfully recreate and transmit sheet music. 

As a result, MEI has been adopted in the academic world (e.g.: https://beethovens-werkstatt.de/, http://www.kb.dk/dcm/cnw/navigation.xq) where metadata is more pertinent. MEI also appears to be more compact, preferring abbreviated XML attributes over elements for data specification.

Here is a simple comparison example to demonstrate some structural differences:

Music XML

<score-partwise>
    ...MusicXML score definition, etc.
  <part id="P1">
    <measure number="1">
      <attributes>
        <division>1</division>
        <key>
          <fifths>0</fifths>
        </key>
        <time>
          <beats>4</beats>
          <beat-type>4</beat-type>
        </time>
        <clef>
          <sign>G</sign>
          <line>2</line>
        </clef>
      </attributes>
      <note>
        <pitch>
        <step>E</step>
        <octave>5</octave>
        </pitch>
        <duration>48</duration>
        <voice>1</voice>
        <type>eighth</type>
        <beam number="1">begin</beam>
        <notations>
          <slur number="1" placement="above" type="start"/>
        </notations>
      </note>
      <note>
        <pitch>
        <step>F</step>
        <octave>5</octave>
        </pitch>
        <duration>48</duration>
        <voice>1</voice>
        <type>eighth</type>
        <beam number="1">end</beam>
        <notations>
          <slur number="1" placement="above" type="stop"/>
        </notations>
      </note>
    </measure>
  </part>
</score-partwise>

MEI

<mei>
  ... MEI Metadata, etc.
  <score>
  <scoreDef meter.count="4" meter.unit="4" meter.sym="common" key.sig="0" key.mode="major">
    <staffGrp symbol="brace" bar.thru="true">
      <staffDef n="1" clef.line="2" clef.shape="G" key.sig="0" lines="5">
      <label>Voice</label>
      <labelAbbr>Voice</labelAbbr>
      </staffDef>
    </staffGrp>
  </scoreDef>
  <section>
    <measure n="1">
    <staff n="1">
      <layer n="1">
      <beam>
          <note xml:id="d16531e298" pname="e" oct="5" dur="8"/>
          <note xml:id="d16531e316" pname="f" oct="5" dur="8"/>
      </beam>
      </layer>
    </staff>
    <slur tstamp="0" curvedir="above" startid="#d16531e298" endid="#d16531e316"                 staff="1" tstamp2="0m+1"/>
    </measure>
      ...
  </section>
  </score>
</mei>

We can see some of the key differences here:

Music XMLMEI


The time signature and clef are defined as elements at the beginning of the measure


The staff definition, including the time signature and clef is defined first. Then matched to the staff later by ID (n attribute)
The part/measure structure is the base structurestaff (and staff group) is the base structure
Less semantic structureSection and layer ‘semantic’ elements are present
Key signature, time signature, note pitch, duration, etc. are all defined as elementsKey signature, time signature, note pitch, duration, etc. are all defined as attributes
Beams are defined with “begin” and “end” text content on the note range contained within the beamBeam elements contain the notes that are members of the beam
Slurs are defined with the “start” and “stop” type attribute to mark the begin and end note of a slurSlurs are defined at the end of a measure, pointing to the start and stop notes by ID

Comparison Summary

For human readability, MusicXML has an advantage because abbreviations aren’t typically used for attribute or element names. This makes MusicXML larger in size, but MusicXML resolves this by providing a compressed.MXL standard (a zipped MusicXML file essentially).

For academic purposes where querying, linking, and organizing scores by metadata is important, MEI comes out on top with its semantic-focused structure. For the interchange of sheet music across notation programs, MusicXML is the dominant format and most widespread.

This has the result of all major music notation programs supporting MusicXML natively, whereas MEI must be transformed to and from Music XML (one exception being a plugin available for Sibelius for MEI).

While the formats are technically competitors in that they both specify sheet music, based on their underlying philosophies they sort themselves into different market categories. Unless the development of one or both of the formats starts to drift away from it’s stated goal, it seems likely that we will continue to see the academic/wider market split.

Join the Community

We’re building a new community of sheet music developers. The goal is to make a common place for developers and digital sheet music enthusiasts alike. If you’d like to meet other sheet music devs, share your project or just join in the discussion, definitely check out our Discord channel.

To get fresh updates and curated blog posts directly in your inbox subscribe to our newsletter.

Try OSMD – WordPress plugin and GitHub repository available.

The OSMD tool library is up for free download on GitHub and on the WordPress market place. For a quick test of free and premium features check out the demos.

New OSMD updates or content covering Music XML or digital sheet music apps is always in the works. Stay in the loop – follow the blog or OSMD on social media.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top