Wednesday, December 05, 2018

How do you pronounce the web?

Thank you to Irfan Ali and Janina Sajka for starting the Spoken Pronunciation Task Force in the W3C Accessible Platform Architectures (APA) Working Group. As an accessibility professional, amateur linguist, and person with a cognitive disability, I'm offering this blog post as a perspective and a starting point for discussion.

What would happen if we use lexical markup in HTML for improving pronunciation in text to speech (TTS)? Lexical markup specifies the lexeme of a word or phrase, not its phonemes.

  • Lexical markup can help language learners as a basis for more efficient translation.
  • Lexical markup can help end users with cognitive disabilities through more accurate presentation of lexical synonyms (PDF) or conversion to simplified language.
  • Existing W3C standards already offer lexical markup as a basis for pronunciation. In the Pronunciation Lexicon Specification (PLS) and Speech Synthesis Markup Language (SSML), the role attribute and token element provide this capability.
  • Implementations exist which offer lexical markup as an option, such as Amazon Polly.
  • For authors, lexical specification can be simpler than phonetic specification, which encourages better quality for TTS end users. For example, consider which of these Amazon examples seems easier for authors to get right:

    I read a lot of <w role="amazon:NN">content</w> <!-- interpret the word as a noun -->

    I read a lot of <phoneme alphabet="ipa" ph="ˈkɑn.tɛnt">content</phoneme> <-- use the specified IPA pronunciation (International Phonetic Alphabet) -->

  • Phonetic markup breaks regional accents in TTS, while lexical markup allows them.
  • Lexical markup makes it easier for search to match meaning.
  • Example of the last two points: An author marks up "Jaguar" and "jaguar" lexically, not phonetically. TTS pronounces it correctly in both American and British English. Searches for "jaguar car" and "jaguar cat" match the right content.

At the same time, the markup specification should also allow authors to express their desired pronunciation phonetically. This is important for some cases, such as:

  • Where a pronunciation lexicon is not yet available, such as specialized technical vocabularies, or quotations of minority languages.
  • Nonce words.
  • Situations where precise pronunciation is more important than interoperability, such as a language comprehension test.

This task force will hit an existing limitation of HTML: there's no way for authors to mark up individual words in a flat text string such as an aria-label value, title attribute value, or <title> element text. Today, this limitation prevents authors from applying a lang attribute to a substring. Is now the time to devise a solution in the HTML spec?

With the rise of voice search and ubiquitous mainstream text-to-speech, accessibility specification writers will be wise to design for simultaneous benefits in mainstream voice applications. Avoiding changes to the mainstream browsing experience was a good choice for ARIA, but this philosophy should not be overextended. In most cases, pronunciation goals are the same for screen readers, literacy software, speech output as a mainstream feature in visual browsers, and voice-first or voice-only mainstream user agents such as smart speakers. Authors might wish to optimize speech for different use cases such as "read this text naturally for my enjoyment" and "speak all information literally," but these minor variants are not enough to justify making assistive speech pronunciation completely separate from mainstream speech pronunciation. On the contrary, if accessibility can pull in the same direction as other technology interests, then good web pronunciation won't end up isolated in an education niche.

I've heard concerns that standardization of pronunciation technology infrastructure could lead to (or could only succeed with) a high degree of pronunciation standardization that does not represent the diversity of real-world human speech. These concerns are valid, but not new. In traditional print dictionaries, lexicographers have always had to choose a specific midpoint between description and prescription. Fortunately, the PLS standard already supports more pronunciation options than print dictionaries could, by allowing authors to choose a pronunciation lexicon appropriate for their content and audience.

  • A highly descriptive lexicon can be derived automatically from statistical analysis of a speech performance corpus.
  • A balanced descriptive–prescriptive lexicon could be derived from community consensus sources such as Wikipedia and Wiktionary.
  • A highly prescriptive lexicon could be written as a speech output style guide for a particular context, such as a foreign language course.

Likewise, TTS developers will expect the freedom to choose pronunciation lexicons that range from descriptive to prescriptive. In the future, TTS developers may want to give users a choice of lexicons along the descriptiveness range, as they already offer voice options for accent or gender. So web standards should apply the principle of "author proposes, user disposes" to the choice of lexicon.

Where privacy allows, speech-to-text (STT) applications are a rich source for descriptive pronunciation lexicons. I don't know how STT might interoperate with pronunciation markup. At a minimum, STT developers should be invited for their input to the task force.

Enterprise-scale machine learning is already using human speech sources to create pronunciation lexicons. Profit motives have funded spectacular progress, but there have been problematic side effects. Corporations have zealously protected their data sets inside of walled gardens, while adding languages slowly. This currently leaves assistive technology TTS users excluded from the web's promise of cultural and linguistic inclusion. I would love to see open transparent processes for creating descriptive pronunciation lexicons from community sources like Wiktionary and linguistic research corpora.

Lexicon publication standards could yield some useful results. I bet a lot of people would like to define the pronunciation of their own name in their native language and in a dominant culture language. I also bet companies would like to define one or more pronunciations of their brand names – if this provides even a slight boost in organic search engine optimization (SEO) for voice search (analogous to schema markup for traditional search), then somebody will make a lot of money.

Friday, May 18, 2018

Accessibility experts are farmers, not manure shovelers

As an accessibility subject matter expert (SME) watching another Global Accessibility Awareness Day come and go, sometimes I feel like a manure shoveler. No matter how carefully I shovel (or test), it will always stink.

Instead, as SMEs we must remember to be farmers. With patient perseverance, we sow the seeds and nurture the cycles of life – or the lifecycles of digital products.

Actually, many analogies are apt...

  • the accessibility job most days :: the accessibility job as it should be
  • manure shoveler :: farmer
  • cop (traditionally gaining respect through fear) :: firefighter (traditionally gaining respect through protection)
  • factory worker :: global logistics consultant
  • number cruncher :: polymath
  • heart surgeon :: nutritionist

No analogy is perfect. In reality the manure will always need shoveling, the factory will need greasing, numbers will need crunching, and digital products will need auditing. The power of these analogies is to remind us that the left side is just a task we must do, while the right side is who we are.

So the next time I'm in the middle of a hard day of shoveling manure, I will remember to preserve my energy and time, and get up the next day bright and early as the nurturing farmer.

Saturday, February 24, 2018

Learning from Yucatán and Quintana Roo

As we started driving out of Cancún, I couldn't help looking for patterns. How is this place the same or different from the parts of the U.S. and Germany that I know? After a week with my family in Valladolid, Tulum, and Playa del Carmen, I still have more questions than answers.

In Puerto Morelos yesterday, I bought a book called "Mañana Forever? Mexico and the Mexicans" from a lovely bookstore called Alma Libre. I'll see if I can put together some answers.

I'm not looking for the "national character" of Mexico or of any place, just patterns of prevailing beliefs of the people who live there. Before diving into that book, here are my initial observations and hypotheses.

More people smile back to me in Mexico than in San Francisco.

People spend more time face to face with friends and acquaintances. They share a vehicle, sit together without devices, talk and laugh. I saw three young women in the Tulum library doing crafts together.

People "give it everything they've got" as [xxxx] put it in [xxxx link]. Yet they also rest more consistently.

People spend a lot of their time tired or happy or both. Most don't try to hide these feelings, except in more polished service jobs, where they put on happy faces like in the U.S. with which they only partially convince themselves and their patrons.

People living in shacks across the street from our Tulum hotel are certainly poor, but in a particular way which rarely exists in the San Francisco Bay Area. The closest I've seen was the squatter settlement on Albany Bulb before it was shut down. These are working poor people, in a lifestyle which could be sustainable if job and health allow. The book [squatter cities xxxx] has influenced my perceptions.

Like anywhere, the written rules and the unwritten ones overlap and coexist. I would like to learn the different ways Mexicans feel and talk about rules and laws.

Official communications from the government make it sound like Mexicans are proud of their government institutions, but I haven't yet noticed that vibe from individuals. National treasures like Chichén Itsá and Sian Ka'an are possible exceptions.

I talked the most with Joaquín. He is writing a multilingual primer on a contemporary Mayan language. Mayans are proud of their culture, but resent a pattern of lack of access to opportunities.

People resent smug entrenched wealth, represented not just by US tourists like me but also by Mexican oligarchs and corporations. When I am kind, someone like Santos our boat pilot can feel good about me as a person while continuing to resent the inaccessible wealth that I represent. Building these bridges requires two people to communicate; it doesn't extend to two degrees of separation.

I expected conversations to land more decisively on English or Spanish. It took me a few days to become comfortable switching between Spanish and English within a single conversation or even a single thought, whatever got the message across. This pattern was quite common where we were.

Only in our last evening in Playa del Carmen I realized that the insulated tourist zones are legitimately part of Mexico, just as San Francisco is so different yet a genuine part of the United States. (Moxie convinced me over dinner.)

These are all first impressions. I've not had enough experiences to be confident about any of these generalizations, even within the communities I visited.