Notes on digital
content
(These
notes build on earlier notes on
"Multimedia
Content and Tools"; there are, however, no links here, so far
...)
see
also: A Brief Note on
Knowledge Technologies (no
links either, sorry
:-( )
-
Digital content
has many aspects, facets, dimensions: contextual (in terms of purposes,
applications and domains), technical, economic, linguistic, social,
cultural, aesthetic, ethical, legal, political, to name but a few. This
note focuses primarily on (some) technical issues.
-
It does in particular not
address the "driving forces" behind digital content and its associated
technologies (e.g. "who profits in what way from what?", "who needs it
for what?", "who establishes requirements?"), or ways technical
solutions can help resolve non-technical issues (such as IPR/"digital
rights", multilinguality, data protection, content preservation or
social control of digital content).
-
We speak of digital content as
opposed to analogue content. Digital content is
anything that can be produced/created, stored, processed, managed and
transmitted using digital technologies. (Analogue content has - at
least in principle - nothing to do with these technologies: it used to
be produced/created, stored, processed, managed and transmitted, long
before (in most OECD countries) digital computers became ubiquitous
office equipment, and digital cameras, CD and DVD players, playstations
etc., became household appliances. We note, however, that digital
technologies are also having a great impact on (at least the production
and distribution of) analogue content.)
-
This may be - to all intents and purposes -
slightly too broad a "definition". We exclude
software engineering products in so far as they drive the machines that
process content and make it accessible (among many other tasks
supported by software). We include above all
databases, electronic texts, images and graphics, audio and video, or -
more generally - "multimedia content". But we also
include software engineering products (e.g. programmes and
specifications) in so far as they implement computational services and
services in processing and accessing content (including themselves (!)
- as pieces of text in "software engineering environments", for
instance).
-
More specifically, we may "define" digital
content as: [[structured]
collections of] digital representations of abstract or real-world
objects and processes, and/or of knowledge thereof; objects and
processes are either natural or man-made.
(Unfortunately, the terminology is currently somewhat blurred as many
"content providers" and manufacturers of "content management systems"
seem to adhere to too narrow a view of the concept, e.g. limited to the
context of "digital rights management".)
-
Our definition subsumes "multimedia".
In fact "multimedia content" is
usually understood as subject to a number of restrictions, such as:
-
to form part of multimedia content a digital
representation of a real-world object must preserve some or (in the
best case) all sensory features (in terms of seeing, hearing, smelling,
tasting and touching) of that object;
-
to form part of multimedia content a digital
representation of an abstract object (e.g. data sets, information
spaces) must allow presentations that appeal directly to one, several
or all of the human senses.
-
More often than not the objects underlying digital
representations are man-made and themselves analogue representations
(e.g. images, text, sound, etc.) of real-world objects. (Strictly
speaking, "text", as a concatenation of characters drawn from a finite
set of symbols, is itself a digital object type; hand written text,
however, clearly includes a strong analogue component.)
-
Whereas storing and communicating analogue
representations of (real-world) objects require a range of different
physical media (e.g. paper, film, air (!), magnetic tape,
electromagnetic fields, etc.), digital representations can in principle
be stored on and communicated via a single physical medium
respectively. (Hence "multimedia" may be
considered a misnomer; it is actually the replacement of several
(multi) physical media by (potentially) a single one, that the term
'multimedia' alludes to.)
This is what "convergence" is
all about (at least technically speaking).
-
Digital content - and multimedia content in
particular - play eminent roles in the context of,
for example,
-
business and public administrations,
-
science, engineering, medicine, law and (many)
other professional occupations,
-
education and training,
-
(general and specific) information services,
-
workflow (including CSCW) and transaction oriented
(e.g. for e-commerce) systems,
-
preservation of and access to intellectual and
cultural assets and resources (in public and private "memory
institutions / facilities");
-
entertainment (games, interactive digital TV, etc.).
-
Digital content (as all forms of content) may or
may not represent explicit knowledge about real
world phenomena (things, processes, etc.). (In
an education/training environment, for instance, digital content is
likely to represent some explicit real-world knowledge whereas this may
not necessarily be the case with content underlying the odd action game.)
In any event,
however, it
is always possible to have knowledge about (digital
or non-digital) content. It is
usually descriptive in nature. Formal
content description, representing knowledge about
content, is indeed an important form of digital content. We refer to it
as "meta-content" (as opposed to
"domain content") which
includes "metadata" as well as the (formal) documentation of whatever
"mini-world" is needed to give these metadata meaning (e.g. ontologies).
-
A possible (non-exhaustive) list of (classes of) operations
on digital content could be structured along the
complementary views of content as either output or input, and the
"domain content versus meta-content" dichotomy:
|
content
as output
|
content
as input
|
domain
content
|
creation, manipulation,
search, access, retrieval, communication, presentation
|
description, storage,
management, manipulation, annotation, analysis
|
meta-
content
|
creation, description,
manipulation, annotation, analysis
|
management, search, access,
retrieval, presentation
|
An operation can
appear in different quadrants. "Annotation", for instance, would take
"domain content" as input and output "meta-content"; "presentation", to
take another example, can be based on "meta-content" to visualise
"domain content".
The allocation is of course fuzzy. None of the
above (classes of) operations can be seen in isolation. Their
application is usually subject to some sequential or hierarchical order
(if sequential it is often called "life cycle").
They are of different degree of complexity, and there is substantial
overlap between the (mainly technical) issues pertaining to individual
operations. A more detailed list of such issues is annexed to this note.
-
Tools - in general both hardware
and software - are needed to perform these operations. They can be
grouped in corresponding classes. Tools are either generic or
application specific: different application areas and different
applications within these areas may impose different requirements on
the specificities of the various above listed types of operations.
The particular features of authoring tools for
instance, depend on the type of content to be created: requirements for
authoring a set of courseware modules are likely to be quite different
from those for compiling an interactive multimedia newspaper or for
producing a video clip.
Yet, the basic features of these operations do
provide a common ground for most if not all contexts of digital
content. They are largely independent of any given application domain.
-
Technical challenges regarding
digital content and associated tools stem mainly from the evolution of
basic digital technologies, characterized by ever increasing values of
parameters such as processor speed, storage and memory capacity,
bandwidth and connectivity. This evolution has greatly facilitated the
emergence of specific "technologies for creating and
using digital content". These have brought about:
-
the (well known) quasi-explosive increase in
digital content production (using tools that are many orders of
magnitude more powerful than pen, paper, the printing press or library
catalogues; in fact, most of what used to be given "analogue" form in
the past - e.g. sound, still and moving images, speech, etc. - is now
available as "digital content") as well as distributed (world-wide)
platforms for the management and use of digital content;
-
a considerable enhancement of our ability to
analyse what is going on in the world (in both nature and society), to
peruse vast amounts of data, searching for structure, thus refining our
models of the world; and - partly as a consequence - machines/agents
that can learn and - to a certain extent - act autonomously in limited
formal environments.
A key technical challenge consists in building on
these developments with a view to creating tools and systems that would
make operations on digital content ever more effective and efficient,
and its use ever more enjoyable (for instance by allowing a higher
degree of interaction).
As insinuated, this challenge is persistent. Its
target is moving.
There is a "technology - applications" cycle.
Applications pose challenges; technologies are developed in response to
such challenges and may make new applications possible, desirable or
necessary. That cycle takes societal needs as input and yields products
and/or services as output.
-
Presumably the hardest (and hence most
challenging?) problems, from a technical point of view, are those
related to content analysis (or,
for that matter, the analysis of the real world, the ultimate source of
content). (In a nutshell: such analysis aims to derive
from - more or less - raw data or signals something intelligible and
actionable - in the form of ontologically grounded metadata, for
instance.) From
an organisational (that is non-technical) point of view the hardest
(but perhaps no less challenging) problems are probably those related to
access to and use of digital content (e.g. in the
above mentioned contexts, cf. 10).
-
There are a number of general
architectural issues related to systems dealing with digital
content. They include: heterogeneity, distributivity, federation,
scalability, interoperability, sustainability and commercial viability
of content repositories / systems; (multimedia) content production
systems; embedded multimedia (e.g. background multimedia libraries),
human-centred design techniques, etc.
-
Adherence to standards in the
design and implementation of systems based on digital content is a
necessary condition for (inter alia) interoperability, sustainability
and commercial viability. Standards issues arise in connection with
several of the above classes of operations, in particular creation
(representation formats and languages), description (metadata,
identifiers, languages, semantics, ontologies) and communication
(protocols).
-
Summary:
-
The scope of (the "abstract" notion of) digital
content ranges from "domain content" (object and process
representation) to "meta-content" ("knowledge representation"); there
are many application contexts;
-
Specific technologies underly tools for working
with digital content; tools have generic and application dependent
features;
-
The hardest problems include: (1)
content/real-world analysis; (2) sustainable access to and use of
digital content; (3) social control of digital content;
-
These problems are likely to persist - with
solutions emerging from the "technology - application cycle" (with
"needs" as input and "products / services" as output).
Annex:
Some
technical and
non-technical issues related to operations on
digital representations and collections
creation
|
representation formats and languages, digitization
techniques, editing / authoring / production, protection (e.g.
watermarking and encryption), digital representations of not yet
commonly handled objects and features (related, for instance, to senses
other than seeing and hearing)
|
description
|
(there is a strong link between description related
issues and those related to other classes of operations, particularly
analysis, search and retrieval), classification, metadata, metadata derivation and
tools, content semantics, knowledge representation, ontologies, related
(formal) language issues
|
storage
|
distributed storage, near-line storage
architectures, caching, I/O bandwidth resource management, OS support
for realtime and non-realtime data, compression techniques
|
management
|
multimedia data models, distributed object
management, multimedia fragments management, temporal and spatial
databases, multimedia data warehousing, guaranteeing consistency,
integrity and authenticity
|
manipulation
|
code transformation, texture mapping, image
perspective transformation, feature enhancement, etc...
|
annotation
|
video annotation and summarization / abstracting,
semantic annotation etc...
|
analysis
|
data mining / knowledge discovery in multimedia
databases, document analysis and understanding, image analysis, pattern
recognition, feature detection (e.g. object motion detection, tracking
and characterization; face detection and tracking; understanding
acoustic signals; speaker identification and tracking; emotion
detection - 'kansei'; ... for content based retrieval...)
|
search
|
resource discovery, meta-search engines, filtering
and selection, concept based browsing, content based querying,
navigation
|
access
|
indexing methods (e.g. spatial indexing, content
based indexing), access control (including control of access to illegal
and harmful content), rights management, user privacy
|
retrieval
|
multimedia retrieval models (including network and
hypermedia models, Web based multimedia), content based ('intelligent')
retrieval
|
communication
|
design and implementation of multimedia
communication protocols, quality of service, real-time streaming /
synchronization, multimedia over the Internet, mobile multimedia,
multimedia via satellite, multicasting and security
|
presentation
|
content personalization, man-machine interaction
models, non-standard interfaces (e.g. 'immersive content',
multimodality), interface agents, virtual reality / interactive
simulations, visualisation techniques, etc...
|
Copyright:
Hans-Georg Stork
|