The AI Field Guide / D

Letter D

16 terms, explained without the techno-murk.

/

Data

Start here

Information that computers can store, examine or learn from.

Data can be words, images, sounds, numbers, clicks or sensor readings. Its quality, legality and representativeness strongly affect what an AI system can do.

For example

Thousands of labelled bird photographs are data for an image-recognition system.

#

Data augmentation

Deeper

Creating varied training examples by making sensible changes to existing data.

It is like practising the same skill under slightly different conditions. A picture might be cropped, brightened or flipped so the model learns the important subject rather than one exact photograph.

For example

A vision dataset gains extra examples by rotating each training image a little.

#

Data ethics

Everyday

The principles used to decide whether collecting, sharing and using data is responsible and fair.

Something can be technically possible and legally permitted while still raising ethical concerns. Data ethics asks whether people have meaningful choice, whether the use is proportionate, who benefits, who may be harmed and whether communities have a voice in decisions about their information.

For example

A public body asks whether reusing residents' records to train an AI matches the purpose for which the information was originally collected.

#

Data mining

Everyday

Searching large collections of data for useful patterns or relationships.

The mining comparison suggests digging through a lot of material to find something valuable. It can reveal trends and connections, but a pattern is not automatically an explanation or proof of cause.

For example

A supermarket analyses baskets to find products that are often bought together.

#

Data science

Everyday

Using data, statistics and computing to understand problems and support decisions.

Data science can include collecting and cleaning information, finding patterns, making charts, running experiments and building predictive models. Not every data-science project uses AI.

For example

A data scientist studies sales records to find why customers are leaving.

#

Data transparency

Everyday

Providing clear information about where data came from and how it was collected, changed and used.

It is the data equivalent of an ingredient label and supply history. Useful transparency describes sources, permissions, missing groups, cleaning steps and limitations in language that affected people can understand, rather than merely stating that 'data was used.'

For example

A model report lists the years, regions and populations represented in its training data and identifies important gaps.

#

Data-centric engineering

Deeper

Building and improving systems by treating data as a central engineering concern.

This approach pays close attention to how data is collected, labelled, checked, maintained and used throughout a system's life. Better data practices can improve an AI system more than repeatedly changing the model.

For example

A factory team improves sensor quality and fault labels before retraining its maintenance model.

#

Dataset

Start here

An organised collection of data used for training, testing or analysis.

A dataset might be a spreadsheet, a library of images or a huge collection of text. Datasets can contain gaps, errors, copyrighted material or private information, so their origins matter.

For example

A speech dataset may contain audio clips paired with accurate transcripts.

#

Decision tree

Everyday

A prediction method that follows a branching series of questions.

It works like a flowchart: take one branch if the answer is yes and another if it is no, then continue until reaching a result. Trees are often easier to inspect than large neural networks.

For example

A tree asks about income, payment history and debt before estimating lending risk.

#

Deep learning

Everyday

Machine learning that uses neural networks with many layers.

Deep learning is behind many advances in language, images and speech. The layers learn increasingly useful patterns from large amounts of data instead of relying only on hand-written rules.

For example

A deep-learning system can learn visual patterns that distinguish cats from dogs.

#

Deepfake

Everyday

Convincing fake media made or altered with AI.

Deepfakes can imitate a person's face, voice or actions. They have creative uses, but can also enable fraud, harassment and misinformation, especially when presented without clear labelling.

For example

A scammer clones an executive's voice to request an urgent bank transfer.

#

Diffusion model

Deeper

A generative model that learns to turn random noise into a useful image, sound or other output.

During training it learns how to reverse a gradual noising process. When generating, it starts with noise and repeatedly refines it toward something matching the prompt.

For example

Many text-to-image tools use diffusion to build a picture over a series of denoising steps.

#

Digital twin

Everyday

A digital representation of a real object, place or process that changes as the real one changes.

A digital twin uses measurements and models to help people monitor, test or predict what may happen in its physical counterpart. It is more than a static 3D picture because it is connected to real or regularly updated data.

For example

Engineers use a digital twin of a wind turbine to watch its condition and test maintenance plans.

#

Discriminator (in a GAN)

Deeper

The part of a GAN that learns to tell real training examples from generated ones.

A discriminator acts like a forgery detective. The generator presents increasingly convincing fakes, and the discriminator tries to expose them. Feedback from the detective helps the forger improve, while better fakes force the detective to become sharper.

For example

A discriminator examines faces and estimates whether each came from the real dataset or the generator.

#

Distillation

Deeper

Training a smaller model to imitate useful behaviour from a larger one.

The smaller 'student' aims to keep much of the larger 'teacher' model's ability while becoming cheaper or faster to run. Some quality is often traded for efficiency.

For example

A company distils a large language model into a smaller assistant that can run on a phone.

#

Double descent

Deeper

A surprising pattern where a model gets worse as it grows, then starts getting better again.

The traditional expectation is that an overly complex model will overfit and keep performing badly. In some modern systems, test performance follows two downward slopes: it improves, worsens around a troublesome middle size, then improves again when the model becomes much larger. It is an observed pattern, not a promise that making any model bigger will help.

For example

A sequence of increasingly large models may show falling test error, then a spike, followed by another fall.

#