没有合适的资源?快使用搜索试试~ 我知道了~
首页Techniques for Data Hiding
Techniques for Data Hiding
需积分: 9 14 下载量 163 浏览量
更新于2023-03-03
评论
收藏 421KB PDF 举报
Data hiding, a form of steganography, embeds data into digital media for the purpose of identification, annotation, and copyright. Seve constraints affect this process: the quantity of data to be hidden, the need for invariance of the data under conditions where a “host” signal is subject to distortions, e.g., lossy compression and the degree to which the data must be imm to interception, modification, or removal by a t party. We explore both traditional and novel techniques for addressing the data-hiding proc and evaluate these techniques in light of three applications: copyright protection, tamper-proofing, and augmentation data embedding.
资源详情
资源评论
资源推荐
313
©Copyright 1996 by International Business Machines Corpora-
tion. Copying in printed form for private use is permitted without
payment of royalty provided that (1) each reproduction is done
without alteration and (2) the Journal reference and IBM copyright
notice are included on the first page. The title and abstract, but no
other portions, of this paper may be copied or distributed royalty
free without further permission by computer-based and other infor-
mation-service systems. Permission to republish any other portion
of this paper must be obtained from the Editor.
IBM SYSTEMS JOURNAL, VOL 35, NOS 3&4, 1996 0018-8670/96/$5.00 1996 IBM BENDER ET AL.
Data hiding, a form of steganography, embeds
data into digital media for the purpose of
identification, annotation, and copyright. Several
constraints affect this process: the quantity of
data to be hidden, the need for invariance of these
data under conditions where a “host” signal is
subject to distortions, e.g., lossy compression,
and the degree to which the data must be immune
to interception, modification, or removal by a third
party. We explore both traditional and novel
techniques for addressing the data-hiding process
and evaluate these techniques in light of three
applications: copyright protection, tamper-
proofing, and augmentation data embedding.
igital representation of media facilitates access
and potentially improves the portability, effi-
ciency, and accuracy of the information presented.
Undesirable effects of facile data access include an
increased opportunity for violation of copyright and
tampering with or modification of content. The moti-
vation for this work includes the provision of protec-
tion of intellectual property rights, an indication of
content manipulation, and a means of annotation.
Data hiding represents a class of processes used to
embed data, such as copyright information, into vari-
ous forms of media such as image, audio, or text with
a minimum amount of perceivable degradation to the
“host” signal; i.e., the embedded data should be invis-
ible and inaudible to a human observer. Note that data
hiding, while similar to compression, is distinct from
encryption. Its goal is not to restrict or regulate access
to the host signal, but rather to ensure that embedded
data remain inviolate and recoverable.
Two important uses of data hiding in digital media are
to provide proof of the copyright, and assurance of
content integrity. Therefore, the data should stay hid-
den in a host signal, even if that signal is subjected to
manipulation as degrading as filtering, resampling,
cropping, or lossy data compression. Other applica-
tions of data hiding, such as the inclusion of augmen-
tation data, need not be invariant to detection or
removal, since these data are there for the benefit of
both the author and the content consumer. Thus, the
techniques used for data hiding vary depending on the
quantity of data being hidden and the required invari-
ance of those data to manipulation. Since no one
method is capable of achieving all these goals, a class
of processes is needed to span the range of possible
applications.
The technical challenges of data hiding are formida-
ble. Any “holes” to fill with data in a host signal,
either statistical or perceptual, are likely targets for
removal by lossy signal compression. The key to suc-
cessful data hiding is the finding of holes that are not
suitable for exploitation by compression algorithms.
A further challenge is to fill these holes with data in a
way that remains invariant to a large class of host sig-
nal transformations.
D
Techniques for data
hiding
by W. Bender
D. Gruhl
N. Morimoto
A. Lu
BENDER ET AL. IBM SYSTEMS JOURNAL, VOL 35, NOS 3&4, 1996
314
Features and applications
Data-hiding techniques should be capable of embed-
ding data in a host signal with the following restric-
tions and features:
1. The host signal should be nonobjectionally
degraded and the embedded data should be mini-
mally perceptible. (The goal is for the data to
remain hidden. As any magician will tell you, it is
possible for something to be hidden while it
remains in plain sight; you merely keep the person
from looking at it. We will use the words hidden,
inaudible, imperceivable, and invisible to mean
that an observer does not notice the presence of the
data, even if they are perceptible.)
2. The embedded data should be directly encoded
into the media, rather than into a header or wrap-
per, so that the data remain intact across varying
data file formats.
3. The embedded data should be immune to modifi-
cations ranging from intentional and intelligent
attempts at removal to anticipated manipulations,
e.g., channel noise, filtering, resampling, cropping,
encoding, lossy compressing, printing and scan-
ning, digital-to-analog (D/A) conversion, and ana-
log-to-digital (A/D) conversion, etc.
4. Asymmetrical coding of the embedded data is
desirable, since the purpose of data hiding is to
keep the data in the host signal, but not necessarily
to make the data difficult to access.
5. Error correction coding
1
should be used to ensure
data integrity. It is inevitable that there will be
some degradation to the embedded data when the
host signal is modified.
6. The embedded data should be self-clocking or
arbitrarily re-entrant. This ensures that the embed-
ded data can be recovered when only fragments of
the host signal are available, e.g., if a sound bite is
extracted from an interview, data embedded in the
audio segment can be recovered. This feature also
facilitates automatic decoding of the hidden data,
since there is no need to refer to the original host
signal.
Applications. Trade-offs exist between the quantity
of embedded data and the degree of immunity to host
signal modification. By constraining the degree of
host signal degradation, a data-hiding method can
operate with either high embedded data rate, or high
resistance to modification, but not both. As one
increases, the other must decrease. While this can be
shown mathematically for some data-hiding systems
such as a spread spectrum, it seems to hold true for all
data-hiding systems. In any system, you can trade
bandwidth for robustness by exploiting redundancy.
The quantity of embedded data and the degree of host
signal modification vary from application to applica-
tion. Consequently, different techniques are employed
for different applications. Several prospective applica-
tions of data hiding are discussed in this section.
An application that requires a minimal amount of
embedded data is the placement of a digital water
mark. The embedded data are used to place an indica-
tion of ownership in the host signal, serving the same
purpose as an author’s signature or a company logo.
Since the information is of a critical nature and the
signal may face intelligent and intentional attempts to
destroy or remove it, the coding techniques used must
be immune to a wide variety of possible modifica-
tions.
A second application for data hiding is tamper-proof-
ing. It is used to indicate that the host signal has been
modified from its authored state. Modification to the
embedded data indicates that the host signal has been
changed in some way.
A third application, feature location, requires more
data to be embedded. In this application, the embed-
ded data are hidden in specific locations within an
image. It enables one to identify individual content
features, e.g., the name of the person on the left versus
the right side of an image. Typically, feature location
data are not subject to intentional removal. However,
it is expected that the host signal might be subjected
to a certain degree of modification, e.g., images are
routinely modified by scaling, cropping, and tone-
scale enhancement. As a result, feature location data-
hiding techniques must be immune to geometrical and
nongeometrical modifications of a host signal.
Trade-offs exist between
the quantity of data and
the immunity to
modification.
IBM SYSTEMS JOURNAL, VOL 35, NOS 3&4, 1996 BENDER ET AL.
315
Image and audio captions (or annotations) may
require a large amount of data. Annotations often
travel separately from the host signal, thus requiring
additional channels and storage. Annotations stored in
file headers or resource sections are often lost if the
file format is changed, e.g., the annotations created in
a Tagged Image File Format (TIFF) may not be present
when the image is transformed to a Graphic Inter-
change Format (GIF). These problems are resolved by
embedding annotations directly into the data structure
of a host signal.
Prior work. Adelson
2
describes a method of data hid-
ing that exploits the human visual system’s varying
sensitivity to contrast versus spatial frequency. Adel-
son substitutes high-spatial frequency image data for
hidden data in a pyramid-encoded still image. While
he is able to encode a large amount of data efficiently,
there is no provision to make the data immune to
detection or removal by typical manipulations such as
filtering and rescaling. Stego,
3
one of several widely
available software packages, simply encodes data in
the least-significant bit of the host signal. This tech-
nique suffers from all of the same problems as Adel-
son’s method but creates an additional problem of
degrading image or audio quality. Bender
4
modifies
Adelson’s technique by using chaos as a means to
encrypt the embedded data, deterring detection, but
providing no improvement to immunity to host signal
manipulation. Lippman
5
hides data in the chromi-
nance channel of the National Television Standards
Committee (NTSC) television signal by exploiting the
temporal over-sampling of color in such signals. Typi-
cal of Enhanced Definition Television Systems, this
method encodes a large amount of data, but the data
are lost to most recording, compression, and transcod-
ing processes. Other techniques, such as Hecht’s
Data-Glyph,
6
which adds a bar code to images, are
engineered in light of a predetermined set of geomet-
ric modifications.
7
Spread-spectrum,
8-11
a promising
technology for data hiding, is difficult to intercept and
remove but often introduces perceivable distortion
into the host signal.
Problem space. Each application of data hiding
requires a different level of resistance to modification
and a different embedded data rate. These form the
theoretical data-hiding problem space (see Figure 1).
There is an inherent trade-off between bandwidth and
“robustness,” or the degree to which the data are
immune to attack or transformations that occur to the
host signal through normal usage, e.g., compression,
resampling, etc. The more data to be hidden, e.g., a
caption for a photograph, the less secure the encoding.
The less data to be hidden, e.g., a watermark, the more
secure the encoding.
Data hiding in still images
Data hiding in still images presents a variety of chal-
lenges that arise due to the way the human visual sys-
tem (HVS) works and the typical modifications that
images undergo. Additionally, still images provide a
relatively small host signal in which to hide data. A
fairly typical 8-bit picture of 200 × 200 pixels pro-
vides approximately 40 kilobytes (kB) of data space
in which to work. This is equivalent to only around 5
seconds of telephone-quality audio or less than a sin-
gle frame of NTSC television. Also, it is reasonable to
expect that still images will be subject to operations
ranging from simple affine transforms to nonlinear
transforms such as cropping, blurring, filtering, and
lossy compression. Practical data-hiding techniques
need to be resistant to as many of these transforma-
tions as possible.
Despite these challenges, still images are likely candi-
dates for data hiding. There are many attributes of the
HVS that are potential candidates for exploitation in a
data-hiding system, including our varying sensitivity
to contrast as a function of spatial frequency and the
masking effect of edges (both in luminance and
Figure 1 Conceptual data-hiding problem space
ROBUSTNESS
BANDWIDTH
EXTENT OF CURRENT TECHNIQUES
BENDER ET AL. IBM SYSTEMS JOURNAL, VOL 35, NOS 3&4, 1996
316
chrominance). The HVS has low sensitivity to small
changes in luminance, being able to perceive changes
of no less than one part in 30 for random patterns.
However, in uniform regions of an image, the HVS
is more sensitive to the change of the luminance,
approximately one part in 240. A typical CRT (cathode
ray tube) display or printer has a limited dynamic
range. In an image representation of one part in 256,
e.g., 8-bit gray levels, there is potentially room to hide
data as pseudorandom changes to picture brightness.
Another HVS “hole” is our relative insensitivity to
very low spatial frequencies such as continuous
changes in brightness across an image, i.e., vignett-
ing. An additional advantage of working with still
images is that they are noncausal. Data-hiding tech-
niques can have access to any pixel or block of pixels
at random.
Using these observations, we have developed a variety
of techniques for placing data in still images. Some
techniques are more suited to dealing with small
amounts of data, while others to large amounts. Some
techniques are highly resistant to geometric modifica-
tions, while others are more resistant to nongeometric
modifications, e.g., filtering. We present methods that
explore both of these areas, as well as their combina-
tion.
Low bit-rate data hiding
With low bit-rate encoding, we expect a high level of
robustness in return for low bandwidth. The emphasis
is on resistance to attempts of data removal by a third
party. Both a statistical and a perceptual technique are
discussed in the next sections on Patchwork, texture,
and applications.
Patchwork: A statistical approach
The statistical approach, which we refer to as Patch-
work, is based on a pseudorandom, statistical process.
Patchwork invisibly embeds in a host image a specific
statistic, one that has a Gaussian distribution. Figure 2
shows a single iteration in the Patchwork method.
Two patches are chosen pseudorandomly, the first A,
the second B. The image data in patch A are lightened
while the data in patch B are darkened (exaggerated
for purposes of this illustration). This unique statistic
indicates the presence or absence of a signature.
Patchwork is independent of the contents of the host
image. It shows reasonably high resistance to most
nongeometric image modifications.
For the following analysis, we make the following
simplifying assumptions (these assumptions are not
limiting, as is shown later): We are operating in a 256
level, linearly quantized system starting at 0; all
brightness levels are equally likely; all samples are
independent of all other samples.
The Patchwork algorithm proceeds as follows: take
any two points, A and B, chosen at random in an
image. Let a equal the brightness at point A and b the
brightness at point B. Now, let
(1)
The expected value of S is 0, i.e., the average value of
S after repeating this procedure a large number of
times is expected to be 0.
Sab–=
Figure 2 A single iteration in the Patchwork method
(photograph courtesy of Webb Chapel)
IBM SYSTEMS JOURNAL, VOL 35, NOS 3&4, 1996 BENDER ET AL.
317
Although the expected value is 0, this does not tell us
much about what S will be for a specific case. This is
because the variance is quite high for this procedure.
The variance of S, σ
s
is a measure of how tightly sam-
ples of S will cluster around the expected value of 0.
To compute this, we make the following observation:
Since S = a − b and a and b are assumed independent,
can be computed as follows (this, and all other
probability equations are from Drake
12
):
(2)
where for a uniform S is:
(3)
Now, since a and b are samples from the
same set, taken with replacement. Thus:
(4)
which yields a standard deviation σ
S
≈ 104. This
means that more than half the time, S will be greater
than 43 or less than − 43. Assuming a Gaussian clus-
tering, a single iteration does not tell us much. How-
ever, this is not the case if we perform the procedure
many times.
Let us repeat this procedure n times, letting a
i
and b
i
be the values a and b take on during the ith iteration,
S
i
. Now let S
n
be defined as:
(5)
The expected value of S
n
is:
(6)
This makes intuitive sense, since the number of times
a
i
is greater than b
i
should be offset by the number of
times the reverse is true. Now the variance is:
(7)
And the standard deviation is:
(8)
Now, we can compute S
10000
for a picture, and if it var-
ies by more than a few standard deviations, we can be
fairly certain that this did not happen by chance. In
fact, since as we will show later S′
n
for large n has a
Gaussian distribution, a deviation of even a few σ
S′
s
indicates to a high degree of certainty the presence of
encoding (see Table 1).
The Patchwork method artificially modifies S for a
given picture, such that S′
n
is many deviations away
from expected. To encode a picture, we:
1. Use a specific key for a known pseudorandom
number generator to choose (a
i
, b
i
). This is impor-
tant, because the encoder needs to visit the same
points during decoding.
2. Raise the brightness in the patch a
i
by an amount δ,
typically in the range of 1 to 5 parts in 256.
3. Lower the brightness in b
i
by this same amount δ
(the amounts do not have to be the same, as long as
they are in opposite directions).
4. Repeat this for n steps (n typically ~10 000).
Now, when decoded, S′
n
will be:
(9)
or:
(10)
So each step of the way we accumulate an expectation
of 2 × δ. Thus after n repetitions, we expect S′
n
to be:
σ
S
2
σ
S
2
σ
a
2
σ
b
2
+=
σ
a
2
σ
a
2
5418≈
σ
a
2
σ
b
2
=
σ
S
2
2 σ×
a
2
2
255 0–()
2
12
------------------------
× 10836≈≈=
S
n
S
i
i1=
n
∑
a
i
b
i
–
i1=
n
∑
==
S
n
nS× n0× 0===
σ
S
n
2
nσ
S
2
×=
σ
S
n
nσ×n104×≈=
S
n
′
a
i
δ+()b
i
δ–()–
i1=
n
∑
=
S
n
′
2δna
i
b
i
–()
i1=
n
∑
+=
Table 1 Degree of certainty of encoding given deviation
from that expected in a Gaussian distribution
(
δ =2)
Standard
Deviations
Away
Certainty
n
0
1
2
3
50.00%
84.13%
97.87%
99.87%
0
679
2713
6104
剩余23页未读,继续阅读
choolt
- 粉丝: 3
- 资源: 7
上传资源 快速赚钱
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
会员权益专享
最新资源
- RTL8188FU-Linux-v5.7.4.2-36687.20200602.tar(20765).gz
- c++校园超市商品信息管理系统课程设计说明书(含源代码) (2).pdf
- 建筑供配电系统相关课件.pptx
- 企业管理规章制度及管理模式.doc
- vb打开摄像头.doc
- 云计算-可信计算中认证协议改进方案.pdf
- [详细完整版]单片机编程4.ppt
- c语言常用算法.pdf
- c++经典程序代码大全.pdf
- 单片机数字时钟资料.doc
- 11项目管理前沿1.0.pptx
- 基于ssm的“魅力”繁峙宣传网站的设计与实现论文.doc
- 智慧交通综合解决方案.pptx
- 建筑防潮设计-PowerPointPresentati.pptx
- SPC统计过程控制程序.pptx
- SPC统计方法基础知识.pptx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0