1 Material and methods
(ⅰ) Sample collection and measurement of NIR
spectrums
The material used in this study is Baizhi (angelicae
ARTICLES
180 Chinese Science Bulletin Vol. 50 No. 2 January 2005
dahuricae radix or ADR) and Danshen (salviae miltiorrhizae
radix or SMR). The origins of the Baizhi samples
are Henan, Hebei, Sichuan and Zhejiang provinces of
China, and all are cultivated plants. The origins of the
Danshen samples are Shandong, Shanxi, Henan, Sichuan,
Zhejiang and Hebei provinces of China, including wild
and cultivated plants. All samples were collected and
identified by experts from China Institute of Traditional
Chinese Medicine. Among all the samples, part are the
same as those we used in our preceding study using
MIR[6,7]. We call this sample set Set-A. The rest of the
samples were provided at the moment when most of the
experiments had been finished. We use these new samples
as independent samples, and we call them Set-B. The origins,
growth conditions and numbers of all samples are
summarized in Tables 1 and 2 for Baizhi and Danshen
respectively. When studying the geographic origin of
Danshen, we combined samples of different growth conditions
from the same origin, and equally, we combined
the samples from different origins with the same cultivation
when studying the different growth conditions.
The NIR reflectance spectrums of these Baizhi and
Danshen samples were measured with Spectrum One NTS
(Perkin Elmer Ltd). Sample powders of 200 meshes were
used in the measurement. The frequency range of the
spectrums is from 4000 to 10000 cm−1 (sampling frequency
2 cm−1). The measurements were all done in the
Analytical Center of Tsinghua University. To reduce
noises, each sample was measured 2—5 times, and the
final spectrum of the sample was the average of the duplicated
measurements. According to observations from
previous studies, we calculated the first-order derivatives
spectrum from the original NIR spectrums as the data used
for the discrimination. This preprocessing can make the
variations in the original spectrums more distinctive and
can also eliminate the effect of base bias between different
spectrums. Fig. 1 shows the examples of NIR derivative
spectrums of Baizhi from different origins. We can see
that the differences between the curves are very subtle and
are hard to be distinguished with naked eyes.
Fig. 1. The NIR derivative spectrums of Baizhi from different origins.
(a) Henan; (b) Hebei; © Sichuan; (d) Zhejiang.
(ⅱ) Preliminary classification experiments and feature
selection
Firstly we used Set-A to study the classification of
geographic origins. We applied the nearest neighbor
method with Pearson’s correlation coefficient as the similarity
measure and the multi-class SVM method on the
whole range of the spectrums to classify samples’ origins
and growth conditions. Leave-one-out (LOO) cross validation
accuracies of 99% and 95% were obtained[7]. The
detailed methods and results were described in refs. [6, 7].
From these preliminary experiments, we observed that not
(ⅰ) Sample collection and measurement of NIR
spectrums
The material used in this study is Baizhi (angelicae
ARTICLES
180 Chinese Science Bulletin Vol. 50 No. 2 January 2005
dahuricae radix or ADR) and Danshen (salviae miltiorrhizae
radix or SMR). The origins of the Baizhi samples
are Henan, Hebei, Sichuan and Zhejiang provinces of
China, and all are cultivated plants. The origins of the
Danshen samples are Shandong, Shanxi, Henan, Sichuan,
Zhejiang and Hebei provinces of China, including wild
and cultivated plants. All samples were collected and
identified by experts from China Institute of Traditional
Chinese Medicine. Among all the samples, part are the
same as those we used in our preceding study using
MIR[6,7]. We call this sample set Set-A. The rest of the
samples were provided at the moment when most of the
experiments had been finished. We use these new samples
as independent samples, and we call them Set-B. The origins,
growth conditions and numbers of all samples are
summarized in Tables 1 and 2 for Baizhi and Danshen
respectively. When studying the geographic origin of
Danshen, we combined samples of different growth conditions
from the same origin, and equally, we combined
the samples from different origins with the same cultivation
when studying the different growth conditions.
The NIR reflectance spectrums of these Baizhi and
Danshen samples were measured with Spectrum One NTS
(Perkin Elmer Ltd). Sample powders of 200 meshes were
used in the measurement. The frequency range of the
spectrums is from 4000 to 10000 cm−1 (sampling frequency
2 cm−1). The measurements were all done in the
Analytical Center of Tsinghua University. To reduce
noises, each sample was measured 2—5 times, and the
final spectrum of the sample was the average of the duplicated
measurements. According to observations from
previous studies, we calculated the first-order derivatives
spectrum from the original NIR spectrums as the data used
for the discrimination. This preprocessing can make the
variations in the original spectrums more distinctive and
can also eliminate the effect of base bias between different
spectrums. Fig. 1 shows the examples of NIR derivative
spectrums of Baizhi from different origins. We can see
that the differences between the curves are very subtle and
are hard to be distinguished with naked eyes.
Fig. 1. The NIR derivative spectrums of Baizhi from different origins.
(a) Henan; (b) Hebei; © Sichuan; (d) Zhejiang.
(ⅱ) Preliminary classification experiments and feature
selection
Firstly we used Set-A to study the classification of
geographic origins. We applied the nearest neighbor
method with Pearson’s correlation coefficient as the similarity
measure and the multi-class SVM method on the
whole range of the spectrums to classify samples’ origins
and growth conditions. Leave-one-out (LOO) cross validation
accuracies of 99% and 95% were obtained[7]. The
detailed methods and results were described in refs. [6, 7].
From these preliminary experiments, we observed that not