||
Quasi-Natural Language Programming
Pu Yin
Computer Science College. Wuhan University
Wuhan, China
yinpu@whu.edu.cn
Abstract—A new programming language—Quasi-Natural Language and an implementation of this language—Kaimeng language processing platform is introduced in this paper. Although much effort was made in early days to program in natural language, it could not succeed without an effective knowledge processing mechanic. Kaimeng represent knowledge as in natural language. It is designed to work at higher level than traditional programming language. The elements of language are unlimited words in a form close to natural language. Both the extension and intension of vocabulary are expandable. The alphabet or character in any natural language may be used in this language, e.g., Chinese, English even mixture of multi-language.
Keywords-Quasi-Natural Language Programming; Natural language understanding; Knowledge representation)
When traditional computer programming languages (referred as traditional language later) behave awkward in expressing abstract concept or complicated requirement, natural languages reveal no difference in describingabstract concept and detail concepts or intricate cases and simple cases. When you ask your wife “take a pound of spinach home” in natural language, you can just ask a computer doing an addition in computer language. When you claim “the two brothers looks alike”, you can just let computer judge if an integer variable A larger than variable B. Carrying same information, which is simply expressed in natural language, may become a suffering task in traditional language. The disparity in these 2 kinds of language comes from knowledge representation. We set off from existing knowledge in natural language, but start from detailed statements almost without any knowledge in traditional languages. Evidently programming in natural language is undoubtedly liberation to all programmers.
Researchers explored the feasibility of programming in natural language even in early stage of computer science [1] [2] [3] [4]. Researchers tried to implement natural language programming [1] and some important ideas are proposed, as “Action, or non-copular verbs (everything except verbs like to be, and to seem) map to functions, while noun phrases map to classes. Adjectival modifiers map to properties of a class, and adverbial modifiers map to auxiliary arguments to functions”. And other important topics are also discussed as object inheritance and object inference.
Although totally free style natural language programming is still not practical, programming in natural words under certain lexical and syntactic rules is possible. A language architecture called quazi-natural language is brought forward in this paper and implemented on a language processing platform titled Kaimeng.
The most fundamental differences between natural language and traditional language appear in two aspects: the first is knowledge representation, when natural languages are tools to represent knowledge, traditional languages are just procedure description tools which contain almost no knowledge; the second is the openness of language, natural languages are open languages, expandable both intensively and extensively, while traditional languages are closed languages with limited amount of predefined statements or commands, which is immutable in format and connotation after the language processors are built.
These 2 primary reasons cause some new features appear in Quasi-Natural Language:
Knowledge is always implicitly or explicitly expressed in natural language. When we say “send my regard to John by E-mail”. Some implied knowledge is employed, as what is regard and E-mail, who is John, what is his e-mail address and how to send a mail. So words in natural language not only play roles literally, but also significatively.
Natural languages are evolving ceaselessly. New words join the dictionary and old words absorb new implication by time. Word “WWW” and “CPU” emerge in vocabulary along with “computer”. And “surf” is assigned new connotation. These expansions of language do not require reconstruction of language architecture. Therefore natural languages are opened language.
Computer languages are closed language composed of dozens of commands which are predefined in compilers or interpreters. When compiler or interpreter is built, the amount and meaning of command is fixed. If new command or new function of command is needed, language processor needs to be redesigned. So traditional computer languages are closed.
Not like traditional languages, in which language and language processor is integrative, the openness of language require language independent from language processor. When language processor keeps unchanged, the language could evolve independently.
When new knowledge is added into the language, previous knowledge should keep untouched and effective unless deliberately do. And new knowledge may be built upon existing knowledge.
The best way to make use of existing knowledge is inheritance as in OOP. It’s also adopted in quasi-natural language.
Although it is not necessary, quasi-natural language appeared in a form similar to natural language, to release the burden for learning new language and approach the way of human thinking.
Knowledge representation in quazi-natural language is mainly describing data structure of objects and suitable operation upon objects. All knowledge appears in form of words as in natural language. Word consists of a word name and multiple properties. A word contains at least a property “partofspeech”, to define its syntactic part. Other properties are defined as needed.
The formal expression of word:
<word>→<word name> <property> {<property >}
Data and its structure are represented by noun. TABLE I show the structure of noun sound in .wav format.
PropertyName | PropertyValue |
PartofSpeech | noun |
DataType | complex |
wavHeader | |
wavFormat | |
wavData | |
DevieceHandle | |
FileSpecification | |
LeftChannelSignal | |
RightChannelSignal |
Property PartofSpeech and DataType are syntax properties, to define the syntax part of word and type of data. They are predefined in language interpreter. Other properties in Table 1 are entity properties; need to be defined further, as property wavHeader is defined in TABLE II.
PropertyName | PropertyValue |
PartofSpeech | noun |
DataType | complex |
ChunckID | RIFF |
FileSize | |
wavID | WAVE |
And properties in TABLE I and TABLE II are defined by more nouns, until all property names corresponding to a homonymic noun. TABLE III shows definition of noun “FileSize” in TABLE II. Its type is elementary type “int” predefined in Kaimeng. At this point, all properties in TABLE III are predefined properties; more definition is no longer needed.
PropertyName | PropertyValue |
PartofSpeech | noun |
DataType | int |
Besides element type and complex type, “customer” type is acceptable in Kaimeng. TABLE IV show a noun DeviceHandle, its type is “customer”. Kaimeng does not deal with objects with “customer” type directly; their suitable verbs take the job.
PropertyName | PropertyValue |
PartofSpeech | noun |
DataType | customer |
CreateVerb | Create |
DeleteVerb | Delete |
Form definition above, a tree type hierarchy would be created at run time for noun “wav” as in “Figure 1”.
Every node in the tree is called an object. A path is used to reference an object. As “wav descend wavDate descend dataID” to address object “dataID”. The path order is in Chinese convention (for Kaimeng is originally designed for programming in Chinese), high level object at left and low level object at right.
Suitable operation appears in form of verb or operator. Verbs in quasi-natural language are implemented by dynamic link functions or verb scripts in quasi-natural language. TABLE V show definition of verb “play”, which is implemented with dll function _PlayWav(). There may be several “Reference” properties for verb, which define parameters passed to verb function or script, and also be used as validate check for semantic matching in language interpreter. Duplicated noun name is forbidden in quasi-natural language, but homonymic verbs are permitted. The interpreter chooses a suitable verb from multi homonymic verbs in a specific sentence by the objective noun and “Reference” properties of the verb. Other information necessary for function loading or executing are required too, as “VerbPath”, “VerbFile”.
PropertyName | PropertyValue |
PartofSpeech | verb |
VerbPath | c:\Kaimeng\dsp |
VerbFile | wavproc.dll |
VerbFunction | _PlayWav |
Reference | wav |
Some verbs producing return result have property “ReturnObject”. In TABLE VI, verb “ShowSignal” produces an object “SignalWindow”.
PropertyName | PropertyValue |
PartofSpeech | verb |
VerbPath | c:\Kaimeng\dsp |
VerbFile | wavproc.dll |
VerbFunction | _ShowSignal |
Reference | Signal |
CreatedObject | SignalWindow |
A methodology similar to Object Oriented Programming is adopted to perform knowledge inheritance. Any noun may be inherited by other noun. The former called parent noun and the latter called offspring noun. By defining a “WordClass” property and set the property value to the name of parent noun, the offspring inherit all properties and suitable verbs from parent. When an object is created by offspring noun, it is created by its parent noun first, and then properties of offspring noun are added to the object or overlay former property. If a property defined in offspring noun is not found in parent object, a new property is added to offspring object; otherwise, property defined in offspring noun overlay the former property with same name. TABLE VII shows a noun “PureTone”. With definition of property “WordClass”, “PureTone” inherit all properties and suitable verbs from parent noun “wav”. And other 2 properties “FileSpecification” and “wavFormat” overlay homonymic properties in “wav”.
PropertyName | PropertyValue |
PartofSpeech | noun |
ClassWord | wav |
FileSpecification | PureToneSpecification |
wavFormat | StandardFormat |
Two instances are designed to show the features of Kaimeng. The bold text is executable script. In Kaimeng semicolon serve as sentence separator to avoid ambiguity with dot in number.
This example performs a factorial of 10.
Script :ThereIs an integer , naming p ; p be 1; ThereIs an integer, naming i ; i be 1 ; if i<10 , repeat {link1} .
Link1:p be p*i ; i be i+1 .
Each sentence denotes as:
ThereIs an integer , naming p :create an integer object, and name it as p.
p be 1:assign 1 to p.
ThereIs an integer , naming i :create an integer object, and name it as i.
i be 1 :assign 1 to i.
if i<10 , repeat {link1} :execute {Link1} repeatedly while expression “i<10” is true. There is only one routine control sentence in Kaimeng, which perform both selection and repetition structure.
<link> is an executable text written in Kaimeng which is embedded in script, as compound-statement or block in C language. The text in {} is the name of <link>. In this case, it is “Link1”, which denotes following sentences.
p be p*i :assign p*i to p.
i be i+1 :assign i+1 to i.
Evidently, quasi-natural language reveals no advantage to traditional languages in low level programming.
Following script implements voice record, filter and replay.
Script: ThereIs a wav, naming original; original descend wavFormat be StandardFormat; record original; FFT original descend LeftChannelSignal, produce spectra; bandfilter spectra 3000 and 20; reFFT spectra, produce FilteredSignal; LeftChannelSignal be FilteredSignal; play wav;
Each sentence denotes as:
ThereIs a wav, naming original: create a wav object, named as “original”. wav is defined in TABLE I.
original descend wavFormat be StandardFormat: assign noun “StandardFormat” to object “wavFormat” which is a property under object “original”. That means set “wavFormat” property value under “original” to the noun “StandardFormat”.
record original:act verb “record” upon object “original”. Verb “record” performs record action and writes voice data into LeftChannelSignal and RightChannelSignal properties under “original”.
FFT original descend LeftChannelSignal, produce spectra1: act verb “FFT” upon object “LeftChannelSignal” under object “original”. And name the new object as “spectra1”, which is produced by verb “FFT”.
bandfilter spectra and 3000 and 20: act verb “bandfilter” upon object “spectra” and constant 3000 and 20. It performs a band filtering upon object “spectra”.
reFFT spectra, produce FilteredSignal: act verb “reFFT” upon object “spectra” and name the new object as “FileteredSignal”, which is produced by verb “reFFT” produced.
LeftChannelSignal be FilteredSignal: assign object “FilterdSignal” to object “LeftChannelSignal ”.
play wav: act verb “play” upon object “wav”. The channel is assumed to be 1 here, so not bother to process “RightChannelSignal” property.
This instance indicates Kaimeng may describe a task simply upon existing knowledge as in natural languages. These tasks may require thousands of statements in traditional languages.
One sentence in this script may corresponding to a chunk of code in traditional language. As verb FFT has a kernel in C Language along with necessary shell (the code is originally designed for Chinese, some still stay in Chinese).
bool FFT_Recursive(int N,int offset,int interval,long double *signal[2],
long double *output[2],long double *media[2])
{
int half=N/2;
long double cs,sn;
int k00,k01,k10,k11;
long double tmp0,tmp1;
if (N>2)
{
FFT_Recursive(half,offset,2*interval,signal,media,output);
FFT_Recursive(half,offset+interval,2*interval,signal,media,output);
for (int k=0;k<half;k++)
{
k00=offset+k*interval;
k01=k00+half*interval;
k10=offset+2*k*interval;
k11=k10+interval;
cs=cos(TWO_PI*k/(long double)N);
sn=sin(TWO_PI*k/(long double)N);
tmp0=cs*media[0][k11]+sn*media[1][k11];
tmp1=cs*media[1][k11]-sn*media[0][k11];
output[0][k00]=media[0][k10]+tmp0;
output[1][k00]=media[1][k10]+tmp1;
output[0][k01]=media[0][k10]-tmp0;
output[1][k01]=media[1][k10]-tmp1;
}
}
else
{
k00=offset;
k01=k00+interval;
output[0][k00]=signal[0][k00]+signal[0][k01];
output[1][k00]=signal[1][k00]+signal[1][k01];
output[0][k01]=signal[0][k00]-signal[0][k01];
output[1][k01]=signal[1][k00]-signal[1][k01];
}
}
Shell of FFT in C Language:
extern "C" __declspec(dllexport) bool FFTShell(Server cmd,CSemanticStack* rf,void** rslt)
{
bool result=false;
*rslt=NULL;
if (rf->GetCount()==1)
{
SIngredient ingredient;
rf->Goto(0);
rf->GetElement(ingredient);
//获得信号指针 `
void* start=NULL;
start=ingredient.Pointer;
if (start)
{
Reference rfr;
rfr.Start=start;
strcpy(rfr.Route,"");
DataType dt;
Signal signal;
signal.Type=*(DataType*)cmd(GET_OBJECT,"数值类型",&rfr,NULL,dt);
signal.Length=*(int*)cmd(GET_OBJECT,"信号长度",&rfr,NULL,dt);
signal.Period=*(int*)cmd(GET_OBJECT,"信号周期",&rfr,NULL,dt);
signal.Amplitude=*(int*)cmd(GET_OBJECT,"振幅",&rfr,NULL,dt);
signal.Phase=*(int*)cmd(GET_OBJECT,"相位",&rfr,NULL,dt);
signal.Start=cmd(GET_OBJECT,"信号指针",&rfr,NULL,dt);
Spectrum spectra;
result=FFT(signal,&spectra);
if (result)
{
*rslt=cmd(CREATE_OBJECT,"复变信号",NULL,NULL,dt);
rfr.Start=*rslt;
strcpy(rfr.Route,"实部");
cmd(GET_OBJECT,"信号指针",&rfr,NULL,dt);
cmd(MODIFY_OBJECT,"信号指针",&rfr,spectra.Re.Start,dt);
cmd(GET_OBJECT,"信号长度",&rfr,NULL,dt);
cmd(MODIFY_OBJECT,"信号长度",&rfr,&spectra.Re.Length,dt);
int t=spectra.Re.Type;
cmd(GET_OBJECT,"数值类型",&rfr,NULL,dt);
cmd(MODIFY_OBJECT,"数值类型",&rfr,&t,dt);
strcpy(rfr.Route,"虚部");
cmd(GET_OBJECT,"信号指针",&rfr,NULL,dt);
cmd(MODIFY_OBJECT,"信号指针",&rfr,spectra.Im.Start,dt);
cmd(GET_OBJECT,"信号长度",&rfr,NULL,dt);
cmd(MODIFY_OBJECT,"信号长度",&rfr,&spectra.Im.Length,dt);
cmd(GET_OBJECT,"数值类型",&rfr,NULL,dt);
cmd(MODIFY_OBJECT,"数值类型",&rfr,&t,dt);
}
}
}
return result;
}
Other verbs are implemented likewise. When some knowledge about data structures or suitable operations is achieved, they may be abstracted into simple nouns or verbs. And be used in simple sentence to replace long codes in traditional languages.
Although free natural language programming is still hard, programming with unlimited amount of words in natural form under restricted lexical and syntactic rules is practical. And may boost computer programming to higher level.
A new concept and a language system are brought forward. All the ideas and methodology is implemented in Kaimeng which is developed by author as a prototype for quasi-natural language. It is expected to be a step toward free natural language programming.
[1] Hugo Liu, Henry Lieberman, "Toward a Programmatic Semantics of Natural Language," vlhcc, pp.281-282, 2004 IEEE Symposium on Visual Languages - Human Centric Computing (VLHCC'04), 2004
[2] Henry Lieberman and Hugo Liu,"Feasibility Studies for Programming in Natural Language",in Human-Computer Interaction Series,Volume 9,Springer Netherlands,2006,pp 459-473.
[3] Hugo Liu, Henry Lieberman, Programmatic semantics for natural language interfaces, Conference on Human Factors in Computing Systems, CHI '05 extended abstracts on Human factors in computing systems, Portland, OR, USA, SESSION: Late breaking results: short papers, Pages: 1597 – 1600, Year of Publication:2005.
[4] Manolis Maragoudakis, Nikolaos Cosmas and Aristogiannis Garbis, Mining Natural Language Programming Directives with Class-Oriented Bayesian Networks,in Lecture Notes in Computer Science,Volume 5139/2008,2008,pp 15-26.
[5] Pu Yin, “Implementation of Quasi-Natural Language,” unpublished.
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-5-20 04:33
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社