The grammar and spelling rules of a programming language constitute its

Lexical analysis highlights one of the subtle ways in which programming languages differ from natural languages, such as English or Chinese. In natural languages, the relationship between a word's representation—its spelling or its pictogram—and its meaning is not obvious. In English, are is a verb while art is a noun, even though they differ only in the final character. Furthermore, not all combinations of characters are legitimate words. For example, arz differs minimally from are and art, but does not occur as a word in normal English usage.

A scanner for English could use fa-based techniques to recognize potential words, since all English words are drawn from a restricted alphabet. After that, however, it must look up the prospective word in a dictionary to determine if it is, in fact, a word. If the word has a unique part of speech, dictionary lookup will also resolve that issue. However, many English words can be classified with several parts of speech. Examples include buoy and stress; both can be either a noun or a verb. For these words, the part of speech depends on the surrounding context. In some cases, understanding the grammatical context suffices to classify the word. In other cases, it requires an understanding of meaning, for both the word and its context.

In contrast, the words in a programming language are almost always specified lexically. Thus, any string in [1…9][0…9]* is a positive integer. The re [a…z]([a…z] | [0…9])* defines a subset of the Algol identifiers; arz, are and art are all identifiers, with no lookup needed to establish the fact. To be sure, some identifiers may be reserved as keywords. However, these exceptions can be specified lexically, as well. No context is required.

This property results from a deliberate decision in programming language design. The choice to make spelling imply a unique part of speech simplifies scanning, simplifies parsing, and, apparently, gives up little in the expressiveness of the language. Some languages have allowed words with dual parts of speech—for example, pl/i has no reserved keywords. The fact that more recent languages abandoned the idea suggests that the complications outweighed the extra linguistic flexibility.

Regular expressions are closed under many operations—that is, if we apply the operation to an re or a collection of res, the result is an re. Obvious examples are concatenation, union, and closure. The concatenation of two res x and y is just xy. Their union is x | y. The Kleene closure of x is just x*. From the definition of an re, all of these expressions are also res.

These closure properties play a critical role in the use of res to build scanners. Assume that we have an re for each syntactic category in the source language, a0,a1, a2, …, an. Then, to construct an re for all the valid words in the language, we can join them with alternation as a0 | a1 | a2 | … | an. Since res are closed under union, the result is an re. Anything that we can do to an re for a single syntactic category will be equally applicable to the re for all the valid words in the language.

Closure under union implies that any finite language is a regular language. We can construct an re for any finite collection of words by listing them in a large alternation. Because the set of res is closed under union, that alternation is an re and the corresponding language is regular.

Closure under concatenation allows us to build complex res from simpler ones by concatenating them. This property seems both obvious and unimportant. However, it lets us piece together res in systematic ways. Closure ensures that ab is an re as long as both a and b are res. Thus, any techniques that can be applied to either a or b can be applied to ab; this includes constructions that automatically generate a recognizer from res.

Regular expressions are also closed under both Kleene closure and the finite closures. This property lets us specify particular kinds of large, or even infinite, sets with finite patterns. Kleene closure lets us specify infinite sets with concise finite patterns; examples include the integers and unbounded-length identifiers. Finite closures let us specify large but finite sets with equal ease.

The next section shows a sequence of constructions that build an fa to recognize the language specified by an re. Section 2.6 shows an algorithm that goes the other way, from an fa to an re. Together, these constructions establish the equivalence of res and fas. The fact that res are closed under alternation, concatenation, and closure is critical to these constructions.

The equivalence between res and fas also suggests other closure properties. For example, given a complete fa, we can construct an fa that recognizes all words w that are not in L(fa), called the complement of L(fa). To build this new fa for the complement, we can swap the designation of accepting and nonaccepting states in the original fa. This result suggests that res are closed under complement. Indeed, many systems that use res include a complement operator, such as the ^ operator in lex.

Complete fa

an fa that explicitly includes all error transitions

Section Review

Regular expressions are a concise and powerful notation for specifying the microsyntax of programming languages. res build on three basic operations over finite alphabets: alternation, concatenation, and Kleene closure. Other convenient operators, such as finite closures, positive closure, and complement, derive from the three basic operations. Regular expressions and finite automata are related; any re can be realized in an fa and the language accepted by any fa can be described with re. The next section formalizes that relationship.

Review Questions

1.

Recall the re for a six-character identifier, written using a finite closure.

([A…Z]|[a…z]) ([A…Z]|[a…z]|[0…9])5

Rewrite it in terms of the three basic re operations: alternation, concatenation, and closure.

2.

In pl/i, the programmer can insert a quotation mark into a string by writing two quotation marks in a row. Thus, the string

The quotation mark, ", should be typeset in italics

would be written in a pl/i program as

"The quotation mark, "", should be typeset in italics."

Design an re and an fa to recognize pl/i strings. Assume that strings begin and end with quotation marks and contain only symbols drawn from an alphabet, designated as Σ. Quotation marks are the only special case.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780120884780000025

Runtime Modification

Erez Metula, in Managed Code Rootkits, 2011

Publisher Summary

Programming languages were created to describe the computations performed by computers. Each language defines its own format (syntax) declaring how code should be structured, and the meaning (semantics) of the language elements. The compiler's responsibility is to generate machine instructions based on the “contract” declared between the source code represented using the language definition. This chapter discusses what happens if one changes the language definition. The chapter established a technique that is used to customize the runtimes, namely .NET, Java, and Dalvik. It begins by demonstrating the steps of modifying the implementation of .NET's WriteLine method so that each application calling this method will be influenced by the modified logic. Further, the steps for modifying Java, by using a similar example with the println method, are demonstrated. Finally, the chapter discusses the steps of modifying Dalvik, this time demonstrating how to eliminate the behavior of a specific method. The significance of these kinds of simple PoCs is that the runtimes can be changed using a variety of techniques.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9781597495745000040

Software Architectures and Tools for Computer Aided Process Engineering

B. Braunschweig, R. Gani, in Computer Aided Chemical Engineering, 2002

Programming languages

Programming languages evolve as well, with new languages appearing periodically, able to take advantages of the new hardware and operating systems. Object-oriented languages (e.g. C++, Smalltalk, Sun’s Java, Microsoft’s C#) almost completely replaced the traditional procedural code that was the standard ten years ago (i.e. Fortran and C). Developers of CAPE applications increasingly use sophisticated development tools such as Microsoft’s Visual Studio or Sun’s J2EE toolkit. Source code is often generated from higher-level or abstract models and descriptions, and component libraries supplied with Application Programming Interfaces (APIs) are used for several simple and not-so-simple tasks (handling of lists and arrays, database handling, numerical processing, graphical user interface widgets, etc.). We cannot leave this list without adding the many languages used for the Internet, from the simple HTML static descriptions to the CGI scripts for database access, and to the Javascript and Active Server Pages dynamic systems, some of the most popular technologies in this field.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/S1570794602800049

Virtualization

Rajkumar Buyya, ... S. Thamarai Selvi, in Mastering Cloud Computing, 2013

3.3.1.3 Programming language-level virtualization

Programming language-level virtualization is mostly used to achieve ease of deployment of applications, managed execution, and portability across different platforms and operating systems. It consists of a virtual machine executing the byte code of a program, which is the result of the compilation process. Compilers implemented and used this technology to produce a binary format representing the machine code for an abstract architecture. The characteristics of this architecture vary from implementation to implementation. Generally these virtual machines constitute a simplification of the underlying hardware instruction set and provide some high-level instructions that map some of the features of the languages compiled for them. At runtime, the byte code can be either interpreted or compiled on the fly—or jitted5—against the underlying hardware instruction set.

Programming language-level virtualization has a long trail in computer science history and originally was used in 1966 for the implementation of Basic Combined Programming Language (BCPL), a language for writing compilers and one of the ancestors of the C programming language. Other important examples of the use of this technology have been the UCSD Pascal and Smalltalk. Virtual machine programming languages become popular again with Sun’s introduction of the Java platform in 1996. Originally created as a platform for developing Internet applications, Java became one of the technologies of choice for enterprise applications, and a large community of developers formed around it. The Java virtual machine was originally designed for the execution of programs written in the Java language, but other languages such as Python, Pascal, Groovy, and Ruby were made available. The ability to support multiple programming languages has been one of the key elements of the Common Language Infrastructure (CLI), which is the specification behind .NET Framework. Currently, the Java platform and .NET Framework represent the most popular technologies for enterprise application development.

Both Java and the CLI are stack-based virtual machines: The reference model of the abstract architecture is based on an execution stack that is used to perform operations. The byte code generated by compilers for these architectures contains a set of instructions that load operands on the stack, perform some operations with them, and put the result on the stack. Additionally, specific instructions for invoking methods and managing objects and classes are included. Stack-based virtual machines possess the property of being easily interpreted and executed simply by lexical analysis and hence are easily portable over different architectures. An alternative solution is offered by register-based virtual machines, in which the reference model is based on registers. This kind of virtual machine is closer to the underlying architecture we use today. An example of a register-based virtual machine is Parrot, a programming-level virtual machine that was originally designed to support the execution of PERL and then generalized to host the execution of dynamic languages.

The main advantage of programming-level virtual machines, also called process virtual machines, is the ability to provide a uniform execution environment across different platforms. Programs compiled into byte code can be executed on any operating system and platform for which a virtual machine able to execute that code has been provided. From a development life-cycle point of view, this simplifies the development and deployment efforts since it is not necessary to provide different versions of the same code. The implementation of the virtual machine for different platforms is still a costly task, but it is done once and not for any application. Moreover, process virtual machines allow for more control over the execution of programs since they do not provide direct access to the memory. Security is another advantage of managed programming languages; by filtering the I/O operations, the process virtual machine can easily support sandboxing of applications. As an example, both Java and .NET provide an infrastructure for pluggable security policies and code access security frameworks. All these advantages come with a price: performance. Virtual machine programming languages generally expose an inferior performance compared to languages compiled against the real architecture. This performance difference is getting smaller, and the high compute power available on average processors makes it even less important.

Implementations of this model are also called high-level virtual machines, since high-level programming languages are compiled to a conceptual ISA, which is further interpreted or dynamically translated against the specific instruction of the hosting platform.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124114548000036

Software Agent Systems

Rainer Unland, in Industrial Agents, 2015

1.2.6.2 Agent-Oriented Programming Languages

An agent programming language, sometimes also called agent-oriented programming language (AOP), permits developing and programming intentional agents—in other words, the developed agents usually operate on a semantically higher level than those developed with the help of development toolkits. An AOP usually provides the basic building blocks to design and implement intentional agents by means of a set of programming constructs. These programming constructs facilitate the manipulation of the agents’ beliefs and goals and the structuring of their decision making. The language usually provides an intuitive programming framework based on symbolic or practical reasoning. Shoham (1993) suggests that an AOP system needs the following three elements in order to be complete:

A formal language with clear syntax for describing the mental state. This includes constructs for declaring beliefs and their structure (e.g., based on predicate calculus) and passing messages.

A programming language that permits defining agents. The semantics of this language should be closely related to those of the formal language.

A method for converting neutral applications into agents in order to allow an agent to communicate with a non-agent by attributing intentions.

The most important AOPs are logic-based. They had their high in research some time ago, which is why many of them are not maintained any longer. Table 1.4 lists some relevant AOPs.

Table 1.4. Agent-Oriented Programming Language

NameReference3APLDastani et al. (2005)AgentSpeak(L)/JasonKinny et al. (1996), Rao (1996), Bordini et al. (2007)ASPECSCossentino et al. (2007)GOALHindriks et al. (2012) GOAL (2011)GologLevesque et al. (1997)MetateMDennis et al. (2008)PLACAThomas (1995)

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128003411000012

The digital computer

Martin Plonus, in Electronics and Communications for Scientists and Engineers (Second Edition), 2020

8.2.3 Communicating with a computer: Programming languages

Programming languages provide the link between human thought processes and the binary words of machine language that control computer actions, in other words, instructions written by a programmer that the computer can execute. A computer chip understands machine language only, that is, the language of 0’s and 1’s. Programming in machine language is incredibly slow and easily leads to errors. Assembly languages were developed that express elementary computer operations as mnemonics instead of numeric instructions. For example, to add two numbers, the instruction in assembly language is ADD. Even though programming in assembly language is time consuming, assembly language programs can be very efficient and should be used especially in applications where speed, access to all functions on board, and size of executable code are important. A program called an assembler is used to convert the application program written in assembly language to machine language. Although assembly language is much easier to use since the mnemonics make it immediately clear what is meant by a certain instruction, it must be pointed out that assembly language is coupled to the specific microprocessor. This is not the case for higher-level languages. Higher languages such as C/C ++, JAVA, and scripting languages like Python, were developed to reduce programming time, which usually is the largest block of time consumed in developing new software. Even though such programs are not as efficient as programs written in assembly language, the savings in product development time when using a language such as C has reduced the use of assembly language programming to special situations where speed and access to all a computer's features is important. A compiler is used to convert a C program into the machine language of a particular type of microprocessor. A high-level language such as C is frequently used even in software for 8-bit controllers, and C ++ and JAVA are often used in used in the design of software for 16-, 32-, and 64-bit microcontrollers.

Fig. 8.1 illustrates the translation of human thought to machine language by use of programming languages.

The grammar and spelling rules of a programming language constitute its

Fig. 8.1. Programming languages provide the link between human thought statements and the 0’s and 1’s of machine code which the computer can execute.

Single statements in a higher-level language, which is close to human thought expressions, can produce hundreds of machine instructions, whereas a single statement in the lower-level assembly language, whose symbolic code more closely resembles machine code, generally produces only one instruction.

Fig. 8.2 shows how a 16-bit processor would execute a simple 16-bit program to add the numbers in memory locations X, Y, and Z and store the sum in memory location D. The first column shows the binary instructions in machine language. Symbolic instructions in assembly language, which have a nearly one-to-one correspondence with the machine language instructions, are shown in the next column. They are quite mnemonic and should be read as “Load the number at location X into the accumulator register; add the number at location Y to the number in the accumulator register; add the number at location Z to the number in the accumulator register; store the number in the accumulator register at location D.” The accumulator register in a microcontroller is a special register where most of the arithmetic operations are performed. This series of assembly language statements, therefore, accomplishes the desired result. This sequence of assembly statements would be input to the assembler program that would translate them into the corresponding machine language (first column) needed by the computer. After assembly, the machine language program would be loaded into the machine and the program executed. Because programming in assembly language involves many more details and low-level details relating to the structure of the microcomputer, higher-level languages have been developed. FORTRAN (FOR-mula TRANslator) was one of the earlier and most widely used programming languages and employs algebraic symbols and formulas as program statements. Thus the familiar algebraic expression for adding numbers becomes a FORTRAN instruction; for example, the last column in Fig. 8.2 is the FORTRAN statement for adding the three numbers and is compiled into the set of corresponding machine language instructions of the first column.

The grammar and spelling rules of a programming language constitute its

Fig. 8.2. Three types of program instructions. Machine language gives instructions as 0’s and 1’s and is the only language that the computer understands. Assembly language is more concise but still very cumbersome when programming. A high-level language such as FORTRAN or C facilitates easy programming.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128170083000085

Overview of Compilation

Keith D. Cooper, Linda Torczon, in Engineering a Compiler (Second Edition), 2012

Checking Syntax

To check the syntax of the input program, the compiler must compare the program's structure against a definition for the language. This requires an appropriate formal definition, an efficient mechanism for testing whether or not the input meets that definition, and a plan for how to proceed on an illegal input.

Mathematically, the source language is a set, usually infinite, of strings defined by some finite set of rules, called a grammar. Two separate passes in the front end, called the scanner and the parser, determine whether or not the input code is, in fact, a member of the set of valid programs defined by the grammar.

Programming language grammars usually refer to words based on their parts of speech, sometimes called syntactic categories. Basing the grammar rules on parts of speech lets a single rule describe many sentences. For example, in English, many sentences have the form

Sentence → Subject verb Object endmark

where verb and endmark are parts of speech, and Sentence, Subject, and Object are syntactic variables. Sentence represents any string with the form described by this rule. The symbol “→” reads “derives” and means that an instance of the right-hand side can be abstracted to the syntactic variable on the left-hand side.

Consider a sentence like “Compilers are engineered objects.” The first step in understanding the syntax of this sentence is to identify distinct words in the input program and to classify each word with a part of speech. In a compiler, this task falls to a pass called the scanner. The scanner takes a stream of characters and converts it to a stream of classified words—that is, pairs of the form (p,s), where p is the word's part of speech and s is its spelling. A scanner would convert the example sentence into the following stream of classified words:

(noun,“Compilers”), (verb,“are”), (adjective,“engineered”),(noun,“objects”), (endmark,“.”)

Scanner

the compiler pass that converts a string of characters into a stream of words

In practice, the actual spelling of the words might be stored in a hash table and represented in the pairs with an integer index to simplify equality tests. Chapter 2 explores the theory and practice of scanner construction.

In the next step, the compiler tries to match the stream of categorized words against the rules that specify syntax for the input language. For example, a working knowledge of English might include the following grammatical rules:

The grammar and spelling rules of a programming language constitute its

By inspection, we can discover the following derivation for our example sentence:

RulePrototype Sentence—Sentence1Subject verb Object endmark2noun verb Object endmark5noun verb Modifier noun endmark6noun verb adjective noun endmark

The derivation starts with the syntactic variable Sentence. At each step, it rewrites one term in the prototype sentence, replacing the term with a right-hand side that can be derived from that rule. The first step uses Rule 1 to replace Sentence. The second uses Rule 2 to replace Subject. The third replaces Object using Rule 5, while the final step rewrites Modifier with adjective according to Rule 6. At this point, the prototype sentence generated by the derivation matches the stream of categorized words produced by the scanner.

The derivation proves that the sentence “Compilers are engineered objects.” belongs to the language described by Rules 1 through 6. The sentence is grammatically correct. The process of automatically finding derivations is called parsing. Chapter 3 presents the techniques that compilers use to parse the input program.

Parser

the compiler pass that determines if the input stream is a sentence in the source language

A grammatically correct sentence can be meaningless. For example, the sentence “Rocks are green vegetables” has the same parts of speech in the same order as “Compilers are engineered objects,” but has no rational meaning. To understand the difference between these two sentences requires contextual knowledge about software systems, rocks, and vegetables.

The semantic models that compilers use to reason about programming languages are simpler than the models needed to understand natural language. A compiler builds mathematical models that detect specific kinds of inconsistency in a program. Compilers check for consistency of type; for example, the expression

a ← a × 2 × b × c × d

might be syntactically well-formed, but if b and d are character strings, the sentence might still be invalid. Compilers also check for consistency of number in specific situations; for example, an array reference should have the same number of dimensions as the array's declared rank and a procedure call should specify the same number of arguments as the procedure's definition. Chapter 4 explores some of the issues that arise in compiler-based type checking and semantic elaboration.

Type Checking

the compiler pass that checks for type-consistent uses of names in the input program

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780120884780000013

The Procedure Abstraction

Keith D. Cooper, Linda Torczon, in Engineering a Compiler (Second Edition), 2012

Scope Rules Across Various Languages

Programming language scope rules vary idiosyncratically from language to language. The compiler writer must understand the specific rules of a source language and must adapt the general translation schemes to work with these specific rules. Most alls have similar scope rules. Consider the rules for the languages fortran, c, and Scheme:

fortran has a simple name space. A fortran program creates a single global scope, along with a local scope for each procedure or function. Global variables are grouped together in a “common block”; each common block consists of a name and a list of variables. The global scope holds the names of procedures and common blocks. Global names have lifetimes that match the lifetime of the program. A procedure's scope holds parameter names, local variables, and labels. Local names obscure global names if they conflict. Names in the local scope have, by default, lifetimes that match an invocation of the procedure, The programmer can give a local variable the lifetime of a global variable by listing it in a save statement.

Separate compilation makes it hard for fortran compilers to detect different declarations for a common block in distinct files. Thus, the compiler must translate common-block references into 〈block, offset〉 pairs to produce correct behavior.

c has more complex rules. A c program has a global scope for procedure names and global variables. Each procedure has a local scope for variables, parameters, and labels. The language definition does not allow nested procedures, although some compilers have implemented this feature as an extension. Procedures can contain blocks (set off with left and right braces) that create separate local scopes; blocks can be nested. Programmers often use a block-level scope to create temporary storage for code generated by a preprocessor macro or to create a local variable whose scope is the body of a loop.

c introduces another scope: the file-level scope. This scope includes names declared as static that not enclosed in a procedure. Thus, static procedures and functions are in the file-level scope, as are any static variables declared at the outermost level in the file. Without the static attribute, these names would be global variables. Names in the file-level scope are visible to any procedure in the file, but are not visible outside the file. Both variables and procedures can be declared static.

Static Name

A variable declared as static retains its value across invocations of its defining procedure.

Variables that are not static are called automatic.

Scheme has a simple set of scope rules. Almost all objects in Scheme reside in a single global space. Objects can be data or executable expressions. System-provided functions, such as cons, live alongside user-written code and data items. Code, which consists of an executable expression, can create private objects by using a let expression. Nesting let expressions inside one another can create nested lexical scopes of arbitrary depth.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780120884780000062

Introduction to Industrial Control Systems and Operations

Eric D. Knapp, Joel Thomas Langill, in Industrial Network Security (Second Edition), 2015

Sequential function charts

Another programming language used by PLCs and defined within the IEC-61131-3 standard is “sequential logic” or “sequential function charts (SFC).” Sequential logic differs from ladder logic in that each step is executed in isolation and progresses to the next step only upon completion, as opposed to ladder logic where every step is tested in each scan. This type of sequential programming is very common in batch-oriented operations. Other common languages defined by IEC-61131-3 include “structured text (ST),” “function block diagram (FBD)” and “instruction list (IL)” methods. No matter what programming language is used with a particular PLC, the end goal is ultimately to automate the legacy electromechanical functions common in industrial systems by checking inputs, applying logic (the program), and adjusting outputs as appropriate,4 as shown in Figure 4.4.

The grammar and spelling rules of a programming language constitute its

Figure 4.4. PLC operational flow diagram.

The logic used by the PLC is created using a software application typically installed on an engineering workstation that combines similar tools, or may be combined with other system functions like the HMI. The program is compiled locally on the computer, and then downloaded from the computer to the PLC by either direct serial (RS-232) or Ethernet connections, where the logic code is loaded onto the PLC. PLCs can support the ability to host both the source and compiled logic programs, meaning that anyone with the appropriate engineering software could potentially access the PLC and “upload” the logic.

What are the rules of a programming language called?

Syntax refers to the rules that define the structure of a language. Syntax in computer programming means the rules that control the structure of the symbols, punctuation, and words of a programming language.

Does every programming language has rules governing its word usage and punctuation?

Every programming language has rules governing its word usage and punctuation. Besides the popular, comprehensive programming languages such as Java and C++, many programmers use scripting languages such as Python, Lua, Perl, and PHP. Professional computer programmers write programs to satisfy their own needs.

Is the set of rules that every programming language must strictly follow?

In computer science, the syntax of a computer language is the rules that defines the combinations of symbols that are considered to be correctly structured statements or expressions in that language.

What is the function of a programming language quizlet?

Programming languages can be used to create programs to control the behavior of a machine or to express algorithms.