1. Introduction
A programmable Logic Controller (PLC) is a type of industrial automation technology widely used in harsh operating environments because of its ability to perform reliably under extreme conditions, such as high temperatures, humidity, strong vibrations, and electrical interference [1–3]. PLCs are typically programmed using the IEC 61131-3 standard, which defines a set of programming languages, instructions, syntax, and semantics [4,5]. However, the internal configuration of the PLC’s microprocessor can generate significant latency, which affects the efficiency of instruction execution [6,7]. In industrial applications, speed and reliability are critical, so it is essential to have Central Processing Units (CPU) working as efficiently as possible, allowing more complex algorithms to control processes faster and more precisely [8–11]. To address these issues, modern PLCs employ a System-On-Chip (SoC) approach that leverages hardware description languages (HDLs) to model, synthesize, and quickly reconfigure Field-Programmable Gate Array (FPGA) structures [9,10]. So, the designing of a PLC microprocessor can be approached in various ways:
For example, a code proposed in a programming language for the IEC61131-3 standard can be converted to hardware description languages such as Verilog or VHDL [
14,
15,
16,
17,
18]. Furthermore, these approaches can be executed on architectures synthesized on FPGA [6,16] or designing a PLC in an SoC [
19,
20]. This work presents a new PLC microprocessor designed using Verilog HDL based on the IEC 61131-3 standard. The microprocessor uses a Harvard multicycle architecture, and its instructions are classified according to
Table 1. Specialized timers were included to improve the microprocessor’s efficiency, allowing for more precise control over input and output signals. Using a specialized SoC also provides improved scalability and customization, enabling the creation of PLC systems tailored to specific applications and requirements [
21] s. A Cyclone V FPGA was used to test the new PLC microprocessor, which provides reconfigurable hardware and an internal architecture enabling direct execution of the instruction list (IL). Likewise, Chmiel et al. [
20] analyze the design process for a PLC implemented in an FPGA embedded in a System on Chip (SoC) complying with the IEC 61131-3 standard. Then a superior architecture with extended instructions was contrasted with current state-of-the-art PLC microprocessors (SIMENS S7-1500 family) was reported by Carlos Hernandez et al. [
22].
The developed CPU has three main elements: memories, control units, and execution units. The instruction decoder they propose manages to execute all its instructions in four clock cycles, initialization, decoding, execution, and fetch of new instructions. Shedge et al. [
23] propose designing an instruction list (IL) processor on an FPGA platform dedicated to PLC applications concluding that an instruction list processor is a useful tool for industrial automation systems as it provides a flexible and efficient way to control machinery and equipment. Rudrawar & Sakhare [6] define an IEC 61131-3 compliant instruction set for a processor using an ALU (arithmetic and logic unit). Their proposed architecture is divided into three functional stages, the fetch unit, the decode unit, and the execution unit. So, the objectives of this proposed microprocessor design align with the requirements specified in IEC 61131-3, making it well-suited for integration into industrial automation systems. In addition, it incorporates dedicated hardware enabling the selection of data for operations, thereby enhancing its flexibility. Moreover, the electronic capabilities have been expanded compared to the version described in [
22], which now includes an input/output controller equipped with eight digital inputs and outputs, along with eight analog inputs and one analog output. These enhancements render the proposed microprocessor versatile and well-suited for a wide range of applications. In summary, this work has introduced a microprocessor that showcases significant enhancements in both efficiency and performance, as evidenced by comparisons with references [
20,
21,
22,
23]. These advancements position it as an appealing choice for industrial automation applications, thanks to its ability to execute instructions in a reduced number of clock cycles and the added flexibility conferred by its specialized hardware. In essence, this microprocessor offers a compelling blend of high efficiency and adaptability.
2. Design of the Microprocessor
Figure 1 illustrates the instruction word structure: 32 bits store literals or immediate values for ALU operations, followed by 6 bits for the opcode. The Control Unit incorporates two decoders that rely on the instruction. The architecture uses the two most significant bits to categorize data types, distinguishing between inputs/outputs, memory, and literals. In
Figure 2, the modified Harvard microarchitecture for the PLC microprocessor is depicted, featuring a dual-port (2P) random access memory (RAM) that enables simultaneous two-way read and write operations. The 2P-RAM stores data in separate stacks for words and bits, referred to as current result word (CRW) and current result bits (CRB) stacks. Each stack can hold 128 registers, with a length of 32 bits for CRW and 1 bit for CRB. The program memory is designed to accommodate up to 256 registers, each with a length of 40 bits, and it can be extended based on the application thanks to the flexibility of the FPGA.
The data path that interconnects all the hardware blocks of the PLC is based on a modified Harvard architecture, compatible with the IEC 61131-3 standard, with a set of 32 instructions, described in Table 2.
| LT |
Less than |
111011 |
| LD |
Load bit |
000000 |
| AND |
AND between bits |
000001 |
| OR |
OR between bits |
000010 |
| XOR |
XOR between bits |
000011 |
| ORN |
ORN between bits |
000100 |
| ANDN |
ANDN between bits |
000101 |
| XNOR |
XNOR between bits |
000110 |
| ST |
Bit storage |
000111 |
| LD_NEG |
Load bit denied |
001000 |
| JMP |
Unconditional jump |
111101 |
Figure 3 depicts the schematic illustrating connections between various modules, including the program counter, program memory, control unit, dual-port RAM, I/O controller, data selector, as well as the Word and Bit ALU. Each of these modules will be elaborated upon in the subsequent sections.
Program Counter
An ascending counter employs an 8-bit register for counting, which is incremented by the control unit through the iINCR_PC signal. The control unit also signals when an instruction jump occurs through the iWR_PC signal. In such instances, the eight least significant bits of the instruction dictate the jump address, which is obtained from the input port iJUMP_ADDR. Finally, the output port oADDR is the program memory pointer.
Control Unit
The control unit is an essential component in managing the control signals in the architecture [
24]. The control unit is implemented by a finite state machine (FSM) which controls the execution times of each instruction, and all instructions are performed in a three-stage process, search for new instruction, decoding, and execution, as shown in
Figure 4 These stages are described as follows:
The control unit is crucial to managing control signals within the architecture [
24]. It’s implemented using a finite state machine (FSM) that regulates the timing of each instruction. All instructions follow a simple three-step process: searching for a new instruction, decoding it, and executing it, as shown in
Figure 4. The stages of the instruction cycle are described below:
Search for new instruction: In this state, the signal iINCR_PC activates and increases the program counter, moving it to a new instruction.
Decoding: In this state, the microprocessor deactivates the program counter increment (INC_PC) and turns off signals WR_W and WR_b. These signals are responsible for updating the pointers in the 2P-RAM memory for both the ALU’s current word result (CR_W) and the current bit result (CR_b). Additionally, the signal WR_CC_Enable is activated to write to the command counter. During this state, an instruction reaches the control unit, and the two most significant bits are selected. These bits dictate the type of operations to be performed, with bits 37 to 32 connected to the ALU bit decoder and ALU word decoder. These decoders determine the operation code for the ALUs. Moreover, the data select block determines the input values for the ALU based on the six bits of the operation code.
Execution: After the decoding state, the operation code is linked to the ALU_SEL port of the ALUs, which in turn determines the specific operation to be carried out.
2P-RAM
FPGA-based 2P-RAM plays a crucial role in developing complex systems [
25,
26]. This 2P-RAM comprises two separate ports designed for storing data with varying bit lengths.
Table 3 describes the input and output ports of the 2P-RAM.
Word ALU
The ALU (Arithmetic Logic Unit) is a frequently accessed CPU module actively involved in executing most instructions [
27,
28]. The dedicated word ALU can perform 18 different operations, with operation codes obtained from bits 37 to 32 of the 40-bit instruction, as reported in
Table 4. The ALU has several ports: iEXEC signals when the operands iA and iB are ready for use. Additionally, the 5-bit input port iALU_SEL selects the operation to be performed by the word ALU. It’s worth noting that while 6 bits in the instruction word determine whether an instruction is executed in the word ALU or bit ALU, only 5 bits are used for operation selection during decoding. Within the word ALU, there’s an FPU (Floating Point Unit) responsible for handling floating-point operations. This FPU was designed following the IEC 754 standard [
29], enabling it to perform floating-point addition, subtraction, and multiplication operations.
Bitwise operations ALU
The bit ALU is a crucial component in data processing, responsible for performing various logical operations [
30]. It receives inputs from different ports, including the iA port, which gets data from the data selector, the iB port, which receives data from the current result (CR) stack in RAM, and the iALU_SEL port for instruction codes. The bit ALU is versatile, offering seven different operations, as listed in
Table 5. The results of these logical operations are delivered through the oALU_Out port, making them available for further processing by the data selector or storage in the current register stack. The proposed ALUs execute operations sequentially, utilizing FPGA’s parallelization capability. This enables ALUs to execute multiple operations, facilitating complex data processing crucial for a wide range of applications within a single clock cycle.
Data selector
The data selector block’s ports are detailed in
Table 6. The data selector’s operation begins when the iDEC input port receives a Boolean signal, indicating that the control unit has initiated the decoding process. At this stage, the data selector works in coordination with the control unit to determine which data to use. The two most significant bits determine whether it works with the M0.X and M1.X inputs or the output register blocks (as shown in
Table 7 and
Table 8). Subsequently, the 37 to 32 bits of the 40-bit instruction are examined to decide if data should be stored. If the instruction specifies bit storage, the M0.X memory stores the bits. Data is sourced from the input port iALU_BIT_RESULT and placed in the memory address specified by the instruction’s five least significant bits. This approach is also employed when the instruction indicates writing to an output oQ0.x. In this case, the four least significant bits of the instruction specify the output to be altered. For word storage, the M1.X memory is used if the instruction calls for it. Data from the input port iALU_WORD_RESULT is stored at the address designated by the instruction’s five least significant bits, following a similar process.
When instruction decoding signals the need to read from the M0.X memory bank, the address specified by the five least significant bits of the 40-bit instruction is accessed. The data from this register is then placed in the output port oALU_BIT_DATA_A. Conversely, if the instruction calls for reading an input iI0.X, the input specified by the instruction’s four least significant bits is accessed, and its value is placed in the output port oALU_BIT_DATA_A. Finally, when the instruction indicates a read from the M1.X register bank, the register specified by the address in the instruction’s five least significant bits is read, and its data is deposited into the output port oALU_WORD_DATA_A.
Registers Banks M0.X and M1.X
Two register banks have been created to handle temporary data. The M0.X register bank consists of 32 1-bit registers, while the M1.X bank comprises 32 registers, each with a capacity of 32 bits. Each of these banks manages three input ports and one output port. The iADDR port specifies which register to access, iWrite receives the write enable signal for the register bank, iDATA is where incoming data is received and stored in the register bank, and oDATA serves as the output port for reading data from one of the registers.
I/O Controller
The input/output controller serves as a vital intermediary component bridging the proposed microarchitecture and the microprocessor’s inputs and outputs, ensuring efficient and accurate data transmission [
31,
32]. This controller facilitates the transfer of data for both digital and analog inputs and outputs, with their corresponding ports detailed in
Table 9. Digital inputs and outputs are represented as I0.X and Q0.X, respectively, while analog inputs are labeled as AI0.X, and the sole analog output is AQ0.0. Given the limited number of input and output pins on FPGAs, there can be constraints on the amount of data that can be simultaneously acquired and processed. To address this, an MCP3208 [
33] was incorporated for the analog-to-digital conversion (ADC) process. It enables reading from eight different channels in a multiplexed manner and utilizes the SPI communication protocol for digital data transmission. To match the FPGA clock’s operating frequency, a prescaler was introduced to reduce it from 50MHz to 1MHz, ensuring efficient data reception. A specific communication protocol was designed within the I/O controller to interface with the ADC. The hardware reads the eight input channels and stores the results in a block of registers, which are linked in combination with the output ports of the I/O controller corresponding to the ADC. The digital-to-analog conversion (DAC) employs the MCP4921 [
34] and shares the SPI communication protocol. Unlike the ADC, the DAC features only one output channel. Similar to the ADC, the MCP4921 DAC is prescaled to match the 50MHz working frequency, reducing it to 25MHz. Unlike the ADC, the DAC has a single analog output. Data received via the iAQ0.0 input port of the I/O controller is stored in a register and transmitted as 12-bit data through the SPI serial protocol to the DAC. Hence, our microprocessor is equipped to handle both digital and analog I/O, making it suitable for addressing various industrial applications.
Figure 5 presents the resource utilization of the employed FPGA. The utilization of logic resources remains low, in line with the FPGA’s logical resource capacity (5CSEBA6U23I7). Specifically, it utilizes a total of 2,508 logic elements, amounting to just 6% of the FPGA’s overall capacity. This stands in stark contrast to the work of Delgado del Carpio [
23], who utilized 6,930 logic elements from the available EP4CE10E22C8 FPGA, reflecting a substantial improvement of 63.84% in resource efficiency. Additionally, about memory utilization, our design employs 60,416 memory bits, equivalent to a mere 1% of the FPGA’s capacity. In contrast, Delgado del Carpio’s study reported memory usage of 6,930 bits, constituting 3% of the FPGA’s capacity. This disparity arises from the expanded memory blocks integrated into our proposed microprocessor, which enable the storage of a greater number of instructions in the program memory and provide enhanced storage capacity in data memory, along with expanded working registers. Furthermore, as we did not employ Altera FPUs and instead developed our FPU using Verilog, only 3% of the DPS (Digital Signal Processing) resources were utilized.
Prove of concept and case of studies.
To demonstrate the effectiveness and the potential of our processed microprocessor several cases of studies are reported and discussed.
Simulation of a water heater tank filling system.
The first case of study consists of a filling-the-water tank problem statement, using two valves, two sensors for level monitoring as digital inputs, and a temperature sensor as analogic input, as shown in
Figure 6. A digital output was used for on-off control of the resistance.
The machine code and the instruction list (IL) are presented in
Table 10.
The data collected by a PT100 sensor was used to simulate the analog signal, using the value of 100Ω at 0℃ as a reference [
35]. The resolution of the MCP3208 is 12 bits.
Table 11 shows the characterization of PT100, measuring resistance, voltage, and current at different temperatures, the conversion by mapping the 0-100 °C scale is also described with a representation using the 12-bit resolution. The temperature at 100 °C is used as a reference to establish the sensor’s resolution per degree, resulting in Equation (1).
The sequence of the problem statement and solution is the following:
The system initiates upon pressing the “Start” push button. Once activated, the system becomes operational and remains in motion until the “Stop” pushbutton is pressed.
After the system has been initiated, the lower valve is closed when the low-level sensor detects that the tank is empty. Conversely, when the high-level sensor determines that the tank is full, the upper valve is opened, and subsequently, the upper valve is closed.
When the water tank reaches full capacity, an action is triggered for the heating element based on the temperature. If the temperature falls below 100 °C (equivalent to 3434 ADC Digit), the heating element is activated. If the temperature exceeds 150 °C (corresponding to 3183 ADC Digit), the heating element is deactivated, and the lower valve is opened to release the hot water.
INPUTS
Start = I0
Stop = I1
Low-level sensor = I2
High-level sensor = I3
Temperature sensor (Analog) = AI0
OUTPUTS
Lamp Start = Q0
Lamp Stop = Q1
Valvule1(upper valve) = Q2
Valvule2(lower valve) = Q3
Resistor = Q4
After developing the code, it was loaded into the instruction memory and deployed on the De10-nano development board, which features the Cyclone V 5CSEBA6U23I FPGA-based chip. Given that the case study entailed the analysis of a time-varying analog input, it was crucial to visualize signal changes with minimal latency. For this purpose, SignalTap II Logic, an Analyzer Editor tool, proved to be an excellent choice for debugging FPGA designs. This tool is an integral part of the Quartus II software package developed by Intel FPGA. It enables real-time, high-speed debugging of the microarchitecture design, facilitating the verification of correct microprocessor operation during implementation. It’s worth noting that the architecture design incorporated external hardware for converting analog signals to digital, as detailed in
Section 2 under the I/O controller subtopic.
Simulation and implementation of a sorting system for metallic and non-metallic objects
The second case study consists of a metal parts sorting system used by the FESTO conveyor belt station, including three-color lamps, a presence detector, an inductive sensor, and a solenoid as the actuator, as shown in
Figure 7.
The sequence of the problem statement and solution is the following.
The tape’s motion should commence upon pressing a “Start” pushbutton and continue until a second “Stop” pushbutton is pressed.
The system incorporates two sensors: a presence sensor and an inductive sensor, to distinguish between metallic and plastic products. Additionally, two pushbuttons are employed—an ordinarily open pushbutton serves as the start button, while a normally closed pushbutton functions as the stop button.
Actuation is achieved through the use of a gear motor to drive the belt and a solenoid to divert metal parts. Furthermore, a three-color tower lamp serves as an indicator.
INPUTS
Start = I0
Stop = I1
Inductive sensor = I3
Presence sensor = I4
OUTPUTS
Solenoid = Q0
Geared motor = Q1
Lamp Color green = Q2
Lamp Color Amber = Q3
Lamp Color red = Q4
For the code describing the second case study,
Table 12 displays the results of the first four instructions required to implement motor interlocking. The code utilizes two 8-bit registers: I0_X, representing eight digital inputs, and Q0_X, representing eight digital outputs. In the simulation, I0 represents a normally open pushbutton (START), while I1 represents a normally closed pushbutton (STOP), both set to high initially. The system begins by loading the value of START (I0) and performing an OR operation with the output Q1, which controls the motor.