fix several errors

This commit is contained in:
Юлия Строева 2023-06-15 21:45:18 +04:00
parent ac0b1636a9
commit 0a2c4e3e6e

View File

@ -90,7 +90,7 @@ The paper \cite{ali2011overview} presents an review of approaches and software t
In the article \cite{chae2013software}, the authors analyze borrowings in the source code according to the sequences of using external programming interfaces (external dependencies) and the frequency of such calls. This method is not suitable for solving the problem of this study because of the educational orientation. Some student projects can not use external dependencies.
Thus, it is necessary to develop an approach to the search for structurally similar projects, which are focused on simple software systems and a high speed of analysis.
Thus, it is necessary to develop an approach to the search for structurally similar projects, which are focused on simple software systems and a high speed of analysis. A set of source code files will be considered as projects.
\section{The Proposed Algorithm for Analyzing the Structure of the Source Code}
@ -112,6 +112,7 @@ We developed an algorithm to extract the structure of the project in the source
\begin{equation*}
N^{Class} = \lbrace N_{i} \in N | F \left( N_{i}.data \right) = \text{`Class'} \rbrace,
\end{equation*}
\item Select nodes with the `Class field' as the $N^{Vars}$ set from the $N^{Class}$ set:
\begin{equation*}
N^{Vars} = \lbrace N_{i}^{Class} \in N^{Class} | F \left( N_{i}^{Class}.data \right) = \text{`Field'} \rbrace,
@ -132,17 +133,40 @@ We developed an algorithm to extract the structure of the project in the source
\item Save the resulting AST in a graph database (GDB).
\end{enumerate}
Figure \ref{fig:SourceCodeAndAST} shows a fragment of the source code and the resulting AST.
In this algorithm, F (*) is a search function that finds nested nodes. The function input is a node or subtree, and the output is a node of the desired type.
Figure \ref{fig:SourceCodeAndAST} shows the resulting AST for the following source code:
\begin{lstlisting}
package com.example.demo.simple;
public class Main {
private String a;
void run() {
while(true) {
int a = 1;
if (a == 1) {
this.show("Hello")
}
}
int c = "Foo";
}
void show(String text) {
System.out.println(text);
}
}
\end{lstlisting}
\begin{figure}
\centering
\includegraphics[width=0.71\textwidth]{images/SourceCodeAndASTVert.png}
\includegraphics[width=0.71\textwidth]{images/AST.png}
\caption{Sample source code and its AST.} \label{fig:SourceCodeAndAST}
\end{figure}
GDB is a non-relational type of database based on the topographic structure of the network. Graphs represent sets of data as nodes, edges, and properties. GDBs are more flexible than relational databases. GDBs are more flexible than relational databases and allow you to fast obtain data of various types, considering numerous relations.
We use the Neo4j GDB as the data storage. Neo4j has a high speed of operation even with a large amount of stored data.
We use the Neo4j \cite{ref_neo4j} GDB as the data storage. Neo4j is a graph database management system. Neo4j stores nodes, edges connecting them, and attributes of nodes and edges. Neo4j has a high speed of operation even with a large amount of stored data.
\section{The Proposed Algorithm for Detecting the Structural Similarity of Software Projects}
@ -265,14 +289,14 @@ Thus, we can calculate the number of matching and not matching paths (see eq. \r
\begin{figure}
\centering
\includegraphics[width=\textwidth]{images/ExampleSystem.png}
\includegraphics[width=\textwidth]{images/ExampleSystemEng.png}
\caption{The main form of the developed system.}
\label{fig:ExampleSystem}
\end{figure}
\section{Experiments}
We conducted experiments to evaluate the speed of source code analysis. We calculated the results relative to the number of lines of code and the number of files in the analyzing project. The main aim of the experiment is to determine the speed of the algorithm, considering the average number of lines of code processed per minute. We used the IntelliJ IDEA Statistic plugin to get the data for the experiment.
We conducted experiments to evaluate the speed of source code analysis. We calculated the results relative to the number of lines of code and the number of files in the analyzing project. The main aim of the experiment is to determine the speed of the algorithm, considering the average number of lines of code processed per minute. We used the IntelliJ IDEA Statistic plugin \cite{ref_Statistic} to get the data for the experiment. The plugin allows you to calculate the number, size, number of lines, average value and other information for each file in the project. You can also find out the total number of rows, the number of lines of code, the proportion of lines of code, the number of comment lines, the proportion of comment lines, etc.
We selected 10 random Java projects for this experiment. Table \ref{tab:speed} presents the results of experiments for analyzing the speed of the proposed algorithm.