Difference between revisions of "Source code"

From Emulation General Wiki
Jump to navigation Jump to search
(I explained the languages more.)
m
(9 intermediate revisions by 8 users not shown)
Line 1: Line 1:
'''Source code''' is any collection of computer instructions written using some [[human-readable]] computer language. The source code is often transformed by a [[compiler]] program into low-level [[machine code]] understood by the computer. Alternatively, an [[interpreter (computing)|interpreter]] can be used to analyze and perform the outcomes of the source code program directly on the fly.
+
'''Source code''' is a collection of text files containing instructions that a computer either runs as-is (interpretation) or translates into an executable file beforehand (compilation / assembly). Source code is written using a human-readable computer language so that it can be translated by a compiler or an interpreter. It's also possible to decompile an executable file, though decompilers aren't as common.
  
Software, and its accompanying source code, typically falls within one of two licensing paradigms: [[open source]] and [[proprietary software]]. Software is ''open source'' if the source code is free to use, distribute, modify and study, and ''proprietary'' if the source code is kept secret, or is privately owned and restricted.  
+
The language used changes how the source code is read, and just like emulation it too has its own high and low-level types. If a program were written in an assembly language (which often involves taking advantage of system-specific attributes), an assembler would write the machine code "word for word," reflecting how the machine takes instructions. If a program were written in a low-level language, a compiler would read the code and translate the equivalent in machine code. If a program were written in a high-level language, it would often work without requiring a compiler. Some compilers and interpreters also do error checking to make sure the programmer's code is either properly written or formatted. Many languages also check that the code won't inherently cause bugs, such as Rust.
  
[[Porting]] software to other computer platforms is usually prohibitively difficult without source code. Without the source code for a particular piece of software, portability is generally computationally expensive.{{Citation needed|date=October 2008}} Possible porting options include [[binary translation]] and emulation of the original platform.
+
Software can be ported to other types of computers but, without the source code, it's often prohibitively difficult to do. Other ways to port software include binary translation and platform emulation.
  
== Licensing ==
+
==Language levels==
 +
Software can be programmed in many different languages (even multiple in one program), and just like [[High/Low level emulation|high and low level emulation]], they have different levels of abstraction. Here are the different ones, from lowest to highest.
  
Emulator software may be open source or closed source. There are many advantages for console emulators:
+
===Assembly===
 +
Assembly is the closest representation of machine code without being machine code. There are basically no abstractions from the architecture, meaning everything is close to what the machine processes. This used to be ideal for platforms at a time when compilers weren't optimized enough to give equivalent performance to assembly, and as a result you'd find that early console games were programmed in assembly more often than higher level languages. Assembly is commonplace in [[dynamic recompilation]] as well because it allows developers to optimize code closer for an architecture than even a low-level language like C or C++.
 +
 
 +
===Low===
 +
A low-level language allows programmers to get closer to the system they work on, taking advantage of architecture or platform-specific quirks without having to learn the architecture like assembly. Low-level languages have the advantage that they're easier to port to other platforms by nature of being more abstract from the hardware.
 +
 
 +
Examples of low-level languages include (but are in no way limited to) C and C++.
 +
 
 +
===Medium===
 +
Medium-level languages have attributes of both low and high-level paradigms like Rust (which is designed to be performant and system-focused but also memory safe). Some high-level languages can also be lower than others.
 +
 
 +
===High===
 +
High-level languages push away most system specific quirks in favor of instructions intended to work on any platform. This was pioneered by Java, whose goal was for developers to "write once, run anywhere".
 +
 
 +
In high-level languages, many of the same instructions can be run across different architectures and platforms. They may have a compiler, a compiler cache, a [[dynamic recompilation|dynamic recompiler]], and/or an interpreter.
  
* Abandoned open source projects can be picked up by other dev teams. Abandoned closed source projects cannot be updated.
+
===Esoteric===
* Easy forking and customization of projects
+
[[wikipedia:Esoteric programming language|Esoteric languages]] are built around a specific idea or a joke, as part of a challenge. These languages are intended to be comedic, confusing, and/or thought-provoking.
* Allows others to examine the source code and offer input, or to fix bugs.
 
  
Many of the most successful emulation projects are ones that are open source.
+
One example includes [[wikipedia:Brainfuck|Brainfuck]], a <abbr title="Meaning it can solve any problem a Turing machine can.">Turing-complete</abbr> programming language with only eight one-character commands (as opposed to the thousands of standard languages and architectures) and one instruction pointer. Another is [[wikipedia:Shakespeare Programming Language|Shakespeare]], a programming language designed to resemble a Shakespearean play. There's also [https://github.com/dylanbeattie/rockstar Rockstar], a language designed around "the lyrical conventions of 1980s hard rock and power ballads", meant to lampoon the software industry's use of "rockstar developers" in recruiting.
  
==Languages==
+
==Version control==
Include stuff about the advantages and disadvantages of each.  
+
Version control refers to the management of data as it changes. A version control system is a program that tracks changes in data. Its most common use is to allow programmers to collaborate on a source code repository without accidentally ruining any components. There are several version control systems, but the most ubiquitous by virtue of ties to the Linux kernel is Git, so much so that a ton of services are built around Git, like GitHub and GitLab. Other systems include CVS (the very first of its kind), Subversion, and another developed alongside Git called Mercurial.
  
===Assembly===
+
==Licensing==
Assembly, being tied to the machine, has the potential of fastest code. However, Assembly language is also tied to machine code, making Assembly language very difficult in programming. Also, due to its being tied to the machine, Assembly code has to be recoded into another language if the programmer wants to use the emulator in another machine, even if the operating system is the same.
+
Software is copyrightable, but the source code can be made available to users however the author chooses. A copyright license is a legal document that tells people how the software can be used and what limitations come with using it.
  
===Java===
+
;Public domain:There is no copyright (i.e. No Rights Reserved). Works enter the public domain when they:
Java is a high-level language. Code written in Java can be run anywhere due to the ubiquitousness of interpreters and is relatively easy to code. However, the Java language is notoriously prone to security exploits, sometimes day-1 exploits.
+
:# were released before the current copyright expiry date. This is why old paintings, plays, and books are so commonly quoted and used in modern works, because they'd have to negotiate the rights with the author otherwise. Most software is not released this way because it is still covered by the current American copyright term.
 +
:# are dedicated through a license like Creative Commons Zero or the Unlicense. This is the only option for modern works to be released into the public domain because, per the Berne Convention, copyright is seen as opt-out, not opt-in. If a public domain dedication can't be made (probably because the jurisdiction doesn't recognize the public domain), then the license grants users the equivalent freedoms.
 +
;Open-source:The program is released under a copyright license that permits four freedoms: that it can be run at any time, studied and modified for the user's own purposes, distributed to anyone, and improved for everyone else. This bypasses most of the issues encountered with public domain works. For anything else copyrightable, the term "open content" often applies.
 +
:It's worth noting that open-source does not replace copyright. And likewise, the license cannot be removed after the work has been released under it. To see the various open-source licenses available, see [https://choosealicense.com/ choosealicense.com]. Also see [https://choosealicense.com/appendix/ the appendix] at the same website.
 +
;Source-available:The program is released under a copyright license more restrictive than an open-source license, but the source code is still publically available. The biggest example is [[Snes9x]], which is released under a non-commercial license. This license makes it not open-source, as it restricts the users' commercial use.
 +
;Closed source / Proprietary:The program's source code isn't available. Often because the ecosystem behind the platform is closed, sometimes by nature (like Windows and Android), or sometimes by force (like every modern console).
 +
:;Freeware:The source code isn't available but the program is still free.
 +
:;Shareware / Trialware:A limited demo version of the program is free. This was common for [[Intel CPUs|DOS]] games.
  
===C++===
+
The more successful emulation projects are often open source (though you definitely will find exceptions).
C++ can be considered a compromise between Assembly and Java, despite C++ being older than (and is an antecedent to) Java. C++ compilers are ubiquitous, hence compiling C++ source code is a readily-available task. C++ is also one of the fastest 3rd generation languages. Also, most programmers already have a working knowledge of C++. However, writing in C++ is still quite complicated in coding. Also, opposed to Java, C++ code has to be explicitly complied before being able to work, lengthening turnaround times.
 
  
== References ==  
+
==See also==
 +
* [[Dynamic recompilation]]
 +
* [[ROM Hacking Resources]]
 +
<!--
 +
==References==
 
{{reflist}}
 
{{reflist}}
* (VEW04) "Using a Decompiler for Real-World Source Recovery", M. Van Emmerik and T. Waddington, the ''Working Conference on Reverse Engineering'', [[Delft]], [[Netherlands]], 9–12 November 2004. [http://www.itee.uq.edu.au/~emmerik/experience_long.pdf Extended version of the paper].
+
-->
 
+
[[Category:FAQs]]
== External links ==
 
* {{cite web| title=Obligatory accreditation system for IT security products (2008-09-22), may start from May 2009, reported by Yomiuri on 2009-04-24.|url=http://www.metafilter.com/75061/Obligatory-accreditation-system-for-IT-security-products|publisher=MetaFilter.com|accessdate=2009-04-24}}
 
* [http://rosettacode.org/wiki/Main_Page Same program written in multiple languages]
 

Revision as of 17:24, 4 February 2019

Source code is a collection of text files containing instructions that a computer either runs as-is (interpretation) or translates into an executable file beforehand (compilation / assembly). Source code is written using a human-readable computer language so that it can be translated by a compiler or an interpreter. It's also possible to decompile an executable file, though decompilers aren't as common.

The language used changes how the source code is read, and just like emulation it too has its own high and low-level types. If a program were written in an assembly language (which often involves taking advantage of system-specific attributes), an assembler would write the machine code "word for word," reflecting how the machine takes instructions. If a program were written in a low-level language, a compiler would read the code and translate the equivalent in machine code. If a program were written in a high-level language, it would often work without requiring a compiler. Some compilers and interpreters also do error checking to make sure the programmer's code is either properly written or formatted. Many languages also check that the code won't inherently cause bugs, such as Rust.

Software can be ported to other types of computers but, without the source code, it's often prohibitively difficult to do. Other ways to port software include binary translation and platform emulation.

Language levels

Software can be programmed in many different languages (even multiple in one program), and just like high and low level emulation, they have different levels of abstraction. Here are the different ones, from lowest to highest.

Assembly

Assembly is the closest representation of machine code without being machine code. There are basically no abstractions from the architecture, meaning everything is close to what the machine processes. This used to be ideal for platforms at a time when compilers weren't optimized enough to give equivalent performance to assembly, and as a result you'd find that early console games were programmed in assembly more often than higher level languages. Assembly is commonplace in dynamic recompilation as well because it allows developers to optimize code closer for an architecture than even a low-level language like C or C++.

Low

A low-level language allows programmers to get closer to the system they work on, taking advantage of architecture or platform-specific quirks without having to learn the architecture like assembly. Low-level languages have the advantage that they're easier to port to other platforms by nature of being more abstract from the hardware.

Examples of low-level languages include (but are in no way limited to) C and C++.

Medium

Medium-level languages have attributes of both low and high-level paradigms like Rust (which is designed to be performant and system-focused but also memory safe). Some high-level languages can also be lower than others.

High

High-level languages push away most system specific quirks in favor of instructions intended to work on any platform. This was pioneered by Java, whose goal was for developers to "write once, run anywhere".

In high-level languages, many of the same instructions can be run across different architectures and platforms. They may have a compiler, a compiler cache, a dynamic recompiler, and/or an interpreter.

Esoteric

Esoteric languages are built around a specific idea or a joke, as part of a challenge. These languages are intended to be comedic, confusing, and/or thought-provoking.

One example includes Brainfuck, a Turing-complete programming language with only eight one-character commands (as opposed to the thousands of standard languages and architectures) and one instruction pointer. Another is Shakespeare, a programming language designed to resemble a Shakespearean play. There's also Rockstar, a language designed around "the lyrical conventions of 1980s hard rock and power ballads", meant to lampoon the software industry's use of "rockstar developers" in recruiting.

Version control

Version control refers to the management of data as it changes. A version control system is a program that tracks changes in data. Its most common use is to allow programmers to collaborate on a source code repository without accidentally ruining any components. There are several version control systems, but the most ubiquitous by virtue of ties to the Linux kernel is Git, so much so that a ton of services are built around Git, like GitHub and GitLab. Other systems include CVS (the very first of its kind), Subversion, and another developed alongside Git called Mercurial.

Licensing

Software is copyrightable, but the source code can be made available to users however the author chooses. A copyright license is a legal document that tells people how the software can be used and what limitations come with using it.

Public domain
There is no copyright (i.e. No Rights Reserved). Works enter the public domain when they:
  1. were released before the current copyright expiry date. This is why old paintings, plays, and books are so commonly quoted and used in modern works, because they'd have to negotiate the rights with the author otherwise. Most software is not released this way because it is still covered by the current American copyright term.
  2. are dedicated through a license like Creative Commons Zero or the Unlicense. This is the only option for modern works to be released into the public domain because, per the Berne Convention, copyright is seen as opt-out, not opt-in. If a public domain dedication can't be made (probably because the jurisdiction doesn't recognize the public domain), then the license grants users the equivalent freedoms.
Open-source
The program is released under a copyright license that permits four freedoms: that it can be run at any time, studied and modified for the user's own purposes, distributed to anyone, and improved for everyone else. This bypasses most of the issues encountered with public domain works. For anything else copyrightable, the term "open content" often applies.
It's worth noting that open-source does not replace copyright. And likewise, the license cannot be removed after the work has been released under it. To see the various open-source licenses available, see choosealicense.com. Also see the appendix at the same website.
Source-available
The program is released under a copyright license more restrictive than an open-source license, but the source code is still publically available. The biggest example is Snes9x, which is released under a non-commercial license. This license makes it not open-source, as it restricts the users' commercial use.
Closed source / Proprietary
The program's source code isn't available. Often because the ecosystem behind the platform is closed, sometimes by nature (like Windows and Android), or sometimes by force (like every modern console).
Freeware
The source code isn't available but the program is still free.
Shareware / Trialware
A limited demo version of the program is free. This was common for DOS games.

The more successful emulation projects are often open source (though you definitely will find exceptions).

See also