Data Processing: Glossary

Key Points

Processing Text Files with grep and awk
  • Both grep a awk are powerful tools available from the command line

Using Regular Expressions with Python
  • Python supports basically the same regular expression syntax as Perl. However, the syntax changes between Perl and Python and the way of using regular expressions is substantially different.

Structurated Text (XML and JSON)
  • XML is simpler than SGML, but JSON is much simpler than XML. JSON has a much smaller grammar and maps more directly onto the data structures used in modern programming languages. Both format are commonly used as exchange format for transferring data between applications.

Binary formats: NetCDF and HDF5
  • Storing large numerical arrays as text is highly inefficient both for humans and machine. NetCDF and HDF5 are de-facto standards for storing numerical data and metadata.

Creating simple Databases with SQLite
  • SQLite is a C library that provides a lightweight disk-based database that doesn’t require a separate server process and allows accessing the database using a nonstandard variant of the SQL query language.

No-SQL databases with MongoDB
  • MongoDB is a free and open-source cross-platform document-oriented database program. It stores data in JSON-like documents and fits easily for storing research data

Machine Learning: (scikit-learn, Keras, and TensorFlow)
  • Neural Networks and Deep Learning are active topics of development nowadays, TensorFlow and Keras are just two popular tools on that area.

Glossary

FIXME