Skip to content

Reference

incoder edited this page Mar 6, 2018 · 10 revisions

(Under development)

About IO

IO is a Jupiter satellite (Jupiter Moon), and in the same time the abbreviation of Input/Output. Library is designed to provide C++ universal input and output with well known industry standard data formats, like: text XML and JSON or binary ASN1 or Google Protocol buffers.

At the moment only XML is implemented. Another data-formats is under construction.

IO is cross platform and cross compiler C++ 11+ run-time library. At the moment following configuration supported (tested):

Building and installing library

IO is a run-time library can be used as static or dynamic (DLL or shared) library. In order to use it, you need to build it first. There are several build options, which were explained in README.md This section explains preconditions.

Microsoft Windows with GCC (MinGW64) and MSYS2

Download and install MSYS2 if not yet. Install Gnu Compiler Collection (GCC) as described at
MSYS2 documentation.

Then install GNU lib-iconv and gnutls-devel packages using pacman, using following commands:

pacman -S pack-config mingw-iconv gnutls-devel

Now you can build library using GNU make or cmake like described in README.md

Microsoft Windows with Microsoft Visual C++ (Visual Studio)

To build library using Microsoft C++ compiler you need Visual Studio or Vistual Studio Build tools version 15 or newer with C++ compiler option installed. Building with nmake described at README.md. If you'd like to build with cmake, it is considered to use cmake-gui.

Linux/POSIX

Make sure that you have installed following:

  1. GCC with G++ packages, version 4.7 is minimal for C++ 11. It is recommended to use GCC 5 +
  2. Development package of GNU TLS library, version 3.0 is minimal required version
  3. Check pkg-config package installed

Now you can build with GNU make or CMake as described at README.md.

Byte buffer

Byte buffer is dynamic memory array container with uint8_t background data array. Unlike std::vector or another STL containers byte buffer designed especially for input and output operations.

Difference from std::vector<uint8_t>

  • Byte buffer is non copyable, but movable. E.g you if you need put a buffer a function parameter you should use reference or move reference;
  • Unlike Vector byte buffer can not extend underlying dynamically allocated memory block without your direct instructions. Memory allocation and reallocation is under your direct control;
  • Unlike any STL containers all byte buffer method never throws, including std::bad_alloc, and can be used when exceptions support is off by compiler options;
  • Byte buffer have no begin and end iterators. Position and last iterators exist instead. Position is set on the first byte where data can be pushed into, and the last shows on last pointer just after last buffer filled byte.

Typical usage of byte_buffer:

std::error_code ec; io::byte_buffer buff = io::byte_buffer::allocate(ec, 1024); io::check_error_code(ec); buff.put(123); buff.put("Hello world!"); buff.put(123.456); buff.flip(); out->write(ec, buff.position.get(), buffer.size() ); io::check_error_code(ec);

or

std::error_code ec; io::byte_buffer buff = io::byte_buffer::allocate(ec, 1024); std::size_t read = in->read(ec, buff.position.get(), buff.capacity() ); io::check_error_code(ec); buff.move(read); buff.flip(); out->write(ec, buff.position.get(), buffer.size() ); io::check_error_code(ec);

Generic Input and output interface

According Bjarne Stroustrup C++ do not have input and output operators, and it can be implemented by the language itself. With is means that input and output can be implemented by the library, as well as C++ standard library have input and output streams as the reference implementation.

What is wrong with the C++ standard library streams?

  • Streams were designed for text data formats, i.e. to replace C printf family functions. Thay are not really comfortable for working with binary data.
  • Streams have some strange abstraction design. For instance streams depends on underlying stream buffer implementation, which is mean a data processing algorithm have a dependency on data obtaining algorithms. This may lead to a bad or over complicated software design.
  • Streams do not have comfortable error handling methods, e.g. it is exception model based mostly.
  • Streams have "close" method which may lead to unexpected issues in data processing application logic. For example closed stream object used for obtaining data. This is especially risky for multi-threading data processing algorithms.

IO input output design principles

  • A generic API for input/output operations whenever data comes from or should be put into
  • Resource Acquisition Is Initialization (RAII) - call to a constructor or factory method obtains input/output resource when destructor call closes resource;
  • Input/Output errors is not an exceptional case and should be easily handled and processed;
  • Call to any input or output method should be exceptions safe, i.e. should not throw (noexcept) and guaranty for not throwing;
  • Should be an option to build input/output error handling i.e. use or not to use C++ exceptions;
  • Textual input/output should be based on top of binary API;
  • C++ 11 language and standard library features like noexcept, movable semantic with perfect forwarding, as well as system error should be used were relevant;
  • Smart pointers (smart references) rather than raw pointer on reference or scoped resource owning like foo(std::fostream&) for polymorphic class objects

Implementation principles

  • Compile twice run everywhere
  • No side effects where possible, or guaranty for thread safety where no possibility to avoid side effects
  • Constructors and destructor must not throw
  • Reference implementation streams should use system operating system calls rather than C library, to avoid C library file buffers overheads;
  • Performance does meter, but with compromise with the development effort
  • As less external dependencies as possible, to avoid DLL hell

Smart pointer with intrusive reference counting strategy

C ++ Technical Report 1 introduces generic smart pointer library addition into C ++ standard library. Interfaces was moved from Boost smart pointer. However for intrusive reference counting strategy implemented over the make_shared/enable_shared_from_this instead of boost :: intrusive_ptr template. make_shared/enable_shared_from_this together with shared_ptr have a benefit - it is generic useful with the legacy code without modifying it. And in the same time is uses 8 or 16 additional bytes peer each smart reference for 32 and 64 bit CPU architecture correspondingly. IO is new library, so it can save some memory using boost:intrusive_ptr.

If you using boost in your project, you can define IO_HAS_BOOST macro and let IO build system know about boost (cmake will pick up it automatically if boost is available). Otherwise IO will use embedded intrusive_ptr extracted from boost.
IO provides type definitions for smart references to avoid long type names like boost::intrusive_ptr<io::read_channel> or multiple unreadable definitions like auto rch = f.open_for_read(ec);

Short names always have next pattern s_[reference_type_name] for example s_read_channel.

IO also provides base class for simplify implementing reference on implementation with intrusive reference counting strategy called io::object.

If you implementing your own interface which is expected to be used with intrusive smart pointer you can use following technique:

class my_operation: public io::object { public: constexpr my_operation() noexept: io::object() {} void foo() {} virtual ~my_operation() noexcept override; }; my_operation::~ my_operation() {} DECLARE_IPTR(my_operation); s_my_operation op(new (std::nothrow) my_operation() ); if(op) op->foo();

Channels

The channles.hpp header defines pure virtual classes (interfaces) with generic input output API.
Core interfaces are read_channel and write_channel. This is synchronous low level binary input/output operations. There is read_write_channel for resources like TCP network sockets and random_access_channel interface for resources like a file.

Transferring data between read and write channels.

If you need some simple operation like copy a file or write socket input into a file you can use transfer function. transfer taking a source input channel and destination write channel, together with error code and temporary buffer size as function arguments, and doing data tranfering loop.

Compatibility with C++ standard library streams

<stream.hpp> header constants templates which can be used to fast build std::istream and std::ostream streams on top of read and write channels. There is pre-defined type definitions for char, wchar_t, char16_t and char32_t streams. See iostreams example.

Files

Most common usage of input/output system is files. files.hpp header provides operating system depended file descriptor API. File class has same API for Microsoft Windows or POSIX like operating system (GNU/Linux, MacOS X, FreeBSD etc) but different implementations. You can – check file existence, create a new file and open it for: reading, writing or read/write mode with random access.

Network Sockets

IO provides TCP/IP sockets channel. Implementation is bases on system sockets API. I.e. winsocks2 on MS Windows and Berklay sockets on POSIX. At the moment only synchronous TCP client side socket channel implemented.

sockets.hpp header declares networking interfaces

Unlike asio/boost asio or C++ 17 standard library networking extension (c++ 17 asio) IO doesn't care about ipv4 or ipv6 on API level. You can open a read_write channel from IP V4 or IP V6 ip address or a NAT/DNS host name.

For example:

std::error_code ec; const io::net::socket_factory *sf = io::net::socket_factory::instance(ec); io::check_error_code(ec); io::s_socket s = sf->client_tcp_socket(ec, "google.com", 80); io::check_error_code(ec); io::s_read_write_channel raw_ch = tpc_socket->connect(ec);

SSL and TLS security channels (not provided for MS VC++)

If you need a secured encrypted TCP/IP i.e. TLS/SSL sockets you can use the <net/secure_channel.hpp> implementation. This implementation is build on top GNU TLS library.

Generally obtaining a secure channel looks like following:

const io::net::secure::service *sec_service = io::net::secure::service::instance(ec); io::check_error_code( ec ); io::s_read_write_channel raw_ch = tpc_socket->connect(ec); io::check_error_code( ec ); io::s_read_write_channel sch = sec_service->new_client_connection(ec, std::move(raw_ch) );

Alternatively you can implement it on some another TLS/SSL implementation like Botan, OpenSSL, embed tls etc. It is not recommended to use windows schannel/sspi since TLS version 1.2 is provided starting from newest Windows 10.

Unique Resource Identifier (URI)

IO has a class for work with Unique Resource Identifiers (URI/URL). Interface can be found at <net/uri.hpp> header. Class is able to split/parse a URI on sections and contains a list of ports for well known protocols.

Error handling

Generally IO API using system error C++ standard library functionality for handing errors, rather then exceptions. If a function can fail for some reason, for example hardware or networking issues during input/output or out of memory state this function taking reference to a std::error_code as a first argument. You can handle the error according your requirement, and in the same time when you are ok with the program termination on error you can use check_error_code(std::error_code&) function. This function check the error code, and if it identify an error error message will be print into process error stream. Windows GUI application will show this error using MessageBoxEx pop-up dialog. Then check_error_code normally exit the current process with returning error code as the process exit result if C++ exceptions off when IO binary build. When exceptions was on, error_check will throw std::system_error.

unsafe wrapper template

If you don't want to call check_error_code each time you are calling some read or wite method sort of API have a unsafe wrapper template. For example code without unsafe will looks like following:

std::error_code ec; std::size_t read = in->read(ec, array, bytes); io::check_error_code(ec); std::size_t written = 0, wrt; while(written != read) { wrt = out->write(ec, array, read-written); io::check_error_code(ec); writtein += wrt; array += wrt; }

And with unsafe will looks like: std::error_code ec; io::unsafe<io::read_channel> src( std::move(in) ); io::unsafe<io::write_channel> dst( std::move(out) ); std::size_t read = src.read(ec, array, bytes); std::size_t written = 0, wrt; while(written != read) { wrt = out.write(ec, array, read-written); writtein += wrt; array += wrt; }

Console framework

IO provides a console read and write channels for console (terminal) mode. To access console API include <console.hpp> header. IO console have next advantages over the standard library std::cin/std::cout/std::cerr streams.

  • Locale is fully under your control, i.e. nether C library nor default std::imbique used.
  • Support for colored input output
  • Support for UNICODE input and output including windows console
  • You can output a huge amount of text into console without multiple flushing stream buffers on their overflow

Console have next disadvantages over standard streams

  • On Windows piping is not working. I.e. myapp.exe >> log.txt will produce a 0 bytes file;
  • On Windows a GUI application will allocate a console i.e. application windows + an additional console window;
  • If you put some binary data like float or integers directly to console binary channels, without converting them into string values result is undefined;

Alternative cin, cout and cerr

There are standard library like console streams provided by io::console class. All of them supports UTF-8 input data, character set reconverting (trans-coding) will be done automatically.

Text

Character sets and UNICODE

IO contains the API for converting string data between different code-pages. Conversations is build on top if iconv raw C API. POSIX libc/libc++ provides iconv out of the box, when MS Windows needs iconv as an additional dll. If you simply need to convert between const char*/const wchar_t*/const char16_t*/const char32_t* raw C character arrays you can use transcode family functions can be found at charsetcvt.hpp header. API for standard library strings std::string can be found at <text.hpp> header. See chconv example

Non cryptographic string hashing

If you need to have a predictable and fast non cryptographic hash functions for strings or arrays of raw data, you can found an API in hashing.hpp header. hash_bytes function provides MurMur32 hash function for 32-bit CPU architecture, and Google City Hash for 64-bit CPU architecture.

Constant string

constant_string is container for dynamically allocated raw C style zero ending string. const_string is nether std::string nor C++ 17 string_view. It is considered you will store some UTF-8 character in this string. Benefit from this class is following:

  • immutability - can be used as a class field
  • Works like intrusive smart pointer
  • Ability to convert into mutable standard library string of UTF-8, UTF-16[LE|BE] and UTF-32[LE|BE] (not exception safe)

Save memory with dynamic string pooling

If you program expecting to work with big amount of same strings allocated in dynamic memory, you can use IO string pooling. String pooling is build on top std::unordered_map and storing cached_string classes. cached_string in many aspects is similar to const_string, but unlike const_string a few functions implemented differently. For example comparing to cache_string object will bring to comparing to underlying pointers instead of referencing to std::strcmp. See stringpool example for more details

XML

IO contains functionality is reading and writing eXtensible Markup Language (XML) data format. IO XML is differ from most another C/C++ librarians for XML parsing and XML processing.

What is inside

  • Java like Streaming API for XML parsing StAX (Pool parsing API)
  • No any C/C++ dependencies on another XML libraries. E.g. IO XML is not a expat/msxml/libxml2 etc wrapper.
  • XML reader API to simplify reading XML into POCO structures or a primitive classes
  • Support for exceptions and rtti compiler options off for XML parsing and XML reading
  • Writing XML from POCO classes using template meta-programming techniques, XML format can be specified i.e. use tags or tag attributes
  • Generating XSD schema from POCO classes using template meta-programming techniques
  • Auto detecting latin1/ASCII/CP-1252/UTF-8/UTF-[16|32][LE|BE] XML file encoding
  • lexical cast API for XML

Differences from full XML processors

This is non validating parser i.e. no XML structure validation using DTD or XSD yet provided.

XML syntax will be validated, i.e validation for valid XML characters, correct XML prologue and initial section, correct XML names and W3C attributes rolls like only one attribute with the same name and balanced root node.

XML Parsing with StAX XML pool API

A pool API for parsing XML. Unlike SAX or SAX like parsers (for example expat) you don't have to put any callbacks into parser. Parsing flow is fully under your control. This API is useful is you need to process some huge XML files, or need to extract only a specific data from a huge common XML. Memory used internally is limited to 16 mib as max. Initially parser uses a OS page size memory buffer (4k in most cases), buffer growing exponentially each time parser need more data unless 16 mib limit. A complete parsing example with commentaries can be found xmlparse

Reading XML into POCO classes or raw C structures structures

There is a facade on top for event reader API to simplify reading data into POCO structures. A complete parsing example with commentaries can be found xml_deserializing example

Writing XML and generating XSD schema from POCO classes

Library using template meta-progrmming for a reflection like serializing POCO into XML. All what you need is provide the required XML stricture to XML writer complete example can be found at xml_marshalling . When you have a C++ RTTI on, which is good idea for debugging build you can also generate XSD schema. NOTE! This functionality is not exception safe, unlike most parts of the library.

Clone this wiki locally