Monthly Archives: August 2017

8 RULES FOR FAST DATA MANAGEMENT AND ANALYTICS

For building, maintaining, and supporting the current generation there is a need to take proactive measures by the Data managers. A significant piece of the puzzle in maintaining the performance, and here is where drive and database elements converges to real time. Shifting to a fast or streaming data environment is possible by these key elements over here:

1) MIND YOUR STORAGE

Abundance and responsive storage are the essential components of the technology for fast data requirement. Business counterparts and data managers must comprehend the place and time of using data pulsing through their need in organizations to read once and discard or stored for ancient purposes. There are lots of forms of data like constant streams of normal reading from sensors- for archival storage it is simply not enough.

2) CONSIDER ALTERNATIVE DATABASES

Across the enterprise, lots of data is being sought among the enterprise these days in the non-relational variety, unstructured- graphical, video, log data, and so forth. For instance, relational data system are slower than required for the job for installing unstructured data streams. For instance, NoSQL databases have lighter established relational database environments.

3) CLOSE ANALYTICS OF DATA IS EMPLOYED

For data analytic it is useful that are database embedded with solutions of database for many basic queries. Greater response times is enabled by the user versus routing data and queries through networks and dragging centralized algorithms on increase and performance wait times

4) EXAMINE IN_MEMORY OPTIONS

High Intelligence of delivery and interactive experiences need the back end systems and applications perform at the peak. Delivery of data at blazing speeds requires the movement and check out that every nanosecond counts in a user interaction. For supporting entire datasets in the memory and delivering at a high speed memory technology is used.

5) MACHINE LEARNING EMPLOYMENT

An algorithm for employing the techniques behind every analytics driven interaction for gathering data and some pattern matching for measuring preferences or future predicting outcomes.

6) CLOUD LOOKING

There are lots of components required for streaming or fast data in today’s cloud service support in the memory technologies, machine-learning algorithms. In the surveys of OPSclarity 68% of the cite by most respondents utilized hybrid developments as the preferred mechanism for hosting streaming data pipelines.

7) SKILLSBASE BOOST

For fast or streaming data and analytics delivery is needed as the next-generation dawns. The Data professionals require greater familiarity with new tools and frameworks along with Apache Spark or Apache Kafka. The level of training must be increased for current data management staffs along with seek out skills in the market.

8) LOOK AT DATA LIFECYCLE MANAGEMENT

For filtering the data that is required for long term eventual storage versus data that is only useful at the moment. In other words the amount of data needed to store would be overwhelming and unnecessary mostly.

Thus our DBA Course is more than enough for you to make your career in this field.

Stay connected to CRB Tech for more technical optimization and other updates and information.

Don't be shellfish...Digg thisBuffer this pageEmail this to someoneShare on FacebookShare on Google+Pin on PinterestShare on StumbleUponShare on LinkedInTweet about this on TwitterPrint this pageShare on RedditShare on Tumblr

Monitoring MongoDB Performance

A favorite database for developers is MongoDB. It offers developers with a NoSQL database option and a database environment that has flexible schema design, automated failover, and an input language with developer-familiarity, namely JSON. NoSQL databases are available in various types. Each item is retrieved using its name (key) and is stored by Key-value. A kind of key-value store with wide column stores uses rows and columns and there is a change in the rows and column values in a table.

Documents are actually data stored by the document-oriented databases offering more structural flexibility when compare to other databases.

A document-oriented database also called as MongoDB is a cross-platform database that has data in documents in a binary-encoded JSON format ( called as binary JSON, or BSON). Both the speed and flexibility of JSON is increased by the binary format and adds lots of data types.

Reason for Monitoring MongoDB

Simple or Complicated environments are there in MongoDB database, distributed or local, on-premises or in the cloud. If you want to make sure about the available database and performance, you need to track and monitor analytics in order to:

  • Current state database determination
  • Data performance review for identifying any abnormal behavior.
  • Some diagnostic data is offered to resolve identified problems
  • Small issues are fixed before they grow into big issues
  • Have a smooth running environment
  • Ongoing availability and success is ensured

Keep your database under observation in a regular and measurable way for ensuring discrepancy spotting, odd behaviors, or issues before they affect the performance. You can quickly spot slowdowns, limiting resources, other aberrant behavior and work for fixing the issues before hitting the consequences of slow websites and applications, lack of data availability or fed up customers.

Thus our DBA Course over here is more than enough for you to make your career in this field.

Stay connected to CRB Tech for more technical optimization and other updates and information.

Don't be shellfish...Digg thisBuffer this pageEmail this to someoneShare on FacebookShare on Google+Pin on PinterestShare on StumbleUponShare on LinkedInTweet about this on TwitterPrint this pageShare on RedditShare on Tumblr

TOP 5 NOSQL DATABASES

Gone are the days where one database was used for the entire company. In today’s world even a normal mobile application requires more than one database. Welcome to the golden age of open source NoSQL databases. There are great and readily available open sour technologies with amazing communities behind them at their fingertips. The main thing to consider is which database is right for which use case. There are lots of options available and here are five NoSQL database that developers are familiar with.

1) MONGODB

For supporting JSON format MongoDB is a document oriented database. It is popular among the developers because of its use and easy operation and there is no need for a database administrator (DBA) to bootstrap. For flexible replication and sharding across nodes MongoDB is quite functionally robust. There is a multi-version concurrency control with MongoDB for ensuring consistency in older versions of data available in complex transactions. For scenarios with high loads and Big Data volumes MongoDB has suitable scenarios. Sharding, replication, and data center queries aggregates powerfully with index support and map/reduce functions. It is very easy to use NoSQl database in development phase at an earlier stage and during this phase the schema is not fully established.

2) REDIS

One of the speedy datastores existing today is REDIS. An in-memory, open source, NoSQL database is known for its speed and performance. The community of developers are growing and vibrant in Redis. There are several data types featured implementing lots of functionalities and flows very simple. For delivering top performance, there are various requirements of stored data in RAM, when it comes to speed and performance Redis is considered the winner. If you have an issue with time then this database is the best choice.

3) Cassandra

As a useful hybrid of a column oriented database with a key value store, Cassandra is created at Facebook. The familiar feeling of tables is provided by the grouping families for offering good replication and consistency for good linear scaling. For managing really big volumes of data, Cassandra is most effective in use. A familiar interface is provided and the learning curve is not very steep for users. There are tunable consistency settings in Cassandra.

4) CouchDB

In JSON format over HTTP, CouchDB is accessed. For Web applications this is very simple. It is not jaw dropping that the best suited database for Web with good applications for offine mobile apps is called CouchDB. While choosing a reliable database developers should take an account of CouchDB where every change is stored on disk like a document revision therefore the main point addressed over here is redundancy and conflict resolution. A strong replication is boasted by CouchDB model that for allowing filtered replication streams.

5) HBase

In Hadoop there is a powerful database considered and the Hbase spreads among nodes using HDFS. It is very appropriate to use for handling huge tables comprising of billions of rows. Big Table model is followed by both Hbase and Cassandra. For linear scaling Hbase is sued for simply adding multiple nodes to the setup. For real-time querying of Big Data Hbase is best suited. For more information join the DBA course to make your career in this field as a DBA professional.

Fore more information join the DBA institute of training for becoming a DBA Professional in this field successful.

Stay connected to CRB Tech for more technical optimization and other updates and information.

Don't be shellfish...Digg thisBuffer this pageEmail this to someoneShare on FacebookShare on Google+Pin on PinterestShare on StumbleUponShare on LinkedInTweet about this on TwitterPrint this pageShare on RedditShare on Tumblr

How Cosmos DB Handles Data Consistency

Cosmos DB users with limited percentage will use this approach for data consistency in the real world. Advantages of three alternative consistency models will be used instead. Turing Award Winner Leslie Lamport is based on the work. Thereby creating a database for managing the real life situations were the foundations and they deliver shared applications without the traditional consistency model penalties. Staleness is bounded by the first alternative consistency model that offer a point that there is a sync between reads and writes. There is no guarantee before it but after it the latest version is always accessed by you. Either a number of versions or time interval you will define the boundary.

Everything is consistent outside the boundary and within it there is no surety of a read returning latest data. An element of strong consistency is a store while offering you low latency and the choice of global sharing and higher reliability.

If you want to be sure of all the consistent reads then you can use this model. The writes are also considered to be fast. If you read it in the region that is written then you can get the data that is correct.

Session consistency, second alternative consistency model, works well when there is a read and write drives from a client app. The own writes of a client can be read by them, and across the rest of the network the data replication takes place. You have low latency of data access with this way and you know that you will fail over over a period of time and in any Azure region your application will run.

A third alternative consistency model with prefix of consistency in Cosmos DB has been added by Microsoft. The eventual consistency’ speed has added predictability with Consistency prefix. The latest write might not be seen by you when you read the data but your reads will never be out of order.

It is both fast and predictable useful feature. Your client will be able to see after Write A, then B, and then C, only A or A and B but never A and C.

The Cosmos DB regions will mix on A, B, and C, offering you reliability and speed. From the competition it is a very different beast when Cosmos DB is concerned. There are some NoSQl offers from the limited form of shared access, but they target at redundancy offers and recovery from disasters.

There are similar features offered by the Google spanner but is possible only datacenters in a single region. If your target audience is only US or EU then you might be fine with the work but for a global reach there are more cloud services.

Strong consistency in low latency is a good option but has less value when data replication that is cross-regional becomes a major bottleneck.

CosmosDB relies on applications and that is what your type of consistency choice is all about. Is your target about reading or writing the data? Usage of data? There are advantages and disadvantages of each consistency model and you need to take into account before choice preferences carefully.

A good place to start, session consistency for most app-centric data. It is good to test with various choices when you dont require immediate access of data globally.

For more information join the DBA Course to make your career in this field.

Stay connected to CRB Tech for more technical optimization and other updates and information.

Don't be shellfish...Digg thisBuffer this pageEmail this to someoneShare on FacebookShare on Google+Pin on PinterestShare on StumbleUponShare on LinkedInTweet about this on TwitterPrint this pageShare on RedditShare on Tumblr

CrateDB In Detail

Crate.io makes CrateDB.For the purpose of receive sensor data similar to IOT you will find CrateDB, a quasi-RDBMS.

Creators of CrateDB may be have a little show for realizing that the “R” part was required but are playing catch-up with regard.

Austrian guys from Berlin discovered an outfit called as Crate.io and it is being converted into a San Franciso company.

There are 22 employees and 5 paying customers in Crate.io.

There are large number of production users in crate.io and it has clearly active custers and overall product downloads.

An open source CrateDB in an essence has less mature solution to MemSQL. The choice for MemSQL and CrateDB exists in part because analytic RDBMS vendors didn’t end it.

There Are No Relational Story Starts In CrateDB:

  • There are original values or objects in a column and these objects are nested/hierarchical structures that are common in the NoSQL/internet-backend world.
  • But when they are BlOBS they are different (Binary Large Objects).
  • For strict schemas manual definition on the structured objects a syntax for navigating the structure in WHERE clauses is required.
  • For automatically inferring dynamic schemas it is simple enough for more suitable development/prototyping than for serious production.

An instance of data given by Crate from greater than 800 kinds of sensors being collected together in a single table. This provides a significant complexity in the FROM clauses. In a relational schema it would be at least as complicated and probably worse.

For knowing the the architectural choices for Crate is to observe that they are accepting to have different latency/consistency standards for:

  • Single row look ups and writes
  • Aggregates and joins

Thus It Makes Sense That:

  • In CrateDB data is banged into an NoSQLish type of way as it arrives, with RYW consistency.
  • The required indexes for SQL functionality are updated in microbatches as soon as possible after that.

There are no real multi-statement transactions that CrateDB will have but it has easier levels of isolation that is called transactions in some marketing contexts.

Highlights of Technical CrateDB Includes:

  • JSON documents are stored from CrateDB records.
  • Relational case purely has the glorified text strings of the documents regarded.
  • IT was found that BLOB Storage was somewhat isolated from the rest.
  • The sharing story of CrateDB initiates with consistent hashing
  • The convenient and the lenient nature have many local shards.
  • There is a possibility to change your shard counts and the future inserts will be given into the new shards set.
  • CrateDB has two indexing strategies with respect to consistency models.
  • Primary-key/ single row look ups have a forward lookup index whatever it is.
  • Columnar index are available in Tables.
  • There are more aggregations and complex queries required and are commonly done straight against the index of the columnar.
  • The indexing strategy and CrateDB’s principal columnar looks like an inverted list which looks like a standard text indexing.
  • Geospatial datatypes can be indexed in different ways.

For more information Join the DBA Training Institute in Pune to make your career in this field as a DBA Professional.

Stay connected to CRB Tech for more technical optimization and other updates and information.

Don't be shellfish...Digg thisBuffer this pageEmail this to someoneShare on FacebookShare on Google+Pin on PinterestShare on StumbleUponShare on LinkedInTweet about this on TwitterPrint this pageShare on RedditShare on Tumblr

How To Get Rid Of Big Data Analytics Failure?

Game-changing initiatives of Big Data analytics are offering you insights for assisting the blow past the contest, provide new revenue sources, serving better customers, etc. Colossal failures are also possible because of big data and analytics initiatives. Thus it leads to money and time waste and not tell the loss of professionals who are talented in technology because management blunders. Considering the fact that you have done the basics which divides success from failure in big data analytics is for dealing with technical issues and challenges for analyzing big data. For staying on the success side of the equation this what you can do.

1) Dont Choose Big Data Analytics Tools Hastily

There are lots of technology failures rise up from the fact that companies buy and use products that stands for an awful fit for their choice of accomplishment. Big data or advanced analytics of the words can be slapped by and seller in their product descriptions for taking advantage of the high level hype around the terms. Around the storage architecture and data transformation there are some basic capabilities for all the big data analytics. Development of a data model is required by every data analytics tool in the back-end system. For translation into business language the right data should always be used.

2) Make Sure That The Tools Are Easy For Use

It is a known fact that Big data and advanced analytics are not simple but the products are very simple and users rely on it for accessing and making sense of the data. Offer simple, effective tools for the teams of business analytics and for using the data discovery analytics and visualizations. For domain registrar GoDaddy the right combination of tools was tough to find. For faster visualizations it needs to be simple but capable for deep-dive analytics. For performing more advanced analytics its team was freed up. Programmer level tools are not provided to nontechnical business users.

3) Project And Data Alignment

The efforts of big data analytics bugging my might fail because they end up as solution while searching the problem that is not in existence. In such cases business challenges/needs must be framed when you are focused into the right analytical problem. There is a need for applying the right data for extracting business intelligence and make proper predictions. Therefore data should have high priorities.

4) Don’t Skip On Bandwidth And Build A Data Lake

There were lots of data involved for big data. In the ancient times, very few companies store so much data, very few organize and analyze it. High-performance storage technologies, large-scale processing are available widely, cloud and on-premises systems are available in the cloud. An important real-time analytics to traffic routing from social media trends needs to be speedy enough. So use the fastest interconnect available for building your data lake.

5) High Security In Every Facet Of Data

The computational infrastructure and its heterogeneity has a higher degree of components and is sped substantially and the ability for meaningful insights from data. Deployment of the basic enterprise tools must be the security measure data encryption whenever identified, practical and assess the management, network security.

6) Data Management And Quality At A Top Priority

Quality and good data management assurance should be the landmark of all the projects of big data analytics or else the chances of failure are much higher. Data management professionals are hired by big part of governance and data quality assurance. After offering strategic importance and initiatives, enterprise have real data ownership need over stewardship of data, management, governance, and policy.

Join the Institute of DBA course to make your career in this field as a DBA Professional.

Stay connected to CRB Tech for more technical optimization and other updates and information.

Don't be shellfish...Digg thisBuffer this pageEmail this to someoneShare on FacebookShare on Google+Pin on PinterestShare on StumbleUponShare on LinkedInTweet about this on TwitterPrint this pageShare on RedditShare on Tumblr

CockroachDB

The ultimate motto of CockroachDB is for making the data easy. Instead of wasting your energy in banging with database shortcomings,and invest that time, money and engineering into transforming a company stronger and better.

Horizontal Scaling

At its lowest level let us see cockroachDB running on a single machine, even though the data is organized logically into rows, tables, columns, etc. At the least level, individual pieces of data are stored on-disk in an arranged key-value map.

The empty range of key-value data starts off and encompasses the entire key space. With more data in this range, it will lead to a threshold size. The data splits into two ranges and single data can be tested as individual units and each covering segment that is contiguous of the key space.

Replication

There are three nodes replicated with each range and there can be various configured nodes but the most useful consideration is three and five. In disparate datacenters replicas can be intended to be located for its sustainability.

In multiple machines you can find the stored data and it is very important that the consistency of the data across replicas. Raft is used by cockroachDB and it has a consensus protocol. An independent instance is given for each range of the raft protocol and we many ranges for independent running Raft.

Distributed Transactions

Strong consistency and full support of shared ACID transactions is the basis of CockroachDB and it offers shared transaction using MVCC Multi Version Concurrency Control. Stored data in MVCC is managed on each local storage device with a hint of RocksDB.

Deployment and Management

CockroachDB has been built to have a simple deployment that has proper fit with the container model. You can find the nodes very organized and with symmetry and it has no single point of failure and has no configuration with complications. There is no mention of different roles to be managed. The only thing required is a single binary running on each node in order for joining the cluster and the local stores export for getting new writes and rebalances.

Corporate dev cluster, laptops or private cloud can be used for running the CockroachDB. It is also found on many common infrastructure. You can set up a cluster of Amazon Web Services using cockroach-prod and it can be used in Google Cloud Engine.

For more details join the Institute of DBA to make your career in this field as a Certified DBA professional.

Stay connected to CRB Tech for more technical optimization and other updates and information.

 

Don't be shellfish...Digg thisBuffer this pageEmail this to someoneShare on FacebookShare on Google+Pin on PinterestShare on StumbleUponShare on LinkedInTweet about this on TwitterPrint this pageShare on RedditShare on Tumblr

Apache Spark

An in-memory data processing engine which is rapid, elegant, and expressively develops APIS for allowing data workers for efficiently executing stream and machine learning SQL workloads for the need of rapid access of datasets in an iterative manner. Developers omnipresently create applications with the spark running on Apache Hadoop YARN, derive insights, and developing their data science within a single workload, distributed dataset in Hadoop. This architecture offers the basis that permits Spark and other applications sharing a cluster common to the dataset while consistent ensuring levels of service and response.

USE CASES IN SPARK

As the momentum increases in Apache Spark, customers across various sectors can be seen seeking actual values using it. Here are few examples on the usage of Spark:

  1. Insurance: Claiming reimbursement procedures are optimized with the use of Spark’s machine learning skills for processing and analyzing the claims.
  2. Healthcare: Spark core is used for building a patient care system and streaming and SQL.
  3. Retail: For analyzing point of sale use spark data and coupon usage.
  4. Internet: For identifying the spark’s ML capability get to know the fake profiles and develop the product matches that reveals their customers.
  5. Banking: For predicting the retail banking’s profile use a machine learning model for the users financial product.
  6. Government: Spending across time, geography, and category and its analysis.
  7. Scientific Research: Time, depth, geography and future events prediction for analyzing earthquake.
  8. Investment Banking: Stock prices analysis with intra-day for predicting the future price movements.
  9. Geospatial Analysis: Uber trips are researched by geography and time for knowing the future pricing and demands.
  10. Twitter Sentiment Analysis: For finding negative, positive, and neutral sentiments research large volumes of tweets for particular products and organizations.
  11. Airlines: For predicting airline travel delays build a model.
  12. Devices: Choose likelihood of building with extra threshold temperatures.

What’s New in Spark 2.0?

This marks a big milestone for the project and it has new releases in targeted feature enhancement relying on community feedback. With respect to Spark’s enhancement there are four major areas of improvement.

  • SPARK SQL

SQL is the famous Apache Spark based applications and the most popular interface. 99 TPC-DS queries has spark2.0 offering support that is largely relied on SQL: 2003 specification. Current data loads can be ported into a spark backend with less replications of the application stack.

  • Machine Learning

A major emphasis in the new release is called machine learning. The package that as new spark.ml relies on DataFrames and will be replaced with the current Spark Mlib. Models and Pipelines of machine learning can now be persisted across all languages back up by Spark. Generalized Linear Models, K-Means, Survival Regression and Naïve Bayes are now backed in R.

  • Datasets:

For scala and java programming languages the data frames and datasets are now unified within the new Datasets class and it also offers an abstraction for structured streaming. Hive context and SQL context are now overwritten by unified SparkSession. For backward compati bility old APIs have been depricated.

Join the Institute of DBA training course to make your career in this field as a Certified DBA Professional.

Stay connected to CRB Tech for more technical optimization and other updates and information.

Don't be shellfish...Digg thisBuffer this pageEmail this to someoneShare on FacebookShare on Google+Pin on PinterestShare on StumbleUponShare on LinkedInTweet about this on TwitterPrint this pageShare on RedditShare on Tumblr

Important MySQL Performance Tips

There are many MySQL performance issues for turning out with similar solution and making troubleshooting and tuning MySQL a manageable task. You can find 4 tips for getting great performance in MySQl.

1) MySQL performance tip No.1: Workload Profile

If you profile the workload of the servers then you can very well understand how your server spends its time. You can most expose the costly queries for further tuning. You find time as the most important metric as you issue a query against the server, you lack care about anything except how rapidly it completes. With a tool such as MySQL Enterprise Monitor query analyzer is the best tool for profiling your workload. The server executes the tools captured queries and return a table of tasks sorted by reducing order of response time, instantly bubbling up the most expensive and time-consuming tasks for the top as you can see where to focus your efforts. Similar queries together with workload-profiling tools group allowing you to see the queries that are slow along with the queries that are fast but executed many times.

2) MySQL performance tip No. 2: Understand The Four Fundamental Resources

A database server requires four fundamental resources for functioning like CPU, Memory, disk, and network. If you find them erratic, weak, or overloaded then the database server is very likely to perform poorly. It is very significant for understanding the performance of fundamental resources and choosing hardware and troubleshooting problems. Choosing the hardware for MySQL, ensures good-performing components all around. Significantly balance them reasonably well against each other. Companies will select servers with fast CPUs and disks that has memory starvings.

3) MySQL performance tip No. 3: Don’t Use MySQL As A Queue

Queue-like access patterns can get into your software without the knowledge of it. For instance, if you keep the status of an item in such a way that a particular worker process can declare it before acting on it. Then you developing a queue unwittingly. Emails for marketing such as unsent, sending them, then marking them as sent is known example. There are two major reasons for which queues cause problems. Workload is serialized avoiding tasks for parallel work and often the result in a table contains work in process for historical data from jobs that was processed long ago.

4) MySQL performance tip No. 4: Filter Results By Cheapest First

MySQL optimization can be done in a great way by doing cheap and imprecise work first, resulting set of data, precise work on the smaller. For instance, if you are looking for something in a given circle of a point in geographical way. The great circle (Haversine ) formula is the first tool in many programmers for computing distance in the surface of a sphere. The formula requires a lot of trigonometric operations and is very CPU-intensive. There is a slow way of running great-circle calculations and make the CPU machine utilization skyrocket. To small subset of total pare down your records before implementing this great-circle formula. A circle within a square has an easy way of doing it. In this way the only things within the square gets hit by those costly trig functions.

Before applying the great-circle formula, pare down your records to a small subset of the total, and trim the resulting set to a precise circle. A square that contains the circle (precisely or imprecisely) is an easy way to do this. That way, the world outside the square never gets hit with all those costly trig functions.

If you Join The Institute of DBA you will be able to understand and become a DBA Professional in this field.

Stay connected to CRB Tech for more technical optimization and other updates and information.

Don't be shellfish...Digg thisBuffer this pageEmail this to someoneShare on FacebookShare on Google+Pin on PinterestShare on StumbleUponShare on LinkedInTweet about this on TwitterPrint this pageShare on RedditShare on Tumblr

Apache KAFKA

Apache kafka is a scalable, fast, durable, and fault-tolerant subscribe messaging system on later stage. Traditional message brokers often uses Kafka like AMQP and JMS as its higher result, reliability and replication. Apache Hbase, Apache storm has a combination of work with Kafka for rendering streaming data and real-time analysis. Geospatial data can be messaged from a fleet of long-haul trucks or data sensor from cooling and heating equipment in office buildings. Whatever may be the scenario, for low-latency analysis for massive message streams in Enterprise Apache Hadoop.

What KAFKA Does ?

There are a wide range of use cases and a general-purpose messaging system is supported by Apache Kafka and it has a high throughput, reliable delivery, horizontal scalability is significant. Apache Hbase and Apache Storm has a good work compatibility with Kafka. These are common use cases:

  • Website Activity Tracking
  • Stream Processing
  • Log Aggregation
  • Metrics Collection and Monitoring

Some of the Significant Characteristics of Kafka making have an attractive option for use cases with the following:

1) Scalability: With no downtime distributed system scales easily.

2) Durability: Provides intra-cluster replication and persists messages on disk.

3) Reliability: Supports multiple subscribers, Replicates data, and balances automatic consumers in case of failure.

4) Performance: For publishing and subscribing you require high throughput along with disk structures for offering constant performance even with many terabytes of stored messages.

KAFKA WORKING

It can be considered as a shared commit log, and incoming data has a sequential entry into disk. Here are four main components involved in moving data in and out of Kafka:

  • Producers
  • Topics
  • Consumers
  • Brokers

A Topic is a user defined category in Kafka where messages are published and it is done by Kafka Producers for one or more topics and Consumers give their subscription to process and topics for the published messages. A cluster of Kafka comprises of more servers known as Brokers for handling the replication and persistence of data message.

Kafka’s performance would be high and it is the simplicity of the broker’s responsibility. The topics of Kafka comprises of more partitions that are ordered with immutable message sequences. Sequential writes are available in partition because the design greatly reduces the number of hard disk seeks.

Kafka’s performance contributing factor and scalability is the fact that brokers of Kafka keep track of the messages that has been consumed and the consumer must take the responsibility. JMS, traditional messaging system have the broker bore the responsibility, there by strictly restricting the ability to scale as there is a steady increase in consumers.

Join the Institute of DBA to know more about this field and become a Certified DBA Professional over here.

Stay connected to CRB Tech for more technical optimization and other updates and information.

Don't be shellfish...Digg thisBuffer this pageEmail this to someoneShare on FacebookShare on Google+Pin on PinterestShare on StumbleUponShare on LinkedInTweet about this on TwitterPrint this pageShare on RedditShare on Tumblr